在网页上显示淘宝旺旺图标和QQ在线状态图标
将您的淘宝旺旺状态发布在互联网上点击“旺旺图标” ,他人便可随时随地与您联系,买卖宝贝, 广交淘友,”旺遍天下”给您更便捷的淘宝体验。
链接:http://www.taobao.com/help/wangwang/wangwang_0628_04.php
在线生成QQ状态图标
scrapy
Scrapy ,这是一个用 Python 写的 Crawler Framework ,简单轻巧,并且非常方便,并且官网上说已经在实际生产中在使用了,因此并不是一个玩具级别的东西。http://scrapy.org/
Latest stable release: 0.8
The latest stable release is available from several sources:
- Download the source tarball: Scrapy-0.8.tar.gz
- Download the Windows installer: Scrapy-0.8.win32.exe
- Scrapy 0.8 on PyPI
- install with:
easy_install -U Scrapy
- install with:
- Ubuntu PPA for Scrapy 0.8 (maintained by Insophia, the company behind Scrapy)
- after adding PPA, install with:
apt-get install scrapy
- the PPA version also includes post-release bug fixes
- after adding PPA, install with:
python爬虫和数据挖掘
考虑用python做爬虫,需要研究学习的python模块
1内置的 urllib, urllib2 库用来爬取数据
2 使用BeautifulSoup做数据清洗
http://www.crummy.com/software/BeautifulSoup/
编码规则
Beautiful Soup tries the following encodings, in order of priority, to turn your document into Unicode:
1 An encoding you pass in as the fromEncoding argument to the soup constructor.
2 An encoding discovered in the document itself: for instance, in an XML declaration or (for HTML documents) an http-equiv META tag. If Beautiful Soup finds this kind of encoding within the document, it parses the document again from the beginning and gives the new encoding a try. The only exception is if you explicitly specified an encoding, and that encoding actually worked: then it will ignore any encoding it finds in the document.
3 An encoding sniffed by looking at the first few bytes of the file. If an encoding is detected at this stage, it will be one of the UTF-* encodings, EBCDIC, or ASCII.
4 An encoding sniffed by the chardet library, if you have it installed.
5 UTF-8
6 Windows-1252
可以用fromEncoding参数来构造BeautifulSoup
soup = BeautifulSoup(euc_jp, fromEncoding="gbk")
3 使用python chardet 字符编码判断
http://chardet.feedparser.org/download/
4 更加强大的 selenium
迅速修复nginx fcgi方式配置漏洞
2010年5月20日,80后爆nginx 0day漏洞,上传图片可入侵100万服务器。目前已经有好几个大型互联网公司被入侵了,公司类型包括电子商务、游戏、SNS等。
现在看来,这个漏洞不属于Nginx的漏洞. 是配置的问题, 现在到处都在说是Nginx的Bug,关闭fix_pathinfo(默认是开启的).就可以解决
临时修复方法如下,可3选其一。
1、设置php.ini的cgi.fix_pathinfo为0,重启php。最方便,但修改设置的影响需要自己评估。
2、给nginx的vhost配置添加如下内容,重启nginx。vhost较少的情况下也很方便。
if ( $fastcgi_script_name ~ \..*\/.*php ) {
return 403;
}
3、禁止上传目录解释PHP程序。不需要动webserver,如果vhost和服务器较多,短期内难度急剧上升;建议在vhost和服务器较少的情况下采用。
初学python的Web框架Django-模板
修改django的url规则和模板的基本操作
We’ll just have to take a few steps to make the conversion. We will:
1. Convert the URLconf.
2. Rename a few templates.
3. Delete some of the old, unneeded views.
4. Fix up URL handling for the new views.
初学python的Web框架Django之二-后台管理
一 激活管理界面 Activate the admin site
1 Add “django.contrib.admin” to your INSTALLED_APPS setting.
2 Run python manage.py syncdb. Since you have added a new application to INSTALLED_APPS, the database tables need to be updated.
3 Edit your mysite/urls.py file and uncomment the lines that reference the admin – there are three lines in total to uncomment. This file is a URLconf; we’ll dig into URLconfs in the next tutorial. For now, all you need to know is that it maps URL roots to applications. In the end, you should have a urls.py file that looks like this:
› Continue reading
初学python的Web框架Django
1下载
bear@njava:~$wget http://www.djangoproject.com/download/1.2/tarball/ bear@njava:~$tar -xzvf Django-1.2.tar.gz bear@njava:~$cd Django-1.2/ bear@njava:~$sudo python setup.py install
2 新建django项目
bear@njava:~$ django-admin.py startproject njava bear@njava:~$cd njava bear@njava:~$ls total 20K drwxr-xr-x 2 bear bear 4.0K 2010-05-19 23:27 . drwxr-xr-x 4 bear bear 4.0K 2010-05-19 23:27 .. -rw-r--r-- 1 bear bear 0 2010-05-19 23:27 __init__.py -rw-r--r-- 1 bear bear 546 2010-05-19 23:27 manage.py -rw-r--r-- 1 bear bear 3.3K 2010-05-19 23:27 settings.py -rw-r--r-- 1 bear bear 534 2010-05-19 23:27 urls.py bear@njava:~$python manage.py runserver 0.0.0.0:8000
ubuntu上用gitosis来管理 git服务
Git 是 Linus Torvalds 为了帮助管理 Linux 内核开发而开发的分布式版本控制软件(Distributed SCM)。Git 汲取了 Torvalds 在维护大型的分布式项目开发方面的经验和对文件系统性能的丰富知识,正如其文档所描述的,“是一个快速、可扩展的分布式版本控制系统,它具有极为丰富的命令集,对内部系统提供了高级操作和完全访问。”目前,Linux 内核、X.org 服务器和 Ruby on Rails 等开源项目的版本控制系统都已经切换到 Git。
Gitosis 是 Tommi Virtanen 为了更方便和安全的辅助 Git 架设和管理软件版本库 (Software Repository) 而开发的工具软件。虽然 Git 本身也提供 git-daemon 以架设版本库,但在用户访问控制上做的并不严格。而 Gitosis 允许单个用户帐号管理多个版本库,使用 SSH keys 管理用户认证,不需要 shell 帐号就可以解决多用户访问集中版本库的问题。