Python HTTP库之requests模块
是一个用于发送 HTTP 请求的 Python 库,其 API 比较简洁,使用起来比 urllib
更加便捷(本质是封装了 urllib3
库发送请求将网页内容下载下来以后,并不会执行 JavaScript 代码,这需要我们自己分析目标站点然后发起新的请求。- 在正式学习
前,建议先熟悉 HTTP 协议
是最常用的请求方式。- 其他请求方式包括
import requests
| import requests
r = requests.get('')
r ='', data={'key': 'value'})
r = requests.put('', data={'key': 'value'})
r = requests.delete('')
r = requests.head('')
r = requests.options('')
二、基于 GET 请求
1. 基本请求
1 2 3 4
| import requests
response = requests.get('') print(response.text)
2. 带参数的 GET 请求
自己拼接 GET 参数
import requests
| import requests
response = requests.get('', headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36', }) print(response.text)
URL 编码
如果查询关键词是中文或其他特殊字符,需要进行 URL 编码。
from urllib.parse import urlencode
| from urllib.parse import urlencode
wd = 'jerry老师' encode_res = urlencode({'k': wd}, encoding='utf-8') keyword = encode_res.split('=')[1]
url = f'{keyword}&pn=1'
response = requests.get(url, headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36', }) print(response.text)
使用 params
import requests
| import requests
wd = 'jerry老师' pn = 1
response = requests.get('', params={ 'wd': wd, 'pn': pn }, headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36', }) print(response.text)
3. 带头部的 GET 请求
import requests
| import requests
headers = { 'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36', }
response = requests.get('', headers=headers) print(response.status_code)
4. 带 Cookie 的 GET 请求
import requests
| import requests
cookies = { 'sessionid': 'abc123', }
response = requests.get('', cookies=cookies) print(response.text)
好的,以下是优化后的关于基于 POST 请求的部分:
三、基于 POST 请求
1. 介绍
GET 请求:
- 默认的请求方法。
- 没有请求体。
- 数据必须在 1K 之内。
- 数据会暴露在浏览器的地址栏中。
- 常用操作:
- 在浏览器的地址栏中直接给出 URL。
- 点击页面上的超链接。
- 提交表单时,默认使用 GET 请求,但可以设置为 POST。
POST 请求:
- 数据不会出现在地址栏中。
- 数据的大小没有上限。
- 有请求体。
- 请求体中如果存在中文,会使用 URL 编码。
用法与 requests.get()
有一个 data
2. 发送 POST 请求
示例:自动登录 GitHub
import requests
import re
| import requests import re
r1 = requests.get('') r1_cookie = r1.cookies.get_dict() authenticity_token = re.findall(r'name="authenticity_token".*?value="(.*?)"', r1.text)[0]
data = { 'commit': 'Sign in', 'utf8': '✓', 'authenticity_token': authenticity_token, 'login': 'your_username', 'password': 'your_password' }
r2 ='', data=data, cookies=r1_cookie) login_cookie = r2.cookies.get_dict()
r3 = requests.get('', cookies=login_cookie) print('your_username' in r3.text)
使用 requests.Session
自动管理 Cookie
import requests
import re
| import requests import re
session = requests.Session()
r1 = session.get('') authenticity_token = re.findall(r'name="authenticity_token".*?value="(.*?)"', r1.text)[0]
data = { 'commit': 'Sign in', 'utf8': '✓', 'authenticity_token': authenticity_token, 'login': 'your_username', 'password': 'your_password' }
r2 ='', data=data) login_cookie = r2.cookies.get_dict()
r3 = session.get('') print('your_username' in r3.text)
3. 补充
import requests
| import requests
url = '' headers = { 'Content-Type': 'application/json' } data = { 'key': 'value' }
response =, data=data, headers=headers) print(response.json())
response =, json=data) print(response.json())
处理 JSON 数据
import requests
| import requests
url = '' data = { 'key': 'value' }
response =, json=data) print(response.json())
import requests
| import requests
url = '' data = { 'key': 'value' }
response =, data=data) print(response.json())
1. 响应属性
import requests
| import requests
response = requests.get('')
print(response.text) print(response.content) print(response.status_code) print(response.headers) print(response.cookies) print(response.url) print(response.history) print(response.encoding)
2. 关闭响应
1 2 3 4 5
| from contextlib import closing
with closing(requests.get('', stream=True)) as response: for line in response.iter_content(): pass
3. 编码问题
1 2 3 4 5
| import requests
response = requests.get('') response.encoding = 'gbk' print(response.text)
4. 获取二进制数据
import requests
| import requests
response = requests.get('')
with open('a.jpg', 'wb') as f: f.write(response.content)
5. 流式下载大文件
import requests
| import requests
response = requests.get('', stream=True)
with open('b.mp4', 'wb') as f: for chunk in response.iter_content(chunk_size=1024): if chunk: f.write(chunk)
6. 解析 JSON
import requests
| import requests
response = requests.get('')
res1 = response.json() print(res1)
7. 重定向和历史
import requests
| import requests
r = requests.get('') print(r.url) print(r.status_code) print(r.history)
1. SSL 证书验证
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| import requests from requests.packages.urllib3.exceptions import InsecureRequestWarning
response = requests.get('')
response = requests.get('', verify=False) print(response.status_code)
requests.packages.urllib3.disable_warnings(InsecureRequestWarning) response = requests.get('', verify=False) print(response.status_code)
response = requests.get('', cert=('/path/server.crt', '/path/key')) print(response.status_code)
2. 使用代理
import requests
| import requests
proxies = { 'http': '', 'https': '', }
response = requests.get('', proxies=proxies) print(response.status_code)
3. 会话对象
import requests
| import requests
session = requests.Session() session.headers.update({'User-Agent': 'MyApp/1.0'})
response = session.get('') print(response.status_code) print(response.json())
4. 异常处理
1 2 3 4 5 6 7 8 9 10
| import requests from requests.exceptions import RequestException
try: response = requests.get('') response.raise_for_status() except RequestException as e: print(f"An error occurred: {e}") else: print(response.json())
GitHub 登录
import requests
import re
| import requests import re
r1 = requests.get('') r1_cookie = r1.cookies.get_dict() authenticity_token = re.findall(r'name="authenticity_token".*?value="(.*?)"', r1.text)[0]
data = { 'commit': 'Sign in', 'utf8': '✓', 'authenticity_token': authenticity_token, 'login': 'your_username', 'password': 'your_password' }
r2 ='', data=data, cookies=r1_cookie) print(r2.status_code) print(r2.url) print(r2.history)
r2 ='', data=data, cookies=r1_cookie, allow_redirects=False) print(r2.status_code) print(r2.url) print(r2.history)