首页 > SEO工具 / 正文
本文地址:http://www.xiangcunyuai.com/gongju/535.html
文章摘要:采集百度图片:python程序根据关键词采集指定百度图片 ,,。
导读:采集百度图片:python程序根据关键词采集指定百度图片
采集百度图片:python程序根据关键词采集指定百度图片
# -*- coding: UTF-8 -*-
import re
import sys
import urllib
import requests
def getPage(keyword, page, n):
page = page * n
keyword = urllib.parse.quote(keyword, safe='/')
url_begin = "http://www.xiangcunyuai.com//search/flip?tn=baiduimage&ie=utf-8&word="
url = url_begin + keyword + "&pn=" + str(page) + "&gsm=" + str(hex(page)) + "&ct=&ic=0&lm=-1&width=0&height=0"
return url
def get_onepage_urls(onepageurl):
try:
html = requests.get(onepageurl).text
print=(html)
except Exception as e:
print(e)
pic_urls = [] # 列表形式的URL
return pic_urls
pic_urls = re.findall('"objURL":"(.*?)",', html, re.S)
return pic_urls
def down_pic(pic_urls):
"""给出图片链接列表, 下载所有图片"""
for i, pic_url in enumerate(pic_urls):
try:
pic = requests.get(pic_url, timeout=15)
string = "C:\\Users\\qinzu\\Pictures\\tupian\\"+str(i + 1) + '.jpg'
with open(string, 'wb') as f:
f.write(pic.content)
print('成功下载第%s张图片: %s' % (str(i + 1), str(pic_url)))
except Exception as e:
print('下载第%s张图片时失败: %s' % (str(i + 1), str(pic_url)))
print(e)
continue
if __name__ == '__main__':
keyword = '热血传奇' # 关键词, 改为你想输入的词即可, 相当于在百度图片里搜索一样
page_begin = 0
page_number = 30
image_number = 3
all_pic_urls = []
while 1:
if page_begin > image_number:
break
print("第%d次请求数据", [page_begin])
url = getPage(keyword, page_begin, page_number)
onepage_urls = get_onepage_urls(url)
page_begin += 1
all_pic_urls.extend(onepage_urls)
down_pic(list(set(all_pic_urls)))
# -*- coding: UTF-8 -*-
import re
import sys
import urllib
import requests
def getPage(keyword, page, n):
page = page * n
keyword = urllib.parse.quote(keyword, safe='/')
url_begin = "http://www.xiangcunyuai.com//search/flip?tn=baiduimage&ie=utf-8&word="
url = url_begin + keyword + "&pn=" + str(page) + "&gsm=" + str(hex(page)) + "&ct=&ic=0&lm=-1&width=0&height=0"
return url
def get_onepage_urls(onepageurl):
try:
html = requests.get(onepageurl).text
print=(html)
except Exception as e:
print(e)
pic_urls = [] # 列表形式的URL
return pic_urls
pic_urls = re.findall('"objURL":"(.*?)",', html, re.S)
return pic_urls
def down_pic(pic_urls):
"""给出图片链接列表, 下载所有图片"""
for i, pic_url in enumerate(pic_urls):
try:
pic = requests.get(pic_url, timeout=15)
string = "C:\\Users\\qinzu\\Pictures\\tupian\\"+str(i + 1) + '.jpg'
with open(string, 'wb') as f:
f.write(pic.content)
print('成功下载第%s张图片: %s' % (str(i + 1), str(pic_url)))
except Exception as e:
print('下载第%s张图片时失败: %s' % (str(i + 1), str(pic_url)))
print(e)
continue
if __name__ == '__main__':
keyword = '热血传奇' # 关键词, 改为你想输入的词即可, 相当于在百度图片里搜索一样
page_begin = 0
page_number = 30
image_number = 3
all_pic_urls = []
while 1:
if page_begin > image_number:
break
print("第%d次请求数据", [page_begin])
url = getPage(keyword, page_begin, page_number)
onepage_urls = get_onepage_urls(url)
page_begin += 1
all_pic_urls.extend(onepage_urls)
down_pic(list(set(all_pic_urls)))
猜你喜欢
-
无相关信息
- 搜索
-
- 标签列表
-
- SEO教程 (4)
- SEO服务 (4)
- SEO培训机构 (2)
- 织梦cms仿站教程 (2)
- 广安SEO培训 (2)
- 学SEO (1)
- SEO的发展史 (1)
- 百度权重 (1)
- 百度快照 (1)
- SEO分类 (1)
- 什么是SEO (1)
- 内容营销 (1)
- 网页质量 (1)
- URL路径优化 (1)
- 新站收录 (1)
- 栏目页如何SEO优化 (1)
- 404页面制作 (1)
- seo与sem的关系和区别 (1)
- SEO视频教程 (1)
- 页面优化 (1)
- seo是什么职业 (1)
- 网站为什么要做SEO (1)
- 首页无排名的原因 (1)
- 外链优化 (1)
- dede首页模板的SEO设置 (1)
- 百度收录与索引 (1)
- 百度点击算法 (1)
- 高效学习SEO (1)
- http状态码 (1)
- 网站标题 (1)
- 筛选网站关键词 (1)
- 关键词布局 (1)
- 关键词维度分析 (1)
- 抓住用户的特殊需求 (1)
- 网站导航布局 (1)
- 网站路径优化 (1)
- 域名解析 (1)
- 百度指数 (1)
- 404页面 (1)
- 内容质量 (1)