Python爬虫-wallhaven任意页面下的壁纸批量下载

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Python爬⾍-wallhaven任意页⾯下的壁纸批量下载基于Python 任意页⾯下的壁纸批量下载
Maxpagenum 爬取页数
fpath 保存路径
url 基础地址
import requests
import re
import time
import os
#爬取页数
Maxpagenum = 10
Sleeptime =0.1
def creatPath(path):
if not os.path.exists(path):
print("Creat path")
os.makedirs(path)
if __name__ == '__main__':
#创建⽂件夹路径
fpath = "D:\Download\pic"
creatPath(path=fpath)
#源地址'https:///search?q=id%3A2278&sorting=random&ref=fp&seed=ZYNEUQ&page=2' 'https:///hot''https:///hot?page=4'... #图⽚列表链接
url = 'https:///search?q=id%3A4641&page=4'
#初始化
pagenum = 0
picnum = 0
#获取每⼀个page
while pagenum<Maxpagenum:
headers = {
'referer': url + 'page = ' + str(pagenum),
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"
}
pagenum = pagenum + 1
par = {
'page': str(pagenum)
}
img_data = requests.get(url=url,headers=headers,params=par).text
#获取图⽚详情页链接的正则表达式
ex = '<a class="preview" href="(.*?)" target="_blank" ></a>'
img_src_list = re.findall(ex,img_data,re.S)
#获取图⽚链接的正则表达式
img_url_ex = '<img id="wallpaper" src="(.*?)" alt'
# 从详情页获取图⽚链接
for src in img_src_list:
time.sleep(Sleeptime)
img_page = requests.get(url=src,headers=headers).text
img_url = re.findall(img_url_ex,img_page,re.S)[0]
img_data = requests.get(url=img_url).content
img_name = img_url.split('/')[-1]
img_path = fpath+'/'+img_name
fp = open(img_path, 'wb')
fp.write(img_data)
print("finish " + str(picnum))
picnum += 1。