工作任务和目标:自动获取百度实时热搜榜的标题和热搜指数
标题:<div class="c-single-text-ellipsis"> 东部战区台岛战巡演练模拟动画 <!--48--></div>
<div class="hot-index_1Bl1a"> 4946724 </div>
第一步,在deepseek中输入如下提示词:
你是一个Python爬虫专家,完成以下网页爬取的Python脚本任务:
在F:aivideo文件夹里面新建一个Excel文件:topbaidu.xlsx
设置chromedriver的路径为:"D:Program Fileschromedriver125chromedriver.exe"
用selenium打开网页:https://top.baidu.com/board?tab=realtime;
请求标头为:
Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,**;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-encoding': 'gzip, deflate, br, zstd',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8',
'cache-control': 'max-age=0',
'cookie': '__root_domain_v=.baidu.com; _qddaz=QD.484716194472545; _ntes_origin_from=sogou; _ntes_nuid=af525edefbd3d363cd5876dbe902d85b; s_n_f_l_n3=01cb466d244b7db51716517702302; _antanalysis_s_id=1716517702995; UserProvince=全国; ne_analysis_trace_id=1716518441938; vinfo_n_f_l_n3=01cb466d244b7db5.1.0.1716517702302.0.1716518672850',
'if-modified-since': 'Fri, 24 May 2024 02:40:05 GMT',
'priority': 'u=0, i',
'referer': 'https://www.sogou.com/link?url=hedJjaC291M4oaTBlXc5yCiioa5eDVgw',
'sec-ch-ua': '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'cross-site',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36'
}
# 设置Chrome选项
chrome_options = Options()
chrome_options.add_argument(f'user-agent={headers["user-agent"]}')
# 设置Chrome服务
service = Service(chromedriver_path)
# 启动Chrome浏览器
driver = webdriver.Chrome(service=service, options=chrome_options)
# 打开网页
url = 'https://top.baidu.com/board?tab=realtime'
driver.get(url)
# 等待页面加载完成
time.sleep(5)
# 获取网页源代码
html = driver.page_source
print("网页源代码:")
print(html)
# 创建Excel文件
excel_path = os.path.join('F:\aivideo', 'topbaidu.xlsx')
df = pd.Dataframe(columns=['热搜标题', '热搜指数'])
df.to_excel(excel_path, index=False)
# 定位热搜标题和指数
titles = driver.find_elements(By.CLASS_NAME, 'c-single-text-ellipsis')
indices = driver.find_elements(By.CLASS_NAME, 'hot-index_1Bl1a')
# 保存热搜标题和指数到Excel文件
data = []
for title, index in zip(titles, indices):
title_text = title.text
index_text = index.text
# 输出相关信息
print(f"热搜标题: {title_text}")
print(f"热搜指数: {index_text}")
# 添加数据到列表
data.append({'热搜标题': title_text, '热搜指数': index_text})
# 随机暂停
time.sleep(random.randint(1, 10))
# 使用concat将数据添加到Dataframe
df = pd.concat([df, pd.Dataframe(data)], ignore_index=True)
df.to_excel(excel_path, index=False)
# 关闭浏览器
driver.quit()
print("热搜信息已保存到Excel文件。")
第三步,打开visual studio code软件,新建一个py文件,将Python代码复制到这个文件中,按下F5键运行程序: