Python爬虫基础(二)--beautifulsoup-美丽汤框架介绍--688IT编程网

Python爬⾍基础（⼆）--beautifulsoup-美丽汤框架介绍beautifulsoup 框架介绍

框架基本使⽤：

Beautiful Soup将复杂HTML⽂档转换成⼀个复杂的树形结构，每个节点都是Python对象，所有对象可以归纳为4种：

Tag：标签

NavigableString：可遍历字符串对象

BeautifulSoup：表⽰⼀个⽂档的全部内容

Comment：注释

find_all()函数：

find_all( name , attrs , recursive , string , **kwargs )

搜索 name 参数的值可以使任⼀类型的过滤器，字符窜，正则表达式，列表,⽅法或是 True

recursive ：布尔型，是否查所有⼦节点，默认为true

eg：

find_all("a") 查标签为a

find_all(id="xx") 查id为xx的属性

soup.find_all(href=repile("elsie"), id='link1') 查属性值包含"elsie"且id为'link1'python正则表达式爬虫

data_soup.find_all(attrs={"data-foo": "value"}) 查属性名为data-foo值为value

soup.find_all("a", class_="sister") 查标签为a，class为sister（注意class是python中的关键字，所以这⾥要⽤class_）

soup.find_all("a", attrs={"class": "sister"}) 查标签为a，class属性为sister

find_next_siblings() 和 find_next_sibling() 平⾏查（在同⼀个⽗节点下的下⼀个/所有⼦节点，注意是同⼀个⽗节点下）

css选择器：select()

soup.select("title") 查title标签

soup.select("p:nth-of-type(3)") 查p标签

soup.select("body a") 查body标签中的a标签

soup.select(".sister") 查class为sister的标签注意有个.

soup.select("[class~=sister]") 查class为sister的标签符号：~=

soup.select("a#link2") 查a标签且id为lingk2

soup.select("#link1,#link2") 查id为link1或者lingk2

soup.select('a[href]') 查a标签且存在属性名为href

输出：

格式化输出：soup.prettify()

压缩输出：str(soup) 或者 unicode(soup.a) （应⽤于只想得到结果字符串，不重视格式）输出tag中的⽂本内容：get_text()

输出tag中的⽂本内容扩展：_text(strip=True) 去除⽂本前后的空⽩符

其他详细介绍可以参阅官⽅⽂档

688IT编程网

Python爬虫基础(二)--beautifulsoup-美丽汤框架介绍

发表评论

推荐文章

react useeffect面试题

react fiber常见的面试题

reactnative高级面试题

react高阶面试题

usestate的原理

热门文章

aftership前端面试题(二)

高级前端面试问题及答案解析

西藏久远银海公司面试题(一)

AIESEC绝密面试题

Redux面试题汇总及答案

react框架高级面试题

react-native 面试题

通过React Native用Javascript搭建3D游戏

在React Native中实现无线滚动效果

react effects 中的put作用

react native modal 层级

reactnative 加减组件

如何在React Native扩展自定义原生模块

react typescript includes方法

ReactNative基础教程

react native 响应式

react native组件命名方式

react asset-manifest

Case Study for Nike

react中img优雅的路径写法

最新文章

react useeffect面试题

react fiber常见的面试题

reactnative高级面试题

react高阶面试题

react 数组包含字符的写法

react-virtuoso使用手册

标签列表