介绍
BeauifulSoup 是一个可以从HTML或XML 文件中提取数据的python库;它能通过转换器实现惯用的文档导航、查找、修改文档的方式。
BeauifulSoup是一个基于re开发的解析库,可以提供一些强大的解析功能;使用BeauifulSoup 能够提高提取数据的效率与爬虫开发效率。
安装
sql
pip install beautifulsoup4
使用
1 构建文档树
BeauifulSoup 进行文档解析是基于文档树结构来实现的,而文档树则是由BeauifulSoup 中的四个数据对象构建而成的。
python
from bs4 import BeautifulSoup
html = """<div class="post js_watermark quill-editor" style="background-repeat: repeat; background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA6wAAADeCAYAAAA5OY0TAAAAAXNSR0IArs4c6QAAIABJREFUeF7tnYuS4zCOLWtfs///uXf2GWe3cy8GQ8lyWbZFMSuio6ptWSKThA9BguA//PgjAQlIQAISkIAEJCABCUhAAhK4IIF/uGCZLJIEJCABCUhAAhKQgAQkIAEJSOBHh9VOIAEJSEACEpCABCQgAQlIQAKXJKDDut0s//zz8/NPPz//49T/+8/Pz3/8/Pz81yVb0UJJQAKdwD/+/PzEhvP7P//YcH77IwEJXJ9AdDf6qwZfv60soQRGBNRg+8WpBHRY/x5nmGSgi6PK/zPY/bdT6XszCUjgHQQY6MZm868OenVa30Hce0rgPAJq8HksvZMEvkGAyWIWevL//GTxRw3+Rovc4Jk6rH/fiJkVyr+6osoAWGO7Qae3CrcmwMoMzmoqywA4ERJGSty6+a3cDQiowTdoRKuwLIEtDf6XP/obbTZacdnu8fuKr+ywEq4QZzSD2OqMhks1KK7VYf19X/OTEjiTAKJYZ25xRrv96rCeSd57SeAcAtHVDGKZIFaDz+HqXSTwCQLoasbQ+UkEIs7olgYzkazD+okWutkzVnVYY2AZ6GaAS8hgRPOvGzM/rLDGIDW0mxmB1ZmOQL63MtBlxTS2Gxtlr3mvUA3rj837IwEJfJcAGhybjR2zDUcN/m67+HQJHCHwGw1mhVUNPkLYa/6OwCoOK4lXAoDBK84qr20ZE4aZ6zU0jUgCnyfA9xSTRczo1tCiuu+8Tyrtvff52vhECaxHoGtwbLiH7avB6/ULazwHga7B0dTorBo8R/vdopR3d1hryEJmbtnsvRcymBnf+hNhjdgyE3yLhrcSEpiEQI2G2Ar5TVWw0zoRlddZfY3tO+E0SaNbzNsQeFWDCf0nCaIRTrfpGlZkEgLR1kwm1dMy+hg6VdmaGEaDuzZPUn2LeRUCKzisMbQYXBzWrQEre1T7Kqp7V6/SUy3HqgQQS7J0jwasW3tU3bu6aq+x3lchQITSUQ3ueSLU4Ku0pOVYlUAcUaIftrbFPdJg966u2ntOrPcdHNa92RvELoPc1HXL2EarqMzsBjerrnkt98yPqzUndkRvtSwB7KyHFwVIXV3J31th+VurqH3VNffAhs0WvGyXs+InEyCrbw3x5RFHNHhrFVUNPrmhvJ0EBgQeaTCJDY9ocF9F7WNrNdgu+GsCd3BYa7hCD+etA1mSsvQzoLZmcOuMUQbTJF5i0KzD+utu5wcl8H8EYmd/+bMfpq+uVCGNnY6OpdlbReX81XyO7wImoHRY7YQSOIfApzS4nu1oxv5z2s67SKBqcF/UQV9JTopDWiOd6jX9yBqcXTXYfvYygTs4rDGICGYMohsbA9YYGZvEq1NbDa07oLxXE0E4yH25y3kDCfwNASaM2KNW7bCu3NSV0TrpVG28hwszkK739tByO6AEziPApNJIg7FZVmZe1WCcVPexntd+3kkCRzQ4uozW9nwuOKWj8bEabP86jcAMDuueQ1pXVzjLjQFvd0br0TR5jxmjvF4d3XpoOaAVyNO6nDdajMCeGAYFIkgW0Sp61RnNtfkuYAYXm+S1avdkIOUa7XexTmd1TyOw55DmIegsf/ezVKt9HtXgbve5tzZ8WpN6o8UIPNJgHM6RBldnNNhYwIk9YpM9uzffGXW1VftdrNO9o7pXdVhTLpIl7YXg1v2r+Tufi/OZn763tc/0RFjr+Y2IKWGHrqa+o8d5z1UIYG+pb80uWOtfQ36xWUKO+r4axBJnlFleVnaq88t3hqupq/Q263k2ga7BWyG4v9HglDU2WjWYFRxC/11NPbtFvd9qBOJs4nDuaTCTSiMN5j20lHE5SRDzmVwT+0WziagwK/BqPe7N9b2yw9pXTkYo6oZuhJPV0rzHPeqgthpaXmf/nIPcN3c2b78Ugb1QXUDUKAj2iXP8DO/ld/2eynUMZtl7U6Mr+h6apaBbWQmcRGBvu0x9xBENZv8aqyxbGoxdO9F0UiN6m6UJ7IXqdg0mQgnnEwc0Dmp+1OClu9I1Kn8Fh7XuTRsJISumWw4r+9OYEa6hB3UgWxMn9RBgBfIa/dFSzEcA++2O4pGz13p2Uc5xg0INUcr9R4NovsMMOZqv71jiaxAgOqnuH6+rJVsajO1vaTCTTXW1tIYFY7PY+TVoWAoJzEXgDA0movCIBo9CgENMDZ6r30xX2m87rDiZAdcTJtXsvRhkzRI6Wp1hJZXV0m5AfMZQhem6qgW+KAH2mBNy1Gdu2evCd009+oJV2LyGg/sonCifyY8rqRftEBZrKgLPanA9T7FrMFl8axTESIMJ63eieKquYmEvSuCRBmNnIw1mFfYZDSZ5mhp80Q5x12J902Ht57P1MxYR0ppAhf0tf/2z2pJwXmZ26gxRnF/F8K691npdgUBdgcnfhPLWAWrdQ8MKaa7FPjNTG5vOD5NI2Lt7yK/QypbhzgS6Bo+OlYrGsgJaQ3vV4Dv3DOs2AwEmjFJWNXiGFrOMLxH4hMPK7A9hQxwrU7PxstLSQ49wUBkE48QyOO4ztbzfnd+XIPlhCSxMgEEtg9a6fxT7q2H9daJoFK5LyBF7ZIK2ztTyfk+dv3ATWHUJ/JoAE0tEJpDsKDfsEQ5MJtWH7WkwqzI1akIN/nVT+UEJDAl0DSYaEdtmMpgtcGqwHemWBN7psCJchPHWcNx6FirCmYHqo5XRuqqzt68m9zSe/pZd1kp9kAAhfqx21nPYuihmtXS0ytqLS5KWrbD8kZP7wSr7KAnchoAafJumtCKLEugaTAhv30LXF3P2xr9q8KKdafZqn+mw5l494VFmfKpzivFtGdujldEjDuvsbWL5JfANAt1+6+wtzikzvSM7Pboy+kgsv1F3nymBOxDoNlwnmFI/bDp2rAbfocWtw50IbGlw3R6jBt+pxa3LUwRedVhZNa2D1SqEIwGNiI7EcissuFaIo2oercQ+BcGLJbAwAWyK8N7YFtlCRwK6ddzUEUf0mVnghZvEqkvgMIGqwflQbFcNPozPCyXwdQKcWc62m0ca3LP0UgE1+OtNaQHeSeAVh7XP1qacNVxhVG4MahTOu+WM8jqD556N9J18vLcE7kyAIyZYMeX/PfkKDPaOqtlyRhlQ9z10huzfuWdZt08QwLbq3tM9DX4UoXRUg50w/kTr+owVCHQNJgrxbA2uCQ4ZQ6vBK/SwG9XxqMOa69jQTfUZoNbwQIyvr6DmM3uhDHl/dL86UOZ4jBvhtyoS+AiB2BYTPj2LL6syeZ0BMHtRa+F4rx4t1Qu/FRbM94wC+ZHm9iE3JIAN973jsbm6H/yIBm8NhtXgG3Ycq3QJAs9o8NYKqhp8iaa0EN8icMRh7bO4NWNvDw8k829PqlQNrb9X6761x/VbfHyuBO5AoM/iUifC8NkjwwpMzfpZrx1lIax8Hk1K3YGldZDApwk8o8F932qdYOb8xC0N7lFTTjB9uqV93l0J1ORJbLlJXYmI6Bo8OtbtSJ4INfiuPch6/c+Kyt4PQslZiT0clxAiVj/ZB9eTsuzN+tbnj1ZybSYJSOD3BLA9nM16XEwdCHPGYp6E/e5NTo1KtDWL/PvS+0kJrE2ga3APx60aHFLYec/C/YwG55meY752v7P25xHAWSUJaXVGiWrI0x5p8NbKax9Dj6KpzquNd5LAlwjgsCJm/ViKuqctf8egRqunMaSt42tGYYbM5PKZL1Xfx0rgFgTYW4qzyWCz2h7O5Og4mUfH14xWYpnEqmeo3gKmlZDAhwnUlc1PaTDRUKNoig9X38dJYHoCexqMo4kGj84Yf3R8zWglVg2evttYgWcIxID+8ucDGcgiYn8dnGPKTG5/j0RK1QjrbG5uXw22Xr91FuMzdfBaCaxMIEIWmyLMKH+PEpNt7U/bmlCq4f6IZSaYsF8mr3RYV+591v1VAnWFJTbM5NDZGow94xznN5EUrqa+2op+fmUCZ2lwHQ93vVaDV+5h1v1/COCwkiRpNHgF1Sg2HvHLNXXllWtxYuMU14G0g1w7oAReJzCy174aylPqPvIakoQD2sWS/W4MoomiIGGLg9zX2887SKCH6h7R4Jo06RkN7uHC7lG1/0ngNQKjieC+GnpUg0fbdYiAyPeEGvxaW/npyQk847DWmdl6LM0oUVJ/jdBjBXLyDmPxv0KghgziWG5l9d1yWFPwkXM6mojqr2m/X2l2H3ojAtheqrSX1ReHdRR9NNLg+t1QJ555Xn0tz1aDb9SprMrHCOxpcN9buuWwqsEfay4fdEcCMcJ/LaFBXeR6nQkLrokfmGHK78wGEafvWW137DHW6dMEcB4z0Ix99b3kVRxTNo65qJkI6wxvxLXvkyOkqSZZ6omXPl1vnyeBOxBgoNvD9tHHOjmEfee10dFwDHhjr2rwHXqHdZiBQOwR3YyNdo1Vg2doRcs4PQGcyxgce1wQzYQnbCVY6lmA68wRIQzO5E7fPazAlwmMwnhJ7sAKTK5h0Fsnjeoe01qNrdT47F/PtSZi+XLD+/jbEKjbY5jQJQqCSCVsGg0meiI2flSD+R7IvdTg23QfK/JlAs9ocJxa9DOfix2PJp7U4C83qo+fk0B1WGsGYGaUevKWGpI0ynQ2JwVLLYHvEkAUORaKPWojsRyFDPZQX65Jrbqdci2rrN+tuU+XwD0IEH0U20M3OaYiA9S677RGMqX21c5xULfOU1SD79FfrMW1CBApyIRP1+A6CTTS4O6EHtFgk45eqw9YmosTYFaWcKV6HEafBaYqrMSMDja+eHUtngQuR6CG/JKpO3aJmFUHkyy9HAfFSgxhwjWZ2Wi/KoNjHGMTJ12uO1igyQjUkF8mh1gpzf+7g1n1E/vrURPVTvP5mjMi76nBk3USi3tpAizQVM2tk71VS/M6essYGOe0RzaowZdudgs3G4EaTngk8+Bs9bO8ErgSAcKEaplG56vVI2UYoJLIrIbaY7M1oy/vb4nllXhYFgnMRqDb8FbEQ+yvJ0HC1gkNJmSQe6jBs/UGyzsbgaMa3BMp4dT2z1cN3nJYjUacrZdY3ksSqCHBhDLVsxargF6yAhZKApMQGK2ipOhboUSjY2ZqGC/HYdRVHN7nWT250iSoLKYELkegrpTWFU9CCWtehx7yy6pMd2JxYGPD9XuA69FkIyEu1x0s0IQEXtXgUTbg3DP22yMn+rPM6TJhh7HIlyLwDzisDJyrgDordKm2sjATE6j7Y+r+tlSJ92oW4Lze97wRJlwz+daVWPbQ1T04o0zBE2O06BL4GoF6DmLPgI/t1cmjmmW7hgxWm2TSCQeY5IeE/KvBX2tuH3wzAkc0mC01scuuwaOjavpkFd8ReT338qSMm3Uiq/NVAv+Ew+rsz1fbwYffhECcxfzUFZHqkFLNPhCt++CqaObv0ewtjm4PHb4JRqshga8QYItM3QueguCQssd8lCWfvejYP/ciLJiw/Zr1tzqsavBXmtyH3ozAngYTio8zWm2OSSUmfIk0xNbJ6VK1exRdcTOcVkcCHyWwq8F50x8JSOB1AtUx7aLGvpe6362utIxWYGqW0Ihn3afKIHokvK/XxDtIYE0CdaW02yc2XMN964C3O59MQnGmaj/epoYMGgmxZn+z1ucSeFWDR/tWOXMVDa5HvtVJKCeczm1L77YmgV0N1mFds1NY63MJ4HBiT1t7vxHUvrd0lCCJexIeyP/JUNhDi8+tkXeTwFoEEMrYG8mQRgRIvlLD/XBOmUDic9VJZcWGvaqcser+1LX6mbV9DwG0lbv/VoP7ZDNRTKzOxqmNXXMMpKdlvKc9vet6BB5qsA7rep3CGv+eQA2h76srdQVmL9nRyDntSZJSwq19b78vvZ+UgAS2QgZZLWElZWv/WZ1IIrwXhzW/64DXkF/7mwTOJbClwewxZWJoT4N7osOUcOtoqb7qem5tvJsE1iPwaw3WYV2vs1jj3xGoq6h72bNHYlifWEN9CQWs9657WJnFNdzod23mpyTQbY8BaN1LWq95tC+th/pim6y8EjLIdURC2BISkMBrBKpOsjd8dMezNLiep6wGv9Z2floCTAz9WoN1WO1EEnhMoAolqyhb+85GDmkfEI/OTa0D4VzPsTYK5eP28QoJPCKAQ4k9bQ14txzS0aRTn7hilYbELfVoqkfl830JSGCbgBps75DA3ARe1mAd1rk7gKX/DIEaxsse0q29K33v6cjhHIUFf6YmPkUC6xGoe0yzApoJo70oiZ4gqROrIcD1TNb1yFpjCXyGwG80mGgHNfgzbeRTJLBF4BQN1mG1g0ngOQLsS9sb8D5ySN2f+hxzr5bAWQSwvdxvy9kc7VPtz3d/6lkt4n0k8ByBTDhlUumVrTmPQv+fK5FXS0ACRwn8WoN1WI8i9joJ/C+BI0L3KCxYlhKQwPcIkI3wr3+yffaSHAkL/l7pfbIE1iagBq/d/tZ+fgKPJnyHGqzDOn/DW4PPEjgS8nvkms+W2qdJQAIQeBTym+uOXCNRCUjg8wSO6OuRaz5fcp8oAQkc1de/02AdVjuPBH5+WHEJi63jLCqnRyG/uTbX5Iesv3KWgATeQ4DBaWwue8sf7Ss9sgeVa9gH956Se1cJSAC95GgaNdg+IYG5CHxEg3VY5+oUlvZ8AqTAfyajJyFJe2e9nV9S7ygBCXQCv7VFJqmODI6lLgEJvI+AGvw+tt5ZAu8m8DEN1mF9d1N6/6sRYOUkmQP7IePPlPXRWW/P3MtrJSCB4wQ4PiaTTNjz3rmMozsf2Qd3vEReKQEJHCVQkw5Gg6OlseWto+K27qsGHyXudRI4l8BXNFiH9dxG9G7XI5A+nn84qH1vSw4xzms1BT7X7tUmBpt/W8fbXI+EJZLAnASw4dhofurelvz/LwP75dqtGueeuU9+nh0oz0nRUkvgOwS2NBjNVYO/0y4+VQJHCexpcF1hZQtcxtCna7AO69Hm8rpZCaSPZ0AbA2IVJjOz1aDYk1qNci9l/qwsLLcEZiQQ+4wNJ6svq6qsykQUsV8c0Nhx7Df//JGABL5LIPb4r38GsM9o8LNRE9+tpU+XwH0JRG8zsfRVDdZhvW8HW7FmrJqw8pnBLCuqvJYBbwa4OKz9UHGPpFmx51jnqxAgciHlweFkv2lsNa/VMMI+i0uIcO7jgPcqrWo5ViFQNRhbZQWGiSQmmdTgVXqF9ZyJwGU1WId1pm5kWfcI1FDfGByD23yGjL04rfT7Hs7LYDe/n0nCZMtIQAKvE+BsNga0GfAywYRN53deJyNwd1jZW1Pt//WSeQcJSOARgRoaGPuMbdZJp3z+GQ3G4X30XN+XgATOIXBpDdZhPaeRvcvnCfSV0Oqw9hncXMtqK2GDGdDisLKCg5i6MvP59vSJaxEYrYTWvam8n0Erofo4r+gWWbqxfSamYr/uS12rP1nbzxPY0mD2sRG2H61FW+vqzZYGE87fo58+X0OfKIH7EphOg3VY79sZ71Yz+ioithUmiAjWLGY4rBnERkTr7G+u7/e+GzvrI4ErEKh2Vp1MBqjVYSXsN+XOAJiV1fw/e2nyUzOLasNXaGHLcGcCIw2OLcY+Ofs4+koypXpeed2mowbfuZdYtysTmFqDdViv3LUsWyVAqALJkHBIqxASNlhneOvqDAlbIrJ1P5ykJSCB9xLAQc1TMrits7s4nxwzRTg+IYZMUhEiyGQVIcPvLbl3l4AEQiCOJhNHTCJVp5TIpppxn6Nn6v7VateG/dq3JPAZAtNrsA7rZzqKT3mdQDW2ui+GcCNWSqtYIoyIpWGCr7eDd5DAbwn0EMIMdmOb1VGtE1B5Tq6pqziGCf6Wvp+TwGsE9jSYPeVEMzFprAa/xtxPS+BMAlNrsA7rmV3Be72bAOKHU9r3puZ9VmcY2OY1rnOw++4W8v4S2CfA6iirrNgsKzWE9NdkSuw7f3Sum+wlIIH3EhhpcDSXvalq8Hv5e3cJvEpgWg3WYX216f38pwnULGZ5dt0vk5UYwgR1Tj/dMj5PAo8JsErDwJboBwbCGfjmrDed08csvUIC3yDQNZjkhQn/zT81+But4jMlcIzAIw3O+5lQvpwG67Aea2CvuhYBBDOlYk8rK6mc7XatElsaCUgAAggmkQ8kbMn/8x7hhBKTgASuSYD9rGrwNdvHUklgj8CUGqzDaqeekUCdIcJhnbEellkCqxKIc9ozjK7KwnpLYDYCavBsLWZ5JfC3BKbTYB1Wu/CsBEjWMmv5LbcEViegDa/eA6z/zAS035lbz7JL4P8nPZyChQ7rFM1kISUgAQlIQAISkIAEJCABCaxHQId1vTa3xhKQgAQkIAEJSEACEpCABKYgoMM6RTNZSAlIQAISkIAEJCABCUhAAusR0GFdr82tsQQkIAEJSEACEpCABCQggSkI6LBO0UwWUgISkIAEJCABCUhAAhKQwHoEdFjXa3NrLAEJSEACEpCABCQgAQlIYAoCOqxTNJOFlIAEJCABCUhAAhKQgAQksB4BHdb12twaS0ACEpCABCQgAQlIQAISmIKADusUzWQhJSABCUhAAhKQgAQkIAEJrEdAh3W9NrfGEpCABCQgAQlIQAISkIAEpiCgwzpFM1lICUhAAhKQgAQkIAEJSEAC6xHQYV2vza2xBCQgAQlIQAISkIAEJCCBKQjosE7RTBZSAhKQgAQkIAEJSEACEpDAegR0WNdrc2ssAQlIQAISkIAEJCABCUhgCgI6rFM0k4WUgAQkIAEJSEACEpCABCSwHgEd1vXa3BpLQAISkIAEJCABCUhAAhKYgoAO6xTNZCElIAEJSEACEpCABCQgAQmsR0CHdb02t8YSkIAEJCABCUhAAhKQgASmIKDDOkUzWUgJSEACEpCABCQgAQlIQALrEdBhXa/NrbEEJCABCUhAAhKQgAQkIIEpCOiwTtFMFlICEpCABCQgAQlIQAISkMB6BHRY12tzaywBCUhAAhKQgAQkIAEJSGAKAjqsUzSThZSABCQgAQlIQAISkIAEJLAeAR3W9drcGktAAhKQgAQkIAEJSEACEpiCgA7rFM1kISUgAQlIQAISkIAEJCABCaxHQId1vTa3xhKQgAQkIAEJSEACEpCABKYgoMM6RTNZSAlIQAISkIAEJCABCUhAAusR0GFdr82tsQQkIAEJSEACEpCABCQggSkI6LBO0UwWUgISkIAEJCABCUhAAhKQwHoEdFjXa3NrLAEJSEACEpCABCQgAQlIYAoCOqxTNJOFlIAEJCABCUhAAhKQgAQksB4BHdb12twaS0ACEpCABCQgAQlIQAISmIKADusUzWQhJSABCUhAAhKQgAQkIAEJrEdAh3W9NrfGEpCABCQgAQlIQAISkIAEpiCgwzpFM1lICUhAAhKQgAQkIAEJSEAC6xHQYV2vza2xBCQgAQlIQAISkIAEJCCBKQjosE7RTBZSAhKQgAQkIAEJSEACEpDAegR0WNdrc2ssAQlIQAISkIAEJCABCUhgCgI6rFM0k4WUgAQkIAEJSEACEpCABCSwHgEd1vXa3BpLQAISkIAEJCABCUhAAhKYgoAO6xTNZCElIAEJSEACEpCABCQgAQmsR0CHdb02t8YSkIAEJCABCUhAAhKQgASmIKDDOkUzWUgJSEACEpCABCQgAQlIQALrEdBhXa/NrbEEJCABCUhAAhKQgAQkIIEpCOiwTtFMFlICEpCABCQgAQlIQAISkMB6BHRY12tzaywBCUhAAhKQgAQkIAEJSGAKAjqsUzSThZSABCQgAQlIQAISkIAEJLAeAR3W9drcGktAAhKQgAQkIAEJSEACEpiCgA7rFM1kISUgAQlIQAISkIAEJCABCaxHQId1vTY/q8b/+PPzk3/pQ//5599/nXVz7yMBCbyVQOwW+82DsOG3PtSbS0ACpxFQg09D6Y0k8HECXYP/4+fnxzH0TjPosH68j97igf/0Z7Ab40ofwmn991vUzkpI4N4EYq+xYb7/8zu2HMGM4+qPBCRwbQIjDY795p8/EpDAtQmowb9oHx3WX0Bb/CPpM/9cBrjBEfHMv786Q7R477D6MxDAWY1zGkeVlRqcVmd5Z2hFy7gqgS0Njh3/mxq8arew3hMRGGlwXnPSaacRdVgn6uEfLirhChHBPpDFsChSrv3LH7F0hebDDeXjJLBBAEc0b+Oc4qDyWn4z25u/DUuyO0ngGgS2NJjX62pqbP1f1OBrNJylkMAfAl2DGR/n9a7BdSHISeNBF9Jh1a5GBJjBrQ5pDCghvyNDQixdYbU/SeD7BOpAt9rw1uytDuv328wSSKASUIPtDxKYl8BvNFiH9UF767DOaxBnlpw9bNwzhpPX2NOGQxqHdbRHhusTjuSPBCTwWQJ8jzOZVEN8mdGNjeZnNOmU6xM1kWvdA/fZtvNpEgiBPQ0mKiI2HI0dRTGpwfYjCXyPQLffrsFMCm9tu1GDD7SdDusBSDe/JIaCEDLgZfBaV1MTbpSf7pTy+QyEDQe+eWexepcjUIWw7klNQas9kqSlJ1ZydfVyTWqBFiPAYLVOJm1pMJFOFZEavFiHsbqXItCd0RSuh/zmNTX4xWbTYX0R4A0+Hkc0//7fzupK+kmuyQC4ZgKuyR/MEHyDzmAVpiNA9EMGsnsJV7bEsiZ/cMJpuua3wDcgEG3NpHG21GxFOKjBN2hoq3BLAtFQwnn3NLhHLgJDDT7YLXRYD4Ka+DJi6Vl9qVVBBAln2DK2rRlcsgPzudyHPuXgd+JOY9EvRYDZ2jilfQ85NpgCP7tHdTQzjA2PnnUpKBZGApMQOKrBTAhv5YnIgLdHMqnBk3QCizk1gdGKaXU46/7T0aRTXdypiQ3V4Ce6hQ7rE7AmvZQVmBhJXwXFYc17W/tj9lZRMzNMTH7PhuZeuEk7jMW+FAEEre4pp4D1vby2tT9mawa37rNhnxyTVxk8O+l0qa5gYSYlsJcDoq6cbh0NpwZP2vAW+xYEsD8mhasuvqrBaDMTxPmuUIM3uo0O6y3sabcSMQjCefsKat1qDSwOAAAaI0lEQVQ7Mwr5zY0JJRwla4mTy14bCsFA17Tc9+9b1vD9BBDL2GpPelZnZ1kZ7XtU+xmrtcRETuDs8rsegfP+GvoECdybwDMaPIqS2NPg6HbsuA6i1eB79ydr91kCNRLxbA2uEVKMmfNbDR60sQ7rZzv+O55WVzZHZygSN8+h4lXYaubQGlrEwBVDrWFINbypDphdkXlH63rPuxPAnvJ7ZENMKvF+tXFsH5tm8qgnW2JmOEJYw/bzeg03Hm0buDt/6yeBVwmcpcHRY3T6qAb3bKRGRbzamn5+NQLv0mD0Nro80mDeV4MP9jgd1oOgLngZs64UbXQkRQ35xSjI8lvfyyA47//lzz44Qgu5Jp9hoEu4wsg5viAmiySBSxIYhRKNHFZChlKJHhbMe4TfMzlVQ4PZW5N7I8y5l6swl+wWFmoiAkc1ODYYe4vG1iy/RE+g3YQOM3EUu1aDJ+oQFnUqAt/QYEJ+1eBfdBUd1l9Au8hHEgpUB7B7iRricBL+l0yE7FfLPRjs1sEsYQ84sXXlhXAFZ3Iv0hEsxpQEiGiozmW3YQQVcatnpVaxreev9pVYVm2wV/bKuJo6Zbex0Bci8IwGR1OxWbbmoMnYZtdgJpkykawGX6jhLcotCDyjweimGvzFptdh/SL8g49mZbM7iEcOCmcGmBXS7qCSdTBFqeELOLXMABtTf7CxvEwCAwJ8z1aHlFDfvue0fryH3NfkSbmOUKOeBKJGW/AdwIDXiSa7qASeI/BODY59MmlcQwRr5v0aJWH27ufazqslEAKf1OBoO3vRmYRinzljaVvlFwR0WH8B7cMfqfta+oA3DmicSwQVMeM69qgS8lv/jzjWMETCj/qeVRMofbjRfdxtCOCY9pD9unpK6B8TR3XFtCZU6c4n19XVl27jtwFpRSTwJQJMDvfEg6yQ1i0zRzS4TgATEVFXWaPrNUEi23C+VH0fK4GpCexpMOH6RDO8qsF9kplxtmPoE7qQDusJEN90izozM0qYlMcmVAgDqzNIHF+T96uA1pCkrWNn2Gejgb2pYb3tMgRIiILT2fd9MxmFrTEwZRIJB7XaItdsrcxurQYtA92KSuAkAmgwkQ1xIns0Q5xLNfgk4N5GAicT+K0Go9UjDa56PopYUoNPbkRup8P6JrBP3LZmGGQ/Wz5eZ4V6wiRuz8xvnZ2NgbFCyjmpdUY3TmxPzf1Ecb1UAhIoBBjUVmezhtfX79qtI2cY8GL3+U1UBPfFhhk8j46ZsmEkIIHnCRzV4NhgP8tcDX6et5+QwJkEztbgaCxjcZIk1dwPavCZrffEvXRYn4D1hkvr7A2zMjX2nXChXBdhZG9pHQT3lVASQeTa0Wopxuhetjc0qLdcikAd6NYIB2Znsb8aJtRXWXu4Xw37rYmSAIuAmjRpqa5mZd9E4IgGR2N70sJHGpz3SbTU952qwW9qTG+7HIGuwYyZazLROoHcJ51Ge1vV4It2Ix3WzzVMH5jm/zV+PiWpiZR6GODRlVHCDOOw+iMBCbyHQJ3VxXlEPEdHTG3tg+ul29pv855aeFcJrENADV6nra3p/QnUPBBVg0lkVre9cW1+P4pOUoMv2nd0WN/fMHT+OvNTswFWo8q1JFIarZymtJyjOip5npHPx3h76NL7a+oTJHA/AnUveWpXw3p7BAOTUKPQQWZtj2QFzn32rrsfZWskgfcReEaDt6KZUjr2q6rB72sr7yyBTqA6m1WDiUqsEQxq8I37jw7rexuXMKKaVGXPmeRcqNHq6F5YMMdb0J6PZpDeW2vvLoF7EOhCSa32Eh4RNdGTmjFoZm9M/U6oq7V5xmiF9h5ErYUEPktgpMFsuxmVRA3+bPv4NAnsEXiXBtcFnaq/jKHV4Av2Sx3WcxqFmZ6+L7TPyO7N3u7NDKWUeX8UFlzDIph9MsPvOe3qXdYh8GgvC0mQ6sHhnc6jVdStsGCSOOR+NUHaOvStqQReI3BUg7eOiUNjOfN0FKGkBr/WRn5aAo+cUzSQ67qmPgrX3dPgrbDg0YRx33duy12AgA7rOY2AkfSVTbL0In41ccNosJv71PPX+jVbYtv35pxTK+8igTUIsPe0ZulOzXEkOWYGwevX5dqtFdRKcE9steE1+pq1fA+BPQ2uqyVnaPBoH5z2+5529a5rEDhTg9Hr0cLNlgaPJqzXID9RLXVYX28sQohiCAnlrausvEd4IBk+OXaGp7P3dC9UqQ6K9/bQvF4j7yCBdQjghLJyWiedqhPKvvNc3wWxRjn0LMCVJDO5eW3rHOR1yFtTCZxDAJ2NffWzUvuxMxwR18P61eBz2sK7SOBZAkQXxjYZA+NsvkOD0XpPyni2pb58vQ7rsQbA0czV9TgJRC6v5Zq+P5UBKntMCTNg0MugtWYHzjMIb+J5fdBryO+xdvMqCVR7YhWkn1tcvwdHZ6Wy0lrtt+5FHc3aYsM9tMiZXPukBJ4ncIYGo5tMPtWV16MarP0+33Z+QgLoIWPfmsOhbolhMrdqdA3lVYMX7ks6rI8bv5/Tlk+wCoNDmkHuXlKkhAbXgTD3JPw3e1Pr2Y20i5lCH7ePV0hgjwChRvW7rocI8vnR/tQqltVGmaDC9hHiKsw9wZItJQEJPE/giAbHpruu8qS6espAuF9Ldn3OXFWDn28nPyGBEYGRBjO2rYsz+exZGkwkhRp8oz6pw7rfmOGDMxkDI3RhFLrLtT0kqQpjFdDcl2vzNwNertmLw79RF7QqEngbgeps1nPamMXtZx1nlaVOSOXv0d4a7st9WGHN/+sqDjb8tgp6YwncnECNYqqJz/q2Gmw1jucRDe5HyKnBN+9IVu8rBPqEL07pyJFkfN3f29JgEqTV7wU1+CvN/JmH6rD+L2dWSvN3DfkdnYvKTOxoD9ronDb219RkSv2+OLU6qZ/p9z7lXgT6qmYNG+pZP3sipUpilGFwK9y3iuXo+8Ow/Xv1MWvzXgJnaHAd8NYsv6NkhWrwe9vTu69FoGtwDelVg9fqC2+rrQ7r32b3xHmt4QocJRMDRBBHs7vMHMU4k3ypxujXcCOu2zuP9W0N7o0lcDMCDHQzAK1702roX64hnJf95KPkSKNMv32Vlu8I7mnihpt1KKvzcQLV7roG41gy4fsbDSYzcJ2MrknWPl5hHyiBGxGoGjnS4Bq5xPg3vx9pMJNOIw3eSp52I6xWpRNY3WFF/KrxYFxk4uX/sMpvzkrsq6yEBcfQ6nus3OQ5OLJ72UTtqRKQwDECdX8bk0pkGyQZWk3Ygv1u7W0ZnZVaBRMbHh1tc6zEXiUBCUCgazAD2thXdLSG36vB9hsJXI9AjRCse8BZ+Ol7WPc0uGptz9hfdZwxuBPG1+sPbyvRKg5rn6FhphUHs+55YTY2q6T8ZIU0r2NAGOBopZWw4H4mKyETPWvo2xrXG0vgJgSwHWZV+7moeb8mQouAVoeSsHxELlhYke0TR1V866QTZagO603wWg0JvJ3AMxqMvaLBOLVq8NubyQdIYEiASCZ8BnSzTij119RgO9OpBFZwWGtClMAjlI/BaBzMbli5roYjkOW3DmBH+1UZCEdwu8N6asN5MwksQgChxFnFYawhvqyo1tndmvBotG91tF+V74fu8C6C2mpK4C0EugbnIWTqxhl9pMGJfKjZvevKbD+XHNtWg9/SnN50MQI9wugVDa4TxGrwYh3p1erezWEl1KBy6SuePQHDo3CF3Ktm9OXe/dy2+kyOvHi1ffy8BFYnUJMeEfJbU993h5bvtJrAbLRvdZRMCdbcw8RJq/c+6/8sgUcanPv1I2WOaHB0vEc0bWkw0RCGCz7bel4vgb8n0JOC4sDWSSdsmElfIgmJZtzT4NHxjURAqcH2yP8jcCeHtYbp1iaOs1kTHI1Cfkns0Gd/+FzuwZ4aDDJiOTrexu4lAQk8TwDHs4bt1sFtTYTWV0BZbcnv0fE13Luvuub+Hj3zfFv5CQmMCLDHrOd2qEfD5XMjDY5No6kMUvMadt+z8/dBsy0iAQm8RuAZDd6KeCCCETvPb46dQZ9rQkTe1zF9re2W+PRdHNbUI4KW3zVDbxqRFVVCfJnpqeFCo5laBDT3qyER9ZxFEyctYSZW8s0EEMq6coqd1QyiNalSDQ8crZb2ldmanCnVMWnDmxvV2y9F4B0aXKOh1OClupOV/TAB7IvowHos1BkazHYdVk7V4A838B0eN5vDOgoTqHtZSMrQk6Wwh402I/V2nb3Ne3UvDKuuOMA15MEMoXfo/dbh0wRGobY1pJdwwr73DPvt31esjmKbNbSI74WabZQVoNTbldVPt77PuwOBd2pwjWJihSYT0WrwHXqOdbgCgd9oMBpNqH2tx281uIYMX4GLZZiAwEwOKzO4PQyXgWkGuX1vDE1QQ357OEI+R4gSZ72xMtud2Ama1CJK4JIEqujVyARer7ZaV0/zel1dqSG/NQS4hwnjxPZ7XRKOhZLABATQ4L6XlBWYGvrXJ53Q2FFYftXuHvmkBk/QMSziFATQ0RRWDZ6iySxkJTCLw4pTSrhCzwrIykzez36ZzMjWhAs9TX6dvcVJJSw479WzUk3coM1I4DUCNeQ3d+oDXuy3ropWQR2F/PZruabab0289FoN/LQE1ibwjAbXVVGoqcFr9x9r/10CavB3+fv0EwhczWEdhSukmjiTcSQjfH2fakXRkyzlPcSyrqD2kN++isNKzgmYvYUEliBQv09qEoXqTLIqurX/G1uv4b1bDmvNPMgkFGXAfk3msETXs5InEXhVg7ciobDruoK6te2G7wg1+KRG9TbLENiy35EG132qFdAzGpxr6xa5vu2G0N9lGsCKvo/AlRzWumLSDSlGkJ90/jikeb9nIoRSP7aGgSxnqfaQ4L5a+z7a3lkC9yYwOu80NWbvCxNO+f/WGYl7zmkNAWbG2JDfe/cpa/c5AmjwyKbO0GAyi6rBn2tTn7QWgXdqcOw3P4y9cU7V4LX62NdqexWHldVNRLGuhHY4I4e0XsOsbe5Rw3lryC/Xj85/+lpj+GAJTEwARzO2vHfc08ghrdV+FBZcrzXkd+IOY9EvReBZDX406ZQJ4qrBNYKpVlwNvlQ3sDATEyCHixo8cSNa9G0CV3FYCdmNeMWx7M7mEYe0XhOxzE9fPR2FK9g/JCCB1wj0wSghQqNw3C2HtJYA4e2DWVZVc20/r/W1GvhpCaxNAA3OJFD+PqLBW1tztqKliLToYftrk7f2EnidABqcO+Xvd2lwHUOrwa+3m3d4gsBVHFZWViOWCfmNIeyF6o72qfYBbxzfvb2uT2DyUglIYIdADfkdHTHTP7rlkHLdo1VYG0MCEjiXQA35zYRvtHhrj9vWPtWuwTi+7iM/t628mwQ6gdG2m73ohdE+1XpPNdg+djkCV3FYn3U2H4UFsxKztc/1cg1hgSRwEwIIXT9T8VkxrJNYN0FjNSQwBYHo66PkhmrwFE1pIRckcMTZfHQNDnDweVLGgp3oilW+osOaMmUF9UhI0t41V+RtmSRwdwKEA6aeW4mVjoQF352T9ZPAVQls5YHok059n+pV62O5JLASgSP6euSalZhZ1wkIfMphJUywpr/ew7O1B7V+hrAlV1En6GgWcWoCRCzkN4mO9ir0KOQ3nyWb4dbxNlMDs/ASuBiBZzR4aw9qrRLXmCH0Yg1tcW5JoGrwEc1Ug2/ZDdau1CccVgyH85iOOJj5zKM9qBFgz3hau/9a+/cTYKDLk46IZQ0L3rq+7rlxj9v729EnrEtADV637a35/ASqBpPo6JFmHtmaowbP3zeWqsE7HFZmgoh7f3Ru6gg4YcF7560u1VBWVgIfJFATsNSV0Gf2spDUYSss+IPV8VESWIpA12DOID8yWQwoNXipLmNlL0aga3CKdyS6qdov2q0GX6xxLc7vCJzhsDJLw2C27kHNLNC//tnLxvtHV0U5pHgrU+HvauynJCCBToDvgdhm/o7QIZA9dJeZ3SMzvBHdZ0TWlpGABJ4nMNLgOKloJw6rGvw8Wz8hgU8QeKTB6DHj50f6m+tZmVWDP9GCPuPtBM5wWEcJGlhVjaGwHzXPYkCcmd5Hs71dhN8OwwdIYEEC1UElfJfJovwf0cN+g2jvjLc6w8tnjojrguitsgROIRAbjc3GQY3mxu5wWGN72LMafApubyKBUwmowafi9GZ3JfCsw4oTmd84nBFLzkVN1t4IZHVY62fCkdUbrr0rW+slgasRwBZxOplA4ogKBrwcDo4DWw8Lr+89mnS6Wv0tjwRmJ7CnwbFHwv/U4Nlb2vLfkcCWBmeCKe+xDa5GNoVD/ZwafMeeYZ0eEnjGYc21GdhmBje/OVKG2V1WUWJwXNcHtMwk5TN/fVg6L5CABM4iwL42hK9m7EYc86wa2j9KmPTo/Lazyut9JCCBvyWABsd2Y4d1gimaWxMb5v+x377vXA22V0ngOwSOanDsFp0eaTBJ1MzQ/Z129KlfIrDnsBIKiMHkWs5HZYAbpxUBrI4qGXzZQ9PDCo2p/1KD+9hlCCCOZBXETmtYYOyQwS8TTvU7ATvlXiSC4HVDfZfpTlb0CwS6BrP9JrqKLeZvbDtaHW2OfbJ/vEZCsTLDd8IzSdS+UH0fKYGpCYw0mIklVk27BvcxuRo8dRew8GcSqMbB3wxCY1ispDLjg8OaazgHta64MgPMSk13WHldoTyzFb2XBP6XAM5o/q6RD9gh4fgMbHM9e94YxNYjMFihqTPDR9Pq2yYSkMBzBF7R4Ngtzit7VmPbavBzbeDVEniFgBr8Cj0/K4EdAtVhrSmw8xHOQs0AlX0xJHJgBpfQ3wyO2ZOKo5vPmOHX7ieBzxDoqyc4rLFxkpxVhxUb5zuA8CI+l/cJOfxMDXyKBNYm0O0zWprXtjQ479UV1arBeZ09rWtTtfYS+AwBNfgznH3KogS6w9r3piKWDHprJsIgQ1Dzd/ak1mMxDPtdtFNZ7a8QqAeFEwbY96Zi7zWZEiuqdXCL8BKu9JUK+VAJLEYAB5SJohrt8EiDo704rIQIq8GLdSCr+1UCr2owCZeIaEpl1OCvNqkPvxKBHi9PttDR3tSe6CH1qCHA+Yw/EpDA9wjUhEh9byp73vqRNCZw+F57+WQJVAI4mkQs9fwQarD9RQLXJlD19IgG1/wQoyRp166tpZPABwl0hzX/T9gv+9QQTgbCMcaspNY9qH3fzQeL76MkIIFGIJNOdR8NM7Q1fJAQfz6qDduNJHANAnsajG2ToV/7vUabWQoJVFskson8EF2D2aIzSnRoIkP7kgQ2CHSHNZfVZC15n+NnGPB6fqrdSQLXJVBXaZh4ikD2cCWF8bptaMnWJtCPiiN6SQ1eu19Y+zkIoMF1Cw55X9hmNzquZo7aWUoJfInAyGFNUWrCpXpeKqnyv1RcHysBCRwggHNaEy7lY66kHoDnJRK4AIGRBsd+888s+xdoIIsggR0CarDdQwInE9hyWPMYzosigcvJj/Z2EpDAGwmYOOmNcL21BN5MgFUa9py/+XHeXgISOJmAGnwyUG+3NoE9h3VtMtZeAhKQgAQkIAEJSEACEpCABL5KQIf1q/h9uAQkIAEJSEACEpCABCQgAQlsEdBhtW9IQAISkIAEJCABCUhAAhKQwCUJ6LBeslkslAQkIAEJSEACEpCABCQgAQnosNoHJCABCUhAAhKQgAQkIAEJSOCSBHRYL9ksFkoCEpCABCQgAQlIQAISkIAEdFjtAxKQgAQkIAEJSEACEpCABCRwSQI6rJdsFgslAQlIQAISkIAEJCABCUhAAjqs9gEJSEACEpCABCQgAQlIQAISuCQBHdZLNouFkoAEJCABCUhAAhKQgAQkIAEdVvuABCQgAQlIQAISkIAEJCABCVySgA7rJZvFQklAAhKQgAQkIAEJSEACEpCADqt9QAISkIAEJCABCUhAAhKQgAQuSUCH9ZLNYqEkIAEJSEACEpCABCQgAQlIQIfVPiABCUhAAhKQgAQkIAEJSEAClySgw3rJZrFQEpCABCQgAQlIQAISkIAEJKDDah+QgAQkIAEJSEACEpCABCQggUsS0GG9ZLNYKAlIQAISkIAEJCABCUhAAhLQYbUPSEACEpCABCQgAQlIQAISkMAlCeiwXrJZLJQEJCABCUhAAhKQgAQkIAEJ6LDaByQgAQlIQAISkIAEJCABCUjgkgR0WC/ZLBZKAhKQgAQkIAEJSEACEpCABHRY7QMSkIAEJCABCUhAAhKQgAQkcEkCOqyXbBYLJQEJSEACEpCABCQgAQlIQAI6rPYBCUhAAhKQgAQkIAEJSEACErgkAR3WSzaLhZKABCQgAQlIQAISkIAEJCABHVb7gAQkIAEJSEACEpCABCQgAQlckoAO6yWbxUJJQAISkIAEJCABCUhAAhKQgA6rfUACEpCABCQgAQlIQAISkIAELklAh/WSzWKhJCABCUhAAhKQgAQkIAEJSECH1T4gAQlIQAISkIAEJCABCUhAApckoMN6yWaxUBKQgAQkIAEJSEACEpCABCSgw2ofkIAEJCABCUhAAhKQgAQkIIFLEtBhvWSzWCgJSEACEpCABCQgAQlIQAIS0GG1D0hAAhKQgAQkIAEJSEACEpDAJQnosF6yWSyUBCQgAQlIQAISkIAEJCABCeiw2gckIAEJSEACEpCABCQgAQlI4JIEdFgv2SwWSgISkIAEJCABCUhAAhKQgAR0WO0DEpCABCQgAQlIQAISkIAEJHBJAjqsl2wWCyUBCUhAAhKQgAQkIAEJSEACOqz2AQlIQAISkIAEJCABCUhAAhK4JAEd1ks2i4WSgAQkIAEJSEACEpCABCQgAR1W+4AEJCABCUhAAhKQgAQkIAEJXJKADuslm8VCSUACEpCABCQgAQlIQAISkIAOq31AAhKQgAQkIAEJSEACEpCABC5JQIf1ks1ioSQgAQlIQAISkIAEJCABCUjgvwGkPfbrNh9ZvQAAAABJRU5ErkJggg=="); background-size: 940px 222.942px;">
<h1 class="title">版面分析------网页HTML解析 BeautifulSoup</h1>
<div class="group-info">
<a href="https://wx.zsxq.com/dweb2/index/group/51112141255244">
<span>来自:</span>
<span class="group-name">AiGC面试宝典</span>
</a>
</div>
<div class="author-info">
<div class="author">
<img src="https://images.zsxq.com/FpFYmnHpgmz5J4DicXxscPfi3GI2?e=2064038400&token=kIxbL07-8jAj8w1n4s9zv64FuZZNEATmlU_Vm6zD:hS7fTOpUpCI18IU4GweitfivQIU=" alt="用户头像">
<span class="nick-name">Just do it!</span>
</div>
<span class="date" id="article-date">2024年04月27日 14:30</span>
</div>
<div class="ql-snow">
<div class="content ql-editor"><p><img src="https://article-images.zsxq.com/FsOmOdM3jIkLawUT9z7sEbkMZgpV"></p><p><img src="https://article-images.zsxq.com/FnbQkQK1pNTESbYjScR42_PrYb9E"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html = <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><html><head><title>The Dormouse's story</title></head></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="title"><b>The Dormouse's story</b></p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/elsie" class="sister" id="link1"><!--Elsie--></a>,</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">...</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></html></span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、Tag对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head:{soup.head} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.name:{soup.head.name} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.attrs:{soup.head.attrs} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.head):{type(soup.head)} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>()</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、Navigable String对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.title.string:{soup.title.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.title.string):{type(soup.title.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、Comment对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.a.string:{soup.a.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.a.string):{type(soup.a.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#5、结构化输出soup对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.prettify()=>{soup.prettify()}"</span>)</div></div><p><img src="https://article-images.zsxq.com/FmlPl-0tw4xgHRqKTWm5F2R15YJq"></p><div class="ql-code-block-container"><div class="ql-code-block">type(soup):<span class="ql-token hljs-tag"><class 'bs4.BeautifulSoup'></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head:<span class="ql-token hljs-tag"><head><title></span>The Dormouse's story<span class="ql-token hljs-tag"></title></head></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.name:head</div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.attrs:{}</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.head):<span class="ql-token hljs-tag"><class 'bs4.element.Tag'></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.title.string:The Dormouse's story</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.title.string):<span class="ql-token hljs-tag"><class 'bs4.element.NavigableString'></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.a.string:Elsie</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.a.string):<span class="ql-token hljs-tag"><class 'bs4.element.Comment'></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.prettify()=><span class="ql-token hljs-tag"><html></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><head></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><title></span></div><div class="ql-code-block"> The Dormouse's story</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></title></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"></head></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><body></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><p class="title"></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><b></span></div><div class="ql-code-block"> The Dormouse's story</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></b></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"></p></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><p class="story"></span></div><div class="ql-code-block"> Once upon a time there were three little sisters; and their names were</div><div class="ql-code-block"> <span class="ql-token hljs-tag"><a class="sister" href="http://example.com/elsie" id="link1"></span></div><div class="ql-code-block"> <span class="ql-token hljs-comment"><!--Elsie--></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"></a></span></div><div class="ql-code-block"> ,</div><div class="ql-code-block"> <span class="ql-token hljs-tag"><a class="sister" href="http://example.com/lacie" id="link2"></span></div><div class="ql-code-block"> Lacie</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></a></span></div><div class="ql-code-block"> and</div><div class="ql-code-block"> <span class="ql-token hljs-tag"><a class="sister" href="http://example.com/tillie" id="link3"></span></div><div class="ql-code-block"> Tillie</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></a></span></div><div class="ql-code-block"> ;</div><div class="ql-code-block">and they lived at the bottom of a well.</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></p></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><p class="story"></span></div><div class="ql-code-block"> ...</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></p></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-tag"></html></span></div></div><p><br></p><p><img src="https://article-images.zsxq.com/FtPX-qsEEgZYHos3AnyDni1jH6rn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html = <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><html><head><title>The Dormouse's story</title></head></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="title"><b>The Dormouse's story</b></p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/elsie" class="sister" id="link1"><!--Elsie--></a>,</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">...</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></html></span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、向下遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.contents)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.children))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.descendants))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、向上遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.parent.name,<span class="ql-token hljs-string">'\n'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.p.parents:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、平行遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_next:'</span>,soup.a.next_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.next_siblings:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_nexts:'</span>,i)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previous:'</span>,soup.a.previous_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.previous_siblings:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previouss:'</span>,i)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/FuyJDzHROhQahkpBUh4jWRuaB-mo"></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):<<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>></div><div class="ql-code-block"><br></div><div class="ql-code-block">[<b>The Dormouse<span class="ql-token hljs-string">'s story</b>]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[<b>The Dormouse'</span>s story</b>]</div><div class="ql-code-block">[<b>The Dormouse<span class="ql-token hljs-string">'s story</b>, "The Dormouse'</span>s story<span class="ql-token hljs-string">"]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><span class="ql-token hljs-string">html</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[document]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_next: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: <a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/lacie<span class="ql-token hljs-string">" id="</span>link2<span class="ql-token hljs-string">">Lacie</a></span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: and</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: <a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/tillie<span class="ql-token hljs-string">" id="</span>link3<span class="ql-token hljs-string">">Tillie</a></span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previous: Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previouss: Once upon a time there were three little sisters; and their names were</span></div></div><p><img src="https://article-images.zsxq.com/FtqWdNWSM0b8quez92lJ9SqPTK76"></p><p><span style="background-color: rgb(240, 240, 240); color: rgb(92, 92, 92);">代码</span></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html = <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><html><head><title>The Dormouse's story</title></head></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="title"><b>The Dormouse's story</b></p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/elsie" class="sister" id="link1"><!--Elsie--></a>,</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">...</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></html></span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、find_all( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>)) <span class="ql-token hljs-comment">#检索标签名</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link1'</span>)) <span class="ql-token hljs-comment">#检索属性值</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,class_=<span class="ql-token hljs-string">'sister'</span>)) </div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、find( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link2'</span>))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3 、向上检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.find_parent().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_parents():</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、平行检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.head.find_next_sibling().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.head.find_next_siblings():</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.title.find_previous_sibling())</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_previous_siblings():</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(i.name)</div></div><p><img src="https://article-images.zsxq.com/FgdDcWod8Suvbq5UuGYLvXz0UI8R"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):<<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>></div><div class="ql-code-block"><br></div><div class="ql-code-block">[<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block">[<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">[<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block">F:\AwesomeRAG\tutorial\layout_analysis\html\tutorial\BeautifulSoup4\test3.py:<span class="ql-token hljs-number">24</span>: DeprecationWarning: The <span class="ql-token hljs-string">'text'</span> argument to find()-<span class="ql-token hljs-built_in">type</span> methods <span class="ql-token hljs-keyword">is</span> deprecated. Use <span class="ql-token hljs-string">'string'</span> instead.</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block">[<span class="ql-token hljs-string">'Elsie'</span>, <span class="ql-token hljs-string">'Lacie'</span>]</div><div class="ql-code-block"><a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a></div><div class="ql-code-block"><a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a></div><div class="ql-code-block">body</div><div class="ql-code-block">head</div><div class="ql-code-block">html</div><div class="ql-code-block">[document]</div><div class="ql-code-block">body</div><div class="ql-code-block">body</div><div class="ql-code-block"><span class="ql-token hljs-literal">None</span></div></div><p><img src="https://article-images.zsxq.com/FiqTmlpR_fGE6pUZ8gCcdD9z1ao_"></p><div class="ql-code-block-container"><div class="ql-code-block">HTML标题:<h> </h></div><div class="ql-code-block">HTML段落:<p> </p></div><div class="ql-code-block">HTML链接:<a href=<span class="ql-token hljs-string">'httts://www.baidu.com/'</span>> this <span class="ql-token hljs-keyword">is</span> a link </a></div><div class="ql-code-block">HTML图像:<img src=<span class="ql-token hljs-string">'Ai-code.jpg'</span>,width=<span class="ql-token hljs-string">'104'</span>,height=<span class="ql-token hljs-string">'144'</span> /></div><div class="ql-code-block">HTML表格:<table> </table></div><div class="ql-code-block">HTML列表:<ul> </ul></div><div class="ql-code-block">HTML块:<div> </div></div></div><p><img src="https://article-images.zsxq.com/FkTgptMBTLt2w7nUUKs13PNKkckn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><br></div><div class="ql-code-block">html = <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><html><head><title>The Dormouse's story</title></head></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="title"><b>The Dormouse's story</b></p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/elsie" class="sister" id="link1"><!--Elsie--></a>,</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">...</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></html></span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'标签查找:'</span>,soup.select(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'属性查找:'</span>,soup.select(<span class="ql-token hljs-string">'a[id="link1"]'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'类名查找:'</span>,soup.select(<span class="ql-token hljs-string">'.sister'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'id查找:'</span>,soup.select(<span class="ql-token hljs-string">'#link1'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'组合查找:'</span>,soup.select(<span class="ql-token hljs-string">'p #link1'</span>))</div></div><p><img src="https://article-images.zsxq.com/FjQkiig9fOl0Bd5qiCbyH4OddW50"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):<<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>></div><div class="ql-code-block"><br></div><div class="ql-code-block">标签查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block">属性查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">类名查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">组合查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div></div><p><img src="https://article-images.zsxq.com/FqZck0in441U4EYGi6KobKlS0emA"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> requests</div><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> os</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getUrl</span>(<span class="ql-token hljs-params">url</span>):</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block"> read = requests.get(url) </div><div class="ql-code-block"> read.raise_for_status() </div><div class="ql-code-block"> read.encoding = read.apparent_encoding </div><div class="ql-code-block"> <span class="ql-token hljs-keyword">return</span> read.text </div><div class="ql-code-block"> <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">return</span> <span class="ql-token hljs-string">"连接失败!"</span></div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getPic</span>(<span class="ql-token hljs-params">html</span>):</div><div class="ql-code-block"> soup = BeautifulSoup(html, <span class="ql-token hljs-string">"html.parser"</span>)</div><div class="ql-code-block"> </div><div class="ql-code-block"> all_img = soup.find(<span class="ql-token hljs-string">'ul'</span>).find_all(<span class="ql-token hljs-string">'img'</span>) </div><div class="ql-code-block"> <span class="ql-token hljs-keyword">for</span> img <span class="ql-token hljs-keyword">in</span> all_img:</div><div class="ql-code-block"> src = img[<span class="ql-token hljs-string">'src'</span>] </div><div class="ql-code-block"> img_url = src</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(img_url)</div><div class="ql-code-block"> root = <span class="ql-token hljs-string">"F:/Pic/"</span> </div><div class="ql-code-block"> path = root + img_url.split(<span class="ql-token hljs-string">'/'</span>)[-<span class="ql-token hljs-number">1</span>] </div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(path)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(root): </div><div class="ql-code-block"> os.mkdir(root)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(path):</div><div class="ql-code-block"> read = requests.get(img_url)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">with</span> <span class="ql-token hljs-built_in">open</span>(path, <span class="ql-token hljs-string">"wb"</span>)<span class="ql-token hljs-keyword">as</span> f:</div><div class="ql-code-block"> f.write(read.content)</div><div class="ql-code-block"> f.close()</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件保存成功!"</span>)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">else</span>:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件已存在!"</span>)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件爬取失败!"</span>)</div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">if</span> __name__ == <span class="ql-token hljs-string">'__main__'</span>:</div><div class="ql-code-block"> html_url=getUrl(<span class="ql-token hljs-string">"https://findicons.com/search/nature"</span>)</div><div class="ql-code-block"> getPic(html_url)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/Fh_dDSbuteEI_0ArnWoZrCFDRuvm"></p><div class="ql-code-block-container"><div class="ql-code-block">标签查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block">属性查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">类名查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">组合查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div></div><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p></div>
</div>
<div class="milkdown-preview" style="display: none;"><p><img src="https://article-images.zsxq.com/FsOmOdM3jIkLawUT9z7sEbkMZgpV"></p><p><img src="https://article-images.zsxq.com/FnbQkQK1pNTESbYjScR42_PrYb9E"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html = <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><html><head><title>The Dormouse's story</title></head></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="title"><b>The Dormouse's story</b></p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/elsie" class="sister" id="link1"><!--Elsie--></a>,</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">...</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></html></span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、Tag对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head:{soup.head} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.name:{soup.head.name} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.attrs:{soup.head.attrs} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.head):{type(soup.head)} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>()</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、Navigable String对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.title.string:{soup.title.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.title.string):{type(soup.title.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、Comment对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.a.string:{soup.a.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.a.string):{type(soup.a.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#5、结构化输出soup对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.prettify()=>{soup.prettify()}"</span>)</div></div><p><img src="https://article-images.zsxq.com/FmlPl-0tw4xgHRqKTWm5F2R15YJq"></p><div class="ql-code-block-container"><div class="ql-code-block">type(soup):<span class="ql-token hljs-tag"><class 'bs4.BeautifulSoup'></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head:<span class="ql-token hljs-tag"><head><title></span>The Dormouse's story<span class="ql-token hljs-tag"></title></head></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.name:head</div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.attrs:{}</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.head):<span class="ql-token hljs-tag"><class 'bs4.element.Tag'></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.title.string:The Dormouse's story</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.title.string):<span class="ql-token hljs-tag"><class 'bs4.element.NavigableString'></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.a.string:Elsie</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.a.string):<span class="ql-token hljs-tag"><class 'bs4.element.Comment'></span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.prettify()=><span class="ql-token hljs-tag"><html></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><head></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><title></span></div><div class="ql-code-block"> The Dormouse's story</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></title></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"></head></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><body></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><p class="title"></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><b></span></div><div class="ql-code-block"> The Dormouse's story</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></b></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"></p></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><p class="story"></span></div><div class="ql-code-block"> Once upon a time there were three little sisters; and their names were</div><div class="ql-code-block"> <span class="ql-token hljs-tag"><a class="sister" href="http://example.com/elsie" id="link1"></span></div><div class="ql-code-block"> <span class="ql-token hljs-comment"><!--Elsie--></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"></a></span></div><div class="ql-code-block"> ,</div><div class="ql-code-block"> <span class="ql-token hljs-tag"><a class="sister" href="http://example.com/lacie" id="link2"></span></div><div class="ql-code-block"> Lacie</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></a></span></div><div class="ql-code-block"> and</div><div class="ql-code-block"> <span class="ql-token hljs-tag"><a class="sister" href="http://example.com/tillie" id="link3"></span></div><div class="ql-code-block"> Tillie</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></a></span></div><div class="ql-code-block"> ;</div><div class="ql-code-block">and they lived at the bottom of a well.</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></p></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"><p class="story"></span></div><div class="ql-code-block"> ...</div><div class="ql-code-block"> <span class="ql-token hljs-tag"></p></span></div><div class="ql-code-block"> <span class="ql-token hljs-tag"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-tag"></html></span></div></div><p><br></p><p><img src="https://article-images.zsxq.com/FtPX-qsEEgZYHos3AnyDni1jH6rn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html = <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><html><head><title>The Dormouse's story</title></head></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="title"><b>The Dormouse's story</b></p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/elsie" class="sister" id="link1"><!--Elsie--></a>,</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">...</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></html></span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、向下遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.contents)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.children))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.descendants))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、向上遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.parent.name,<span class="ql-token hljs-string">'\n'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.p.parents:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、平行遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_next:'</span>,soup.a.next_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.next_siblings:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_nexts:'</span>,i)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previous:'</span>,soup.a.previous_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.previous_siblings:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previouss:'</span>,i)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/FuyJDzHROhQahkpBUh4jWRuaB-mo"></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):<<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>></div><div class="ql-code-block"><br></div><div class="ql-code-block">[<b>The Dormouse<span class="ql-token hljs-string">'s story</b>]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[<b>The Dormouse'</span>s story</b>]</div><div class="ql-code-block">[<b>The Dormouse<span class="ql-token hljs-string">'s story</b>, "The Dormouse'</span>s story<span class="ql-token hljs-string">"]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><span class="ql-token hljs-string">html</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[document]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_next: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: <a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/lacie<span class="ql-token hljs-string">" id="</span>link2<span class="ql-token hljs-string">">Lacie</a></span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: and</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: <a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/tillie<span class="ql-token hljs-string">" id="</span>link3<span class="ql-token hljs-string">">Tillie</a></span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previous: Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previouss: Once upon a time there were three little sisters; and their names were</span></div></div><p><img src="https://article-images.zsxq.com/FtqWdNWSM0b8quez92lJ9SqPTK76"></p><p><span style="background-color: rgb(240, 240, 240); color: rgb(92, 92, 92);">代码</span></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html = <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><html><head><title>The Dormouse's story</title></head></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="title"><b>The Dormouse's story</b></p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/elsie" class="sister" id="link1"><!--Elsie--></a>,</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">...</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></html></span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、find_all( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>)) <span class="ql-token hljs-comment">#检索标签名</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link1'</span>)) <span class="ql-token hljs-comment">#检索属性值</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,class_=<span class="ql-token hljs-string">'sister'</span>)) </div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、find( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link2'</span>))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3 、向上检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.find_parent().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_parents():</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、平行检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.head.find_next_sibling().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.head.find_next_siblings():</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.title.find_previous_sibling())</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_previous_siblings():</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(i.name)</div></div><p><img src="https://article-images.zsxq.com/FgdDcWod8Suvbq5UuGYLvXz0UI8R"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):<<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>></div><div class="ql-code-block"><br></div><div class="ql-code-block">[<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block">[<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">[<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block">F:\AwesomeRAG\tutorial\layout_analysis\html\tutorial\BeautifulSoup4\test3.py:<span class="ql-token hljs-number">24</span>: DeprecationWarning: The <span class="ql-token hljs-string">'text'</span> argument to find()-<span class="ql-token hljs-built_in">type</span> methods <span class="ql-token hljs-keyword">is</span> deprecated. Use <span class="ql-token hljs-string">'string'</span> instead.</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block">[<span class="ql-token hljs-string">'Elsie'</span>, <span class="ql-token hljs-string">'Lacie'</span>]</div><div class="ql-code-block"><a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a></div><div class="ql-code-block"><a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a></div><div class="ql-code-block">body</div><div class="ql-code-block">head</div><div class="ql-code-block">html</div><div class="ql-code-block">[document]</div><div class="ql-code-block">body</div><div class="ql-code-block">body</div><div class="ql-code-block"><span class="ql-token hljs-literal">None</span></div></div><p><img src="https://article-images.zsxq.com/FiqTmlpR_fGE6pUZ8gCcdD9z1ao_"></p><div class="ql-code-block-container"><div class="ql-code-block">HTML标题:<h> </h></div><div class="ql-code-block">HTML段落:<p> </p></div><div class="ql-code-block">HTML链接:<a href=<span class="ql-token hljs-string">'httts://www.baidu.com/'</span>> this <span class="ql-token hljs-keyword">is</span> a link </a></div><div class="ql-code-block">HTML图像:<img src=<span class="ql-token hljs-string">'Ai-code.jpg'</span>,width=<span class="ql-token hljs-string">'104'</span>,height=<span class="ql-token hljs-string">'144'</span> /></div><div class="ql-code-block">HTML表格:<table> </table></div><div class="ql-code-block">HTML列表:<ul> </ul></div><div class="ql-code-block">HTML块:<div> </div></div></div><p><img src="https://article-images.zsxq.com/FkTgptMBTLt2w7nUUKs13PNKkckn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><br></div><div class="ql-code-block">html = <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><html><head><title>The Dormouse's story</title></head></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="title"><b>The Dormouse's story</b></p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/elsie" class="sister" id="link1"><!--Elsie--></a>,</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and</span></div><div class="ql-code-block"><span class="ql-token hljs-string"><a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"><p class="story">...</p></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></body></span></div><div class="ql-code-block"><span class="ql-token hljs-string"></html></span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'标签查找:'</span>,soup.select(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'属性查找:'</span>,soup.select(<span class="ql-token hljs-string">'a[id="link1"]'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'类名查找:'</span>,soup.select(<span class="ql-token hljs-string">'.sister'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'id查找:'</span>,soup.select(<span class="ql-token hljs-string">'#link1'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'组合查找:'</span>,soup.select(<span class="ql-token hljs-string">'p #link1'</span>))</div></div><p><img src="https://article-images.zsxq.com/FjQkiig9fOl0Bd5qiCbyH4OddW50"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):<<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>></div><div class="ql-code-block"><br></div><div class="ql-code-block">标签查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block">属性查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">类名查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">组合查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div></div><p><img src="https://article-images.zsxq.com/FqZck0in441U4EYGi6KobKlS0emA"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> requests</div><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> os</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getUrl</span>(<span class="ql-token hljs-params">url</span>):</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block"> read = requests.get(url) </div><div class="ql-code-block"> read.raise_for_status() </div><div class="ql-code-block"> read.encoding = read.apparent_encoding </div><div class="ql-code-block"> <span class="ql-token hljs-keyword">return</span> read.text </div><div class="ql-code-block"> <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">return</span> <span class="ql-token hljs-string">"连接失败!"</span></div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getPic</span>(<span class="ql-token hljs-params">html</span>):</div><div class="ql-code-block"> soup = BeautifulSoup(html, <span class="ql-token hljs-string">"html.parser"</span>)</div><div class="ql-code-block"> </div><div class="ql-code-block"> all_img = soup.find(<span class="ql-token hljs-string">'ul'</span>).find_all(<span class="ql-token hljs-string">'img'</span>) </div><div class="ql-code-block"> <span class="ql-token hljs-keyword">for</span> img <span class="ql-token hljs-keyword">in</span> all_img:</div><div class="ql-code-block"> src = img[<span class="ql-token hljs-string">'src'</span>] </div><div class="ql-code-block"> img_url = src</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(img_url)</div><div class="ql-code-block"> root = <span class="ql-token hljs-string">"F:/Pic/"</span> </div><div class="ql-code-block"> path = root + img_url.split(<span class="ql-token hljs-string">'/'</span>)[-<span class="ql-token hljs-number">1</span>] </div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(path)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(root): </div><div class="ql-code-block"> os.mkdir(root)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(path):</div><div class="ql-code-block"> read = requests.get(img_url)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">with</span> <span class="ql-token hljs-built_in">open</span>(path, <span class="ql-token hljs-string">"wb"</span>)<span class="ql-token hljs-keyword">as</span> f:</div><div class="ql-code-block"> f.write(read.content)</div><div class="ql-code-block"> f.close()</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件保存成功!"</span>)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">else</span>:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件已存在!"</span>)</div><div class="ql-code-block"> <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block"> <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件爬取失败!"</span>)</div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">if</span> __name__ == <span class="ql-token hljs-string">'__main__'</span>:</div><div class="ql-code-block"> html_url=getUrl(<span class="ql-token hljs-string">"https://findicons.com/search/nature"</span>)</div><div class="ql-code-block"> getPic(html_url)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/Fh_dDSbuteEI_0ArnWoZrCFDRuvm"></p><div class="ql-code-block-container"><div class="ql-code-block">标签查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block">属性查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">类名查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>>Lacie</a>, <a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>>Tillie</a>]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div><div class="ql-code-block">组合查找: [<a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>><!--Elsie--></a>]</div></div><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p></div>
<footer>
<div class="horizon-line"></div>
<img id="logo" src="/assets_dweb/logo@1x.png">
<div class="text">知识星球</div>
<div class="horizon-line"></div>
</footer>
<div class="qrcode-container">
<img class="qrcode" id="qrcode" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAQAAAAEACAYAAABccqhmAAAAAXNSR0IArs4c6QAAHaZJREFUeF7t3eF6GzevBOCT+79on8dqG0WbXe1LACvrs6d/C4LAYDAEKcn59fHx8fF/+S8IBIEficCvCMCPrHuSDgI3BCIAIUIQ+MEIRAB+cPGTehCIAIQDQeAHIxAB+MHFT+pBIAIQDgSBH4xABOAHFz+pB4EIQDgQBH4wAhGAH1z8pB4EWAB+/fr1FmhNfnFxL6c9/1s7sTkCq7N263MSi05xhRudWLVOnRxk7SQPZL+OjeIdAdigLA0qNhGARwSUkHu4RQDWpUDxjgBEANbZtVmRCeAOiGDRBhwcRAAAJD1ZJke/zvSQK8AdASU40IBNJnnAmxYNFZ9MAJkAihRbO/WUkLkCtMtxc6B4twRAN6mmpCd01b+um1T+yQlgL371L7XTcVb3FLwlrk8/EttkXFdjLdgc2XT6JAIAyEcAnoM02WgRACAkvMEwjvr3ADoqs57SPyu+Yk8ZQTuE76wVHNW/EERO2aNxU9duc5K4MgE8otbpk0wA0FWZADIB5Aqw80UgVWvosV2TjrJV98wEcEdAT3GdOqQmyimJbTKuCAAKgBTmajA1BiXbNt6Of10rzaI4iqjpfldjNhmHYl0VCl2ndpK7+tI6jV8BFHS5+1UnAI1BQYoA3BG4GjNpgqN3h1fXqdOMytEr++T2ljL9CPgOiWkMV5NZCaKkFzvNSTESAkpc1f2OfEueume1TrpO7QRH9SX4RACKfxH9amIJETqNcSt88cddSqzqaay5SxyaozaViKFOrRrb5J67V8HvOAEoiSaLoHtO2mn8QnBpqCPhqPrXZtE3jG0ck/h0rh2Cj/JCfXE9IwAKff8evL7T8xWTBGfCFB+Dx4kLcUziEwEAwKdHy84JIc2mBJExTPabttH4pfkiAI/VqXJP12nthHu6Z64AGwQmizDd3OJP448APEdT8MkEkAngN4v0tJQm7thEAB7RyxvA+jX1W3wMqI1QbbYqsar7dU6bzp6ydlL8dHSt2k2e7ILN0RVY45A91JfWKQIAqEcA1k8WgJV/7BUBeI6/4vNt3wAyAUi7zdjoySK7KXGrdtOn5dU5if/pnDIBAOqZADIBAE1uJlWxUv8RgJ1v72UCUPr07TIBPMfwxwtAn2J3DwpmdU/1v7XrNMHVYrWHRTXeKj57j5ia9+QJp3t+BWZVzu7e4/ETupe8AbxLYhJHleDVhjoaESXWjk013io+EYDHalXx15prnSIAGwQUuEwAz98ABEc9jTMBaNvPTMrjj4Dr4R+vEGJ19lP/EYAIgPJMOaX+xK6zZwRgg7CcQJ2RTk9CKbzaVONVYomd5i34710x9G78LphpHGIn+B/5aQmABDdtIwQRmyMSCZhic3Tfr8Z29Z7VuDTPq+PXd4fJPNXXdA+IPxX9CEDhXeAnk1lJL9+dqPp6F9HR+KVhp20iAIXGfhdivfNppqSPAEy39Jq/CEAE4DcC0rRik2vTrwdWKWZrrTtjHQGIAEQA8Asy1SvdjxKAGV26xou+MG93V5XcrlPC7GVbjXXPlxJwMk+toFwB1FfVTvPWek5+HFzNaXodPwJObzzpr9pUSpAIwHq1IgDrmH3FighAAXU9MTIB3BGoinShPE+vPloTma6qB0gnp+m1EYACohGAc9AyAZxj9A4WEYBCFSIA56BFAM4xegeLcQGQUU9HJ/H1ChCFzJpTZwSVt4hX4CF7CGZ7fjo4SlzKKYmj46tziEieahMBAKSEzEKYo62qZFACQorjJoJZBOARgQ6HqgWMAAByQuZO8SIA9yJ0cIRS8r+JKHGoAMuD4mfssqfkuGITAQC0IgAA0sZEMMsE8EMnAKWTKqIq8XbfqjLrfupfGkGnhI6d1kXsqgIgWMj+nzadOskeHf/VOmlPSPw3jPQfB2WHxX92ulN4LUQE4I5AFTPlQQTg+bUmAgBMUrWrkllP6CqZ1b+IX5UwR3fLKmZQtptJFTPBQmPQHJVn2307/qv1rMZ6hFkmgA0yUpjpwgvpJa6jsbcjRNpsZ9OVYiZYaEy6Z7WpOv6r9azGGgHYQUAbo3qaqX8hfZUwmQDO5aLaVD9KADTZc7j3P+642v9eXNU9q4Q5OqGvFoDJmqioiWhWfa1MOpJ7lRvKAxXvaqyddXwFqDaLkHulMSTZTmEm/VeJpXdobaBqTkpcsROblcZWf5J7tU4dnunaavy6LgKgSP1h1ymeCunkCSopqphU7bRhp+0k9wgAoKTEBVe733i62n+1yDrBSN4rk04E4I5oVXS0JlVu6EGgotaJt7o2E0ABOS18lVi5AjwiFwEokBSXtASg0wgY319mV08Kk/41RyG4Yq2njeTZ2VNzr9oJZupbfSke230F6z3R35sYp2ONAGyqpcVScomdFFXJFwEQxF87YSinqjzQmu9OpPpV4M4m6yU5XqFgyp4CuPjp2kgcEYBa00ptBP+jE1r8K2clDrFZiTUTQCaAQw53REcao2OjjSB7qC/FI1cAQb1oo2oq7rXw4qtjI3Eo+XRSExw7e3bwkLWCmfjRu/fKqfotBWD3/gC//JsslBZL95wkeMeXrq0Sa6921T2VB+JfxUobWfCRuD79XC2Q1Th0nWLGVwAt/NZOm1EDFn9is6Lo70IGIXgHR10rcQhRIwDniG8xElzPvd4tIgCAVgTgOUjVRq6ug5LdTDr+36XmEYBNteV0F5tMAB/aR6d21UarrjsN6F+Djv8IAKBcBUnWHW0vzS02EYAIwDOKC0d1HJ8UIt0T2vefKUm/B3D1G4AmVgVT11ULr/4Vxy0eEteRqMnaq0Vz2r/gqHvu+VI+SqN1uHG1/wjABuF3aJa9Rpa4IgC/HqoZATiXjwhABOCGgDaLnozyeDV9Mlb3zARwLhS7FnIqXU0svcsr2d41J4lrpZG3Bb26TtP+cwW4I6Dc3sVM3wA6m2w3Vl9qJ/qlvsRObCSm/2y0uaVppTE0tuppr6KscSjeMgHonlW7aqyKmfrX+PkKMLmx+lI7SVZ9iZ3YSEwRAENJ8Y4AGJ5/WkUA4A1AXuT1tHyXE1qoojlpg8qeio9cKTT+alyTsWYC+Pj7c+lJYqkvsRObFVLlCvAcLcU7E8AK6/6xzQSQCeCQNXqCaoOu0/Nfku786CwTwB1NrdPuxKKPgJ1xZ7tWTzwp8l5cCojGIcTVWDt2Eoc242TuEpfadGonayfx6dRS8ZfrZ6sHIgBKzed202QQMndEWQk4g4570by1keXw0dptfek6tZNG1roxjhEAJ+czSy1yx04i1cZQIsmekzZMXLwWRACeV4ffADqnjRRB1O/TRojbIVGVzJ3G1qaV2NSX4Cj7Tdt0aidrJ/GZrrn0gNZNsLj1UyaAGQpPk0ELKOKqsc0g0fOieWsjX4mP4qp2by0AqjxS/ukiT378I3l2Cir4fNoIRtoEaldtFpkOJZ+jCa+Kd3XdZI10ap3eU/3xBCCNoZteTQb1L8QVVZ4ucgTgEfVqI1fXvYLHusfWrsPtXb7rFSACcIfvamJFACIARwIRAYBvDHZAEqGLAJyfX9VrmV5X3qFOyjOJ9RzRfyx0T/WXK8AGKSlWBOCcXhGAO0bCqXNEv1gANMBq4av+VRW/ogh6mmnuYqd5Tp4ksmdnv0kcO74muV2NQ7DWnri9XekbgJBv7zGsU/jdR4vBL4BoTtWHmGqRq3Ht4f+Ku6SQssODSRw7viIAwMxJkCIAAHjhCrNyQkgEEQBB6dGmKkSC9Up9MwGs144fYqpFLoT0e8k0QSQW2TMTQARAuHRqU20qIenp5v8aKJmrsWocOiHt2WkOEotg29lvEseOr8npthqHYP2yCUCDERJ1Xta3a6vgrtyhq+8C1aZVfNROaqI21T1VFKr17PBTYqvGdYSr+NOcJP4b3zuPgBqMEKlKoj21EyBXilCNX9ap6Cg+aqexiV11TybpFzz8SmwdnulBIIdbZ8KLAGzQq4qaEKYjOtpkaieNrTbVPRWzaqNVa6kjdDWuFR5EAICFAlKHbBACPwyq8ssVQwnYaYRq7rJnpyayVmI4yq/qX9ZFAHYQqJ4iuQI8gtnBUZpdx01pPm0WFbptbBJDBECrUGSHFm/aTsLtEERO6Opp3xG1Do6TOclUJjV6hY2KpuRU9fWZp9RObFYwa70ByEYa8LRdNTZZp6dgBOCOwKTYVmu0ctoLH8VGG1vtdE/FKAKgSJ3Y6SCljfAVp00mgOeC9RU1qe6ptI4AKFIRgEME5FRS4Rsqx5Kb6tgueevJrna6pwLAAlDd+J0LryCJnU4A4qtzdaj6n14nJ1fnKiUYaU2Uo+qviuVkj2msEYBqtTbrFPDqdkrSqv/pdRGAdUQjAOuYvc2KCMBjKSIA69SMAKxj9jYrIgARgC4ZIwBdBL9wfQQgAtCl37cQgG0S+sLaeRCS+3Enjm1s2uxaULGTHD/j/Io8hfgaf4cHEsekjeY0zZfRHPTXgELSz8AiAPfyVDHba+QO2XRtVeiEkNUYjkRN9rzaRnOKAGwq8QrghMwah/jaI1sE4LkYaoNqA6m/KTvlj8avfJmK/3ZgZwJYh3O6oFL4Dtl0bVXoBMFqDJkAPgTesk0EoABdBGAdtAjAOWZyEJx7WbO4XADWwpmx7jSoRDD50Cb7rZyCnUaTWBRb8aU21Zw6sVb31JzUrpOD7BEBEJQ2NhGAAmiNJdVm7DRPdc9GmrtLOzlILBEAQSkC8BuBqwm5V45qM3Zire5ZoNPTJZ0cJJYIgKAUAYgAFHgysSQCUEBRQauqfK4AhaI0lkzWScOo7qn+1U65rP62duMTgAQy/dopxbq6adV/x+7V2Gqd1E7i1yuANIbGVbWTGI5y/oo9d7Gd/h6AFFmTF1+fNhGAO1KT2KovtdN6/nVKDf67AB0B3uYZAfiofUlhmjARgAhAZ3JQPkYA4DFMVF4BF1+ZAB5RmsRWfamd1jMTwLGgd6aOXAE2CMjkoKSdHC0/96wWerIZ1ZfaKZYRgDcUAB2xqkWebiCJQ5tsUigkLrWZxEx9VWObFonJmlR5oJipnWCrvjgnfQSMAEh5XmvTIYPcZztNto0tAnB+VRP2dGreugJEAKQ8r7XpkCECcK8Vn5abTyMm8VfmdPaMAGwQqBZei3W1XYcMEYAIwCcCrS8CCcE7JBX/tyRAmTsTjIyze/41d81za6f+Vei2/jtXgGpOuu7qnATbagx7nD3KW/boXK8iAMC4CACA9GITaYxJ0e8cNNNxiFArPhEAIG4EAEB6sYkSXJqlOr1VY8gEUPxq5xHHcgW4I9MZB6vN8uLev21XbT691uQKsKmqAvfqe2pnNNOcMgF8RYs/3zMCMCP6fAWYpIA2XnVPUe9XPLp0TmOZaq7GUfHv4D15YAhmnfu44jFpVxU6jSECsEFKANfGVjsh5WSTKTnUbjI29SV2UsuV+7jiMWmnOVT3jABEAKrc+b1OmlE3UV9ip83zLpPUHkaag+K7tYsARACq3IkAtJE7dxABOMfoLws5HfIGUAD2YEkH77wBPK/DWwuAjE4dcuhaAUli/SyF7into74kftlv5T4rn2xo/Bpb1U7xkUfAq3mgmGlOf43sOx+hd64OrSuAgKmAaBLVhzWJNQLwWIVO7arNrjzYs4sA3FFRgYkAwBuAisfkOFttII01E8D66C3Yqmhqg2YC2CCQCeA5cYWke5OO4qr+qwKWCeCa+h55zQSQCeCGQARg5vrzbSeAKkF01Hnnk0VOsyo+R75lRNfTUmO7ctzUGJQHyiupndq8wxuD1pxz0j8JVi2gFkoLr4mJncYmvqr4RAAeEVAeTNZO6rs3Jelpr02ructbE+cUAVCo1u9mkwRRcuie0kCdPWWa0Fi1gWYqeewlE0DhvixEO7qDXl1QjU3iyATwtQIpNeraRAAiAIccigBEAFYOss708zZXgKqiavJ6QuuoKvHKntrse/vp2lfn1IlV1gquR/X5CiykTp24JntAYj3EtvMGIA2l97dWEvj1SIlXiDodq5JB4le8xddknoJrBODXXxAIbq06RQAeMb8ccPxzaJ3TZWpEbBGr+JeaZZoQ8TqykfoejfLVj2ZVlKt4V9fd8owARABWTt5qA+m6CMCL+agCoKorJ5eSoaps1XWaY4ekOu4LRpqn2IlNRyg6/jt4yzQknP30U50ApJYr2ApPdU+eAGTTo9FJiqBFlsSmySa5V0mk42YHH8FDbFZIKs0itVzZU+qkAixje6fmEqv2k8R6iGMmAC3Fc7sOGarNp+vETmxWmjECcEfrXYRu9xCJAEQAOlOIru0IjE4/UslMAI8o5QogrAGbTACPIGUCyAQAbfM6kHQM00aWdw099ap7LgF8Yjx5Mu5NBR38NTYRHb0vv0NNPmNV3Kpc+JIJQIugSUmxFEjxpfFHANYFvoNZBEA75m4XAdhgFgH4gxz4DUs5oTsCLP73Tkutpfpfb6/+CsWtulMEIAJwQ6DTBLJWiZwJ4PlbSrXRj9ZFACIAEYCPj7/6Q6eH6YaUt6XJPVsC8K4gdeLSk2pbhM6e+qYge8pp3BmXq+RTXCcnAI1V9xR/6kvtpOZV/nyuiwBs0FOiVgsjJDoayWXPCIAifLerNuPeTupL7aTmEYDNCNc5jSMA6w0kKxRXbQy1k9i+wld1T+U24935JqAGI0Xo2FQ//ukoZ1WZNU8poJJI7K6upeRzG0kbP5fWPaR2V/vSPCXWDo9zBcgV4IZABOCRCBGATWNcrVgtFYPPq6t3Y22OKmF0Ipi+b279KT5aJxEU3bOKrcSg7y17PNC4qr3T2VN5xRNANQktghJLG6FKcCWl+NciVO20JlKDat5HDTS5pzba5eNy8a8caZ2E21UsjjgWAYArQJXM1cbWdUqsavyyLgJwXi2tUwQAmlFA0mlCCyONMK3M57TqPZjJBCN5RwDOK6U8E25P8ywTAIiONMJ0Yc5pFQE4w0jqljcAZK6CeVaUo/8/eQedVFzNR+NHuP96lVf/Gq9MAOqrireuU+4Jtp09xb+c4tNTk9ZpNzb9HoAWoRpMh+DyPQAtXjVPjb8ah/qfxF99aVNt/ek6rYlg29lT/EcAlDVDo/eemmqRtViSkjaokmibg/qXWPWNRH1V8dZ1EYB7JZQ/XLtMAI9QKdlkhFaCixBFAM4pLc2hNVG786iuf6uRGI5sWo+AnY2rDSSnV6d4EYD1qlbx1nVakwhAoXadCaAKuDTxeir3FR1iyUkreR8qLnxrUfHRPGXCmN5TrjDVuI6wrb4FqcBUDy3lSzUOrd0u3hGAR1giAM/vmyo6EYD1e3sEoHPs/7G2StJPFxGACIDSULhyxKnp6Uemk0wAGwSqQqEj3WSRv4JsnT0zAfyACaB696g2niqz2lUJrqPapFBorJq72OmenTwlDrXRuqi/ip1iMRmr7jk+AUQAnlOkVRj45dkkibSWKt6V5umuuRoPiU9rPhmr7hkB2CBQPeG0eK3CRACk3x5stC7LjhcWaM0nY9U9IwARgAUq24PoyiPX0uYF48mmKmx/W6LNOBmr7hkBiAAs8bo6IS1tMmg82VTVsLQZJ2PVPb9EAL7DnbED8BZ0Lbx8qaVKUj2pNFaJQ8VE3yJkz13C4xexJF6xWYlTeKY1EV+fsY1/FViIq8GtgFexnQZTYqjuqeskhgjAOUrS3GJzvtPdQvpCeSC+IgCN02ClsH/aVguo6zQuIcjknp1mkVg1b81J4hUbjWtalBWzTABQIQUTXPGf35ZJSvY7spGctFkkjk6zSKwSw+3Ea4i+fLlJ/VevOupfMYsAAHMUTHBVJqAWXmKYPm1kzwjAOUrCM+WB+Fq6AuyFP/nAp4mdw7hvoYBU/XfwqeY+mZPWshqr4qpCIblPxypTmcR1hEU13tae+mvADsGl+NXkxbeeeOpL7a5uqk7htzlcHatiFgFQpO52HR7wFSACsF6Yq5uqU/gIwHo9MwFsMFOCC9SZAASlR5sIwHPMpjkVAYgALHWpCmSVqBGACED3ejt+BZAOmSTu7SUTfjgjcenjjMavAqDXq04OU2v1jr7dr4PZXuwSh+45hc2KHxV9mToUn12eTT8CCgjThYkACOozNtJ4LUJe/Dn9DAp9LxGAPoa/PUQABsE8cRUBmME6AjCD481LBGAQzAjAS8CMAAzCHAEYBDMC8BIw/+cEQAOuPv5c/RBWHV2VDR3/nbUa35SdxirvPFVOfeYi/qdy/s9PJ97pWM78KT78KUA1eQ0kAvCIQBXvM2J0/38EoIvga9Zr30UAhuqhjbG3XWftUPjsRmMVAnZETvxzUmjYiRe3GDNTfCIAQ5BrY0QA7gh0GkoJPlTem5tOvJNxiC/FhwVANv0qm+ojoH5Rp+pfrzVSrGqsnftyZ89t7iqQgoU2Y2dPaXaNVXkw2T8aWwRgg7qQRsHVwou/TjOK/06s1WbRnDS2SdGp5qRNLP7Vl06Vuzh2vgnYCXBybfWEVgJW/StxpUGrsWYCeKxCB0cRGOV1BECRArtqg1bJIA17FLbuuV2v69QOYN2988qEpCdSJ1ZpII1V7SIAwpovsIkA3EHvNNWk6EizdGKNADxvND2k+A1AAH9F70+qtRJwMnctzGQzCmZis1JfyVPx132vrpMcNJqTxrrFsbrucCLVNwDdWItVtasSVdep3WT84qtDLMlJbCTO/2wiAHe0OthGADasq4Kp69RupRn+tJXG2PMdAThHfPKQEh6IzWfUaidvJ5qj8ixXgCGBOafnPxZamFwBFNHn7x/rXo7rlCtAFc3BdVU11XVqV00pAvC8Yav4fHrV01FqJzwQm289AXSKJUXQsVdOSxmvjkg0eQ9TklaxrWKm144Ojtu11RxfUSfBYzp+xVb4rrG1rgC6iTS7AK4jdKfJpIGm/VcLP4mZ+tJYBaMOf66uk+AxHb9iGwE4URQh38polgngOeA6CmcCeH79iQBsEBCVF6U+oq8SNwIQAdi7dmQC+PioTve0LgJAMD0YVTF7hZBmAvgBE4CO30KGSTKvt9JxsXRy0JFOMNPTRnx1sNC1Eu90feVjOhW6To0VI7ETHMXPkc34I2CVgAr41YAIQTTWCMBzakYAzlv3ar5HAM5rQH92XIWvKh5KBI0D0m6ZSLwRgHOIBcdzL8cWEQBAT0ZLbbwIwPN7cIfwUieZ8D5tqnUCOi2ZdPCQjSIAgJIQKwLwCKQQNxPAOfkEx3Mv33wCkAZV5Zd7e4e4nbXVQqs4iX8lZLUmEsPhgxb8I7Ed/AXHKj7TUwfH0fk58OSYpL7EjpMv/kOU0yTSeKvNIcRV3xprBGDmAbRaO65TBOCc+vJFIAZ8R3R07Xmk+xZVEsk01DmNq/l09pwW720sWkuNo1o7jiMCcE7DCMAdIyYWjOPnyK9ZyNShjVe9MlbxyRVg51uFWiwpfLWge4XRuHRPJc1aO9ytq6dIJoBHBARHraVySPZs1SkTwHlbVSeAavFUdDqFP896//f18gaj8asvifXoBN2u1cbTPav+NY7q4abxf4uPAasgaYNGAJ5fAYTMYvO5i9akKn4ahzZQBGCDQLWAehqIXWcME2IpiapY6AkqsVaJfNSMgr/Gr740B6m71k73jABEAG4IXE1mFRNpAiW3NovYiU0mgL9/XVudbrnG028AurHYKWneQYWnG1T9CY4yKVSxPtpfiDudY/WqVsWwg1k190mBvwluBOCx/HKSi80RqbSAVYIoma9ulgjA80pU66v8UR5EADZISXOLTQTg1wMEHcyUzFeLWnXS3Is/AgBVrY5Yuq5q1yGzKniVIADr7puFYqH+MwFkAlCuHNpVSanrqnYRgPPSRgB+gACc02DeQk/Q6s4qCjIOVmPorOuI03bfr/A1uWcHx3d4OO3Er33SegPoBFhdq4lV/UcA7shNNqP6UrtqfTvrrn5jmLz2aZ9EADaMiABEAPQBt8qVI/8RAJBnVTZwtWtSLepk8aqxf66bPEG/wtfknh0ccwWAk3EaYPEXAXiO0mQDfYWvyT2FTys2P/oKsAJUbINAEPjfQIDfAP430kmUQSAIrCAQAVhBK7ZB4JshEAH4ZgVNOkFgBYEIwApasQ0C3wyBCMA3K2jSCQIrCEQAVtCKbRD4ZghEAL5ZQZNOEFhBIAKwglZsg8A3QyAC8M0KmnSCwAoC/w9XsS47boDbeAAAAABJRU5ErkJggg==">
<div class="text-desc">扫码加入星球</div>
<div class="text-desc">查看更多优质内容</div>
</div>
<div id="qrcode-url">https://wx.zsxq.com/mweb/views/joingroup/join_group.html?group_id=51112141255244</div>
<input type="hidden" name="group_allow_copy" value="false">
<input type="hidden" name="group_enable_watermark" value="true">
<input type="hidden" name="member_id" value="111888182154422">
<input type="hidden" name="member_name" value="wws">
<input type="hidden" name="member_role" value="other">
</div>
"""
## 上面html跟下面的结果对不上,但是不影响理解应该,跑的时候换成自己的html跑一下就知道了
# 1. BeautifulSoup对象
soup = BeautifulSoup(html, 'lxml')
print(f"type(soup):{type(soup)}\n")
# 2. Tag 对象
print(f"soup.head:{soup.head} \n")
print(f"soup.head.name:{soup.head.name} \n")
print(f"soup.head.attrs:{soup.head.attrs} \n")
print(f"type(soup.head):{type(soup.head)} \n")
# 3. Navigable String 对象
print(f"soup.title.stringh:{soup.title.string} \n")
print(f"type(soup.title.string):{type(soup.title.string)} \n")
# 4. Comment 对象
print(f"soup.a.string:{soup.a.string} \n")
print(f"type(soup.a.string):")
# 5. 结构化输出soup对象
print(f"soup.prettify()=>{soup.prettify()}")
2. 遍历文档树
BeautifulSoup 之所以将文档转为树结构,是因为树结构更便于对内容遍历提取
python
from bs4 import BeautifulSoup
html = """
<html><head><title>The Dotmouse's stroy</title></head>
<body>
<p ...>...</p>
...
</body>
</html>
"""
# 1. BeautifulSoup对象
soup = BeautifulSoup(html, 'lxml')
print(f"type(soup):{type(soup)}\n")
# 2. 向下遍历
print(soup.p.contents)
print(list(soup.p.children))
print(list(soup.p.descendants))
# 3. 向上遍历
print(list(soup.p.parent.name))
for i in soup.p.parents:
print(i.name)
# 4. 平行遍历
print('a_next:',soup.a.next_sibling)
for i in soup.a.next_sibling:
print('a_nexts:', i)
print('a_previous:',soup.a.previous_sibling )
for i in soup.a.previous_sibling:
print('a_previous:', i)
4 搜索文档树
搜索方法:
python
from bs4 import BeautifulSoup
html = """
<html><head><title>The Dotmouse's stroy</title></head>
<body>
<p ...>...</p>
...
</body>
</html>
"""
# 1. BeautifulSoup对象
soup = BeautifulSoup(html, 'lxml')
print(f"type(soup):{type(soup)}\n")
# 2. find()
print(soup.find('a'))# 查找a标签
print(soup.find(id='link2'))# 查找id等于link2的元素
# 3. find_all()
print(soup.find_all('a'))# 查找标签名
print(soup.find_all('a',id='link1'))# 检索属性值
print(soup.find_all('a',class='sister'))# 检索属性值
print(soup.find_all(text=['Elsie','Lacie']))
# 4. 向上检索
print(list(soup.p.find_parent().name))
for i in soup.title.find_parents():
print(i.name)
# 5. 平行检索
print(soup.head.find_next_sibling().name())
for i in soup.head.find_next_sibling():
print('a_nexts:', i)
print(soup.title.find_previous_sibling())
for i in soup.title.find_previous_sibling():
print('a_previous:', i)
5 CSS 选择器
在Tag或者BeautifulSoup对象的select()方法中传入字符串参数,即可使用CSS选择器找到Tag
6 爬取图片示例
python
import requests
from bs4 import BeautifulSoup
import os
def geturl(url):
try :
read =requests.get(url)
read.raise for status()
read,encoding=read.apparent encoding
return read.text
except:
return"连接失败!
def getPic(html):
soup= BeautifulSoup(html, "html.parser")
all_img = soup.find('ul').find_all( img )
for img in all_img:
src = img['src']
img url = src
print(img_url)
root ='F:/Pic/'
path=root + img_url.split('/')[-1]
print(path)
try:
if not os.path.exists(root):
os.mkdir (root)
if not os.path.exists(path):
read =requests.get(img url)
with open(path, "wb )as f:
f.write(read.content)
f.close()
print("文件保存成功!")
else :
print("文件已存在!")
except:
print(~文件爬取失败!")
if __name__=='__main__':
html_url=getUrl( 'https://findicons.com/search/nature' )
getPic(html_url)
参考
版面分析--网页HTML解析
Beautiful Soup 4.4.0 文档
python爬虫之Beautifulsoup模块用法详解
网络爬虫之BeautifulSoup详解(含多个案例)