python - Beautifulsoup url loading error -
so trying content of this page using beautiful soup. want create dictionary of css color names , seemed quick , easy way access this. naturally did quick basic:
from bs4 import beautifulsoup bs url = 'http://www.w3schools.com/cssref/css_colornames.asp' soup = bs(url)
for reason getting url in p
tag inside body , that's it:
>>> print soup.prettify() <html> <body> <p> http://www.w3schools.com/cssref/css_colornames.asp </p> </body> </html>
why wont beautifulsoup give me access information need?
beautifulsoup not load url you.
you need pass in full html page, means need load url first. here sample using urllib2.urlopen
function achieve that:
from urllib2 import urlopen bs4 import beautifulsoup bs source = urlopen(url).read() soup = bs(source)
now can extract colours fine:
css_table = soup.find('table', class_='reference') row in css_table.find_all('tr'): cells = row.find_all('td') if cells: print cells[0].a.text, cells[1].a.text
Comments
Post a Comment