Web Links Gatherer (ver 2) - Spentera Blog

Web Links Gatherer (ver 2)

by Hanny Haliwela / October 13, 2010

By using Beautiful Soup, we can change the code as seen at the previous post to the code below… and it even works much better… just by changing the regex function, it return a better result :

#!/usr/bin/python
# otoy -- https://otoyrood.wordpress.com
# 0x102010
 
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
 
text = urlopen('https://otoyrood.wordpress.com').read()
soup = BeautifulSoup(text)
 
pages = set()
for header in soup('a'):
 pages.add(header['href'])
 
print '\n'.join(sorted(pages))

 

Leave a Comments

Your email address will not be published. Required fields are marked *