Quantcast
Channel: Extracting text from link in python - Stack Overflow
Viewing all articles
Browse latest Browse all 3

Extracting text from link in python

$
0
0

I have a script in python 2.7 that scrapes the table in this page:http://www.the-numbers.com/movie/budgets/all

I want to extract each of the columns, the problem is that my code doesn't recognize the columns that have links (2nd and 3rd columns).

budgeturl = "http://www.the-numbers.com/movie/budgets/all"s = urllib.urlopen(budgeturl).read()htmlpage = etree.HTML(s)htmltable = htmlpage.xpath("//td[@class='data']/text()")

With this code htmltable[0] is the rank, htmltable[1] is Production Budget and continues from there onwards. From the ones I am missing, I need the text not the link.


Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images