Quantcast
Channel: Extracting text from link in python - Stack Overflow
Viewing all articles
Browse latest Browse all 3

Answer by litepresence for Extracting text from link in python

$
0
0
import urllibbudgeturl = "http://www.the-numbers.com/movie/budgets/all"s = urllib.urlopen(budgeturl).read()def find_between( s, first, last ):    try:        start = s.index( first ) + len( first )        end = s.index( last, start )        return s[start:end]    except ValueError:        return ""s = find_between(s, '<table>', '</table>')print s[:500]print '.............................................................'print s[-250:]

Find string between two substrings

returns:

>>><tr><th>&nbsp;</th><th>Release Date</th><th>Movie</th><th>Production Budget</th><th>Domestic Gross</th><th>Worldwide Gross</th></tr><tr><td class="data">1</td><td><a href="/box-office-chart/daily/2009/12/18">12/18/2009</a></td><td><b><a href="/movie/Avatar#tab=summary">Avatar</a></td><td class="data">$425,000,000</td><td class="data">$760,507,625</td><td class="data">$2,783,918,982</td><tr><tr><td class="data">2</td><td><a href="/box-office-chart/daily/2015/12/18">12/18/2015</a></td>.............................................................</td><td><a href="/box-office-chart/daily/2005/08/05">8/5/2005</a></td><td><b><a href="/movie/My-Date-With-Drew#tab=summary">My Date With Drew</a></td><td class="data">$1,100</td><td class="data">$181,041</td><td class="data">$181,041</td><tr>

enter image description here

.........................................

enter image description here

I need the text not the link.

via http://www.convertcsv.com/html-table-to-csv.htm

Release Date,Movie,Production Budget,Domestic Gross,Worldwide Gross1,12/18/2009,Avatar,"$425,000,000","$760,507,625","$2,783,918,982"8/5/2005,My Date With Drew,"$1,100","$181,041","$181,041"

you can use beautifulsoup to do the same, see:

beautifulSoup html csv


Viewing all articles
Browse latest Browse all 3

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>