Easy method to scrape html table using Python

August 30, 2017

Scrapping table can be sometimes quite annoying to do. I have worked on more than 50+ scrapping projects to scrape the html tables. Here is the best and easiest way that will work with any website. For this purpose I will be using Beautiful Soup and Pandas.

datafile=<YOUR HTML PAGE>

from bs4 import BeautifulSoup
with open(datafile,"r") as f:
soup = BeautifulSoup(f,"html.parser")

table = soup.find('table')

import pandas as pd
data = pd.read_html(str(table),flavor='bs4')[0]

In above code datafile is your downloaded html page. Alternatively you can use requests Python module to get the webpage and store it inside the variable.

The second block imports BeautifulSoup and I am loading that html page into soup variable. Again this block can be done using requests if you are going to scrape live website.

Table variable stores the first table tag which is the table we need to scrape.

Pandas is used to read the table variable and store in form Python table. This Python table can be easily exported to CSV or Excel.

Search This Blog

The Mobile Wikipedia

Easy method to scrape html table using Python

Comments

Post a Comment

Popular posts from this blog

wwe 2k14 apk download for Android

Download WhatsApp for Windows Phone - XAP file

Vivo Y51 themes download itz