Easy method to scrape html table using Python

Scrapping table can be sometimes quite annoying to do. I have worked on more than 50+ scrapping projects to scrape the html tables. Here is the best and easiest way that will work with any website. For this purpose I will be using Beautiful Soup and Pandas.

datafile=<YOUR HTML PAGE>

from bs4 import BeautifulSoup
with open(datafile,"r") as f:
    soup = BeautifulSoup(f,"html.parser")

table = soup.find('table')

import pandas as pd
data = pd.read_html(str(table),flavor='bs4')[0]


In above code datafile is your downloaded html page. Alternatively you can use requests Python module to get the webpage and store it inside the variable.

The second block imports BeautifulSoup and I am loading that html page into soup variable. Again this block can be done using requests if you are going to scrape live website.

Table variable stores the first table tag which is the table we need to scrape.

Pandas is used to read the table variable and store in form Python table. This Python table can be easily exported to CSV or Excel.

Comments

Popular posts from this blog

Vivo Y51 themes download itz

Free games for 512 MB RAM Android mobile

Getting too many Vodafone spam calls