Posts

Showing posts from August, 2017

How to remove special character like  using Python?

In this blog post I am going to share you an Python function which can be used to remove special character in your data. It is normal that you will get special character like Ä, NÄ�sÄ«f etc when scrape from website. This will cause trouble when you want to export the data and it also doesn't looks good. Below code helps us to solve the problem: def cleanup(value): return value.encode('ascii', errors='replace').replace("?"," ") You can run the above funtion in Pandas to column which got the special character. Below is one sample: data['Name']= data['Name'].apply(cleanup) In above line data is the data frame and it has column named as Name . I am using apply function to run my custom code which will replace the any weird character to blanks. You can covert any column to numeric using below line: data['Age'] = data['Age'].apply(pd.to_numeric, errors='coerce') Again in above line I am using ap

Pandas create new column based on existing column

In SQL, we can just use SELECT statement to create a new column. In Python, we can do the same using the Pandas. For using Pandas, first create a custom function for how you need the new column to be created. I had an scenario where I have to classify the person as Adult or Child. For this purpose I will be using the Age column in my data-set. If the age is above 18 and I will create the corresponding value as Adult in my new column. So let's see how this can be achieved. def checkAdult(age):     if age>=18:         return Adult     else:         return Child Above is my custom function. Where it takes one argument age and returns Adult or child. The above Python function can be used in the existing data frame (data) to create new column(Adult/Child).      data[Adult/Child]=data[Age].apply(checkAdult) I am creating new column with name Adult/Child . I am passing Age column to the checkAdult function as define in the right hand side.

Easy method to scrape html table using Python

Scrapping table can be sometimes quite annoying to do. I have worked on more than 50+ scrapping projects to scrape the html tables. Here is the best and easiest way that will work with any website. For this purpose I will be using Beautiful Soup and Pandas. datafile=<YOUR HTML PAGE> from bs4 import BeautifulSoup with open(datafile,"r") as f:     soup = BeautifulSoup(f,"html.parser") table = soup.find('table') import pandas as pd data = pd.read_html(str(table),flavor='bs4')[0] In above code datafile is your downloaded html page. Alternatively you can use requests Python module to get the webpage and store it inside the variable. The second block imports BeautifulSoup and I am loading that html page into soup variable. Again this block can be done using requests  if you are going to scrape live website. Table variable stores the first table tag which is the table we need to scrape. Pandas is used to read the table variable and

Excel Macro to Find and Replace in multiple sheet

Below are the two Sub-procedure which can be used to find and replace particular search term with replace value. So this macro will run on multiple sheet in same directory. In below code you have to make following changes according to your settings. Pathname - I have mentioned the directory where all my sheet is present. This macro will run for all the sheet present in the directory you mention. Filename  - If you notice I have give as "*csv". So I was using macro for CSV files in the particular directory. Change  the extension accordingly ex:xlsx or xls. Search and Replacement - Use this to specify the search term and replace term. Columns(3) - Here I am running this search and replace only for column C in the sheet. Worksheets(1) - This means I am running the macro only in first sheet in the workbook. How to run: To run this macro open a blank Excel Sheet and press 'ALT + F11'. It will open Visual Basic editor. And right click on sheet -> Inser