Pandas- Another library widely used in Machine learning

Pandas- Step By Step Guide By Sagar Jaybhay

Pandas is an open source library which is use full for high performance, easy to use a data structure and data analysis tools. It makes data science very easy and effective. (Shift + Tab ) is used to get information about the functions in all python.

Link:- https://pandas.pydata.org/

Which problem pandas solve?

Python is a good programming language for data mugging and preparation but when it comes to data analysis it behind in some areas so to fill this gap panda is theirs. So by using you can carry the whole workflow of your data analysis in python and not go to switch the language like R.

To start to learn first you need to import pandas lib.

import pandas as pd

pd.__version__

this is used to check the current version which you install on your machine. If your version is not latest then you can upgrade this to the latest version by using below commands.

python3 -m pip install --upgrade pandas==0.23.0
print(pd.show_versions())

DataFrames

Data Frames are like 2-dimensional array like our spread sheet.Its size is mutable,possibly heterogeneous  We took Olympics data for processing.

Link to data download is-https://docs.google.com/spreadsheet/ccc?key=0AonYZs4MzlZbdHlfd0F1QlAxYjgtOW53ZXNOZ0JzNVE

The site on which data link is present is below link.

https://www.theguardian.com/sport/datablog/2012/jun/25/olympic-medal-winner-list-data

CityEditionSportDiscipline AthleteNOCGender EventEvent_genderMedal
Athens 1896 Aquatics Swimming HAJOS, Alfred HUN Men 100m freestyle M Gold
Athens 1896 Aquatics Swimming HERSCHMANN, Otto AUT Men 100m freestyle M Silver
Athens 1896 Aquatics Swimming DRIVAS, Dimitrios GRE Men 100m freestyle for sailors M Bronze
Athens 1896 Aquatics Swimming MALOKINIS, Ioannis GRE Men 100m freestyle for sailors M Gold
Athens 1896 Aquatics Swimming CHASAPIS, Spiridon GRE Men 100m freestyle for sailors M Silver
Athens 1896 Aquatics Swimming CHOROPHAS, Efstathios GRE Men 1200m freestyle M Bronze
Athens 1896 Aquatics Swimming HAJOS, Alfred HUN Men 1200m freestyle M Gold
Athens 1896 Aquatics Swimming ANDREOU, Joannis GRE Men 1200m freestyle M Silver
Athens 1896 Aquatics Swimming CHOROPHAS, Efstathios GRE Men 400m freestyle M Bronze

To read csv file and print some part of it on notebook use below code.

head=pd.read_csv('C:\data\olympicData.csv',skiprows=0)
head.head()

If you use dataframe name only to print data in jyputer notebook then it display first 30 rows and last 30 rows.

Series

It is one dimensional array of indexed data. Is having capability of holding any kind of data like int, string, python object. In this axis labels are collectively called as index. When you consider a data frame then every column in that data frame is considered as series and every row also considered as series.

Sagar-Jaybhay-Series-Data-Frame

If you want to get 2 series use below code which is list of series

head[['City','Athlete']]

To get the type you can use below code

type(head) #o/p pandas.core.frame.DataFrame
type(head.City)#o/p - pandas.core.series.Series
type(head[['City','Athlete']]) #o/p pandas.core.frame.DataFrame

Data Input

pd.read_csv()
pd.read_excel()
pd.read_html()

These 3 methods commonly use in Pandas to get data or data frame.

  1. Head.shape:= By using this we get the shape of an data frame in our case 29216 is the rows count and 10 are column count. The output is tuple format.
head.shape #o/p (29216,10)(rows,columns)
  1. Head method:– in this if you don’t specify the rows by default you will get first 5 rows
    • head.head()
  2. Tail Method:– In this same if you don’t specify rows you will get last 5 rows
    • head.tail()
  3. Info Method:- By using this method you will able to get information about data frames like below image.
Sagar Jaybhay Data Frame
  1. values_count():- This method is on pandas index count which returns the count of unique values. By default result is descending order if we not specify order.
head.Gender.value_counts(ascending=True)
  • sort_values(): this method is used on series then it sort series by ascending order by default but If you want to sort data frame in this you can use by which series you want to sort.
head.sort_values(by=[‘City’,’Athlete’]) 

the result you get a data frame but the rows are sorted by using the column name which are provided into that sort_values function .

Sagar Jaybhay, from Maharashtra, India, is currently a Senior Software Developer at Software Company. He has continuously grown in the roles that he has held in the more than seven years he has been with this company. Sagar Jaybhay is an excellent team member and prides himself on his work contributions to his team and company as a whole.

Sagar Jaybhay

Sagar Jaybhay, from Maharashtra, India, is currently a Senior Software Developer at Software Company. He has continuously grown in the roles that he has held in the more than seven years he has been with this company. Sagar Jaybhay is an excellent team member and prides himself on his work contributions to his team and company as a whole.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *