Pandas: Library heavily used in machine learning
Pandas By Sagar Jaybhay
Boolean vectors can be used to filter data and if you usemultiple conditions then that can be grouped under the brackets.
If you want to find who won the gold medal then you can use head.Medal==’Gold’ by using this you will get the array or series of true and false but you want real data so you use below like this
Head[head.Medal==’Gold’] it return a data frame as a result. Above is single condition and if you want multiple condition then use below
By using series.str the str using this you can access the string methods and can apply this methods also
Below is the syntax that find the athlete who’s name is start with FLACK.
head.index #o/p RangeIndex(start=0, stop=29216, step=1)
pandas.DataFrame.set_index this is used to set the index in data frame using preexisting column as the index.
The parameter here inplace is used to set the indexpermanently in underlying table.
If you set index once using inplace=True you can’t use same column name on next time it will throw an error.
To remove index use below line
You can sort the element along the axis.
You can search the index and if not found raise an key value error.
By using this you can split the data frame into groups based on some criteria. You can combine the result based on groups. Group By object is not a data frame but it is group of data frames a dictionary like structure.
it returns the group by object. We have to check type of group by object is.
type(head.groupby('Sport')) #o/p is pandas.core.groupby.groupby.DataFrameGroupBy list(head.groupby('Sport'))
Iterate through the group.
by using this you will get total number of rows by column wise data.
this is used for starting from row position to end row number means from this code we will get 1 row from 1 st row position to 99th row. Means last number-1 .
Operations on data frames: To import a data from xlsx file we need to install xlrd.
this method is used to save data from data frame to csv on your given location. But it will save the index which is present in data frame. So if you don’t want to save index in csv you can use one parameter flag in this function .df.to_csv(‘c:\data\mydata.csv’)
To store data into excel using pandas and python you need to install package openpyxl
and if you don’t want to save index of file use below
This will create xlsx file in your folder where you havecreated jupyter notebook file. Basically it create folders under