Life of Mech n
Life of Mechon is an information resource site for Mechons and Geeks.Here we focus on Machine Learning, Artificial Intelligence, 3D printing, Tips and Tricks related to Programming and Front End CSS
- Home
- About Me
- Contact
- Machine Learning
-
Settings
- Dark mode
Day 3 Indexing, Manipulation and Visualization of data - Part 1
Indexing is referred as accessing data in a data frame in Python. It would be done within the square brackets. It is very similar to the string parsing in python.
There is no need for doing with any special datasets. Already a dataset is provided to you on day 2.
import pandas as pddf = pd.read_csv("data.csv")df.head()#printing values of specific column 'Fare'df['Fare']#printing values of multiple columns ' Name' and 'Cabin'df[['Name','Cabin']]#iloc is Integer index based so the first parameter would be considered as row and second one as column# To print rows from 2 to 4df.iloc[2:5]#To print rows from 0 to 4df.iloc[:5]#To print rows from 0 to 4 for all the columns df.iloc[1:5,:]#To print the values that match to the particular columndf[df['Embarked']=='S']
Change the indexing values and explore the ways to filter the data.
Try these ,
1. Access columns upto 4
2. Access values of rows 1 to 7 for 3,4,5 columns
3. Access values of any one column using indexing
Data Manipulation
Data Manipulation in simple words organizing the data in required structure. We gonna use some kind of new functions to manipulate the data and several operations such as sorting, merging. We are going to deal with some functions which used for finding missing data.
First of all look after the below codes and read comments for the explanation.
dropna(),isna(),notna(),fillna() are the functions used for accessing missing values in certain rows or columns.
As we use dropna( ) here, it is the function which drop the null values and there are several parameters for the dropna()
you could find why the 'how' and 'any' is used in the below code. Just read the above image for comparison.
Sorting in pandas uses two main functions and they are
- sort_values()
- sort_index()
# Sorting a single column
import pandas as pd
import numpy as np
data_file = pd.read_csv(data.csv)
# drop the null values
data_file = data_file.dropna(how="any")
# view the top results
data_file.head()
# sort by year
sorted_data = data_file.sort_values(by='Fare')
# print sorted data
sorted_data[:5]
# sort in place and descending order
data_file.sort_values(by='Outlet_Establishment_Year', ascending=False, inplace=True)
data_file[:5]
# inplace means the sorted values get updated in real data frame by default inplace is False
# Sorting Multiple columns
# As the existing data_file is sorted due to inplace=True we are again reading the csv file
data_file = pd.read_csv('data.csv')
data_file.sort_values(by=['Outlet_Establishment_Year', 'Item_Outlet_Sales'], ascending=False)[:5]
# changed the order of columns
data_file.sort_values(by=['Item_Outlet_Sales', 'Outlet_Establishment_Year'], ascending=False, inplace=True)
data_file[:5]
#you can see the difference between the datasets
# Sort using Row index
# sort by index
data_file.sort_index(inplace=True)
data_file[:5]
Merging Data frames
Concat() and Merge() are the two useful functions for merging Data Frames
Let us create three Data Frame by ourselves and combine all,
Below is the syntax for creating a data frame and we hence created three data frames
df1 = pd.DataFrame({'A': ['1', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3'], 'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], 'B': ['B4', 'B5', 'B6', 'B7'], 'C': ['C4', 'C5', 'C6', 'C7'], 'D': ['D4', 'D5', 'D6', 'D7']}, index=[4, 5, 6, 7]) df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'], 'B': ['B8', 'B9', 'B10', 'B11'], 'C': ['C8', 'C9', 'C10', 'C11'], 'D': ['D8', 'D9', 'D10', 'D11']}, index=[8, 9, 10, 11])
Now Let's combine the data into a single data frame using concat()
# combine dataframes
result = pd.concat([df1, df2, df3])
#The data which combined would be the result
result
As similar to Dictionaries, we can able to add keys or labels to the each data frame, which we would like to combine
for instance
# combine dataframes with labelsresult = pd.concat([ df1, df2, df3 ],keys = ['A' , ' B' , 'C' ])result
Now the label would appear on the left hand side of the data frame.
Using loc attribute we could easily access the data frame with keys.
# This code would print the values linked with the key 'A'
result.loc[ 'A' ]
Recommended
Trending Topics
Recent Trends
Recommended
-->
Post a Comment
Post a Comment