Day 4 Indexing, Manipulation and Visualization of data - Part 2


Now let us use the merge().

merge() is some what related to the joins in SQL, so if you google about the joins in SQL it is very easy for you to grasp these things quicker.

Ok look at these tables and we are going to merge these tables using different join operations

While we should create those data frames 

df_a = pd.DataFrame({
        'roll_num': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']})

df_b = pd.DataFrame({
        'roll_num': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']})

df_c = pd.DataFrame({
        'roll_num': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
        'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]})

roll_num first_namelast_name
1 Alex Anderson
2 Amy Ackerman
3 Allen Ali
4 Alice Aoni
5 Ayoung Atitches

roll_num first_namelast_name
4 Billy Bonder
5 Bryan Black
6 Bran Balwner
7 Bryce Brice
8 Betty Btisan

roll_num test_id
1 51
2 15
3 15
4 61
5 16
7 14
8 15
9 1
10 61
11 16

Using these dataframe we'll try to use basic join operation, inner, left, outer, right.

Try this code snippets , please put import statements by yourself 

pd.merge(df_a, df_c, on='roll_num')


pd.merge(df_a, df_b, on='roll_num', how='outer')


pd.merge(df_a, df_b, on='roll_num', how='inner')


pd.merge(df_a, df_b, on='roll_num', how='right')


pd.merge(df_a, df_b, on='roll_num', how='left')


You might wonder which one to use merge() or concat() that belongs to the data you are going to work with, you can find one variation between merge and concate as concate function join the data at bottom of another dat frame , and merge join to the right.

And that one also can be changed by changing axes value.

Note: 
         Check this link for the pandas functions and uses , you can navigate to all functions, their attributes and their values.

Well now we will look how to visualize the data using visualization tools

Oh! wait one minute,  how did I forget this one ,

You can use apply() function which was often you are going to use in Data Manipulation.

This is very similar to for loop, as we use the keywords to access the data according to our requirement.

Let me share the codes with you , try it yourself. And also try with different datasets, data sets can be downloaded from Kaggle or in google.

import pandas as pd
import numpy as np

data_BM = pd.read_csv('data.csv')
# drop the null values
data_BM = data_BM.dropna(how="any")
# reset index after dropping
data_BM = data_BM.reset_index(drop=True)
# view the top results
data_BM.head()

# accessing row wise
data_BM.apply(lambda x: x)

# access first row
data_BM.apply(lambda x: x[0])

# accessing column wise
data_BM.apply(lambda x: x, axis=1)

# before clipping
data_BM["Fare"][:5]

# clip fare it is greater than 50
def clip_price(price):
    if price > 50:
        price = 50
    return price

# after clipping
data_BM["Fare"].apply(lambda x: clip_price(x))[:5]

data_BM["Embarked"][:5]

# label encode type 
def label_encode(type):
    if type == 'C':
        label = 0
    elif type == 'S':
        label = 1
    else:
        label = 2
    return label

# operate label_encode on every row of Outlet_Location_Type
data_BM["Embarked"] = data_BM["Embarked"].apply(label_encode)

# after label encoding
data_BM["Embarked"][:5]


Recommended

Post a Comment