Machine Learning -- This tutorial will help you to learn Machine Learning by daily with 100 days Goal.

We are very happy to announce you that we've created a  Brand New Machine Learning series. 




How it is different from other Machine Learning Tutorials ?

Here we would not make you to learn Machine Learning from basics as the most online courses would do.

Instead we provide a little knowledge about Machine learning and we dive into the case study approach.


What is case steady approach ?

Instead of learning the algorithms and linear algebra at first , we make the step directly into applications.

we would go through a steps of projects and finally we could understand why the Machine Learning need these algorithms and statistical functions.

Do I need to learn any basics for Machine Learning ?

Of course, you should learn PYTHON. It is an easy language to learn and understand complex functions. Learn at least basics of Python it would be very useful while you go through the course.

Is there any fees for this course ?

No, as we're contributing our articles to the Mechanical society, we don't receive fees. It is completely free.

Do Mechanical Engineers find jobs in Machine Learning ?

Definitely Yes. The most trendiest job in 2020 is Machine Learning Engineer. It would be the future of Mechanical Engineers. 

Do we receive any certificates ?

You would receive certificate of completion once you finish and submit the final project of the course. 
How long does the course takes to learn ?

It is self paced course, but it would possibly take around 12 weeks.

For live updates of latest series Join our telegram channel https://t.me/lifeofmechon

1.1 This post would help to install the basic software  required to work with Machine Learning .


Basic Requirements :
  1. Python
  2. Jupyter Notebook

Go through the below steps if you've not installed Python

If you've already installed Python Skip the first step.

1. Download Python from this website https://www.python.org/downloads/


    If you want specific versions for other operating systems, it can be downloaded from this website.

2. After Installation You could check whether the Python is Installed correctly on your PC/Laptop.
 
Open Your Command Prompt Window  and Type "python --version" without quotes


3. After Installation of Python you should install Package Installer for Python. Pip is the Package installer for Python.

For python > 3.4 the Package Installer is installed by default.

Just ensure whether it is installed properly by typing "pip --version" without quotes just like above in Command Prompt Window.

4. Jupyter Notebook is an open source developed in Github. It gives some additional functionalities for Python to visualize the statistical data effectively.

Open Your Command Prompt Window and type "pip install notebook" without quotes to install Jupyter.



After Installing Jupyter check using the command "jupyter notebook" without quotes.

It would open the browser with local host. 

Finally you've installed the Python and Jupyter Notebook.

At next we would install some important packages for Machine Learning.

1.2 We are confident as now it is easy to learn Machine Learning fundamentals. 


"As I said earlier Python is used in this series, you should know the basics of python. If you're ready,  go next line otherwise take some online courses or lectures to learn python basics. There are many sites which offer python basics for free.


And now as we've already installed Jupyter Notebook and Python, now let us get into some Machine Learning packages which are going to be used throughout this course.

  1. Numpy
  2. Pandas
  3. Scikit-Learn
  4. Scipy
  5. Matplotlib

First of all we should've installed these library files in the pc or laptop.

So use  the following commands to install these library files in Cmd prompt window

pip install numpy

pip install scipy

pip install pandas

pip install scikit-learn

pip matplotlib
And also there is a Machine Learning Package for Python that is Pycaret

You can simply type 
pip install pycaret
to install all the packages above mentioned and also some other library files useful for Machine Learning.

Don't confuse much yourself with the algorithms and models and all. Because there are numerous courses to deal with Machine Learning. But we framed this course to deal with applications first and to the algorithms. So you could better understand where you are going to use the Machine learning.

After installing the required packages, everything is completed from the software end. Now from the next post onwards we're going to write codes in the application arena. So be ready folks, go to the bottom of the page and drop you mail address to receive Machine Learning posts in your Mail as soon a as we post. 

2.1 Introduction to Pandas



Pandas is a powerful python data analysis tool for visualizing, manipulating, filtering , reading and exporting the data.

Pandas is used by most of the data scientists and IT professional to analyze the data.

Pandas has many alternatives but we use pandas because it has more functionalities compared to others.

It has huge contribution and support from the community and pandas can be used by anyone as it is an open source library. It is built on the top of Numpy another package similar to pandas.

You can read different forms of data CSV files, json, and many other formats are supported by pandas.

Functions of filtering the data , selecting and manipulating are done easily.

Pandas can help you read different types of files and for better knowledge see the below table

Format TypeData DescriptionReaderWriter
textCSVread_csvto_csv
textFixed-Width Text Fileread_twf
textJSONread_jsonto_json
textHTMLread_htmlto_html
textLocal clipboardread_clipboardto_clipboard
MS Excelread_excelto_excel
binaryOpenDocumentread_excel
binaryHDF5 Formatread_hdfto_hdf
binaryFeather Formatread_featherto_feather
binaryParquet Formatread_paraquetto_paraquet
binaryORC Formatread_orc
binaryMsgpackread_msgpackto_msgpack
binaryStataread_statato_stata
binarySASread_sas
binarySPSSread_spss
binaryPython Pickle Formatread_pickle   to_pickle
SQLSQLread_sqlto_sql
SQLGoogle BigQueryread_gbqto_gbq

Mostly we will deal with CSV and Excel files.

Step 0 :

For reading excel files there is a requirement to add another dependence file.

On command prompt window type
pip install xlrd==1.2.0

Now, download the datasets file from here. [Source : Analytics vidhya]

Step 1:  Reading datasets with Pandas

     1.Open Jupyter Notebook by typing jupyter notebook in command prompt window

     2.Upload the dataset files to the jupyter notebook.
 
     3.Create a new python3 file and do the following commands separately.

#importing pandas library and naming it as pd for easy to use
import pandas as pd 

# assigning the read file from dataset to the df
df = pd.read_csv("data.csv")

#head() prints the top rows and columns nearly 5*5
df.head()

df1 = pd.read_excel("data.xlsx")

df1.head()

2.2 Data frames

are the structure of the data which is used in python. Pandas and SFrame are also a kind of dataframe. Each has its unique functions.

They are used to perform several operations

some of them are 
  • df.shape() which provides the dimensions ie.  rows x cols
  • df.head()  is used to access top of the data frame
  • df.tail() is used to access bottom of the data frame
  • df.columns is used to access all columns
  • df["column_name"] is used to access data in a specified column
  • df["column1","column2"] for accesing data of multiple columns
Try the above functions to perform different kinds of data filtering.

3.1 Indexing, Manipulation and Visualization of data - Part 1

Indexing is referred as accessing data in a data frame in Python. It would be done within the square brackets. It is very similar to the string parsing in python.

There is no need for doing with any special datasets. Already a dataset is provided to you on day 2.


import pandas as pd
df = pd.read_csv("data.csv")
df.head()
#printing values of specific column 'Fare'
df['Fare']
#printing values of multiple columns ' Name' and 'Cabin'
df[['Name','Cabin']]
#iloc is Integer index based so the first parameter would be considered as row and second one as column
# To print rows from 2 to 4
df.iloc[2:5]
#To print rows from 0 to 4
df.iloc[:5]
#To print rows from 0 to 4 for all the columns 
df.iloc[1:5,:]
#To print the values that match to the particular column
df[df['Embarked']=='S']
Change the indexing values and explore the ways to filter the data.
Try these ,
1. Access columns upto 4
2. Access values of rows 1 to 7 for 3,4,5 columns
3. Access values of any one column using indexing

Data Manipulation 

Data Manipulation in simple words organizing  the data in required structure. We gonna use some kind of new functions to manipulate the data and several operations such as sorting, merging. We are going to deal with some functions which used for finding missing data.

First of all look after the below codes and read comments for the explanation.

dropna(),isna(),notna(),fillna() are the functions used for accessing missing values in certain rows or columns.

As we use dropna( ) here, it is the function which drop the null values and there are several parameters for the dropna() 


you could find why the 'how' and 'any' is used in the below code. Just read the above image for comparison.

Sorting in pandas uses two main functions and they are
  • sort_values()
  • sort_index()

# Sorting a single column

import pandas as pd

import numpy as np

data_file = pd.read_csv(data.csv)

# drop the null values

data_file = data_file.dropna(how="any")

# view the top results

data_file.head()

# sort by year

sorted_data = data_file.sort_values(by='Fare')

# print sorted data

sorted_data[:5]

# sort in place and descending order

data_file.sort_values(by='Outlet_Establishment_Year', ascending=False, inplace=True)

data_file[:5]

# inplace means the sorted values get updated in real data frame by default inplace is False

# Sorting Multiple columns

# As the existing data_file is sorted due to inplace=True we are again reading the csv file

data_file = pd.read_csv('data.csv')

data_file.sort_values(by=['Outlet_Establishment_Year', 'Item_Outlet_Sales'], ascending=False)[:5]

# changed the order of columns

data_file.sort_values(by=['Item_Outlet_Sales', 'Outlet_Establishment_Year'], ascending=False, inplace=True)

data_file[:5]

#you can see the difference between the datasets 

# Sort using Row index

# sort by index

data_file.sort_index(inplace=True)

data_file[:5]

Merging Data frames

Concat() and Merge() are the two useful functions for merging Data Frames

Let us create three Data Frame by ourselves and combine all,

Below is the syntax for creating a data frame and we hence created three data frames

df1 = pd.DataFrame({'A': ['1', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3'],
                     'C': ['C0', 'C1', 'C2', 'C3'],
                     'D': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 1, 2, 3])

df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                     'B': ['B4', 'B5', 'B6', 'B7'],
                     'C': ['C4', 'C5', 'C6', 'C7'],
                     'D': ['D4', 'D5', 'D6', 'D7']},
                    index=[4, 5, 6, 7])
 
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                     'B': ['B8', 'B9', 'B10', 'B11'],
                     'C': ['C8', 'C9', 'C10', 'C11'],
                     'D': ['D8', 'D9', 'D10', 'D11']},
                    index=[8, 9, 10, 11])

Now Let's combine the data into a single data frame using concat()


# combine dataframes

result = pd.concat([df1, df2, df3])

#The data which combined would be the result

result
As similar to Dictionaries, we can able to add keys or labels to the each data frame, which we would like to combine
for instance 

# combine dataframes with labels
result = pd.concat([ df1, df2, df3 ],keys = ['A' , ' B' , 'C' ])
result

Now the label would appear on the left hand side of the data frame.
Using loc attribute we could easily access the data frame with keys.

# This code would print the values linked with the key 'A'

result.loc[ 'A' ]

Day 4 Indexing, Manipulation and Visualization of data - Part 2


Now let us use the merge().

merge() is some what related to the joins in SQL, so if you google about the joins in SQL it is very easy for you to grasp these things quicker.

Ok look at these tables and we are going to merge these tables using different join operations

While we should create those data frames 

df_a = pd.DataFrame({
        'roll_num': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']})

df_b = pd.DataFrame({
        'roll_num': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']})

df_c = pd.DataFrame({
        'roll_num': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
        'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]})

roll_numfirst_namelast_name
1AlexAnderson
2AmyAckerman
3AllenAli
4AliceAoni
5AyoungAtitches

roll_numfirst_namelast_name
4BillyBonder
5BryanBlack
6BranBalwner
7BryceBrice
8BettyBtisan

roll_numtest_id
151
215
315
461
516
714
815
91
1061
1116

Using these dataframe we'll try to use basic join operation, inner, left, outer, right.

Try this code snippets , please put import statements by yourself 

pd.merge(df_a, df_c, on='roll_num')


pd.merge(df_a, df_b, on='roll_num', how='outer')


pd.merge(df_a, df_b, on='roll_num', how='inner')


pd.merge(df_a, df_b, on='roll_num', how='right')


pd.merge(df_a, df_b, on='roll_num', how='left')


You might wonder which one to use merge() or concat() that belongs to the data you are going to work with, you can find one variation between merge and concate as concate function join the data at bottom of another dat frame , and merge join to the right.

And that one also can be changed by changing axes value.

Note: 
         Check this link for the pandas functions and uses , you can navigate to all functions, their attributes and their values.

Well now we will look how to visualize the data using visualization tools

Oh! wait one minute,  how did I forget this one ,

You can use apply() function which was often you are going to use in Data Manipulation.

This is very similar to for loop, as we use the keywords to access the data according to our requirement.

Let me share the codes with you , try it yourself. And also try with different datasets, data sets can be downloaded from Kaggle or in google.

import pandas as pd
import numpy as np

data_BM = pd.read_csv('data.csv')
# drop the null values
data_BM = data_BM.dropna(how="any")
# reset index after dropping
data_BM = data_BM.reset_index(drop=True)
# view the top results
data_BM.head()

# accessing row wise
data_BM.apply(lambda x: x)

# access first row
data_BM.apply(lambda x: x[0])

# accessing column wise
data_BM.apply(lambda x: x, axis=1)

# before clipping
data_BM["Fare"][:5]

# clip fare it is greater than 50
def clip_price(price):
    if price > 50:
        price = 50
    return price

# after clipping
data_BM["Fare"].apply(lambda x: clip_price(x))[:5]

data_BM["Embarked"][:5]

# label encode type 
def label_encode(type):
    if type == 'C':
        label = 0
    elif type == 'S':
        label = 1
    else:
        label = 2
    return label

# operate label_encode on every row of Outlet_Location_Type
data_BM["Embarked"] = data_BM["Embarked"].apply(label_encode)

# after label encoding
data_BM["Embarked"][:5]