Life of Mech n
Life of Mechon is an information resource site for Mechons and Geeks.Here we focus on Machine Learning, Artificial Intelligence, 3D printing, Tips and Tricks related to Programming and Front End CSS
- Home
- About Me
- Contact
- Machine Learning
-
Settings
- Dark mode
Introduction to Pandas
Pandas is a powerful python data analysis tool for visualizing, manipulating, filtering , reading and exporting the data.
Pandas is used by most of the data scientists and IT professional to analyze the data.
Pandas has many alternatives but we use pandas because it has more functionalities compared to others.
It has huge contribution and support from the community and pandas can be used by anyone as it is an open source library. It is built on the top of Numpy another package similar to pandas.
You can read different forms of data CSV files, json, and many other formats are supported by pandas.
Functions of filtering the data , selecting and manipulating are done easily.
Pandas can help you read different types of files and for better knowledge see the below table
Format Type | Data Description | Reader | Writer |
---|---|---|---|
text | CSV | read_csv | to_csv |
text | Fixed-Width Text File | read_twf | |
text | JSON | read_json | to_json |
text | HTML | read_html | to_html |
text | Local clipboard | read_clipboard | to_clipboard |
MS Excel | read_excel | to_excel | |
binary | OpenDocument | read_excel | |
binary | HDF5 Format | read_hdf | to_hdf |
binary | Feather Format | read_feather | to_feather |
binary | Parquet Format | read_paraquet | to_paraquet |
binary | ORC Format | read_orc | |
binary | Msgpack | read_msgpack | to_msgpack |
binary | Stata | read_stata | to_stata |
binary | SAS | read_sas | |
binary | SPSS | read_spss | |
binary | Python Pickle Format | read_pickle | to_pickle |
SQL | SQL | read_sql | to_sql |
SQL | Google BigQuery | read_gbq | to_gbq |
Mostly we will deal with CSV and Excel files.
Step 0 :
For reading excel files there is a requirement to add another dependence file.
On command prompt window type
pip install xlrd==1.2.0
Now, download the datasets file from here. [Source : Analytics vidhya]
Step 1: Reading datasets with Pandas
1.Open Jupyter Notebook by typing jupyter notebook in command prompt window
2.Upload the dataset files to the jupyter notebook.
3.Create a new python3 file and do the following commands separately.
#importing pandas library and naming it as pd for easy to useimport pandas as pd
# assigning the read file from dataset to the dfdf = pd.read_csv("data.csv")
#head() prints the top rows and columns nearly 5*5df.head()
df1 = pd.read_excel("data.xlsx")
df1.head()
Data frames are the structure of the data which is used in python. Pandas and SFrame are also a kind of dataframe. Each has its unique functions.
They are used to perform several operations
some of them are
- df.shape() which provides the dimensions ie. rows x cols
- df.head() is used to access top of the data frame
- df.tail() is used to access bottom of the data frame
- df.columns is used to access all columns
- df["column_name"] is used to access data in a specified column
- df["column1","column2"] for accesing data of multiple columns
Try the above functions to perform different kinds of data filtering.
Recommended
Trending Topics
Recent Trends
Recommended
-->
Post a Comment
Post a Comment