Contenu connexe Similaire à Pandas meetup 20200908 (20) Plus de Haim Michael (20) Pandas meetup 202009081. The pandas Library
Haim Michael
September 8th
, 2020
All logos, trade marks and brand names used in this presentation belong
to the respective owners.
lifemichael
https://youtu.be/Go_6xXYEtkw
2. © 2008 Haim Michael 20150729
What is Pandas?
Pandas is a fast, powerful, flexible and easy to use open
source data analysis and manipulation tool, built on top of
the Python programming language. (pandas.pydata.org)
3. © 2008 Haim Michael 20150729
Installing Pandas
There are more than a few ways to install Pandas. The
simplest would be using the pip utility.
pip install pandas
4. © 2008 Haim Michael 20150729
Checking Pandas Version
You can easily check the version of the Pandas library you
already have installed using the following code.
The expected output should look like the following:
5. © 2008 Haim Michael 20150729
Importing Pandas
In order to use Pandas we should first importing it. It is a
common practice to import it using the alias pd.
import pandas as pd
6. © 2008 Haim Michael 20150729
The DataFrame Class
When instantiating the DataFrame class we will get an
object that represents a table.
import pandas as pd
df = pd.DataFrame({ "country":["israel","france","germany"],
"currency":["ils","euro","euro"],
"capitol":["jerusalem","paris","berlin"]
})
print(df)
7. © 2008 Haim Michael 20150729
The Series Class
Each and every column of a DataFrame object is
represented using a Series object.
import pandas as pd
marks = pd.Series([88,90,72,64], name="Mark")
print(marks)
8. © 2008 Haim Michael 20150729
The describe() Function
We can invoke this method both on a Series object and on a
DataFrame object. Calling this function we will get a detailed
statistic description.
import pandas as pd
marks = pd.Series([88,90,72,64], name="Mark")
print(marks.describe())
9. © 2008 Haim Michael 20150729
Reading & Writing Data
The available methods allow us to read data from files in
various formats directly into a DataFrame object, and to
write data we already have in a DataFrame object directly to
files in various formats.
10. © 2008 Haim Michael 20150729
Reading & Writing Data
The to_excel, to_csv, etc... are methods that we invoke
on a DataFrame object. These methods allow us to to write
the data we already have organized in a DataFrame object
to a new file of a specific format.
The read_excel, read_csv, etc... are public methods that
were defined in the pandas module.
11. © 2008 Haim Michael 20150729
Writing to Excel Sample
import pandas as pd
df = pd.DataFrame({ "country":["israel","france","germany"],
"currency":["ils","euro","euro"],
"capitol":["jerusalem","paris","berlin"]
}
)
df.to_excel("countries.xlsx")
12. © 2008 Haim Michael 20150729
Reading from CSV
import pandas as pd
ob = pd.read_csv("countries.csv")
print(ob)
13. © 2008 Haim Michael 20150729
Selecting Column
Selecting a column is done by using square brackets
together with the column name of the column of interest.
Each column is represented using an object of the type
Series.
14. © 2008 Haim Michael 20150729
Selecting Column
import pandas as pd
df = pd.DataFrame({ "country":["israel","france","germany"],
"currency":["ils","euro","euro"],
"capitol":["jerusalem","paris","berlin"]
}
)
countries = df["country"]
print(countries)
print(type(countries)
15. © 2008 Haim Michael 20150729
Selecting Columns
Selecting multiple columns is done by using a list of column
names.
16. © 2008 Haim Michael 20150729
Selecting Columns
import pandas as pd
df = pd.DataFrame({ "country":["israel","france","germany"],
"currency":["ils","euro","euro"],
"capitol":["jerusalem","paris","berlin"]
}
)
ob = df[["country","capitol"]]
print(ob)
print(type(ob))
17. © 2008 Haim Michael 20150729
Selecting Rows
Selecting rows based on a specific condition is done using a
condition we specify inside the selection brackets.
18. © 2008 Haim Michael 20150729
Selecting Rows
import pandas as pd
df = pd.DataFrame({ "first name":["moshe","daniel","tal"],
"last name":["israeli","cohen","lahat"],
"id":["234234","645645","678678"],
"average":[85,90,64]
}
)
beststudents = df[df["average"]>80]
print(beststudents)
print(type(beststudents))
19. © 2008 Haim Michael 20150729
Selecting Rows
Selecting rows that a specific column in the selected row
has a value which is a specific value.
20. © 2008 Haim Michael 20150729
Selecting Rows
import pandas as pd
df = pd.DataFrame({ "first name":["moshe","daniel","tal","jane"],
"last name":["israeli","cohen","lahat","lala"],
"id":["234234","645645","678678","234234"],
"class":["1st","1st","2nd","3rd"]
}
)
premiumpassengers = df[(df["class"] == "1st") | (df["class"] == "3rd")]
print(premiumpassengers)
print(type(premiumpassengers))
21. © 2008 Haim Michael 20150729
Selecting Multiple Rows & Cols
In order to select multiple rows and cols, a subset of our
data, we should use the iloc operator.
22. © 2008 Haim Michael 20150729
Selecting Multiple Rows & Cols
import pandas as pd
df = pd.DataFrame({ "first name":["moshe","daniel","tal"],
"last name":["israeli","cohen","lahat"],
"id":["234234","645645","678678"],
"class":["1st","1st","2nd"]
}
)
ob = df.iloc[0:2,0:2]
print(ob)
print(type(ob))