This document provides an overview of Python basics for data analysis, including introductions to key Python packages like NumPy, Pandas, and Matplotlib. It covers fundamental Python concepts like data types, operators, conditional statements, loops and functions. It also demonstrates how to load and manipulate data with NumPy arrays and Pandas DataFrames, including indexing, slicing, grouping, merging, and handling missing values. Visualization with Matplotlib charts is also covered.
2. GUI
• Anaconda-navigator 1.8.7
– Use either of IDE
– Jupyter Notebook: It is an interactive computational environment, in which you
can combine code execution, rich text, mathematics, plots and rich media.
– Spyder is an open source cross-platform integrated development environment
for scientific programming in the Python language
• Download and install
– https://anaconda.org/anaconda/anaconda-
navigator
3. Why Python
• A simple language
• Free and open-source
• Ease of Portability
• Extensible and Embeddable with other
languages
• A high-level, interpreted language
• Standard libraries
• Object-oriented
6. History and path setup
Run history command to see previous commands as well as
copying executed code
Set up path on local machine so you can refer and
run the same with %run command. You need to create
MyHelloWorld.py in local path
7. Comments
One line comments are denoted by # at the start of line
Multiple line/block comments start with ''' and end with '''
8. Data structures
• Data Types
– Integer, Floating point number, Strings,
Boolean Values, Date and Timestamp
• Advanced Data Types
– Tuples , Lists, Dictionary
10. String
Python has very strong string processing capabilities. Subsets of strings can be
taken using the slice operator ( [ ] and [ : ] ) with indexes starting at 0 in the
beginning of the string and working their way from -1 at the end. Strings in Python
are immutable.
11. Date Time
Python has a built-in datetime module for working with dates and times
12. List
A list contains items separated by commas and enclosed within square brackets ([]).
All the items belonging to a list can be of different data type.
14. Dictionary
Python's dictionaries are kind of hash tables, associative arrays with key-value pairs
Dictionaries are enclosed by curly braces ( { } ) and values can be assigned
Dictionaries accessed using square braces ( [] )
16. Data conversion
To convert integer to float, use the float() function
To convert a float to an integer, the int() function.
To convert a string use str() function.
22. Function
• Function begin with the keyword def followed by the function name and
parentheses ( ( ) ).
• Any input parameters or arguments should be placed within these
parentheses.Arguments are specified within parentheses in function
definition separated by commas.
• It is also possible to assign default values to parameters in order to make
the program flexible and not behave in an unexpected manner.
• The code block within every function starts with a colon (:) and is indented.
• it allows you to pass any number of arguments (*argv) and you do not have
to worry about specifying the number when writing the function. This feature
becomes extremely important when dealing with lists or input data where
you do not know number of data observations beforehand.
• The statement return [expression] exits a function, optionally passing back
an expression to the caller.
• A return statement with no arguments is the same as return None.
24. Exception
• An exception is an interruption that happens during execution of a program. When that error
occurs, Python generate an exception that can be handled, which avoid program to stop.
• We can handle exceptions using the try..except statement.
• We basically put our usual statements within the try-block
and put all our error handlers in the except-block.
27. Classes
Python is an object oriented programming language and a Class is object constructor
The __init__() Function
All classes have a function called __init__(),
which is always executed when the class is being initiated.
The __init__() function to assign values to object properties.
The __init__() function is called automatically every time the class is being used
to create a new object
28. Numpy
• NumPy, short for Numerical Python, is the foundational package for
scientific computing in Python.
– NumPy provides basic numerical functions, especially for multi-dimensional
arrays and mathematical computation.
– SciPy builds on NumPy to provide features for scientific applications.
– a powerful N-dimensional array object
– sophisticated (broadcasting) functions, Functions for performing element-wise
computations with arrays or mathematical operations between arrays
– tools for integrating C/C++ and Fortran code
– useful linear algebra, Fourier transform, and random number capabilities
•
29. N-dimensional array
• import numpy as np
– Where np as alias
– use np.array() to create an array
– use np.arange() to create an arithmetic progression array
• ndarray: The N-dimensional array
– Use the np.array() constructor to create an
array with any number of dimensions.
– np.array(object, dtype=None)
34. Array attributes
• array.ndim : The number of dimensions of this
array.
• array.shape :A tuple of the array's dimensions.
• array.dtype :The array's data type.
35. Array Method
<Array name>.astype(Transform to datatype specified)
<Arrayname>.mean() returns mean of the values in Array
<Arrayname>. Var() returns Variance of the values in array
39. Mathematical Operations
• The usual mathematical operators (+ - * /) generalize to NumPy
array
• Two vectors vector1 and vector2 of the same length, the “+”
operator gives you an element-by-element sum broadcasting
41. Pandas
• import pandas as pd
• import numpy as np
• path = 'C:/BigData/Python‘
• A Series is a one-dimensional array-like object containing an array of data with an associated
array of data index.
• s1 = pd.Series([1,2,4,5,6,7])
Anaconda Navigator is a desktop graphical user interface included in Anaconda that allows you to launch applications and easily manage conda packages, environments and channels without the need to use command line commands.
A simple language which is easier to learnPython has a very simple and elegant syntax. It&apos;s much easier to read and write Python programs compared to other languages like: C++, Java, C#. Python makes programming fun and allows you to focus on the solution rather than syntax.
Free and open-sourceYou can freely use and distribute Python, even for commercial use. Not only can you use and distribute softwares written in it, you can even make changes to the Python&apos;s source code.Python has a large community constantly improving it in each iteration.
PortabilityYou can move Python programs from one platform to another, and run it without any changes.It runs seamlessly on almost all platforms including Windows, Mac OS X and Linux.
Extensible and EmbeddableSuppose an application requires high performance. You can easily combine pieces of C/C++ or other languages with Python code.This will give your application high performance as well as scripting capabilities which other languages may not provide out of the box.
A high-level, interpreted languageUnlike C/C++, you don&apos;t have to worry about daunting tasks like memory management, garbage collection and so on.Likewise, when you run Python code, it automatically converts your code to the language your computer understands. You don&apos;t need to worry about any lower-level operations.
Large standard libraries to solve common tasksPython has a number of standard libraries which makes life of a programmer much easier since you don&apos;t have to write all the code yourself. For example: Need to connect MySQL database on a Web server? You can use MySQLdb library using import MySQLdb .Standard libraries in Python are well tested and used by hundreds of people. So you can be sure that it won&apos;t break your application.
Object-orientedEverything in Python is an object. Object oriented programming (OOP) helps you solve a complex problem intuitively.With OOP, you are able to divide these complex problems into smaller sets by creating objects.
Applications of Python
Web Applications
You can create scalable Web Apps using frameworks and CMS (Content Management System) that are built on Python. Some of the popular platforms for creating Web Apps are: Django, Flask, Pyramid, Plone, Django CMS.
Sites like Mozilla, Reddit, Instagram and PBS are written in Python.
Scientific and Numeric Computing
There are numerous libraries available in Python for scientific and numeric computing. There are libraries like: SciPy and NumPy that are used in general purpose computing. And, there are specific libraries like: EarthPy for earth science, AstroPy for Astronomy and so on.
Also, the language is heavily used in machine learning, data mining and deep learning.
Creating software Prototypes
Python is slow compared to compiled languages like C++ and Java. It might not be a good choice if resources are limited and efficiency is a must.
However, Python is a great language for creating prototypes. For example: You can use Pygame (library for creating games) to create your game&apos;s prototype first. If you like the prototype, you can use language like C++ to create the actual game.
Good Language to Teach Programming
Python is used by many companies to teach programming to kids and newbies.
It is a good language with a lot of features and capabilities. Yet, it&apos;s one of the easiest language to learn because of its simple easy-to-use syntax.
One of the most important built-in data structures.
Python&apos;s dictionaries are kind of hash tables.
They work like associative arrays and consist of key-value pairs.
A dictionary key can be almost any Python type, but are usually numbers or strings.
Values, on the other hand, can be any arbitrary Python object.
Dictionaries are enclosed by curly braces ( { } ) and values can be assigned and accessed using square braces ( [] ).
Library Highlights
A fast and efficient DataFrame object for data manipulation with integrated indexing;
Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
Flexible reshaping and pivoting of data sets;
Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;
Columns can be inserted and deleted from data structures for size mutability;
Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets;
High performance merging and joining of data sets;
Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;
Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;
Highly optimized for performance, with critical code paths written in Cython or C.
Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.