Publicité

21 Dec 2020•0 j'aime## 0 j'aime

•57 vues## vues

Soyez le premier à aimer ceci

afficher plus

Nombre de vues

0

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

0

Télécharger pour lire hors ligne

Signaler

Formation

إذا كُنت من هواة البرمجة ولم تُحلّق في هذا العالم بعد فالطريق ما زال مفتوحًا أمامك، فالفضاء موجود أمامك لتختار أحد المسارات وتسلكها فورًا. اختيار المسار بحد ذاته هو الحاجز الذي نقف عنده في الغالب، بل ويستغرق وقتًا أطول من وقت التعلّم والمُمارسة، لكن ليس هُناك أجمل من الاستفادة من التقنيات الموجودة بين أيدينا حاليًا لتطوير أدوات نستطيع الاستفادة منها. لمزيد من المعلومات اشتركوا في قائمتنا البريدية: https://www.apptrainers.com/

App Ttrainers .comSuivre

Publicité

- | @Apptrainers
- • Introduction to Python • Numpy • Pandas | @Apptrainers content
- | @Apptrainers
- “In December 1989, I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. My office ... would be closed, but I had a home computer, and not much else on my hands. I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to Unix/C hackers. I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus).” — Guido van Rossum 4| @Apptrainers
- The big technology companies have each largely aligned themselves with different languages stacks. Oracle and IBM are aligned with Java (Oracle actually owns Java). Google are known for their use of Python (1997), a very versatile, dynamic and extensible language, although in reality they are also heavy users of C++ and Java. They have also created their own language called Go (2009). 5| @Apptrainers
- Easy to learn and powerful programming language It has efficient high-level data structures and a simple but effective approach to object- oriented programming. Freely available in source or binary form for all major platforms from the Python Web site, https://www.python.org/ The Python interpreter is easily extended with new functions and data types implemented in C or C++ (or other languages callable from C). Python is also suitable as an extension language for customizable applications. Widely used (Google, NASA, Quora). 6 | @Apptrainers
- When you run python program an interpreter will parse python program line by line basis, as compared to compiled languages like C or C++, where compiler first compiles the program and then start running. Difference is that interpreted languages are little bit slow as compared to compiled languages. 7| @Apptrainers
- In python you don’t need to define variable data type ahead of time, python automatically guesses the data type of the variable based on the type of value it contains. 8| @Apptrainers
- Python codes are usually 1/3 or 1/5 of the java code. It means we can write less code in Python to achieve the same thing as in Java. 9| @Apptrainers
- There are many good options for saving and manipulating code Sublime text (unlimited free trial available) Notepad++ Xcode (Mac) TextWrangler (Mac) TextEdit (Mac) Now there are multiple platforms for taking online courses for free Coursera Edx Stanford Online Khan Academy Udacity | @Apptrainers 10
- To download Python follow the instructions on the official website! https://www.python.org/ 11| @Apptrainers
- I would strongly recommend this video: https://www.youtube.com/watch?v=HW29067qVWk 12| @Apptrainers
- 13| @Apptrainers
- https://git-scm.com/book/en/v2/Getting-Started-Installing-Git https://github.com 14| @Apptrainers
- “GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere”. GitHub accounts can be public (free) or private (not free) A repository is usually used to organize a single project, It contains folders and files, images, videos, spreadsheets, and data sets – anything your project needs. 15| @Apptrainers
- Master in a repository:The final version Branch:To try out new ideas that don’t affect the master unless pull request is accepted. Any changes committed to branch reflects for you to keep track of different versions Adding Commits:To Keep track (history) of user progress on a branch or master. Forking a repository: creates a copy of Repo. Submit a pull request to owner so that the owner can incorporate changes. 16| @Apptrainers
- Download Python and Jupyter Notebook Write a python code to print your name, your id, and your favorite quote! Save the project as .html and as .ipynb Install git and create a GitHub account Upload your first project as .html to e-learning Upload your first project as .ipynb to your Github account Share the link of your Github with me on e-learning 17| @Apptrainers
- https://www.tutorialspoint.com/execute_python_online.php https://www.onlinegdb.com/online_python_compiler 18| @Apptrainers
- You can type things directly into a running Python session 19| @Apptrainers
- Most of the programming languages like C, C++, Java use braces { } to define a block of code. Python uses indentation. A code block (body of a function, loop etc.) starts with indentation and ends with the first unindented line.The amount of indentation is up to you, but it must be consistent throughout that block. Generally four whitespaces are used for indentation and is preferred over tabs. Here is an example. for i in range(1,11): print(i) if i == 5: break Incorrect indentation will result into IndentationError. 20| @Apptrainers
- In Python, we use the hash (#) symbol to start writing a comment. It extends up to the newline character. Comments are for programmers for better understanding of a program. Python Interpreter ignores comment. #This is a comment #print out Hello print('Hello’) If we have comments that extend multiple lines, one way of doing it is to use hash (#) in the beginning of each line. Another way of doing this is to use triple quotes, either ’‘ ' or ” ” ". These triple quotes are generally used for multi-line strings. But they can be used as multi-line comment as well. """This is also a perfect example of multi-line comments""" 21| @Apptrainers
- expression: A data value or set of operations to compute a value. Examples: 1 + 4 * 3 42 Arithmetic operators we will use: + - * / addition, subtraction, multiplication, division % modulus, a.k.a. remainder ** exponentiation precedence: Order in which operations are computed. * / % ** have a higher precedence than + - 1 + 3 * 4 is 13 Parentheses can be used to force a certain order of evaluation. (1 + 3) * 4 is 16 Operat or Description Example = Assignment num = 7 + Addition num = 2 + 2 - Subtraction num = 6 - 4 * Multiplication num = 5 * 4 / Division num = 25 / 5 % Modulo num = 8 % 3 ** Exponent num = 9 ** 2 22| @Apptrainers
- When we divide integers with / , the quotient is also an integer. 35 / 5 is 7 84 / 10 is 8 156 / 100 is 1 The % operator computes the remainder from a division of integers. The operators + - * / % ** ( ) all work for real numbers. The / produces an exact answer: 15.0 / 2.0 is 7.5 The same rules of precedence also apply to real numbers: Evaluate ( ) before * / % before + - When integers and reals are mixed, the result is a real number. Example: 1 / 2.0 is 0.5 The conversion occurs on a per-operator basis 7 / 3 * 1.2 + 3 / 2 2 * 1.2 + 3 / 2 2.4 + 3 / 2 2.4 + 1 3.4 23| @Apptrainers
- Python has useful commands for performing calculations. Command name Description abs(value) absolute value ceil(value) rounds up cos(value) cosine, in radians floor(value) rounds down log(value) logarithm, base e log10(value) logarithm, base 10 max(value1, value2) larger of two values min(value1, value2) smaller of two values round(value) nearest whole number sin(value) sine, in radians sqrt(value) square root Constant Description e 2.7182818... pi 3.1415926... To use many of these commands, you must write the following at the top of your Python program: from math import * 24| @Apptrainers
- variable: A named piece of memory that can store a value. Usage: Compute an expression's result, store that result into a variable, and use that variable later in the program. assignment statement: Stores a value into a variable. Syntax: name = value Examples: x = 5 gpa = 3.14 x 5 gpa 3.14 A variable that has been given a value can be used in expressions. x + 4 is 9 Exercise: Evaluate the quadratic equation for a given a, b, and c. 25| @Apptrainers
- print : Produces text output on the console. Syntax: print ("Message”) print (Expression) Prints the given text message or expression value on the console, and moves the cursor down to the next line. print (Item1, Item2, ..., ItemN) Prints several messages and/or expressions on the same line. Examples: print ("Hello, world!”) age = 45 print ("You have", 65 - age, "years until retirement”) Output: Hello, world! You have 20 years until retirement 26| @Apptrainers
- input : Reads a number from user input. You can assign (store) the result of input into a variable. Example: age = input("How old are you? ") print ("Your age is", age) print ("You have", 65 - age, "years until retirement”) Output: How old are you? 53 Your age is 53 You have 12 years until retirement Exercise: Write a Python program that prompts the user for his/her amount of money, then reports how many Nintendo Wiis the person can afford, and how much more money he/she will need to afford an additional Wii. 27| @Apptrainers
- for loop: Repeats a set of statements over a group of values. Syntax: for variableName in groupOfValues: statements We indent the statements to be repeated with tabs or spaces. variableName gives a name to each value, so you can refer to it in the statements. groupOfValues can be a range of integers, specified with the range function. Example: for x in range(1, 6): print (x, "squared is", x * x) Output: 1 squared is 1 2 squared is 4 3 squared is 9 4 squared is 16 5 squared is 25 28| @Apptrainers
- 29| @Apptrainers
- The range function specifies a range of integers: range(start, stop) - the integers between start (inclusive) and stop (exclusive) It can also accept a third value specifying the change between values. range(start, stop, step) - the integers between start (inclusive) and stop (exclusive) by step Example: for x in range(5, 0, -1): print (x) print (”Hello!”) Output: 5 4 3 2 1 Hello! 30| @Apptrainers
- Some loops incrementally compute a value that is initialized outside the loop. This is sometimes called a cumulative sum. sum = 0 for i in range(1, 11): sum = sum + (i * i) print ("sum of first 10 squares is", sum) Output: sum of first 10 squares is 385 Exercise: Write a Python program that computes the factorial of an integer. 31| @Apptrainers
- if statement: Executes a group of statements only if a certain condition is true. Otherwise, the statements are skipped. Syntax: if condition: statements Example: gpa = 3.4 if gpa > 2.0: print ("Your application is accepted.”) 32| @Apptrainers
- if/else statement: Executes one block of statements if a certain condition is True, and a second block of statements if it is False. Syntax: if condition: statements else: statements Example: gpa = 1.4 if gpa > 2.0: print "Welcome to JUST University!" else: print "Your application is denied." Multiple conditions can be chained with elif ("else if"): if condition: statements elif condition: statements else: statements 33| @Apptrainers
- while loop: Executes a group of statements as long as a condition is True. good for indefinite loops (repeat an unknown number of times) Syntax: while condition: statements Example: number = 1 while number < 200: print number, number = number * 2 Output: 1 2 4 8 16 32 64 128 34| @Apptrainers
- Many logical expressions use relational operators: Logical expressions can be combined with logical operators: Exercise: Write code to display and count the factors of a number. Operator Example Result and 9 != 6 and 2 < 3 True or 2 == 3 or -1 < 5 True not not 7 > 0 False Operator Meaning Example Result == equals 1 + 1 == 2 True != does not equal 3.2 != 2.5 True < less than 10 < 5 False > greater than 10 > 5 True <= less than or equal to 126 <= 100 False >= greater than or equal to 5.0 >= 5.0 True 35| @Apptrainers
- string: A sequence of text characters in a program. Strings start and end with quotation mark " or apostrophe ' characters. Examples: "hello" "This is a string" "This, too, is a string. It can be very long!" A string may not span across multiple lines or contain a " character. "This is not a legal String." "This is not a "legal" String either." A string can represent characters by preceding them with a backslash. t tab character n new line character " quotation mark character backslash character Example: "HellottherenHow are you?" 36| @Apptrainers
- Characters in a string are numbered with indexes starting at 0: Example: name = "P. Diddy" Accessing an individual character of a string: variableName [ index ] Example: print name, "starts with", name[0] Output: P. Diddy starts with P index 0 1 2 3 4 5 6 7 character P . D i d d y 37| @Apptrainers
- len(string) - number of characters in a string (including spaces) str.lower(string) - lowercase version of a string str.upper(string) - uppercase version of a string Example: name = "Martin Douglas Stepp" length = len(name) big_name = str.upper(name) print big_name, "has", length, "characters" Output: MARTIN DOUGLAS STEPP has 20 characters 38| @Apptrainers
- A compound data type: [0] [2.3, 4.5] [5, "Hello", "there", 9.8] [] Use len() to get the length of a list >>> names = [“Ben",“Chen",“Yaqin"] >>> len(names) 3 39| @Apptrainers
- 40| @Apptrainers
- 41| @Apptrainers
- http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html 42| @Apptrainers
- Certain features of Python are not loaded by default In order to use these features, you’ll need to import the modules that contain them. E.g. import matplotlib.pyplot as plt import numpy as np 43| @Apptrainers
- f = 7 / 2 # in python 2, f will be 3, unless “from __future__ import division” f = 7 / 2 # in python 3 f = 3.5 f = 7 // 2 # f = 3 in both python 2 and 3 f = 7 / 2. # f = 3.5 in both python 2 and 3 f = 7 / float(2) # f is 3.5 in both python 2 and 3 f = int(7 / 2) # f is 3 in both python 2 and 3 44| @Apptrainers
- Get the i-th element of a list x = [i for i in range(10)] # is the list [0, 1, ..., 9] zero = x[0] # equals 0, lists are 0-indexed one = x[1] # equals 1 nine = x[-1] # equals 9, 'Pythonic' for last element eight = x[-2] # equals 8, 'Pythonic' for next-to-last element one_to_four = x[1:5] # [1, 2, 3, 4] first_three = x[:3] # [0, 1, 2] last_three = x[-3:] # [7, 8, 9] three_to_end = x[3:] # [3, 4, ..., 9] without_first_and_last = x[1:-1] # [1, 2, ..., 8] copy_of_x = x[:] # [0, 1, 2, ..., 9] another_copy_of_x = x[:3] + x[3:] # [0, 1, 2, ..., 9] 45| @Apptrainers
- 1 in [1, 2, 3] # True 0 in [1, 2, 3] # False x = [1, 2, 3] y = [4, 5, 6] x.extend(y) # x is now [1,2,3,4,5,6] x = [1, 2, 3] y = [4, 5, 6] z = x + y # z is [1,2,3,4,5,6]; x is unchanged. x, y = [1, 2] # x is 1 and y is 2 [x, y] = 1, 2 # same as above x, y = [1, 2] # same as above x, y = 1, 2 # same as above _, y = [1, 2] # y is 2, didn't care about the first element 46| @Apptrainers
- >>> a = ['Mary', 'had', 'a', 'little', 'lamb'] >>> for i in range(len(a)): ... print(i, a[i]) ... 0 Mary 1 had 2 a 3 little 4 lamb 47| @Apptrainers
- What are the expected output for the following code? a = list(range(10)) b = a b[0] = 100 print(a) a = list(range(10)) b = a[:] b[0] = 100 print(a) [100, 1, 2, 3, 4, 5, 6, 7, 8, 9] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] a = [0, 1, 2, 3, 4] b = a c = a[:] a == b Out[129]: True a is b Out[130]: True a == c Out[132]: True a is c Out[133]: False 48| @Apptrainers
- Similar to lists, but are immutable a_tuple = (0, 1, 2, 3, 4) Other_tuple = 3, 4 Another_tuple = tuple([0, 1, 2, 3, 4]) Hetergeneous_tuple = (‘john’, 1.1, [1, 2]) Can be sliced, concatenated, or repeated a_tuple[2:4] # will print (2, 3) Cannot be modified a_tuple[2] = 5 TypeError: 'tuple' object does not support item assignment Note: tuple is defined by comma, not parentheses, which is only used for convenience and grouping elements. So a = (1) is not a tuple, but a = (1,) is. 49| @Apptrainers
- Useful for returning multiple values from functions Tuples and lists can also be used for multiple assignments def sum_and_product(x, y): return (x + y),(x * y) sp = sum_and_product(2, 3) # equals (5, 6) s, p = sum_and_product(5, 10) # s is 15, p is 50 x, y = 1, 2 [x, y] = [1, 2] (x, y) = (1, 2) x, y = y, x 50| @Apptrainers
- a = [1, 2, 3, 4, 5, 6] my_tuple=(a,) my_tuple[0]=a #### ERROR a = [1, 2, 3, 4, 5, 6] my_tuple=(a) my_tuple[0]=a #### No ERROR a = [1, 2, 3, 4, 5, 6] my_tuple=(a,) my_tuple[0]=5 #### ERROR a = [1, 2, 3, 4, 5, 6] my_tuple=(a,) my_tuple[0][0]=5 #### No ERROR 51| @Apptrainers
- A dictionary associates values with unique keys empty_dict = {} # Pythonic empty_dict2 = dict() # less Pythonic grades = { "Joel" : 80, "Tim" : 95 } # dictionary literal joels_grade = grades["Joel"] # equals 80 grades["Tim"] = 99 # replaces the old value grades["Kate"] = 100 # adds a third entry num_students = len(grades) # equals 3 • Access/modify value with key try: kates_grade = grades["Kate"] except KeyError: print "no grade for Kate!" 52| @Apptrainers
- 53| @Apptrainers
- Check for existence of key joel_has_grade = "Joel" in grades # True kate_has_grade = "Kate" in grades # False joels_grade = grades.get("Joel", 0) # equals 80 kates_grade = grades.get("Kate", 0) # equals 0 no_ones_grade = grades.get("No One") # default default is None • Use “get” to avoid keyError and add default value • Get all items all_keys = grades.keys() # return a list of all keys all_values = grades.values() # return a list of all values all_pairs = grades.items() # a list of (key, value) tuples #Which of the following is faster? 'Joel' in grades # faster. Hashtable 'Joel' in all_keys # slower. List. In python3,The following will not return lists but iterable objects 54| @Apptrainers
- a = [0, 0, 0, 1] any(a) Out[135]: True all(a) Out[136]: False 55| @Apptrainers
- try: print 0 / 0 except ZeroDivisionError: print ("cannot divide by zero") https://docs.python.org/3/tutorial/errors.ht ml 56| @Apptrainers
- Functions are defined using def def double(x): """this is where you put an optional docstring that explains what the function does. for example, this function multiplies its input by 2""" return x * 2 • You can call a function after it is defined z = double(10) # z is 20 • You can give default values to parameters def my_print(message="my default message"): print (message) my_print("hello") # prints 'hello' my_print() # prints 'my default message‘ 57| @Apptrainers
- Sometimes it is useful to specify arguments by name def subtract(a=0, b=0): return a – b subtract(10, 5) # returns 5 subtract(0, 5) # returns -5 subtract(b = 5) # same as above subtract(b = 5, a = 0) # same as above 58| @Apptrainers
- Functions are objects too In [12]: def double(x): return x * 2 ...: DD = double; ...: DD(2) ...: Out[12]: 4 In [16]: def apply_to_one(f): ...: return f(1) ...: x=apply_to_one(DD) ...: x ...: Out[16]: 2 59| @Apptrainers
- Small anonymous functions can be created with the lambda keyword. The power of lambda is better shown when you use them as an anonymous function inside another function. def myfunc(n): return lambda a : a * n mydoubler = myfunc(2) mytripler = myfunc(3) print(mydoubler(11)) print(mytripler(11)) A lambda function can take any number of arguments, but can only have one expression. x = lambda a : a + 10 print(x(5)) x = lambda a, b, c : a * b - c print(x(5, 6, 2)) 60| @Apptrainers
- pairs = [(2, 'two'), (3, 'three'), (1, 'one'), (4, 'four')] pairs.sort(key = lambda pair: pair[0]) print (pairs) Out[22]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')] def getKey(pair): return pair[0] pairs.sort(key=getKey) print (pairs) Out[107]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four') 61| @Apptrainers
- A very convenient way to create a new list squares = [x * x for x in range(5)] print (squares) Out[52]: [0, 1, 4, 9, 16] squares=[0,0,0,0,0] for x in range(5): squares[x] = x * x print (squares) Out[64]: [0, 1, 4, 9, 16] 62| @Apptrainers
- In [68]: even_numbers = [] In [69]: for x in range(5): ...: if x % 2 == 0: ...: even_numbers.append(x) ...: even_numbers Out[69]: [0, 2, 4] In [65]: even_numbers = [x for x in range(5) if x % 2 == 0] In [66]: even_numbers Out[66]: [0, 2, 4] Can also be used to filter list 63| @Apptrainers
- More complex examples: # create 100 pairs (0,0) (0,1) ... (9,8), (9,9) pairs = [(x, y) for x in range(10) for y in range(10)] # only pairs with x < y, # range(lo, hi) equals # [lo, lo + 1, ..., hi - 1] increasing_pairs = [(x, y) for x in range(10) for y in range(x + 1, 10)] [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 2), (1, 3) …etc 64| @Apptrainers
- Convenient tools in python to apply function to sequences of data def double(x): return 2*x b=range(5) list(map(double, b)) Out[203]: [0, 2, 4, 6, 8] In [204]: double(b) Traceback (most recent call last): TypeError: unsupported operand type(s) for *: 'int' and 'range' def double(x): return 2*x print ([double(i) for i in range(5)]) Out[205]: [0, 2, 4, 6, 8] 65| @Apptrainers
- map_output = map(lambda x: x*2, [1, 2, 3, 4]) print(map_output) # Output: map object: <map object at 0x04D6BAB0> list_map_output = list(map_output) print(list_map_output) # Output: [2, 4, 6, 8] map(lambda x : x*2, [1, 2, 3, 4]) #Output [2, 4, 6, 8] map(lambda x, y: x + y, list_a, list_b) # Output: [11, 22, 33] 66| @Apptrainers
- def is_even(x): return x%2==0 a=[0, 1, 2, 3] list(filter(is_even, a)) Out[208]: [0, 2] In [209]: [a[i] for i in a if is_even(i)] Out[209]: [0, 2] a = [1, 2, 3, 4, 5, 6] print list(filter(lambda x : x % 2 == 0, a)) # Output: [2, 4, 6] 67| @Apptrainers
- In [216]: from functools import reduce In [217]: reduce(lambda x, y: x+y, range(10)) Out[217]: 45 In [220]: reduce(lambda x, y: x*y, [1, 2, 3, 4]) Out[220]: 24 68| @Apptrainers
- Useful to combined multiple lists into a list of tuples In [238]: list(zip(['a', 'b', 'c'], [1, 2, 3], ['A', 'B', 'C'])) Out[238]: [('a', 1, 'A'), ('b', 2, 'B'), ('c', 3, 'C')] In [245]: names = ['James', 'Tom', 'Mary'] ...: grades = [100, 90, 95] ...: list(zip(names, grades)) ...: Out[245]: [('James', 100), ('Tom', 90), ('Mary', 95)] 69| @Apptrainers
- file object = open(file_name [, access_mode]) access_mode − The access_mode determines the mode in which the file has to be opened, i.e., read, write, append, etc. A complete list of possible values is given below in the table.This is optional parameter and the default file access mode is read (r). 70| @Apptrainers
- 71| @Apptrainers
- read(): It reads the entire file and returns it contents in the form of a string readline(): It reads the first line of the file i.e till a newline character or an EOF in case of a file having a single line and returns a string readlines(): It reads the entire file line by line and returns a list of line strings 1 hello 40 50 hi This is my course Welcome to this course n wish you all the best f = open("my_file2.txt", 'w') f.write("Hello Everyone!") 72| @Apptrainers
- Notice how each piece of data is separated by a comma. 73| @Apptrainers
- 74| @Apptrainers
- | @Apptrainers
- Numpy Numerical Computing in Python 2
- What is Numpy? • Numpy, Scipy, and Matplotlib provide MATLAB- like functionality in python. • Numpy Features: Typed multidimentional arrays (matrices) Fast numerical computations (matrix math) High-level math functions 3 |@Apptrainers
- Why do we need NumPy Let’s see for ourselves! 4 |@Apptrainers
- Why do we need NumPy • Python does numerical computations slowly. • 1000 x 1000 matrix multiply Python triple loop takes > 10 min. Numpy takes ~0.03 seconds 5 |@Apptrainers
- NumPy Overview 1. Arrays 2. Shaping and transposition 3. Mathematical Operations 4. Indexing and slicing 5. Broadcasting 6 |@Apptrainers
- Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 7 |@Apptrainers
- Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 𝑝 𝑥 𝑝 𝑦 𝑝 𝑧 𝑎11 ⋯ 𝑎1𝑛 ⋮ ⋱ ⋮ 𝑎 𝑚1 ⋯ 𝑎 𝑚𝑛 8 |@Apptrainers
- Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 9 |@Apptrainers
- Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 10 |@Apptrainers
- Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 11 |@Apptrainers
- Arrays, Basic Properties import numpy as np a = np.array([[1,2,3],[4,5,6]],dtype=np.float32) print a.ndim, a.shape, a.dtype 1. Arrays can have any number of dimensions, including zero (a scalar). 2. Arrays are typed: np.uint8, np.int64, np.float32, np.float64 3. Arrays are dense. Each element of the array exists and has the same type. 12 |@Apptrainers
- Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 13 |@Apptrainers
- Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 14 |@Apptrainers
- Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 15 |@Apptrainers
- Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 16 |@Apptrainers
- Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 17 |@Apptrainers
- Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 18 |@Apptrainers
- Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 19 |@Apptrainers
- 20 |@Apptrainers
- Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 21 |@Apptrainers
- Arrays, danger zone • Must be dense, no holes. • Must be one type • Cannot combine arrays of different shape 22 |@Apptrainers
- Shaping a = np.array([1,2,3,4,5,6]) a = a.reshape(3,2) a = a.reshape(2,-1) a = a.ravel() 1. Total number of elements cannot change. 2. Use -1 to infer axis shape 3. Row-major by default (MATLAB is column-major) 23 |@Apptrainers
- import numpy as np a = np.array([1,2,3,4,5,6]) print(a) print('-'*20) b=a.reshape(3,2) print(b) print('-'*20) c=a.reshape(2,-1) print(c) print('-'*20) d= a.ravel() print(d) 24 |@Apptrainers
- 25 |@Apptrainers
- Return values • Numpy functions return either views or copies. • Views share data with the original array, like references in Java/C++. Altering entries of a view, changes the same entries in the original. • The numpy documentation says which functions return views or copies • np.copy, np.view make explicit copies and views. 26 |@Apptrainers
- Transposition a = np.arange(10).reshape(5,2) a = a.T a = a.transpose((1,0)) np.transpose permutes axes. a.T transposes the first two axes. 27 |@Apptrainers
- 28 |@Apptrainers
- 29 |@Apptrainers
- Saving and loading arrays np.savez(‘data.npz’, a=a) data = np.load(‘data.npz’) a = data[‘a’] 1. NPZ files can hold multiple arrays 2. np.savez_compressed similar. 30 |@Apptrainers
- Mathematical operators • Arithmetic operations are element-wise • Logical operator return a bool array • In place operations modify the array 31 |@Apptrainers
- Mathematical operators • Arithmetic operations are element-wise • Logical operator return a bool array • In place operations modify the array 32 |@Apptrainers
- Mathematical operators • Arithmetic operations are element-wise • Logical operator return a bool array • In place operations modify the array 33 |@Apptrainers
- Mathematical operators • Arithmetic operations are element-wise • Logical operator return a bool array • In place operations modify the array 34 |@Apptrainers
- Math, upcasting Just as in Python and Java, the result of a math operator is cast to the more general or precise datatype. uint64 + uint16 => uint64 float32 / int32 => float32 Warning: upcasting does not prevent overflow/underflow. You must manually cast first. Use case: images often stored as uint8. You should convert to float32 or float64 before doing math. 35 |@Apptrainers
- Math, universal functions Also called ufuncs Element-wise Examples: np.exp np.sqrt np.sin np.cos np.isnan 36 |@Apptrainers
- Math, universal functions Also called ufuncs Element-wise Examples: np.exp np.sqrt np.sin np.cos np.isnan 37 |@Apptrainers
- Math, universal functions Also called ufuncs Element-wise Examples: np.exp np.sqrt np.sin np.cos np.isnan 38 |@Apptrainers
- Indexing x[0,0] # top-left element x[0,-1] # first row, last column x[0,:] # first row (many entries) x[:,0] # first column (many entries) Notes: Zero-indexing Multi-dimensional indices are comma-separated (i.e., a tuple) 39 |@Apptrainers
- 40 |@Apptrainers
- Python Slicing Syntax: start:stop:step a = list(range(10)) a[:3] # indices 0, 1, 2 a[-3:] # indices 7, 8, 9 a[3:8:2] # indices 3, 5, 7 a[4:1:-1] # indices 4, 3, 2 (this one is tricky) 41 |@Apptrainers
- 42 |@Apptrainers
- Axes a.sum() # sum all entries a.sum(axis=0) # sum over rows a.sum(axis=1) # sum over columns a.sum(axis=1, keepdims=True) 1. Use the axis parameter to control which axis NumPy operates on 2. Typically, the axis specified will disappear, keepdims keeps all dimensions 43 |@Apptrainers
- 44 |@Apptrainers
- Broadcasting a = a + 1 # add one to every element When operating on multiple arrays, broadcasting rules are used. Each dimension must match, from right-to-left 1. Dimensions of size 1 will broadcast (as if the value was repeated). 2. Otherwise, the dimension must have the same shape. 3. Extra dimensions of size 1 are added to the left as needed. 45 |@Apptrainers
- Broadcasting example Suppose we want to add a color value to an image a.shape is 100, 200, 3 b.shape is 3 a + b will pad b with two extra dimensions so it has an effective shape of 1 x 1 x 3. So, the addition will broadcast over the first and second dimensions. 46 |@Apptrainers
- Broadcasting failures If a.shape is 100, 200, 3 but b.shape is 4 then a + b will fail. The trailing dimensions must have the same shape (or be 1) 47 |@Apptrainers
- Tips to avoid bugs 1. Know what your datatypes are. 2. Check whether you have a view or a copy. 3. Know np.dot vs np.multiply. 48 |@Apptrainers
- 49 numpy.dot numpy.dot(a, b, out=None) Dot product of two arrays. Specifically, • If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation). • If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred. • If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred. • If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b. • If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b: dot(a, b)[i,j,k,m] = (a[i,j,:] * b[k,:,m]) |@Apptrainers
- 50 |@Apptrainers
- 51 Numpy.multiply |@Apptrainers
- 52 |@Apptrainers
- | @Apptrainers
- What is Pandas? Pandas is a Python module, which is rounding up the capabilities of Numpy, Scipy and Matplotlab. The word pandas is an acronym which is derived from: "Python and data analysis" and "panel data". There is often some confusion about whether Pandas is an alternative to Numpy, SciPy and Matplotlib. The truth is that it is built on top of Numpy. This means that Numpy is required by pandas. Scipy and Matplotlib on the other hand are not required by pandas but they are extremely useful. That's why the Pandas project lists them as "optional dependency". | @Apptrainers
- What is Pandas? • Pandas is a software library written for the Python programming language. • It is used for data manipulation and analysis. • It provides special data structures and operations for the manipulation of numerical tables and time series. | @Apptrainers| @Apptrainers
- Common Data Structures in Pandas • Series • Data Frame | @Apptrainers| @Apptrainers
- Series • A Series is a one-dimensional labelled array-like object. • It is capable of holding any data type, e.g. integers, floats, strings, Python objects, and so on. • It can be seen as a data structure with two arrays: one functioning as the index, i.e. the labels, and the other one contains the actual data. | @Apptrainers
- Example import pandas as pd S = pd.Series([11, 28, 72, 3, 5, 8]) S The above code returns: 0 11 1 28 2 72 3 3 4 5 5 8 dtype: int64 | @Apptrainers
- • We can directly access the index and the values of our Series S: print(S.index) print(S.values) RangeIndex(start=0, stop=6, step=1) [11 28 72 3 5 8] | @Apptrainers
- • If we compare this to creating an array in numpy, there are still lots of similarities: import numpy as np X = np.array([11, 28, 72, 3, 5, 8]) print(X) print(S.values) # both are the same type: print(type(S.values), type(X)) [11 28 72 3 5 8] [11 28 72 3 5 8] <class 'numpy.ndarray'> <class 'numpy.ndarray'> | @Apptrainers
- Another example: fruits = ['apples', 'oranges', 'cherries', 'pears'] quantities = [20, 33, 52, 10] S = pd.Series(quantities, index=fruits) S Output: apples 20 oranges 33 cherries 52 pears 10 dtype: int64 | @Apptrainers
- If we add two series with the same indices, we get a new series with the same index and the corresponding values will be added: fruits = ['apples', 'oranges', 'cherries', 'pears'] S = pd.Series([20, 33, 52, 10], index=fruits) S2 = pd.Series([17, 13, 31, 32], index=fruits) print(S + S2) print(“sum of S: ", sum(S)) Output: apples 37 oranges 46 cherries 83 pears 42 dtype: int64 sum of S: 115 | @Apptrainers
- The indices do not have to be the same for the Series addition. The index will be the "union" of both indices. If an index doesn't occur in both Series, the value for this Series will be NaN: fruits = ['peaches', 'oranges', 'cherries', 'pears'] fruits2 = ['raspberries', 'oranges', 'cherries', 'pears'] S = pd.Series([20, 33, 52, 10], index=fruits) S2 = pd.Series([17, 13, 31, 32], index=fruits2) print(S + S2) Output: cherries 83.0 oranges 46.0 peaches NaN pears 42.0 raspberries NaN dtype: float64 | @Apptrainers
- fruits = ['apples', 'oranges', 'cherries', 'pears'] fruits_ro = ["mere", "portocale", "cireșe", "pere"] S = pd.Series([20, 33, 52, 10], index=fruits) S2 = pd.Series([17, 13, 31, 32], index=fruits_ro) print(S+S2) Output: apples NaN cherries NaN cireșe NaN mere NaN oranges NaN pears NaN pere NaN portocale NaN dtype: float64 | @Apptrainers
- It's possible to access single values of a Series or more than one value by a list of indices: print(S['apples']) 20 print(S[['apples', 'oranges', 'cherries']]) apples 20 oranges 33 cherries 52 dtype: int64 | @Apptrainers
- Similar to Numpy we can use scalar operations or mathematical functions on a series: import numpy as np print((S + 3) * 4) print("======================") print(np.sin(S)) Output: apples 92 oranges 144 cherries 220 pears 52 dtype: int64 ====================== apples 0.912945 oranges 0.999912 cherries 0.986628 pears -0.544021 dtype: float64 | @Apptrainers
- Pandas.Series.Apply Series.apply(func, convert_dtype=True, args=(), **kwds) Parameter Meaning func a function, which can be a NumPy function that will be applied to the entire Series or a Python function that will be applied to every single value of the series convert_dtype A boolean value. If it is set to True (default), apply will try to find better dtype for elementwise function results. If False, leave as dtype=object args Positional arguments which will be passed to the function "func" additionally to the values from the series. **kwds Additional keyword arguments will be passed as keywords to the function | @Apptrainers
- S.apply(np.sin) apples 0.912945 oranges 0.999912 cherries 0.986628 pears -0.544021 dtype: float64 | @Apptrainers
- • We can also use Python lambda functions. Let's assume, we have the following task: test the amount of fruit for every kind. If there are less than 50 available, we will augment the stock by 10: S.apply(lambda x: x if x > 50 else x+10 ) apples 30 oranges 43 cherries 52 pears 20 dtype: int64 | @Apptrainers
- Filtering with a Boolean array: S[S>30] oranges 33 cherries 52 dtype: int64 | @Apptrainers
- • A series can be seen as an ordered Python dictionary with a fixed length. "apples" in S True | @Apptrainers
- • We can even pass a dictionary to a Series object, when we create it. We get a Series with the dict's keys as the indices. The indices will be sorted. cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome": 2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900, "Milan": 1350680} city_series = pd.Series(cities) print(city_series)
- | @Apptrainers
- NaN One problem in dealing with data analysis tasks consists in missing data. Pandas makes it as easy as possible to work with missing data. my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) my_city_series
- | @Apptrainers
- • Due to the NaN values the population values for the other cities are turned into floats. There is no missing data in the following examples, so the values are int: my_cities = ["London", "Paris", "Berlin", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) my_city_series
- The Methods isnull() and notnull() my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) print(my_city_series.isnull()) | @Apptrainers
- print(my_city_series.notnull())
- • We get also a NaN, if a value in the dictionary has a None: d = {"a":23, "b":45, "c":None, "d":0} S = pd.Series(d) print(S) | @Apptrainers
- print(pd.isnull(S)) | @Apptrainers
- Print(pd.notnull(S)) | @Apptrainers
- Filtering out Missing Data It's possible to filter out missing data with the Series method dropna. It returns a Series which consists only of non-null data: import pandas as pd cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome": 2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900, "Milan": 1350680} my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) print(my_city_series.dropna()) | @Apptrainers
- | @Apptrainers
- Filling in Missing Data • In many cases you don't want to filter out missing data, but you want to fill in appropriate data for the empty gaps. A suitable method in many situations will be fillna: print(my_city_series.fillna(0)) London 8615246.0 Paris 2273305.0 Zurich 0.0 Berlin 3562166.0 Stuttgart 0.0 Hamburg 1760433.0 dtype: float64 | @Apptrainers
- • If we call fillna with a dictionary, we can provide the appropriate data, i.e. the population of Zurich and Stuttgart: missing_cities = {"Stuttgart":597939, "Zurich":378884} my_city_series.fillna(missing_cities) London 8615246.0 Paris 2273305.0 Zurich 378884.0 Berlin 3562166.0 Stuttgart 597939.0 Hamburg 1760433.0 dtype: float64 | @Apptrainers
- cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome": 2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900, "Milan": 1350680} my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) my_city_series = my_city_series.fillna(0).astype(int) print(my_city_series) | @Apptrainers
- London 8615246 Paris 2273305 Zurich 0 Berlin 3562166 Stuttgart 0 Hamburg 1760433 dtype: int64 | @Apptrainers
- DataFrame • The underlying idea of a DataFrame is based on spreadsheets. We can see the data structure of a DataFrame as tabular and spreadsheet-like. • A DataFrame logically corresponds to a "sheet" of an Excel document. • A DataFrame has both a row and a column index. | @Apptrainers
- • Like a spreadsheet or Excel sheet, a DataFrame object contains an ordered collection of columns. • Each column consists of a unique data type, but different columns can have different types, e.g. the first column may consist of integers, while the second one consists of Boolean values and so on. • There is a close connection between the DataFrames and the Series of Pandas. • A DataFrame can be seen as a concatenation of Series, each Series having the same index, i.e. the index of the DataFrame. | @Apptrainers
- import pandas as pd years = range(2014, 2018) shop1 = pd.Series([2409.14, 2941.01, 3496.83, 3119.55], index=years) shop2 = pd.Series([1203.45, 3441.62, 3007.83, 3619.53], index=years) shop3 = pd.Series([3412.12, 3491.16, 3457.19, 1963.10], index=years) print(pd.concat([shop1, shop2, shop3])) | @Apptrainers
- | @Apptrainers
- • This result is not what we have intended or expected. The reason is that concat used 0 as the default for the axis parameter. Let's do it with "axis=1": shops_df = pd.concat([shop1, shop2, shop3], axis=1) print(shops_df) | @Apptrainers
- | @Apptrainers
- cities = ["Zürich", "Winterthur", "Freiburg"] shops_df.columns = cities print(shops_df) # alternative way: give names to series: shop1.name = "Zürich" shop2.name = "Winterthur" shop3.name = "Freiburg" print("------") shops_df2 = pd.concat([shop1, shop2, shop3], axis=1) print(shops_df2) | @Apptrainers
- | @Apptrainers
- print(type(shops_df)) <class 'pandas.core.frame.DataFrame'> | @Apptrainers
- DataFrames from Dictionaries cities = {"name": ["London", "Berlin", "Madrid", "Rome", "Paris", "Vienna", "Bucharest", "Hamburg", "Budapest", "Warsaw", "Barcelona", "Munich", "Milan"], "population": [8615246, 3562166, 3165235, 2874038, 2273305, 1805681, 1803425, 1760433, 1754000, 1740119, 1602386, 1493900, 1350680], "country": ["England", "Germany", "Spain", "Italy", "France", "Austria", "Romania", "Germany", "Hungary", "Poland", "Spain", "Germany", "Italy"]} city_frame = pd.DataFrame(cities) print(city_frame) | @Apptrainers
- | @Apptrainers
- Retrieving the Column Names city_frame.columns.values Output: array(['country', 'name', 'population'], dtype=object) | @Apptrainers
- Custom Index • We can see that an index (0,1,2, ...) has been automatically assigned to the DataFrame. We can also assign a custom index to the DataFrame object: ordinals = ["first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eigth", "ninth", "tenth", "eleventh", "twelfth", "thirteenth"] city_frame = pd.DataFrame(cities, index=ordinals) print(city_frame) | @Apptrainers
- | @Apptrainers
- Rearranging the Order of Columns We can also define and rearrange the order of the columns at the time of creation of the DataFrame. This makes also sure that we will have a defined ordering of our columns, if we create the DataFrame from a dictionary. Dictionaries are not ordered. | @Apptrainers
- city_frame = pd.DataFrame(cities, columns=["name", "country", "population"]) print(city_frame) | @Apptrainers
- | @Apptrainers
- • But what if you want to change the column names and the ordering of an existing DataFrame? city_frame.reindex(["country", "name", "population"]) print(city_frame) | @Apptrainers
- | @Apptrainers
- • Now, we want to rename our columns. For this purpose, we will use the DataFrame method 'rename'. This method supports two calling conventions • (index=index_mapper, columns=columns_mapper, ...) • (mapper, axis={'index', 'columns'}, ...) • We will rename the columns of our DataFrame into Romanian names in the following example. • We set the parameter inplace to True so that our DataFrame will be changed instead of returning a new DataFrame, if inplace is set to False, which is the default! | @Apptrainers
- city_frame.rename(columns={"name":"Nume", "country":"țară", "population":"populație"}, inplace=True) print(city_frame) | @Apptrainers
- | @Apptrainers
- Existing Column as the Index of a DataFrame • We want to create a more useful index in the following example. We will use the country name as the index, i.e. the list value associated to the key "country" of our cities dictionary: city_frame = pd.DataFrame(cities, columns=["name", "population"], index=cities["country"]) print(city_frame) | @Apptrainers
- | @Apptrainers
- • Alternatively, we can change an existing DataFrame. • We can use the method set_index to turn a column into an index. • "set_index" does not work in-place, it returns a new data frame with the chosen column as the index: | @Apptrainers
- city_frame = pd.DataFrame(cities) city_frame2 = city_frame.set_index("country") print(city_frame2) | @Apptrainers
- | @Apptrainers
- • We saw in the previous example that the set_index method returns a new DataFrame object and doesn't change the original DataFrame. If we set the optional parameter "inplace" to True, the DataFrame will be changed in place, i.e. no new object will be created: city_frame = pd.DataFrame(cities) city_frame.set_index("country", inplace=True) print(city_frame) | @Apptrainers
- | @Apptrainers
- Label-Indexing on the Rows • So far we have indexed DataFrames via the columns. We will demonstrate now, how we can access rows from DataFrames via the locators 'loc' and 'iloc'. ('ix' is deprecated and will be removed in the future) city_frame = pd.DataFrame(cities, columns=("name", "population"), index=cities["country"]) print(city_frame.loc["Germany"]) | @Apptrainers
- | @Apptrainers
- | @Apptrainers
- | @Apptrainers
- Sum and Cumulative Sum • We can calculate the sum of all the columns of a DataFrame or the sum of certain columns: print(city_frame.sum()) | @Apptrainers
- city_frame["population"].sum() 33800614 | @Apptrainers
- We can use "cumsum" to calculate the cumulative sum: | @Apptrainers
- Assigning New Values to Columns • x is a Pandas Series. • We can reassign the previously calculated cumulative sums to the population column: city_frame["population"] = x print(city_frame) | @Apptrainers
- | @Apptrainers
- • Instead of replacing the values of the population column with the cumulative sum, we want to add the cumulative population sum as a new column with the name "cum_population". city_frame = pd.DataFrame(cities, columns=["country", "population", "cum_population"], index=cities["name"]) print(city_frame) | @Apptrainers
- | @Apptrainers
- • We can see that the column "cum_population" is set to NaN, as we haven't provided any data for it. • We will assign now the cumulative sums to this column: city_frame["cum_population"] =city_frame["population"].cumsum() print(city_frame) | @Apptrainers
- | @Apptrainers
- • We can also include a column name which is not contained in the dictionary, when we create the DataFrame from the dictionary. In this case, all the values of this column will be set to NaN: city_frame = pd.DataFrame(cities, columns=["country", "area", "population"], index=cities["name"]) print(city_frame) | @Apptrainers
- | @Apptrainers
- Accessing the Columns of a DataFrame • There are two ways to access a column of a DataFrame. The result is in both cases a Series: # in a dictionary-like way: print(city_frame["population"]) | @Apptrainers
- | @Apptrainers
- # as an attribute print(city_frame.population) | @Apptrainers
- | @Apptrainers
- print(type(city_frame.population)) <class 'pandas.core.series.Series'> | @Apptrainers
- city_frame.population From the previous example, we can see that we have not copied the population column. "p" is a view on the data of city_frame. | @Apptrainers
- Assigning New Values to a Column • The column area is still not defined. We can set all elements of the column to the same value: city_frame["area"] = 1572 print(city_frame) | @Apptrainers
- | @Apptrainers
- • In this case, it will be definitely better to assign the exact area to the cities. The list with the area values needs to have the same length as the number of rows in our DataFrame. # area in square km: area = [1572, 891.85, 605.77, 1285, 105.4, 414.6, 228, 755, 525.2, 517, 101.9, 310.4, 181.8] # area could have been designed as a list, a Series, an array or a scalar city_frame["area"] = area print(city_frame) | @Apptrainers
- | @Apptrainers
- Sorting DataFrames city_frame = city_frame.sort_values(by="area", ascending=False) print(city_frame) | @Apptrainers
- Let's assume, we have only the areas of London, Hamburg and Milan. The areas are in a series with the correct indices. We can assign this series as well: city_frame = pd.DataFrame(cities, columns=["country", "area", "population"], index=cities["name"]) some_areas = pd.Series([1572, 755, 181.8], index=['London', 'Hamburg', 'Milan']) city_frame['area'] = some_areas print(city_frame) | @Apptrainers
- | @Apptrainers
- Inserting new columns into existing DataFrames • In the previous example we have added the column area at creation time. Quite often it will be necessary to add or insert columns into existing DataFrames. • For this purpose the DataFrame class provides a method "insert", which allows us to insert a column into a DataFrame at a specified location: insert(self, loc, column, value, allow_duplicates=False)` | @Apptrainers
- | @Apptrainers
- city_frame = pd.DataFrame(cities, columns=["country", "population"], index=cities["name"]) idx = 1 city_frame.insert(loc=idx, column='area', value=area) print(city_frame) <class 'pandas.core.frame.DataFrame'> | @Apptrainers
- | @Apptrainers
- | @Apptrainers
- DataFrame from Nested Dictionaries A nested dictionary of dictionaries can be passed to a DataFrame as well. The indices of the outer dictionary are taken as the columns and the inner keys. i.e. the keys of the nested dictionaries, are used as the row indices: | @Apptrainers
- | @Apptrainers
- | @Apptrainers
- • You like to have the years in the columns and the countries in the rows? No problem, you can transpose the data: growth_frame.T | @Apptrainers
- | @Apptrainers
- • Consider: growth_frame = growth_frame.T growth_frame2 = growth_frame.reindex(["Switzerland", "Italy", "Germany", "Greece"]) # remove France print(growth_frame2) | @Apptrainers
- | @Apptrainers
- Filling a DataFrame with random values: import numpy as np names = ['Frank', 'Eve', 'Stella', 'Guido', 'Lara'] index = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"] df = pd.DataFrame((np.random.randn(12, 5)*1000).round(2), columns=names, index=index) print(df) randn: returns sample or samples of random numbers from a normal distribution with Mean as 1st argument and VAR as second argument. | @Apptrainers
- | @Apptrainers
- Summary • So far we have covered the following: • Python 3.0 (scalers, lists, dictionaries, loops, selection, functions) • Numpy • Pandas • The reason for studying these packages is to be able to program the 5 steps in any data science process. | @Apptrainers

Publicité