2. Dictionaries
Stores unordered, arbitrarily indexed data
Consists of key-value pairs
Dict = {key:value, key:value, key:value...}
Note: keys must be immutable!
ergo: numbers, tuples or strings
Values may be anything, incl. another
dictionary
Mainly used for storing associations or
mappings
3. Create, add, lookup, remove
Creation:
mydict = {} (empty), or
mydict = { mykey:myval, mykey2:myval2 }
Adding:
mydict[key] = value
Lookup:
mydict[key]
Remove:
del mydict[key]
4. Dictionary methods
All keys:
mylist.keys() - returns list of keys
All values:
mydict.values() - returns list of values
All key-value pairs as list of tuples:
mydict.items()
Get one specific value:
mydict.get(key [, default])
if default is given, that is returned if key is not present in the
dictionary, else None is returned
Test for presence of key:
key in mydict – returns True or False
5. Dictionary exercise
Log in to freebee as before
Do module load python, then start python
Create this dictionary:
{“A”: 1, 1:”A”, “B”:[1,2,3]}
Find out the following:
how many keys are there?
add “str”: {1:”X”} to the dictionary
is there something stored with key “strx?”
what about the key “str”?
remove the number 3 from the list stored under “B” -
print the results
6. Sets
Similar to lists but:
no order
every element is unique
Can create set from list (duplicates are then
removed)
Add elements with
myset = set()
myset.add(elem)
Neat trick - how to create unique list:
newlist = list(set(oldlist))
7. Set operations
Intersection – found in both sets
set1.intersection(set2)
Union – all elements from both sets
set1.union(set2)
Difference
set1 – set2
Symmetrical difference
set1.symmetric_difference(set2)
8. Set exercise
Create these lists:
[“a”, “B”, 1, “a”, 4], [“c”, 1, 2, “A”, “c”, “a”]
make sets from these two lists
Figure out:
the number of unique elements in each list
the elements present in both
the elements that are not shared
the number of unique elements altogether
the elements that are present in the second set,
but not in the first
9. Input from terminal
Can get input from terminal (user)
Code:
variable = raw_input(“Promt text”)
Prompt text will be printed to screen and the
text the user types in will be stored in
variable
10. Indentation and scope
Python does not use brackets or other
symbols to delineate a block of code
Python uses indentation – either tab or
space
Note: variables can only be seen and used
within the block of code it is in – this is
called scope
11. Flow control
Flow control determines which blocks of
code that will to be executed
One conditional statement
If – else
Two iteration statements
For: iterate over group of elements
While: do until something is true
12. If
Structure:
if <boolean expression>:
code block 1
elif <boolean expression>:
code block 2
else:
code block 3
Only one of these code blocks are executed
Executed block: the one whose expression
first evaluates to True
13. Boolean expressions
Comparisons
A> B A greater than B
A< B A smaller than B
A >= B A greater than or equal to B
A <=B A smaller than or equal to B
A == B A equal to B
A != B A not equal to B
Comparisons can be combined:
and, or, and not
B != C and B > A - results evaluated left-right
Other values
True: non-empty lists, sets, tuples etc
False: 0 and None
14. If exercise
Use the interactive python shell
Create the following:
Empty list
List with elements
A variable with value 0
A variable with value -1
A variable with value None
Use these in an if structure to see which
ones that evaluate to True
15. If script
Create variable that takes input from user
Test to see if:
The sequence contains anything else than
ATGC
The sequence is at least 10 nucleotides long
Report results to user
16. If script
inputstring = raw_input("Input your DNA string: ")
mystring = inputstring.upper()
mylength = len(mystring)
myAs = mystring.count("A")
myCs = mystring.count("C")
myTs = mystring.count("T")
myGs = mystring.count("G")
nucleotidesum = myAs + myCs + myTs + myGs
if nucleotidesum < mylength:
print "String contains something else than DNA"
elif mylength < 10:
print "Length is below 10"
else:
print "Sequence is ok"
17. For
Structure:
For VAR in ITERABLE:
code block
Code block executed for each element in
ITERABLE
VAR takes on value of current element
Iterables are:
Strings, lists, tuples, xrange, byte arrays,
buffers
18. For example
Use the python interactive shell
Create string “ATGGCGGA”
Print out each letter in this string
>>> a = "ATGGCGGA"
>>> for var in a:
... print var
...
A
T
G
G
C
G
G
A
>>>
19. For exercise
Define list of numbers 1-9
Show each number multiplied with itself
>>> a = [1,2,3,4,5,6,7,8,9]
>>> for var in a:
... print var*var
...
1
4
9
16
25
36
49
64
81
>>>
20. xrange
Iterate over a range of numbers
xrange(int): numbers from 0 to int
xrange(start, stop, step):
Start at start, stop at stop, skip step
between each
>>> for i in xrange(0,10,2):
... print i
...
0
2
4
6
8
>>>
21. For exercise
Create dictionary where:
Keys are all combinations of A, B, C
Values are increasing from 1 and up
Hints
Can use two for loops
Adding to an integer variable:
i += 1
22. For exercise
letters = "ABC"
valuedict = {}
i = 1
for letter1 in letters:
for letter2 in letters:
k = letter1 + letter2
i += 1
valuedict[k] = i
print valuedict
[karinlag@freebee]~/tmp/course% python forloopdict.py
{'AA': 2, 'AC': 4, 'AB': 3, 'BA': 5, 'BB': 6, 'BC': 7,
'CC': 10, 'CB': 9, 'CA': 8}
[karinlag@freebee]~/tmp/course%
23. While
Structure
while EXPRESSION:
code block
Important: code block MUST change truth
value of expression, otherwise infinite loop
24. While example
>>> a=10
>>> while True:
... if a<40:
... print a
... else:
... break
... a += 10
...
10
20
30
25. Break
Can be used to break out of a loop
Can greatly improve legibility and efficiency
What happens when next tuple is iterated
over, after 'blue' is found?
26. Homework
ATCurve.py
take an input string from the user
check if the sequence only contains DNA – if
not, promt for new sequence.
calculate a running average of AT content along
the sequence. Window size should be 3, and
the step size should be 1. Print one value per
line.
Note: you need to include several runtime
examples to show that all parts of the code
works.
27. Homework
CodonFrequency.py
take an input string from the user
check if the sequence only contains DNA
– if not, promt for new sequence
find an open reading frame in the string (note,
must be multiple of three)
– if not, prompt for new sequence
calculate the frequency of each codon in the
ORF