SlideShare une entreprise Scribd logo
1  sur  23
FBW
05-12-2017
Wim Van Criekinge
Google Calendar
esearch + efetch
Install Extra Packages … finance ?
pip is the preferred installer program.
Starting with Python 3.4, it is included
by default with the Python binary
installers.
pip3.5 install Biopython
#pip3.5 install yahoo_finance
from yahoo_finance import Share
yahoo = Share('AAPL')
print (yahoo.get_open())
Numpy – SciPy – Matplib
• Numpy: Fundamental open source package for
scientific computing with Python.
– N-dimensional array object
Linear algebra, Fourier transform, random
number capabilities
• SciPy (prono1unced “Sigh Pie”) is a Python-based
ecosystem of open-source software for
mathematics, science, and engineering.
• Matplotlib is a python 2D plotting library which
produces publication quality figures in a variety of
hardcopy formats and interactive environments
across platforms.
Numpy – SciPy – Matplib -- > PANDAS ?
• Python Data Analysis Library, similar to:
– R
– MATLAB
– SAS
• Combined with the IPython toolkit
• Built on top of NumPy, SciPy, to some extent
matplotlib
• Panel Data System
– Open source, BSD-licensed
• Key Components
– Series is a named Python list (dict with list as value).
{ ‘grades’ : [50,90,100,45] }
– DataFrame is a dictionary of Series (dict of series):
{ { ‘names’ : [‘bob’,’ken’,’art’,’joe’]}
{ ‘grades’ : [50,90,100,45] }
}
Install Extra Packages … googleapiclient?
Install Extra Packages … twitter
Install Extra Packages …matplotlib …
T = +1
A = -1
C = +1G = -1
2D – random walk
1D – random walk
u(i)=1 for pyrimidines (C or T) and u(i)=-1 for purines (A or G)
Install Extra Packages …matplotlib …. and nolds
Extra Questions (2)
• How many human proteins in Swiss Prot ?
• What is the longest human protein ? The shortest ?
• Calculate for all human proteins their MW and pI, display as
two histograms (2D scatter ?)
• How many human proteins have “cancer” in their
description?
• Which genes has the highest number of SNPs/somatic
mutations (COSMIC)
• How many human DNA-repair enzymes are represented in
Swiss Prot (using description / GO)?
• List proteins that only contain alpha-helices based on the
Chou-Fasman algorithm
• List proteins based on the number of predicted
transmembrane regions (Kyte-Doollittle)
 Amino acid sequences fold onto themselves to become a
biologically active molecule.
There are three types of local segments:
Helices: Where protein residues seem to be following the shape
of a spring. The most common are the so-called alpha helices
Extended or Beta-strands: Where residues are in line and
successive residues turn back to each other
Random coils: When the amino acid chain is neither helical nor
extended
Secondary structure of protein
Chou-Fasman Algorithm
Chou, P.Y. and Fasman, G.D. (1974). Conformational parameters for amino acids in helical,
b-sheet, and random coil regions calculated from proteins.Biochemistry 13, 211-221.
Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry 13,
222-245.
Analyzed the frequency of the 20 amino acids in alpha helices,
Beta sheets and turns.
• Ala (A), Glu (E), Leu (L), and Met (M) are strong predictors of
 helices
• Pro (P) and Gly (G) break  helices.
• When 4 of 5 amino acids have a high probability of being in an alpha helix, it predicts a
alpha helix.
• When 3 of 5 amino acids have a high probability of being in a
b strand, it predicts a b strand.
• 4 amino acids are used to predict turns.
Calculation of Propensities
Pr[i|b-sheet]/Pr[i], Pr[i|-helix]/Pr[i], Pr[i|other]/Pr[i]
determine the probability that amino acid i is in
each structure, normalized by the background
probability that i occurs at all.
Example.
let's say that there are 20,000 amino acids in the database, of
which 2000 are serine, and there are 5000 amino acids in
helical conformation, of which 500 are serine. Then the
helical propensity for serine is: (500/5000) / (2000/20000) =
1.0
Calculation of preference parameters
• Preference parameter > 1.0  specific
residue has a preference for the specific
secondary structure.
• Preference parameter = 1.0  specific
residue does not have a preference for, nor
dislikes the specific secondary structure.
• Preference parameter < 1.0  specific
residue dislikes the specific secondary
structure.
Calculation of Propensities
Pr[i|b-sheet]/Pr[i], Pr[i|-helix]/Pr[i], Pr[i|other]/Pr[i]
determine the probability that amino acid i is in
each structure, normalized by the background
probability that i occurs at all.
Example.
let's say that there are 20,000 amino acids in the database, of
which 2000 are serine, and there are 5000 amino acids in
helical conformation, of which 500 are serine. Then the
helical propensity for serine is: (500/5000) / (2000/20000) =
1.0
Calculation of preference parameters
• Preference parameter > 1.0  specific
residue has a preference for the specific
secondary structure.
• Preference parameter = 1.0  specific
residue does not have a preference for, nor
dislikes the specific secondary structure.
• Preference parameter < 1.0  specific
residue dislikes the specific secondary
structure.
Preference parameters
Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3)
Ala 1.45 0.97 0.57 0.049 0.049 0.034 0.029
Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101
Asn 0.73 0.65 1.68 0.101 0.086 0.216 0.065
Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059
Cys 0.77 1.30 1.17 0.089 0.022 0.111 0.089
Gln 1.17 1.23 0.56 0.050 0.089 0.030 0.089
Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021
Gly 0.53 0.81 1.68 0.104 0.090 0.158 0.113
His 1.24 0.71 0.69 0.083 0.050 0.033 0.033
Ile 1.00 1.60 0.58 0.068 0.034 0.017 0.051
Leu 1.34 1.22 0.53 0.038 0.019 0.032 0.051
Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073
Met 1.20 1.67 0.67 0.070 0.070 0.036 0.070
Phe 1.12 1.28 0.71 0.031 0.047 0.063 0.063
Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062
Ser 0.79 0.72 1.56 0.100 0.095 0.095 0.104
Thr 0.82 1.20 1.00 0.062 0.093 0.056 0.068
Trp 1.14 1.19 1.11 0.045 0.000 0.045 0.205
Tyr 0.61 1.29 1.25 0.136 0.025 0.110 0.102
Val 1.14 1.65 0.30 0.023 0.029 0.011 0.029
Applying algorithm
1. Assign parameters (propensities) to residue.
2. Identify regions (nucleation sites) where 4 out of 6 residues have
P(a)>100: a-helix. Extend helix in both directions until four
contiguous residues have an average P(a)<100: end of a-helix. If
segment is longer than 5 residues and P(a)>P(b): a-helix.
3. Repeat this procedure to locate all of the helical regions.
4. Identify regions where 3 out of 5 residues have P(b)>100: b-
sheet. Extend sheet in both directions until four contiguous
residues have an average P(b)<100: end of b-sheet. If P(b)>105
and P(b)>P(a): b-sheet.
5. Rest: P(a)>P(b)  a-helix. P(b)>P(a)  b-sheet.
6. To identify a bend at residue number i, calculate the following
value: p(t) = f(i)f(i+1)f(i+2)f(i+3)
If: (1) p(t) > 0.000075; (2) average P(t)>1.00 in the tetrapeptide;
and (3) averages for tetrapeptide obey P(a)<P(t)>P(b): b-turn.
Additional fun
• Find proteins of at least 250aa that contain the fewest
secondary – structure elements ? Are they candidates
for being IDP ?
• Find proteins that contain no prosite patterns (using
scanner from previous exercise) ?
• Calculate for all human proteins their MW and pI (display
as 2D gel)
P7 2017 biopython3

Contenu connexe

Similaire à P7 2017 biopython3

Bioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekingeBioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekingeProf. Wim Van Criekinge
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Prof. Wim Van Criekinge
 
Biol161 02
Biol161 02Biol161 02
Biol161 02gfb1
 
Chapter-21-Proteins-and-Enzymes Biochem.pdf
Chapter-21-Proteins-and-Enzymes Biochem.pdfChapter-21-Proteins-and-Enzymes Biochem.pdf
Chapter-21-Proteins-and-Enzymes Biochem.pdfMaryJoyceMejia
 
Protein Purification Lecture
Protein Purification LectureProtein Purification Lecture
Protein Purification Lectureouopened
 
Cross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyCross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyChris Mungall
 
Finding Km Values httpwww.brenda-enzymes.org F.docx
Finding Km Values httpwww.brenda-enzymes.org F.docxFinding Km Values httpwww.brenda-enzymes.org F.docx
Finding Km Values httpwww.brenda-enzymes.org F.docxernestc3
 
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and ProteinsLearning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and ProteinsTauqeer Ahmad
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiomejukais
 
04 Amino acids SPECIAL @.pdf
04 Amino acids SPECIAL @.pdf04 Amino acids SPECIAL @.pdf
04 Amino acids SPECIAL @.pdfAlmazGebru2
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSAksw Group
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 

Similaire à P7 2017 biopython3 (20)

Bioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekingeBioinformatics t7-protein structure-v2013_wim_vancriekinge
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
 
Ch08 massspec
Ch08 massspecCh08 massspec
Ch08 massspec
 
Bioalgo 2012-03-massspec
Bioalgo 2012-03-massspecBioalgo 2012-03-massspec
Bioalgo 2012-03-massspec
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
 
Bioinformatica 01-12-2011-t7-protein
Bioinformatica 01-12-2011-t7-proteinBioinformatica 01-12-2011-t7-protein
Bioinformatica 01-12-2011-t7-protein
 
Biol161 02
Biol161 02Biol161 02
Biol161 02
 
Chapter-21-Proteins-and-Enzymes Biochem.pdf
Chapter-21-Proteins-and-Enzymes Biochem.pdfChapter-21-Proteins-and-Enzymes Biochem.pdf
Chapter-21-Proteins-and-Enzymes Biochem.pdf
 
Protein Purification Lecture
Protein Purification LectureProtein Purification Lecture
Protein Purification Lecture
 
Cross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyCross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene Ontology
 
Finding Km Values httpwww.brenda-enzymes.org F.docx
Finding Km Values httpwww.brenda-enzymes.org F.docxFinding Km Values httpwww.brenda-enzymes.org F.docx
Finding Km Values httpwww.brenda-enzymes.org F.docx
 
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and ProteinsLearning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
 
Ionomics
IonomicsIonomics
Ionomics
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
04 Amino acids SPECIAL @.pdf
04 Amino acids SPECIAL @.pdf04 Amino acids SPECIAL @.pdf
04 Amino acids SPECIAL @.pdf
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
 
Ionomics
IonomicsIonomics
Ionomics
 
Interpreting MS\\MS Results
Interpreting  MS\\MS ResultsInterpreting  MS\\MS Results
Interpreting MS\\MS Results
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 

Plus de Prof. Wim Van Criekinge

2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_uploadProf. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 

Plus de Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
Van criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotechVan criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotech
 
P4 2017 io
P4 2017 ioP4 2017 io
P4 2017 io
 
T5 2017 database_searching_v_upload
T5 2017 database_searching_v_uploadT5 2017 database_searching_v_upload
T5 2017 database_searching_v_upload
 

Dernier

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 

Dernier (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

P7 2017 biopython3

  • 1.
  • 5. Install Extra Packages … finance ? pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers. pip3.5 install Biopython #pip3.5 install yahoo_finance from yahoo_finance import Share yahoo = Share('AAPL') print (yahoo.get_open())
  • 6.
  • 7. Numpy – SciPy – Matplib • Numpy: Fundamental open source package for scientific computing with Python. – N-dimensional array object Linear algebra, Fourier transform, random number capabilities • SciPy (prono1unced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. • Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
  • 8. Numpy – SciPy – Matplib -- > PANDAS ? • Python Data Analysis Library, similar to: – R – MATLAB – SAS • Combined with the IPython toolkit • Built on top of NumPy, SciPy, to some extent matplotlib • Panel Data System – Open source, BSD-licensed • Key Components – Series is a named Python list (dict with list as value). { ‘grades’ : [50,90,100,45] } – DataFrame is a dictionary of Series (dict of series): { { ‘names’ : [‘bob’,’ken’,’art’,’joe’]} { ‘grades’ : [50,90,100,45] } }
  • 9. Install Extra Packages … googleapiclient?
  • 10. Install Extra Packages … twitter
  • 11. Install Extra Packages …matplotlib … T = +1 A = -1 C = +1G = -1 2D – random walk 1D – random walk u(i)=1 for pyrimidines (C or T) and u(i)=-1 for purines (A or G)
  • 12. Install Extra Packages …matplotlib …. and nolds
  • 13. Extra Questions (2) • How many human proteins in Swiss Prot ? • What is the longest human protein ? The shortest ? • Calculate for all human proteins their MW and pI, display as two histograms (2D scatter ?) • How many human proteins have “cancer” in their description? • Which genes has the highest number of SNPs/somatic mutations (COSMIC) • How many human DNA-repair enzymes are represented in Swiss Prot (using description / GO)? • List proteins that only contain alpha-helices based on the Chou-Fasman algorithm • List proteins based on the number of predicted transmembrane regions (Kyte-Doollittle)
  • 14.  Amino acid sequences fold onto themselves to become a biologically active molecule. There are three types of local segments: Helices: Where protein residues seem to be following the shape of a spring. The most common are the so-called alpha helices Extended or Beta-strands: Where residues are in line and successive residues turn back to each other Random coils: When the amino acid chain is neither helical nor extended Secondary structure of protein
  • 15. Chou-Fasman Algorithm Chou, P.Y. and Fasman, G.D. (1974). Conformational parameters for amino acids in helical, b-sheet, and random coil regions calculated from proteins.Biochemistry 13, 211-221. Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry 13, 222-245. Analyzed the frequency of the 20 amino acids in alpha helices, Beta sheets and turns. • Ala (A), Glu (E), Leu (L), and Met (M) are strong predictors of  helices • Pro (P) and Gly (G) break  helices. • When 4 of 5 amino acids have a high probability of being in an alpha helix, it predicts a alpha helix. • When 3 of 5 amino acids have a high probability of being in a b strand, it predicts a b strand. • 4 amino acids are used to predict turns.
  • 16. Calculation of Propensities Pr[i|b-sheet]/Pr[i], Pr[i|-helix]/Pr[i], Pr[i|other]/Pr[i] determine the probability that amino acid i is in each structure, normalized by the background probability that i occurs at all. Example. let's say that there are 20,000 amino acids in the database, of which 2000 are serine, and there are 5000 amino acids in helical conformation, of which 500 are serine. Then the helical propensity for serine is: (500/5000) / (2000/20000) = 1.0
  • 17. Calculation of preference parameters • Preference parameter > 1.0  specific residue has a preference for the specific secondary structure. • Preference parameter = 1.0  specific residue does not have a preference for, nor dislikes the specific secondary structure. • Preference parameter < 1.0  specific residue dislikes the specific secondary structure.
  • 18. Calculation of Propensities Pr[i|b-sheet]/Pr[i], Pr[i|-helix]/Pr[i], Pr[i|other]/Pr[i] determine the probability that amino acid i is in each structure, normalized by the background probability that i occurs at all. Example. let's say that there are 20,000 amino acids in the database, of which 2000 are serine, and there are 5000 amino acids in helical conformation, of which 500 are serine. Then the helical propensity for serine is: (500/5000) / (2000/20000) = 1.0
  • 19. Calculation of preference parameters • Preference parameter > 1.0  specific residue has a preference for the specific secondary structure. • Preference parameter = 1.0  specific residue does not have a preference for, nor dislikes the specific secondary structure. • Preference parameter < 1.0  specific residue dislikes the specific secondary structure.
  • 20. Preference parameters Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.101 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 0.089 Gln 1.17 1.23 0.56 0.050 0.089 0.030 0.089 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 1.68 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.050 0.033 0.033 Ile 1.00 1.60 0.58 0.068 0.034 0.017 0.051 Leu 1.34 1.22 0.53 0.038 0.019 0.032 0.051 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.070 0.036 0.070 Phe 1.12 1.28 0.71 0.031 0.047 0.063 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.79 0.72 1.56 0.100 0.095 0.095 0.104 Thr 0.82 1.20 1.00 0.062 0.093 0.056 0.068 Trp 1.14 1.19 1.11 0.045 0.000 0.045 0.205 Tyr 0.61 1.29 1.25 0.136 0.025 0.110 0.102 Val 1.14 1.65 0.30 0.023 0.029 0.011 0.029
  • 21. Applying algorithm 1. Assign parameters (propensities) to residue. 2. Identify regions (nucleation sites) where 4 out of 6 residues have P(a)>100: a-helix. Extend helix in both directions until four contiguous residues have an average P(a)<100: end of a-helix. If segment is longer than 5 residues and P(a)>P(b): a-helix. 3. Repeat this procedure to locate all of the helical regions. 4. Identify regions where 3 out of 5 residues have P(b)>100: b- sheet. Extend sheet in both directions until four contiguous residues have an average P(b)<100: end of b-sheet. If P(b)>105 and P(b)>P(a): b-sheet. 5. Rest: P(a)>P(b)  a-helix. P(b)>P(a)  b-sheet. 6. To identify a bend at residue number i, calculate the following value: p(t) = f(i)f(i+1)f(i+2)f(i+3) If: (1) p(t) > 0.000075; (2) average P(t)>1.00 in the tetrapeptide; and (3) averages for tetrapeptide obey P(a)<P(t)>P(b): b-turn.
  • 22. Additional fun • Find proteins of at least 250aa that contain the fewest secondary – structure elements ? Are they candidates for being IDP ? • Find proteins that contain no prosite patterns (using scanner from previous exercise) ? • Calculate for all human proteins their MW and pI (display as 2D gel)