3. Learn to Code for Data Analysis
• Started as a 4-week 20-30h Futurelearn MOOC
–Basic Python 3 + function definitions – loops
–R-like pandas library for data analysis
–http://tiny.cc/lcda-ol
–Jupyter notebooks with Anaconda or cocalc.com
• Problems
–learners: time; installation; navigation; feedback
–us: software, sites and data change; assessing
4. Learn to Code for Data Analysis
Follows First Principles of Instruction http://tiny.cc/fpoi
• Problem-driven: weekly project; clean, merge, etc.
• ‘Authentic’: real open data from WHO, WU, WB, UN
• Demonstrate:
–we do analysis and introduce concepts as needed
–we show written up analysis (reproducible research)
• Apply: students work on exercise notebook in parallel
• Integrate: do a different analysis and share (show & tell)
7. Context
• Not a programming module
– ie. we don’t teach python programming
– understanding of python necessary to engage with scientific python libraries
• expect appropriate competence for level 3 study
• Part of data science DA strand
8. Content
• Data lifecycle: Acquire, prepare, analyse, present
– Python techniques for acquiring and cleaning data
– DBs for data storage
– Some machine learning and statistical analyses
– Graph plotting with Matplotlib
9. Tools
• Python 3 language
• Postgresql, MongoDB databases
• Pandas, matplotlib (some scikit.learn) libraries
• Accessed through Jupyter notebooks
– significant teaching materials using notebooks
– including TMA01 submission
12. REQUIRED
NOT
REQUIRED
DESIRABLE
Python distribution includes
non-standard Python package,
or student can install it
themselves
Python process can call out to
third party APIs using http
Jupyter notebook customised with OU
branding
Notebook server seeded with course
notebooks
Jupyter notebook server includes “docx”
export extension and functionality
Saved kernel state
Persisted student files
Have to understand given library and implement function
Have to include tests in screenshot
The TM351 Jupyter notebook server includes several customisations, including:
OU branding of notebooks;
Custom exports: Microsoft Word .docx, ODSzip (a zip file containing the original notebook and the HTML rendering of it)
The notebook source file (suffix .ipynb) is a JSON text file.
It can be rendered to an HTML document using the `nbviewer` application.
The notebook file can also be used interactively, as a GUI to a backend computational process that can execute and evaluate elements identified as “code” in the notebook, and return the results of code execution for display in the notebook.
--
{
ipynb[label=".ipynb\n(JSON text file)"];
nbviewer[label="nbviewer",color='yellow'];
ipynb -> nbviewer;
group {
shape=line;
html1[label="HTML",color='lightgreen'];
nbviewer -> html1;
}
nbserver[label="nbserver",color='yellow'];
ipynb -> nbserver;
group {
shape=line;
py[label="Python process", color='lightblue'];
dots[shape = "dots"];
html2[label="HTML",color='lightgreen'];
nbserver -> dots, py, html2;
nbserver -> dots[style='none'];
}
}