This document discusses reproducibility in development environments. It defines reproducibility as the ability to duplicate an entire experiment or study. A development environment is the collection of tools, methods, and infrastructure used to build software. Key components of a reproducible development environment include coding styles, software dependencies, configuration files, data, and security keys. The document advocates for exporting dependencies using tools like pip and conda, sharing notebooks on platforms like Anaconda Cloud, checking configuration files into version control, and automating environment setup to improve reproducibility.
1. 11/20/2015 Reproducibility of your development environment
http://localhost:4567/slides/environments.html#/?pdf-print 1/1
| |PyData2015 ContinuumAnalytics malev
Reproducibility of your
development environment
by / /
PyData NYC 2015
Marcos Vanetta @malev Continuum Analytics
3. | |PyData2015 ContinuumAnalytics malev
Reproducibility
Reproducibility is the ability of an entire experiment or study to be
duplicated, either by the same researcher or by someone else working
independently.
4. | |PyData2015 ContinuumAnalytics malev
Development Environment
A computer system in which a computer program or software
component is deployed and executed.
A development environment is a collection of procedures and
tools for developing, testing and debugging an application or
program.
A development environment contains everything required by
a team to build and deploy software-intensive systems.
7. | |PyData2015 ContinuumAnalytics malev
Method
Roles, work products, tasks, and processes
Standards, guidelines, checklists, templates, and examples
Deployment topology
8. | |PyData2015 ContinuumAnalytics malev
Tools
Development tools and their integrations
Development tool configurations and installation scripts
Deployment topology, which considers the software and
hardware required
9. | |PyData2015 ContinuumAnalytics malev
Infrastructure
A development environment considers infrastructure in terms
of both hardware and software.
Locations, nodes, and connectivity
Software (such as operating systems, database management
systems, board-level controls, and test harnesses).
10. | |PyData2015 ContinuumAnalytics malev
How do we work with data?
Everything is production
Everything is NOT production
Multi-language
Local | Cloud | both
Data ~Gb | Data ~Tb | ...
11. | |PyData2015 ContinuumAnalytics malev
What do we want to reproduce?
Coding and documentation styles
Software dependencies (libraries, databases, etc.)
Configuration files and environmental variables
Data (dummy data and real data)
Keys (aws, ssh, etc)
13. | |PyData2015 ContinuumAnalytics malev
Dependencies
Database engines
Installation instructions
Schema
Configuration
Dummy data
Docker or Vagrant
Makefiles or bash scripts
SaaS
Migrations
Automate
14. | |PyData2015 ContinuumAnalytics malev
Dependencies: libraries
pip conda
Lot of packages Data packages mostrly
~ Multi platform Multi platform
Not so fast Fast
Included in Anaconda Included in Anaconda
Consider tools like or .pipreqs defrost
15. | |PyData2015 ContinuumAnalytics malev
Exporting your dependencies with pip
Reusing an environment
Keep it simple
$pipfreeze>requirements.txt
$catrequirements.txt
requests==2.8.1
virtualenv==13.0.1
wheel==0.26.0
$virtualenv.my-env
$source.my-env/bin/activate
(my-env)$pipinstall-rrequirements.txt
16. | |PyData2015 ContinuumAnalytics malev
Exporting your dependencies with conda
Reusing with conda
Keep it simple
$condaenvexport-nplease-work-fenvironment.yml
$catenvironment.yml
name:my-project
dependencies:
-bokeh=0.8.0=np19py27_0
-colorama=0.3.3=py27_0
-pip:
-flask
$condaenvcreate
...
$sourceactivatemy-project
discarding/Users/mvanetta/miniconda/binfromPATH
prepending/Users/mvanetta/miniconda/envs/my-project/bintoPATH
(my-project)$
18. | |PyData2015 ContinuumAnalytics malev
Working with notebooks
Reusing your notebook
$condacreate-nproject
$condainstall-ybokehpandasjupyter
$ipythonnotebookiris.ipynb
$condaenvattach-nirisiris.ipynb
$anacondanotebookuploadiris.ipynb
$anacondanotebookdownloadmalev/iris
$condaenvcreateiris.ipynb
$sourceactivateiris
$ipythonnotebookiris.ipynb
19. | |PyData2015 ContinuumAnalytics malev
Configuration files and environmental variables
Essential part of configuration management
yaml, ini, json files
Generally stored in programmer's brains
Attach to the repo, document, use tools like
and automate.autoenv