2. Motivation: No code rule
“Data beers” rules do not allow to show code during talks.
But data is the same as code!
(cf. Lisp homoiconicity, Unix “rule of representation”)
1 / 16
3. Where is there a lot of code? Github
is the VCS developed by the Linux kernel team
is a Git repository web-based hosting service
2 / 16
4. Inspiration: Blatt maps
U.S. Census Bureau data on second languages in
American households 1
1
http://gizmodo.com/the-most-common-languages-spoken-in-the-u-s-state-by-1575719698
3 / 16
10. Processing Github information
Github offers a REST API, but it has rate limits
GitHub Archive publishes all public commits in hourly
archives
Google BigQuery has the Github timeline as public data
9 / 16
11. Which countries are there in Europe?
There may be new countries:
There may be less countries:
A solution: DBpedia and SPARQL
DBpedia has a SPARQL endpoint to receive queries. There
are wrapper libraries
10 / 16
12. No Twitter
Quite tired of people categorizing tweets. There are many
APIs out there!
Do not worry, we are still going to get rich! → using World
Bank macroeconomic data 2
2
Sherouse, Oliver (2014). Wbdata. Arlington, VA. Available from http://github.com/OliverSherouse/wbdata.
11 / 16
14. corr(GDP, language)
Figure: Pearson correlation of GDP with language preference 3
3
Negative values denote a language used in richer countries; a low value in the language precedence means a
higher place in the language preference list for a country.
13 / 16
15. corr(unemployment, language)
Figure: Pearson correlation of unemployment with language preference4
4
Positive values show preferred languages in countries with low unemployment
14 / 16
16. corr(debt, language)
Figure: Pearson correlation of total government debt as % of GDP with language preference5
5
Positive values show preferred languages in countries with low debt
15 / 16
18. Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
16 / 16
19. Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
BigQuery and other tools: your laptop controls clusters
16 / 16
20. Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
BigQuery and other tools: your laptop controls clusters
All languages are beautiful
16 / 16
21. Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
BigQuery and other tools: your laptop controls clusters
All languages are beautiful
but do not program in OCaml if you can avoid it
16 / 16