2. @_opendatahack
About opendatahack.org
Open data hack is a collaborative effort in solving day to day difficulties
faced by local communities, civic bodies and non profit institutions.
Technologists, designers, innovators and government bodies with great
social insights come aboard for a day together to build technology
based solutions availing enormously accessible free open data.
3. @_opendatahack
Our current projects in India?
● Real-time environment vitals monitoring system with suggestions
● Health factors heat map of urban localities in India
● Mapping of all quality abortion clinics in India
7. @_opendatahack
Definition by OKF
A piece of data or content is open if anyone is
free to use, reuse, and redistribute it -
subject only, at most, to the requirement to
attribute and/or share-alike.
8. @_opendatahack
Definition by ODI
Open data is data that is made available by
organizations, businesses and individuals for
anyone to access, use and share.
10. @_opendatahack
Open Data is accessible public data that people,
companies and organizations can use to launch new
ventures, analyze patterns and trends, make data-
driven decisions, and solve complex problems.
11. @_opendatahack
Benefits of Open Data
● Data Driven Decision Making
● Performance Measurement
● Reduction of Government Costs
● Support an Open Government Initiative
– e.g. Transparency
● Economic Development
● Increased Citizen Engagement
● Talent Attraction / Retention
15. @_opendatahack
Open Data Licenses
● Open Data Commons Public Domain Dedication
and Licence (ODC PDDL) – Public domain
● Creative Commons CCZero – Public domain
● Open Data Commons Attribution License –
Attribution for data(bases)
● Open Data Commons Open Database License
(OdbL) - Attribution-ShareAlike for data(bases)
17. @_opendatahack
What is Data Science?
Data science ~ computer science +
mathematics/statistics + visualization
18. @_opendatahack
Data is just like crude
● It’s valuable, but if unrefined it cannot really be used.
● It has to be changed into gas, plastic, chemicals, etc
to create a valuable entity that drives profitable
activity
- Data must be broken down and analyzed for it to
have value.
21. @_opendatahack
Data cleansing
● Harvested data may come with lots of noise or
interesting anomalies.
● Goal is to provide structured presentation for
analysis.
- Network(graph)
- Values with dimension
24. @_opendatahack
Some tips & ethics
● Use the mobile version of the sites if available
● No cookies
● Respect robots.txt
● Identify yourself
● If possible, download bulk data first, process it later
● Prefer dumps over APIs, APIs over scraping
● Be polite and request permission to gather the data
● Worth checking: https://scraperwiki.com/
25. @_opendatahack
Data analyzing
● Numpy
- Offers efficient multidimensional array object, ndarray
- Basic linear algebra operations and data types
- Requires GNU Fortran
● Scipy
- Builds on top of NumPy
- Modules for statistics, optimization, signal processing, ...
- Add-ons (called SciKits) for machine learning, data mining, etc
● For analysing networks
- NetworkX
- igraph
29. @_opendatahack
Some conviniet data formats
● JSON (import simplejson)
● XML (import xml)
● RDF (import rdflib, SPARQLWrapper)
● GraphML (import networkx)
● CSV (import csv)
30. @_opendatahack
Resource Description Framework
(RDF)
● Collection of W3C standards for modeling complex
relations and to exchange information
● Allows data from multiple sources to combine nicely
● RDF describes data with triples
● - each triple has form subject - predicate - object
e.g. PyconIndia2017 is organized in Delhi
31. @_opendatahack
Why R for Data Science?
● Algorithms
● Visualizations
● Data manupulation
● Integrations
● Easily scalable
32. @_opendatahack
Simple R code for bar graph
# Create the data for the chart.
H <- c(7,12,28,3,41)
# Give the chart file a name.
png(file = "barchart.png")
# Plot the bar chart.
barplot(H)
# Save the file.
dev.off()