How can startups find data and use it to help their business?
Presentation for the Digital Incubation Center, Qatar Ministry of Transportation and Communications
Heather Leson
March 9, 2016
http://www.ictqatar.qa/en/dic
http://qcri.org.qa/
6. Cultural: Data about cultural works and artefacts — for example titles and authors —
and generally collected and held by galleries, libraries, archives and museums.
Science: Data that is produced as part of scientific research from astronomy to
zoology.
Finance: Data such as government accounts (expenditure and revenue) and
information on financial markets (stocks, shares, bonds etc).
Statistics: Data produced by statistical offices such as the census and key
socioeconomic indicators.
Weather: The many types of information used to understand and predict the weather
and climate.
Environment: Information related to the natural environment such presence and
level of pollutants, the quality and rivers and seas.
Transport: Data such as timetables, routes, on-time statistics.
Types of Open Data
(Source: okfn.org)
14. Storyteller
Role: Generate Ideas, interesting questions, help defining the questions and assist in the information
products/story outputs.
Scout
Role: Scouts hunt down data from across the web. They can be non-technical or technical, depending on
how difficult it is to obtain data (whether it is easily downloadable or needs to be scraped etc).
Analyst
Role: Analysts are the ones who crunch the data found by the scouts and test the hypotheses generated
by the storytellers.
“Engineers” (Optional)
Role: create information outputs (varying degrees of technical from coding to using ‘off the shelf’ tools
Designers
Role: Beautify the outputs and make sure the story really comes through the data.
15. 3. How to:
Data Clinics to
connect
entrepreneurs,
business and
government
22. What are the questions you seek to answer?
What is the license? Can you reuse/publish the data?
Is the source credible?
Is the data credible?
Where did they get their data?
How much time do I have to search?
How am I organizing my research?
Keen to learn more about verification? http://verificationhandbook.com/ (it
is in Arabic too!)
Consider
23. Who is publishing about Qatar...on biodiversity?
United States 7,440 occurrences, 97.77% geo-
referenced.
United Kingdom 832 occurrences, 8.29% geo-referenced.
Sweden 620 occurrences, 0.32% geo-referenced.
Netherlands 298 occurrences, 5.03% geo-referenced.
Source: Global Biodiversity Information Facility
24. What about data on tourism?
Source: Knoema Data Atlas, which
aggregates the World Development
Indicators, 2015
$6, 616,000,000 USD
International Tourism
expenditures for travel items
(Time for more boutique
travel startups)
25. World Bank UN Data
UNESCO Institute of Statistics
HDX WEF
Forbes: Top 35 big data sources
Visually: 30 places to find Open Data
27. Ministry of Development Planning and Statistics
In economic statistics:
Quarterly and annual Gross Domestic Product -GDP (constant and current) by economic
activity
Monthly, quarterly and annual Consumer Price Index, Production Price Index-PPI,
Foreign Trade Statistics (import and export), Building permits
In social statistics:
Labor force statistics (through a labor force sample survey)
Marriage, health, birth, fertility, education, disability, mortality statistics (in coordination with
other ministries)
In environmental statistics:
Monthly rainfall, Monthly and annual average concentrations of air pollutants, Capacities
of urban wastewater treatment plants
In population statistics: Population growth rate, Population sex ratio
28. QALM portal (Qatar Information Exchange)
QALM is an ambitious national project, developed by a number of government partners
including: The General Secretariat for Development Planning, The Statistics Authority, The
Supreme Council of Health, The Supreme Education Council, Supreme Council of Family Affairs,
ictQATAR, Ministerial Cabinet and the Permanent Population Committee.
http://www.qalm.gov.qa/
Data is available in multiple formats!
To get data from the Ministry of Development. Check their website. If you are looking for other
data, they are an email away. ICU@mdps.gov.qa
35. Open Refine http://openrefine.org/
Sublime Text
https://www.sublimetext.com/
There are many tools for software
developers and data scientists too.
Note: you still need the Human API to analyze and
make decisions for your business. Of course, if you
can afford it, then you can get your business
intelligence from KPMG, Gartner, Bloomberg,
McKinley or PWC. Until then….
Some tools to Clean Datasets
Learn more with Lillian and her
online courses.
36. Tools for Charts, Graphs and Infographics
http://tableau.com/
http://infogr.am/
http://piktochart.com/
https://www.canva.com/
More LMGTFY: http://www.creativebloq.com/design-tools/data-visualization-
712402
(source: TuktukDesign, Noun Project ccby)
37. Map tools
Mapbox: http://mapbox.com/
CartoDB: http://academy.cartodb.com/
Leaflet: http://leafletjs.com/
Google: https://www.google.com/mapmaker
ARCgis: https://www.arcgis.com/features/
Time mapper: http://timemapper.okfnlabs.org/
Also: if you are collecting your own location data, try Field Papers or
crowdsource map photos with Mapillary. (They just got 8M funding!)
(source: Mister Pixel, Noun Project, ccby)
38. QCRI Combining Data Sources: Real-Time Traffic
Monitoring
● Collection and classification of traffic
related tweets (script, research tool)
● Continuous Real-time querying of
Google Traffic API
● Qatar Traffic Profiling & Modeling
○ Geo: City, zone, district
○ Time: Hourly, daily, weekly,
and monthly
● Usage:
○ Detection of abnormal
behaviors
○ Predictions
○ Monthly Public reports
■ Commute status
■ Deadpoints
39. The best way to learn
is to find data and
make data information
products.
Try to recreate the
diagrams and track
back the data.
Track how other
startups use data.
Copy. Remix.
41. Impact of Data-Driven Business
You know your business. Data can give you a
leading edge. Be a Data-Driven Startup.
Some reading:
ODI Report: The Economic Impact of Open Data
ODI - Open Data Means Business
How to build a business from Open Data (1)
How to build a business from Open Data (2)
OpenMENA - 19 studies on Open Data
42. ABC: Always be Charging
How can you have a Data-Driven Career?
What is your Data Plan for your startup?
Can you use Data-Driven Journalism techniques to improve your
business?
What kind of data do you need to grow your business?
What type of training do you want/need?
Data-Driven Startups to be held at the DIC, Qatar Ministry of Transportation and Communications
http://www.ticketfun.me/index/event?eid=999
http://textontechs.com/2016/03/primer-on-data-driven-innovation-for-startups/
Your startup is all about data. From your market segmentation analysis to your business intelligence to your customer management system and beyond. Understanding the tools and formats on how to use data and data skills makes you a business leader and a “Data Driven Startup”
To show how data-driven startups can be successful, I’ll share some data basics followed by some local and regional examples of data startups.
There are many types of data. I like to think of it in layers (mainly due to my love of maps). This diagram is to give you an picture into all the types of data and how they might interact to tell stories, do good and sell your startup outputs. Every startup will use a different combination of this.
Open Data is available in some countries and regions. Qatar currently has an open data policy and it is listed in the National Strategy . http://opendatahandbook.org/. See some of the impact via this report - http://odimpact.org/ More from https://okfn.org/opendata/
Kasra.co is an arabic online news site that targets Arabic language speakers worldwide, especially in MENA. Kasra leverages social media to assist driving traffic, mainly from Facebook. Kasra’s Facebook page has 1M followers. The News Analytics team at QCRI is working to help with social data analytics. (Team is lead by Jim Jansen, Principal Scientist) http://qcri.org.qa/our-people/bio?pid=235&par=acc&name=JimJansen
1. We are using online traffic data to assist in topic selection for their online articles
2. Goal is to understand what types of articles go viral
3. Research aim is to prediction the popularity of articles
From Kasra.co
http://goo.gl/H3mLyc
More about Kasra - http://textontechs.com/2015/08/in-their-own-words-via-kasra/
Metis is a local startup that focuses on connecting students to planning. This objective of this project is to develop student-centric academic planning software for universities and students, using elective based system, which are very flexible but imposes greater challenges for students completing on time
http://www.menafn.com/1094627802/Qatar--New-tool-helps-university-students-plan-their-courses
http://www.gulf-times.com/story/483430/New-tool-helps-university-students-plan-their-cour
https://www.facebook.com/metiscmu/ From Sabih “Regarding data, we have relied primarily on statistics. We did our pilot for 2 weeks, collected data on student interaction and their behavior towards short-term and long-term degree planning. Even though the data itself was not statistically significant, we got good insights on what further data to collect in production mode and how this data can be input back to our recommendation system.”
Waleed Abd El Rahman is creating a data-driven business. Making healthy nutrition available for everyone through spreading entrepreneurship. He is also connecting with local communities to help grow their business. Which brings up the important point. Let’s move beyond hackathons to ongoing sustainable growth for entrepreneurs. With the local community behind his business, he is growing his supporters and his ability to use talent to inspire. https://eg.linkedin.com/in/waleed-abd-el-rahman-1b9a6312 http://getmumm.com/
Exantium is a leading UAE-based advisory firm focusing on the public sector transformation in the GCC and the Arab world, driven by cutting-edge innovations, strategic digital transformation initiatives and world-class informational policies. http://exantium.com/ Exantium did a recent Smart Government course. http://exantium.com/?p=609, and is focused on Smart Cities. They are also an Open Data Institute Node. ODI works to connect business, government and entrepreneurs to the power of data. http://dubai.theodi.org/ hey have an upcoming course - http://www.mbrsg.ae/HOME/EXECUTIVE-EDUCATION/Open-Enrolments-Programs/Open-Data.aspx?lang=en-US
For the Data-Driven Innovation workshop, I wrote a blog post about what I think needs to happen to connect data-driven innovation for local entrepreneurs. http://ddi-mena.org/ http://textontechs.com/2016/02/hybrid-skills-needed-to-foster-change/
If you are unsure of the data available, you can productively use the Data Expedition model to help seek and find all the data you might use to answer your questions Example from the amazing Kathmandu Living Labs https://twitter.com/KTMLivingLabs/status/706338515684995072
How to do this http://schoolofdata.org/data-expeditions/
How to do some data projects together http://schoolofdata.org/data-expeditions/
Note all the free courses http://schoolofdata.org/learn/
A data clinic is like a hackathon but you involve all the stakeholders to consider a project. Let’s say you have a dataset and you are trying to prove that this type of data will help business. A data clinic is a technique to work on showcasing your desire to use the data appropriately and also give the officials some insights into how the data might help business. An example from my friend Olu in Nigeria http://schoolofdata.org/tag/data-clinic/ It is always about acheiving buyin. More details “ A data clinic is a workshop where participants bring troublesome data and a data scientist/data journalist together to think about how to use the data and view it.
Data has the power to connect us to our audiences. Ferras Mohssen of BQ Magazine advised that staff called all the embassies and collected population data on Nationalities in Qatar for 2013- 2014. They took professional photos and created this map information diagram. The group did this for other GCC countries (Kuwait, UAE, Bahrain). I use this poster daily to remind myself that local social innovation has such a diverse audience. http://www.bq-magazine.com/economy/2013/12/population-qatar
Sometimes you need to collect your own data with trusted partners. QCRI worked with health professionals and two schools to get data insights into health monitoring. The proposed intervention targets Qatari nationals who are overweight or obese. It involves three phases (1) weight loss camps, (2) after-school clubs as supplement, and (3) maintenance through web and social/family support. Data could provide basis for efforts to stem the rise of obesity in Qatar through lifestyle changes.
Things we’d like to infer from these images:
- what kids *don’t* eat (e.g. leaving vegetables) and if this is personal (= different preferences)
- how they eat (e.g. many kids leave the cutlery clean and unused, others make a huge mess)
- track their calorie intake
Using Crowdflower to label the images, Instagram, mobile data collection
Partners: Qatar University, Imperial College and Leeds
OpenStreetMap is a global map of the world - free and opensource. It counts on local communities to always improve the map with data. Imagine if we had a map of Doha to help businesses. This data is pure diy raw business intelligence. All you have to do is look at groups like Mapillary or Mapbox or Cartodb to see how people are using Maps and location data. Here is how - http://learnosm.org/en/
The Data Pipeline really varies from project to project. There are tools, skills and activities common to some projects. I like to add ethical questions and more. See my article - http://textontechs.com/2014/09/infusing-ethics-into-data-projects/ and the Responsible Data Forum’s work on a project lifecycle - https://wiki.responsibledata.io/Data_in_the_project_lifecycle
You are now on a data expedition. while you are doing this research, you should get ready to answer some of these questions. If you are really keen to learn more about verifying data, consider reading the Verification Handbook. http://verificationhandbook.com/
Also be responsible with your data - http://responsibledata.io/
I found out on my data expedition that the Global Biodiversity Information Facility has free and open datasets (about 54 datasets about Qatar from a number of sources.) While maybe not useful for startup, it makes you wonder how these could be used for studying the SDGs. Or, if you are doing a tourism startup. more on that topic soon. Source http://www.gbif.org/country/QA/about/countries. No date provided on the data.
It is always a good idea to ask questions about the sources and check the dates. Where are they getting this data? Is it predictative? Can it be reused? could we have it in another format? Source: http://knoema.com/atlas/Qatar/Expenditures-for-travel-items but the data is really from the World Development Indicators which is the World Bank. http://data.worldbank.org/data-catalog/world-development-indicators
Here is a quick list of some data sources available http://blog.visual.ly/data-sources/ https://data.hdx.rwlabs.org/ http://data.uis.unesco.org/ http://data.worldbank.org/ http://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#a10f86667961 http://data.un.org/ http://reports.weforum.org/global-competitiveness-report-2015-2016/economies/#economy=QAT
Location data is key for many businesses. There are a few startups here who are using map data. I am just providing some free sources. How to download data: http://wiki.openstreetmap.org/wiki/Downloading_data.
Data Journalism is simply using data to tell a compelling story. Well, startups do that every day with their investors, supporters and customers. You need to differentiate yourself. This is just another item to add to your toolkit, right by ‘how to stay financially viable’.
The Qatar Census is full of usable business intelligence. The data is available on QALM, but let’s say you did not find it. How would you get access to the data? You just need to use it. PDF -
Tools - http://tabula.technology/
I loaded the 111 pages of census data into Tabula. The next step is put the csv into open refine or another too. (QALM also allows you to download into excel if you wish)
Doha News either transcribed the data or used a tool to clean up the data and load it into Tableau. http://dohanews.co/what-are-the-fastest-growing-neighborhoods-in-qatar/
(Census data: http://www.gsdp.gov.qa/portal/page/portal/gsdp_en/knowledge_center/Publications/Tab7/Tab/Census%202015.pdf)
Tableau http://public.tableau.com/profile/peter4596#!/vizhome/ChangeinQatarspopulation2010-2015/Dashboard2
This is just another example on how data from a census can be used to help people see details about their communities. In this case, it was about education and age. At Open Data Day on March 5, 2016 a team in South Africa used census data to do this. They used a tool called plot.ly https://plot.ly/~collierab/457/count-vs-age/
This is just a sampler of Data Visualization tools. You can find more all over the net like this great guide http://visualisingadvocacy.org/:
Noun Project: http://nounproject.com/
more on vis tools and to decide https://blog.infogr.am/15-thought-leaders-define-what-is-data-visualization/
Map tools - http://fieldpapers.org/
My colleagues are working on Real-time Traffic monitoring as part of the Urban Informatics team. Where did they get the data? Google, Social media, Admin boundaries. Doha over time - http://earthshots.usgs.gov/earthshots/node/69#ad-image-0
There are many ways to think about data skills for good. In fact this is how I learned some of these techniques and innovations. The Digital Humanitarian Network is a group of people and communities that do this for humanitarian activities. http://digitalhumanitarians.com/ I would like to point out DataKind and the Standby Task Force. Learning about how people use information for social good is about taking care of our future. It is also a good way to see how you can apply these learnings to social entrepreneurship.
How to collect data to help your career - http://www.slideshare.net/heatherleson/using-your-voice-to-amplify-your-career-may-14-2015
Thanks so much for your interest in QCRI. @qatarcomputing
http://qcri.org.qa/