Fortune Time Institute: Big Data - Challenges for Smartcity
1. Big and Open Data
Challenges for Smartcity
Dr. Victoria López
Grupo G-TeC
www.tecnologiaUCM.es
Universidad Complutense de Madrid
August
26th
2014
55 Exchange
Place
NYC
2. Big and Open data. Challenges for Smartcity
• What about Big Data?
• Fighting with Big Data.
• Big Data. Big Projects. Privacity.
• Open Data. Transparency. Smartcities.
3. What about Big Data?
From Data Warehouse to Big Data (large Data Bases)
3
1970 relational model invented
RDBMS declared mainstream till 90s
One-size fits all, Elephant vendors- heavily
encoded even indexing by B-trees.
6. Fighting with the Big Data
Bioinformatics, Genoma data, DNA, RNA, Proteins and,
in general all biological data have been required by
computing monitors and storing in large data bases in
several laboratories and researching centers along the
world
The Human Genome Project
6
8. Web Issues: Short path
8
Joke but, behind our comfortable position there are
some math and programming…
9. • Restrictions:
– Total time
– Total Costs
– Date/hour
• How to sort the results?
– http://www.sorting-algorithms.com/
9
Web issues: Searching & Sorting
10. How many?
10
Order your room now!
One teenager working = one afternoon at home
11. How many?
11
Order all New York rooms NOW!
One teenager working alone?
13. 13
Big Data: Map Reduce
• Created by Google (2004)
– Parallel programming model
– Simple concept, smart, suitable for multiple applications
– Big datasets multi-node in multiprocessors
– Sets of nodes: Clusters or Grids (distributed programming)
– Able to process 20 PB per day
– Based on Map & Reduce, classical methods in functional programming
related to the classic Divide & Conquer
– Come from numeric analysis (big matrix products).
• Main feature: scalability to many nodes
– Scan of 100 TB in 1 node @ 50 MB/sec = 23 days
– Scan in a cluster of 1000 nodes = 33 minutes
14. Big Data: Hadoop, Spark
– Used by Yahoo!, Facebook, Twitter
Amazon, eBay…
– Can be used in different architectures:
both clusters (in-house) and grid
(Cloudcomputing)
https://hadoop.apache.org/ https://spark.apache.org/ 14
19. Big Data for Big projects
Real Time
The Obama 2012 campaign used data analytics and the
experimental method to assemble a winning coalition vote by
vote. In doing so, it overturned the long dominance of TV
advertising in U.S. politics and created something new in the
world: a national campaign run like a local ward election, where
the interests of individual voters were known and addressed.
19
20. 20
Big Data for Big projects
Real Time
How Brazil vs. Germany played out on Twitter
Geotagged tweets mentioning key terms around the Word Cup game,
July 8, 2014
23. Open Data
“Open data is data that can be freely used, reused and redistributed by anyone –
subject only, at most, to the requirement to attribute and sharealike.”
OpenDefinition.org -
“Open data is data that can be freely used,
reused and redistributed by anyone – subject
only, at most, to the requirement to attribute
and share alike.” OpenDefinition.org
Availability and Access: the data must be
available as a whole and at no more than a
reasonable reproduction cost, preferably by
downloading over the internet. The data
must also be available in a convenient and
modifiable form.
Reuse and Redistribution: the data must be
provided under terms that permit reuse and
redistribution including the intermixing with
other datasets. The data must be machine-readable.
Universal Participation: everyone must be
able to use, reuse and redistribute – there
should be no discrimination against fields of
endeavour or against persons or groups. For
example, ‘non-commercial’ restrictions that
would prevent ‘commercial’ use, or
restrictions of use for certain purposes (e.g.
only in education), are not allowed.
23
26. Recycla.me
Mariam Saucedo
Pilar Torralbo
Daniel Sanz
Ana Alfaro
Sergio Ballesteros
Lidia Sesma
Héctor Martos
Álvaro Bustillo
Arturo Callejo
Belén Abellanas
Jaime Ramos
Ignacio P. de Ziriza
Victor Torres
Alberto Segovia
Miguel Bueno
Mar Octavio de
Toledo
Antonio Sanmartín
Carlos Fernández
MAPA DE RECURSOS
26
RECYCLA.TE
27. Madrid – Smart City
• Parks and gardens
• Parkings for
• Cars
• Motorbikes
• Bikes
• Recycing Points
• Fixed
• Mobile
• Cloths
• Stations
• Bioetanol
• Gas
• Oil
• Electric
• Routes for bikes
• Vías ciclistas
• Calles seguras
• Residential Priority Areas
27
29. The way from data to value
• Big Data Collection
– Monitoring
– Data cleaning and integration
– Hosted Data Platforms and the Cloud
• Big Data Storage
– Modern Data Bases
– Distributed Computing Platforms
– NoSQL, NewSQL
• Big Data Systems
– Security
– Multicore scalability
– Visualization and User Interfaces
• Big Data Analytics
– Fast algorithms
– Data compression
– Machine learning tools
– Visualization & Reporting
29
The MIT proposal stage list
to deal with Big Data
30. Conclusions
30
Big Data, Open Data and Smartcity
• Era of Data Revolution (Alex 'Sandy' Pentland,
http://www.media.mit.edu/people/sandy)
• New technologies & development
• New Business
• Great opportunities in Smartcity development