1. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 1
Big Data Projects: Unknowns, Estimates and
Returns
Rim Moussa
ENI-Carthage
University of Carthage
LaTICE Lab.
7th
Euro-African Conference on Finance and Economics @
Beït-al-Hekma, Carthage
22nd
of June, 2018
2. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 2
Outline
●Big Data: the 5 V's
●Overview of some Big Data Vertical Markets
»Web (Google)
»Social Networks (Facebook)
»Maritime Trajectory (Marine Traffic)
●Big Data Projects
»Costs Estimates?
»Failures' causes?
»“High qualifications” Pattern
●DEBS'2018 Grand Challenge
●Conclusions
3. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 3
Big Data
↬ The 5 V's
●Volume
»Data at rest: historical data
»Volume refers to the amount of data (terabytes to
petabytes),
»the challenge is data processing at scale.
●Velocity
»Data-in-motion
»Velocity refers to the speed at which new data is
generated,
»The challenge is to integrate and analyze data while it is
being generated.
●Variety
»Data in many forms
»Variety refers to different types of data; e.g. structured
(relational data), semi-structured (XML, JSON, BSON),
unstructured, multimedia,
»The challenge is processing different types of data.
4. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 4
Big Data
↬ The 5 V's
●Veracity
»Veracity refers to the messiness or trustworthiness of the
data,
»the challenge is to integrate uncertain data quality in data
sources
●Value
»Value refers to our ability to turn data into value.
»Big invests (infrastructure, experts, software dvpt...) ,
returned insights must lead to valuable insights
5. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 5
Big Data
↬ Information Retrieval on Web Data
●Crawling, Indexing, Information Retrieval
●The 5 V's
»Volume
●How big is the web?
●Google: the Indexed Web contains at least ~45 billion of
pages (Saturday, 16 June, 2018).
●http://www.worldwidewebsize.com/
»Velocity
●Real-time data: web pages edit, new web content, ...
»Variety
●Web docs (.html), Text docs (.doc,.pdf), Images, Videos,
News,
»Veracity
●To investigate
»Value
●To investigate
7. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 7
Big Data
↬ Google Business Model
●Leader in Algorithmic Search Technology
●Google Revenue Equation
» Revenue = Amount of Time on the Web
» websites hold a Google Ad slot
●Hidden revenue business model
»Keeps users out of the equation, so they don't pay for the
service or product offered,
●The revenue streams come from advertising money spent by
businesses bidding on keywords
●As of 2017, over 90 billion dollars, which consisted of 86% of
google revenues came from advertising
●Google AdWords and Google Ad Sense
●A win-win-win business model
»CPC (cost per click)
»CPM (cost per mile, cost per thousand)
8. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 8
Big Data
↬ Social Networks
●Facebook companies
»Facebook Payments Inc.: to let Facebook generate
revenue through payment business.
»Atlas: ad-serving and measurement platform, offering
services to advertisers and agencies.
»Instagram: Media Sharing Platform.
»Onavo: Mobile utility application.
»Parse: back end infrastructure provider for mobile
applications.
»Moves: Exercise (steps) tracking application.
»Oculus: Virtual reality technology.
»LiveRail: Publisher Monetization Platform.
»WhatsApp: Instant Messaging Client.
»Masquerade: Visual Filters mobile application.
11. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 11
Big Data
↬ Trajectory Maritime Data
Snapshot of vessels tracked by MarineTraffic on 22nd
of June, 2018 (8am GMT+1)
12. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 12
Big Data
↬ Flights Data
Snapshot of vessels tracked by flightradar on 22nd
of June, 2018 (8am GMT+1)
13. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 13
Big Data
↬ The 5 V's by example: Trajectory Maritime Data
●The Danish Maritime Authority (DMA) makes historical
AIS data (2006 : 2018) available to anyone interested,
»1.8 TB
14. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 14
Big Data
↬ Trajectory Maritime Data: Data Pricing
1000 Danish Krone = 134.2 euros (on 22nd
of June 2018)
15. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 15
Big Data
↬ MarineTraffic Online Services
prices on 1st
June, 2018
16. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 16
Big Data
↬ MarineTraffic Online Services
prices on 1st
June, 2018
17. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 17
Big Data
↬ MarineTraffic Online Services
prices on 1st
June, 2018
18. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 18
Big Data
↬ Maritime Traffic: New Agenda
●Autonomous Vessels
●Smart Vessels
●Connected Vessels
●Increase
»Maritime surveillance
»Safety
»Security
»Economy
●Optimum route planning
20. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 20
Data Generation & Consumption Models
●Old Model
»Few companies (producers) are generating data, all
others are consuming data
●New Model
»All of us are generating data, and all of us are
consuming data
If you aren’t paying for it,
you’re the product!
21. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 21
Big Data Project Cost Estimation
●Hardware cost
●Software cost
●Humans Resources cost
»Hardware technicians
»Software developers
»Domain experts
»Decision makers
»Researchers
●...
22. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 22
Hardware Cost
●Hardware Trends
»Large Hard Drives Capacities
»High computing Capacities
»High speed networks
»I/O bottleneck
●I/O bottleneck
»Hard Drives are like bottles of
different sizes having the same
throughput
●Solution:
»Aggregate RW throughputs
»Read/Write from/into multiple hard
drives
●RAID systems by Patterson, Gibson
and Katz @ Berkeley University
23. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 23
New Hardware Architecture for Big Data
»But, High cost → let's migrate data/software to
cloud,
New Software,
High power consumption,
Cooling systems ...
Scale-out: Horizontal Scaling
24. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 24
Big Data Technologies
↬ Landscape
Source: https://chiefmartec.com/2017/05/marketing-techniology-landscape-supergraphic-2017/
Retrieved: 1st
of June 2018
25. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 25
Why some Big Data Projects Fail?
●Unknown Unknowns
»"There are known knowns; there are things we
know we know. ... There are known unknowns; that
is to say we know there are some things we do not
know. But there are also unknown unknowns -- the
ones we don't know we don't know." D. Rumsfeld (US
Defense Secretary, 2002)
●Data will Speak for themselves
»business questions which are undefined or imprecise
26. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 26
Why do some Big Data Projects Fail?
●Data quality and provenance,
●Hardware Cost
●Complex architectures
●Cost of hiring skilled teams,
»Expert software developers (Graduate studies in CS)
»Business experts in each vertical market
●Immature technologies
●Cost management of systems in the cloud
27. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 27
Qualifications
↬ Example: DEBS'2018 Grand Challenge
●Data
»Static information
●Ports' locations around the world.
»History Data of data streams
●Each ship sends a tuple according to its behaviour based
on the AIS specifications
●Queries
»Q1: Predicting destinations of ships
»Q2: Predicting arrival times of ships
28. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 28
DEBS'2018 Grand Challenge
↬ Computing Vessels' Trips Patterns (by R. Moussa, 2018)
29. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 29
DEBS'2018 Grand Challenge
↬ Real-time prediction of vessels' future locations
(by R. Moussa, 2018)
Departure Port
30. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 30
time
DEBS'2018 Grand Challenge
↬ Real-time prediction of vessels' future locations
(by R. Moussa, 2018)
31. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 31
time
DEBS'2018 Grand Challenge
↬ Real-time prediction of vessels' future locations
(by R. Moussa, 2018)
32. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 32
DEBS'2018 Grand Challenge
↬ Solution's Engineering (graduate classes)
●Theoretical knowledge
»Advanced System Architectures
»Distributed processing
»Advanced Algorithmics
»Spatial data processing
»Information Retrieval
»Engineering a solution
●Practical Knowledge
»Java programing
»Big Data Frameworks
●Apache Spark
33. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 33
Conclusions
●There is still big room for innovations and improvement in
several directions including: architecture, applications and
systems
34. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 34
Thank you for your Attention
Q & A
Big Data Projects: Unknowns, Estimates and
Returns
Rim Moussa
22nd
of June, 2018
7th
Euro-African Conference on Finance and Economics
@ Beït-al-Hekma, Carthage
35. 22nd
June, 2018 7th
Euro-African Conference on Finance and Economics 35
About Me
Rim Moussa is a tenured associate professor at University of Carthage, and
researcher at LaTICE lab.. She is also habilitated as associate professor in Computer
Science Engineering by the the French National Council of Universities. She received
her M.Sc. and Ph.D in Computer Science (Scalable and Distributed Data Management
Systems) from Université Paris IX Dauphine (France) under the supervision of Pr.
Witold LITWIN.
She ensures both undergraduate and graduate lectures, related to operating
systems, distributed data management systems, agile methods for software
engineering, business intelligence fundamentals and practices: Data Warehousing
and OLAP, NoSQL databases, Spatial databases, and Cloud Computing & High
Performance Computing (Big Data, Apache Hadoop, Apache Spark..).
She participated to multiple R&D projects (SDDS fund by Microsoft CERIA, HA Grid
CERN, ICONS, GORDA, WebArchive, DataScale PIA Inria). Her current research
interests include Scalable and Distributed Data Management systems,
Multidimensional data modeling and querying, Data warehousing and OLAP, Smart
Cities, Big Data Architectures at scale and Spatial Computing at scale.