Big data is large and complex data that exceeds the processing capacity of conventional database systems. It is characterized by high volume, velocity, and variety of data. An enterprise can leverage big data through an analytical use to gain new insights, or through enabling new data-driven products and services. An analogy compares an enterprise's big data architecture to a sugar cane factory that acquires, organizes, analyzes, and generates business intelligence from big data sources to create value for the organization. NoSQL databases are complementary to rather than replacements for relational databases in big data solutions.
DSPy a system for AI to Write Prompts and Do Fine Tuning
Introduction to Big Data - An Overview of Big Data Concepts and Technologies
1. Introduction to Big Data
An analogy between Sugar Cane & Big Data
Image Source: alternative-energy-fuels.com Image Source: MicFarris.com
Jean-Marc Desvaux – March 2012
2. Session Abstract :
What is Big Data ? Where does it apply ?
What are the technologies behind it ?
Is it going to replace your RDBMS ? …
3. Big data, It’s all Silicon Valley is talking about. It’s
the new buzz word after ‘cloud.’
“Everybody is speaking of it and many are
convinced it is the only way forward. As always,
such dramatic statements are not only dangerous
but serve to put some people off the concept. “
6. Big Data is data that exceeds the processing
capacity of conventional database systems.
It’s too big, too fast or does not fit the
structures of database architectures.
To gain value from this type of data you need
an alternative way to process it.
Why this is happening ?
Data is growing faster than computers are
getting bigger.
7. A catch-all term.
Includes Social Networks data, Web logs, MP3s,
Web pages unstructured content, XML, GPS
tracking data, Vehicles Telemetry, financial market
data and many more…
Can be characterized by the 3 Vs :-
Image Source: Tom Kyte’s Big Data Are you ready ? presentation
8. Volume
Data growing faster than machines getting
bigger.
Data sources adding up..
Velocity
Rate of acquisition and desired rate of
consumption.
Variety
Extends beyond structured data, includes
unstructured data of all varieties.
Image Source: Tom Kyte’s Big Data Are you ready ? presentation
10. Big Data value to an Organisation falls into two
main categories :
Analytical Use
Enabling new products
and services
11. Analytical Use
To reveal insights previously hidden because
hard to record and exploit.
An edge on classic Analytics based on
sampling and more “static” &
predetermined reports.
It promotes an investigative approach to
data and put the data scientist and analyst
in the spotlight.
Hal Varian, chief economist at Google
“I keep saying that the sexy job in the next 10 years
will be statisticians”
12. Some terms linked to the Analytical Use of Big Data
Sentiment Analysis :
Mining the Web in real time and getting a quick read of what people are thinking.
Named-entity recognition (NER) (also known as entity
identification and entity extraction) is a subtask of information extraction that
seeks to locate and classify atomic elements in text into predefined categories
such as the names of persons, organizations, locations, expressions of times,
quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big
Brother or Amitabh Bachan)
13. Product/Service Enabler
Some products and services cannot exist if not
backed up by Big Data technologies:
-Need to Scale
-Need a fast Feedback Loop on complex
analytics.
Highly successful Web startups pioneering Big
Data technologies through R&D to enable new
type of products are a good example:
Google, Yahoo, Amazon,Facebook.
14. Sectors with Fast Adoption and High Potential
Financial Sector
Telecommunications
Government
Health
Retail
19. SUGAR CANE FIELDS A Sugar Factory
AQUIRE (HARVEST)
EXTRACT/SCHRED
EVAPORATE/DISTILL/BOIL
DRY/STORE/SUGAR
BOTTOM LINE = VALUE
20. DATA SOURCES
(RDBMS &
An Enterprise Big Data Factory
Data Marketplaces)
AQUIRE (HARVEST)
HDFS NoSQL Database RDBMS
(Hadoop Distributed FS) (Hadoop Distributed FS) Enterprise Applications
ORGANIZE(EXTRACT) Map Reduce Big Data RDBMS
(Hadoop) Connectors Connectors
ANALYSE
Data Warehousing / RDBMS stores
(SCHRED/DISTILL/BOIL)
BUSINESS Analytic Applications
INTELLIGENCE the sweet part (sugar/rhum)
(DECIDE)
BOTTOM LINE = VALUE
23. Another “Turnkey Factory” Example from Oracle
Targeting high-end Analytics
AQUIRE (HARVEST) ORGANIZE(EXTRACT) BUSINESS
ANALYSE INTELLIGENCE
ORGANIZE(EXTRACT) (SCHRED/DISTILL/BOIL) (DECIDE)
Image Source: Tom Kyte’s Big Data Are you ready ? presentation
24. The Microsoft way
+ Of Course, you can build your own factory using
OpenSource widely available and on which most
turnkey factory are built.
28. Turning RDBMS to a legacy data store ?
Not at all.
We need RDBMS to store high value data and for its
feature rich approach (feature first).
NoSQL (scale first) is not a superset of RDBMS
technologies (a bit like Einstein Relativity to Newton
Physics).
Remember NoSQL is not “No SQL” but “Not Only SQL”
30. Rise of Data Marketplaces
Data Science tools development:
More powerful & expressive toolsets for analysis
Streaming Data processing emerging tools
(Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BI
Further cloud-enablement
Ease of integration to Enterprise Sources
32. To leverage Big Data you need something like a Sugar
Factory.
It can be very entry level factory (Excel – Azure Source)
or more complex.
The more complex and complete the more value at the
end of the processing chain
To turn Big Data technologies from developer-centric
solutions to enterprise solutions, they must be
combined with SQL solutions into a single proven
infrastructure meeting manageability and security
requirements of enterprises.
33. The challenge for Enterprises is to simplify Big Data
integration/engineering and leverage it where possible
to improve their processes at tactical and strategic
levels.
Architects & DBAs will be able to make choices for
datastores technologies and will need to understand
where one is better than the other.
Big Data has to be part of the Enterprise Applications
EcoSystem where it will be turned to value.