El 3 de julio de 2014, organizamos en la Fundación Ramón Areces una jornada con el lema 'Big Data: de la investigación científica a la gestión empresarial'. En ella estudiamos los retos y oportunidades del Big data en las ciencias sociales, en la economía y en la gestión empresarial. Entre otros ponentes, acudieron expertos de la London School of Economics, BBVA, Deloite, Universidades de Valencia y Oviedo, el Centro Nacional de Supercomputación...
2. Too many
Vs for Big
Data
Batch of new technologies that allow
us to extract value out of a dataset
which, due to it’s volume, variety or
velocity, was not previously exploited
3. “Set of new technologies, able to extract additional
value of all the available data of a company”
4. Petabytes: Google 300 PB, facebook: 45 PB, Yahoo! 180 PB
Exabytes: U.S. healthcare
Zetabytes: 2011, 1.8 ZB created. World Information 9.57 ZB
YottaByte, Brontobyte, GeopByte to be reached
I don’t have so much data…
A big European company = Terabytes
16. UNIVERSAL DATA VALUE
BUSINESS INTELLIGENCE
DATA DRIVEN DECISIONS
BIG DATA INTERACTIVE
BIG DATA REAL TIME DATA
STREAMING
BIG DATA INTELLIGENCE
10
10
20
20
20
BIG DATA STORED 20
100
UDV= = = 0,35
PDR
100
35
100
10 10 20 20 20 20
35
PDR
17. UDV DE TU EMPRESA
BUSINESS INTELLIGENCE
DATA DRIVEN DECISIONS
BIG DATA INTERACTIVE
BIG DATA REAL TIME DATA
STREAMING
BIG DATA INTELLIGENCE
5
5
0
0
0
BIG DATA STORED 0
10
PDR (tú) = UDV 10 = 0,35 10 = 3,5
10 10 20 20 20 20
35
3,5(tú)
PDR
X X
22. 83% of the surveyed companies were
able to do things with Big Data that
seemed impossible to achieve before
“The art of possible”
“Impossible is not a fact, it’s an opinion”
28. Description:
Search the social network comments and
mentions of interest of a particular issue or event
for further evaluation, influencers detection and
graphical display of the conversation to facilitate
analysis.
Advantages:
Show real-time event (symposium, forum,
seminar, etc..) with visual information.
Get opinions and feelings about a topic in social
networks in real time
Identify the influencers of a hot topic
Risk detection and prevention
Emotional mining: Know the term that is most
popular for some people, brand, event, etc.and
this way you can know about the generated
feelings by the most important terms.
Social networks tracking Application
29. Description:
Search the network content and publications on
specific subjects of our interest, to detect, filter,
collect and process relevant information in semi-
real time or batch.
Associated with the semantic analysis this allows
the detection and classification of the contents
effectively.
Advantages:
Allows the generating of sites in a dynamic way
without any intervention or exhaustive searches,
with the contents collected and categorized.
Unifies in a single web all the tasks that users have
to do manually, so it saves them money and
generates loyalty.
Web Content Crawling and Scraping
31. +160% clicks
vs. one size fits all
+79% clicks
vs. randomly selected
+43% clicks
vs. editor selected
Recommended links News Interests Top Searches
Description:
Customizing homepages based on user navigation
Analysis and customization of the homepage and site in
real time for each user based on their browsing
Modification of contents, highlights, ads, in real time
based on user history
Advantages:
Over 300% increase in clickthrough
Creating millions of web pages in real time
Increasing Conversions
Increase in sales
Cost ten times lower than other solutions
Marketing online: Customizing Web Sites (Behavioral
Customization)
32. Description:
Newsletter development, email-marketing or any
other sent material segmented by individual
preferences
Analyzes and takes into account:
•Financial information and user data
•Navigation and usage information from previous
marketing shipments
•Mobile app data (GPS, payments, browsing of
offers…)
• Users’ information from the social networks
Advantages:
Increased clickthrough
Increase in conversions and sales
Natural language processing – semantics and
sentiments
Combines private and public data
Marketing offline: Personalized Marketing with Big Data
33. NH Quality Focus:
Complementing the internal data of a company by
combining the structured and the unstructured
data, with the data generated by the web and
social networks, allows us to determine the validity
of the data of our brand, product or company.
The comparison and analysis of internal and
external data (web) increases the value of our data
and allows us to gain a competitive advantage over
our competitors.
Advantages:
It allows sales improvement.
Improves loyalty.
Increases Conversions.
Detects errors or data manipulation.
SEO improvement with regards to the users and
the public data.
Improves marketing and product boosting with
regards to trends.
Marketing through private structured data with unstructured
public data
34. Description:
Allows you to label and categorize automatically and
massively, any type of content or information.
Advantages:
Allows searching, categorization, clustering, and be
able to extract value out of information otherwise
hardly findable and usable.
Utilizes state of the art tools to identify entities, NED
systems, NERD. These tools combined with the use of
disambiguation of entities using a Big Data system
containing the Wikipedia and other sources of
information.
Speed processing capabilities and data volume
superior to that of other systems.
Massive information tagging
36. COMBINATION AND SPEED
COUCHDBCOUCHDB
Combine all type of data and past, present and future
“Cross Data Spark” main mission is:
•To facilitate the use of data stored in different noSQL databases and data containers
•To allow combining stored data (past), real-time data (present), and future data
(predictive).
37. MACHINE LEARNING AND ALGORITHMS
USING ONLY SPARK FOR ALL PROCESSING:
BATCH, INTERACTIVE AND STREAMING
CROSSDATA SPARK:
Stratio is able to
combine, in one query,
stored data with
streaming data entering
in the system
Polyglots: Spark
integrated with the main
noSQL databases, starting
with Cassandra & Mongo
DB.
38. Lean = Easier deployment, management,
and use of the system
Former Hadoop or
Hybrid Hadoop-Spark Platforms
Stratio Platform
SIMPLE AND EASY
46. “the best way to predict the future is to create it”
THANKS
Óscar Méndez, CEO de Stratio,
Notes de l'éditeur
Hilo de la presentación:
TESIS
----------
Aparación de Big Data 2.0 (cambio ed paradigma Big Query)
Requerimientos: 100X
Necesidad de arquitectura NO-HADOOP para conseguir estos requerimientos
OPORTUNIDAD
------------------------
Dado que es la única plataforma NO-HADOOP open source, si la tesis es correcta será:
The Open Source Big Data 2.0 Platform
There is not any longer one Internet, there are thousands of Internets, one for every user, because Internet is filtered and adapted to every user…
Ellos estan en esto, precisamente, en iniciar, ser proactivos adelantarse a lo que quieren o van a querer los usuarios.
En algunos casos prediciendo lo que van a querer
En otros casos creando productos que el usuario va a querer
Ambas vertientes hay que trabajarlas:
Tienes que predecir, y ser mas inteligente
Tienes que crear o complementar tus productos con lo que van a querer los usuarios. Integrarlos con el mundo tecnologico digital en el que vivimos.
Desarrollos realizados:
NH HOTEL
Proyecto con bases de datos noSQL: Indexadas Lucene, Solr, scrapping, tecnologías semánticas de medición de sentimiento e identificación de identidades
Tags: Crawling / Categorización y Agregación de opiones.
Project: Medición de la reputación de un hotel
Target: Rastrear los sitios más importantes en los que se encuentra información sobre los hoteles. De ellos se extraen las valoraciones de los usuarios, haciendo una clasificación de ellos, para presentarlos en un portal que permite a los administradores de los hoteles ver rápidamente cómo sus hoteles se relacionan con los de la competencia y tambien con los datos recopilados en las encuestas a los clientes. Si hay disparidad en las encuestas de clientes se detecta y analiza si hay “apaño” por parte de la dirección del hotel o se estudia las razones. Tambien está integrado con el SAP contable para corraborar que el aumento de las valoraciones va ligado al aumento de ingresos.
http://www.youtube.com/watch?v=6gmP4nk0EOE
Cada vez hay mas gap con las empresas que lo estan haciendo bien, tienes que recortar terreno sobre ellas, y ganarlo sobre las tradicionales
Tu competencia futura seran empresas que no esperas…. Libros, retail, viajes, banca (paypal), telefonia…
Bill Gates: Deberian empezar a caerte bien los freakies, porque seguramente en el futuro trabajaras para uno de ellos….
http://www.youtube.com/watch?v=6gmP4nk0EOE
Es vital reinventarse crear nuevos productos disruptivos, innovadores...no puedes ser indolente con tus productos, y acomodarte en ellos, tienes que innovar, sacar nuevos productos, convertirlos en tu purple cow, y obtener beneficios de ellos para crear un ambiente propicio para crear nuevos purple cows. Eso es lo que se ha hecho con IRO, y ahora toca obtener de este purple cow todo el beneficio que se pueda y ocupar el mayor espacio posible.