SlideShare une entreprise Scribd logo
1  sur  76
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Hadoop User Group
29 janvier 2015
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Agenda
#1: Traitement des données non structurées (Vidéos, images, …) avec Haven pour
Hadoop,
#2: Apache Flink: Fast and Reliable Large-scale Data Processing,
#3: Etude de cas, projet Hadoop dans le domaine des RH avec Capgemini.
La vectorisation des documents : rendre comparables des informations non
structurées, de nouvelles opportunités pour un acteur de l’emploi
21h00 : Cocktail dinatoire
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Hadoop User Group
Haven pour analyser 100% des
informations
Frédéric Demongeot – EMEA Subject Matter Expert
29 Janvier 2015
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.4
Big Data landscape
Human InformationMachine Data
Business
Data
10% of Information
90% of Information
Annual
Growth
~100%
~10%
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.5
Haven – Big Data platform
Haven
Social media IT/OT ImagesAudioVideo
Transactional
dataMobile Search engineEmail Texts
Catalog massive
volumes of
distributed data
Hadoop/
HDFS
Process and
index all
information
Autonomy
IDOL
Analyze at
extreme scale
in real-time
Vertica
Collect & unify
machine data
Enterprise
Security
Powering
HP Software
+ your apps
nApps
Documents
hp.com/haven
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.6
Leverage existing tools in shared Vertica and Hadoop storage environment
A few words on Vertica
Hadoop
HDFS
External Tables
Flex Tables
Click Stream, Web Session Data
Hive
Integration
(HCatalog)
webHDFS
ANSI SQL
webHCAT
Storage Tiering
Hive Pig
MapReduce
HBase
Cop
y
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.7
Vertica SQL on Hadoop
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.
Autonomy Haven Examples
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.9
“Autonomy, with the power of its IDOL engine,
takes fan data, collects, it stores, and stitches it
together…that helps us understand what is being
talked about across the ecosystem of the sport.”
- Senior Director of IT, NASCAR
HAVEn Solution:
• Autonomy IDOL + Autonomy Explore + HP Enterprise
services
Results:
• Wildly successful NASCAR Fan and Media Engagement
Center collects and aggregates fan information
• Understands sentiment, identifies emerging issues, and
uncovers trends that help the NASCAR team share and enrich
the fan and broadcast experience
NASCAR
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.10
Car Manufacturer
Use Case: Brand monitoring
• Have one single Big Data platform they leverage for multiple projects
• Customer brand and partner analytics
• Connected vehicles
• Aggregate multiple data sources and store all on HDFS, such as:
• Social media
• YouTube rich media
• Internal data sources, such as CRM, car logs
• Rich media analysis - logo recognition, face recognition, speech to text, etc.
• Sentiment Analysis
HAVEn technology
• Autonomy
• Hadoop
• nApps – Integration of Haven technologies
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.11
BlaBlaCar
Improving a ride-sharing community marketplace
Challenge
• Marketing campaigns and web experience for 10M+
members in 12 countries limited by infrequent and slow
data analysis
Solution
• HP Vertica Analytics Platform
• Cloudera Distributed Hadoop, Tableau, & Data Science
Studio
Result
• Optimized performance of CRM campaigns: program
development improved experience for 2M customers per
month
• Refined targeted marketing by integrating social media
and predicting customer behavior through pattern
recognition
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.
The Challenge of Human
Information
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.13
What is Human Information
Information that is created by people and understood by people
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.14
Why is human information different?
Human Information is made up of ideas, is diverse, and has
context.
=
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.15
Strong Information & Weak Information
Key Words are small amounts of very strong information without
context
Larger amounts of weaker information is what humans refer to as “context”
“Mercury”
Is it a planet?Is it an element?Is it a car?With high certainty; it’s an element!
“A heavy element and the only metal that is liquid at standard conditions for
temperature and pressure with the symbol Hg and atomic number 80,
commonly known as quicksilver”
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.16
How does HP IDOL approaches Human Information
Using Adaptive Probabilistic Concept Modeling
Techniques that provide continuous learning based on context.
Techniques that deal in a scalable manner with the subtlety of the real
world.
Adaptive
Probabilistic
Concept
Modeling
Techniques that inform the importance of patterns found in data
This proprietary combination of mathematics model Human Information and is;
Automatic, Fast, Data agnostic, Language independent, Scalable, Accurate,
Dynamic, Real-time, Voice & Video
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.
IDOL - OS of Human
Information
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.18
The OS of Human Information
securely to any source of information through a single cross-enterprise interface layer;Connect
the meaning, concepts and key attributes in all types of human friendly information including documents, emails,
databases, clickstreams, audio, social and rich media, etc…Understand
Inquire, Investigate, Interact and Improve quickly, correctly and compliantly based on a holistic view of
information, market conditions and social trends.
Act &
Automate
IDOL(Intelligent Data Operating Layer) provides an interface for accessing human information from all sources and
of all types and provides common services that leverage the information to applications that need to access them to;
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.19
IDOL: the OS for human information
• Mathematically based
• 15 years and over $280M in R&D
• 170+ Patents
• Language independent
• Built for infrastructure
• All file types, all media types (voice/video)
• Scalable and with security
• Platform/OS /device agnostic
• Managed in place
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.20
Inquire
“Search your data”
When you have criteria or an object that form a question, Inquire functions allow you to return
results that answer that question. It allows you to sift through large quantities of data to find
specific documents that relate to your question or an area of interest.
Investigate
“Analyze your
data”
Investigate functions allow you to use information contained in the results of an inquiry to
analyse those results. The analysis might provide insights that allow you to improve your
inquiry, or it might provide more general information about your content.
Interact
“Personalize your
data”
Using information with affinity to the user to create conceptual Profiles and Agents that reflect
the user’s information needs which in turn can be used to power other functions, Interact
functions encompasses all the functionality to achieve this goal.
Improve
“Enhance your
data”
Improve functions enhance your information with more details that help with the Inquire and
Investigate functions. These functions allow you to add information to data of any type, be it
audio, video or text, that makes it easier to search and retrieve information, or to identify key
features of your content.
The World as Services
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.21
Analyse your Data
Inquire
“Search your data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
What insights does our Human Information hold?
Is there structure that I can use to navigate the data?
Expose Concepts and Patterns
Help me evaluate the information quickly
Intelligent Summarization (simple, concept and context)
Intelligent Highlighting (search terms, phrases, concepts, context, fidelity to query grammar)
Concept Streaming (Real time summaries from Audio, contextual to queries and intent)
Intelligent Results de-duplication including “near” de-duplication
Structured, Semi-structured & XML support
Parametric Searching (unlimited nesting and association support)
Directed Navigation (create compelling navigation for users)
Structured Refinement
Automatic Query Guidance (providing top themes from query results in real time)
Concept Navigation via advanced visualizations (node graphs, theme tracking, broadcast analysis)
Language Independent
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.22
Visualization of main topics Inquire
“Search your
data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.23
Enhance your Data
Inquire
“Search your data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Human Information is rich in features that can, when identified, enhance our analysis
Automatic Classification or Clustering
Automatically determine categories based on patterns and relationships in Human Information
Spot analysis of all themes and grouping within Human Information at any moment in time
Time sensitive analysis; What’s hot? What’s New?
Supervised Classification
Create categories using business rules or training and classify information into those categories
Eduction and Entity Extraction
Extract features and determine characteristics in Human Information
Names, Addresses, Credit Card Information, Sentiment, Intent…..
Audio Analysis
Extract features of Audio information
Speaker independent speech to text, speaker identification, audio events, language identification…..
Image and Video Analysis
Extract features from Video information
Next generation image classification (is this a car?/find more like “this”)
On-screen OCR, logo detection, intelligent scene analysis, Colour and texture analysis, story segmentation….
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.24
Hundreds of conceptual entities
Eduction
Quickly narrow search results with auto-identified facets
and conceptual entities such as employee names from
documents
Validate or customize entities
• Is this a valid credit card number?
• What are all docs that contain SSNs?
• If area code is 415, output as Home Office
Pinpoint accuracy for multibyte languages such as CJK,
Thai and some European languages
Names
Places
IP addresses
Companies
Events
Relationships
Medicines
Airports
Cars
Social Security numbers
Phone numbers
Credit cards
Dates
Holidays
Job titles
Currencies
… many more
Inquire
“Search your
data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.25
Eduction Inquire
“Search your
data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
<Organization>
• National Security Agency
<Names>
• President Obama
• Vladimir Putin
• Edward Snowden
<Places>
• Moscow
• St. Petersburg
• Washington
• Syria
• Russia
<Author>
• Carla Anne Robbins
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.26 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Topical sentiment analysis
Decomposition and classification within a
sentence to pull out specific topics
“I stayed at the Marriott last week, and though the
mattresses were very nice, the service was awful.”
Is this Positive? Negative? Neutral?
How much Positive? How much Negative?
Inquire
“Search your
data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.27
Search video as easily as text
Transform rich media into intelligent assets
Inquire
“Search your
data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Live video or
playback from
archived footage
On-screen text
recognition
Face identification
Automatically generated
transcript using speech
recognition
Speaker identification
Timecode
synchronization
Automatic keyframe
generation
Automate
Automatically create metadata,
keyframes, transcriptions
Understand
Understand video footage and
audio streams in real time
Act
Apply advanced analytics such as
clustering and categorization, and link
with other file types
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.28
Most advanced speech technology
Convert spoken words to text
• Acoustic + Language Model
• Speech-to-Text and IDOL’s conceptual understanding
Eliminate manually adding metadata to A/V clips
Phonetic approaches have major problems
• No Conceptual or Contextual Language Understanding
• Keyword-Based
Model of language disambiguates similar terms
• U.S. President “Bush”
• “bush” as in a large plant
Inquire
“Search your
data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.29
Image technology: Text
Document field extraction
Inquire
“Search your
data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
<item>
<price>$6.23</pri
ce>
<date>10/2/2012
</date>
<purpose>Lunch
</purpose>
…
</item>
OCR: Read text from images
1D and 2D barcode reading
ISBN
(“9870140189865”)
PDF-417 (“LASTNAME,
FIRSTNAME,…”)
Data Matrix
(“The Future of
Ticketing…”)
Many more (about 20
barcode types)
Image artifacts such as wrinkled paper
Avoid non-text parts of the image
Column understanding
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.30
Image technology: 2D objects
Registered image Test image
Generic Logo recognition
Registered
Logos
Test image
Inquire
“Search your
data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.31
Image technology: Human analysisInquire
“Search your
data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Primary clothing color =
white
Not nude
Primary clothing color =
white
Not nude
Primary clothing color =
black
Not nude
Face detection
Face analysis
Found “President Obama”
face
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.
IDOL Architecture
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.33
IDOL architecture supporting next gen apps
Social Media Video Audio Email Texts Mobile Transactional
Data
Documents XML Search Engine Images
HP Autonomy
IDOL Applications
Autonomy Connectors
eDiscovery
Enterprise Search
Media
Monitoring
Social Media
Analytics
Decision
Support
Augmented
Reality
Partner/
In-house apps
HC Analytics
2D/3D clustering, Acoustic signature, Active matching, Agents, Alerting, Auto language detection, Auto query guidance, Boolean & legacy, Operations, Breaking news
clustering, Categorization, Collaboration, Community, Concept highlighting, Concept-query, Summarization, Conceptual retrieval, Context summarization, Cross-
modal suggest, Dynamic n-dimensional, Taxonomy generation, Dynamic XML, Consumption, Eduction, Exact phrase matching, Expertise location, Explicit
profiling, Face recognition, Field modulation, Frame analysis, Fuzzy matching, Hot clustering, Hyperlinking, Image analysis, Image association, Implicit profiling,
Keyword search, Mail object identification, Melody classification, Melody identification, Metadata recognition, Natural language retrieval, Object identification, Object
recognition, Ontology generation, Parametric refinement, Phrase spotting, Proper name identification, Query by example, Real-time aggregation, Routing,
Scene detection, Script alignment, Sentiment analysis, Soundex matching, Speaker identification, Speaker recognition, Spectographic analysis, Spell checking,
Tag reconciliation, Transcription, Video analysis, Voice printing, Word spotting, Work groups, XML tagging….
Repositories
Information
Types
Apps
500
Functions
IDOL Services Multimedia
Informatics
Enrichment
Capture
InteractionAnalytics
Discovery
Concept
Clouds
Active
MatchingVisualization
SharePoint, Hadoop, Email,
ERP,CRM, DB, Data Warehouse, Jive, …
ACA
MediaBin
Connected LiveVault
TRIM
AeD
Data Protector
WorkSite
DigitalSafe
Connectors
…
CloudEnterprise
IDOL
OS for Human Information
Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.34
Hadoop Plays
1. Keyview used in Hadoop to extract text and metadata from data objects using map reduce
2. Connectors used to fill the data-lake from enterprise repositories
3. IDOL (in conjunction with 1 & 2) used to provide deep text analytics on data objects
Inquire
“Search your data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.35
HP IDOL for Hadoop – Potential use case
• HP Hadoop Connectors can ingest
data from other systems into Hadoop
• HP KeyView extracts text and
metadata from Hadoop data
• HP IDOL functions can be performed
on Hadoop data via SDK
Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.
Thank You
Robert Metzger
Flink committer
co-founder, data Artisans
@rmetzger_
rmetzger@data-artisans.com
Apache
Flink
What is Flink
 Collection programming APIs for batch and real-time
streaming analysis
 Backed by a very robust execution backend
• with true streaming capabilities,
• custom memory manager,
• native iteration execution,
• and a cost-based optimizer.
38
The case for Flink
 Performance and ease of use
• Exploits in-memory and pipelining, language-embedded logical APIs
 Unified batch and real streaming
• Batch and Stream APIs on top of streaming engine
 A runtime that "just works" without tuning
• C++ style memory management inside the JVM
 Predictable and dependable execution
• Bird’s-eye view of what runs and how, and what failed and why
39
Example: WordCount
40
case class Word (word: String, frequency: Int)
val env = ExecutionEnvironment.getExecutionEnvironment
env.readTextFile(...)
.flatMap {line => line.split(" ").map(word => Word(word,1))}
.groupBy("word").sum("frequency”).print()
env.execute()
Flink has mirrored Java and Scala APIs that offer the same
functionality, including by-name addressing.
Example: Window WordCount
41
case class Word (word: String, frequency: Int)
val env =
StreamExecutionEnvironment.getExecutionEnvironment
val lines = env.fromSocketStream(...)
lines
.flatMap {line => line.split(" ").map(word => Word(word,1))}
.window(Count.of(100)).every(Count.of(10))
.groupBy("word").sum("frequency”).print()
env.execute()
Defining windows
 Trigger policy
• When to trigger the computation on current window
 Eviction policy
• When data points should leave the window
• Defines window width/size
 E.g., count-based policy
• evict when #elements > n
• start a new window every n-th element
 Built-in: Count, Time, Delta policies
42
Flink API in a nutshell
 map, flatMap, filter, groupBy,
reduce, reduceGroup,
aggregate, join, coGroup,
cross, project, distinct, union,
iterate, iterateDelta, ...
 All Hadoop input formats are
supported
 API similar for data sets and
data streams with slightly
different operator semantics
 Window functions for data
streams
 Counters, accumulators, and
broadcast variables
43
Flink stack
44
Flink Optimizer Flink Stream Builder
Common API
Scala API Java API
Python API
(upcoming)
Graph API
(Gelly)
Apache
MRQL
Flink Local RuntimeEmbedded
environment
(Java collections)
Local
Environment
(for debugging)
Remote environment
(Regular cluster execution)
Apache Tez
Data
storage
HDFSFiles S3 JDBC Flume
Rabbit
MQ
KafkaHBase …
Single node execution Standalone or YARN cluster
Technology inside Flink
 Technology inspired by compilers +
MPP databases + distributed systems
 For ease of use, reliable performance,
and scalability
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Cost-based
optimizer
Type extraction
stack
Memory
manager
Out-of-core
algos
real-time
streamingTask
schedulin
g
Recovery
metadata
Data
serialization
stack
Streaming
network
stack
...
Pre-flight
(client) Master
Workers
Notable runtime features
1. Pipelined data transfers
2. Management of memory
3. Native iterations
4. Program optimization
46
Pipelined data transfers
47
Staged (batch) execution
Romeo,
Romeo,
where art
thou
Romeo?
Loa
d
Log
Searc
h for
str1
Searc
h for
str2
Searc
h for
str3
Grep 1
Grep 2
Grep 3
Stage 1:
Create/cache Log
Subseqent stages:
Grep log for matches
Caching in-memory
and disk if needed
48
Pipelined execution
Romeo,
Romeo,
where art
thou
Romeo?
Loa
d
Log
Searc
h for
str1
Searc
h for
str2
Searc
h for
str3
Grep 1
Grep 2
Grep 3
001100110011001100110011
Stage 1:
Deploy and start operators
Data transfer in-
memory and disk if
needed 49
Note: Log
DataSet is
never
“created”!
Pipelining in Flink
 Currently the default mode of operation
• Much better performance in many cases – no
need to materialize large data sets
• Supports both batch and real-time streaming
 In the future pluggable
• Batch will use combination of blocking and
pipelining
• Streaming will use pipelining
• Interactive will use blocking
50
Memory management
51
Memory management in Flink
public class WC {
public String word;
public int count;
}
empty
page
Pool of Memory Pages
Sorting,
hashing,
caching
Shuffling,
broadcasts
User code
objects
ManagedUnmanaged
52
Flink contains its own memory management stack. Memory is
allocated, de-allocated, and used strictly using an internal buffer pool
implementation. To do that, Flink contains its own type extraction and
serialization components.
Configuring Flink
 Per job
• Parallelism
 System config
• Total JVM heap size (-Xmx)
• % of total JVM size for Flink runtime
• Memory for network buffers (soon not needed)
 That's all you need. System will not throw an OOM
exception to you.
53
Benefits of managed memory
 More reliable and stable performance (less GC
effects, easy to go to disk)
54
Native iterative processing
55
Example: Transitive Closure
56
case class Path (from: Long, to: Long)
val env =
ExecutionEnvironment.getExecutionEnvironment
val edges = ...
val tc = edges.iterate (10) { paths: DataSet[Path] =>
val next = paths
.join(edges).where("to").equalTo("from") {
(path, edge) => Path(path.from, edge.to)
}
.union(paths).distinct()
next
}
tc.print()
env.execute()
Iterate natively
57
partial
solution
partial
solutionX
other
datasets
Y
initial
solution
iteration
result
Replace
Step function
Iterate natively with deltas
58
partial
solution
delta
setX
other
datasets
Y
initial
solution
iteration
result
workset A B workset
Merge deltas
Replace
initial
workset
Effect of delta iterations
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
45000000
1 6 11 16 21 26 31 36 41 46 51 56 61
#ofelementsupdated
iteration
Iteration performance
60
MapReduce
Closing
61
Flink roadmap for 2015
 Unify batch and streaming
 Machine learning library and Mahout
 Graph processing library improvements
 Interactive programs and Zeppelin
 Logical queries and SQL
 And many more
62
Thank you for your invitation
 Check out the project
website:
http://flink.apache.org
 news@flink.apache.org
and other mailinglists
 Twitter: @ApacheFlink
 Feedback & contributions
welcome
63
Flink community
0
20
40
60
80
100
120
Jul-09 Nov-10 Apr-12 Aug-13 Dec-14 May-16
#unique contributors by git commits
(without manual de-dup)
flink.apache.org
@ApacheFlink
Hadoop User Group
29th January 2015 @ HP
Text matching engine
About Capgemini
With more than 130,000 people in over 40
countries, Capgemini is one of the world's
foremost providers of consulting,
technology and outsourcing services. The
Group reported 2013 global revenues of
EUR 10.1 billion. Together with its clients,
Capgemini creates and delivers business
and technology solutions that fit their
needs and drive the results they want. A
deeply multicultural organization,
Capgemini has developed its own way of
working, the Collaborative Business
ExperienceTM, and draws on Rightshore®, its
worldwide delivery model.
Learn more about us
at www.capgemini.com.
Rightshore® is a trademark belonging to
Capgemini
About Capgemini
67© 2015 Capgemini.All rights reserved.
Capgemini Global BIM Service Line
 Capgemini ‘s global reach with operations in 44
countries and a focus on BIM with over 9600
BIM practitioners.
 A uniquely integrated approach to Information
Strategy based around the Capgemini
“Intelligence Enterprise”.
 Deep Industry sector knowledge supported by
Sector Specific BIM offerings.
 Capgemini’s best-in-class Rightshore®
capability for BIM for development and
management of BIM – 4000 BIM experts in
India CoE.
 A unmatched (and vendor independent) depth
of technology experience. Capgemini works
with all the major BI software vendors to deliver
solutions appropriate to the customer’s needs.
850+ M EUR revenue in 2013
Europe:
South Africa
Argentina
Brazil
Mexico
United States
Canada
Saudi Arabia India
Australia
China
Morocco
Austria
Finalnd
France
Italy
Germany
Norway
Netherlands
Poland
Spain
Sweden
Switzerland
UK
68© 2015 Capgemini.All rights reserved.
Contact information
Please contact:
• Edmond SEGALEN
edmond.segalen@capgemini.com
• Mouloud LOUNACI
mouloud.lounaci@capgemini.com
• Nathalie SIMON
nathalie.simon@capgemini.com
69© 2015 Capgemini.All rights reserved.
The matching process in 3 steps
7
6 months of
applications
(~200 000)
7 days of offers
(~50 000)
Cleaning up
documents
Vectorization
of documents
~10 billons of possible
combinations
Similarity
computation
70© 2015 Capgemini.All rights reserved.
Cleaning Up the documents with Apache Lucene (UDF Hive)
• Removing the useless words by :
• A French vocabulary analyzer (general useless words in analysis : le, la, … articles..)
• A customized dictionary defined by users (specific useless words/regex such as: email, addresses,
numbers/dates ..)
• Extracting roots of remaining words (stemming) :
• Get rids off gender and plural problems
• Correct some possible misspelling in job applications
Formation en production mecanique bac+4
Experience en management en tant que
responsable de service qualite :service de
controle, de validation et de production de
lavage de pieces
clients; Sydeb Renault, GM Strasbourg,
ACMS Mecachrome Encadrement et
supervision du personnel du service qualite
: …
form production mecan bac experienc
manag tant responsabl servic qualit servic
control valid production
lavag piec client sydeb renault gm
strasbourg acm mecachrom encadr
supervision personel servic qualit …
+
Cleaning up the documents
71© 2015 Capgemini.All rights reserved.
1 – Corpus dictionary
Key: a.a: Value: 0
Key: a.a.c: Value: 1…
Key: form : Value: 1474…
Key: production : Value:
15500 …
Normalized TFIDF vectors
Doc_1: { (w1; tfidf_1); (w2 ; tfidf_2); … }
Doc_2: { (w_1; tfidf_1); (w_2 ; tfidf_2); … }
…
Normalized Vectors
The documents vectorization
transform texts inputs into
comparable quantifiable
mathematical objects : vectors
Vector Basis
(~ 1,2 million words/pairs of words)
2 – Relative weight
TFIDF(mot, doc)
= TF / DF
• TF (Term Frequency)
• DF (Document Frequency)
Documents
(Applications +
Offers)
Weight word
1
Weight word
2
…
Weight word
X
Doc_ 1 0.53 0.93 …
0
Doc_2 0 0.89 …
0.12
… … … … …
Vector coordinates
TIDF
“Vectorization” of documents
72© 2015 Capgemini.All rights reserved.
Similarity coefficient = Cosine between 2 vectors
measure of the TFIDF angle ( positive ) between 0 et +1
In SQL :
Independent information Exact same information
90° 0°…
- id_CV
- id_word
- tfidf
Offer
- id_offer
- id_word
- tfidf
SELECT
id_offer
,id_cv
,SUM(cv.tfidf*offer.tfidf) cos_sim
FROM offer
INNER JOIN CV ON offer.id_mot = CV.id_mot
GROUP BY id_offer, id_cv
Application
id_word
=
id_word
=1=1
Similarity process
73© 2015 Capgemini.All rights reserved.
Facts from the field…
 JOB OFFER
 Indeed offer Operator / Operator of the chemical
manufacturing In a company dedicated to the
production and packaging of chemicals such as
solvents, aerosols, greases for vehicle engines,
you'll be loads of different product mixes
 Similarity = 17%
XXXX : Technician manufacturing Chemical industry
AREAS OF EXPERTISE :
Monitor compliance on instruments…
 Similarity = 13%
YYYY : Operator chemistry
PROFESSIONAL SKILLS Manipulation measurement tools and
controls
 Similarity = 9,7%
ZZZZ
Weighing and metering products ehiniiques manipulation tool
controle Ph meter, meter, microscope ...
Procedures for cleaning and disinfection...
Similarity > 12%: High confidence Similarity
9-12%: Moderate confidence Similarity 5-
9%: Risky match
Similarity < 5%: High risk match
74© 2015 Capgemini.All rights reserved.
Thank you
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Merci
29 janvier 2015

Contenu connexe

Tendances

How cognitive computing is transforming HR and the employee experience
How cognitive computing is transforming HR and the employee experienceHow cognitive computing is transforming HR and the employee experience
How cognitive computing is transforming HR and the employee experienceRichard McColl
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumStarttech Ventures
 
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...StampedeCon
 
A Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelA Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelDATAVERSITY
 
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Impetus Technologies
 
Why Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & AnalyticsWhy Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & AnalyticsRick Perret
 
An AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationAn AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationDavid Solomon
 
Seminário Big Data, 19/05/2014 - Apresentação Federico Grosso
Seminário Big Data, 19/05/2014 - Apresentação Federico GrossoSeminário Big Data, 19/05/2014 - Apresentação Federico Grosso
Seminário Big Data, 19/05/2014 - Apresentação Federico GrossoFecomercioSP
 
Three Keys for Making Big Data User-Friendly
Three Keys for Making Big Data User-FriendlyThree Keys for Making Big Data User-Friendly
Three Keys for Making Big Data User-FriendlyInside Analysis
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?Kun Le
 
Conversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipConversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipLoren Davie
 
Top 10 optimistic data center solution providers 2020
Top 10 optimistic data center solution providers 2020Top 10 optimistic data center solution providers 2020
Top 10 optimistic data center solution providers 2020Swiftnlift
 

Tendances (20)

How cognitive computing is transforming HR and the employee experience
How cognitive computing is transforming HR and the employee experienceHow cognitive computing is transforming HR and the employee experience
How cognitive computing is transforming HR and the employee experience
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise Analytics
 
Semantic AI
Semantic AISemantic AI
Semantic AI
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
 
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
A Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelA Pragmatic AI Maturity Model
A Pragmatic AI Maturity Model
 
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
 
Why Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & AnalyticsWhy Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & Analytics
 
An AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationAn AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven Organization
 
Seminário Big Data, 19/05/2014 - Apresentação Federico Grosso
Seminário Big Data, 19/05/2014 - Apresentação Federico GrossoSeminário Big Data, 19/05/2014 - Apresentação Federico Grosso
Seminário Big Data, 19/05/2014 - Apresentação Federico Grosso
 
Why Analytics is key for Telecoms - you snooze you lose!
Why Analytics is key for Telecoms - you snooze you lose!Why Analytics is key for Telecoms - you snooze you lose!
Why Analytics is key for Telecoms - you snooze you lose!
 
Three Keys for Making Big Data User-Friendly
Three Keys for Making Big Data User-FriendlyThree Keys for Making Big Data User-Friendly
Three Keys for Making Big Data User-Friendly
 
National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015
 
Big data primer
Big data primerBig data primer
Big data primer
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?
 
Conversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipConversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data Stewardship
 
Top 10 optimistic data center solution providers 2020
Top 10 optimistic data center solution providers 2020Top 10 optimistic data center solution providers 2020
Top 10 optimistic data center solution providers 2020
 
Insights success the 10 best hadoop solution provider companies nov 2017
Insights success the 10 best hadoop solution provider companies nov 2017Insights success the 10 best hadoop solution provider companies nov 2017
Insights success the 10 best hadoop solution provider companies nov 2017
 

En vedette

Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Modern Data Stack France
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Cedric CARBONE
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connectorDuyhai Doan
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandationModern Data Stack France
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielModern Data Stack France
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Modern Data Stack France
 

En vedette (14)

Executive Breach Response Playbook
Executive Breach Response PlaybookExecutive Breach Response Playbook
Executive Breach Response Playbook
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
 
Seize the mobile moment forrester report
Seize the mobile moment   forrester reportSeize the mobile moment   forrester report
Seize the mobile moment forrester report
 
Engaging the Travel Consumer
Engaging the Travel ConsumerEngaging the Travel Consumer
Engaging the Travel Consumer
 
Introduction à HDFS
Introduction à HDFSIntroduction à HDFS
Introduction à HDFS
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
 
Une introduction à HBase
Une introduction à HBaseUne introduction à HBase
Une introduction à HBase
 
Une introduction à Hive
Une introduction à HiveUne introduction à Hive
Une introduction à Hive
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
 
Un introduction à Pig
Un introduction à PigUn introduction à Pig
Un introduction à Pig
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Spark dataframe
Spark dataframeSpark dataframe
Spark dataframe
 

Similaire à Hadoop User Group Agenda and Presentations

WCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEn
WCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEnWCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEn
WCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEnWCIT 2014
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companiesRobert Smith
 
HP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEn
HP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEnHP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEn
HP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEnHP Enterprise Italia
 
HP Enterprise Software: Making your applications and information work for you
HP Enterprise Software: Making your applications and information work for youHP Enterprise Software: Making your applications and information work for you
HP Enterprise Software: Making your applications and information work for youHP Enterprise Italia
 
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightAction from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightStampedeCon
 
Big Data analytics per le IT Operations
Big Data analytics per le IT OperationsBig Data analytics per le IT Operations
Big Data analytics per le IT OperationsHP Enterprise Italia
 
Incorporating cloud computing for enhanced communication v2
Incorporating cloud computing for enhanced communication v2Incorporating cloud computing for enhanced communication v2
Incorporating cloud computing for enhanced communication v2Christian Verstraete
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
From Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your businessFrom Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your businessNG DATA
 
Come fare business con i big data in concreto
Come fare business con i big data in concretoCome fare business con i big data in concreto
Come fare business con i big data in concretoHP Enterprise Italia
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
Data and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationData and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationVMware Tanzu
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...Dr. Wilfred Lin (Ph.D.)
 
The top ten free and open-source tools for video analytics.pdf
The top ten free and open-source tools for video analytics.pdfThe top ten free and open-source tools for video analytics.pdf
The top ten free and open-source tools for video analytics.pdfVertexplus Technologies
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HPMITEF México
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital TransformationVMware Tanzu
 

Similaire à Hadoop User Group Agenda and Presentations (20)

WCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEn
WCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEnWCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEn
WCIT 2014 Rohit Tandon - Big Data to Drive Business Results: HP HAVEn
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companies
 
HP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEn
HP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEnHP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEn
HP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEn
 
HP Enterprise Software: Making your applications and information work for you
HP Enterprise Software: Making your applications and information work for youHP Enterprise Software: Making your applications and information work for you
HP Enterprise Software: Making your applications and information work for you
 
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightAction from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
 
Big Data analytics per le IT Operations
Big Data analytics per le IT OperationsBig Data analytics per le IT Operations
Big Data analytics per le IT Operations
 
Incorporating cloud computing for enhanced communication v2
Incorporating cloud computing for enhanced communication v2Incorporating cloud computing for enhanced communication v2
Incorporating cloud computing for enhanced communication v2
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
From Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your businessFrom Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your business
 
Come fare business con i big data in concreto
Come fare business con i big data in concretoCome fare business con i big data in concreto
Come fare business con i big data in concreto
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Data and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationData and its Role in Your Digital Transformation
Data and its Role in Your Digital Transformation
 
Big data
Big dataBig data
Big data
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...
 
The top ten free and open-source tools for video analytics.pdf
The top ten free and open-source tools for video analytics.pdfThe top ten free and open-source tools for video analytics.pdf
The top ten free and open-source tools for video analytics.pdf
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HP
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital Transformation
 

Plus de Modern Data Stack France

Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupModern Data Stack France
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Modern Data Stack France
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlusModern Data Stack France
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)Modern Data Stack France
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Modern Data Stack France
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Modern Data Stack France
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015Modern Data Stack France
 
The Cascading (big) data application framework
The Cascading (big) data application frameworkThe Cascading (big) data application framework
The Cascading (big) data application frameworkModern Data Stack France
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Modern Data Stack France
 
Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...
Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...
Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...Modern Data Stack France
 
HUGFR : Une infrastructure Kafka & Storm pour lutter contre les attaques DDoS...
HUGFR : Une infrastructure Kafka & Storm pour lutter contre les attaques DDoS...HUGFR : Une infrastructure Kafka & Storm pour lutter contre les attaques DDoS...
HUGFR : Une infrastructure Kafka & Storm pour lutter contre les attaques DDoS...Modern Data Stack France
 

Plus de Modern Data Stack France (20)

Stash - Data FinOPS
Stash - Data FinOPSStash - Data FinOPS
Stash - Data FinOPS
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Hug janvier 2016 -EDF
Hug   janvier 2016 -EDFHug   janvier 2016 -EDF
Hug janvier 2016 -EDF
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
 
Spark meetup at viadeo
Spark meetup at viadeoSpark meetup at viadeo
Spark meetup at viadeo
 
The Cascading (big) data application framework
The Cascading (big) data application frameworkThe Cascading (big) data application framework
The Cascading (big) data application framework
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 
Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...
Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...
Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...
 
HUGFR : Une infrastructure Kafka & Storm pour lutter contre les attaques DDoS...
HUGFR : Une infrastructure Kafka & Storm pour lutter contre les attaques DDoS...HUGFR : Une infrastructure Kafka & Storm pour lutter contre les attaques DDoS...
HUGFR : Une infrastructure Kafka & Storm pour lutter contre les attaques DDoS...
 
Future of data
Future of dataFuture of data
Future of data
 

Dernier

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 

Dernier (20)

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 

Hadoop User Group Agenda and Presentations

  • 1. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Hadoop User Group 29 janvier 2015
  • 2. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Agenda #1: Traitement des données non structurées (Vidéos, images, …) avec Haven pour Hadoop, #2: Apache Flink: Fast and Reliable Large-scale Data Processing, #3: Etude de cas, projet Hadoop dans le domaine des RH avec Capgemini. La vectorisation des documents : rendre comparables des informations non structurées, de nouvelles opportunités pour un acteur de l’emploi 21h00 : Cocktail dinatoire
  • 3. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Hadoop User Group Haven pour analyser 100% des informations Frédéric Demongeot – EMEA Subject Matter Expert 29 Janvier 2015
  • 4. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.4 Big Data landscape Human InformationMachine Data Business Data 10% of Information 90% of Information Annual Growth ~100% ~10%
  • 5. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.5 Haven – Big Data platform Haven Social media IT/OT ImagesAudioVideo Transactional dataMobile Search engineEmail Texts Catalog massive volumes of distributed data Hadoop/ HDFS Process and index all information Autonomy IDOL Analyze at extreme scale in real-time Vertica Collect & unify machine data Enterprise Security Powering HP Software + your apps nApps Documents hp.com/haven
  • 6. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.6 Leverage existing tools in shared Vertica and Hadoop storage environment A few words on Vertica Hadoop HDFS External Tables Flex Tables Click Stream, Web Session Data Hive Integration (HCatalog) webHDFS ANSI SQL webHCAT Storage Tiering Hive Pig MapReduce HBase Cop y
  • 7. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.7 Vertica SQL on Hadoop
  • 8. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners. Autonomy Haven Examples
  • 9. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.9 “Autonomy, with the power of its IDOL engine, takes fan data, collects, it stores, and stitches it together…that helps us understand what is being talked about across the ecosystem of the sport.” - Senior Director of IT, NASCAR HAVEn Solution: • Autonomy IDOL + Autonomy Explore + HP Enterprise services Results: • Wildly successful NASCAR Fan and Media Engagement Center collects and aggregates fan information • Understands sentiment, identifies emerging issues, and uncovers trends that help the NASCAR team share and enrich the fan and broadcast experience NASCAR
  • 10. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.10 Car Manufacturer Use Case: Brand monitoring • Have one single Big Data platform they leverage for multiple projects • Customer brand and partner analytics • Connected vehicles • Aggregate multiple data sources and store all on HDFS, such as: • Social media • YouTube rich media • Internal data sources, such as CRM, car logs • Rich media analysis - logo recognition, face recognition, speech to text, etc. • Sentiment Analysis HAVEn technology • Autonomy • Hadoop • nApps – Integration of Haven technologies
  • 11. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.11 BlaBlaCar Improving a ride-sharing community marketplace Challenge • Marketing campaigns and web experience for 10M+ members in 12 countries limited by infrequent and slow data analysis Solution • HP Vertica Analytics Platform • Cloudera Distributed Hadoop, Tableau, & Data Science Studio Result • Optimized performance of CRM campaigns: program development improved experience for 2M customers per month • Refined targeted marketing by integrating social media and predicting customer behavior through pattern recognition
  • 12. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners. The Challenge of Human Information
  • 13. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.13 What is Human Information Information that is created by people and understood by people
  • 14. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.14 Why is human information different? Human Information is made up of ideas, is diverse, and has context. =
  • 15. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.15 Strong Information & Weak Information Key Words are small amounts of very strong information without context Larger amounts of weaker information is what humans refer to as “context” “Mercury” Is it a planet?Is it an element?Is it a car?With high certainty; it’s an element! “A heavy element and the only metal that is liquid at standard conditions for temperature and pressure with the symbol Hg and atomic number 80, commonly known as quicksilver”
  • 16. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.16 How does HP IDOL approaches Human Information Using Adaptive Probabilistic Concept Modeling Techniques that provide continuous learning based on context. Techniques that deal in a scalable manner with the subtlety of the real world. Adaptive Probabilistic Concept Modeling Techniques that inform the importance of patterns found in data This proprietary combination of mathematics model Human Information and is; Automatic, Fast, Data agnostic, Language independent, Scalable, Accurate, Dynamic, Real-time, Voice & Video
  • 17. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners. IDOL - OS of Human Information
  • 18. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.18 The OS of Human Information securely to any source of information through a single cross-enterprise interface layer;Connect the meaning, concepts and key attributes in all types of human friendly information including documents, emails, databases, clickstreams, audio, social and rich media, etc…Understand Inquire, Investigate, Interact and Improve quickly, correctly and compliantly based on a holistic view of information, market conditions and social trends. Act & Automate IDOL(Intelligent Data Operating Layer) provides an interface for accessing human information from all sources and of all types and provides common services that leverage the information to applications that need to access them to;
  • 19. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.19 IDOL: the OS for human information • Mathematically based • 15 years and over $280M in R&D • 170+ Patents • Language independent • Built for infrastructure • All file types, all media types (voice/video) • Scalable and with security • Platform/OS /device agnostic • Managed in place
  • 20. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.20 Inquire “Search your data” When you have criteria or an object that form a question, Inquire functions allow you to return results that answer that question. It allows you to sift through large quantities of data to find specific documents that relate to your question or an area of interest. Investigate “Analyze your data” Investigate functions allow you to use information contained in the results of an inquiry to analyse those results. The analysis might provide insights that allow you to improve your inquiry, or it might provide more general information about your content. Interact “Personalize your data” Using information with affinity to the user to create conceptual Profiles and Agents that reflect the user’s information needs which in turn can be used to power other functions, Interact functions encompasses all the functionality to achieve this goal. Improve “Enhance your data” Improve functions enhance your information with more details that help with the Inquire and Investigate functions. These functions allow you to add information to data of any type, be it audio, video or text, that makes it easier to search and retrieve information, or to identify key features of your content. The World as Services
  • 21. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.21 Analyse your Data Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data” What insights does our Human Information hold? Is there structure that I can use to navigate the data? Expose Concepts and Patterns Help me evaluate the information quickly Intelligent Summarization (simple, concept and context) Intelligent Highlighting (search terms, phrases, concepts, context, fidelity to query grammar) Concept Streaming (Real time summaries from Audio, contextual to queries and intent) Intelligent Results de-duplication including “near” de-duplication Structured, Semi-structured & XML support Parametric Searching (unlimited nesting and association support) Directed Navigation (create compelling navigation for users) Structured Refinement Automatic Query Guidance (providing top themes from query results in real time) Concept Navigation via advanced visualizations (node graphs, theme tracking, broadcast analysis) Language Independent
  • 22. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.22 Visualization of main topics Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data”
  • 23. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.23 Enhance your Data Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data” Human Information is rich in features that can, when identified, enhance our analysis Automatic Classification or Clustering Automatically determine categories based on patterns and relationships in Human Information Spot analysis of all themes and grouping within Human Information at any moment in time Time sensitive analysis; What’s hot? What’s New? Supervised Classification Create categories using business rules or training and classify information into those categories Eduction and Entity Extraction Extract features and determine characteristics in Human Information Names, Addresses, Credit Card Information, Sentiment, Intent….. Audio Analysis Extract features of Audio information Speaker independent speech to text, speaker identification, audio events, language identification….. Image and Video Analysis Extract features from Video information Next generation image classification (is this a car?/find more like “this”) On-screen OCR, logo detection, intelligent scene analysis, Colour and texture analysis, story segmentation….
  • 24. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.24 Hundreds of conceptual entities Eduction Quickly narrow search results with auto-identified facets and conceptual entities such as employee names from documents Validate or customize entities • Is this a valid credit card number? • What are all docs that contain SSNs? • If area code is 415, output as Home Office Pinpoint accuracy for multibyte languages such as CJK, Thai and some European languages Names Places IP addresses Companies Events Relationships Medicines Airports Cars Social Security numbers Phone numbers Credit cards Dates Holidays Job titles Currencies … many more Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data”
  • 25. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.25 Eduction Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data” <Organization> • National Security Agency <Names> • President Obama • Vladimir Putin • Edward Snowden <Places> • Moscow • St. Petersburg • Washington • Syria • Russia <Author> • Carla Anne Robbins
  • 26. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.26 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Topical sentiment analysis Decomposition and classification within a sentence to pull out specific topics “I stayed at the Marriott last week, and though the mattresses were very nice, the service was awful.” Is this Positive? Negative? Neutral? How much Positive? How much Negative? Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data”
  • 27. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.27 Search video as easily as text Transform rich media into intelligent assets Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data” Live video or playback from archived footage On-screen text recognition Face identification Automatically generated transcript using speech recognition Speaker identification Timecode synchronization Automatic keyframe generation Automate Automatically create metadata, keyframes, transcriptions Understand Understand video footage and audio streams in real time Act Apply advanced analytics such as clustering and categorization, and link with other file types
  • 28. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.28 Most advanced speech technology Convert spoken words to text • Acoustic + Language Model • Speech-to-Text and IDOL’s conceptual understanding Eliminate manually adding metadata to A/V clips Phonetic approaches have major problems • No Conceptual or Contextual Language Understanding • Keyword-Based Model of language disambiguates similar terms • U.S. President “Bush” • “bush” as in a large plant Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data”
  • 29. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.29 Image technology: Text Document field extraction Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data” <item> <price>$6.23</pri ce> <date>10/2/2012 </date> <purpose>Lunch </purpose> … </item> OCR: Read text from images 1D and 2D barcode reading ISBN (“9870140189865”) PDF-417 (“LASTNAME, FIRSTNAME,…”) Data Matrix (“The Future of Ticketing…”) Many more (about 20 barcode types) Image artifacts such as wrinkled paper Avoid non-text parts of the image Column understanding
  • 30. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.30 Image technology: 2D objects Registered image Test image Generic Logo recognition Registered Logos Test image Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data”
  • 31. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.31 Image technology: Human analysisInquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data” Primary clothing color = white Not nude Primary clothing color = white Not nude Primary clothing color = black Not nude Face detection Face analysis Found “President Obama” face
  • 32. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners. IDOL Architecture
  • 33. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.33 IDOL architecture supporting next gen apps Social Media Video Audio Email Texts Mobile Transactional Data Documents XML Search Engine Images HP Autonomy IDOL Applications Autonomy Connectors eDiscovery Enterprise Search Media Monitoring Social Media Analytics Decision Support Augmented Reality Partner/ In-house apps HC Analytics 2D/3D clustering, Acoustic signature, Active matching, Agents, Alerting, Auto language detection, Auto query guidance, Boolean & legacy, Operations, Breaking news clustering, Categorization, Collaboration, Community, Concept highlighting, Concept-query, Summarization, Conceptual retrieval, Context summarization, Cross- modal suggest, Dynamic n-dimensional, Taxonomy generation, Dynamic XML, Consumption, Eduction, Exact phrase matching, Expertise location, Explicit profiling, Face recognition, Field modulation, Frame analysis, Fuzzy matching, Hot clustering, Hyperlinking, Image analysis, Image association, Implicit profiling, Keyword search, Mail object identification, Melody classification, Melody identification, Metadata recognition, Natural language retrieval, Object identification, Object recognition, Ontology generation, Parametric refinement, Phrase spotting, Proper name identification, Query by example, Real-time aggregation, Routing, Scene detection, Script alignment, Sentiment analysis, Soundex matching, Speaker identification, Speaker recognition, Spectographic analysis, Spell checking, Tag reconciliation, Transcription, Video analysis, Voice printing, Word spotting, Work groups, XML tagging…. Repositories Information Types Apps 500 Functions IDOL Services Multimedia Informatics Enrichment Capture InteractionAnalytics Discovery Concept Clouds Active MatchingVisualization SharePoint, Hadoop, Email, ERP,CRM, DB, Data Warehouse, Jive, … ACA MediaBin Connected LiveVault TRIM AeD Data Protector WorkSite DigitalSafe Connectors … CloudEnterprise IDOL OS for Human Information
  • 34. Copyright © 2013 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.34 Hadoop Plays 1. Keyview used in Hadoop to extract text and metadata from data objects using map reduce 2. Connectors used to fill the data-lake from enterprise repositories 3. IDOL (in conjunction with 1 & 2) used to provide deep text analytics on data objects Inquire “Search your data” Investigate “Analyze your data” Interact “Personalize your data” Improve “Enhance your data”
  • 35. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners.35 HP IDOL for Hadoop – Potential use case • HP Hadoop Connectors can ingest data from other systems into Hadoop • HP KeyView extracts text and metadata from Hadoop data • HP IDOL functions can be performed on Hadoop data via SDK
  • 36. Copyright © 2015 Autonomy Inc., an HP Company. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners. Thank You
  • 37. Robert Metzger Flink committer co-founder, data Artisans @rmetzger_ rmetzger@data-artisans.com Apache Flink
  • 38. What is Flink  Collection programming APIs for batch and real-time streaming analysis  Backed by a very robust execution backend • with true streaming capabilities, • custom memory manager, • native iteration execution, • and a cost-based optimizer. 38
  • 39. The case for Flink  Performance and ease of use • Exploits in-memory and pipelining, language-embedded logical APIs  Unified batch and real streaming • Batch and Stream APIs on top of streaming engine  A runtime that "just works" without tuning • C++ style memory management inside the JVM  Predictable and dependable execution • Bird’s-eye view of what runs and how, and what failed and why 39
  • 40. Example: WordCount 40 case class Word (word: String, frequency: Int) val env = ExecutionEnvironment.getExecutionEnvironment env.readTextFile(...) .flatMap {line => line.split(" ").map(word => Word(word,1))} .groupBy("word").sum("frequency”).print() env.execute() Flink has mirrored Java and Scala APIs that offer the same functionality, including by-name addressing.
  • 41. Example: Window WordCount 41 case class Word (word: String, frequency: Int) val env = StreamExecutionEnvironment.getExecutionEnvironment val lines = env.fromSocketStream(...) lines .flatMap {line => line.split(" ").map(word => Word(word,1))} .window(Count.of(100)).every(Count.of(10)) .groupBy("word").sum("frequency”).print() env.execute()
  • 42. Defining windows  Trigger policy • When to trigger the computation on current window  Eviction policy • When data points should leave the window • Defines window width/size  E.g., count-based policy • evict when #elements > n • start a new window every n-th element  Built-in: Count, Time, Delta policies 42
  • 43. Flink API in a nutshell  map, flatMap, filter, groupBy, reduce, reduceGroup, aggregate, join, coGroup, cross, project, distinct, union, iterate, iterateDelta, ...  All Hadoop input formats are supported  API similar for data sets and data streams with slightly different operator semantics  Window functions for data streams  Counters, accumulators, and broadcast variables 43
  • 44. Flink stack 44 Flink Optimizer Flink Stream Builder Common API Scala API Java API Python API (upcoming) Graph API (Gelly) Apache MRQL Flink Local RuntimeEmbedded environment (Java collections) Local Environment (for debugging) Remote environment (Regular cluster execution) Apache Tez Data storage HDFSFiles S3 JDBC Flume Rabbit MQ KafkaHBase … Single node execution Standalone or YARN cluster
  • 45. Technology inside Flink  Technology inspired by compilers + MPP databases + distributed systems  For ease of use, reliable performance, and scalability case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Cost-based optimizer Type extraction stack Memory manager Out-of-core algos real-time streamingTask schedulin g Recovery metadata Data serialization stack Streaming network stack ... Pre-flight (client) Master Workers
  • 46. Notable runtime features 1. Pipelined data transfers 2. Management of memory 3. Native iterations 4. Program optimization 46
  • 48. Staged (batch) execution Romeo, Romeo, where art thou Romeo? Loa d Log Searc h for str1 Searc h for str2 Searc h for str3 Grep 1 Grep 2 Grep 3 Stage 1: Create/cache Log Subseqent stages: Grep log for matches Caching in-memory and disk if needed 48
  • 49. Pipelined execution Romeo, Romeo, where art thou Romeo? Loa d Log Searc h for str1 Searc h for str2 Searc h for str3 Grep 1 Grep 2 Grep 3 001100110011001100110011 Stage 1: Deploy and start operators Data transfer in- memory and disk if needed 49 Note: Log DataSet is never “created”!
  • 50. Pipelining in Flink  Currently the default mode of operation • Much better performance in many cases – no need to materialize large data sets • Supports both batch and real-time streaming  In the future pluggable • Batch will use combination of blocking and pipelining • Streaming will use pipelining • Interactive will use blocking 50
  • 52. Memory management in Flink public class WC { public String word; public int count; } empty page Pool of Memory Pages Sorting, hashing, caching Shuffling, broadcasts User code objects ManagedUnmanaged 52 Flink contains its own memory management stack. Memory is allocated, de-allocated, and used strictly using an internal buffer pool implementation. To do that, Flink contains its own type extraction and serialization components.
  • 53. Configuring Flink  Per job • Parallelism  System config • Total JVM heap size (-Xmx) • % of total JVM size for Flink runtime • Memory for network buffers (soon not needed)  That's all you need. System will not throw an OOM exception to you. 53
  • 54. Benefits of managed memory  More reliable and stable performance (less GC effects, easy to go to disk) 54
  • 56. Example: Transitive Closure 56 case class Path (from: Long, to: Long) val env = ExecutionEnvironment.getExecutionEnvironment val edges = ... val tc = edges.iterate (10) { paths: DataSet[Path] => val next = paths .join(edges).where("to").equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths).distinct() next } tc.print() env.execute()
  • 58. Iterate natively with deltas 58 partial solution delta setX other datasets Y initial solution iteration result workset A B workset Merge deltas Replace initial workset
  • 59. Effect of delta iterations 0 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 45000000 1 6 11 16 21 26 31 36 41 46 51 56 61 #ofelementsupdated iteration
  • 62. Flink roadmap for 2015  Unify batch and streaming  Machine learning library and Mahout  Graph processing library improvements  Interactive programs and Zeppelin  Logical queries and SQL  And many more 62
  • 63. Thank you for your invitation  Check out the project website: http://flink.apache.org  news@flink.apache.org and other mailinglists  Twitter: @ApacheFlink  Feedback & contributions welcome 63
  • 64. Flink community 0 20 40 60 80 100 120 Jul-09 Nov-10 Apr-12 Aug-13 Dec-14 May-16 #unique contributors by git commits (without manual de-dup)
  • 66. Hadoop User Group 29th January 2015 @ HP Text matching engine
  • 67. About Capgemini With more than 130,000 people in over 40 countries, Capgemini is one of the world's foremost providers of consulting, technology and outsourcing services. The Group reported 2013 global revenues of EUR 10.1 billion. Together with its clients, Capgemini creates and delivers business and technology solutions that fit their needs and drive the results they want. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business ExperienceTM, and draws on Rightshore®, its worldwide delivery model. Learn more about us at www.capgemini.com. Rightshore® is a trademark belonging to Capgemini About Capgemini 67© 2015 Capgemini.All rights reserved.
  • 68. Capgemini Global BIM Service Line  Capgemini ‘s global reach with operations in 44 countries and a focus on BIM with over 9600 BIM practitioners.  A uniquely integrated approach to Information Strategy based around the Capgemini “Intelligence Enterprise”.  Deep Industry sector knowledge supported by Sector Specific BIM offerings.  Capgemini’s best-in-class Rightshore® capability for BIM for development and management of BIM – 4000 BIM experts in India CoE.  A unmatched (and vendor independent) depth of technology experience. Capgemini works with all the major BI software vendors to deliver solutions appropriate to the customer’s needs. 850+ M EUR revenue in 2013 Europe: South Africa Argentina Brazil Mexico United States Canada Saudi Arabia India Australia China Morocco Austria Finalnd France Italy Germany Norway Netherlands Poland Spain Sweden Switzerland UK 68© 2015 Capgemini.All rights reserved.
  • 69. Contact information Please contact: • Edmond SEGALEN edmond.segalen@capgemini.com • Mouloud LOUNACI mouloud.lounaci@capgemini.com • Nathalie SIMON nathalie.simon@capgemini.com 69© 2015 Capgemini.All rights reserved.
  • 70. The matching process in 3 steps 7 6 months of applications (~200 000) 7 days of offers (~50 000) Cleaning up documents Vectorization of documents ~10 billons of possible combinations Similarity computation 70© 2015 Capgemini.All rights reserved.
  • 71. Cleaning Up the documents with Apache Lucene (UDF Hive) • Removing the useless words by : • A French vocabulary analyzer (general useless words in analysis : le, la, … articles..) • A customized dictionary defined by users (specific useless words/regex such as: email, addresses, numbers/dates ..) • Extracting roots of remaining words (stemming) : • Get rids off gender and plural problems • Correct some possible misspelling in job applications Formation en production mecanique bac+4 Experience en management en tant que responsable de service qualite :service de controle, de validation et de production de lavage de pieces clients; Sydeb Renault, GM Strasbourg, ACMS Mecachrome Encadrement et supervision du personnel du service qualite : … form production mecan bac experienc manag tant responsabl servic qualit servic control valid production lavag piec client sydeb renault gm strasbourg acm mecachrom encadr supervision personel servic qualit … + Cleaning up the documents 71© 2015 Capgemini.All rights reserved.
  • 72. 1 – Corpus dictionary Key: a.a: Value: 0 Key: a.a.c: Value: 1… Key: form : Value: 1474… Key: production : Value: 15500 … Normalized TFIDF vectors Doc_1: { (w1; tfidf_1); (w2 ; tfidf_2); … } Doc_2: { (w_1; tfidf_1); (w_2 ; tfidf_2); … } … Normalized Vectors The documents vectorization transform texts inputs into comparable quantifiable mathematical objects : vectors Vector Basis (~ 1,2 million words/pairs of words) 2 – Relative weight TFIDF(mot, doc) = TF / DF • TF (Term Frequency) • DF (Document Frequency) Documents (Applications + Offers) Weight word 1 Weight word 2 … Weight word X Doc_ 1 0.53 0.93 … 0 Doc_2 0 0.89 … 0.12 … … … … … Vector coordinates TIDF “Vectorization” of documents 72© 2015 Capgemini.All rights reserved.
  • 73. Similarity coefficient = Cosine between 2 vectors measure of the TFIDF angle ( positive ) between 0 et +1 In SQL : Independent information Exact same information 90° 0°… - id_CV - id_word - tfidf Offer - id_offer - id_word - tfidf SELECT id_offer ,id_cv ,SUM(cv.tfidf*offer.tfidf) cos_sim FROM offer INNER JOIN CV ON offer.id_mot = CV.id_mot GROUP BY id_offer, id_cv Application id_word = id_word =1=1 Similarity process 73© 2015 Capgemini.All rights reserved.
  • 74. Facts from the field…  JOB OFFER  Indeed offer Operator / Operator of the chemical manufacturing In a company dedicated to the production and packaging of chemicals such as solvents, aerosols, greases for vehicle engines, you'll be loads of different product mixes  Similarity = 17% XXXX : Technician manufacturing Chemical industry AREAS OF EXPERTISE : Monitor compliance on instruments…  Similarity = 13% YYYY : Operator chemistry PROFESSIONAL SKILLS Manipulation measurement tools and controls  Similarity = 9,7% ZZZZ Weighing and metering products ehiniiques manipulation tool controle Ph meter, meter, microscope ... Procedures for cleaning and disinfection... Similarity > 12%: High confidence Similarity 9-12%: Moderate confidence Similarity 5- 9%: Risky match Similarity < 5%: High risk match 74© 2015 Capgemini.All rights reserved.
  • 76. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Merci 29 janvier 2015