HORTONWORKS DATA PLATFORM AND IBM SYSTEMS – A COMPLETE SOLUTION FOR COGNITIVE BUSINESS
SynerScope has been helping European organizations across industries unlock competitive business value from data for almost a decade. Now, by leveraging state-of-the-art access control and audit mechanisms from Hortonworks combined with the latest generation high-performance computing and storage solutions from IBM, SynerScope can connect and correlate enterprise data at a scale not previously possible. SynerScope will demonstrate end-to-end analytics workflows including deep-learning based automation using new integrated solutions from Hortonworks and IBM.
3. Dealing with the
complexity of
data at scale
Volume, variety, velocity and
veracity
Text
a,b,c,…
Sensors
IoT
Wearables
Etc.
Digital Images
Video
Pictures
Scans
Network
Numbers
1,2,3,…
4. SYNERSCOPE’S CHOICE FOR A HIGH-PERFORMANCE
CONVERGED SYSTEM
o Scalable
o Flexible
o Attacks the right bottlenecks
o Unique balance of compute, memory and
storage
SYNERSCOPE’S PRIMARY USE CASES FOR IBM
POWER
o High Volume transactions
o On-premise image analytics
o Sensor data (IoT)
o Large scale networks
IBM Power
Systems
5. COMPUTER AUTOMATION PLUS HUMAN INTERACTION
Machine
Learning + Visual
Sensemaking
o Speed and Flexibility
o Double cognitive system
6. Four main components
End to End
Analytics
Connect & Correlate
at Scale
State of the art
Access Control
AI
Deep Learning
7. TIME BURNERS WHEN WORKING WITH DATA
o Getting the infrastructure ready
o Data Science tooling deployment
o Loading data into the platform
o Data quality, provenance or cleaning
o Searching the data
o Selecting the relevant data
o Finding which data holds information
value
o Operationalize the new findings
o Getting to data driven ways of working
End-to-End
Analytics
Productivity with Raw
and Complex Data
8. BULK PROCESSING FOR ALL THE HEAVY LIFTING
o Ingest from unlimited data sources
o Content Scanning at field level
o Data-driven Correlation
o Enterprise Search
Connect and
correlate data at
Scale
9. Fully Automated, zero interaction by Ixiwa IximeerSped up by Ixiwa
Traditional ETL data science approach
Re-arrange the analytic workflow,
Bulk
Ingest
Scan &
Organize
Correlate &
Enrich
Find Extract Analyze
Find Ingest Organize Enrich Extract Analyze
10. AUTOMATED BULK PROCESSES DO THE HEAVY LIFTING
o Ingest
o Content Scanning
o Data-driven Correlation
o Enterprise Search
Replace upfront
human efforts
with targeted
efforts
11. FINE TOUCH INTEGRATION WITH HDP AND SPARK
o Scalable performance
IXIWA BUILD-IN INTELLIGENCE TO
o Detect input format and encodings
o Extract text from PDF, DOCX, XLSX, etc. with
Tika
o Index all data with SOLR for datalake-wide
search
o Automatically tag the data
Ixiwa on
IBM Power
Systems with
POWER8 for
heavy data lifting
12. PATENTED MANY-TO-MANY AUTO-CORRELATOR AT SCALE
Data
Similarity &
Linking
Data
Fingerprinting Auto-correlator
Similar data
redundant
Linkable data
• Augment
• Enrich
13. ACCESS CONTROL
o Role based access made easy
o First do bulk ingest then find data-driven
access
o Field level data scanning for precision
o Compartmentalize the data
o Roles and field content together define access
o Continuous audit trails
GDPR READY
o Know where all sensitive data is
o Know which applications and users access
what
Enterprise Grade
Security for your
Datalake
Ixiwa + Ranger + Atlas
14. IXIWA ON APACHE RANGER AND APACHE ATLAS
o Automated data tagging
o Easy setting of access permissions
o Bulk ingested data made easy to share safely
o Full provenance trails kept in Ranger
o Transparent, traceable and auditable
SECURE HIGH-LEVEL TASK AND WORKFLOW SYSTEM
o Reduce the need for one-off data science
activities
Smart data
scanning and
data
management
15. BOTTLENECKS
o How data is presented to humans
o Collaboration in and between groups
o Complex decision making
o Data driven decisions
AI BECOMES A HUMANS BEST COMPANION
o Changes how data is presented
o Allows humans to do what humans do best
o ASSOCIATE DATA AND INFORMATION
Productivity
Productivity
Productivity
with data at Scale
17. Analyst
Analyze Emerging
Patterns & Context
Historical Data
Test found rules, return
high-quality hits
Streaming Flows
Implement found rules,
alert in near real-time
18. COMBINE IBM- HORTONWORKS- GOOGLE
SynerScope’s
fine touch
integration of
HDP, GPU &
Tensorflow
o Mixed cluster with multiple node types
o Leverage the Hybrid nature of Tensorflow (CPU+GPU)
o Leverage YARN node labels to launch on GPU nodes
o Models stored on HDFS
HDP
YARN
CPU Nodes
TensorFlow
PySpark
GPU Nodes
CUDA
TensorFlow
PySpark
19. SYNERSCOPE & TENSORFLOW ON IBM POWER SYSTEMS
Warp power for
Storage,
Compute and
Learning
o PowerAI Deep Learning distribution includes pre-compiled
TensorFlow binaries for fast deployment
o Direct support for NVIDIA Tesla and Pascal series
o NVLink support on Pascal
Spectrum
Scale
Storage
IBM
POWER8
HDP
PySpark
IBM POWER8
&
NVIDIA Pascal
HDP
PowerAI
TensorFlow
PySpark
Scalable ComputeScalable Storage Scalable Learning
20. EXAMPLE OF A PERFECT TASK FOR IBM POWER SYSTEMS
o Image classification and grading
o Batch-wise training on ever expanding data
o Field level data scanning for precision
On-premise
Image
Classification
when a public cloud is off-limits
22. Practical Live Modelling with SAP, IBM Power Systems and ML
Live Order
Data
Stream
Full
Historical
Data
Labels &
Predictio
ns
Async
Stream
Processin
g
Trained and
Dynamically
Updated Models
Virtualized on IBM
POWER8
IBM POWER8 (lab) IBM POWER8 NVIDIA