SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Introducing:
Trillium DQ for Big Data
Harald Smith, Director Product Marketing
Housekeeping
Webcast Audio
• Today’s webcast audio is streamed through your computer speakers.
• If you need technical assistance with the web interface or audio,
please reach out to us using the chat window.
Questions Welcome
• Submit your questions at any time during the presentation
using the chat window.
• We will answer them during our Q&A session following the
presentation.
Recording and slides
• This webcast is being recorded. You will receive an
email following the webcast with a link to download
both the recording and the slides.
Speaker
Harald Smith
• Director of Product Marketing, Syncsort
• 20+ years in Information Management with a focus
on data quality, integration, and governance
• Co-author of Patterns of Information Management
• Author of two Redbooks on Information Governance
and Data Integration
• Blog on InfoWorld: “Data Democratized”
3
Data challenges across the business
Business Leaders
Lack trust in data needed to
make rapid, accurate
decisions that grow business
Business Analysts
Can’t access or understand
data and spend excessive
time on investigating
Information Leaders
Must facilitate business
collaboration and data
transparency and governance
Chief Data Officers
Make data a strategic
business asset utilizing
scientific skills from basic
spreadsheet knowledge
4
Only 35% of senior
executives have a high
level of trust in the
accuracy of their Big
Data Analytics
92% of executives are
concerned about the
negative impact of data
and analytics on
corporate reputation
New survey indicates
nearly 80% of AI/ML
projects stalling due to
poor data quality
84% of CEOs
are concerned about
the quality of the data
they’re basing
decisions on
Big Data Needs
Data Quality
6
Data Quality Challenges of Big Data
Profiling Data
• Organizations are storing vast amounts of data in data lakes and the Cloud –
from many different sources – but that data isn’t usable unless it is understood
and to understand it, the business users who work with the data must be able
to access and profile it without constant IT help
Matching Entities Accurately
• Distinguishing matches that indicate a single specific entity across so much data
requires sophisticated multi-field matching algorithms – that need to be
understandable by business users to be meaningful
Scalability
• Distinguishing matches across massive datasets requires a lot of compute
power - compare everything has to be compared to everything else, multiple
times in multiple ways
• Taking advantage of Big Data processing for scalability requires specialized skills
and takes a long time – and requires tuning, re-writing as technology changes
• Traditional data quality tools are not designed to work on that scale of data
Trillium DQ for Big Data
Understand, Evaluate, and Resolve Big Data Quality Problems
Trillium Discovery for Big Data
Data Profiling
Gain a complete picture of your data before
use
• Understand the data
• Analyze the data
• Find data quality problems
• Build and evaluate data quality rules
7
Trillium DQ for Big Data
On Premises or via Trillium Cloud
Deploy any or all products to the cloud - Completely managed SaaS in AWS or Azure
Trillium Quality for Big Data
Data Cleansing and Matching
Cleanse, standardize, and connect
data in accordance with your predefined
standards
• Entity matching and resolution
• Data cleansing and correction
• Data record enrichment
Feature-rich data profiling and data quality processing engines
• Leveraging over two decades of data quality expertise
An efficient orchestration of this engine in Big Data distributed
frameworks
• Powered by an architecture that has been in production with very large
(2000+ node) environments running natively across the cluster
• Partnered with Cloudera and Hortonworks closely, native integration with the stack
• Syncsort has been a major contributor to Apache Hadoop open source project
• With efficient orchestration, we can process any number of attributes with a handful
of MapReduce jobs
• Same architecture is used for Apache Spark
“Design once, deploy anywhere” architecture
• Native connectivity providing breadth and performance
• “Intelligent Execution” to optimize process execution at run-time
(MapReduce, Spark 1.x, Spark 2.x)
• On-premise and in the cloud (e.g. Amazon EMR)
8
Data Quality for Your Big Data Needs
Key Outcomes
• Reduce the time for business analysts to discover and understand
data on Big Data platforms
• Allow business analysts who understand the data but have little
technical expertise to quickly find data and run data profiling in
three steps
• Let analysts explore results and drilldown to details within 2-5
seconds per view to review and then report on data issues to
business leaders
• Scale to large volumes of data sources & attributes so that business
analysts can understand the contents of any data source needed for
business decisions
• Data is always secured in process and at rest and only available to
authorized users to comply with regulations and avoid fines
9
Trillium Discovery for Big Data
10
Trillium Discovery for Big Data
• Delivers enterprise trusted Trillium Discovery on distributed big data
platforms (e.g. Hadoop, Spark) for high-volume, scalable data profiling
• Provides complete Trillium Discovery data profiling for analysis & review
• Attribute metadata, value & pattern frequencies, key & dependency analysis,
cross-source join analysis, drill down to any outlier or issue, and more…
• Provides easily configured native connectivity for Big Data sources
• Provides managing and monitoring for task execution
• Integrates with the security frameworks (Kerberos, AD, LDAP) of
Big Data platforms
Run Profiling
1
n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
Trillium Discovery for Big Data – Data Profiling at Scale
Select Source Explore ProfilesRun Profiling
Stored Profiling Results
▪ Metadata & Statistics
▪ Frequency Distributions
▪ Drilldown Indices
Share &
Govern
Results
Integration
(APIs)
Notification
Collaboration
Native Connectors
▪ HDFS source directories
▪ …
Drilldown to IssuesEvaluate Business Rules
Key Outcomes
• Match and link any data entity – customers, suppliers, products, etc. –
into a trusted single view to support a broad array of business-critical
use cases (e.g. Customer 360, fraud, AML)
• Parse and standardize complex multi-domain data, extended with
enrichment and verification of critical address and geolocation data –
all leveraging out-of-the-box templates
• Utilize “design once, deploy anywhere” approach to speed time-to-
value and focus on building data quality business logic while letting the
product handle the technical aspects of framework execution with no
coding or tuning required
• Leverage the high-performance compute power of distributed Big Data
frameworks including Hadoop MapReduce and Spark to process high
volumes within targeted time windows to meet critical Service Level
Agreements (SLA’s)
12
Trillium Quality for Big Data
13
Trillium Quality for Big Data
• Integrate, parse, standardize, and match new and legacy customer data
from multiple disparate sources.
• Provide high-quality entity resolution through multi-domain deduplication
and matching with the most comprehensive set of match comparisons
available, including fuzzy matching, distance comparisons, and more.
• Standardize, enhance, and match international data sets with postal and
country-code validation.
• Deploy data quality workflows as native, parallel MapReduce or Spark
processes for optimal efficiency.
• Process hundreds of millions of records of data.
• Increase processing efficiency.
• Support failover through Hadoop’s fault-tolerant design; during a node
failure, processing is redirected to another node.
Trillium Quality for Big Data – Data Cleansing at Scale
Boost effectiveness of machine learning, AI with complete, standardized, matched data.
1. Visually create and test data
quality processes locally
2. Execute in MapReduce or Spark
On premise or in the Cloud
Big Data Platform
14
Syncsort Trillium Delivers Data You can Trust
Data Profiling Business Rules &
Data Quality
Assessment
Data Validation,
Standardization,
Enrichment & more
Matching, Entity
Resolution &
Verification
•Customer 360
•AI/ML
Operational Integrations
•Analytics &
Reporting
Data Governance
Trillium Discovery for Big Data
Trillium Quality for Big Data
+ Global Address Verification
Trillium DQ for Big Data
15
Trillium DQ for Big Data
Use Cases
16
Turn your Big Data
into a trusted view
of your customers,
products and more
Power machine
learning and
advanced analytics
with reliable, fit-for-
purpose data
Gain actionable
business insights
from high-volume
disparate data sets
from across the
enterprise
Deploy industry-
leading data quality
processes at massive
scale, with no coding
or Big Data skills
required
Trillium DQ for Big
Data evaluates &
transforms your Big
Data for trusted
business insights
Anti-Money
Laundering on
Hadoop at
Global Bank
S O LU T I O N
CHAL L ENGE
• Must provide highly accurate
entity resolution
• Must be secure – Kerberos, LDAP
• Must have lineage – data origin
to end point
• Massive data volumes
• Scattered data – Mainframe,
RDBMS, Cloud, …
• Must archive unaltered
mainframe data
Full Anti-Money Laundering
regulatory compliance with
financial crimes data lake –
high performance results at
massive scale.
• Full end-to-end data lineage
supplied to Apache Atlas
and ASG Data Intelligence
• Cluster-native data
verification, enrichment,
and demanding multi-field
entity resolution on Spark
• Unmodified mainframe
“Golden Records” stored
on Hadoop
Bank must monitor transactions
to detect Money Laundering for
FCA compliance.
Machine learning can detect
patterns, but …
Requires large amounts of
current, clean data.
• Trillium DQ for Big Data
• Connect CDC
• Connect for Big Data
18
Trillium DQ for
Big Data Cleanses
Credit Data for
Creditsafe
C H A L L E N G E
Ensure ALL DATA on each company is
analyzed – and NO DATA from another
company is accidentally included –
to get accurate corporate credit ratings.
• Need to profile, cleanse and enhance
data to evaluate credit ratings for
80 million companies in U.S. alone
• Existing solution lacked flexible
de-dupe matching rules, scalability
• Millions of records to analyze per
company, in multiple inconsistent
data sources, about 800 million/day
total and growing
• Solution must scale!
S O LU T I O N
• Amazon EMR Cloud
• Trillium DQ for Big Data cleansed,
standardized and matched over
130 million recs/hour on basic
10-node test cluster– met the
business SLA with room to grow
96% Address Matching Accuracy
after Trillium cleansing,
standardization
Saved software costs – Replaced
multiple solutions and tools
Saved Amazon cluster costs and
left room for company growth
“We can’t afford to miss
information, or mix up information
about businesses with similar
names. Companies count on our
highly accurate predictive scoring
to provide fast, accurate ratings
for their potential customers
and vendors.”
19
Next Steps
For more information on Trillium DQ for Big Data and our other
Syncsort Trillium data quality solutions, please visit:
https://www.syncsort.com/en/products/trillium-dq-for-big-data
And:
https://www.syncsort.com/en/integrate
Q & A
21
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for the Data Lake

Contenu connexe

Tendances

Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data FabricAlan McSweeney
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
BigID & Collibra Joint Deck: Using BigID’s Privacy-centric Data Discovery to...
BigID & Collibra Joint Deck: Using BigID’s Privacy-centric Data  Discovery to...BigID & Collibra Joint Deck: Using BigID’s Privacy-centric Data  Discovery to...
BigID & Collibra Joint Deck: Using BigID’s Privacy-centric Data Discovery to...BigID Inc
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityDATAVERSITY
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architectureanicewick
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Julien Le Dem
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Selecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approachSelecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approachChristopher Bradley
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
 
RWDG Slides: What is a Data Steward to do?
RWDG Slides: What is a Data Steward to do?RWDG Slides: What is a Data Steward to do?
RWDG Slides: What is a Data Steward to do?DATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 

Tendances (20)

Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
BigID & Collibra Joint Deck: Using BigID’s Privacy-centric Data Discovery to...
BigID & Collibra Joint Deck: Using BigID’s Privacy-centric Data  Discovery to...BigID & Collibra Joint Deck: Using BigID’s Privacy-centric Data  Discovery to...
BigID & Collibra Joint Deck: Using BigID’s Privacy-centric Data Discovery to...
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Selecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approachSelecting Data Management Tools - A practical approach
Selecting Data Management Tools - A practical approach
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
RWDG Slides: What is a Data Steward to do?
RWDG Slides: What is a Data Steward to do?RWDG Slides: What is a Data Steward to do?
RWDG Slides: What is a Data Steward to do?
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Data Warehousing Trends
Data Warehousing TrendsData Warehousing Trends
Data Warehousing Trends
 

Similaire à Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for the Data Lake

The New Trillium DQ: Big Data Insights When and Where You Need Them
The New Trillium DQ: Big Data Insights When and Where You Need ThemThe New Trillium DQ: Big Data Insights When and Where You Need Them
The New Trillium DQ: Big Data Insights When and Where You Need ThemPrecisely
 
What’s New in Syncsort’s Trillium Software System (TSS) 15.7
What’s New in Syncsort’s Trillium Software System (TSS) 15.7What’s New in Syncsort’s Trillium Software System (TSS) 15.7
What’s New in Syncsort’s Trillium Software System (TSS) 15.7Precisely
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudPrecisely
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Empowering Business & IT Teams:  Modern Data Catalog Requirements
Empowering Business & IT Teams:  Modern Data Catalog RequirementsEmpowering Business & IT Teams:  Modern Data Catalog Requirements
Empowering Business & IT Teams:  Modern Data Catalog RequirementsPrecisely
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationDenodo
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsDATAVERSITY
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Precisely
 
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)Vishal Bamba
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersRevolution Analytics
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 

Similaire à Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for the Data Lake (20)

The New Trillium DQ: Big Data Insights When and Where You Need Them
The New Trillium DQ: Big Data Insights When and Where You Need ThemThe New Trillium DQ: Big Data Insights When and Where You Need Them
The New Trillium DQ: Big Data Insights When and Where You Need Them
 
What’s New in Syncsort’s Trillium Software System (TSS) 15.7
What’s New in Syncsort’s Trillium Software System (TSS) 15.7What’s New in Syncsort’s Trillium Software System (TSS) 15.7
What’s New in Syncsort’s Trillium Software System (TSS) 15.7
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Empowering Business & IT Teams:  Modern Data Catalog Requirements
Empowering Business & IT Teams:  Modern Data Catalog RequirementsEmpowering Business & IT Teams:  Modern Data Catalog Requirements
Empowering Business & IT Teams:  Modern Data Catalog Requirements
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data Virtualization
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 

Plus de Precisely

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfPrecisely
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenPrecisely
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfPrecisely
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Precisely
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Precisely
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Precisely
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fPrecisely
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsPrecisely
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPPrecisely
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenPrecisely
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsPrecisely
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyPrecisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowPrecisely
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellencePrecisely
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation ManagementPrecisely
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowPrecisely
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckPrecisely
 

Plus de Precisely (20)

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAP
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIs
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to Know
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar Deck
 

Dernier

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for the Data Lake

  • 1. Introducing: Trillium DQ for Big Data Harald Smith, Director Product Marketing
  • 2. Housekeeping Webcast Audio • Today’s webcast audio is streamed through your computer speakers. • If you need technical assistance with the web interface or audio, please reach out to us using the chat window. Questions Welcome • Submit your questions at any time during the presentation using the chat window. • We will answer them during our Q&A session following the presentation. Recording and slides • This webcast is being recorded. You will receive an email following the webcast with a link to download both the recording and the slides.
  • 3. Speaker Harald Smith • Director of Product Marketing, Syncsort • 20+ years in Information Management with a focus on data quality, integration, and governance • Co-author of Patterns of Information Management • Author of two Redbooks on Information Governance and Data Integration • Blog on InfoWorld: “Data Democratized” 3
  • 4. Data challenges across the business Business Leaders Lack trust in data needed to make rapid, accurate decisions that grow business Business Analysts Can’t access or understand data and spend excessive time on investigating Information Leaders Must facilitate business collaboration and data transparency and governance Chief Data Officers Make data a strategic business asset utilizing scientific skills from basic spreadsheet knowledge 4
  • 5. Only 35% of senior executives have a high level of trust in the accuracy of their Big Data Analytics 92% of executives are concerned about the negative impact of data and analytics on corporate reputation New survey indicates nearly 80% of AI/ML projects stalling due to poor data quality 84% of CEOs are concerned about the quality of the data they’re basing decisions on Big Data Needs Data Quality
  • 6. 6 Data Quality Challenges of Big Data Profiling Data • Organizations are storing vast amounts of data in data lakes and the Cloud – from many different sources – but that data isn’t usable unless it is understood and to understand it, the business users who work with the data must be able to access and profile it without constant IT help Matching Entities Accurately • Distinguishing matches that indicate a single specific entity across so much data requires sophisticated multi-field matching algorithms – that need to be understandable by business users to be meaningful Scalability • Distinguishing matches across massive datasets requires a lot of compute power - compare everything has to be compared to everything else, multiple times in multiple ways • Taking advantage of Big Data processing for scalability requires specialized skills and takes a long time – and requires tuning, re-writing as technology changes • Traditional data quality tools are not designed to work on that scale of data
  • 7. Trillium DQ for Big Data Understand, Evaluate, and Resolve Big Data Quality Problems Trillium Discovery for Big Data Data Profiling Gain a complete picture of your data before use • Understand the data • Analyze the data • Find data quality problems • Build and evaluate data quality rules 7 Trillium DQ for Big Data On Premises or via Trillium Cloud Deploy any or all products to the cloud - Completely managed SaaS in AWS or Azure Trillium Quality for Big Data Data Cleansing and Matching Cleanse, standardize, and connect data in accordance with your predefined standards • Entity matching and resolution • Data cleansing and correction • Data record enrichment
  • 8. Feature-rich data profiling and data quality processing engines • Leveraging over two decades of data quality expertise An efficient orchestration of this engine in Big Data distributed frameworks • Powered by an architecture that has been in production with very large (2000+ node) environments running natively across the cluster • Partnered with Cloudera and Hortonworks closely, native integration with the stack • Syncsort has been a major contributor to Apache Hadoop open source project • With efficient orchestration, we can process any number of attributes with a handful of MapReduce jobs • Same architecture is used for Apache Spark “Design once, deploy anywhere” architecture • Native connectivity providing breadth and performance • “Intelligent Execution” to optimize process execution at run-time (MapReduce, Spark 1.x, Spark 2.x) • On-premise and in the cloud (e.g. Amazon EMR) 8 Data Quality for Your Big Data Needs
  • 9. Key Outcomes • Reduce the time for business analysts to discover and understand data on Big Data platforms • Allow business analysts who understand the data but have little technical expertise to quickly find data and run data profiling in three steps • Let analysts explore results and drilldown to details within 2-5 seconds per view to review and then report on data issues to business leaders • Scale to large volumes of data sources & attributes so that business analysts can understand the contents of any data source needed for business decisions • Data is always secured in process and at rest and only available to authorized users to comply with regulations and avoid fines 9 Trillium Discovery for Big Data
  • 10. 10 Trillium Discovery for Big Data • Delivers enterprise trusted Trillium Discovery on distributed big data platforms (e.g. Hadoop, Spark) for high-volume, scalable data profiling • Provides complete Trillium Discovery data profiling for analysis & review • Attribute metadata, value & pattern frequencies, key & dependency analysis, cross-source join analysis, drill down to any outlier or issue, and more… • Provides easily configured native connectivity for Big Data sources • Provides managing and monitoring for task execution • Integrates with the security frameworks (Kerberos, AD, LDAP) of Big Data platforms
  • 11. Run Profiling 1 n . . . . . . . . . . . . . . . . . . . . . . 11 Trillium Discovery for Big Data – Data Profiling at Scale Select Source Explore ProfilesRun Profiling Stored Profiling Results ▪ Metadata & Statistics ▪ Frequency Distributions ▪ Drilldown Indices Share & Govern Results Integration (APIs) Notification Collaboration Native Connectors ▪ HDFS source directories ▪ … Drilldown to IssuesEvaluate Business Rules
  • 12. Key Outcomes • Match and link any data entity – customers, suppliers, products, etc. – into a trusted single view to support a broad array of business-critical use cases (e.g. Customer 360, fraud, AML) • Parse and standardize complex multi-domain data, extended with enrichment and verification of critical address and geolocation data – all leveraging out-of-the-box templates • Utilize “design once, deploy anywhere” approach to speed time-to- value and focus on building data quality business logic while letting the product handle the technical aspects of framework execution with no coding or tuning required • Leverage the high-performance compute power of distributed Big Data frameworks including Hadoop MapReduce and Spark to process high volumes within targeted time windows to meet critical Service Level Agreements (SLA’s) 12 Trillium Quality for Big Data
  • 13. 13 Trillium Quality for Big Data • Integrate, parse, standardize, and match new and legacy customer data from multiple disparate sources. • Provide high-quality entity resolution through multi-domain deduplication and matching with the most comprehensive set of match comparisons available, including fuzzy matching, distance comparisons, and more. • Standardize, enhance, and match international data sets with postal and country-code validation. • Deploy data quality workflows as native, parallel MapReduce or Spark processes for optimal efficiency. • Process hundreds of millions of records of data. • Increase processing efficiency. • Support failover through Hadoop’s fault-tolerant design; during a node failure, processing is redirected to another node.
  • 14. Trillium Quality for Big Data – Data Cleansing at Scale Boost effectiveness of machine learning, AI with complete, standardized, matched data. 1. Visually create and test data quality processes locally 2. Execute in MapReduce or Spark On premise or in the Cloud Big Data Platform 14
  • 15. Syncsort Trillium Delivers Data You can Trust Data Profiling Business Rules & Data Quality Assessment Data Validation, Standardization, Enrichment & more Matching, Entity Resolution & Verification •Customer 360 •AI/ML Operational Integrations •Analytics & Reporting Data Governance Trillium Discovery for Big Data Trillium Quality for Big Data + Global Address Verification Trillium DQ for Big Data 15
  • 16. Trillium DQ for Big Data Use Cases 16
  • 17. Turn your Big Data into a trusted view of your customers, products and more Power machine learning and advanced analytics with reliable, fit-for- purpose data Gain actionable business insights from high-volume disparate data sets from across the enterprise Deploy industry- leading data quality processes at massive scale, with no coding or Big Data skills required Trillium DQ for Big Data evaluates & transforms your Big Data for trusted business insights
  • 18. Anti-Money Laundering on Hadoop at Global Bank S O LU T I O N CHAL L ENGE • Must provide highly accurate entity resolution • Must be secure – Kerberos, LDAP • Must have lineage – data origin to end point • Massive data volumes • Scattered data – Mainframe, RDBMS, Cloud, … • Must archive unaltered mainframe data Full Anti-Money Laundering regulatory compliance with financial crimes data lake – high performance results at massive scale. • Full end-to-end data lineage supplied to Apache Atlas and ASG Data Intelligence • Cluster-native data verification, enrichment, and demanding multi-field entity resolution on Spark • Unmodified mainframe “Golden Records” stored on Hadoop Bank must monitor transactions to detect Money Laundering for FCA compliance. Machine learning can detect patterns, but … Requires large amounts of current, clean data. • Trillium DQ for Big Data • Connect CDC • Connect for Big Data 18
  • 19. Trillium DQ for Big Data Cleanses Credit Data for Creditsafe C H A L L E N G E Ensure ALL DATA on each company is analyzed – and NO DATA from another company is accidentally included – to get accurate corporate credit ratings. • Need to profile, cleanse and enhance data to evaluate credit ratings for 80 million companies in U.S. alone • Existing solution lacked flexible de-dupe matching rules, scalability • Millions of records to analyze per company, in multiple inconsistent data sources, about 800 million/day total and growing • Solution must scale! S O LU T I O N • Amazon EMR Cloud • Trillium DQ for Big Data cleansed, standardized and matched over 130 million recs/hour on basic 10-node test cluster– met the business SLA with room to grow 96% Address Matching Accuracy after Trillium cleansing, standardization Saved software costs – Replaced multiple solutions and tools Saved Amazon cluster costs and left room for company growth “We can’t afford to miss information, or mix up information about businesses with similar names. Companies count on our highly accurate predictive scoring to provide fast, accurate ratings for their potential customers and vendors.” 19
  • 20. Next Steps For more information on Trillium DQ for Big Data and our other Syncsort Trillium data quality solutions, please visit: https://www.syncsort.com/en/products/trillium-dq-for-big-data And: https://www.syncsort.com/en/integrate