SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
IBM Software
Integrating and governing big data
Does big data spell big trouble for integration?
Not if you follow these best practices
Integrating and governing big data
1 2 3 4 5Introduction Integration and
governance
requirements
for big data
Best practices:
Integrating and
governing big
data effectively
IBM InfoSphere
delivers the
confidence to
act on big data
Why InfoSphere?
Integrating and governing big data
3
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Introduction
Business leaders are eager to harness
the power of big data. However, as the
opportunity increases, ensuring that source
information is trustworthy and protected
becomes exponentially more difficult. If
this trustworthiness issue is not addressed
directly, end users may lose confidence
in the insights generated from their data—
which can result in a failure to act on
opportunities or against threats.
To make the most of big data, you have to
start with data you trust. But the sheer
volume and complexity of big data means
that the traditional, manual methods of
discovering, governing and correcting
information are no longer feasible. Information
integration and governance must be
implemented within big data applications,
providing appropriate governance
and rapid integration from the start.
1	Introduction
By automating information integration and
governance and employing it at the point
of data creation, organizations can boost
big data confidence.
A solid integration and governance program
must include automated discovery, profiling
and understanding of diverse data sets to
provide context and enable employees
to make informed decisions. It must be
agile to accommodate a wide variety of
data and seamlessly integrate with diverse
technologies, from data marts to Apache
Hadoop systems. And it must automatically
discover, protect and monitor sensitive
information as part of big data applications.
Big data is a phenomenon, not a technology
With all the hype about big data, it’s easy to think that big data can solve all your problems. But big
data isn’t a technology—it’s a phenomenon. To leverage it effectively, you must be able to integrate
and govern key data throughout your enterprise.
Integrating and governing big data
4
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Whenever the topic of big data arises,
discussions often turn to analytics and
Hadoop. Interestingly, big data analytics
have been shifting recently toward
structured data and away from its origins in
unstructured data. But while analytics and
Hadoop are important for both structured
and unstructured data, they represent just
one piece of the big data puzzle.
Forward-thinking IT professionals now
realize that the phenomenon of big data
is affecting all of their systems, creating
a new set of requirements that impact
the results of data warehousing and big
data and analytics initiatives. To ensure
the best results, data from big data
sources must be integrated, governed
and trusted.
Many of the most common challenges
associated with big data aren’t really
analytics problems. In many cases,
these problems are fundamental—even
“traditional”—information integration
problems, and they can be avoided or
addressed with an agile, enterprise-class
data integration and governance solution.
Additionally, new big data sources are not
useful if they exist in a silo—they must be
integrated into your enterprise architecture.
Integration and governance requirements for big data
The best solutions form a solid, integrated
foundation that facilitates the analytics
work that yields valuable, actionable
business information.
Appropriate solutions for integrating and
governing big data should:
1.	Be agile
2.	Be built on a high-performance,
scalable architecture
3.	Support greater efficiency
4.	Help create confidence and trust in
the veracity of data
5.	Meet your requirements for flexible,
agile delivery of data
2	 Integration and
governance requirements
for big data
Integrating and governing big data
5
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Best practices: Integrating and governing big data effectively
Several best practices for integration and
governance can help you make the most of
big data in your organization.
Embrace IT agility for performance
and scalability
Big data streams in at high velocity—so
performance is key. Data changes rapidly,
and it must be fed to various applications in
the system quickly so that business leaders
can react to changing market conditions as
soon as possible.
To successfully handle big data, organizations
need an enterprise-class data integration
solution that is:
•	 Dynamic to meet your current and future
performance requirements.
•	 Extendable and partitioned for fast and
easy scalability.
•	 Integrated with Hadoop. Hadoop itself
is not an integration platform, but it can
be leveraged as part of an integration
architecture—to land and determine the
value of data, as well as for balanced
optimization.
Scalability is one of the most challenging
big data integration requirements, since
business requirements can evolve very
quickly. Consequently, when tackling big
data integration, it’s important that you have
a product that can achieve data scalability
across all architectures with the same
function and with linear speedup, scaling
“N way” without issue.
3	 Best practices:
Integrating and governing
big data effectively
1x
One core
SMP
(GBs)
SMP/MPP
(Hundreds of GB)
MPP/Grid
(Hundreds of TB)
Hundreds of cores
x
y
• Same functionality
• All architectures
• Linear speedup
• Decision (at runtime) to scale “N way”
Data scalability across hardware architectures
2 way
8 way
16 way
64 way
Integrating and governing big data
6
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Work smarter, not harder—and
control costs
Employee time is a valuable and costly
resource. An integration solution for big
data that supports employee productivity
and efficiency helps to improve the
enterprise’s bottom line, eliminate
bottlenecks and enhance agility.
For IT departments, service-level agreements
(SLAs) are often impacted by inefficiencies.
As data volume, variety, velocity and veracity
grows, the time required to process data
integration jobs frequently exceeds the window
allowed by SLAs, meaning that IT is no longer
meeting the needs of internal customers.
To improve productivity, it’s important to
create design logic for Hadoop-oriented
data integration efforts using the same
interface, concepts and logic constructs as
3	 Best practices:
Integrating and governing
big data effectively
for any other deployment method. This
eliminates the need to learn new coding
languages as they evolve and perform
hand-coding and replicating work.
End:
Integrated
data
Start:
Data
sources
Working harder Working smarter
End:
Integrated
data
Start:
Data
sources Multiple data
handoffs between
systems
Data gathered and passed directly
to real-time analytics processes
Single interface for integration tasks
Automated processes and elements
limit hand-coding and replication
Predetermined, confirmed set of
concepts and logic constructs
Support for multiple data sources
and streaming dataWork in multiple
interfaces
Working with different
code languages
Staffing bottlenecks
due to manual processes
Slow analysis speeds require longer
processing windows and more downtime
Integrating and governing big data
7
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
For big data projects focused on real-time
analytical processing, it is also critical to
quickly and easily integrate with systems
that support streaming data (also known
as “data in motion”). Big data integration
solutions should be “smart” enough to allow
standard data integration conventions to
gather and pass data directly to real-time
analytics processes.
Create confidence with accurate,
timely data
Companies usually tackle big data to
augment and improve their existing
analytics capabilities, either through
analyzing new data sources or by tackling
greater volumes of data—neither of which
is possible with traditional technologies.
However, analytics insight is only as
good as the underlying data. If companies
aren’t feeding their analytics systems
with quality data, the insight they gain
is invalid.
Without the ability to agree on and leverage
common definitions for key business terms,
businesses simply cannot be responsive
and adaptable. When departments have
inconsistent definitions for key terms,
decisions cannot be made with the
necessary speed and accuracy. For example,
what happens when someone on the
marketing side asks for “customer” data to
analyze—but receives just a subset of the
data they actually need to make a decision
because the IT team defined “customer” as
a household instead of an individual?
3	 Best practices:
Integrating and governing
big data effectively
Integrating and governing big data
8
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Unfortunately, it isn’t enough to simply
establish definitions and policies for
information, and then hope that people will
follow the rules. To be confident that their
data is trustworthy, organizations must be
able to trace its path through their systems
so they can see where it came from and
how it was manipulated. It’s important to
have a big data integration solution that can
support this level of transparency.
To ensure high-quality data, it is also critical to
have information analysis capabilities that
enable data stewards to test data quality.
For example, stewards might perform a
simple null check to ensure all the fields and
tables they are analyzing actually contain
data. In another scenario, they might run
their data against sophisticated algorithms
to determine its validity. This information
is most useful in a dashboard view, so
business analysts can quickly determine
whether there are any issues and easily
get into the details.
It is important to apply data cleansing
to any big data you want to retain so
you can establish confidence in your
data. Confidence in data quality enables
confidence in analytics results.
Cleansing data as part of the information integration cycle
helps ensure data quality for the rest of the process.
3	 Best practices:
Integrating and governing
big data effectively
Data
sources
Deliver
Create and
maintain
quality
Cleansing
Business
initiatives
Applying data cleansing in the integration
and governance workflow
Understand
and govern
Transform
Information
integration
Integrating and governing big data
9
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Real-time integration
provides flexibility for
transactional integrity
plus high-volume, low-
latency replication for
continuous business
availability.
Self-service data
integration enables
line-of-business and
other nontechnical users
to get information
whenever needed to
fuel their analytics.
Deliver data appropriately
When approaching big data integration
projects, you want to achieve high
performance and scalability for real-time
data processing, as well as for bulk or batch
movement. In many cases, organizations
also need to leverage data replication or
virtualization as part of their larger big
data integration solution. This is true for
traditional data integration as well as for big
data integration. Here are several good
styles of data delivery that can be used
along with big data platforms:
High-speed bulk data
delivery, including ETL,
ELT and dynamic
integration that leverage
Hadoop to support
information exchange
with big data sources.
Virtualized access
to and delivery from
diverse and distributed
information allows virtual
consolidation of big (and
regular) data.
ETL
Log
3	 Best practices:
Integrating and governing
big data effectively
IBM InfoSphere
Information Server for
Data Integration
IBM InfoSphere
Federation Server
IBM InfoSphere Data
Replication
IBM InfoSphere
Data Click
Integrating and governing big data
10
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Leverage data replication
As the amount and variety of data in your
environment builds, it will become less
practical to maintain physical pools of data.
To remain flexible and agile in the big data
world, enterprises must leverage different
technologies—including incremental data
delivery—to ensure they have the data they
need. Data transformation and delivery
requirements have broadened from batch
and bulk data movement to also include
real-time data transfer based on data
replication capabilities—specifically around
change data capture (CDC). Whereas batch
and bulk data movement happens relatively
infrequently, real-time data delivery occurs
whenever data at the source changes. The
changed data is captured, transferred and
transformed, and then loaded into the target.
Three factors impact the performance and
scalability of real-time data transformations:
1.	The approach used to capture a
change at the source or sources.
The most flexible and efficient option for
capturing changes at the source is for a
CDC mechanism to “push” changes as
data streams. As soon as source data
is modified, the mechanism becomes
aware of the alteration and forwards
the data.
2.	The mechanism used. Many
mechanisms can be used for CDC. When
properly implemented, a log-based
capture approach often has a lower
impact on the source database, resulting
in higher overall performance.
3.	Temporary data persistence. Whether
data is temporarily persisted also impacts
CDC performance. Ideally, an organization
would be able to stream changes without
persisting them to increase performance
(since data does not need to be written
to disk and then accessed by a
transformation engine).
For more information about deriving real-
time insight from big data using data
replication, download this IBM white paper.
3	 Best practices:
Integrating and governing
big data effectively
Integrating and governing big data
11
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Virtualize data
Given the massive upswings in the volume,
variety, velocity and veracity of data, the
question of data access is more relevant
than ever before. Data virtualization
technologies can help create the pool of
data you need to support your business.
Data virtualization focuses on simplifying
access to data by isolating the details of
storage and retrieval and making the
process transparent to data consumers.
By doing so, data virtualization reduces the
time required to take advantage of disparate
data, which makes it easier for users and
processes to get the information they need
in a timely manner.
Two primary strategies exist for data
virtualization: data federation and data
services. In both cases, data is exposed to
make it more consumable, accessible and
reusable by users, customers or business
processes throughout the enterprise.
3	 Best practices:
Integrating and governing
big data effectively
Integrating and governing big data
12
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
More about InfoSphere Information Server
Want a deeper dive into the InfoSphere Information Server capabilities that help you support agile integration, business-driven governance and
sustainable quality? Check out this resources website: ibm.com/software/data/integration/info_server/demo.html
IBM InfoSphere delivers the confidence to act on big data
While the term “big data” has only recently
come into vogue, IBM has designed
solutions capable of handling very large
quantities of data for decades. The
company has long led the way with data
integration, management, security and
analytics solutions that are known for their
reliability, flexibility and scalability.
The end-to-end information integration
capabilities of IBM® InfoSphere® Information
Server are designed to help you understand,
cleanse, monitor, transform and deliver
data—as well as collaborate to bridge the
gap between business and IT. InfoSphere
Information Server enables you to be
confident that the information that drives
your business and your strategic initiatives,
from big data and point-of-impact analytics
to master data management and data
warehousing, is trusted, consistent and
governed in real time. In fact, InfoSphere
Information Server is 10-15 times faster
than Hadoop for data integration.1
Be fast and agile
Organizations working with big data
need unlimited data scalability from their
integration software. InfoSphere software is
designed from the ground up to optimize the
usage of hardware resources, allowing the
maximum amount of data to be processed
per node. It has powerful data transformation
and delivery capabilities, enabling clients to
process data on massively parallel systems,
eliminating bottlenecks and dramatically
improving time-to-value.
4	 IBM InfoSphere delivers
the confidence to act
on big data
Integrating and governing big data
13
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
The University of Arizona speeds data
access with InfoSphere Information Server
With over 38,000 students and faculty, the
University of Arizona’s infrastructure carries
a heavy load. To stay competitive, it needed
to replace aging administrative computer
systems that couldn’t handle the demand for BI
information. According to Manav Mehra, senior
manager of information integration for enterprise
information and analytics at the University of
Arizona, the organization wanted a single source
of data that users could easily query at their
convenience and get results in a timely manner.
The University’s BI team used InfoSphere Information
Server to build that single source of trusted data;
the team employed the software to understand,
cleanse, transform and deliver data from source
systems into its enterprise data warehouse.
The solution includes tools to help BI staff:
•	 Discover, model, visualize, relate and standardize
diverse and distributed data sets
•	 Capture and define business requirements
in a common familiar format to support the
development of extract, transform and load
(ETL) jobs
•	 Gain insight into data source analysis, ETL
processes, data quality rules, business
terminology, data models and BI reports
“On average, InfoSphere Information Server
software saves us around six hours per developer
in terms of data modeling and ETL job creation,”
says Mehra. “We had two graduate students from
our MIS department help us create ETL jobs and
they were able to build close to 9,000 ETL jobs from
a template within three months. But more important
for me is the amount of time it saves in finding and
fixing data problems.”
Mehra says the team can run more than 22,000
nightly ETL jobs in 2.5 hours compared to 9 hours
before they introduced InfoSphere Information
Server. And in the six months since deployment, use
of the organization’s enterprise data warehouse has
significantly increased—a sign that users are finding
the information they need.
Learn more about the University’s experience here.
4	 IBM InfoSphere delivers
the confidence to act
on big data
Integrating and governing big data
14
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Be efficient
InfoSphere Information Server includes
capabilities that help make better use of
employee time. For example, Version 9.1
includes InfoSphere Data Click, which
greatly simplifies self-service data
integration and provisioning. As a result,
line-of-business staff can do these tasks
themselves, while skilled IT engineers focus
on higher-value efforts.
InfoSphere Information Server also saves
developer time by providing a single design
palette in a shared application environment.
Developers don’t have to flip through
different interfaces, since everything they
need is easily accessible. In addition, every
InfoSphere Information Server component
uses the same metadata layer. This makes
it simple to track job progress and diagnose
problems quickly. There is also a dashboard
to provide a unified view of the environment.
As big data stores continue to grow, these
high-performance and time-saving features
become even more important. For IT
departments, they can mean the difference
between meeting SLAs or not, between
having time to work on innovative new projects
or low-value efforts just managing existing
systems. For the business, it can mean
quicker, more informed decision making—
leading to stronger profits, better service for
customers and competitive advantage.
4	 IBM InfoSphere delivers
the confidence to act
on big data
InfoSphere Information Server for
Data Integration
Integration is the name of the game when it
comes to boosting accuracy and efficiency.
Watch this video and find out how InfoSphere
Information Server helps you bring data sources
together. Download the video at ibm.co/13jL5mr
Integrating and governing big data
15
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Be confident
Many enterprises have improved data
quality by implementing data governance.
Ideally, a data governance initiative will
encompass three functions: definition of
terms, cleansing of existing data and data
quality monitoring.
To help individuals across the organization
reach a shared understand of key terms,
InfoSphere Information Server provides a
data glossary, which allows business and
IT to create and agree on definitions, rules
and policies. Data modeling capabilities also
are included, enabling data architects to
determine where each piece of data will
come from and where it will go. These tools
allow organizations to establish “truth”—at
least as it relates to business data.
To be truly confident that data is trustworthy,
however, organizations must also be able to
trace the path of data through their systems.
To support this level of transparency,
InfoSphere Information Server provides
metadata and lineage capabilities that
allow users to track data back to its original
source and see every calculation performed
on it along the way.
InfoSphere Information Server for Data Quality in action
Learn more about the four phases of cleansing and standardizing data for maximum quality—and
how InfoSphere Information Server for Data Quality brings them all together. Download the video
and watch now: ibm.co/17yl8nC
Furthermore, InfoSphere Information Server
provides data quality capabilities to help
cleanse data and monitor quality on an
ongoing basis. Cleansing capabilities
include sophisticated tools for investigation,
standardization, matching and survivorship,
enabling data stewards to fix any problems
they find during their analysis. For example,
names should be matched automatically,
so that “William Smith” and “Bill Smith” are
listed as a single customer.
4	 IBM InfoSphere delivers
the confidence to act
on big data
Integrating and governing big data
16
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Be flexible
In many cases, “fast” isn’t enough.
Delivering real-time integration is about
flexibility, not just speed. One way to obtain
that data is to run queries against the
databases or applications where the data
resides. However, that approach can slow
down transactional systems so much that
business leaders find it unacceptable.
A better approach: use a solution, such as
InfoSphere Data Replication, that quickly
captures ever-changing data and sends it
wherever it needs to go, giving business
managers an up-to-the-second view of vital
information without slowing business-critical
processes. InfoSphere Data Replication
uses a “push” CDC mechanism as data
streams to provide flexibility and efficiency.
It also employs log-based capture to reduce
source database impact, and it streams
changed data without data persistence to
increase performance.
Depending on your big data integration
requirements, data federation can also help
you adapt to your big data requirements.
IBM InfoSphere Federation Server quickly
creates a consolidated view of your data to
support key business processes and
decisions. You can access and integrate
diverse data and content sources as if they
were a single resource—regardless of
where the information actually resides.
The four Vs of big data
How do you deal with the volume, velocity,
variety and veracity of big data? InfoSphere
Data Replication gives you the near–real-time
capabilities you need to adjust product offers,
deliver reliable data and more. Download the
video and learn more about the power of
flexibility: ibm.co/11cy27N
4	 IBM InfoSphere delivers
the confidence to act
on big data
Integrating and governing big data
17
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Why InfoSphere?
As a critical element of IBM Watson™
Foundations, the IBM big data and analytics
platform, InfoSphere Information Integration
and Governance (IIG) provides market-
leading functionality to handle the challenges
of big data. InfoSphere IIG provides optimal
scalability and performance for massive data
volumes, agile and rightsized integration
and governance for the increasing velocity
of data, and support and protection for a
wide variety of data types and big data
systems. InfoSphere IIG helps make big
data and analytics projects successful by
delivering business users the confidence to
act on insight.
InfoSphere capabilities include:
•	 Metadata, business glossary and
policy management: Define metadata,
business terminology and governance
policies with IBM InfoSphere Information
Governance Catalog.
•	 Data integration: Handle all integration
requirements, including batch data
transformation and movement (InfoSphere
Information Server), real-time replication
(InfoSphere Data Replication) and data
federation (InfoSphere Federation Server).
•	 Data quality: Parse, standardize, validate
and match enterprise data with IBM
InfoSphere Information Server for
Data Quality.
•	 Master data management (MDM):
Act on a trusted view of your customers,
products, suppliers, locations and
accounts with InfoSphere MDM.
•	 Data lifecycle management: Manage
the data lifecycle from test data creation
through retirement and archiving with IBM
InfoSphere Optim™.
•	 Data security and privacy: Continuously
monitor data access and protect
repositories from data breaches, and
support compliance with IBM InfoSphere
Guardium®. Ensure that sensitive
data is masked and protected with
InfoSphere Optim.
5	 Why InfoSphere?
Integrating and governing big data
18
1	Introduction 2	 Integration and
governance requirements
for big data
3	 Best practices:
Integrating and governing
big data effectively
4	 IBM InfoSphere delivers
the confidence to act
on big data
5	 Why InfoSphere?
Additional resources
To learn more about the IBM approach to information integration and governance for big
data, please contact your IBM representative or IBM Business Partner, or check out
these resources:
•	 ibm.com/software/data/information-integration-governance
•	 ibm.com/software/data/infosphere/information-integration-big-data
•	 ibm.com/software/data/integration/info_server
•	 InfoSphere Information Server: A Forrester Total Economic Impact Study
•	 Delivering Trusted Information for Big Data and Data Warehousing:
A Ventana Research Report
•	 Gartner: Hadoop Is Not a Data Integration Solution
•	 ITG: Business Case for Enterprise Data Integration Strategy: Comparing IBM
InfoSphere Information Server and Open Source Tools
5	 Why InfoSphere?
IMM14125-USEN-03
© Copyright IBM Corporation 2014
IBM Corporation
Software Group
Route 100
Somers, NY 10589
Produced in the United States of America
December 2014
IBM, the IBM logo, ibm.com, Guardium, IBM Watson, InfoSphere,
and Optim are trademarks of International Business Machines
Corp., registered in many jurisdictions worldwide. Other product
and service names might be trademarks of IBM or other
companies. A current list of IBM trademarks is available on the
web at ibm.com/legal/copytrade.shtml
This document is current as of the initial date of publication and
may be changed by IBM at any time. Not all offerings are available
in every country in which IBM operates.
It is the user’s responsibility to evaluate and verify the operation of
any other products or programs with IBM products and programs.
THE INFORMATION IN THIS DOCUMENT IS PROVIDED
“AS IS” WITHOUT ANY WARRANTY, EXPRESS OR
IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND ANY WARRANTY OR CONDITION OF
NON-INFRINGEMENT. IBM products are warranted according
to the terms and conditions of the agreements under which they
are provided.
The client is responsible for ensuring compliance with laws and
regulations applicable to it. IBM does not provide legal advice or
represent or warrant that its services or products will ensure that the
client is in compliance with any law or regulation.
1
IBM internal testing.
Please Recycle

Contenu connexe

Dernier

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 

Dernier (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Big Data needs governance ! Integrating and governing big data, with IBM Solutions.

  • 1. IBM Software Integrating and governing big data Does big data spell big trouble for integration? Not if you follow these best practices
  • 2. Integrating and governing big data 1 2 3 4 5Introduction Integration and governance requirements for big data Best practices: Integrating and governing big data effectively IBM InfoSphere delivers the confidence to act on big data Why InfoSphere?
  • 3. Integrating and governing big data 3 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Introduction Business leaders are eager to harness the power of big data. However, as the opportunity increases, ensuring that source information is trustworthy and protected becomes exponentially more difficult. If this trustworthiness issue is not addressed directly, end users may lose confidence in the insights generated from their data— which can result in a failure to act on opportunities or against threats. To make the most of big data, you have to start with data you trust. But the sheer volume and complexity of big data means that the traditional, manual methods of discovering, governing and correcting information are no longer feasible. Information integration and governance must be implemented within big data applications, providing appropriate governance and rapid integration from the start. 1 Introduction By automating information integration and governance and employing it at the point of data creation, organizations can boost big data confidence. A solid integration and governance program must include automated discovery, profiling and understanding of diverse data sets to provide context and enable employees to make informed decisions. It must be agile to accommodate a wide variety of data and seamlessly integrate with diverse technologies, from data marts to Apache Hadoop systems. And it must automatically discover, protect and monitor sensitive information as part of big data applications. Big data is a phenomenon, not a technology With all the hype about big data, it’s easy to think that big data can solve all your problems. But big data isn’t a technology—it’s a phenomenon. To leverage it effectively, you must be able to integrate and govern key data throughout your enterprise.
  • 4. Integrating and governing big data 4 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Whenever the topic of big data arises, discussions often turn to analytics and Hadoop. Interestingly, big data analytics have been shifting recently toward structured data and away from its origins in unstructured data. But while analytics and Hadoop are important for both structured and unstructured data, they represent just one piece of the big data puzzle. Forward-thinking IT professionals now realize that the phenomenon of big data is affecting all of their systems, creating a new set of requirements that impact the results of data warehousing and big data and analytics initiatives. To ensure the best results, data from big data sources must be integrated, governed and trusted. Many of the most common challenges associated with big data aren’t really analytics problems. In many cases, these problems are fundamental—even “traditional”—information integration problems, and they can be avoided or addressed with an agile, enterprise-class data integration and governance solution. Additionally, new big data sources are not useful if they exist in a silo—they must be integrated into your enterprise architecture. Integration and governance requirements for big data The best solutions form a solid, integrated foundation that facilitates the analytics work that yields valuable, actionable business information. Appropriate solutions for integrating and governing big data should: 1. Be agile 2. Be built on a high-performance, scalable architecture 3. Support greater efficiency 4. Help create confidence and trust in the veracity of data 5. Meet your requirements for flexible, agile delivery of data 2 Integration and governance requirements for big data
  • 5. Integrating and governing big data 5 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Best practices: Integrating and governing big data effectively Several best practices for integration and governance can help you make the most of big data in your organization. Embrace IT agility for performance and scalability Big data streams in at high velocity—so performance is key. Data changes rapidly, and it must be fed to various applications in the system quickly so that business leaders can react to changing market conditions as soon as possible. To successfully handle big data, organizations need an enterprise-class data integration solution that is: • Dynamic to meet your current and future performance requirements. • Extendable and partitioned for fast and easy scalability. • Integrated with Hadoop. Hadoop itself is not an integration platform, but it can be leveraged as part of an integration architecture—to land and determine the value of data, as well as for balanced optimization. Scalability is one of the most challenging big data integration requirements, since business requirements can evolve very quickly. Consequently, when tackling big data integration, it’s important that you have a product that can achieve data scalability across all architectures with the same function and with linear speedup, scaling “N way” without issue. 3 Best practices: Integrating and governing big data effectively 1x One core SMP (GBs) SMP/MPP (Hundreds of GB) MPP/Grid (Hundreds of TB) Hundreds of cores x y • Same functionality • All architectures • Linear speedup • Decision (at runtime) to scale “N way” Data scalability across hardware architectures 2 way 8 way 16 way 64 way
  • 6. Integrating and governing big data 6 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Work smarter, not harder—and control costs Employee time is a valuable and costly resource. An integration solution for big data that supports employee productivity and efficiency helps to improve the enterprise’s bottom line, eliminate bottlenecks and enhance agility. For IT departments, service-level agreements (SLAs) are often impacted by inefficiencies. As data volume, variety, velocity and veracity grows, the time required to process data integration jobs frequently exceeds the window allowed by SLAs, meaning that IT is no longer meeting the needs of internal customers. To improve productivity, it’s important to create design logic for Hadoop-oriented data integration efforts using the same interface, concepts and logic constructs as 3 Best practices: Integrating and governing big data effectively for any other deployment method. This eliminates the need to learn new coding languages as they evolve and perform hand-coding and replicating work. End: Integrated data Start: Data sources Working harder Working smarter End: Integrated data Start: Data sources Multiple data handoffs between systems Data gathered and passed directly to real-time analytics processes Single interface for integration tasks Automated processes and elements limit hand-coding and replication Predetermined, confirmed set of concepts and logic constructs Support for multiple data sources and streaming dataWork in multiple interfaces Working with different code languages Staffing bottlenecks due to manual processes Slow analysis speeds require longer processing windows and more downtime
  • 7. Integrating and governing big data 7 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? For big data projects focused on real-time analytical processing, it is also critical to quickly and easily integrate with systems that support streaming data (also known as “data in motion”). Big data integration solutions should be “smart” enough to allow standard data integration conventions to gather and pass data directly to real-time analytics processes. Create confidence with accurate, timely data Companies usually tackle big data to augment and improve their existing analytics capabilities, either through analyzing new data sources or by tackling greater volumes of data—neither of which is possible with traditional technologies. However, analytics insight is only as good as the underlying data. If companies aren’t feeding their analytics systems with quality data, the insight they gain is invalid. Without the ability to agree on and leverage common definitions for key business terms, businesses simply cannot be responsive and adaptable. When departments have inconsistent definitions for key terms, decisions cannot be made with the necessary speed and accuracy. For example, what happens when someone on the marketing side asks for “customer” data to analyze—but receives just a subset of the data they actually need to make a decision because the IT team defined “customer” as a household instead of an individual? 3 Best practices: Integrating and governing big data effectively
  • 8. Integrating and governing big data 8 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Unfortunately, it isn’t enough to simply establish definitions and policies for information, and then hope that people will follow the rules. To be confident that their data is trustworthy, organizations must be able to trace its path through their systems so they can see where it came from and how it was manipulated. It’s important to have a big data integration solution that can support this level of transparency. To ensure high-quality data, it is also critical to have information analysis capabilities that enable data stewards to test data quality. For example, stewards might perform a simple null check to ensure all the fields and tables they are analyzing actually contain data. In another scenario, they might run their data against sophisticated algorithms to determine its validity. This information is most useful in a dashboard view, so business analysts can quickly determine whether there are any issues and easily get into the details. It is important to apply data cleansing to any big data you want to retain so you can establish confidence in your data. Confidence in data quality enables confidence in analytics results. Cleansing data as part of the information integration cycle helps ensure data quality for the rest of the process. 3 Best practices: Integrating and governing big data effectively Data sources Deliver Create and maintain quality Cleansing Business initiatives Applying data cleansing in the integration and governance workflow Understand and govern Transform Information integration
  • 9. Integrating and governing big data 9 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Real-time integration provides flexibility for transactional integrity plus high-volume, low- latency replication for continuous business availability. Self-service data integration enables line-of-business and other nontechnical users to get information whenever needed to fuel their analytics. Deliver data appropriately When approaching big data integration projects, you want to achieve high performance and scalability for real-time data processing, as well as for bulk or batch movement. In many cases, organizations also need to leverage data replication or virtualization as part of their larger big data integration solution. This is true for traditional data integration as well as for big data integration. Here are several good styles of data delivery that can be used along with big data platforms: High-speed bulk data delivery, including ETL, ELT and dynamic integration that leverage Hadoop to support information exchange with big data sources. Virtualized access to and delivery from diverse and distributed information allows virtual consolidation of big (and regular) data. ETL Log 3 Best practices: Integrating and governing big data effectively IBM InfoSphere Information Server for Data Integration IBM InfoSphere Federation Server IBM InfoSphere Data Replication IBM InfoSphere Data Click
  • 10. Integrating and governing big data 10 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Leverage data replication As the amount and variety of data in your environment builds, it will become less practical to maintain physical pools of data. To remain flexible and agile in the big data world, enterprises must leverage different technologies—including incremental data delivery—to ensure they have the data they need. Data transformation and delivery requirements have broadened from batch and bulk data movement to also include real-time data transfer based on data replication capabilities—specifically around change data capture (CDC). Whereas batch and bulk data movement happens relatively infrequently, real-time data delivery occurs whenever data at the source changes. The changed data is captured, transferred and transformed, and then loaded into the target. Three factors impact the performance and scalability of real-time data transformations: 1. The approach used to capture a change at the source or sources. The most flexible and efficient option for capturing changes at the source is for a CDC mechanism to “push” changes as data streams. As soon as source data is modified, the mechanism becomes aware of the alteration and forwards the data. 2. The mechanism used. Many mechanisms can be used for CDC. When properly implemented, a log-based capture approach often has a lower impact on the source database, resulting in higher overall performance. 3. Temporary data persistence. Whether data is temporarily persisted also impacts CDC performance. Ideally, an organization would be able to stream changes without persisting them to increase performance (since data does not need to be written to disk and then accessed by a transformation engine). For more information about deriving real- time insight from big data using data replication, download this IBM white paper. 3 Best practices: Integrating and governing big data effectively
  • 11. Integrating and governing big data 11 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Virtualize data Given the massive upswings in the volume, variety, velocity and veracity of data, the question of data access is more relevant than ever before. Data virtualization technologies can help create the pool of data you need to support your business. Data virtualization focuses on simplifying access to data by isolating the details of storage and retrieval and making the process transparent to data consumers. By doing so, data virtualization reduces the time required to take advantage of disparate data, which makes it easier for users and processes to get the information they need in a timely manner. Two primary strategies exist for data virtualization: data federation and data services. In both cases, data is exposed to make it more consumable, accessible and reusable by users, customers or business processes throughout the enterprise. 3 Best practices: Integrating and governing big data effectively
  • 12. Integrating and governing big data 12 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? More about InfoSphere Information Server Want a deeper dive into the InfoSphere Information Server capabilities that help you support agile integration, business-driven governance and sustainable quality? Check out this resources website: ibm.com/software/data/integration/info_server/demo.html IBM InfoSphere delivers the confidence to act on big data While the term “big data” has only recently come into vogue, IBM has designed solutions capable of handling very large quantities of data for decades. The company has long led the way with data integration, management, security and analytics solutions that are known for their reliability, flexibility and scalability. The end-to-end information integration capabilities of IBM® InfoSphere® Information Server are designed to help you understand, cleanse, monitor, transform and deliver data—as well as collaborate to bridge the gap between business and IT. InfoSphere Information Server enables you to be confident that the information that drives your business and your strategic initiatives, from big data and point-of-impact analytics to master data management and data warehousing, is trusted, consistent and governed in real time. In fact, InfoSphere Information Server is 10-15 times faster than Hadoop for data integration.1 Be fast and agile Organizations working with big data need unlimited data scalability from their integration software. InfoSphere software is designed from the ground up to optimize the usage of hardware resources, allowing the maximum amount of data to be processed per node. It has powerful data transformation and delivery capabilities, enabling clients to process data on massively parallel systems, eliminating bottlenecks and dramatically improving time-to-value. 4 IBM InfoSphere delivers the confidence to act on big data
  • 13. Integrating and governing big data 13 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? The University of Arizona speeds data access with InfoSphere Information Server With over 38,000 students and faculty, the University of Arizona’s infrastructure carries a heavy load. To stay competitive, it needed to replace aging administrative computer systems that couldn’t handle the demand for BI information. According to Manav Mehra, senior manager of information integration for enterprise information and analytics at the University of Arizona, the organization wanted a single source of data that users could easily query at their convenience and get results in a timely manner. The University’s BI team used InfoSphere Information Server to build that single source of trusted data; the team employed the software to understand, cleanse, transform and deliver data from source systems into its enterprise data warehouse. The solution includes tools to help BI staff: • Discover, model, visualize, relate and standardize diverse and distributed data sets • Capture and define business requirements in a common familiar format to support the development of extract, transform and load (ETL) jobs • Gain insight into data source analysis, ETL processes, data quality rules, business terminology, data models and BI reports “On average, InfoSphere Information Server software saves us around six hours per developer in terms of data modeling and ETL job creation,” says Mehra. “We had two graduate students from our MIS department help us create ETL jobs and they were able to build close to 9,000 ETL jobs from a template within three months. But more important for me is the amount of time it saves in finding and fixing data problems.” Mehra says the team can run more than 22,000 nightly ETL jobs in 2.5 hours compared to 9 hours before they introduced InfoSphere Information Server. And in the six months since deployment, use of the organization’s enterprise data warehouse has significantly increased—a sign that users are finding the information they need. Learn more about the University’s experience here. 4 IBM InfoSphere delivers the confidence to act on big data
  • 14. Integrating and governing big data 14 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Be efficient InfoSphere Information Server includes capabilities that help make better use of employee time. For example, Version 9.1 includes InfoSphere Data Click, which greatly simplifies self-service data integration and provisioning. As a result, line-of-business staff can do these tasks themselves, while skilled IT engineers focus on higher-value efforts. InfoSphere Information Server also saves developer time by providing a single design palette in a shared application environment. Developers don’t have to flip through different interfaces, since everything they need is easily accessible. In addition, every InfoSphere Information Server component uses the same metadata layer. This makes it simple to track job progress and diagnose problems quickly. There is also a dashboard to provide a unified view of the environment. As big data stores continue to grow, these high-performance and time-saving features become even more important. For IT departments, they can mean the difference between meeting SLAs or not, between having time to work on innovative new projects or low-value efforts just managing existing systems. For the business, it can mean quicker, more informed decision making— leading to stronger profits, better service for customers and competitive advantage. 4 IBM InfoSphere delivers the confidence to act on big data InfoSphere Information Server for Data Integration Integration is the name of the game when it comes to boosting accuracy and efficiency. Watch this video and find out how InfoSphere Information Server helps you bring data sources together. Download the video at ibm.co/13jL5mr
  • 15. Integrating and governing big data 15 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Be confident Many enterprises have improved data quality by implementing data governance. Ideally, a data governance initiative will encompass three functions: definition of terms, cleansing of existing data and data quality monitoring. To help individuals across the organization reach a shared understand of key terms, InfoSphere Information Server provides a data glossary, which allows business and IT to create and agree on definitions, rules and policies. Data modeling capabilities also are included, enabling data architects to determine where each piece of data will come from and where it will go. These tools allow organizations to establish “truth”—at least as it relates to business data. To be truly confident that data is trustworthy, however, organizations must also be able to trace the path of data through their systems. To support this level of transparency, InfoSphere Information Server provides metadata and lineage capabilities that allow users to track data back to its original source and see every calculation performed on it along the way. InfoSphere Information Server for Data Quality in action Learn more about the four phases of cleansing and standardizing data for maximum quality—and how InfoSphere Information Server for Data Quality brings them all together. Download the video and watch now: ibm.co/17yl8nC Furthermore, InfoSphere Information Server provides data quality capabilities to help cleanse data and monitor quality on an ongoing basis. Cleansing capabilities include sophisticated tools for investigation, standardization, matching and survivorship, enabling data stewards to fix any problems they find during their analysis. For example, names should be matched automatically, so that “William Smith” and “Bill Smith” are listed as a single customer. 4 IBM InfoSphere delivers the confidence to act on big data
  • 16. Integrating and governing big data 16 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Be flexible In many cases, “fast” isn’t enough. Delivering real-time integration is about flexibility, not just speed. One way to obtain that data is to run queries against the databases or applications where the data resides. However, that approach can slow down transactional systems so much that business leaders find it unacceptable. A better approach: use a solution, such as InfoSphere Data Replication, that quickly captures ever-changing data and sends it wherever it needs to go, giving business managers an up-to-the-second view of vital information without slowing business-critical processes. InfoSphere Data Replication uses a “push” CDC mechanism as data streams to provide flexibility and efficiency. It also employs log-based capture to reduce source database impact, and it streams changed data without data persistence to increase performance. Depending on your big data integration requirements, data federation can also help you adapt to your big data requirements. IBM InfoSphere Federation Server quickly creates a consolidated view of your data to support key business processes and decisions. You can access and integrate diverse data and content sources as if they were a single resource—regardless of where the information actually resides. The four Vs of big data How do you deal with the volume, velocity, variety and veracity of big data? InfoSphere Data Replication gives you the near–real-time capabilities you need to adjust product offers, deliver reliable data and more. Download the video and learn more about the power of flexibility: ibm.co/11cy27N 4 IBM InfoSphere delivers the confidence to act on big data
  • 17. Integrating and governing big data 17 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Why InfoSphere? As a critical element of IBM Watson™ Foundations, the IBM big data and analytics platform, InfoSphere Information Integration and Governance (IIG) provides market- leading functionality to handle the challenges of big data. InfoSphere IIG provides optimal scalability and performance for massive data volumes, agile and rightsized integration and governance for the increasing velocity of data, and support and protection for a wide variety of data types and big data systems. InfoSphere IIG helps make big data and analytics projects successful by delivering business users the confidence to act on insight. InfoSphere capabilities include: • Metadata, business glossary and policy management: Define metadata, business terminology and governance policies with IBM InfoSphere Information Governance Catalog. • Data integration: Handle all integration requirements, including batch data transformation and movement (InfoSphere Information Server), real-time replication (InfoSphere Data Replication) and data federation (InfoSphere Federation Server). • Data quality: Parse, standardize, validate and match enterprise data with IBM InfoSphere Information Server for Data Quality. • Master data management (MDM): Act on a trusted view of your customers, products, suppliers, locations and accounts with InfoSphere MDM. • Data lifecycle management: Manage the data lifecycle from test data creation through retirement and archiving with IBM InfoSphere Optim™. • Data security and privacy: Continuously monitor data access and protect repositories from data breaches, and support compliance with IBM InfoSphere Guardium®. Ensure that sensitive data is masked and protected with InfoSphere Optim. 5 Why InfoSphere?
  • 18. Integrating and governing big data 18 1 Introduction 2 Integration and governance requirements for big data 3 Best practices: Integrating and governing big data effectively 4 IBM InfoSphere delivers the confidence to act on big data 5 Why InfoSphere? Additional resources To learn more about the IBM approach to information integration and governance for big data, please contact your IBM representative or IBM Business Partner, or check out these resources: • ibm.com/software/data/information-integration-governance • ibm.com/software/data/infosphere/information-integration-big-data • ibm.com/software/data/integration/info_server • InfoSphere Information Server: A Forrester Total Economic Impact Study • Delivering Trusted Information for Big Data and Data Warehousing: A Ventana Research Report • Gartner: Hadoop Is Not a Data Integration Solution • ITG: Business Case for Enterprise Data Integration Strategy: Comparing IBM InfoSphere Information Server and Open Source Tools 5 Why InfoSphere?
  • 19. IMM14125-USEN-03 © Copyright IBM Corporation 2014 IBM Corporation Software Group Route 100 Somers, NY 10589 Produced in the United States of America December 2014 IBM, the IBM logo, ibm.com, Guardium, IBM Watson, InfoSphere, and Optim are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at ibm.com/legal/copytrade.shtml This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. It is the user’s responsibility to evaluate and verify the operation of any other products or programs with IBM products and programs. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. The client is responsible for ensuring compliance with laws and regulations applicable to it. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law or regulation. 1 IBM internal testing. Please Recycle