Powerpoint exploring the locations used in television show Time Clash
Robust data synchronization with ibm tivoli directory integrator sg246164
1. Front cover
Robust Data
Synchronization
with IBM Tivoli Directory Integrator
Complete coverage of architecture and
components
Helpful solution and operational
design guide
Extensive hands-on
scenarios
Axel Buecker
Franc Cervan
Christian Chateauvieux
David Druker
Eddie Hartman
Rana Katikitala
Elizabeth Melvin
Todd Trimble
Johan Varno
ibm.com/redbooks
2.
3. International Technical Support Organization
Robust Data Synchronization with IBM Tivoli
Directory Integrator
May 2006
SG24-6164-00
12. Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX® Informix® OS/2®
Cloudscape™ IBM® Redbooks™
Distributed Relational Database Lotus Notes® Redbooks (logo) ™
Architecture™ Lotus® RACF®
Domino® Metamerge® RDN™
DB2® Netfinity Manager™ Tivoli®
DRDA® Netfinity® Update Connector™
Everyplace® Notes® WebSphere®
HACMP™ iNotes™
The following terms are trademarks of other companies:
iPlanet, Java, Javadoc, JavaScript, JDBC, JDK, JMX, JVM, J2EE, Solaris, Sun, Sun Java, Sun ONE, and all
Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or
both.
Microsoft, Windows NT, Windows, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.
Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
x Robust Data Synchronization with IBM Tivoli Directory Integrator
14. The team that wrote this book is shown in the picture above. They are from top left to right:
Rana, Todd and Franc; and bottom left to right: David, Axel, and Beth
Axel Buecker is a Certified Consulting Software IT Specialist at the International
Technical Support Organization, Austin Center. He writes extensively and
teaches IBM classes worldwide in the areas of software security architecture and
network computing technologies. He holds a degree in Computer Science from
the University of Bremen, Germany. He has 19 years of experience in a variety of
areas related to workstation and systems management, network computing, and
e-business solutions. Before joining the ITSO in March 2000, Axel worked for
IBM in Germany as a Senior IT Specialist in Software Security Architecture.
Franc Cervan is an IT Specialist working in Technical Presales for the IBM
Software Group, Slovenia. He holds a diploma in Industrial Electronics from the
University of Ljubljana and has 10 years of experience in security and systems
management solutions. After joining IBM in 2003, his area of expertise are Tivoli
Security and Automation products.
Christian Chateauvieux is a Consulting IT Specialist helping and mentoring the
IBM Tivoli Software Technical Sales Teams across the EMEA geography. He is a
xii Robust Data Synchronization with IBM Tivoli Directory Integrator
15. technical advocate of Tivoli Security solutions, promoting and supporting the
sales and marketing initiatives associated with the Tivoli Directory portfolio and
the rest of the IBM Tivoli Security portfolio, including Tivoli Identity Manager and
Tivoli Access Manager in EMEA. He is an expert in Tivoli Directory products and
joined IBM in 2002. Prior to this he had two years in Metamerge® professional
services and support. Christian holds a master’s degree of Computer Sciences
from the National Institute of Applied Sciences (INSA) in France and is ITIL
certified.
David Druker is a Consulting IT Specialist for Tivoli Security products. He
currently works in the IBM Channel Technical Sales organization and is a
recognized authority on IBM Tivoli Directory Integrator solutions. David holds a
Ph.D. in Speech and Hearing Science from the University of Iowa. He joined IBM
in 2002. Prior to that, he wrote code, built scientific apparatus and managed a
variety of systems in both business and scientific enterprises.
Eddie Hartman is part of the Tivoli Directory Integrator development team,
working with design, documentation and storytelling. Eddie studied Computer
Science at SFASU in Nacogdoches, Texas, and at the University of Oslo in
Norway.
Rana Katikitala is an Advisory Software Specialist for Tivoli Security in the IBM
Software Labs, India. He has eight years of experience in the IT industry in the
ares of development, support, and test of operating systems, systems
management software, and e-business solutions. He holds a master’s degree in
Structural Engineering from Regional Engineering College (REC) Warangal,
India. His areas of expertise include IBM OS/2®, Windows® 2000, Netfinity®
Manager™, IBM Director, Healthcare domain solutions of HIPAA (Health
Insurance Portability and Accountability Act) and HCN (Healthcare Collaborative
Network) and Tivoli Security solutions.
Elizabeth Melvin is a Certified Consulting IT Specialist in Austin, Texas, working
for the IBM TechWorks Americas Group as a subject matter expert supporting
software sales. She has 16 years of experience in a variety of areas including
systems security, identity/data management and architecture as well as network
computing. She holds a degree in Management of Information Systems from the
University of Texas in Austin. Her areas of expertise include security
infrastructure and data synchronization software.
Todd Trimble is a Certified IT Product Specialist. He is ITIL certified and has 25
years experience in the security and systems management solutions area. Todd
joined IBM in 1998 and has been working with the Tivoli Security products on
major customer engagements. He is responsible for providing a validated
technical solution that resolves the identified business requirements and
eliminates the technical issues and concerns prior to the sale of the IBM Tivoli
Security portfolio.
Preface xiii
16. Johan Varno is the Lead Architect for Tivoli Directory Integrator at the IBM Oslo
Development Lab in Norway. He holds a degree in Computer Science from the
University in Oslo and an MBA from the Norwegian School of Management. He
has 24 years of experience in a variety of areas relating to network technologies,
software development, and business development. Prior to working in IBM,
Johan was cofounder and CTO of Metamerge.
Thanks to the following people for their contributions to this project:
Keith Sams, Jay Leiserson, Bob Hodges, Ralf Willert, Rudy Sutijiato, Cameron
MacLean, Kraicho Kraichev, Lanness Robinson, Jason Todoroff
IBM US
Yogendra Soni
IBM India
David Moore
IBM Australia
Gabrielle Velez
International Technical Support Organization
Become a published author
Join us for a two- to six-week residency program! Help write an IBM Redbook
dealing with specific products or solutions, while getting hands-on experience
with leading-edge technologies. You'll team with IBM technical professionals,
Business Partners and/or customers.
Your efforts will help increase product acceptance and customer satisfaction. As
a bonus, you'll develop a network of contacts in IBM development labs, and
increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our Redbooks™ to be as helpful as possible. Send us your comments
about this or other Redbooks in one of the following ways:
xiv Robust Data Synchronization with IBM Tivoli Directory Integrator
17. Use the online Contact us review redbook form found at:
ibm.com/redbooks
Send your comments in an e-mail to:
redbook@us.ibm.com
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. OSJB Building 905
11501 Burnet Road
Austin, Texas 78758-3493
Preface xv
18. xvi Robust Data Synchronization with IBM Tivoli Directory Integrator
22. 1.1 A close look at the challenge
Nobody wants to shake the infrastructure too hard. It's holding up the house.
Furthermore, it has grown to fit, the result of evolution: Natural selection; survival
of the highest switching cost.
And yet, businesses still undergo the expense and trauma of infecting their
infrastructure with new software. And they usually do it for the same reason: to
increase value produced by the organization while decreasing the cost involved
in its production. The goal is to improve organizational efficiency, quality,
traceability, agility, or all of the above.
But when companies tamper with the underpinnings of the enterprise, they tread
softly; sometimes so softly that initial goals evaporate down to just getting new
software deployed and running. This task would be less formidable were it not for
the riddle of shared data.
Applications need data—annoyingly often the same data. Since most of these
products are engineered independently of each other, they probably don't see
eye-to-eye on how data is handled. This includes home-grown solutions as well
as commercial products, even many built by the same vendor. Some use
standards, while others maintain their switching costs with proprietary
approaches. And even if two systems agree on a common data store, they
probably do not concur on its structure. So you end up with multiple data sources
carrying bits and pieces of the same information. Disparate pockets of data, with
dependent systems in a tight orbit around them.
Experience shows that this sort of data fragmentation is the rule rather than the
exception. It is the result of the evolutionary, periodically explosive growth of a
company's machine and software infrastructure, and sustained by the constant
fear of breaking something important. Terms like golden directory are born of this
inhibiting, but justifiable fear. And when enough data sources are golden the
infrastructure becomes very heavy. It solidifies and loses agility, making the
ordeal of adding new systems and services even more painful. Nobody plans for
this to happen. It is the natural result of unresolved governance. Intrinsically,
applications presume ownership of their own data—a presumption likely shared
by their principle users in the organization. This works fine for some types of
information, but fails dramatically for others; for example (but not limited to)
identity data.
Let us rephrase that. Nowhere is this more true than for identity data.
Organizations often discover that their identity information data and structure is,
more often than not, owned by everybody, and yet by nobody in the organization.
4 Robust Data Synchronization with IBM Tivoli Directory Integrator
23. This apparently contradictory statement refers to the fact that information about
people in the organization is typically managed in multiple places, yet not
coordinated in terms of governance or data structure. This is not a big problem
when applications and user data live in isolation, for example information about
employees residing solely in the HR system and users in the LAN directory1. This
indiscretion is often tolerated until the risks involved become too great (or
sometimes, until they simply become obvious).
The proliferation of user registries and the ensuing security exposure make the
argument for directory integration particularly compelling: An employee may be
terminated, but there's no guarantee that there won't be access rights left in
some subset of directories, invisibly providing unwarranted access privileges;
Sanctioned users are burdened with a multitude of user names and passwords
spread all over the place, each of which they must remember and maintain
separately, and which they probably write down somewhere. This in itself
represents a security risk, in addition to the productivity loss caused by
inconsistent provisioning. Not to mention increasingly tougher audit requirements
(for example, the Sarbanes-Oxley Act2) forcing people to get serious about
traceability and security.
Moreover, identity data fragmentation becomes a serious roadblock as
organizations increasingly implement large-scale, cross-organization solutions
that require consistent data, managed in a 24x7 environment, scalable for
growing usage and demands, and possibly including customers and partners.
Deploying enterprise portals and services (like simplified or single sign-on)
without an enterprise view of identities is practically impossible. Success, for both
tactical deployments and continued strategic growth, hinges on tying the chaos of
existing user registries into a holistic model.
Although the utopian proposition is to condense disparate registries down to a
single physical directory, the multitude of identity stores won't be going away as
long as applications depend on them in their own specific ways. As a result, the
common approach to addressing data fragmentation is with integration tools that
allow silos to stay in place, but give the appearance of unified access. Ideally,
with tools for building integration through careful evolution, rather than revolution.
This means that deployment is broken into measured steps, bringing new
systems and repositories into the picture over time. If the process is planned
correctly, ROI can begin as soon as the first sub-step is complete.
This document is not about implementing a single enterprise-wide directory that
becomes the master for all others, although such can certainly be implemented
with Tivoli Directory Integrator. However, it is about the options available with
1
Even though integration at this stage also makes sense from a security and data integrity
perspective.
2
More information about the Sarbanes-Oxley Act can be found at http://www.sarbanes-oxley.com/.
Chapter 1. Business context for evolutionary integration 5
24. Tivoli Directory Integrator to deal with the wide spectrum of integration
challenges encountered when deploying identity based applications in the
enterprise.
1.2 Benefits of synchronization
When implementing a synchronization solution, the result is an environment
where shared data looks the same for all consuming applications. This is
because changes are propagated throughout the synchronized network of
systems, molded in transit to fit the needs of each consumer. Each data source is
kept up-to-date, maintaining the illusion of a single, common repository. Each
application accesses its data in an optimal manner, utilizing the repository to its
full potential without creating problems for the other applications.
Synchronization strategies are increasingly the choice for deploying new IT
systems. For identity management, this is usually a centralized or metadirectory
style synchronization, where a high speed store (like a directory) is used to
publish the enterprise view of its data. This approach has a number of
advantages:
Security requirements vary from system to system, and they can change over
time. A good repository (like a directory) provides fine-grained control over
how each piece of data is secured. Some provide group management
features as well. These tools enable you to sculpt the enterprise security
profile as required.
Each new IT deployment can be made on an optimal platform instead of
shoe-horned between existing systems into an uninviting infrastructure.
Applications get to live in individually suited environments bridged by
metadirectory synchronization services.
If the availability and performance requirements are not met by some system
(legacy or existing, or new), it can be left in place and simply synchronize its
contents to a new repository with the required profile; or multiple repositories
to scale.
A metadirectory uncouples the availability of your data from that of its
underlying data sources. It cuts the cord, making it easier to maintain up-time
on enterprise data.
Disruption of IT operations and services must be managed and minimized.
Fortunately, the metadirectory's network of synchronized systems evolves
over time in managed steps. Branches are added or pruned as required.
Tivoli Directory Integrator is designed for infrastructure gardening.
6 Robust Data Synchronization with IBM Tivoli Directory Integrator
25. A good metadirectory provides features for on-demand synchronization as
well3. Sure, joining data dynamically can be prohibitively expensive in terms
of system and network load; but sometimes it's the optimal solution.
1.3 Directory Integrator in non-synchronizing scenarios
While Tivoli Directory Integrator is a powerful tool to deal with a large number of
synchronization scenarios, its core is a general purpose integration engine that
can be used by other systems in real-time, providing these systems with very
interesting capabilities. Below are some examples of deployed solutions to
illustrate such usage:
A mainframe application sends MQ messages that Tivoli Directory Integrator
picks up, then accesses other data systems in the enterprise, performs some
operations and transformations on the data set and responds back through
MQ to the mainframe.
The Tivoli Access Manager SSO (single sign-on) service calls Tivoli Directory
Integrator during user login in order to authenticate their credentials against
one or multiple systems not supported out-of-the-box by Tivoli Access
Manager. Automatic provisioning of new users is done as required.
Tivoli Directory Integrator monitors the operational status of an LDAP
directory and sends SNMP traps to enterprise monitoring systems.
A SOA-based application calls Tivoli Directory Integrator through Web
services, and Tivoli Directory Integrator writes data to specially formatted log
files and updates databases.
Tivoli Directory Integrator intercepts LDAP traffic to transparently make
multiple directories look like one to an LDAP client application. As in all Tivoli
Directory Integrator solutions, any number of Tivoli Directory Integrator
connectors, transformation, and scripting can be brought to bear on the data
flow.
As seen from the above deployments, Tivoli Directory Integrator isn't limited to
synchronizing data. The next sections provide additional scenarios and
examples that illustrate how Tivoli Directory Integrator is inserted into a data flow,
enabling real-time operations to be executed that otherwise would have required
complex and custom code.
3
In addition to change-driven, schedule-driven and event-driven
Chapter 1. Business context for evolutionary integration 7
26. 1.4 Synchronization patterns and approaches
This section takes a look at synchronization from a conceptual perspective. First,
we look at how and when, meaning how Tivoli Directory Integrator is invoked to
perform its work. Then we look at some of the typical data flow patterns that are
encountered.
1.4.1 How and when synchronization can be invoked
Tivoli Directory Integrator-based synchronization solutions are typically deployed
in one of the three following manners, although combinations are also frequently
used to enable the various data flows that entire solution requires:
Batch - In this mode Tivoli Directory Integrator is invoked in some manner
(through its built-in timer, command line or the Tivoli Directory Integrator API),
and expected to perform some small or large job before either terminating or
going back to listening for timer events or incoming API calls. This is often
used when synchronizing data sources where the latency between change
and propagation is not required to be near real-time.
Event - Tivoli Directory Integrator can accept events and incoming traffic from
a number of systems, including directory change notification, JMX™, HTTP,
SNMP, and others. This mode is typically used when Tivoli Directory
Integrator needs to deal with a single, or a small number of data objects.
Call-reply - This is a variation of the event mode, but the difference is that the
originator of the event expects an answer back. IBM products use the Tivoli
Directory Integrator API to call Tivoli Directory Integrator, and solutions in the
field often use HTTP, MQ/JMS and Web services to invoke a Tivoli Directory
Integrator rule and get a reply back.
There is no single answer to the questions of when to choose between batch or
event-driven integration. For example, enterprises have varying requirements
regarding the propagation of identity data. Delays can be acceptable in the
seconds, minutes, and even in the hours range. It must also be determined
whether the data sources can provide a data change history (LDAP directories
often have changelogs) or notification mechanisms when data changes. Tivoli
Directory Integrator can be utilized both as a batch system, checking for changes
every so often, as well as a notified system, reacting only when the source
system sends a data change notification.
Also keep in mind that the above modes are not exclusive of each other, all of
them can be utilized in the same Tivoli Directory Integrator deployment.
8 Robust Data Synchronization with IBM Tivoli Directory Integrator
27. 1.4.2 Data flow patterns
Tivoli Directory Integrator is often used to implement not just one, but a number
of data flows. Data can flow from one system to another, but also from many
systems to one. As a system becomes the source of data from many systems, it
often evolves to the next stage, where it becomes the source for updates into
many others.
It is important to understand and then map the intended flow of data. Although
the current infrastructure does not yet look like the picture in Figure 1-1, it does
illustrate that the enterprise applications are being rolled out with increasing
speed in large organizations. These systems often do not share identity
repositories (although the same directory may host several instances), simply
because the applications have diverging requirements on data format, as well as
the system owners have different perspectives on how to manage and access
the identity data. A well-crafted integration solution will let each business owner
have full control of their data system, while ensuring that common data is kept in
harmony across the entire infrastructure.
Other enterprise
applications
Single
Provisioning Sign-on
LAN
Portal
Personal
profile
Personalization
White pages
Content
Management
Figure 1-1 IT infrastructure example
A commonly underestimated part of synchronization projects is the planning of
data flows. Successful deployments document the flow of attributes at an early
stage and therefore identify the number and type of data flows required. A project
might look very complicated at first glance, but once the flows are identified, the
project can be approached in incremental steps.
Chapter 1. Business context for evolutionary integration 9
28. Although the project could at first glance look like a very complex many-to-many
data flow scenario, it might after inspection reveal itself to be a number of simple
one-to-one, many-to-one or one-to-many data flows. Next, we take a look at
these simple data flow patterns that a project typically consists of.
One-to-one data flow
The simplest data flow is the copying or synchronizing of data from a single
source to a single target. However, just because the flow is simple, there can be
any kind of transformation performed on the data, either in content, syntax,
format or protocol. Here are some examples of such data flows:
Updating a database with data from a file that was made available as a report
from another system.
Generating a file that contains changes made in a database.
Keeping a directory synchronized with another, transferring only changes as
they occur on the source directory.
Reading an XML file and writing a CSV formatted file with a selected subset
of the XML file.
Even though the flows above are conceptually simple, transformation of the data
might be required that introduces complexity. For example, when dealing with
identity data, there could be a requirement to join a number of groups into a
single one in the target directory. This join could have further restrictions based
on other data in the source system, such as address, department, or job function.
Many-to-one data flow
As previously discussed, data ends up in
email
Directory multiple repositories for a number of good
TDI Directory reasons. As this happens, additional context is
built into the systems as well. Both explicit and
Database implicit relationships between the data are
File established, which are lost when just copying
the data to a new system. Furthermore, the
existing systems continue to be updated and
managed as before, so copying data quickly looses its relevance. Sometime a
federated approach can be used to access this data set in real-time, but often
this is not acceptable because of performance or availability requirements.
Therefore, a synchronization data flow must involve multiple source systems in
the process of maintaining a target system with the re-contextualized data.
A many-to-one data flow uses the source systems for purposes such as verifying
information, making decisions in the data flow, and merging (joining) additional
attributes to the initial data set that is intended for the target system.
10 Robust Data Synchronization with IBM Tivoli Directory Integrator
29. One-to-many data flow
The illustration does not fully describe the
email combinations that are possible in
Directory
one-to-many scenarios. The main point is
TDI Directory
that data needs to be updated,
Database
maintained or created in several places.
For example, as e-mail addresses are
File
added in the e-mail directory, Tivoli
Directory Integrator ensures that this is
updated in the single sign-on directory for authentication purposes. However, the
ERP system also likes to subscribe to this information as it is used in automated
ERP-based messages to employees. So in this example, Tivoli Directory
Integrator would update both the SSO directory as well as the ERP system as
part of a data flow. Another example is propagating password changes in a
directory to a number of other directories.
In one-to-many data flows it is important to consider what could happen if a flow
was interrupted and data not updated in all systems as was expected. In
transactional systems, roll-back is used to reset the involved systems to the state
they had before the data flow started. However, in most identity synchronization
projects, this is not much of a problem since the entire data flow can be
repeated—it is not like transferring the same amount of money twice to another
bank account. However, roll-back or compensating logic can be added to a Tivoli
Directory Integrator solution should this be required.
1.5 Business and technical scenarios
The previous section looked at synchronization concepts in general. Also, some
of the benefits of synchronization were discussed in another section. Now we
investigate some real-life scenarios to illustrate the business context. The
examples below are intended to bring them to life so that the reader can more
readily recognize and identify synchronization opportunities when faced with a
new business or technical deployment challenge. The fictional company PingCo
is used to illustrate the scenarios. Let us now look at a few identity use cases to
illustrate the issues that throw wrenches into the machinery that organizations
have spent years building.
1.5.1 Multiple existing directories and security concern
PingCo is building a portal that will be used by both employees and external
customers. PingCo has already implemented separate employee and business
partner directories, but the employee directory is on the corporate intranet and
will not be made accessible to non-VPN external users. The portal will be placed
Chapter 1. Business context for evolutionary integration 11
30. in the DMZ, with no access into the internal network. One solution is to use Tivoli
Directory Integrator to synchronize the employee and the business partner
directory into a new directory placed in the DMZ. Only the necessary information
about the employees is transferred into the DMZ directory to reduce security
exposure. PingCo can choose whether or not to securely synchronize the
employee passwords into the external directory, or create new passwords (but
the same user name) for employees that access the external portal.
The above scenario could be modified to include organizations with many
internal directories, possibly managed by separate business units or other
organizational entities that challenges coordination of efforts. Synchronizing the
content (with possible filtering of data) from the directories lets them keep
ownership of data, yet enables common applications to be deployed on the joint
set of identity data on a new directory that reduces the dependence on each
sub-directory with minimum performance impact.
1.5.2 Existing directory cannot be modified
PingCo intends to deploy an enterprise single-sign-on (SSO) service and have a
directory with all employees. However, for some reason PingCo cannot let the
SSO service use the existing directory directly. Sometimes directories are only
accessed in read-only mode, but sometimes applications that use directories
also need to store data in them as well. That can become a hurdle for reasons
such as:
Technical. The existing applications that use the directories cannot deal with
this change.
Availability. The business owners of the existing directory are not able to meet
the availability requirements of an enterprise (and possible cross-enterprise)
SSO service.
Governance. Existing business owners of the directory don't want others to
modify a system that they own and manage.
Performance. The added performance impact of the SSO service could
extend beyond what the directory platform can provide.
Security. Although the user names are already there, the SSO service adds
new data that might be considered even more sensitive.
The solution in this case is a simple synchronization to a new directory. It could
even be a separate logical directory tree on the same machine or an entirely
different directory implementation on a more scalable and secure physical
machine. PingCo would have the choice of where passwords are managed and
changed. Any change to one directory would immediately be made on the other
as well.
12 Robust Data Synchronization with IBM Tivoli Directory Integrator
31. With IBM SSO (single sign-on) offerings, Tivoli Access Manager, there is an
additional option available as described in the following section. That scenario
works with a single directory for Tivoli Access Manager authentication, but keeps
all other data in a separate and secure directory.
1.5.3 Single sign-on into multiple directories with Access Manager
PingCo intends to implement a single sign-on service with Tivoli Access
Manager, and users are defined in multiple directories. Tivoli Directory Integrator
integrates with Tivoli Access Manager Version 5.1 and later through its EAI
(External Authentication Interface) so that Tivoli Directory Integrator can
authenticate users across any number of back-end sources that Tivoli Directory
Integrator supports. For example, when a user provides credentials to Tivoli
Access Manager, Tivoli Directory Integrator is invoked and then attempts to
authenticate into a number of directories with custom filters and modifications to
the base credentials. Tivoli Directory Integrator can also look at the supplied
credentials and do direct authentication to a target directory rather than trying all
of them if such information is available.
1.5.4 Data is located in several places
PingCo intends to deploy a portal based application that requires information
about employees, their work location as well as who their manager is. This
information does exist in the infrastructure, but not in a single location. There are
directories that contain both unique and overlapping information about
employees. The HR system knows about work location and the managers of the
employees. To make things even more complicated for the solution architect, the
HR group is not willing to provide direct access to their system, but are willing to
provide a weekly report with the required information.
This is a classic example of where Tivoli Directory Integrator can bring order to
the chaos by connecting to all of the directories, identify the unique set of users,
and merge that data with the weekly feed from HR. The end result is a directory
where all information is collected and users have work location and manager
information added in from the HR system. Once the initial job has been
completed, Tivoli Directory Integrator continues to monitor the sources for
changes, including the weekly report from HR, and identify the records that have
been added, modified, and deleted.
1.5.5 Use of virtual directory - access data in place
PingCo needs to authenticate users against one or more directories that cannot
be synchronized, possibly because they belong to somebody else who does not
allow this to be done. If PingCo uses Tivoli Federated Identity Manager or Tivoli
Chapter 1. Business context for evolutionary integration 13
32. Access Manager then there are authentication plug-ins available (using the
External Authentication Interface) to Tivoli Directory Integrator. However, in other
situations, Tivoli Directory Integrator can intercept LDAP messages and forward
them to one or more LDAP directories in a round-robin/chaining or other custom
logic on behalf of the client. This scenario is often described as a virtual
directory approach since the client does not need to know that it's actually
communicating with a number of directories in real-time. This approach has
some apparent benefits (and sometimes offer the only practical option), such as
leaving data in place, removing the requirement for synchronization. However,
there are both short-term and long-term issues that should be considered:
Availability - Some attribute relationships cannot be reliably resolved in
real-time due to unstable systems, scheduled maintenance, broken links,
latency, firewalls, and so forth; or because some relationships are too
complex to resolve quickly. Synchronization can spend the time it takes to
map their data.
Performance - A virtual directory imposes itself into every data access
operation. A separate synchronized directory maximizes performance while it
maintains the enterprise view via change-based synchronization.
Performance requirements are often underestimated as the use of new
enterprise applications often grow past what was initially assumed. This is
especially true for enterprise portals and single sign-on projects, where a
successful deployment creates major benefits, but increases resource
consumption.
Reliability - The virtual directory is dependent on all connected systems
being available and online. The owners of those systems might not be willing
to provide that level of service to the rest of the enterprise. A synchronized
solution will always be available, and there is no impact of an off-line
subsystem. Also, if the synchronization engine (not the synchronized
directory itself) is offline, data gets out-of-date. This is amended as soon as
the synchronization is restarted. If the virtual directory is down, all dependent
applications are down as well.
Agility - New enterprise data means new data relationships, so with both
approaches the integration solution must be updated to include these.
However, the out-of-band nature of synchronized solutions significantly
facilitates maintenance and upgrade since data flows and integration flows
can be added without impacting the operational availability of the directories.
Scalability - Virtual directories can't scale the way real directories can. Even
with caching, they will always be limited by the scalability of the systems with
the source data. Furthermore, a good enterprise directory can be massively
scaled in multi-master-slave configurations for high performance.
14 Robust Data Synchronization with IBM Tivoli Directory Integrator
33. 1.6 Conclusion
Synchronization introduces a number of benefits to the architectural design of
new enterprise solutions. Rather than trying to craft an optimal situation,
synchronization can provide a pragmatic approach that is less costly to build and
maintain, while adding operational benefits such as performance, availability and
agility. These benefits certainly do not apply to all scenarios, but on the other
hand are often not evaluated because the architectural 20-20 vision prevails
where the pragmatic mind would have provided quicker time to value as well as a
more future-proof solution since changes are often less predictable than we
would like.
Chapter 1. Business context for evolutionary integration 15
34. 16 Robust Data Synchronization with IBM Tivoli Directory Integrator
36. What typical business requirement is Tivoli Directory Integrator trying to
solve?
What data stores are required to solve the problem?
How can you instrument and test the solution?
Who is responsible for what activity?
2.1 Typical business requirements
Tivoli Directory Integrator is a truly generic data integration tool that is suitable for
a wide range of problems that usually require custom coding and significantly
more resources to address with traditional integration tools. It is designed to
move, transform, harmonize, propagate, and synchronize data across otherwise
incompatible systems.
However, before the tool can be used, it might be necessary to understand what
has brought about the data synchronization requirement. For example, is it the
result of a company’s acquisition of another firm, in which case the acquired
company’s uses need to be integrated and kept in synch with the parent
companies data stores, thereby providing a common data source to be used with
the development of a new enterprise application? A secondary goal may be the
synchronization of user passwords.
Tivoli Directory Integrator can be used in conjunction with the deployment of the
IBM Tivoli Identity Manager product to provide a feed from multiple HR systems
as well as functioning as a custom Identity Manager adapter.
Both of these scenarios will be further expanded upon later in this book.
Regardless of the scenario, it is essential to gain a full understanding of the
environment. This allows you to document the solution.
Typically this is accomplished by the development of a series of use cases that
are designed to clarify the business needs and refine the solution through an
iterative process that ultimately provide you with a complete list of documented
and agreed to customer business requirements.
For example, is the data synchronization solution viewed as business critical, and
will it need to be instrumented into a high availability solution; or is a guaranteed
response time a business requirement that has to be addressed?
It is important to point out, that in most cases you are manipulating user identity
data. As such, the appropriate security safeguards for privacy and regulatory
compliance requirements need to be addressed during the requirements
gathering phase.
18 Robust Data Synchronization with IBM Tivoli Directory Integrator
37. The ultimate goal is to determine how the information will need to flow through
the enterprise to solve the stated business requirements. This is the essential
first step in breaking down the complex problem of enterprise data
synchronization into manageable pieces.
At a minimum, the solution architect will need to be able to provide:
An agreed upon definition of the business requirements and the translation of
the business objectives into concrete data and directory integration
definitions.
A concise understanding of the various data stores that are part of the
solution and under what circumstances the information needs to flow through
the organization as well as the authoritative source for each data element that
will be managed.
The diagram in Figure 2-1 depicts the various steps required to instrument an
enterprise data synchronization solution.
Detailed data identification
· Location – data source
· Owner
· Access Tivoli Directory Integrator
· Initial format
· Unique data
Review results
· Enables initial design documentation
and communication
Business requirements Data synchronization
· Business scope solution
· Business benefits
Instrument and test
Plan data flows
· Workable units
· Authoritative attributes
· Naming conventions
· Unique link criteria
· Availability/failover
· Special business requirements
· System administration
· Final data format
· Security
· Data cleanup
· Password synchronization
· Phased approach
· Frequency
Figure 2-1 Solution architecture process flow
It is important to note that some of the elements in the process flow described in
the figure above are outside of the Tivoli Directory Integrator product sphere—
indicated by not being placed completely inside the grayed in area. Those found
entirely inside of the grayed in area are wholly a part of the solution. Let us take a
closer look at each of the different disciplines in order to clarify what we mean.
Chapter 2. Architecting an enterprise data synchronization solution 19
38. 2.2 Detailed data identification
This section discusses the best practice for identifying the nature of the data
required to solve the defined business problem.
Once the business requirements and corresponding use cases have been clearly
stated and agreed upon, the next step in architecting a data synchronization
solution is to identify the nature of the data that will be utilized. At a minimum, the
solution architect will need to be able to:
Identify as much as possible about the data.
Provide a document that describes the data flow.
Describe how the results of the first two steps will be reviewed.
By following this best practice technique of identifying, planning, and reviewing
the nature of the data, the solution architect will be able to craft the technical
solution requirements and design to match the driving business needs.
To continue with the best practice of simplifying a complex problem, the
systematic definition of the required data will further simplify the task of creating
a successful project. Detailed data identification starts with the understanding
that this is the time where the business based use cases are used to add more
clarity to what is to be accomplished. At a minimum the solution architect must
identify the following:
Data location
Data owner
Data access
Initial data format
Uniqueness of data
2.2.1 Data location
The location of the data is typically the primary factor in determining the ultimate
solution design and architecture. The solution architect will be required to identify
both the physical and logical location of the data to be used to satisfy the use
case.
Some examples of physical location are items such as the data exists in a
specific regional location, is on a particularly slow or fast hardware platform, or
happens to be limited in accessibility due to distance or network speed. These
factors are used when planning data flows and designing the physical
architecture of the data synchronization solution.
The logical location of the data translates very specifically to IBM Tivoli Directory
Integrator components that are mentioned in the following chapter. By
20 Robust Data Synchronization with IBM Tivoli Directory Integrator
39. determining the data sources in the use case, the solution architect can then
determine the type of connection to be used along with the underlying technology
to be utilized.
An example of identifying a logical location of data might be that the use case
involves synchronizing data located within a directory server. The logical location
of the directory server’s data would be described by the server name and/or IP
address. The underlying technology to be used to connect to a directory server
would typically be the LDAP protocol or possibly via an LDIF file. Similarly, if the
use case incorporated the use of a database, the data source would be identified
as possibly relational in format and accessibly via a JDBC™ technology
connection.
2.2.2 Data owner
Determining the owner of the data helps the architect identify any possible
requirements introduced to the solution due to privacy or compliance concerns.
Does the data have a requirement to be handled in a special way or is it even
possible to use the data within the desired use case given its current location and
form? Regulatory and corporate policies should be reviewed with the data owner
at this time as well.
2.2.3 Data access
Many times, the data owner is often the same organization or person who
provides the data access. However, this is not always the case. Data access
involves the determination of what level of access can be granted to the data
store or source to be able to synchronize the required attributes.
An example of this is a business use case that requires the solution to
synchronize to an LDAP server. A best practice would be for the owner of the
LDAP server to provide an individual login account with special privileges just for
Tivoli Directory Integrator to use. The result of this allows the server owner to
track the activity generated by the synchronization solution as well as effectively
maintain any security policies the organization may have in place for that server.
If the solution only requires access to a specific container on that LDAP server,
the login account could be limited to read and write privileges within that
specified container. This is an example of where the solution architect would
specify what access privileges are required to each data source in the use case.
2.2.4 Initial data format
Identifying the initial data format involves the determination of all the possible
values each attribute could have when initially connecting to the data source. The
Chapter 2. Architecting an enterprise data synchronization solution 21
40. reason for this is that data values tend to show up in one of four states; null,
blank, out-of-range and valid. As such, the best practice is to determine when
the solution will account for all four possible states, as well as, how to handle any
special conditions that could be encountered. For example, how does the
solution resolve duplicate or multiple values.
Tip: A common pitfall many solutions encounter is the issue of converting
integer value data to strings. This happens most often when synchronizing
from a database if you are not careful to take note of the format of the field
values in a database. For example, many fields within databases designed to
handle a numeric entry, such as employee number, use an integer format.
Sometimes your data synchronization solution requires you to parse or
otherwise process these values as though they were a string within IBM Tivoli
Directory Integrator.
2.2.5 Unique data
The identification of unique data is typically accomplished at the same time that
the initial data format is determined. Often the data values or attributes to be
used are in a specific format that needs to be accounted for within the data
synchronization solution.
Tip: For the advanced user, Tivoli Directory Integrator can be used to help
identify some of the specifics of the data by using data and schema discovery
functions in Directory Integrator.
2.3 Plan the data flows
The second step of designing a solution deals with planning the data flows. Many
times this occurs simultaneously with the data identification phase. At a
minimum, the solution architect needs to identify the following details:
Authoritative attributes
Unique link criteria
Special conditions or business requirements
Final data format
Data cleanup
Phased approach
Frequency
22 Robust Data Synchronization with IBM Tivoli Directory Integrator
41. 2.3.1 Authoritative attributes
When planning the flow of data, identifying which attributes are authoritative in
what data source(s) is paramount. For example, an enterprise may determine
that the human resources application is authoritative for all attributes describing
an employee except for the employee’s e-mail address. The e-mail server is
considered the authoritative data source for the e-mail address attribute.
It is ideal that there be only one data store within the enterprise identified as
being authoritative per attribute. It is possible to have multiple data stores as
authoritative for the same attribute being synchronized. The most common
attribute being the user password. It is best not to have any attributes have more
than one authoritative data source.
Tip: This is where the best practice mentioned earlier in the data access
section of having separate logins for each connection comes in handy, so you
know who is changing what attribute in its authoritative data store.
2.3.2 Unique link criteria
When synchronizing data within an enterprise, it is a technical requirement to
identify some way to link the data sources. Simply put, how do you identify the
same user across multiple data stores? A common way to link the multiple data
stores is via a user’s unique identification number. For employees, it tends to be
their unique employee number. In some cases, it is the e-mail address and in
others it is some combination of attribute values.
If there is no pre-existing unique identifier between data sources to be
synchronized, one much be generated using some combination of attribute
values or by using the best available logic applied to the business case.
Fortunately, Tivoli Directory Integrator provides a simple way to link data sources
on very simple or detailed linking criteria.
2.3.3 Special conditions or requirements
In many cases, special conditions or requirements exist within the use cases.
This is often more obvious after the solution architect completes the detailed data
identification process. A simple example of a special condition would be when
the origination data source only contains the values of first name and last name
for a user and the requirement is to synchronize their full name into a new
attribute in the destination data source. This is where the solution architect would
note the condition required to concatenate the user’s first name and last name
together to generate the full name.
Chapter 2. Architecting an enterprise data synchronization solution 23
42. Another example of a special requirement might be that only users in certain
departments have their e-mail address synchronized.
2.3.4 Final data format
When planning the flow of data for each use case, identifying the expected
format of the data in the target system(s) is critical. The solution architect needs
to resolve two concerns.
In the first concern we have to perform identification of attributes that might have
special or unique formatting of the data values. In some cases, this can create a
requirement that might alter the expected flow of data. A common example of this
occurs when the use case requires the attribute for a user’s manager to be
synchronized into an LDAP data store. Since the solution architect previously
identified the nature of the LDAP data store, they can then determine if the LDAP
server requires the manager attribute to be the data format of a fully qualified
distinguished name.
The second concern regarding the final data format involves what has been
mentioned in 2.2.4, “Initial data format” on page 21. The solution must allow for
handling any of the four possible data states for the expected output. Once again,
those data states are null, blank, out-of-range, and valid. This is less of an issue
here. It occurs most often when the destination data store is being altered by
many sources.
2.3.5 Data cleanup
At this stage of planning, it has most likely become apparent if a separate or
additional data flow might be required to handle data that needs to be either
cleaned up or has no matching attribute(s) between the source and destination
data stores. These two conditions are the most common and are often referred to
as handling dirty data and creating unique link criteria.
If it becomes apparent this task is rather large, it is often a requirement to plan for
a complete separate initial phase of the project to clean the data. The on-going
data synchronization will continue to focus on accommodating the initial and final
data formats mentioned in previous sections and will have solved the unique link
criteria requirements.
2.3.6 Phased approach
Often times it is necessary to utilize a phased approach when planning your data
flows. The need for a phased approach typically occurs when either there is a
large amount of data cleanup required or the use case over time plans on
24 Robust Data Synchronization with IBM Tivoli Directory Integrator