Scaling API-first – The story of a global engineering organization
Usage Landscape of Enterprise Open Source Data Integration
1. WHITE PAPER
Usage Landscape
Enterprise Open Source Data Integration
2. Table of Contents
Introduction ................................................................... 3
Background ....................................................................3
Diverse Data Integration Projects..........................................4
Data Integration Needs and Tools..........................................6
Open Source Data Integration vs. Proprietary Solutions ...............8
Enterprise Requirements .................................................... 9
Community Support ........................................................ 10
Community Involvement................................................... 11
Conclusion ................................................................... 13
3. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
Introduction
Enterprise data integration needs are growing exponentially over
time, as is the interest in open source technologies and the adoption
of open source solutions.
With this in mind Talend conducted a survey to define the usage
landscape of open source data integration and to profile users of
this technology. The data used in this analysis was collected from
1013 survey participants. Responses came primarily from the U.S.
(56.5%), followed by Europe (35.2%), with the rest of the responses
(8.3%) originating in the rest of the World.
8%
35% 57%
US
Europe
Other
Survey respondents’ demographics
Background
As companies merge, acquire new applications, and build their IT
platforms by incorporating disparate applications with legacy
systems, information systems are becoming more and more
heterogeneous. As a result, data integration tools are now
indispensable if enterprise IT departments are to properly manage
the flows of data across the information system.
Page 3 of 13
4. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
In addition, alternative models of software deployment—such as
Software as a Service (SaaS)—and the need for interoperability with
partners, customers, providers, etc., all have an important impact
on data integration requirements.
The global economy is imposing cost controls on IT Managers, both in
Data Integration
The process of combining data residing terms of staff and software, at a time when data integration
at different sources and providing the
user with a unified view of these data. represents an increasingly larger percentage of the enterprise IT
budget. Asked to do more with less, IT personnel would be better
off spending cycles on tasks other than the time consuming manual
scripting needed to meet custom requirements. In fact, software
resources with lower acquisition and operation costs would allow IT
Managers to more easily deploy enterprise-grade solutions.
In this context, open source solutions offer a very compelling
argument. Open source tools can automate and maintain tasks
formerly requiring manual scripts, and the existing skills of the IT
implementation team easily transfer to an open source offering. In
addition, IT departments don’t have to justify significant up-front
fees.
Diverse Data Integration Projects
Data integration is the collective term for technologies that include
ETL (Extract-Transform-Load) for business intelligence and data
warehousing, and operation data integration—the flows of data
across operational applications and systems. These needs can range
from high throughput batch transfers of data to near-real-time,
trickle-feed data flows.
Project Type
Consistent with the global data integration market distribution—
whether open source or proprietary—most of the survey participants
(61.5%) use open source solutions for their ETL projects, in
Page 4 of 13
5. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
particular for BI, Data warehousing and analytics. This can be
attributed to the fact that ETL is the most mature segment of the
entire data integration market.
ETL
Data Loading
Operational Data Integration: Batch
Migration
Operational Data Integration: Real Time
Database Synchronization
0% 10% 20% 30% 40% 50% 60% 70%
Types of projects for which open source
data integration is used
Data Loading Data loading (41.9%) and data migration (26.5%) are the second and
The process of loading data in an
application or database—for example fourth most popular type of project. Both of these are good
prior to its deployment.
candidates for open source solutions, as they are typically one-offs,
Data Migration
The process of transferring data
with no ongoing purpose that would justify a long-term investment
between databases, applications or
other systems, with the purpose of
in an expensive proprietary tool.
replacing a system with another.
Data Synchronization Data synchronization (19.1%) is also a popular type of project
The process of establishing data
consistency on remote sources conducted by open source data integration users.
continually harmonizing the data over
time.
Batch vs. Real-Time
Operational data integration—whether batch or real-time—is also a
good fit for open source solutions. As business tempos speed up,
real-time and nearly real-time operational data integration projects
will prevail over bulk transfer projects. As of the date of the survey,
40% of participants used open source tools to manage their batch
operational data integration tasks, compared to only 22.9% for real-
time projects—but the latter is a much faster growing segment.
Page 5 of 13
6. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
ETL vs. Operational Data Integration
Taken together, batch and real-time operational data integration
projects (62.9%) are slightly better represented than ETL usage
share (61.5%), even though the former market segment is less
mature. And, if we also add in data synchronization, the operational
project share reaches 82%. The reason for this over-representation is
simply that open source tools are particularly appropriate for
operational projects because they meet a number of data
integration requirements, whereas—traditionally—proprietary tools
focus on ETL. In addition, enterprises that want to diversify their
data integration tools are often discouraged by the licensing costs of
proprietary applications. Open source solutions offer a greater
breadth of connectivity and more flexibility in terms of adoption,
deployment, and maintenance.
Data Integration Needs and Tools
Although software companies are trying to provide unified
integration solution packages, the data integration needs for most
enterprises are so complex that they often need to multiply the
number and nature of the integration software products they use.
Manual scripting
Database utilities
Commercial software
0% 10% 20% 30% 40% 50% 60%
Data integration technologies used in conjunction with open source
Page 6 of 13
7. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
Survey participants proved to use a combination of commercial
applications, open source solutions, and database utilities to meet
their data integration needs.
The statistics show that using open source and commercial solutions
in combination is very common (31.2%), and that the two can, and
do, coexist on the same platform. In fact, open source solutions are
often complementary to an existing proprietary solution that—for
whatever reason—cannot address a specific need. In some cases it
may be that it’s not worth the expense of investing in a proprietary
solution extension.
The high incidence of database utilities shown in the survey results
(53.9%) is as expected—these utilities are a no-cost solution and are
usually included with the databases. Their usefulness, however, is
limited to dedicated database usage.
Applications are often stacked as needs arise—increasing
connectivity issues—whether enterprises want their CRM system to
communicate with their ERP module, or to have their disparate
databases exchanging information with their home-grown platform.
Faced with multiple connectivity issues, enterprises often have no
option other than manual scripting to keep data flowing across their
heterogeneous enterprise systems. This is why the survey results
rank manual scripting as one of the technologies most frequently
invoked (54.7%) by enterprises to meet their integration needs.
Although this is much higher than commercial (31.2%) packaged
technologies, it is not surprising that manual scripting is the solution
of choice as it carries the lowest initial cost.
Although manual scripting is often intended to be a short-term fix
for interchange issues, once in production it often becomes a
permanent solution. And, in the end, this simple stop-gap can
Page 7 of 13
8. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
become an entire home-grown platform. The drawback of hand
coding or home-grown platforms surfaces over time in the inevitable
maintenance problems that increase the TCO. The advantage,
however, is that it fits a particular need that none of the available
commercial or open source solutions can meet.
Open Source Data Integration vs. Proprietary Solutions
In an ongoing effort to lower their data integration software TCO,
many enterprises are now considering open source solutions, not
just for one-time projects, but also for their ongoing mission-critical
processes, to replace or complement their expensive CPU-
dependent solutions.
Ease of use
Performance
Avoid lock-in
No licensing costs
Source code access
0% 20% 40% 60% 80% 100%
Very important Important Neutral Not important
Decision criteria
Open source solutions are a real alternative to the proprietary
world. Key players have made major strides toward improving the
usability and friendliness of open source technologies, traditionally a
weak spot for these applications.
In just a few short years, open source has evolved from something
“geeky” into an enterprise-ready solution. Today, open source
solutions are sufficiently feature-rich to meet complex user
requirements. The survey results reflect these expectations.
Page 8 of 13
9. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
Respondents felt most strongly about ease-of-use (59%) and
performance (53.9%) as the most important aspects of an open
source data integration solution.
Surprisingly, licensing cost is not the gating criterion for enterprises
turning to open source solutions. It actually comes fourth after
performance, ease of use, and no lock-in (42.5%), with only 42.1% of
respondents considering it very important.
Access to the source code comes last on most priority lists when
enterprises are choosing open source tools.
It is a common misconception that control of the source code is
important for users of open source software. Most users today
understand that open source solutions are as mature as their
proprietary counterparts and, therefore, don’t feel the need to
enhance the code themselves.
Today, open source solutions are advantageously replacing the
source code escrow of proprietary software. However, few
enterprises want to allocate in-house resources (or even have the
expertise) to edit, enhance, and maintain their data integration
applications code.
Enterprise Requirements
An analysis of the survey data indicates that users expect the same
performance and enterprise-scale features from open source
solutions that they previously found only in proprietary products. In
order of importance these features include:
• centralized scheduling and execution dashboard
• shared repository
• administration tools
Page 9 of 13
10. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
70%
60%
50%
40%
30%
20% Scheduling tool
Dashboard
10%
Shared repository
0% Administration tool
Enterprise open source data integration requirements
First, 60.5% of respondents want a scheduling tool that lets them
consolidate and centralize their technical processes. Second, 57.8%
users need a dashboard to centrally monitor processes as they
execute. Because enterprise users often work in teams and need to
share data on large-scale projects, 54.9% consider a shared
repository essential. Finally, 38.4% of enterprise users want an
administration tool to centrally manage users and projects.
However, not all companies have enterprise-scale requirements.
Single users and SMBs might not need that sort of enterprise-grade
feature. What emerges is that open source solutions address diverse
needs for a variety of user profiles, whether large or small.
Community Support
As shown, enterprises want the same support with open source
solutions that commercial applications provide. The major
difference lies in the fact that a significant number of open source
users (84.9%) would rather call on the community for help
addressing issues than get support from a dedicated service. This
lets them reduce the cost of support and decrease their data
integration budget; the return they get from the community is
Page 10 of 13
11. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
comparable in quality to traditional support from a proprietary
vendor.
Community support (forums, etc.)
Email-based or Web-based support
Guaranteed response times
Phone support
0% 20% 40% 60% 80% 100%
Community vs. commercial support expectations
Open source users value the forum and the other community tools at
their disposal, as well as the ease-of-mind that comes from knowing
that there is no pressure to upgrade or to buy new tools. The
community also tends to be more responsive than traditional support
services and community tools are no-cost to the enterprise.
However, enterprise users working on mission critical projects, do
need (and demand) vendor-provided, enterprise-grade technical
support. This still represents a minority of the total number of users
of open source data integration (20.9%), but is a fast growing
proportion.
Community Involvement
Two-thirds of the respondents say that they are willing to actively
participate in the community, and nearly half are ready to help
beta-test open source products. Open source communities have a
real, live QA lab of thousands at their disposal. Open source users
appreciate getting support from the community and feel at ease in
sharing their experiences and helping other users solve problems.
Getting involved in the community ensures the sustainability of the
Page 11 of 13
12. Talend White Paper Usage Landscape - Enterprise Open Source Data Integration
open source arena and, by extension, the sustainability and the
quality of the application they use.
80%
60%
40%
20%
Forum
Beta testing
0% Code contributions
Expectation for community contributions
Other community tools—like bug/feature tracking systems—are also
broadly used by the community, especially for feature requests.
Because the development cycle of open source applications is
usually quite short, users know that the chances of getting a feature
request developed and made available in the next release of an
open source application is significantly greater than a similar
request in the proprietary domain. It’s a win-win situation.
Community enterprises are asked to Beta-test and report bugs on
features that they requested previously, ensuring both quick access
to these features and the quality of the developed application.
In addition, participating in the community is much less time-
consuming than getting involved in the development itself. Only
10.4% of users want to contribute to code development. A closer
look at this group indicates that most of them want to contribute
external features—such as connectors—rather than core code.
Page 12 of 13