SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
W    I    N   T    E       R            C    O        R   P    O   R   A   T   I   O   N
W H I T E PA P E R
                                       SAP Sybase IQ 15.4
                                                     An Elastic Platform
                                                    for Business Analytics




                                                                               en   t Experts
                                                                          g em
                                                                      a
                                                                 an
                                                             M
                                                    a   ta
                                                D
                                            e
                                    al
                                   Sc
                               e
                              rg
                              La
                         The




SPONSORED
 RESEARCH
 PROGRAM
W   I   N   T   E    R         C     O      R    P     O     R     A        T   I   O   N




                SAP Sybase IQ 15.4
                          An Elastic Platform
                         for Business Analytics




                                   APRIL 2012




                          245 First Street, Suite 1800
                             Cambridge MA 02145
                                  617-695-1800

                     visit us at www.wintercorp.com



            ©2012 Winter Corporation, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics              3

                              A WINTER CORPOR ATION WHITE PAPER



Executive Summary


E
      xecutives around the world are intensely focusing on business analytics. They
      see an analytical approach to business decisions—an approach based on more abundant
      data and mathematical analysis of that data—as the cornerstone of new strategies for profitable
operation, profitable growth, new product development and customer engagement.
The opportunity to benefit from business analytics is especially large right now in part because
businesses have access to “big data”—enormous, previously unavailable volumes of data on the
actions, interests and sentiment of customers; on the movement of products, components and raw
materials through the supply chain and the distribution chain; and, on many other aspects of the
operation of businesses and their market environment. Perhaps surprisingly, the challenge of “big
data” is not only the data volume. It is also that much of the new data is less structured and less
regular than the tabular corporate data that has been the focus of data warehousing in the past.
The new big data comes from new or greatly expanded sources: social media, rapidly proliferating
smart mobile devices, from vehicles and a dizzying array of new sensors and intelligent products.
Even beyond the challenges of big data, there are other obstacles to success with business analytics:
data analysis can be a cumbersome, slow, frustrating and expensive process. First you have to find
the data you want. Then you have to get it loaded into a repository where it is accessible. Then you
need to cleanse it, organize it and integrate it with other data of interest. Then you have to conduct
the analysis…every step bedeviled by many practical difficulties, not least of which is often the
difficulty of getting help from people with the right skills.
New open source technology has emerged and is being deployed for “big data”; new vocabulary
includes terms such as “Hadoop clusters” and “MapReduce.” This technology brings new benefits
for certain types of information and analysis. However, it also creates one more data silo in a world
in which there are already too many silos. The complete analytical process thus gets enhanced in
some areas but also becomes more fragmented: to get analytic results and business solutions,
stakeholders must contend with a yet more complex environment with net new skill requirements.
The new, highly analytical business strategies place a particular emphasis on prediction. Knowing
what happened yesterday isn’t enough—you need to predict which of the business actions in front
of you is likely to produce the best result. And, as well as judgment, you need facts, data and analysis
to back that decision. And, you must take into account the new data sources—the customer sentiment
expressed on social media; the customer behavior evident from new data sources and devices; the
subtle patterns that can be seen in purchase behavior, web browsing and many other sources; and,
the supply chain realities now visible as parts, components, goods and materials move around the
world and are affected by weather, catastrophes and human events.
Often, to the decision maker, the unfortunate reality is that predictions of which profitable customers
are at risk may indeed be extremely valuable, but getting such predictions before it’s too late is
easier said than done.

 For many enterprises, then, the key to the analytic opportunity is finding a way to make the
 entire analytic process work smoothly, conveniently, responsively and cost effectively—whether
 the analysis focuses on the tabular data most frequently used for the past 25 years; on newer
 data sources, such as sentiment expressed in social media; or, both.

In response to this challenge, SAP has introduced a new version of its flagship analytic DBMS
product—SAP Sybase IQ 15.4—as a platform and an integrated environment to support and
facilitate the customer’s entire analytic process.




                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics           4

                              A WINTER CORPOR ATION WHITE PAPER



In addition to a greatly enhanced DBMS engine for data warehousing, Sybase IQ 15.4 features
significant new capabilities for business analytics and big data. Highlights are:
•	 A new analytic services layer that supports the use of MapReduce and many other analytic
   functions on data within Sybase IQ itself;
•	 Parallel interaction between Sybase IQ and Hadoop;
•	 Support of R, the open source language for statistical analysis;
•	 Support of new third party SQL-callable functions for data mining and predictive analytics;
•	 An expanded eco-system for the support of third-party applications for information
   lifecycle management, business intelligence and data integration, predictive analytics and
   system/data administration.
At the core of Sybase IQ 15.4 is the most mature column store DBMS for data warehousing on the
market, with sophisticated capabilities for data compression, query processing and query
optimization—an engine with a long record of exceptional query performance and efficiency.
While column storage and column-oriented data compression have been “hot trends” for the last
few years, Sybase IQ was built from day one with these capabilities: its users have been benefitting
from them for more than a decade. And, they contribute significantly to the efficiency of Sybase
IQ for analytics.
In addition to the remarkably efficient storage and query processing technology at its core, Sybase
IQ 15.4 features PlexQ™ technology, a distinctive, elastic design that supports highly parallel query
processing and data loading along with independent scaling for data growth and workload growth.
WinterCorp, an independent expert in analytic data management and big data, has been invited
by Sybase Inc, an SAP company, to review its new product, SAP Sybase IQ 15.4. To conduct its
review, WinterCorp, reviewed product designs and documentation; and, engaged in technical
discussions of the product architecture with key employees at SAP/Sybase and with independent
parties. This White Paper, sponsored by Sybase Inc, an SAP company, presents WinterCorp’s views
and findings from that review.




                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics                                                     5

                                             A WINTER CORPOR ATION WHITE PAPER



Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Table of Contents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2  Architecture of SAP Sybase IQ 15.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
    2.1  A Platform For Business Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3  The SAP Sybase IQ15.4 Core Data Management Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . 11
    3.1 Data Load Performance and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
    3.2  Column-Store Storage Efficiency, Indexing, and Compression . . . . . . . . . . . . . . . . . . . . . . . 13
    3.3  Query Processing Performance and Scalability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
    3.4  Very Large Database (VLDB) Management and Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
    3.5  In-Database Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
    3.6  Text Search and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4  The Application Services Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
    4.1  “MPP Enabled” User Defined Functions (UDF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
    4.2  Protected JAVA UDF’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
    4.3  In-Database MapReduce. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
    4.4 Simulation for In-Database Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
    4.5  Hadoop Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
    4.6 Geospatial/Geometric Data & Query Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
    4.7  Free Express Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5  The Ecosystem Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
    5.1 SAP BusinessObjects Portfolio Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
    5.2  “R” Language Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
    5.3 MapReduce-Enabled Data Mining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
    5.4 Social Network Analysis Modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
    5.5 Sybase PowerDesigner 16 Architecture Recommender. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
    5.6  In-Database PMML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28




                                 Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics                              6

                                   A WINTER CORPOR ATION WHITE PAPER




1   Introduction
    This paper examines the architecture and capabilities of SAP Sybase IQ 15.4 with a particular focus
    on demanding new requirements for business analytics and big data.
    Business Analytics.   People who have been involved with data warehousing for the last decade or
    more—especially those with a technical background in the field—are often puzzled by the new
    wave of executive interest in “business analytics.” A common question is, “Aren’t we doing that
    already?” Surely, the reason all that data has been modeled, cleansed, integrated and stored in
    data warehouses for the last ten or twenty years is so that it
    can be analyzed!
    Certainly there has been analysis going with data warehouse data.
    But, from the perspective of the business manager or business                                Methodology
    end user, data warehousing and business intelligence in practice                             & Sponsorhip
    has too often meant little more than ‘routine-ized’ reporting;
    extraction to other applications and systems; and, the occasional
    ad hoc query. Sure, business intelligence tools have steadily                          This WinterCorp Executive
    improved; data may be delivered on nicer looking, more functional                      Report describes two trends:
    electronic reports and dashboards; data access may be more                             business analytics and “big data”
    interactive; and, data may even be available on mobile devices.                        —and the approach to them
                                                                                           adopted in SAP Sybase IQ.
    All of these advances add some value.
                                                                                            In developing this report,
    But most end users will still tell you the same thing: most of what
                                                                                            WinterCorp drew on its own
    they have been doing with the data warehouse has been “looking                          independent research and
    in the rear view mirror.” Often, business users learn what has                          experience; interviewed SAP
    happened from the data warehouse. They learn which products                             Sybase IQ employees; and,
    have been selling; which customers have been buying; which                              reviewed SAP Sybase IQ
    suppliers have consistently delivered on time… these insights are                       product materials.
    treasured when good information was not previously available                            In its capacity as the sponsor of
    as a basis for decision making.                                                         this report, Sybase Inc, an SAP
    The problem is that the practice of business management has                             company, was provided an
    moved on from that point. Looking in the rear view mirror is no                         opportunity to comment on the
    longer enough.                                                                          paper with respect to facts.
                                                                            WinterCorp has final editorial
    Increasingly, operating and strategic decisions must be based on
                                                                            control over the content of this
    forward looking analysis with a mathematically sound
                                                                            publication and is solely
    foundation. The analytical approaches to business exemplified
                                                                            responsible for any opinions
    in Competing on Analytics1 and a series of subsequent books—and
                                                                            expressed.
    in the best selling popular book and recent hit movie, Moneyball  ,
                                                                    2

    have influenced business culture. These accounts and many
    others have shown how business performance can undergo
    radical improvement when the decision making process looks
    forward with analytics. At the heart of this revolutionary analysis is better prediction: whether of
    the performance of a baseball player, of a product, of a service—or the behavior of a customer.

    1
      Competing on Analytics, The New Science of Winning, Thomas Davenport and Jeanne Harris, Harvard Business
    School Press, 2007 (www.tomdavenportbooks.com)
    2
      Moneyball, The Art of Winning an Unfair Game, Michael Lewis, W.W. Norton & Company, 2003



                          Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics              7

                               A WINTER CORPOR ATION WHITE PAPER



And, while you may feel that your data warehouse already has the capabilities to support these
analytics, there is more to the story.
Big Data.  As predictive analytics have been gaining ever more significance in business circles, another
trend—big data—has made a profound impact on business and data strategies.
“Big data” is a broad phenomenon encompassing the rise of social media; the seemingly sudden
proliferation of machine generated data; the worldwide spread of mobile intelligent devices, including
smart phones and tablets; the widespread use of GPS data, which attaches a location to many events
in daily life; and, rapid decreases in cost associated with capturing, delivering and storing a wide
range of previously costly varieties of data, including voice, image, video, etc.
Taking all of these phenomena together, we are witnessing an enormous explosion of data which is
many times larger and faster growing than what we have seen in data warehouses over the last
decade. While the transactional information about customers, products, stores and the like is still
uniquely valuable—and plays a central role in understanding any business—there is now new and
unprecedented information available that can provide business, engineering, scientific and medical
insights never before available.
To provide one example, a useful technique in customer retention is to observe when a profitable
customer’s activity with a credit card begins to decline and then react quickly to retain the customer
before the account is cancelled. When this technique works it is much more efficient than acquiring
a new customer that is equally or more profitable.
But what if you could know earlier—before the usage declined—that the customer was at risk?
Perhaps the retention rate would become yet higher and the retention cost lower, particularly if you
could discover the reason that the customer relationship was threatened. If you knew the reason,
then your actions to deal with it could be yet more efficiently directed at the root cause.
But how could you know earlier? One possibility is social media. If you are engaged with your
customers on social media, they may tell you what they are thinking: that they like the service or
the incentives or the prices offered by a competitor; that they don’t like your call center or your fees.
Or, if they have opted into your social media program, they may let you see what they are saying to
others about your product or service.
The enormous flood of data pouring out of social media is one of many examples of big data. Data
is also pouring out of a growing tide of products that we use every day, and to the extent that we opt
in, manufacturers can gain precious knowledge about how, when, and where we use products—and
what problems we have with them. This is clearly the case today with smartphones and tablets.
Vehicles are becoming more intelligent and more connected and will increasingly provide similar
capabilities (more expensive commercial vehicles, such as helicopters, already provide telemetry data
that is used to optimize safety and maintenance). The trend will spread to many other products that
we use every day, in every case generating yet more “big data” for analysis.
New Tools and Technologies.   The concurrent rise of predictive analytics and big data has generated
interest in new tools and technologies for several reasons.
First, much of the big data does not fit closely with the relational database model. Much of the
significance of the data is not revealed by fitting it into a tabular structure. Social media data has
textual, image, audio, video and other components that must be analyzed primarily by specialized
or procedural functions—SQL solves a relatively small part of the problem here. Embedded in the
data is a social graph which is most readily analyzed outside of SQL.


                      Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics           8

                              A WINTER CORPOR ATION WHITE PAPER



In general, a significant element of the new, more predictive analysis—especially of the newly varied
and highly voluminous “big data”—is best attacked with tools other than SQL. In connection with
this, interest has grown in MapReduce, a parallel data analysis framework, and Hadoop, an open
source engine for running MapReduce jobs.
Some data analysis jobs can be readily performed in a Hadoop cluster. Others may require the services
of a data warehouse, such as SAP Sybase IQ. Yet others may best be handled with a combination of
the two.
Regardless of where the data is stored, interest has also grown rapidly in other analysis tools, such
as the open source statistical analysis language, R. In general, the new business analytics will use
SQL and the data warehouse, but will also create a strong demand for other tools.
Data Strategies.  As enterprises grapple with this rapidly changing world of big data, they need a
data infrastructure that will enable them to implement analytic business strategies. Especially with
regulatory and governance requirements enforcing longer periods of data retention, enterprises
need a convenient, flexible, cost effective process for solving analytic data problems from beginning
to end.
Sybase seeks to address that customer need—for a comprehensive approach to business analytics—
through its new capabilities in SAP Sybase IQ 15.4.




                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics              9

                                  A WINTER CORPOR ATION WHITE PAPER




2   Architecture of SAP Sybase IQ 15.4
    Software relational database engines have been commercially available since the 1970s. To this day,
    most of these products were originally conceived as row storage engines for online transaction
    processing. A notable exception is SAP Sybase IQ. Conceived from its earliest days as a column-storage,
    analytical DBMS, Sybase IQ was in many ways ahead of its time. It was the first commercial column
    storage engine; the first to put a major emphasis on data compression; and, one of the earliest to place
    a strong emphasis on complex queries and analytics, rather than on online transaction processing.
    Sybase IQ has come into substantially widespread use, with thousands of customer installations, and
    thus has developed into a reliable, highly usable, comprehensive product for data warehousing and
    business intelligence.
    But, with Sybase IQ 15.4, that distinctive engine architecture has been expanded into something
    more: a platform for large scale business analytics. This section will discuss the new capabilities of
    Sybase 15.4 and describe how they support and enable analytics for the data warehouse and for the
    newer phenomenon of big data.
    2 .1   A PL ATF O R M F O R BUSINESS ANALY TICS
    With the introduction of Sybase IQ 15.4, SAP has expanded its IQ product line from data warehouse
    engine to business analytics platform, as shown in Figure 1.
    The core data management infrastructure, represented by the innermost layer in Figure 1, is a high
    performance column storage analytic database engine. In recent releases, the core data management
    infrastructure has been enhanced with SAP’s patented PlexQ™ technology, which SAP characterizes
    as massively parallel shared everything architecture. The combination of the relatively new PlexQ™
    technology and Sybase IQ’s previously developed grid structure results in an elastic architecture—
    on which capacity is readily added or removed. The underlying database engine, a distinctive design
    with sophisticated column storage, compression and indexing techniques, has long established
    advantages in query performance. In Sybase IQ 15.4, the core data management infrastructure is
    further enhanced with new capabilities for large object compression and high performance bulk
    inserts via the industry standard ODBC and JDBC interfaces. The core infrastructure has several
    other noteworthy features, highlights of which are discussed in Section 3.

    	                             Figure 1: Sybase IQ 15.4 as a Platform for Business Analytics




                         Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics          10

                              A WINTER CORPOR ATION WHITE PAPER



The Application Services Layer, shown in Figure 1, is a greatly expanded set of services designed
specifically to for the development and support of analytic applications. It also provides facilities
for users and partners to develop and use their own analytic functions that Sybase IQ will run in
parallel against the database. This layer provides major new services, including an implementation
of native MapReduce that runs in parallel against the database and also provides connectivity with
Hadoop. The Application Services Layer is described further in Section 4.
The Ecosystem Layer, represented by the outermost layer in Figure 1, is an environment in which
SAP and its partners can provide and support analytic applications and tools, as well as the business
intelligence tools that have long been available with Sybase IQ. Some key elements of this layer
that are new in Sybase IQ 15.4 include support for:
•	 Expansion to support all major Business Intelligence and Data Integration tools including
   optimizations for SAP BusinessObjects products;
•	 the R language, an open source language for statistical analysis;
•	 a library of MapReduce enabled data mining functions that will run in parallel against data in
   Sybase IQ;
•	 a set of social network analysis modules; and,
•	 packaged applications for analytics and data lifecycle management.
The Ecosystem layer is yet another significant enhancement to the analytic capabilities of Sybase
IQ and a third major element of SAP’s initiative to make Sybase IQ a major platform for business
analytics. The Ecosystem Layer is described in Section 5.
While Sybase IQ has long enjoyed a respected presence in data warehousing, increasing its customer
base over the last few years from about 2,000 to over 4,500 installations, Sybase IQ 15.4 is clearly
something new and different from what Sybase has offered before. As well as significant continuing
enhancement to its core DBMS engine for data warehousing, SAP is now offering an array of
capabilities for business analytics with Sybase IQ.




                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics         11

                                  A WINTER CORPOR ATION WHITE PAPER




3   The SAP Sybase IQ 15.4 Core Data Management Infrastructure
     The core infrastructure of SAP Sybase IQ has been enhanced significantly over the last three
     releases with the implementation of elastic PlexQ™ grids for highly parallel processing.

    The elastic PlexQ™ grid preserves the advantages of the earlier Sybase IQ architecture—a
    sophisticated form of shared data clustering—while adding scale out processing for queries, loads
    and other large data warehouse operations.
    In prior releases, Sybase IQ could run queries and loads in parallel across a single node. In Sybase
    IQ 15.4, with PlexQ™, the system can run an individual query or load in parallel across multiple
    nodes. This ability to scale out for individual queries and loads enables Sybase IQ 15.4 to handle
    significantly larger scale data warehouses and analyses.
    In addition, as in prior releases, Sybase IQ 15.4 can spread the work of multiple users across the
    nodes of the grid. Also, grid nodes can be grouped and assigned to specific workloads or user
    populations, making it possible to dedicate a chosen set of nodes to a particular purpose. New
    nodes can be added to the cluster as the workload grows, providing an elastic character to the
    system. Figure 2 below provides an overview of the core data management infrastructure:

                        Figure 2: SAP Sybase IQ Core Infrastructure with PlexQ™ Technology




                                                         Source: Adapted from a diagram by SAP Inc.

    Sybase IQ runs on Red Hat and SUSE Linux 64/32 bit systems, Windows 64/32 bit, AIX 64 bit, Sun
    Solaris 64 bit, and HP-UX 64 bit systems, providing for customers to independently optimize storage,
    caching, processors, memory, threading, and load distribution.
    3.1   DATA LOAD PE RF O R M AN CE AND S CAL ABILIT Y
    Sybase IQ data load performance and scalability depend primarily on seven factors:

     1.	 PlexQ™ technology, making it possible to spread the work of a load job across multiple
         nodes of an elastic PlexQ™ grid.




                         Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics                   12

                                    A WINTER CORPOR ATION WHITE PAPER



    2.	 In a new feature for Sybase IQ 15.4, highly efficient bulk inserts via ODBC and JDBC
        are supported. This means that many third party tools and applications that load via
        industry standard interfaces will load large data volumes much more rapidly. In some
        practical examples, for example when third party ETL tools are used, speeds up of 100
        times have been measured. 3
 3.	 Fast, flexible load processing built into the engine at the most fundamental level.
 4.	 Versioning to minimize contention between data-load and query processing.
 5.	 Automated, flexible remote loads.
 6.	 “Near-real-time” “Trickle-feed” loads.
 7.	 Sybase’s ETL (extract, transform, load) utility.
Fast Load Processing.  Sybase IQ provides specific features for speeding column-store data loading.
In the batch case, a column-store approach allows loads to be in “flat schema” (or “semi-normalized”)
format—that is, users can avoid the added space and complexity of storing the data as multiple
tables. Sybase IQ’s architecture allows parallelism in loading, including parallel feeds from
distributed clients (the “grid”) to multiple servers and parallelism by using multiple processors for
parallel storage of individual tables and columns in the target data-warehouse database. Sybase
IQ loads only those columns that have changed in a given row (or, of course, in the entire data
store)—this typically allows Sybase IQ to create loads a fraction of the size of the comparable row-
store relational approach.
Versioning.  As the changed columns are loaded, they do not replace old columns. Rather, new
versions are created and old ones maintained while needed by ongoing queries. Within a new
column version, only changed pages create new storage. Thus, Sybase IQ querying is not interrupted
during data loading, data loading is not blocked by ongoing querying, and additional storage for
versioning is minimized.
Automated, Flexible Remote Loads. Sybase IQ allows scale-out loading across its grid architecture.
Data can be pulled from the clients, or “pushed” by the client to the server via ODBC. The utility
also enables data loading from SAP Sybase ASE, Microsoft SQL Server and Oracle data stores.
Near-Real-Time Loads. Sybase IQ supports “micro-bursts” of “microbatched” incremental data
loads (i.e., not the constant stream of updates of an OLTP database, but column changes accumulated
over a minute or two, loaded at once). For example, Replication Server—Real Time Loading Edition
15.5 allows delivery of changed data to the Sybase IQ data store within minutes of a data change
elsewhere. This ensures “near-real-time” up-to-dateness of data. Combined with versioning, it
allows up-to-dateness without interruption of ongoing queries.
Sybase InfoPrimer ETL Tool.  This coordinates data loading, including data cleansing as necessary.
It takes advantage of the features described in 1-4, and operates multi-threaded, for a high degree
of concurrency and/or parallelism. InfoPrimer ETL combines loading and indexing—a chunk of
data and its indexes are treated as a single object item—for additional ETL speedup. A SAP utility
automates data loading from SAP Sybase ASE, Microsoft SQL Server and Oracle data stores. Sybase
IQ also supports SAP BusinessObjects Data Services ETL tool and other third-party ETL tools such
as those of Informatica, Syncsort, and Data Stage. Note also that Sybase IQ supports “Extract Load
Transform” schemes, in which database functionality or stored procedures are used to speed some
forms of data transformation, as well as “change data capture” via Replication Server.



3
    Note that bulk inserts were efficiently implemented in prior releases for the native application interface.



                           Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics               13

                              A WINTER CORPOR ATION WHITE PAPER



3. 2   CO LUM N -STO RE STO R AGE E FFICIE N C Y, INDE XING , AND COM PRESSIO N
A key differentiator for Sybase IQ is its ability to store data in the minimum amount of space on
disk or in main memory, which has a dramatic positive effect on performance and scalability.
Relational data stores in row format, by and large, already minimize duplication of records (rows).
However, relational row stores duplicate columns within a row even when there is no data in the
column, and store the same value in a column multiple times. Sybase IQ’s columnar-data-store
approach does not store non-existent column data, and stores each distinct value only once (Figure
4). For example, where a relational row store may store the “Married” value (or any other value) in
the customer-marital-status field in every row, the columnar approach stores a pointer to one
central instance of each value in the field.

                               Figure 3: SAP Sybase IQ’s Columnar Data Storage




                                                                                     Source: SAP Inc.

Many queries in BI, complex or otherwise, analyze data using only a few fields in a record, or only
a few columns in a row. For queries involving analysis of many rows, this means that a row-based
query engine will retrieve much more data than necessary, slowing performance, while a column-
based query engine like Sybase IQ will retrieve only those efficiently-stored columns applicable
to the analysis. Add Sybase IQ’s ability to partition data according to columns and thus avoid some
indexing performance overhead (discussed in VLDB Management, below), and the more that
Sybase IQ scales, the greater the frequency and size of its performance advantage.
Note that other queries may favor a row-based approach—for example, those that access a small
number of rows and a large number of columns. The design philosophy of Sybase IQ argues that
such queries typically comprise a modest fraction of the workload in an analytic database. Therefore
the gains from a column-store approach will dominate the performance tradeoffs. While Sybase
IQ was alone in advancing this argument ten and more years ago, many several products have
since incorporated some column storage features or capabilities in response. However, few products
have been designed with a column storage approach from the ground up—and Sybase IQ remains
the most mature of these.
To improve storage efficiency, Sybase IQ’s column-store architecture adds data compression,
leveraging its storage of a single data type per column per data page. Aside from the standard
methods of compressing individual word strings, Sybase IQ offers bit-mapped indexing (in which
low-cardinality column data values are represented as bit strings, and query operations can be



                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics               14

                              A WINTER CORPOR ATION WHITE PAPER



carried out as bit operations, for two-orders-of-magnitude performance speedup) where appropriate.
In fact, Sybase IQ provides compression not only of the data, but also of its indexes (Figure 4).
In Sybase IQ 15.4, data compression is enhanced further for large data objects, providing a critical
new capability for unstructured data. The enhanced data compression applies to variable length
and fixed length character and binary large objects (VARCHAR, VARBINARY, CHAR and BINARY).
In early use of these features, data has compressed from 3 times to 16 times more than with prior
releases of Sybase IQ. This enhanced compression means fewer disk I/O operations to read and
write the same data, thus enhancing performance. Large objects are especially prevalent in the
new “big data” arena, where unstructured and semi-structured data accounts for most of the
increased volume.

                                  Figure 4: SAP Sybase IQ Data Compression




                                                                                     Source: SAP Inc.

Many relational databases “retrofit” compression into their database engine by decompressing
the data before processing it. Sybase IQ designed in query processing without decompression,
so that all operations use the compressed data, and the only time data is decompressed is when
processing is finished and the data is being sent to an end user to read in a report. Also, Sybase
IQ performs “perfect prefetch of pages,” because it knows from its bitmaps exactly which pages
have to be fetched in sequence. The result is an increase in the amount of data that can be stored
in main memory, allowing in-memory-database-like performance plus scalability beyond an
in-memory database.
Sybase IQ’s indexing schemes complement its columnar storage and compression approaches. In
particular, Sybase IQ offers a wide range of indexing schemes that allow columns with different
characteristics to be stored in less space (Figure 5).




                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics                                            15

                                       A WINTER CORPOR ATION WHITE PAPER


                                   Figure 5: Forms of Indexing Supported by SAP Sybase IQ

                                                                            Type of Query Operation
        Index Name                  Type of Data Useful For                                                                 Data Type
                                                                                   Useful For

                                        all columns with                          Projections with
 Fast Project (Default)               < 16M unique values                         scalar aggregates
                                                                                                                                 All

                                                                                                                    All except, BITs/CHARs
 High Group                         high cardinality columns                  Large joins, GROUP BYs
                                                                                                                             > 255
                                                                                                                    Mainly for integers and
 High Non Group                     high cardinality columns                       Range searches
                                                                                                                        CHARs < 255
                                   columns with < 1000 unique                     Projections, joins,               All except, BITs/CHARs
 Low Fast                          values requiring fast lookup                   scalar aggregates                          > 255
                                   Columns with DATE, TIME,                  Queries with dates, times,                DATE, DATETIME,
 Date, Time, DateTime                DATETIME data types                   timestamps ranges/compares                     TIME only
                                        two columns with
                                                                                                                       Mainly for integers
 Compare                             identical data types (for                    <, >, = compares
                                                                                                                         and CHARs
                                     comparison operations)
                                   Data types involving strings
 Word                                      and words
                                                                                 Dictionary Lookup                 CHARs, VARCHARs only

                                                                            Complex text terms/phrase
                                   Data types involving strings
 Text                                      and words
                                                                            searches including boolean,            CHARs, VARCHARs only
                                                                              proximity, and scoring


This broad range of indexing techniques is partly baked in (that is, data loading will automatically
index data in a compressed form for storage efficiency), but also allows the customer further
flexibility to create additional indexes to deliver performance for the customer’s unique querying
patterns. Because indexes are highly compressed, users can create a multitude of them up front in
anticipation of future ad hoc queries. An “index advisor” built into the query optimizer assists the
user by suggesting indexes that will improve query performance. Sybase IQ’s column store
architecture aggressively encourages usage of indexes—in many cases multiple indexes per
column—on which predicates are applied to obtain speed up. Figure 6 shows how Sybase IQ’s
data-storage approach can minimize I/Os.
Note also that SAP Sybase IQ can fetch data in large page sizes (typically 64K), which can reduce
disk I/O significantly.
                                          Figure 6: SAP Sybase IQ Query I/O Reduction
              EXAMPLE:
              select sum(sales)
                from customers
              where state = ’NY’
                and class = ‘A’

              Sybase IQ will use the LF indexes to filter rows and then apply to HNG to compute the sum.
              Minimal amount of data is read to resolve the query.
              Note also that Sybase IQ can fetch data in large page sizes (typically 64K), which can reduce disk I/O significantly.




                                                                                                          Source: SAP Inc.



                           Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics              16

                               A WINTER CORPOR ATION WHITE PAPER



3. 3   QUE RY PRO CESSING PE RF O R M AN CE AND S CAL ABILIT Y
The Sybase IQ query-processing engine is built to take advantage of all of Sybase IQ’s storage,
Shared Everything PlexQ™ architecture, and versioning capabilities. The cost-based optimizer can
load-balance a query across processors and systems, while constantly updating its “sense” of the
relative load on each processor/system. The optimizer also factors in the size of the compressed/
indexed data and its presence or absence in main memory, ensuring quicker data access and
processing. Sybase IQ can dynamically adjust its query execution plan based on concurrent
workload, after having started the execution of the query. Sybase IQ 15.4 rebalances query
resources—threads, processors, and cache—every several seconds, to maintain query performance
for both long-running/larger and short-time-period/smaller queries. Note that the intelligence of
the cost-based optimizer allows users to flexibly deploy heterogeneous small-scale servers if needed,
each with its own SLA (service level agreement).
Once the query is optimized, the engine carries out pipelining of operations within queries as well
as parallelism within and across queries. That is, a query that may involve an initial load and sort
followed by a join might begin the join operation for one column value immediately, without
waiting for all data to have been sorted. When one processor is finished sorting a column value, it
might move to sort the next, passing the value to the “join” processor. Multiple pipelines may
operate in parallel for different sets of data within a query. In the case of joins, in particular, Sybase
IQ provides two levels of parallelism, in which parts of data to be joined may be “grouped” initially
for separate, parallel processing, and then the groups may be joined together in a second step.
In the case of column data that uses bit-mapped storage and indexing, the engine takes an additional
step. It combines (performs bit operations) early, in order to reduce the number of times that the
engine needs to actually “touch” a data item. In this case, Sybase IQ never needs to do a table scan.
3.4    VE RY L ARGE DATABA S E (VLDB) M ANAGE M E NT AND BACKUP
The larger Sybase IQ implementations typically manage hundreds of terabytes of data; a few Sybase
IQ systems manage petabytes of data, according to SAP.
Moreover, Sybase IQ allows administrators to bind tables, indexes, and columns to particular storage
structures—thus placing less fresh groups of data on more price-performant storage (offline disk
or nearline tape) without significant diminution of performance. Logical “groups” can be moved
(e.g., from disk to tape) with simple commands, as when “aging” data becomes ready for archiving.
Sybase PowerDesigner (also part of the Sybase Workspace IDE) enables creation of programs that
generate reports based on the data’s logical “age.” To complement logical “data age” partitioning,
Sybase IQ supports physical range partitioning of columns/tables based on the values in a “date
created” or “date last modified” field. Older data can be marked “read-only,” avoiding the need for
further backup (see Figure 7).
Adding and removing data can have significantly less impact on performance (and hence the need
for retuning) than in row-based systems. Specifically, if a field needs to be added to or removed
from the data, it does not require reallocation of each row in storage or immediate redefinition of
all affected row indexes, and does not “lock up” rows during the addition/removal process. Moreover,
efficient data and column representation means quicker field addition or removal.




                      Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics               17

                               A WINTER CORPOR ATION WHITE PAPER



             Figure 7: Data Partitioning Allows the Placement of Older Data on Lower Cost Storage




                                                                                      Source: SAP Inc.

In general, Sybase IQ emphasizes ease of tool use by administrators. They can perform most needed
operations via the Sybase Central GUI (graphical user interface), and SAP anticipates releasing a
Web version of administrative tools with the same functionality within the next 12 months.
Parameters that administrators may tweak include modeling the data, and ETL. At the same time,
Sybase IQ automates job load balancing within a node, as well as ETL-based data-load balancing.
Sybase IQ supports active-passive disaster recovery, with manual failover of a single failed node.
Sybase IQ’s Virtual Backup integrates with the storage subsystem to create and periodically
resynchronize shadow data-device copies online, with delayed logged writing of updates to the
shadow. Effectively, this means that during normal processing, backup overhead is minimal, and
“virtual restore” involves only roll-forward of changes not yet applied to the shadow—often a
matter of seconds.
Note that Sybase IQ reduces the amount of pre-aggregation/materialization and index creation
work required of the typical data-warehouse administrator. Sybase IQ’s columnar approach
effectively aggregates data according to columns and values in a column; index compression is
carried out during data loading; indexes can be created automatically “on the fly” by the query
engine, and can be based on usage patterns rather than pre-defined by the administrator.
Security schemes involve both data communications (e.g., RSA, FIPS 140-2, Kerberos) and data
storage. Data storage encryption is applied to the entire database and to particular columns (using
Sybase IQ’s AES 128-bit encryption or an optional FIPS 140-2 certified version of the encryption.
3. 5   IN - DATABA S E ANALY TICS
Using stored procedures or user-defined functions compiled and optimized within the database
engine’s process is a time-honored way to improve performance of key query types. SAP extends
the notion to encompass not only built-in math functions and SQL OLAP operators but also SAS/
SPSS-type complex operations such as clustering, simulations, and classifications. And Sybase IQ
specifically opens this capability (e.g., via C++ plug-ins) to third parties such as Fuzzy Logix and
Visual Numerics.
Sybase IQ 15.4 introduced a major expansion of the User Defined Function (UDF) and other analytic
capabilities. This is described in Section 4 on Analytic Services.



                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics          18

                              A WINTER CORPOR ATION WHITE PAPER



3.6 TE X T S E ARCH AND ANALYSIS
Sybase IQ allows full (semi-structured) text data search in combination with traditional relational
(structured) data analysis. For example, users can find all instances of a word or phrase in a set of
text fields stored in Sybase IQ’s data store, without having to scan table rows or having to know
which column the word or phrase is stored in. Specialized text indexes that store positional
information for terms in the indexed column(s) speed up complex text search and analysis. Moreover,
Sybase IQ’s -in-database capabilities (outlined earlier) include plug-ins for third-party C++ Text
Analytics/Mining libraries.




                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics               19

                                  A WINTER CORPOR ATION WHITE PAPER




4   The Application Services Layer
    The Application Services Layer, represented by the middle layer in the Sybase IQ 15.4 analytic platform
    architecture, is a greatly expanded set of services designed specifically for the development and
    operation of analytic applications. This layer provides several new services, including an
    implementation of MapReduce that runs in parallel against the database. The Application Services
    Layer also provides facilities for users and partners to develop and use their own analytic functions
    that Sybase IQ 15.4 will perform in parallel against the database.

                             Figure 8: The SAP Sybase IQ 15.4 Application Services Layer




                                                                                         Source: SAP Inc.

    Additional key elements of the Application Services Layer include protected “out of process’ Java
    UDFs, spatial/geometric data and query support and simulation for in-database application
    development and testing.
    4.1 “M PP E NABLED ” US E R DE FINED FUN C TIO N S (UDF)
     Several of the advanced capabilities of Application Services Layer are possible because of the new
    forms of user defined functions (UDF) supported in Sybase IQ 15.4. SAP characterizes these new
    UDFs as “MPP Enabled,” meaning that Sybase IQ will run them in highly parallel fashion, including
    spreading the work of a single function call across multiple nodes of the PlexQ™ grid.
    These are functions written in C or C++ (and for some types, may be written in JAVA); and, are
    callable from SQL. Because such functions are enabled for execution in parallel across multiple
    nodes, they are key enablers for business analytics and big data.
    UDFs are a convenient mechanism for the advanced users or database professionals in an enterprise
    to codify certain calculations or analytical techniques specific to a business—and then make them
    available for use throughout the enterprise.
    Though the industry term for this capability is “user defined function”— and while Sybase IQ
    customers will certainly write them—a substantial library of such functions is provided by SAP
    and its partners. UDFs also provide a mechanism whereby a software vendor or data service
    provider can develop proprietary techniques; and, make them available for use by customers; but,
    without necessarily disclosing the algorithm or its implementation.


                         Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics            20

                              A WINTER CORPOR ATION WHITE PAPER



Four classes of UDFs are supported:
•	 Scalar functions operate on individual data items, returning a single value;
•	 Aggregate functions operate on sets of values, returning a single value; several aggregation
   operations are built into the SQL language—for example SUM, COUNT and AVERAGE—but
   aggregate UDFs provide an opportunity for users to create their own aggregation functions,
   which may incorporate techniques specific to an industry, company or analytical discipline;
•	 Table functions produce bulk data (that is, a table) as output and may be written in C/C++
   and/or JAVA;
•	 Table parameterized functions both accept bulk data as input and return bulk data as output
   and may be written in C/C++.
Taken together, considered in light of their enablement for highly parallel execution, these UDFs
provide a potent new analytical capability for SAP customers and partners.
4. 2 PROTE C TED JAVA UDF ’S
Prior to Sybase IQ 15.4, customers have been able to write UDFs in C and C++. Such functions had
to be tested and certified before they could be run as part of a production system. They ran in the
Sybase IQ kernel.
UDFs can now also be written in JAVA. In addition, they are run in a “protected” mode. This means
that they are executed in a separate process that runs on a database server (that is, it runs on a node
of the PlexQ™ grid). This prevents an error in the UDF from interfering in the operation of either
the core infrastructure of Sybase IQ 15.4 or in the operation of any other UDF or user process. The
result is therefore more reliable and consistently available data and analytical services.
4. 3 IN - DATABA S E M apReduce
“MapReduce is a software framework for distributed processing of large data sets on compute
clusters,” as described on the website of the Apache Foundation (http://hadoop.apache.org/mapreduce/).
In the MapReduce framework, data analysis tasks can be broken into functional pieces—called
mappers and reducers—each of which performs a portion of the analysis, reading an incoming
set of (key, value) pairs and writing an outgoing set of (key, value) pairs. When mappers and
reducers are run in the correct sequence, the complete analysis task is accomplished.
The MapReduce framework is especially interesting when a large volume of data is to be processed
because it is designed—and MapReduce functions are written—so that many copies of each mapper
and each reducer can be run at the same time in a parallel architecture. Thus, if there is a terabyte
of data to be analyzed and one runs 100 copies of a mapper, then each mapper needs to analyze
only 10 GB of data (assuming that there is a readily available way to partition the data into 100
roughly equal parts). This concept of scalable, highly parallel analysis is similar to the concept of
parallel query processing used in a data warehouse, though there are important differences between
the two.
Prior to the development of MapReduce, procedural programs to analyze data—written outside
of the context of a parallel database system—had to deal with all the complexity of parallel
programming. So, data could be analyzed serially—a very slow process with large data volumes—
or the programmer could get involved in the very complex and error prone process of specifying
manually how the data was to be:
•	 partitioned;
•	 fed to many separate copies of the analysis process; and,

                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics            21

                              A WINTER CORPOR ATION WHITE PAPER



•	 analyzed;
and, then, how the many separate results were to be
•	 recombined; and,
•	 delivered.
In a complex analysis there are many successive stages of parallel analysis, with the data passing
between them in complex patterns, and the difficulty of the programming task escalates rapidly.
The MapReduce framework relieves the programmer of explicitly dealing with the parallel aspect
of the analysis, freeing him or her to concentrate on the data and the analytical logic.
The MapReduce framework has been popularized in connection with Hadoop, an open software
system that implements MapReduce on compute clusters (typically, clusters of low cost servers and
low cost storage). Hundreds, if not thousands, of companies are now using or experimenting with
Hadoop clusters in part so that they can have an environment for storing large amounts of data—
the so-called “big data”—and analyzing it with MapReduce and other tools.
While a Hadoop cluster provides a repository for the storage and analysis of big data, it has different
advantages and limitations than a data warehouse. WinterCorp believes that most enterprises will
have an analytical environment in which at least one data warehouse and at least one Hadoop
cluster will be present. Section 5.x provides more information about Hadoop clusters and describes
the facilities in Sybase IQ 15.4 for interfacing to them and interacting with data stored in them.
Meanwhile MapReduce as a programming framework has come to be widely viewed as a standard
method for interfacing the procedural program—written in Java, Python or some other popular
language for data analysis—to a large volume of data in storage. This is because programs and
functions written using MapReduce can be executed in highly parallel architectures that speed up
the large scale analysis.

 In Sybase IQ 15.4, a facility is provided for running C++ applications that use the MapReduce
 framework and run within the Sybase IQ PlexQ™ elastic grid. They can run against data stored
 in the Sybase IQ database or against externally stored data. The data can be structured or
 unstructured, as Sybase IQ 15.4 is capable of storing either. And the mappers and reducers
 are stackable.

Note carefully that, in the Sybase IQ 15.4 context, such programs need not have anything to do
with Hadoop. The data that they are analyzing can be data previously stored in Sybase IQ and
that could be analyzed with SQL queries or any other tool that works with Sybase IQ. But, because
of the popularity of MapReduce this facility is likely to be valuable, because:
•	 Many libraries of analytical functions will be implemented for other environments using
   MapReduce; such libraries can then be ported to Sybase IQ 15.4;
•	 Many programmers, data scientists and other data specialists will gain familiarity with
   MapReduce and may prefer to program using it; and,
•	 Sybase IQ customers may want to build their own libraries of functions that can be used both
   on data in Sybase IQ and on data in other environments such as Hadoop; these customers
   will therefore be able to use MapReduce for this purpose.
As described in Section 5.2, at least one Sybase IQ partner has already leveraged this facility to
provide data mining functions to Sybase IQ customers using MapReduce.


                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics            22

                              A WINTER CORPOR ATION WHITE PAPER



 In addition, using MapReduce on Sybase IQ data will typically be simpler than accomplishing
 the same task on data in a Hadoop cluster. This is because the data to be analyzed can be
 selected and partitioned using SQL; the results returned by the analysis can be stored back
 in Sybase IQ using SQL; and, the definition of the data to be analyzed can be maintained in
 SQL. Each of these simplifies some aspect of the analytical process. Also, data that is stored
 in Sybase IQ is managed by Sybase IQ. It benefits from all of the services provided in the
 Sybase IQ environment for other data. For example, it can be incorporated in a routine backup
 schedule; it can be made recoverable; it can be secured via access controls; and, so on.

4.4 SIMUL ATIO N F O R IN - DATABA S E DE VE LO PM E NT
Analyzing data within a UDF—rather than transferring the data to an external system for analysis—
has important advantages for a user of Sybase IQ 15.4. First, it takes time and system resources to
transfer the data elsewhere. Second, the moment the transfer begins, the data starts growing stale.
If the analysis is delayed for some reason it becomes even more stale. And, the larger the volume
of data to be analyzed, the higher is the overhead of first moving it elsewhere. Second, Sybase IQ
15.4 is capable of running UDFs in parallel across multiple nodes. If the data is transferred to
another system for analysis—and if that system is not able to analyze data with an equal or greater
degree of parallelism—then there will be yet more delay. Third, if the results of the analysis are
substantial and are to be retained for later use, it will be efficient to write them in parallel back
within Sybase IQ, rather than having to transfer them from another system.
These reasons—and others—provide incentives to analyze data in place within UDFs in Sybase
IQ. But, there are some issues to address. As UDFs are being developed, they may contain errors.
A UDF under test could have unintended—and undesirable—effects on the production environment
if run there.
In some environments, the production data is extremely sensitive and it is not practical to have a
copy of it in a separate test environment.
To address such issues, Sybase IQ 15.4 provides facilities for testing of UDFs and applications that
are intended to perform in-database analysis. These facilitate the process of creating realistic
simulated data in a large scale test database. As a result, the development of in-database analytics
is far more streamlined and UDFs can be more completely tested before they are used with
production data.
4. 5 HADO O P INTE RFACE
Many enterprises will create—or have already created—an analytic environment in which there
are multiple data repositories, some on data warehouse platforms and others in Hadoop clusters.
In this situation, which WinterCorp believes will become nearly universal within the next several
years, it will be common to have analytic processes which leverage data from multiple sources.
In response to this emerging requirement, Sybase IQ 15.4 has four mechanisms in its Application
Services framework for connection between Sybase IQ and Hadoop. These are:
a.	 Client Side Federation.  The Quest TOAD data query tool (certified with Sybase IQ and Hadoop)
    can retrieve data from each source and bring it together at the client; this can be a good solution
    when the volumes of data returned are not very large;
b.	Analysis in the Sybase IQ Environment that Includes Data Extracted from Hadoop. ETL
   Hadoop data into Sybase IQ via Apache SQOOP, an open source tool for bulk data transfers
   between Hadoop and relational databases; SQOOP stands for “SQL-to-Hadoop”; this is a

                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics             23

                               A WINTER CORPOR ATION WHITE PAPER



  particularly attractive solution when the data extracted from Hadoop is to be joined or aggregated
  with data that resides in Sybase IQ; performing the work in the Sybase IQ environment brings
  all the benefits of a mature relational database environment, including security, compliance,
  backup/recovery, query optimization, and, defined and controlled data semantics; this is a bulk
  data transfer which can be highly parallel on both the Sybase IQ side and the Hadoop side.
c.	 Incorporation of Hadoop Data into Sybase IQ Queries.  When the data access is to be more
    frequent, and when the data volumes to be transferred are not very large; data can be retrieved
    from Hadoop using Sybase IQ table functions; these retrievals, while not instant because Hadoop
    is fundamentally a batch environment, can nonetheless be incorporated into SQL queries as
    they are executing;
d.	Coordinate Hadoop Job(s) with Sybase IQ Query.  In this case, a Hadoop MapReduce job runs
   separately from a query but is designed to feed data to it; the query and the Hadoop job interface
   by means of parameterized table functions in the Sybase IQ query; though similar to case (c)
   above, the emphasis here on coordinating and integrating analysis that is occurring in two jobs
   in two separate environments.
With these four capabilities, Sybase IQ customers can deal effectively with a range of situations in
which a Hadoop repository is to be used in conjunction with an Sybase IQ data warehouse to meet
an analytic requirement.
4.6 GE OS PATIAL/GE OM E TRIC DATA & QUE RY SUPPO RT
Trends have increased the prevalence and the significance of location data and geometric data in
the analytical environment.
First, GPS enabled devices such as smartphones and tablets are in widespread use and proliferating
rapidly. There are hundreds of millions of such devices in use today and projections are that there
will soon be billions. Such devices frequently communicate their location via the internet and such
location data ends up in many commercially valuable databases.
Second, many other types of GPS-enabled electronic devices are being created and they also
communicate their location with increasing frequency. Examples include vehicles, medical devices,
surveillance devices used for traffic analysis, weather devices and others too numerous to mention
here.
Data on the location of devices, once too expensive or impractical to obtain, now shows up in more
databases every day. Analysis of the location aspects of this data is central to the timely understanding
and management of public safety, public health, the commercial supply chain, customer purchase
patterns and many commercial resources.
A similar trend exists with respect to geometric data significant to the design and manufacture of
products; the maintenance of buildings, highways and bridges; the management of energy use;
and, so on.
In both cases, it is important for the data to be defined, captured and stored in as standardized,
easily specified and easily used a fashion as possible. It is also essential for the database system to
facilitate the specification of queries that exploit geographic and geometric data. And, it is essential
for the database system to perform such queries efficiently, especially when the data volumes
involved are large.
Sybase presently addresses these requirements in the embedded row store DBMS inside Sybase
IQ - SQL Anywhere, a very efficient small footprint DBMS that serves as a catalog store for Sybase

                      Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics          24

                              A WINTER CORPOR ATION WHITE PAPER



IQ’s engine. However, users can also spawn a separate instance of SQL Anywhere from Sybase IQ
to store Geo Spatial data. Sybase IQ then provides facilities for federated query of the geospatial/
geometric data stored in SQL Anywhere and the main analytical column data store in Sybase IQ
15.4 to enable high performance geospatial analysis
4.7 FRE E E XPRESS EDITIO N
As with any software platform, it is important for developers to be encouraged to develop tools
and applications for Sybase IQ. The robust and rapidly growing community of millions of Sybase
IQ end users— using the product at over 4,500 installations worldwide—is certainly an incentive
to developers.
But it is still important to remove obstacles from the path of any developer interested in providing
new capabilities to those users.
To this end, SAP is providing a free Express Edition of Sybase IQ 15.4. Anyone developing for
Sybase IQ (utilizing the rich Application Services described in this section) using Sybase IQ or
thinking about such an activity can download the product from http://www.sybase.com/iqexpressedition.




                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics                              25

                                      A WINTER CORPOR ATION WHITE PAPER




5   The Ecosystem Layer
    The Ecosystem Layer, represented by the outermost layer in the Sybase IQ 15.4 analytic platform
    architecture, is an environment in which SAP and its partners can provide and support analytic
    applications and tools, as well as the business intelligence tools that have long been available with
    Sybase IQ.

                              Figure 9: The SAP Sybase IQ 15.4 Application Enablement Layer




                                                                                             Source: SAP Inc.

    Key elements of the Ecosystem Layer include support for SAP BusinessObjects and the “R” statistical
    language; more efficient and scalable data mining functions written to exploit the new in-database
    MapReduce; social network analysis modules; support for PMML, for mathematical modeling;
    new capabilities for PowerDesigner and system administration and monitoring; applications for
    big data information lifecycle management;. Highlights are described in the following sections
    5.1 SAP Business O bjects Portfolio support
    As part of SAP, Sybase IQ is now well integrated with SAP’s market leading tools for Business
    Intelligence and Data Integration from its BusinessObjects portfolio. The SAP BusinessObjects BI
    Platform is not only certified with every new version of Sybase IQ, including Sybase IQ 15.4, but
    it is also being optimized to support Sybase IQ focused optimized SQL generation. Similarly, SAP
    BusinessObjects Data Services is being certified and optimized to load and transform data into
    Sybase IQ in a very efficient manner.
    5. 2 “R” L ANGUAGE SUPPO RT
    As described at www.r-project.org,
     R is a language and environment for statistical computing and graphics. …R provides a wide variety of statistical
     (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc) and
     graphical techniques, and is highly extensible.
     R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in
     source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD
     and Linux), Windows and MacOS.




                            Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics          26

                              A WINTER CORPOR ATION WHITE PAPER



Sybase IQ 15.4 provides support for the R language in two ways.
First, R applications can fetch data sets stored in IQ for analysis in the R environment through
RJDBC.
Second, calls to models written in R can be embedded in a Sybase IQ table UDF written in C++.
Then SQL queries submitted to Sybase IQ can call the UDF, thereby causing the model to be invoked
in an R server process.
5. 3 M apReduce- E NABLED DATA MINING
Since Version 15.1, the Fuzzy Logix library of data mining and analytic functions has been available
with Sybase IQ.
With Sybase IQ 15.4, this library of over 250 functions has been:
•	 Re-implemented using Sybase IQ’s in-database MapReduce API; and,
•	 Extended with additional new functions.
Most significantly, by using Sybase IQ’s in-database MapReduce API, the new implementation
leverages the Sybase IQ table and table parameterized functions (thus using bulk data input and
bulk data output to gain efficiency) and exploits the elastic PlexQ™ grid to execute the functions
with much higher parallelism.
5.4 S O CIAL NE T WO RK ANALYSIS MO DULES
KXEN’s InfiniteInsight social network analysis and predictive analytic toolset has been certified
with Sybase IQ 15.4 to run on data stored in the database. With Sybase IQ 15.4, KXEN does its
scoring directly in the database, and reports that it realizes large performance benefits both from
the column storage model and the in-database analytic support.
5. 5 SYBA S E POWE RDESIGNE R 16 ARCHITE C TURE RE COM M E NDE R
Sybase PowerDesigner is a widely used application and database design product that has long been
available and integrated with Sybase IQ.
PowerDesigner 16 and Sybase IQ 15.4 are now jointly enhanced and integrated to provide a new
capability of recommending the architecture for a Sybase IQ solution. The user provides
PowerDesigner with:
•	 Database design
•	 Expected data volumes & growth
•	 Expected workload
•	 Performance requirements
•	 Hardware preferences (e.g., Intel or Power)
PowerDesigner will then generate an estimate of the configuration required and a bill of materials
based on Sybase IQ reference architectures developed in cooperation with system partners. Where
pre-built appliance-like configurations are available, these can be generated.
The user can then vary input assumptions and examine the sensitivity of the configuration to
variations.
In WinterCorp’s opinion, such estimated configurations would be used only as a starting point in
certain capacity planning situations. Particularly in larger and more complex deployments, users
would be well advised to seek independent confirmation and measurement. However, a fast path
to an initial estimate is often extremely useful in capacity planning and this tool can provide that,
along with an indication of sensitivity to various planning assumptions.




                     Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics                    27

                               A WINTER CORPOR ATION WHITE PAPER



An area where particular caution is advised is in regard to large databases with complex query
requirements. Where query complexity is high and data volumes are large, modest changes to the
query workload can produce surprisingly large variations in capacity requirements. In these cases,
a certain amount of realistic testing—along with larger allowances for unexpected capacity
demands—are in order.
However, with this tool, a history of configuration changes can be initiated, estimated, tracked,
and maintained that can make sizing and deployments much more “factory like.”
5.6 IN - DATABA S E PM M L
From http://www.dmg.org/pmml-v3-0.html
 The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications
 to define statistical and data mining models and to share models between PMML compliant applications.
 PMML provides applications a vendor-independent method of defining models so that proprietary issues and
 incompatibilities are no longer a barrier to the exchange of models between applications.
PMML models can be developed in a variety of data mining and statistical workbench environments
available from other parties. However, when PMML models are actually used in production to
score large volumes of data, they must run in a highly parallel environment.
In Sybase IQ 15.4, users can run PMML models with a plug-in, developed by Zementis (http://www.
zementis.com/in-DB-plugin.htm). With the plug-in, the PMML model can be run directly against data
in Sybase IQ. The Zementis plug-in is a Sybase IQ UDF, leveraging the new JAVA API available in
Sybase IQ 15.4
Besides the various eco-system modules outlined above, Sybase IQ supports a substantial variety of
packaged analytical applications through its OEM partnerships covering various functional areas.
A few examples include Ericsson’s OSS product ENIQ, BMMSoft EDMT, and Solix EDMS.




                      Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics           29

                                  A WINTER CORPOR ATION WHITE PAPER




6   Conclusions
    Over the course of its last five rapid releases in 3 years—from 15.0 through the present 15.4—SAP
    Sybase IQ has been transformed to a platform for large scale data analytics and big data. It has
    significantly advanced in:
    •	 Scalability, with the development of its elastic PlexQ™ grid that adds highly parallel
       execution of large queries and loads; previously, such operations could run in parallel over a
       single node of the grid; now they can run in parallel over multiple nodes; this is a major
       architectural advance, highly significant for larger data and workload requirements;
    •	 In-database analytics, with a major generalization and extension of the user defined function
       (UDF) facility in Sybase IQ; with these new capabilities, UDFs can be written in Java as well
       as C++; they can read and write bulk data in the form of tables and files; they can be run in a
       protected mode, increasing system reliability and data availability; and, they can be executed
       in parallel over multiple nodes of the grid;
    •	 In-database MapReduce, enabling end users and partners to run MapReduce routines and
       libraries against data in place and in a highly parallel fashion in Sybase IQ, and opening
       Sybase IQ up to a large range analytic tools and applications from many vendors and sources;
    •	 Interface to Hadoop, enabling the many customers who are investing—or will invest—in an
       open source data repository in a Hadoop cluster—to leverage that investment in combination
       with data and analysis in Sybase IQ;
    •	 Other analytic application services leveraging in-database MapReduce and new, more
       powerful UDFs; these include an expanded, more efficient and more highly parallel version
       of the Fuzzy Logix data mining and analytics library; a simulator for testing analytic
       applications; and, other features.
    •	 Partner Ecosystem - Other analytical, management and business intelligence tools and
       functions available from partners, certified by Sybase IQ and providing analytical solutions
       and capabilities to customers; these include support for the SAP BusinessObjects tool set, the
       R statistical language; a PMML plug-in for data mining from Zementis; social network
       analysis from KXEN; query and administration tools from Quest TOAD; and, of other
       capabilities.
    These advances are evidence of a significant reorientation of the product direction and a significant
    enhancement of the product line to focus on the major drivers of change in business today.
    Organizations everywhere are grappling with the implications of a much larger volume and variety
    of data and a much increased focus on business strategies driven by fuller analysis of that data.
    Mobility (tablets, smartphones, other devices), social media and machine generated data are all
    changing our data environments.
    Sybase IQ now claims more than 4,500 installations of Sybase IQ across the globe, following a rapid
    growth in revenue and a large expansion of the development organization.
    In addition to the recent advances in releases 15.0 through 15.4 described here, Sybase IQ retains its
    established advantages in column storage, indexing and compression. These features—present since
    the earliest versions of Sybase IQ—work in combination to confer benefits that are unique to Sybase
    IQ. While other products offer column storage and compression, no other product has the sophistication
    of Sybase IQ in integrating these features with advanced indexing and query optimization. The result
    is that Sybase Q is particularly efficient in reducing the amount of data that must be read to satisfy
    queries. These fundamental strengths are now combined with increased parallelism and other
    features to deliver product benefits in a wider range of applications, now including those that use
    advanced analytic methods, including MapReduce and that involve interaction with big data in
    Hadoop clusters.


                         Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
WinterCorp is an independent consulting firm expert in the architecture
            and scalability of big data and analytic database solutions.
Since our founding in 1992, we have architected solutions to some of the largest scale
    and most demanding big data and data warehouse requirements, worldwide.
     We help technology users define their requirements; architect their solutions;
select their platforms; and, engineer their implementations to optimize business value.
       We create and conduct benchmarks, proofs-of-concept, pilot programs and
         system engineering studies that help our clients manage technical risk,
                          control cost and reach business goals.
         Our seminars and structured workshops help client teams establish
    a shared foundation of knowledge and move forward to meet their challenges
     in big data and analytic database scalability, performance and availability.
      We’re expert with SQL, MapReduce and Hadoop—with structured data,
       unstructured data, and semi-structured data—with the products, tools
              and technologies of data analytics in all its major forms.
     With our in-depth knowledge and experience, we deliver unmatched insight
           into the issues that impede scalability and into the technologies
                       and practice that enable business success.




                             245 First Street, Suite 1800
                                Cambridge MA 02145
                                     617-695-1800

                         visit us at www.wintercorp.com



                ©2012 Winter Corporation, Cambridge, MA. All rights reserved.

Contenu connexe

En vedette

Creating virtual groundwater research laboratories through interoperable tech...
Creating virtual groundwater research laboratories through interoperable tech...Creating virtual groundwater research laboratories through interoperable tech...
Creating virtual groundwater research laboratories through interoperable tech...Helen Thompson
 
CeCC Capability Presentation Sep 2011
CeCC Capability Presentation Sep 2011CeCC Capability Presentation Sep 2011
CeCC Capability Presentation Sep 2011Helen Thompson
 
Presentation Af X Aerosol Systems
Presentation Af X Aerosol SystemsPresentation Af X Aerosol Systems
Presentation Af X Aerosol SystemsReijns
 
Localización WordPress Chile y Proyecto Wordpress.cl
Localización WordPress Chile y Proyecto Wordpress.clLocalización WordPress Chile y Proyecto Wordpress.cl
Localización WordPress Chile y Proyecto Wordpress.clAYTY
 

En vedette (6)

Creating virtual groundwater research laboratories through interoperable tech...
Creating virtual groundwater research laboratories through interoperable tech...Creating virtual groundwater research laboratories through interoperable tech...
Creating virtual groundwater research laboratories through interoperable tech...
 
Ang Desyerto Tambayan
Ang Desyerto   TambayanAng Desyerto   Tambayan
Ang Desyerto Tambayan
 
CeCC Capability Presentation Sep 2011
CeCC Capability Presentation Sep 2011CeCC Capability Presentation Sep 2011
CeCC Capability Presentation Sep 2011
 
Presentation Af X Aerosol Systems
Presentation Af X Aerosol SystemsPresentation Af X Aerosol Systems
Presentation Af X Aerosol Systems
 
Neelesh it assignment
Neelesh it assignmentNeelesh it assignment
Neelesh it assignment
 
Localización WordPress Chile y Proyecto Wordpress.cl
Localización WordPress Chile y Proyecto Wordpress.clLocalización WordPress Chile y Proyecto Wordpress.cl
Localización WordPress Chile y Proyecto Wordpress.cl
 

Similaire à Elastic Platform for Business Analytics

SAP Business Objects Planning and Consolidaton, Version for SAP NetWeaver
SAP Business Objects Planning and Consolidaton, Version for SAP NetWeaverSAP Business Objects Planning and Consolidaton, Version for SAP NetWeaver
SAP Business Objects Planning and Consolidaton, Version for SAP NetWeaverIBM India Smarter Computing
 
Delivering Operational Excellence with Innovation
Delivering Operational Excellence with InnovationDelivering Operational Excellence with Innovation
Delivering Operational Excellence with InnovationFindWhitePapers
 
End User Performance: Building and Maintaining ROI
End User Performance: Building and Maintaining ROIEnd User Performance: Building and Maintaining ROI
End User Performance: Building and Maintaining ROIFindWhitePapers
 
Digital Business with SAP B1 - Introduction
Digital Business with SAP B1 - IntroductionDigital Business with SAP B1 - Introduction
Digital Business with SAP B1 - Introductionjzelynlim95
 
Sap S/4hana Mill Products Industry
Sap S/4hana Mill Products IndustrySap S/4hana Mill Products Industry
Sap S/4hana Mill Products IndustryBui Quoc Vu
 
Saps4 hana industry-paperpackaging
Saps4 hana industry-paperpackagingSaps4 hana industry-paperpackaging
Saps4 hana industry-paperpackagingKrishnagoud Dasari
 
SAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptx
SAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptxSAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptx
SAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptxAshwin Katkar
 
Sap bw4 hana architecture archetypes
Sap bw4 hana architecture archetypesSap bw4 hana architecture archetypes
Sap bw4 hana architecture archetypesLuc Vanrobays
 
IRJET- Business Intelligence using Hadoop
IRJET-  	  Business Intelligence using HadoopIRJET-  	  Business Intelligence using Hadoop
IRJET- Business Intelligence using HadoopIRJET Journal
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
End to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdfEnd to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdfIgnacioPeredoCL
 
Tableau Capping 112 477N
Tableau Capping 112 477NTableau Capping 112 477N
Tableau Capping 112 477NMark Soranno
 
Analytics and Information Architecture
Analytics and Information ArchitectureAnalytics and Information Architecture
Analytics and Information ArchitectureWilliam McKnight
 
Case Study Sap Establishing A Research Center Over China
Case Study Sap Establishing  A Research Center Over ChinaCase Study Sap Establishing  A Research Center Over China
Case Study Sap Establishing A Research Center Over ChinaLakeisha Jones
 
How to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics Cloud
How to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics CloudHow to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics Cloud
How to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics CloudWiiisdom
 
P6 analytics producing meaningful results in p6 analytics - Oracle Primavera...
P6 analytics producing meaningful results in p6 analytics  - Oracle Primavera...P6 analytics producing meaningful results in p6 analytics  - Oracle Primavera...
P6 analytics producing meaningful results in p6 analytics - Oracle Primavera...p6academy
 

Similaire à Elastic Platform for Business Analytics (20)

SAP Business Objects Planning and Consolidaton, Version for SAP NetWeaver
SAP Business Objects Planning and Consolidaton, Version for SAP NetWeaverSAP Business Objects Planning and Consolidaton, Version for SAP NetWeaver
SAP Business Objects Planning and Consolidaton, Version for SAP NetWeaver
 
Delivering Operational Excellence with Innovation
Delivering Operational Excellence with InnovationDelivering Operational Excellence with Innovation
Delivering Operational Excellence with Innovation
 
End User Performance: Building and Maintaining ROI
End User Performance: Building and Maintaining ROIEnd User Performance: Building and Maintaining ROI
End User Performance: Building and Maintaining ROI
 
Digital Business with SAP B1 - Introduction
Digital Business with SAP B1 - IntroductionDigital Business with SAP B1 - Introduction
Digital Business with SAP B1 - Introduction
 
Sap S/4hana Mill Products Industry
Sap S/4hana Mill Products IndustrySap S/4hana Mill Products Industry
Sap S/4hana Mill Products Industry
 
Saps4 hana industry-paperpackaging
Saps4 hana industry-paperpackagingSaps4 hana industry-paperpackaging
Saps4 hana industry-paperpackaging
 
Genpact_IPIE_an_analytics_foundation_v2
Genpact_IPIE_an_analytics_foundation_v2Genpact_IPIE_an_analytics_foundation_v2
Genpact_IPIE_an_analytics_foundation_v2
 
SAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptx
SAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptxSAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptx
SAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptx
 
Sap bw4 hana architecture archetypes
Sap bw4 hana architecture archetypesSap bw4 hana architecture archetypes
Sap bw4 hana architecture archetypes
 
Olap
OlapOlap
Olap
 
IRJET- Business Intelligence using Hadoop
IRJET-  	  Business Intelligence using HadoopIRJET-  	  Business Intelligence using Hadoop
IRJET- Business Intelligence using Hadoop
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Crystal Report
Crystal ReportCrystal Report
Crystal Report
 
End to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdfEnd to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdf
 
Tableau Capping 112 477N
Tableau Capping 112 477NTableau Capping 112 477N
Tableau Capping 112 477N
 
Analytics and Information Architecture
Analytics and Information ArchitectureAnalytics and Information Architecture
Analytics and Information Architecture
 
Case Study Sap Establishing A Research Center Over China
Case Study Sap Establishing  A Research Center Over ChinaCase Study Sap Establishing  A Research Center Over China
Case Study Sap Establishing A Research Center Over China
 
Sap in memory computing technology
Sap in memory computing technologySap in memory computing technology
Sap in memory computing technology
 
How to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics Cloud
How to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics CloudHow to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics Cloud
How to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics Cloud
 
P6 analytics producing meaningful results in p6 analytics - Oracle Primavera...
P6 analytics producing meaningful results in p6 analytics  - Oracle Primavera...P6 analytics producing meaningful results in p6 analytics  - Oracle Primavera...
P6 analytics producing meaningful results in p6 analytics - Oracle Primavera...
 

Plus de Sybase Türkiye

Italya Posta Teskilatı Sybase Afaria Kullaniyot
Italya Posta Teskilatı Sybase Afaria KullaniyotItalya Posta Teskilatı Sybase Afaria Kullaniyot
Italya Posta Teskilatı Sybase Afaria KullaniyotSybase Türkiye
 
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORTSAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORTSybase Türkiye
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSybase Türkiye
 
Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase Türkiye
 
Mobil Uygulama Geliştirme Klavuzu
Mobil Uygulama Geliştirme KlavuzuMobil Uygulama Geliştirme Klavuzu
Mobil Uygulama Geliştirme KlavuzuSybase Türkiye
 
Mobile Device Management for Dummies
Mobile Device Management for DummiesMobile Device Management for Dummies
Mobile Device Management for DummiesSybase Türkiye
 
SAP Sybase Data Management
SAP Sybase Data Management SAP Sybase Data Management
SAP Sybase Data Management Sybase Türkiye
 
Sybase IQ ile Analitik Platform
Sybase IQ ile Analitik PlatformSybase IQ ile Analitik Platform
Sybase IQ ile Analitik PlatformSybase Türkiye
 
Appcelerator report-q2-2012
Appcelerator report-q2-2012Appcelerator report-q2-2012
Appcelerator report-q2-2012Sybase Türkiye
 
Sybase PowerDesigner Vs Erwin
Sybase PowerDesigner Vs ErwinSybase PowerDesigner Vs Erwin
Sybase PowerDesigner Vs ErwinSybase Türkiye
 
Information Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesignerInformation Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesignerSybase Türkiye
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQSybase Türkiye
 
Mobile Application Strategy
Mobile Application StrategyMobile Application Strategy
Mobile Application StrategySybase Türkiye
 
Mobile is the new face of business
Mobile is the new face of businessMobile is the new face of business
Mobile is the new face of businessSybase Türkiye
 
Sybase SUP Mobil Uygulama Geliştirme Genel Bilgilendirme
Sybase SUP Mobil Uygulama Geliştirme Genel BilgilendirmeSybase SUP Mobil Uygulama Geliştirme Genel Bilgilendirme
Sybase SUP Mobil Uygulama Geliştirme Genel BilgilendirmeSybase Türkiye
 

Plus de Sybase Türkiye (20)

Italya Posta Teskilatı Sybase Afaria Kullaniyot
Italya Posta Teskilatı Sybase Afaria KullaniyotItalya Posta Teskilatı Sybase Afaria Kullaniyot
Italya Posta Teskilatı Sybase Afaria Kullaniyot
 
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORTSAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming Processing
 
Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem Performans
 
Mobil Uygulama Geliştirme Klavuzu
Mobil Uygulama Geliştirme KlavuzuMobil Uygulama Geliştirme Klavuzu
Mobil Uygulama Geliştirme Klavuzu
 
Mobile Device Management for Dummies
Mobile Device Management for DummiesMobile Device Management for Dummies
Mobile Device Management for Dummies
 
SAP Sybase Data Management
SAP Sybase Data Management SAP Sybase Data Management
SAP Sybase Data Management
 
Sybase IQ ve Big Data
Sybase IQ ve Big DataSybase IQ ve Big Data
Sybase IQ ve Big Data
 
Sybase IQ ile Analitik Platform
Sybase IQ ile Analitik PlatformSybase IQ ile Analitik Platform
Sybase IQ ile Analitik Platform
 
Appcelerator report-q2-2012
Appcelerator report-q2-2012Appcelerator report-q2-2012
Appcelerator report-q2-2012
 
Sybase PowerDesigner Vs Erwin
Sybase PowerDesigner Vs ErwinSybase PowerDesigner Vs Erwin
Sybase PowerDesigner Vs Erwin
 
Actionable Architecture
Actionable Architecture Actionable Architecture
Actionable Architecture
 
Information Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesignerInformation Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesigner
 
Why modeling matters ?
Why modeling matters ?Why modeling matters ?
Why modeling matters ?
 
Welcome introduction
Welcome introductionWelcome introduction
Welcome introduction
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQ
 
Mobile Application Strategy
Mobile Application StrategyMobile Application Strategy
Mobile Application Strategy
 
Mobile is the new face of business
Mobile is the new face of businessMobile is the new face of business
Mobile is the new face of business
 
Sybase SUP Mobil Uygulama Geliştirme Genel Bilgilendirme
Sybase SUP Mobil Uygulama Geliştirme Genel BilgilendirmeSybase SUP Mobil Uygulama Geliştirme Genel Bilgilendirme
Sybase SUP Mobil Uygulama Geliştirme Genel Bilgilendirme
 
Sybase IQ Big Data
Sybase IQ Big DataSybase IQ Big Data
Sybase IQ Big Data
 

Dernier

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Elastic Platform for Business Analytics

  • 1. W I N T E R C O R P O R A T I O N W H I T E PA P E R SAP Sybase IQ 15.4 An Elastic Platform for Business Analytics en t Experts g em a an M a ta D e al Sc e rg La The SPONSORED RESEARCH PROGRAM
  • 2. W I N T E R C O R P O R A T I O N SAP Sybase IQ 15.4 An Elastic Platform for Business Analytics APRIL 2012 245 First Street, Suite 1800 Cambridge MA 02145 617-695-1800 visit us at www.wintercorp.com ©2012 Winter Corporation, Cambridge, MA. All rights reserved.
  • 3. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 3 A WINTER CORPOR ATION WHITE PAPER Executive Summary E       xecutives around the world are intensely focusing on business analytics. They       see an analytical approach to business decisions—an approach based on more abundant       data and mathematical analysis of that data—as the cornerstone of new strategies for profitable operation, profitable growth, new product development and customer engagement. The opportunity to benefit from business analytics is especially large right now in part because businesses have access to “big data”—enormous, previously unavailable volumes of data on the actions, interests and sentiment of customers; on the movement of products, components and raw materials through the supply chain and the distribution chain; and, on many other aspects of the operation of businesses and their market environment. Perhaps surprisingly, the challenge of “big data” is not only the data volume. It is also that much of the new data is less structured and less regular than the tabular corporate data that has been the focus of data warehousing in the past. The new big data comes from new or greatly expanded sources: social media, rapidly proliferating smart mobile devices, from vehicles and a dizzying array of new sensors and intelligent products. Even beyond the challenges of big data, there are other obstacles to success with business analytics: data analysis can be a cumbersome, slow, frustrating and expensive process. First you have to find the data you want. Then you have to get it loaded into a repository where it is accessible. Then you need to cleanse it, organize it and integrate it with other data of interest. Then you have to conduct the analysis…every step bedeviled by many practical difficulties, not least of which is often the difficulty of getting help from people with the right skills. New open source technology has emerged and is being deployed for “big data”; new vocabulary includes terms such as “Hadoop clusters” and “MapReduce.” This technology brings new benefits for certain types of information and analysis. However, it also creates one more data silo in a world in which there are already too many silos. The complete analytical process thus gets enhanced in some areas but also becomes more fragmented: to get analytic results and business solutions, stakeholders must contend with a yet more complex environment with net new skill requirements. The new, highly analytical business strategies place a particular emphasis on prediction. Knowing what happened yesterday isn’t enough—you need to predict which of the business actions in front of you is likely to produce the best result. And, as well as judgment, you need facts, data and analysis to back that decision. And, you must take into account the new data sources—the customer sentiment expressed on social media; the customer behavior evident from new data sources and devices; the subtle patterns that can be seen in purchase behavior, web browsing and many other sources; and, the supply chain realities now visible as parts, components, goods and materials move around the world and are affected by weather, catastrophes and human events. Often, to the decision maker, the unfortunate reality is that predictions of which profitable customers are at risk may indeed be extremely valuable, but getting such predictions before it’s too late is easier said than done. For many enterprises, then, the key to the analytic opportunity is finding a way to make the entire analytic process work smoothly, conveniently, responsively and cost effectively—whether the analysis focuses on the tabular data most frequently used for the past 25 years; on newer data sources, such as sentiment expressed in social media; or, both. In response to this challenge, SAP has introduced a new version of its flagship analytic DBMS product—SAP Sybase IQ 15.4—as a platform and an integrated environment to support and facilitate the customer’s entire analytic process. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 4. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 4 A WINTER CORPOR ATION WHITE PAPER In addition to a greatly enhanced DBMS engine for data warehousing, Sybase IQ 15.4 features significant new capabilities for business analytics and big data. Highlights are: • A new analytic services layer that supports the use of MapReduce and many other analytic functions on data within Sybase IQ itself; • Parallel interaction between Sybase IQ and Hadoop; • Support of R, the open source language for statistical analysis; • Support of new third party SQL-callable functions for data mining and predictive analytics; • An expanded eco-system for the support of third-party applications for information lifecycle management, business intelligence and data integration, predictive analytics and system/data administration. At the core of Sybase IQ 15.4 is the most mature column store DBMS for data warehousing on the market, with sophisticated capabilities for data compression, query processing and query optimization—an engine with a long record of exceptional query performance and efficiency. While column storage and column-oriented data compression have been “hot trends” for the last few years, Sybase IQ was built from day one with these capabilities: its users have been benefitting from them for more than a decade. And, they contribute significantly to the efficiency of Sybase IQ for analytics. In addition to the remarkably efficient storage and query processing technology at its core, Sybase IQ 15.4 features PlexQ™ technology, a distinctive, elastic design that supports highly parallel query processing and data loading along with independent scaling for data growth and workload growth. WinterCorp, an independent expert in analytic data management and big data, has been invited by Sybase Inc, an SAP company, to review its new product, SAP Sybase IQ 15.4. To conduct its review, WinterCorp, reviewed product designs and documentation; and, engaged in technical discussions of the product architecture with key employees at SAP/Sybase and with independent parties. This White Paper, sponsored by Sybase Inc, an SAP company, presents WinterCorp’s views and findings from that review. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 5. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 5 A WINTER CORPOR ATION WHITE PAPER Table of Contents Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Table of Contents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2  Architecture of SAP Sybase IQ 15.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1  A Platform For Business Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3  The SAP Sybase IQ15.4 Core Data Management Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1 Data Load Performance and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 3.2  Column-Store Storage Efficiency, Indexing, and Compression . . . . . . . . . . . . . . . . . . . . . . . 13 3.3  Query Processing Performance and Scalability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4  Very Large Database (VLDB) Management and Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5  In-Database Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.6  Text Search and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4  The Application Services Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 4.1  “MPP Enabled” User Defined Functions (UDF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2  Protected JAVA UDF’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3  In-Database MapReduce. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4 Simulation for In-Database Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.5  Hadoop Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.6 Geospatial/Geometric Data & Query Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.7  Free Express Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5  The Ecosystem Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.1 SAP BusinessObjects Portfolio Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2  “R” Language Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.3 MapReduce-Enabled Data Mining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4 Social Network Analysis Modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.5 Sybase PowerDesigner 16 Architecture Recommender. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.6  In-Database PMML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 6. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 6 A WINTER CORPOR ATION WHITE PAPER 1 Introduction This paper examines the architecture and capabilities of SAP Sybase IQ 15.4 with a particular focus on demanding new requirements for business analytics and big data. Business Analytics.   People who have been involved with data warehousing for the last decade or more—especially those with a technical background in the field—are often puzzled by the new wave of executive interest in “business analytics.” A common question is, “Aren’t we doing that already?” Surely, the reason all that data has been modeled, cleansed, integrated and stored in data warehouses for the last ten or twenty years is so that it can be analyzed! Certainly there has been analysis going with data warehouse data. But, from the perspective of the business manager or business Methodology end user, data warehousing and business intelligence in practice & Sponsorhip has too often meant little more than ‘routine-ized’ reporting; extraction to other applications and systems; and, the occasional ad hoc query. Sure, business intelligence tools have steadily This WinterCorp Executive improved; data may be delivered on nicer looking, more functional Report describes two trends: electronic reports and dashboards; data access may be more business analytics and “big data” interactive; and, data may even be available on mobile devices. —and the approach to them adopted in SAP Sybase IQ. All of these advances add some value. In developing this report, But most end users will still tell you the same thing: most of what WinterCorp drew on its own they have been doing with the data warehouse has been “looking independent research and in the rear view mirror.” Often, business users learn what has experience; interviewed SAP happened from the data warehouse. They learn which products Sybase IQ employees; and, have been selling; which customers have been buying; which reviewed SAP Sybase IQ suppliers have consistently delivered on time… these insights are product materials. treasured when good information was not previously available In its capacity as the sponsor of as a basis for decision making. this report, Sybase Inc, an SAP The problem is that the practice of business management has company, was provided an moved on from that point. Looking in the rear view mirror is no opportunity to comment on the longer enough. paper with respect to facts. WinterCorp has final editorial Increasingly, operating and strategic decisions must be based on control over the content of this forward looking analysis with a mathematically sound publication and is solely foundation. The analytical approaches to business exemplified responsible for any opinions in Competing on Analytics1 and a series of subsequent books—and expressed. in the best selling popular book and recent hit movie, Moneyball  , 2 have influenced business culture. These accounts and many others have shown how business performance can undergo radical improvement when the decision making process looks forward with analytics. At the heart of this revolutionary analysis is better prediction: whether of the performance of a baseball player, of a product, of a service—or the behavior of a customer. 1 Competing on Analytics, The New Science of Winning, Thomas Davenport and Jeanne Harris, Harvard Business School Press, 2007 (www.tomdavenportbooks.com) 2 Moneyball, The Art of Winning an Unfair Game, Michael Lewis, W.W. Norton & Company, 2003 Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 7. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 7 A WINTER CORPOR ATION WHITE PAPER And, while you may feel that your data warehouse already has the capabilities to support these analytics, there is more to the story. Big Data.  As predictive analytics have been gaining ever more significance in business circles, another trend—big data—has made a profound impact on business and data strategies. “Big data” is a broad phenomenon encompassing the rise of social media; the seemingly sudden proliferation of machine generated data; the worldwide spread of mobile intelligent devices, including smart phones and tablets; the widespread use of GPS data, which attaches a location to many events in daily life; and, rapid decreases in cost associated with capturing, delivering and storing a wide range of previously costly varieties of data, including voice, image, video, etc. Taking all of these phenomena together, we are witnessing an enormous explosion of data which is many times larger and faster growing than what we have seen in data warehouses over the last decade. While the transactional information about customers, products, stores and the like is still uniquely valuable—and plays a central role in understanding any business—there is now new and unprecedented information available that can provide business, engineering, scientific and medical insights never before available. To provide one example, a useful technique in customer retention is to observe when a profitable customer’s activity with a credit card begins to decline and then react quickly to retain the customer before the account is cancelled. When this technique works it is much more efficient than acquiring a new customer that is equally or more profitable. But what if you could know earlier—before the usage declined—that the customer was at risk? Perhaps the retention rate would become yet higher and the retention cost lower, particularly if you could discover the reason that the customer relationship was threatened. If you knew the reason, then your actions to deal with it could be yet more efficiently directed at the root cause. But how could you know earlier? One possibility is social media. If you are engaged with your customers on social media, they may tell you what they are thinking: that they like the service or the incentives or the prices offered by a competitor; that they don’t like your call center or your fees. Or, if they have opted into your social media program, they may let you see what they are saying to others about your product or service. The enormous flood of data pouring out of social media is one of many examples of big data. Data is also pouring out of a growing tide of products that we use every day, and to the extent that we opt in, manufacturers can gain precious knowledge about how, when, and where we use products—and what problems we have with them. This is clearly the case today with smartphones and tablets. Vehicles are becoming more intelligent and more connected and will increasingly provide similar capabilities (more expensive commercial vehicles, such as helicopters, already provide telemetry data that is used to optimize safety and maintenance). The trend will spread to many other products that we use every day, in every case generating yet more “big data” for analysis. New Tools and Technologies.   The concurrent rise of predictive analytics and big data has generated interest in new tools and technologies for several reasons. First, much of the big data does not fit closely with the relational database model. Much of the significance of the data is not revealed by fitting it into a tabular structure. Social media data has textual, image, audio, video and other components that must be analyzed primarily by specialized or procedural functions—SQL solves a relatively small part of the problem here. Embedded in the data is a social graph which is most readily analyzed outside of SQL. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 8. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 8 A WINTER CORPOR ATION WHITE PAPER In general, a significant element of the new, more predictive analysis—especially of the newly varied and highly voluminous “big data”—is best attacked with tools other than SQL. In connection with this, interest has grown in MapReduce, a parallel data analysis framework, and Hadoop, an open source engine for running MapReduce jobs. Some data analysis jobs can be readily performed in a Hadoop cluster. Others may require the services of a data warehouse, such as SAP Sybase IQ. Yet others may best be handled with a combination of the two. Regardless of where the data is stored, interest has also grown rapidly in other analysis tools, such as the open source statistical analysis language, R. In general, the new business analytics will use SQL and the data warehouse, but will also create a strong demand for other tools. Data Strategies.  As enterprises grapple with this rapidly changing world of big data, they need a data infrastructure that will enable them to implement analytic business strategies. Especially with regulatory and governance requirements enforcing longer periods of data retention, enterprises need a convenient, flexible, cost effective process for solving analytic data problems from beginning to end. Sybase seeks to address that customer need—for a comprehensive approach to business analytics— through its new capabilities in SAP Sybase IQ 15.4. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 9. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 9 A WINTER CORPOR ATION WHITE PAPER 2 Architecture of SAP Sybase IQ 15.4 Software relational database engines have been commercially available since the 1970s. To this day, most of these products were originally conceived as row storage engines for online transaction processing. A notable exception is SAP Sybase IQ. Conceived from its earliest days as a column-storage, analytical DBMS, Sybase IQ was in many ways ahead of its time. It was the first commercial column storage engine; the first to put a major emphasis on data compression; and, one of the earliest to place a strong emphasis on complex queries and analytics, rather than on online transaction processing. Sybase IQ has come into substantially widespread use, with thousands of customer installations, and thus has developed into a reliable, highly usable, comprehensive product for data warehousing and business intelligence. But, with Sybase IQ 15.4, that distinctive engine architecture has been expanded into something more: a platform for large scale business analytics. This section will discuss the new capabilities of Sybase 15.4 and describe how they support and enable analytics for the data warehouse and for the newer phenomenon of big data. 2 .1 A PL ATF O R M F O R BUSINESS ANALY TICS With the introduction of Sybase IQ 15.4, SAP has expanded its IQ product line from data warehouse engine to business analytics platform, as shown in Figure 1. The core data management infrastructure, represented by the innermost layer in Figure 1, is a high performance column storage analytic database engine. In recent releases, the core data management infrastructure has been enhanced with SAP’s patented PlexQ™ technology, which SAP characterizes as massively parallel shared everything architecture. The combination of the relatively new PlexQ™ technology and Sybase IQ’s previously developed grid structure results in an elastic architecture— on which capacity is readily added or removed. The underlying database engine, a distinctive design with sophisticated column storage, compression and indexing techniques, has long established advantages in query performance. In Sybase IQ 15.4, the core data management infrastructure is further enhanced with new capabilities for large object compression and high performance bulk inserts via the industry standard ODBC and JDBC interfaces. The core infrastructure has several other noteworthy features, highlights of which are discussed in Section 3. Figure 1: Sybase IQ 15.4 as a Platform for Business Analytics Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 10. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 10 A WINTER CORPOR ATION WHITE PAPER The Application Services Layer, shown in Figure 1, is a greatly expanded set of services designed specifically to for the development and support of analytic applications. It also provides facilities for users and partners to develop and use their own analytic functions that Sybase IQ will run in parallel against the database. This layer provides major new services, including an implementation of native MapReduce that runs in parallel against the database and also provides connectivity with Hadoop. The Application Services Layer is described further in Section 4. The Ecosystem Layer, represented by the outermost layer in Figure 1, is an environment in which SAP and its partners can provide and support analytic applications and tools, as well as the business intelligence tools that have long been available with Sybase IQ. Some key elements of this layer that are new in Sybase IQ 15.4 include support for: • Expansion to support all major Business Intelligence and Data Integration tools including optimizations for SAP BusinessObjects products; • the R language, an open source language for statistical analysis; • a library of MapReduce enabled data mining functions that will run in parallel against data in Sybase IQ; • a set of social network analysis modules; and, • packaged applications for analytics and data lifecycle management. The Ecosystem layer is yet another significant enhancement to the analytic capabilities of Sybase IQ and a third major element of SAP’s initiative to make Sybase IQ a major platform for business analytics. The Ecosystem Layer is described in Section 5. While Sybase IQ has long enjoyed a respected presence in data warehousing, increasing its customer base over the last few years from about 2,000 to over 4,500 installations, Sybase IQ 15.4 is clearly something new and different from what Sybase has offered before. As well as significant continuing enhancement to its core DBMS engine for data warehousing, SAP is now offering an array of capabilities for business analytics with Sybase IQ. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 11. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 11 A WINTER CORPOR ATION WHITE PAPER 3 The SAP Sybase IQ 15.4 Core Data Management Infrastructure The core infrastructure of SAP Sybase IQ has been enhanced significantly over the last three releases with the implementation of elastic PlexQ™ grids for highly parallel processing. The elastic PlexQ™ grid preserves the advantages of the earlier Sybase IQ architecture—a sophisticated form of shared data clustering—while adding scale out processing for queries, loads and other large data warehouse operations. In prior releases, Sybase IQ could run queries and loads in parallel across a single node. In Sybase IQ 15.4, with PlexQ™, the system can run an individual query or load in parallel across multiple nodes. This ability to scale out for individual queries and loads enables Sybase IQ 15.4 to handle significantly larger scale data warehouses and analyses. In addition, as in prior releases, Sybase IQ 15.4 can spread the work of multiple users across the nodes of the grid. Also, grid nodes can be grouped and assigned to specific workloads or user populations, making it possible to dedicate a chosen set of nodes to a particular purpose. New nodes can be added to the cluster as the workload grows, providing an elastic character to the system. Figure 2 below provides an overview of the core data management infrastructure: Figure 2: SAP Sybase IQ Core Infrastructure with PlexQ™ Technology Source: Adapted from a diagram by SAP Inc. Sybase IQ runs on Red Hat and SUSE Linux 64/32 bit systems, Windows 64/32 bit, AIX 64 bit, Sun Solaris 64 bit, and HP-UX 64 bit systems, providing for customers to independently optimize storage, caching, processors, memory, threading, and load distribution. 3.1 DATA LOAD PE RF O R M AN CE AND S CAL ABILIT Y Sybase IQ data load performance and scalability depend primarily on seven factors: 1. PlexQ™ technology, making it possible to spread the work of a load job across multiple nodes of an elastic PlexQ™ grid. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 12. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 12 A WINTER CORPOR ATION WHITE PAPER 2. In a new feature for Sybase IQ 15.4, highly efficient bulk inserts via ODBC and JDBC are supported. This means that many third party tools and applications that load via industry standard interfaces will load large data volumes much more rapidly. In some practical examples, for example when third party ETL tools are used, speeds up of 100 times have been measured. 3 3. Fast, flexible load processing built into the engine at the most fundamental level. 4. Versioning to minimize contention between data-load and query processing. 5. Automated, flexible remote loads. 6. “Near-real-time” “Trickle-feed” loads. 7. Sybase’s ETL (extract, transform, load) utility. Fast Load Processing.  Sybase IQ provides specific features for speeding column-store data loading. In the batch case, a column-store approach allows loads to be in “flat schema” (or “semi-normalized”) format—that is, users can avoid the added space and complexity of storing the data as multiple tables. Sybase IQ’s architecture allows parallelism in loading, including parallel feeds from distributed clients (the “grid”) to multiple servers and parallelism by using multiple processors for parallel storage of individual tables and columns in the target data-warehouse database. Sybase IQ loads only those columns that have changed in a given row (or, of course, in the entire data store)—this typically allows Sybase IQ to create loads a fraction of the size of the comparable row- store relational approach. Versioning.  As the changed columns are loaded, they do not replace old columns. Rather, new versions are created and old ones maintained while needed by ongoing queries. Within a new column version, only changed pages create new storage. Thus, Sybase IQ querying is not interrupted during data loading, data loading is not blocked by ongoing querying, and additional storage for versioning is minimized. Automated, Flexible Remote Loads. Sybase IQ allows scale-out loading across its grid architecture. Data can be pulled from the clients, or “pushed” by the client to the server via ODBC. The utility also enables data loading from SAP Sybase ASE, Microsoft SQL Server and Oracle data stores. Near-Real-Time Loads. Sybase IQ supports “micro-bursts” of “microbatched” incremental data loads (i.e., not the constant stream of updates of an OLTP database, but column changes accumulated over a minute or two, loaded at once). For example, Replication Server—Real Time Loading Edition 15.5 allows delivery of changed data to the Sybase IQ data store within minutes of a data change elsewhere. This ensures “near-real-time” up-to-dateness of data. Combined with versioning, it allows up-to-dateness without interruption of ongoing queries. Sybase InfoPrimer ETL Tool.  This coordinates data loading, including data cleansing as necessary. It takes advantage of the features described in 1-4, and operates multi-threaded, for a high degree of concurrency and/or parallelism. InfoPrimer ETL combines loading and indexing—a chunk of data and its indexes are treated as a single object item—for additional ETL speedup. A SAP utility automates data loading from SAP Sybase ASE, Microsoft SQL Server and Oracle data stores. Sybase IQ also supports SAP BusinessObjects Data Services ETL tool and other third-party ETL tools such as those of Informatica, Syncsort, and Data Stage. Note also that Sybase IQ supports “Extract Load Transform” schemes, in which database functionality or stored procedures are used to speed some forms of data transformation, as well as “change data capture” via Replication Server. 3 Note that bulk inserts were efficiently implemented in prior releases for the native application interface. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 13. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 13 A WINTER CORPOR ATION WHITE PAPER 3. 2 CO LUM N -STO RE STO R AGE E FFICIE N C Y, INDE XING , AND COM PRESSIO N A key differentiator for Sybase IQ is its ability to store data in the minimum amount of space on disk or in main memory, which has a dramatic positive effect on performance and scalability. Relational data stores in row format, by and large, already minimize duplication of records (rows). However, relational row stores duplicate columns within a row even when there is no data in the column, and store the same value in a column multiple times. Sybase IQ’s columnar-data-store approach does not store non-existent column data, and stores each distinct value only once (Figure 4). For example, where a relational row store may store the “Married” value (or any other value) in the customer-marital-status field in every row, the columnar approach stores a pointer to one central instance of each value in the field. Figure 3: SAP Sybase IQ’s Columnar Data Storage Source: SAP Inc. Many queries in BI, complex or otherwise, analyze data using only a few fields in a record, or only a few columns in a row. For queries involving analysis of many rows, this means that a row-based query engine will retrieve much more data than necessary, slowing performance, while a column- based query engine like Sybase IQ will retrieve only those efficiently-stored columns applicable to the analysis. Add Sybase IQ’s ability to partition data according to columns and thus avoid some indexing performance overhead (discussed in VLDB Management, below), and the more that Sybase IQ scales, the greater the frequency and size of its performance advantage. Note that other queries may favor a row-based approach—for example, those that access a small number of rows and a large number of columns. The design philosophy of Sybase IQ argues that such queries typically comprise a modest fraction of the workload in an analytic database. Therefore the gains from a column-store approach will dominate the performance tradeoffs. While Sybase IQ was alone in advancing this argument ten and more years ago, many several products have since incorporated some column storage features or capabilities in response. However, few products have been designed with a column storage approach from the ground up—and Sybase IQ remains the most mature of these. To improve storage efficiency, Sybase IQ’s column-store architecture adds data compression, leveraging its storage of a single data type per column per data page. Aside from the standard methods of compressing individual word strings, Sybase IQ offers bit-mapped indexing (in which low-cardinality column data values are represented as bit strings, and query operations can be Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 14. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 14 A WINTER CORPOR ATION WHITE PAPER carried out as bit operations, for two-orders-of-magnitude performance speedup) where appropriate. In fact, Sybase IQ provides compression not only of the data, but also of its indexes (Figure 4). In Sybase IQ 15.4, data compression is enhanced further for large data objects, providing a critical new capability for unstructured data. The enhanced data compression applies to variable length and fixed length character and binary large objects (VARCHAR, VARBINARY, CHAR and BINARY). In early use of these features, data has compressed from 3 times to 16 times more than with prior releases of Sybase IQ. This enhanced compression means fewer disk I/O operations to read and write the same data, thus enhancing performance. Large objects are especially prevalent in the new “big data” arena, where unstructured and semi-structured data accounts for most of the increased volume. Figure 4: SAP Sybase IQ Data Compression Source: SAP Inc. Many relational databases “retrofit” compression into their database engine by decompressing the data before processing it. Sybase IQ designed in query processing without decompression, so that all operations use the compressed data, and the only time data is decompressed is when processing is finished and the data is being sent to an end user to read in a report. Also, Sybase IQ performs “perfect prefetch of pages,” because it knows from its bitmaps exactly which pages have to be fetched in sequence. The result is an increase in the amount of data that can be stored in main memory, allowing in-memory-database-like performance plus scalability beyond an in-memory database. Sybase IQ’s indexing schemes complement its columnar storage and compression approaches. In particular, Sybase IQ offers a wide range of indexing schemes that allow columns with different characteristics to be stored in less space (Figure 5). Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 15. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 15 A WINTER CORPOR ATION WHITE PAPER Figure 5: Forms of Indexing Supported by SAP Sybase IQ Type of Query Operation Index Name Type of Data Useful For Data Type Useful For all columns with Projections with Fast Project (Default) < 16M unique values scalar aggregates All All except, BITs/CHARs High Group high cardinality columns Large joins, GROUP BYs > 255 Mainly for integers and High Non Group high cardinality columns Range searches CHARs < 255 columns with < 1000 unique Projections, joins, All except, BITs/CHARs Low Fast values requiring fast lookup scalar aggregates > 255 Columns with DATE, TIME, Queries with dates, times, DATE, DATETIME, Date, Time, DateTime DATETIME data types timestamps ranges/compares TIME only two columns with Mainly for integers Compare identical data types (for <, >, = compares and CHARs comparison operations) Data types involving strings Word and words Dictionary Lookup CHARs, VARCHARs only Complex text terms/phrase Data types involving strings Text and words searches including boolean, CHARs, VARCHARs only proximity, and scoring This broad range of indexing techniques is partly baked in (that is, data loading will automatically index data in a compressed form for storage efficiency), but also allows the customer further flexibility to create additional indexes to deliver performance for the customer’s unique querying patterns. Because indexes are highly compressed, users can create a multitude of them up front in anticipation of future ad hoc queries. An “index advisor” built into the query optimizer assists the user by suggesting indexes that will improve query performance. Sybase IQ’s column store architecture aggressively encourages usage of indexes—in many cases multiple indexes per column—on which predicates are applied to obtain speed up. Figure 6 shows how Sybase IQ’s data-storage approach can minimize I/Os. Note also that SAP Sybase IQ can fetch data in large page sizes (typically 64K), which can reduce disk I/O significantly. Figure 6: SAP Sybase IQ Query I/O Reduction EXAMPLE: select sum(sales)   from customers where state = ’NY’   and class = ‘A’ Sybase IQ will use the LF indexes to filter rows and then apply to HNG to compute the sum. Minimal amount of data is read to resolve the query. Note also that Sybase IQ can fetch data in large page sizes (typically 64K), which can reduce disk I/O significantly. Source: SAP Inc. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 16. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 16 A WINTER CORPOR ATION WHITE PAPER 3. 3 QUE RY PRO CESSING PE RF O R M AN CE AND S CAL ABILIT Y The Sybase IQ query-processing engine is built to take advantage of all of Sybase IQ’s storage, Shared Everything PlexQ™ architecture, and versioning capabilities. The cost-based optimizer can load-balance a query across processors and systems, while constantly updating its “sense” of the relative load on each processor/system. The optimizer also factors in the size of the compressed/ indexed data and its presence or absence in main memory, ensuring quicker data access and processing. Sybase IQ can dynamically adjust its query execution plan based on concurrent workload, after having started the execution of the query. Sybase IQ 15.4 rebalances query resources—threads, processors, and cache—every several seconds, to maintain query performance for both long-running/larger and short-time-period/smaller queries. Note that the intelligence of the cost-based optimizer allows users to flexibly deploy heterogeneous small-scale servers if needed, each with its own SLA (service level agreement). Once the query is optimized, the engine carries out pipelining of operations within queries as well as parallelism within and across queries. That is, a query that may involve an initial load and sort followed by a join might begin the join operation for one column value immediately, without waiting for all data to have been sorted. When one processor is finished sorting a column value, it might move to sort the next, passing the value to the “join” processor. Multiple pipelines may operate in parallel for different sets of data within a query. In the case of joins, in particular, Sybase IQ provides two levels of parallelism, in which parts of data to be joined may be “grouped” initially for separate, parallel processing, and then the groups may be joined together in a second step. In the case of column data that uses bit-mapped storage and indexing, the engine takes an additional step. It combines (performs bit operations) early, in order to reduce the number of times that the engine needs to actually “touch” a data item. In this case, Sybase IQ never needs to do a table scan. 3.4 VE RY L ARGE DATABA S E (VLDB) M ANAGE M E NT AND BACKUP The larger Sybase IQ implementations typically manage hundreds of terabytes of data; a few Sybase IQ systems manage petabytes of data, according to SAP. Moreover, Sybase IQ allows administrators to bind tables, indexes, and columns to particular storage structures—thus placing less fresh groups of data on more price-performant storage (offline disk or nearline tape) without significant diminution of performance. Logical “groups” can be moved (e.g., from disk to tape) with simple commands, as when “aging” data becomes ready for archiving. Sybase PowerDesigner (also part of the Sybase Workspace IDE) enables creation of programs that generate reports based on the data’s logical “age.” To complement logical “data age” partitioning, Sybase IQ supports physical range partitioning of columns/tables based on the values in a “date created” or “date last modified” field. Older data can be marked “read-only,” avoiding the need for further backup (see Figure 7). Adding and removing data can have significantly less impact on performance (and hence the need for retuning) than in row-based systems. Specifically, if a field needs to be added to or removed from the data, it does not require reallocation of each row in storage or immediate redefinition of all affected row indexes, and does not “lock up” rows during the addition/removal process. Moreover, efficient data and column representation means quicker field addition or removal. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 17. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 17 A WINTER CORPOR ATION WHITE PAPER Figure 7: Data Partitioning Allows the Placement of Older Data on Lower Cost Storage Source: SAP Inc. In general, Sybase IQ emphasizes ease of tool use by administrators. They can perform most needed operations via the Sybase Central GUI (graphical user interface), and SAP anticipates releasing a Web version of administrative tools with the same functionality within the next 12 months. Parameters that administrators may tweak include modeling the data, and ETL. At the same time, Sybase IQ automates job load balancing within a node, as well as ETL-based data-load balancing. Sybase IQ supports active-passive disaster recovery, with manual failover of a single failed node. Sybase IQ’s Virtual Backup integrates with the storage subsystem to create and periodically resynchronize shadow data-device copies online, with delayed logged writing of updates to the shadow. Effectively, this means that during normal processing, backup overhead is minimal, and “virtual restore” involves only roll-forward of changes not yet applied to the shadow—often a matter of seconds. Note that Sybase IQ reduces the amount of pre-aggregation/materialization and index creation work required of the typical data-warehouse administrator. Sybase IQ’s columnar approach effectively aggregates data according to columns and values in a column; index compression is carried out during data loading; indexes can be created automatically “on the fly” by the query engine, and can be based on usage patterns rather than pre-defined by the administrator. Security schemes involve both data communications (e.g., RSA, FIPS 140-2, Kerberos) and data storage. Data storage encryption is applied to the entire database and to particular columns (using Sybase IQ’s AES 128-bit encryption or an optional FIPS 140-2 certified version of the encryption. 3. 5 IN - DATABA S E ANALY TICS Using stored procedures or user-defined functions compiled and optimized within the database engine’s process is a time-honored way to improve performance of key query types. SAP extends the notion to encompass not only built-in math functions and SQL OLAP operators but also SAS/ SPSS-type complex operations such as clustering, simulations, and classifications. And Sybase IQ specifically opens this capability (e.g., via C++ plug-ins) to third parties such as Fuzzy Logix and Visual Numerics. Sybase IQ 15.4 introduced a major expansion of the User Defined Function (UDF) and other analytic capabilities. This is described in Section 4 on Analytic Services. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 18. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 18 A WINTER CORPOR ATION WHITE PAPER 3.6 TE X T S E ARCH AND ANALYSIS Sybase IQ allows full (semi-structured) text data search in combination with traditional relational (structured) data analysis. For example, users can find all instances of a word or phrase in a set of text fields stored in Sybase IQ’s data store, without having to scan table rows or having to know which column the word or phrase is stored in. Specialized text indexes that store positional information for terms in the indexed column(s) speed up complex text search and analysis. Moreover, Sybase IQ’s -in-database capabilities (outlined earlier) include plug-ins for third-party C++ Text Analytics/Mining libraries. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 19. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 19 A WINTER CORPOR ATION WHITE PAPER 4 The Application Services Layer The Application Services Layer, represented by the middle layer in the Sybase IQ 15.4 analytic platform architecture, is a greatly expanded set of services designed specifically for the development and operation of analytic applications. This layer provides several new services, including an implementation of MapReduce that runs in parallel against the database. The Application Services Layer also provides facilities for users and partners to develop and use their own analytic functions that Sybase IQ 15.4 will perform in parallel against the database. Figure 8: The SAP Sybase IQ 15.4 Application Services Layer Source: SAP Inc. Additional key elements of the Application Services Layer include protected “out of process’ Java UDFs, spatial/geometric data and query support and simulation for in-database application development and testing. 4.1 “M PP E NABLED ” US E R DE FINED FUN C TIO N S (UDF) Several of the advanced capabilities of Application Services Layer are possible because of the new forms of user defined functions (UDF) supported in Sybase IQ 15.4. SAP characterizes these new UDFs as “MPP Enabled,” meaning that Sybase IQ will run them in highly parallel fashion, including spreading the work of a single function call across multiple nodes of the PlexQ™ grid. These are functions written in C or C++ (and for some types, may be written in JAVA); and, are callable from SQL. Because such functions are enabled for execution in parallel across multiple nodes, they are key enablers for business analytics and big data. UDFs are a convenient mechanism for the advanced users or database professionals in an enterprise to codify certain calculations or analytical techniques specific to a business—and then make them available for use throughout the enterprise. Though the industry term for this capability is “user defined function”— and while Sybase IQ customers will certainly write them—a substantial library of such functions is provided by SAP and its partners. UDFs also provide a mechanism whereby a software vendor or data service provider can develop proprietary techniques; and, make them available for use by customers; but, without necessarily disclosing the algorithm or its implementation. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 20. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 20 A WINTER CORPOR ATION WHITE PAPER Four classes of UDFs are supported: • Scalar functions operate on individual data items, returning a single value; • Aggregate functions operate on sets of values, returning a single value; several aggregation operations are built into the SQL language—for example SUM, COUNT and AVERAGE—but aggregate UDFs provide an opportunity for users to create their own aggregation functions, which may incorporate techniques specific to an industry, company or analytical discipline; • Table functions produce bulk data (that is, a table) as output and may be written in C/C++ and/or JAVA; • Table parameterized functions both accept bulk data as input and return bulk data as output and may be written in C/C++. Taken together, considered in light of their enablement for highly parallel execution, these UDFs provide a potent new analytical capability for SAP customers and partners. 4. 2 PROTE C TED JAVA UDF ’S Prior to Sybase IQ 15.4, customers have been able to write UDFs in C and C++. Such functions had to be tested and certified before they could be run as part of a production system. They ran in the Sybase IQ kernel. UDFs can now also be written in JAVA. In addition, they are run in a “protected” mode. This means that they are executed in a separate process that runs on a database server (that is, it runs on a node of the PlexQ™ grid). This prevents an error in the UDF from interfering in the operation of either the core infrastructure of Sybase IQ 15.4 or in the operation of any other UDF or user process. The result is therefore more reliable and consistently available data and analytical services. 4. 3 IN - DATABA S E M apReduce “MapReduce is a software framework for distributed processing of large data sets on compute clusters,” as described on the website of the Apache Foundation (http://hadoop.apache.org/mapreduce/). In the MapReduce framework, data analysis tasks can be broken into functional pieces—called mappers and reducers—each of which performs a portion of the analysis, reading an incoming set of (key, value) pairs and writing an outgoing set of (key, value) pairs. When mappers and reducers are run in the correct sequence, the complete analysis task is accomplished. The MapReduce framework is especially interesting when a large volume of data is to be processed because it is designed—and MapReduce functions are written—so that many copies of each mapper and each reducer can be run at the same time in a parallel architecture. Thus, if there is a terabyte of data to be analyzed and one runs 100 copies of a mapper, then each mapper needs to analyze only 10 GB of data (assuming that there is a readily available way to partition the data into 100 roughly equal parts). This concept of scalable, highly parallel analysis is similar to the concept of parallel query processing used in a data warehouse, though there are important differences between the two. Prior to the development of MapReduce, procedural programs to analyze data—written outside of the context of a parallel database system—had to deal with all the complexity of parallel programming. So, data could be analyzed serially—a very slow process with large data volumes— or the programmer could get involved in the very complex and error prone process of specifying manually how the data was to be: • partitioned; • fed to many separate copies of the analysis process; and, Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 21. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 21 A WINTER CORPOR ATION WHITE PAPER • analyzed; and, then, how the many separate results were to be • recombined; and, • delivered. In a complex analysis there are many successive stages of parallel analysis, with the data passing between them in complex patterns, and the difficulty of the programming task escalates rapidly. The MapReduce framework relieves the programmer of explicitly dealing with the parallel aspect of the analysis, freeing him or her to concentrate on the data and the analytical logic. The MapReduce framework has been popularized in connection with Hadoop, an open software system that implements MapReduce on compute clusters (typically, clusters of low cost servers and low cost storage). Hundreds, if not thousands, of companies are now using or experimenting with Hadoop clusters in part so that they can have an environment for storing large amounts of data— the so-called “big data”—and analyzing it with MapReduce and other tools. While a Hadoop cluster provides a repository for the storage and analysis of big data, it has different advantages and limitations than a data warehouse. WinterCorp believes that most enterprises will have an analytical environment in which at least one data warehouse and at least one Hadoop cluster will be present. Section 5.x provides more information about Hadoop clusters and describes the facilities in Sybase IQ 15.4 for interfacing to them and interacting with data stored in them. Meanwhile MapReduce as a programming framework has come to be widely viewed as a standard method for interfacing the procedural program—written in Java, Python or some other popular language for data analysis—to a large volume of data in storage. This is because programs and functions written using MapReduce can be executed in highly parallel architectures that speed up the large scale analysis. In Sybase IQ 15.4, a facility is provided for running C++ applications that use the MapReduce framework and run within the Sybase IQ PlexQ™ elastic grid. They can run against data stored in the Sybase IQ database or against externally stored data. The data can be structured or unstructured, as Sybase IQ 15.4 is capable of storing either. And the mappers and reducers are stackable. Note carefully that, in the Sybase IQ 15.4 context, such programs need not have anything to do with Hadoop. The data that they are analyzing can be data previously stored in Sybase IQ and that could be analyzed with SQL queries or any other tool that works with Sybase IQ. But, because of the popularity of MapReduce this facility is likely to be valuable, because: • Many libraries of analytical functions will be implemented for other environments using MapReduce; such libraries can then be ported to Sybase IQ 15.4; • Many programmers, data scientists and other data specialists will gain familiarity with MapReduce and may prefer to program using it; and, • Sybase IQ customers may want to build their own libraries of functions that can be used both on data in Sybase IQ and on data in other environments such as Hadoop; these customers will therefore be able to use MapReduce for this purpose. As described in Section 5.2, at least one Sybase IQ partner has already leveraged this facility to provide data mining functions to Sybase IQ customers using MapReduce. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 22. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 22 A WINTER CORPOR ATION WHITE PAPER In addition, using MapReduce on Sybase IQ data will typically be simpler than accomplishing the same task on data in a Hadoop cluster. This is because the data to be analyzed can be selected and partitioned using SQL; the results returned by the analysis can be stored back in Sybase IQ using SQL; and, the definition of the data to be analyzed can be maintained in SQL. Each of these simplifies some aspect of the analytical process. Also, data that is stored in Sybase IQ is managed by Sybase IQ. It benefits from all of the services provided in the Sybase IQ environment for other data. For example, it can be incorporated in a routine backup schedule; it can be made recoverable; it can be secured via access controls; and, so on. 4.4 SIMUL ATIO N F O R IN - DATABA S E DE VE LO PM E NT Analyzing data within a UDF—rather than transferring the data to an external system for analysis— has important advantages for a user of Sybase IQ 15.4. First, it takes time and system resources to transfer the data elsewhere. Second, the moment the transfer begins, the data starts growing stale. If the analysis is delayed for some reason it becomes even more stale. And, the larger the volume of data to be analyzed, the higher is the overhead of first moving it elsewhere. Second, Sybase IQ 15.4 is capable of running UDFs in parallel across multiple nodes. If the data is transferred to another system for analysis—and if that system is not able to analyze data with an equal or greater degree of parallelism—then there will be yet more delay. Third, if the results of the analysis are substantial and are to be retained for later use, it will be efficient to write them in parallel back within Sybase IQ, rather than having to transfer them from another system. These reasons—and others—provide incentives to analyze data in place within UDFs in Sybase IQ. But, there are some issues to address. As UDFs are being developed, they may contain errors. A UDF under test could have unintended—and undesirable—effects on the production environment if run there. In some environments, the production data is extremely sensitive and it is not practical to have a copy of it in a separate test environment. To address such issues, Sybase IQ 15.4 provides facilities for testing of UDFs and applications that are intended to perform in-database analysis. These facilitate the process of creating realistic simulated data in a large scale test database. As a result, the development of in-database analytics is far more streamlined and UDFs can be more completely tested before they are used with production data. 4. 5 HADO O P INTE RFACE Many enterprises will create—or have already created—an analytic environment in which there are multiple data repositories, some on data warehouse platforms and others in Hadoop clusters. In this situation, which WinterCorp believes will become nearly universal within the next several years, it will be common to have analytic processes which leverage data from multiple sources. In response to this emerging requirement, Sybase IQ 15.4 has four mechanisms in its Application Services framework for connection between Sybase IQ and Hadoop. These are: a. Client Side Federation.  The Quest TOAD data query tool (certified with Sybase IQ and Hadoop) can retrieve data from each source and bring it together at the client; this can be a good solution when the volumes of data returned are not very large; b. Analysis in the Sybase IQ Environment that Includes Data Extracted from Hadoop. ETL Hadoop data into Sybase IQ via Apache SQOOP, an open source tool for bulk data transfers between Hadoop and relational databases; SQOOP stands for “SQL-to-Hadoop”; this is a Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 23. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 23 A WINTER CORPOR ATION WHITE PAPER particularly attractive solution when the data extracted from Hadoop is to be joined or aggregated with data that resides in Sybase IQ; performing the work in the Sybase IQ environment brings all the benefits of a mature relational database environment, including security, compliance, backup/recovery, query optimization, and, defined and controlled data semantics; this is a bulk data transfer which can be highly parallel on both the Sybase IQ side and the Hadoop side. c. Incorporation of Hadoop Data into Sybase IQ Queries.  When the data access is to be more frequent, and when the data volumes to be transferred are not very large; data can be retrieved from Hadoop using Sybase IQ table functions; these retrievals, while not instant because Hadoop is fundamentally a batch environment, can nonetheless be incorporated into SQL queries as they are executing; d. Coordinate Hadoop Job(s) with Sybase IQ Query.  In this case, a Hadoop MapReduce job runs separately from a query but is designed to feed data to it; the query and the Hadoop job interface by means of parameterized table functions in the Sybase IQ query; though similar to case (c) above, the emphasis here on coordinating and integrating analysis that is occurring in two jobs in two separate environments. With these four capabilities, Sybase IQ customers can deal effectively with a range of situations in which a Hadoop repository is to be used in conjunction with an Sybase IQ data warehouse to meet an analytic requirement. 4.6 GE OS PATIAL/GE OM E TRIC DATA & QUE RY SUPPO RT Trends have increased the prevalence and the significance of location data and geometric data in the analytical environment. First, GPS enabled devices such as smartphones and tablets are in widespread use and proliferating rapidly. There are hundreds of millions of such devices in use today and projections are that there will soon be billions. Such devices frequently communicate their location via the internet and such location data ends up in many commercially valuable databases. Second, many other types of GPS-enabled electronic devices are being created and they also communicate their location with increasing frequency. Examples include vehicles, medical devices, surveillance devices used for traffic analysis, weather devices and others too numerous to mention here. Data on the location of devices, once too expensive or impractical to obtain, now shows up in more databases every day. Analysis of the location aspects of this data is central to the timely understanding and management of public safety, public health, the commercial supply chain, customer purchase patterns and many commercial resources. A similar trend exists with respect to geometric data significant to the design and manufacture of products; the maintenance of buildings, highways and bridges; the management of energy use; and, so on. In both cases, it is important for the data to be defined, captured and stored in as standardized, easily specified and easily used a fashion as possible. It is also essential for the database system to facilitate the specification of queries that exploit geographic and geometric data. And, it is essential for the database system to perform such queries efficiently, especially when the data volumes involved are large. Sybase presently addresses these requirements in the embedded row store DBMS inside Sybase IQ - SQL Anywhere, a very efficient small footprint DBMS that serves as a catalog store for Sybase Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 24. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 24 A WINTER CORPOR ATION WHITE PAPER IQ’s engine. However, users can also spawn a separate instance of SQL Anywhere from Sybase IQ to store Geo Spatial data. Sybase IQ then provides facilities for federated query of the geospatial/ geometric data stored in SQL Anywhere and the main analytical column data store in Sybase IQ 15.4 to enable high performance geospatial analysis 4.7 FRE E E XPRESS EDITIO N As with any software platform, it is important for developers to be encouraged to develop tools and applications for Sybase IQ. The robust and rapidly growing community of millions of Sybase IQ end users— using the product at over 4,500 installations worldwide—is certainly an incentive to developers. But it is still important to remove obstacles from the path of any developer interested in providing new capabilities to those users. To this end, SAP is providing a free Express Edition of Sybase IQ 15.4. Anyone developing for Sybase IQ (utilizing the rich Application Services described in this section) using Sybase IQ or thinking about such an activity can download the product from http://www.sybase.com/iqexpressedition. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 25. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 25 A WINTER CORPOR ATION WHITE PAPER 5 The Ecosystem Layer The Ecosystem Layer, represented by the outermost layer in the Sybase IQ 15.4 analytic platform architecture, is an environment in which SAP and its partners can provide and support analytic applications and tools, as well as the business intelligence tools that have long been available with Sybase IQ. Figure 9: The SAP Sybase IQ 15.4 Application Enablement Layer Source: SAP Inc. Key elements of the Ecosystem Layer include support for SAP BusinessObjects and the “R” statistical language; more efficient and scalable data mining functions written to exploit the new in-database MapReduce; social network analysis modules; support for PMML, for mathematical modeling; new capabilities for PowerDesigner and system administration and monitoring; applications for big data information lifecycle management;. Highlights are described in the following sections 5.1 SAP Business O bjects Portfolio support As part of SAP, Sybase IQ is now well integrated with SAP’s market leading tools for Business Intelligence and Data Integration from its BusinessObjects portfolio. The SAP BusinessObjects BI Platform is not only certified with every new version of Sybase IQ, including Sybase IQ 15.4, but it is also being optimized to support Sybase IQ focused optimized SQL generation. Similarly, SAP BusinessObjects Data Services is being certified and optimized to load and transform data into Sybase IQ in a very efficient manner. 5. 2 “R” L ANGUAGE SUPPO RT As described at www.r-project.org, R is a language and environment for statistical computing and graphics. …R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc) and graphical techniques, and is highly extensible. R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 26. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 26 A WINTER CORPOR ATION WHITE PAPER Sybase IQ 15.4 provides support for the R language in two ways. First, R applications can fetch data sets stored in IQ for analysis in the R environment through RJDBC. Second, calls to models written in R can be embedded in a Sybase IQ table UDF written in C++. Then SQL queries submitted to Sybase IQ can call the UDF, thereby causing the model to be invoked in an R server process. 5. 3 M apReduce- E NABLED DATA MINING Since Version 15.1, the Fuzzy Logix library of data mining and analytic functions has been available with Sybase IQ. With Sybase IQ 15.4, this library of over 250 functions has been: • Re-implemented using Sybase IQ’s in-database MapReduce API; and, • Extended with additional new functions. Most significantly, by using Sybase IQ’s in-database MapReduce API, the new implementation leverages the Sybase IQ table and table parameterized functions (thus using bulk data input and bulk data output to gain efficiency) and exploits the elastic PlexQ™ grid to execute the functions with much higher parallelism. 5.4 S O CIAL NE T WO RK ANALYSIS MO DULES KXEN’s InfiniteInsight social network analysis and predictive analytic toolset has been certified with Sybase IQ 15.4 to run on data stored in the database. With Sybase IQ 15.4, KXEN does its scoring directly in the database, and reports that it realizes large performance benefits both from the column storage model and the in-database analytic support. 5. 5 SYBA S E POWE RDESIGNE R 16 ARCHITE C TURE RE COM M E NDE R Sybase PowerDesigner is a widely used application and database design product that has long been available and integrated with Sybase IQ. PowerDesigner 16 and Sybase IQ 15.4 are now jointly enhanced and integrated to provide a new capability of recommending the architecture for a Sybase IQ solution. The user provides PowerDesigner with: • Database design • Expected data volumes & growth • Expected workload • Performance requirements • Hardware preferences (e.g., Intel or Power) PowerDesigner will then generate an estimate of the configuration required and a bill of materials based on Sybase IQ reference architectures developed in cooperation with system partners. Where pre-built appliance-like configurations are available, these can be generated. The user can then vary input assumptions and examine the sensitivity of the configuration to variations. In WinterCorp’s opinion, such estimated configurations would be used only as a starting point in certain capacity planning situations. Particularly in larger and more complex deployments, users would be well advised to seek independent confirmation and measurement. However, a fast path to an initial estimate is often extremely useful in capacity planning and this tool can provide that, along with an indication of sensitivity to various planning assumptions. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 27. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 27 A WINTER CORPOR ATION WHITE PAPER An area where particular caution is advised is in regard to large databases with complex query requirements. Where query complexity is high and data volumes are large, modest changes to the query workload can produce surprisingly large variations in capacity requirements. In these cases, a certain amount of realistic testing—along with larger allowances for unexpected capacity demands—are in order. However, with this tool, a history of configuration changes can be initiated, estimated, tracked, and maintained that can make sizing and deployments much more “factory like.” 5.6 IN - DATABA S E PM M L From http://www.dmg.org/pmml-v3-0.html The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. PMML provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. PMML models can be developed in a variety of data mining and statistical workbench environments available from other parties. However, when PMML models are actually used in production to score large volumes of data, they must run in a highly parallel environment. In Sybase IQ 15.4, users can run PMML models with a plug-in, developed by Zementis (http://www. zementis.com/in-DB-plugin.htm). With the plug-in, the PMML model can be run directly against data in Sybase IQ. The Zementis plug-in is a Sybase IQ UDF, leveraging the new JAVA API available in Sybase IQ 15.4 Besides the various eco-system modules outlined above, Sybase IQ supports a substantial variety of packaged analytical applications through its OEM partnerships covering various functional areas. A few examples include Ericsson’s OSS product ENIQ, BMMSoft EDMT, and Solix EDMS. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 28. SAP Sybase IQ 15.4: An Elastic Platform for Business Analytics 29 A WINTER CORPOR ATION WHITE PAPER 6 Conclusions Over the course of its last five rapid releases in 3 years—from 15.0 through the present 15.4—SAP Sybase IQ has been transformed to a platform for large scale data analytics and big data. It has significantly advanced in: • Scalability, with the development of its elastic PlexQ™ grid that adds highly parallel execution of large queries and loads; previously, such operations could run in parallel over a single node of the grid; now they can run in parallel over multiple nodes; this is a major architectural advance, highly significant for larger data and workload requirements; • In-database analytics, with a major generalization and extension of the user defined function (UDF) facility in Sybase IQ; with these new capabilities, UDFs can be written in Java as well as C++; they can read and write bulk data in the form of tables and files; they can be run in a protected mode, increasing system reliability and data availability; and, they can be executed in parallel over multiple nodes of the grid; • In-database MapReduce, enabling end users and partners to run MapReduce routines and libraries against data in place and in a highly parallel fashion in Sybase IQ, and opening Sybase IQ up to a large range analytic tools and applications from many vendors and sources; • Interface to Hadoop, enabling the many customers who are investing—or will invest—in an open source data repository in a Hadoop cluster—to leverage that investment in combination with data and analysis in Sybase IQ; • Other analytic application services leveraging in-database MapReduce and new, more powerful UDFs; these include an expanded, more efficient and more highly parallel version of the Fuzzy Logix data mining and analytics library; a simulator for testing analytic applications; and, other features. • Partner Ecosystem - Other analytical, management and business intelligence tools and functions available from partners, certified by Sybase IQ and providing analytical solutions and capabilities to customers; these include support for the SAP BusinessObjects tool set, the R statistical language; a PMML plug-in for data mining from Zementis; social network analysis from KXEN; query and administration tools from Quest TOAD; and, of other capabilities. These advances are evidence of a significant reorientation of the product direction and a significant enhancement of the product line to focus on the major drivers of change in business today. Organizations everywhere are grappling with the implications of a much larger volume and variety of data and a much increased focus on business strategies driven by fuller analysis of that data. Mobility (tablets, smartphones, other devices), social media and machine generated data are all changing our data environments. Sybase IQ now claims more than 4,500 installations of Sybase IQ across the globe, following a rapid growth in revenue and a large expansion of the development organization. In addition to the recent advances in releases 15.0 through 15.4 described here, Sybase IQ retains its established advantages in column storage, indexing and compression. These features—present since the earliest versions of Sybase IQ—work in combination to confer benefits that are unique to Sybase IQ. While other products offer column storage and compression, no other product has the sophistication of Sybase IQ in integrating these features with advanced indexing and query optimization. The result is that Sybase Q is particularly efficient in reducing the amount of data that must be read to satisfy queries. These fundamental strengths are now combined with increased parallelism and other features to deliver product benefits in a wider range of applications, now including those that use advanced analytic methods, including MapReduce and that involve interaction with big data in Hadoop clusters. Copyright © 2012, WINTER CORPORATION, Cambridge, MA. All rights reserved.
  • 29. WinterCorp is an independent consulting firm expert in the architecture and scalability of big data and analytic database solutions. Since our founding in 1992, we have architected solutions to some of the largest scale and most demanding big data and data warehouse requirements, worldwide. We help technology users define their requirements; architect their solutions; select their platforms; and, engineer their implementations to optimize business value. We create and conduct benchmarks, proofs-of-concept, pilot programs and system engineering studies that help our clients manage technical risk, control cost and reach business goals. Our seminars and structured workshops help client teams establish a shared foundation of knowledge and move forward to meet their challenges in big data and analytic database scalability, performance and availability. We’re expert with SQL, MapReduce and Hadoop—with structured data, unstructured data, and semi-structured data—with the products, tools and technologies of data analytics in all its major forms. With our in-depth knowledge and experience, we deliver unmatched insight into the issues that impede scalability and into the technologies and practice that enable business success. 245 First Street, Suite 1800 Cambridge MA 02145 617-695-1800 visit us at www.wintercorp.com ©2012 Winter Corporation, Cambridge, MA. All rights reserved.