SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
Enabling Analytics as a Service (AaaS): The Key Analytical Platforms and
Workloads on IBM SoftLayer Cloud
Abstract
In the recent past, two prominent and dominant trends have gripped the IT industry. The first one is the
accelerated IT infrastructure optimization, which is being primarily sponsored and supported by the proven
and promising cloud technologies. The other one is the amount of data getting generated, collected, and
subjected to a variety of investigations to extract actionable insights in time, to enable correct and timely
decision-making by business executives with all the confidence and clarity, and to empower knowledge workers
to be greatly efficient in their tasks, is challengingly massive in size. Established product vendors and researchers
from academic institutions across the world are in fast track and in grand unison in collaboratively conceiving
and concretizing a bevy of service assemblage and delivery platforms (SDPs), data virtualization, ingestion,
analytics and visualization platforms and application enablement platforms (AEPs) in order to speed up and
simplify knowledge extraction and engineering from a variety of data heaps via real-time as well as batch
processing. Towards the knowledge discovery and dissemination, there are design and architectural patterns,
highly synchronized processes, evaluation metrics, key guidelines, best practices, etc. being unearthed and
sustained by data management professionals. We have performed a variety of proof of concepts and pilots in
the fast-growing big and fast data analytics domains and based on that experience and expertise gained, we
could produce a repository of reusable assets to be shared across. In this paper, we would illustrated how the
trendy and transformative data analytics is being exposed and delivered as a service via IBM SoftLayer Cloud
for worldwide users in an affordable, amenable and accelerated fashion.
Introduction
There are several disruptive things happening in parallel in the IT field. The device ecosystem is seeing an
unprecedented growth towards billions of connected devices, the number of implantables, wearables, portables,
cyber-physical systems (CPS), etc. are zooming ahead, the business-critical operational, transactional and
analytical systems are becoming pervasive, social sites are embraced with a greater alacrity by people across the
world, the digitization idea is pursued vigorously as never before resulting in trillions of digitized entities /
smart objects / sentient materials, scores of powerful scientific and technical experimentations are
accomplished, etc.
Traditionally business data is the main source for analytics to squeeze out business insights. Today the data size
is massive, data scope, speed, and structure are varying sharply, and the resulting data value for any individual,
innovator and institution is going to be decisive if all kinds of data getting collected are crunched cognitively.
Having understood the strategic significance of data-driven insights, there are two grand disciplines (big and
fast data analytics) of deeper research and study. There are several enabling technologies, platforms, and tools
in plenty these days from worldwide product vendors for accelerating big and fast data analytics in a simplified
and streamlined fashion. Precisely speaking, there is an insistence on crafting and composing insights-filled
knowledge services towards enhanced care, choice, comfort and convenience for people. In the ensuing
sections, we would like throw some light on the two principal technologies enabling the smooth and sagacious
realization of analytics as a service (AaaS). We write about the analytics platforms and workloads that got
modernized, migrated and deployed in IBM SoftLayer Cloud to envisage and enable the ultimate aim of
accomplishing of analytics as a service.
The World of Big and Fast Data Analytics
Big data analytics is now moving beyond the realm of intellectual curiosity and propensity to make tangible and
trendsetting impacts on business operations, offerings and outlooks. It is no longer a hype or a buzzword and
is all set to become a core and central tenet for every sort of business enterprise to be extremely relevant and
rightful to their stakeholders and end-users. Big data analytics is a generic and horizontally applicable idea to be
feverishly leveraged across all kinds of business domains and hence is poised to become a trendsetter for
worldwide businesses to march ahead with all clarity and confidence. Real-time analytics is the hot requirement
today and everyone is working on fulfilling this critical need. The emerging use cases include the use of real-
time data such as the sensor data to detect any abnormalities in plant and machinery and batch processing of
sensor data collected over a period to conduct root cause and failure analysis of plant and machinery.
Public Clouds for Big and Real-time Data Analytics - Most traditional data warehousing and business
intelligence (BI) projects to date have involved collecting, cleansing and analyzing data extracted from on-
premises business-critical systems. However, this age-old practice is about to change forever. However, for the
foreseeable future, it is unlikely that many organizations will move their mission-critical systems or data
(customer, confidential and corporate) to public cloud environments for analysis. Businesses steadily are
adopting the cloud idea for business operational and transactional purposes. Packaged and cloud-native
applications are primarily found fit for clouds and they are exceedingly well in their new residences. The biggest
potential for cloud computing is the affordable and adept processing of data that already exists in cloud centers.
All sorts of functional web sites, applications and services are bound to be cloud-based sooner rather than later.
The positioning of clouds as the converged, heavily optimized and automated, dedicated and shared, virtualized
and software-defined environment for IT infrastructures (servers, storage and networking), business
infrastructure and management software solutions and applications is getting strengthened fast. Therefore every
kind of physical assets are seamlessly integrated with cloud-based services in order to be smart in their
behavioral aspects. That is, ground-level sensors and actuators are increasingly tied up with cloud-based
software to be distinct in their operations and outputs. All these developments clearly foretell that the future
data analytics is to flourish fluently in clouds.
These days’ public clouds are natively providing all kinds of big data analytics tools, platforms, and tools on
their infrastructures in order to speed up the most promising data analytics at a blazing speed at an affordable
cost. WAN optimization technologies are maturing fast to substantially reduce the network latency while
transmitting huge amount of data from one system to another among geographically distributed clouds.
Federated, open, connected, and interoperable cloud schemes are fast capturing the attention of the concerned
and hence we can see the concept of the inter-cloud getting realized soon through open and industry-strength
standards and deeper automations. With the continued adoption and articulation of new capabilities and
competencies such as software-defined compute, storage and networking, the days of cloud-based data analytics
is to grow immensely. In short, clouds are being positioned as the core, central and cognitive environment for
all kinds of complex tasks.
Hybrid Clouds for Specific Cases - It is anticipated that in the years to unfold, the value of hybrid clouds is
to climb up sharply as for most of the emerging scenarios, a mixed and multi-site IT environment is more
appropriate. For the analytics space, a viable and venerable hybrid cloud use case is to filter out sensitive
information from data sets shortly after capture and then leverage the public cloud to perform any complex
analytics on them. For example, if analyzing terabytes worth of medical data to identify reliable healthcare
patterns to predict any susceptibility towards a particular disease, the identity details of patients are not too
relevant. In this case, just a filter can scrape names, addresses, and social security numbers, etc. before pushing
the anonymized set to secure cloud data storage.
All kinds of software systems are steadily being modernized and moved to cloud environments especially public
clouds to be given subscribed and used as a service over the public web. The other noteworthy factor is that a
variety of social sites for capturing and captivating different segments of people across the world are emerging
and joining in the mainstream computing. We therefore hear, read and even use social media, networking, and
computing aspects. A statistics says that the widely used Facebook pours out at least 8 terabytes of data every
day. Similarly other social sites produce a large-scale of amount of personal, social, professional data apart from
musings, blogs, opinions, feedbacks, reviews, multimedia files, comments, compliments, complaints,
advertisements, and other articulations. These poly-structured data play a bigger role in shaping up the data
analytics domain.
The other valuable trends include the movement of enterprise-class operational, transactional, commercial, and
analytics systems to public clouds. We all know that www.salesforce.com is the founding public cloud providing
CRM as a service. Thus most of the enterprise data originates in public clouds. With public clouds projected to
grow fast, the cloud data is being presented as another viable and venerable opportunity towards cloud-based
data analytics.
The Contemporary Analytics in Hybrid Clouds
Apart from the traditional business analytics, the above-mentioned trends ask for newer kinds of analytics
leveraging big and real-time data. There are domain-specific and agnostic analytics categories. For example,
increasingly the justifications for predictive and prescriptive analytics, operational, security, performance
analytics and so on are being expounded with the purposeful emergence of different and distributed data
sources. Every industry vertical has its big data analytics. With different data velocities, real-time / streaming
analytics is bound to be mandatory. There are a few vital parameters to determine the appropriateness of cloud
environments for powerful data analytics.
 The Data Volume and Velocity
 The Impacts on Compute, Storage and Network Resources
 The Sensitivity of data and Regulatory /Compliance Requirements
 The Scope of Analytics
 The Types of the Environments?
Why the Next-Generation Data Analytics Applications and Platforms in Cloud Environments?
Clouds-based data analytics has been picking up fast in order to reap all the originally envisaged benefits of the
cloud paradigm. Here is a list of key benefits to be accrued out of the cloud embarkation strategy and journey.
 Agility & Affordability - No capital investment of a large-scale IT infrastructures. Just Use and Pay
 Big & Fast Data Platforms - Deploying and using any kind of Big data Platforms (generic or specific,
open or commercial-grade, etc.) for analytics are quick and easy
 End-to-end Hadoop Platforms – Data virtualization, ingestion, processing, mining, analytics, and
information visualization tasks are being performed by these platforms
 Data Management Systems – Parallel, Clustered, Distributed SQL databases, NoSQL and NewSQL
databases are made available in Clouds
 Data Warehouse Systems – Recently there are data warehouse as a service (DWaaS) capabilities are
being realized
 Social Sites, mobile application stores, etc. – The popular social media and network applications
are being run on public clouds
 WAN Optimization Technologies - There are WAN optimization products and platforms for
efficiently transmitting data over the Internet infrastructure
 Business Applications in Clouds - With enterprise information systems (EISs), business-critical
packaged applications such as ERP, CMS. SCM, KM, etc. are also getting deployed in clouds.
 Cloud Integrators, Brokers & Orchestrators – There are products and platforms for seamless
interoperability among different and distributed systems, services and data
 Operational, Transactional and Analytical Systems are modernized, migrated and hosted in
Clouds
 Device / Sensor / Machines Integration with Cloud-native as well as enabled Applications, Services
and Data
Cloud-based Analytical Platforms
We have performed a number of proof of concepts (PoCs) in order to gain the deeper understanding of cloud-
based big and fast data analytics. The following sections are to depict the various platforms, databases, and
tools which are made to run in IBM SoftLayer Cloud for simplifying and streamlining the provision of analytics
as a service to worldwide clients and customers.
Big Data Analytics Platforms in IBM SoftLayer Cloud
Increasingly, individuals, innovators and institutions are taking advantage of the agility and cost efficiencies that
cloud infrastructures provide. There are several other advantages being carefully associated with cloudification
of enterprise IT infrastructures. As we all know, Hadoop is the prime method to proceed with confidence.
The maturity and stability levels of Hadoop-compliant data analytics platforms are pushing companies towards
big data analytics. As enunciated earlier, the cloud infrastructure is being positioned as the most appropriate
one for big data analytics. Also there are several open source as well as commercial-grade implementations of
Hadoop specifications in the market. Cloudera, Hortonworks, and MapR. IBM InfoSphere BigInsights is the
most favored and full-fledged commercial implementation with Apache Hadoop as the base.
Designed specifically for mission-critical environments, Cloudera Enterprise includes Cloudera data hub
(CDH), the world’s most popular open source Hadoop-based platform, as well as advanced system
management and data management tools. Cloudera Enterprise includes Cloudera Manager to help you easily
deploy, manage, monitor, and diagnose issues with your cluster. Cloudera is critical for operating clusters at
scale. Cloud environments are becoming increasingly popular for critical Apache Hadoop workloads, given
their flexibility and elasticity. With Cloudera Director, you can unlock the full potential of Hadoop in the cloud,
without compromise. The CDH reference architecture is given below.
SoftLayer Cloud not only provides potentially unlimited resources for your high-performance computing
cluster, but makes it easy to manage with Cloudera Managed Hadoop. Similarly we have deployed
Hortwonworks and MapR Hadoop platforms in SoftLayer Cloud. A typical cloud-based solution comprises
storage, processing and management components deployed on SoftLayer Cloud, an extensible, elegant,
efficient, and elastic environment for processing your data. The other benefits include extreme flexibility, high
performance, agility, and pay as per the usage obliterating the upfront costs.
IBM InfoSphere BigInsights is also made available on SoftLayer cloud and this movement brings the following
benefits to the table.
 Accelerates and simplifies cluster deployment – Take advantage of big data analytics without the
need for an on-premise infrastructure.
 Scales as your business demands – Keep infrastructure costs in line with the changing needs of the
business.
 Provides advanced tools to reduce time to value – Gain value from Big SQL, Big Sheets, text
analytics and more.
 Optimizes performance and enhances security – Experience speed and reliability with a dedicated
bare-metal infrastructure.
 Offers expertise and best practices – Benefit from a dedicated cloud operations team that deploys
clusters based on best practices.
Thus Hadoop-based platforms are being steadily taken to cloud environments in order to deliver big data
analytics with nimbleness and suppleness.
Real-time Analytics Platforms in IBM SoftLayer Cloud
Not only big data analytics but also real-time analytics on fast and streaming data is also comfortably
accomplished in cloud environments. In this section, we would like to explain how a couple of platforms that
were methodically modernized and migrated to IBM SoftLayer cloud center in order to understand the
concerns, challenges and changes associated with cloud-based real-time analytics.
Delivering Real-time Applications via SoftLayer Cloud-based VoltDB - Now with the data getting
generated and captured is growing into unprecedented volumes, the traditional data analytics platforms and
infrastructures are bound to face a variety of constraints. That means we need robust and resilient algorithms
and IT solutions for big and fast data. Several product vendors, having realized the brewing challenges, are
proactively bringing forth a bevy of big data analytics systems that facilitate the smooth transition of captured
and consolidated data to information and to knowledge methodically.
Data virtualization, databases, warehouses, data marts and cubes, business intelligence (BI) and visualization
solutions are very critical for powering up the goals of knowledge extraction and engineering to realize a
growing family of smarter systems and services for fulfilling the ingenious ideas and ideals of the smarter planet
vision. VoltDB is a high performance and scalable relational database management system (RDBMS) for big
data, high-velocity OLTP and real-time analytics. VoltDB, being proclaimed as a kind of NewSQL database, is
a blazingly fast DB designed to run on modern scale-out computing infrastructures. Unlike legacy RDBMS
products and NoSQL data stores, VoltDB enables high-velocity applications without requiring complex and
costly sharding layers or compromising transactional data integrity (ACID) to gain performance and scale:
VoltDB provides
 Database throughput reaching millions of operations per second
 On demand scaling
 High availability, fault tolerance and database durability
 Real-time data analytics
VoltDB is deployed in SoftLayer Cloud in order to showcase its real-time and real-world capabilities of
producing actionable insights.
Apache Storm on IBM SoftLayer Cloud for Real-time Analytics
Not only the data size and structure but also the data speed matters much these days. There are specific use
cases across industry verticals emerging insisting fast data analytics. Data are being massaged, encapsulated and
delivered as messages. Data and event messages are emerging as the formalized building-block to be received,
opened up, parsed, and used for a variety of deeper and decisive analysis. There are data streams (multimedia)
and events from newer data sources such as sensors, machines, operational systems, platforms, etc. and they
need to be systematically captured and analyzed immediately in order to extract both tactic and strategically
sound insights to empower decision-makers and even systems to ponder about the next course of actions with
all the confidence and clarity. While clouds are being positioned as the core and optimized IT infrastructure,
there are several open source as well as commercial-grade platforms for accomplishing and automating the
process of real-time and streaming analytics and its associated tasks.
Apache Storm is one such real-time analytics platform, is a free and open source distributed real-time
computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time
processing what Hadoop did for batch processing. Storm is simple, can be used with any programming
language. Storm has many use cases: real-time analytics, online machine learning, continuous computation,
distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per
second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and
operate. Storm integrates with the queuing and database technologies you already use. A Storm topology
consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams
between each stage of the computation however needed. We have deployed an instance of Apache Storm in
IBM SoftLayer cloud and chosen a small use case in order to understand and enunciate how cloud-based Storm
functions and delivers its originally envisaged goals.
High-Performance Big Data Analytics in SoftLayer Cloud
Everyone agrees that the high-performance characteristic is being insisted everywhere these days. There are
valid concerns expressed in different quarters that cloud environments do not guarantee high performance.
Therefore hosting high-performing platforms on clouds is being touted as one of the viable mechanisms in
order to ensure high-performance of cloud-hosted services and workloads.
Big data analytics (BDA) is emerging as a data-intensive activity mandating high-end IT infrastructures and
integrated platforms to simplify and streamline the tasks typically associated with any data analytics. There are
several viable options these days ranging from mainframes, clusters, grids, appliances, to super computers to
accelerate and accomplish data analytics efficiently. Hadoop platforms are the most sought-after for enabling
cost-effective analysis of multi-structured data mountains. In short, high-performance computing (HPC) is the
most appropriate computing model in association with to approach the infrastructural challenges thrown by
BDA. In this paper, we have described how the Netezza software solution can be systematically moved to IBM
SoftLayer Cloud, the leading public cloud offering, configured there, and used for accomplishing next-
generation real-time analytics in a low total cost of ownership (TCP) and high return on investment (RoI). In
our PoC-induced asset document, we have given all the right and relevant details of a sample application in
order to accentuate the power of cloud-based Netezza in fulfilling the various requirements of high-
performance data analytics.
Streaming Analytics in IBM SoftLayer Cloud
Stream Computing continuously integrates and analyzes data in motion to deliver real-time analytics. It further
enables organizations to detect insights (risks and opportunities) in high velocity data which can only be
detected and acted on at a moment’s notice. High velocity flows of data from real-time sources such as market
data, machines, smartphones, sensors and actuators, clickstreams, and even transactions, remain largely un-
navigated. IBM Cloud Analytics Application Services delivers high performance clusters for running enterprise-
grade big data and analytics workloads on a dedicated bare metal infrastructure and pre-installed with industry-
leading big data software. Real-time analytic processing. Store less, analyze more, and make better decisions
faster. IBM InfoSphere Streams is the Supported Software for this Cloud Analytics service. IBM InfoSphere
Streams is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and
correlate information as it arrives from thousands of real-time sources. The solution can handle very high data
throughput rates, up to millions of events or messages per second.
Many organizations need to process a large amount of data in real-time for real-time analytics, real-time ETL
or to respond to events instantaneously. Analyzing on the fly of big data streams is emerging as a distinct need
for many industry verticals these days. We have deployed DataTorrent in IBM SoftLayer Cloud and verified
how it delivers on its promises for big data streaming analytics. DataTorrent is an enterprise-grade software
platform that enables businesses to perform any sort of data processing or transformations on structured or
unstructured data, all in real-time as the data is getting streamed into a data center. Leveraging Hadoop 2.0,
DataTorrent is a YARN-native application platform. It can be installed directly onto an existing Hadoop cluster,
connect directly to all in-coming data sources live, and perform any type of processing or transformation of
your data in-memory, as it comes streaming in. DataTorrent will handle all of the scaling and fault tolerance of
the system, leaving enterprises to focus on just their business logic.
DataTorrent supports today’s most demanding, mission-critical, big-data streaming applications. It enables you
to quickly develop applications that ingest massive amounts of data from various sources in real-time, and
perform highly scalable computations in real-time. With DataTorrent, you can leverage your existing Hadoop
environment for real-time stream processing. We employed a sample application in order to erudite the readers
on how cloud-based real-time analytics applications can be implemented in a streamlined manner.
End-to-end Big Data Analytics Platform in IBM SoftLayer Cloud
In general, Hadoop platforms do pre-processing, processing and analytics for knowledge discovery. But an end-
to-end big data analytics platform involves data collection, virtualization, ingestion, analytics and visualization
modules. With just a single click, everything gets accomplished quickly and securely. Datameer is one such
platform
Datameer is an end-to-end big data analytics platform purpose-built for Hadoop that enables the fastest time
from raw data to new insights. The mission is to eliminate the complexity of the tasks associated with big data
analytics and empower everyone to make data-driven decisions in minutes, not in months. There is no need of
a data scientist or multiple, technical tools to model, integrate, cleanse, prepare, analyze and visualize your data.
Datameer is the one-stop-shop for getting all your data into Hadoop, analyzing that data, discovering the
knowledge and visualizing the insights squeezed in a preferred form and format. Datameer can handle all kinds
of data from multiple sources as illustrated in the picture below. Datameer has been successfully installed in
IBM SoftLayer cloud environment and tested with a sample application in order to demonstrate its unique
capability.
HBase, a NoSQL Database in IBM SoftLayer Cloud
HBase is a column-oriented database management system that runs on top of Hadoop distributed file system
(HDFS). HBase is a NoSQL database, is well suited for sparse data sets, and does not support a structured
query language like SQL. An HBase system comprises a set of tables and each table must have an element
defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column
represents an attribute of an object and allows for many attributes to be grouped together into what are known
as column families. With HBase, you must predefine the table schema and specify the column families.
However, it’s very flexible in that new columns can be added to families at any time, making the schema flexible
and therefore able to adapt to changing application requirements.
HBase is a part and parcel of every standard Hadoop distribution and was installed in IBM SoftLayer Cloud.
There are certain usage scenarios wherein big data analytics (BDA) is well-accomplished with the help of cloud-
based HBase database. We could indulge in developing a small application to test how HBase is productive in
faraway clouds.
There are several other competent and high-end NoSQL databases in the marketplace. Facebook Cassandra,
Google BigTable, etc. are some of the highly popular database management systems getting into cloud
environments in order to tackle the data explosion, data variety, viscosity, and variability.
The Apache Cassandra database is the correct choice when you need scalability and high availability without
compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud
infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across
multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing
that you can survive regional outages.
Cassandra's data model offers the convenience of column indexes with the performance of log-structured
updates, strong support for denormalization and materialized views, and powerful built-in caching. This is also
deployed in IBM SoftLayer Cloud. Basho Riak is another NoSQL database made available in SoftLayer cloud.
Similarly other renowned databases such as MongoDB are also being taken to cloud to reap its infrastructural
innovations and inventions.
ScaleBase Distributed Database Management System
ScaleBase brings in elasticity, scalability and continuous high availability to MySQL databases and applications
in public, private and hybrid cloud environments. ScaleBase enables instant and transparent MySQL scale out,
leveraging the power of smaller, less expensive servers working together. The policy-based data distribution
(automated sharding), powered by the ScaleBase Analysis Genie and the intelligent load balancing with
replication-aware read/write splitting enable growth of the operational load and throughput, increase of
application performance and protect from varying usage peaks and load spikes.
ScaleBase automated failover and failback ensure business continuity and protection from both unexpected and
expected outages, as well as simplify different ongoing maintenance tasks, such as software and hardware
upgrades, without impacting the application or database availability. The ability to migrate an application from
a hosted environment with a single growing database to a virtualized environment with smaller, more
manageable data nodes gives companies agility, flexibility and competitiveness. ScaleBase was purpose built for
cloud deployment. ScaleBase can be run on private clouds and is available on public clouds. We have done the
initial formalities in order to prepare and migrate the ScaleBase solution to the IBM SoftLayer public cloud,
made the necessary configuration changes, and performed a small sample application in order to run and check
how ScaleBase functions in an online, off-premise and on-demand cloud environment. This forms a major part
of our strategy of empowering public cloud offerings to be high-performing, elastic, and exotic for data and
process-intensive applications
AeroSpike In-Memory NoSQL Database in IBM SoftLayer Cloud
Versatile in-memory computing, NoSQL and NewSQL databases, parallel file systems, etc. are the prominent
IT solutions to be enabled to be hosted and run in elastic clouds elegantly for fulfilling the varying needs of the
big data world. Aerospike is an open-source distributed NoSQL database optimized for in-memory and SSD-
based indexing and data storage. Aerospike is a modern database built from the ground up to push the limits
of flash storage, processors and networks. It was designed to operate with predictable low latency at high
throughput with uncompromising reliability – both high availability and ACID guarantees. It greatly simplifies
developers’ workloads substantially as there is no need to incorporate the logic for sharding and for cluster
changes. The perpetual needs of no worrying about data loss or downtime get realized with this game-changing
database solution. Aerospike is ideal for real-time big data or context driven applications that must sense and
respond right now. Aerospike operates at in-memory speed and global scale with enterprise-grade reliability.
Identical Aerospike servers scale out to form a shared-nothing cluster which transparently partitions data and
parallelizes processing across nodes. Nodes in the cluster are identical, you can start with 2 and just add more
hardware. The cluster scales linearly.
We have migrated an instance of Aerospike database to IBM SoftLayer cloud environment and configured to
deliver on its promises. We have worked on a sample application in order to gain a deeper understanding of
the distinct capabilities of Aerospike in sufficiently meeting the goals of new-generation data-intensive
workloads.
NewSQL Databases in IBM SoftLayer Cloud
Essentially, NewSQL combines the best features from both worlds – maintaining the transactional integrity of
traditional database systems while providing high-end scalable performance of NoSQL systems. This
combination of performance and scale is crucial in transaction-intensive environments. NoSQL-based data
systems are riding a seismic wave of success with the promise of scalability. NewSQL databases seek to overtake
NoSQL with the added bonus of high-speed transactional integrity.
VoltDB is a NewSQL database and is successfully deployed in IBM SoftLayer Cloud and is subjected to a
variety of small-scale tests in order to verify whether it is capable of fulfilling of its ordained capabilities. There
are other popular NewSQL databases such as Clustrix, NuoDB, etc. getting a greater market and mind shares
fast. These are conveniently hosted and delivered as a service via cloud environments.
Database as a Service (DBaaS)
Today’s applications are expected to manage a variety of structured and unstructured data, accessed by massive
networks of users, devices, and business locations, or even sensors, vehicles and Internet-enabled goods.
Companies of all sizes, from startups to mega-users like Samsung, Hothead Games, and Fidelity Investments
use Cloudant to manage data for large or fast- growing web and mobile applications in ecommerce, on-line
education, gaming, financial services, and other industries.
Cloudant is best suited for applications that need a database to handle a massively concurrent mix of low-
latency reads and writes. Its data replication & synchronization technology also enables continuous data
availability, as well as off-line application usage for mobile or remote users. In a large organization, it can take
several weeks for a DBMS instance to be provisioned for a new development project, which limits innovation
and agility. DBaaS enables instant provisioning of your data layer, so that you can begin new development
whenever you need.
Unlike Do-It-Yourself (DIY) databases, DBaaS solutions like Cloudant provide—and guarantee—a specific
level of data layer performance and up time. This eliminates risk of service delivery failure for you and your
project. The Cloudant database as a service (DBaaS) is the first data management platform to leverage the
availability, elasticity, and reach of the cloud to create a global data delivery network (DDN) that enables
applications to scale larger and remain available to users wherever they are.
Data Warehouse as a Service (DWaaS)
IBM dashDB is a fully managed data warehousing service in the cloud. IBM dashDB is a powerful, agile data
warehousing solution on the cloud that puts an analytics powerhouse at your fingertips. IBM dashDB allows
you to break free from the bonds of infrastructure when your business demands it. IBM dashDB can help
extend your existing infrastructure into the cloud, or help you start new data warehousing self-service
capabilities. It is powered by high performance in-memory and in-database technology that delivers
answers as fast as you can think. IBM dashDB provides the simplicity of an appliance with the elasticity
and agility of the cloud for any size organization. IBM dashDB is designed to meet your expectations
of enterprise security. You can gain instant access to critical business insights without the hefty upfront
infrastructure investment. Simply you can load, analyze, and visualize your data in minutes. Thus the
days of providing data warehouse as a service is straightening and brightening.
IBM Watson Analytics in SoftLayer Cloud
As most of us know that Watson Analytics is a natural language-based cognitive service that can provide instant
access to predictive and visual analytic tools for businesses. It is designed to make advanced and predictive
analytics easy to acquire and use for anyone. Watson Analytics offers self-service analytics, including access to
easy-to-use data refinement and data warehousing services that make it easier for business users to acquire and
prepare data, beyond the simple spreadsheets for analysis and visualization. IBM Watson Analytics automates
steps like data preparation, predictive analysis, and visual storytelling for business professionals across data
intensive disciplines like marketing, sales, operations, finance and human resources. SoftLayer is integrating the
latest IBM power systems into their cloud infrastructure in order to fulfill the infrastructural needs for cost-
effective high-performance computing. IBM Watson system is to run efficiently on IBM power systems and
hence the days of Watson Analytics as a service via the SoftLayer cloud for worldwide users is to see the light
sooner.
Containerized Analytics as a Service in IBM SoftLayer Cloud
The concept of containerization for stuffing and sandboxing mission-critical applications is catching the
attention of developers as well as system administrators. Bundling every kind of software module along with
its binaries, libraries, configuration details and other dependencies together into a single package is one grand
way out for the faster and error-free deployment and delivery of software workloads. This pragmatic idea has
penetrated further up and thereby these days, all kinds of mobile, cloud, social, embedded, middleware,
database, enterprise and IoT applications are methodically being containerized using the sandbox aspect (a
subtle and smart isolation technique) to eliminate the restricting dependencies on underlying operating systems.
Such comprehensive and compact sandboxed and contained applications are being prescribed as a most sought-
after and appropriate solution for achieving portability, extensibility, manoeuvrability, sustainability and security
needs.
With the faster maturity of the Docker technology, there is a new paradigm of “containers as a service (CaaS)”
emerging and evolving. That is, containers are being readied, hosted and delivered as a service over the public
Web. All the necessary procedures to deliver application-aware containers as a service are being meticulously
enacted on containers to make them ready for the forthcoming service era. That is, knowledge-filled, service-
oriented, cloud-based, composable, and cognitive containers are being proclaimed as one of the principal
ingredients for the establishment and sustenance of the smarter planet vision. Precisely speaking, applications
are containerized and exposed as services to be discovered and used by a variety of consumers for a growing
set of use cases. Big and fast data analytics via Hadoop and Apache Storm, Spark, etc. are fast maturing and
stabilizing. VMs are widely being used for enabling Hadoop as a service. Now with the faster adoption of
containerization, the prospects for data analytics via portable, substitutable, composable, and replaceable
containers that are very famous for faster provisioning, live-in migration, etc. In short, containers are destined
for cloud environments.
The integration of Hadoop YARN with Docker will allow multiple clusters to utilize the same hardware
resources. We have made YARN containers through the Dockerization steps and hosted the YARN containers
in IBM SoftLayer Cloud. We have do a sample work in order to understand how containerized big data
workloads and analytical platforms ensures higher efficiency and thereby the new offering of containerized
analytics as a service via the SoftLayer cloud seems imminent.
Conclusion
Data has become a strategic asset for any organization these days to precisely plan ahead and proceed with
utmost confidence and clarity. Data-driven enterprises are being pronounced as the one ordained for the
continued success sagaciously overcoming all kinds of unexpected business challenges and changes. That is,
any enterprising endeavor subjecting all of its data gleaned from different and distributed sources systematically
to a series of IT-enabled deeper analytics processes with the help of end-to-end platforms for extracting
actionable insights is bound to attain and retain a greater success in its long and arduous journey. With the
steady increase in the data sources, it becomes clear for organizations to strengthen their capabilities in order
to capture all the data emanating from different and distributed systems, subject them to a series of deeper and
decisive investigations to extract actionable insights in time, and disseminate the extracted and extrapolated to
the concerned to enable them to consider the correct course of actions to steer the organizations in its anointed
journey. In this white paper, we have explained how IBM SoftLayer can take care of everything to squeeze out
actionable insights out of your big and real-time data.
The concept of cloud represents the extremely optimized and organized IT to succulently enable every kind of
IT capabilities and competencies to be provided as a service via the open, public and cheap Internet
infrastructure to the increasingly connected world.
Authors
Pethuru Raj & Skylab Vanga
IBM Global CAMS Center of Excellence
IBM India, Manyata Tech Park, Bangalore
E-mails: pechelli@in.ibm.com,
skylab.vanga@in.ibm.com

Contenu connexe

Tendances

IBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep diveIBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep dive
Kun Le
 
Cover Story_IT Var News
Cover Story_IT Var NewsCover Story_IT Var News
Cover Story_IT Var News
Garima Rai
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
Brian Crotty
 
next-generation-data-centers
next-generation-data-centersnext-generation-data-centers
next-generation-data-centers
Jason Hoffman
 
Survivors guide to the cloud whitepaper
Survivors guide to the cloud whitepaperSurvivors guide to the cloud whitepaper
Survivors guide to the cloud whitepaper
Onomi
 

Tendances (19)

IBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep diveIBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep dive
 
Cover Story_IT Var News
Cover Story_IT Var NewsCover Story_IT Var News
Cover Story_IT Var News
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
Big Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictionsBig Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictions
 
next-generation-data-centers
next-generation-data-centersnext-generation-data-centers
next-generation-data-centers
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
The Future Paradigm Shifts of the Cloud and Big Data: Security Impacts & New...
 The Future Paradigm Shifts of the Cloud and Big Data: Security Impacts & New... The Future Paradigm Shifts of the Cloud and Big Data: Security Impacts & New...
The Future Paradigm Shifts of the Cloud and Big Data: Security Impacts & New...
 
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsBig Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
 
Survivors guide to the cloud whitepaper
Survivors guide to the cloud whitepaperSurvivors guide to the cloud whitepaper
Survivors guide to the cloud whitepaper
 
Survivors Guide To The Cloud
Survivors Guide To The CloudSurvivors Guide To The Cloud
Survivors Guide To The Cloud
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Big Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 ConferenceBig Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 Conference
 
DCD Big Discussion Guide
DCD Big Discussion GuideDCD Big Discussion Guide
DCD Big Discussion Guide
 
Thailand Business with the Cloud Service
Thailand Business with  the Cloud ServiceThailand Business with  the Cloud Service
Thailand Business with the Cloud Service
 
7th cloud computing & big data 2013 Summit - 2013
7th cloud computing & big data 2013 Summit - 2013 7th cloud computing & big data 2013 Summit - 2013
7th cloud computing & big data 2013 Summit - 2013
 
QuickView #4 - Enterprise Software
QuickView #4 - Enterprise SoftwareQuickView #4 - Enterprise Software
QuickView #4 - Enterprise Software
 
Welcome to Your Compact, Data-Driven, Generator-Free Data Center Future
Welcome to Your Compact, Data-Driven, Generator-Free Data Center FutureWelcome to Your Compact, Data-Driven, Generator-Free Data Center Future
Welcome to Your Compact, Data-Driven, Generator-Free Data Center Future
 
7 trends-for-big-data
7 trends-for-big-data7 trends-for-big-data
7 trends-for-big-data
 

Similaire à Analytics as a Service in SL

Read the Discussions below and give a good replyDiscussion 1..docx
Read the Discussions below and give a good replyDiscussion 1..docxRead the Discussions below and give a good replyDiscussion 1..docx
Read the Discussions below and give a good replyDiscussion 1..docx
makdul
 
Technology Vision 2008 at ICCG HD08
Technology Vision 2008 at ICCG HD08Technology Vision 2008 at ICCG HD08
Technology Vision 2008 at ICCG HD08
niklaus
 
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docxNFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
curwenmichaela
 

Similaire à Analytics as a Service in SL (20)

Read the Discussions below and give a good replyDiscussion 1..docx
Read the Discussions below and give a good replyDiscussion 1..docxRead the Discussions below and give a good replyDiscussion 1..docx
Read the Discussions below and give a good replyDiscussion 1..docx
 
Seven data storage & networking trends in 2020
Seven data storage & networking trends in 2020Seven data storage & networking trends in 2020
Seven data storage & networking trends in 2020
 
SMAC
SMACSMAC
SMAC
 
Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...
 
Notes on Current trends in IT (1) (1).pdf
Notes on Current trends in IT (1) (1).pdfNotes on Current trends in IT (1) (1).pdf
Notes on Current trends in IT (1) (1).pdf
 
Shamit khemka list outs 6 technology trends for 2015
Shamit khemka list outs 6 technology trends for 2015Shamit khemka list outs 6 technology trends for 2015
Shamit khemka list outs 6 technology trends for 2015
 
Openstack
OpenstackOpenstack
Openstack
 
Future Trends in the Modern Data Stack Landscape
Future Trends in the Modern Data Stack LandscapeFuture Trends in the Modern Data Stack Landscape
Future Trends in the Modern Data Stack Landscape
 
Big Data Analytics in the Cloud for Business Intelligence.docx
Big Data Analytics in the Cloud for Business Intelligence.docxBig Data Analytics in the Cloud for Business Intelligence.docx
Big Data Analytics in the Cloud for Business Intelligence.docx
 
Modern Data Stack.pdf
Modern Data Stack.pdfModern Data Stack.pdf
Modern Data Stack.pdf
 
Bni cloud presentation
Bni cloud presentationBni cloud presentation
Bni cloud presentation
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
 
Technology Vision 2008 at ICCG HD08
Technology Vision 2008 at ICCG HD08Technology Vision 2008 at ICCG HD08
Technology Vision 2008 at ICCG HD08
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Oracle
OracleOracle
Oracle
 
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docxNFRASTRUCTURE MODERNIZATION REVIEW                      Analyz.docx
NFRASTRUCTURE MODERNIZATION REVIEW Analyz.docx
 
Evolution of #cloud computing
Evolution of #cloud computingEvolution of #cloud computing
Evolution of #cloud computing
 
Top 10 Digital Transformation Trends For Business
Top 10 Digital Transformation Trends For BusinessTop 10 Digital Transformation Trends For Business
Top 10 Digital Transformation Trends For Business
 
oracle-total-cloud-2346917
oracle-total-cloud-2346917oracle-total-cloud-2346917
oracle-total-cloud-2346917
 

Analytics as a Service in SL

  • 1. Enabling Analytics as a Service (AaaS): The Key Analytical Platforms and Workloads on IBM SoftLayer Cloud Abstract In the recent past, two prominent and dominant trends have gripped the IT industry. The first one is the accelerated IT infrastructure optimization, which is being primarily sponsored and supported by the proven and promising cloud technologies. The other one is the amount of data getting generated, collected, and subjected to a variety of investigations to extract actionable insights in time, to enable correct and timely decision-making by business executives with all the confidence and clarity, and to empower knowledge workers to be greatly efficient in their tasks, is challengingly massive in size. Established product vendors and researchers from academic institutions across the world are in fast track and in grand unison in collaboratively conceiving and concretizing a bevy of service assemblage and delivery platforms (SDPs), data virtualization, ingestion, analytics and visualization platforms and application enablement platforms (AEPs) in order to speed up and simplify knowledge extraction and engineering from a variety of data heaps via real-time as well as batch processing. Towards the knowledge discovery and dissemination, there are design and architectural patterns, highly synchronized processes, evaluation metrics, key guidelines, best practices, etc. being unearthed and sustained by data management professionals. We have performed a variety of proof of concepts and pilots in the fast-growing big and fast data analytics domains and based on that experience and expertise gained, we could produce a repository of reusable assets to be shared across. In this paper, we would illustrated how the trendy and transformative data analytics is being exposed and delivered as a service via IBM SoftLayer Cloud for worldwide users in an affordable, amenable and accelerated fashion. Introduction There are several disruptive things happening in parallel in the IT field. The device ecosystem is seeing an unprecedented growth towards billions of connected devices, the number of implantables, wearables, portables, cyber-physical systems (CPS), etc. are zooming ahead, the business-critical operational, transactional and analytical systems are becoming pervasive, social sites are embraced with a greater alacrity by people across the world, the digitization idea is pursued vigorously as never before resulting in trillions of digitized entities / smart objects / sentient materials, scores of powerful scientific and technical experimentations are accomplished, etc. Traditionally business data is the main source for analytics to squeeze out business insights. Today the data size is massive, data scope, speed, and structure are varying sharply, and the resulting data value for any individual, innovator and institution is going to be decisive if all kinds of data getting collected are crunched cognitively. Having understood the strategic significance of data-driven insights, there are two grand disciplines (big and fast data analytics) of deeper research and study. There are several enabling technologies, platforms, and tools in plenty these days from worldwide product vendors for accelerating big and fast data analytics in a simplified and streamlined fashion. Precisely speaking, there is an insistence on crafting and composing insights-filled knowledge services towards enhanced care, choice, comfort and convenience for people. In the ensuing sections, we would like throw some light on the two principal technologies enabling the smooth and sagacious realization of analytics as a service (AaaS). We write about the analytics platforms and workloads that got modernized, migrated and deployed in IBM SoftLayer Cloud to envisage and enable the ultimate aim of accomplishing of analytics as a service. The World of Big and Fast Data Analytics Big data analytics is now moving beyond the realm of intellectual curiosity and propensity to make tangible and trendsetting impacts on business operations, offerings and outlooks. It is no longer a hype or a buzzword and is all set to become a core and central tenet for every sort of business enterprise to be extremely relevant and
  • 2. rightful to their stakeholders and end-users. Big data analytics is a generic and horizontally applicable idea to be feverishly leveraged across all kinds of business domains and hence is poised to become a trendsetter for worldwide businesses to march ahead with all clarity and confidence. Real-time analytics is the hot requirement today and everyone is working on fulfilling this critical need. The emerging use cases include the use of real- time data such as the sensor data to detect any abnormalities in plant and machinery and batch processing of sensor data collected over a period to conduct root cause and failure analysis of plant and machinery. Public Clouds for Big and Real-time Data Analytics - Most traditional data warehousing and business intelligence (BI) projects to date have involved collecting, cleansing and analyzing data extracted from on- premises business-critical systems. However, this age-old practice is about to change forever. However, for the foreseeable future, it is unlikely that many organizations will move their mission-critical systems or data (customer, confidential and corporate) to public cloud environments for analysis. Businesses steadily are adopting the cloud idea for business operational and transactional purposes. Packaged and cloud-native applications are primarily found fit for clouds and they are exceedingly well in their new residences. The biggest potential for cloud computing is the affordable and adept processing of data that already exists in cloud centers. All sorts of functional web sites, applications and services are bound to be cloud-based sooner rather than later. The positioning of clouds as the converged, heavily optimized and automated, dedicated and shared, virtualized and software-defined environment for IT infrastructures (servers, storage and networking), business infrastructure and management software solutions and applications is getting strengthened fast. Therefore every kind of physical assets are seamlessly integrated with cloud-based services in order to be smart in their behavioral aspects. That is, ground-level sensors and actuators are increasingly tied up with cloud-based software to be distinct in their operations and outputs. All these developments clearly foretell that the future data analytics is to flourish fluently in clouds. These days’ public clouds are natively providing all kinds of big data analytics tools, platforms, and tools on their infrastructures in order to speed up the most promising data analytics at a blazing speed at an affordable cost. WAN optimization technologies are maturing fast to substantially reduce the network latency while transmitting huge amount of data from one system to another among geographically distributed clouds. Federated, open, connected, and interoperable cloud schemes are fast capturing the attention of the concerned and hence we can see the concept of the inter-cloud getting realized soon through open and industry-strength standards and deeper automations. With the continued adoption and articulation of new capabilities and competencies such as software-defined compute, storage and networking, the days of cloud-based data analytics is to grow immensely. In short, clouds are being positioned as the core, central and cognitive environment for all kinds of complex tasks. Hybrid Clouds for Specific Cases - It is anticipated that in the years to unfold, the value of hybrid clouds is to climb up sharply as for most of the emerging scenarios, a mixed and multi-site IT environment is more appropriate. For the analytics space, a viable and venerable hybrid cloud use case is to filter out sensitive information from data sets shortly after capture and then leverage the public cloud to perform any complex analytics on them. For example, if analyzing terabytes worth of medical data to identify reliable healthcare patterns to predict any susceptibility towards a particular disease, the identity details of patients are not too relevant. In this case, just a filter can scrape names, addresses, and social security numbers, etc. before pushing the anonymized set to secure cloud data storage. All kinds of software systems are steadily being modernized and moved to cloud environments especially public clouds to be given subscribed and used as a service over the public web. The other noteworthy factor is that a variety of social sites for capturing and captivating different segments of people across the world are emerging and joining in the mainstream computing. We therefore hear, read and even use social media, networking, and computing aspects. A statistics says that the widely used Facebook pours out at least 8 terabytes of data every day. Similarly other social sites produce a large-scale of amount of personal, social, professional data apart from musings, blogs, opinions, feedbacks, reviews, multimedia files, comments, compliments, complaints,
  • 3. advertisements, and other articulations. These poly-structured data play a bigger role in shaping up the data analytics domain. The other valuable trends include the movement of enterprise-class operational, transactional, commercial, and analytics systems to public clouds. We all know that www.salesforce.com is the founding public cloud providing CRM as a service. Thus most of the enterprise data originates in public clouds. With public clouds projected to grow fast, the cloud data is being presented as another viable and venerable opportunity towards cloud-based data analytics. The Contemporary Analytics in Hybrid Clouds Apart from the traditional business analytics, the above-mentioned trends ask for newer kinds of analytics leveraging big and real-time data. There are domain-specific and agnostic analytics categories. For example, increasingly the justifications for predictive and prescriptive analytics, operational, security, performance analytics and so on are being expounded with the purposeful emergence of different and distributed data sources. Every industry vertical has its big data analytics. With different data velocities, real-time / streaming analytics is bound to be mandatory. There are a few vital parameters to determine the appropriateness of cloud environments for powerful data analytics.  The Data Volume and Velocity  The Impacts on Compute, Storage and Network Resources  The Sensitivity of data and Regulatory /Compliance Requirements  The Scope of Analytics  The Types of the Environments? Why the Next-Generation Data Analytics Applications and Platforms in Cloud Environments? Clouds-based data analytics has been picking up fast in order to reap all the originally envisaged benefits of the cloud paradigm. Here is a list of key benefits to be accrued out of the cloud embarkation strategy and journey.  Agility & Affordability - No capital investment of a large-scale IT infrastructures. Just Use and Pay  Big & Fast Data Platforms - Deploying and using any kind of Big data Platforms (generic or specific, open or commercial-grade, etc.) for analytics are quick and easy  End-to-end Hadoop Platforms – Data virtualization, ingestion, processing, mining, analytics, and information visualization tasks are being performed by these platforms  Data Management Systems – Parallel, Clustered, Distributed SQL databases, NoSQL and NewSQL databases are made available in Clouds  Data Warehouse Systems – Recently there are data warehouse as a service (DWaaS) capabilities are being realized  Social Sites, mobile application stores, etc. – The popular social media and network applications are being run on public clouds  WAN Optimization Technologies - There are WAN optimization products and platforms for efficiently transmitting data over the Internet infrastructure  Business Applications in Clouds - With enterprise information systems (EISs), business-critical packaged applications such as ERP, CMS. SCM, KM, etc. are also getting deployed in clouds.  Cloud Integrators, Brokers & Orchestrators – There are products and platforms for seamless interoperability among different and distributed systems, services and data  Operational, Transactional and Analytical Systems are modernized, migrated and hosted in Clouds
  • 4.  Device / Sensor / Machines Integration with Cloud-native as well as enabled Applications, Services and Data Cloud-based Analytical Platforms We have performed a number of proof of concepts (PoCs) in order to gain the deeper understanding of cloud- based big and fast data analytics. The following sections are to depict the various platforms, databases, and tools which are made to run in IBM SoftLayer Cloud for simplifying and streamlining the provision of analytics as a service to worldwide clients and customers. Big Data Analytics Platforms in IBM SoftLayer Cloud Increasingly, individuals, innovators and institutions are taking advantage of the agility and cost efficiencies that cloud infrastructures provide. There are several other advantages being carefully associated with cloudification of enterprise IT infrastructures. As we all know, Hadoop is the prime method to proceed with confidence. The maturity and stability levels of Hadoop-compliant data analytics platforms are pushing companies towards big data analytics. As enunciated earlier, the cloud infrastructure is being positioned as the most appropriate one for big data analytics. Also there are several open source as well as commercial-grade implementations of Hadoop specifications in the market. Cloudera, Hortonworks, and MapR. IBM InfoSphere BigInsights is the most favored and full-fledged commercial implementation with Apache Hadoop as the base. Designed specifically for mission-critical environments, Cloudera Enterprise includes Cloudera data hub (CDH), the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools. Cloudera Enterprise includes Cloudera Manager to help you easily deploy, manage, monitor, and diagnose issues with your cluster. Cloudera is critical for operating clusters at scale. Cloud environments are becoming increasingly popular for critical Apache Hadoop workloads, given their flexibility and elasticity. With Cloudera Director, you can unlock the full potential of Hadoop in the cloud, without compromise. The CDH reference architecture is given below.
  • 5. SoftLayer Cloud not only provides potentially unlimited resources for your high-performance computing cluster, but makes it easy to manage with Cloudera Managed Hadoop. Similarly we have deployed Hortwonworks and MapR Hadoop platforms in SoftLayer Cloud. A typical cloud-based solution comprises storage, processing and management components deployed on SoftLayer Cloud, an extensible, elegant, efficient, and elastic environment for processing your data. The other benefits include extreme flexibility, high performance, agility, and pay as per the usage obliterating the upfront costs. IBM InfoSphere BigInsights is also made available on SoftLayer cloud and this movement brings the following benefits to the table.  Accelerates and simplifies cluster deployment – Take advantage of big data analytics without the need for an on-premise infrastructure.  Scales as your business demands – Keep infrastructure costs in line with the changing needs of the business.  Provides advanced tools to reduce time to value – Gain value from Big SQL, Big Sheets, text analytics and more.  Optimizes performance and enhances security – Experience speed and reliability with a dedicated bare-metal infrastructure.  Offers expertise and best practices – Benefit from a dedicated cloud operations team that deploys clusters based on best practices. Thus Hadoop-based platforms are being steadily taken to cloud environments in order to deliver big data analytics with nimbleness and suppleness. Real-time Analytics Platforms in IBM SoftLayer Cloud Not only big data analytics but also real-time analytics on fast and streaming data is also comfortably accomplished in cloud environments. In this section, we would like to explain how a couple of platforms that were methodically modernized and migrated to IBM SoftLayer cloud center in order to understand the concerns, challenges and changes associated with cloud-based real-time analytics. Delivering Real-time Applications via SoftLayer Cloud-based VoltDB - Now with the data getting generated and captured is growing into unprecedented volumes, the traditional data analytics platforms and infrastructures are bound to face a variety of constraints. That means we need robust and resilient algorithms and IT solutions for big and fast data. Several product vendors, having realized the brewing challenges, are proactively bringing forth a bevy of big data analytics systems that facilitate the smooth transition of captured and consolidated data to information and to knowledge methodically. Data virtualization, databases, warehouses, data marts and cubes, business intelligence (BI) and visualization solutions are very critical for powering up the goals of knowledge extraction and engineering to realize a growing family of smarter systems and services for fulfilling the ingenious ideas and ideals of the smarter planet vision. VoltDB is a high performance and scalable relational database management system (RDBMS) for big data, high-velocity OLTP and real-time analytics. VoltDB, being proclaimed as a kind of NewSQL database, is a blazingly fast DB designed to run on modern scale-out computing infrastructures. Unlike legacy RDBMS products and NoSQL data stores, VoltDB enables high-velocity applications without requiring complex and costly sharding layers or compromising transactional data integrity (ACID) to gain performance and scale: VoltDB provides  Database throughput reaching millions of operations per second  On demand scaling
  • 6.  High availability, fault tolerance and database durability  Real-time data analytics VoltDB is deployed in SoftLayer Cloud in order to showcase its real-time and real-world capabilities of producing actionable insights. Apache Storm on IBM SoftLayer Cloud for Real-time Analytics Not only the data size and structure but also the data speed matters much these days. There are specific use cases across industry verticals emerging insisting fast data analytics. Data are being massaged, encapsulated and delivered as messages. Data and event messages are emerging as the formalized building-block to be received, opened up, parsed, and used for a variety of deeper and decisive analysis. There are data streams (multimedia) and events from newer data sources such as sensors, machines, operational systems, platforms, etc. and they need to be systematically captured and analyzed immediately in order to extract both tactic and strategically sound insights to empower decision-makers and even systems to ponder about the next course of actions with all the confidence and clarity. While clouds are being positioned as the core and optimized IT infrastructure, there are several open source as well as commercial-grade platforms for accomplishing and automating the process of real-time and streaming analytics and its associated tasks. Apache Storm is one such real-time analytics platform, is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language. Storm has many use cases: real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. Storm integrates with the queuing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. We have deployed an instance of Apache Storm in IBM SoftLayer cloud and chosen a small use case in order to understand and enunciate how cloud-based Storm functions and delivers its originally envisaged goals. High-Performance Big Data Analytics in SoftLayer Cloud Everyone agrees that the high-performance characteristic is being insisted everywhere these days. There are valid concerns expressed in different quarters that cloud environments do not guarantee high performance. Therefore hosting high-performing platforms on clouds is being touted as one of the viable mechanisms in order to ensure high-performance of cloud-hosted services and workloads. Big data analytics (BDA) is emerging as a data-intensive activity mandating high-end IT infrastructures and integrated platforms to simplify and streamline the tasks typically associated with any data analytics. There are several viable options these days ranging from mainframes, clusters, grids, appliances, to super computers to accelerate and accomplish data analytics efficiently. Hadoop platforms are the most sought-after for enabling cost-effective analysis of multi-structured data mountains. In short, high-performance computing (HPC) is the most appropriate computing model in association with to approach the infrastructural challenges thrown by BDA. In this paper, we have described how the Netezza software solution can be systematically moved to IBM SoftLayer Cloud, the leading public cloud offering, configured there, and used for accomplishing next- generation real-time analytics in a low total cost of ownership (TCP) and high return on investment (RoI). In our PoC-induced asset document, we have given all the right and relevant details of a sample application in order to accentuate the power of cloud-based Netezza in fulfilling the various requirements of high- performance data analytics.
  • 7. Streaming Analytics in IBM SoftLayer Cloud Stream Computing continuously integrates and analyzes data in motion to deliver real-time analytics. It further enables organizations to detect insights (risks and opportunities) in high velocity data which can only be detected and acted on at a moment’s notice. High velocity flows of data from real-time sources such as market data, machines, smartphones, sensors and actuators, clickstreams, and even transactions, remain largely un- navigated. IBM Cloud Analytics Application Services delivers high performance clusters for running enterprise- grade big data and analytics workloads on a dedicated bare metal infrastructure and pre-installed with industry- leading big data software. Real-time analytic processing. Store less, analyze more, and make better decisions faster. IBM InfoSphere Streams is the Supported Software for this Cloud Analytics service. IBM InfoSphere Streams is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from thousands of real-time sources. The solution can handle very high data throughput rates, up to millions of events or messages per second. Many organizations need to process a large amount of data in real-time for real-time analytics, real-time ETL or to respond to events instantaneously. Analyzing on the fly of big data streams is emerging as a distinct need for many industry verticals these days. We have deployed DataTorrent in IBM SoftLayer Cloud and verified how it delivers on its promises for big data streaming analytics. DataTorrent is an enterprise-grade software platform that enables businesses to perform any sort of data processing or transformations on structured or unstructured data, all in real-time as the data is getting streamed into a data center. Leveraging Hadoop 2.0, DataTorrent is a YARN-native application platform. It can be installed directly onto an existing Hadoop cluster, connect directly to all in-coming data sources live, and perform any type of processing or transformation of your data in-memory, as it comes streaming in. DataTorrent will handle all of the scaling and fault tolerance of the system, leaving enterprises to focus on just their business logic. DataTorrent supports today’s most demanding, mission-critical, big-data streaming applications. It enables you to quickly develop applications that ingest massive amounts of data from various sources in real-time, and perform highly scalable computations in real-time. With DataTorrent, you can leverage your existing Hadoop environment for real-time stream processing. We employed a sample application in order to erudite the readers on how cloud-based real-time analytics applications can be implemented in a streamlined manner. End-to-end Big Data Analytics Platform in IBM SoftLayer Cloud In general, Hadoop platforms do pre-processing, processing and analytics for knowledge discovery. But an end- to-end big data analytics platform involves data collection, virtualization, ingestion, analytics and visualization modules. With just a single click, everything gets accomplished quickly and securely. Datameer is one such platform Datameer is an end-to-end big data analytics platform purpose-built for Hadoop that enables the fastest time from raw data to new insights. The mission is to eliminate the complexity of the tasks associated with big data analytics and empower everyone to make data-driven decisions in minutes, not in months. There is no need of a data scientist or multiple, technical tools to model, integrate, cleanse, prepare, analyze and visualize your data. Datameer is the one-stop-shop for getting all your data into Hadoop, analyzing that data, discovering the knowledge and visualizing the insights squeezed in a preferred form and format. Datameer can handle all kinds of data from multiple sources as illustrated in the picture below. Datameer has been successfully installed in IBM SoftLayer cloud environment and tested with a sample application in order to demonstrate its unique capability.
  • 8. HBase, a NoSQL Database in IBM SoftLayer Cloud HBase is a column-oriented database management system that runs on top of Hadoop distributed file system (HDFS). HBase is a NoSQL database, is well suited for sparse data sets, and does not support a structured query language like SQL. An HBase system comprises a set of tables and each table must have an element defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column represents an attribute of an object and allows for many attributes to be grouped together into what are known as column families. With HBase, you must predefine the table schema and specify the column families. However, it’s very flexible in that new columns can be added to families at any time, making the schema flexible and therefore able to adapt to changing application requirements. HBase is a part and parcel of every standard Hadoop distribution and was installed in IBM SoftLayer Cloud. There are certain usage scenarios wherein big data analytics (BDA) is well-accomplished with the help of cloud- based HBase database. We could indulge in developing a small application to test how HBase is productive in faraway clouds.
  • 9. There are several other competent and high-end NoSQL databases in the marketplace. Facebook Cassandra, Google BigTable, etc. are some of the highly popular database management systems getting into cloud environments in order to tackle the data explosion, data variety, viscosity, and variability. The Apache Cassandra database is the correct choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages. Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching. This is also deployed in IBM SoftLayer Cloud. Basho Riak is another NoSQL database made available in SoftLayer cloud. Similarly other renowned databases such as MongoDB are also being taken to cloud to reap its infrastructural innovations and inventions. ScaleBase Distributed Database Management System ScaleBase brings in elasticity, scalability and continuous high availability to MySQL databases and applications in public, private and hybrid cloud environments. ScaleBase enables instant and transparent MySQL scale out, leveraging the power of smaller, less expensive servers working together. The policy-based data distribution (automated sharding), powered by the ScaleBase Analysis Genie and the intelligent load balancing with replication-aware read/write splitting enable growth of the operational load and throughput, increase of application performance and protect from varying usage peaks and load spikes. ScaleBase automated failover and failback ensure business continuity and protection from both unexpected and expected outages, as well as simplify different ongoing maintenance tasks, such as software and hardware upgrades, without impacting the application or database availability. The ability to migrate an application from a hosted environment with a single growing database to a virtualized environment with smaller, more manageable data nodes gives companies agility, flexibility and competitiveness. ScaleBase was purpose built for cloud deployment. ScaleBase can be run on private clouds and is available on public clouds. We have done the initial formalities in order to prepare and migrate the ScaleBase solution to the IBM SoftLayer public cloud, made the necessary configuration changes, and performed a small sample application in order to run and check how ScaleBase functions in an online, off-premise and on-demand cloud environment. This forms a major part of our strategy of empowering public cloud offerings to be high-performing, elastic, and exotic for data and process-intensive applications AeroSpike In-Memory NoSQL Database in IBM SoftLayer Cloud Versatile in-memory computing, NoSQL and NewSQL databases, parallel file systems, etc. are the prominent IT solutions to be enabled to be hosted and run in elastic clouds elegantly for fulfilling the varying needs of the big data world. Aerospike is an open-source distributed NoSQL database optimized for in-memory and SSD- based indexing and data storage. Aerospike is a modern database built from the ground up to push the limits of flash storage, processors and networks. It was designed to operate with predictable low latency at high throughput with uncompromising reliability – both high availability and ACID guarantees. It greatly simplifies developers’ workloads substantially as there is no need to incorporate the logic for sharding and for cluster changes. The perpetual needs of no worrying about data loss or downtime get realized with this game-changing database solution. Aerospike is ideal for real-time big data or context driven applications that must sense and respond right now. Aerospike operates at in-memory speed and global scale with enterprise-grade reliability. Identical Aerospike servers scale out to form a shared-nothing cluster which transparently partitions data and
  • 10. parallelizes processing across nodes. Nodes in the cluster are identical, you can start with 2 and just add more hardware. The cluster scales linearly. We have migrated an instance of Aerospike database to IBM SoftLayer cloud environment and configured to deliver on its promises. We have worked on a sample application in order to gain a deeper understanding of the distinct capabilities of Aerospike in sufficiently meeting the goals of new-generation data-intensive workloads. NewSQL Databases in IBM SoftLayer Cloud Essentially, NewSQL combines the best features from both worlds – maintaining the transactional integrity of traditional database systems while providing high-end scalable performance of NoSQL systems. This combination of performance and scale is crucial in transaction-intensive environments. NoSQL-based data systems are riding a seismic wave of success with the promise of scalability. NewSQL databases seek to overtake NoSQL with the added bonus of high-speed transactional integrity. VoltDB is a NewSQL database and is successfully deployed in IBM SoftLayer Cloud and is subjected to a variety of small-scale tests in order to verify whether it is capable of fulfilling of its ordained capabilities. There are other popular NewSQL databases such as Clustrix, NuoDB, etc. getting a greater market and mind shares fast. These are conveniently hosted and delivered as a service via cloud environments. Database as a Service (DBaaS) Today’s applications are expected to manage a variety of structured and unstructured data, accessed by massive networks of users, devices, and business locations, or even sensors, vehicles and Internet-enabled goods. Companies of all sizes, from startups to mega-users like Samsung, Hothead Games, and Fidelity Investments use Cloudant to manage data for large or fast- growing web and mobile applications in ecommerce, on-line education, gaming, financial services, and other industries. Cloudant is best suited for applications that need a database to handle a massively concurrent mix of low- latency reads and writes. Its data replication & synchronization technology also enables continuous data availability, as well as off-line application usage for mobile or remote users. In a large organization, it can take several weeks for a DBMS instance to be provisioned for a new development project, which limits innovation and agility. DBaaS enables instant provisioning of your data layer, so that you can begin new development whenever you need. Unlike Do-It-Yourself (DIY) databases, DBaaS solutions like Cloudant provide—and guarantee—a specific level of data layer performance and up time. This eliminates risk of service delivery failure for you and your project. The Cloudant database as a service (DBaaS) is the first data management platform to leverage the availability, elasticity, and reach of the cloud to create a global data delivery network (DDN) that enables applications to scale larger and remain available to users wherever they are. Data Warehouse as a Service (DWaaS) IBM dashDB is a fully managed data warehousing service in the cloud. IBM dashDB is a powerful, agile data warehousing solution on the cloud that puts an analytics powerhouse at your fingertips. IBM dashDB allows you to break free from the bonds of infrastructure when your business demands it. IBM dashDB can help extend your existing infrastructure into the cloud, or help you start new data warehousing self-service capabilities. It is powered by high performance in-memory and in-database technology that delivers answers as fast as you can think. IBM dashDB provides the simplicity of an appliance with the elasticity
  • 11. and agility of the cloud for any size organization. IBM dashDB is designed to meet your expectations of enterprise security. You can gain instant access to critical business insights without the hefty upfront infrastructure investment. Simply you can load, analyze, and visualize your data in minutes. Thus the days of providing data warehouse as a service is straightening and brightening. IBM Watson Analytics in SoftLayer Cloud As most of us know that Watson Analytics is a natural language-based cognitive service that can provide instant access to predictive and visual analytic tools for businesses. It is designed to make advanced and predictive analytics easy to acquire and use for anyone. Watson Analytics offers self-service analytics, including access to easy-to-use data refinement and data warehousing services that make it easier for business users to acquire and prepare data, beyond the simple spreadsheets for analysis and visualization. IBM Watson Analytics automates steps like data preparation, predictive analysis, and visual storytelling for business professionals across data intensive disciplines like marketing, sales, operations, finance and human resources. SoftLayer is integrating the latest IBM power systems into their cloud infrastructure in order to fulfill the infrastructural needs for cost- effective high-performance computing. IBM Watson system is to run efficiently on IBM power systems and hence the days of Watson Analytics as a service via the SoftLayer cloud for worldwide users is to see the light sooner. Containerized Analytics as a Service in IBM SoftLayer Cloud The concept of containerization for stuffing and sandboxing mission-critical applications is catching the attention of developers as well as system administrators. Bundling every kind of software module along with its binaries, libraries, configuration details and other dependencies together into a single package is one grand way out for the faster and error-free deployment and delivery of software workloads. This pragmatic idea has penetrated further up and thereby these days, all kinds of mobile, cloud, social, embedded, middleware, database, enterprise and IoT applications are methodically being containerized using the sandbox aspect (a subtle and smart isolation technique) to eliminate the restricting dependencies on underlying operating systems. Such comprehensive and compact sandboxed and contained applications are being prescribed as a most sought- after and appropriate solution for achieving portability, extensibility, manoeuvrability, sustainability and security needs. With the faster maturity of the Docker technology, there is a new paradigm of “containers as a service (CaaS)” emerging and evolving. That is, containers are being readied, hosted and delivered as a service over the public Web. All the necessary procedures to deliver application-aware containers as a service are being meticulously enacted on containers to make them ready for the forthcoming service era. That is, knowledge-filled, service- oriented, cloud-based, composable, and cognitive containers are being proclaimed as one of the principal ingredients for the establishment and sustenance of the smarter planet vision. Precisely speaking, applications are containerized and exposed as services to be discovered and used by a variety of consumers for a growing set of use cases. Big and fast data analytics via Hadoop and Apache Storm, Spark, etc. are fast maturing and stabilizing. VMs are widely being used for enabling Hadoop as a service. Now with the faster adoption of containerization, the prospects for data analytics via portable, substitutable, composable, and replaceable containers that are very famous for faster provisioning, live-in migration, etc. In short, containers are destined for cloud environments. The integration of Hadoop YARN with Docker will allow multiple clusters to utilize the same hardware resources. We have made YARN containers through the Dockerization steps and hosted the YARN containers in IBM SoftLayer Cloud. We have do a sample work in order to understand how containerized big data workloads and analytical platforms ensures higher efficiency and thereby the new offering of containerized analytics as a service via the SoftLayer cloud seems imminent.
  • 12. Conclusion Data has become a strategic asset for any organization these days to precisely plan ahead and proceed with utmost confidence and clarity. Data-driven enterprises are being pronounced as the one ordained for the continued success sagaciously overcoming all kinds of unexpected business challenges and changes. That is, any enterprising endeavor subjecting all of its data gleaned from different and distributed sources systematically to a series of IT-enabled deeper analytics processes with the help of end-to-end platforms for extracting actionable insights is bound to attain and retain a greater success in its long and arduous journey. With the steady increase in the data sources, it becomes clear for organizations to strengthen their capabilities in order to capture all the data emanating from different and distributed systems, subject them to a series of deeper and decisive investigations to extract actionable insights in time, and disseminate the extracted and extrapolated to the concerned to enable them to consider the correct course of actions to steer the organizations in its anointed journey. In this white paper, we have explained how IBM SoftLayer can take care of everything to squeeze out actionable insights out of your big and real-time data. The concept of cloud represents the extremely optimized and organized IT to succulently enable every kind of IT capabilities and competencies to be provided as a service via the open, public and cheap Internet infrastructure to the increasingly connected world. Authors Pethuru Raj & Skylab Vanga IBM Global CAMS Center of Excellence IBM India, Manyata Tech Park, Bangalore E-mails: pechelli@in.ibm.com, skylab.vanga@in.ibm.com