Hadoop is deployed for a variety of uses, including web analytics, fraud detection, security monitoring, healthcare, environmental analysis, social media monitoring, and other purposes.
I can’t really talk about Hortonworks without first taking a moment to talk about the history of Hadoop.What we now know of as Hadoop really started back in 2005, when Eric Baldeschwieler – known as “E14” – started to work on a project that to build a large scale data storage and processing technology that would allow them to store and process massive amounts of data to underpin Yahoo’s most critical application, Search. The initial focus was on building out the technology – the key components being HDFS and MapReduce – that would become the Core of what we think of as Hadoop today, and continuing to innovate it to meet the needs of this specific application.By 2008, Hadoop usage had greatly expanded inside of Yahoo, to the point that many applications were now using this data management platform, and as a result the team’s focus extended to include a focus on Operations: now that applications were beginning to propagate around the organization, sophisticated capabilities for operating it at scale were necessary. It was also at this time that usage began to expand well beyond Yahoo, with many notable organizations (including Facebook and others) adopting Hadoop as the basis of their large scale data processing and storage applications and necessitating a focus on operations to support what as by now a large variety of critical business applications.In 2011, recognizing that more mainstream adoption of Hadoop was beginning to take off and with an objective of facilitating it, the core team left – with the blessing of Yahoo – to form Hortonworks. The goal of the group was to facilitate broader adoption by addressing the Enterprise capabilities that would would enable a larger number of organizations to adopt and expand their usage of Hadoop.[note: if useful as a talk track, Cloudera was formed in 2008 well BEFORE the operational expertise of running Hadoop at scale was established inside of Yahoo]
While overly simplistic, this graphic represents what we commonly see as a general data architecture:A set of data sources producing dataA set of data systems to capture and store that data: most typically a mix of RDBMS and data warehousesA set of applications that leverage the data stored in those data systems. These could be package BI applications (Business Objects, Tableau, etc), Enterprise Applications (e.g. SAP) or Custom Applications (e.g. custom web applications), ranging from ad-hoc reporting tools to mission-critical enterprise operations applications.Your environment is undoubtedly more complicated, but conceptually it is likely similar.
As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets).Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with:Existing applications – such as Tableau, SAS, Business Objects, etc,Existing databases and data warehouses for loading data to / from the data warehouseDevelopment tools used for building custom applicationsOperational tools for managing and monitoring
It is for that reason that we focus on HDP interoperability across all of these categories:Data systemsHDP is endorsed and embedded with SQL Server, Teradata and moreBI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and moreWith Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with HadoopFor Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDPOperational toolsIntegration with System Center, and with Teradata viewpoint
In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
Across all of our user base, we have identified just 3 separate usage patterns – sometimes more than one is used in concert during a complex project, but the patterns are distinct nonetheless. These are Refine, Explore and Enrich.The first of these, the Refine case, is probably the most common today. It is about taking very large quantities of data and using Hadoop to distill the information down into a more manageable data set that can then be loaded into a traditional data warehouse for usage with existing tools. This is relatively straightforward and allows an organization to harness a much larger data set for their analytics applications while leveraging their existing data warehousing and analytics tools.Using the graphic here, in step 1 data is pulled from a variety of sources, into the Hadoop platform in step 2, and then in step 3 loaded into a data warehouse for analysis by existing BI tools
The final use case is called Application Enrichment.This is about incorporating data stored in HDP to enrich an existing application. This could be an on-line application in which we want to surface custom information to a user based on their particular profile. For example: if a user has been searching the web for information on home renovations, in the context of your application you may want to use that knowledge to surface a custom offer for a product that you sell related to that category. Large web companies such as Facebook and others are very sophisticated in the use of this approach.In the diagram, this is about pulling data from disparate sources into HDP in Step 1, storing and processing it in Step 2, and then interacting with it directly from your applications in Step 3, typically in a bi-directional manner (e.g. request data, return data, store response).
A second use case is what we would refer to as Data Exploration – this is the use case in question most commonly when people talk about “Data Science”.In simplest terms, it is about using Hadoop as the primary data store rather than performing the secondary step of moving data into a data warehouse. To support this use case you’ve seen all the BI tool vendor rally to add support for Hadoop – and most commonly HDP – as a peer to the database and in so doing allow for rich analytics on extremely large datasets that would be both unwieldy and also costly in a traditional data warehouse. Hadoop allows for interaction with a much richer dataset and has spawned a whole new generation of analytics tools that rely on Hadoop (HDP) as the data store.To use the graphic, in step 1 data is pulled into HDP, it is stored and processed in Step 2, before being surfaced directly into the analytics tools for the end user in Step 3.
We live in a world where organizations must now compete on their differential use of time and information. Because we can have no more of the former, and because we have an unlimited amount of the latter, there is a new responsibility to harness information more effectively to compete speedily and agilely. Is traditional Business Intelligence able to address this newfound opportunity to put much more information to work? We know that, today, only a small fraction of information workers actually use a traditional BI tool during the course of a day. In fact, according to most industry analysts only about 25% of information workers actually use BI today. Why? Because those tools are too complex and too costly – which prevents the widespread use of timely, actionable data. The bigger matter, though, is that most information workers do NOT spend their day inside of a BI tool . . . Nor do they want to! We simply can’t expect even the best workers to go and find the right report or data that is relevant to their question or issue at hand.
So, what’s the solution? Bring timely, actionable data TO the users. Information workers today truly need information that finds them, not the other way around. This information should be delivered within the software applications and business processes that are used every day by information workers. From pipeline dashboards within the CRM system to visualized compensation data within the HR system and on to interactive charts inside the native, mobile customer service application – the information generated by these business processes and (transactional) software applications should be put to greater use. At Jaspersoft, we call this a “data-driven” application and our mission is to be the Intelligence Inside.
To truly deliver integrated intelligence within a software application or business process, there are 3 primary requirements: 1. must be a simple self-service reporting and analysis environment that allows any user profile, from an executive to data analyst, to get the information they need; 2. must be easy to embed and integrate within the application or process, enabling different techniques to liberate the data generated by the application and encouraging widespread use of it as information; and 3. must be affordable even on a large scale, so there is no question about the value of delivering more information to any user who could benefit.Jaspersoft has become the intelligence inside tens of thousands of software applications and business processes globally, because we’ve set the standard for highly-embeddable and affordable self-service BI. Each day, our software touches millions of people and enables them to make decisions faster using timely, actionable data. Our customers have made Jaspersoft the Intelligence Inside.
Today Jaspersoft is the Intelligence Inside over 130,000 applications of every type in every industry. For example, Red Hat integrates Jaspersoft within its Enterprise Virtualization software (RHEV) and exposes system health and monitoring information to allow its customers to better manage their virtualization environment. Verizon embeds us inside their customer portal to share billing information with their customers. Virgin Money embeds Jaspersoft within its charitable “Giving” multi-tenant SaaS application, providing reports and analysis to describe sources and uses of funds. The Naval Safety Center embeds us inside their internally built application to report on Naval incidents. British Telecom has built a comprehensive statistical data warehouse of customer information, using Jaspersoft for customer service reports and analysis that enable reduced call times and improved service levels. Groupon uses Jaspersoft with Hadoop to drive optimized campaigns to better target users with discount offers. FICO’s Entiera division uses us with Vertica to do large scale marketing analytics. With each of these customer examples, Jaspersoft was chosen because of its modern, embeddable architecture that delivers a rich self-service experience at a fraction of the cost of the alternatives.
But we’re not just focused on delivering the Intelligence Inside of applications and business processes today. Our mission is to become the de facto standard for reporting and analysis in the New IT Stack. Specifically, we want to provide BI Builders with a reporting and analytic service inside their preferred Cloud platform, running on any Big Data store, so they can build Intelligence Inside their internal or commercial applications. As part of that mission, we have delivered a number of business intelligence industry first’s including being:The first and to-date, only, BI service on VMware’s PaaS, Cloud FoundryThe first and to-date, only BI service on Red Hat’s PaaS, OpenShiftThe first and to-date, only BI service on GoGrids’ IaaS marketplaceThe first BI vendor to be certified on Amazon’s new data warehouse service, Redshift. In fact, we were the first BI vendor Amazon approached to support their new service, because of our open source model and communityThe first and only BI provider to connect directly (no ETL) to non-SQL Big Data stores like MongoDB and Hadoop HbaseWhat this means is that BI Builders who are looking to build applications on these new stacks, can build in intelligence using Jaspersoft today
Users can also create and interact with beautiful dashboards that include charts, widgets, maps etc. These are perfect for executives or managers to get a quick understanding of the business and it’s KPIs.
And data analysts or power users can do analysis using traditional OLAP or using interactive visualizations to slice and dice their data to get insights into their business
What is unique about Jaspersoft is that all the functionality we provide can be used to power the Intelligence Inside any application, portal or website. As an example, Tata, the Indian technology giant, uses Jaspersoft to power their Mosaic product which allows media executives to track their media assets from creation all the way through to distribution. As you can see from the screen shot, this product looks nothing like the Jaspersoft product or what you’d think of as a typical business intelligence product. And that’s the point of the Intelligence Inside. You want business users to use reporting, dashboards and analytics in the context of their preferred application, without having to go to a separate BI system. Another example here is from eBuilder, a cloud based application, which helps companies to automate their business processes like travel and expenses, procurement, order fulfillment and After Sales management. All of the reporting and analysis is powered by Jaspersoft. The screen shot here shows the After Sales dashboard where managers can track the performance of different fulfillment centers geographically in delivering product to customers. Finally, the example here of Virgin Money’s non-profit Giving site which is a multi-tenant SaaS application powered by Jaspersoft that allows charities to track and optimize their fund raising activities. All of JasperReports Server capabilities are available here from interactive reporting through to ad hoc query, report and analysis. All, branded to look like the Virgin Money website.
We are able to do all of this because of our world-class BI platform. The platform is 100% open web standards based, from the backend Java server to the CSS (Cascading Style Sheet)-driven, HTML5 user interface. The product has a full suite of capabilities from reporting to dashboards, analysis and visualization that can be viewed and interacted with in a browser or on a tablet or mobile phone. Underlying these capabilities is a columnar-based in-memory engine that allows the user to work with an in-memory data set for faster performance. The engine is intelligent enough to push expensive aggregations down to the underlying database when that makes sense, for example if you have a high-performance analytic database like Amazon’s Redshift. We have a business metadata layer that allows BI Builders to define more business-user friendly data objects that abstract from the underlying data complexity. This layer can connect directly to our data connectors. Alternatively, customers can leverage our powerful data integration layer that allows them to extract, load and transform data to create a data mart or a data warehouse. If they don’t want to move the data but still need to merge multiple data sources they can use our data virtualization layer which allows them to federate queries across multiple data sources so that they look like a single source to the business user. The data connectivity layer provides access to any data source, from relational databases to Big Data stores like Hadoop and NoSQL stores like MongoDB and Cassandra, as well as other data stores like files. All of this power is exposed through an extensive set of APIs from HTTP to SOAP and REST-based web services that allows BI Builders to integrate the server capabilities into their applications.
Only Jaspersoft offers all 3 approaches, giving users the ability to meet any use case requirements
Jaspersoft is an active sponsor of BigDataUniversity.com. This is a FREE online learning portal to develop Big Data expertise and practical skills. We encourage everyone to register today and learn more about Big Data there.
So, now we’re going to see a demo of Jaspersoft 5, our most recent product release. You’ll see many of the capabilities we’ve discussed already. The thing to note about version 5 is that it highlights the strength of our architecture and vision. When creating this product we had the vision to deliver the power of what today you can only get in a desktop visualization tool like Tableau or QlikTech but to do so completely within a browser. Whereas Tableau and Qlik require the user to create the visualizations using a desktop tool, Jaspersoft allows the user to do that from a browser. There are many benefits to this approach. Apart from avoiding the obvious issue of having to manage desktop software, this approach allows BI Builders to embed this functionality inside their internal and commercial applications, portals or websites. This is not possible with the desktop tools. Now, let’s see the product in action.
At Hortonworks today, our focus is very clear: we Develop, Distribute and Support a 100% open source distribution of Enterprise Apache Hadoop.We employ the core architects, builders and operators of Apache Hadoop and drive the innovation in the open source community.We distribute the only 100% open source Enterprise Hadoop distribution: the Hortonworks Data PlatformGiven our operational expertise of running some of the largest Hadoop infrastructure in the world at Yahoo, our team is uniquely positioned to support youOur approach is also uniquely endorsed by some of the biggest vendors in the IT marketYahoo is both and investor and a customer, and most importantly, a development partner. We partner to develop Hadoop, and no distribution of HDP is released without first being tested on Yahoo’s infrastructure and using the same regression suite that they have used for years as they grew to have the largest production cluster in the worldMicrosoft has partnered with Hortonworks to include HDP in both their off-premise offering on Azure but also their on-premise offering under the product name HDInsight. This also includes integration with both Visual Studio for application development but also with System Center for operational management of the infrastructureTeradata includes HDP in their products in order to provide the broadest possible range of options for their customers