SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
www.stratebi.com
Emilio Arias
Co-Founder at StrateBI.
Follow us on Twittter
@Stratebi
@TodoBI_OS
www.stratebi.com
Roberto Tardío
Head of Big Data at StrateBI.
Follow me on Twittter
@RoberTardio
www.stratebi.com
www.stratebi.com
www.stratebi.com
• OLAP (On-Line Analytical Processing)
• Analytical systems that enable interactive queries.
• Requires very low query latency: Milliseconds-Seconds.
• Usually supports SQL and, sometimes, MDX query language.
• Enables KPI’s data aggregation and filtering across hierarchical multidimensional
structures (OLAP cubes).
• Used as data source for diferents goals:
• Detailed data analysis (OLAP views).
• Dashboarding.
• Reporting.
www.stratebi.com
• Big Data OLAP
• Big Data: Volume, Variety and Velocity.
• OLAP applications over Big Data sets.
• Main challenges:
• Very low query latency over fact and dimension tables of billions to trillions of rows.
• Support for ANSI SQL and BI Tools integration.
• Real time data ingestion and processing.
www.stratebi.com
• Current Approaches (some of them).
www.stratebi.com
• Why Apache Kylin?
• Sub-second queries over +12 billion rows fact tables.
• Best query latency results (in our deployements and benchmarks)
• ANSI SQL and BI Tools integration.
• Integration with Pentaho possible through JDBC, Mondrian and PME
• Also Superset, Tableau, Power BI, Zeppelin, Microstrategy…
• Star and snowflake schemas full support
• Not all tools support it (e.g. Druid)
• Near Real time data ingestion (Kafka) and processing.
• It is an Apache open-source project.
• Currently in version 2.5
www.stratebi.com
• Apache Kylin Architecture
• M-OLAP approach:
• Data pre-aggregation.
• Enables only analytical
queries.
• Hadoop based tool
• Full scalability
• Hadoop nodes
• Hbase and Kylin separated
clusters (if needed)
www.stratebi.com
www.stratebi.com
• A real love story
www.stratebi.com
• Why Apache Kylin and Pentaho BA Server?
• It is becoming more and more necessary to provide dashboarding, reporting and
OLAP viewing over Big Data scenarios.
• Using our STTools Pentaho plugins: STPivot, STReport, STDashboard,…
• Also Pentaho Reporting, Community Dashboard Editor, Saiku (plugin),…
• Both Kylin and Pentaho are leading BI & Big Data open-source tools.
• Pentaho enables integration with most-known Big Data tools: Hive, Impala, Spark SQL,…
• Integration with Pentaho possible through JDBC, Mondrian and PME
• Mondrian 4.X using existing Mondrian 4.4 (lagunitas)
• Mondrian 3.X, with a great effort of our team.
• Using Pentaho BA Sever 7.1
www.stratebi.com
• Identified issues and solutions: Kylin and Mondrian 3.X (3.14)
• Issue 1: Kylin needs ANSI-92 inner joins but Mondrian 3.X generated old-style joins.
• Solution: We defined a Mondrian dialect and we used this patch to implement
allowsJoinOn() method.
• Issue 2: Mondrian native cross join and nonempty properties caused invalid SQL
code for Kylin.
• Solution: We disabled these properties for Kylin dialect.
• Issue 3: Kylin needs the fact table to be the first table in the from SQL clause.
• Workaround: We modified Mondrian code to identify fact tables using a name prefix (F or
FT) and thus place them first in the from clause.
www.stratebi.com
• Identified issues and solutions: Kylin and Mondrian 3.X
• Some interesting used references:
• How to implement Kylin dialect for Mondrian
• https://web.archive.org/web/20171010103502/http://dekarlab.de/wp/?p=443
• Pentaho JIRA - MONDRIAN-955
• Mondrian should support the Dialect.allowsJoinOn() option
• Patch
• Pentaho JIRA - MONDRIAN-2364
• Add dialect for Apache Kylin
www.stratebi.com
• Identified issues and solutions: Kylin and Pentaho Metadata Editor
• Issue 1: There is no dialect for Kylin in PME.
• Solution: Definition of the Kylin dialect using the Hive 2 SQL dialect.
• Works perfectly without changing anything.
• JDBC connections between Pentaho BA Server and Kylin:
• Initially we used the generic connection through a JDBC driver.
• To simplify the connection, we defined the connection interface for Kylin in Pentaho
BA Server.
• We have used Pentaho BA Server 7.1 but a connection to Kylin has not yet been included
in Pentaho 8.1.
www.stratebi.com
• Enabling security at schemas, concepts and data levels:
• Mondrian 3.X
• We could not use views to filter data (Kylin approach limitation)
• Solution: We have used Mondrian Dynamic Schema Processor
• We extended the typical Mondrian DSP class using a variable that replaces a piece of
XML from the schema.
• Pentaho Metadata Editor
• PME requires roles and users tables be created in the same data source, but Kylin does
not allow it (Kylin approach limitation).
• Solution: We have created JDBCSecuritySqlGenerator
• Extension of this PME existing security class.
• The security is defined in a file we called securitySQLGenerator-properties.xml.
www.stratebi.com
• What have we obtained?
• Dasboarding, reporting and OLAP viewing using our Pentaho STTools over cubes
with more than a billion rows (1.000.000.000)
• Enabling sub-second Roll-up, Drill-down, Slice and Dice and Pivot OLAP operations.
• We have carried the first deployement of Kylin for a Spain based company.
• Try our demo with Kylin, Pentaho and STPivot viewer (Marketplace available)
• http://bigdata.stratebi.com/kylin-olap/index.htm
www.stratebi.com
www.stratebi.com
• Kylin applied to digital marketing scenario
• Initial Scenario
• OLAP system for data analysis using an in-house reporting tool.
• Based on MySQL (80% queries) + Redshift (20% queries)
• Several million rows per hour in some fact tables
• Goals
• Reduce query latency (some queries take >20s to run)
• Reduce ETL processing time: "Data freshness".
• Implementation of Open-Source BI tools (STTools)
• Self-service OLAP, reporting and dashboarding
www.stratebi.com
• Kylin applied to digital marketing scenario
• Architecture
www.stratebi.com
• Kylin applied to digital marketing scenario
• Goals achieved
• Reduced query latency: User queries were compared for the company's three most
important reports.
• Kylin query executions times are 4 times faster than Redshift.
• Most Kylin queries have response times below 1 second.
• Some very complex queries that in Redshift take about 30 seconds are executed in
over 400 milliseconds using Kylin
• Full integration with open source BI tools (STTools)
• STPivot, STReport, STDashboard
• Security implemented at schema and data levels (Mondrian and PME).
www.stratebi.com
• Kylin applied to digital marketing scenario
• Kylin vs Redshift
Redshift Kylin
www.stratebi.com
www.stratebi.com
• Why Vertica is an alternative to Kylin for Big Data OLAP?
• Sub-second queries over billions of rows fact tables.
• In our implementations and benchmark it achieves very good query latency results.
• But it is not as fast as Kylin for extremely huge fact tables.
• ANSI SQL and BI Tools integration.
• Integration with Pentaho possible through JDBC, Mondrian and PME
• Also Superset, Tableau, Power BI, Zeppelin, Microstrategy…
• Star and snowflake schemas full support
• Near Real time data ingestion and processing.
• Microfocus Vertica is not an open-source project.
• But there is a free community version, enough for much typical Big Data scenarios.
www.stratebi.com
• Vertica Architecture
• Distributed processing in cluster mode.
• But it does not need a hadoop cluster to work.
• Although it does support integration with Hadoop (e.g. Spark or Hive)
• Columnar and distributed storage
• Hybrid OLAP (tables, projections,
flattened tables…)
www.stratebi.com
• Integration with Pentaho and STTools
• Seamless integration with Pentaho PDI for data warehouse loading
• Including bulk load steps
• We have also integrated Vertica with Pentaho BA Server for several successfully use
cases
• Be careful defining the Mondrian OLAP scheme to achieve good performance.
• In PME we have faced similar issues to Kylin (use of PostgreSQL dialect)
• Retail Sector use case
• + 3,000 points of sales = high concurrency
• Volumetrics determined by sales line level detail
• Need for highly customized graphics (we have implemented a lot of CDE dashboards)
www.stratebi.com
www.stratebi.com
• Why a Big Data OLAP Benchmark?
• To test the performance of the two most powerful Big Data OLAP tools
• Kylin vs Vertica
• Compare their performance against OLAP implementations in traditional databases
• PostgreSQL: Open source relational database that has a good performance for OLAP
systems.
www.stratebi.com
• Benchmark implementation
• We have used the SSB benchmark
• A star scheme version of the best known TPC-H
(industry-standard)
• Kyligence team has an implementation of the SSB
benchmark for Apache Kylin.
• Including schemas and data generator.
• We have adapted it to use with Vertica and
PostgreSQL.
• It provides a set of 13 analytical queries
www.stratebi.com
• Test performed
• Number of rows of facts and dimensions tables for each test performed.
• Hardware used
LINEORDER CUSTOMER PART SUPPLIER DATE
Test – Role of table Fact (KPI) Dimension Dimension Dimension Dimension
100M 100.000.000 40.000 32.000 20.000 2.556
500M 500.000.000 200.000 48.000 100.000 2.556
1.000M 1.000.000.000 400.000 56.000 200.000 2.556
Tool Distributed
Processing
Kind of
hardware
Nº of
hosts
Processor Cores RAM Memory
Kylin 2.4 Yes Dedicated
Cloud
3 Intel(R) Atom(TM) CPU C2750 @
2.40GHz
8 32 Gb
Vertica 9.1 Yes Dedicated
Cloud
3 Intel(R) Atom(TM) CPU C2750 @
2.40GHz
8 32 Gb
PostgreSQL 9.6 No Dedicated
Cloud
1 Intel(R) Atom(TM) CPU C2750 @
2.40GHz
8 32 Gb
www.stratebi.com
• Benchmark Results
Test P1 – 100M (seconds) P1 – 500M (seconds) P1 – 1.000M (seconds)
Query Kylin Vertica Postgre Kylin Vertica Postgre Kylin Vertica Postgre
Q1.1 0.2 0.2 22.4 0.3 0.3 +280 0.6 0.6 -
Q1.2 0.2 0.4 18.7 0.3 0.2 +280 0.5 0.3 -
Q1.3 0.2 0.4 18.5 0.3 0.3 +280 0.6 0.2 -
Q2.1 0.3 1.1 18.1 0.4 2.7 +280 0.6 9.1 -
Q2.2 0.3 0.8 16.3 0.4 2.7 +280 0.7 8.2 -
Q2.3 0.3 0.8 15.2 0.4 2.2 +280 0.6 7.4 -
Q3.1 0.3 1.4 23.9 0.4 3.7 +280 0.8 15.1 -
Q3.2 0.6 0.7 18.5 0.8 0.7 +280 0.9 9.8 -
Q3.3 0.3 0.9 15.8 0.3 0.6 +280 0.7 3.7 -
Q3.4 0.2 0.6 15.9 0.2 0.2 +280 0.2 1.0 -
Q4.1 0.3 1.4 23.7 0.4 7.3 +280 0.7 14.7 -
Q4.2 0.3 1.0 23.3 0.4 2.0 +280 0.7 3.8 -
Q4.3 2.5 0.8 17.1 2.4 1.3 +280 2.9 2.0 -
www.stratebi.com
• Benchmark Results
Relationship between row size in the fact table and query latency between Kylin and Vertica
www.stratebi.com
• Benchmark Results
• Kylin and Vertica are both suitable for Big Data OLAP applications.
• Apache Kylin has the best query performance.
• But high hardware, software (Hadoop) and know-how requirements.
• 100% open source version without limitations.
• Vertica is the alternative to Kylin for less extreme Big Data scenarios.
• Lower hardware, software and know-how requirements.
• Free community version with some limitations.
• PostgreSQL is not suitable for Big Data OLAP.
www.stratebi.com
www.stratebi.com
• Pentaho also integrates with many other Big Data tools
• Lince Big Data Stack
• Our selection of Big Data tools based on experience and tests.
• All of them allow the integration with Pentaho open source tools.
• Lince BI tools (formerly STTools) are used to analyze the data from Big Data repositories.
• STPivot: OLAP Viewer.
• STReport: Ad-Hoc Reporting.
• STDashboard: Fast Dashboards.
• STCard: Balanced Scorecards.
www.stratebi.com
www.stratebi.com
• Visit our Big Data demos website
• http://bigdata.stratebi.com/
• Pentaho
• Kylin + STPivot (Mondrian 4.X)
• Hadoop + PDI (HDFS, Hive, Oozie,…steps)
• PDI + SparkMlib + Zeppelin
• Other open source Big Data tools
• Kafka + Spark Streaming
• Kylin + Superset
• Neo4J
• …and much more.
www.stratebi.com
www.stratebi.com
• Pentaho BA server enables Big Data OLAP in combination with Kylin or Vertica.
• Easy to integrate through JDBC connector with SQL based plugins (CDE dashboards)
• We have worked hard to integrate these tools with Mondrian 3.X and PME 7.1.
• Best performance results with the integration between Pentaho, Kylin and STTools
• Sub-second Roll-up, Drill-down, Slice and Dice and Pivot OLAP operations.
• Experienced performance with STTools is really good, but we have to extend our benchmark to
test it (Kylin with Mondrian or PME)
• Pentaho tools are useful for Big Data ETL and analysis
• However, our experience tells us that many of the Pentaho Big Data connectors and features
are very hard to configure.
• We propose to include Kylin and Vertica dialects (Mondrian and PME) in future Pentaho
versions.
PCM18 (Big Data Analytics)

Contenu connexe

Tendances

What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...DataWorks Summit
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data avanttic Consultoría Tecnológica
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Dave Segleau
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...DataWorks Summit
 
Hybrid Apache Spark Architecture with YARN and Kubernetes
Hybrid Apache Spark Architecture with YARN and KubernetesHybrid Apache Spark Architecture with YARN and Kubernetes
Hybrid Apache Spark Architecture with YARN and KubernetesDatabricks
 
From an experiment to a real production environment
From an experiment to a real production environmentFrom an experiment to a real production environment
From an experiment to a real production environmentDataWorks Summit
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement VMware Tanzu
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNDataWorks Summit
 
GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
GoldenGate and ODI - A Perfect Match for Real-Time Data WarehousingGoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
GoldenGate and ODI - A Perfect Match for Real-Time Data WarehousingMichael Rainey
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaDataWorks Summit
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems IBM Power Systems
 
Cloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UKCloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UKCloudera, Inc.
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easyDataWorks Summit
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationMichael Rainey
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 

Tendances (20)

What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
KPN ETL Factory (KETL) - Automated Code generation using Metadata to build Da...
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 
Hybrid Apache Spark Architecture with YARN and Kubernetes
Hybrid Apache Spark Architecture with YARN and KubernetesHybrid Apache Spark Architecture with YARN and Kubernetes
Hybrid Apache Spark Architecture with YARN and Kubernetes
 
From an experiment to a real production environment
From an experiment to a real production environmentFrom an experiment to a real production environment
From an experiment to a real production environment
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
 
Apache Deep Learning 201
Apache Deep Learning 201Apache Deep Learning 201
Apache Deep Learning 201
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
GoldenGate and ODI - A Perfect Match for Real-Time Data WarehousingGoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems
 
Cloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UKCloudera Cares + DataKind | 7 May 2015 | London, UK
Cloudera Cares + DataKind | 7 May 2015 | London, UK
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 

Similaire à PCM18 (Big Data Analytics)

Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Complex Data Transformations Made Easy
Complex Data Transformations Made EasyComplex Data Transformations Made Easy
Complex Data Transformations Made EasyData Con LA
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentationargonauts007
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionElasticsearch
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo SanchezGoDataDriven
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Nicolas Poggi
 
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsOperationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsKinetica
 
Gimel at Teradata Analytics Universe 2018
Gimel at Teradata Analytics Universe 2018Gimel at Teradata Analytics Universe 2018
Gimel at Teradata Analytics Universe 2018Romit Mehta
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceSamanthaBerlant
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...Big Data Spain
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Web Services
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...Geir Høydalsvik
 

Similaire à PCM18 (Big Data Analytics) (20)

Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Complex Data Transformations Made Easy
Complex Data Transformations Made EasyComplex Data Transformations Made Easy
Complex Data Transformations Made Easy
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
 
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsOperationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
 
Gimel at Teradata Analytics Universe 2018
Gimel at Teradata Analytics Universe 2018Gimel at Teradata Analytics Universe 2018
Gimel at Teradata Analytics Universe 2018
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
 

Plus de Stratebi

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentesStratebi
 
Azure Synapse
Azure SynapseAzure Synapse
Azure SynapseStratebi
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with PythonStratebi
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with PythonStratebi
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasStratebi
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup SpainStratebi
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)Stratebi
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integrationStratebi
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data MarketingStratebi
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works Stratebi
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data AnalyticsStratebi
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosStratebi
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports AnalyticsStratebi
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme AnalysisStratebi
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIStratebi
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overviewStratebi
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalleStratebi
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con TalendStratebi
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend IntroducionStratebi
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent AnalyticsStratebi
 

Plus de Stratebi (20)

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentes
 
Azure Synapse
Azure SynapseAzure Synapse
Azure Synapse
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with Python
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with Python
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicas
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup Spain
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integration
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data Marketing
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y Cursos
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports Analytics
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme Analysis
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBI
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overview
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalle
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con Talend
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend Introducion
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent Analytics
 

Dernier

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Dernier (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

PCM18 (Big Data Analytics)

  • 1.
  • 2. www.stratebi.com Emilio Arias Co-Founder at StrateBI. Follow us on Twittter @Stratebi @TodoBI_OS
  • 3. www.stratebi.com Roberto Tardío Head of Big Data at StrateBI. Follow me on Twittter @RoberTardio
  • 6. www.stratebi.com • OLAP (On-Line Analytical Processing) • Analytical systems that enable interactive queries. • Requires very low query latency: Milliseconds-Seconds. • Usually supports SQL and, sometimes, MDX query language. • Enables KPI’s data aggregation and filtering across hierarchical multidimensional structures (OLAP cubes). • Used as data source for diferents goals: • Detailed data analysis (OLAP views). • Dashboarding. • Reporting.
  • 7. www.stratebi.com • Big Data OLAP • Big Data: Volume, Variety and Velocity. • OLAP applications over Big Data sets. • Main challenges: • Very low query latency over fact and dimension tables of billions to trillions of rows. • Support for ANSI SQL and BI Tools integration. • Real time data ingestion and processing.
  • 9. www.stratebi.com • Why Apache Kylin? • Sub-second queries over +12 billion rows fact tables. • Best query latency results (in our deployements and benchmarks) • ANSI SQL and BI Tools integration. • Integration with Pentaho possible through JDBC, Mondrian and PME • Also Superset, Tableau, Power BI, Zeppelin, Microstrategy… • Star and snowflake schemas full support • Not all tools support it (e.g. Druid) • Near Real time data ingestion (Kafka) and processing. • It is an Apache open-source project. • Currently in version 2.5
  • 10. www.stratebi.com • Apache Kylin Architecture • M-OLAP approach: • Data pre-aggregation. • Enables only analytical queries. • Hadoop based tool • Full scalability • Hadoop nodes • Hbase and Kylin separated clusters (if needed)
  • 13. www.stratebi.com • Why Apache Kylin and Pentaho BA Server? • It is becoming more and more necessary to provide dashboarding, reporting and OLAP viewing over Big Data scenarios. • Using our STTools Pentaho plugins: STPivot, STReport, STDashboard,… • Also Pentaho Reporting, Community Dashboard Editor, Saiku (plugin),… • Both Kylin and Pentaho are leading BI & Big Data open-source tools. • Pentaho enables integration with most-known Big Data tools: Hive, Impala, Spark SQL,… • Integration with Pentaho possible through JDBC, Mondrian and PME • Mondrian 4.X using existing Mondrian 4.4 (lagunitas) • Mondrian 3.X, with a great effort of our team. • Using Pentaho BA Sever 7.1
  • 14. www.stratebi.com • Identified issues and solutions: Kylin and Mondrian 3.X (3.14) • Issue 1: Kylin needs ANSI-92 inner joins but Mondrian 3.X generated old-style joins. • Solution: We defined a Mondrian dialect and we used this patch to implement allowsJoinOn() method. • Issue 2: Mondrian native cross join and nonempty properties caused invalid SQL code for Kylin. • Solution: We disabled these properties for Kylin dialect. • Issue 3: Kylin needs the fact table to be the first table in the from SQL clause. • Workaround: We modified Mondrian code to identify fact tables using a name prefix (F or FT) and thus place them first in the from clause.
  • 15. www.stratebi.com • Identified issues and solutions: Kylin and Mondrian 3.X • Some interesting used references: • How to implement Kylin dialect for Mondrian • https://web.archive.org/web/20171010103502/http://dekarlab.de/wp/?p=443 • Pentaho JIRA - MONDRIAN-955 • Mondrian should support the Dialect.allowsJoinOn() option • Patch • Pentaho JIRA - MONDRIAN-2364 • Add dialect for Apache Kylin
  • 16. www.stratebi.com • Identified issues and solutions: Kylin and Pentaho Metadata Editor • Issue 1: There is no dialect for Kylin in PME. • Solution: Definition of the Kylin dialect using the Hive 2 SQL dialect. • Works perfectly without changing anything. • JDBC connections between Pentaho BA Server and Kylin: • Initially we used the generic connection through a JDBC driver. • To simplify the connection, we defined the connection interface for Kylin in Pentaho BA Server. • We have used Pentaho BA Server 7.1 but a connection to Kylin has not yet been included in Pentaho 8.1.
  • 17. www.stratebi.com • Enabling security at schemas, concepts and data levels: • Mondrian 3.X • We could not use views to filter data (Kylin approach limitation) • Solution: We have used Mondrian Dynamic Schema Processor • We extended the typical Mondrian DSP class using a variable that replaces a piece of XML from the schema. • Pentaho Metadata Editor • PME requires roles and users tables be created in the same data source, but Kylin does not allow it (Kylin approach limitation). • Solution: We have created JDBCSecuritySqlGenerator • Extension of this PME existing security class. • The security is defined in a file we called securitySQLGenerator-properties.xml.
  • 18. www.stratebi.com • What have we obtained? • Dasboarding, reporting and OLAP viewing using our Pentaho STTools over cubes with more than a billion rows (1.000.000.000) • Enabling sub-second Roll-up, Drill-down, Slice and Dice and Pivot OLAP operations. • We have carried the first deployement of Kylin for a Spain based company. • Try our demo with Kylin, Pentaho and STPivot viewer (Marketplace available) • http://bigdata.stratebi.com/kylin-olap/index.htm
  • 20. www.stratebi.com • Kylin applied to digital marketing scenario • Initial Scenario • OLAP system for data analysis using an in-house reporting tool. • Based on MySQL (80% queries) + Redshift (20% queries) • Several million rows per hour in some fact tables • Goals • Reduce query latency (some queries take >20s to run) • Reduce ETL processing time: "Data freshness". • Implementation of Open-Source BI tools (STTools) • Self-service OLAP, reporting and dashboarding
  • 21. www.stratebi.com • Kylin applied to digital marketing scenario • Architecture
  • 22. www.stratebi.com • Kylin applied to digital marketing scenario • Goals achieved • Reduced query latency: User queries were compared for the company's three most important reports. • Kylin query executions times are 4 times faster than Redshift. • Most Kylin queries have response times below 1 second. • Some very complex queries that in Redshift take about 30 seconds are executed in over 400 milliseconds using Kylin • Full integration with open source BI tools (STTools) • STPivot, STReport, STDashboard • Security implemented at schema and data levels (Mondrian and PME).
  • 23. www.stratebi.com • Kylin applied to digital marketing scenario • Kylin vs Redshift Redshift Kylin
  • 25. www.stratebi.com • Why Vertica is an alternative to Kylin for Big Data OLAP? • Sub-second queries over billions of rows fact tables. • In our implementations and benchmark it achieves very good query latency results. • But it is not as fast as Kylin for extremely huge fact tables. • ANSI SQL and BI Tools integration. • Integration with Pentaho possible through JDBC, Mondrian and PME • Also Superset, Tableau, Power BI, Zeppelin, Microstrategy… • Star and snowflake schemas full support • Near Real time data ingestion and processing. • Microfocus Vertica is not an open-source project. • But there is a free community version, enough for much typical Big Data scenarios.
  • 26. www.stratebi.com • Vertica Architecture • Distributed processing in cluster mode. • But it does not need a hadoop cluster to work. • Although it does support integration with Hadoop (e.g. Spark or Hive) • Columnar and distributed storage • Hybrid OLAP (tables, projections, flattened tables…)
  • 27. www.stratebi.com • Integration with Pentaho and STTools • Seamless integration with Pentaho PDI for data warehouse loading • Including bulk load steps • We have also integrated Vertica with Pentaho BA Server for several successfully use cases • Be careful defining the Mondrian OLAP scheme to achieve good performance. • In PME we have faced similar issues to Kylin (use of PostgreSQL dialect) • Retail Sector use case • + 3,000 points of sales = high concurrency • Volumetrics determined by sales line level detail • Need for highly customized graphics (we have implemented a lot of CDE dashboards)
  • 29. www.stratebi.com • Why a Big Data OLAP Benchmark? • To test the performance of the two most powerful Big Data OLAP tools • Kylin vs Vertica • Compare their performance against OLAP implementations in traditional databases • PostgreSQL: Open source relational database that has a good performance for OLAP systems.
  • 30. www.stratebi.com • Benchmark implementation • We have used the SSB benchmark • A star scheme version of the best known TPC-H (industry-standard) • Kyligence team has an implementation of the SSB benchmark for Apache Kylin. • Including schemas and data generator. • We have adapted it to use with Vertica and PostgreSQL. • It provides a set of 13 analytical queries
  • 31. www.stratebi.com • Test performed • Number of rows of facts and dimensions tables for each test performed. • Hardware used LINEORDER CUSTOMER PART SUPPLIER DATE Test – Role of table Fact (KPI) Dimension Dimension Dimension Dimension 100M 100.000.000 40.000 32.000 20.000 2.556 500M 500.000.000 200.000 48.000 100.000 2.556 1.000M 1.000.000.000 400.000 56.000 200.000 2.556 Tool Distributed Processing Kind of hardware Nº of hosts Processor Cores RAM Memory Kylin 2.4 Yes Dedicated Cloud 3 Intel(R) Atom(TM) CPU C2750 @ 2.40GHz 8 32 Gb Vertica 9.1 Yes Dedicated Cloud 3 Intel(R) Atom(TM) CPU C2750 @ 2.40GHz 8 32 Gb PostgreSQL 9.6 No Dedicated Cloud 1 Intel(R) Atom(TM) CPU C2750 @ 2.40GHz 8 32 Gb
  • 32. www.stratebi.com • Benchmark Results Test P1 – 100M (seconds) P1 – 500M (seconds) P1 – 1.000M (seconds) Query Kylin Vertica Postgre Kylin Vertica Postgre Kylin Vertica Postgre Q1.1 0.2 0.2 22.4 0.3 0.3 +280 0.6 0.6 - Q1.2 0.2 0.4 18.7 0.3 0.2 +280 0.5 0.3 - Q1.3 0.2 0.4 18.5 0.3 0.3 +280 0.6 0.2 - Q2.1 0.3 1.1 18.1 0.4 2.7 +280 0.6 9.1 - Q2.2 0.3 0.8 16.3 0.4 2.7 +280 0.7 8.2 - Q2.3 0.3 0.8 15.2 0.4 2.2 +280 0.6 7.4 - Q3.1 0.3 1.4 23.9 0.4 3.7 +280 0.8 15.1 - Q3.2 0.6 0.7 18.5 0.8 0.7 +280 0.9 9.8 - Q3.3 0.3 0.9 15.8 0.3 0.6 +280 0.7 3.7 - Q3.4 0.2 0.6 15.9 0.2 0.2 +280 0.2 1.0 - Q4.1 0.3 1.4 23.7 0.4 7.3 +280 0.7 14.7 - Q4.2 0.3 1.0 23.3 0.4 2.0 +280 0.7 3.8 - Q4.3 2.5 0.8 17.1 2.4 1.3 +280 2.9 2.0 -
  • 33. www.stratebi.com • Benchmark Results Relationship between row size in the fact table and query latency between Kylin and Vertica
  • 34. www.stratebi.com • Benchmark Results • Kylin and Vertica are both suitable for Big Data OLAP applications. • Apache Kylin has the best query performance. • But high hardware, software (Hadoop) and know-how requirements. • 100% open source version without limitations. • Vertica is the alternative to Kylin for less extreme Big Data scenarios. • Lower hardware, software and know-how requirements. • Free community version with some limitations. • PostgreSQL is not suitable for Big Data OLAP.
  • 36. www.stratebi.com • Pentaho also integrates with many other Big Data tools • Lince Big Data Stack • Our selection of Big Data tools based on experience and tests. • All of them allow the integration with Pentaho open source tools. • Lince BI tools (formerly STTools) are used to analyze the data from Big Data repositories. • STPivot: OLAP Viewer. • STReport: Ad-Hoc Reporting. • STDashboard: Fast Dashboards. • STCard: Balanced Scorecards.
  • 38. www.stratebi.com • Visit our Big Data demos website • http://bigdata.stratebi.com/ • Pentaho • Kylin + STPivot (Mondrian 4.X) • Hadoop + PDI (HDFS, Hive, Oozie,…steps) • PDI + SparkMlib + Zeppelin • Other open source Big Data tools • Kafka + Spark Streaming • Kylin + Superset • Neo4J • …and much more.
  • 40. www.stratebi.com • Pentaho BA server enables Big Data OLAP in combination with Kylin or Vertica. • Easy to integrate through JDBC connector with SQL based plugins (CDE dashboards) • We have worked hard to integrate these tools with Mondrian 3.X and PME 7.1. • Best performance results with the integration between Pentaho, Kylin and STTools • Sub-second Roll-up, Drill-down, Slice and Dice and Pivot OLAP operations. • Experienced performance with STTools is really good, but we have to extend our benchmark to test it (Kylin with Mondrian or PME) • Pentaho tools are useful for Big Data ETL and analysis • However, our experience tells us that many of the Pentaho Big Data connectors and features are very hard to configure. • We propose to include Kylin and Vertica dialects (Mondrian and PME) in future Pentaho versions.