SlideShare une entreprise Scribd logo
1  sur  29
Innovation and
Reinvention Driving
Transformation
OCTOBER 9, 2018
2018 HPCC Systems® Community
Day
Gavin Halliday
A First Look at HPCC Systems 7.0, Innovation in Action
Renewing the foundations
• File processing
• ECLWatch workunit interface
• Visualization Framework
• DESDL
• Configuration manager
HPCC 7.0 2
Usability and
Productivity
ECL Watch
Goals
• Highlight important information
• Make it easier to understand queries
• Improved support for very large queries
Examples:
• Gantt
• Graph Viewer
• Timings
• Log data visualizer
HPCC 7.0 4
Gantt chart
HPCC 7.0 5
New Graph Viewer
HPCC 7.0 6
New Graph Viewer
HPCC 7.0 7
Stats and Timings
HPCC 7.0 8
Visualization Framework
• Version 2.0 now available
• https://github.com/hpcc-systems/Visualization
• Rebranded as hpcc-js in the node npm repository
• New documentation, demos and gallery
• Includes non visualization items like ESP comms layer
• Dashy beta
• Not tied to HPCC Systems
• Visualizer Bundle 1.1
HPCC 7.0 9
ECL libraries
• Ecl Library extensions
• Date – timestamps, time zones, formatting
• Unicode – words, prefixes and suffixes
• Maths – infinity, fmod
• Bundles
• Data Patterns
• ML – Gradient boosted trees, boosted forests
• Visualizer
HPCC 7.0 10
ESP improvements
• DESDL improvements
• Custom mappings
• Fully integrated into ESP
• Mixing DESDL and ESDL in one service
• Allow disconnection from Dali
• Support for persistent connections.
HPCC 7.0 11
ECL Compiler
• Activities in other languages.
EXPORT streamed dataset(r) myDataset(unsigned numRows = numRows) :=
EMBED(javascript : activity) …
• Multi-line string constants
message := '''One
Two
Three''';
• Code generator improvements
• Faster archive generation
• Faster syntax checking
HPCC 7.0 12
Interoperability
Spark
• “An open source distributed general-purpose cluster-computing framework”
• Reading from spark
• Files and indexes.
• Filter rows
• Select fields required
• N to M parallel reads
• Writing from spark
• File security
• Spark cluster installation
HPCC 7.0 14
Log Data Visualizations
HPCC 7.0 15
Log Data Visualizations
HPCC 7.0 16
Log Data Visualizations
HPCC 7.0 17
https://hpccsystems.com/blog/ELK_visualizations
VS Code
HPCC 7.0 18
https://code.visualstudio.com/
VS Code
HPCC 7.0 19
Security
User Security
• Session management
• Avoid resending credentials
• Users can log out
• Allow sessions lock and time out
• Minimize time passwords retained
HPCC 7.0 21
System security
• Spark
• File access rights
• Dafilesrv authentication of requests
• The cloud
• Verifying components
• Encryption in transit
• ROXIE HTTPS support
HPCC 7.0 22
Performance
Thor
• Keyed Join (HPCC-16476)
HPCC 7.0 24
Thor
• LOOP
• Synchronization overhead
• LOCAL LOOP bodies
• Child Queries
• Reduced overhead
• Improvements to buffering
• Faster Startup
HPCC 7.0 25
Index improvements
HPCC 7.0 26
•60K rows
•0.02% of totalHourly
•1.4M rows
•0.6% of totalDaily
•10M rows
•4% of totalWeekly
•43M rows
•17% of totalMonthly
•520M rows
•100% of totalHistorical
• Example database containing 250M unique items with 1000 updates each minute
Index improvements
• Bloom filters
• Supports multiple filters per index
• User configurable probability
• Automatically created.
• Richard’s blog post hpccsystems.com/blog/bloom-filters
• Hash distributed keys.
• When distribution fields are filtered with equalities
• Easier to create co-distributed keys
• Lower overhead calculating the part containing a match
HPCC 7.0 27
Finally
• WsSQL – now part of the core
• Over 1,000 pull requests since 6.4
HPCC 7.0 28
Talk to us!
• Bloom filters - Richard Chapman
• DESDL - Yanrui Ma
• ELK - Rodrigo Pastrana
• Thor - Jake Cobbett-Smith
• Visualizations - Gordon Smith
• Security - Tony Fishbeck
• Spark - Rodrigo Pastrana
• Config Manager - Ken Rowland
HPCC 7.0 29

Contenu connexe

Tendances

OSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica SarbuOSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica SarbuNETWAYS
 
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...OpenNebula Project
 
Kubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csiKubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csiRita Zhang
 
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge MeinhardHNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge MeinhardHelix Nebula The Science Cloud
 
Everything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
Everything you wanted to know about RadosGW - Orit Wasserman, Matt BenjaminEverything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
Everything you wanted to know about RadosGW - Orit Wasserman, Matt BenjaminCeph Community
 
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07it-novum
 
Orchestrating Shared Networks, Physical LB and DNS on Cloudstack
Orchestrating Shared Networks, Physical LB and DNS on CloudstackOrchestrating Shared Networks, Physical LB and DNS on Cloudstack
Orchestrating Shared Networks, Physical LB and DNS on CloudstackMarcus Vinicius Cesário
 
Cloud Networking - Greg Blomquist, Scott Drennan, Lokesh Jain - ManageIQ Desi...
Cloud Networking - Greg Blomquist, Scott Drennan, Lokesh Jain - ManageIQ Desi...Cloud Networking - Greg Blomquist, Scott Drennan, Lokesh Jain - ManageIQ Desi...
Cloud Networking - Greg Blomquist, Scott Drennan, Lokesh Jain - ManageIQ Desi...ManageIQ
 
Storage Monitoring in openATTIC - Monitoring Workshop - 2016-09-07
Storage Monitoring in openATTIC - Monitoring Workshop - 2016-09-07Storage Monitoring in openATTIC - Monitoring Workshop - 2016-09-07
Storage Monitoring in openATTIC - Monitoring Workshop - 2016-09-07Lenz Grimmer
 
Kubernetes Fundamentals on Azure 2017
Kubernetes Fundamentals on Azure 2017Kubernetes Fundamentals on Azure 2017
Kubernetes Fundamentals on Azure 2017Vadim Zendejas
 
ManageIQ Overview at Management and Orchestration Developer (MODM) Meet-up
ManageIQ Overview at Management and Orchestration Developer (MODM) Meet-upManageIQ Overview at Management and Orchestration Developer (MODM) Meet-up
ManageIQ Overview at Management and Orchestration Developer (MODM) Meet-upJerome Marc
 
Sprint 38 review
Sprint 38 reviewSprint 38 review
Sprint 38 reviewManageIQ
 
OSMC 2018 | SLA Monitoring mit Icinga & Prometheus by Moritz Tanzer
OSMC 2018 | SLA Monitoring mit Icinga & Prometheus by Moritz TanzerOSMC 2018 | SLA Monitoring mit Icinga & Prometheus by Moritz Tanzer
OSMC 2018 | SLA Monitoring mit Icinga & Prometheus by Moritz TanzerNETWAYS
 
Serhiy Kalinets "Building .NET Services for Kubernetes"
Serhiy Kalinets "Building .NET Services for Kubernetes"Serhiy Kalinets "Building .NET Services for Kubernetes"
Serhiy Kalinets "Building .NET Services for Kubernetes"Fwdays
 
Ceph and Storage Management with openATTIC, Ceph Tech Talks 2016-06-23
Ceph and Storage Management with openATTIC, Ceph Tech Talks 2016-06-23Ceph and Storage Management with openATTIC, Ceph Tech Talks 2016-06-23
Ceph and Storage Management with openATTIC, Ceph Tech Talks 2016-06-23Lenz Grimmer
 
Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusGrafana Labs
 
CoreOS fest 2016 Summary - DevOps BP 2016 June
CoreOS fest 2016 Summary - DevOps BP 2016 JuneCoreOS fest 2016 Summary - DevOps BP 2016 June
CoreOS fest 2016 Summary - DevOps BP 2016 JuneZsolt Molnar
 
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGOOpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGOOpenNebula Project
 
Intro to creating kubernetes operators
Intro to creating kubernetes operators Intro to creating kubernetes operators
Intro to creating kubernetes operators Juraj Hantak
 

Tendances (20)

OSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica SarbuOSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
 
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
 
Kubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csiKubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csi
 
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge MeinhardHNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
 
Everything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
Everything you wanted to know about RadosGW - Orit Wasserman, Matt BenjaminEverything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
Everything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
 
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
 
Orchestrating Shared Networks, Physical LB and DNS on Cloudstack
Orchestrating Shared Networks, Physical LB and DNS on CloudstackOrchestrating Shared Networks, Physical LB and DNS on Cloudstack
Orchestrating Shared Networks, Physical LB and DNS on Cloudstack
 
Cloud Networking - Greg Blomquist, Scott Drennan, Lokesh Jain - ManageIQ Desi...
Cloud Networking - Greg Blomquist, Scott Drennan, Lokesh Jain - ManageIQ Desi...Cloud Networking - Greg Blomquist, Scott Drennan, Lokesh Jain - ManageIQ Desi...
Cloud Networking - Greg Blomquist, Scott Drennan, Lokesh Jain - ManageIQ Desi...
 
Storage Monitoring in openATTIC - Monitoring Workshop - 2016-09-07
Storage Monitoring in openATTIC - Monitoring Workshop - 2016-09-07Storage Monitoring in openATTIC - Monitoring Workshop - 2016-09-07
Storage Monitoring in openATTIC - Monitoring Workshop - 2016-09-07
 
Kubernetes Fundamentals on Azure 2017
Kubernetes Fundamentals on Azure 2017Kubernetes Fundamentals on Azure 2017
Kubernetes Fundamentals on Azure 2017
 
ManageIQ Overview at Management and Orchestration Developer (MODM) Meet-up
ManageIQ Overview at Management and Orchestration Developer (MODM) Meet-upManageIQ Overview at Management and Orchestration Developer (MODM) Meet-up
ManageIQ Overview at Management and Orchestration Developer (MODM) Meet-up
 
Sprint 38 review
Sprint 38 reviewSprint 38 review
Sprint 38 review
 
OSMC 2018 | SLA Monitoring mit Icinga & Prometheus by Moritz Tanzer
OSMC 2018 | SLA Monitoring mit Icinga & Prometheus by Moritz TanzerOSMC 2018 | SLA Monitoring mit Icinga & Prometheus by Moritz Tanzer
OSMC 2018 | SLA Monitoring mit Icinga & Prometheus by Moritz Tanzer
 
Serhiy Kalinets "Building .NET Services for Kubernetes"
Serhiy Kalinets "Building .NET Services for Kubernetes"Serhiy Kalinets "Building .NET Services for Kubernetes"
Serhiy Kalinets "Building .NET Services for Kubernetes"
 
Ceph and Storage Management with openATTIC, Ceph Tech Talks 2016-06-23
Ceph and Storage Management with openATTIC, Ceph Tech Talks 2016-06-23Ceph and Storage Management with openATTIC, Ceph Tech Talks 2016-06-23
Ceph and Storage Management with openATTIC, Ceph Tech Talks 2016-06-23
 
Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available Prometheus
 
CoreOS fest 2016 Summary - DevOps BP 2016 June
CoreOS fest 2016 Summary - DevOps BP 2016 JuneCoreOS fest 2016 Summary - DevOps BP 2016 June
CoreOS fest 2016 Summary - DevOps BP 2016 June
 
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGOOpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
 
Intro to creating kubernetes operators
Intro to creating kubernetes operators Intro to creating kubernetes operators
Intro to creating kubernetes operators
 
Kong in 1.x Territory
Kong in 1.x TerritoryKong in 1.x Territory
Kong in 1.x Territory
 

Similaire à A First Look at HPCC Systems 7.0, Innovation in Action

Innovation with Connection, The new HPCC Systems Plugins and Modules
Innovation with Connection, The new HPCC Systems Plugins and ModulesInnovation with Connection, The new HPCC Systems Plugins and Modules
Innovation with Connection, The new HPCC Systems Plugins and ModulesHPCC Systems
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems
 
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11HPCC Systems
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02Kumar Gaurav
 
Technical Introduction to RHEL8
Technical Introduction to RHEL8Technical Introduction to RHEL8
Technical Introduction to RHEL8vidalinux
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013Matt Ray
 
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKrishna-Kumar
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
Tech-Spark: SQL Server on Linux
Tech-Spark: SQL Server on LinuxTech-Spark: SQL Server on Linux
Tech-Spark: SQL Server on LinuxRalph Attard
 
HPCC Platform + Visualization
HPCC Platform + VisualizationHPCC Platform + Visualization
HPCC Platform + VisualizationGordon Smith
 
DEVNET-1136 Cisco ONE Enterprise Cloud Suite for Infrastructure Management.
DEVNET-1136	Cisco ONE Enterprise Cloud Suite for Infrastructure Management.DEVNET-1136	Cisco ONE Enterprise Cloud Suite for Infrastructure Management.
DEVNET-1136 Cisco ONE Enterprise Cloud Suite for Infrastructure Management.Cisco DevNet
 
Vijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-featuresVijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-featuresmkorremans
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017Radisys Corporation
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...DataWorks Summit
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Weaveworks
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
 

Similaire à A First Look at HPCC Systems 7.0, Innovation in Action (20)

Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Innovation with Connection, The new HPCC Systems Plugins and Modules
Innovation with Connection, The new HPCC Systems Plugins and ModulesInnovation with Connection, The new HPCC Systems Plugins and Modules
Innovation with Connection, The new HPCC Systems Plugins and Modules
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 Highlights
 
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02
 
Technical Introduction to RHEL8
Technical Introduction to RHEL8Technical Introduction to RHEL8
Technical Introduction to RHEL8
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013
 
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Tech-Spark: SQL Server on Linux
Tech-Spark: SQL Server on LinuxTech-Spark: SQL Server on Linux
Tech-Spark: SQL Server on Linux
 
HPCC Platform + Visualization
HPCC Platform + VisualizationHPCC Platform + Visualization
HPCC Platform + Visualization
 
DEVNET-1136 Cisco ONE Enterprise Cloud Suite for Infrastructure Management.
DEVNET-1136	Cisco ONE Enterprise Cloud Suite for Infrastructure Management.DEVNET-1136	Cisco ONE Enterprise Cloud Suite for Infrastructure Management.
DEVNET-1136 Cisco ONE Enterprise Cloud Suite for Infrastructure Management.
 
Vijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-featuresVijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-features
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 

Plus de HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...HPCC Systems
 

Plus de HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
 

Dernier

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Dernier (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 

A First Look at HPCC Systems 7.0, Innovation in Action

  • 1. Innovation and Reinvention Driving Transformation OCTOBER 9, 2018 2018 HPCC Systems® Community Day Gavin Halliday A First Look at HPCC Systems 7.0, Innovation in Action
  • 2. Renewing the foundations • File processing • ECLWatch workunit interface • Visualization Framework • DESDL • Configuration manager HPCC 7.0 2
  • 4. ECL Watch Goals • Highlight important information • Make it easier to understand queries • Improved support for very large queries Examples: • Gantt • Graph Viewer • Timings • Log data visualizer HPCC 7.0 4
  • 9. Visualization Framework • Version 2.0 now available • https://github.com/hpcc-systems/Visualization • Rebranded as hpcc-js in the node npm repository • New documentation, demos and gallery • Includes non visualization items like ESP comms layer • Dashy beta • Not tied to HPCC Systems • Visualizer Bundle 1.1 HPCC 7.0 9
  • 10. ECL libraries • Ecl Library extensions • Date – timestamps, time zones, formatting • Unicode – words, prefixes and suffixes • Maths – infinity, fmod • Bundles • Data Patterns • ML – Gradient boosted trees, boosted forests • Visualizer HPCC 7.0 10
  • 11. ESP improvements • DESDL improvements • Custom mappings • Fully integrated into ESP • Mixing DESDL and ESDL in one service • Allow disconnection from Dali • Support for persistent connections. HPCC 7.0 11
  • 12. ECL Compiler • Activities in other languages. EXPORT streamed dataset(r) myDataset(unsigned numRows = numRows) := EMBED(javascript : activity) … • Multi-line string constants message := '''One Two Three'''; • Code generator improvements • Faster archive generation • Faster syntax checking HPCC 7.0 12
  • 14. Spark • “An open source distributed general-purpose cluster-computing framework” • Reading from spark • Files and indexes. • Filter rows • Select fields required • N to M parallel reads • Writing from spark • File security • Spark cluster installation HPCC 7.0 14
  • 17. Log Data Visualizations HPCC 7.0 17 https://hpccsystems.com/blog/ELK_visualizations
  • 18. VS Code HPCC 7.0 18 https://code.visualstudio.com/
  • 21. User Security • Session management • Avoid resending credentials • Users can log out • Allow sessions lock and time out • Minimize time passwords retained HPCC 7.0 21
  • 22. System security • Spark • File access rights • Dafilesrv authentication of requests • The cloud • Verifying components • Encryption in transit • ROXIE HTTPS support HPCC 7.0 22
  • 24. Thor • Keyed Join (HPCC-16476) HPCC 7.0 24
  • 25. Thor • LOOP • Synchronization overhead • LOCAL LOOP bodies • Child Queries • Reduced overhead • Improvements to buffering • Faster Startup HPCC 7.0 25
  • 26. Index improvements HPCC 7.0 26 •60K rows •0.02% of totalHourly •1.4M rows •0.6% of totalDaily •10M rows •4% of totalWeekly •43M rows •17% of totalMonthly •520M rows •100% of totalHistorical • Example database containing 250M unique items with 1000 updates each minute
  • 27. Index improvements • Bloom filters • Supports multiple filters per index • User configurable probability • Automatically created. • Richard’s blog post hpccsystems.com/blog/bloom-filters • Hash distributed keys. • When distribution fields are filtered with equalities • Easier to create co-distributed keys • Lower overhead calculating the part containing a match HPCC 7.0 27
  • 28. Finally • WsSQL – now part of the core • Over 1,000 pull requests since 6.4 HPCC 7.0 28
  • 29. Talk to us! • Bloom filters - Richard Chapman • DESDL - Yanrui Ma • ELK - Rodrigo Pastrana • Thor - Jake Cobbett-Smith • Visualizations - Gordon Smith • Security - Tony Fishbeck • Spark - Rodrigo Pastrana • Config Manager - Ken Rowland HPCC 7.0 29

Notes de l'éditeur

  1. Good afternoon. In this presentation I am going to guide you through some of the main changes in the new version of the platform. If something catches your eye and you want to find out more, please come and chat afterwards in one of the breaks. Hopefully by the end you’ll all be dying to try it out for yourselves. [20]
  2. So, each major version of the platform is a chance for us to make significant changes to some of the foundations. The changes in 7.0 have enabled us to introduce various new features, but just as importantly they provide the scope for improvements in future releases. Let’s take the first of these as an example. The file changes came about through a combination of different requirements: First of all we wanted to make it easier for ECL developers when file formats change. Previously if the format of file changed, then you needed to update your own copy of the ECL definition before you could read it. It would be much better if you could continue to use the old definition until it was convenient for you to update your sources. Secondly, it can be slow reading files and indexes between clusters because the network capacity between them is often much smaller than within a cluster. If the data being transferred could be reduced by filtering and projecting remotely, it should progress much faster. Thirdly, there was a need to improve integration with other platforms particularly Spark. So we revamped the file processing code to make it more flexible. As a bonus in future versions, it will make it easier to read other file formats, and even reduce the size of the generated c++ code. I’ll return to some of the others items in this list later, but for the rest of this presentation I’m going to group the changes into 4 main areas.. [1:40]
  3. The first area is changes that improve your day to day experience as a developer. [10]
  4. EclWatch is something that all ECL developers spend quite a lot of time using – whether it is directly in a browser web page, or embedded within the eclide. We wanted to bring important information to your attention. For instance if something is wrong with your query or with the system it should be clearly presented to you, ideally on a dashboard, rather than needing to go and hunt for it. We also wanted to give you better tools to understand your queries, to dig into the detail, for example where is the time going, and what was happening at a particular point in your query. Let’s look at a few of the changes in more detail. [50]
  5. The workunit timings and graph pages have gained a gantt chart at the top. It includes all the events in a workunit’s lifetime, tooltips provide extra details and you can zoom in on any part of the chart. Here are 3 different examples. The first example comes from a system that is busy. It isn’t always obvious why your job took a long time to run. Was it the compiler was slow, thor was busy, or it is just a slow job. Here you can quickly see that although the workunit took about 80 seconds to execute, almost one minute of that time was taken up waiting for a Thor to become available before the graph could run. The second example is that same chart zoomed in to highlight the time taken compiling a query, with a tooltip highlighting details from one of the stages. The final example is from a workunit with multiple workflow actions like persists, or independents. You can quickly see where the time has gone, and the order the graphs and subgraphs were executed in. [1:00]
  6. A new java script graph viewer was introduced in 6.0, and in 7.0 it has been fully integrated into Gordon’s visualisation framework. As well as meaning it is available for anyone to use in their visualisations, it also allows other components of the visualisation framework to be easily included in the graph. For the moment Gordon has used that to add little tweaks like icons for the activity types, but I suspect he has many other ideas. [30]
  7. One problem with large queries is that the graphs can be unmanageable and take forever to display. One significant change is the graph viewer can now request a much smaller subset – for instance clicking on a subgraph in the timings list brings you to this view – which can be rendered much more quickly. [20]
  8. Our goal for improving the timings tab is simple enough – to make it easy to examine the performance of your query. Unfortunately it isn’t immediately obvious the best way to present all the information that is available, but hopefully the changes we have made will be a step in the right direction. This example shows 4 different timings for a graph that reads from disk, sorts, and then writes to disk. The purple bars represent the total time within that activity, and the other colour bars represent times for different tasks within the activity. It helps gives a better idea of where the time is going and why. Again this is another area I expect to change and improve in future versions. So please let us know what sorts of comparisons would be useful to you, and how you would like them displayed. [50]
  9. Many of these changes in eclwatch rely on the improvements to the visualisation framework, which I think is worth highlighting in its own right. If you are producing any visualisations – with or without HPCC – it would be well worth your time investigating it further. For those who don’t know the visualisation framework is a separate open source project, held in its own github repository. It provides visualisations that can pull data from various sources especially big data. It is designed to work well with all common java script frameworks, and is published in the node npm repository, which makes it trivial to include in any project. There are really two different components to the library – visualisations and communications. The visualisation side provides great functionality – like the gantt charts and graph viewer that you saw earlier. But the framework really comes in to its own when it is used in combination with HPCC. For instance you can directly render the results of your roxie query to a chart embedded on a web page. If you are including visualizations in an ecl queries, then go along to the breakout session that Gordon is hosting later will cover the new version of the visualizer bundle in much more detail. [1:20]
  10. I am not going to delve into any detail on the changes within the ecl library. What I want to bring to your attention is that there are improvements in each of these areas. So whether you need to split Unicode strings into words, or process dates in different timezones, there may well be changes in 7.0 that make your job easier. We have already heard details from Dan and Roger about some of the bundle changes, and more about the visualizer is coming up in the following breakout. [30]
  11. The ESP improvements really help those who are developing web services. Dynamic ESDL has been around since 5.0, allowing service definitions to be directly deployed to esp. But up until now quite a few services could not take advantage of it because the query received from esp needed to be modified before being passed on to roxie - and that modification required the use of custom c++. In 7.0 a big improvement is the introduction of custom transforms. Along with the esdl definition you can include a specification in an xml file that takes inputs like the request, security values, etc and uses them to modify the query that gets sent to roxie. What it means to the web service developer is that that custom c++ code can now be replaced with an xml definition. That is probably worthwhile it itself – reducing the scope for mistakes. Even better it means that the vast majority of services can now use DESDL and be deployed directly from the command line without having to compile c++. Perhaps most significantly you avoid the need to bring esp down, deploy the compiled mappoiong code, and then bring it up again every time a new service definition is required. DESDL is now fully integrated into ESP – it is really more like an ESP v2. It is now just another way of configuring ESP services. A few other improvements to esp have allowed greater control when they are acting as stand alone web servers. For instance being able to connect and disconnect from dali means that operations has control over when service definitions are updated, and allows them isolate esp from other parts of the system. [2:00]
  12. Version 6 added support for embedded languages like python, or MySQL, but their use was a bit restricted. For example there was no EMBED equivalent to an output statement that takes a stream of input records and is executed in parallel over all the nodes. The new activity attribute on an EMBED now allows you to achieve that. Other changes in the compiler focus on improving working with a local repository. Some examples include speeding up local syntax checking and generating the archives that are sent to eclccserver, and providing support for auto completion in editors. [40]
  13. We don’t have the resources (or the skills) to solve to every problem within the HPCC code base. Instead Richard’s team concentrates on improving and extending our core functionality, but also providing you with the ability to integrate other open source projects into your solutions. Allowing other languages to create activities is part of those improvements, what else have we done? [30]
  14. You have probably heard of it, but what is Spark? According to Wikipedia it is “An open source distributed general-purpose cluster-computing framework”. That sounds awfully like HPCC, so why would you want to use it? They are similar, but HPCC and Spark have different strengths and development communities. For example Spark is particularly strong in the machine learning community, and many researchers use it to develop new machine learning algorithms. If you want to apply that work to your data you will be much more successful running those algorithms on spark, rather than trying to port them to HPCC. Another reason to use Spark might be familiarity. If your data analysts are already using spark, with a development environment they are familiar with, then they will want to continue using it. But if a group want to use Spark, and all your data is on HPCC, you have a problem. Well no longer. Version 7 allows Spark to read both files and indexes from HPCC. This allows you to use HPCC for the bulk of your data processing, and use Spark for the areas that particularly suit it. You can then export your results back to HPCC ready to be processed along with the rest of your data. If you want to experiment, then to make life even easier there will also be an optional package which will install and configure a Spark cluster on the same nodes that are used to run HPCC. Of course in 5 years time there may well be a new trendy platform. If so we will make sure that HPCC can also integrate with that platform, whatever it may be. [1:45]
  15. The log files generated by the system contain really useful information, but it can be a real pain in the neck to get at. Version 7 makes it easy to integrate an ELK stack with the system, including the ability to add Kibana dashboards into eclwatch. This integration is highly configurable, and can be useful for many different roles. For example operations can track system health, segfaults, and many other significant events. Developers can search log entries and identify problems. Here, for example, is a dashboard that shows the summary status of a complete cluster. [40]
  16. This example on the other hand provides details about a single machine within the cluster. [10]
  17. And this dashboard item can track the number of transactions per minute going through esp. If you want to know more, there is a blog post to get you started that contains various recipes for extracting different pieces of information from the logs and then visualising them within eclwatch. [20]
  18. A bit of a change of focus. What is VS code and why do I care? Well, if you’re writing ECL on a windows machine then eclide provides a good development environment. If you’re not then what can you do? VS Code provides the cross platform equivalent. For those who haven’t heard of it VS code is a lightweight source code editor, which is gaining widespread adoption. It is designed from the start to be highly customizable and extensible. It has numerous downloadable extensions for different languages, different source control systems, spell checkers, and much much more. Gordon has developed an ECL extension which allows you use vscode in a very similar way to eclide. It is fully functional, even including auto completion, and he is actively developing it. A few brave souls might even be tempted to swap from eclide to VS Code – especially if you are writing code in multiple languages, or particularly value its customizability. [60]
  19. Here is an example of what it looks like when you are editing ecl code. You can see a tree of attributes on the right, the syntax colouring in the editor and integration of the compiler errors just like ecl ide. If you want to find out more then go to Arjuna’s breakout session later today. [20]
  20. Improving security is a continual task. It was improved in 6.0, and I’m sure it will be in the list of improvements for 8.0, and the foreseeable future. So what has changed? [15]
  21. Previously there were a couple of potential problems with the way that browsers connect to eclwatch. The scheme used for authenticating users meant the user name and password were sent with each request, and because the browser sends them automatically there wasn’t a natural way to log out or connect as a different user. This has now changed so the user and password is authenticated once, and after that the connection continues using a session cookie. What practical difference will it make? You now see a different dialog to request the username and password, and once logged in there are options in the top right corner to log out and lock your session, and sessions will lock automatically after a period of inactivity. [45]
  22. Adding the capability for Spark to read Thor files is great, but it raises some security issues. There is no point verifying ECL users have the rights to access files, if Spark users can read any file they want. So along with the spark integration, work needed to be done to ensure the access rights are checked and enforced consistently. And the move to host environments in the cloud also poses extra security challenges. Depending on your level of paranoia you may want the system to Verify that you are really talking to the server you think you are. Signing messages to verify the source of the message is who they claim to be. Add encryption in transit to ensure that no one can read the data being sent between components Version 7 contains several changes to improve this situation – for instance roxie now supports https which allows end to end encryption for roxie queries in the cloud. [55]
  23. Finally of the four, performance is another long term goal that is always going to be on the improvements list. Here are a few areas that are worth highlighting: [10]
  24. Thor has historically been very good at performing standard joins, but not so good at keyed joins. Indeed, sometimes it has been quicker to perform a full join against a index than a keyed join. To tackle this Jake has completely reimplemented keyed joins in Thor. To give you some idea of the improvement, here is a graph of the timings from the performance suite. As you can see it is fairly dramatic! There are more details in the jira issue if you are interested. Obviously your mileage is going to vary, but I would be very surprised if you did not see a fairly dramatic improvement in your own examples. [40]
  25. Some of the extensions to the ML library have really stretched (and sometimes broken) the LOOP activity. As a result there are fixes to the code generator and improvements to Thor, particularly reducing the synchronization between the slave nodes. The other entries on this slide are all examples of improvements to performance, which have come about in response to issues that have been reported. Hopefully they will benefit many users. [35]
  26. The final performance improvement involves indexes. Indexes are used by roxie queries to provide quick access to data. They are however read only and do not support incremental updates, and if they are large they can be slow to build. That causes a problem if the data you are storing is constantly being updated. The common solution to this problem is to use a superindex. This is where a collection of indexes with the same structure are treated as a single index. Those sub indexes are updated at different frequencies – for example on this diagram hourly, daily, weekly, monthly ,yearly. [If have also included some typical figures for numbers of rows]. This scheme retains the quick access to the data, but also allows quick updates since the hourly index takes a fraction of the time to build because it is much smaller. This approach does though have a disadvantage. Now instead of searching a single index file for a match, the system has to search all 5 of the sub indexes. And since only a small proportion of the records are changed each hour most of those searches are not going to find any matches. [1:20]
  27. This is where bloom filters help. They allow the system to quickly exclude indexes from consideration. That means that most of the time that 5 index look up will be reduced to 2 or 3. If you want to understand how they work, and how you use them from ECL, then Richard has written a great blog post for you to read. Hash distributed keys are linked because they will help you to build those incremental updates. They provide a simpler way to build distributed keys that are consistently distributed, and don’t develop problems with skew over time. [35]
  28. There have been a lot of bug fixes, improvements and new features. When I last looked there were more than 1,000 changes that were not part of the 6.x series. [40]
  29. So, while you are at the conference please make the most of your opportunity to talk to the developers. Come and ask us questions, give us feedback and suggest your crazy new ideas. If you want to know who to talk to, here are some suggestions to get you started. [15]