SlideShare une entreprise Scribd logo
1  sur  37
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
Building a Stock Prediction system with
Machine Learning using Geode, Spring XD
e Spark MLLib
William Markito
@william_markito
Fred Melo
@fredmelo_br
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
It's all about DATA
Data Sources
Look for patterns
Prediction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
medium avg
(x+1)
relative
strength (x)
medium avg (x)
price(x)
Machine Learning Model
(e.g. Linear Regression)
© Copyright 2014 Pivotal. All rights reserved.
Transform Sink
SpringXD
Extensible
Open-Source
Fault-Tolerant
Horizontally Scalable
Cloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
Apache Geode (incubating)
Introduction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Introduction
A distributed, memory-based data management platform for
data oriented apps that need:
High performance, scalability, resiliency and continuous
availability
Fast access to critical data set
Location aware distributed data processing
Event driven data architecture
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Cache
In-memory storage and management for
your data
Configurable through XML, Spring, Java
API or CLI
Collection of Region
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Region
Distributed java.util.Map on steroids
(Key/Value)
Consistent API regardless of where or how data
is stored
Observable (reactive)
Highly available, redundant on cache Member
(s).
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Region
Local, Replicated or Partitioned
In-memory or persistent
Redundant
LRU
Overflow
LOCAL
LOCAL_HEAP_LRU
LOCAL_OVERFLOW
LOCAL_PERSISTENT
LOCAL_PERSISTENT_OVERFLOW
PARTITION
PARTITION_HEAP_LRU
PARTITION_OVERFLOW
PARTITION_PERSISTENT
PARTITION_PERSISTENT_OVERFLOW
PARTITION_PROXY
PARTITION_PROXY_REDUNDANT
PARTITION_REDUNDANT
PARTITION_REDUNDANT_HEAP_LRU
PARTITION_REDUNDANT_OVERFLOW
PARTITION_REDUNDANT_PERSISTENT
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW
REPLICATE
REPLICATE_HEAP_LRU
REPLICATE_OVERFLOW
REPLICATE_PERSISTENT
REPLICATE_PERSISTENT_OVERFLOW
REPLICATE_PROXY
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Member
A process that has a connection to the system
A process that has created a cache
Embeddable within your application
Client
Locator
Server
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Client cache
A process connected to the Geode server(s)
Can have a local copy of the data
Can be notified about events on the servers
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Listeners
CacheWriter / CacheListener
AsyncEventListener (queue / batch)
Parallel or Serial
Conflation
© Copyright 2014 Pivotal. All rights reserved. 19
Apache Geode (incubating)
Currently under incubation in Apache Software Foundation
Welcome contributions and contributors
Code and Patches
Bugs, feature requests
Documentation and content
Any form of feedback
© Copyright 2014 Pivotal. All rights reserved. 20
Code
New features
Bug fixes (patches)
Writing tests
Documentation
Wiki
Web site
User guides
Community
Join our mailing lists (Ask or answer)
Become a speaker
Find and report bugs
Testing a release candidate or beta
Apache Geode (incubating)
© Copyright 2014 Pivotal. All rights reserved. 21
JIRA - https://issues.apache.org/jira/browse/GEODE
GitHub - https://github.com/apache/incubator-geode
Mailing lists:
Development - dev@geode.incubator.apache.org
Users - user@geode.incubator.apache.org
Wiki - cwiki.apache.org/confluence/display/GEODE
StackOverflow - http://stackoverflow.com/questions/tagged/geode+or+gemfire
Apache Geode (incubating)
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
SpringXD
Introduction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
 A stream is composed from modules. Each module is deployed to a container and its
channels are bound to the transport.
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
Apache Zeppelin
(incubating)
Introduction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Web based REPL
Iterative & Exploratory
Support for Data Ingestion
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Multi interpreters
Markdown
Shell
Spark
Geode
Python…
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Sharing through URLs without Reports
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
Apache Spark
Introduction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
RDD
Dataframe
Driver
Worker
"An RDD in Spark is simply an immutable distributed collection of objects.
Each RDD is split into multiple partitions, which may be computed on different nodes
of the cluster. RDDs can contain any type of Python, Java, or Scala objects,
including user-defined classes."
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
RDD
Dataframe
Driver
Worker
“A dataframe is a distributed collection of rows organized into named columns. An
abstraction for selecting, filtering and plotting structured data (pandas), previously
known as SchemaRDD."
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
RDD
Dataframe
Driver
Worker
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Summary
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Summary
• Integration
• Spark, JDBC, Geode
• HDFS, Twitter, File, Mail…
• Data pipeline orchestration
• Intuitive DSL
• Streaming & Analytics
• Distributed and scalable
• Web based REPL
• Multiple Interpreters
• Apache Spark
• Markdown
• Flink
• Python
• Geode…
• Iterative & Exploratory
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Summary
• Fast data processing
• Columnar queries
• RDDs
• Machine Learning
• Analytics & Streaming
• Fast data store and processing
• In-memory & Persistent
• Highly Consistent
• Transaction processing
• Thousands of concurrent
clients
© Copyright 2014 Pivotal. All rights reserved. 36
Source Code
http://pivotal-open-source-hub.github.io/StockInference-Spark/
Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

Contenu connexe

Tendances

Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Databricks
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
DataWorks Summit
 

Tendances (20)

Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbench
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipeline
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu Ma
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AI
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at Uber
 

En vedette

Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
IBM Cloud Data Services
 

En vedette (15)

How to Contribute to Apache Geode
How to Contribute to Apache GeodeHow to Contribute to Apache Geode
How to Contribute to Apache Geode
 
Apache Geode (incubating) Introduction with Docker
Apache Geode (incubating) Introduction with DockerApache Geode (incubating) Introduction with Docker
Apache Geode (incubating) Introduction with Docker
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Time Series Analysis with Spark
Time Series Analysis with SparkTime Series Analysis with Spark
Time Series Analysis with Spark
 
Machine Learning with Spark MLlib
Machine Learning with Spark MLlibMachine Learning with Spark MLlib
Machine Learning with Spark MLlib
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlib
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
 

Similaire à Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

Similaire à Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib (20)

Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일
피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일 피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일
피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일
 
Pivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow KeynotePivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow Keynote
 
Spark meets Spring
Spark meets SpringSpark meets Spring
Spark meets Spring
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Reducing the Risks of Migrating Off Oracle
Reducing the Risks of Migrating Off OracleReducing the Risks of Migrating Off Oracle
Reducing the Risks of Migrating Off Oracle
 
Pivotal Digital Transformation Forum: Data Science Technical Overview
Pivotal Digital Transformation Forum: Data Science Technical OverviewPivotal Digital Transformation Forum: Data Science Technical Overview
Pivotal Digital Transformation Forum: Data Science Technical Overview
 
Node summit workshop
Node summit workshopNode summit workshop
Node summit workshop
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
 
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
 
IoT Use Cases with MapR
IoT Use Cases with MapRIoT Use Cases with MapR
IoT Use Cases with MapR
 
Marcin Szałowicz - MySQL Workbench
Marcin Szałowicz - MySQL WorkbenchMarcin Szałowicz - MySQL Workbench
Marcin Szałowicz - MySQL Workbench
 
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and OpsRemoving Barriers Between Dev and Ops
Removing Barriers Between Dev and Ops
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 

Dernier

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

  • 1. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› Building a Stock Prediction system with Machine Learning using Geode, Spring XD e Spark MLLib William Markito @william_markito Fred Melo @fredmelo_br
  • 2.
  • 3. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. It's all about DATA Data Sources Look for patterns Prediction
  • 4. ‹#›© 2015 Pivotal Software, Inc. All rights reserved.
  • 5. ‹#›© 2015 Pivotal Software, Inc. All rights reserved.
  • 6.
  • 7. ‹#›© 2015 Pivotal Software, Inc. All rights reserved.
  • 8. medium avg (x+1) relative strength (x) medium avg (x) price(x) Machine Learning Model (e.g. Linear Regression)
  • 9.
  • 10. © Copyright 2014 Pivotal. All rights reserved. Transform Sink SpringXD Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native Machine Learning Enrich Filter Split Dashboard Indicators 1 2 Predict 3 Real data Simulator /Stocks /TechIndicators /Predictions
  • 11. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› Apache Geode (incubating) Introduction
  • 12. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Introduction A distributed, memory-based data management platform for data oriented apps that need: High performance, scalability, resiliency and continuous availability Fast access to critical data set Location aware distributed data processing Event driven data architecture
  • 13. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Cache In-memory storage and management for your data Configurable through XML, Spring, Java API or CLI Collection of Region
  • 14. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Region Distributed java.util.Map on steroids (Key/Value) Consistent API regardless of where or how data is stored Observable (reactive) Highly available, redundant on cache Member (s).
  • 15. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Region Local, Replicated or Partitioned In-memory or persistent Redundant LRU Overflow LOCAL LOCAL_HEAP_LRU LOCAL_OVERFLOW LOCAL_PERSISTENT LOCAL_PERSISTENT_OVERFLOW PARTITION PARTITION_HEAP_LRU PARTITION_OVERFLOW PARTITION_PERSISTENT PARTITION_PERSISTENT_OVERFLOW PARTITION_PROXY PARTITION_PROXY_REDUNDANT PARTITION_REDUNDANT PARTITION_REDUNDANT_HEAP_LRU PARTITION_REDUNDANT_OVERFLOW PARTITION_REDUNDANT_PERSISTENT PARTITION_REDUNDANT_PERSISTENT_OVERFLOW REPLICATE REPLICATE_HEAP_LRU REPLICATE_OVERFLOW REPLICATE_PERSISTENT REPLICATE_PERSISTENT_OVERFLOW REPLICATE_PROXY
  • 16. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Member A process that has a connection to the system A process that has created a cache Embeddable within your application Client Locator Server
  • 17. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Client cache A process connected to the Geode server(s) Can have a local copy of the data Can be notified about events on the servers
  • 18. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Listeners CacheWriter / CacheListener AsyncEventListener (queue / batch) Parallel or Serial Conflation
  • 19. © Copyright 2014 Pivotal. All rights reserved. 19 Apache Geode (incubating) Currently under incubation in Apache Software Foundation Welcome contributions and contributors Code and Patches Bugs, feature requests Documentation and content Any form of feedback
  • 20. © Copyright 2014 Pivotal. All rights reserved. 20 Code New features Bug fixes (patches) Writing tests Documentation Wiki Web site User guides Community Join our mailing lists (Ask or answer) Become a speaker Find and report bugs Testing a release candidate or beta Apache Geode (incubating)
  • 21. © Copyright 2014 Pivotal. All rights reserved. 21 JIRA - https://issues.apache.org/jira/browse/GEODE GitHub - https://github.com/apache/incubator-geode Mailing lists: Development - dev@geode.incubator.apache.org Users - user@geode.incubator.apache.org Wiki - cwiki.apache.org/confluence/display/GEODE StackOverflow - http://stackoverflow.com/questions/tagged/geode+or+gemfire Apache Geode (incubating)
  • 22. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› SpringXD Introduction
  • 23. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts
  • 24. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts  A stream is composed from modules. Each module is deployed to a container and its channels are bound to the transport.
  • 25. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› Apache Zeppelin (incubating) Introduction
  • 26. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Web based REPL Iterative & Exploratory Support for Data Ingestion
  • 27. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Multi interpreters Markdown Shell Spark Geode Python…
  • 28. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Sharing through URLs without Reports
  • 29. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› Apache Spark Introduction
  • 30. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts RDD Dataframe Driver Worker "An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."
  • 31. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts RDD Dataframe Driver Worker “A dataframe is a distributed collection of rows organized into named columns. An abstraction for selecting, filtering and plotting structured data (pandas), previously known as SchemaRDD."
  • 32. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts RDD Dataframe Driver Worker
  • 33. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Summary
  • 34. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Summary • Integration • Spark, JDBC, Geode • HDFS, Twitter, File, Mail… • Data pipeline orchestration • Intuitive DSL • Streaming & Analytics • Distributed and scalable • Web based REPL • Multiple Interpreters • Apache Spark • Markdown • Flink • Python • Geode… • Iterative & Exploratory
  • 35. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Summary • Fast data processing • Columnar queries • RDDs • Machine Learning • Analytics & Streaming • Fast data store and processing • In-memory & Persistent • Highly Consistent • Transaction processing • Thousands of concurrent clients
  • 36. © Copyright 2014 Pivotal. All rights reserved. 36 Source Code http://pivotal-open-source-hub.github.io/StockInference-Spark/