SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Avalanche
 By: Matthew Levandowski, Travis Fisher, Erik Vavro,
Eric Nelson, Jonathan Hoatlin
Ideas & Definitions
Workbench / Interface
        - A sandbox environment for developing workflows that can be later used in implementations (e.g., our beer restaurant).
        The workbench acts as a secure entry point to the remote framework (RESTful cloud service).
Block
    - A single event of data manipulation. Blocks are commonly chained together and each block is usually dependent on the
    output of its preceding blocks. They can accept data from either Mongo, the UI, and of course, other blocks. Blocks inherit
    the behavior of celery tasks.
Connection
    - An identifying route between a source block and target block.
Group Block
    - A block that encapsulates sub-blocks, used to provide a basic sense of hierarchy. Group blocks do not perform any data
    manipulation themselves and simply forward incoming data to their sub-blocks.
Workflow
        - A user-owned collection of blocks and/or group blocks and connections that is described by a JSON schema. Workflows
        are generated in the UI (workbench) and displayed with the Graphiti JSON graph and passed to the remote framework for
        serialization into an executable sequence of blocks (celery tasks).
Ideas & Definitions (con.)
Framework
         - A RESTful cloud-based framework that mines data, serializes workflows and performs various statistical/analytical tasks
         (powered by celery).
Celery
       - An asynchronous task queue/job queue library based on distributed message passing.
Celery Worker
       - An external process connected to the mongo database that executes tasks on the task queue and returns results to other
       tasks or to the main workflow task
Task
       - A unit of execution in Celery. Blocks inherit from Task, so that they can be run in Celery
Workbench/Features
•    Administrator page allows user to create workflow
•    Each block has metadata so that front end knows what connections and parameters each block needs.
•    After user creates blocks dynamic form is created to receive parameters from the user.
•    Restaurant allows user to create data by ordering beers and wines
•    History of Results
•    Upload Datasets
General Use Case
1. User logs in and then creates new dataset upload (server parses as json)
2. Dataset file is uploaded to server and generates unique filename
3. User creates new block by requesting block parameters and building form
4. Form and data is validated and new block is created
5. Before saving block model generates unique block id and adds to Graphiti canvas
6. Saves block model json to workflow field
7. User clicks ‘Run’ button and serializes blocks and workflow to send to backend
Framework/Features
•      Uses celery which is a multi-threaded tasks handler – increases performance
•      MongoDB is a flexible, schema-free, BSON based database (NoSQL)
•      Parses workflows into blocks and creates tasks for celery


Concepts and Paradigms
•      Distributed, message-based computing
•      Meta based
•      Choose between duck and static typing
•      Data confidence
•      Scalability
•      Modularity
•      Cloud-based RESTful service


General Use Case
1.   Workflow json gets sent to backend to be executed
2.   Backend parses the workflow data and creates an executable sequence of blocks
3.   Celery automagically handles and optimizes block queueing and saves results into MongoDB
4.   Backend returns ids of results back to frontend.
5.   Frontend access MongoDB API to get result data and parse into a visually pleasing format
6.   Django display’s views for results with highcharts javascript library.
Example Workflow
Celery Constructs
   Chain            Chord
What we need
Common Dependencies   Multiple Inputs
Solution:
Parallel Topological Sort
Parallel Topological Sort




Blocks without dependencies are started
Parallel Topological Sort




B0 finishes, b3 is started
Parallel Topological Sort




b1 finishes, b2 and b4 are started
Parallel Topological Sort




B2, b3, b4 finish, b5 is started
Parallel Topological Sort
               • Result ids are returned
                 when all blocks finish
               • The data stays in mongo



B5 finishes
Framework/Algorithms
•   Basic Statistics
     o Mean, Median, Mode
     o Standard Deviation
     o Variation
     o Maximum, Minimum
•   Set Theory
     o Union
     o Intersection
     o Difference
     o Sorting
•   Apriori Algorithm
•   K-Means Clustering
•   Outlier Detection (Density-Based Clustering)
Demo
Workbench Technology
•   Django – Python based website framework
•   Jquery – multi-browser JavaScript library designed to simplify the client-side scripting of HTML with ajax support
•   Twitter Bootstrap Framework – HTML and CSS-based design templates for typography, forms, buttons, charts, navigation
    and other interface components, as well as optional JavaScript extensions.
•   Gargoyle – Togglable feature flips for administrator interface
•   HTML5 Canvas - dynamic, scriptable rendering of 2D shapes and bitmap images


Problems Encountered?
•   HTML5 Canvas GUI frontend does not work right on all browsers
•   Django and jquery ui drag and drop.
•   Django steep learning curve.
Framework Technology
•   Celery
•   MongoDB
•   Numpy
•   Scipy
•   Scikit Learn
•   Flask


Problems Encountered?


•   Celery has a steep initial learning curve
•   Spent a lot of time revising the structure of workflows and blocks
•   Machine learning algorithms are difficult
•   Coordination of data formats was difficult to address between the front and back end

Contenu connexe

Tendances

Entity framework code first
Entity framework code firstEntity framework code first
Entity framework code first
Confiz
 
Struts 2-overview2
Struts 2-overview2Struts 2-overview2
Struts 2-overview2
divzi1913
 

Tendances (17)

Session 39 - Hibernate - Part 1
Session 39 - Hibernate - Part 1Session 39 - Hibernate - Part 1
Session 39 - Hibernate - Part 1
 
Session 25 - Introduction to JEE, Servlets
Session 25 - Introduction to JEE, ServletsSession 25 - Introduction to JEE, Servlets
Session 25 - Introduction to JEE, Servlets
 
What is JDBC
What is JDBCWhat is JDBC
What is JDBC
 
MuleSoft ESB - CSV File to Database
MuleSoft ESB - CSV File to DatabaseMuleSoft ESB - CSV File to Database
MuleSoft ESB - CSV File to Database
 
Assignment#10
Assignment#10Assignment#10
Assignment#10
 
Angular - Chapter 7 - HTTP Services
Angular - Chapter 7 - HTTP ServicesAngular - Chapter 7 - HTTP Services
Angular - Chapter 7 - HTTP Services
 
Entity Framework Overview
Entity Framework OverviewEntity Framework Overview
Entity Framework Overview
 
Session 37 - JSP - Part 2 (final)
Session 37 - JSP - Part 2 (final)Session 37 - JSP - Part 2 (final)
Session 37 - JSP - Part 2 (final)
 
Entity Framework Database and Code First
Entity Framework Database and Code FirstEntity Framework Database and Code First
Entity Framework Database and Code First
 
ITI006En-AJAX
ITI006En-AJAXITI006En-AJAX
ITI006En-AJAX
 
Hibernate 3
Hibernate 3Hibernate 3
Hibernate 3
 
Entity framework code first
Entity framework code firstEntity framework code first
Entity framework code first
 
Session 28 - Servlets - Part 4
Session 28 - Servlets - Part 4Session 28 - Servlets - Part 4
Session 28 - Servlets - Part 4
 
Session 34 - JDBC Best Practices, Introduction to Design Patterns
Session 34 - JDBC Best Practices, Introduction to Design PatternsSession 34 - JDBC Best Practices, Introduction to Design Patterns
Session 34 - JDBC Best Practices, Introduction to Design Patterns
 
FITC presents: Mobile & offline data synchronization in Angular JS
FITC presents: Mobile & offline data synchronization in Angular JSFITC presents: Mobile & offline data synchronization in Angular JS
FITC presents: Mobile & offline data synchronization in Angular JS
 
Session 31 - Session Management, Best Practices, Design Patterns in Web Apps
Session 31 - Session Management, Best Practices, Design Patterns in Web AppsSession 31 - Session Management, Best Practices, Design Patterns in Web Apps
Session 31 - Session Management, Best Practices, Design Patterns in Web Apps
 
Struts 2-overview2
Struts 2-overview2Struts 2-overview2
Struts 2-overview2
 

En vedette (7)

Openwrt, linux e GPIO al LinuxDay 2010 Roma
Openwrt, linux e GPIO al LinuxDay 2010 RomaOpenwrt, linux e GPIO al LinuxDay 2010 Roma
Openwrt, linux e GPIO al LinuxDay 2010 Roma
 
RAIL BHAVAN
RAIL BHAVANRAIL BHAVAN
RAIL BHAVAN
 
Nutrition
NutritionNutrition
Nutrition
 
TIME MANAGEMENT
TIME MANAGEMENTTIME MANAGEMENT
TIME MANAGEMENT
 
SHORT CATALOGUE
SHORT CATALOGUESHORT CATALOGUE
SHORT CATALOGUE
 
LOK SABHA SOUND SYSTEM
LOK SABHA SOUND SYSTEMLOK SABHA SOUND SYSTEM
LOK SABHA SOUND SYSTEM
 
Presentazione tirocinio
Presentazione tirocinio Presentazione tirocinio
Presentazione tirocinio
 

Similaire à AvalancheProject2012

Similaire à AvalancheProject2012 (20)

Frameworks Galore: A Pragmatic Review
Frameworks Galore: A Pragmatic ReviewFrameworks Galore: A Pragmatic Review
Frameworks Galore: A Pragmatic Review
 
Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts
 
70487.pdf
70487.pdf70487.pdf
70487.pdf
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1
 
Shopzilla On Concurrency
Shopzilla On ConcurrencyShopzilla On Concurrency
Shopzilla On Concurrency
 
MVC Framework
MVC FrameworkMVC Framework
MVC Framework
 
AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)
 
Advanced web application architecture - Talk
Advanced web application architecture - TalkAdvanced web application architecture - Talk
Advanced web application architecture - Talk
 
Bquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureBquery Reporting & Analytics Architecture
Bquery Reporting & Analytics Architecture
 
04 integrate entityframework
04 integrate entityframework04 integrate entityframework
04 integrate entityframework
 
Elements for an iOS Backend
Elements for an iOS BackendElements for an iOS Backend
Elements for an iOS Backend
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
Shopzilla On Concurrency
Shopzilla On ConcurrencyShopzilla On Concurrency
Shopzilla On Concurrency
 
Sitecore9 key features by jitendra soni - Presented in Sitecore User Group UK
Sitecore9 key features by jitendra soni - Presented in Sitecore User Group UKSitecore9 key features by jitendra soni - Presented in Sitecore User Group UK
Sitecore9 key features by jitendra soni - Presented in Sitecore User Group UK
 
Efficient working with Databases in LabVIEW - Sam Sharp (MediaMongrels Ltd) -...
Efficient working with Databases in LabVIEW - Sam Sharp (MediaMongrels Ltd) -...Efficient working with Databases in LabVIEW - Sam Sharp (MediaMongrels Ltd) -...
Efficient working with Databases in LabVIEW - Sam Sharp (MediaMongrels Ltd) -...
 
7 steps to simplifying your AI workflows
7 steps to simplifying your AI workflows7 steps to simplifying your AI workflows
7 steps to simplifying your AI workflows
 
Hibernate tutorial
Hibernate tutorialHibernate tutorial
Hibernate tutorial
 
170215 msa intro
170215 msa intro170215 msa intro
170215 msa intro
 
Scalable Architectures - Microsoft Finland DevDays 2014
Scalable Architectures - Microsoft Finland DevDays 2014Scalable Architectures - Microsoft Finland DevDays 2014
Scalable Architectures - Microsoft Finland DevDays 2014
 
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
 

AvalancheProject2012

  • 1. Avalanche By: Matthew Levandowski, Travis Fisher, Erik Vavro, Eric Nelson, Jonathan Hoatlin
  • 2. Ideas & Definitions Workbench / Interface - A sandbox environment for developing workflows that can be later used in implementations (e.g., our beer restaurant). The workbench acts as a secure entry point to the remote framework (RESTful cloud service). Block - A single event of data manipulation. Blocks are commonly chained together and each block is usually dependent on the output of its preceding blocks. They can accept data from either Mongo, the UI, and of course, other blocks. Blocks inherit the behavior of celery tasks. Connection - An identifying route between a source block and target block. Group Block - A block that encapsulates sub-blocks, used to provide a basic sense of hierarchy. Group blocks do not perform any data manipulation themselves and simply forward incoming data to their sub-blocks. Workflow - A user-owned collection of blocks and/or group blocks and connections that is described by a JSON schema. Workflows are generated in the UI (workbench) and displayed with the Graphiti JSON graph and passed to the remote framework for serialization into an executable sequence of blocks (celery tasks).
  • 3. Ideas & Definitions (con.) Framework - A RESTful cloud-based framework that mines data, serializes workflows and performs various statistical/analytical tasks (powered by celery). Celery - An asynchronous task queue/job queue library based on distributed message passing. Celery Worker - An external process connected to the mongo database that executes tasks on the task queue and returns results to other tasks or to the main workflow task Task - A unit of execution in Celery. Blocks inherit from Task, so that they can be run in Celery
  • 4. Workbench/Features • Administrator page allows user to create workflow • Each block has metadata so that front end knows what connections and parameters each block needs. • After user creates blocks dynamic form is created to receive parameters from the user. • Restaurant allows user to create data by ordering beers and wines • History of Results • Upload Datasets General Use Case 1. User logs in and then creates new dataset upload (server parses as json) 2. Dataset file is uploaded to server and generates unique filename 3. User creates new block by requesting block parameters and building form 4. Form and data is validated and new block is created 5. Before saving block model generates unique block id and adds to Graphiti canvas 6. Saves block model json to workflow field 7. User clicks ‘Run’ button and serializes blocks and workflow to send to backend
  • 5. Framework/Features • Uses celery which is a multi-threaded tasks handler – increases performance • MongoDB is a flexible, schema-free, BSON based database (NoSQL) • Parses workflows into blocks and creates tasks for celery Concepts and Paradigms • Distributed, message-based computing • Meta based • Choose between duck and static typing • Data confidence • Scalability • Modularity • Cloud-based RESTful service General Use Case 1. Workflow json gets sent to backend to be executed 2. Backend parses the workflow data and creates an executable sequence of blocks 3. Celery automagically handles and optimizes block queueing and saves results into MongoDB 4. Backend returns ids of results back to frontend. 5. Frontend access MongoDB API to get result data and parse into a visually pleasing format 6. Django display’s views for results with highcharts javascript library.
  • 7. Celery Constructs Chain Chord
  • 8. What we need Common Dependencies Multiple Inputs
  • 10. Parallel Topological Sort Blocks without dependencies are started
  • 11. Parallel Topological Sort B0 finishes, b3 is started
  • 12. Parallel Topological Sort b1 finishes, b2 and b4 are started
  • 13. Parallel Topological Sort B2, b3, b4 finish, b5 is started
  • 14. Parallel Topological Sort • Result ids are returned when all blocks finish • The data stays in mongo B5 finishes
  • 15. Framework/Algorithms • Basic Statistics o Mean, Median, Mode o Standard Deviation o Variation o Maximum, Minimum • Set Theory o Union o Intersection o Difference o Sorting • Apriori Algorithm • K-Means Clustering • Outlier Detection (Density-Based Clustering)
  • 16. Demo
  • 17. Workbench Technology • Django – Python based website framework • Jquery – multi-browser JavaScript library designed to simplify the client-side scripting of HTML with ajax support • Twitter Bootstrap Framework – HTML and CSS-based design templates for typography, forms, buttons, charts, navigation and other interface components, as well as optional JavaScript extensions. • Gargoyle – Togglable feature flips for administrator interface • HTML5 Canvas - dynamic, scriptable rendering of 2D shapes and bitmap images Problems Encountered? • HTML5 Canvas GUI frontend does not work right on all browsers • Django and jquery ui drag and drop. • Django steep learning curve.
  • 18. Framework Technology • Celery • MongoDB • Numpy • Scipy • Scikit Learn • Flask Problems Encountered? • Celery has a steep initial learning curve • Spent a lot of time revising the structure of workflows and blocks • Machine learning algorithms are difficult • Coordination of data formats was difficult to address between the front and back end