SlideShare une entreprise Scribd logo
1  sur  40
Cascalog Workshop
Example query
Execution

1. Pre-aggregation
2. Aggregation
3. Post-aggregation
Variable dependencies
Pre-aggregation
• Start from generator variables
• Resolve as many variables as possible using:
 • Joins
 • Functions
• Use as many filters as possible
• Join all sources into one set of tuples
Aggregation


• Group by resolved output variables
• Apply all aggregators to each group
Post-aggregation


• Resolve the rest of the variables
• Apply rest of filters
Example query
Query planner




 Start with generators
Query planner

          [?person2 ?age2 ?double-age2]




Add functions and filters until fixed point
Query planner

  [?person2 ?age2 ?double-age2]

   [?person1 ?person2 ?age2 ?double-age2]




       Do a join
Query planner

          [?person2 ?age2 ?double-age2]

           [?person1 ?person2 ?age2 ?double-age2]




Add functions and filters until fixed point
Query planner

                              [?person2 ?age2 ?double-age2]

                               [?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]




                                   Do a join
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]




[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

               Add functions and filters until fixed point
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta


[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                 Group by already satisfied output vars
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                    Execute aggregators on each group
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

               Add functions and filters until fixed point
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                                                       Project fields to [?delta ?count]
Cascading pipes

• Each: can occur in Map or Reduce
• GroupBy: Causes a Reduce step
• Every: One or more follow GroupBy
• CoGroup: Join implementation, causes
  Reduce step
To Cascading
To Cascading
              Each


 [?person2 ?age2 ?double-age2]
To Cascading

 [?person2 ?age2 ?double-age2]
                             CoGroup
   [?person1 ?person2 ?age2 ?double-age2]
To Cascading

                              [?person2 ?age2 ?double-age2]

                               [?person1 ?person2 ?age2 ?double-age2]
  CoGroup
[?person1 ?age1 ?person2 ?age2 ?double-age2]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]
                      Each


                       Each


[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta
                                                      GroupBy
[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]

                                                                                       Every
                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                    Execute aggregators on each group
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]
                                                                             Each

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
                                                                                 Each
                                                       Project fields to [?delta ?count]
To MapReduce

                                 [?person2 ?age2 ?double-age2]
                                                                            Job 1
                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                                                       Project fields to [?delta ?count]
To MapReduce

                                 [?person2 ?age2 ?double-age2]

   Job 2                           [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                                                       Project fields to [?delta ?count]
To MapReduce

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
                                                        Job 3
                                                       Project fields to [?delta ?count]
defmapop
[A1, B1, C1]                            [A1, B1, C1, D1, E1]



[A2, B2, C2]                            [A2, B2, C2, D2, E2]



[A3, B3, C3]                            [A3, B3, C3, D3, E3]



               Appends fields to tuple
deffilterop
[A1, B1, C1]     true
                            [A1, B1, C1]
[A2, B2, C2]     false      [A3, B3, C3]


[A3, B3, C3]     true
defmapcatop
                      [    [“a red dog”, “a”]
                                                               [“a red dog”, “a”]
[“a red dog”]             [“a red dog”, “red”]
                          [“a red dog”, “dog”]   ]            [“a red dog”, “red”]

   [“ ”]                          []                          [“a red dog”, “dog”]

                                                               [“hello”, “hello”]
  [“hello”]           [    [“hello”, “hello”]    ]
                Map                                  Concat
Aggregators
[“key1”, 1]         [“key1”, 1]
                                       [“key1”, 3]
[“key3”, 3]         [“key1”, 2]

Map Task 1         Reduce Task 1


[“key2”, 3]         [“key2”, 3]
                                       [“key2”, 3]
[“key1”, 2]         [“key3”, 3]
                                      [“key3”, 4]
[“key3”, 1]         [“key3”, 1]
Map Task 2         Reduce Task 2


Regular aggregators - all data goes to reducers
defparallelagg
 [“nathan”]           [“nathan”, 1]
                                                [“nathan”, 2]
  [“alice”]            [“alice”, 1]                                 [“nathan”, 3]
                                                  [“alice”, 1]
 [“nathan”]           [“nathan”, 1]
  Map Task 1            Map Task 1                Map Task 1        Reduce Task 1
                                      Combine            Combine
               Init
                                       (Map)             (Reduce)
                                                                    [“sally”, 1]
 [“nathan”]           [“nathan”, 1]             [“nathan”, 1]
                                                                    [“alice”, 1]
  [“sally”]            [“sally”, 1]              [“sally”, 1]
 Map Task 2             Map Task 2                 Map Task 2       Reduce Task 2


Parallel aggregators - partial aggregation done in mappers
combine
[1]             [3]

[2]             [4]

[3]             [5]


        [1]

        [2]

        [3]
        [3]
        [4]

        [5]
union
[1]           [3]

[2]           [4]

[3]           [5]


       [1]

       [2]

       [3]

       [4]

       [5]
ElephantDB
                                   Shard 0
                                   Shard 1
                                   Shard 2       Distributed
Key/Value pairs
                                   Shard 3       Filesystem
                    Pre-shard      Shard 4
                   and index in
                                   Shard 5
                   MapReduce


                  Generation of domain of data
ElephantDB
DFS                       ElephantDB
                             Server
Shard 0
Shard 1
Shard 2                   ElephantDB
                             Server
Shard 3
Shard 4
Shard 5                   ElephantDB
                             Server


     Serving domain of data

Contenu connexe

En vedette

Lab safety 12_10_13
Lab safety 12_10_13Lab safety 12_10_13
Lab safety 12_10_13skwahl
 
Aprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticasAprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticasRafa Cofiño
 
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementQuantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementEmery Berger
 
Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...Vicki Shaw
 

En vedette (9)

Lab safety 12_10_13
Lab safety 12_10_13Lab safety 12_10_13
Lab safety 12_10_13
 
ebay for Beginners
ebay for Beginnersebay for Beginners
ebay for Beginners
 
Hands-On LinkedIn for Beginners
Hands-On LinkedIn for BeginnersHands-On LinkedIn for Beginners
Hands-On LinkedIn for Beginners
 
Aprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticasAprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticas
 
Infografía
InfografíaInfografía
Infografía
 
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementQuantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
 
Power tecnologia
Power tecnologiaPower tecnologia
Power tecnologia
 
Missao Piaui Diario da Serra 2016
Missao Piaui Diario da Serra 2016Missao Piaui Diario da Serra 2016
Missao Piaui Diario da Serra 2016
 
Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...
 

Plus de nathanmarz

Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineeringnathanmarz
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easynathanmarz
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineeringnathanmarz
 
Your Code is Wrong
Your Code is WrongYour Code is Wrong
Your Code is Wrongnathanmarz
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itnathanmarz
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypenathanmarz
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackTypenathanmarz
 
Cascalog at Strange Loop
Cascalog at Strange LoopCascalog at Strange Loop
Cascalog at Strange Loopnathanmarz
 
Cascalog at Hadoop Day
Cascalog at Hadoop DayCascalog at Hadoop Day
Cascalog at Hadoop Daynathanmarz
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Groupnathanmarz
 

Plus de nathanmarz (17)

Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineering
 
Your Code is Wrong
Your Code is WrongYour Code is Wrong
Your Code is Wrong
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
 
Storm
StormStorm
Storm
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
ElephantDB
ElephantDBElephantDB
ElephantDB
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackType
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
 
Cascalog at Strange Loop
Cascalog at Strange LoopCascalog at Strange Loop
Cascalog at Strange Loop
 
Cascalog at Hadoop Day
Cascalog at Hadoop DayCascalog at Hadoop Day
Cascalog at Hadoop Day
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
 
Cascalog
CascalogCascalog
Cascalog
 
Cascading
CascadingCascading
Cascading
 

Dernier

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Dernier (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Cascalog workshop

  • 5. Pre-aggregation • Start from generator variables • Resolve as many variables as possible using: • Joins • Functions • Use as many filters as possible • Join all sources into one set of tuples
  • 6. Aggregation • Group by resolved output variables • Apply all aggregators to each group
  • 7. Post-aggregation • Resolve the rest of the variables • Apply rest of filters
  • 9. Query planner Start with generators
  • 10. Query planner [?person2 ?age2 ?double-age2] Add functions and filters until fixed point
  • 11. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Do a join
  • 12. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Add functions and filters until fixed point
  • 13. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Do a join
  • 14. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  • 15. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Group by already satisfied output vars
  • 16. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  • 17. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  • 18. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 19. Cascading pipes • Each: can occur in Map or Reduce • GroupBy: Causes a Reduce step • Every: One or more follow GroupBy • CoGroup: Join implementation, causes Reduce step
  • 21. To Cascading Each [?person2 ?age2 ?double-age2]
  • 22. To Cascading [?person2 ?age2 ?double-age2] CoGroup [?person1 ?person2 ?age2 ?double-age2]
  • 23. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] CoGroup [?person1 ?age1 ?person2 ?age2 ?double-age2]
  • 24. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Each Each [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 25. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta GroupBy [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 26. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Every Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  • 27. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] Each [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 28. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Each Project fields to [?delta ?count]
  • 29. To MapReduce [?person2 ?age2 ?double-age2] Job 1 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 30. To MapReduce [?person2 ?age2 ?double-age2] Job 2 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 31. To MapReduce [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Job 3 Project fields to [?delta ?count]
  • 32. defmapop [A1, B1, C1] [A1, B1, C1, D1, E1] [A2, B2, C2] [A2, B2, C2, D2, E2] [A3, B3, C3] [A3, B3, C3, D3, E3] Appends fields to tuple
  • 33. deffilterop [A1, B1, C1] true [A1, B1, C1] [A2, B2, C2] false [A3, B3, C3] [A3, B3, C3] true
  • 34. defmapcatop [ [“a red dog”, “a”] [“a red dog”, “a”] [“a red dog”] [“a red dog”, “red”] [“a red dog”, “dog”] ] [“a red dog”, “red”] [“ ”] [] [“a red dog”, “dog”] [“hello”, “hello”] [“hello”] [ [“hello”, “hello”] ] Map Concat
  • 35. Aggregators [“key1”, 1] [“key1”, 1] [“key1”, 3] [“key3”, 3] [“key1”, 2] Map Task 1 Reduce Task 1 [“key2”, 3] [“key2”, 3] [“key2”, 3] [“key1”, 2] [“key3”, 3] [“key3”, 4] [“key3”, 1] [“key3”, 1] Map Task 2 Reduce Task 2 Regular aggregators - all data goes to reducers
  • 36. defparallelagg [“nathan”] [“nathan”, 1] [“nathan”, 2] [“alice”] [“alice”, 1] [“nathan”, 3] [“alice”, 1] [“nathan”] [“nathan”, 1] Map Task 1 Map Task 1 Map Task 1 Reduce Task 1 Combine Combine Init (Map) (Reduce) [“sally”, 1] [“nathan”] [“nathan”, 1] [“nathan”, 1] [“alice”, 1] [“sally”] [“sally”, 1] [“sally”, 1] Map Task 2 Map Task 2 Map Task 2 Reduce Task 2 Parallel aggregators - partial aggregation done in mappers
  • 37. combine [1] [3] [2] [4] [3] [5] [1] [2] [3] [3] [4] [5]
  • 38. union [1] [3] [2] [4] [3] [5] [1] [2] [3] [4] [5]
  • 39. ElephantDB Shard 0 Shard 1 Shard 2 Distributed Key/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
  • 40. ElephantDB DFS ElephantDB Server Shard 0 Shard 1 Shard 2 ElephantDB Server Shard 3 Shard 4 Shard 5 ElephantDB Server Serving domain of data

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n