SlideShare une entreprise Scribd logo
1  sur  44
Cloud Computing Lecture #1
What is Cloud Computing?
(and an intro to parallel/distributed processing)




                             Jimmy Lin
                             The iSchool
                             University of Maryland

                             Wednesday, September 3, 2008


       Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet,
       Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
       This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States
       See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Source: http://www.free-pictures-photos.com/
What is Cloud Computing?
1.   Web-scale problems
2.   Large data centers
3.   Different models of computing
4.   Highly-interactive Web applications




                                                       The iSchool
                                           University of Maryland
1. Web-Scale Problems
   Characteristics:
       Definitely data-intensive
       May also be processing intensive
   Examples:
       Crawling, indexing, searching, mining the Web
       “Post-genomics” life sciences research
       Other scientific data (physics, astronomers, etc.)
       Sensor networks
       Web 2.0 applications
       …




                                                                     The iSchool
                                                         University of Maryland
How much data?
   Wayback Machine has 2 PB + 20 TB/month (2006)
   Google processes 20 PB a day (2008)
   “all words ever spoken by human beings” ~ 5 EB
   NOAA has ~1 PB climate data (2007)
   CERN’s LHC will generate 15 PB a year (2008)

                           640K ought to be
                           enough for anybody.




                                                             The iSchool
                                                 University of Maryland
Maximilien Brice, © CERN
Maximilien Brice, © CERN
There’s nothing like more data!
      s/inspiration/data/g;




                                          The iSchool
(Banko and Brill, ACL 2001)
                              University of Maryland
(Brants et al., EMNLP 2007)
What to do with more data?
          Answering factoid questions
                 Pattern matching on the Web
                 Works amazingly well
                            Who shot Abraham Lincoln? → X shot Abraham Lincoln

          Learning relations
                 Start with seed instances
                 Search for patterns on the Web
                 Using patterns to find more instances
                                                            Wolfgang Amadeus Mozart (1756 - 1791)
                                                            Einstein was born in 1879

               Birthday-of(Mozart, 1756)
               Birthday-of(Einstein, 1879)
                                                                    PERSON (DATE –
                                                                    PERSON was born in DATE

                                                                                                    The iSchool
(Brill et al., TREC 2001; Lin, ACM TOIS 2007)
(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )                   University of Maryland
2. Large Data Centers
   Web-scale problems? Throw more machines at it!
   Clear trend: centralization of computing resources in large
    data centers
       Necessary ingredients: fiber, juice, and space
       What do Oregon, Iceland, and abandoned mines have in
        common?
   Important Issues:
       Redundancy
       Efficiency
       Utilization
       Management



                                                               The iSchool
                                                   University of Maryland
Source: Harper’s (Feb, 2008)
Maximilien Brice, © CERN
Key Technology: Virtualization


                           App      App          App

    App     App      App   OS       OS           OS

      Operating System           Hypervisor

          Hardware               Hardware

      Traditional Stack      Virtualized Stack




                                                   The iSchool
                                       University of Maryland
3. Different Computing Models
     “Why do it yourself if you can pay someone to do it for you?”

   Utility computing
       Why buy machines when you can rent cycles?
       Examples: Amazon’s EC2, GoGrid, AppNexus
   Platform as a Service (PaaS)
       Give me nice API and take care of the implementation
       Example: Google App Engine
   Software as a Service (SaaS)
       Just run it for me!
       Example: Gmail



                                                                         The iSchool
                                                             University of Maryland
4. Web Applications
   A mistake on top of a hack built on sand held together by
    duct tape?
   What is the nature of software applications?
       From the desktop to the browser
       SaaS == Web-based applications
       Examples: Google Maps, Facebook
   How do we deliver highly-interactive Web-based
    applications?
       AJAX (asynchronous JavaScript and XML)
       For better, or for worse…




                                                             The iSchool
                                                 University of Maryland
What is the course about?
   MapReduce: the “back-end” of cloud computing
       Batch-oriented processing of large datasets
   Ajax: the “front-end” of cloud computing
       Highly-interactive Web-based applications
   Computing “in the clouds”
       Amazon’s EC2/S3 as an example of utility computing




                                                                  The iSchool
                                                      University of Maryland
Amazon Web Services
   Elastic Compute Cloud (EC2)
       Rent computing resources by the hour
       Basic unit of accounting = instance-hour
       Additional costs for bandwidth
   Simple Storage Service (S3)
       Persistent storage
       Charge by the GB/month
       Additional costs for bandwidth
   You’ll be using EC2/S3 for course assignments!




                                                               The iSchool
                                                   University of Maryland
This course is not for you…
    If you’re not genuinely interested in the topic
    If you’re not ready to do a lot of programming
    If you’re not open to thinking about computing in new ways
    If you can’t cope with uncertainly, unpredictability, poor
     documentation, and immature software
    If you can’t put in the time



             Otherwise, this will be a richly rewarding course!



                                                                    The iSchool
                                                        University of Maryland
Source: http://davidzinger.wordpress.com/2007/05/page/2/
Cloud Computing Zen
   Don’t get frustrated (take a deep breath)…
       This is bleeding edge technology
       Those W$*#T@F! moments
   Be patient…
       This is the second first time I’ve taught this course
   Be flexible…
       There will be unanticipated issues along the way
   Be constructive…
       Tell me how I can make everyone’s experience better




                                                                      The iSchool
                                                          University of Maryland
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Things to go over…
   Course schedule
   Assignments and deliverables
   Amazon EC2/S3




                                               The iSchool
                                   University of Maryland
Web-Scale Problems?
   Don’t hold your breath:
       Biocomputing
       Nanocomputing
       Quantum computing
       …
   It all boils down to…
       Divide-and-conquer
       Throwing more hardware at the problem




                       Simple to understand… a lifetime to master…

                                                                       The iSchool
                                                           University of Maryland
Divide and Conquer

                 “Work”
                                       Partition


        w1         w2         w3


      “worker”   “worker”   “worker”


         r1         r2         r3




                 “Result”              Combine


                                                     The iSchool
                                         University of Maryland
Different Workers
   Different threads in the same core
   Different cores in the same CPU
   Different CPUs in a multi-processor system
   Different machines in a distributed system




                                                             The iSchool
                                                 University of Maryland
Choices, Choices, Choices
   Commodity vs. “exotic” hardware
   Number of machines vs. processor vs. cores
   Bandwidth of memory vs. disk vs. network
   Different programming models




                                                           The iSchool
                                               University of Maryland
Flynn’s Taxonomy

                                        Instructions

                              Single (SI)         Multiple (MI)

                                 SISD                  MISD
           Single (SD)


                            Single-threaded         Pipeline
                                process           architecture
    Data




                                SIMD                   MIMD
           Multiple (MD)




                           Vector Processing     Multi-threaded
                                                 Programming




                                                                          The iSchool
                                                              University of Maryland
SISD



                    Processor


       D   D   D        D         D   D   D




                   Instructions




                                                          The iSchool
                                              University of Maryland
SIMD

                       Processor


       D0   D0   D0       D0         D0   D0   D0
       D1   D1   D1       D1         D1   D1   D1
       D2   D2   D2       D2         D2   D2   D2
       D3   D3   D3       D3         D3   D3   D3
       D4   D4   D4       D4         D4   D4   D4
       …    …    …        …          …    …    …
       Dn   Dn   Dn       Dn         Dn   Dn   Dn



                      Instructions



                                                            The iSchool
                                                University of Maryland
MIMD
                    Processor


       D   D   D        D         D   D   D




                   Instructions

                    Processor


       D   D   D        D         D   D   D




                   Instructions


                                                          The iSchool
                                              University of Maryland
Memory Typology: Shared




           Processor            Processor

                       Memory

           Processor            Processor




                                                        The iSchool
                                            University of Maryland
Memory Typology: Distributed



       Processor   Memory             Processor   Memory


                            Network



       Processor   Memory             Processor   Memory




                                                              The iSchool
                                                  University of Maryland
Memory Typology: Hybrid


       Processor                      Processor
                   Memory                         Memory
       Processor                      Processor


                            Network



       Processor                      Processor
                   Memory                         Memory
       Processor                      Processor




                                                              The iSchool
                                                  University of Maryland
Parallelization Problems
    How do we assign work units to workers?
    What if we have more work units than workers?
    What if workers need to share partial results?
    How do we aggregate partial results?
    How do we know all the workers have finished?
    What if workers die?


                What is the common theme of all of these problems?



                                                                The iSchool
                                                    University of Maryland
General Theme?
   Parallelization problems arise from:
       Communication between workers
       Access to shared resources (e.g., data)
   Thus, we need a synchronization system!
   This is tricky:
       Finding bugs is hard
       Solving bugs is even harder




                                                              The iSchool
                                                  University of Maryland
Managing Multiple Workers
   Difficult because
       (Often) don’t know the order in which workers run
       (Often) don’t know where the workers are running
       (Often) don’t know when workers interrupt each other
   Thus, we need:
       Semaphores (lock, unlock)
       Conditional variables (wait, notify, broadcast)
       Barriers
   Still, lots of problems:
       Deadlock, livelock, race conditions, ...
   Moral of the story: be careful!
       Even trickier if the workers are on different machines
                                                                      The iSchool
                                                          University of Maryland
Patterns for Parallelism
    Parallel computing has been around for decades
    Here are some “design patterns” …




                                                         The iSchool
                                             University of Maryland
Master/Slaves



                master




                slaves




                                     The iSchool
                         University of Maryland
Producer/Consumer Flow




         P     C   P     C


         P     C   P     C


         P     C   P     C




                                         The iSchool
                             University of Maryland
Work Queues




        P                    C
              shared queue

        P     W W W W W      C

        P                    C




                                             The iSchool
                                 University of Maryland
Rubber Meets Road
   From patterns to implementation:
       pthreads, OpenMP for multi-threaded programming
       MPI for clustering computing
       …
   The reality:
       Lots of one-off solutions, custom code
       Write you own dedicated library, then program with it
       Burden on the programmer to explicitly manage everything
   MapReduce to the rescue!
       (for next time)




                                                                 The iSchool
                                                     University of Maryland

Contenu connexe

Similaire à Session1

ccna course 2
ccna course 2ccna course 2
ccna course 2S Sridhar
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionDarian Frajberg
 
Conceptual Structures in STEM education
Conceptual Structures in STEM educationConceptual Structures in STEM education
Conceptual Structures in STEM educationSu White
 
100% training accuracy without overfitting
100% training accuracy without overfitting100% training accuracy without overfitting
100% training accuracy without overfittingLibgirlTeam
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce George Ang
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceGeorge Ang
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
MeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data TsunamiMeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data Tsunamiinside-BigData.com
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 
Cyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCameron Kiddle
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Code for science (rev 2)
Code for science (rev 2)Code for science (rev 2)
Code for science (rev 2)Andy Lenards
 
Learning in the Microcosmos
Learning in the MicrocosmosLearning in the Microcosmos
Learning in the Microcosmosopenforum
 
Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Martin
 
ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning Mark S. Steed
 

Similaire à Session1 (20)

ccna course 2
ccna course 2ccna course 2
ccna course 2
 
Data Research Vision
Data Research VisionData Research Vision
Data Research Vision
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
Conceptual Structures in STEM education
Conceptual Structures in STEM educationConceptual Structures in STEM education
Conceptual Structures in STEM education
 
100% training accuracy without overfitting
100% training accuracy without overfitting100% training accuracy without overfitting
100% training accuracy without overfitting
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
 
Empirical AI Research
Empirical AI Research Empirical AI Research
Empirical AI Research
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
MeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data TsunamiMeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data Tsunami
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
Cyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in Science
 
Environmental Science, Big Data and the Cloud
Environmental Science, Big Data and the CloudEnvironmental Science, Big Data and the Cloud
Environmental Science, Big Data and the Cloud
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Code for science (rev 2)
Code for science (rev 2)Code for science (rev 2)
Code for science (rev 2)
 
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challengesCloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
 
Learning in the Microcosmos
Learning in the MicrocosmosLearning in the Microcosmos
Learning in the Microcosmos
 
Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008
 
ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning
 

Dernier

Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...lizamodels9
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Timedelhimodelshub1
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionMintel Group
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...lizamodels9
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...lizamodels9
 

Dernier (20)

Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Time
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted Version
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
 

Session1

  • 1. Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday, September 3, 2008 Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
  • 3. What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications The iSchool University of Maryland
  • 4. 1. Web-Scale Problems  Characteristics:  Definitely data-intensive  May also be processing intensive  Examples:  Crawling, indexing, searching, mining the Web  “Post-genomics” life sciences research  Other scientific data (physics, astronomers, etc.)  Sensor networks  Web 2.0 applications  … The iSchool University of Maryland
  • 5. How much data?  Wayback Machine has 2 PB + 20 TB/month (2006)  Google processes 20 PB a day (2008)  “all words ever spoken by human beings” ~ 5 EB  NOAA has ~1 PB climate data (2007)  CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody. The iSchool University of Maryland
  • 8. There’s nothing like more data! s/inspiration/data/g; The iSchool (Banko and Brill, ACL 2001) University of Maryland (Brants et al., EMNLP 2007)
  • 9. What to do with more data?  Answering factoid questions  Pattern matching on the Web  Works amazingly well Who shot Abraham Lincoln? → X shot Abraham Lincoln  Learning relations  Start with seed instances  Search for patterns on the Web  Using patterns to find more instances Wolfgang Amadeus Mozart (1756 - 1791) Einstein was born in 1879 Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) PERSON (DATE – PERSON was born in DATE The iSchool (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … ) University of Maryland
  • 10. 2. Large Data Centers  Web-scale problems? Throw more machines at it!  Clear trend: centralization of computing resources in large data centers  Necessary ingredients: fiber, juice, and space  What do Oregon, Iceland, and abandoned mines have in common?  Important Issues:  Redundancy  Efficiency  Utilization  Management The iSchool University of Maryland
  • 13. Key Technology: Virtualization App App App App App App OS OS OS Operating System Hypervisor Hardware Hardware Traditional Stack Virtualized Stack The iSchool University of Maryland
  • 14. 3. Different Computing Models “Why do it yourself if you can pay someone to do it for you?”  Utility computing  Why buy machines when you can rent cycles?  Examples: Amazon’s EC2, GoGrid, AppNexus  Platform as a Service (PaaS)  Give me nice API and take care of the implementation  Example: Google App Engine  Software as a Service (SaaS)  Just run it for me!  Example: Gmail The iSchool University of Maryland
  • 15. 4. Web Applications  A mistake on top of a hack built on sand held together by duct tape?  What is the nature of software applications?  From the desktop to the browser  SaaS == Web-based applications  Examples: Google Maps, Facebook  How do we deliver highly-interactive Web-based applications?  AJAX (asynchronous JavaScript and XML)  For better, or for worse… The iSchool University of Maryland
  • 16. What is the course about?  MapReduce: the “back-end” of cloud computing  Batch-oriented processing of large datasets  Ajax: the “front-end” of cloud computing  Highly-interactive Web-based applications  Computing “in the clouds”  Amazon’s EC2/S3 as an example of utility computing The iSchool University of Maryland
  • 17. Amazon Web Services  Elastic Compute Cloud (EC2)  Rent computing resources by the hour  Basic unit of accounting = instance-hour  Additional costs for bandwidth  Simple Storage Service (S3)  Persistent storage  Charge by the GB/month  Additional costs for bandwidth  You’ll be using EC2/S3 for course assignments! The iSchool University of Maryland
  • 18. This course is not for you…  If you’re not genuinely interested in the topic  If you’re not ready to do a lot of programming  If you’re not open to thinking about computing in new ways  If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software  If you can’t put in the time Otherwise, this will be a richly rewarding course! The iSchool University of Maryland
  • 20. Cloud Computing Zen  Don’t get frustrated (take a deep breath)…  This is bleeding edge technology  Those W$*#T@F! moments  Be patient…  This is the second first time I’ve taught this course  Be flexible…  There will be unanticipated issues along the way  Be constructive…  Tell me how I can make everyone’s experience better The iSchool University of Maryland
  • 25. Things to go over…  Course schedule  Assignments and deliverables  Amazon EC2/S3 The iSchool University of Maryland
  • 26. Web-Scale Problems?  Don’t hold your breath:  Biocomputing  Nanocomputing  Quantum computing  …  It all boils down to…  Divide-and-conquer  Throwing more hardware at the problem Simple to understand… a lifetime to master… The iSchool University of Maryland
  • 27. Divide and Conquer “Work” Partition w1 w2 w3 “worker” “worker” “worker” r1 r2 r3 “Result” Combine The iSchool University of Maryland
  • 28. Different Workers  Different threads in the same core  Different cores in the same CPU  Different CPUs in a multi-processor system  Different machines in a distributed system The iSchool University of Maryland
  • 29. Choices, Choices, Choices  Commodity vs. “exotic” hardware  Number of machines vs. processor vs. cores  Bandwidth of memory vs. disk vs. network  Different programming models The iSchool University of Maryland
  • 30. Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) SISD MISD Single (SD) Single-threaded Pipeline process architecture Data SIMD MIMD Multiple (MD) Vector Processing Multi-threaded Programming The iSchool University of Maryland
  • 31. SISD Processor D D D D D D D Instructions The iSchool University of Maryland
  • 32. SIMD Processor D0 D0 D0 D0 D0 D0 D0 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D3 D3 D3 D3 D3 D3 D3 D4 D4 D4 D4 D4 D4 D4 … … … … … … … Dn Dn Dn Dn Dn Dn Dn Instructions The iSchool University of Maryland
  • 33. MIMD Processor D D D D D D D Instructions Processor D D D D D D D Instructions The iSchool University of Maryland
  • 34. Memory Typology: Shared Processor Processor Memory Processor Processor The iSchool University of Maryland
  • 35. Memory Typology: Distributed Processor Memory Processor Memory Network Processor Memory Processor Memory The iSchool University of Maryland
  • 36. Memory Typology: Hybrid Processor Processor Memory Memory Processor Processor Network Processor Processor Memory Memory Processor Processor The iSchool University of Maryland
  • 37. Parallelization Problems  How do we assign work units to workers?  What if we have more work units than workers?  What if workers need to share partial results?  How do we aggregate partial results?  How do we know all the workers have finished?  What if workers die? What is the common theme of all of these problems? The iSchool University of Maryland
  • 38. General Theme?  Parallelization problems arise from:  Communication between workers  Access to shared resources (e.g., data)  Thus, we need a synchronization system!  This is tricky:  Finding bugs is hard  Solving bugs is even harder The iSchool University of Maryland
  • 39. Managing Multiple Workers  Difficult because  (Often) don’t know the order in which workers run  (Often) don’t know where the workers are running  (Often) don’t know when workers interrupt each other  Thus, we need:  Semaphores (lock, unlock)  Conditional variables (wait, notify, broadcast)  Barriers  Still, lots of problems:  Deadlock, livelock, race conditions, ...  Moral of the story: be careful!  Even trickier if the workers are on different machines The iSchool University of Maryland
  • 40. Patterns for Parallelism  Parallel computing has been around for decades  Here are some “design patterns” … The iSchool University of Maryland
  • 41. Master/Slaves master slaves The iSchool University of Maryland
  • 42. Producer/Consumer Flow P C P C P C P C P C P C The iSchool University of Maryland
  • 43. Work Queues P C shared queue P W W W W W C P C The iSchool University of Maryland
  • 44. Rubber Meets Road  From patterns to implementation:  pthreads, OpenMP for multi-threaded programming  MPI for clustering computing  …  The reality:  Lots of one-off solutions, custom code  Write you own dedicated library, then program with it  Burden on the programmer to explicitly manage everything  MapReduce to the rescue!  (for next time) The iSchool University of Maryland