SlideShare une entreprise Scribd logo
1  sur  17
The University of Phoenix
Wins Big with SAS® Grid
Computing

             Daqing Zhao, PhD, VP Analytics
                      Aptimus, Apollo Group
                            March 24,2009
Business Context
                                Apollo’s Univ. of Phoenix
                                     • Largest private university with 360K students world wide
                                     • Largest online student population (80%)
                                     • Over 200 campuses
                                Associates, Bachelors, Masters and PhDs, as well as High School
                                programs
                                     • US has 18 million college students and 70 million US adult population don’t
                                       have college degree
                                Aptimus is marketing arm of Apollo edu institutions
                                     • Large number of student leads per month and 30K enrollments
                                     • Large amount of data, high economic impact
                                     • Need sophisticated analyses, and scalable data processing and statistical
                                       computing power




Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
CRM Strategy
                             Every student can have own optimal learning
                             environment
                             Right message to the right person at the right time
                             Marketing and recruiting, relevant message to recruit
                             only people who are truly interested in education
                             Student segmentation for services, and predict retention
                             Identify risk factors, goals and objectives
                             Improve quality of education and student experience and
                             graduation rate



Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Data and Analytics Strategy
                                Student and faculty event level data
                                     •        Banner and search behavior
                                     •        Web site behavior
                                     •        Email behavior
                                     •        Demographics and psychographics
                                     •        Call center behavior
                                     •        Classroom behavior
                                     •        Surveys and web logs
                                     •        Experimental designs
                                For Internet media players, Yahoo, Google, MySpace, etc., the game
                                is to optimize from impression to click to lead form submission
                                For University of Phoenix, we manage and optimize not only these,
                                but also lead to enrollment conversion, student educational
                                experience, student retentions
                                Large data problems


Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Grid computing
                             The last couple of years, Moore’s Law has flatten off
                                 • Memory bandwidth is the bottleneck for processing power of a single CPU
                                 • Not clock speed
                             Parallel computing
                                 • Multiple processors access the same copy of memory
                                 • Expensive
                                 • Many SAS installations are still on these big servers
                             Distributed or grid computing
                                 • Multiple CPUs on multiple computers, have own memory
                                 • High Ethernet bandwidth
                                 • COTS computers (consumer off the shelf)
                             Only grid computing guarantees to scale
                                 • SAS some procs are multithreaded, others are not
                                 • For large datasets, disk IO is the bottleneck for processing
                             SAS Grid can be deployed on a cluster of inexpensive servers




Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS grid computing architecture
                                                                                                                                                 Sun X4600, slaves


                                                                                                                                                                                …....
                                                                                                                                                                                                                                                        Fiber channel

                                                       Ethernet


                                                                                                                                                                   Sun T5220
                                                                                                                                                               grid manager




                                                                                                                                                                                                                                                      40TB SAN, ST9990V




Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Sun Hardware for SAS Grid
                                Four identical low cost Sun X4600 servers, four processors each,
                                duo core AMD 3.0GHz processors, 64GB RAM per server
                                     • Solaris 10
                                     • Total 32 nodes
                                     • Quad Gigabit Ethernet
                                40TB of SAN clustered file system, ST9990V
                                     • Some 146GB disks, two fiber channel controllers, 30 MB/sec disk IO
                                       throughput per core, no backups, RAID 5
                                        − SAS temp work space
                                        − Project directories and SAS install directory
                                     • Some 300GB disks that are backed up, RAID 5
                                Grid Manager and Meta Data server on Sun T5220 machine
                                All servers attached to the shared 40TB SAN disk




Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Installs and Usage Scenarios
                             Single SAS installation on the shared disk
                                 • All slaves read from the same SAS install
                                 • No need to install SAS for new servers
                                 • Grid manager and meta data server
                             Analyst can log in each server or servers independently for smaller
                             tasks
                                 • All servers are interchangeable
                                 • Each server is quite powerful
                             All servers can work together as a grid on larger tasks
                                 • With simple modifications of code, no need to change code as data scale
                                 • Every server can access the grid
                                 • SAS Grid Manager manages jobs on the nodes
                             SAS EM and EG connect to the grid and submit jobs


Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Setup at University of Phoenix
                                Servers reside at the same location as the data warehouse
                                     • No need to transfer large amount of data over the Internet
                                We have all SAS modules installed as fat clients, with all the SAS
                                functions available on PCs
                                     • Smaller data sets can be sliced and diced locally
                                     • Development can be done on Windows platform
                                System is scalable and fault tolerant, reduce IT SLA requirement
                                     • If one or two servers are down, we can still process data, just slower
                                     • Meta data and grid manager server also has a fail over server
                                     • We can increase computing power by simply adding additional servers
                                         − No need to upgrade entire server system
                                         − We can add newer model machines to work with existing nodes
                                         − No need even to install SAS on new machines




Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Example: Web media optimization
                                Web log data for media optimization
                                Optimize clicks and application and enrollment yield, at CPM, CPC,
                                and CPL
                                Optimize by exposure, session, user behavior and targeted
                                messages
                                Data are at user impression level information, based on anonymous
                                cookie identifier
                                     • For example, one website has 25GB data per day, or 750GB per month
                                     • Longitudinal data sets with data appends and feature generation in the TB
                                       range
                                     • Low signal from large amount of data
                                Disk IO is bottleneck in processing speed



Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Grid Solution
                             Using SAS Base, SAS Connect, and SAS Grid Manager
                                 • Allow each node to process one day of data, and reading and writing
                                   compressed files
                                 • Aggregate and process user level data and take appropriate samples
                                 • SAN disk supports high disk IO speed
                             Build predictive models using SAS/stat and SAS Enterprise Miner
                                 • CPU intensive
                                 • Multiple models on the same data set to optimize model performance
                                 • Many models for different targets at the same time
                             With sufficient computing power, we can add:
                                 •         Complexity of targeting and optimization
                                 •         Update models more frequently
                                 •         Search large set of predictors
                                 •         Get responsive, interactive queries on large data sets




Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Web log processing example
                         Single day log file processing takes 30 minutes (14MB/sec) CPU time
                              • On single SAS session, it takes 15 hours to process 30 days

                         Using 15 nodes on a grid, parallel read and write
                              • 30 minutes to process 15 days
                              • 1 hour to process 30 days, and 1 hour to aggregate (200MB/sec)
                              • Total 2 hours to process one month of data

                         Sorting, 8 hours CPU time, but 5 hours real time
                              • Cannot utilize grid directly
                              • PROC SORT is multithreaded
                              • Total 7 hours processing time is enough for us

                         Alternative, more elaborate strategy is to split each day file based on
                         sorting key
                              • Aggregate by buckets of keys and sort separately
                              • Aggregate all at the end

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Grid search for best models




Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Why Sun/Windows for server/client
                                Sun consultants are part of the team, providing great services from
                                design to implementation
                                Solaris 10 a mature operating system
                                OS support for fast disk IO
                                SAS on Windows with GUI to simplify development efforts and achieve
                                high productivity
                                Complete development tools for additional usage of the servers, so that
                                we don’t compromise flexibility for other available code, C++, Perl, R,
                                etc.
                                Large number of open source software available on Solaris, if needed
                                Low cost high power Sun servers readily available
                                Blazing fast, rock solid, very responsive, and easy to use
                                Favored by system administrators as well as end users



Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Why we chose SAS
                                SAS consultants are part of the team, providing great services from design to
                                implementation
                                SAS is the most tested software, same scripts run on different platforms, 30+ years in
                                the market, widely used by drug companies, financial institutions, and in academic
                                research
                                Powerful SAS base, stat, OR, ETS, allow us to transform data, sample data, and
                                generate reports efficiently and easily
                                SAS’s efficient use of work temp disk space as well as RAM allows processing of
                                extremely large data sets
                                SAS ACCESS connect to Oracle and MySQL databases easily
                                SAS EG allow easy access of SAS by analysts
                                SAS EM power model building tools, grid enabled, with convenient GUI
                                SAS Text Miner for web page classifications
                                SAS Grid Manager schedules tasks optimally
                                Great SAS user community, excellent SAS tech support and training

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Conclusion
                                      What we have is not ideal, not “share nothing”
                                      But we can implement many similar strategies to scale
                                           • Parallel read/write/compute
                                           • Also benefit from multithreads

                                      The setup provides computation horsepower for our data size
                                      and complexity of analysis
                                      Scalability and fault tolerance
                                      Flexibility of conducting analysis using SAS
                                      Cost and performance
                                      High computing power make complex queries and analysis
                                      more or less interactive
                                      We evaluated TeraData, and will continue to, but what we have
                                      seems sufficient for now

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Contenu connexe

En vedette

6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGYGeorge Beaton
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersGiovanni Galloro
 
Cloud Computing Security (Final Year Project) by Pavlos Stefanis
Cloud Computing Security (Final Year Project) by Pavlos StefanisCloud Computing Security (Final Year Project) by Pavlos Stefanis
Cloud Computing Security (Final Year Project) by Pavlos StefanisPavlos Stefanis
 
Virtualization in cloud computing ppt
Virtualization in cloud computing pptVirtualization in cloud computing ppt
Virtualization in cloud computing pptMehul Patel
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple pptAgarwaljay
 
Cloud computing project report
Cloud computing project reportCloud computing project report
Cloud computing project reportNaveed Farooq
 

En vedette (11)

6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
6 STEPS TO CREATE A SUCCESSFUL BUSINESS INTELLIGENCE STRATEGY
 
CS298_presentation
CS298_presentationCS298_presentation
CS298_presentation
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux Containers
 
Cloud Computing Security (Final Year Project) by Pavlos Stefanis
Cloud Computing Security (Final Year Project) by Pavlos StefanisCloud Computing Security (Final Year Project) by Pavlos Stefanis
Cloud Computing Security (Final Year Project) by Pavlos Stefanis
 
Virtual machine
Virtual machineVirtual machine
Virtual machine
 
Virtualization in cloud computing ppt
Virtualization in cloud computing pptVirtualization in cloud computing ppt
Virtualization in cloud computing ppt
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining
Data miningData mining
Data mining
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple ppt
 
Cloud computing project report
Cloud computing project reportCloud computing project report
Cloud computing project report
 
cloud computing ppt
cloud computing pptcloud computing ppt
cloud computing ppt
 

Similaire à SAS Cloud Computing and MapReduce

CDAO Public Sector 2018 Presentation - Forces Shaping Data Science and Machin...
CDAO Public Sector 2018 Presentation - Forces Shaping Data Science and Machin...CDAO Public Sector 2018 Presentation - Forces Shaping Data Science and Machin...
CDAO Public Sector 2018 Presentation - Forces Shaping Data Science and Machin...Felix Liao
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Zaloni
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningSandesh Rao
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLAmazon Web Services
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGSandesh Rao
 
Data Management for High Performance Analytics
Data Management for High Performance AnalyticsData Management for High Performance Analytics
Data Management for High Performance AnalyticsMary Snyder
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopDataWorks Summit
 
Virtual SAP Upgrade Tools in the Cloud
Virtual SAP Upgrade Tools in the CloudVirtual SAP Upgrade Tools in the Cloud
Virtual SAP Upgrade Tools in the Cloudsaumw89402
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...DataWorks Summit
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaSandesh Rao
 
Deep dive session - how to achieve database freedom
Deep dive session - how to achieve database freedomDeep dive session - how to achieve database freedom
Deep dive session - how to achieve database freedomRitesh Toshniwal
 
SAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiSAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiWilliam Nadolski
 
Memory Management Made Simple: Kevin Mcghee, Madelyn Olson
Memory Management Made Simple: Kevin Mcghee, Madelyn OlsonMemory Management Made Simple: Kevin Mcghee, Madelyn Olson
Memory Management Made Simple: Kevin Mcghee, Madelyn OlsonRedis Labs
 
Virtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleVirtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleDenodo
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
 
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLMBIM User Day
 

Similaire à SAS Cloud Computing and MapReduce (20)

CDAO Public Sector 2018 Presentation - Forces Shaping Data Science and Machin...
CDAO Public Sector 2018 Presentation - Forces Shaping Data Science and Machin...CDAO Public Sector 2018 Presentation - Forces Shaping Data Science and Machin...
CDAO Public Sector 2018 Presentation - Forces Shaping Data Science and Machin...
 
Machine Learning: Decision Trees
Machine Learning: Decision TreesMachine Learning: Decision Trees
Machine Learning: Decision Trees
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
 
Data Management for High Performance Analytics
Data Management for High Performance AnalyticsData Management for High Performance Analytics
Data Management for High Performance Analytics
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
 
resume
resumeresume
resume
 
Virtual SAP Upgrade Tools in the Cloud
Virtual SAP Upgrade Tools in the CloudVirtual SAP Upgrade Tools in the Cloud
Virtual SAP Upgrade Tools in the Cloud
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow India
 
UPES-First Indian University to implement SAP
UPES-First Indian University to implement SAPUPES-First Indian University to implement SAP
UPES-First Indian University to implement SAP
 
Deep dive session - how to achieve database freedom
Deep dive session - how to achieve database freedomDeep dive session - how to achieve database freedom
Deep dive session - how to achieve database freedom
 
The Big Data Dream Team
The Big Data Dream TeamThe Big Data Dream Team
The Big Data Dream Team
 
SAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiSAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William Nadolski
 
Memory Management Made Simple: Kevin Mcghee, Madelyn Olson
Memory Management Made Simple: Kevin Mcghee, Madelyn OlsonMemory Management Made Simple: Kevin Mcghee, Madelyn Olson
Memory Management Made Simple: Kevin Mcghee, Madelyn Olson
 
Virtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleVirtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise Scale
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
 

Dernier

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Dernier (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

SAS Cloud Computing and MapReduce

  • 1. The University of Phoenix Wins Big with SAS® Grid Computing Daqing Zhao, PhD, VP Analytics Aptimus, Apollo Group March 24,2009
  • 2. Business Context Apollo’s Univ. of Phoenix • Largest private university with 360K students world wide • Largest online student population (80%) • Over 200 campuses Associates, Bachelors, Masters and PhDs, as well as High School programs • US has 18 million college students and 70 million US adult population don’t have college degree Aptimus is marketing arm of Apollo edu institutions • Large number of student leads per month and 30K enrollments • Large amount of data, high economic impact • Need sophisticated analyses, and scalable data processing and statistical computing power Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 3. CRM Strategy Every student can have own optimal learning environment Right message to the right person at the right time Marketing and recruiting, relevant message to recruit only people who are truly interested in education Student segmentation for services, and predict retention Identify risk factors, goals and objectives Improve quality of education and student experience and graduation rate Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 4. Data and Analytics Strategy Student and faculty event level data • Banner and search behavior • Web site behavior • Email behavior • Demographics and psychographics • Call center behavior • Classroom behavior • Surveys and web logs • Experimental designs For Internet media players, Yahoo, Google, MySpace, etc., the game is to optimize from impression to click to lead form submission For University of Phoenix, we manage and optimize not only these, but also lead to enrollment conversion, student educational experience, student retentions Large data problems Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 5. Grid computing The last couple of years, Moore’s Law has flatten off • Memory bandwidth is the bottleneck for processing power of a single CPU • Not clock speed Parallel computing • Multiple processors access the same copy of memory • Expensive • Many SAS installations are still on these big servers Distributed or grid computing • Multiple CPUs on multiple computers, have own memory • High Ethernet bandwidth • COTS computers (consumer off the shelf) Only grid computing guarantees to scale • SAS some procs are multithreaded, others are not • For large datasets, disk IO is the bottleneck for processing SAS Grid can be deployed on a cluster of inexpensive servers Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 6. SAS grid computing architecture Sun X4600, slaves ….... Fiber channel Ethernet Sun T5220 grid manager 40TB SAN, ST9990V Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 7. Sun Hardware for SAS Grid Four identical low cost Sun X4600 servers, four processors each, duo core AMD 3.0GHz processors, 64GB RAM per server • Solaris 10 • Total 32 nodes • Quad Gigabit Ethernet 40TB of SAN clustered file system, ST9990V • Some 146GB disks, two fiber channel controllers, 30 MB/sec disk IO throughput per core, no backups, RAID 5 − SAS temp work space − Project directories and SAS install directory • Some 300GB disks that are backed up, RAID 5 Grid Manager and Meta Data server on Sun T5220 machine All servers attached to the shared 40TB SAN disk Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 8. SAS Installs and Usage Scenarios Single SAS installation on the shared disk • All slaves read from the same SAS install • No need to install SAS for new servers • Grid manager and meta data server Analyst can log in each server or servers independently for smaller tasks • All servers are interchangeable • Each server is quite powerful All servers can work together as a grid on larger tasks • With simple modifications of code, no need to change code as data scale • Every server can access the grid • SAS Grid Manager manages jobs on the nodes SAS EM and EG connect to the grid and submit jobs Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 9. SAS Setup at University of Phoenix Servers reside at the same location as the data warehouse • No need to transfer large amount of data over the Internet We have all SAS modules installed as fat clients, with all the SAS functions available on PCs • Smaller data sets can be sliced and diced locally • Development can be done on Windows platform System is scalable and fault tolerant, reduce IT SLA requirement • If one or two servers are down, we can still process data, just slower • Meta data and grid manager server also has a fail over server • We can increase computing power by simply adding additional servers − No need to upgrade entire server system − We can add newer model machines to work with existing nodes − No need even to install SAS on new machines Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 10. Example: Web media optimization Web log data for media optimization Optimize clicks and application and enrollment yield, at CPM, CPC, and CPL Optimize by exposure, session, user behavior and targeted messages Data are at user impression level information, based on anonymous cookie identifier • For example, one website has 25GB data per day, or 750GB per month • Longitudinal data sets with data appends and feature generation in the TB range • Low signal from large amount of data Disk IO is bottleneck in processing speed Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 11. Grid Solution Using SAS Base, SAS Connect, and SAS Grid Manager • Allow each node to process one day of data, and reading and writing compressed files • Aggregate and process user level data and take appropriate samples • SAN disk supports high disk IO speed Build predictive models using SAS/stat and SAS Enterprise Miner • CPU intensive • Multiple models on the same data set to optimize model performance • Many models for different targets at the same time With sufficient computing power, we can add: • Complexity of targeting and optimization • Update models more frequently • Search large set of predictors • Get responsive, interactive queries on large data sets Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 12. Web log processing example Single day log file processing takes 30 minutes (14MB/sec) CPU time • On single SAS session, it takes 15 hours to process 30 days Using 15 nodes on a grid, parallel read and write • 30 minutes to process 15 days • 1 hour to process 30 days, and 1 hour to aggregate (200MB/sec) • Total 2 hours to process one month of data Sorting, 8 hours CPU time, but 5 hours real time • Cannot utilize grid directly • PROC SORT is multithreaded • Total 7 hours processing time is enough for us Alternative, more elaborate strategy is to split each day file based on sorting key • Aggregate by buckets of keys and sort separately • Aggregate all at the end Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 13. Grid search for best models Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 14. Why Sun/Windows for server/client Sun consultants are part of the team, providing great services from design to implementation Solaris 10 a mature operating system OS support for fast disk IO SAS on Windows with GUI to simplify development efforts and achieve high productivity Complete development tools for additional usage of the servers, so that we don’t compromise flexibility for other available code, C++, Perl, R, etc. Large number of open source software available on Solaris, if needed Low cost high power Sun servers readily available Blazing fast, rock solid, very responsive, and easy to use Favored by system administrators as well as end users Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 15. Why we chose SAS SAS consultants are part of the team, providing great services from design to implementation SAS is the most tested software, same scripts run on different platforms, 30+ years in the market, widely used by drug companies, financial institutions, and in academic research Powerful SAS base, stat, OR, ETS, allow us to transform data, sample data, and generate reports efficiently and easily SAS’s efficient use of work temp disk space as well as RAM allows processing of extremely large data sets SAS ACCESS connect to Oracle and MySQL databases easily SAS EG allow easy access of SAS by analysts SAS EM power model building tools, grid enabled, with convenient GUI SAS Text Miner for web page classifications SAS Grid Manager schedules tasks optimally Great SAS user community, excellent SAS tech support and training Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 16. Conclusion What we have is not ideal, not “share nothing” But we can implement many similar strategies to scale • Parallel read/write/compute • Also benefit from multithreads The setup provides computation horsepower for our data size and complexity of analysis Scalability and fault tolerance Flexibility of conducting analysis using SAS Cost and performance High computing power make complex queries and analysis more or less interactive We evaluated TeraData, and will continue to, but what we have seems sufficient for now Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
  • 17. Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.