  1. 1. http://smallbitesofbigdata.com Big Data in the Cloud Ravi Patel Business Intelligence Team Manager Microsoft Certified Solution Expert (BI) ® ravi@nealanalytics.com
  2. 2. http://smallbitesofbigdata.com About Me
  3. 3. http://smallbitesofbigdata.com Key Takeaways Basic Big Data and Hadoop terminology What projects fit well with Hadoop Why Hadoop in the cloud is so Powerful Sample end-to-end architecture See: Data, Hadoop, Hive, Streaming, Analytics, BI Do: Data, Hadoop, Hive, Streaming, Analytics, BI How this tech solves your business problems
  4. 4. http://smallbitesofbigdata.com Your Goals What are your backgrounds and needs? What is your Big Data experience?
  5. 5. http://smallbitesofbigdata.com Pre-Req: Azure Subscription Trial: http://azure.microsoft.com/en-us/pricing/free-trial/ MSDN Subscription: http://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/ Startup BizSpark: http://azure.microsoft.com/en-us/pricing/member-offers/bizspark-startups/ Classroom: http://www.microsoftazurepass.com/azureu Pay-As-You-Go or Enterprise Agreement: http://azure.microsoft.com/en-us/pricing/
  6. 6. http://smallbitesofbigdata.com Pre-Reqs Azure subscription with available HDInsight cores Demo file: http://www.slideshare.net/raviumesh/big-datademo Download Power Query add-in http://www.microsoft.com/en- us/download/details.aspx?id=39379&CorrelationId=d8002172-0438-4ef5-b0fa-e635f8f17251 Enable PowerPivot and Power View in Excel options – com add-ins HOL labs http://tinyurl.com/lncd45x “Clone in Desktop” or “Download ZIP” + UNZIP GUI: Install CloudXplorer http://clumsyleaf.com/products/downloads (Optional) Cmd line: Install AzCopy http://azure.microsoft.com/en-us/documentation/articles/storage-use- azcopy/ Install SQL 2014 SSMS http://www.microsoft.com/en-gb/download/details.aspx?id=42299 Today’s slides: http://tinyurl.com/lxutdd4
  7. 7. http://smallbitesofbigdata.com What is Big Data?
  8. 8. http://smallbitesofbigdata.com What do you think Big Data is?
  9. 9. http://smallbitesofbigdata.com What is Big Data? It Is Scale out, distributed processing Enables elasticity Encourages exploration Faster data ingestion Lower TCO Empowers self-service BI and analytics Rapid time to insight It Is NOT A well-defined thing About volume, size A replacement for everything The answer to every problem
  10. 10. http://smallbitesofbigdata.com What is Hadoop? Conceptual View It Is A type of Big Data Just another data source A loose collection of open source code Distributed by many Handles loosely structured data Write once, read many It Is Not Actually a thing! The only way to do Big Data Only about data
  11. 11. Basically Available Soft State Eventually Consistent BASE ACID Atomic Consistent Isolated Durable BASE - ACID
  12. 12. http://smallbitesofbigdata.com What is Hadoop? Tech View http://hortonworks.com/hdp/
  13. 13. http://smallbitesofbigdata.com End to End Architecture
  14. 14. Microsoft Azure Data Services Transform + analyze Visualize + decide Capture + manage Data 
  15. 15. http://smallbitesofbigdata.com Demo VIEW THE AZURE PORTALS HDINSIGHT: ELASTICITY, QUERY
  16. 16. Microsoft Azure Source Data Real Time Microsoft Azure Azure Storage Microsoft Azure Microsoft Azure Machine Learning, Analytics, and Business Intelligence Internet of Things – Business Insights Queries HDInsight SQL Server Storage Storage Storage Event Hub Streaming Microsoft Azure Destination Apps+ Data
  17. 17. http://smallbitesofbigdata.com Architecture – Use Cloud Building Blocks Blob Storage or In Memory (Landing Zone) Blob Storage (Persistent Storage) HDInsight Clusters (Hive, Pig, etc) REST Sqoop Self-Service Analytics Reporting / DW Curator Optimized for write throughput - Many small blobs - Raw/binary format - Data kept until curated - Azure Blob Storage if persisted - Azure Queues & Workers for in memory Optimized for query efficiency - Optimized size (combine blobs) - Cleansed/masked - Partitioned - Well-defined, semi-structured data Use Case Specific & General Processing - Data governance requirements (PII scrub) - Aggregate for efficient storage - Publish to real-time consumers and long term storage (Hadoop) OtherAny Device!
  19. 19. http://smallbitesofbigdata.com When to Use Hadoop
  20. 20. Typical Big Data Use Cases Smart meter monitoring Equipment monitoring Advertising analysis Life sciences research Fraud detection Healthcare outcomes Weather forecasting Natural resource exploration Social network analysis Churn analysis Traffic flow optimization Legal discovery Telemetry IT infrastructure optimization
  21. 21. http://smallbitesofbigdata.com Hadoop Shines When…. Data exploration, analytics and reporting, new data-driven actionable insights Rapid iterating Unknown unknowns Flexible scaling Data driven actions for early competitive advantage or first to market Low number of direct, concurrent users Low cost data archival
  22. 22. http://smallbitesofbigdata.com Hadoop Anti-Patterns…. Replace system whose pain points don’t align with Hadoop’s strengths OLTP needs adequately met by an existing system Known data with a static schema Many end users Interactive response time requirements Your first Hadoop project + mission critical system
  23. 23. Relational Database SCALE (storage & processing) Hadoop Platform schema speed governance best fit use processing Required on write Required on read Reads are fast Writes are fast Standards and structured Loosely structured Limited, no data processing Processing coupled with data data typesStructured Multi and unstructured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Massive Storage/Processing
  24. 24. http://smallbitesofbigdata.cohttp://bit.ly/BDApr2015 Now You Do It CLOUD DATA CAMP LAB 2 CREATE: HDINSIGHT CLUSTER THANKS TO LARA RUBBELKE FOR DEMOS!
  25. 25. http://smallbitesofbigdata.com Why Hadoop in the Cloud
  26. 26. http://smallbitesofbigdata.com Microsoft Hadoop Options Cloud HDInsight Service Windows Azure Storage Blob (WASB) HDP or Cloudera on VMs (Windows or Linux) Any distro on VMs (Windows or Linux) Hybrid / On-Premises Parallel Data Warehouse (PDW) with Polybase APS/PDW Hadoop Regions OneBox for Developers Hortonworks Data Platform (HDP for Windows)
  27. 27. Why Hadoop in the Cloud?
  28. 28. http://smallbitesofbigdata.com Why Hadoop in the Cloud? Hadoop It’s easier You can concentrate on the analytics WASB: separation of storage and compute Shared data, globally accessible Lowers the cost of discovery & innovation No commitment as you learn Cloud in General Today’s disruptor, tomorrow’s reality Elasticity, capacity Less infrastructure and implementation work Lower TCO Business Continuity Operational Agility
  29. 29. http://smallbitesofbigdata.com WASB: Separation of Storage & Compute Windows Azure Storage Blob (WASB) = separate of storage and compute Open source code available to any distro Simplified data access Reduced data movement Faster access to new data Enables ETL even when a cluster isn’t up = lower TCO Share data concurrently
  30. 30. http://smallbitesofbigdata.com Why HDInsight Separation of storage and compute is the default Varied workloads: Query, Streaming, NoSQL Elasticity: Node sizes, # of nodes Committed to openness: Hortonworks, Linux, WASB
  32. 32. http://smallbitesofbigdata.com So Far…. Basic Big Data and Hadoop terminology What projects fit well with Hadoop Why Hadoop in the cloud is so Powerful Sample end-to-end architecture Hands-On: Storage, data load, SQL database, Service Bus Event Hub, HDInsight, Hive, AzureML, Power Query, Power View
  33. 33. http://smallbitesofbigdata.com Tie It Together
  34. 34. http://smallbitesofbigdata.com What’s the Goal? Ask a business question Find and load data Explore the data Iterate Analyze, Visualize, and/or move the data Productionalize some, all, or none
  35. 35. http://smallbitesofbigdata.com Key Takeaways Basic Big Data and Hadoop terminology What projects fit well with Hadoop Why Hadoop in the cloud is so Powerful Sample end-to-end architecture See: Data, Hadoop, Hive, Streaming, Analytics, BI Do: Data, Hadoop, Hive, Streaming, Analytics, BI How this tech solves your business problems
  36. 36. http://smallbitesofbigdata.com Hadoop in the Cloud Ravi Patel Business Intelligence Team Manager Microsoft Certified Solution Expert (SQL 2012) ® ravi@nealanalytics.com
  37. 37. http://smallbitesofbigdata.com Big Data References Get started / overview with a free Ebook “Introducing Microsoft Azure HDInsight” http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing- microsoft-azure-hdinsight.aspx Architect a solution with the Patterns and Practices guide “Developing big data solutions on Microsoft Azure HDInsight“ http://blogs.msdn.com/b/masashi_narumoto/archive/2014/06/30/new-release-developing- big-data-solutions-on-microsoft-hdinsight.aspx The Data Science Laboratory Series is Complete http://blogs.msdn.com/b/buckwoody/archive/2014/03/24/the-data-science-laboratory- series-is-complete.aspx
  38. 38. http://smallbitesofbigdata.com Big Data References Microsoft Big Data http://microsoft.com/bigdata HDP for Windows http://hortonworks.com/products/hdp-windows/ Hadoop: The Definitive Guide by Tom White Programming Hive Book by Capriolo, Wampler, Rutherglen Big Data Learning Resources http://sqlblog.com/blogs/lara_rubbelke/archive/2012/09/10/big-data-learning- resources.aspx Hurricane Sandy Mash-Up: Hive, SQL Server, PowerPivot & Power View http://blogs.msdn.com/b/cindygross/archive/2013/01/31/mash-up-hive-sql-server-data-in-powerpivot-amp- power-view-hurricane-sandy-2012.aspx Twitter Search https://twitter.com/#!/search/%23bigdata Hive Reference http://hive.apache.org HDInsight Tutorials http://www.windowsazure.com/en-us/documentation/services/hdinsight/?fb=en-us Denny Lee http://dennyglee.com/category/bigdata/ Carl Nolan http://blogs.msdn.com/b/carlnol/archive/tags/hadoop+streaming/ Cindy Gross http://tinyurl.com/SmallBitesBigData

