DataLakes kan skalere i takt med skyen, nedbryde integrationsbarrierer og data gemt i siloer og bane vejen for nye forretningsmuligheder. Det er alt sammen med til at give et bedre beslutningsgrundlag for ledelse og medarbejdere. Kom og hør hvordan.
David Bojsen, Arkitekt, Microsoft
2. Big Data is changing traditional data
warehousing
… data warehousing has reached the
most significant tipping point since its
inception. The biggest, possibly most
elaborate data management system
in IT is changing.
– Gartner, “The State of Data Warehousing”*
* Donald Feinberg, Mark Beyer, Merv Adrian, Roxane Edjlali (Gartner), The State of Data Warehousing in 2012 (Stamford, CT.: Gartner, 2012)
Data sources
ETL
Data warehouse
BI and analytics
3. Big Data definition
Big data is high-volume, high-velocity
and/or high-variety information assets
that demand cost-effective, innovative
forms of information processing that
enable enhanced insight, decision
making, and process automation.
– Gartner, Big Data Definition*
* Gartner, Big Data (Stamford, CT.: Gartner, 2016), URL: http://www.gartner.com/it-glossary/big-data/
4. Big Data is driving transformative changes
Traditional Big Data
Relational data
with highly modeled schema
All data
with schema agility
Specialized HW Commodity HW
Data
characteristics
Costs
Culture
Operational
reporting
Focus on rear-view analysis
Experimentation leading
to intelligent action
With machine learning, graph, a/b
testing
5. Big Data introduces a culture of experimentation
Tangerine instantly adapts to customer feedback to offer
customers what they want, when they want it
“I can see us…creating predictive, context-
aware financial services applications that give
information based on
the time and where the customer is.”
Billy Lo
Head of Enterprise Architecture
Scenario
Lack of insight for targeted campaigns
Inability to support data growth
Solution
Azure HDInsight (Hadoop-as-a-service) with the
Analytics Platform System (APS) enables instant analysis
of social sentiment and customer feedback across digital,
face-to-face and phone.
Result
Reduced time to customer insight
Ability to make changes to campaigns or adjust product
rollouts based on real-time customer reactions
Ability to offer incentives and new services to retain—and
grow—its customer base
6. Trends in Data and benefits of
getting ahead with your data
platform
Legacy technology led to: Capitalize on new trends:
Isolated Data
Information stored across silos made it challenging
for employees to access and review data
Historical Analysis
Data reviews provided insight into the reasons for
current and past outcomes
Predictable patterns in data growth without the
technology to integrate and analyze
Exploding Data
Aggregate, store, and make sense of diverse data sets
that hold the key to critical business decisions
Marginal Data Growth
Ensure the employees that need it most can conduct
the analysis on their terms with a familiar set of tools
Predictive Analytics
Rigorous quantitative modeling and simulations are
forecasting future opportunities
Data Self-Service
7. The World of Data HAS CHANGED
Gartner, Merv Adrian, Nick Heudecker, Adam M. Ronthal, “Cool Vendors in DBMS, 2015”
8. However, there are challenges to Big Data…
*Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
Obtaining skills
and capabilities
Determining how
to get value
Integrating with
existing IT investments
9. Big Data as a cornerstone of Cortana Intelligence
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake
Store
Data
Sources
Apps
Sensors
and
devices
Data
10. Azure
Data Lake Store
A No limits Data Lake that
powers Big Data Analytics
Petabyte size files and Trillions of objects
Scalable throughput for massively parallel
analytics
HDFS for the cloud
Always encrypted, role-based security &
auditing
Enterprise-grade support
11. Azure
Data Lake Analytics
A No limits Analytics Job
Service to power intelligent
action
Start in seconds, scale instantly,
pay per job
Develop massively parallel programs
with simplicity
Debug and optimize your big data
programs with ease
Virtualize your analytics
Enterprise-grade security, auditing
and support
12. Azure
HDInsight
A Cloud Spark and
Hadoop service for the
Enterprise
Reliable with an industry leading SLA
Enterprise-grade security and monitoring
Productive platform for developers and
scientists
Cost effective cloud scale
Integration with leading ISV applications
Easy for administrators to manage
63% lower TCO than deploy your
own Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in
the Cloud with Microsoft Azure HDInsight”
13. Azure Data Lake
YARN
U-SQL
Analytics HDInsight
Hive R Server
HDFS
Store
Store and analyze data of any kind and size
Develop faster, debug and optimize smarter
Interactively explore patterns in your data
No learning curve
Managed and supported
Dynamically scales to match your business
priorities
Enterprise-grade security
Built on YARN, designed for the cloud
14. Azure Data Lake
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready
15. Petabyte size files and
Trillions of objects
• Store data in it’s native
format
• PB sized files, 200x larger
than anyone else
• Scalable throughput for
massively parallel analytics
• No need to redesign
application or reparation
data at higher scaleTBs
EBs
Store
16. Start in seconds, Scale
instantly, Pay per job
• Process big data jobs in 30 seconds
• No infrastructure to worry about (no
servers, no VMs, no clusters)
• Instantly scale analytic units up or down
(processing power)
• Architected for cloud scale and
performance
• Frees you up to focus only on your
business logic
17. Azure Data Lake
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready
18. Easy for administrators
to spin up quickly
• Deploy big data projects
in minutes
• No hardware to install, tune,
configure or deploy
• No infrastructure or software to
manage
• Scale to tens to thousands of
machines instantly
19. Debug and Optimize
your Big Data
programs with ease
• Deep integration with
Visual Studio and Visual Studio Code
• Easy for novices to write
simple queries
• Integrated with U-SQL
• Actively offers recommendations to
improve performance and reduce cost
• Playback visually displays job run
20. Develop massively
parallel programs
with simplicity
• U-SQL: a simple
and powerful language that’s familiar and
easily extensible
• Unifies the declarative
nature of SQL with expressive
power of C#
• Leverage existing libraries in .NET languages,
R and Python
• Massively parallelize code on diverse
workloads (ETL, ML, image tagging, facial
detection)
21. Azure Data Lake
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready
22. Highest availability
guarantee in the industry
for peace of mind
• Managed, monitored and
supported by Microsoft
• Enterprise-leading SLA—99.9%
uptime
• No IT resources needed for
upgrades and patching
• Microsoft monitors your
deployment so you don’t
have to
99.9% SLA
23. Always encrypted,
Role-based security
& Auditing
• Always encrypted; in motion using SSL,
and at rest using keys in Azure Key
Vault
• Single sign-on, multi-factor
authentication and seamless
integration of on-premises identities
with Active Directory
• Fine-grained POSIX-based ACLs for
role-based access controls
• Auditing every access / configuration
change
24. Lower total cost
of ownership
• No hardware
• Pay only for the processing used
per job
• No paying for unused cluster
capacity
• Independently scale storage and
compute
• No need to hire specialized
operations team
25. Recognized by
top analysts
Forrester Wave for Big Data
Hadoop Cloud
• Named industry leader by
Forrester with the most
comprehensive, scalable, and
integrated platforms*
• Recognized for its cloud-first
strategy that is paying off*
*The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
26. Get started now
Learn more on the Data Lake website:
http://azure.com/datalake
http://aka.ms/datalake
Watch videos on Azure Data Lake:
https://channel9.msdn.com/Series/AzureDataLake
Take courses and read documentation
on Azure Data Lake:
http://aka.ms/hditraining
http://aka.ms/adlanalytics
http://aka.ms/adlstore
28. Tak for jeres tid
Spørgsmål:
david.bojsen@microsoft.com
Editor's Notes
Key goal of slide: To convey what every IT person knows: The data warehouse and what’s it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?
Slide talk track:
What is the “traditional” data warehouse?
IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company.
However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
<Note: no video or demo associated with this case study yet>
About TangerineTangerine is a direct bank that delivers simplified everyday banking to Canadians. With nearly 2 million Clients and close to $38 billion in total assets, we are Canada's leading direct bank. Tangerine offers banking that is flexible and accessible, products and services that are innovative, fair fees, and award-winning Client service. From no-fee daily chequing and high-interest savings accounts, Credit Card, GICs, RSPs, TFSAs, mortgages and mutual funds, Tangerine has the everyday banking products Canadians need. With over 1,000 employees in Canada, our presence extends beyond our website and Mobile Banking app to our Café locations, Pop-Up locations, Kiosks and 24/7 Contact Centres. Tangerine was launched as ING DIRECT Canada in 1997. In 2012 it was acquired by Scotiabank, and operates independently as a wholly-owned subsidiary.
Tangerine faced a lack of market differentiation, insight for targeted campaigns, and the ability to support their data growth.
To address this issue, they decided to deploy the Microsoft Analytics Platform System (APS), a turnkey big data analytics appliance, along with Microsoft Azure HDInsight, the Microsoft cloud Hadoop-as-a-service.
For Tangerine, transforming customer data into insight is now much easier—and faster. Employees can instantly access usable BI analysis and make better decisions in order to offer customers what they want, when they want it, adjusting on the fly.
With a better ability to learn what its customers are looking for, Tangerine can offer the incentives and new services it needs to be able to retain—and grow—its customer base while creating new services and campaigns based on social media data.
Other benefits include:
Reduced time to customer insight
Ability to make changes to campaigns or adjust product rollouts based on real-time customer reactions
Ability to offer incentives and new services to retain—and grow—its customer base
Tangerine case study: https://customers.microsoft.com/Pages/CustomerStory.aspx?recid=14594
The Gartner quote is from April 9th, 2015 and is the Summary statement.
In the same article they mention as KEY FINDINGS:
Key Findings
Designing DBMSs for the cloud is driving new approaches to architecture, such as decoupling compute and storage, to enhance elasticity to meet uneven usage profiles.
Applications with complex scaling and throughput requirements, particularly those related to the Internet of Things (IoT) and infrastructure monitoring, can be tackled with multimodel products that optimize compute and store capabilities at different points in a business process.
Supporting existing on-premises DBMSs for enhanced availability and agility expectations is an increasing requirement in shops that want to expand the use of existing deployments.
This is exactly what Microsoft is doing with MPP DWH and especially with “Azure SQL Data Warehouse”!
Petabyte size files and Trillions of objects:With Azure Data Lake Store your organization can analyze all of its data in a single place with no artificial constraints. Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores. This makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.
Scalable throughput for massively parallel analytics:Without redesigning your application or repartitioning your data at higher scale, Data Lake Store scales throughput to support any size of analytic workload. It provides massive throughput to run analytic jobs with 1,000+ concurrent executors that read and write hundreds of terabytes of data efficiently.
HDFS for the Cloud:Microsoft Azure Data Lake Store supports any application that uses the open Apache Hadoop Distributed File System (HDFS) standard. By supporting HDFS, you can easily migrate your existing Hadoop and Spark data to the cloud without recreating your HDFS directory structure.
Always encrypted, Role-based security & Auditing:Data Lake Store protects your data assets and extends your on-premises security and governance controls to the cloud easily. Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system.
Enterprise-grade Support:We guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.
Start in seconds, Scale instantly, Pay per job:Our on-demand service will have you processing Big Data jobs within 30 seconds. There is no infrastructure to worry about because there are no servers, VMs, or clusters to wait for, manage or tune. You can instantly scale the analytic units (processing power) from one to thousands for each job. You only pay for the processing used per job.
Develop massively parallel programs with simplicity:U-SQL is a simple, expressive, and extensible language that allows you to write code once and automatically have it be parallelized for the scale you need. You can process petabytes of data for diverse workload categories such as ETL, machine learning, cognitive science, machine translation, imaging processing, and sentiment analysis by using U-SQL and leveraging existing libraries written in .NET languages, R, or Python..
Debug and Optimize your Big Data programs with ease:Debugging failures in cloud distributed programs are now as easy as debugging a program in your personal environment. Our execution environment actively analyzes your programs as they run and offers recommendations to improve performance and reduce cost. For example, if you requested 1000 AUs for your program and only 50 AUs were needed, the system would recommend that you only use 50 AUs resulting in a 20x cost savings.
Virtualize your analytics:The power to act on all your data with optimized data virtualization of your relational sources such as Azure SQL Server on VMs, Azure SQL Database, and Azure SQL Data Warehouse. Queries are automatically optimized by moving processing close to the source data, without data movement, thereby maximizing performance and minimizing latency.
Enterprise-grade Security, Auditing and Support:Extend your on-premises security and governance controls to the cloud for meeting your security and regulatory compliance needs. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. Role Based Access control, and the ability to audit all processing and management operations are on by default. We guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.
Reliable Open Source analytics with an Industry leading SLA
HDInsight allows you to easily spin up enterprise-grade open source cluster types guaranteed with the industry’s best 99.9% SLA and 24/7 support. We guarantee this SLA for the entire big data solution, not just the VM instances. HDInsight is architected for full redundancy and high availability including head node replication, data geo-replication, and built-in standby NameNode making HDInsight resilient to critical failures not addressed in standard Hadoop implementations. Azure also offers cluster monitoring and 24x7 enterprise support backed by Microsoft and Hortonworks with 37 combined committers for Hadoop core, more than all other managed cloud providers combined to support your deployment and the ability to fix and commit code back to Hadoop.
Enterprise Grade Security & Monitoring
HDInsight protects your data assets and easily extends your on-premise security and governance controls to the cloud. We feature single sign-on (SSO), multi-factor authentication and seamless management of millions of identities through Azure Active Directory. You can authorize users and groups with fine-grained access control policies over all your enterprise data with Apache Ranger. HDInsight meets HIPAA, PCI, SOC compliance, ensuring your enterprise data assets are always protected with the highest security and regulatory compliance. To ensure the highest level of business continuity, HDInsight extends capabilities for alerting, monitoring, defining pre-emptive actions, and enhanced workload protection through native integration with Azure Operations Management Suite (OMS).
Most Productive platform for developers and scientists
HDInsight offers developers tailored experiences through rich productivity suites for Hadoop & Spark with integrated development environments using Visual Studio, Eclipse, and IntelliJ supporting Scala, Python, R, Java, and .Net. HDInsight gives data scientists the ability to create narratives that combine code, statistical equations, and visualizations that tell a story about the data through integration to the two most popular notebooks: Jupyter and Zeppelin. HDInsight is also the only managed cloud Hadoop solution with integration to Microsoft R Server. Multi-threaded math libraries and transparent parallelization in R Server means handling up to 1000x more data and up to 50x faster speeds than open source R—helping you train more accurate models for better predictions than previously possible.
Cost effective cloud scale
HDInsight has decoupled compute and storage, enabling you to cost-effectively scale workloads up or down, independent of storage. Local storage can still be used for caching and fast I/O. Spark and interactive Hive users can choose SSD memory for interactive performance; while Kafka users can retain all streaming data in premium managed disks. You only pay for the compute and storage you use and are given the ability to choose any Azure VM types that enables the best utilization of resources. A recent study showed HDInsight delivering 63% lower TCO than deploying Hadoop on premises over 5 years.*
Integration with leading Productivity Applications
In the broader ecosystem for Hadoop, there is a thriving market of independent software vendors (ISVs) who provide value added solutions. Through a unique design where every cluster is extended with edge nodes and script action, HDInsight lets customers spin up Hadoop and Spark clusters pre-integrated and pre-tuned with any ISV application out-of-the-box. Datameer, Cask, AtScale, StreamSets are few such applications, which are very popular on the HDInsight platform today.
Easy for administrators to manage
With HDInsight, administrators can deploy Hadoop in the cloud without buying new hardware or incurring other up-front costs. There’s also no time-consuming installation or set up. There is also no need to patch the operating system or upgrade the Hadoop versions. Azure does it for you. Launch your first cluster in minutes.
Petabyte size files and Trillions of objects:With Azure Data Lake Store your organization can analyze all of its data in a single place with no artificial constraints. Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores. This makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.
Scalable throughput for massively parallel analytics:Without redesigning your application or repartitioning your data at higher scale, Data Lake Store scales throughput to support any size of analytic workload. It provides massive throughput to run analytic jobs with 1,000+ concurrent executors that read and write hundreds of terabytes
Start in seconds, Scale instantly, Pay per job:Our on-demand service will have you processing Big Data jobs within 30 seconds. There is no infrastructure to worry about because there are no servers, VMs, or clusters to wait for, manage or tune. You can instantly scale the analytic units (processing power) from one to thousands for each job. You only pay for the processing used per job.
Debug and Optimize your Big Data programs with ease:Debugging failures in cloud distributed programs are now as easy as debugging a program in your personal environment. Our execution environment actively analyzes your programs as they run and offers recommendations to improve performance and reduce cost. For example, if you requested 1000 AUs for your program and only 50 AUs were needed, the system would recommend that you only use 50 AUs resulting in a 20x cost savings.
Develop massively parallel programs with simplicity:U-SQL is a simple, expressive, and extensible language that allows you to write code once and automatically have it be parallelized for the scale you need. You can process petabytes of data for diverse workload categories such as ETL, machine learning, cognitive science, machine translation, imaging processing, and sentiment analysis by using U-SQL and leveraging existing libraries written in .NET languages, R, or Python..
Always encrypted, Role-based security & Auditing:Data Lake Store protects your data assets and extends your on-premises security and governance controls to the cloud easily. Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system.
Get it at //aka.ms/forresterwave
Cortana Intelligence is available as a simple monthly subscription, which gives you a predictable monthly cost for building a comprehensive Big Data and advanced analytics solution.
Learn more about Cortana Intelligence through our website, by scheduling a workshop to determine where our solution may help, or by speaking with your Microsoft contact about licensing options and trained partners in your area that can help you get the most out of the solution.
T: Thanks for your time.