SlideShare a Scribd company logo
1 of 26
DAL
AUG 9, 2017
#datapopup
Building Serverless Data
Pipelines in the Cloud
Manisha Sule
Director of Big Data Analytics, Linux Academy.
Board Member on SMU’s Big Data Advisory Board.
linkedin.com/in/manisha-sule
@tweetDataS
Agenda
1. What is serverless?
2. Big Data architectures and best practices
3. AWS Server less services:
 Lambda
 Kinesis (Streams, Firehose, Analytics)
 DynamoDB
 S3
 Athena
4. Analytics for CoudAssessments.com
What is Server less?
Source: https://www.slideshare.net/CodeOps/serverless-architecture-a-gentle-overview?qid=aecf8d27-8b16-4da5-987f-600fe1cb0655&v=&b=&from_search=5
Server less architectures
 Depend on 3rd party services, known as Backend As a Service (BaaS).
 Distributed system that reacts to events and triggers.
 Dynamically scales, based on demand
 Utilized ephemeral (short-lived) containers or computational resources in the cloud.
Advantages of Server less
 Fully managed, cloud manages servers.
 Highly Available, scalable, no provisioning needed and zero administration.
 Not just compute containers, but also includes NoSQL databases, interactive query services,
storage services, messaging services.
 Cost efficient, never have to pay for idle time.
 Support for continuous integration/ continuous delivery pipelines.
 Developers can focus on architecture and code only.
 Gartner terms as fPaaS, lists several use cases. Utility logic, scheduled processing, event-
driven architecture, micro services, full blown applications
AWS Serverless Application Model
Template based mechanism of defining and deploying serverless applications.
Source : AWS Tech Talk Webinar
Big Data Lambda architecture
Requirements of Big Data architectures:
1. Processing real time streams.
2. Processing batch data.
3. Real time ETL.
4. Enrich real time data with batch data.
5. Queries must be answerable using
batch data and real time data.
Big Data best practices
1. Build decoupled architecture, decouple data->store->process->store steps.
2. Use right tools: Latency, throughput, access patterns, data structures.
3. Cost effective: Big data, not big cost.
AWS Managed vs Serverless services
Need to manage servers, their scale, their location,
software updates etc.
 Elastic Map Reduce: Managed Hadoop
framework, includes Apache Spark,
Zeppelin, Hbase, Flink etc.
 ElasticSearch: For log analytics, full text
search, application monitoring, and more.
Fully integrated with Kibana and LogStash.
 RedShift: Fully managed data warehouse,
to analyze data and integrate with BI tools.
 RDS: Database service to setup, operate
and scale a database in the cloud.
Automatically available in all availability zones
in the region, set on a regional level in the AWS
infrastructure. HA and fault tolerant.
 Lambda
 Kinesis
 S3
 DynamoDB
 Athena
 API Gateway
 CloudWatch
 QuickSight
 IoT
 Cognito
 SQS
AWS Lambda
• Heart of serverless architecture patterns.
• Stateless, event driven code. Supports Node.js, Python, Java, C#.
• No infrastructure to manage.
• No risk of over provisioning or under provisioning, don’t pay for idle time
• Logging and operation monitoring is in-built.
• Efficient performance at scale. If a thousand requests come in, it scales automatically.
• Allows to skip the boring and the hard part. Easy to author, deploy and focus on business
logic.
AWS Kinesis Streams
What is it?: High throughput, low latency, service for real time
data processing over large distributed data streams. Stores
streaming data for a period of 24 hours, during which data can
be read, processed, stored in real time.
How to use it? Configure producer data sources to emit data
into the stream. Build consuming applications that read and
process data from that stream in real-time.
Applications: Real-time metrics and reporting. Extracting
metrics and generating KPIs to power reports and dashboards
at real-time speeds. Used for streaming data that needs custom
processing.
Why use it? Amazon Kinesis Streams has simple pay-as–you-
go pricing, with no up-front costs or minimum fees, and you’ll
only pay for the resources you consume. Guarantees durability
and availability of data. Also maintains order of data.
Source:
https://www.slideshare.net/frodriguezolivera/aws-
kinesis-streams
AWS Kinesis vs Kafka
Both are data ingest frameworks for streaming data with durability, reliability and scalability.
Differences:
1. Kafka is open source. User is responsible for managing, installing clusters.
2. Kinesis is a managed service by AWS and saves cost and effort in managing servers.
3. Kafka’s costs includes DevOps engineers and storage and compute servers.
4. Kinesis being serverless, resource and human costs are much lower.
AWS Kinesis Firehose
What is it? Fully managed service that offers an easy to use solution to collect and deliver
streaming data to Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service.
How to use it? Configure and use. No code needed.
Applications: Load streaming data into S3, Redshift, ElasticSearch that can connect to BI tools
for real time analysis. Unlike Kinesis streams, Firehose is used when data does not need
custom processing.
Why to use it?: Seamlessly scales to match data throughput without intervention.
AWS Kinesis Analytics
What is it? Fully managed service to process streaming data with SQL.
How to use it? Configure input stream, write queries and configure output stream.
Applications: Perform continual processing on streaming data.
Why to use it?: Pre-processing, basic analytics like aggregates, filtering, advanced analytics like
anomaly detection, alerting and triggering.
AWS Kinesis: serverless stream processing
Kinesis Streams: With Lambda, allows stateless processing of data. Ingests from multiple
producers and delivers to multiple destinations. Needs management of scale using shards.
Kinesis Firehose: Transform streaming data with Lambda and guaranteed delivery to S3,
Redshift or Elastic Search.
Kinesis Analytics: Stateful processing of streaming data, like aggregations over a time period.
When to use which approach?
AWS DynamoDB
• Fully managed NoSQL Database that supports both key-value and document store models.
• Other than the primary key, the table is schema less.
• Supports 32 levels of nested attributes.
• In memory cache allows response times to reduce to microseconds.
AWS DynamoDB Stream processing
• Durability and high availability
• Managed streams
• Performant
• Native integration with Lambda.
Source: AWS Webinars
AWS S3
Object storage that provides you a highly reliable, secure, and scalable storage for all your data,
big or small. It is designed to deliver 99.999999999% durability, and scale past trillions of objects.
AWS Athena
 Launched at AWS re:Invent Novemebr 2016.
 Interactive query service, to analyze data stored in S3 buckets.
 Serverless, no infrastructure setup needed.
 Pay only for the queries you run; $5 per terabyte scanned by the queries
 Works with a variety of standard data formats, including CSV, JSON, ORC, and Parquet.
 Uses Presto with full SQL support.
 Ideal for quick ad-hoc querying as well as complex analysis.
 Powers real time dashboards.
Linux Academy launches Cloud Assessments
(https://www.cloudassessments.com/)
1. Assess: Enroll in Quests (Example: AWS CSA) and take assessments that test real-
world AWS skills on live cloud environments.
2. Learn: Lean learning, based on your performance, you are presented a tailor made
learning path.
3. Earn: Earn proven skills and ability to pass certification exams, earn badges and
micro certifications.
Linux Academy and AWS Partnership
Give nonprofit teams and individuals unlimited access to our entire library of cloud certification training
content to facilitate cloud building skills for all levels:
• More than 2,500 self-paced video courses
• 209 total hours of AWS course training
• 438 Linux training hours
• 105 OpenStack training hours
• More than 60 hands-on, scenario-based labs for AWS skill building
• Live AWS lab servers for practicing newly-acquired skills
• Quizzes, study guides, flash cards, study groups, and practice exams
Analytics for CloudAssessments.com
(https://www.cloudassessments.com/)
1. Descriptive Analytics: Dashboards with charts and graphs
• Historical views
• Real time views
2. Anomaly Detection: detect abuse of system, operational inefficiencies
3. Recommendation Engine: to provide custom tailor-made learning paths
4. Predictive analytics: Predict student performance
5. Chat bots: Virtual assistants for learning guidance.
Real time processing using Kinesis streams and
Kinesis Analytics
Big Data architecture using AWS serverless
Thank you!

More Related Content

What's hot

Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Cathrine Wilhelmsen
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemDatabricks
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
Future of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldFuture of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldSrivatsan Srinivasan
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityDatabricks
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Sprinkle Data Inc
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyDatabricks
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataScott Clinton
 
Auckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeAuckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeSergio Zenatti Filho
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 

What's hot (20)

Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark Ecosystem
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Future of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldFuture of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native world
 
Azure databricks by usama whaba khan
Azure databricks by usama whaba khanAzure databricks by usama whaba khan
Azure databricks by usama whaba khan
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
Auckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeAuckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data Lake
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 

Similar to Building Data Analytics pipelines in the cloud using serverless technology

Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recapCloudHesive
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?Software Guru
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Amazon Web Services
 
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)Amazon Web Services
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesStridely Solutions
 
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Amazon Web Services
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Jamie Kinney
 
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...Amazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)Adrian Hornsby
 

Similar to Building Data Analytics pipelines in the cloud using serverless technology (20)

Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recap
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
AWS re:Invent 2016: Accenture Cloud Platform Serverless Journey (ARC202)
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs Kubernetes
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
What's new in AWS?
What's new in AWS?What's new in AWS?
What's new in AWS?
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)
 

More from Domino Data Lab

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...Domino Data Lab
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataDomino Data Lab
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationDomino Data Lab
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryDomino Data Lab
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusDomino Data Lab
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterDomino Data Lab
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceDomino Data Lab
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Domino Data Lab
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Domino Data Lab
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at ScaleDomino Data Lab
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataDomino Data Lab
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Domino Data Lab
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceDomino Data Lab
 
Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the RescueDomino Data Lab
 
How to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical FeaturesHow to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical FeaturesDomino Data Lab
 
Building Up Local Models of Customers
Building Up Local Models of CustomersBuilding Up Local Models of Customers
Building Up Local Models of CustomersDomino Data Lab
 

More from Domino Data Lab (20)

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops data
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile Virus
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with Jupyter
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data Science
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Making Big Data Smart
Making Big Data SmartMaking Big Data Smart
Making Big Data Smart
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data Science
 
Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the Rescue
 
How to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical FeaturesHow to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical Features
 
Building Up Local Models of Customers
Building Up Local Models of CustomersBuilding Up Local Models of Customers
Building Up Local Models of Customers
 

Recently uploaded

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Building Data Analytics pipelines in the cloud using serverless technology

  • 2. Building Serverless Data Pipelines in the Cloud Manisha Sule Director of Big Data Analytics, Linux Academy. Board Member on SMU’s Big Data Advisory Board. linkedin.com/in/manisha-sule @tweetDataS
  • 3. Agenda 1. What is serverless? 2. Big Data architectures and best practices 3. AWS Server less services:  Lambda  Kinesis (Streams, Firehose, Analytics)  DynamoDB  S3  Athena 4. Analytics for CoudAssessments.com
  • 4. What is Server less? Source: https://www.slideshare.net/CodeOps/serverless-architecture-a-gentle-overview?qid=aecf8d27-8b16-4da5-987f-600fe1cb0655&v=&b=&from_search=5
  • 5. Server less architectures  Depend on 3rd party services, known as Backend As a Service (BaaS).  Distributed system that reacts to events and triggers.  Dynamically scales, based on demand  Utilized ephemeral (short-lived) containers or computational resources in the cloud.
  • 6. Advantages of Server less  Fully managed, cloud manages servers.  Highly Available, scalable, no provisioning needed and zero administration.  Not just compute containers, but also includes NoSQL databases, interactive query services, storage services, messaging services.  Cost efficient, never have to pay for idle time.  Support for continuous integration/ continuous delivery pipelines.  Developers can focus on architecture and code only.  Gartner terms as fPaaS, lists several use cases. Utility logic, scheduled processing, event- driven architecture, micro services, full blown applications
  • 7. AWS Serverless Application Model Template based mechanism of defining and deploying serverless applications. Source : AWS Tech Talk Webinar
  • 8. Big Data Lambda architecture Requirements of Big Data architectures: 1. Processing real time streams. 2. Processing batch data. 3. Real time ETL. 4. Enrich real time data with batch data. 5. Queries must be answerable using batch data and real time data.
  • 9. Big Data best practices 1. Build decoupled architecture, decouple data->store->process->store steps. 2. Use right tools: Latency, throughput, access patterns, data structures. 3. Cost effective: Big data, not big cost.
  • 10. AWS Managed vs Serverless services Need to manage servers, their scale, their location, software updates etc.  Elastic Map Reduce: Managed Hadoop framework, includes Apache Spark, Zeppelin, Hbase, Flink etc.  ElasticSearch: For log analytics, full text search, application monitoring, and more. Fully integrated with Kibana and LogStash.  RedShift: Fully managed data warehouse, to analyze data and integrate with BI tools.  RDS: Database service to setup, operate and scale a database in the cloud. Automatically available in all availability zones in the region, set on a regional level in the AWS infrastructure. HA and fault tolerant.  Lambda  Kinesis  S3  DynamoDB  Athena  API Gateway  CloudWatch  QuickSight  IoT  Cognito  SQS
  • 11. AWS Lambda • Heart of serverless architecture patterns. • Stateless, event driven code. Supports Node.js, Python, Java, C#. • No infrastructure to manage. • No risk of over provisioning or under provisioning, don’t pay for idle time • Logging and operation monitoring is in-built. • Efficient performance at scale. If a thousand requests come in, it scales automatically. • Allows to skip the boring and the hard part. Easy to author, deploy and focus on business logic.
  • 12. AWS Kinesis Streams What is it?: High throughput, low latency, service for real time data processing over large distributed data streams. Stores streaming data for a period of 24 hours, during which data can be read, processed, stored in real time. How to use it? Configure producer data sources to emit data into the stream. Build consuming applications that read and process data from that stream in real-time. Applications: Real-time metrics and reporting. Extracting metrics and generating KPIs to power reports and dashboards at real-time speeds. Used for streaming data that needs custom processing. Why use it? Amazon Kinesis Streams has simple pay-as–you- go pricing, with no up-front costs or minimum fees, and you’ll only pay for the resources you consume. Guarantees durability and availability of data. Also maintains order of data. Source: https://www.slideshare.net/frodriguezolivera/aws- kinesis-streams
  • 13. AWS Kinesis vs Kafka Both are data ingest frameworks for streaming data with durability, reliability and scalability. Differences: 1. Kafka is open source. User is responsible for managing, installing clusters. 2. Kinesis is a managed service by AWS and saves cost and effort in managing servers. 3. Kafka’s costs includes DevOps engineers and storage and compute servers. 4. Kinesis being serverless, resource and human costs are much lower.
  • 14. AWS Kinesis Firehose What is it? Fully managed service that offers an easy to use solution to collect and deliver streaming data to Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service. How to use it? Configure and use. No code needed. Applications: Load streaming data into S3, Redshift, ElasticSearch that can connect to BI tools for real time analysis. Unlike Kinesis streams, Firehose is used when data does not need custom processing. Why to use it?: Seamlessly scales to match data throughput without intervention.
  • 15. AWS Kinesis Analytics What is it? Fully managed service to process streaming data with SQL. How to use it? Configure input stream, write queries and configure output stream. Applications: Perform continual processing on streaming data. Why to use it?: Pre-processing, basic analytics like aggregates, filtering, advanced analytics like anomaly detection, alerting and triggering.
  • 16. AWS Kinesis: serverless stream processing Kinesis Streams: With Lambda, allows stateless processing of data. Ingests from multiple producers and delivers to multiple destinations. Needs management of scale using shards. Kinesis Firehose: Transform streaming data with Lambda and guaranteed delivery to S3, Redshift or Elastic Search. Kinesis Analytics: Stateful processing of streaming data, like aggregations over a time period. When to use which approach?
  • 17. AWS DynamoDB • Fully managed NoSQL Database that supports both key-value and document store models. • Other than the primary key, the table is schema less. • Supports 32 levels of nested attributes. • In memory cache allows response times to reduce to microseconds.
  • 18. AWS DynamoDB Stream processing • Durability and high availability • Managed streams • Performant • Native integration with Lambda. Source: AWS Webinars
  • 19. AWS S3 Object storage that provides you a highly reliable, secure, and scalable storage for all your data, big or small. It is designed to deliver 99.999999999% durability, and scale past trillions of objects.
  • 20. AWS Athena  Launched at AWS re:Invent Novemebr 2016.  Interactive query service, to analyze data stored in S3 buckets.  Serverless, no infrastructure setup needed.  Pay only for the queries you run; $5 per terabyte scanned by the queries  Works with a variety of standard data formats, including CSV, JSON, ORC, and Parquet.  Uses Presto with full SQL support.  Ideal for quick ad-hoc querying as well as complex analysis.  Powers real time dashboards.
  • 21. Linux Academy launches Cloud Assessments (https://www.cloudassessments.com/) 1. Assess: Enroll in Quests (Example: AWS CSA) and take assessments that test real- world AWS skills on live cloud environments. 2. Learn: Lean learning, based on your performance, you are presented a tailor made learning path. 3. Earn: Earn proven skills and ability to pass certification exams, earn badges and micro certifications.
  • 22. Linux Academy and AWS Partnership Give nonprofit teams and individuals unlimited access to our entire library of cloud certification training content to facilitate cloud building skills for all levels: • More than 2,500 self-paced video courses • 209 total hours of AWS course training • 438 Linux training hours • 105 OpenStack training hours • More than 60 hands-on, scenario-based labs for AWS skill building • Live AWS lab servers for practicing newly-acquired skills • Quizzes, study guides, flash cards, study groups, and practice exams
  • 23. Analytics for CloudAssessments.com (https://www.cloudassessments.com/) 1. Descriptive Analytics: Dashboards with charts and graphs • Historical views • Real time views 2. Anomaly Detection: detect abuse of system, operational inefficiencies 3. Recommendation Engine: to provide custom tailor-made learning paths 4. Predictive analytics: Predict student performance 5. Chat bots: Virtual assistants for learning guidance.
  • 24. Real time processing using Kinesis streams and Kinesis Analytics
  • 25. Big Data architecture using AWS serverless