Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Using real time big data analytics for competitive advantage

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 33 Publicité

Using real time big data analytics for competitive advantage

Many organisations find it challenging to successfully perform real-time data analytics using their own on premise IT infrastructure. Building a system that can adapt and scale rapidly to handle dramatic increases in transaction loads can potentially be quite a costly and time consuming exercise.

Most of the time, infrastructure is under-utilised and it’s near impossible for organisations to forecast the amount of computing power they will need in the future to serve their customers and suppliers.

To overcome these challenges, organisations can instead utilise the cloud to support their real-time data analytics activities. Scalable, agile and secure, cloud-based infrastructure enables organisations to quickly spin up infrastructure to support their data analytics projects exactly when it is needed. Importantly, they can ‘switch off’ infrastructure when it is not.

BluePi Consulting and Amazon Web Services (AWS) are giving you the opportunity to discover how organisations are using real time data analytics to gain new insights from their information to improve the customer experience and drive competitive advantage.

Many organisations find it challenging to successfully perform real-time data analytics using their own on premise IT infrastructure. Building a system that can adapt and scale rapidly to handle dramatic increases in transaction loads can potentially be quite a costly and time consuming exercise.

Most of the time, infrastructure is under-utilised and it’s near impossible for organisations to forecast the amount of computing power they will need in the future to serve their customers and suppliers.

To overcome these challenges, organisations can instead utilise the cloud to support their real-time data analytics activities. Scalable, agile and secure, cloud-based infrastructure enables organisations to quickly spin up infrastructure to support their data analytics projects exactly when it is needed. Importantly, they can ‘switch off’ infrastructure when it is not.

BluePi Consulting and Amazon Web Services (AWS) are giving you the opportunity to discover how organisations are using real time data analytics to gain new insights from their information to improve the customer experience and drive competitive advantage.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (20)

Publicité

Similaire à Using real time big data analytics for competitive advantage (20)

Plus par Amazon Web Services (20)

Publicité

Plus récents (20)

Using real time big data analytics for competitive advantage

  1. 1. Data Is the New Oil
  2. 2. What is Big Data? Quantum of data TB to PB of data Speed of data Millisecond latency Types of data Hundreds of data sources Quality of data Varies greatly. Affects accuracy of analysis Business relevance How does it help the business? 25+TB of data being generated per second globally 90+% of world’s data created in last 2 years 90+% of data generated is unstructured
  3. 3. Evolution of Big Data ProcessingDescriptivePredictivePrescriptive Batch Real-time Dashboards; Traditional query & reporting Prediction engines; Inventory forecasting, cross-sell analysis Recommendation engines; routes, content recos What & Why it happened? Probability of ‘x’ happening What to do if ‘x’ happens TypeofAnalytics Speed of Analysis It is happening! Alerts, analysis & detection; what is going wrong, fraudulent use
  4. 4. Big Data Potentially massive datasets Iterative, experimental style of data manipulation & analysis Frequently not a steady-state workload; peaks & valleys Variety & velocity of data Management of tools complex AWS Cloud Massive, virtually unlimited capacity On-demand infrastructure allows iterative, experimental deployment/usage Most efficient with highly variable workloads Tools & services for managing structured & unstructured, batch & stream data Fully managed Big Data was built for the Cloud
  5. 5. Ingest/ Collect Consume/ visualize Store Process & Analyze Data 1 4 0 9 5 Answers & Insights Broad, Tightly Integrated Capabilities AWS provides the broadest platform for big data analytics today Start Here with a business case Real-time Amazon Kinesis Firehose Data Import AWS Import/Export Snowball Object Storage Amazon S3 Real-time Amazon Kinesis Streams Distributed Amazon EMR (Hadoop, Spark, etc) BI & Data Vizualization Amazon Quicksoght Real-time AWS Lambda Amazon Kinesis Analytics Data Warehousing Amazon Redshift Machine Learning Amazon Machine Learning Relational Databases Amazon RDS No SQL Databases Amazon DynamoDB Elasticsearch Amazon Elasticsearch Data Connect AWS Direct Connect Storage gateway AWS Storage Gateway Database Migration AWS Data Migration Service Time to answer (latency) Throughput Cost
  6. 6. Amazon Redshift Fast, fully managed, petabyte-scale data warehouse •10X better performance than traditional DBs •Less than one tenth the cost of traditional solutions • Simple and fully managed • Flexible & Scalable: Easily change number or type nodes •ANSI SQL Compatible: Use familiar SQL clients/BI tools •Secure: Encryption, network isolation, audit & compliance •Ideal usage patterns: sales, historical, gaming, finance, marketing, ad, social data 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI ToolsSQL Clients/BI Tools 128GB RAM128GB RAM 16TB disk16TB disk 16 cores16 cores Amazon S3Amazon S3 JDBC/ODBC 128GB RAM128GB RAM 16TB disk16TB disk 16 cores16 cores Compute Node Compute Node 128GB RAM128GB RAM 16TB disk16TB disk 16 cores16 cores Compute Node Compute Node 128GB RAM128GB RAM 16TB disk16TB disk 16 cores16 cores Compute Node Compute Node Leader Node Leader Node
  7. 7. Amazon EMR Quickly and cost-effectively process vast amounts of data •Largest cloud operator of Hadoop infrastructure •Open source & MapR distributions •Most current Hadoop distribution •Flexibility : Decoupled compute & storage, select apps, resize •Simple : Launch a cluster in minutes, fully managed •Scalable : Provision as much capacity as needed •Multiple pricing option – On-demand, Reserved Instances, Spot •Typical use cases – Clickstream analysis, log processing, genomics
  8. 8. Amazon Kinesis Easily work with real-time streaming data Amazon Kinesis Streams • Build custom apps to process or analyze streaming data • Typical use cases – Log & event data collection, real-time analytics Amazon Kinesis Firehose • Easily load massive volumes of streaming data into S3, Redshift, AWS ES • Typical use cases – Digital marketing, IoT, mobile data capture Amazon Kinesis Analytics • Easily analyze data streams using standard SQL queries
  9. 9. Amazon Elasticsearch Fully managed making it easy to set-up, operate & scale Elasticsearch clusters in the cloud •Easy set-up & configuration. Fully managed •Flexible storage options •Set-up for high availability •Seamlessly scale •Direct access to Elasticsearch APIs •Support for ELK. Built-in KIbana •Integration with AWS IAM for controlling access to your domain •Integration with Amazon CloudTrail for auditing Amazon Route 53 Elastic Load Balancing IAM CloudWatch Elasticsearch API CloudTrailAWS
  10. 10. Select Big Data & Analytics Customers The vast majority of Big Data use cases deployed in the cloud today run on AWS
  11. 11. Now available in the Mumbai region!
  12. 12. Real time analytics in the cloud
  13. 13. Data is Growing 1.7MB 58% 0.5% of new data will be created every second for every human being on the planet by 2020 http://www.whizpr.be/upload/medialab/21/c ompany/Media_Presentation_2012_DigiUni verseFINAL1.pdf compound annual growth rate of 58% surpassing $1 billion by 2020 forecasted for the Hadoop market http://www.ap-institute.com/big-data- articles/big-data-what-is-hadoop- %E2%80%93-an-explanation-for- absolutely-anyone.aspx http://www.marketanalysis.com/?p=279 of all data is ever analyzed and used at the moment http://www.technologyreview.com/news/51 4346/the-data-made-me-do-it/
  14. 14. Why AWS for Big Data Immediately Available Broad and Deep Capabilities Trusted and Secure Scalable
  15. 15. Collect, Store, Analyze, and Visualize It’s easy to get data to AWS, store it securely, and analyze it with the engine of your choice, without any long-term commitment or vendor lock-in Collect Import/Export Snowball Direct Connect VM Import/Export Store Amazon S3 EMR Amazon Glacier Amazon Redshift DynamoDB Analyze Amazon Kinesis Lambda EMR EC2
  16. 16. AWS – The most complete platform for Big Data Big Data Repositories Clickstream Analysis ETL Offload Machine Learning Online Ad Serving BI Applications
  17. 17. Benefits Unearth opportunities  Save cost and improve revenue Ease of deployment  Technologies are powerful and flexible yet easy to deploy and use No upfront charges  You can start small, scale rapidly and pay as you go
  18. 18. BluePi as AWS Partner AWS Consulting Partner. Preferred partner from last 3 years. Have done 30+ projects and PoCs together. PAN India collaboration to help industries like Logistics, Media, BFSI, Startups.
  19. 19. Why BluePi?  Successfully executed 10+ big data projects.  Handles more than 60TB data in total  With 1 TB of incremental data per month  Serves analytics for 1 billion events per month  Solutions include  Predictive Analysis  Recommendation Engine  Real Time Dashboards
  20. 20. A cross industry need Banking and financial services  Authentication, validation, and fraud prevention Logistics  Optimum route plan, SLA tracking Online commerce  Recommendations, cross sell and upsell opportunities Media and entertainment  Context-based ads and content
  21. 21. Trusted Partner Of.. And many more...
  22. 22. Technologies we thrive on REAL TIME STREAMING VISUALIZATION AQuickSight ht BIG DATA TECHNOLOGIES
  23. 23. What is real time analytics? The analysis of “data in motion”  Data captured by systems for monitoring, alerts, and creating reports  Used for high level decision making The processing of every incoming event  Enables organizations to inspect, analyze, and act IT IS NOT!  Batch processing or micro batch processing  Analysis of static data, even if it was captured a few minutes ago
  24. 24. Why real time analytics?
  25. 25. Pi-stats – Analytics and Recommendation  Real Time Dashboard  Targeted Push Notification  News and Tag Recommendation  Hybrid Machine Learning Algorithm  Trending News Prediction  Language Agnostic
  26. 26.  300+ million monthly events  10000 events/sec  6 million unique users  6 different languages  500 GB data/month Pi-stats – In Numbers
  27. 27. Business problem  Integrated data store across multiple applications to enable complex business analytics and reporting  Adding ~5 GB of data daily to 5TB data warehouse increased complexity  Daily number of dispatches: 350 – 500 k  Number of daily package scans: 10 – 15 m  Number of cities served: 550+  Number of active clients: 3000+
  28. 28. The Solution  Implemented data warehouse and analytics solution on AWS  Using cutting edge technologies like Redshift, Kinesis, Redis and Storm to build solution  Agile methodology used for delivery in 6 months The Benefit  Near real-time reporting to track SLAs  Flexible architecture ensures performance at petabyte scale  Visualization and custom BI tools available
  29. 29. This project has been one of the most challenging and ambitious ones for Delhivery. Given the multifaceted challenges and complexity involved, Bluepi team took up the task enthusiastically and executed it all through with same perseverance and rigor. Being given the kind of reassurance and ongoing support combined with its innovative and collaborative style adds further reason for us to appreciate them. - Kapil Bharti, CTO Delhivery
  30. 30. Questions and Concerns To Know How BluePi and AWS Can Help You Click Here Or e-mail us at info@bluepi.in
  31. 31. Q&A

Notes de l'éditeur

  • Data in the 21st century is like oil in the 18th century, an immensely valuable yet largely untapped asset. Like with oil, for those who see data’s fundamental value and learn to extract and use it there will be huge rewards
  • Big data is typically described basis 3Vs (volume, variety & velocity of big data which is ever increasing) and recently 2 more have got added (value & veracity)-
    Value: Refers to the business relevance of the captured data i.e. how does it help the business ?
    Veracity: Refers to the quality of captured data as it varies greatly. This is important as it affects the accuracy of the analysis
    Variety: Refers to the nature of the captured data. You have a plethora of data sources today and hence you have a broad variety of data be it log/streaming/IoT data or then transactional data. Then you have for example file data with fixed schema (CSV, Parquet, Avro) and file data which is schema free (JSON. Key value). Then you have small files and large files and I could go on
    Velocity: Refers to the speed at which the data is generated and processed. Today for real-time use cases we are talking about milliseconds latency. 1 million reads and writes per second is becoming a norm for example for customers in the digital advertising business
    Volume: Refers to the quantity of data being generated and stored. The size of the data determines the value and potential insight- and whether it can actually be considered big data or not. Customers generating 100-150 TB a day is not very uncommon now
    25+TB of data being generated per second globally
    90+% of world’s data created in last 2 years
    90+% of data generated is unstructured and hence needs some work before it can be meaningfully used
  • Now lets look at how big data processing is evolving
    On the x-axis you have the speed of analysis while on the y-axis you have the type of analytics you can derive basis the same
    With batch analysis its typically descriptive analytics. Descriptive analytics answers the questions what happened and why did it happen. Descriptive analytics looks at past performance and understands that performance by mining historical data to look for the reasons behind past success or failure. Most management reporting - such as sales, marketing, operations, and finance - uses this type of post-mortem analysis. Good for dashboards, reports in response to queries, looking at trends, looking at outcomes e.g. (i) daily customer-preferences report from your web site’s click stream: helps you decide on how to optimize deals and what ad you should try next time, (ii) daily fraud reports: was there fraud yesterday.
    Then it comes to dealing with data in real-time with which it moves to what is happening vs what happened – Great for real-time alerts (what is happening now, what is going wrong now), real-time analysis (what to offer the current customer now), real-time spending caps (transaction gets denied as it exceeds your balance for example)
    The next phase is predictive analytics. Predictive analytics answers the question what might happen. This is when historical performance data is combined with a variety of statistical, modeling, data mining, and machine learning techniques, and occasionally external data to determine the probable future outcome of an event or the likelihood of a situation occurring
    The final phase is prescriptive analytics, which goes beyond predicting future outcomes by also suggesting actions to benefit from the predictions and showing the implications of each decision option. e.g. Think of a traffic navigation app. Pick an origin and a destination — a multitude of factors get mashed together, and it advises you on different route choices, each with a predicted ETA. This is everyday prescriptive analytics at work. Prescriptive analytics can continually take in new data to re-predict and re-prescribe, thus automatically improving prediction accuracy and prescribing better decision options. So prescriptive analytics provide intelligent recommendations for the optimal next steps for almost any application or business process to drive desired outcomes. So while predictive analytics forecasts what might happen in the future, prescriptive analytics can help alter the future
    Example of a retailer that offers free expedited shipping to loyal customers.
    Descriptive analysis would provide the trends on which this program was structured
    Based on past customer behavior, a predictive model would assume that customers will keep the majority of what they purchase with this promotion. However, one customer purchases eight items of clothing but decides to keep only one.
    The retailer paid for expedited shipping with the assumption that there's this great consumer out there who bought eight items, so they're willing to invest and lose a little margin on shipping. The algorithm didn't take return behavior into account.
    For this retailer, reducing its losses on "outlier" customers who don't follow what predictive analytics forecasted means having policies in place to cover itself. Using prescriptive analytics, the retailer might come up with the options of giving an in-store-only coupon to customers who make returns (to encourage another purchase in which shipping isn't a factor) or notifying customers that they must pay for return shipping
  • Big Data was built for the cloud and if you aren't using the cloud for big data then you either aren't dealing with big data or then are struggling/going to run into issues very soon and lets understand why that’s the case
    With big data you typically are dealing with very large or then large and fast growing data sets and with on-prem infrastructure you will run into capacity issues sooner tan later. You have no such capacity issues with the cloud
    With big data there are typically peaks and valleys and its rarely persistent volume and that creates challenges for on-prem infrastructure as you have to provision for peak load which is highly inefficient. The cloud in contrast is most efficient with highly variable workloads
    Given the variety and velocity of big data you will need a set of services & tools to manage the same and managing the same is complex while in the AWS cloud the same set of tools & services are fully managed
  • If you look at a typical big data pipeline, data comes in one side and answers/insights come out the other side and there are multiple stages in between – ingest, store, process & analyze, consume/vizualize with store and process repeating itself multiple times to shape the data in a format that the end consuming application can consume at any rate or any characteristic it demands. What goes on in between is what is called time to answer (pipeline latency), pipeline throughput = f (volume, request rate) and cost
    Before we get to the components that enable this, its important to emphasize that its imperative to start with understanding the use case or in other words the answers and insights that are required, why they are required and how they will help the business before embarking on building out the solution and piecing together the elements to enable it. What’s important is leveraging the data & not the technology stack. The technology exists today to make it all happen quickly, securely & cost efficiently!
    Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology. Amazon Machine Learning provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology. Once your models are ready, Amazon Machine Learning makes it easy to obtain predictions for your application using simple APIs, without having to implement custom prediction generation code, or manage any infrastructure. Amazon Machine Learning is based on the same proven, highly scalable, ML technology used for years by Amazon’s internal data scientist community
    And then we have a new addition to the analytics portfolio by way of Amazon Quicksight our very fast, easy-to-use, cloud-powered business intelligence service for 1/10th the cost of traditional BI solutions at $9/user/mth. Amazon Quicksight is under preview currently
  • Fast, fully managed, petabyte scale datawarehouse
    Fast - Optimized for data warehousing. Redshift has a massively parallel processing (MPP) architecture with columnar storage, data compression and 10GigE networking between nodes for up to 10x better performance than traditional relational, row-based databases
    Cheap - No upfront costs, pay only for resources you provision. Start small for $0.25 per hour and scale over a PB for $935 per TB per year, less than a tenth of most other data warehousing solutions
    Simple – Get started in minutes with a few clicks or a simple API call. Fully managed and fault tolerant. Easy to set up, operate and scale. We take care of provisioning, installation, monitoring, backup, restore and patching
    Scalable – With a few clicks via the Console or a simple API call, you can change the type or number of nodes as your performance or capacity needs change. While resizing, your cluster still runs in read-only mode
    ANSI SQL Compliant – Uses standard JDBC and ODBC drivers, allowing you to use a wide range of familiar SQL clients/BI tools
    Secure – You can encrypt data at rest and in transit using hardware-accelerated AES-256 and SSL, isolate your clusters using Amazon VPC and even manage your keys using hardware security modules (HSMs). Compliant with SOC1, SOC2 & SOC3, FedRAMP, HIPAA and PCI DSS Level 1
    Durability and Availability
    Replication
    Backup
    Automated recovery from failed drives & nodes
    Interfaces
    JDBC/ODBC interface with BI/ETL tools
    Amazon S3 or DynamoDB
    Cost model
    No upfront costs or long term commitments
    Free backup storage equivalent to 100% of provisioned storage
  • Amazon Elastic MapReduce (EMR) simplifies big data processing by providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances
    You can also run other popular distributed frameworks such as Apache Spark and Presto or any other application in the Apache Hadoop stack in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB
    Little trumpeted fact that EMR is the largest cloud operator of Hadoop infrastructure having spun up tens of millions of clusters for customers since 2009
    EMR support the open source & MapR distributions and has the most current Hadoop distribution in the market today with the current versions of the most popular Hadoop apps
    Fully managed and hence simple allowing you to launch a cluster in minutes and takes care of provisioning, set-up, configuration, tuning and monitoring
    Extremely flexible as we have decoupled compute & storage (which also provides a very significant cost benefit), you can select the apps you need as also easily resize a running cluster
    Elastic as you can provision one, hundreds or thousands of instances to process data at any scale
    Typical use cases – Clickstream analysis, log processing, genomics
  • Amazon Kinesis services make it easy to work with real-time streaming data. Lets look at the components and their functionalities
    Amazon Kinesis Streams enables you to build custom applications that process or analyze streaming data for specialized needs. Amazon Kinesis Streams can continuously capture and store terabytes of data per hour from hundreds of thousands of sources such as website clickstreams, financial transactions, social media feeds, IT logs, and location-tracking events. With Amazon Kinesis Client Library (KCL), you can build Amazon Kinesis Applications and use streaming data to power real-time dashboards, generate alerts, implement dynamic pricing and advertising, and more
    Next is Amazon Kinesis Firehose which is the easiest way to load streaming data into AWS. It can capture and automatically load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today
    And then you have Amazon Kinesis Analytics which allows you to easily analyze data streams using standard SQL queries
  • Easy set-up & configuration
    create domains via console, SDK or CLI
    specify instance types, number of instances & storage options
    modify or delete existing domains at any time
    Fully managed
    addresses time consuming management tasks
    ensures high availability, patch management, backups
    monitors cluster and replaces nodes as required
    Flexible storage options
    choose between local on-instance storage or Amazon EBS volumes to store your Elasticsearch indices
    specify size of the Amazon EBS volume and volume type
    modify the storage options after domain creation as needed
    Set-up for High Availability
    Zone Awareness distributes instances supporting the domain across two different AZs
    with replicas enabled, instances are automatically distributed to deliver cross-zone replication
  • Here is a select set of referenceable customers using our analytics services
    The vast majority of Big Data use cases deployed in the cloud today run on AWS
    We now have a large and growing user base in India too

×