SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
What is Big Data?



A new generation of technologies and architectures
designed to economically extract value from very large
volumes of a wide variety of data, by enabling high
velocity capture, discovery and/or analysis
VELOCITY
                              VARIETY
                              VOLUME
                    +   VISUALIZATION

                                   VALUE



Big Data’s impact can be expressed by The Five V’s
 E-Commerce Site fed by outsourced Ad Servers
 Ads appear on a wide range of sites with various offers
 Massive amount of data is generated by these servers:
 • Web logs and click stream data from the E-Commerce Site
 • Ad logs and click stream data from the Ad Servers
 • Results in relational transactions on the site


 Goal: Maximize Traffic Analysis for Business Value
 • Velocity Demo: Pinpoint activity in real-time & react
 • Variety Demo: Examine historical trends across sources
 • Visualization Demo: Enable ad-hoc data analysis for insights



Demo Context
WEB SERVERS



                        How to identify when Ad clicks results in Site Traffic?
                         High volume stream of log activity coming in:
                           •   Web logs and Ad Server logs
                         Real-time stream analysis allows for pinpointing
                          data when it happens
      LOG FILES          Simultaneously join structured and unstructured
                          data in a persistent query
                         Can be used for A/B testing, Offer improvement,
                          Site Dynamic behavior, or Fraud Detection




     AD SERVERS

Velocity Architecture
DEMO: StreamInsight
WEB SERVERS
                       How to do historical analysis on unstructured data?




                        M/R
      LOG FILES


                        Ad Servers and Web Servers generate different log files with different formats
                         making them hard to analyze
                        Map/Reduce processing allows for us to execute a query across variant data
                         formats stored in Hadoop
                        Hive provides a traditional query interface to Map/Reduce
                        Correlate and connect high variety data for trend analysis
     AD SERVERS

Variety Architecture
Access Azure blob storage via a Hive “view” and aggregate session data
 CREATE EXTERNAL TABLE logs (
 date1 STRING,
 time1 STRING,
 action STRING,
 page_uri STRING,
 cookie STRING)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
 STORED AS TEXTFILE
 LOCATION 'asv://logs/logs/';
 CREATE TABLE log_summary AS
 SELECT l.cookie
 ,MAX(regexp_replace(cookie, '[-]', '') % 36) AS geo_hash
 ,MAX(l.time1) AS time1
 ,l.page_uri
 ,MAX(CASE LOWER(action) WHEN 'click' THEN concat(l.date1, ' ', l.time1) ELSE NULL END) AS click_time
 ,MIN(CASE LOWER(action) WHEN 'view' THEN concat(l.date1, ' ', l.time1) ELSE NULL END) AS view_time
 ,MAX(l.date1) AS date1
 FROM logs l
 GROUP BY l.cookie, l.page_uri;

Hive HQL Queries
DEMO: Azure HDInsight
Hadoop is an open source framework for building large scale,
             distributed, data- intensive applications

                                               • Hadoop is HDFS, the
                                                 kernel & M/R
                                               • MapReduce brings the
                                                 code to the data
                                               • Open set of tools exist to
                                                 extend its functional uses
                                                 and representations




Hadoop Ecosystem Overview
The "Map" step                                                     The "Reduce" step
 The mappers are responsible for reading the input data and         Each reducer executes a function on all values for a given
 emitting key/value pairs. The input file can be CSV, XML, or any   key. The framework ensures that all values for the same
 format as long as it can be converted into k/v pairs.              key are sent to the same reducer.




Map/Reduce Distributes Processing of Operations
WEB SERVERS
                     How to do ad-hoc data discovery and visualizations?




                      M/R
      LOG FILES


                      Ad Servers and Web Servers generate different log files with different formats
                       making them hard to analyze
                      Map/Reduce processing allows for us to execute a query across variant data
                       formats stored in Hadoop
                      Hive provides a traditional query interface to Map/Reduce
                      Correlate and connect high variety data for trend analysis
     AD SERVERS

Visualization Architecture
DEMO: Excel & Hive Adapter
 Big Data & Analytics Projects are often Additive
 • New Capabilities layered on top of existing data & apps
 • Analytics can drive Applications in new ways
 Visualizations put Big Data in the hands of the Business




Summary
We are BlueMetal Architects
Take the next steps – Imagine, Define, Build
 Envisioning & Strategy Briefing: Big Data, Analytics & Collaboration
 Envisioning Session: Data is the App – Envisioning the Next
  Generation, Data Driven Enterprise
 Architecture Design Session: Big Data & Analytics
 Healthcare / Life Sciences: Strategy Briefing or Architecture Design
  Session – Big Data Architecture, Cloud & Use Case Driven Analytics
  and applications, Portal, M-Health and UX design for Providers,
  Patients, Pharma & Biotechnology
 Financial Services: Strategy Briefing or Architecture Design Session –
  Big Data & Analytics for Banking, Capital Markets, Retail Brokerage or
  Insurance




Take the next steps - our offerings
Thank You
DESIGN            Differentiation




             UX   DATA     SOCIAL   Specialization




                  CODE              Foundation




Who We Are
DESIGN
                                                       Differentiation
              Strategy     Analysis      Creative



               UX          DATA        SOCIAL
              Desktop      Analytics   Web Content
                                                       Specialization
              Mobile       Big Data      Intranets

             Web Client    Core SQL    Collaboration



               .NET       SERVICES     On-Premise
                                                       Foundation
                Java         PPP          Cloud




Who We Are

Contenu connexe

Tendances

Day1 concurrent fellows
Day1 concurrent fellowsDay1 concurrent fellows
Day1 concurrent fellows
toptrails
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and Solutions
WSO2
 

Tendances (20)

ESRI Mapping & Charting Solution: ArcGIS 10 Production Mapping
ESRI Mapping & Charting Solution: ArcGIS 10 Production MappingESRI Mapping & Charting Solution: ArcGIS 10 Production Mapping
ESRI Mapping & Charting Solution: ArcGIS 10 Production Mapping
 
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business Intelligence
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
Transitioning to a BI Role
Transitioning to a BI RoleTransitioning to a BI Role
Transitioning to a BI Role
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Day1 concurrent fellows
Day1 concurrent fellowsDay1 concurrent fellows
Day1 concurrent fellows
 
Esri Ireland "ArcGIS - The Platform Story" Roadmap Session - Eamonn Doyle, Es...
Esri Ireland "ArcGIS - The Platform Story" Roadmap Session - Eamonn Doyle, Es...Esri Ireland "ArcGIS - The Platform Story" Roadmap Session - Eamonn Doyle, Es...
Esri Ireland "ArcGIS - The Platform Story" Roadmap Session - Eamonn Doyle, Es...
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
Semantic Web Application Development
Semantic Web Application DevelopmentSemantic Web Application Development
Semantic Web Application Development
 
Modernizando plataforma de bi
Modernizando plataforma de biModernizando plataforma de bi
Modernizando plataforma de bi
 
Evolution of Esri Data Formats Seminar
Evolution of Esri Data Formats SeminarEvolution of Esri Data Formats Seminar
Evolution of Esri Data Formats Seminar
 
ArcGIS
ArcGISArcGIS
ArcGIS
 
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesOverview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and Solutions
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data Perspective
 
Data Virtualization Primer - Introduction
Data Virtualization Primer - IntroductionData Virtualization Primer - Introduction
Data Virtualization Primer - Introduction
 
Exploring Puerto Rico Open Data with Power BI
Exploring Puerto Rico Open Data with Power BIExploring Puerto Rico Open Data with Power BI
Exploring Puerto Rico Open Data with Power BI
 
Open Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they CompareOpen Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they Compare
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
 

En vedette

Internet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An IcebergInternet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An Iceberg
Dr. Mazlan Abbas
 

En vedette (15)

El "Internet de Todo" (IoT)
El "Internet de Todo" (IoT)El "Internet de Todo" (IoT)
El "Internet de Todo" (IoT)
 
Iot- Construyendo negocios a través de la información - Carlos Calderón
Iot- Construyendo negocios a través de la información - Carlos CalderónIot- Construyendo negocios a través de la información - Carlos Calderón
Iot- Construyendo negocios a través de la información - Carlos Calderón
 
Big data architectures
Big data architecturesBig data architectures
Big data architectures
 
User and IoT Data Analytics
User and IoT Data AnalyticsUser and IoT Data Analytics
User and IoT Data Analytics
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Tableau Software - Business Analytics and Data Visualization
Tableau Software - Business Analytics and Data VisualizationTableau Software - Business Analytics and Data Visualization
Tableau Software - Business Analytics and Data Visualization
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Big Data: Architectures and Approaches
Big Data: Architectures and ApproachesBig Data: Architectures and Approaches
Big Data: Architectures and Approaches
 
The Internet of Things is Here: Implementing IoT in Your Facility
The Internet of Things is Here: Implementing IoT in Your FacilityThe Internet of Things is Here: Implementing IoT in Your Facility
The Internet of Things is Here: Implementing IoT in Your Facility
 
Internet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An IcebergInternet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An Iceberg
 
Internet of things (IoT) and big data- r.nabati
Internet of things (IoT) and big data- r.nabatiInternet of things (IoT) and big data- r.nabati
Internet of things (IoT) and big data- r.nabati
 
IoT in Agriculture
IoT in AgricultureIoT in Agriculture
IoT in Agriculture
 
The Future of Everything
The Future of EverythingThe Future of Everything
The Future of Everything
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similaire à 20130117 - Big Data Architectures

Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATIONLogitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Avinash Deshpande
 
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureApache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Brad Sarsfield
 

Similaire à 20130117 - Big Data Architectures (20)

Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATIONLogitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Kyvos Insights
Kyvos Insights Kyvos Insights
Kyvos Insights
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
OpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas Corporate Presentation
OpenSistemas Corporate Presentation
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Denodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data ServicesDenodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data Services
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Self service BI with sql server 2008 R2 and microsoft power pivot short
Self service BI with sql server 2008 R2 and microsoft power pivot shortSelf service BI with sql server 2008 R2 and microsoft power pivot short
Self service BI with sql server 2008 R2 and microsoft power pivot short
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Neelima_Resume
Neelima_ResumeNeelima_Resume
Neelima_Resume
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureApache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 

Plus de BlueMetalInc

Plus de BlueMetalInc (10)

Field enablement roadshow keynote - Bob Familiar
Field enablement roadshow keynote - Bob FamiliarField enablement roadshow keynote - Bob Familiar
Field enablement roadshow keynote - Bob Familiar
 
Field Enablement Business Drivers - Matt Bienfang
Field Enablement Business Drivers - Matt BienfangField Enablement Business Drivers - Matt Bienfang
Field Enablement Business Drivers - Matt Bienfang
 
Field enablement roadshow - Real World Solutions - John Pelak
Field enablement roadshow - Real World Solutions - John PelakField enablement roadshow - Real World Solutions - John Pelak
Field enablement roadshow - Real World Solutions - John Pelak
 
BlueMetal - Our Company Culture in 30 Seconds
BlueMetal - Our Company Culture in 30 SecondsBlueMetal - Our Company Culture in 30 Seconds
BlueMetal - Our Company Culture in 30 Seconds
 
Automating Site Provisioning in SharePoint - Presented 7/27/13 at SharePoint ...
Automating Site Provisioning in SharePoint - Presented 7/27/13 at SharePoint ...Automating Site Provisioning in SharePoint - Presented 7/27/13 at SharePoint ...
Automating Site Provisioning in SharePoint - Presented 7/27/13 at SharePoint ...
 
Apps 101 - Moving to the SharePoint 2013 App Model - Presented 7/27/13 at Sha...
Apps 101 - Moving to the SharePoint 2013 App Model - Presented 7/27/13 at Sha...Apps 101 - Moving to the SharePoint 2013 App Model - Presented 7/27/13 at Sha...
Apps 101 - Moving to the SharePoint 2013 App Model - Presented 7/27/13 at Sha...
 
20130427 What's Your Social IQ?
20130427 What's Your Social IQ?20130427 What's Your Social IQ?
20130427 What's Your Social IQ?
 
20130427 - Turbocharge SharePoint 2010 with SharePoint 2013 Search
20130427 - Turbocharge SharePoint 2010 with SharePoint 2013 Search20130427 - Turbocharge SharePoint 2010 with SharePoint 2013 Search
20130427 - Turbocharge SharePoint 2010 with SharePoint 2013 Search
 
Turbo-Charge Collaboration by Automating Site Provisioning in SharePoint 2010
Turbo-Charge Collaboration by Automating Site Provisioning in SharePoint 2010Turbo-Charge Collaboration by Automating Site Provisioning in SharePoint 2010
Turbo-Charge Collaboration by Automating Site Provisioning in SharePoint 2010
 
Empowering business users with hybrid solutions
Empowering business users with hybrid solutionsEmpowering business users with hybrid solutions
Empowering business users with hybrid solutions
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

20130117 - Big Data Architectures

  • 1.
  • 2. What is Big Data? A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis
  • 3. VELOCITY VARIETY VOLUME + VISUALIZATION VALUE Big Data’s impact can be expressed by The Five V’s
  • 4.  E-Commerce Site fed by outsourced Ad Servers  Ads appear on a wide range of sites with various offers  Massive amount of data is generated by these servers: • Web logs and click stream data from the E-Commerce Site • Ad logs and click stream data from the Ad Servers • Results in relational transactions on the site  Goal: Maximize Traffic Analysis for Business Value • Velocity Demo: Pinpoint activity in real-time & react • Variety Demo: Examine historical trends across sources • Visualization Demo: Enable ad-hoc data analysis for insights Demo Context
  • 5. WEB SERVERS How to identify when Ad clicks results in Site Traffic?  High volume stream of log activity coming in: • Web logs and Ad Server logs  Real-time stream analysis allows for pinpointing data when it happens LOG FILES  Simultaneously join structured and unstructured data in a persistent query  Can be used for A/B testing, Offer improvement, Site Dynamic behavior, or Fraud Detection AD SERVERS Velocity Architecture
  • 7. WEB SERVERS How to do historical analysis on unstructured data? M/R LOG FILES  Ad Servers and Web Servers generate different log files with different formats making them hard to analyze  Map/Reduce processing allows for us to execute a query across variant data formats stored in Hadoop  Hive provides a traditional query interface to Map/Reduce  Correlate and connect high variety data for trend analysis AD SERVERS Variety Architecture
  • 8. Access Azure blob storage via a Hive “view” and aggregate session data CREATE EXTERNAL TABLE logs ( date1 STRING, time1 STRING, action STRING, page_uri STRING, cookie STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 'asv://logs/logs/'; CREATE TABLE log_summary AS SELECT l.cookie ,MAX(regexp_replace(cookie, '[-]', '') % 36) AS geo_hash ,MAX(l.time1) AS time1 ,l.page_uri ,MAX(CASE LOWER(action) WHEN 'click' THEN concat(l.date1, ' ', l.time1) ELSE NULL END) AS click_time ,MIN(CASE LOWER(action) WHEN 'view' THEN concat(l.date1, ' ', l.time1) ELSE NULL END) AS view_time ,MAX(l.date1) AS date1 FROM logs l GROUP BY l.cookie, l.page_uri; Hive HQL Queries
  • 10. Hadoop is an open source framework for building large scale, distributed, data- intensive applications • Hadoop is HDFS, the kernel & M/R • MapReduce brings the code to the data • Open set of tools exist to extend its functional uses and representations Hadoop Ecosystem Overview
  • 11. The "Map" step The "Reduce" step The mappers are responsible for reading the input data and Each reducer executes a function on all values for a given emitting key/value pairs. The input file can be CSV, XML, or any key. The framework ensures that all values for the same format as long as it can be converted into k/v pairs. key are sent to the same reducer. Map/Reduce Distributes Processing of Operations
  • 12. WEB SERVERS How to do ad-hoc data discovery and visualizations? M/R LOG FILES  Ad Servers and Web Servers generate different log files with different formats making them hard to analyze  Map/Reduce processing allows for us to execute a query across variant data formats stored in Hadoop  Hive provides a traditional query interface to Map/Reduce  Correlate and connect high variety data for trend analysis AD SERVERS Visualization Architecture
  • 13. DEMO: Excel & Hive Adapter
  • 14.  Big Data & Analytics Projects are often Additive • New Capabilities layered on top of existing data & apps • Analytics can drive Applications in new ways Visualizations put Big Data in the hands of the Business Summary
  • 15. We are BlueMetal Architects
  • 16. Take the next steps – Imagine, Define, Build
  • 17.  Envisioning & Strategy Briefing: Big Data, Analytics & Collaboration  Envisioning Session: Data is the App – Envisioning the Next Generation, Data Driven Enterprise  Architecture Design Session: Big Data & Analytics  Healthcare / Life Sciences: Strategy Briefing or Architecture Design Session – Big Data Architecture, Cloud & Use Case Driven Analytics and applications, Portal, M-Health and UX design for Providers, Patients, Pharma & Biotechnology  Financial Services: Strategy Briefing or Architecture Design Session – Big Data & Analytics for Banking, Capital Markets, Retail Brokerage or Insurance Take the next steps - our offerings
  • 19. DESIGN Differentiation UX DATA SOCIAL Specialization CODE Foundation Who We Are
  • 20. DESIGN Differentiation Strategy Analysis Creative UX DATA SOCIAL Desktop Analytics Web Content Specialization Mobile Big Data Intranets Web Client Core SQL Collaboration .NET SERVICES On-Premise Foundation Java PPP Cloud Who We Are