SlideShare une entreprise Scribd logo
1  sur  31
April 10-12, Chicago, IL
Ensuring Compliance of
Patient Data with Big Data
and BI
Ayad Shammout & Denny Lee
April 10-12, Chicago, IL
Please silence
cell phones
3
Agenda
A Quick Big Data Primer
Healthcare and Big Data
Compliance and Auditing
SQL Compliance Project
Compliance and Auditing with Big Data and BI
Big Data: Unstructured Volumes of Data
Analytics: PowerPivot, Power View
4
What is Big Data?
Volume
Exceeds physical limits of vertical scalability
Velocity
Decision window small compared to data
change rate
Variety
Many different formats makes integration
expensive
Variability
Many options or variable interpretations
confound analysis
5
10x
increase every
five years
85%from
new data types
Data
explosion
Volume
Velocity
Variety
Hadoop
Cloud
By 2015, organizations that
build a modern information
management system will
outperform their peers
financially by 20 percent.

 – Gartner, Mark Beyer
“Information Management in the
21st Century”
7
Big Data Business Value
140,000-190,000
1.5 million
$300 billion
15 out of 17
€250 billion 50-60%
8
Data
9
Hadoop: The most visible face of Big Data
10
HDInsight: Visit HadoopOnAzure.com
10
Healthcare
and Big Data
12
Healthcare and IT
Often the laggard in technology
Yet application of IT to healthcare can radically change what we can do
Genomic Sequencing
Proteomic sequencing
Incidence Prediction
13
Healthcare Big Data Example Scenarios
Clinical Trial Deviations
Originally Viagra was developed to lower blood pressure and treat Angina
Now its used to help newborn pulmonary hypertension and altitude sickness
Incidence Prediction
Missed 4 or more visits, twice as likely to have an asthmatic incident
Particular Cardiac monitor sine wave points to highly likelihood of heart attack
Campaigns
Social media and advertising campaigns to understand user behavior and sentiment
Patient Satisfaction
Social media and advertising campaigns to understand user behavior and sentiment
14
BIDMC Auditing Scenario
Auditing is critical component HIPAA in ensuring patient privacy
1 Billion rows+ of audit data
146 mission critical clinical applications
Comprehensive audits yield 300-500k transactions/day
HIPAA requires audit system with 20 years of data
Auditing Project
Available to community as part of Compliance SDK
Updating for SQL Server 2012, HDInsight, Power View, and MobileBI*
Creating an enterprise tool for consolidated storage, reporting and alerting of all application audit
data - that's cool!
John Halamka’s Cool Technology of the Week
(Wellsphere Top Health Blogger, Health Impact Award)
15
BIDMC Compliance Project
SSIS
SSIS
SSIS
HDInsight
Windows
HDInsight
Azure
SQLServer
2008/2012
Audit LogsETL Logs to
HDFS
Use Excel 2013
PowerPivot and Power
View
SSAS (tabular)
16
Auditing Sensitive Information
16
Querying Audit Information
Use PowerPivot / Power View / Analysis Services to Query the data.
Security InformationPolicy Information
Process Audit Information
Use SSIS to process SQL2008 All-Actions Audit Information and other CG application
audit log data; potentially can use Management Performance DW framework.
Caregroup Environment
File Server
SQL Audit
Connect/Logic
SSIS
CG Application Data
Intersystems
Cache
SQL2005
Oracle
SQL2008 All-Actions Audit Data
SQL 2008 / 2012 R2
SSRS 2008 /
Power View
Policy Analysis
Policy Reports
Policy Best
Practices
Security Analysis
Security Reports
Compliance
Reports
Feedback Action Loop
Update systems to keep them
compliant and secure
Audit Logs
17
Storage Infrastructure
Transfer files to ASV via AzCopy,
CloudExplorer, etc.
18
Storage Infrastructure
18
Hadoop on Azure
Compute Nodes (Medium VMs)
Azure Storage Vault (ASV)
Azure Blob Storage
Azure Flat Network Storage
19
Storage Infrastructure
19
Hadoop on Azure
Compute Nodes (Medium VMs)
Azure Storage Vault (ASV)
Azure Blob Storage
Azure Flat Network Storage
Stream data
To compute
Push data
Back to Storage
map sort shuffle reduce
http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/
2020
SSIS to HDInsight
2121
SSIS
Processing
22
SSAS
Tabular
of HoA
Audit
Data
23
Hadoop / Auditing: File sizes
Currently testing gz vs. raw
E.g. 12MB raw text file vs. 633Kb gz file (~20x compression)
20x smaller size, ~same query time
Approx same map / reduce task utilization
File Size is 250MB-1GB
SSIS package takes care of the size
Future testing: avro, protobuf
23
Query Duration (s)
select count(*) from sql_audit_asv_raw 56.066
select count(*) from sql_audit_asv_gz 58.994
24
Hadoop / Auditing: Formats
For ease of processing, replace carriage returns within embedded SQL
statements, e.g.
select col1, col2
from tableA
to
select col1, col2 from tableA
This allows you to create a Hive table using CR as row delimiter (i.e.
does not have things like SQL quoted identifiers)
24
25
SQOOP, HiveODBC,
Templeton, CSV, etc
BI Connectivity
27
Big Data … Excel-lerated!
2 Server, 3mo
110 GB
binary
files
SSIS
SSIS
SSIS
SSIS extraction
1.2GB of text
120MB gz
Hadoop to
PowerPivot
6MB
28
PowerPivot workbook of HoA Audit data
29
Power View of HoA Audit Data
30
Win a Microsoft Surface Pro!
Complete an online SESSION EVALUATION
to be entered into the draw.
Draw closes April 12, 11:59pm CT
Winners will be announced on the PASS BA
Conference website and on Twitter.
Go to passbaconference.com/evals or follow the QR code link displayed on
session signage throughout the conference venue.
Your feedback is important and valuable. All feedback will be used to improve
and select sessions for future events.
April 10-12, Chicago, IL
Thank you!
Diamond Sponsor Platinum Sponsor

Contenu connexe

Tendances

Tendances (20)

Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
tecFinal 451 webinar deck
tecFinal 451 webinar decktecFinal 451 webinar deck
tecFinal 451 webinar deck
 
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data Grids
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Introduction to Azure Stream Analytics
Introduction to Azure Stream AnalyticsIntroduction to Azure Stream Analytics
Introduction to Azure Stream Analytics
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics SolutionCortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 

Similaire à Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)

Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
Amazon Web Services
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
Ajay Shriwastava
 
bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000
Kartik Padmanabhan
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data Virtualization
Kenneth Peeples
 

Similaire à Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078) (20)

Ensuring compliance of patient data with big data
Ensuring compliance of patient data with big dataEnsuring compliance of patient data with big data
Ensuring compliance of patient data with big data
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCM
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a Service
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
 
bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data Virtualization
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
Big Data, Big Picture: Can You See It?
Big Data, Big Picture: Can You See It?Big Data, Big Picture: Can You See It?
Big Data, Big Picture: Can You See It?
 
JDV Big Data Webinar v2
JDV Big Data Webinar v2JDV Big Data Webinar v2
JDV Big Data Webinar v2
 

Plus de Denny Lee

Plus de Denny Lee (20)

Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceAzure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best Practices
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherYahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
SQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecuritySQLCAT - Data and Admin Security
SQLCAT - Data and Admin Security
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best Practices
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePoint
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
Big Data, Bigger Brains
Big Data, Bigger BrainsBig Data, Bigger Brains
Big Data, Bigger Brains
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On Time
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)

  • 1. April 10-12, Chicago, IL Ensuring Compliance of Patient Data with Big Data and BI Ayad Shammout & Denny Lee
  • 2. April 10-12, Chicago, IL Please silence cell phones
  • 3. 3 Agenda A Quick Big Data Primer Healthcare and Big Data Compliance and Auditing SQL Compliance Project Compliance and Auditing with Big Data and BI Big Data: Unstructured Volumes of Data Analytics: PowerPivot, Power View
  • 4. 4 What is Big Data? Volume Exceeds physical limits of vertical scalability Velocity Decision window small compared to data change rate Variety Many different formats makes integration expensive Variability Many options or variable interpretations confound analysis
  • 5. 5 10x increase every five years 85%from new data types Data explosion Volume Velocity Variety Hadoop Cloud By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.   – Gartner, Mark Beyer “Information Management in the 21st Century”
  • 6.
  • 7. 7 Big Data Business Value 140,000-190,000 1.5 million $300 billion 15 out of 17 €250 billion 50-60%
  • 9. 9 Hadoop: The most visible face of Big Data
  • 12. 12 Healthcare and IT Often the laggard in technology Yet application of IT to healthcare can radically change what we can do Genomic Sequencing Proteomic sequencing Incidence Prediction
  • 13. 13 Healthcare Big Data Example Scenarios Clinical Trial Deviations Originally Viagra was developed to lower blood pressure and treat Angina Now its used to help newborn pulmonary hypertension and altitude sickness Incidence Prediction Missed 4 or more visits, twice as likely to have an asthmatic incident Particular Cardiac monitor sine wave points to highly likelihood of heart attack Campaigns Social media and advertising campaigns to understand user behavior and sentiment Patient Satisfaction Social media and advertising campaigns to understand user behavior and sentiment
  • 14. 14 BIDMC Auditing Scenario Auditing is critical component HIPAA in ensuring patient privacy 1 Billion rows+ of audit data 146 mission critical clinical applications Comprehensive audits yield 300-500k transactions/day HIPAA requires audit system with 20 years of data Auditing Project Available to community as part of Compliance SDK Updating for SQL Server 2012, HDInsight, Power View, and MobileBI* Creating an enterprise tool for consolidated storage, reporting and alerting of all application audit data - that's cool! John Halamka’s Cool Technology of the Week (Wellsphere Top Health Blogger, Health Impact Award)
  • 15. 15 BIDMC Compliance Project SSIS SSIS SSIS HDInsight Windows HDInsight Azure SQLServer 2008/2012 Audit LogsETL Logs to HDFS Use Excel 2013 PowerPivot and Power View SSAS (tabular)
  • 16. 16 Auditing Sensitive Information 16 Querying Audit Information Use PowerPivot / Power View / Analysis Services to Query the data. Security InformationPolicy Information Process Audit Information Use SSIS to process SQL2008 All-Actions Audit Information and other CG application audit log data; potentially can use Management Performance DW framework. Caregroup Environment File Server SQL Audit Connect/Logic SSIS CG Application Data Intersystems Cache SQL2005 Oracle SQL2008 All-Actions Audit Data SQL 2008 / 2012 R2 SSRS 2008 / Power View Policy Analysis Policy Reports Policy Best Practices Security Analysis Security Reports Compliance Reports Feedback Action Loop Update systems to keep them compliant and secure
  • 17. Audit Logs 17 Storage Infrastructure Transfer files to ASV via AzCopy, CloudExplorer, etc.
  • 18. 18 Storage Infrastructure 18 Hadoop on Azure Compute Nodes (Medium VMs) Azure Storage Vault (ASV) Azure Blob Storage Azure Flat Network Storage
  • 19. 19 Storage Infrastructure 19 Hadoop on Azure Compute Nodes (Medium VMs) Azure Storage Vault (ASV) Azure Blob Storage Azure Flat Network Storage Stream data To compute Push data Back to Storage map sort shuffle reduce http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/
  • 23. 23 Hadoop / Auditing: File sizes Currently testing gz vs. raw E.g. 12MB raw text file vs. 633Kb gz file (~20x compression) 20x smaller size, ~same query time Approx same map / reduce task utilization File Size is 250MB-1GB SSIS package takes care of the size Future testing: avro, protobuf 23 Query Duration (s) select count(*) from sql_audit_asv_raw 56.066 select count(*) from sql_audit_asv_gz 58.994
  • 24. 24 Hadoop / Auditing: Formats For ease of processing, replace carriage returns within embedded SQL statements, e.g. select col1, col2 from tableA to select col1, col2 from tableA This allows you to create a Hive table using CR as row delimiter (i.e. does not have things like SQL quoted identifiers) 24
  • 25. 25
  • 26. SQOOP, HiveODBC, Templeton, CSV, etc BI Connectivity
  • 27. 27 Big Data … Excel-lerated! 2 Server, 3mo 110 GB binary files SSIS SSIS SSIS SSIS extraction 1.2GB of text 120MB gz Hadoop to PowerPivot 6MB
  • 28. 28 PowerPivot workbook of HoA Audit data
  • 29. 29 Power View of HoA Audit Data
  • 30. 30 Win a Microsoft Surface Pro! Complete an online SESSION EVALUATION to be entered into the draw. Draw closes April 12, 11:59pm CT Winners will be announced on the PASS BA Conference website and on Twitter. Go to passbaconference.com/evals or follow the QR code link displayed on session signage throughout the conference venue. Your feedback is important and valuable. All feedback will be used to improve and select sessions for future events.
  • 31. April 10-12, Chicago, IL Thank you! Diamond Sponsor Platinum Sponsor

Notes de l'éditeur

  1. Centralizing Logs Allows you to have one system process all audit logs from your servers Easier manageability Set files to 250MB in size (less files, but not too large to process)Optimized for Hadoop General Rule of Thumb: 250MB-1GB file sizes Can also centralize processing … and centralize reportingCompliance SDK contains the full projectOrganized by Server, Database, DDL, and DML actions