SlideShare a Scribd company logo
1 of 30
Download to read offline
Technische Universität München
Fabian Prasser, Florian Kohlmayer
Lehrstuhl für Medizinische Informatik
Institut für Medizinische Statistik und Epidemiologie
Klinikum rechts der Isar der TU München
ARX 3.0.0
A Comprehensive Tool for
Anonymizing Biomedical Data
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Towards useful data anonymization
• Usability has many dimensions
• Ability to balance data utility with privacy requirements
• Need to support a broad spectrum of methods
• Privacy models
• Transformation models
• Methods for analyzing data utility
• Methods for analyzing risks
• Further “non-functional” requirements
• Integrated and harmonized: ARX is not a “tool box”
• Compatibility (syntactic and semantic)
• Performance and scalability
• Intuitive visualization and parameterization
• Provide methods to end-users as well as programmers
19.03.2015 2
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Compatibility
• Built-in data import facilities
• Relational databases (MS SQL, DB2, SQLite, MySQL)
• MS Excel
• CSV (all common formats, auto-detection)
• Support for different data types and scales of measure
• Strings (with nominal and ordinal scale)
• Dates (interval scale)
• Numbers (ratio scale)
• Automatic detection of data types and formats
• Methods for handling and cleaning low-quality data
• Handles missing and invalid values correctly
• In privacy models, transformation methods, visualizations
• Manual removal of tuples, query interface, find & replace
19.03.2015 3
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Flexible transformation methods
• Global recoding
• Full-domain generalization
• Top- & bottom coding
• Local recoding
• Tuple suppression
• Fully integrated and parameterizable
• Importance of attributes, suitability of methods
• Functional representations of transformation rules
• Especially functional representations of hierarchies
• Support for categorical and continuous variables (categorization)
• Multiple methods for measuring data utility
• Parametrizable, e.g., with different aggregate functions
• Use functional representations of transformation rules
19.03.2015 4
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Multiple apriori privacy models
• k-Anonymity
• ℓ-Diversity (distinct, entropy, recursive-(c, ℓ))
• t-Closeness (equal and hierarchical ground distance)
• δ-Presence
• Multiple methods for risk-based anonymization
• Sample characteristics
• Average cell size
• Sample uniqueness
• Super-population models
• Decision rule by Dankar et al.
• Based on models by Pitman, Zayatz and the SNB model
• Support for arbitrary combinations of these models
• Optimal solution within our coding model
19.03.2015 5
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Scalability: ARX can handle large datasets (several million data
entries) on commodity hardware
• Efficient in-memory data management engine
• Works with compressed data representations
• Tight coupling between transformation operators and the „database
kernel”
• Provides a space-time trade-off
• Optimized search strategy: Based on multiple pruning strategies
• Efficient implementations of further complex tasks
• Evaluations of privacy criteria (e.g. t-closeness, δ-presence)
• Methods for solving non-linear equation systems
• Background jobs in the user interface
19.03.2015 6
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Comprehensive graphical interface
• Scalablity comparable to the ARX API
• Supports all methods provided by the ARX API
• Cross-platform (Windows, Linux/GTK, OSX) with native interfaces
• Available as binary distributions with installers
• Independent API
• User interface sits on top
of the API
• Java library
• All methods provided by ARX
are first-class citizens in both
worlds
19.03.2015 7
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Anonymization workflow
• Iterative process to successively refine transformations
• Supported by the scalability of our framework
• Three (repeating) steps, mapped to four perspectives
19.03.2015 8
- Define transformation model
- Define privacy model
- Define coding model
- Filter and analyze the
solution space
- Organize transformations
- Compare and analyze
input and output
- Regarding risks and
utility
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 9
ARX: Wizard for data import
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 10
ARX: Configuration (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 11
ARX: Configuration (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 12
ARX: Configuration (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 13
ARX: Configuration (4)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 14
ARX: Wizard for transformation rules
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 15
ARX: Further dialogs
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 16
ARX: Exploration (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 17
ARX: Exploration (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 18
ARX: Exploration (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 19
ARX: Utility analysis (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 20
ARX: Utility analysis (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 21
ARX: Utility analysis (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 22
ARX: Utility analysis (4)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 23
ARX: Utility analysis (5)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 24
ARX: Risk analysis (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 25
ARX: Risk analysis (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 26
ARX: Risk analysis (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 27
ARX: Risk analysis (4)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 28
ARX: Context-sensitive help
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
●
Current projects
●
Non-interactive Differential Privacy
●
Support for high-dimensional data: heuristic algorithms
●
Further analyses and visualizations: utility and risks
●
Support for transactional attributes: (k, km
)-anonymity
●
Integrated data masking methods
●
More flexible definition of quasi-identifiers
●
More risk models
●
Planned projects
●
Auto-detection of HIPAA identifiers
●
Implement more flexible privacy criteria
19.03.2015 29
ARX: Further developments
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
• ARX is open source software
• Contribute: feature requests, code reviews, criticism,
enhancements, questions
• Repository: https://github.com/arx-deidentifier/arx
• Further information & download: http://arx.deidentifier.org
• Get in touch
• Fabian Prasser (prasser@in.tum.de)
• Florian Kohlmayer (florian.kohlmayer@tum.de)
19.03.2015 30
• Any questions?
Thank you for your attention

More Related Content

What's hot

Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchUniversity Medicine Greifswald
 
Model management tools for improved reproducibility in systems biology
Model management tools for improved reproducibility in systems biologyModel management tools for improved reproducibility in systems biology
Model management tools for improved reproducibility in systems biologyUniversity Medicine Greifswald
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningYashwant Rautela
 
20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraphOpenAIRE
 
Nicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification systemNicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification systemAIST
 
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...Christiane Behnert
 
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings Kerstin Forsberg
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesOla Spjuth
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPjaya lakshmi
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineeringJannatulFerdous160
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeLars Juhl Jensen
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
Image Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsImage Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsTobias Kuhn
 

What's hot (20)

Medical 3D data retrieval
Medical 3D data retrievalMedical 3D data retrieval
Medical 3D data retrieval
 
Data and Model Management for Systems Biology
Data and Model Management  for Systems BiologyData and Model Management  for Systems Biology
Data and Model Management for Systems Biology
 
Short introduction to SED-ML
Short introduction to SED-MLShort introduction to SED-ML
Short introduction to SED-ML
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
 
Data and model management in Systems Biology
Data and model management in Systems BiologyData and model management in Systems Biology
Data and model management in Systems Biology
 
Model management tools for improved reproducibility in systems biology
Model management tools for improved reproducibility in systems biologyModel management tools for improved reproducibility in systems biology
Model management tools for improved reproducibility in systems biology
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph
 
Nicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification systemNicolay Lyfenko - Conceptual scheme for text classification system
Nicolay Lyfenko - Conceptual scheme for text classification system
 
Eira presentation
Eira presentationEira presentation
Eira presentation
 
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
 
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
 
Heterogeneous data annotation
Heterogeneous data annotationHeterogeneous data annotation
Heterogeneous data annotation
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life Sciences
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAP
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineering
 
Chaper 13 trend
Chaper 13 trendChaper 13 trend
Chaper 13 trend
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Image Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsImage Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical Publications
 

Similar to ARX - A Comprehensive Tool for Anonymizing Biomedical Data

ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...Juan Antonio Vizcaino
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)ijccmsjournal
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuantUniversity
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
Overview of the altmetrics landscape
Overview of the altmetrics landscapeOverview of the altmetrics landscape
Overview of the altmetrics landscapeRichard Cave
 
TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015Kees van Bochove
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)ijccmsjournal
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Towards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailTowards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailPaulo Zanini
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondonPaul Agapow
 
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsDr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsInnovation Agency
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 

Similar to ARX - A Comprehensive Tool for Anonymizing Biomedical Data (20)

ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
 
Chemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistryChemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistry
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
Principles and risk assessment of managing distributed ontologies hosted by e...
Principles and risk assessment of managing distributed ontologies hosted by e...Principles and risk assessment of managing distributed ontologies hosted by e...
Principles and risk assessment of managing distributed ontologies hosted by e...
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Overview of the altmetrics landscape
Overview of the altmetrics landscapeOverview of the altmetrics landscape
Overview of the altmetrics landscape
 
TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Towards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailTowards batch one size with industrial semantics email
Towards batch one size with industrial semantics email
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
 
CIKB poster_2015
CIKB poster_2015CIKB poster_2015
CIKB poster_2015
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
1 3 introduction to open_ehr
1 3 introduction to open_ehr1 3 introduction to open_ehr
1 3 introduction to open_ehr
 
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsDr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 

Recently uploaded

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 

Recently uploaded (20)

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 

ARX - A Comprehensive Tool for Anonymizing Biomedical Data

  • 1. Technische Universität München Fabian Prasser, Florian Kohlmayer Lehrstuhl für Medizinische Informatik Institut für Medizinische Statistik und Epidemiologie Klinikum rechts der Isar der TU München ARX 3.0.0 A Comprehensive Tool for Anonymizing Biomedical Data
  • 2. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Towards useful data anonymization • Usability has many dimensions • Ability to balance data utility with privacy requirements • Need to support a broad spectrum of methods • Privacy models • Transformation models • Methods for analyzing data utility • Methods for analyzing risks • Further “non-functional” requirements • Integrated and harmonized: ARX is not a “tool box” • Compatibility (syntactic and semantic) • Performance and scalability • Intuitive visualization and parameterization • Provide methods to end-users as well as programmers 19.03.2015 2
  • 3. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Compatibility • Built-in data import facilities • Relational databases (MS SQL, DB2, SQLite, MySQL) • MS Excel • CSV (all common formats, auto-detection) • Support for different data types and scales of measure • Strings (with nominal and ordinal scale) • Dates (interval scale) • Numbers (ratio scale) • Automatic detection of data types and formats • Methods for handling and cleaning low-quality data • Handles missing and invalid values correctly • In privacy models, transformation methods, visualizations • Manual removal of tuples, query interface, find & replace 19.03.2015 3
  • 4. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Flexible transformation methods • Global recoding • Full-domain generalization • Top- & bottom coding • Local recoding • Tuple suppression • Fully integrated and parameterizable • Importance of attributes, suitability of methods • Functional representations of transformation rules • Especially functional representations of hierarchies • Support for categorical and continuous variables (categorization) • Multiple methods for measuring data utility • Parametrizable, e.g., with different aggregate functions • Use functional representations of transformation rules 19.03.2015 4
  • 5. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Multiple apriori privacy models • k-Anonymity • ℓ-Diversity (distinct, entropy, recursive-(c, ℓ)) • t-Closeness (equal and hierarchical ground distance) • δ-Presence • Multiple methods for risk-based anonymization • Sample characteristics • Average cell size • Sample uniqueness • Super-population models • Decision rule by Dankar et al. • Based on models by Pitman, Zayatz and the SNB model • Support for arbitrary combinations of these models • Optimal solution within our coding model 19.03.2015 5
  • 6. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Scalability: ARX can handle large datasets (several million data entries) on commodity hardware • Efficient in-memory data management engine • Works with compressed data representations • Tight coupling between transformation operators and the „database kernel” • Provides a space-time trade-off • Optimized search strategy: Based on multiple pruning strategies • Efficient implementations of further complex tasks • Evaluations of privacy criteria (e.g. t-closeness, δ-presence) • Methods for solving non-linear equation systems • Background jobs in the user interface 19.03.2015 6
  • 7. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Comprehensive graphical interface • Scalablity comparable to the ARX API • Supports all methods provided by the ARX API • Cross-platform (Windows, Linux/GTK, OSX) with native interfaces • Available as binary distributions with installers • Independent API • User interface sits on top of the API • Java library • All methods provided by ARX are first-class citizens in both worlds 19.03.2015 7
  • 8. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Anonymization workflow • Iterative process to successively refine transformations • Supported by the scalability of our framework • Three (repeating) steps, mapped to four perspectives 19.03.2015 8 - Define transformation model - Define privacy model - Define coding model - Filter and analyze the solution space - Organize transformations - Compare and analyze input and output - Regarding risks and utility
  • 9. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 9 ARX: Wizard for data import
  • 10. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 10 ARX: Configuration (1)
  • 11. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 11 ARX: Configuration (2)
  • 12. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 12 ARX: Configuration (3)
  • 13. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 13 ARX: Configuration (4)
  • 14. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 14 ARX: Wizard for transformation rules
  • 15. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 15 ARX: Further dialogs
  • 16. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 16 ARX: Exploration (1)
  • 17. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 17 ARX: Exploration (2)
  • 18. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 18 ARX: Exploration (3)
  • 19. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 19 ARX: Utility analysis (1)
  • 20. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 20 ARX: Utility analysis (2)
  • 21. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 21 ARX: Utility analysis (3)
  • 22. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 22 ARX: Utility analysis (4)
  • 23. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 23 ARX: Utility analysis (5)
  • 24. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 24 ARX: Risk analysis (1)
  • 25. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 25 ARX: Risk analysis (2)
  • 26. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 26 ARX: Risk analysis (3)
  • 27. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 27 ARX: Risk analysis (4)
  • 28. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 28 ARX: Context-sensitive help
  • 29. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ● Current projects ● Non-interactive Differential Privacy ● Support for high-dimensional data: heuristic algorithms ● Further analyses and visualizations: utility and risks ● Support for transactional attributes: (k, km )-anonymity ● Integrated data masking methods ● More flexible definition of quasi-identifiers ● More risk models ● Planned projects ● Auto-detection of HIPAA identifiers ● Implement more flexible privacy criteria 19.03.2015 29 ARX: Further developments
  • 30. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data • ARX is open source software • Contribute: feature requests, code reviews, criticism, enhancements, questions • Repository: https://github.com/arx-deidentifier/arx • Further information & download: http://arx.deidentifier.org • Get in touch • Fabian Prasser (prasser@in.tum.de) • Florian Kohlmayer (florian.kohlmayer@tum.de) 19.03.2015 30 • Any questions? Thank you for your attention