SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
RESEARCH POSTER PRESENTATION DESIGN © 2015
www.PosterPresentations.com
Introduction
Distributed Near Duplicate Detection
●
Integrate medical data from various heterogeneous medical data sources and private
archives using the public APIs.
●
Curate the integrated data into a data warehouse for public access.
●
Store the detected duplicate pairs into a separate data source.
●
Duplicate detection by analyzing the potential data pairs from the original data sources,
using similarity matrices for textual data.
●
Hierarchical meta data attached to the binary medical data to identify, classify, and find
duplicates among the binary raw data.
●
Considers the inconsistencies in representation.
– Usage of acronyms instead of the full form of the attributes.
– Using different measurement units.
●
Data is published to various data sources by the medical data publishers
– through the respective write APIs of the data sources.
●
Connects to the original data sources through their read APIs.
●
Output of consolidated data and duplicate pairs
– stored through the relevant write APIs.
●
Medical data consumers consume the data from the warehouse composed by MediCurator
through its read API.
●
The data warehouse is considered to be free from the duplicates
– False positives and false negatives.
– based on the effectiveness of the similarity matrices and similarity join algorithms used.
References
●
Xiao, C., Wang, W., Lin, X., Yu, J. X., & Wang, G. (2011). Efficient similarity joins for near-
duplicate detection. ACM Transactions on Database Systems (TODS), 36(3), 15.
●
"Kathiravelu, Pradeeban; Galhardas, Helena; Veiga, Luís; ",∂u∂u Multi-Tenanted Framework:
Distributed Near Duplicate Detection for Big Data, On the Move to Meaningful Internet
Systems: OTM 2015 Conferences, 237-256, 2015, Springer International Publishing
●
"Kathiravelu, Pradeeban; Sharma, Ashish;", MEDIator: A Data Sharing Synchronization
Platform for Heterogeneous Medical Image Archives, "Workshop on Connected Health at Big
Data Era (BigCHat'15) , co-located with 21 st ACM SIGKDD Conference on Knowledge
Discovery and Data Mining (KDD 2015)", 2015, ACM.
●
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle
M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public
Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013,
pp 1045-1057.
●
Hazelcast for a distributed near duplicate detection.
●
Meta Data attached to the binary images in Medical Image Archives
– The Cancer Imaging Archive (TCIA)
●
●
●
●
●
●
●
●
●
●
●
●
Pradeeban Kathiravelu Ashish Sharma
Medical Imaging Data Warehouse Construction
Near Duplicate Detection for
●
Medical data warehouses and image archives are constructed by integrating multiple private
and public data sources.
●
Finding almost identical entries is crucial for warehouse construction.
●
Medical image archives are huge and consist of structured and hierarchical data, which may
be accessed by querying the metadata.
●
Existing solutions tend to be too specific.
– Master Patient Index (MPI) for patient records.
●
Multiple dimensions and attributes
– including medications, clinical, and pathological data
– should be considered for a complete duplicate detection and elimination.
●
MediCurator is a near duplicate detection framework for heterogeneous medical data
sources in constructing data warehouses.
●
MediCurator has been developed to retrieve medical data from
– various data sources, including: MySQL, MongoDB, CSV files, and
– medical image archives such as TCIA
●
MediCurator fits as part of the ETL process.
– Duplicates are detected in-memory.
– Merged data stored into data warehouses hosted in Hadoop Distributed File System
(HDFS).
MediCurator Approach
Design
Implementation
●
A prototype has been implemented.
– Hazelcast as the distributed execution framework.
– Distributed execution of research near duplicate detection algorithms on metadata.
– Speed-up of ten-folds, compared to the existing solutions such as MPI systems.
●
MediCurator functions as an integration middleware
– for data warehouse construction
– with duplicate detection and elimination
– from the raw textual medical data, or the binary data by leveraging the meta data
attached to it.
●
{pkathi2, ashish.sharma} @ emory.edu
Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA.
Acknowledgments
* Google Summer of Code 2015
* NCI U01 [1U01CA187013-01], Resources for development and validation of
Radiomic Analyses & Adaptive Therapy, Fred Prior, Ashish Sharma (UAMS, Emory)

Contenu connexe

Tendances

From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
Databricks
 
Lambda Architecture The Hive
Lambda Architecture The HiveLambda Architecture The Hive
Lambda Architecture The Hive
Altan Khendup
 

Tendances (20)

From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
 
Data cloud lab version v.001.2020
Data cloud lab version v.001.2020Data cloud lab version v.001.2020
Data cloud lab version v.001.2020
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
Role of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly works
 
New PID developments
New PID developmentsNew PID developments
New PID developments
 
Data mining
Data miningData mining
Data mining
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
 
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
 
9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution
 
The Big Metadata
The Big MetadataThe Big Metadata
The Big Metadata
 
Lambda Architecture The Hive
Lambda Architecture The HiveLambda Architecture The Hive
Lambda Architecture The Hive
 
2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During Pandemic
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE Indonesia
 

En vedette

En vedette (6)

Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
 
CHIEF: Controller Farm for Clouds of Software-Defined Community Networks
CHIEF: Controller Farm for Clouds of Software-Defined Community NetworksCHIEF: Controller Farm for Clouds of Software-Defined Community Networks
CHIEF: Controller Farm for Clouds of Software-Defined Community Networks
 
Selective Redundancy in Network-as-a-Service: Differentiated QoS in Multi-Ten...
Selective Redundancy in Network-as-a-Service: Differentiated QoS in Multi-Ten...Selective Redundancy in Network-as-a-Service: Differentiated QoS in Multi-Ten...
Selective Redundancy in Network-as-a-Service: Differentiated QoS in Multi-Ten...
 
An Introduction to Google Summer of Code 2015
An Introduction to Google Summer of Code 2015An Introduction to Google Summer of Code 2015
An Introduction to Google Summer of Code 2015
 
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
 

Similaire à Near Duplicate Detection for Medical Imaging Data Warehouse Construction

Understanding the Need of Data Integration in E Healthcare
Understanding the Need of Data Integration in E HealthcareUnderstanding the Need of Data Integration in E Healthcare
Understanding the Need of Data Integration in E Healthcare
ijtsrd
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
Alan Morrison
 
Big Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiBig Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John Cai
John Cai
 
DATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptxDATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptx
SaravanaD2
 

Similaire à Near Duplicate Detection for Medical Imaging Data Warehouse Construction (20)

dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Understanding the Need of Data Integration in E Healthcare
Understanding the Need of Data Integration in E HealthcareUnderstanding the Need of Data Integration in E Healthcare
Understanding the Need of Data Integration in E Healthcare
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop ClusterIRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
 
CLOUD-BASED DEVELOPMENT OF SMART AND CONNECTED DATA IN HEALTHCARE APPLICATION
CLOUD-BASED DEVELOPMENT OF SMART AND CONNECTED DATA IN HEALTHCARE APPLICATIONCLOUD-BASED DEVELOPMENT OF SMART AND CONNECTED DATA IN HEALTHCARE APPLICATION
CLOUD-BASED DEVELOPMENT OF SMART AND CONNECTED DATA IN HEALTHCARE APPLICATION
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
Big Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiBig Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John Cai
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONMULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
DATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptxDATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptx
 

Plus de Pradeeban Kathiravelu, Ph.D.

Plus de Pradeeban Kathiravelu, Ph.D. (20)

Google Summer of Code_2023.pdf
Google Summer of Code_2023.pdfGoogle Summer of Code_2023.pdf
Google Summer of Code_2023.pdf
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
 
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
 
Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021
 
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
 
Google Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentorsGoogle Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentors
 
Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020
 
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data SourcesData Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos... My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
 
UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018
 
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
 
Moving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routersMoving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routers
 
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
 
Software-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesSoftware-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big Services
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
 
Componentizing Big Services in the Internet
Componentizing Big Services in the InternetComponentizing Big Services in the Internet
Componentizing Big Services in the Internet
 

Dernier

Escorts Lahore || 🔞 03274100048 || Escort service in Lahore
Escorts Lahore || 🔞 03274100048 || Escort service in LahoreEscorts Lahore || 🔞 03274100048 || Escort service in Lahore
Escorts Lahore || 🔞 03274100048 || Escort service in Lahore
Deny Daniel
 
Punjab Call Girls Contact Number +919053,900,678 Punjab Call Girls
Punjab Call Girls Contact Number +919053,900,678 Punjab Call GirlsPunjab Call Girls Contact Number +919053,900,678 Punjab Call Girls
Punjab Call Girls Contact Number +919053,900,678 Punjab Call Girls
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Sheetaleventcompany
 
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
Sheetaleventcompany
 
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Sheetaleventcompany
 
vadodara Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
vadodara Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetvadodara Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
vadodara Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh
 
visakhapatnam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
visakhapatnam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetvisakhapatnam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
visakhapatnam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
mahaiklolahd
 

Dernier (20)

Escorts Lahore || 🔞 03274100048 || Escort service in Lahore
Escorts Lahore || 🔞 03274100048 || Escort service in LahoreEscorts Lahore || 🔞 03274100048 || Escort service in Lahore
Escorts Lahore || 🔞 03274100048 || Escort service in Lahore
 
AECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real Service
AECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real ServiceAECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real Service
AECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real Service
 
Sexy Call Girl Tiruvannamalai Arshi 💚9058824046💚 Tiruvannamalai Escort Service
Sexy Call Girl Tiruvannamalai Arshi 💚9058824046💚 Tiruvannamalai Escort ServiceSexy Call Girl Tiruvannamalai Arshi 💚9058824046💚 Tiruvannamalai Escort Service
Sexy Call Girl Tiruvannamalai Arshi 💚9058824046💚 Tiruvannamalai Escort Service
 
Punjab Call Girls Contact Number +919053,900,678 Punjab Call Girls
Punjab Call Girls Contact Number +919053,900,678 Punjab Call GirlsPunjab Call Girls Contact Number +919053,900,678 Punjab Call Girls
Punjab Call Girls Contact Number +919053,900,678 Punjab Call Girls
 
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
 
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
 
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
 
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
 
vadodara Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
vadodara Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetvadodara Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
vadodara Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Sexy Call Girl Villupuram Arshi 💚9058824046💚 Villupuram Escort Service
Sexy Call Girl Villupuram Arshi 💚9058824046💚 Villupuram Escort ServiceSexy Call Girl Villupuram Arshi 💚9058824046💚 Villupuram Escort Service
Sexy Call Girl Villupuram Arshi 💚9058824046💚 Villupuram Escort Service
 
Independent Call Girls Hyderabad 💋 9352988975 💋 Genuine WhatsApp Number for R...
Independent Call Girls Hyderabad 💋 9352988975 💋 Genuine WhatsApp Number for R...Independent Call Girls Hyderabad 💋 9352988975 💋 Genuine WhatsApp Number for R...
Independent Call Girls Hyderabad 💋 9352988975 💋 Genuine WhatsApp Number for R...
 
visakhapatnam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
visakhapatnam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetvisakhapatnam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
visakhapatnam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Gorgeous Call Girls Mohali {7435815124} ❤️VVIP ANGEL Call Girls in Mohali Punjab
Gorgeous Call Girls Mohali {7435815124} ❤️VVIP ANGEL Call Girls in Mohali PunjabGorgeous Call Girls Mohali {7435815124} ❤️VVIP ANGEL Call Girls in Mohali Punjab
Gorgeous Call Girls Mohali {7435815124} ❤️VVIP ANGEL Call Girls in Mohali Punjab
 
2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology
 
Sexy Call Girl Dharmapuri Arshi 💚9058824046💚 Dharmapuri Escort Service
Sexy Call Girl Dharmapuri Arshi 💚9058824046💚 Dharmapuri Escort ServiceSexy Call Girl Dharmapuri Arshi 💚9058824046💚 Dharmapuri Escort Service
Sexy Call Girl Dharmapuri Arshi 💚9058824046💚 Dharmapuri Escort Service
 
Budhwar Peth ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Budhwar Peth ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Budhwar Peth ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Budhwar Peth ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girlKolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
 
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
 
Call Girls Service Mohali {7435815124} ❤️VVIP PALAK Call Girl in Mohali Punjab
Call Girls Service Mohali {7435815124} ❤️VVIP PALAK Call Girl in Mohali PunjabCall Girls Service Mohali {7435815124} ❤️VVIP PALAK Call Girl in Mohali Punjab
Call Girls Service Mohali {7435815124} ❤️VVIP PALAK Call Girl in Mohali Punjab
 

Near Duplicate Detection for Medical Imaging Data Warehouse Construction

  • 1. RESEARCH POSTER PRESENTATION DESIGN © 2015 www.PosterPresentations.com Introduction Distributed Near Duplicate Detection ● Integrate medical data from various heterogeneous medical data sources and private archives using the public APIs. ● Curate the integrated data into a data warehouse for public access. ● Store the detected duplicate pairs into a separate data source. ● Duplicate detection by analyzing the potential data pairs from the original data sources, using similarity matrices for textual data. ● Hierarchical meta data attached to the binary medical data to identify, classify, and find duplicates among the binary raw data. ● Considers the inconsistencies in representation. – Usage of acronyms instead of the full form of the attributes. – Using different measurement units. ● Data is published to various data sources by the medical data publishers – through the respective write APIs of the data sources. ● Connects to the original data sources through their read APIs. ● Output of consolidated data and duplicate pairs – stored through the relevant write APIs. ● Medical data consumers consume the data from the warehouse composed by MediCurator through its read API. ● The data warehouse is considered to be free from the duplicates – False positives and false negatives. – based on the effectiveness of the similarity matrices and similarity join algorithms used. References ● Xiao, C., Wang, W., Lin, X., Yu, J. X., & Wang, G. (2011). Efficient similarity joins for near- duplicate detection. ACM Transactions on Database Systems (TODS), 36(3), 15. ● "Kathiravelu, Pradeeban; Galhardas, Helena; Veiga, Luís; ",∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data, On the Move to Meaningful Internet Systems: OTM 2015 Conferences, 237-256, 2015, Springer International Publishing ● "Kathiravelu, Pradeeban; Sharma, Ashish;", MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives, "Workshop on Connected Health at Big Data Era (BigCHat'15) , co-located with 21 st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015)", 2015, ACM. ● Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. ● Hazelcast for a distributed near duplicate detection. ● Meta Data attached to the binary images in Medical Image Archives – The Cancer Imaging Archive (TCIA) ● ● ● ● ● ● ● ● ● ● ● ● Pradeeban Kathiravelu Ashish Sharma Medical Imaging Data Warehouse Construction Near Duplicate Detection for ● Medical data warehouses and image archives are constructed by integrating multiple private and public data sources. ● Finding almost identical entries is crucial for warehouse construction. ● Medical image archives are huge and consist of structured and hierarchical data, which may be accessed by querying the metadata. ● Existing solutions tend to be too specific. – Master Patient Index (MPI) for patient records. ● Multiple dimensions and attributes – including medications, clinical, and pathological data – should be considered for a complete duplicate detection and elimination. ● MediCurator is a near duplicate detection framework for heterogeneous medical data sources in constructing data warehouses. ● MediCurator has been developed to retrieve medical data from – various data sources, including: MySQL, MongoDB, CSV files, and – medical image archives such as TCIA ● MediCurator fits as part of the ETL process. – Duplicates are detected in-memory. – Merged data stored into data warehouses hosted in Hadoop Distributed File System (HDFS). MediCurator Approach Design Implementation ● A prototype has been implemented. – Hazelcast as the distributed execution framework. – Distributed execution of research near duplicate detection algorithms on metadata. – Speed-up of ten-folds, compared to the existing solutions such as MPI systems. ● MediCurator functions as an integration middleware – for data warehouse construction – with duplicate detection and elimination – from the raw textual medical data, or the binary data by leveraging the meta data attached to it. ● {pkathi2, ashish.sharma} @ emory.edu Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA. Acknowledgments * Google Summer of Code 2015 * NCI U01 [1U01CA187013-01], Resources for development and validation of Radiomic Analyses & Adaptive Therapy, Fred Prior, Ashish Sharma (UAMS, Emory)