SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Big Data
Case Study
Proposed Model
Results and Conclusions
On Big Data Applications
P. M. Pardalos1, 2
1Center for Applied Optimization
Department of Industrial and Systems Engineering
University of Florida
2Laboratory of Algorithms and Technologies for Networks Analysis (LATNA)
National Research University Higher School of Economics
ItForum, 2013
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
How big will the data be tomorrow?
Cisco: Global ’Net Traffic to Surpass 1 Zettabyte in 2016
http://cacm.acm.org/news/150041-cisco-global-net-traffic-to-surpass-
1-zettabyte-in-2016/fulltext
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
What is BigData?
The proliferation of massive datasets brings with it a series
of special computational challenges.
This data avalanche arises in a wide range of scientific and
commercial applications.
With advances in computer and information technologies,
many of these challenges are beginning to be addressed.
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
HandBook of Massive Datasets
Handbook of Massive Data Sets,
co-editors: J. Abello, P.M. Pardalos,
and M. Resende, Kluwer Academic
Publishers, (2002).
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Data Mining: the practice of searching through large
amounts of computerized data to find useful patterns or
trends.
Optimization: An act, process, or methodology of making
something (as a design, system, or decision) as fully
perfect, functional, or effective as possible; specifically :
the mathematical procedures (as finding the maximum of a
function) involved in this.
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Hard drive Cost
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Hard drive Capacity
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Processing power
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Some Examples of Massive Datasets
Internet Data
Weather Data
Telecommunications Data
Medical Data
Demographics Data
Financial Data
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Handling Massive Datasets
New computing technologies are needed to handle massive
datasets
Input-Output Parallel Systems
External Memory Algorithms
Quantum Computing
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Study Objective
An example in biomedicine
Acute Kidney Injury (AKI) is one of common complications
in post-operative patients.
The in-hospital mortality rate for patients with AKI may be
as high as 60%.
An elevated serum creatinine (sCr) level in blood is
commonly recognized as an indicator of AKI.
The degree to which patterns of sCr change are
associated with in-hospital mortality is unknown.
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Study Objective
Study Outline
Objectives:
To develop a comprehensive model for assessing the
mortality risk in post-operative patients
To establish a quantitative relationship between sCr pattern
and mortality risk
Results:
A probabilistic model was designed to assess mortality risk
in post-operative patients
The model provided high discriminative capacity and
accuracy (ROC = 0.92)
A quantitative association between sCr time series and
mortality risk was established
Simple and informative sCr risk factors were derived
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Study Objective
Data Set
We performed a retrospective study involving patients
admitted to Shands Hospital (Gainesville, FL) from 2000
through 2010.
For each patient who underwent a surgery detailed clinical
and outcome data were collected.
The analysis included 60,074 patients who underwent
in-patient surgery at Shands Hospital during a 10-years
period.
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Logistic Function
Probabilistic score 1
P(C = 1|X = x) = 1 + exp w0 +
m
i=1
wigi(xi)
−1
; (1)
C ∈ {0, 1} is an outcome;
X = (X1, . . . , Xm) - the risk factors;
gi - nonlinear function associated with the ith factor
w0 = ln P(C = 1)/P(C = 0) is the a priori log odds ratio.
{wi}, i > 0 - weights indicating importance of the risk factors
1
S. Saria, A. Rajani,J. Gould,D. Koller,A. Penn, Integration of Early
Physiological Responses Predicts Later Illness Severity in Preterm Infants,
Sci Transl Med, 2010
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Nonlinear Risk Factors Estimation
From the probabilistic score equation we can derive
w0 +
m
i=1
wi · gi(xi) = log
P(X = x|C = 0)
P(X = x|C = 1)
+ log
P(C = 0)
P(C = 1)
Under assumption of risk factors independence:
w0 +
m
i=1
wi · gi(xi) =
m
i=1
log
P(Xi = xi|C = 0)
P(Xi = xi|C = 1)
+ log
P(C = 0)
P(C = 1)
.
For continuous factors P(Xi = xi|C) implies
P(Xi ∈ [xi − ε, xi + ε]|C) where ε is sufficiently small
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Model Parameters
max
w
k∈{0,1} j: Cj =k
ln P(Cj
= k|Xi = xj
i , . . . , Xi = xj
m);
j denotes patients in the data set;
Cj ∈ {0, 1} - outcome for the jth patient;
xj
i - value of ith risk factor for the jth patient.
We solved this problem approximately with interior point
method implemented in MATLAB
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Nonlinear risk functions
We get
gi(xi) = ln
P(Xi = xi|C = 0)
P(Xi = xi|C = 1)
;
w0 = ln
P(C = 1)
P(C = 0)
;
wi = 1, i = 1, . . . , m.
The probabilities P(Xi = xi|C), i = 1, . . . , m were learned
from the data using maximum likelihood principle.
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Discrete risk factors estimation
For discrete valued factors
gi(x) = ln
#{j : Cj = 1, xj
i = x}
#{j : Cj = 1}
#{j : Cj = 0}
#{j : Cj = 1, xj
i = x}
.
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Continuous Risk Factors Estimation
For continuous factors we defined
gi(x) = ln
f0
i (x)
f1
i (x)
,
fk
i (x) = ϕ∗
(x − h∗
, θ∗
),
{ϕ∗
, h∗
, θ∗
} = arg max
ϕ∈Φ,θ,h
Lk
i (ϕ, θ, h),
Lk
i (ϕ, θ, h) =
j:Cj =k
ln
ϕ(xj
i − h, θ)
Fϕ(l2
i , θ) − Fϕ(l1
i , θ)
.
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Continuous Risk Factors Estimation
1 2 3 4 5 6 7
0
50
100
150
200
maxCr/minCr value
Histogramofvalues,C=1
Observed values
Learned Inverse Gaussian distribution
1 2 3 4 5 6 7
0
0.5
1
1.5
2
x 10
4
maxCr/minCr value
Histogramofvalues,C=0
Observed values
Learned Loglogistic distribution
1 2 3 4 5 6 7
0
50
100
150
200
recentCr/minCr value
Histogramofvalues,C=1
Histogram of observed values
Learned Lognormal distribution
1 2 3 4 5 6 7
0
1
2
3
x 10
4
recentCr/minCr value
Histogramofvalues,C=0
Histogram of observed values
Learned LogLogistic distribution
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Age Risk Factor
20 40 60 80 100
0
20
40
60
Age
Distributionofvalues,C=1
20 40 60 80 100
0
1000
2000
3000
Age
Distributionofvalues,C=0
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Which creatinine (Cr) characteristics are ”good”?
Absolute Cr value is not very informative
(maximal Cr value)/(minimal Cr value) works good
Recent Cr value is important
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Cr plots in patients with elevated Cr level
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Probabilistic score for mortality risk
Risk Factors Estimation
Creatinine time series analysis
Our Speculations
The risk is defined by a function R(RC, OD),
RC stands for recent condition
OD stands for overall damage suffered during AKI
One should look for features describing these two factors
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Resulting nonlinear risk functions
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Classification accuracy evaluation
70/30 cross-validation analysis has been performed
Random split into 70% training subset and 30% testing
subset
The average over 100 runs results are reported
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Learned risk factors weights
0.2 0.4 0.6 0.8 1
procedure
admission_type
doctor
morbidity
age
maxCr/minCr
recentCr/minCr
Oddsratio
20 30 40 50 60 70 80 90
0
2
4
Age
1 1.5 2 2.5 3
0
5
10
recentCr/minCr value
1 1.5 2 2.5 3
0
1
2
maxCr/minCr value
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
The odds ratio based on Cr factors
1.5 2 2.5 3
1.5
2
2.5
3
maxCr/minCr value
recentCr/minCrvalue
2
4
6
8
10
12
14
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
The Resulting ROC Curves
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive rate (1 − specificity)
Truepositiverate(sensitivity)
Both Cr factors, AUC = 0.85
maxCr/minCr, AUC = 0.84
recentCr/minCr, 0.76
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive rate (1 − specificity)
Truepositiverate(sensitivity)
Preop. + both Cr factors, AUC = 0.92
Preop. + maxCr/minCr, AUC = 0.91
Preop. + recentCr/minCr, AUC = 0.91
Preop. only, AUC = 0.83
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Cr time series patterns
admission discharge
1
2
3
4
Odds ratios: No recovery
time
sCr/minCr
1.79 (1.7,1.85)
6.46 (6.04,6.91)
21.39 (19.42,23.5)
admission discharge
1
2
3
4
Odds ratios: Partial recovery
time
sCr/minCr
0.79 (0.75,0.83)
2.34 (2.26,2.49)
8.3 (7.85,9.04)
admission discharge
1
2
3
4
Odds ratios: Full recovery
time
sCr/minCr
0.48 (0.44,0.53)
0.63 (0.57,0.73)
0.81 (0.71,0.99)
admission discharge
1
2
3
4
Odds ratios: Reference patterns
time
sCr/minCr
1 (0.88,1.22)
1.12
1 (0.93,1.12)
1.24
1 (0.96,1.03) 1.32
0.22 (0.17,0.25)
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Model Calibration
20 40 60 80 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Percentile
Mortalityrisk
Probabilistic model score compared to mortality risks derived from the data
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Risk factors included Prob. model AUC (95% CI)
Preop. 0.835 (0.817-0.854)
Hosmer-Lemeshow statistics s = 95.06; p = 0.57
Preop. + maxCr/minCr 0.900 (0.888-0.911)
Hosmer-Lemeshow statistics s = 84.33; p = 0.84
Preop. + recentCr/minCr 0.910 (0.897-0.923)
Hosmer-Lemeshow statistics s = 72.19; p = 0.98
Preop. + both sCr factors 0.917 (0.906-0.929)
Hosmer-Lemeshow statistics s = 68.94; p = 0.99
maxCr/minCr only 0.834 (0.816-0.852)
Hosmer-Lemeshow statistics s = 1,818; p = 0
recentCr/minCr only 0.791 (0.764-0.818)
Hosmer-Lemeshow statistics s = 112.88; p = 0.14
both sCr factors 0.855 (0.837-0.873)
Hosmer-Lemeshow statistics s = 137.31; p = 0.01
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
Conclusions
Relatively high classification accuracy was obtained using
probabilistic model
Creatinine time series do provide complimentary
information on patient’s risk of death
Presumably AKI effect on risk of death depends on two
factors:
Kidney recent condition
Overall kidney damage suffered since surgery
P. M. Pardalos, On Big Data Applications
Big Data
Case Study
Proposed Model
Results and Conclusions
The End
Thank you!
P. M. Pardalos, On Big Data Applications

Contenu connexe

Tendances

Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and ClassificationContrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
Artificial Intelligence Institute at UofSC
 

Tendances (20)

Paris Data Ladies #14
Paris Data Ladies #14Paris Data Ladies #14
Paris Data Ladies #14
 
Comparative Study of Classification Method on Customer Candidate Data to Pred...
Comparative Study of Classification Method on Customer Candidate Data to Pred...Comparative Study of Classification Method on Customer Candidate Data to Pred...
Comparative Study of Classification Method on Customer Candidate Data to Pred...
 
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
 
Cao report 2007-2012
Cao report 2007-2012Cao report 2007-2012
Cao report 2007-2012
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
 
Assessment of Decision Tree Algorithms on Student’s Recital
Assessment of Decision Tree Algorithms on Student’s RecitalAssessment of Decision Tree Algorithms on Student’s Recital
Assessment of Decision Tree Algorithms on Student’s Recital
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
 
MitoGame: Gamification Method for Detecting Mitosis from Histopathological I...
MitoGame:  Gamification Method for Detecting Mitosis from Histopathological I...MitoGame:  Gamification Method for Detecting Mitosis from Histopathological I...
MitoGame: Gamification Method for Detecting Mitosis from Histopathological I...
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining II
 
Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and ClassificationContrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
 
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
 
AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)
 
Machine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problemsMachine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problems
 
1530 track2 humphrey
1530 track2 humphrey1530 track2 humphrey
1530 track2 humphrey
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deck
 
AI is the Future of Drug Discovery
AI is the Future of Drug DiscoveryAI is the Future of Drug Discovery
AI is the Future of Drug Discovery
 
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUESPREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
 
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
 
V34132136
V34132136V34132136
V34132136
 

En vedette

Р.Гаттаров - Электронное правительство: пÑ...
Р.Гаттаров - Электронное правительство: пÑ...Р.Гаттаров - Электронное правительство: пÑ...
Р.Гаттаров - Электронное правительство: пÑ...
Ekaterina Morozova
 
Sign of success ex étudiants
Sign of success   ex étudiantsSign of success   ex étudiants
Sign of success ex étudiants
Lorena Rennes
 
А.О.Федорец - Подключение к СМЭВ кредитных организаций. Итоги реализации прое...
А.О.Федорец - Подключение к СМЭВ кредитных организаций. Итоги реализации прое...А.О.Федорец - Подключение к СМЭВ кредитных организаций. Итоги реализации прое...
А.О.Федорец - Подключение к СМЭВ кредитных организаций. Итоги реализации прое...
Ekaterina Morozova
 
презентация Microsoft power point
презентация Microsoft power pointпрезентация Microsoft power point
презентация Microsoft power point
Ekaterina Morozova
 
цод нового поколения (It forum 2020 nn)
цод нового поколения (It forum 2020 nn)цод нового поколения (It forum 2020 nn)
цод нового поколения (It forum 2020 nn)
Ekaterina Morozova
 
Г.Петров - Эксплуатация комплексной системы обеспечения безопасности жизнедея...
Г.Петров - Эксплуатация комплексной системы обеспечения безопасности жизнедея...Г.Петров - Эксплуатация комплексной системы обеспечения безопасности жизнедея...
Г.Петров - Эксплуатация комплексной системы обеспечения безопасности жизнедея...
Ekaterina Morozova
 
доклад гутянская
доклад гутянскаядоклад гутянская
доклад гутянская
Ekaterina Morozova
 
Д.В.Гуртов - Риски и перспективы развития проекта "Информационное общество" в...
Д.В.Гуртов - Риски и перспективы развития проекта "Информационное общество" в...Д.В.Гуртов - Риски и перспективы развития проекта "Информационное общество" в...
Д.В.Гуртов - Риски и перспективы развития проекта "Информационное общество" в...
Ekaterina Morozova
 
А.Э.Бабкин - BigData + LinkedData = Новые возможности получения и обработки з...
А.Э.Бабкин - BigData + LinkedData = Новые возможности получения и обработки з...А.Э.Бабкин - BigData + LinkedData = Новые возможности получения и обработки з...
А.Э.Бабкин - BigData + LinkedData = Новые возможности получения и обработки з...
Ekaterina Morozova
 
М.И.Дегтярева - Опыт внедрения информационных технологий в здравоохранение Вл...
М.И.Дегтярева - Опыт внедрения информационных технологий в здравоохранение Вл...М.И.Дегтярева - Опыт внедрения информационных технологий в здравоохранение Вл...
М.И.Дегтярева - Опыт внедрения информационных технологий в здравоохранение Вл...
Ekaterina Morozova
 
С.Н.Сериков - Необходимые условия успешного внедрения интеллектуального опера...
С.Н.Сериков - Необходимые условия успешного внедрения интеллектуального опера...С.Н.Сериков - Необходимые условия успешного внедрения интеллектуального опера...
С.Н.Сериков - Необходимые условия успешного внедрения интеллектуального опера...
Ekaterina Morozova
 
фролов круглый стол нпп
фролов   круглый стол нппфролов   круглый стол нпп
фролов круглый стол нпп
Ekaterina Morozova
 
В.Гриднев - Стратегия развития ИТ для госсектора
В.Гриднев - Стратегия развития ИТ для госсектораВ.Гриднев - Стратегия развития ИТ для госсектора
В.Гриднев - Стратегия развития ИТ для госсектора
Ekaterina Morozova
 

En vedette (17)

Р.Гаттаров - Электронное правительство: пÑ...
Р.Гаттаров - Электронное правительство: пÑ...Р.Гаттаров - Электронное правительство: пÑ...
Р.Гаттаров - Электронное правительство: пÑ...
 
Sign of success ex étudiants
Sign of success   ex étudiantsSign of success   ex étudiants
Sign of success ex étudiants
 
А.О.Федорец - Подключение к СМЭВ кредитных организаций. Итоги реализации прое...
А.О.Федорец - Подключение к СМЭВ кредитных организаций. Итоги реализации прое...А.О.Федорец - Подключение к СМЭВ кредитных организаций. Итоги реализации прое...
А.О.Федорец - Подключение к СМЭВ кредитных организаций. Итоги реализации прое...
 
презентация Microsoft power point
презентация Microsoft power pointпрезентация Microsoft power point
презентация Microsoft power point
 
цод нового поколения (It forum 2020 nn)
цод нового поколения (It forum 2020 nn)цод нового поколения (It forum 2020 nn)
цод нового поколения (It forum 2020 nn)
 
Г.Петров - Эксплуатация комплексной системы обеспечения безопасности жизнедея...
Г.Петров - Эксплуатация комплексной системы обеспечения безопасности жизнедея...Г.Петров - Эксплуатация комплексной системы обеспечения безопасности жизнедея...
Г.Петров - Эксплуатация комплексной системы обеспечения безопасности жизнедея...
 
доклад гутянская
доклад гутянскаядоклад гутянская
доклад гутянская
 
Д.В.Гуртов - Риски и перспективы развития проекта "Информационное общество" в...
Д.В.Гуртов - Риски и перспективы развития проекта "Информационное общество" в...Д.В.Гуртов - Риски и перспективы развития проекта "Информационное общество" в...
Д.В.Гуртов - Риски и перспективы развития проекта "Информационное общество" в...
 
Rivas
RivasRivas
Rivas
 
нглу курицынагв
нглу курицынагвнглу курицынагв
нглу курицынагв
 
А.Э.Бабкин - BigData + LinkedData = Новые возможности получения и обработки з...
А.Э.Бабкин - BigData + LinkedData = Новые возможности получения и обработки з...А.Э.Бабкин - BigData + LinkedData = Новые возможности получения и обработки з...
А.Э.Бабкин - BigData + LinkedData = Новые возможности получения и обработки з...
 
М.И.Дегтярева - Опыт внедрения информационных технологий в здравоохранение Вл...
М.И.Дегтярева - Опыт внедрения информационных технологий в здравоохранение Вл...М.И.Дегтярева - Опыт внедрения информационных технологий в здравоохранение Вл...
М.И.Дегтярева - Опыт внедрения информационных технологий в здравоохранение Вл...
 
С.Н.Сериков - Необходимые условия успешного внедрения интеллектуального опера...
С.Н.Сериков - Необходимые условия успешного внедрения интеллектуального опера...С.Н.Сериков - Необходимые условия успешного внедрения интеллектуального опера...
С.Н.Сериков - Необходимые условия успешного внедрения интеллектуального опера...
 
5бугрин
5бугрин5бугрин
5бугрин
 
фролов круглый стол нпп
фролов   круглый стол нппфролов   круглый стол нпп
фролов круглый стол нпп
 
проценты вокруг нас1
проценты вокруг нас1проценты вокруг нас1
проценты вокруг нас1
 
В.Гриднев - Стратегия развития ИТ для госсектора
В.Гриднев - Стратегия развития ИТ для госсектораВ.Гриднев - Стратегия развития ИТ для госсектора
В.Гриднев - Стратегия развития ИТ для госсектора
 

Similaire à P.M.Pardalos - On Big Data Application

Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
Chandan Reddy
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
DataScienceConferenc1
 
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
butest
 
Professor Harrison Bai, Artificial Intelligence Applications in Radiology_mHe...
Professor Harrison Bai, Artificial Intelligence Applications in Radiology_mHe...Professor Harrison Bai, Artificial Intelligence Applications in Radiology_mHe...
Professor Harrison Bai, Artificial Intelligence Applications in Radiology_mHe...
Levi Shapiro
 
Supervised deep learning_embeddings_for_the_predic
Supervised deep learning_embeddings_for_the_predicSupervised deep learning_embeddings_for_the_predic
Supervised deep learning_embeddings_for_the_predic
hema latha
 

Similaire à P.M.Pardalos - On Big Data Application (20)

Bayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in IndiaBayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in India
 
Model-driven decision support for monitoring network design based on analysis...
Model-driven decision support for monitoring network design based on analysis...Model-driven decision support for monitoring network design based on analysis...
Model-driven decision support for monitoring network design based on analysis...
 
Santipur Webinar on COVID-19 pandemic
 Santipur Webinar on COVID-19 pandemic Santipur Webinar on COVID-19 pandemic
Santipur Webinar on COVID-19 pandemic
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
 
Quantitive Time Series Analysis of Malware and Vulnerability Trends
Quantitive Time Series Analysis of Malware and Vulnerability TrendsQuantitive Time Series Analysis of Malware and Vulnerability Trends
Quantitive Time Series Analysis of Malware and Vulnerability Trends
 
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
 
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
 
Professor Harrison Bai, Artificial Intelligence Applications in Radiology_mHe...
Professor Harrison Bai, Artificial Intelligence Applications in Radiology_mHe...Professor Harrison Bai, Artificial Intelligence Applications in Radiology_mHe...
Professor Harrison Bai, Artificial Intelligence Applications in Radiology_mHe...
 
How to Validate you Client Churn Model
How to Validate you Client Churn ModelHow to Validate you Client Churn Model
How to Validate you Client Churn Model
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
MLPA for health care presentation smc
MLPA for health care presentation   smcMLPA for health care presentation   smc
MLPA for health care presentation smc
 
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERYLIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
 
Absolute risk estimation in a case cohort study of prostate cancer
Absolute risk estimation in a case cohort study of prostate cancerAbsolute risk estimation in a case cohort study of prostate cancer
Absolute risk estimation in a case cohort study of prostate cancer
 
Basen Network
Basen NetworkBasen Network
Basen Network
 
Supervised deep learning_embeddings_for_the_predic
Supervised deep learning_embeddings_for_the_predicSupervised deep learning_embeddings_for_the_predic
Supervised deep learning_embeddings_for_the_predic
 

Plus de Ekaterina Morozova

электронная демократия бастрыкин-нижний новгород 17.04.13
электронная демократия бастрыкин-нижний новгород 17.04.13электронная демократия бастрыкин-нижний новгород 17.04.13
электронная демократия бастрыкин-нижний новгород 17.04.13
Ekaterina Morozova
 
свободные цифровые технологии как механизм реформированиясистемы государствен...
свободные цифровые технологии как механизм реформированиясистемы государствен...свободные цифровые технологии как механизм реформированиясистемы государствен...
свободные цифровые технологии как механизм реформированиясистемы государствен...
Ekaterina Morozova
 
расширенный доклад по вкц
расширенный доклад по вкцрасширенный доклад по вкц
расширенный доклад по вкц
Ekaterina Morozova
 
презентация полтава мфц
презентация полтава мфцпрезентация полтава мфц
презентация полтава мфц
Ekaterina Morozova
 
почта россии нновгород 16 17 апр 2013 г
почта россии нновгород 16 17 апр 2013 г почта россии нновгород 16 17 апр 2013 г
почта россии нновгород 16 17 апр 2013 г
Ekaterina Morozova
 
основные проблемы и пути развития промышленности пфо гл.
основные проблемы и пути развития промышленности пфо гл.основные проблемы и пути развития промышленности пфо гл.
основные проблемы и пути развития промышленности пфо гл.
Ekaterina Morozova
 
Point of growing nn 16_apr13_mango_v5
Point of growing nn 16_apr13_mango_v5Point of growing nn 16_apr13_mango_v5
Point of growing nn 16_apr13_mango_v5
Ekaterina Morozova
 
2013 04 17_chugunov_e_democracy1
2013 04 17_chugunov_e_democracy12013 04 17_chugunov_e_democracy1
2013 04 17_chugunov_e_democracy1
Ekaterina Morozova
 

Plus de Ekaterina Morozova (20)

2
22
2
 
1
11
1
 
3
33
3
 
электронная демократия бастрыкин-нижний новгород 17.04.13
электронная демократия бастрыкин-нижний новгород 17.04.13электронная демократия бастрыкин-нижний новгород 17.04.13
электронная демократия бастрыкин-нижний новгород 17.04.13
 
уэк чувашия
уэк чувашияуэк чувашия
уэк чувашия
 
свободные цифровые технологии как механизм реформированиясистемы государствен...
свободные цифровые технологии как механизм реформированиясистемы государствен...свободные цифровые технологии как механизм реформированиясистемы государствен...
свободные цифровые технологии как механизм реформированиясистемы государствен...
 
расширенный доклад по вкц
расширенный доклад по вкцрасширенный доклад по вкц
расширенный доклад по вкц
 
презентация полтава мфц
презентация полтава мфцпрезентация полтава мфц
презентация полтава мфц
 
почта россии нновгород 16 17 апр 2013 г
почта россии нновгород 16 17 апр 2013 г почта россии нновгород 16 17 апр 2013 г
почта россии нновгород 16 17 апр 2013 г
 
основные проблемы и пути развития промышленности пфо гл.
основные проблемы и пути развития промышленности пфо гл.основные проблемы и пути развития промышленности пфо гл.
основные проблемы и пути развития промышленности пфо гл.
 
нн 2013 chugunov_v 1
нн 2013 chugunov_v 1нн 2013 chugunov_v 1
нн 2013 chugunov_v 1
 
лад
ладлад
лад
 
долгих
долгихдолгих
долгих
 
бфт найченко
бфт найченкобфт найченко
бфт найченко
 
Sag
SagSag
Sag
 
Point of growing nn 16_apr13_mango_v5
Point of growing nn 16_apr13_mango_v5Point of growing nn 16_apr13_mango_v5
Point of growing nn 16_apr13_mango_v5
 
Meta q гагин
Meta q гагинMeta q гагин
Meta q гагин
 
Kirsanov
KirsanovKirsanov
Kirsanov
 
2013 04 17_chugunov_e_democracy1
2013 04 17_chugunov_e_democracy12013 04 17_chugunov_e_democracy1
2013 04 17_chugunov_e_democracy1
 
6бойченко
6бойченко6бойченко
6бойченко
 

Dernier

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

P.M.Pardalos - On Big Data Application

  • 1. Big Data Case Study Proposed Model Results and Conclusions On Big Data Applications P. M. Pardalos1, 2 1Center for Applied Optimization Department of Industrial and Systems Engineering University of Florida 2Laboratory of Algorithms and Technologies for Networks Analysis (LATNA) National Research University Higher School of Economics ItForum, 2013 P. M. Pardalos, On Big Data Applications
  • 2. Big Data Case Study Proposed Model Results and Conclusions P. M. Pardalos, On Big Data Applications
  • 3. Big Data Case Study Proposed Model Results and Conclusions How big will the data be tomorrow? Cisco: Global ’Net Traffic to Surpass 1 Zettabyte in 2016 http://cacm.acm.org/news/150041-cisco-global-net-traffic-to-surpass- 1-zettabyte-in-2016/fulltext P. M. Pardalos, On Big Data Applications
  • 4. Big Data Case Study Proposed Model Results and Conclusions What is BigData? The proliferation of massive datasets brings with it a series of special computational challenges. This data avalanche arises in a wide range of scientific and commercial applications. With advances in computer and information technologies, many of these challenges are beginning to be addressed. P. M. Pardalos, On Big Data Applications
  • 5. Big Data Case Study Proposed Model Results and Conclusions HandBook of Massive Datasets Handbook of Massive Data Sets, co-editors: J. Abello, P.M. Pardalos, and M. Resende, Kluwer Academic Publishers, (2002). P. M. Pardalos, On Big Data Applications
  • 6. Big Data Case Study Proposed Model Results and Conclusions Data Mining: the practice of searching through large amounts of computerized data to find useful patterns or trends. Optimization: An act, process, or methodology of making something (as a design, system, or decision) as fully perfect, functional, or effective as possible; specifically : the mathematical procedures (as finding the maximum of a function) involved in this. P. M. Pardalos, On Big Data Applications
  • 7. Big Data Case Study Proposed Model Results and Conclusions Hard drive Cost P. M. Pardalos, On Big Data Applications
  • 8. Big Data Case Study Proposed Model Results and Conclusions Hard drive Capacity P. M. Pardalos, On Big Data Applications
  • 9. Big Data Case Study Proposed Model Results and Conclusions Processing power P. M. Pardalos, On Big Data Applications
  • 10. Big Data Case Study Proposed Model Results and Conclusions Some Examples of Massive Datasets Internet Data Weather Data Telecommunications Data Medical Data Demographics Data Financial Data P. M. Pardalos, On Big Data Applications
  • 11. Big Data Case Study Proposed Model Results and Conclusions Handling Massive Datasets New computing technologies are needed to handle massive datasets Input-Output Parallel Systems External Memory Algorithms Quantum Computing P. M. Pardalos, On Big Data Applications
  • 12. Big Data Case Study Proposed Model Results and Conclusions Study Objective An example in biomedicine Acute Kidney Injury (AKI) is one of common complications in post-operative patients. The in-hospital mortality rate for patients with AKI may be as high as 60%. An elevated serum creatinine (sCr) level in blood is commonly recognized as an indicator of AKI. The degree to which patterns of sCr change are associated with in-hospital mortality is unknown. P. M. Pardalos, On Big Data Applications
  • 13. Big Data Case Study Proposed Model Results and Conclusions Study Objective Study Outline Objectives: To develop a comprehensive model for assessing the mortality risk in post-operative patients To establish a quantitative relationship between sCr pattern and mortality risk Results: A probabilistic model was designed to assess mortality risk in post-operative patients The model provided high discriminative capacity and accuracy (ROC = 0.92) A quantitative association between sCr time series and mortality risk was established Simple and informative sCr risk factors were derived P. M. Pardalos, On Big Data Applications
  • 14. Big Data Case Study Proposed Model Results and Conclusions Study Objective Data Set We performed a retrospective study involving patients admitted to Shands Hospital (Gainesville, FL) from 2000 through 2010. For each patient who underwent a surgery detailed clinical and outcome data were collected. The analysis included 60,074 patients who underwent in-patient surgery at Shands Hospital during a 10-years period. P. M. Pardalos, On Big Data Applications
  • 15. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Logistic Function Probabilistic score 1 P(C = 1|X = x) = 1 + exp w0 + m i=1 wigi(xi) −1 ; (1) C ∈ {0, 1} is an outcome; X = (X1, . . . , Xm) - the risk factors; gi - nonlinear function associated with the ith factor w0 = ln P(C = 1)/P(C = 0) is the a priori log odds ratio. {wi}, i > 0 - weights indicating importance of the risk factors 1 S. Saria, A. Rajani,J. Gould,D. Koller,A. Penn, Integration of Early Physiological Responses Predicts Later Illness Severity in Preterm Infants, Sci Transl Med, 2010 P. M. Pardalos, On Big Data Applications
  • 16. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Nonlinear Risk Factors Estimation From the probabilistic score equation we can derive w0 + m i=1 wi · gi(xi) = log P(X = x|C = 0) P(X = x|C = 1) + log P(C = 0) P(C = 1) Under assumption of risk factors independence: w0 + m i=1 wi · gi(xi) = m i=1 log P(Xi = xi|C = 0) P(Xi = xi|C = 1) + log P(C = 0) P(C = 1) . For continuous factors P(Xi = xi|C) implies P(Xi ∈ [xi − ε, xi + ε]|C) where ε is sufficiently small P. M. Pardalos, On Big Data Applications
  • 17. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Model Parameters max w k∈{0,1} j: Cj =k ln P(Cj = k|Xi = xj i , . . . , Xi = xj m); j denotes patients in the data set; Cj ∈ {0, 1} - outcome for the jth patient; xj i - value of ith risk factor for the jth patient. We solved this problem approximately with interior point method implemented in MATLAB P. M. Pardalos, On Big Data Applications
  • 18. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Nonlinear risk functions We get gi(xi) = ln P(Xi = xi|C = 0) P(Xi = xi|C = 1) ; w0 = ln P(C = 1) P(C = 0) ; wi = 1, i = 1, . . . , m. The probabilities P(Xi = xi|C), i = 1, . . . , m were learned from the data using maximum likelihood principle. P. M. Pardalos, On Big Data Applications
  • 19. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Discrete risk factors estimation For discrete valued factors gi(x) = ln #{j : Cj = 1, xj i = x} #{j : Cj = 1} #{j : Cj = 0} #{j : Cj = 1, xj i = x} . P. M. Pardalos, On Big Data Applications
  • 20. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Continuous Risk Factors Estimation For continuous factors we defined gi(x) = ln f0 i (x) f1 i (x) , fk i (x) = ϕ∗ (x − h∗ , θ∗ ), {ϕ∗ , h∗ , θ∗ } = arg max ϕ∈Φ,θ,h Lk i (ϕ, θ, h), Lk i (ϕ, θ, h) = j:Cj =k ln ϕ(xj i − h, θ) Fϕ(l2 i , θ) − Fϕ(l1 i , θ) . P. M. Pardalos, On Big Data Applications
  • 21. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Continuous Risk Factors Estimation 1 2 3 4 5 6 7 0 50 100 150 200 maxCr/minCr value Histogramofvalues,C=1 Observed values Learned Inverse Gaussian distribution 1 2 3 4 5 6 7 0 0.5 1 1.5 2 x 10 4 maxCr/minCr value Histogramofvalues,C=0 Observed values Learned Loglogistic distribution 1 2 3 4 5 6 7 0 50 100 150 200 recentCr/minCr value Histogramofvalues,C=1 Histogram of observed values Learned Lognormal distribution 1 2 3 4 5 6 7 0 1 2 3 x 10 4 recentCr/minCr value Histogramofvalues,C=0 Histogram of observed values Learned LogLogistic distribution P. M. Pardalos, On Big Data Applications
  • 22. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Age Risk Factor 20 40 60 80 100 0 20 40 60 Age Distributionofvalues,C=1 20 40 60 80 100 0 1000 2000 3000 Age Distributionofvalues,C=0 P. M. Pardalos, On Big Data Applications
  • 23. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Which creatinine (Cr) characteristics are ”good”? Absolute Cr value is not very informative (maximal Cr value)/(minimal Cr value) works good Recent Cr value is important P. M. Pardalos, On Big Data Applications
  • 24. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Cr plots in patients with elevated Cr level P. M. Pardalos, On Big Data Applications
  • 25. Big Data Case Study Proposed Model Results and Conclusions Probabilistic score for mortality risk Risk Factors Estimation Creatinine time series analysis Our Speculations The risk is defined by a function R(RC, OD), RC stands for recent condition OD stands for overall damage suffered during AKI One should look for features describing these two factors P. M. Pardalos, On Big Data Applications
  • 26. Big Data Case Study Proposed Model Results and Conclusions Resulting nonlinear risk functions P. M. Pardalos, On Big Data Applications
  • 27. Big Data Case Study Proposed Model Results and Conclusions Classification accuracy evaluation 70/30 cross-validation analysis has been performed Random split into 70% training subset and 30% testing subset The average over 100 runs results are reported P. M. Pardalos, On Big Data Applications
  • 28. Big Data Case Study Proposed Model Results and Conclusions Learned risk factors weights 0.2 0.4 0.6 0.8 1 procedure admission_type doctor morbidity age maxCr/minCr recentCr/minCr Oddsratio 20 30 40 50 60 70 80 90 0 2 4 Age 1 1.5 2 2.5 3 0 5 10 recentCr/minCr value 1 1.5 2 2.5 3 0 1 2 maxCr/minCr value P. M. Pardalos, On Big Data Applications
  • 29. Big Data Case Study Proposed Model Results and Conclusions The odds ratio based on Cr factors 1.5 2 2.5 3 1.5 2 2.5 3 maxCr/minCr value recentCr/minCrvalue 2 4 6 8 10 12 14 P. M. Pardalos, On Big Data Applications
  • 30. Big Data Case Study Proposed Model Results and Conclusions The Resulting ROC Curves 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate (1 − specificity) Truepositiverate(sensitivity) Both Cr factors, AUC = 0.85 maxCr/minCr, AUC = 0.84 recentCr/minCr, 0.76 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate (1 − specificity) Truepositiverate(sensitivity) Preop. + both Cr factors, AUC = 0.92 Preop. + maxCr/minCr, AUC = 0.91 Preop. + recentCr/minCr, AUC = 0.91 Preop. only, AUC = 0.83 P. M. Pardalos, On Big Data Applications
  • 31. Big Data Case Study Proposed Model Results and Conclusions Cr time series patterns admission discharge 1 2 3 4 Odds ratios: No recovery time sCr/minCr 1.79 (1.7,1.85) 6.46 (6.04,6.91) 21.39 (19.42,23.5) admission discharge 1 2 3 4 Odds ratios: Partial recovery time sCr/minCr 0.79 (0.75,0.83) 2.34 (2.26,2.49) 8.3 (7.85,9.04) admission discharge 1 2 3 4 Odds ratios: Full recovery time sCr/minCr 0.48 (0.44,0.53) 0.63 (0.57,0.73) 0.81 (0.71,0.99) admission discharge 1 2 3 4 Odds ratios: Reference patterns time sCr/minCr 1 (0.88,1.22) 1.12 1 (0.93,1.12) 1.24 1 (0.96,1.03) 1.32 0.22 (0.17,0.25) P. M. Pardalos, On Big Data Applications
  • 32. Big Data Case Study Proposed Model Results and Conclusions Model Calibration 20 40 60 80 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Percentile Mortalityrisk Probabilistic model score compared to mortality risks derived from the data P. M. Pardalos, On Big Data Applications
  • 33. Big Data Case Study Proposed Model Results and Conclusions Risk factors included Prob. model AUC (95% CI) Preop. 0.835 (0.817-0.854) Hosmer-Lemeshow statistics s = 95.06; p = 0.57 Preop. + maxCr/minCr 0.900 (0.888-0.911) Hosmer-Lemeshow statistics s = 84.33; p = 0.84 Preop. + recentCr/minCr 0.910 (0.897-0.923) Hosmer-Lemeshow statistics s = 72.19; p = 0.98 Preop. + both sCr factors 0.917 (0.906-0.929) Hosmer-Lemeshow statistics s = 68.94; p = 0.99 maxCr/minCr only 0.834 (0.816-0.852) Hosmer-Lemeshow statistics s = 1,818; p = 0 recentCr/minCr only 0.791 (0.764-0.818) Hosmer-Lemeshow statistics s = 112.88; p = 0.14 both sCr factors 0.855 (0.837-0.873) Hosmer-Lemeshow statistics s = 137.31; p = 0.01 P. M. Pardalos, On Big Data Applications
  • 34. Big Data Case Study Proposed Model Results and Conclusions Conclusions Relatively high classification accuracy was obtained using probabilistic model Creatinine time series do provide complimentary information on patient’s risk of death Presumably AKI effect on risk of death depends on two factors: Kidney recent condition Overall kidney damage suffered since surgery P. M. Pardalos, On Big Data Applications
  • 35. Big Data Case Study Proposed Model Results and Conclusions The End Thank you! P. M. Pardalos, On Big Data Applications