SlideShare a Scribd company logo
1 of 27
RAPID PROTOTYPING DATA
PRODUCTS USING SHINY
rstudio::conf 2018
2017-02-02
2004 201220072005 2006 20142013 2015
SPEAKER PROFILE
TANYA CASHORALI
@TANYACASH21
2
HUMANS ARE NOT FORTUNE TELLERS
Missing
Data
OutliersNonlinearityCollinearity
Delimiters!!
1
t;||
Of course I knew there
wouldn’t be enough data
in Oglala Lakota County
when I wrote the 25
page requirements doc!
WE’RE NOT BUILDING ASTON MARTINS
“Laugh at perfection. It’s boring and keeps you from being done.”
THE DONE MANIFESTO
• http://www.manifestoproject.it/bre-pettis-and-kio-stark/
• https://www.bakadesuyo.com/2015/09/impostor-syndrome/
“Pretending you know what you’re doing
is almost the same as knowing what you
are doing, so just accept that you know
what you’re doing even if you don’t
and do it.”
There are three states of being:
1. Not knowing
2. Action
3. Completion.
CASE STUDIES
1.6 BILLION DOCUMENTS
Problem
Need to enable scientists to query 1.6 billion
“documents” (SNP + phenotype combinations)
quickly and filter based on significance and
various other filters.
CUSTOM RMONGO PACKAGE
RMongo package built in Scala did not support authentication for Mongo 3.0
So we built an RJMongo package using Java = ACTION!
That same issue still isn’t resolved – originally reported in June 2015
PERFORMANCE?
action <- dataTableAjax(session, result,rownames = FALSE,filter = function(data, params) {
q = params
data=dataFromMongo(qs,q$search,q$start,q$length,q$column,q$order)
list(
draw = as.integer(q$draw),
recordsTotal = recordCount,
recordsFiltered =recordCount ,
data = unname(as.matrix(data)),
DT_rows_all = 5
)
})
widget <- datatable(result,
rownames = FALSE,
class = 'display cell-border compact',
selection = 'none',
options = list(ajax = list(url = action),scrollX = TRUE,serverSide = TRUE,stateSave = TRUE,
escape=FALSE,filter=FALSE,processing=TRUE,language = list(processing = "<img src='spin.gif'>"),columnDefs = list(list(targets =
c(0:4,6:25),sortable = FALSE)),order = list(list(5,'asc')))
)
* https://www.rdocumentation.org/packages/DT/versions/0.2/topics/dataTableAjax
In order to improve query performance… dataTableAjax() to the resuce!
FIRST VERSION
“Accept that everything is a draft.
It helps to get done.”
CURRENT PRODUCT
LET’S ADD 2.5 BILLION MORE!
• One node cluster w/ 512GB of RAM
• Current data size ~3 terabytes in JSON format
“Done is the engine of more.”
CMR API
Problem – API access to data from
Centre for Medicines Research (CMR)
International, which provides pharmaceutical
industry metrics and trends analysis.
Issues:
• Clunky API
• Tons of parameter combinations and
results returned in aggregate
• Time-consuming
• IT dumped some of the data
• Slow
• Poor usability on their GUI (filters are
clunky)
• Ineffective visualizations
• Data extracts contain limited details and
were difficult to use
CMR API
First iteration was just ggplots and iterating with client on necessary parameters,
don’t need thousands of indications
AUTHENTICATION (PYTHON! GASP!)
“The point of being done
is not to finish but
to get other things done.”
HOW IT WORKS
cmr_api.R
auth.py
server.R ui.R
fetch_data(token, endpt, params)
reticulate
get_token()
“Once you’re done you
can throw it away.”
CURRENT PRODUCT
DRUG MANUFACTURING
• Many combinations of raw materials in
specific order used to create final drug
substance
• Time Consuming
• Costly
• One problematic substance = lost
batches = millions of dollars
• Single user was running 100s of SQL
queries manually
Throw out massive
requirements docs
NETWORKD3
“People without dirty
hands are wrong.
Doing something makes
you right.”
FIRST VERSION – CORE FUNCTIONALITY
“There is no editing stage.”
DETAILS COME LATER
“Failure counts as done. So do mistakes.”
SHINY AND D3 COMMUNICATION
server.R: session$sendCustomMessage(type="jsondata",var_json)
www/: main.js
Shiny.addCustomMessageHandler("jsondata", function (message) {
if (typeof(message) !== 'undefined') {
var json_data = JSON.parse(message);
initTree(json_data.left);
initSide(json_data.right);
}
});
ui.R: tags$script(src=”main.js")
• http://myinspirationinformation.com/visualisation/d3-js/integrating-d3-js-into-r-shiny/
”FINAL” PRODUCT
Previously:
6 months and full
team to identify
problematic
substance
Now:
1-2 users and 1 day
to identify
problematic
substance
OVERVIEW OF RAPID PROTOTYPING PROCESS
IF WE WERE MAKING DONUTS
THANK YOU
Patrick
Brophy
Daron
Carlson
Mike
Fitzpatrick
Roland
Zhou
Olivia
Brode-Roger
Rajesh
Mikkilineni
Jason
Tetrault
Marianna
Foos

More Related Content

Similar to Rapid Prototyping Data Products in Shiny - RStudio::Conf 2018

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Offline First Applications
Offline First ApplicationsOffline First Applications
Offline First Applicationstechmaddy
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014gdusbabek
 
Protecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataProtecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataMatt Bowen
 
Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichMemi Beltrame
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Fuck Spreadsheets - first steps to become a data-driven company
Fuck Spreadsheets - first steps to become a data-driven companyFuck Spreadsheets - first steps to become a data-driven company
Fuck Spreadsheets - first steps to become a data-driven companySteven Stadler
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015Bipin Singh
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Building a Data Driven Organization
Building a Data Driven OrganizationBuilding a Data Driven Organization
Building a Data Driven OrganizationIT Weekend
 
Alexis max-Creating a bot experience as good as your user experience - Alexis...
Alexis max-Creating a bot experience as good as your user experience - Alexis...Alexis max-Creating a bot experience as good as your user experience - Alexis...
Alexis max-Creating a bot experience as good as your user experience - Alexis...WeLoveSEO
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
Le Web 2012 presentation - Dalton Caldwell
Le Web 2012 presentation - Dalton CaldwellLe Web 2012 presentation - Dalton Caldwell
Le Web 2012 presentation - Dalton Caldwelldaltoncaldwell
 
Inside Out and Upside Down - FOO Camp 2016 - Peter Coffee
Inside Out and Upside Down - FOO Camp 2016 - Peter CoffeeInside Out and Upside Down - FOO Camp 2016 - Peter Coffee
Inside Out and Upside Down - FOO Camp 2016 - Peter CoffeePeter Coffee
 
Python vs JLizard.... a python logging experience
Python vs JLizard.... a python logging experiencePython vs JLizard.... a python logging experience
Python vs JLizard.... a python logging experiencePython Ireland
 
WisdomEye Technologies
WisdomEye TechnologiesWisdomEye Technologies
WisdomEye TechnologiesAshish Jha
 

Similar to Rapid Prototyping Data Products in Shiny - RStudio::Conf 2018 (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Offline first geeknight
Offline first geeknightOffline first geeknight
Offline first geeknight
 
Offline First Applications
Offline First ApplicationsOffline First Applications
Offline First Applications
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Protecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataProtecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test data
 
Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference Zurich
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Fuck Spreadsheets - first steps to become a data-driven company
Fuck Spreadsheets - first steps to become a data-driven companyFuck Spreadsheets - first steps to become a data-driven company
Fuck Spreadsheets - first steps to become a data-driven company
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Building a Data Driven Organization
Building a Data Driven OrganizationBuilding a Data Driven Organization
Building a Data Driven Organization
 
Big Data
Big DataBig Data
Big Data
 
Alexis max-Creating a bot experience as good as your user experience - Alexis...
Alexis max-Creating a bot experience as good as your user experience - Alexis...Alexis max-Creating a bot experience as good as your user experience - Alexis...
Alexis max-Creating a bot experience as good as your user experience - Alexis...
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Le Web 2012 presentation - Dalton Caldwell
Le Web 2012 presentation - Dalton CaldwellLe Web 2012 presentation - Dalton Caldwell
Le Web 2012 presentation - Dalton Caldwell
 
Inside Out and Upside Down - FOO Camp 2016 - Peter Coffee
Inside Out and Upside Down - FOO Camp 2016 - Peter CoffeeInside Out and Upside Down - FOO Camp 2016 - Peter Coffee
Inside Out and Upside Down - FOO Camp 2016 - Peter Coffee
 
Python vs JLizard.... a python logging experience
Python vs JLizard.... a python logging experiencePython vs JLizard.... a python logging experience
Python vs JLizard.... a python logging experience
 
WisdomEye Technologies
WisdomEye TechnologiesWisdomEye Technologies
WisdomEye Technologies
 

More from Tanya Cashorali

When and Why to Use Shiny for Commercial Applications
When and Why to Use Shiny for Commercial ApplicationsWhen and Why to Use Shiny for Commercial Applications
When and Why to Use Shiny for Commercial ApplicationsTanya Cashorali
 
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...Tanya Cashorali
 
Rapid Prototyping Data Products in Shiny - ODSC 2017
Rapid Prototyping Data Products in Shiny - ODSC 2017 Rapid Prototyping Data Products in Shiny - ODSC 2017
Rapid Prototyping Data Products in Shiny - ODSC 2017 Tanya Cashorali
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016Tanya Cashorali
 
Popular Industry Applications of R
Popular Industry Applications of RPopular Industry Applications of R
Popular Industry Applications of RTanya Cashorali
 
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...Tanya Cashorali
 
Big data meetup_10_9_2013
Big data meetup_10_9_2013Big data meetup_10_9_2013
Big data meetup_10_9_2013Tanya Cashorali
 
Front endrequirements 09_25_2013
Front endrequirements 09_25_2013Front endrequirements 09_25_2013
Front endrequirements 09_25_2013Tanya Cashorali
 
Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013Tanya Cashorali
 

More from Tanya Cashorali (10)

When and Why to Use Shiny for Commercial Applications
When and Why to Use Shiny for Commercial ApplicationsWhen and Why to Use Shiny for Commercial Applications
When and Why to Use Shiny for Commercial Applications
 
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
 
Rapid Prototyping Data Products in Shiny - ODSC 2017
Rapid Prototyping Data Products in Shiny - ODSC 2017 Rapid Prototyping Data Products in Shiny - ODSC 2017
Rapid Prototyping Data Products in Shiny - ODSC 2017
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
 
Popular Industry Applications of R
Popular Industry Applications of RPopular Industry Applications of R
Popular Industry Applications of R
 
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
 
DataCon Talk
DataCon Talk DataCon Talk
DataCon Talk
 
Big data meetup_10_9_2013
Big data meetup_10_9_2013Big data meetup_10_9_2013
Big data meetup_10_9_2013
 
Front endrequirements 09_25_2013
Front endrequirements 09_25_2013Front endrequirements 09_25_2013
Front endrequirements 09_25_2013
 
Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Rapid Prototyping Data Products in Shiny - RStudio::Conf 2018

  • 1. RAPID PROTOTYPING DATA PRODUCTS USING SHINY rstudio::conf 2018 2017-02-02
  • 2. 2004 201220072005 2006 20142013 2015 SPEAKER PROFILE TANYA CASHORALI @TANYACASH21 2
  • 3. HUMANS ARE NOT FORTUNE TELLERS Missing Data OutliersNonlinearityCollinearity Delimiters!! 1 t;|| Of course I knew there wouldn’t be enough data in Oglala Lakota County when I wrote the 25 page requirements doc!
  • 4. WE’RE NOT BUILDING ASTON MARTINS “Laugh at perfection. It’s boring and keeps you from being done.”
  • 5. THE DONE MANIFESTO • http://www.manifestoproject.it/bre-pettis-and-kio-stark/ • https://www.bakadesuyo.com/2015/09/impostor-syndrome/ “Pretending you know what you’re doing is almost the same as knowing what you are doing, so just accept that you know what you’re doing even if you don’t and do it.” There are three states of being: 1. Not knowing 2. Action 3. Completion.
  • 7. 1.6 BILLION DOCUMENTS Problem Need to enable scientists to query 1.6 billion “documents” (SNP + phenotype combinations) quickly and filter based on significance and various other filters.
  • 8. CUSTOM RMONGO PACKAGE RMongo package built in Scala did not support authentication for Mongo 3.0 So we built an RJMongo package using Java = ACTION! That same issue still isn’t resolved – originally reported in June 2015
  • 9. PERFORMANCE? action <- dataTableAjax(session, result,rownames = FALSE,filter = function(data, params) { q = params data=dataFromMongo(qs,q$search,q$start,q$length,q$column,q$order) list( draw = as.integer(q$draw), recordsTotal = recordCount, recordsFiltered =recordCount , data = unname(as.matrix(data)), DT_rows_all = 5 ) }) widget <- datatable(result, rownames = FALSE, class = 'display cell-border compact', selection = 'none', options = list(ajax = list(url = action),scrollX = TRUE,serverSide = TRUE,stateSave = TRUE, escape=FALSE,filter=FALSE,processing=TRUE,language = list(processing = "<img src='spin.gif'>"),columnDefs = list(list(targets = c(0:4,6:25),sortable = FALSE)),order = list(list(5,'asc'))) ) * https://www.rdocumentation.org/packages/DT/versions/0.2/topics/dataTableAjax In order to improve query performance… dataTableAjax() to the resuce!
  • 10. FIRST VERSION “Accept that everything is a draft. It helps to get done.”
  • 12. LET’S ADD 2.5 BILLION MORE! • One node cluster w/ 512GB of RAM • Current data size ~3 terabytes in JSON format “Done is the engine of more.”
  • 13. CMR API Problem – API access to data from Centre for Medicines Research (CMR) International, which provides pharmaceutical industry metrics and trends analysis. Issues: • Clunky API • Tons of parameter combinations and results returned in aggregate • Time-consuming • IT dumped some of the data • Slow • Poor usability on their GUI (filters are clunky) • Ineffective visualizations • Data extracts contain limited details and were difficult to use
  • 14. CMR API First iteration was just ggplots and iterating with client on necessary parameters, don’t need thousands of indications
  • 15. AUTHENTICATION (PYTHON! GASP!) “The point of being done is not to finish but to get other things done.”
  • 16. HOW IT WORKS cmr_api.R auth.py server.R ui.R fetch_data(token, endpt, params) reticulate get_token() “Once you’re done you can throw it away.”
  • 18. DRUG MANUFACTURING • Many combinations of raw materials in specific order used to create final drug substance • Time Consuming • Costly • One problematic substance = lost batches = millions of dollars • Single user was running 100s of SQL queries manually
  • 20.
  • 21. NETWORKD3 “People without dirty hands are wrong. Doing something makes you right.”
  • 22. FIRST VERSION – CORE FUNCTIONALITY “There is no editing stage.”
  • 23. DETAILS COME LATER “Failure counts as done. So do mistakes.”
  • 24. SHINY AND D3 COMMUNICATION server.R: session$sendCustomMessage(type="jsondata",var_json) www/: main.js Shiny.addCustomMessageHandler("jsondata", function (message) { if (typeof(message) !== 'undefined') { var json_data = JSON.parse(message); initTree(json_data.left); initSide(json_data.right); } }); ui.R: tags$script(src=”main.js") • http://myinspirationinformation.com/visualisation/d3-js/integrating-d3-js-into-r-shiny/
  • 25. ”FINAL” PRODUCT Previously: 6 months and full team to identify problematic substance Now: 1-2 users and 1 day to identify problematic substance
  • 26. OVERVIEW OF RAPID PROTOTYPING PROCESS IF WE WERE MAKING DONUTS

Editor's Notes

  1. R 2005 story,
  2. Number 1 of the done manifesto
  3. Single nucleotide polymorphisms, frequently called SNPs (pronounced “snips”), are the most common type of genetic variation among people. Each SNP represents a difference in a single DNA building block, called a nucleotide. For example, a SNP may replace the nucleotide cytosine (C) with the nucleotide thymine (T) in a certain stretch of DNA. SNPs occur normally throughout a person’s DNA. They occur once in every 300 nucleotides on average, which means there are roughly 10 million SNPs in the human genome. Most commonly, these variations are found in the DNA between genes.
  4. We had authentication issues with Rmongo and Mongo 3.0, package was built in scala, we re-built it in java. Still wasn’t resolved 1 year later (jun 2015 when I reported, still open today)
  5. It is basically an implementation of server-side processing of DataTables in R. Also set up auth using the copmany’s single-sign on
  6. Full web dev team would take much longer
  7. 2 years later! Still being used and wanting to expand upon. Shiny infrastructure is there though.
  8. What are the latest trends in R&D productivity across the industry? What are the key factors that influence R&D productivity? How do different companies compare — with the industry, with competitors? What are the latest trends in industry pipeline volumes, cycle times and success rates – by therapeutic area and granular indications? What are the most effective and useful metrics for measuring and comparing R&D productivity across the global pharmaceutical industry? Are the timelines and success rates by therapy area being experienced by my company competitive with the rest of the industry and what are the drivers for above or below average performance?
  9. Add more charts
  10. Fastest way to get the data, python auth code example in their docs
  11. Refactor not throw away
  12. networkD3 wasn’t enough needed more customization
  13. Need a bi-directional tree, colors showed up that the client didn’t know existed!
  14. Send custom message to front-end This searches for the custom message of the type “jsondata”. Then it takes the contents of the message, and assigns them to a java script variable, in this case json_data