SlideShare a Scribd company logo
1 of 8
Download to read offline
Automatic reports with knitr
                     How to organize recurring data analysis output


                                          Ryan P. Mears

                                         Harvard Medical School


                                         January 24, 2013




Ryan P. Mears (Harvard Medical School)    Automatic reports with knitr   January 24, 2013   1/8
Background



 Outline



  1    Background


  2    Automatic Reports


  3    Other approaches with knitr




Ryan P. Mears (Harvard Medical School)   Automatic reports with knitr   January 24, 2013   2/8
Background



 Rationale


          Many data projects require much effort to improve signal/noise
          Examples: image enhancement, optical character recognition,
          MR imaging, electrophysiology, etc.
          Unsupervised learning can assist efforts to remove artifacts.
          Strategies: k-means, principal components, independent
          components, etc.
          Method: After classification select & remove likely noise sources
          then compute criterion measures of data quality




Ryan P. Mears (Harvard Medical School)   Automatic reports with knitr   January 24, 2013   3/8
Background



 Analysis Pipeline



                Display             Select          Measure


      1   Display Clustering result
      2   Select features for further processing
      3   Measure based on selection . . . repeat if under criterion




Ryan P. Mears (Harvard Medical School)       Automatic reports with knitr   January 24, 2013   4/8
Automatic Reports



 Generate data table from existing graphics

          Read & filter directory contents.
          Create database of cases, graphics filenames and information.
          Parse filenames and additional information to structure report.
          Important: Initial flexibility of report structure
          (inheritance/modularity etc.) will save work later.



                          BlockSubj Valid Clean BlinkA RangeA FinValid FinClean FinBlinkA FinRangeA
                       1     b1-109   262   112     87    135      262      191         0        71
                       2     b2-109   264   109     82    127      264      208         0        56
                       3     b3-109   262    80    152    110      262      203         0        59
                       4     b1-276   264   255      6      5      264      261         0         3
                       5     b2-276   264   241     19     16      264      263         0         1
                       6     b3-276   262   223     34     33      262      259         0         3
                       7     b1-297   264   166     83     91      264      230         0        34
                       8     b2-297   264   217     36     42      264      249         0        15
                       9     b3-297   263   190     64     71      263      245         0        18
                       10    b1-302   260   170     88     67      260      246         0        14
                       11    b2-302   263   182     72     53      263      253         0        10
                       12    b3-302   264   247     15      8      264      261         0         3
                       13    b1-333   263   152    103     53      263      255         0         8
Ryan P. Mears   (Harvard Medical School) 189
                       14    b2-333   263           74 Automatic263
                                                           17       reports262 knitr0
                                                                             with                 1   January 24, 2013   5/8
Automatic Reports


 A
 LTEX templates: knitr parent & child documents
         Slides are created from database cases.
         Child document contains slide data & figure formatting.
                                     A
         Parent document contains LTEXdocument information.
         Parent also contains code to iterate through database Page 1 of 1
     ICselection_Parent.Rnw
                                                                cases.
         Parent-child document structure separates content Printed For: ryan
     Printed: 1/22/13 1:09:45 PM
                                                            from format.

     <<run-all, results='hide', message=FALSE, echo=FALSE>>=
     out = NULL
     for (i in seq_len(75)) {
       out = c(out, knit_child('ggchild-overlaypdf.Rnw', sprintf('template-%d.tex', i)))
     }
     <<write-results, message=FALSE,warning=FALSE,error=FALSE,echo=FALSE, results='asis'>>=
     cat(out, sep = 'n')
     @
     end{document}




Ryan P. Mears (Harvard Medical School)        Automatic reports with knitr         January 24, 2013   6/8
Other approaches with knitr



 Other approaches: knitr options, patterns & hooks
          Customized chunk options
          Brew patterns © LTEX
                          A

          Markdown patterns © Pandoc © LTEX
                                       A




Ryan P. Mears (Harvard Medical School)          Automatic reports with knitr   January 24, 2013   7/8
Acknowledgements


  Neurodynamics Lab




  Yihui Xie (Creator of the knitr package)
  Jeromy Anglim’s Blog (Psychology and Statistics)
  Markus Gesmann (Mages’ Blog)


Ryan P. Mears (Harvard Medical School)   Automatic reports with knitr   January 24, 2013   8/8

More Related Content

Similar to Greater Boston UseR Meetup talk_2013_mears

Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatisticshydrologyproject0
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
Prediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptxPrediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptxEwout Steyerberg
 
Analyzing and Visualizing Data with Power BI (SF)_Student.pptx
Analyzing and Visualizing Data with Power BI (SF)_Student.pptxAnalyzing and Visualizing Data with Power BI (SF)_Student.pptx
Analyzing and Visualizing Data with Power BI (SF)_Student.pptxAlexChua42
 
News article classification using Naive Bayes Algorithm
News article classification using Naive Bayes AlgorithmNews article classification using Naive Bayes Algorithm
News article classification using Naive Bayes AlgorithmIRJET Journal
 
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsGSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsNeo4j
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu
 
IDENTIFICATION OF RISK FACTORS FOR RUNNING RELATED INJURIES USING MACHINE LEA...
IDENTIFICATION OF RISK FACTORS FOR RUNNING RELATED INJURIES USING MACHINE LEA...IDENTIFICATION OF RISK FACTORS FOR RUNNING RELATED INJURIES USING MACHINE LEA...
IDENTIFICATION OF RISK FACTORS FOR RUNNING RELATED INJURIES USING MACHINE LEA...IRJET Journal
 
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...
IRJET -  	  A Survey on Machine Learning Intelligence Techniques for Medical ...IRJET -  	  A Survey on Machine Learning Intelligence Techniques for Medical ...
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...IRJET Journal
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 courseDmitry Grapov
 

Similar to Greater Boston UseR Meetup talk_2013_mears (14)

Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Kddcup2011
Kddcup2011Kddcup2011
Kddcup2011
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Prediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptxPrediction research: perspectives on performance Stanford 19May22.pptx
Prediction research: perspectives on performance Stanford 19May22.pptx
 
Analyzing and Visualizing Data with Power BI (SF)_Student.pptx
Analyzing and Visualizing Data with Power BI (SF)_Student.pptxAnalyzing and Visualizing Data with Power BI (SF)_Student.pptx
Analyzing and Visualizing Data with Power BI (SF)_Student.pptx
 
News article classification using Naive Bayes Algorithm
News article classification using Naive Bayes AlgorithmNews article classification using Naive Bayes Algorithm
News article classification using Naive Bayes Algorithm
 
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsGSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
IDENTIFICATION OF RISK FACTORS FOR RUNNING RELATED INJURIES USING MACHINE LEA...
IDENTIFICATION OF RISK FACTORS FOR RUNNING RELATED INJURIES USING MACHINE LEA...IDENTIFICATION OF RISK FACTORS FOR RUNNING RELATED INJURIES USING MACHINE LEA...
IDENTIFICATION OF RISK FACTORS FOR RUNNING RELATED INJURIES USING MACHINE LEA...
 
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...
IRJET -  	  A Survey on Machine Learning Intelligence Techniques for Medical ...IRJET -  	  A Survey on Machine Learning Intelligence Techniques for Medical ...
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 

Greater Boston UseR Meetup talk_2013_mears

  • 1. Automatic reports with knitr How to organize recurring data analysis output Ryan P. Mears Harvard Medical School January 24, 2013 Ryan P. Mears (Harvard Medical School) Automatic reports with knitr January 24, 2013 1/8
  • 2. Background Outline 1 Background 2 Automatic Reports 3 Other approaches with knitr Ryan P. Mears (Harvard Medical School) Automatic reports with knitr January 24, 2013 2/8
  • 3. Background Rationale Many data projects require much effort to improve signal/noise Examples: image enhancement, optical character recognition, MR imaging, electrophysiology, etc. Unsupervised learning can assist efforts to remove artifacts. Strategies: k-means, principal components, independent components, etc. Method: After classification select & remove likely noise sources then compute criterion measures of data quality Ryan P. Mears (Harvard Medical School) Automatic reports with knitr January 24, 2013 3/8
  • 4. Background Analysis Pipeline Display Select Measure 1 Display Clustering result 2 Select features for further processing 3 Measure based on selection . . . repeat if under criterion Ryan P. Mears (Harvard Medical School) Automatic reports with knitr January 24, 2013 4/8
  • 5. Automatic Reports Generate data table from existing graphics Read & filter directory contents. Create database of cases, graphics filenames and information. Parse filenames and additional information to structure report. Important: Initial flexibility of report structure (inheritance/modularity etc.) will save work later. BlockSubj Valid Clean BlinkA RangeA FinValid FinClean FinBlinkA FinRangeA 1 b1-109 262 112 87 135 262 191 0 71 2 b2-109 264 109 82 127 264 208 0 56 3 b3-109 262 80 152 110 262 203 0 59 4 b1-276 264 255 6 5 264 261 0 3 5 b2-276 264 241 19 16 264 263 0 1 6 b3-276 262 223 34 33 262 259 0 3 7 b1-297 264 166 83 91 264 230 0 34 8 b2-297 264 217 36 42 264 249 0 15 9 b3-297 263 190 64 71 263 245 0 18 10 b1-302 260 170 88 67 260 246 0 14 11 b2-302 263 182 72 53 263 253 0 10 12 b3-302 264 247 15 8 264 261 0 3 13 b1-333 263 152 103 53 263 255 0 8 Ryan P. Mears (Harvard Medical School) 189 14 b2-333 263 74 Automatic263 17 reports262 knitr0 with 1 January 24, 2013 5/8
  • 6. Automatic Reports A LTEX templates: knitr parent & child documents Slides are created from database cases. Child document contains slide data & figure formatting. A Parent document contains LTEXdocument information. Parent also contains code to iterate through database Page 1 of 1 ICselection_Parent.Rnw cases. Parent-child document structure separates content Printed For: ryan Printed: 1/22/13 1:09:45 PM from format. <<run-all, results='hide', message=FALSE, echo=FALSE>>= out = NULL for (i in seq_len(75)) { out = c(out, knit_child('ggchild-overlaypdf.Rnw', sprintf('template-%d.tex', i))) } <<write-results, message=FALSE,warning=FALSE,error=FALSE,echo=FALSE, results='asis'>>= cat(out, sep = 'n') @ end{document} Ryan P. Mears (Harvard Medical School) Automatic reports with knitr January 24, 2013 6/8
  • 7. Other approaches with knitr Other approaches: knitr options, patterns & hooks Customized chunk options Brew patterns © LTEX A Markdown patterns © Pandoc © LTEX A Ryan P. Mears (Harvard Medical School) Automatic reports with knitr January 24, 2013 7/8
  • 8. Acknowledgements Neurodynamics Lab Yihui Xie (Creator of the knitr package) Jeromy Anglim’s Blog (Psychology and Statistics) Markus Gesmann (Mages’ Blog) Ryan P. Mears (Harvard Medical School) Automatic reports with knitr January 24, 2013 8/8