SlideShare a Scribd company logo
1 of 7
Download to read offline
Excel and R: data exchange
   R-meetup of Los Angeles
   Eric Kostello



    14 December 2010


Monday, December 13, 2010
On spreadsheets
   ✤    Power of spreadsheets: you can do “anything”

   ✤    Problem with spreadsheets: anything can happen

   ✤    Spreadsheets are ubiquitous

   ✤    Very handy for certain types of problems

   ✤    Users like the control they give

   ✤    This is not a talk about why not to use spreadsheets, but check these out...

         ✤    http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html

               ✤   Encyclopedia of the Evils, but acknowledges utility when limited in scope

         ✤    “spreadsheet addiction”: search the web with this phrase to see that problems
              with spreadsheets are not confined to data analysis

Monday, December 13, 2010
Living with spreadsheets
   ✤    R users often must exchange data with spreadsheet users

         ✤    Data is stored in spreadsheets because...

               ✤   That is the way it was archived/sent/obtained

               ✤   It is still being created that way and change is difficult/
                   impossible

   ✤    So, communication is essential

         ✤    Easier communication may make your day easier and your
              exchange more reliable


Monday, December 13, 2010
Data exchange between R and Excel
        Method/                                                                                                    Cross
                            RW           Details                     Pros                      Cons
        package                                                                                                   platform
                                                                                     Manual steps required
   Avoid                    RW   Import/Export CSV         Avoid Excel pitfalls                                   Yes
                                                                                     every time

                                                           Can read rows and
                                                                                     Complexity &                 With driver
   RODBC + drivers R             Adaptation of SQL APIs    columns. Some writing
                                                                                     inconsistencies              purchase
                                                           ability on Windows.

   read.xls                      Automates creation of                               Data frame to sheet only.
                            R                                                                                     Yes
   (gdata)                       CSV, then imports                                   Trouble with quotes.

   write.xls
                                 Automates creation of                               data frame to sheet only.
   (dataframe2xls &         W                              Some formatting ability                                Yes
                                 CSVs, then converts                                 (Coerces to dataframe.)
   Python)

   WriteXLS                      Automates creation of                               Limited flexibility. Some
                     W                                     Some formatting ability                                Yes
   (WriteXLS & Perl)             CSVs, then converts                                 oddities in function call.
   RDCOMClient              RW   via Windows APIs          Cell level control        Not fully vectorized?        No


                                                           Data frames and smaller. xlsx format only.
                                 Using Java library from
   (xlsx , rJava &          RW                             Fine formatting control. Low level calls not all       Yes
                                 Apache
   xlsxJars)                                               xlsx file format.         fully vectorized.




Monday, December 13, 2010
RDCOMClient example
        library ( "RDCOMClient")


        exampleTemplateFilename <- "Example_Template.xls"
        newExcelReportInstance <- paste ( "reportsDirectoryReport_for_", format(Sys.Date(), "%d_%b_%Y"), ".xls", sep = '')
        copyCommand <- paste ( "copy", exampleTemplateFilename, newExcelReportInstance )
        shell ( copyCommand, shell = 'cmd    %WINDIR%')
        print ( "Ignore the error message about UNC paths if it occurs; it does not matter.")


        exampleData <- data.frame ( X = 10:19, Y = 566:557 )
        .COMInit() # Start server
        exl <- COMCreate("Excel.Application") # Hook to Excel
        books <- exl[["workbooks"]] # Talk to workbooks


        exampleBook <- books$open(newHOfile)
        exampleSheets <- exampleBook[["sheets"]]
        exampleSheet    <- exampleSheets$Item(as.integer(1))


        # But, I cannot figure out how to get the "Range" to be larger than 1x1, so iterate through rows


        headerRowPadding <- 1 # Allow for this many header rows
        for ( ithRow in 1:nrow ( exampleData ) ) {
                   cellReferenceA <- exampleSheet$Range( paste ( "A", r + headerRowPadding, sep = '') ) # Create a reference to worksheet
                   Column A, row ithRow + headerRowPadding
                   cellReferenceA[["Value"]] <- exampleData[ ithRow, "X" ]
                   cellReferenceB <- exampleSheet$Range( paste ( "B", r + headerRowPadding, sep = '') )
                   cellReferenceB[["Value"]] <- exampleData[ ithRow, "Y" ]
                   }
        exampleBook$save()
        exampleBook$close()




Monday, December 13, 2010
xlsx package overview

   ✤    Philosophy: Use Excel interface capabilities created in a more widely
        used codebase: The Apache Java API to Microsoft documents.

   ✤    Many capabilities are obtained “for free.”

   ✤    Fully-featured cross platform solution

   ✤    This is a suitable candidate for one stop shopping in R to Excel
        communications

         ✤    but requiring it may be a problem for some installations (rJava
              dependency)



Monday, December 13, 2010
xlsx package capabilities
   ✤    Easy data frame import/export: read.xls and write.xls

         ✤    write.xlsx ( exampleData, file = “exampleData Workbook.xlsx”)

         ✤    read.xlsx ( file = ..., sheet = ... )

               ✤   One sheet at a time. Can keep formulas, provide colClasses.

   ✤    Formatting control (using Excel native capabilities, such as borderColor)

   ✤    Read/Write comments

   ✤    Merging regions, freezing panes, set print area, set zoom

   ✤    Can insert images (dib, emf, jpeg, pict, png, wmf)

Monday, December 13, 2010

More Related Content

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Los Angeles R users group - Dec 14 2010 - Part 4

  • 1. Excel and R: data exchange R-meetup of Los Angeles Eric Kostello 14 December 2010 Monday, December 13, 2010
  • 2. On spreadsheets ✤ Power of spreadsheets: you can do “anything” ✤ Problem with spreadsheets: anything can happen ✤ Spreadsheets are ubiquitous ✤ Very handy for certain types of problems ✤ Users like the control they give ✤ This is not a talk about why not to use spreadsheets, but check these out... ✤ http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html ✤ Encyclopedia of the Evils, but acknowledges utility when limited in scope ✤ “spreadsheet addiction”: search the web with this phrase to see that problems with spreadsheets are not confined to data analysis Monday, December 13, 2010
  • 3. Living with spreadsheets ✤ R users often must exchange data with spreadsheet users ✤ Data is stored in spreadsheets because... ✤ That is the way it was archived/sent/obtained ✤ It is still being created that way and change is difficult/ impossible ✤ So, communication is essential ✤ Easier communication may make your day easier and your exchange more reliable Monday, December 13, 2010
  • 4. Data exchange between R and Excel Method/ Cross RW Details Pros Cons package platform Manual steps required Avoid RW Import/Export CSV Avoid Excel pitfalls Yes every time Can read rows and Complexity & With driver RODBC + drivers R Adaptation of SQL APIs columns. Some writing inconsistencies purchase ability on Windows. read.xls Automates creation of Data frame to sheet only. R Yes (gdata) CSV, then imports Trouble with quotes. write.xls Automates creation of data frame to sheet only. (dataframe2xls & W Some formatting ability Yes CSVs, then converts (Coerces to dataframe.) Python) WriteXLS Automates creation of Limited flexibility. Some W Some formatting ability Yes (WriteXLS & Perl) CSVs, then converts oddities in function call. RDCOMClient RW via Windows APIs Cell level control Not fully vectorized? No Data frames and smaller. xlsx format only. Using Java library from (xlsx , rJava & RW Fine formatting control. Low level calls not all Yes Apache xlsxJars) xlsx file format. fully vectorized. Monday, December 13, 2010
  • 5. RDCOMClient example library ( "RDCOMClient") exampleTemplateFilename <- "Example_Template.xls" newExcelReportInstance <- paste ( "reportsDirectoryReport_for_", format(Sys.Date(), "%d_%b_%Y"), ".xls", sep = '') copyCommand <- paste ( "copy", exampleTemplateFilename, newExcelReportInstance ) shell ( copyCommand, shell = 'cmd %WINDIR%') print ( "Ignore the error message about UNC paths if it occurs; it does not matter.") exampleData <- data.frame ( X = 10:19, Y = 566:557 ) .COMInit() # Start server exl <- COMCreate("Excel.Application") # Hook to Excel books <- exl[["workbooks"]] # Talk to workbooks exampleBook <- books$open(newHOfile) exampleSheets <- exampleBook[["sheets"]] exampleSheet <- exampleSheets$Item(as.integer(1)) # But, I cannot figure out how to get the "Range" to be larger than 1x1, so iterate through rows headerRowPadding <- 1 # Allow for this many header rows for ( ithRow in 1:nrow ( exampleData ) ) { cellReferenceA <- exampleSheet$Range( paste ( "A", r + headerRowPadding, sep = '') ) # Create a reference to worksheet Column A, row ithRow + headerRowPadding cellReferenceA[["Value"]] <- exampleData[ ithRow, "X" ] cellReferenceB <- exampleSheet$Range( paste ( "B", r + headerRowPadding, sep = '') ) cellReferenceB[["Value"]] <- exampleData[ ithRow, "Y" ] } exampleBook$save() exampleBook$close() Monday, December 13, 2010
  • 6. xlsx package overview ✤ Philosophy: Use Excel interface capabilities created in a more widely used codebase: The Apache Java API to Microsoft documents. ✤ Many capabilities are obtained “for free.” ✤ Fully-featured cross platform solution ✤ This is a suitable candidate for one stop shopping in R to Excel communications ✤ but requiring it may be a problem for some installations (rJava dependency) Monday, December 13, 2010
  • 7. xlsx package capabilities ✤ Easy data frame import/export: read.xls and write.xls ✤ write.xlsx ( exampleData, file = “exampleData Workbook.xlsx”) ✤ read.xlsx ( file = ..., sheet = ... ) ✤ One sheet at a time. Can keep formulas, provide colClasses. ✤ Formatting control (using Excel native capabilities, such as borderColor) ✤ Read/Write comments ✤ Merging regions, freezing panes, set print area, set zoom ✤ Can insert images (dib, emf, jpeg, pict, png, wmf) Monday, December 13, 2010