More than Just Lines on a Map: Best Practices for U.S Bike Routes
Los Angeles R users group - Dec 14 2010 - Part 4
1. Excel and R: data exchange
R-meetup of Los Angeles
Eric Kostello
14 December 2010
Monday, December 13, 2010
2. On spreadsheets
✤ Power of spreadsheets: you can do “anything”
✤ Problem with spreadsheets: anything can happen
✤ Spreadsheets are ubiquitous
✤ Very handy for certain types of problems
✤ Users like the control they give
✤ This is not a talk about why not to use spreadsheets, but check these out...
✤ http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html
✤ Encyclopedia of the Evils, but acknowledges utility when limited in scope
✤ “spreadsheet addiction”: search the web with this phrase to see that problems
with spreadsheets are not confined to data analysis
Monday, December 13, 2010
3. Living with spreadsheets
✤ R users often must exchange data with spreadsheet users
✤ Data is stored in spreadsheets because...
✤ That is the way it was archived/sent/obtained
✤ It is still being created that way and change is difficult/
impossible
✤ So, communication is essential
✤ Easier communication may make your day easier and your
exchange more reliable
Monday, December 13, 2010
4. Data exchange between R and Excel
Method/ Cross
RW Details Pros Cons
package platform
Manual steps required
Avoid RW Import/Export CSV Avoid Excel pitfalls Yes
every time
Can read rows and
Complexity & With driver
RODBC + drivers R Adaptation of SQL APIs columns. Some writing
inconsistencies purchase
ability on Windows.
read.xls Automates creation of Data frame to sheet only.
R Yes
(gdata) CSV, then imports Trouble with quotes.
write.xls
Automates creation of data frame to sheet only.
(dataframe2xls & W Some formatting ability Yes
CSVs, then converts (Coerces to dataframe.)
Python)
WriteXLS Automates creation of Limited flexibility. Some
W Some formatting ability Yes
(WriteXLS & Perl) CSVs, then converts oddities in function call.
RDCOMClient RW via Windows APIs Cell level control Not fully vectorized? No
Data frames and smaller. xlsx format only.
Using Java library from
(xlsx , rJava & RW Fine formatting control. Low level calls not all Yes
Apache
xlsxJars) xlsx file format. fully vectorized.
Monday, December 13, 2010
5. RDCOMClient example
library ( "RDCOMClient")
exampleTemplateFilename <- "Example_Template.xls"
newExcelReportInstance <- paste ( "reportsDirectoryReport_for_", format(Sys.Date(), "%d_%b_%Y"), ".xls", sep = '')
copyCommand <- paste ( "copy", exampleTemplateFilename, newExcelReportInstance )
shell ( copyCommand, shell = 'cmd %WINDIR%')
print ( "Ignore the error message about UNC paths if it occurs; it does not matter.")
exampleData <- data.frame ( X = 10:19, Y = 566:557 )
.COMInit() # Start server
exl <- COMCreate("Excel.Application") # Hook to Excel
books <- exl[["workbooks"]] # Talk to workbooks
exampleBook <- books$open(newHOfile)
exampleSheets <- exampleBook[["sheets"]]
exampleSheet <- exampleSheets$Item(as.integer(1))
# But, I cannot figure out how to get the "Range" to be larger than 1x1, so iterate through rows
headerRowPadding <- 1 # Allow for this many header rows
for ( ithRow in 1:nrow ( exampleData ) ) {
cellReferenceA <- exampleSheet$Range( paste ( "A", r + headerRowPadding, sep = '') ) # Create a reference to worksheet
Column A, row ithRow + headerRowPadding
cellReferenceA[["Value"]] <- exampleData[ ithRow, "X" ]
cellReferenceB <- exampleSheet$Range( paste ( "B", r + headerRowPadding, sep = '') )
cellReferenceB[["Value"]] <- exampleData[ ithRow, "Y" ]
}
exampleBook$save()
exampleBook$close()
Monday, December 13, 2010
6. xlsx package overview
✤ Philosophy: Use Excel interface capabilities created in a more widely
used codebase: The Apache Java API to Microsoft documents.
✤ Many capabilities are obtained “for free.”
✤ Fully-featured cross platform solution
✤ This is a suitable candidate for one stop shopping in R to Excel
communications
✤ but requiring it may be a problem for some installations (rJava
dependency)
Monday, December 13, 2010
7. xlsx package capabilities
✤ Easy data frame import/export: read.xls and write.xls
✤ write.xlsx ( exampleData, file = “exampleData Workbook.xlsx”)
✤ read.xlsx ( file = ..., sheet = ... )
✤ One sheet at a time. Can keep formulas, provide colClasses.
✤ Formatting control (using Excel native capabilities, such as borderColor)
✤ Read/Write comments
✤ Merging regions, freezing panes, set print area, set zoom
✤ Can insert images (dib, emf, jpeg, pict, png, wmf)
Monday, December 13, 2010