Statistical analysis constitutes an essential part of every serious scientific research. Without data and a formal process of searching for evidences supporting or disproving stated hypotheses, there is nothing but mere opinion. Evidence-based medicine is no exception
2. www.samedanltd.com l 19
especially for medium- and large-
sized teams of analysts in advanced,
corporate settings.
The software must be bought, and certain
functionalities are grouped and packed
in modules, which can be purchased
separately. Buying the license also grants
access to a professional helpdesk.
GNU R
Developed by the R Core Team and supported
by the R Consortium, GNU R (R) is one of
the most popular and best recognisable
computational environments. It is a direct
successor of the S programming language,
founded in 1976 at Bell Laboratories. In 1998,
S became the first statistical system that
received the top award from the Association
for Computing Machinery.
Today, R can be found in almost every area
of science, such as medicine, pharmacy,
genetics, epidemiology, banking, social
media, data mining and machine learning.
In particular, its capabilities in clinical research
and genomics are remarkable. Many of the
top companies – including pharmaceutical
ones – not only use R, but also contribute by
supplying specialised packages (1,2).
Likewise, R is also popular at universities
and in research departments, where new
algorithms are invented – for example,
at the CERN, NASA and the National Institute
of Standards and Technology.
R is well-known for its flexibility, full-featured
language, strong graphical and reporting
capabilities, and ability to access data in
multiple formats as well as its capacity of
containing a huge number of statistical
methods, stored in nearly 9,000 additional
packages (3). It is a free, general public licensed
software; however, companies like Microsoft,
Oracle and RStudio offer commercial, tweaked
versions as well.The support provided by the
broad and vibrant community of both users
and institutions is comprehensive.
R in Controlled Trials
It is sometimes claimed that R is not
validated and, as a result, cannot be used
in controlled trials and environments.
This requires a deeper explanation but, in
brief, it is a common misunderstanding.
In fact, every CRO develops processes
regarding widely understood verification of
both the software and written programmes.
Thus, validation is a constant part of creating
programmes, and should be carried out
regardless of any assurances made by
the software vendor.
Since the relevant guidelines released by the
FDA, ICH and the R Foundation for Statistical
Computing – as well as the entire source
code of R and all its packages – are publicly
available, the validation is easy to achieve
(4-7). Every R package has its author or
maintainer, who must follow rules in order to
publish the package into the Comprehensive
R Archive Network.
There is also an informal though significant
argument: R is created by professional
statisticians and used by over two million
individuals (8,9). As the code is open,
anyone can verify it. Archival messages in
mailing groups, forums and GitHub prove
both – that new procedures are constantly
checked and improved, while the older ones
are clean and stable. It is also important to
note that a local, version-frozen repository
of packages can be set up in order to ensure
reproducibility of results, regardless of
possible changes in the code (10).
Last but not least, R has been identified by
the FDA as suitable for both interpreting
data from clinical trials, as well as for
making submissions (11).
Why Combine SAS and R?
Every piece of software has its strengths and
weaknesses, and so does SAS. There are tasks
that can be completed easier or cheaper by
employing external programmes –
Figure 1
Required algorithm or functionality
Bidirectional data exchange
SAS IML module
Or different
method of
communication
SAS module 1
SAS base
Missing or
expensive
functionality
SAS module 2
1
nhd
∑
n
i=1
K
x – xi
h )(
4. www.samedanltd.com l 21
differentiation, used for advanced,
nonlinear statistical modelling (18).
Scenario 4: Validation
Regardless of the fact that there is no
requirement for validation of critical parts
of statistical programmes to be done in a
different statistical package, our experience
show that following this route may leverage
quality of the validation. Changing the
way of thinking, forced by using a different
programming language, may alter the
perspective and reception of validation
instructions, which helps to detect
and resolve issues.
Scenario 5: Advanced Graphing
There is an advanced graphing subsystem
in SAS called SAS/GRAPH, which can
produce high-quality plots. Opinions
vary, but some programmers describe
the process as slightly complex. In such a
situation, it is worth noting that R is capable
of producing advanced and professional-
looking graphs by using the famous
ggplot2 package, an implementation
of‘Grammar of Graphics’. There are also
other graphing subsystems available.
Scenario 6: Exposing Results
R may help to expose results of analyses
done in SAS in a network. There are three
packages that are able to simplify the
process: the first is knitr, a general-purpose
package for dynamic report generation
by following the reproducible research
paradigm. Reports are created by mixing
formatted content with chunks of R, SAS
and SQL codes. When processed, results
replace the commands or are coalesced.
Meanwhile, the OpenCPU and Shiny
packages help to constitute a full featured
web server that is able to host dynamic web
applications and reports.
Scenario 7: R-Based Tools
As a lightweight and fully portable
software, where installation is not required
and which works on various operating
systems and architectures (including
ARM-based minicomputers), R is a good
candidate for a framework used to create
advanced statistical solutions, such as:
• Automated processes searching
a database for potential frauds
• Local, handy windows-based
analytical tools
• Web-enabled reporting systems
and dashboards
Key Considerations
SAS and R are two different worlds, so
connecting them may result in issues
significantly affecting the results of the
analysis. Some fundamental discrepancies are:
• Origin of dates
• Representations of floating point numbers
• Used sum of squares
• Default contrasts
• Calculation of quantiles
• Generation of random numbers
• Implementation of advanced models
All of these must be taken into account
when integrating both systems or
validating the result of analyses.
As a summary, it can be said that the
integration of SAS with R packages, when
done properly, may bring noticeable benefits
in terms of enhanced functionality and
reduced costs. The scenarios shown above
do not exhaust the list, which is limited
mostly by one’s experience and invention.
References
1. Visit: http://blog.revolutionanalytics.
com/2014/05/companies-using-r-in-2014.html
2. Visit: www.cioreview.com/news/gsdesign-
explorer-to-optimize-merck-s-clinical-trial-
process-nid-1305-cid-36.html
3. Visit: www.r-clinical-research.com
4. Visit: www.fda.gov/ohrms/dockets/98fr/04d-
0440-gdl0002.pdf
5. Visit: www.fda.gov/ohrms/dockets/
98fr/5667fnl.pdf
6. Visit: www.fda.gov/regulatoryinformation/
guidances/ucm085281.htm
7. Visit: www.r-project.org/doc/R-FDA.pdf
8. Visit: www.r-project.org/foundation/board.html
9. Visit: www.oracle.com/technetwork/database/
options/advanced-analytics/r-enterprise/
bringing-r-to-the-enterprise-1956618.pdf
10. Visit: https://mran.microsoft.com/documents/
rro/reproducibility
11. Visit: http://blog.revolutionanalytics.
com/2012/06/fda-r-ok.html
12. Visit: https://support.sas.com/rnd/app/studio/
Rinterface2.html
13. Visit: http://support.sas.com/documentation/
cdl/en/imlug/63541/HTML/default/viewer.
htm#r_toc.htm
14. Visit: www.jstatsoft.org/article/view/
v046c02/v46c02.pdf
15. Visit: www.lexjansen.com/nesug/nesug12/
bb/bb10.pdf
16. Visit: www.phuse.eu/download.
aspx?type=cmsdocid=2847
17. Visit: https://journal.r-project.org/
archive/2013-2/wang-shan.pdf
18. Visit: www.admb-project.org
Adrian Olszewski is a
Biostatistician in the
Biometrics and Clinical Trial
Data Execution Systems
Department at KCR. He is
responsible for providing
comprehensive support for trials from early
design considerations, through the data
analysis – including interim evaluations – to
the final report. Adrian holds an MSc degree
in Computer Science.
Email: info@kcrcro.com
As a lightweight and fully portable
software, where installation is not required and
which works on various operating systems
and architectures, R is a good candidate for
a framework used to create advanced
statistical solutions