SlideShare une entreprise Scribd logo
1  sur  5
WHITE PAPER
BIG DATA AND THE NEEDS OF
THE PHARMA INDUSTRY
JULY 2013
TABLE OF CONTENTS
Introduction.................................................................................... 1
Big Data into Little Data ................................................................2
Big Data as the Next Wave of an Historical Trend ........................3
1960’s Big Data – Patents ....................................................3
1970’s Big Data – Chemistry................................................ 4
1980’s Big Data – Sequences.............................................. 4
1990’s Big Data – Arrays.......................................................5
2000’s Big Data – Next Generation Sequencing ................ 6
Big Data in the Present..................................................................7
New tools and techniques for old problems........................7
New problems; new opportunities..................................... 12
Big Data in the Future..................................................................15
Data about Big Data...........................................................14
Big Data engineers..............................................................15
Conclusion....................................................................................16
BIG DATA AND THE NEEDS OF THE PHARMA INDUSTRY 1
INTRODUCTION
Gartner, Inc. neatly defines Big Data as “high-volume, high-velocity and high-variety information
assets that demand cost-effective, innovative forms of information processing for enhanced insight and
decision making.” Thomson Reuters Life Sciences sees Big Data as both a problem and an opportunity.
In a recent survey, Thomson Reuters asked a group of IT leaders in Pharma how they view Big Data.
Unequivocally, 100 percent responded that Big Data is an opportunity. This isn’t that surprising given
that looking for empirical facts in large bodies of evidence has always been a driver of innovation. The
Pharma industry generally works by identifying some correlation, e.g. between disease and protein,
and then working out how to exploit it. The science as to why that correlation exists tends to follow
despite recent trends towards science-led discovery. The ability to apply this thinking to even larger
and richer sets of data is clearly an opportunity for Pharma to find new correlations and develop new
drugs.
When asked about where they saw Big Data opportunities, our respondents overwhelmingly
highlighted two areas of focus: early-stage discovery (41.2 percent) and understanding the market
(26.5 percent). Figure 1 illustrates this, while also showing the emerging trend of personalized (or
precision) medicine. Drug discovery has always been a data-driven activity and it makes sense to
extend that to the new volumes, varieties and velocities of data coming out of the labs and out of public
initiatives. Market understanding is new. It reflects both the change in the Pharma market dynamics,
caused by the greater influence on prescribing behavior by payers, and the promise of rich patient-level
data from electronic health records. Understanding the patient (personalized medicine), scoring 14.7
percent in our survey, is also a significant focus in Pharma right now, so access to, and the ability to,
digest this kind of data represents a real win to our respondents in providing value to their businesses.
Figure 1: THE BIGGEST OPPORTUNITIES FOR BIG DATA
Source: Thomson Reuters Big Data Survey
2 BIG DATA AND THE NEEDS OF THE PHARMA INDUSTRY
BIG DATA INTO LITTLE DATA
The problem: humans can’t work with Big Data directly. To realize the value of all this data, we need
to reduce it to human proportions. When you examine what our customers are really doing with Big
Data, what you see is the application of tools and techniques to “shrink” the data. In the Life Sciences
business of Thomson Reuters, we call this “making Big Data look like Little Data.” Little Data is the
data we are equipped to handle. It comprises reliable, evidence-backed facts that scientists can use in
models, visualizations and analyses. It is actionable data that can help Pharma companies with their
core business of developing new and better drugs.
In drug discovery, this usually takes the form of designing in-silico experiments. This requires building
up data sets from disparate sources, requiring cross-functional teams to work together on data that
scores highly in the Variety vector of Big Data. Making this data act like Little Data is a challenge in
data harmonization. You need common ontologies and ways to present data to scientists with very
different skills and backgrounds, so it is delivered in a way they understand.
In patient understanding, the challenge is to get the data in one place and to filter the outputs so
that the interesting correlations stand out from the “noise” of correlations that are either obvious or
spurious--a situation analogous to signal amplification in the audio industry.
BIG DATA AND THE NEEDS OF THE PHARMA INDUSTRY 3
BIG DATA AS THE NEXT WAVE OF AN
HISTORICAL TREND
Challenging volumes of data is not a new phenomenon in Pharma. Rather, Big Data is an evolution in
the application of data in drug R&D. It creates new challenges and provides new tools, but is not the
complete revolution that some commentators claim it to be. Over many years, information companies
like Thomson Reuters have supported customers in their efforts to divide whatever is the current “data
elephant” into manageable chunks.
1960’S BIG DATA – PATENTS
An early Big Data challenge was the boom in patenting. In the first half of the twentieth century, a
Pharma scientist could keep up with all the patents in his field by reading the patent applications him/
herself. Figure 2 shows that by 1960, s/he would have to read over 1,000 patents a year to keep up.
And, would have to be able to read at least English, German, French and Japanese. Abstracting and
indexing services such as Derwent Publications emerged to address this challenge. Derwent’s curation
team read, classified and abstracted all the patents coming out of the leading patent offices around
the world (as they continue to do so today). The editorially-enhanced information was published in
weekly bulletins separated by area of interest so scientists could easily get to those of relevance.
As can be seen from the graph, this trend has continued more or less unchecked ever since. Nowadays
nobody would even consider trying to keep up with patent information in any other way than setting up
tailored alerting services on patent databases.
Figure 2: THE RISE IN GLOBAL PHARMA PATENTING
140000
120000
100000
80000
60000
40000
20000
0
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
1962
1960
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
PatentFamilies
Source: Thomson Reuters Derwent World Patents Index
1970’S BIG DATA – CHEMISTRY
The 1970’s saw the emergence of computer databases. In particular, the development of databases
that could store, search and display chemical structures. These enabled Pharma companies to start
to build the internal registry databases now counted as key assets and prime candidates for novel Big
Data experiments. The emergence of online transactional services, the “Hosts,” like Dialog, Questel
Orbit (now called Orbit) and STN, enabled paper-based indexing services like ISI’s Science Citation

Contenu connexe

En vedette

ANURAG_SR_RESUME
ANURAG_SR_RESUMEANURAG_SR_RESUME
ANURAG_SR_RESUMEANURAG SR
 
Концепция и архитектура Информационной системы (базы данных)
Концепция и архитектура Информационной системы (базы данных)Концепция и архитектура Информационной системы (базы данных)
Концепция и архитектура Информационной системы (базы данных)Self-employed
 
Palestra Es Una Navidad
Palestra Es Una NavidadPalestra Es Una Navidad
Palestra Es Una Navidadinfotuc
 
Introduction to Palringo (Mobile Monday Peer Awards)
Introduction to Palringo (Mobile Monday Peer Awards)Introduction to Palringo (Mobile Monday Peer Awards)
Introduction to Palringo (Mobile Monday Peer Awards)kerrypalringo
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingHealth Catalyst
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

En vedette (7)

ANURAG_SR_RESUME
ANURAG_SR_RESUMEANURAG_SR_RESUME
ANURAG_SR_RESUME
 
Концепция и архитектура Информационной системы (базы данных)
Концепция и архитектура Информационной системы (базы данных)Концепция и архитектура Информационной системы (базы данных)
Концепция и архитектура Информационной системы (базы данных)
 
Palestra Es Una Navidad
Palestra Es Una NavidadPalestra Es Una Navidad
Palestra Es Una Navidad
 
Introduction to Palringo (Mobile Monday Peer Awards)
Introduction to Palringo (Mobile Monday Peer Awards)Introduction to Palringo (Mobile Monday Peer Awards)
Introduction to Palringo (Mobile Monday Peer Awards)
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Plus de BioWorld Today, Now a Part of Thomson Reuters (6)

Medical Device Daily - April 8, 2013
Medical Device Daily - April 8, 2013Medical Device Daily - April 8, 2013
Medical Device Daily - April 8, 2013
 
Biopharmaceutical Royalty Rates Analysis: Essential Benchmarks for Dealmaking...
Biopharmaceutical Royalty Rates Analysis: Essential Benchmarks for Dealmaking...Biopharmaceutical Royalty Rates Analysis: Essential Benchmarks for Dealmaking...
Biopharmaceutical Royalty Rates Analysis: Essential Benchmarks for Dealmaking...
 
BioWorld's Biotechnology State of the Industry Report 2011
BioWorld's Biotechnology State of the Industry Report 2011BioWorld's Biotechnology State of the Industry Report 2011
BioWorld's Biotechnology State of the Industry Report 2011
 
BioWorld Executive Compensation Report 2013 (Preview)
BioWorld Executive Compensation Report 2013 (Preview)BioWorld Executive Compensation Report 2013 (Preview)
BioWorld Executive Compensation Report 2013 (Preview)
 
2012: Looking Back at the Year in Biotech
2012: Looking Back at the Year in Biotech2012: Looking Back at the Year in Biotech
2012: Looking Back at the Year in Biotech
 
Designing Deals Between Biotech Companies and University TTOs – Opportunities...
Designing Deals Between Biotech Companies and University TTOs – Opportunities...Designing Deals Between Biotech Companies and University TTOs – Opportunities...
Designing Deals Between Biotech Companies and University TTOs – Opportunities...
 

Dernier

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Big Data Whitepaper

  • 1. WHITE PAPER BIG DATA AND THE NEEDS OF THE PHARMA INDUSTRY JULY 2013
  • 2. TABLE OF CONTENTS Introduction.................................................................................... 1 Big Data into Little Data ................................................................2 Big Data as the Next Wave of an Historical Trend ........................3 1960’s Big Data – Patents ....................................................3 1970’s Big Data – Chemistry................................................ 4 1980’s Big Data – Sequences.............................................. 4 1990’s Big Data – Arrays.......................................................5 2000’s Big Data – Next Generation Sequencing ................ 6 Big Data in the Present..................................................................7 New tools and techniques for old problems........................7 New problems; new opportunities..................................... 12 Big Data in the Future..................................................................15 Data about Big Data...........................................................14 Big Data engineers..............................................................15 Conclusion....................................................................................16
  • 3. BIG DATA AND THE NEEDS OF THE PHARMA INDUSTRY 1 INTRODUCTION Gartner, Inc. neatly defines Big Data as “high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” Thomson Reuters Life Sciences sees Big Data as both a problem and an opportunity. In a recent survey, Thomson Reuters asked a group of IT leaders in Pharma how they view Big Data. Unequivocally, 100 percent responded that Big Data is an opportunity. This isn’t that surprising given that looking for empirical facts in large bodies of evidence has always been a driver of innovation. The Pharma industry generally works by identifying some correlation, e.g. between disease and protein, and then working out how to exploit it. The science as to why that correlation exists tends to follow despite recent trends towards science-led discovery. The ability to apply this thinking to even larger and richer sets of data is clearly an opportunity for Pharma to find new correlations and develop new drugs. When asked about where they saw Big Data opportunities, our respondents overwhelmingly highlighted two areas of focus: early-stage discovery (41.2 percent) and understanding the market (26.5 percent). Figure 1 illustrates this, while also showing the emerging trend of personalized (or precision) medicine. Drug discovery has always been a data-driven activity and it makes sense to extend that to the new volumes, varieties and velocities of data coming out of the labs and out of public initiatives. Market understanding is new. It reflects both the change in the Pharma market dynamics, caused by the greater influence on prescribing behavior by payers, and the promise of rich patient-level data from electronic health records. Understanding the patient (personalized medicine), scoring 14.7 percent in our survey, is also a significant focus in Pharma right now, so access to, and the ability to, digest this kind of data represents a real win to our respondents in providing value to their businesses. Figure 1: THE BIGGEST OPPORTUNITIES FOR BIG DATA Source: Thomson Reuters Big Data Survey
  • 4. 2 BIG DATA AND THE NEEDS OF THE PHARMA INDUSTRY BIG DATA INTO LITTLE DATA The problem: humans can’t work with Big Data directly. To realize the value of all this data, we need to reduce it to human proportions. When you examine what our customers are really doing with Big Data, what you see is the application of tools and techniques to “shrink” the data. In the Life Sciences business of Thomson Reuters, we call this “making Big Data look like Little Data.” Little Data is the data we are equipped to handle. It comprises reliable, evidence-backed facts that scientists can use in models, visualizations and analyses. It is actionable data that can help Pharma companies with their core business of developing new and better drugs. In drug discovery, this usually takes the form of designing in-silico experiments. This requires building up data sets from disparate sources, requiring cross-functional teams to work together on data that scores highly in the Variety vector of Big Data. Making this data act like Little Data is a challenge in data harmonization. You need common ontologies and ways to present data to scientists with very different skills and backgrounds, so it is delivered in a way they understand. In patient understanding, the challenge is to get the data in one place and to filter the outputs so that the interesting correlations stand out from the “noise” of correlations that are either obvious or spurious--a situation analogous to signal amplification in the audio industry.
  • 5. BIG DATA AND THE NEEDS OF THE PHARMA INDUSTRY 3 BIG DATA AS THE NEXT WAVE OF AN HISTORICAL TREND Challenging volumes of data is not a new phenomenon in Pharma. Rather, Big Data is an evolution in the application of data in drug R&D. It creates new challenges and provides new tools, but is not the complete revolution that some commentators claim it to be. Over many years, information companies like Thomson Reuters have supported customers in their efforts to divide whatever is the current “data elephant” into manageable chunks. 1960’S BIG DATA – PATENTS An early Big Data challenge was the boom in patenting. In the first half of the twentieth century, a Pharma scientist could keep up with all the patents in his field by reading the patent applications him/ herself. Figure 2 shows that by 1960, s/he would have to read over 1,000 patents a year to keep up. And, would have to be able to read at least English, German, French and Japanese. Abstracting and indexing services such as Derwent Publications emerged to address this challenge. Derwent’s curation team read, classified and abstracted all the patents coming out of the leading patent offices around the world (as they continue to do so today). The editorially-enhanced information was published in weekly bulletins separated by area of interest so scientists could easily get to those of relevance. As can be seen from the graph, this trend has continued more or less unchecked ever since. Nowadays nobody would even consider trying to keep up with patent information in any other way than setting up tailored alerting services on patent databases. Figure 2: THE RISE IN GLOBAL PHARMA PATENTING 140000 120000 100000 80000 60000 40000 20000 0 1982 1980 1978 1976 1974 1972 1970 1968 1966 1964 1962 1960 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 PatentFamilies Source: Thomson Reuters Derwent World Patents Index 1970’S BIG DATA – CHEMISTRY The 1970’s saw the emergence of computer databases. In particular, the development of databases that could store, search and display chemical structures. These enabled Pharma companies to start to build the internal registry databases now counted as key assets and prime candidates for novel Big Data experiments. The emergence of online transactional services, the “Hosts,” like Dialog, Questel Orbit (now called Orbit) and STN, enabled paper-based indexing services like ISI’s Science Citation