SlideShare une entreprise Scribd logo
1  sur  13
SOCIAL, POLITICAL AND LEGAL 
ASPECTS OF TEXT AND DATA 
MINING (TDM) 
Michelle Brook, The Content Mine, 
michelle@contentmine.org 
Peter Murray-Rust, University of Cambridge and 
Shuttleworth Fellow, pm286@cam.ac.uk 
Charles Oppenheim, Visiting Professor at City, Northampton 
and Robert Gordon Universities, 
c.oppenheim@btinternet.com
SO WHAT ARE THE NON-TECHNICAL 
PROBLEMS OF TDM? 
• LEGAL - copyright, database rights and licensing 
• SOCIAL - The lack of awareness, and relative 
technological gap, between many TDM tools and the 
skills of many academics 
• POLITICAL – the massive gap between publishers’ 
approaches to TDM and researchers’ needs; also the 
lack of specific TDM exceptions to copyright in most 
countries’ laws
COPYING IS OFTEN INVOLVED IN TDM 
• PDFs, the lingua franca of academic journals, are not 
machine readable 
• For TDM purposes, they must be transferred into a 
different digital form 
• That form is often custom and specific to the 
research question being asked and the most 
appropriate tools to answer that question 
• So there is a need to copy/adapt the original PDF
COPYRIGHT/DATABASE RIGHT 
• Gives the owner the right to authorise, or to refuse to authorise, any of 
the so-called restricted acts, including: copying; adapting; redisseminating 
all, or a “substantial” part, of a copyright work (similar rules apply to 
databases) 
• Substantial does not mean “most of”, but rather “what is important” 
• If someone does such restricted acts without permission, they have 
infringed the right and can be sued 
• However, there are certain (but very restricted) exceptions to copyright, 
whereby someone CAN copy, etc., without having to ask for permission or 
pay fees 
• Only a few countries (UK recently being one of them) have a specific 
exception for TDM in their laws 
• In the absence of such an exception in a country’s national law, 
researchers much ask for permission (request a licence) from the 
copyright owners. Generally, the copyright owners are publishers, 
because authors have (foolishly) assigned their copyright to them
THE NEW UK LAW 
• Came into force in June 2014 
• Specific exception to copyright for TDM 
• UK researchers do not have to ask for permission, pay fees, 
etc., to do TDM as long as it is for “non-commercial” purposes 
and long as they have “lawful access” to the raw materials. 
• What is, or is not “non-commercial” is controversial, but what 
is clear is that the question must be asked at the time the 
TDM was undertaken, so unexpected commercial benefits at 
the end of the project as OK, so long as at the time the intent 
was non-0commercial 
• “Lawful access” usually means licensed content, whether OA 
or a subscription to the materials
THE PROBLEMS OF APPROACHING 
PUBLISHERS FOR LICENCES FOR TDM 
• Many publishers want unreasonably high fees and/or place restrictions on 
what could be done with their materials after TDM, and/or require 
researchers to use its API, and/or take an extremely long time to decide 
how to respond to a TDM request 
• TDM researchers have to approach multiple publishers, each of whom 
have different attitudes, conditions, and speed of response to such 
requests. 
• This is very costly to a researcher, and has significant impact upon the take 
up of TDM, as well as inhibiting academics from sharing the outputs of 
their TDM research 
• These problems are inhibiting the take-up of TDM, thereby limiting the 
potential benefits this technology enables. 
• Also explains why so many TDM experiments are limited to OA materials
PUBLISHER TDM LICENCE INITIATIVES 
GENERALLY DO NOT HELP 
• Publishers have started offering their own TDM licences and policies 
• Their licences often impose unfair (and in the case of the UK, 
unenforceable) constraints on researchers’ freedom to exploit TDM. 
• Why “unenforceable”? Because UK law specifically states that any 
contract or licence term that prevents anyone from doing TDM in the 
manner prescribed in the new exception shall be deemed null and void 
• There are exceptions of course – Springer and Royal Society in particular 
offer generous TDM provisions. 
• So why are publishers offering restrictive licences in the UK? 
• One can only surmise that they hope licensees are ignorant of the new 
law, or the publishers in fact don’t know about it. So they are either 
deliberately misleading, or ignorant
WHAT POLITICAL INITATIVES ARE 
NEEDED? 
• Under EU law, countries in the EU are able to introduce 
exceptions for non-commercial TDM research, 
• However, so far only the UK has taken advantage of this. The 
EC is considering an EU-wide exception for TDM, and the 
Republic of Ireland is also considering such a change to its 
national law. 
• Outside of Europe, only one or two Far East countries have 
introduced such exceptions. 
• There needs to be an international treaty requiring all 
countries to include an exception for TDM in their national 
laws
WHAT CAN PUBLISHERS DO TO HELP? 
• Offer all researchers world-wide the same freedom as is now 
available to UK researchers to undertake TDM for non-commercial 
research purposes, so long as the user has lawful 
access to the original materials 
• Earn goodwill amongst the TDM research community by 
offering user- friendly APIs (without, of course, REQUIRING a 
researcher to use them), free advice, and discussion fora for 
the exchange of experience and ideas in the theory and 
practice of TDM 
• Develop clear agreed statements as to what types of research 
they agree is “non-commercial” and which is “commercial”.
ADDRESSING THE 
RESEARCHER/TECHNOLOGY GAP 
• Current TDM researchers are very technologically adept and 
work will need to be done to develop the existing tools to be 
easier to use by those with less expertise. 
• While The Content Mine and other organisations such as 
Software Carpentry are running workshops to help academics 
become more technically confident, much more needs to be 
done. 
• The TDM community needs to help close the gaps in 
knowledge, ability and awareness 
• Funders and institutions also have a responsibility to ensure 
academics and PhD students are trained in such skills and 
technologies
IN CONCLUSION 
• The main barriers against the uptake of TDM are primarily a lack of 
awareness among academics, a skills gap, legal issues around 
copyright and database rights, and restrictions being implemented 
by publishers’ licences. These problems are all solvable 
• Other countries should change their laws to make TDM lawful 
• Publishers should work with the TDM academic community to 
develop agreed statements as to what types of research they agree 
is “non-commercial” and which is “commercial”, and prevent any 
possible chilling effect from ambiguity around these terms 
• Funders and institutions should be exploring how to teach TDM 
techniques to interested academics and research students 
• Thank you for your attention.
SOME USEFUL 
RESOURCES/ACKNOWLEDGEMENT 
• Use of TDM to detect scientific fraud - 
http://www.nature.com/news/fraud-found-by-reading-between-the-lines- 
1.15859 
• General overview of benefits of TDM - D. McDonald and U. Kelly, The value 
and benefits of text mining (2012), http://www.jisc.ac.uk/reports/value-and- 
benefits-of-text-mining 
• Official guidance on the new UK copyright exception for TDM - 
https://www.gov.uk/government/uploads/system/uploads/attachment_d 
ata/file/315014/copyright-guidance-research.pdf 
• Excellent general overview of the change to UK law and its implications - 
http://copyrightuser.org/topics/text-and-data-mining/ - provides link to 
the precise wording in the law 
• Details of Springer’s and Royal Society’s initiatives at 
http://www.springer.com/gb/rights-permissions/springer-s-text-and-data-mining-policy/ 
29056 and http://royalsocietypublishing.org/text-data-mining 
• Image shown in this presentation is from Wikipedia and is covered by a 
Creative Commons CC BY licence

Contenu connexe

Tendances

Material Transfer Agreement
Material Transfer AgreementMaterial Transfer Agreement
Material Transfer AgreementRobert Harrison
 
411 on IP 101 for Tech-Geeks in the Startup World
411 on IP 101 for Tech-Geeks in the Startup World411 on IP 101 for Tech-Geeks in the Startup World
411 on IP 101 for Tech-Geeks in the Startup WorldG. Nagesh Rao
 
010-25 years of bayh-dole
010-25 years of bayh-dole010-25 years of bayh-dole
010-25 years of bayh-doleguest66dc5f
 
Developing an IP Strategy
Developing an IP StrategyDeveloping an IP Strategy
Developing an IP StrategyJane Lambert
 
Copyright crashcourse
Copyright crashcourseCopyright crashcourse
Copyright crashcourseJM11680
 
Copyright crash course
Copyright crash courseCopyright crash course
Copyright crash courseJM11680
 
Copyright for studentsedit2014
Copyright for studentsedit2014Copyright for studentsedit2014
Copyright for studentsedit2014dixieyeager
 
The Basics of Intellectual Property and Patent Strategy for Maximizing Busine...
The Basics of Intellectual Property and Patent Strategy for Maximizing Busine...The Basics of Intellectual Property and Patent Strategy for Maximizing Busine...
The Basics of Intellectual Property and Patent Strategy for Maximizing Busine...The Hutter Group: IP Business Strategy
 
Collabarative research agreement
Collabarative research agreement Collabarative research agreement
Collabarative research agreement harapriya behera
 
Intellectual Property Strategies
Intellectual Property StrategiesIntellectual Property Strategies
Intellectual Property StrategiesEnvisioning Labs
 
How To Protect Your Company's Intellectual Property
How To Protect Your Company's Intellectual PropertyHow To Protect Your Company's Intellectual Property
How To Protect Your Company's Intellectual PropertySecureDocs
 
Module 5: Legal Social and Economic Issues
Module 5: Legal Social and Economic IssuesModule 5: Legal Social and Economic Issues
Module 5: Legal Social and Economic IssuesLaraLibrarian
 
Handout 2: Copyright guide (a1)
Handout 2: Copyright guide (a1)Handout 2: Copyright guide (a1)
Handout 2: Copyright guide (a1)Jamie Bisset
 
Copyright and the Scholar
Copyright and the ScholarCopyright and the Scholar
Copyright and the Scholaresperr
 

Tendances (20)

TEAM A Lect.3
TEAM A Lect.3 TEAM A Lect.3
TEAM A Lect.3
 
International Patent Law Research :Tools and Strategies
International Patent Law Research :Tools and StrategiesInternational Patent Law Research :Tools and Strategies
International Patent Law Research :Tools and Strategies
 
Material Transfer Agreement
Material Transfer AgreementMaterial Transfer Agreement
Material Transfer Agreement
 
411 on IP 101 for Tech-Geeks in the Startup World
411 on IP 101 for Tech-Geeks in the Startup World411 on IP 101 for Tech-Geeks in the Startup World
411 on IP 101 for Tech-Geeks in the Startup World
 
010-25 years of bayh-dole
010-25 years of bayh-dole010-25 years of bayh-dole
010-25 years of bayh-dole
 
Developing an IP Strategy
Developing an IP StrategyDeveloping an IP Strategy
Developing an IP Strategy
 
Copyright crashcourse
Copyright crashcourseCopyright crashcourse
Copyright crashcourse
 
Copyright crash course
Copyright crash courseCopyright crash course
Copyright crash course
 
Copyright for studentsedit2014
Copyright for studentsedit2014Copyright for studentsedit2014
Copyright for studentsedit2014
 
Copyright and Plagiarism
Copyright and PlagiarismCopyright and Plagiarism
Copyright and Plagiarism
 
The Basics of Intellectual Property and Patent Strategy for Maximizing Busine...
The Basics of Intellectual Property and Patent Strategy for Maximizing Busine...The Basics of Intellectual Property and Patent Strategy for Maximizing Busine...
The Basics of Intellectual Property and Patent Strategy for Maximizing Busine...
 
Collabarative research agreement
Collabarative research agreement Collabarative research agreement
Collabarative research agreement
 
Are Injunctions Permissible for FRAND Encumbered Patents? - Maurits Dolmans -...
Are Injunctions Permissible for FRAND Encumbered Patents? - Maurits Dolmans -...Are Injunctions Permissible for FRAND Encumbered Patents? - Maurits Dolmans -...
Are Injunctions Permissible for FRAND Encumbered Patents? - Maurits Dolmans -...
 
Holdup & Royalty Stacking: Theory & Evidence - Anne Layne-Farrar - December 2...
Holdup & Royalty Stacking: Theory & Evidence - Anne Layne-Farrar - December 2...Holdup & Royalty Stacking: Theory & Evidence - Anne Layne-Farrar - December 2...
Holdup & Royalty Stacking: Theory & Evidence - Anne Layne-Farrar - December 2...
 
Survey of Tademark Research : Tools & Strategies
Survey of Tademark Research : Tools & StrategiesSurvey of Tademark Research : Tools & Strategies
Survey of Tademark Research : Tools & Strategies
 
Intellectual Property Strategies
Intellectual Property StrategiesIntellectual Property Strategies
Intellectual Property Strategies
 
How To Protect Your Company's Intellectual Property
How To Protect Your Company's Intellectual PropertyHow To Protect Your Company's Intellectual Property
How To Protect Your Company's Intellectual Property
 
Module 5: Legal Social and Economic Issues
Module 5: Legal Social and Economic IssuesModule 5: Legal Social and Economic Issues
Module 5: Legal Social and Economic Issues
 
Handout 2: Copyright guide (a1)
Handout 2: Copyright guide (a1)Handout 2: Copyright guide (a1)
Handout 2: Copyright guide (a1)
 
Copyright and the Scholar
Copyright and the ScholarCopyright and the Scholar
Copyright and the Scholar
 

En vedette

OpenNotebookScience NOW!
OpenNotebookScience NOW!OpenNotebookScience NOW!
OpenNotebookScience NOW!petermurrayrust
 
Why Open Data Means Better Science – Jenny Molloy
Why Open Data Means Better Science – Jenny MolloyWhy Open Data Means Better Science – Jenny Molloy
Why Open Data Means Better Science – Jenny MolloyOpenAIRE
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifestpetermurrayrust
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Ross Mounce
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Sciencepetermurrayrust
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiDatapetermurrayrust
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Imagespetermurrayrust
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
 

En vedette (14)

Open Notebook Science
Open Notebook ScienceOpen Notebook Science
Open Notebook Science
 
OpenNotebookScience NOW!
OpenNotebookScience NOW!OpenNotebookScience NOW!
OpenNotebookScience NOW!
 
Why Open Data Means Better Science – Jenny Molloy
Why Open Data Means Better Science – Jenny MolloyWhy Open Data Means Better Science – Jenny Molloy
Why Open Data Means Better Science – Jenny Molloy
 
Europe PMC Section Tagger
Europe PMC Section TaggerEurope PMC Section Tagger
Europe PMC Section Tagger
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
 
Making Theses USEFUL
Making Theses USEFULMaking Theses USEFUL
Making Theses USEFUL
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
Csvconf
CsvconfCsvconf
Csvconf
 

Similaire à Social, Political and Legal Aspects of Text and Data Mining (TDM)

Legal Framework for TDM
Legal Framework for TDMLegal Framework for TDM
Legal Framework for TDMJenny Molloy
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Datapetermurrayrust
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open DataTheContentMine
 
"Let’s tackle it together: recent changes in copyright and intellectual prope...
"Let’s tackle it together: recent changes in copyright and intellectual prope..."Let’s tackle it together: recent changes in copyright and intellectual prope...
"Let’s tackle it together: recent changes in copyright and intellectual prope...TDBaldwin
 
Library Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discoveryLibrary Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discoveryLIBER Europe
 
Registration of trademark
Registration of trademarkRegistration of trademark
Registration of trademarkSree Lekshmi
 
Patent Portfolios and Invention Sessions in Growth Stage Tech Companies - Dav...
Patent Portfolios and Invention Sessions in Growth Stage Tech Companies - Dav...Patent Portfolios and Invention Sessions in Growth Stage Tech Companies - Dav...
Patent Portfolios and Invention Sessions in Growth Stage Tech Companies - Dav...Dave Litwiller
 
Libraries at the centre of the debate on copyright and text and data mining: ...
Libraries at the centre of the debate on copyright and text and data mining: ...Libraries at the centre of the debate on copyright and text and data mining: ...
Libraries at the centre of the debate on copyright and text and data mining: ...LIBER Europe
 
Management of Licences for Electronic Content
Management of Licences for Electronic ContentManagement of Licences for Electronic Content
Management of Licences for Electronic ContentCILIPScotland
 
School of rocking copyright 2017 in Lisbon
School of rocking copyright 2017 in Lisbon School of rocking copyright 2017 in Lisbon
School of rocking copyright 2017 in Lisbon centrumcyfrowe
 
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...LIBER Europe
 
Copyright user presentation
Copyright user presentationCopyright user presentation
Copyright user presentationARLGSW
 
The exchange between open access and open educational resources: What can we ...
The exchange between open access and open educational resources: What can we ...The exchange between open access and open educational resources: What can we ...
The exchange between open access and open educational resources: What can we ...Creative Commons
 
Impact of Trade Agreements on IP
Impact of Trade Agreements on IPImpact of Trade Agreements on IP
Impact of Trade Agreements on IPCarolina Rossini
 
Technology Transfer for Knowledge Translation Practitioners
Technology Transfer for Knowledge Translation PractitionersTechnology Transfer for Knowledge Translation Practitioners
Technology Transfer for Knowledge Translation PractitionersJennifer Flagg
 
NORCAT Entrepreneurship 101 2014/15 – “Intellectual Property” featuring Antho...
NORCAT Entrepreneurship 101 2014/15 – “Intellectual Property” featuring Antho...NORCAT Entrepreneurship 101 2014/15 – “Intellectual Property” featuring Antho...
NORCAT Entrepreneurship 101 2014/15 – “Intellectual Property” featuring Antho...NORCAT
 
Collisions in the digital paradigm short
Collisions in the digital paradigm short Collisions in the digital paradigm short
Collisions in the digital paradigm short David Harvey
 
Material Transfer Agreement it's characters
Material Transfer Agreement it's charactersMaterial Transfer Agreement it's characters
Material Transfer Agreement it's charactersnishanthnish4444
 
wipo_iis_05_ledwards_cwaelde (1).ppt
wipo_iis_05_ledwards_cwaelde (1).pptwipo_iis_05_ledwards_cwaelde (1).ppt
wipo_iis_05_ledwards_cwaelde (1).pptssuserd26df0
 

Similaire à Social, Political and Legal Aspects of Text and Data Mining (TDM) (20)

Legal Framework for TDM
Legal Framework for TDMLegal Framework for TDM
Legal Framework for TDM
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Data
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Data
 
"Let’s tackle it together: recent changes in copyright and intellectual prope...
"Let’s tackle it together: recent changes in copyright and intellectual prope..."Let’s tackle it together: recent changes in copyright and intellectual prope...
"Let’s tackle it together: recent changes in copyright and intellectual prope...
 
Library Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discoveryLibrary Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discovery
 
Registration of trademark
Registration of trademarkRegistration of trademark
Registration of trademark
 
Patent Portfolios and Invention Sessions in Growth Stage Tech Companies - Dav...
Patent Portfolios and Invention Sessions in Growth Stage Tech Companies - Dav...Patent Portfolios and Invention Sessions in Growth Stage Tech Companies - Dav...
Patent Portfolios and Invention Sessions in Growth Stage Tech Companies - Dav...
 
Libraries at the centre of the debate on copyright and text and data mining: ...
Libraries at the centre of the debate on copyright and text and data mining: ...Libraries at the centre of the debate on copyright and text and data mining: ...
Libraries at the centre of the debate on copyright and text and data mining: ...
 
Management of Licences for Electronic Content
Management of Licences for Electronic ContentManagement of Licences for Electronic Content
Management of Licences for Electronic Content
 
School of rocking copyright 2017 in Lisbon
School of rocking copyright 2017 in Lisbon School of rocking copyright 2017 in Lisbon
School of rocking copyright 2017 in Lisbon
 
McCulloch NISO-ICSTI Joint Webinar
McCulloch NISO-ICSTI Joint WebinarMcCulloch NISO-ICSTI Joint Webinar
McCulloch NISO-ICSTI Joint Webinar
 
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
 
Copyright user presentation
Copyright user presentationCopyright user presentation
Copyright user presentation
 
The exchange between open access and open educational resources: What can we ...
The exchange between open access and open educational resources: What can we ...The exchange between open access and open educational resources: What can we ...
The exchange between open access and open educational resources: What can we ...
 
Impact of Trade Agreements on IP
Impact of Trade Agreements on IPImpact of Trade Agreements on IP
Impact of Trade Agreements on IP
 
Technology Transfer for Knowledge Translation Practitioners
Technology Transfer for Knowledge Translation PractitionersTechnology Transfer for Knowledge Translation Practitioners
Technology Transfer for Knowledge Translation Practitioners
 
NORCAT Entrepreneurship 101 2014/15 – “Intellectual Property” featuring Antho...
NORCAT Entrepreneurship 101 2014/15 – “Intellectual Property” featuring Antho...NORCAT Entrepreneurship 101 2014/15 – “Intellectual Property” featuring Antho...
NORCAT Entrepreneurship 101 2014/15 – “Intellectual Property” featuring Antho...
 
Collisions in the digital paradigm short
Collisions in the digital paradigm short Collisions in the digital paradigm short
Collisions in the digital paradigm short
 
Material Transfer Agreement it's characters
Material Transfer Agreement it's charactersMaterial Transfer Agreement it's characters
Material Transfer Agreement it's characters
 
wipo_iis_05_ledwards_cwaelde (1).ppt
wipo_iis_05_ledwards_cwaelde (1).pptwipo_iis_05_ledwards_cwaelde (1).ppt
wipo_iis_05_ledwards_cwaelde (1).ppt
 

Dernier

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Social, Political and Legal Aspects of Text and Data Mining (TDM)

  • 1. SOCIAL, POLITICAL AND LEGAL ASPECTS OF TEXT AND DATA MINING (TDM) Michelle Brook, The Content Mine, michelle@contentmine.org Peter Murray-Rust, University of Cambridge and Shuttleworth Fellow, pm286@cam.ac.uk Charles Oppenheim, Visiting Professor at City, Northampton and Robert Gordon Universities, c.oppenheim@btinternet.com
  • 2. SO WHAT ARE THE NON-TECHNICAL PROBLEMS OF TDM? • LEGAL - copyright, database rights and licensing • SOCIAL - The lack of awareness, and relative technological gap, between many TDM tools and the skills of many academics • POLITICAL – the massive gap between publishers’ approaches to TDM and researchers’ needs; also the lack of specific TDM exceptions to copyright in most countries’ laws
  • 3. COPYING IS OFTEN INVOLVED IN TDM • PDFs, the lingua franca of academic journals, are not machine readable • For TDM purposes, they must be transferred into a different digital form • That form is often custom and specific to the research question being asked and the most appropriate tools to answer that question • So there is a need to copy/adapt the original PDF
  • 4.
  • 5. COPYRIGHT/DATABASE RIGHT • Gives the owner the right to authorise, or to refuse to authorise, any of the so-called restricted acts, including: copying; adapting; redisseminating all, or a “substantial” part, of a copyright work (similar rules apply to databases) • Substantial does not mean “most of”, but rather “what is important” • If someone does such restricted acts without permission, they have infringed the right and can be sued • However, there are certain (but very restricted) exceptions to copyright, whereby someone CAN copy, etc., without having to ask for permission or pay fees • Only a few countries (UK recently being one of them) have a specific exception for TDM in their laws • In the absence of such an exception in a country’s national law, researchers much ask for permission (request a licence) from the copyright owners. Generally, the copyright owners are publishers, because authors have (foolishly) assigned their copyright to them
  • 6. THE NEW UK LAW • Came into force in June 2014 • Specific exception to copyright for TDM • UK researchers do not have to ask for permission, pay fees, etc., to do TDM as long as it is for “non-commercial” purposes and long as they have “lawful access” to the raw materials. • What is, or is not “non-commercial” is controversial, but what is clear is that the question must be asked at the time the TDM was undertaken, so unexpected commercial benefits at the end of the project as OK, so long as at the time the intent was non-0commercial • “Lawful access” usually means licensed content, whether OA or a subscription to the materials
  • 7. THE PROBLEMS OF APPROACHING PUBLISHERS FOR LICENCES FOR TDM • Many publishers want unreasonably high fees and/or place restrictions on what could be done with their materials after TDM, and/or require researchers to use its API, and/or take an extremely long time to decide how to respond to a TDM request • TDM researchers have to approach multiple publishers, each of whom have different attitudes, conditions, and speed of response to such requests. • This is very costly to a researcher, and has significant impact upon the take up of TDM, as well as inhibiting academics from sharing the outputs of their TDM research • These problems are inhibiting the take-up of TDM, thereby limiting the potential benefits this technology enables. • Also explains why so many TDM experiments are limited to OA materials
  • 8. PUBLISHER TDM LICENCE INITIATIVES GENERALLY DO NOT HELP • Publishers have started offering their own TDM licences and policies • Their licences often impose unfair (and in the case of the UK, unenforceable) constraints on researchers’ freedom to exploit TDM. • Why “unenforceable”? Because UK law specifically states that any contract or licence term that prevents anyone from doing TDM in the manner prescribed in the new exception shall be deemed null and void • There are exceptions of course – Springer and Royal Society in particular offer generous TDM provisions. • So why are publishers offering restrictive licences in the UK? • One can only surmise that they hope licensees are ignorant of the new law, or the publishers in fact don’t know about it. So they are either deliberately misleading, or ignorant
  • 9. WHAT POLITICAL INITATIVES ARE NEEDED? • Under EU law, countries in the EU are able to introduce exceptions for non-commercial TDM research, • However, so far only the UK has taken advantage of this. The EC is considering an EU-wide exception for TDM, and the Republic of Ireland is also considering such a change to its national law. • Outside of Europe, only one or two Far East countries have introduced such exceptions. • There needs to be an international treaty requiring all countries to include an exception for TDM in their national laws
  • 10. WHAT CAN PUBLISHERS DO TO HELP? • Offer all researchers world-wide the same freedom as is now available to UK researchers to undertake TDM for non-commercial research purposes, so long as the user has lawful access to the original materials • Earn goodwill amongst the TDM research community by offering user- friendly APIs (without, of course, REQUIRING a researcher to use them), free advice, and discussion fora for the exchange of experience and ideas in the theory and practice of TDM • Develop clear agreed statements as to what types of research they agree is “non-commercial” and which is “commercial”.
  • 11. ADDRESSING THE RESEARCHER/TECHNOLOGY GAP • Current TDM researchers are very technologically adept and work will need to be done to develop the existing tools to be easier to use by those with less expertise. • While The Content Mine and other organisations such as Software Carpentry are running workshops to help academics become more technically confident, much more needs to be done. • The TDM community needs to help close the gaps in knowledge, ability and awareness • Funders and institutions also have a responsibility to ensure academics and PhD students are trained in such skills and technologies
  • 12. IN CONCLUSION • The main barriers against the uptake of TDM are primarily a lack of awareness among academics, a skills gap, legal issues around copyright and database rights, and restrictions being implemented by publishers’ licences. These problems are all solvable • Other countries should change their laws to make TDM lawful • Publishers should work with the TDM academic community to develop agreed statements as to what types of research they agree is “non-commercial” and which is “commercial”, and prevent any possible chilling effect from ambiguity around these terms • Funders and institutions should be exploring how to teach TDM techniques to interested academics and research students • Thank you for your attention.
  • 13. SOME USEFUL RESOURCES/ACKNOWLEDGEMENT • Use of TDM to detect scientific fraud - http://www.nature.com/news/fraud-found-by-reading-between-the-lines- 1.15859 • General overview of benefits of TDM - D. McDonald and U. Kelly, The value and benefits of text mining (2012), http://www.jisc.ac.uk/reports/value-and- benefits-of-text-mining • Official guidance on the new UK copyright exception for TDM - https://www.gov.uk/government/uploads/system/uploads/attachment_d ata/file/315014/copyright-guidance-research.pdf • Excellent general overview of the change to UK law and its implications - http://copyrightuser.org/topics/text-and-data-mining/ - provides link to the precise wording in the law • Details of Springer’s and Royal Society’s initiatives at http://www.springer.com/gb/rights-permissions/springer-s-text-and-data-mining-policy/ 29056 and http://royalsocietypublishing.org/text-data-mining • Image shown in this presentation is from Wikipedia and is covered by a Creative Commons CC BY licence