SlideShare une entreprise Scribd logo
1  sur  20
OPENREFINE
Tricia Clayton
Collection Assessment and Discovery Librarian
Georgia State University
WHAT IS OPENREFINE?
OpenRefine
Main
Functions
Clean &
Transform
Extend &
ReconcileExplore
http://openrefine.org
HOW DOES IT COMPARE TO OTHER TOOLS?
OpenRefine
• Can batch edit
rows and columns
• Excellent for
exploring &
transforming data
• No schema
needed
• Data is always
visible
Spreadsheets
• Edit one cell at a
time
• Excellent for data
entry, functions,
calculations
• No schema
needed
• Data is always
visible
Databases
• Schema and
scripting language
needed for editing
• Data is mostly out
of site unless
programming is
used to run
queries or build
views
LIVE DEMO – BASIC ORIENTATION
• Create/open/import project
• Basic navigation
• The zones of central viewing area; the functions of the “All” column
vs. the other columns
• Export options
• Undo/redo
• Facet/filter
LIVE DEMO – EXPLORING & TRANSFORMING
• Faceting options
• Flag and remove
• Common transforms
• Transform; Add column based on this column
• GREL
• search/replace with multiple commands
• cell.cross
• Split/join cells
GETTING STARTED
Are you seeing this error when you open a project?
You can ignore it. It is trying to reach the Freebase service that
no longer exists.
USEFUL GREL OPERATIONS
Search and replace -
value.replace (",","")
“Atlanta, GA” becomes “Atlanta GA”
You can combine multiple commands together by connecting
them with periods.
value.replace (",","").replace (":","")
“Atlanta, GA: 30303” becomes “Atlanta GA 30303”
USEFUL GREL OPERATIONS
Replace (transform) the values in your current column with
those from another column in the same project:
cells["column"].value
where column represents the name of the column you are
getting the values from
USEFUL GREL OPERATIONS
Concatentation:
Adding a string to the value of the current column –
"added string" + cells["current column"].value
Combining the values of two columns -
cells["column1"].value + " " +
cells["column2"].value
Note – if any of the cells have blank values, problems will arise: see
http://kb.refinepro.com/2011/07/merge-2-columns-that-have-both-
blank.html
USEFUL GREL OPERATIONS
Changing the date format of a string formatted date:
Note: True date formats in OpenRefine are colored in green and formatted like
this: 2018-10-03T00:00:00Z. But you may have imported dates that retained their
text format (particularly if you turned off the option to parse text into numbers and
dates during the import process, as this speeds up the import process).
To transform 2018-10-03 to display just the year 2018:
toString(toDate(value),"yyyy")
The GREL first converts the expression to date format, takes just
the year, then converts it back to string.
USEFUL GREL OPERATIONS
Import a column from a different project into your current
project based on a matching column (cell.cross function):
cell.cross("JSTOR 201806 JR1", "Print
ISSN").cells["Reporting Period Total"].value[0]
Use the “add a column based on this column” menu option
on your Print ISSN column. The other project is “JSTOR
201806 JR1”, you are matching that project’s “Print ISSN”
column, and you are importing that project’s “Reporting
Period Total” column.
CLUSTERING DEMO
Clustering – a semi-automated process to identify groups of
different values that might represent the same thing, then
correct or normalize them:
“organization” AND “organisation”
“New York” AND “new york“
“François Mauriac” AND “Francois Mauriac”
RECONCILIATION
A service that semi-automates the process of matching data in
your project to authoritative data in other sources, for example:
• VIAF (Virtual International Authority File)
• FAST (Faceted Application of Subject Terminology)
• Library of Congress Subject Headings
• Journal TOCs
Other reconcilable data sources
RECONCILIATION
Wikidata reconciliation is the only built in service. Any
others must be added.
To reconcile against only the LC source in VIAF:
http://refine.codefork.com/reconcile/viafproxy/LC
From the column menu: Reconcile:
Start reconciling…
Step 1
Step 4
Step 3
Step 2
RECONCILIATION
Choose:
• what type of entity
to reconcile
against
• if you want it to
auto match
candidates with
high confidence
RECONCILIATION
Next steps:
• Verify the matched titles.
The links will take you to
the LC Name Authority
File records so you can
check.
• Select matches for the
unmatched titles by either
clicking the single or
double check marks:
the single check mark
matches just that cell; the
double check mark matches
all identical cells
RECONCILIATION
Now you have a list of proper LC
headings.
To get the match IDs for the column
you just reconciled:
• Edit Column – Add column
based on this column
• Name the new column
• In “Expression” box enter:
cell.recon.match.id
ADDITIONAL RESOURCES
• Using OpenRefine (2013), by Ruben Verborgh and Max De
Wilde
A somewhat dated but still useful book that provides a
comprehensive introduction to OpenRefine.
• Cleaning Data with OpenRefine:
https://libjohn.github.io/openrefine/
An excellent tutorial developed by John Little at Duke
University Libraries.
ADDITIONAL RESOURCES
• OpenRefine’s Documentation page:
http://openrefine.org/documentation.html
Links to several online courses and an extensive curated
tutorial list
• Official documentation and reference for the General Refine
Expression Language (GREL):
https://github.com/OpenRefine/OpenRefine/wiki/Documentatio
n-For-Users#reference
ADDITIONAL RESOURCES
• Reconciling author names using Open Refine and VIAF:
http://iphylo.blogspot.com/2013/04/reconciling-author-names-
using-open.html
• Reconciling Smithsonian Library data with VIAF:
https://allysonota.weebly.com/uploads/5/7/9/6/57968819/ota_viaf
.pdf
• Reconciliation in OpenRefine, videos by Owen Stephens
https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 1)
https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 2)

Contenu connexe

Tendances

Excel PowerPoint
Excel PowerPointExcel PowerPoint
Excel PowerPointnhumar
 
Data Visualisation & Analytics with Tableau (Beginner) - by Maria Koumandraki
Data Visualisation & Analytics with Tableau (Beginner) - by Maria KoumandrakiData Visualisation & Analytics with Tableau (Beginner) - by Maria Koumandraki
Data Visualisation & Analytics with Tableau (Beginner) - by Maria KoumandrakiOutreach Digital
 
Introduction to MS Excel
Introduction to MS ExcelIntroduction to MS Excel
Introduction to MS ExcelTarek Dib
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced featuresVenkata Reddy Konasani
 
Access 2010 Unit A PPT
Access 2010 Unit A PPTAccess 2010 Unit A PPT
Access 2010 Unit A PPTokmomwalking
 
MS Access teaching powerpoint tasks
MS Access teaching powerpoint tasksMS Access teaching powerpoint tasks
MS Access teaching powerpoint tasksskomadina
 
Access lesson 01 Microsoft Access Basics
Access lesson 01 Microsoft Access BasicsAccess lesson 01 Microsoft Access Basics
Access lesson 01 Microsoft Access BasicsAram SE
 
Tableau Architecture
Tableau ArchitectureTableau Architecture
Tableau ArchitectureVivek Mohan
 
Data Analysis with MS Excel.pptx
Data Analysis with MS Excel.pptxData Analysis with MS Excel.pptx
Data Analysis with MS Excel.pptxKouros Goodarzi
 
Ms Excel Basic to Advance Tutorial
Ms Excel Basic to Advance TutorialMs Excel Basic to Advance Tutorial
Ms Excel Basic to Advance TutorialBikal Shrestha
 
Aligner vos données avec Wikidata grâce à l'outil Open Refine
Aligner vos données avec Wikidata grâce à l'outil Open RefineAligner vos données avec Wikidata grâce à l'outil Open Refine
Aligner vos données avec Wikidata grâce à l'outil Open RefineGautier Poupeau
 
Excel notes by satish kumar avunoori
Excel notes by satish kumar avunooriExcel notes by satish kumar avunoori
Excel notes by satish kumar avunooriSatish Kumar
 
Power BI new workspace experience in power bi
Power BI  new workspace experience in power biPower BI  new workspace experience in power bi
Power BI new workspace experience in power biAmit Kumar ☁
 

Tendances (20)

Excel PowerPoint
Excel PowerPointExcel PowerPoint
Excel PowerPoint
 
Data Visualisation & Analytics with Tableau (Beginner) - by Maria Koumandraki
Data Visualisation & Analytics with Tableau (Beginner) - by Maria KoumandrakiData Visualisation & Analytics with Tableau (Beginner) - by Maria Koumandraki
Data Visualisation & Analytics with Tableau (Beginner) - by Maria Koumandraki
 
Introduction to MS Excel
Introduction to MS ExcelIntroduction to MS Excel
Introduction to MS Excel
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Excel Lecture
Excel LectureExcel Lecture
Excel Lecture
 
Access 2010 Unit A PPT
Access 2010 Unit A PPTAccess 2010 Unit A PPT
Access 2010 Unit A PPT
 
MS Access teaching powerpoint tasks
MS Access teaching powerpoint tasksMS Access teaching powerpoint tasks
MS Access teaching powerpoint tasks
 
Access lesson 01 Microsoft Access Basics
Access lesson 01 Microsoft Access BasicsAccess lesson 01 Microsoft Access Basics
Access lesson 01 Microsoft Access Basics
 
Knime
KnimeKnime
Knime
 
Basic Ms excel
Basic Ms excelBasic Ms excel
Basic Ms excel
 
Microsoft word 2007 tutorial
Microsoft word 2007 tutorialMicrosoft word 2007 tutorial
Microsoft word 2007 tutorial
 
Tableau Architecture
Tableau ArchitectureTableau Architecture
Tableau Architecture
 
Data Analysis with MS Excel.pptx
Data Analysis with MS Excel.pptxData Analysis with MS Excel.pptx
Data Analysis with MS Excel.pptx
 
Ms Excel Basic to Advance Tutorial
Ms Excel Basic to Advance TutorialMs Excel Basic to Advance Tutorial
Ms Excel Basic to Advance Tutorial
 
Power query
Power queryPower query
Power query
 
Word 2007-Header And Footer Basics
Word 2007-Header And Footer BasicsWord 2007-Header And Footer Basics
Word 2007-Header And Footer Basics
 
Aligner vos données avec Wikidata grâce à l'outil Open Refine
Aligner vos données avec Wikidata grâce à l'outil Open RefineAligner vos données avec Wikidata grâce à l'outil Open Refine
Aligner vos données avec Wikidata grâce à l'outil Open Refine
 
Excel notes by satish kumar avunoori
Excel notes by satish kumar avunooriExcel notes by satish kumar avunoori
Excel notes by satish kumar avunoori
 
MS Excel 2nd
MS Excel 2ndMS Excel 2nd
MS Excel 2nd
 
Power BI new workspace experience in power bi
Power BI  new workspace experience in power biPower BI  new workspace experience in power bi
Power BI new workspace experience in power bi
 

Similaire à OpenRefine

Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx086ChintanPatel1
 
IMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESSIMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESS23HARSHU
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Brij Mishra
 
Erlwood KNIME nodes 2014
Erlwood KNIME nodes 2014Erlwood KNIME nodes 2014
Erlwood KNIME nodes 2014James Lumley
 
Obiee metadata development
Obiee metadata developmentObiee metadata development
Obiee metadata developmentdils4u
 
Querying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptxQuerying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptxQuyVo27
 
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...Anna Loughnan Colquhoun
 
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIEzekielJames8
 
Automation Of Reporting And Alerting
Automation Of Reporting And AlertingAutomation Of Reporting And Alerting
Automation Of Reporting And AlertingSean Durocher
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices documentAshwani Pandey
 
Java development with the dynamo framework
Java development with the dynamo frameworkJava development with the dynamo framework
Java development with the dynamo frameworkPatrick Deenen
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343Edgar Alejandro Villegas
 
How To Automate Part 2
How To Automate Part 2How To Automate Part 2
How To Automate Part 2Sean Durocher
 
Microsoft Excel- basics
Microsoft Excel-  basicsMicrosoft Excel-  basics
Microsoft Excel- basicsjeshin jose
 

Similaire à OpenRefine (20)

Oracle report from ppt
Oracle report from pptOracle report from ppt
Oracle report from ppt
 
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
 
IMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESSIMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESS
 
Using Spreadsheets.pptx
Using Spreadsheets.pptxUsing Spreadsheets.pptx
Using Spreadsheets.pptx
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012
 
Erlwood KNIME nodes 2014
Erlwood KNIME nodes 2014Erlwood KNIME nodes 2014
Erlwood KNIME nodes 2014
 
Obiee metadata development
Obiee metadata developmentObiee metadata development
Obiee metadata development
 
Querying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptxQuerying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptx
 
Introduction to Microsoft Excel
Introduction to Microsoft ExcelIntroduction to Microsoft Excel
Introduction to Microsoft Excel
 
BI Suite Overview
BI Suite OverviewBI Suite Overview
BI Suite Overview
 
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
 
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
 
PowerBI Training
PowerBI Training PowerBI Training
PowerBI Training
 
Automation Of Reporting And Alerting
Automation Of Reporting And AlertingAutomation Of Reporting And Alerting
Automation Of Reporting And Alerting
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices document
 
Java development with the dynamo framework
Java development with the dynamo frameworkJava development with the dynamo framework
Java development with the dynamo framework
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
 
Etl2
Etl2Etl2
Etl2
 
How To Automate Part 2
How To Automate Part 2How To Automate Part 2
How To Automate Part 2
 
Microsoft Excel- basics
Microsoft Excel-  basicsMicrosoft Excel-  basics
Microsoft Excel- basics
 

Plus de Georgia Libraries Conference (formerly Ga COMO).

Plus de Georgia Libraries Conference (formerly Ga COMO). (20)

Public Libraries as Partners for Community Health: Five Years of Evidence-Bas...
Public Libraries as Partners for Community Health: Five Years of Evidence-Bas...Public Libraries as Partners for Community Health: Five Years of Evidence-Bas...
Public Libraries as Partners for Community Health: Five Years of Evidence-Bas...
 
Everyone In!: Building and maintaining culture on your team
Everyone In!: Building and maintaining culture on your teamEveryone In!: Building and maintaining culture on your team
Everyone In!: Building and maintaining culture on your team
 
Creating a culture of welcome: Celebrating diversity and serving the informat...
Creating a culture of welcome: Celebrating diversity and serving the informat...Creating a culture of welcome: Celebrating diversity and serving the informat...
Creating a culture of welcome: Celebrating diversity and serving the informat...
 
Journey with Jones: Creating Virtual Tours to Generate Global Awareness
Journey with Jones: Creating Virtual Tours to Generate Global AwarenessJourney with Jones: Creating Virtual Tours to Generate Global Awareness
Journey with Jones: Creating Virtual Tours to Generate Global Awareness
 
So you want to manage? The Dos & Don'ts of personnel management.
So you want to manage? The Dos & Don'ts of personnel management.So you want to manage? The Dos & Don'ts of personnel management.
So you want to manage? The Dos & Don'ts of personnel management.
 
Building the Foundation For Grant Seeking in Public Libraries
Building the Foundation For Grant Seeking in Public LibrariesBuilding the Foundation For Grant Seeking in Public Libraries
Building the Foundation For Grant Seeking in Public Libraries
 
Preserving the History of a Consolidated University
Preserving the History of a Consolidated UniversityPreserving the History of a Consolidated University
Preserving the History of a Consolidated University
 
Supporting Libraries Through Advocacy
Supporting Libraries Through AdvocacySupporting Libraries Through Advocacy
Supporting Libraries Through Advocacy
 
Only So Much Time in the Day: Time Management Strategies for Success
Only So Much Time in the Day: Time Management Strategies for SuccessOnly So Much Time in the Day: Time Management Strategies for Success
Only So Much Time in the Day: Time Management Strategies for Success
 
Assessment during a pandemic: Using ACRL’s project OUTCOME to assess instruct...
Assessment during a pandemic: Using ACRL’s project OUTCOME to assess instruct...Assessment during a pandemic: Using ACRL’s project OUTCOME to assess instruct...
Assessment during a pandemic: Using ACRL’s project OUTCOME to assess instruct...
 
The Challenges of Collection Management During Fiscal Uncertainty
The Challenges of Collection Management During Fiscal UncertaintyThe Challenges of Collection Management During Fiscal Uncertainty
The Challenges of Collection Management During Fiscal Uncertainty
 
Are We Building Bridges or Walls? Opportunities and Challenges in Mitigating ...
Are We Building Bridges or Walls? Opportunities and Challenges in Mitigating ...Are We Building Bridges or Walls? Opportunities and Challenges in Mitigating ...
Are We Building Bridges or Walls? Opportunities and Challenges in Mitigating ...
 
LC Call Number 101: “What does it all mean?!”, the LC Classification and Shel...
LC Call Number 101: “What does it all mean?!”, the LC Classification and Shel...LC Call Number 101: “What does it all mean?!”, the LC Classification and Shel...
LC Call Number 101: “What does it all mean?!”, the LC Classification and Shel...
 
Strengthening the School to College Pipeline: Building National History Day P...
Strengthening the School to College Pipeline: Building National History Day P...Strengthening the School to College Pipeline: Building National History Day P...
Strengthening the School to College Pipeline: Building National History Day P...
 
History, Libraries and Archives
History, Libraries and ArchivesHistory, Libraries and Archives
History, Libraries and Archives
 
Georgia Helen Ruffin Reading Bowl (GaHRRB) is Celebrating 20 Years of Success...
Georgia Helen Ruffin Reading Bowl (GaHRRB) is Celebrating 20 Years of Success...Georgia Helen Ruffin Reading Bowl (GaHRRB) is Celebrating 20 Years of Success...
Georgia Helen Ruffin Reading Bowl (GaHRRB) is Celebrating 20 Years of Success...
 
Brick House: Building Stronger Academic Connections for Student Learning Success
Brick House: Building Stronger Academic Connections for Student Learning SuccessBrick House: Building Stronger Academic Connections for Student Learning Success
Brick House: Building Stronger Academic Connections for Student Learning Success
 
Successful User Experience: Active Listening + Creative Solutions = Building ...
Successful User Experience: Active Listening + Creative Solutions = Building ...Successful User Experience: Active Listening + Creative Solutions = Building ...
Successful User Experience: Active Listening + Creative Solutions = Building ...
 
Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...
Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...
Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...
 
Let’s Get Down to Business: An Academic Library Instagram Experience
Let’s Get Down to Business: An Academic Library Instagram ExperienceLet’s Get Down to Business: An Academic Library Instagram Experience
Let’s Get Down to Business: An Academic Library Instagram Experience
 

Dernier

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 

Dernier (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

OpenRefine

  • 1. OPENREFINE Tricia Clayton Collection Assessment and Discovery Librarian Georgia State University
  • 2. WHAT IS OPENREFINE? OpenRefine Main Functions Clean & Transform Extend & ReconcileExplore http://openrefine.org
  • 3. HOW DOES IT COMPARE TO OTHER TOOLS? OpenRefine • Can batch edit rows and columns • Excellent for exploring & transforming data • No schema needed • Data is always visible Spreadsheets • Edit one cell at a time • Excellent for data entry, functions, calculations • No schema needed • Data is always visible Databases • Schema and scripting language needed for editing • Data is mostly out of site unless programming is used to run queries or build views
  • 4. LIVE DEMO – BASIC ORIENTATION • Create/open/import project • Basic navigation • The zones of central viewing area; the functions of the “All” column vs. the other columns • Export options • Undo/redo • Facet/filter
  • 5. LIVE DEMO – EXPLORING & TRANSFORMING • Faceting options • Flag and remove • Common transforms • Transform; Add column based on this column • GREL • search/replace with multiple commands • cell.cross • Split/join cells
  • 6. GETTING STARTED Are you seeing this error when you open a project? You can ignore it. It is trying to reach the Freebase service that no longer exists.
  • 7. USEFUL GREL OPERATIONS Search and replace - value.replace (",","") “Atlanta, GA” becomes “Atlanta GA” You can combine multiple commands together by connecting them with periods. value.replace (",","").replace (":","") “Atlanta, GA: 30303” becomes “Atlanta GA 30303”
  • 8. USEFUL GREL OPERATIONS Replace (transform) the values in your current column with those from another column in the same project: cells["column"].value where column represents the name of the column you are getting the values from
  • 9. USEFUL GREL OPERATIONS Concatentation: Adding a string to the value of the current column – "added string" + cells["current column"].value Combining the values of two columns - cells["column1"].value + " " + cells["column2"].value Note – if any of the cells have blank values, problems will arise: see http://kb.refinepro.com/2011/07/merge-2-columns-that-have-both- blank.html
  • 10. USEFUL GREL OPERATIONS Changing the date format of a string formatted date: Note: True date formats in OpenRefine are colored in green and formatted like this: 2018-10-03T00:00:00Z. But you may have imported dates that retained their text format (particularly if you turned off the option to parse text into numbers and dates during the import process, as this speeds up the import process). To transform 2018-10-03 to display just the year 2018: toString(toDate(value),"yyyy") The GREL first converts the expression to date format, takes just the year, then converts it back to string.
  • 11. USEFUL GREL OPERATIONS Import a column from a different project into your current project based on a matching column (cell.cross function): cell.cross("JSTOR 201806 JR1", "Print ISSN").cells["Reporting Period Total"].value[0] Use the “add a column based on this column” menu option on your Print ISSN column. The other project is “JSTOR 201806 JR1”, you are matching that project’s “Print ISSN” column, and you are importing that project’s “Reporting Period Total” column.
  • 12. CLUSTERING DEMO Clustering – a semi-automated process to identify groups of different values that might represent the same thing, then correct or normalize them: “organization” AND “organisation” “New York” AND “new york“ “François Mauriac” AND “Francois Mauriac”
  • 13. RECONCILIATION A service that semi-automates the process of matching data in your project to authoritative data in other sources, for example: • VIAF (Virtual International Authority File) • FAST (Faceted Application of Subject Terminology) • Library of Congress Subject Headings • Journal TOCs Other reconcilable data sources
  • 14. RECONCILIATION Wikidata reconciliation is the only built in service. Any others must be added. To reconcile against only the LC source in VIAF: http://refine.codefork.com/reconcile/viafproxy/LC From the column menu: Reconcile: Start reconciling… Step 1 Step 4 Step 3 Step 2
  • 15. RECONCILIATION Choose: • what type of entity to reconcile against • if you want it to auto match candidates with high confidence
  • 16. RECONCILIATION Next steps: • Verify the matched titles. The links will take you to the LC Name Authority File records so you can check. • Select matches for the unmatched titles by either clicking the single or double check marks: the single check mark matches just that cell; the double check mark matches all identical cells
  • 17. RECONCILIATION Now you have a list of proper LC headings. To get the match IDs for the column you just reconciled: • Edit Column – Add column based on this column • Name the new column • In “Expression” box enter: cell.recon.match.id
  • 18. ADDITIONAL RESOURCES • Using OpenRefine (2013), by Ruben Verborgh and Max De Wilde A somewhat dated but still useful book that provides a comprehensive introduction to OpenRefine. • Cleaning Data with OpenRefine: https://libjohn.github.io/openrefine/ An excellent tutorial developed by John Little at Duke University Libraries.
  • 19. ADDITIONAL RESOURCES • OpenRefine’s Documentation page: http://openrefine.org/documentation.html Links to several online courses and an extensive curated tutorial list • Official documentation and reference for the General Refine Expression Language (GREL): https://github.com/OpenRefine/OpenRefine/wiki/Documentatio n-For-Users#reference
  • 20. ADDITIONAL RESOURCES • Reconciling author names using Open Refine and VIAF: http://iphylo.blogspot.com/2013/04/reconciling-author-names- using-open.html • Reconciling Smithsonian Library data with VIAF: https://allysonota.weebly.com/uploads/5/7/9/6/57968819/ota_viaf .pdf • Reconciliation in OpenRefine, videos by Owen Stephens https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 1) https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 2)