SlideShare a Scribd company logo
1 of 17
Download to read offline
A	
  recipe	
  for	
  grabbing	
  director	
  informa-on	
  from	
  OpenCorporates	
  using	
  OpenRefine	
  
given	
  an	
  OpenCorporates	
  company	
  ID	
  or	
  OpenCorporates	
  company	
  page	
  URL	
  	
  
For	
  more	
  informa<on,	
  contact:	
  schoolOfData.org	
  

1	
  
Here’s	
  the	
  start	
  of	
  thing	
  we’re	
  star<ng	
  with	
  –	
  a	
  list	
  of	
  companies…	
  

2	
  
Here’s	
  the	
  sort	
  of	
  thing	
  we	
  want	
  –	
  lists	
  of	
  directors	
  associated	
  with	
  each	
  company	
  
(where	
  that	
  informa<on	
  is	
  available).	
  

3	
  
The	
  first	
  step	
  is	
  to	
  create	
  a	
  web	
  address/URL	
  to	
  call	
  the	
  OpenCorporates	
  API	
  and	
  ask	
  it	
  
for	
  data	
  about	
  a	
  par<cular	
  company.	
  OpenRefine	
  can	
  create	
  a	
  new	
  column	
  populated	
  
with	
  the	
  contents	
  of	
  calls	
  made	
  to	
  a	
  URL	
  contained	
  in,	
  or	
  generated	
  from,	
  another	
  
column.	
  

4	
  
The	
  URLs	
  should	
  take	
  the	
  form:	
  
h"p://api.opencorporates.com/companies/JURISDICTION/COMPANY_ID	
  
If	
  you	
  already	
  have	
  company	
  page	
  URLs	
  in	
  a	
  column,	
  add	
  column	
  based	
  on	
  that	
  
column	
  using:	
  
value.replace(‘h"p://’,’h"p://api”)	
  
If	
  you	
  have	
  JURISDICTION/COMPANY_ID	
  in	
  a	
  column,	
  use	
  the	
  formula:	
  
“h"p://api.opencorporates.com/companies/”+value	
  

5	
  
The	
  data	
  comes	
  back	
  as	
  JSON	
  data,	
  which	
  we	
  will	
  need	
  to	
  process.	
  
Each	
  JSON	
  result	
  contains	
  the	
  data	
  for	
  a	
  single	
  company.	
  The	
  data	
  rela<ng	
  to	
  the	
  
directors	
  can	
  be	
  found	
  as	
  a	
  list	
  down	
  the	
  path	
  value.parseJson()['results']['company']
['officers’]	
  

6	
  
Let’s	
  parse	
  the	
  JSON	
  data	
  an	
  put	
  the	
  directors	
  informa<on	
  into	
  another	
  column…	
  

7	
  
What	
  we	
  are	
  aiming	
  for	
  is	
  a	
  contrivance	
  based	
  on	
  the	
  form:	
  
32866743::SIMON	
  ALAN	
  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null	
  
32866744::KARIN	
  JACQUELINE	
  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22	
  
32866745::ANDREW	
  WILLIAM	
  LONGDEN::director::2003-­‐11-­‐03::null	
  
…	
  
where	
  we	
  list	
  director	
  ID,	
  name,	
  posi<on,	
  appointment	
  date,	
  termina<on	
  date.	
  

8	
  
This	
  func<on	
  will	
  parse	
  the	
  data	
  into	
  string	
  with	
  the	
  form:	
  
32866743::SIMON	
  ALAN	
  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null||
32866744::KARIN	
  JACQUELINE	
  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22||
32866745::ANDREW	
  WILLIAM	
  LONGDEN::director::2003-­‐11-­‐03::null||…	
  
The	
  func<on	
  reads	
  as	
  follows:	
  “for	
  each	
  officer,	
  join	
  their	
  ID,	
  name,	
  posi<on,	
  start	
  
date	
  and	
  end	
  data	
  with	
  ::,	
  then	
  join	
  each	
  of	
  these	
  director	
  descrip<ons	
  using	
  ||”.	
  
The	
  use	
  of	
  two	
  different	
  –	
  and	
  hopefully	
  unique	
  –	
  delimiters	
  means	
  we	
  can	
  split	
  the	
  
data	
  on	
  each	
  delimiter	
  type	
  separately.	
  

9	
  
The	
  parsed	
  data	
  is	
  put	
  into	
  a	
  new	
  column	
  in	
  this	
  combined	
  list	
  form.	
  

10	
  
We	
  can	
  then	
  split	
  the	
  data	
  so	
  that	
  we	
  create	
  a	
  new	
  row	
  for	
  each	
  director	
  using	
  the	
  
delimiter	
  we	
  defined:	
  ||	
  

11	
  
Note	
  that	
  values	
  from	
  the	
  other	
  columns	
  will	
  not	
  be	
  copied	
  into	
  any	
  newly	
  created	
  
rows	
  –	
  we	
  will	
  have	
  to	
  do	
  that	
  ourselves	
  either	
  now,	
  or	
  later.	
  

12	
  
For	
  each	
  director,	
  we	
  now	
  want	
  to	
  split	
  their	
  details	
  out	
  across	
  several	
  columns,	
  one	
  
for	
  each	
  data	
  field	
  (ID,	
  name,	
  posi<on,	
  appointment	
  date,	
  termina<on	
  date).	
  

13	
  
We	
  can	
  do	
  this	
  by	
  splijng	
  on	
  the	
  other	
  separator	
  type	
  we	
  used:	
  ::	
  

14	
  
The	
  newly	
  created	
  columns	
  are	
  labeled	
  with	
  automa<cally	
  generated	
  names.	
  It	
  would	
  
probably	
  make	
  sense	
  to	
  rename	
  them	
  to	
  something	
  slightly	
  more	
  convenient.	
  

15	
  
Finally,	
  we	
  can	
  do	
  a	
  likle	
  more	
  <dying.	
  For	
  any	
  columns	
  we	
  want	
  to	
  export,	
  such	
  as	
  
company	
  name,	
  or	
  company	
  ID,	
  we	
  can	
  Fill	
  down	
  using	
  the	
  corresponding	
  values	
  from	
  
the	
  original	
  row	
  the	
  directors’	
  informa<on	
  was	
  pulled	
  from.	
  

16	
  
If	
  you	
  want	
  to	
  know	
  more,	
  contact	
  us…	
  

17	
  

More Related Content

What's hot

Data normailazation
Data normailazationData normailazation
Data normailazationLalit Kale
 
How to Manage an Email List
How to Manage an Email ListHow to Manage an Email List
How to Manage an Email Listbillie4reid
 
Mail Merge - the basics
Mail Merge - the basicsMail Merge - the basics
Mail Merge - the basicskprentice
 
OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...
OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...
OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...Sarah Weeks
 
Using MarcEdit for batch cataloging
Using MarcEdit for batch catalogingUsing MarcEdit for batch cataloging
Using MarcEdit for batch catalogingNCLA2011
 
DIY basic Facebook data mining
DIY basic Facebook data miningDIY basic Facebook data mining
DIY basic Facebook data miningSTEM/MARK
 
A Linked Data Visualisation - The organogram
A Linked Data Visualisation - The organogramA Linked Data Visualisation - The organogram
A Linked Data Visualisation - The organogramdanpaulsmith
 
Uploading datasets to data site
Uploading datasets to data siteUploading datasets to data site
Uploading datasets to data sitecookcountyblog
 
Mail Merge - Microsoft Office 2010
Mail Merge - Microsoft Office 2010Mail Merge - Microsoft Office 2010
Mail Merge - Microsoft Office 2010mmarchione
 
Presentation on SQL At Batra Computer Cente
Presentation on SQL At Batra Computer CentePresentation on SQL At Batra Computer Cente
Presentation on SQL At Batra Computer CenteBatra Computer Centre
 
Bt0062 fundamentals of it(2)
Bt0062 fundamentals of it(2)Bt0062 fundamentals of it(2)
Bt0062 fundamentals of it(2)Techglyphs
 
Depicting FCC Amateur Radio Licensees
Depicting FCC Amateur Radio LicenseesDepicting FCC Amateur Radio Licensees
Depicting FCC Amateur Radio LicenseesDepiction
 
One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014
One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014
One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014Ippon
 
Advance word-processing-skills final
Advance word-processing-skills finalAdvance word-processing-skills final
Advance word-processing-skills finalmelaniebitar
 

What's hot (19)

Data normailazation
Data normailazationData normailazation
Data normailazation
 
XML Bible
XML BibleXML Bible
XML Bible
 
Exportto excel
Exportto excelExportto excel
Exportto excel
 
Mail merge
Mail mergeMail merge
Mail merge
 
How to Manage an Email List
How to Manage an Email ListHow to Manage an Email List
How to Manage an Email List
 
Mail Merge - the basics
Mail Merge - the basicsMail Merge - the basics
Mail Merge - the basics
 
OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...
OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...
OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...
 
Using MarcEdit for batch cataloging
Using MarcEdit for batch catalogingUsing MarcEdit for batch cataloging
Using MarcEdit for batch cataloging
 
DIY basic Facebook data mining
DIY basic Facebook data miningDIY basic Facebook data mining
DIY basic Facebook data mining
 
A Linked Data Visualisation - The organogram
A Linked Data Visualisation - The organogramA Linked Data Visualisation - The organogram
A Linked Data Visualisation - The organogram
 
Uploading datasets to data site
Uploading datasets to data siteUploading datasets to data site
Uploading datasets to data site
 
Mail Merge - Microsoft Office 2010
Mail Merge - Microsoft Office 2010Mail Merge - Microsoft Office 2010
Mail Merge - Microsoft Office 2010
 
Presentation on SQL At Batra Computer Cente
Presentation on SQL At Batra Computer CentePresentation on SQL At Batra Computer Cente
Presentation on SQL At Batra Computer Cente
 
Bt0062 fundamentals of it(2)
Bt0062 fundamentals of it(2)Bt0062 fundamentals of it(2)
Bt0062 fundamentals of it(2)
 
Form 1 Term 3 Week 2.2
Form 1   Term 3   Week 2.2Form 1   Term 3   Week 2.2
Form 1 Term 3 Week 2.2
 
Depicting FCC Amateur Radio Licensees
Depicting FCC Amateur Radio LicenseesDepicting FCC Amateur Radio Licensees
Depicting FCC Amateur Radio Licensees
 
Mail merge
Mail mergeMail merge
Mail merge
 
One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014
One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014
One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014
 
Advance word-processing-skills final
Advance word-processing-skills finalAdvance word-processing-skills final
Advance word-processing-skills final
 

Similar to Scoda openrefine-directordata

EX16_AC_CH06_GRADER_CAP_AS - Drivers and Insurance (completed solution)
EX16_AC_CH06_GRADER_CAP_AS - Drivers and Insurance (completed solution)EX16_AC_CH06_GRADER_CAP_AS - Drivers and Insurance (completed solution)
EX16_AC_CH06_GRADER_CAP_AS - Drivers and Insurance (completed solution)NinaDobrev22
 
Cis407 a ilab 4 web application development devry university
Cis407 a ilab 4 web application development devry universityCis407 a ilab 4 web application development devry university
Cis407 a ilab 4 web application development devry universitylhkslkdh89009
 
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docxhoney725342
 
AIRBNB DATA WAREHOUSE & GRAPH DATABASE
AIRBNB DATA WAREHOUSE & GRAPH DATABASEAIRBNB DATA WAREHOUSE & GRAPH DATABASE
AIRBNB DATA WAREHOUSE & GRAPH DATABASESagar Deogirkar
 
final project5630281_f260.jpgfinal projectclone_request_ap.docx
final project5630281_f260.jpgfinal projectclone_request_ap.docxfinal project5630281_f260.jpgfinal projectclone_request_ap.docx
final project5630281_f260.jpgfinal projectclone_request_ap.docxvoversbyobersby
 
Nested JSON data processing with Apache Spark
Nested JSON data processing with Apache SparkNested JSON data processing with Apache Spark
Nested JSON data processing with Apache SparkAegis Software Canada
 
OBIEE publisher with Report creation - Tutorial
OBIEE publisher with Report creation - TutorialOBIEE publisher with Report creation - Tutorial
OBIEE publisher with Report creation - Tutorialonlinetrainingplacements
 
Informatica cloud Powercenter designer
Informatica cloud Powercenter designerInformatica cloud Powercenter designer
Informatica cloud Powercenter designerRameswara Reddy
 
Managing Oracle Streams Using Enterprise Manager Grid Control
Managing Oracle Streams Using Enterprise Manager Grid ControlManaging Oracle Streams Using Enterprise Manager Grid Control
Managing Oracle Streams Using Enterprise Manager Grid Controlscottb411
 
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdfDBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdfAbhishekKumarPandit5
 
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...ProductCamp Boston
 
INFO-6053 Fall 2017 Project 3 Page 1 of 6 .docx
INFO-6053 Fall 2017 Project 3 Page 1 of 6 .docxINFO-6053 Fall 2017 Project 3 Page 1 of 6 .docx
INFO-6053 Fall 2017 Project 3 Page 1 of 6 .docxjaggernaoma
 
Oracle Discoverer Reports via BSS
Oracle Discoverer Reports via BSSOracle Discoverer Reports via BSS
Oracle Discoverer Reports via BSSKhalid Tariq
 

Similar to Scoda openrefine-directordata (20)

Apps1
Apps1Apps1
Apps1
 
EX16_AC_CH06_GRADER_CAP_AS - Drivers and Insurance (completed solution)
EX16_AC_CH06_GRADER_CAP_AS - Drivers and Insurance (completed solution)EX16_AC_CH06_GRADER_CAP_AS - Drivers and Insurance (completed solution)
EX16_AC_CH06_GRADER_CAP_AS - Drivers and Insurance (completed solution)
 
Cis407 a ilab 4 web application development devry university
Cis407 a ilab 4 web application development devry universityCis407 a ilab 4 web application development devry university
Cis407 a ilab 4 web application development devry university
 
B2BCMarketing
B2BCMarketingB2BCMarketing
B2BCMarketing
 
LDV.pptx
LDV.pptxLDV.pptx
LDV.pptx
 
Lead generation process
Lead generation process Lead generation process
Lead generation process
 
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
 
Insight
InsightInsight
Insight
 
AIRBNB DATA WAREHOUSE & GRAPH DATABASE
AIRBNB DATA WAREHOUSE & GRAPH DATABASEAIRBNB DATA WAREHOUSE & GRAPH DATABASE
AIRBNB DATA WAREHOUSE & GRAPH DATABASE
 
final project5630281_f260.jpgfinal projectclone_request_ap.docx
final project5630281_f260.jpgfinal projectclone_request_ap.docxfinal project5630281_f260.jpgfinal projectclone_request_ap.docx
final project5630281_f260.jpgfinal projectclone_request_ap.docx
 
Nested JSON data processing with Apache Spark
Nested JSON data processing with Apache SparkNested JSON data processing with Apache Spark
Nested JSON data processing with Apache Spark
 
mis4200notes4_2.ppt
mis4200notes4_2.pptmis4200notes4_2.ppt
mis4200notes4_2.ppt
 
OBIEE publisher with Report creation - Tutorial
OBIEE publisher with Report creation - TutorialOBIEE publisher with Report creation - Tutorial
OBIEE publisher with Report creation - Tutorial
 
Informatica cloud Powercenter designer
Informatica cloud Powercenter designerInformatica cloud Powercenter designer
Informatica cloud Powercenter designer
 
Managing Oracle Streams Using Enterprise Manager Grid Control
Managing Oracle Streams Using Enterprise Manager Grid ControlManaging Oracle Streams Using Enterprise Manager Grid Control
Managing Oracle Streams Using Enterprise Manager Grid Control
 
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdfDBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
 
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
How to Leverage Usage Data to Drive Product Messaging and Adoption - Rachel S...
 
INFO-6053 Fall 2017 Project 3 Page 1 of 6 .docx
INFO-6053 Fall 2017 Project 3 Page 1 of 6 .docxINFO-6053 Fall 2017 Project 3 Page 1 of 6 .docx
INFO-6053 Fall 2017 Project 3 Page 1 of 6 .docx
 
Data management
Data managementData management
Data management
 
Oracle Discoverer Reports via BSS
Oracle Discoverer Reports via BSSOracle Discoverer Reports via BSS
Oracle Discoverer Reports via BSS
 

More from Tony Hirst

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiestaTony Hirst
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptxTony Hirst
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyterTony Hirst
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2Tony Hirst
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Conversations with data
Conversations with dataConversations with data
Conversations with dataTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalismTony Hirst
 

More from Tony Hirst (20)

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiesta
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Ili 16 robot
Ili 16 robotIli 16 robot
Ili 16 robot
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalism
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Scoda openrefine-directordata

  • 1. A  recipe  for  grabbing  director  informa-on  from  OpenCorporates  using  OpenRefine   given  an  OpenCorporates  company  ID  or  OpenCorporates  company  page  URL     For  more  informa<on,  contact:  schoolOfData.org   1  
  • 2. Here’s  the  start  of  thing  we’re  star<ng  with  –  a  list  of  companies…   2  
  • 3. Here’s  the  sort  of  thing  we  want  –  lists  of  directors  associated  with  each  company   (where  that  informa<on  is  available).   3  
  • 4. The  first  step  is  to  create  a  web  address/URL  to  call  the  OpenCorporates  API  and  ask  it   for  data  about  a  par<cular  company.  OpenRefine  can  create  a  new  column  populated   with  the  contents  of  calls  made  to  a  URL  contained  in,  or  generated  from,  another   column.   4  
  • 5. The  URLs  should  take  the  form:   h"p://api.opencorporates.com/companies/JURISDICTION/COMPANY_ID   If  you  already  have  company  page  URLs  in  a  column,  add  column  based  on  that   column  using:   value.replace(‘h"p://’,’h"p://api”)   If  you  have  JURISDICTION/COMPANY_ID  in  a  column,  use  the  formula:   “h"p://api.opencorporates.com/companies/”+value   5  
  • 6. The  data  comes  back  as  JSON  data,  which  we  will  need  to  process.   Each  JSON  result  contains  the  data  for  a  single  company.  The  data  rela<ng  to  the   directors  can  be  found  as  a  list  down  the  path  value.parseJson()['results']['company'] ['officers’]   6  
  • 7. Let’s  parse  the  JSON  data  an  put  the  directors  informa<on  into  another  column…   7  
  • 8. What  we  are  aiming  for  is  a  contrivance  based  on  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null   32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22   32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null   …   where  we  list  director  ID,  name,  posi<on,  appointment  date,  termina<on  date.   8  
  • 9. This  func<on  will  parse  the  data  into  string  with  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null|| 32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22|| 32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null||…   The  func<on  reads  as  follows:  “for  each  officer,  join  their  ID,  name,  posi<on,  start   date  and  end  data  with  ::,  then  join  each  of  these  director  descrip<ons  using  ||”.   The  use  of  two  different  –  and  hopefully  unique  –  delimiters  means  we  can  split  the   data  on  each  delimiter  type  separately.   9  
  • 10. The  parsed  data  is  put  into  a  new  column  in  this  combined  list  form.   10  
  • 11. We  can  then  split  the  data  so  that  we  create  a  new  row  for  each  director  using  the   delimiter  we  defined:  ||   11  
  • 12. Note  that  values  from  the  other  columns  will  not  be  copied  into  any  newly  created   rows  –  we  will  have  to  do  that  ourselves  either  now,  or  later.   12  
  • 13. For  each  director,  we  now  want  to  split  their  details  out  across  several  columns,  one   for  each  data  field  (ID,  name,  posi<on,  appointment  date,  termina<on  date).   13  
  • 14. We  can  do  this  by  splijng  on  the  other  separator  type  we  used:  ::   14  
  • 15. The  newly  created  columns  are  labeled  with  automa<cally  generated  names.  It  would   probably  make  sense  to  rename  them  to  something  slightly  more  convenient.   15  
  • 16. Finally,  we  can  do  a  likle  more  <dying.  For  any  columns  we  want  to  export,  such  as   company  name,  or  company  ID,  we  can  Fill  down  using  the  corresponding  values  from   the  original  row  the  directors’  informa<on  was  pulled  from.   16  
  • 17. If  you  want  to  know  more,  contact  us…   17