SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Searching Chinese Patents: 	

Challenges and Solutions When Building
an Innovative Discovery Interface
ERIC PUGH | epugh@o19s.com | @dep4b
Who am I?
• Principal at OpenSource Connections
- Solr/Lucene Search Consultancy
http://bit.ly/OSCCommercialSummary	

• Member of Apache Software
Foundation	

• SOLR-284 UpdateRichDocuments
(July 07)
Co-Author
N
extEdition
M
ay!
Agilista
Selected Customers
Telling some stories
war	

^
Risks
• Cloud new at USPTO	

• Discovery is tenuous concept	

• Conflicting User Goals	

• Fixed Budget: trade scope for
budget/quality
Telling some stories
➡How to inject “Discovery” into your
app	

• The Cloud to the Rescue (sorta!)	

• Parsers and Parsers and Parsers	

• Don’t be Afraid to Share!
Flow of understanding
Data UnderstandingInformation
Building “Discovery”
Engine
UX DataTension
Grok data at gut level	

Look for outliers	

!
User Interviews	

Surveys	

Card Sorting	

Scenarios/Personas	

!
UX
Data
brainstorm
Mockups	

Proof of concept	

!
!
Where to spend time?
UX
Engine
Data
40%	

!
20%	

!
40%	

!
40%	

!
40%	

!
20%	

We spent	

!
Telling some stories
• How to inject “Discovery” into your app	

➡The Cloud to the Rescue (sorta!)
• Parsers and Parsers and Parsers	

• Don’t be Afraid to Share!
Boy meets Girl Story
Boy meets Girl Story
Metadata
Ingest	

Pipeline	

Discovery
UX
Content
Files
How we built it
EmberJS Single Page Search App
HTML
XML
JSON
Server Dashboard
GPSN UI (Bootsrap CSS)
Browsers
Mobile/
Tablet
Third Party
Application
Servers
S3 BucketSolr
Solr as a NoSQL
Datastore
• Used “atomic updates” to merge three
source datasets into single final dataset.	

• All text displayed in application stored in
Solr.	

• Dynamic schema supports many languages,
en, cn right now.
Lessons Learned
Don’t Move Files
• Copying 5 TB data up to S3 was very
painful.	

• We used S3Funnel which is “rsync like”	

• We bought more network bandwidth for
our office
Never
underestimate
the bandwidth of
a station wagon
full of tapes
hurtling down
the highway.

–Andrew Tanenbaum, 1981
Data Size
0
250000
500000
750000
1000000
1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011
Patent Count
277871
Think about DataVolume
• Started with older dataset, and tasks like TIFF -> PNG
conversion became progressively harder. Map/Reduce nice,
need more visibility into progress..	

• Should have sharded our Search Index from the beginning
just to make indexing faster and cheaper process (500 gb
index!)	

• 8 shards dropped time from 12 hours to 2 hours.
Merging took 5!	

• We had too many steps in our pipeline
Building	
  a	
  Patents	
  Index
MachineCount
0
75
150
225
300
5 days 3 days 30 Minutes
1 5
300
Key scaling concept
behind GPSN:	

!
Cloud meets Ocean
More prosaically…
Database
Server
Server
Server
Client
Client
Client
$
$
$
$
Telling some stories
• How to inject “Discovery” into your app	

• The Cloud to the Rescue (sorta!)	

➡Parsers and Parsers and Parsers
• Don’t be Afraid to Share!
Why so many pipelines?
Morphlines
Tika as a pipeline?
Lot’s of File Types
• Sometimes in ZIP archives, sometimes not!	

• multiple XML formats as well as CSV and
EDI	

• Purplebook,Yellowbook,
Redbook,Greenbook, Questel, SIPO…
Tika as a pipeline!
• Auto detects content type	

• Metadata structure has all the
key/value needed for Solr	

• Allows us to scale up with
Behemoth project (and
others!).
Lots of files!
HHHHHT APS1 ISSUE - 760106!
PATN!
WKU 039302717!
SRC 5!
APN 5328756!
APT 1!
ART 353!
APD 19741216!
TTL Golf glove!
ISD 19760106!
NCL 4!
ECL 1
<PatentGrant>!
<BibliographicData>!
<GrantIdentification>!
<DocumentKindCode>B1</DocumentKindCode>!
<GrantNumber>06644224</GrantNumber>!
<CountryCode>US</CountryCode>!
<IssueDateText>2003-11-11</IssueDateText>
Detector to pick File
public	
  class	
  GreenbookDetector	
  implements	
  Detector	
  {	
  
!
	
  	
  	
  	
  private	
  static	
  Pattern	
  pattern	
  =	
  Pattern.compile("PATN");	
  
	
  	
  	
  	
  	
  
	
  	
  	
  	
  @Override	
  
	
  	
  	
  	
  public	
  MediaType	
  detect(InputStream	
  stream,	
  Metadata	
  metadata)	
  throws	
  IOException	
  {	
  
!
	
  	
  	
  	
  	
  	
  	
  	
  MediaType	
  type	
  =	
  MediaType.OCTET_STREAM;	
  
	
  	
  	
  	
  	
  	
  	
  	
  InputStream	
  lookahead	
  =	
  new	
  LookaheadInputStream(stream,	
  1024);	
  
	
  	
  	
  	
  	
  	
  	
  	
  String	
  extract	
  =	
  org.apache.commons.io.IOUtils.toString(lookahead,	
  "UTF-­‐8");	
  
!
	
  	
  	
  	
  	
  	
  	
  	
  Matcher	
  matcher	
  =	
  pattern.matcher(extract);	
  
!
	
  	
  	
  	
  	
  	
  	
  	
  if	
  (matcher.find())	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  type	
  =	
  GreenbookParser.MEDIA_TYPE;	
  
	
  	
  	
  	
  	
  	
  	
  	
  }	
  
!
	
  	
  	
  	
  	
  	
  	
  	
  lookahead.close();	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  type;	
  
	
  	
  	
  	
  }	
  
	
  	
  	
  
}
Telling some stories
• How to inject “Discovery” into your app	

• The Cloud to the Rescue (sorta!)	

• Parsers and Parsers and Parsers	

➡Don’t be Afraid to Share!
Your BigData solution
isn’t perfect
• Allow users to export data	

• Most business users want to work in Excel.
Accept it!	

• Allow other applications to build on top of
of your application.
GPSN has
• Lots of easy “Print to
PDF” options.	

• Data stored in S3 as:	

• individual patent files	

• chunky downloads.	

• Filtering to expand or
select specific data sets.	

• Permalinks: simple, very
sharable URLs.	

• Underlying Solr service
is exposed to public via
proxy. You can query
Solr yourself.	

• Need advance querying?
Use Lucene syntax in
search bar.
One more thought...
Measuring the impact
of our algorithms
changes is just getting
harder with Big Data.
www.quepid.com
Quepid: Give your Queries
some Love
W
e
need
betausers!
Thank you!
!
Questions?
• epugh@o19s.com	

• @dep4b	

• www.opensourceconnections.com	

• slideshare.com/o19s
Nervous about
speaking up? Ask
me later!

Contenu connexe

Tendances

How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning ModelsJosh Patterson
 
DefCore: The Interoperability Standard for OpenStack
DefCore: The Interoperability Standard for OpenStackDefCore: The Interoperability Standard for OpenStack
DefCore: The Interoperability Standard for OpenStackMark Voelker
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Josh Patterson
 
OpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient CloudOpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient CloudMark Voelker
 
Getting a Neural Network Up and Running with OpenLab
Getting a Neural Network Up and Running with OpenLabGetting a Neural Network Up and Running with OpenLab
Getting a Neural Network Up and Running with OpenLabMelvin Hillsman
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Jeremy Edberg
 
What's beyond Virtualization - The Future of Cloud Platforms
What's beyond Virtualization - The Future of Cloud PlatformsWhat's beyond Virtualization - The Future of Cloud Platforms
What's beyond Virtualization - The Future of Cloud PlatformsDerek Collison
 
Joyent circa 2006 (Scale with Rails)
Joyent circa 2006 (Scale with Rails)Joyent circa 2006 (Scale with Rails)
Joyent circa 2006 (Scale with Rails)bcantrill
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Hellmar Becker
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemHellmar Becker
 
Markup languages and warp-speed documentation
Markup languages and warp-speed documentationMarkup languages and warp-speed documentation
Markup languages and warp-speed documentationLois Patterson
 
OpenStack 101 - All Things Open 2015
OpenStack 101 - All Things Open 2015OpenStack 101 - All Things Open 2015
OpenStack 101 - All Things Open 2015Mark Voelker
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?DATAVERSITY
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JJosh Patterson
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisionsTrent Hornibrook
 
Operations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyOperations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyEduardo Piairo
 
Drupal 8 Configuration Management Initiative Update
Drupal 8 Configuration Management Initiative UpdateDrupal 8 Configuration Management Initiative Update
Drupal 8 Configuration Management Initiative Updateheyrocker
 

Tendances (20)

How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
Coscup
CoscupCoscup
Coscup
 
DrupalCon 2011 Highlight
DrupalCon 2011 HighlightDrupalCon 2011 Highlight
DrupalCon 2011 Highlight
 
DefCore: The Interoperability Standard for OpenStack
DefCore: The Interoperability Standard for OpenStackDefCore: The Interoperability Standard for OpenStack
DefCore: The Interoperability Standard for OpenStack
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015
 
OpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient CloudOpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient Cloud
 
Getting a Neural Network Up and Running with OpenLab
Getting a Neural Network Up and Running with OpenLabGetting a Neural Network Up and Running with OpenLab
Getting a Neural Network Up and Running with OpenLab
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)
 
What's beyond Virtualization - The Future of Cloud Platforms
What's beyond Virtualization - The Future of Cloud PlatformsWhat's beyond Virtualization - The Future of Cloud Platforms
What's beyond Virtualization - The Future of Cloud Platforms
 
Joyent circa 2006 (Scale with Rails)
Joyent circa 2006 (Scale with Rails)Joyent circa 2006 (Scale with Rails)
Joyent circa 2006 (Scale with Rails)
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
 
Markup languages and warp-speed documentation
Markup languages and warp-speed documentationMarkup languages and warp-speed documentation
Markup languages and warp-speed documentation
 
OpenStack 101 - All Things Open 2015
OpenStack 101 - All Things Open 2015OpenStack 101 - All Things Open 2015
OpenStack 101 - All Things Open 2015
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisions
 
Operations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyOperations for databases: the agile/devops journey
Operations for databases: the agile/devops journey
 
Drupal 8 Configuration Management Initiative Update
Drupal 8 Configuration Management Initiative UpdateDrupal 8 Configuration Management Initiative Update
Drupal 8 Configuration Management Initiative Update
 

En vedette

Oliver Schmid's Value Presentation
Oliver Schmid's Value Presentation Oliver Schmid's Value Presentation
Oliver Schmid's Value Presentation Oliver Schmid
 
G&S 2017 Presentación Corporativa
G&S 2017   Presentación CorporativaG&S 2017   Presentación Corporativa
G&S 2017 Presentación CorporativaJavier Gonzales
 
APM Presentación Corporativa 2012
APM Presentación Corporativa 2012APM Presentación Corporativa 2012
APM Presentación Corporativa 2012APM-Shipley
 
Presentación Corporativa Indra
Presentación Corporativa IndraPresentación Corporativa Indra
Presentación Corporativa Indraalekope
 
Connectu media Company Presentation
Connectu media Company PresentationConnectu media Company Presentation
Connectu media Company Presentationmarykayhoal
 
Deutsche EuroShop | Company Presentation | 09/16
Deutsche EuroShop | Company Presentation | 09/16 Deutsche EuroShop | Company Presentation | 09/16
Deutsche EuroShop | Company Presentation | 09/16 Deutsche EuroShop AG
 
Company_Presentation_2015_-_Short
Company_Presentation_2015_-_ShortCompany_Presentation_2015_-_Short
Company_Presentation_2015_-_ShortHitesh Mathur
 
Click consulting group corporate presentation 2014
Click consulting group corporate presentation 2014Click consulting group corporate presentation 2014
Click consulting group corporate presentation 2014S. Milena C Bonila
 
Rcpower presentation original es
Rcpower presentation original esRcpower presentation original es
Rcpower presentation original esrcpowerrd
 
Presentación corporativa Abengoa 2017
Presentación corporativa Abengoa 2017Presentación corporativa Abengoa 2017
Presentación corporativa Abengoa 2017Abengoa
 
Company presentation Aura Light Spain
Company presentation Aura Light SpainCompany presentation Aura Light Spain
Company presentation Aura Light SpainAuraLight00
 
Abengoa's Corporate Presentation 2017
Abengoa's Corporate Presentation 2017Abengoa's Corporate Presentation 2017
Abengoa's Corporate Presentation 2017Abengoa
 
Deutsche EuroShop | Company Presentation | 02/17
Deutsche EuroShop | Company Presentation | 02/17 Deutsche EuroShop | Company Presentation | 02/17
Deutsche EuroShop | Company Presentation | 02/17 Deutsche EuroShop AG
 
ERTMS Solutions general company presentation
ERTMS Solutions general company presentationERTMS Solutions general company presentation
ERTMS Solutions general company presentationERTMS Solutions
 
PJ Software Company Presentation
PJ Software Company PresentationPJ Software Company Presentation
PJ Software Company PresentationPJ Software
 
2014 LinkedIn Company Presentation
2014 LinkedIn Company Presentation2014 LinkedIn Company Presentation
2014 LinkedIn Company PresentationLinkedIn
 

En vedette (20)

Oliver Schmid's Value Presentation
Oliver Schmid's Value Presentation Oliver Schmid's Value Presentation
Oliver Schmid's Value Presentation
 
G&S 2017 Presentación Corporativa
G&S 2017   Presentación CorporativaG&S 2017   Presentación Corporativa
G&S 2017 Presentación Corporativa
 
APM Presentación Corporativa 2012
APM Presentación Corporativa 2012APM Presentación Corporativa 2012
APM Presentación Corporativa 2012
 
Presentación Corporativa Indra
Presentación Corporativa IndraPresentación Corporativa Indra
Presentación Corporativa Indra
 
Eeb corporativa vf
Eeb corporativa vfEeb corporativa vf
Eeb corporativa vf
 
KP314
KP314KP314
KP314
 
Company Presentation
Company PresentationCompany Presentation
Company Presentation
 
Connectu media Company Presentation
Connectu media Company PresentationConnectu media Company Presentation
Connectu media Company Presentation
 
Deutsche EuroShop | Company Presentation | 09/16
Deutsche EuroShop | Company Presentation | 09/16 Deutsche EuroShop | Company Presentation | 09/16
Deutsche EuroShop | Company Presentation | 09/16
 
Company_Presentation_2015_-_Short
Company_Presentation_2015_-_ShortCompany_Presentation_2015_-_Short
Company_Presentation_2015_-_Short
 
Click consulting group corporate presentation 2014
Click consulting group corporate presentation 2014Click consulting group corporate presentation 2014
Click consulting group corporate presentation 2014
 
Karcher Company presentation
Karcher Company presentation Karcher Company presentation
Karcher Company presentation
 
Rcpower presentation original es
Rcpower presentation original esRcpower presentation original es
Rcpower presentation original es
 
Presentación corporativa Abengoa 2017
Presentación corporativa Abengoa 2017Presentación corporativa Abengoa 2017
Presentación corporativa Abengoa 2017
 
Company presentation Aura Light Spain
Company presentation Aura Light SpainCompany presentation Aura Light Spain
Company presentation Aura Light Spain
 
Abengoa's Corporate Presentation 2017
Abengoa's Corporate Presentation 2017Abengoa's Corporate Presentation 2017
Abengoa's Corporate Presentation 2017
 
Deutsche EuroShop | Company Presentation | 02/17
Deutsche EuroShop | Company Presentation | 02/17 Deutsche EuroShop | Company Presentation | 02/17
Deutsche EuroShop | Company Presentation | 02/17
 
ERTMS Solutions general company presentation
ERTMS Solutions general company presentationERTMS Solutions general company presentation
ERTMS Solutions general company presentation
 
PJ Software Company Presentation
PJ Software Company PresentationPJ Software Company Presentation
PJ Software Company Presentation
 
2014 LinkedIn Company Presentation
2014 LinkedIn Company Presentation2014 LinkedIn Company Presentation
2014 LinkedIn Company Presentation
 

Similaire à Searching Chinese Patents Presentation at Enterprise Data World

Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarAdf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarNilesh Shah
 
Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017Jim Adcock
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 

Similaire à Searching Chinese Patents Presentation at Enterprise Data World (20)

Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarAdf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
 
Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 

Plus de OpenSource Connections

How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonOpenSource Connections
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...OpenSource Connections
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajOpenSource Connections
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...OpenSource Connections
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlOpenSource Connections
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...OpenSource Connections
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...OpenSource Connections
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...OpenSource Connections
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...OpenSource Connections
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...OpenSource Connections
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...OpenSource Connections
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah ViaOpenSource Connections
 

Plus de OpenSource Connections (20)

Encores
EncoresEncores
Encores
 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
 

Dernier

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 

Dernier (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Searching Chinese Patents Presentation at Enterprise Data World

  • 1. Searching Chinese Patents: Challenges and Solutions When Building an Innovative Discovery Interface ERIC PUGH | epugh@o19s.com | @dep4b
  • 2. Who am I? • Principal at OpenSource Connections - Solr/Lucene Search Consultancy http://bit.ly/OSCCommercialSummary • Member of Apache Software Foundation • SOLR-284 UpdateRichDocuments (July 07)
  • 7.
  • 8. Risks • Cloud new at USPTO • Discovery is tenuous concept • Conflicting User Goals • Fixed Budget: trade scope for budget/quality
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Telling some stories ➡How to inject “Discovery” into your app • The Cloud to the Rescue (sorta!) • Parsers and Parsers and Parsers • Don’t be Afraid to Share!
  • 15. Flow of understanding Data UnderstandingInformation
  • 17. Grok data at gut level Look for outliers ! User Interviews Surveys Card Sorting Scenarios/Personas ! UX Data brainstorm Mockups Proof of concept ! !
  • 18. Where to spend time? UX Engine Data 40% ! 20% ! 40% ! 40% ! 40% ! 20% We spent !
  • 19. Telling some stories • How to inject “Discovery” into your app ➡The Cloud to the Rescue (sorta!) • Parsers and Parsers and Parsers • Don’t be Afraid to Share!
  • 20. Boy meets Girl Story
  • 21. Boy meets Girl Story Metadata Ingest Pipeline Discovery UX Content Files
  • 22. How we built it EmberJS Single Page Search App HTML XML JSON Server Dashboard GPSN UI (Bootsrap CSS) Browsers Mobile/ Tablet Third Party Application Servers S3 BucketSolr
  • 23. Solr as a NoSQL Datastore • Used “atomic updates” to merge three source datasets into single final dataset. • All text displayed in application stored in Solr. • Dynamic schema supports many languages, en, cn right now.
  • 25. Don’t Move Files • Copying 5 TB data up to S3 was very painful. • We used S3Funnel which is “rsync like” • We bought more network bandwidth for our office
  • 26. Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.
 –Andrew Tanenbaum, 1981
  • 27. Data Size 0 250000 500000 750000 1000000 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 Patent Count 277871
  • 28. Think about DataVolume • Started with older dataset, and tasks like TIFF -> PNG conversion became progressively harder. Map/Reduce nice, need more visibility into progress.. • Should have sharded our Search Index from the beginning just to make indexing faster and cheaper process (500 gb index!) • 8 shards dropped time from 12 hours to 2 hours. Merging took 5! • We had too many steps in our pipeline
  • 29. Building  a  Patents  Index MachineCount 0 75 150 225 300 5 days 3 days 30 Minutes 1 5 300
  • 30. Key scaling concept behind GPSN: ! Cloud meets Ocean
  • 32. Telling some stories • How to inject “Discovery” into your app • The Cloud to the Rescue (sorta!) ➡Parsers and Parsers and Parsers • Don’t be Afraid to Share!
  • 33. Why so many pipelines? Morphlines
  • 34. Tika as a pipeline?
  • 35. Lot’s of File Types • Sometimes in ZIP archives, sometimes not! • multiple XML formats as well as CSV and EDI • Purplebook,Yellowbook, Redbook,Greenbook, Questel, SIPO…
  • 36. Tika as a pipeline! • Auto detects content type • Metadata structure has all the key/value needed for Solr • Allows us to scale up with Behemoth project (and others!).
  • 37. Lots of files! HHHHHT APS1 ISSUE - 760106! PATN! WKU 039302717! SRC 5! APN 5328756! APT 1! ART 353! APD 19741216! TTL Golf glove! ISD 19760106! NCL 4! ECL 1 <PatentGrant>! <BibliographicData>! <GrantIdentification>! <DocumentKindCode>B1</DocumentKindCode>! <GrantNumber>06644224</GrantNumber>! <CountryCode>US</CountryCode>! <IssueDateText>2003-11-11</IssueDateText>
  • 38. Detector to pick File public  class  GreenbookDetector  implements  Detector  {   !        private  static  Pattern  pattern  =  Pattern.compile("PATN");                    @Override          public  MediaType  detect(InputStream  stream,  Metadata  metadata)  throws  IOException  {   !                MediaType  type  =  MediaType.OCTET_STREAM;                  InputStream  lookahead  =  new  LookaheadInputStream(stream,  1024);                  String  extract  =  org.apache.commons.io.IOUtils.toString(lookahead,  "UTF-­‐8");   !                Matcher  matcher  =  pattern.matcher(extract);   !                if  (matcher.find())  {                          type  =  GreenbookParser.MEDIA_TYPE;                  }   !                lookahead.close();                                    return  type;          }         }
  • 39. Telling some stories • How to inject “Discovery” into your app • The Cloud to the Rescue (sorta!) • Parsers and Parsers and Parsers ➡Don’t be Afraid to Share!
  • 40. Your BigData solution isn’t perfect • Allow users to export data • Most business users want to work in Excel. Accept it! • Allow other applications to build on top of of your application.
  • 41. GPSN has • Lots of easy “Print to PDF” options. • Data stored in S3 as: • individual patent files • chunky downloads. • Filtering to expand or select specific data sets. • Permalinks: simple, very sharable URLs. • Underlying Solr service is exposed to public via proxy. You can query Solr yourself. • Need advance querying? Use Lucene syntax in search bar.
  • 43. Measuring the impact of our algorithms changes is just getting harder with Big Data.
  • 44. www.quepid.com Quepid: Give your Queries some Love W e need betausers!
  • 45. Thank you! ! Questions? • epugh@o19s.com • @dep4b • www.opensourceconnections.com • slideshare.com/o19s Nervous about speaking up? Ask me later!