SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013




                         Search All the Things


Friday, February 8, 13
Introduction
      Kevin Bridges

       •     Senior Software Engineer, Cloud
             Systems at Acquia
       •     Avid technologist that believes
             Drupal is a component of larger
             systems.
       •     http://drupal.org/user/27802 -
             aka cyberswat
       •     https://twitter.com/cyberswat




                                               2

Friday, February 8, 13
The Problem
      Large organizations have lots of data that can be in multiple
      formats. Different teams can use different tools and
      services making a cohesive interface difficult.

      •     Hosted data with services like Github
      •     Internal API’s
      •     Wikis
      •     Documents and text files.

      This data can span multiple languages and formats. How
      can we combine all of these sources into a single interface
      that is easy to use while maintaining context?



                                                                      3

Friday, February 8, 13
Engineering Week Hackathon




      We had 24 hours to solve the problem.

       •     Build a Drupal 7 site
       •     Integrate with LDAP over SSL for secure access
       •     Serve generated API docs like RDoc
       •     Index generated docs and github docs for searching
       •     Enable an effective faceted search
                                                                  4

Friday, February 8, 13
The Team
      We needed a few specialists to pull this off. 3 Drupal
      developers, 1 Drupal themer, and 2 operations hackers.

       •     Kevin Bridges (@cyberswat) - Drupal & DevOps
       •     Peter Wolanin (@pwolanin) - Drupal & Solr
       •     Peter Jackson (@faoiseamh) - Drupal & DevOps
       •     Richard Burford (@psynaptic) - Drupal Themer
       •     Amin Astaneh (@aastaneh) - Operations
       •     Chris Rutter (@ChrisRut) - Operations




                                                               5

Friday, February 8, 13
Drupal Modules
      We used 6 contributed modules to accelerate our
      development efforts. We needed to create 1 custom
      module that currently lives in a Drupal Sandbox.

      Contributed Modules
      •     Acquia Connector - Contains the Acquia Search
            module which provides integration between a Drupal site
            and Acquia's hosted search service
      •     Apache Solr - Integrates Drupal with the Apache Solr
            search platform
      •     Apache Solr Attachments - Allows searching within file
            attachments from Solr



                                                                      6

Friday, February 8, 13
Drupal Modules
      Contributed Modules Continued

      •     Apache Solr Multisite Search - Search across multiple
            sites with Solr
      •     Facet API - Abstract facet API that can be used by
            various search backends
      •     LDAP - Provides integration with LDAP services

      Custom Modules

      •     API docs search - Search API docs with Solr




                                                                    7

Friday, February 8, 13
Custom StreamWrappers
      Drupal’s StreamWrappers allow us to keep local copies of
      the data we need to index while maintaining control over
      how the data is displayed to the end user.

      generated
       •     Store generated content for indexing and viewing.
       •     Allow the files to be viewable from the search results in
             the context of the Drupal site.
       •     Allows us to store raw html for display from search
             results.
      github
       •     Store github content for pre-processing and indexing.
       •     Modify external links to this content to reference the
             document as it lives on github for additional context.

                                                                         8

Friday, February 8, 13
Jenkins
      Jenkins runs a cron that gathers all of the data we want
      indexed and pushes it into the main git repository as
      rendered content for the site. Once content is in git it is
      pulled onto the server for our StreamWrappers to work.

       •     Checks out the allthethings repo that runs the main
             drupal install.
       •     Loops over each of the git repositories we are interested
             in indexing.
       •     Scans our standard documentation types and locations
             for changes and commits them to allthethings.
       •     Runs RDoc to generate Ruby Docs and commits the
             documentation to allthethings if it has changed.



                                                                         9

Friday, February 8, 13
Scanning Content for Indexing
      Before we can index content in Solr we need to identify
      what should be indexed. Once identified, the file is tracked
      in mysql so that it can be processed efficiently.

       •     Cron is used to pull down changes Jenkins may have
             pushed.
       •     Each of the StreamWrapper file directories is scanned
             for valid content.
       •     A hash of the content is generated with the timestamps
             to help target what should be indexed.
       •     Database record includes uri, hash, timestamp, type,
             mimetype and status.




                                                                      10

Friday, February 8, 13
Passing Content to Solr
      For each of the scanned documents we need to build a Solr
      document to be used in search results.

       •     Evaluate the content and render it using the github
             markup gem if necessary.
       •     Evaluate the content for html tags to assist with
             surfacing content in searches.
       •     Identify a good title for the document by searching for
             title and h1 tags.
       •     Send the completed document to Solr for indexing.
       •     Update our scanned document’s status to indicate it has
             been indexed.




                                                                       11

Friday, February 8, 13
Create Facets with FacetAPI
      The FacetAPI is used to create custom Facets. We wanted
      a facet to allow filtering by API Source and Content Type.

       •     During generation of the Solr document populate the
             ss_apisource attribute.
       •     FacetAPI provides a block for each content type. This
             corresponds with the entity_type attribute in our Solr
             document.
       •     Implement hook_facetapi_facet_info to provide the
             definition of the facet.
       •     Use apidocs_search_map_source to map different
             sources to labels.




                                                                      12

Friday, February 8, 13
Drush Integration
      It’s always a good idea to start with Drush while building
      advanced tools. This provides easier development,
      troubleshooting and maintenance capabilities.

       •     apidocs-clean Removes file references from database
             that no longer exist in the filesystem
       •     apidocs-index Indexes files referenced in
             {apidocs_search_files}.
       •     apidocs-scan - Scans existing documentation to record
             references in the database.
       •     apidocs-markup - Parses a github flavored markdown
             file into markup.




                                                                     13

Friday, February 8, 13
Custom apidocs_search Module
      The bulk of our customizations were focused in the
      apidocs_search module. This module is available in a
      sandbox on drupal.org for your inspection.

       •     apidocs_search.index.inc - Manages Solr indexing
       •     apidocs_search.install - Manages the
             apidocs_search_files schema.
       •     apidocs_search_markup.rb - uses the github-markup
             gem to render github flavored markdown
       •     apidocs_search_streamwrappers.inc - Provides a
             generated documentation and github stream wrapper
       •     apidocs_search.module - Provides the necessary
             callbacks and methods to make it all work


                                                                 14

Friday, February 8, 13
Resources and Links
      Developers
       •     cyberswat - http://drupal.org/user/27802
       •     pwolanin - http://drupal.org/user/49851
       •     faoiseamh - http://drupal.org/user/1999750
       •     psynaptic - http://drupal.org/user/93429
       •     aastaneh - http://drupal.org/user/2318122
       •     ChrisRut - http://drupal.org/user/597820

      More Reading
      •     https://www.acquia.com/blog/finding-all-things-
            engineering-hackathon
      •     http://www.slideshare.net/cyberswat/drupalcon-sydney


                                                                   15

Friday, February 8, 13
Resources and Links
      Contrib Modules
      •     http://drupal.org/project/acquia_connector
      •     http://drupal.org/project/apachesolr
      •     http://drupal.org/project/apachesolr_attachments
      •     http://drupal.org/project/apachesolr_multisitesearch
      •     http://drupal.org/project/facetapi
      •     http://drupal.org/project/ctools
      •     http://drupal.org/project/ldap

      Custom Modules
      •     http://drupal.org/sandbox/pwolanin/1801674



                                                                   16

Friday, February 8, 13
Aquia is Hiring in Australia
                              (and elsewhere)
                   https://www.acquia.com/careers


Friday, February 8, 13
CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013



                          Search All the Things
                         We Need Your Feedback
                             http://sydney2013.drupal.org/node/348




Friday, February 8, 13

Contenu connexe

Tendances

Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop EcosystemAdding Search to the Hadoop Ecosystem
Adding Search to the Hadoop Ecosystem
Cloudera, Inc.
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 

Tendances (20)

Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
DrupalCon 2011 Highlight
DrupalCon 2011 HighlightDrupalCon 2011 Highlight
DrupalCon 2011 Highlight
 
Intro to Search
Intro to SearchIntro to Search
Intro to Search
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop EcosystemAdding Search to the Hadoop Ecosystem
Adding Search to the Hadoop Ecosystem
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos
 
OSSCON: Big Search 4 Big Data
OSSCON: Big Search 4 Big DataOSSCON: Big Search 4 Big Data
OSSCON: Big Search 4 Big Data
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming app
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
 
Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldSearch in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!
 

En vedette

Sf Core Summit
Sf Core SummitSf Core Summit
Sf Core Summit
cyberswat
 
Gurubhaktiyoga
GurubhaktiyogaGurubhaktiyoga
Gurubhaktiyoga
gurusewa
 
Anmol bhajansangrah
Anmol bhajansangrahAnmol bhajansangrah
Anmol bhajansangrah
gurusewa
 
Imslp00467 chopin --2_nocturnes__op_55
Imslp00467 chopin --2_nocturnes__op_55Imslp00467 chopin --2_nocturnes__op_55
Imslp00467 chopin --2_nocturnes__op_55
Pedro Gragera Luna
 
Jitay jimukti
Jitay jimuktiJitay jimukti
Jitay jimukti
gurusewa
 
5th weekly news
5th weekly news5th weekly news
5th weekly news
samankit
 
Presentazione Telesca Martina
Presentazione Telesca MartinaPresentazione Telesca Martina
Presentazione Telesca Martina
Martina
 

En vedette (20)

Sf Core Summit
Sf Core SummitSf Core Summit
Sf Core Summit
 
Gurubhaktiyoga
GurubhaktiyogaGurubhaktiyoga
Gurubhaktiyoga
 
Xarxes Socials
Xarxes SocialsXarxes Socials
Xarxes Socials
 
Gmail and Google Groups for Librarians
Gmail and Google Groups for LibrariansGmail and Google Groups for Librarians
Gmail and Google Groups for Librarians
 
Anmol bhajansangrah
Anmol bhajansangrahAnmol bhajansangrah
Anmol bhajansangrah
 
HERS SA Academy 8 September 2014: Workshop on Scholarly Journals
HERS SA Academy 8 September 2014: Workshop on Scholarly JournalsHERS SA Academy 8 September 2014: Workshop on Scholarly Journals
HERS SA Academy 8 September 2014: Workshop on Scholarly Journals
 
Imslp00467 chopin --2_nocturnes__op_55
Imslp00467 chopin --2_nocturnes__op_55Imslp00467 chopin --2_nocturnes__op_55
Imslp00467 chopin --2_nocturnes__op_55
 
Blogging For Education
Blogging For EducationBlogging For Education
Blogging For Education
 
Powerpoint dinnershow
Powerpoint dinnershowPowerpoint dinnershow
Powerpoint dinnershow
 
Jitay jimukti
Jitay jimuktiJitay jimukti
Jitay jimukti
 
Kids in the Cloud Namibia Workshops
Kids in the Cloud Namibia WorkshopsKids in the Cloud Namibia Workshops
Kids in the Cloud Namibia Workshops
 
Nokia 5800
Nokia 5800Nokia 5800
Nokia 5800
 
5th weekly news
5th weekly news5th weekly news
5th weekly news
 
Electicity
Electicity Electicity
Electicity
 
New Media New Technologies - 2013 - Workshop 2 - Space
New Media New Technologies -  2013 - Workshop 2 - SpaceNew Media New Technologies -  2013 - Workshop 2 - Space
New Media New Technologies - 2013 - Workshop 2 - Space
 
A chave
A chaveA chave
A chave
 
SciELO
SciELO SciELO
SciELO
 
Presentazione Telesca Martina
Presentazione Telesca MartinaPresentazione Telesca Martina
Presentazione Telesca Martina
 
Panamá
PanamáPanamá
Panamá
 
Mobile learning.. Trends and Opportunities
Mobile learning.. Trends and OpportunitiesMobile learning.. Trends and Opportunities
Mobile learning.. Trends and Opportunities
 

Similaire à Search all the things

Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Spark Summit
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
BlueData, Inc.
 

Similaire à Search all the things (20)

Produce and consume_linked_data_with_drupal
Produce and consume_linked_data_with_drupalProduce and consume_linked_data_with_drupal
Produce and consume_linked_data_with_drupal
 
Search api d8
Search api d8Search api d8
Search api d8
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
 
Swoogle
SwoogleSwoogle
Swoogle
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
Wikipedia Cloud Search Webinar
Wikipedia Cloud Search WebinarWikipedia Cloud Search Webinar
Wikipedia Cloud Search Webinar
 
Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
QueryPath, Mash-ups, and Web Services
QueryPath, Mash-ups, and Web ServicesQueryPath, Mash-ups, and Web Services
QueryPath, Mash-ups, and Web Services
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Search all the things

  • 1. CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013 Search All the Things Friday, February 8, 13
  • 2. Introduction Kevin Bridges • Senior Software Engineer, Cloud Systems at Acquia • Avid technologist that believes Drupal is a component of larger systems. • http://drupal.org/user/27802 - aka cyberswat • https://twitter.com/cyberswat 2 Friday, February 8, 13
  • 3. The Problem Large organizations have lots of data that can be in multiple formats. Different teams can use different tools and services making a cohesive interface difficult. • Hosted data with services like Github • Internal API’s • Wikis • Documents and text files. This data can span multiple languages and formats. How can we combine all of these sources into a single interface that is easy to use while maintaining context? 3 Friday, February 8, 13
  • 4. Engineering Week Hackathon We had 24 hours to solve the problem. • Build a Drupal 7 site • Integrate with LDAP over SSL for secure access • Serve generated API docs like RDoc • Index generated docs and github docs for searching • Enable an effective faceted search 4 Friday, February 8, 13
  • 5. The Team We needed a few specialists to pull this off. 3 Drupal developers, 1 Drupal themer, and 2 operations hackers. • Kevin Bridges (@cyberswat) - Drupal & DevOps • Peter Wolanin (@pwolanin) - Drupal & Solr • Peter Jackson (@faoiseamh) - Drupal & DevOps • Richard Burford (@psynaptic) - Drupal Themer • Amin Astaneh (@aastaneh) - Operations • Chris Rutter (@ChrisRut) - Operations 5 Friday, February 8, 13
  • 6. Drupal Modules We used 6 contributed modules to accelerate our development efforts. We needed to create 1 custom module that currently lives in a Drupal Sandbox. Contributed Modules • Acquia Connector - Contains the Acquia Search module which provides integration between a Drupal site and Acquia's hosted search service • Apache Solr - Integrates Drupal with the Apache Solr search platform • Apache Solr Attachments - Allows searching within file attachments from Solr 6 Friday, February 8, 13
  • 7. Drupal Modules Contributed Modules Continued • Apache Solr Multisite Search - Search across multiple sites with Solr • Facet API - Abstract facet API that can be used by various search backends • LDAP - Provides integration with LDAP services Custom Modules • API docs search - Search API docs with Solr 7 Friday, February 8, 13
  • 8. Custom StreamWrappers Drupal’s StreamWrappers allow us to keep local copies of the data we need to index while maintaining control over how the data is displayed to the end user. generated • Store generated content for indexing and viewing. • Allow the files to be viewable from the search results in the context of the Drupal site. • Allows us to store raw html for display from search results. github • Store github content for pre-processing and indexing. • Modify external links to this content to reference the document as it lives on github for additional context. 8 Friday, February 8, 13
  • 9. Jenkins Jenkins runs a cron that gathers all of the data we want indexed and pushes it into the main git repository as rendered content for the site. Once content is in git it is pulled onto the server for our StreamWrappers to work. • Checks out the allthethings repo that runs the main drupal install. • Loops over each of the git repositories we are interested in indexing. • Scans our standard documentation types and locations for changes and commits them to allthethings. • Runs RDoc to generate Ruby Docs and commits the documentation to allthethings if it has changed. 9 Friday, February 8, 13
  • 10. Scanning Content for Indexing Before we can index content in Solr we need to identify what should be indexed. Once identified, the file is tracked in mysql so that it can be processed efficiently. • Cron is used to pull down changes Jenkins may have pushed. • Each of the StreamWrapper file directories is scanned for valid content. • A hash of the content is generated with the timestamps to help target what should be indexed. • Database record includes uri, hash, timestamp, type, mimetype and status. 10 Friday, February 8, 13
  • 11. Passing Content to Solr For each of the scanned documents we need to build a Solr document to be used in search results. • Evaluate the content and render it using the github markup gem if necessary. • Evaluate the content for html tags to assist with surfacing content in searches. • Identify a good title for the document by searching for title and h1 tags. • Send the completed document to Solr for indexing. • Update our scanned document’s status to indicate it has been indexed. 11 Friday, February 8, 13
  • 12. Create Facets with FacetAPI The FacetAPI is used to create custom Facets. We wanted a facet to allow filtering by API Source and Content Type. • During generation of the Solr document populate the ss_apisource attribute. • FacetAPI provides a block for each content type. This corresponds with the entity_type attribute in our Solr document. • Implement hook_facetapi_facet_info to provide the definition of the facet. • Use apidocs_search_map_source to map different sources to labels. 12 Friday, February 8, 13
  • 13. Drush Integration It’s always a good idea to start with Drush while building advanced tools. This provides easier development, troubleshooting and maintenance capabilities. • apidocs-clean Removes file references from database that no longer exist in the filesystem • apidocs-index Indexes files referenced in {apidocs_search_files}. • apidocs-scan - Scans existing documentation to record references in the database. • apidocs-markup - Parses a github flavored markdown file into markup. 13 Friday, February 8, 13
  • 14. Custom apidocs_search Module The bulk of our customizations were focused in the apidocs_search module. This module is available in a sandbox on drupal.org for your inspection. • apidocs_search.index.inc - Manages Solr indexing • apidocs_search.install - Manages the apidocs_search_files schema. • apidocs_search_markup.rb - uses the github-markup gem to render github flavored markdown • apidocs_search_streamwrappers.inc - Provides a generated documentation and github stream wrapper • apidocs_search.module - Provides the necessary callbacks and methods to make it all work 14 Friday, February 8, 13
  • 15. Resources and Links Developers • cyberswat - http://drupal.org/user/27802 • pwolanin - http://drupal.org/user/49851 • faoiseamh - http://drupal.org/user/1999750 • psynaptic - http://drupal.org/user/93429 • aastaneh - http://drupal.org/user/2318122 • ChrisRut - http://drupal.org/user/597820 More Reading • https://www.acquia.com/blog/finding-all-things- engineering-hackathon • http://www.slideshare.net/cyberswat/drupalcon-sydney 15 Friday, February 8, 13
  • 16. Resources and Links Contrib Modules • http://drupal.org/project/acquia_connector • http://drupal.org/project/apachesolr • http://drupal.org/project/apachesolr_attachments • http://drupal.org/project/apachesolr_multisitesearch • http://drupal.org/project/facetapi • http://drupal.org/project/ctools • http://drupal.org/project/ldap Custom Modules • http://drupal.org/sandbox/pwolanin/1801674 16 Friday, February 8, 13
  • 17. Aquia is Hiring in Australia (and elsewhere) https://www.acquia.com/careers Friday, February 8, 13
  • 18. CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013 Search All the Things We Need Your Feedback http://sydney2013.drupal.org/node/348 Friday, February 8, 13