This outlines a 24 hackathon project at Acquia that addresses combining generated api documentation and docs from github hosted resources into a single indexeable interface managed by Solr and Drupal.
1. CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013
Search All the Things
Friday, February 8, 13
2. Introduction
Kevin Bridges
• Senior Software Engineer, Cloud
Systems at Acquia
• Avid technologist that believes
Drupal is a component of larger
systems.
• http://drupal.org/user/27802 -
aka cyberswat
• https://twitter.com/cyberswat
2
Friday, February 8, 13
3. The Problem
Large organizations have lots of data that can be in multiple
formats. Different teams can use different tools and
services making a cohesive interface difficult.
• Hosted data with services like Github
• Internal API’s
• Wikis
• Documents and text files.
This data can span multiple languages and formats. How
can we combine all of these sources into a single interface
that is easy to use while maintaining context?
3
Friday, February 8, 13
4. Engineering Week Hackathon
We had 24 hours to solve the problem.
• Build a Drupal 7 site
• Integrate with LDAP over SSL for secure access
• Serve generated API docs like RDoc
• Index generated docs and github docs for searching
• Enable an effective faceted search
4
Friday, February 8, 13
5. The Team
We needed a few specialists to pull this off. 3 Drupal
developers, 1 Drupal themer, and 2 operations hackers.
• Kevin Bridges (@cyberswat) - Drupal & DevOps
• Peter Wolanin (@pwolanin) - Drupal & Solr
• Peter Jackson (@faoiseamh) - Drupal & DevOps
• Richard Burford (@psynaptic) - Drupal Themer
• Amin Astaneh (@aastaneh) - Operations
• Chris Rutter (@ChrisRut) - Operations
5
Friday, February 8, 13
6. Drupal Modules
We used 6 contributed modules to accelerate our
development efforts. We needed to create 1 custom
module that currently lives in a Drupal Sandbox.
Contributed Modules
• Acquia Connector - Contains the Acquia Search
module which provides integration between a Drupal site
and Acquia's hosted search service
• Apache Solr - Integrates Drupal with the Apache Solr
search platform
• Apache Solr Attachments - Allows searching within file
attachments from Solr
6
Friday, February 8, 13
7. Drupal Modules
Contributed Modules Continued
• Apache Solr Multisite Search - Search across multiple
sites with Solr
• Facet API - Abstract facet API that can be used by
various search backends
• LDAP - Provides integration with LDAP services
Custom Modules
• API docs search - Search API docs with Solr
7
Friday, February 8, 13
8. Custom StreamWrappers
Drupal’s StreamWrappers allow us to keep local copies of
the data we need to index while maintaining control over
how the data is displayed to the end user.
generated
• Store generated content for indexing and viewing.
• Allow the files to be viewable from the search results in
the context of the Drupal site.
• Allows us to store raw html for display from search
results.
github
• Store github content for pre-processing and indexing.
• Modify external links to this content to reference the
document as it lives on github for additional context.
8
Friday, February 8, 13
9. Jenkins
Jenkins runs a cron that gathers all of the data we want
indexed and pushes it into the main git repository as
rendered content for the site. Once content is in git it is
pulled onto the server for our StreamWrappers to work.
• Checks out the allthethings repo that runs the main
drupal install.
• Loops over each of the git repositories we are interested
in indexing.
• Scans our standard documentation types and locations
for changes and commits them to allthethings.
• Runs RDoc to generate Ruby Docs and commits the
documentation to allthethings if it has changed.
9
Friday, February 8, 13
10. Scanning Content for Indexing
Before we can index content in Solr we need to identify
what should be indexed. Once identified, the file is tracked
in mysql so that it can be processed efficiently.
• Cron is used to pull down changes Jenkins may have
pushed.
• Each of the StreamWrapper file directories is scanned
for valid content.
• A hash of the content is generated with the timestamps
to help target what should be indexed.
• Database record includes uri, hash, timestamp, type,
mimetype and status.
10
Friday, February 8, 13
11. Passing Content to Solr
For each of the scanned documents we need to build a Solr
document to be used in search results.
• Evaluate the content and render it using the github
markup gem if necessary.
• Evaluate the content for html tags to assist with
surfacing content in searches.
• Identify a good title for the document by searching for
title and h1 tags.
• Send the completed document to Solr for indexing.
• Update our scanned document’s status to indicate it has
been indexed.
11
Friday, February 8, 13
12. Create Facets with FacetAPI
The FacetAPI is used to create custom Facets. We wanted
a facet to allow filtering by API Source and Content Type.
• During generation of the Solr document populate the
ss_apisource attribute.
• FacetAPI provides a block for each content type. This
corresponds with the entity_type attribute in our Solr
document.
• Implement hook_facetapi_facet_info to provide the
definition of the facet.
• Use apidocs_search_map_source to map different
sources to labels.
12
Friday, February 8, 13
13. Drush Integration
It’s always a good idea to start with Drush while building
advanced tools. This provides easier development,
troubleshooting and maintenance capabilities.
• apidocs-clean Removes file references from database
that no longer exist in the filesystem
• apidocs-index Indexes files referenced in
{apidocs_search_files}.
• apidocs-scan - Scans existing documentation to record
references in the database.
• apidocs-markup - Parses a github flavored markdown
file into markup.
13
Friday, February 8, 13
14. Custom apidocs_search Module
The bulk of our customizations were focused in the
apidocs_search module. This module is available in a
sandbox on drupal.org for your inspection.
• apidocs_search.index.inc - Manages Solr indexing
• apidocs_search.install - Manages the
apidocs_search_files schema.
• apidocs_search_markup.rb - uses the github-markup
gem to render github flavored markdown
• apidocs_search_streamwrappers.inc - Provides a
generated documentation and github stream wrapper
• apidocs_search.module - Provides the necessary
callbacks and methods to make it all work
14
Friday, February 8, 13
17. Aquia is Hiring in Australia
(and elsewhere)
https://www.acquia.com/careers
Friday, February 8, 13
18. CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013
Search All the Things
We Need Your Feedback
http://sydney2013.drupal.org/node/348
Friday, February 8, 13