SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
Eguide lucene revolution_2011_v1d
LUCENE REVOLUTION San Francisco 2011




Welcome to San Francisco!
We are excited to be bringing you the second Lucene Revolution event, following quickly on the
success of our 2010 conference in Boston last year. In addition to all the great feedback we received
after Boston, many people asked about bringing the conference to the West Coast – and here we
are. It’s great to host the community here in our home state of California.
There’s now no question: the revolution is in full swing, and Lucene and Solr are shaping the future
of search. The diverse range of search technology and applications is without a doubt one of its
greatest strengths. For the extended community and ecosystem of open source search, Lucene
Revolution is an unmatched opportunity to learn, network, share experiences, see how others have
changed the world of search.
Speakers here at the conference hail from companies large and small, from innovative startups and
established companies, as well as from government, academia and non-profits. Even better, the
range of experience and application interests of your fellow-attendees should inspire you to seek out
new ways to put search technology to work. We’ve allotted ample time in breaks to have formal and
informal conversations. And be sure to join the Revolution social network at:
http://lucene.crowdvine.com/. Keep an eye out at the Registration Desk for agenda changes and
updates.
One group you should definitely seek out here is the core group of developers and committers who
are the heart and soul of the Apache Lucene/Solr project. You know them from the mailing lists;
these are the people who do the hard work of making the code do its magic, resolving challenging
technical and architectural issues that we all benefit from. Don’t just attend their roadmap panel and
technical sessions; make sure you avail yourself of the opportunity to put faces to names, so that
when you’re on the mailing lists, you’ll have more than a ‘to’ and a ‘from’ to go by.
As the commercial entity for Lucene/Solr, we at Lucid Imagination are always looking for new ways
to help make the most of open source search. Be sure to tell us what you like, what could be
improved, and what topics should be covered in future events. Think about sharing your own
successes with the community by speaking at the next Lucene Revolution.
Let the conference staff, or anyone on the Lucid Imagination team, know if you have any questions,
or if there’s anything you need.
Onward to the revolution!
Eric Gries, CEO
Lucid Imagination



                                                 1
San Francisco 2011                      LUCENE REVOLUTION




Opening Letter .................................................................................................................................................... 1!
Contents ............................................................................................................................................................... 2!
Timetable at a Glance ........................................................................................................................................ 3!
Agenda .................................................................................................................................................................. 6!
About Lucid Imagination .................................................................................................................................. 8!
About Our Sponsors ........................................................................................................................................ 10!
Training .............................................................................................................................................................. 14!
Keynotes ............................................................................................................................................................ 18!
Sessions–Day 1.................................................................................................................................................. 19!
Lightning Talks ................................................................................................................................................. 25!
Sessions–Day 2.................................................................................................................................................. 28!
Speaker Bios ...................................................................................................................................................... 36!
Hotel, Maps & Transportation Info .............................................................................................................. 50!




Lucene, Apache Lucene, Solr, Apache Solr, Hadoop, Apache Hadoop and other Apache projects mentioned are trademarks of The Apache Software Foundation.



                                                                                  2
LUCENE REVOLUTION San Francisco 2011




SUNDAY MAY 22
16:00 - 18:00 ........................................................................................ REGISTRATION OPEN
                                                                             Sandpebble Foyer outside Grand Peninsula Ballroom

MONDAY MAY 23
8:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN
9:00 - 17:00 ...................................................................................... Training Workshops/Day 1
        ! Solr Application Development Workshop
        ! Developing Search Applications with LucidWorks Enterprise
        ! Lucene Application Development Workshop
        ! Scaling Search with Solr and Big Data
                                                                           See registration desk in Sandpebble Foyer for room assignment.

TUESDAY MAY 24
8:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN
9:00 - 17:00 ...................................................................................... Training Workshops/Day 2
         ! Solr Application Development Workshop
         ! Developing Search Applications with LucidWorks Enterprise
         ! Lucene Application Development Workshop
         ! Scaling Search with Solr and Big Data
16:00 – 18:00 .............................................................................................. Ticket Pickup for Giants Game
                                                                                               (advance tickets required). Tickets may be picked up
                                                                                 at the Conference Registration Desk in the Sandpebble Foyer
18:00.................................................................................................................. Buses depart for Giants Game
                                                                                                               from front entrance of Hyatt Hotel




                                                                       3
San Francisco 2011                   LUCENE REVOLUTION


WEDNESDAY, MAY 25
7:30 – 18:00............................................................................................................. REGISTRATION OPEN
7:30 – 8:30 ..................................................................................................................Light Breakfast Available
8:30 – 10:05 ................................................................................................ Welcome & Keynotes
                  Welcome .................................................................. Eric Gries, Lucid Imagination
                  Keynotes ......................................................Marc Krellenstein, Lucid Imagination
                                                                                  Stephen Dunn, The Guardian News and Media
10:05 – 10:35 .......................................................................................................................................... BREAK
10:35 - 11:25 ........................................................................................ Technical Track Sessions
11:25 – 11:35 .......................................................................................................................................... BREAK
11:35 - 12:25 ........................................................................................ Technical Track Sessions
12:25 - 13:30 ....................................................................................LUNCH AND SPONSOR EXHIBITS
13:30 - 14:20 ........................................................................................ Technical Track Sessions
14:20 - 14:30 ........................................................................................................................................... BREAK
14:30 - 15:20 ........................................................................................ Technical Track Sessions
15:20 - 15:50 .......................................................................................................................................... BREAK
15:50 - 16:40 ..................................................................................... Panel: “Stump the Chump”
16:40 – 17:00 ......................................................................................................................................... BREAK
17:00 - 18:30 ........................................................................................................ Lightning Talks
18:30........................................................................................................................... REVOLUTION PARTY

THURSDAY MAY 26
7:45 – 8:45 ..................................................................................................................Light Breakfast Available
8:45 – 10:15
                  Keynote ....................................................................... Stephen O’Grady, Redmonk
                  Panel ..................................................... Committers Q&A, Lucene/Solr Roadmap
10:15 – 10:45 .......................................................................................................................................... BREAK
10:45 - 11:35 ........................................................................................ Technical Track Sessions
11:35 - 11:45 ........................................................................................................................................... BREAK
11:45 - 12:35 ........................................................................................ Technical Track Sessions
12:35 - 13:45 ....................................................................................LUNCH AND SPONSOR EXHIBITS
13:45 - 14:35 ........................................................................................ Technical Track Sessions
14:35 - 14:45 ........................................................................................................................................... BREAK
14:45 - 15:35 ........................................................................................ Technical Track Sessions
15:35 - 15:45 ........................................................................................................................................... BREAK
15:45 - 16:35 ....................................................................................... Technical Track Sessions
16:35 - 17:30 ......................................... Panel: “Search for Tomorrow (RDBMS for Yesterday)”
17:30............................................................................................................................ CONFERENCE ENDS




                                                                             4
LUCENE REVOLUTION San Francisco 2011


LOGISTICS
       !   REGISTRATION is in the Grand Peninsula Foyer
       !   KEYNOTES and PANEL DISCUSSIONS are Grand Peninsula Ballroom D
       !   TRACK 1 is in Grand Peninsula Ballroom A/B/C
       !   TRACK 2 is in Grand Peninsula Ballroom D
       !   TRACK 3 is in Grand Peninsula Ballroom E/F/G
       !   TRACK 4 is in Sand Pebble A/B/C
       !   LUNCHES are in the Atrium (upstairs above Ballroom )
       !   THE REVOLUTION PARTY is in the Grand Peninsula Foyer
       !   TRAINING CLASSES will be held in the Sandpebble Conference Rooms
       !   TRAINING REGISTRATION is outside the Sandpebble Conference Rooms
              (please contact charelm@gmail.com if are unsure which class you are in):




                                              5
San Francisco 2011   LUCENE REVOLUTION




6
LUCENE REVOLUTION   San Francisco 2011




                                         7
San Francisco 2011    LUCENE REVOLUTION




As the world’s leading source of expertise in open source search technology and the
commercial company for Apache Solr/Lucene, Lucid Imagination offers the products and
services you need for cost-effective development and production deployment of cutting edge search
applications that lower your cost of growth. Thousands of organizations around the world have
turned to the power of Apache Solr/Lucene open source technology to drive their cutting-edge
search applications.

LucidWorks: Enterprise Grade Solr/Lucene
LucidWorks Enterprise is a flexible, cost-effective scalable platform that simplifies development,
tuning, configuration and deployment of Solr/Lucene open source search technology. It features:

                                               POW ERFUL SEARCH
                                               !   Complete Apache Solr 4.x Release Integrated
                                                   and tested with powerful enhancements
                                               !   Scalability Distributed search and indexing
                                               !   Cloud-Ready Centrally managed search
                                                   replication and configuration
                                               !   REST API Simplifies integration
                                               SIM PLIFIED ADM INSTRATION
                                               !   Easy-to-use Installer & Admin UI
                                                   Streamlines      startup   and     common
                                                   configuration tasks
                                               !   Data Connectors for databases, file systems,
                                                   Web sites, SharePoint and more
                                               !   Multiple file types MS Office, PDF, native
                                                   XML format documents and more
                                               !   Security: LDAP-aware, document level, role-
                                                   based, policy-driven.
                                               ADVANCED USER EXPERIENCE
                                               !   Enriched Query Parsing: more resilient
                                                   interpretation of user input
                                               !   Click Scoring: boosts results based on user
                                                   behavior
                                               !   User     Alerts:    Automatic    notification
                                                   of new results
                                               !   Integrated Auto-complete and spellchecking.




                                               8
LUCENE REVOLUTION San Francisco 2011



Global Expertise: Training & 24x7 Services
Lucid Imagination offers a deep bench of resources in search and open source, backed by
unmatched experience with thousands of diverse search applications at the world’s largest
companies.
TRAINING
A comprehensive selection of courses and classes for developers, system administrators, managers,
and search application users on LucidWorks Enterprise, Solr and Lucene; instruction is offered in
a variety of formats around the world.
CONSULTING
Our unique ExpertLink Advisory Services provides consultative guidance on design and
optimization for search applications during development and production to ensure your
Lucene/Solr implementations meet the requirements of your business.
ENTERPRISE SUPPORT AND SUBSCRIPTIONS
Lucid Imagination offers attractively priced subscriptions that deliver Solr/Lucene technology in an
integrated, well-packaged format. Subscriptions combine stability, security, robust interfaces, and
predictable release schedules with unmatched support resources in reach 24 x 7 x 365 across the
globe.




                                                9
San Francisco 2011      LUCENE REVOLUTION




Platinum Sponsor: Basis Technology
Basis Technology provides software solutions for multilingual text analytics, information retrieval,
and name resolution. Our Rosette© Linguistics Platform is the text analysis engine behind many
commercial and government search-based applications, adding language support to Lucene and Solr
for better search precision and recall in English or 27 other languages. Starting with language
identification in 55 languages, our high quality linguistic analysis seamlessly integrates into Lucene
and Solr via a connector — enabling customizable tokenization and stemming/lemmatization for
languages like Chinese, Japanese, Arabic, and Persian. Dictionary-based decompounding is available
in German, Dutch, Danish, Swedish, Norwegian, and Korean. Entity extraction enriches search by
adding auto-generated metadata and faceted navigation to results. Implementing support for new
languages to Solr is less than a day’s work.
The Rosette Platform powers search, business intelligence, e-discovery, and other enterprise and
government applications for customers worldwide including: Microsoft/Bing, Cisco, EMC, Endeca,
Oracle, and Yahoo!
                                                                                !!!"#$%&%'()*")+,-
                                                                                                        -




                                                10
LUCENE REVOLUTION San Francisco 2011



Exhibitors
SALESFORCE.COM
Salesforce.com is the enterprise cloud computing leader and the world’s 4th fastest-growing
company. We’re also one of the “Best Places to Work” (FORTUNE). Salesforce.com’s Search Team
is strong and experienced, with deep architecture expertise. We’re dedicated to delivering the fastest,
most reliable cloud-scale enterprise search in the industry. In addition to innovating around
scalability and security, we strive to delight our end users with an original, intuitive user experience
and relevancy that’s adaptive, robust, and deeply satisfying. If you share our passion for search and
for solving tough problems, swing by our booth to chat.
                                                                                 !!!"%$.(%/+0)(")+,-
SEARCH TECHNOLOGIES
Search Technologies is the leading independent provider of search engine integration and support
services. Operating internationally, we help clients to gain business advantage using search. Our
technical team of more than 80 experts is the most experienced group of search implementation
professionals globally, and this mitigates risk for our customers. In short, we are the experts at fine-
tuning search applications to deliver business benefits.
                                                                       !!!"%($0)*'()*1+.+2&(%")+,-
DOCUM ILL
Documill is an independent software vendor (ISV) enabling browser-based access to Microsoft
Office and PDF documents and empowering high volume server-side content processing
solutions.Documill Visual Search dramatically improves search user experience and discoverability
of multi-page documents. Instant document previews and page-level search results improve
document data mining experience and accuracy. With page-level bookmarking features, Documill
Visual Search enables collaborative search, allowing users to take actions based on their findings,
share results and syndicate relevant pages into new documents.
                                                                                   !!!"3+)4,&..")+,-




                                                 11
San Francisco 2011      LUCENE REVOLUTION


Community Sponsors
SEM ATEXT
Sematext is a software products and services company focused on Search & Analytics using Lucene,
Solr, Nutch, Hadoop, HBase, Flume, Mahout, and other open-source technologies. Sematext also
offers Lucene & Solr technical support subscriptions, consulting packages, and training. The
company also runs the popular search-hadoop.com and search-lucene.com sites. Founded in 2007 in
New York, Sematext is privately held and self-funded with presence in North America and Europe.
Sematext’s customers include The Library of Congress, Lockheed Martin, Simon & Schuster,
Salesforce, NAVTEQ, Comcast, Cox Communications, ProQuest, Citysearch, Gilt Groupe,
Autodesk, and many others.
                                                                                 !!!"#$%&'$('")*%+
EM C CORPORATION
EMC Corporation is the world’s leading developer and provider of information infrastructure
technology and solutions that enable organizations of all sizes to transform the way they compete
and create value from their information.We can help you design, build, and manage flexible, scalable,
and secure information infrastructures. And with these infrastructures, you’ll be able to intelligently
and efficiently store, protect, and manage your information so that it can be made accessible,
searchable, shareable, and, ultimately, actionable.In short, with an information infrastructure, you
can avoid the potentially serious risks and reduce the significant costs associated with managing
information, while fully exploiting its value for business advantage.
                                                                                       !!!"$%)")*%+
SPRINGSOURCE, A DIVISION OF VM W ARE, INC.
SpringSource, a division of VMware, Inc., (NYSE: VMW), employs the open source leaders who
created and drive innovation for Spring, the de facto standard programming model for enterprise
Java applications, as well as the Java and web thought leaders within the Apache Tomcat, Apache
HTTP Server, RabbitMQ, Hyperic, Groovy and Grails open source communities. SpringSource
forges open source innovations to create lean and powerful technology that people love to use.
From high productivity developer tools and framework to lightweight application server runtimes
including data management solutions for the hardest enterprise and cloud scale problems,
SpringSource provides solutions for tomorrow’s enterprise challenges.
                                                                             !!!"#,-./0#*1-)$")*%"+




                                                 12
LUCENE REVOLUTION San Francisco 2011


M ANNING PUBLICATIONS
Manning Publications offers computer books for professionals—programmers, system
administrators, designers, architects, managers and others. Manning’s focus is on computing titles at
professional levels. We care about the quality of our books. Our books are designed without
gimmicks. Their main goal is elegance and readability—we feel the two are often the same. Our
covers are understated, decorated with pictures of worldwide regional dress habits of two hundred
years ago. Many of our books come with online reader support: authors answer the questions of
their readers in our Web-based Author Online discussion forums.
                                                                 -       -       !!!",$11&12")+,-
DZONE
DZone is a social linking and blogging network for the developer and IT communities. According to
PC Magazine, “DZone is a developer’s dream—a vast network of user-submitted links to message
boards, news, coding tricks, and more.” Launched in June, 2006, DZone is in Alexa’s top 3000 sites,
surpassing established leaders like DevX, Sys-con, FTP Online and TheServerSide.com. DZone is
the only vertically focused site regularly listed among the web’s largest social bookmarking sites. In
its first year of operation DZone sent over 5 million visitors to other developer websites. Today,
DZone has curated topic pages for Java, Solr/Lucene, Cloud Computing, PHP, Agile, Mobile, and
much more.
                                                                                    !!!"37+1(")+,-
TNR GLOBAL
TNR Global is a systems design and integration company focused on enterprise search and cloud
computing solutions. TNR develops scalable, fault-tolerant web-based search solutions built on the
open source LAMP stack and utilizing Amazon Web Services and/or physical servers. TNR has
over ten years of experience in web systems and enterprise search implementations, both proprietary
and open source, and specializes in Lucene Solr and FAST ESP search applications. TNR Global
builds solutions for: Vertical Search Engines, Publishing, Web Directories, News Sites, Information
Portals, Web Catalogs, Education. We also work with web based startups to build scalable services.
                                                                                !!!"'102.+#$.")+,-
UCHIDA SPECTRUM
Uchida Spectrum, Inc. (USI) is a leader in the Japan search market. USI provides SMART/Insight, a
search application that integrates and analyzes enterprise information. SMART/InSight is used by
leading blue chips, like Canon and Moody’s. USI is working with Lucid Imagination as its Strategic
Alliance Partner to integrate LucidWorks Enterprise into its products and offer Lucene/Solr support
services. In 2011, USI expanded its offerings to Enterprise Search and Web Services/Ecommerce
companies across Asia. USI now serves clients and partners in Japan, India, China and Singapore.
                                                                                !!!"%6()'04,")+"86-



                                                13
San Francisco 2011      LUCENE REVOLUTION




Scaling Search With Big Data And Solr
Scaling Search with Big Data and Solr is a 2-day instructor-led, hands-on classroom training course
delivered by instructors certified by Lucid in a shared classroom setting. The class is for Solr
developers who want to know how to leverage the flexible search functionality of Apache Solr and
the Big Data processing of Apache Hadoop, to create the indexes for both general search and
augmented data analytics. Lab exercises and real-world examples will be used to reinforce content.
We’ll start with Hadoop from the ground up, and cover MapReduce, HDFS—the Hadoop
Distributed File System, cluster management, “the shuffle,” etc., before continuing on to connecting
it to Solr. We’ll look at common use cases for generating search indexes from big data, typical
patterns for the data processing workflow, and how to make it all work reliably at scale. We will
explore in-depth an example of processing 1 billion records to create a faceted Solr search solution.
You’ll learn how Solr can be used as a NoSQL solution, and how it compares to classic NoSQL
projects such as Cassandra and HBase.
The class will continue with techniques for scaling your Solr installation, how to identify bottlenecks
in your Solr installation, how to monitor your installation, and how determine resource usage. We’ll
also cover various Solr architectures, their characteristics and use cases. We’ll examine how to apply
these to make appropriate tradeoffs to effectively scale your Solr installation.
THE COURSE COVERS
       !   An overview of Hadoop.
       !   Understanding MapReduce.
       !   Principles of Hadoop development, operations & eco-system.
       !   How to use Hadoop with Solr.
       !   How to Index large volumes of data.
       !   How to effectively search large indexes.
       !   Understanding NoSQL.
       !   How to shard/federate/replicate your data for large indexes.
       !   Understanding resources cost & tradeoffs for Solr Features.
PREREQUISITES
Prospective students should be familiar with Solr, obtained either through work experience with
Solr, or having completed the Lucid Imagination Solr training course. It is assumed the student does
not have prior Hadoop experience.




                                                 14
LUCENE REVOLUTION San Francisco 2011


Developing Search Applications With Lucidworks Enterprise
Developing Search Applications with LucidWorks Enterprise is a 2-day instructor-led, hands-on
classroom training course designed and developed by the engineers that developed LucidWorks
Enterprise (LWE), and delivered by instructors certified by Lucid in a shared classroom setting.
The objective of this course is to introduce LucidWorks Enterprise to users with no previous
experience working with search applications. Through a combination of lectures and hands-on lab
exercises you will learn how to get up and running with LucidWorks Enterprise, what the
components of a search application are, and how to make your content searchable and findable in a
search application built on LucidWorks Enterprise. There will be time for questions and discussion
to enhance your learning experience.
At the end of the course you will know what a search application is, and how to set up and use
LucidWorks Enterprise to index and search your content. You will also learn about all of the
features LWE such as highlighting, spell checking, and custom alerts, and how to use these features
to build a satisfying search experience for end users who will search your content.
THE COURSE COVERS
       !   What a search application is and how to build one with LucidWorks Enterprise.
       !   How to install and configure LWE.
       !   How to make your content searchable and findable.
       !   How to work with different data sources such as web pages, relational databases, and
           rich content files.
       !   How to build queries to search for content in LWE.
       !   Techniques and features in LWE that can be used to make results for end users more
           relevant.
       !   Different ways to process search results returned by LWE.
PREREQUISITES
No programming skills are necessary, however some technical background and familiarity with
application development will be helpful. There will be labs accompanying the lectures that will
require basic computer skills including how to run a simple command from the command line.No
previous experience with search applications is necessary.




                                               15
San Francisco 2011      LUCENE REVOLUTION


Solr Application Development Workshop
Solr Application Development Workshop is a two-day hands-on training course designed and
developed by the engineers that helped write the Apache Lucene/Solr code, and delivered by
instructors certified by Lucid in a shared classroom setting. The workshop is targeted at developers
who want to build applications with Apache Solr, the Lucene Search Server. You will learn how to
set up and use Solr to index and search, how to analyze and solve common problems, and how to
use optional Solr modules such as facets, spell check, and highlighting. Lab exercises and real-world
examples will be used to reinforce content.
There will be time for questions and discussion to enhance your learning experience. At the end of
the course you will understand how to set up and use Solr to index and search, how to analyze and
solve common problems, and how to use optional Solr modules such as facets, spell check, and
highlighting.
THE COURSE COVERS
       !   Principles of search application development
       !   Common search use cases and their application
       !   How to make content searchable
       !   Key Solr and Lucene concepts
       !   Basics of indexing and searching using Solr
       !   How to design and run a Solr application
       !   Best practices for indexing, searching and performance
       !   Techniques to analyze and resolve common search problems
       !   How to leverage Solr’s optional modules including spell checking, highlighting, Data
           Import Handler, Tika Integration and other popular capabilities
       !   Advanced topics in designing Solr apps and running a site
       !   Solr operations and deployment tools and strategies
       !   How to customize and extend Solr
PREREQUISITES
Some programming skill and experience with a modern programming language such as Java, PHP,
Perl, Ruby, .NET, or any language that supports HTTP and/or XML.




                                                16
LUCENE REVOLUTION San Francisco 2011


Lucene Application Development Workshop
Lucene Application Development Workshop is a two day instructor-led hands-on training
workshop, written and led by the engineers who helped write the Apache Lucene/Solr code. The
objective of this course is to provide you with real life use cases and teach you how to apply Lucene
to real business requirements. During the course you will learn to apply best practices in developing
scalable, highly available and high performance search applications.
There will be time for questions and discussion to enhance your learning experience.
THE COURSE COVERS
       !   Principals of search application development.
       !   Common search use cases and their application.
       !   How to make content searchable.
       !   Key Lucene concepts.
       !   Basics of indexing and searching with the Lucene APIs.
       !   Best practices for indexing, searching and performance.
       !   Analysis techniques for solving common search problems.
       !   Lucene Internals.
       !   Lucene’s optional modules to enable spell checking, highlighting and other common
           search features.
PREREQUISITES
Basic Java programming skills




                                                17
San Francisco 2011      LUCENE REVOLUTION




The Once and Future History
of Enterprise Search and Open Source
M ARC KRELLENSTEIN | LUCID IM AGINATION
While it remains challenging to build best practice search applications, core search technology has
become commoditized. Open source Lucene/Solr represents the best form of that commodity, as
good as or better than any commercial search technology while also providing the cost, control and
flexibility advantages of open source. In this talk, we’ll look at how past challenges in search were
met and new ones evolved, and the place of Lucene/Solr in that evolution.

From Publisher To Platform: How The Guardian
Embraced the Internet using Content, Search, and Open Source
STEPHEN DUNN | GUARDIAN NEW S AND M EDIA UK
In 2009 The Guardian launched The Open Platform, a suite of services and tools that enable
content partners and developers to build applications with The Guardian’s rich content. The content
API, hosted on Solr instances on EC2, contains JSON representations of all Guardian articles back
to 1999 - over 1 million articles, and is an increasingly complete representation of the output of the
organization. The DataStore contains curated data sets for use in applications and virtualizations.
This talk will cover how The Guardian opened up their business, enriched it, and reached new
markets with its Open Platform strategy. Stephen will cover the technical architecture,
implementation of Solr (the key technology powering the platform), and how The Guardian has
used it to embrace disruption in the media space, while finding new sources of revenue and
innovation. With two years since its launch, Stephen will cover some of the lessons learned, and
explain how the Guardian complements use of Solr with other open-source non-relational
technology, as it platform evolves.

All Data Big and Small
STEPHEN O’GRADY | REDM ONK
The last twenty four months have seen a veritable explosion in discussion around what is commonly
referred to as Big Data and the infrastructure technology employed to manage it. The wealth of
available open source software means that businesses from any industry have easily accessible tools
with which to tackle projects that would have been out of their reach just a few years prior. Less
heralded, however, has been the fact that making data actually useful - whatever its size - remains a
challenge. In this session we’ll explore the role of search in putting data - big and small - to work
answering the important questions for businesses and society by reducing the friction between
question and answer.

                                                18
LUCENE REVOLUTION San Francisco 2011




Integrating Advanced Text Analytics into Solr
STEVE KEARNS | BASIS TECHNOLOGY
Text analytics provides a number of interesting analytic capabilities that can enhance enterprise
search applications, though in practice it is not always obvious how these can be integrated
effectively into Solr. This presentation will describe some of the practical ways that leading
organizations are using text analytics by integrating them directly into Solr and their user interface to
improve relevance, navigate results, and discover new information. The combination of Solr and
quality text analytics can improve existing keyword search solutions, and enable new ways of
discovering knowledge hidden in existing data.

Finite State Automata in Lucene: Internals and Applications
DAW ID W EISS | POZNAN UNIVERSITY OF TECHNOLOGY, POLAND
Finite state automata and transducers made it into Lucene fairly recently, but already show a very
promising impact on search performance. This data structure is rarely exploited because it is
commonly (and unfairly) associated with high complexity. During the talk, I will try to show that
automata and transducers are in fact very simple, their construction can be very efficient (memory
and time-wise) and their field of applications very broad. This will be backed by an introduction to
how FSTs are implemented in Lucene (construction and traversals) and practical use cases of where
FSTs have been useful so far. If you’d like to see how to squeeze a 150MB of text data into 1.8MB
of compact data structure, this talk is for you.

Case Study - Panasonic Europe Powered by Apache Solr
DANIEL POTZINGER | AOE M EDIA GM BH
In 2010 Panasonic made the decision to replace their legacy enterprise search tool and switched the
search for all their European websites to a Apache Solr based solution. Now their customers benefit
from an incredibly fast and feature rich solution that is much more than just a search and has
become a valuable sales-driving tool for Panasonic. Features like relevancy manipulation,
autosuggest, contextual filtering for properties like color or product category were implemented
under not the most ideal circumstances mainly that there was no access to structured data. The
search was rolled out in close to 30 countries so far also putting Solr multi-lingual handling to a test.




                                                  19
San Francisco 2011     LUCENE REVOLUTION


Real-time Search at Yammer
BORIS ALEKSANDROVSKY | YAM M ER, INC.
This talk will be focused on the architecture, scalability concerns, performance bottlenecks,
operational characteristics and lessons learned while designing and implementing Yammer
distributed real-time search system. Yammer is an enterprise social network SaaS offering with over
100,000 networks (including 85% of the Fortune 100) and nearly 2 million users. The search system
we developed scales well up to 1B messages and serves a foundation of knowledge base analysis
services Yammer is developing.

Boosting Documents in Solr by Recency,
Popularity and Personal Preferences
TIM OTHY POTTER | NATIONAL RENEW ABLE ENERGY LABORATORY (NREL)
Attendees with come away from this presentation with a good understanding and access to source
code for boosting and/or filtering documents by recency, popularity, and personal preferences. My
solution improves upon the common “recipe” based solution for boosting by document age. The
framework also supports boosting documents by a popularity score, which is calculated and
managed outside the index. I will present a few different ways to calculate popularity in a scalable
manner. Lastly, my solution supports the concept of a personal document collection, where each
user is only interested in a subset of the total number of documents in the index. My presentation
will provide a good example of how to filter and/or boost results based on user preferences, which
is a very common requirement of many Web applications.

Jazzed about Solr: People as a Search Problem
JOSHUA TUBERVILLE | EHARM ONY
Search oriented architectures are obvious approaches for web pages, emails, documents, and other
text based entities. Often with traditional structured data, text searching is “added on” to the
traditional Boolean queries in relational stores. When Jazzed was initiated we wanted search to be
front and center. When we evaluated Solr we realized we could take the opposite approach “add on”
Boolean components to textual searches. This hybrid query approach makes transitioning to flexible
ranking easy and straightforward. In this talk we will cover
       !   How we model semi-structured user data in Solr
       !   Indexing strategies and their tradeoffs
       !   Where in Jazzed architecture Solr does and doesn’t fit
       !   What aspects of Solr we are using
       !   Future considerations




                                               20
LUCENE REVOLUTION San Francisco 2011


Heavy Committing: DocValues
aka. Column Stride Fields in Lucene 4.0
SIM ON W ILLNAUER | APACHE LUCENE PM C
Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside
Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the “next
generation” features. DocValues enable Lucene to efficiently store and retrieve type-safe Document
& Value pairs in a column stride fashion either entirely memory resident random access or disk
resident iterator based without the need to un-invert fields. Its final goal is to provide a
independently update-able per document storage for scoring, sorting or even filtering. This talk will
introduce the current state of development, implementation details, its features and how DocValues
have been integrated into Lucene’s Codec API for full extendability.

Search, APIs, capability management and the Sensis journey
CRAIG REES | SENSIS
Earlier this year, Sensis launched its Business Search API, which allows publishers to develop local
search propositions powered by the two million business listings contained in the Australian Yellow
Pages® and White Pages® directories.
This case study will explore Sensis’ strategic direction for search and explain how the framework and
metrics by which search is managed at Sensis were used to define our search roadmap. Key
architectural decisions including our use of Solr and MongoDB will be discussed as well as our
approach to real-time search tuning and quality management.

A Study of I/O and Virtualization Performance with
a Search Engine based on an XML database and Lucene
ED BUECHE | EM C
Documentum xPlore provides an integrated Search facility for the Documentum Content Server.
The standalone search engine is based on EMC’s xDB (Native XML database) and Lucene. In this
talk we will introduce xPlore and some of its key components and capabilities. These include aspects
of a tight integration of Lucene with the XML database: xQuery translation and optimization into
Lucene query/API’s as well as transactional update Lucene). In addition, xPlore is being deployed
aggressively into virtualized environments (both disk I/O and VM). We cover some performance
results and tuning tips in these areas.




                                                21
San Francisco 2011        LUCENE REVOLUTION



Four Pillars of Designing the Search Experience
TYLER TATE | TW IGKIT
Lucene and Solr provide many excellent tools for presenting information to users, but what makes
some search user interfaces better than others? Should you aim for a rich, advanced UI or should
you “just make it look like Google”? Through his work at TwigKit with blue-chip corporations,
scientific institutes, and governments, Tyler has identified four guiding pillars of the search
experience:
        ! User Expertise - Novices orienteer, experts teleport
        ! User Behaviour - Lookup, learn, and investigate
        ! Information Diversity - homogenous vs. heterogenous data
        ! Situational Context - factors from the surrounding environment
We’ll delve deep into each dimension and discuss how to achieve useful, useable, and beautiful
search interfaces using design patterns including: autocomplete, faceted navigation, breadcrumbs,
best bets, related searches, spelling suggestions, clickable metadata, result clustering, saved searches,
data visualisation, and more.

Using Solr in Online
Travel Shopping to Improve User Experience
ESTEBAN DONATO, SUDHAKARA KAREGOW DRA AND RAM ON RESM A | TRAVELOCITY
In this talk we would like to present three different use cases of Solr in the travel industry. First of all
we would describe how we implemented faceted navigation for hotel shopping. Then, we will
introduce how we implemented destination searching functionality like auto-complete and
misspelling. Lastly, we will show you how we integrated Solr to provide better experiences to mobile
users.

Solr @ eBay Kleinanzeigen
OLAF ZSCHIEDRICH | EBAY.DE
Attendees will learn how eBay Germany has implemented Solr, why Solr was selected, which Solr
features are utilized. and how Solr is configured and used in production. Recommended best
practices will be profiled alomng with eBay Kleinanzeigen plans for future deployment of Solr.




                                                   22
LUCENE REVOLUTION San Francisco 2011


Rapid Prototyping with Solr
ERIK HATCHER | LUCID IM AGINATION
Got data? Let’s make it searchable! This interactive presentation will demonstrate getting documents
into Solr quickly, will provide some tips in adjusting Solr’s schema to match your needs better, and
finally will discuss how showcase your data in a flexible search user interface. We’ll see how to
rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will
be enough time left to outline the next steps in developing your search application and taking it to
production.

Search Analytics: What? Why? How?
OTIS GOSPODNETIC | SEM ATEXT
You’ve indexed your data and people are searching it. But how do you know if they are happy with
the results? How do you know if they are finding what they need? With search increasingly
becoming the primary information access mechanism, knowing how your search is doing is not just
a matter of mere curiosity, but often has direct business impact. In this talk we’ll talk about Search
Analytics and how it can be used to answer questions like:
        ! Are too many users getting the dreaded “no matches” results?
        ! How deep into search results do people dig?
        ! Which hits are they clicking on, or what percentage of them don’t click on any hits?
        ! How much do they use the Did You Mean or Auto-Complete suggestions?
We’ll explore what specific Search Analytics reports tell us and what specific actions you should take
based on those reports.




                                                 23
San Francisco 2011      LUCENE REVOLUTION

“Stump The Chump”: Get
On The Spot Solutions To Your Real Life Solr/Lucene Challenges
GRANT INGERSOLL | LUCID IM AGINATION
Got a tough problem with your Solr or Lucene application? Facing challenges that you’d like some
advice on? Looking for new approaches to overcome a Lucene/Solr issue? Not sure how to get the
results you expected? Don’t know where to get started? Then this session is for you.
Now, you can get your questions answered live, in front of an audience of hundreds of Lucene
Revolution attendees! Back again by popular demand, “Stump the Chump” at Lucene Revolution
2011 is hosted by PMC chairman and Lucid Imagination co-founder Grant Ingersoll. All you need
to do is send in your questions to us here at info@lucenerevolution.org. You can ask anything you
like, but consider topics in areas like:
        ! Data modelling
        ! Query parsing
        ! Tricky faceting
        ! Text analysis
        ! Scalability
You can email your questions to info@lucenerevolution.org. Please describe in detail the challenge
you have faced and possible approach you have taken to solve the problem. Anything related to
Solr/Lucene is fair game. Our MC will read the questions, and Grant will have to formulate a
solution on the spot. A panel of judges will decide if he has provided an effective answer. Prizes will
be awarded by the panel for the best question—and for those deemed to have “stumped the
chump”.




                                                 24
LUCENE REVOLUTION San Francisco 2011




Improve Relevance by Using
Morphology and Named Entity Recognition
CHRISTOPH GOLLER, DIRECTOR, RESEARCH | INTRAFIND SOFTW ARE AG
This talk will show how the relevance of search results can be improved by using morphology and
named entity recognition. After briefly explaining the purpose of morphological analysis and of
named entity recognition we will analyze their potential advantages for search, faceting, and
clustering of search results. Based on these ideas we will briefly sketch details how to implement a
morphological analyzer in Lucene and how to implement a natural language question answering
system based on Lucene using named entity recognition. The talk will be accompanied by a life
demo of these ideas.
BIO:
Christoph Goller has more than 10 years of experience in the search industry. He got a Ph.D in computer science from
the Technical University of Munich where he worked in several research projects on artificial intelligence, machine
learning and neural networks. Christoph started his career at Lernout & Hauspie. Since 2002 he has been Director
Research of Intrafind Software AG (www.intrafind.de), a German company specializing in full-text search and text
mining based on Lucene/Solr. Christoph has been a Lucene committer since 2004. He has accompanied dozens of
commercial projects using Lucene and Solr. Christoph is author of more than 15 scientific papers, frequently gives
presentations on search related topics and is responsible for partner training at Intrafind.

Scientific Data Search
in the Pharmaceutical Industry with Solr
JEFFREY GUO, CEO | SEM TIFIC SOFTW ARE, INC.
Tremendous amount of experimental information and scientific knowledge has been locked or lost
in data silos in the forms of semi-structured or unstructured data in today’s pharmaceutical industry.
Out of the box full text search engines do not understand embedded scientific terms and objects
and their relationships to facilitate context sensitive and relevant searches. This presentation will
discuss a successful implementation at a major pharmaceutical company that utilizes Solr as
enterprise search platform and enhances it with chemistry (molecular entities and reactions) search
capabilities. The scope of the document indexing process is expanded to cover embedded chemistry
objects and terms of various types such as common chemical names, corporate IDs, SMILES, and
InChI from documents. Scientifically aware search based on query structure drawing or chemical
terms is therefore enabled. Enterprise scientific search strategies and lessons learned will be
discussed during the presentation.
Bio: Founder of Semtific Software, Inc., a company that provides products and services that streamline drug discovery
workflow and enterprise search of scientific research data.
                                                       25
San Francisco 2011         LUCENE REVOLUTION


Using Lucene’s Test Framework
ROBERT M UIR | LUCID IM AGINATION
The Lucene/Solr community takes testing seriously: we have a suite of over 3500 tests to ensure
software quality. Over time we accumulated some useful extensions to JUnit testing, and several
people found themselves using our extensions for other projects. We released this “test framework”
for the first time in Lucene 3.1, and this talk is a short summary of its feature list to hopefully
encourage you to go check it out for yourself. Find out how you can:
! Improve test coverage for custom Lucene components.
! Speed up your unit test suite by running tests in parallel
! Find resource leaks, localization or timezone-sensitive bugs in your application
! Use our extensions to make unit tests easier to write.
Bio: Robert Muir, software engineer for Lucid Imagination, us a Lucene/Solr committer & PMC member.

Using Apache Solr and Active Directory to
unify data access across Intranet, ERP and Filesystem Cluster
ROBERT W EIßGRAEBER, PROJECT DIRECTOR | LIGHTW ERK
Solr is tightly linked into all available data and business intelligence sources in the enterprise:
Indexing the TYPO3 CMS-based Intranet, downloads, forms, handbooks, an Oxaion based ERP-
Database, and the file system Cluster running Microsoft Distributed File System – using TIKA for
full-text content extraction. All data is connected via ActiveDirectory servers into user based fine-
grained access control lists, which are evaluated in real-time and early-binding mode by Solr. A
worldwide Solr-Cluster using different shards gives additional security for world-wide deployment,
e.g. keeping confidential data inside the headquarters own data centers.
Bio: Robert Weißgraeber is Project Director at Lightwerk, primary specialized in designing, planning and executing
corporate portals.




                                                      26
LUCENE REVOLUTION San Francisco 2011


Thousands of Indexes in the Cloud
SHANEAL M ANEK, LEAD SEARCH ENGINEER | GREPLIN
Indexes at Greplin are strange - instead of having one giant index that is searched all the time and
updated infrequently, there are thousands of relatively small indexes that are updated much more
frequently than they are searched. These unorthodox requirements lead to an unorthodox
architecture that uses techniques inspired by Zoie and Bobo. We will discuss techniques that allowed
us to exploit the inherent shardability and access patterns of our data to build an extremely high
throughput information retrieval architecture. We will also examine some of the challenges and
opportunities presented by running Lucene on Amazon’s Elastic Compute cloud.
Bio: Shaneal Manek is the lead search engineer at Greplin. He was previously the founder and CTO of Signpost.com,
which built a geospatial search and recommendation engine on top of Lucene and Lisp.




                                                      27
San Francisco 2011       LUCENE REVOLUTION




Intuit’s Live Community
FLOYD M ORGAN | INTUIT
TurboTax Live Community is a large-scale web application that uses user contribution and open
source technology to assist millions of TurboTax users complete their tax returns. Other benefits
from Live Community include reducing support calls, highly effective advertising campaigns,
usability engineering and new for this year conversion prediction analytics. I will present how
Solr/Lucene powers the many facets of TurboTax Live Community now in the future.

Highly Relevant Search Result Ranking for
Large Law Enforcement Information Sharing Systems
RONALD M AYER | FORENSIC LOGIC
Law enforcement data has many interesting complexities for search. Cross-agency searches are even
more challenging because each agency has its own shorthand. Many different types of similarity
between search clauses and documents should influence the ranking of results. For example, a
search clause mentioning a “tall suspect” might want to include results with “6 foot 4 suspect”.
Spatial clusters are important, as are temporal patterns. Different fields may be more or less
important depending on the type of crime—for example, a victim’s race may matter more than a
vehicle’s make in a sex crime but less in an auto theft. Also, documents may be related to each other
in various ways that may also affect their ideal search ranking.
Solr’s great flexibility in its analyzers, filters, synonyms, and boosting make it excellent tool for such
diverse requirements. We’ve contributed a patch to Solr (#SOLR-2058) that helped further improve
search result ranking for cases where a search for a suspect with a “red baseball cap, black leather
jacket” is compared against many documents mentioning red caps, black caps, etc. This presentation
will describe how we addressed some domain-specific challenges of our data.

Using Solr/Lucene/LWE for eCommerce
GRANT INGERSOLL | LUCID IM AGINATION
If your user can’t find it, they can’t buy it right? In this talk, Apache Lucene and Solr committer
Grant Ingersoll will discuss architecture, techniques and tips for successfully deploying search tools
like Lucene, Solr and LucidWorks Enterprise in eCommerce environments.




                                                  28
LUCENE REVOLUTION San Francisco 2011


Flexible Indexing in Lucene 4.0
UW E SCHINDLER | SD DATASOLUTIONS
Apache Lucene’s next major release, 4.0, will introduce lots of flexibility into indexing, but also
fundamental changes to the well-known APIs: It features a new and consistent, 4-dimensional
iteration API on top of a low-level, pluggable codec API giving applications full control over the
postings data. Terms are now arbitrary opaque bytes enabling users to store terms in any encoding,
not necessarily UTF-8, natively in the index (e.g. numeric fields). Currently under development is a
higher performance postings iteration API, enabling interesting codecs based on recent encoding
algorithms to work effectively. Several codecs have already been created, including the default
“standard” codec, which enables sizable RAM reduction for searchers, and a “pulsing” codec that
inlines postings data directly into the terms dictionary, which provides a solid performance boost for
primary key fields. A lot of new codecs are under development like “PFOR”, “FOR”, “AFOR”, or
“Simple64”. In this talk, Uwe presents an overview of all of these exciting changes, as well as several
concrete, real-world examples of how applications can tap into these new features.

Transforming the House Hunting Experience: How Solr is Helping
Trulia Reshape the Real Estate Industry
ALEXANDER KANARSKY | TRULIA
Trulia is a real estate search company that helps customers find homes for sale or to rent and
provides them with information to help them make better decisions in the process. It is also a hub
for real estate professionals to market their listings, view real estate data and promote their services.
The presentation describes how Solr helped Trulia to transform the traditional real estate experience
and make real estate data accessible and understandable to millions of users. It discusses approaches
we took to achieve this by using custom-built distributed index management, indexing integration
with Hadoop and geospatial search enhancements to Solr.




                                                  29
San Francisco 2011      LUCENE REVOLUTION


Extending Solr: Behind CareerBuilder’s
Cloud-like Knowledge Discovery Platform
TREY GRAINGER| CAREERBUILDER
For CareerBuilder, a 1% deviance in search relevancy can mean millions of missed job opportunities
for our users. When CareerBuilder moved to Solr from an expensive, proprietary search vendor, our
top priorities were maintaining the quality of our search results and drastically improving our agility.
This talk will describe how we addressed both needs. For search quality, we’ll cover some of our
internal studies and resulting methods for dealing with multi-lingual content across dozens of
languages, as well as customizing and experimenting with relevancy calculations. For platform agility,
we’ll discuss CareerBuilder’s cloud-like search API framework which seamlessly handles millions of
searches an hour, processes hundreds of millions of documents, and is powered by hundreds of
globally-distributed servers. Come hear the results of our studies and some best practices for quality
and performance. Learn how our framework has lead to staggering improvements in both
maintainability and technology innovation, allowing us to learn from our content, not just find it.

Handy Installation Tool “Anuenue” for Solr Cluster & Implemen-
tation of “Did you mean” Facility for Queries in Japanese
TAKAHIKO ITO| M IXI
mixi is one of the largest social networking services in Japan, providing various communication
services for over 14M monthly active users. The latest internal mixi project is to replace the in-house
search engine with Apache Solr. This session covers two topics
a simple packaging system for Solr that eases the installation process and daily operations, and
implementation of a “Did you mean” facility for Japanese queries using a log mining tool. These
tools have been released as OSS projects.

Implementing Click-through
Relevance Ranking in Solr and LucidWorks Enterprise
ANDRZEJ BIALECKI | LUCID IM AGINATION
This talk will present what are click-through events and how to process them with LucidWorks
Enterprise. This innovative technique puts powerful search and relevancy at your fingertips—at a
fraction of the time and effort required to program them yourself with native Apache Solr. Andrzej
will discuss and present how you can use LucidWorks Enterprise for:
       !   Click Scoring to automatically configure relevance for most popular results
       !   Simplified implementation of auto-complete and “did-you-mean” functionality
       !   Unsupervised feedback to automatically provide relevance improvement on every query



                                                 30
LUCENE REVOLUTION San Francisco 2011


Using Solr to find the Right Person for the Right Job
LAURA KANG | THELADDERS
In this talk, we’ll describe how TheLadders.com uses Lucene/Solr to instantly recommend
candidates to a recruiter when he/she posts a job on the recruiter site. Our matching algorithm
scores candidates from our job seeker site based on the criteria and description of jobs and job
seekers’ resume and profile data. This helps recruiters quickly identify candidates that are right for
the job and increases the chance of our job seekers getting hired.
The talk covers an overview of our Solr architecture and a description of our matching algorithm.
We’ll also a discuss criteria for evaluating the algorithm, including an overview of our testing
sessions and their format. Finally, we’ll also demo the feature so you can see how it works in
practice.

Using Solr For Enabling Highly Customized Sitewide Navigation
SHANTANU DEO | AT&T
The organization needed to enable a very customizable form of Global Navigation for the various
types of users (based on their profile and other factors). This would normally have involved complex
logic to figure out the appropriate set of links to show for a customer, and would have been a
maintenance nightmare. Instead we approached the problem as a search problem. Coupled with a
novel encoding scheme we were able to solution the problem simply by searching on the customers
profile groups and return a coherent global navigation using Solr to index the data. This has resulted
in a very simple to understand and maintain solution that will stand in good stead in the future. The
presentation is meant to be a description of using Solr to implement a real-world application.

Building Specialized Industry Applications
Using Solr, And Migration From FAST ESP
RAHUL AGARW ALLA | UCHIDA SPECTRUM INC.
Uchida Spectrum, Inc. is a leader in the Japan search market. USI provides SMART InSight, a search
application used by many Fortune 500 companies for specialized industry applications like R&D and
quality assurance for manufacturing, claims and customer management etc.
Originally SMART/InSight was based on Microsoft FAST. This talk will review how
SMART/InSight has migrated from FAST ESP to LucidWorks Enterprise, and how
SMART/InSight incorporates virtual data integration, enterprise search, and the ability for users to
have a unified way to navigate diverse data sources, analyze data more easily, and personalize results.
Several use cases will be profiled with demonstrations of real-world use cases.




                                                 31
San Francisco 2011      LUCENE REVOLUTION


The Seven Deadly Sins of Solr
JAY HILL | LUCID IM AGINATION
Sloth. Greed. Pride. Lust. Envy. Gluttony. Wrath. Getting started with Solr can present some pitfalls
and temptations, often turning into a trial and error process. (Confess - some or all of these may
have been part of your development project.) Based on a broad swath of experience across Solr
implementations running in some of the largest Fortune 500 companies as well as some of the
smallest start-ups, this talk will cover common mistakes made by newbies and even veteran
developers—and how to avoid them. You’ll learn how best to face the challenges that can occur
either when starting out with a new Solr implementation, or in keeping up with the latest
improvements and changes.

Advanced Search and Analytics in 20 Minutes
M ARK DAVIS | KITENGA
Kitenga’s ZettaVox and ZettaSearch products support Solr and Lucene ecosystems at both the
ingestion point and for the search user. In this talk, I will show how ZettaVox, our professional
content mining platform on Hadoop, can be used to index content and rich metadata into a
LucidWorks Enterprise installation. Being built on Hadoop, ZettaVox scales up by scaling out. I will
then create an end-user search and analytics experience using our ZettaSearch solution that leverages
the faceted metadata to enhance information discovery and analysis. All in about 20 minutes.

Building SaaS Solutions for Online Media Using Apache Solr
ALBERTO M IJARES | CANOO ENGINEERING AG
SaaS applications have the advantage of remote web deployment that can be instantaneously be used
by potentially any consumer in internet, or of the cost reduction that a Web-based deployment
provides. The speaker explains in this talk the architecture of an innovative SaaS solution built for
Axel Springer media group (Switzerland). This application can extracting remotely the content of
multiple online newspaper articles, analyze them and classify them, determining which articles are
the most similar to a given one, and integrating back into the article to provide the user with a
“related articles” feature. The core components of the analysis process are: language-specific tools
(used to filter the superfluous language terms) and semantic knowledge bases (like Wikipedia, used
to enrich the indexed information with new context specific terms, or to disambiguate the extracted
terms). In a more technical layer, the speaker will explain the criteria to select the emerging
enterprise search framework Apache Solr as platform and how it reduced drastically the
development effort required.




                                                32
LUCENE REVOLUTION San Francisco 2011


Solr Performance: Key Innovations
YONIK SEELEY | LUCID IM AGINATION
Recent developments in Solr/Lucene have made significant contributions to distributed search
processing, scalability, and throughput. In this talk, Yonik Seeley, creator of Solr, will survey key
performance strategies for building search applications with Solr, and review innovations included in
Solr 3.1, as well as forthcoming development work in Solr 4.0 and beyond.

Solr and Lucene at Etsy
GREGG DONOVAN | ETSY
Etsy is using Solr and Lucene to serve queries at a rate of more than 8 billion per year (and growing).
In this case study, we will describe how Etsy has integrated Solr/Lucene into our continuous
deployment        infrastructure    (see:    http://codeascraft.etsy.com/2010/05/20/quantum-of-
deployment/), allowing for Solr configuration, Java-based indexers, and query parsing logic to go
from passing tests to production code in minutes. We’ll also discuss how we’re leveraging Solr’s new
Geo-search to power both local item search and GeoIP-personalized location autosuggest.
We’ll also share how we’ve extended Solr, adding personalized faceting and filtering as well as multi-
currency sorting and filtering that accounts for real-time currency fluctuation (contributed in SOLR-
2202) Note that code will be open-sourced/contributed for both of these features]. We will share
our real-time monitoring techniques, including how we track Solr replication, query, and GC times
in Ganglia. Finally, we’ll discuss how we’ve used Hadoop-based user analytics to improve relevance
and power data-driven spelling corrections, autocomplete suggestions, and related searches.




                                                 33
San Francisco 2011      LUCENE REVOLUTION



Lucene @ Yelp
SUDARSHAN GAIKAIW ARI | YELP
This talk describes how the Yelp uses Lucene to provide search services. It includes
       !   Statistics of Yelp search usage
       !   Overview of Yelp search architecture: Yelp uses different services to provide searches
           for different types of data. Some are based on Lucene and some on Solr
       ! Deeper dive into business and review search. This is the most important search service at
           Yelp.
We will cover:
       !   Yelp’s implementation of a micro sharded architecture and differences with Katta.
       !   Yelp extensions to Lucene to implement features such as filters and performance
           comparison with solr/Bobo
       !   Yelp’s implementation of index replication.
       !   Various tricks used at Yelp to make the service faster.

Using Solr Cloud to Tame an Index Explosion
JON GIFFORD | LOGGLY
We have hundreds of customers, each of whom may have dozens of shards. To manage this
explosion of indexes, I’ll describe how we’re using Solr Cloud to manage every index - from
creation, through migration from box to box, and finally destruction. I’ll describe some of the
performance issues we had to deal with, especially with ZooKeeper.

Lots of Facets, Fast
ANNE VELING | BEYONDTREES
We created a web application for a well-known US newspaper, to create a maps-like zooming
application on top of the 60,000 newspapers since 1850 and using Solr over the 28,000,000 articles
to create an interactive heatmap over it. The out-of-the-box faceting solution was optimized using
domain knowledge by order-of-magnitude which allowed us to create a great visual way of exploring
trends in historical newspapers.




                                                34
LUCENE REVOLUTION San Francisco 2011



CPython Embedded in Solr - Search Solution
for Python Lovers With the Speed of Native Java
ROM AN CHYLA | CERN
SPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest full text
repository for the full text papers in High Energy Physics, and INSPIRE is the biggest digital library
that merges the two. We must work with result sets bigger than 1 million for citation related queries
and our partners from Astrophysics with 6 million sets, however INSPIRE is written in Python. So
how do we move several million result sets between the two systems fast? How do we take
advantage of our special NLP processing pipeline written in Python? How do we join them? We do
not use Jython. We do not use pipes. We do not embed Solr inside INSPIRE. We embed INSPIRE
into Solr! The talk shows benefits and challenges of this surprisingly elegant solution.




                                                35
San Francisco 2011     LUCENE REVOLUTION




Rahul Agarwalla
HEAD OF INTERNATIONAL BUSINESS, UCHIDA SPECTRUM INC
                                                                              !!!"%6()'04,")+"86-
Rahul Agarwalla heads international business for Uchida Spectrum Inc, Japan. Previously he has
built and exited two content/technology ventures including Matrix Information, the pioneer of
digital content syndication in India. He has over 14 years of experience with various search
technologies like Verity, FAST ESP and Solr/Lucene.

Boris Aleksandrovsky
SEARCH ARCHITECT, YAM M ER
                                                                               -!!!"9$,,(0")+,-
Boris Aleksandrovsky works for Yammer, the Enterprise Social Network company, where they are
trying to bring benefits of social media to enterprises by creating discoverable knowledge bases. He
specializes in solving problems of search, machine learning and data analysis on large scale by
employing distributed and scalable software architectures. Boris has almost completed his PhD in
Computer Science and Neuroscience at University of California at Irvine.

Josh Berkus
CORE TEAM , POSTGRESQL
                                                                              !!!"62(56(0'%")+,-
Josh Berkus has been working as a database application consultant for 8 years. Josh primarily builds
applications for the legal and HR industries and does performance tuning. He was also head of Sun
Microsystem's PosgtreSQL support staff for 2 years and helped launch BI startup Greenplum.




                                               36
LUCENE REVOLUTION San Francisco 2011



Ed Bueche
DISTINGUISHED ENGINEER, EM C
                                                                                        !!!"#$%"%&$'
Ed Bueche is an EMC Distinguished Engineer and one of the Architects of the Documentum xPlore
search engine (part of EMC’s Information Intelligence Group). He has been with Documentum/EMC
for 12+ years and has more than 23 years of experience in performance/development in the industry,
including companies like AT&T Bell Labs and Sybase. At Documentum he worked to improve
performance & scalability for all previous Documentum full-text integrations (Verity and FAST). Ed has
been a regular speaker for over 11 years at the Documentum worldwide user conferences (both in
America, Europe) as well as at EMC World.

Andrzej Bialecki
TECHNICAL ADVISOR, LUCID IM AGINATION
                                                                        !!!"()%*+*$,-*.,/*&."%&$'
Andrzej Bialecki, Apache Lucene PMC Member, also serves as project lead for Nutch, and as committer
in the Lucene-java, Nutch and Hadoop projects. He has broad expertise across domains as diverse as
information retrieval, systems architecture, embedded systems, networking and business process/e-
commerce modeling. He’s also author of the popular Luke index inspection utility.

Roman Chyla
RESEARCH FELLOW , CERN
                                                                                   !!!"%#0."%1'
Roman Chyla is a research fellow at CERN, Switzerland. He works in the INSPIRE team to build
the biggest digital library for the High Energy Physics. He is a developer and also information
specialist, presented at four conferences, two of them international: Knihovny soucasnosti 2006,
CASLIN 2007, IKI 2009, CASLIN 2009.

Mark Davis
CTO, KITENGA, INC
                                                                                  !!!"2*/#.-,"%&$'
Mark Davis is Founder and CTO of Kitenga, Inc. Previously he served as Principal Engineer at
Xerox PARC spin-out InXight (acquired by Business Objects) and designed their enterprise product
suite, as well as at Microsoft as a Program Manager for enterprise search and SharePoint. Mark spent
nearly a decade as an academic researcher in the defense/intelligence community specializing in
cross-language search and computational linguistics. He has extensive speaking experience in
professional and academic forums.

                                                37
San Francisco 2011      LUCENE REVOLUTION


Shantanu Deo
TECHNICAL DIRECTOR, AT&T
                                                                                      !!!"$''")+,-
Shantanu Deo is a Technical Director in AT&T, in charge of their ecommerce CMS team. He is a
patent holder and has in the past presented and published his work at the INFORMs conference on
Optimization. His interests include web technologies, optimization and lately mobile web
communications. Shantanu holds a BS in Computer Engineering from the university of Poona, India
and MS degrees in the areas of Operations Research and Computer Science from the Louisiana State
University.

Esteban Donato
LEAD ARCHITECT, TRAVELOCITY
                                                                              !!!"'0$;(.+)&'9")+,-
Esteban Donato works as Lead Architect for Travelocity. He has worked as Java Developer,
Technical Leader and Architect for the last 10 years in different industries. Esteban has been
working with Solr and Lucene technology for the last 2 years implementing it in different projects.
Esteban has given conferences about Solr and Data Mining in Travelocity and Universities in
Buenos Aires, Argentina.

Gregg Donovan
TECHNICAL LEAD SEARCH, ETSY
                                                                                     !!!"('%9")+,-
Gregg Donovan is currently Technical Lead, Search at Etsy.com, the world’s most vibrant
handmade marketplace. He has worked extensively with Solr and Lucene at Etsy, and, previously, at
TheLadders.com. At Etsy, located in Brooklyn, NY, he leads the search engineering team as it
tackles the challenges presented by a growing international marketplace with a half-million different
sellers in 150 different countries selling tens of millions of items.

Stephen Dunn
HEAD OF TECHNOLOGY STRATEGY, GUARDIAN NEW S AND M EDIA UK
                                                                           !!!"'*(24$03&$1")+"4:-
Stephen Dunn is Head of Technology Strategy for Guardian News and Media in the UK. He joined
The Guardian in 1999 where he helps guide the technology strategy for it’s multiple award winning
network of web sites and services. His professional interests include open web technologies, digital
identity and security. Prior to joining the Guardian, Stephen completed his PhD at the Center for
Computational Neuroscience and Robotics at Sussex University, UK.

                                                38
LUCENE REVOLUTION San Francisco 2011


Sudarshan Gaikaiwari
SOFTW ARE ENGINEER, YELP INC
                                                                                   !!!"9(.6")+,-
Sudarshan Gaikaiwari is a software engineer working on Yelp’s search team. Prior to Yelp he
worked on various information retrieval technologies at Symantec’s Data Loss Prevention group.

Jon Gifford
CO-FOUNDER, LOGGLY
                                                                                 !!!".+22.9")+,-
Jon Gifford is the CTO and co-founder of Loggly, where he spends all day coercing Solr into
playing nice with the cloud, and with high-volume real-time data streams. An active user and
frequent hacker of Lucene since 2004, he’s happy to let Solr take care of some of the hard work for
a change. Prior to Loggly, he has spent more than a decade working on Search systems at Minimal
Loop, Scout Labs, Technorati and LookSmart. He is concerned that his near-complete web-
anonymity is under threat.

Otis Gospodnetic
FOUNDER, SEM ATEXT
                                                                              !!!"%(,$'(5'")+,-
Otis Gospodnetic is a coauthor of Lucene in Action (1st and 2nd edition). He has been involved with
Lucene since 2000 and Solr since 2006. He is also a member of Nutch, and Mahout development
teams, as well as Lucene Project Management Committee. Otis is an Apache Software Foundation
member and the founder of Sematext, a software development and consulting company focused on
Search & Analytics using open-source technologies like Lucene, Solr, Nutch, Hadoop, HBase,
Flume, and more.




                                               39
San Francisco 2011      LUCENE REVOLUTION



Trey Grainger
SEARCH TECHNOLOGY DEVELOPM ENT TEAM LEAD, CAREERBUILDER
                                                                           !!!")$0((0#4&.3(0")+,-
Trey Grainger leads the Search Technology Development group at CareerBuilder.com. He
introduced Solr to CareerBuilder and led the successful conversion away from the Microsoft FAST
ESP platform. He has been with CareerBuilder for 4 years, and his search experience includes
handling multi-lingual content across dozens of markets/languages, genetic algorithm and user
group based relevancy tuning, geo-spatial search and validation, and work on customized payload
scoring models, data mining, clustering, and recommendations. He is responsible for architecting
CareerBuilder’s cloud-like search API exposing search as a simple, dynamic, and powerful generic
service abstracted away from a large, globally-distributed architecture. Trey is also the founder and
Chief Architect of Celiaccess.com, a gluten-free search engine and networking site.

Eric Gries
PRESIDENT AND CEO, LUCID IM AGINATION
                                                                        !!!".4)&3&,$2&1$'&+1")+,-
Eric Gries joined Lucid Imagination as the President and CEO, after spending more than 20 years in
executive leadership roles, where he built high-growth technology-based businesses. Prior to joining
the company, Eric was an Executive-in-Residence at Granite Ventures. Eric has served as CEO,
general manager and vice president for companies in application development, systems
management, networking, financial services and hardware systems, in both the U.S. and Europe.
Prior to joining Granite Ventures, Eric led XACCT, a pioneering network mediation market leader,
as its president and CEO. XACCT was acquired by Amdocs in 2004, at which time Eric joined
Amdocs’ executive team as Senior Vice President. Earlier in his career, Eric served as general
manager of Compuware’s Network and Systems Management division, and held product
management, marketing, sales and engineering positions at companies such as ACI, Cullinet
Software and DEC.

Erik Hatcher
TECHNICAL STAFF, LUCID IM AGINATION
                                                                        !!!".4)&3&,$2&1$'&+1")+,-
Erik Hatcher is the co-author of two books, Lucene in Action co-author of Java Development with Ant.
Erik has been an active member of the Lucene community - a leading Lucene and Solr committer,
member of the Lucene Project Management Committee, member of the Apache Software
Foundation as well as a frequent invited speaker at various industry events. Erik earned his B.S. in
Computer Science from University of Virginia, Charlottesville, VA.

                                                40
LUCENE REVOLUTION San Francisco 2011


Jay Hill
SENIOR SEARCH ARCHITECT, LUCID IM AGINATION
                                                                        !!!".4)&3&,$2&1$'&+1")+,-
Jay Hill has been building enterprise search applications since 2003, and has worked extensively with
Autonomy IDOL, Lucene, and Solr. He is a certified Solr trainer, and is lead author for Lucid
Imagination’s Solr training courses.

Grant Ingersoll
CO-FOUNDER, LUCID IM AGINATION
                                                                        !!!".4)&3&,$2&1$'&+1")+,-
Grant Ingersoll is a founder and member of the technical staff at Lucid Imagination. Grant’s
programming interests include information retrieval, machine learning, text categorization, and
extraction. Grant is a regularly featured speaker at ApacheCon and other industry events. He has
been an active member of the Lucene community – a Lucene and Solr committer, co-founder of the
Apache Mahout machine learning project, chairman of the Lucene Project Management Committee
(PMC) as well as a Vice President at the Apache Software Foundation. He is also the co-author of
Taming Text (Manning, forthcoming) covering open source tools for natural-language processing.
Grant’s prior experience includes work at the Center for Natural Language Processing at Syracuse
University in natural language processing and information retrieval. Grant earned his B.S. from
Amherst College in Math and Computer Science and his M.S. in Computer Science from Syracuse
University, NY.

Takahiko Ito
SOFTW ARE ENGINEER, MIXI, INC
                                                                                       !!!",&5&"86-
Takahiko Ito received his Ph.D. in Engineering at Nara Institute of Science and Technology,
specializing in graph mining. He was a specialist for Japanese and Asian language processing at Fast
Search and Transfer prior to joining mixi, Inc as an R&D engineer. Selected Papers include:
       !   Masashi Shimbo, Takahiko Ito, Daichi Mochihashi, Yuji Matsumoto. On the Properties
           of von Neumann Kernels for Link Analysis. Machine Learning, 75:37-67, 2009.
       !   Takahiko Ito, Massashi Shimbo, Taku Kudo, Yuji Matsumoto. Application of Kernels to
           Link Analysis, The Eleventh ACM SIGKDD International Conference on Knowledge
           Discovery and Data Mining. 2005.




                                                41
San Francisco 2011      LUCENE REVOLUTION




Alexander Kanarsky
SENIOR SOFTW ARE ENGINEER, TRULIA
                                                                                    !!!"'04.&$")+,-
Alexander Kanarsky is responsible for managing day-to-day operations of Trulia’s indexing and
search infrastructure and oversees the search related development there. Prior to Trulia he was a
member of core development team for Autonomy’s Digital Safe, world’s largest private archive of
electronic documents.

Laura Kang
TECHNICAL LEAD, SEARCH AND M ATCHING, THELADDERS
                                                                              !!!"'*(.$33(0%")+,-
Laura Kang holds a B.A. in computer science, mathematics, and economics from University of
California at Berkeley, and M.S. and Ph.D. in computational mechanism design from Harvard
University. She has presented her work at several conferences, including the International
Conference for Electronic Commerce and the ACM Conference on Electronic Commerce. Before
joining TheLadders, she was a manager at a NYC technology startup. At TheLadders, she focuses
on search and matching algorithms.

Sudhakara Karegowdra
PRINCIPLE ARCHITECT, TRAVELOCITY
                                                                              !!!"'0$;(.+)&'9")+,-
Sudhakara Karegowdra works as Principle Architect for Travelocity. He has worked as Java
Developer, Technical Leader and Architect for the last 14 years in different industries and 10 out of
those in Travel industry. Sudhakar has been working with Solr and Lucene technology for the last 3
years implementing it in different projects. Sudhakara has given conferences about Solr in
Travelocity.




                                                42
LUCENE REVOLUTION San Francisco 2011



Steve Kearns
ROSETTE PRODUCT M ANAGER
                                                                               !!!"#$%&%'()*")+,-
Steve is the product manager for the Rosette Platform and is also the subject matter expert for the
international compliance market within Basis Technology. Prior to Basis Technology, Steve worked
at BBN Technologies where he worked on the Broadcast and Web Monitoring Systems, which
capture and extract open-source intelligence from live television and internet news websites. He has
experience in information visualization, distributed systems architecture and received his MS in
Information Technology and BS in Computer Information Systems from Bentley University. He
also spoke at the Apache Lucene EuroCon 2010 in Prague, on the topic of Building Multilingual
Search Based Applications.

Marc Krellenstein
FOUNDER, LUCID IM AGINATION
                                                                       !!!".4)&3&,$2&1$'&+1")+,-
Marc Krellenstein is the founder of Lucid Imagination. Marc has 30 years’ experience in the
computer industry, focusing for the last 20 years on information retrieval technology and
applications. Marc was previously Chief Technology Officer and Vice President for Search and
Discovery Technology at Elsevier, the scientific, technical and medical publishing division of Reed-
Elsevier. Prior to Elsevier Marc was Chief Technology Officer and Senior Vice President of
Engineering at Northern Light Technology, where he was the founding technologist and led the
design and development of the Northern Light search service, including designing the data model,
query interpretation, relevancy ranking, automatic document classification and patented technology
for document clustering. Marc has an A.B. in philosophy from Cornell
he earned his M.S. in computer science from the University of Wisconsin at Madison and a Ph.D. in
psychology (cognitive science) from the New School for Social Research, NY.

Ronald Mayer
CTO, FORENSIC LOGIC, INC.
                                                                           !!!"/+0(1%&).+2&)")+,-
Ronald Mayer has spent his career with technology start-ups in a number of fields ranging from
medical devices to digital video to law enforcement software. Ron has also been involved in Open
Source for decades, with code that has been incorporated in the LAME MP3 library, the
PostgreSQL database, and the PostGIS geospatial extension. His most recent speaking engagement
was when he gave a presentation on a broader aspect of this system to the SD Forum’s Emerging
Tech SIG titled “Fighting Crime: Information Chokepoints & New Software Solutions”

                                               43
San Francisco 2011      LUCENE REVOLUTION


Alberto Mijares
CANOO ENGINEERING AG
                                                                                    !!!")$1++")+,-
Alberto Mijares is a software engineer with more than 10 years of experience. He is Scrum Master
and an agile practitioner. He has a large background in Web technologies and Java, having
participated in the past in W3C activities related with Semantic Web. His usual role is either leading
projects or designing architectures for web applications. He started working in Canoo Engineering
AG (Switzerland) in 2008 and speaks Spanish, English and German. He has a degree in Computer
Engineering. He has participated giving talks in Java and Web related conferences and user groups in
Switzerland and Spain.

Floyd Morgan
INTUIT
                                                                                     !!!"&1'4&'")+,-
Floyd is a Principal Software Engineer who works in the Central Technology Organization at Intuit,
makers of TurboTax, Quickbooks, Quicken and Intuit Payroll, to name a few. Floyd has developed
core features of the flagship TurboTax product line and recently co-founded Intuit’s newest social
driven technology Live Community. Under Floyd’s direction, Live Community has gone from a
small project to a widely adopted platform used by most Intuit products and services. Floyd earned
his B.S. from San Diego State University in Computer Science.

Stephen O’Grady
CO-FOUNDER AND PRINCIPAL ANALYST, REDM ONK
                                                                                 !!!"0(3,+1:")+,-
Stephen O’Grady is the co-founder and Principal Analyst of RedMonk, a boutique industry analyst
firm focused on developers. Founded in 2002, RedMonk provides strategic advisory services to
some of the most successful technology firms in the world. Stephen’s focus is on infrastructure
software such as programming languages, operating systems and databases, with a special focus on
open source and big data. Before setting up RedMonk, Stephen worked as an analyst at Illuminata.
Prior to joining Illuminata, Stephen served in various senior capacities with large systems integration
firms like Keane and consultancies like Blue Hammock. Regularly cited in publications such as the
New York Times, NPR, the Boston Globe, and the Wall Street Journal, and a popular speaker and
moderator on the conference circuit, Stephen’s advice and opinion is well respected throughout the
industry.




                                                 44
LUCENE REVOLUTION San Francisco 2011



Timothy Potter
SENIOR ENGINEER, NATIONAL RENEW ABLE ENERGY LABORATORY (NREL)
                                                                                      !!!"10(."2+;-
Timothy is a highly skilled technologist with over 13 years experience delivering innovative software
solutions that encompass a wide range of technologies and business sectors. Currently, Mr. Potter is
a Senior Engineer at the National Renewable Energy Laboratory (NREL) where he leads the effort
to build a large-scale distributed platform for handling smart grid related energy data using Hadoop
and NoSQL technologies. Prior to NREL, Timtohy was the CTO for Viyya Technologies where he
developed a large-scale content recommendation system based on Solr, Mahout, and Hadoop
running in the Amazon Cloud. As a Senior Software Engineer for the WebLogic Platform at BEA
Systems, he was the chief inventor of several US Patents that helped revolutionize J2EE-based
enterprise application integration. His technical blog (http://thelabdude.blogspot.com/) is highly
respected as a guide for other developers in the open-source Java community. Mr. Potter has a BS in
Mathematics and BA in Economics with honors (summa cum laude) from the University of
Colorado.

Daniel Potzinger
AOE M EDIA GM BH
                                                                                 !!!"$+(,(3&$"3(-
Daniel Potzinger has more than 10 years of web development experience under his belt. He is a
skillful hand at developing clean solutions with a particular love of elegant, easily maintained and
reusable coding. Daniel is always open to new projects and development methods, such as Agile
Software development.
Over the last few years since joining AOE media, Daniel has played “midwife” to more than 60
Enterprise CMS-Projects for such renowned clients as congstar, Cisco WebEx and VMware,
Panasonic and the like: taking care of client requirements, directing the development and launching
the results.




                                                45
San Francisco 2011     LUCENE REVOLUTION


Craig Rees
SENSIS
                                                                                     !"#$%$&'()&*+,
Craig Rees has been at Sensis since 2008. Craig heads up the content and search groups which
manage the search capabilities, platforms and operational teams that support the Yellow Pages® and
White Pages® businesses. Craig is the author of the Sensis Content Strategy and the technology
owner of the Sensis Business Search API. Prior to joining Sensis, Craig worked in digital strategy
development and implementation roles in the United Kingdom with companies including BBC, Sky
and Argos.

Ramon Resma
ARCHITECT, TRAVELOCITY
                                                                             ---&./*0"1('%.2&'(),
Ramon Resma works as an Architect for Travelocity Mobile. He has over 22 years of experience in
the travel industry and has worked on technical leadership roles for Travelocity Architecture, Sabre
Airline Solutions Architecture, and American Airlines. Ramon has been working with Solr and
Lucene technology for the last 2 years. Recently he worked on implementing Solr functions for
serving location-based content on travel mobile applications.

Yonik Seeley
CREATOR OF APACHE SO LR & CO-FOUNDER LUCID IM AGINATION
                                                                       ---&1+'%3%)*4%#*.%(#&'(),
Yonik Seeley is the creator of Solr. He is an expert in distributed search systems architecture and
performance. Yonik has been a prolific Lucene/Solr committer, a member of the Lucene PMC, and
a member of the Apache Software Foundation. Yonik’s work experience includes CNET Networks,
BEA and Telcordia. He earned his M.S. in Computer Science from Stanford University.




                                               46
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d
Eguide lucene revolution_2011_v1d

Contenu connexe

En vedette

HSP KSSR PENDIDIKAN JASMANI TAHUN 1
HSP KSSR PENDIDIKAN JASMANI TAHUN 1HSP KSSR PENDIDIKAN JASMANI TAHUN 1
HSP KSSR PENDIDIKAN JASMANI TAHUN 1shi3yda
 
Kansen KHN voor LOGIS slideshare
Kansen KHN voor LOGIS slideshareKansen KHN voor LOGIS slideshare
Kansen KHN voor LOGIS slideshareAl Sauerfield
 
2.11 04 friesland kansenvoor horecaondernemers
2.11 04 friesland kansenvoor horecaondernemers2.11 04 friesland kansenvoor horecaondernemers
2.11 04 friesland kansenvoor horecaondernemersAl Sauerfield
 
エデルマン・マスターナラティブ
エデルマン・マスターナラティブエデルマン・マスターナラティブ
エデルマン・マスターナラティブEdelman Japan
 
Hoogwater als uitdaging 2006 H+N+S
Hoogwater als uitdaging 2006 H+N+SHoogwater als uitdaging 2006 H+N+S
Hoogwater als uitdaging 2006 H+N+SRWS & HekjeHekjes
 
Clipping RCD Ambientare 2010 02
Clipping RCD Ambientare 2010 02Clipping RCD Ambientare 2010 02
Clipping RCD Ambientare 2010 02Agência DUE
 
Presentatie 1, tour culinair 2011
Presentatie 1, tour culinair 2011Presentatie 1, tour culinair 2011
Presentatie 1, tour culinair 2011Al Sauerfield
 
De Boer Business Space Solutions
De Boer Business Space SolutionsDe Boer Business Space Solutions
De Boer Business Space SolutionsBerendRietveld
 
Web Analytics and its implication for the future of Indonesian online industry
Web Analytics and its implication for the future of Indonesian online industryWeb Analytics and its implication for the future of Indonesian online industry
Web Analytics and its implication for the future of Indonesian online industryPandu Truhandito
 
Clipping Vinicola Garibaldi - Maio/Agosto 2008
Clipping Vinicola Garibaldi - Maio/Agosto 2008Clipping Vinicola Garibaldi - Maio/Agosto 2008
Clipping Vinicola Garibaldi - Maio/Agosto 2008Agência DUE
 
Clipping Hotel Alpestre - Novembro 2009
Clipping Hotel Alpestre - Novembro 2009Clipping Hotel Alpestre - Novembro 2009
Clipping Hotel Alpestre - Novembro 2009Agência DUE
 
Tasty Test Prep Tidbits in the Library
Tasty Test Prep Tidbits in the LibraryTasty Test Prep Tidbits in the Library
Tasty Test Prep Tidbits in the LibrarySarah Bosler
 
Presentatie sociale media KHN
Presentatie sociale media KHNPresentatie sociale media KHN
Presentatie sociale media KHNAl Sauerfield
 
Presentatie social media KHN
Presentatie social media KHNPresentatie social media KHN
Presentatie social media KHNAl Sauerfield
 
2.11 04 brabant kansenvoor horecaondernemers
2.11 04 brabant kansenvoor horecaondernemers2.11 04 brabant kansenvoor horecaondernemers
2.11 04 brabant kansenvoor horecaondernemersAl Sauerfield
 
El llenguatge dels nous mitjans de comunicació
El llenguatge dels nous mitjans de comunicacióEl llenguatge dels nous mitjans de comunicació
El llenguatge dels nous mitjans de comunicacióguestad0284d
 

En vedette (19)

HSP KSSR PENDIDIKAN JASMANI TAHUN 1
HSP KSSR PENDIDIKAN JASMANI TAHUN 1HSP KSSR PENDIDIKAN JASMANI TAHUN 1
HSP KSSR PENDIDIKAN JASMANI TAHUN 1
 
Actarus
ActarusActarus
Actarus
 
Kansen KHN voor LOGIS slideshare
Kansen KHN voor LOGIS slideshareKansen KHN voor LOGIS slideshare
Kansen KHN voor LOGIS slideshare
 
Zelhem slideshare
Zelhem slideshareZelhem slideshare
Zelhem slideshare
 
2.11 04 friesland kansenvoor horecaondernemers
2.11 04 friesland kansenvoor horecaondernemers2.11 04 friesland kansenvoor horecaondernemers
2.11 04 friesland kansenvoor horecaondernemers
 
Ideeen kalender 2013
Ideeen kalender 2013Ideeen kalender 2013
Ideeen kalender 2013
 
エデルマン・マスターナラティブ
エデルマン・マスターナラティブエデルマン・マスターナラティブ
エデルマン・マスターナラティブ
 
Hoogwater als uitdaging 2006 H+N+S
Hoogwater als uitdaging 2006 H+N+SHoogwater als uitdaging 2006 H+N+S
Hoogwater als uitdaging 2006 H+N+S
 
Clipping RCD Ambientare 2010 02
Clipping RCD Ambientare 2010 02Clipping RCD Ambientare 2010 02
Clipping RCD Ambientare 2010 02
 
Presentatie 1, tour culinair 2011
Presentatie 1, tour culinair 2011Presentatie 1, tour culinair 2011
Presentatie 1, tour culinair 2011
 
De Boer Business Space Solutions
De Boer Business Space SolutionsDe Boer Business Space Solutions
De Boer Business Space Solutions
 
Web Analytics and its implication for the future of Indonesian online industry
Web Analytics and its implication for the future of Indonesian online industryWeb Analytics and its implication for the future of Indonesian online industry
Web Analytics and its implication for the future of Indonesian online industry
 
Clipping Vinicola Garibaldi - Maio/Agosto 2008
Clipping Vinicola Garibaldi - Maio/Agosto 2008Clipping Vinicola Garibaldi - Maio/Agosto 2008
Clipping Vinicola Garibaldi - Maio/Agosto 2008
 
Clipping Hotel Alpestre - Novembro 2009
Clipping Hotel Alpestre - Novembro 2009Clipping Hotel Alpestre - Novembro 2009
Clipping Hotel Alpestre - Novembro 2009
 
Tasty Test Prep Tidbits in the Library
Tasty Test Prep Tidbits in the LibraryTasty Test Prep Tidbits in the Library
Tasty Test Prep Tidbits in the Library
 
Presentatie sociale media KHN
Presentatie sociale media KHNPresentatie sociale media KHN
Presentatie sociale media KHN
 
Presentatie social media KHN
Presentatie social media KHNPresentatie social media KHN
Presentatie social media KHN
 
2.11 04 brabant kansenvoor horecaondernemers
2.11 04 brabant kansenvoor horecaondernemers2.11 04 brabant kansenvoor horecaondernemers
2.11 04 brabant kansenvoor horecaondernemers
 
El llenguatge dels nous mitjans de comunicació
El llenguatge dels nous mitjans de comunicacióEl llenguatge dels nous mitjans de comunicació
El llenguatge dels nous mitjans de comunicació
 

Similaire à Eguide lucene revolution_2011_v1d

Rockefeller foundation
Rockefeller foundationRockefeller foundation
Rockefeller foundationsshpro
 
Scenarios for the Future of Technology and Int'l Development
Scenarios for the Future of Technology and Int'l DevelopmentScenarios for the Future of Technology and Int'l Development
Scenarios for the Future of Technology and Int'l DevelopmentNicholas Manurung
 
THE ROCKEFELLER FOUNDATION
THE ROCKEFELLER FOUNDATIONTHE ROCKEFELLER FOUNDATION
THE ROCKEFELLER FOUNDATIONICJ-ICC
 
Scenarios for the Future of Technology and International Development
Scenarios for the Future of Technology and International DevelopmentScenarios for the Future of Technology and International Development
Scenarios for the Future of Technology and International DevelopmentThierry Debels
 
Back end-of-innovation-6-2013-brochure
Back end-of-innovation-6-2013-brochureBack end-of-innovation-6-2013-brochure
Back end-of-innovation-6-2013-brochureLiberteks
 
27450 SAT16 Program Guide_3_low res
27450 SAT16 Program Guide_3_low res27450 SAT16 Program Guide_3_low res
27450 SAT16 Program Guide_3_low resDave Bentley
 
SCALEit ignite program, Spring 2013
SCALEit ignite program, Spring 2013SCALEit ignite program, Spring 2013
SCALEit ignite program, Spring 2013Yggdrasil.me
 
SCALEit Ignite Spring 2013
SCALEit Ignite Spring 2013SCALEit Ignite Spring 2013
SCALEit Ignite Spring 2013scaleit
 
Digital brand strategy - Ugo Orlando
Digital brand strategy - Ugo OrlandoDigital brand strategy - Ugo Orlando
Digital brand strategy - Ugo OrlandoUgo Orlando
 
Future Decoded 2014 Show Guide
Future Decoded 2014 Show GuideFuture Decoded 2014 Show Guide
Future Decoded 2014 Show GuideMitchell Feldman
 
Disc 2015 program book 1102
Disc 2015 program book 1102Disc 2015 program book 1102
Disc 2015 program book 1102Han Woo PARK
 
2600 v24 n2 (summer 2007)
2600 v24 n2 (summer 2007)2600 v24 n2 (summer 2007)
2600 v24 n2 (summer 2007)Felipe Prado
 
Luke O Reilly (1765916) Digitial media
Luke O Reilly (1765916) Digitial mediaLuke O Reilly (1765916) Digitial media
Luke O Reilly (1765916) Digitial mediaLuke O Reilly
 
Jon Quinton, Scaling Content Marketing
Jon Quinton, Scaling Content MarketingJon Quinton, Scaling Content Marketing
Jon Quinton, Scaling Content MarketingDistilled
 
Plantilla para KPIs
Plantilla para KPIsPlantilla para KPIs
Plantilla para KPIsDolores Vela
 
AEC2015_ProgramBook_WEB
AEC2015_ProgramBook_WEBAEC2015_ProgramBook_WEB
AEC2015_ProgramBook_WEBClare12345
 

Similaire à Eguide lucene revolution_2011_v1d (20)

Rockefeller foundation
Rockefeller foundationRockefeller foundation
Rockefeller foundation
 
Scenarios for the Future of Technology and Int'l Development
Scenarios for the Future of Technology and Int'l DevelopmentScenarios for the Future of Technology and Int'l Development
Scenarios for the Future of Technology and Int'l Development
 
THE ROCKEFELLER FOUNDATION
THE ROCKEFELLER FOUNDATIONTHE ROCKEFELLER FOUNDATION
THE ROCKEFELLER FOUNDATION
 
Scenarios for the Future of Technology and International Development
Scenarios for the Future of Technology and International DevelopmentScenarios for the Future of Technology and International Development
Scenarios for the Future of Technology and International Development
 
Back end-of-innovation-6-2013-brochure
Back end-of-innovation-6-2013-brochureBack end-of-innovation-6-2013-brochure
Back end-of-innovation-6-2013-brochure
 
27450 SAT16 Program Guide_3_low res
27450 SAT16 Program Guide_3_low res27450 SAT16 Program Guide_3_low res
27450 SAT16 Program Guide_3_low res
 
SCALEit ignite program, Spring 2013
SCALEit ignite program, Spring 2013SCALEit ignite program, Spring 2013
SCALEit ignite program, Spring 2013
 
SCALEit Ignite Spring 2013
SCALEit Ignite Spring 2013SCALEit Ignite Spring 2013
SCALEit Ignite Spring 2013
 
Digital brand strategy - Ugo Orlando
Digital brand strategy - Ugo OrlandoDigital brand strategy - Ugo Orlando
Digital brand strategy - Ugo Orlando
 
Adaptis GmbH
Adaptis GmbHAdaptis GmbH
Adaptis GmbH
 
Future Decoded 2014 Show Guide
Future Decoded 2014 Show GuideFuture Decoded 2014 Show Guide
Future Decoded 2014 Show Guide
 
Disc 2015 program book 1102
Disc 2015 program book 1102Disc 2015 program book 1102
Disc 2015 program book 1102
 
2600 v24 n2 (summer 2007)
2600 v24 n2 (summer 2007)2600 v24 n2 (summer 2007)
2600 v24 n2 (summer 2007)
 
FMP-DeliGooGoo
FMP-DeliGooGooFMP-DeliGooGoo
FMP-DeliGooGoo
 
Water valve export procedures and documentation at minh hoa thanh co., ltd.docx
Water valve export procedures and documentation at minh hoa thanh co., ltd.docxWater valve export procedures and documentation at minh hoa thanh co., ltd.docx
Water valve export procedures and documentation at minh hoa thanh co., ltd.docx
 
Luke O Reilly (1765916) Digitial media
Luke O Reilly (1765916) Digitial mediaLuke O Reilly (1765916) Digitial media
Luke O Reilly (1765916) Digitial media
 
Jon Quinton, Scaling Content Marketing
Jon Quinton, Scaling Content MarketingJon Quinton, Scaling Content Marketing
Jon Quinton, Scaling Content Marketing
 
Plantilla para KPIs
Plantilla para KPIsPlantilla para KPIs
Plantilla para KPIs
 
LOLA iWave 4/17 - September 2013
LOLA iWave 4/17 - September 2013LOLA iWave 4/17 - September 2013
LOLA iWave 4/17 - September 2013
 
AEC2015_ProgramBook_WEB
AEC2015_ProgramBook_WEBAEC2015_ProgramBook_WEB
AEC2015_ProgramBook_WEB
 

Plus de Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 

Plus de Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 

Dernier

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementNuwan Dias
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 

Dernier (20)

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API Management
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 

Eguide lucene revolution_2011_v1d

  • 2. LUCENE REVOLUTION San Francisco 2011 Welcome to San Francisco! We are excited to be bringing you the second Lucene Revolution event, following quickly on the success of our 2010 conference in Boston last year. In addition to all the great feedback we received after Boston, many people asked about bringing the conference to the West Coast – and here we are. It’s great to host the community here in our home state of California. There’s now no question: the revolution is in full swing, and Lucene and Solr are shaping the future of search. The diverse range of search technology and applications is without a doubt one of its greatest strengths. For the extended community and ecosystem of open source search, Lucene Revolution is an unmatched opportunity to learn, network, share experiences, see how others have changed the world of search. Speakers here at the conference hail from companies large and small, from innovative startups and established companies, as well as from government, academia and non-profits. Even better, the range of experience and application interests of your fellow-attendees should inspire you to seek out new ways to put search technology to work. We’ve allotted ample time in breaks to have formal and informal conversations. And be sure to join the Revolution social network at: http://lucene.crowdvine.com/. Keep an eye out at the Registration Desk for agenda changes and updates. One group you should definitely seek out here is the core group of developers and committers who are the heart and soul of the Apache Lucene/Solr project. You know them from the mailing lists; these are the people who do the hard work of making the code do its magic, resolving challenging technical and architectural issues that we all benefit from. Don’t just attend their roadmap panel and technical sessions; make sure you avail yourself of the opportunity to put faces to names, so that when you’re on the mailing lists, you’ll have more than a ‘to’ and a ‘from’ to go by. As the commercial entity for Lucene/Solr, we at Lucid Imagination are always looking for new ways to help make the most of open source search. Be sure to tell us what you like, what could be improved, and what topics should be covered in future events. Think about sharing your own successes with the community by speaking at the next Lucene Revolution. Let the conference staff, or anyone on the Lucid Imagination team, know if you have any questions, or if there’s anything you need. Onward to the revolution! Eric Gries, CEO Lucid Imagination 1
  • 3. San Francisco 2011 LUCENE REVOLUTION Opening Letter .................................................................................................................................................... 1! Contents ............................................................................................................................................................... 2! Timetable at a Glance ........................................................................................................................................ 3! Agenda .................................................................................................................................................................. 6! About Lucid Imagination .................................................................................................................................. 8! About Our Sponsors ........................................................................................................................................ 10! Training .............................................................................................................................................................. 14! Keynotes ............................................................................................................................................................ 18! Sessions–Day 1.................................................................................................................................................. 19! Lightning Talks ................................................................................................................................................. 25! Sessions–Day 2.................................................................................................................................................. 28! Speaker Bios ...................................................................................................................................................... 36! Hotel, Maps & Transportation Info .............................................................................................................. 50! Lucene, Apache Lucene, Solr, Apache Solr, Hadoop, Apache Hadoop and other Apache projects mentioned are trademarks of The Apache Software Foundation. 2
  • 4. LUCENE REVOLUTION San Francisco 2011 SUNDAY MAY 22 16:00 - 18:00 ........................................................................................ REGISTRATION OPEN Sandpebble Foyer outside Grand Peninsula Ballroom MONDAY MAY 23 8:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN 9:00 - 17:00 ...................................................................................... Training Workshops/Day 1 ! Solr Application Development Workshop ! Developing Search Applications with LucidWorks Enterprise ! Lucene Application Development Workshop ! Scaling Search with Solr and Big Data See registration desk in Sandpebble Foyer for room assignment. TUESDAY MAY 24 8:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN 9:00 - 17:00 ...................................................................................... Training Workshops/Day 2 ! Solr Application Development Workshop ! Developing Search Applications with LucidWorks Enterprise ! Lucene Application Development Workshop ! Scaling Search with Solr and Big Data 16:00 – 18:00 .............................................................................................. Ticket Pickup for Giants Game (advance tickets required). Tickets may be picked up at the Conference Registration Desk in the Sandpebble Foyer 18:00.................................................................................................................. Buses depart for Giants Game from front entrance of Hyatt Hotel 3
  • 5. San Francisco 2011 LUCENE REVOLUTION WEDNESDAY, MAY 25 7:30 – 18:00............................................................................................................. REGISTRATION OPEN 7:30 – 8:30 ..................................................................................................................Light Breakfast Available 8:30 – 10:05 ................................................................................................ Welcome & Keynotes Welcome .................................................................. Eric Gries, Lucid Imagination Keynotes ......................................................Marc Krellenstein, Lucid Imagination Stephen Dunn, The Guardian News and Media 10:05 – 10:35 .......................................................................................................................................... BREAK 10:35 - 11:25 ........................................................................................ Technical Track Sessions 11:25 – 11:35 .......................................................................................................................................... BREAK 11:35 - 12:25 ........................................................................................ Technical Track Sessions 12:25 - 13:30 ....................................................................................LUNCH AND SPONSOR EXHIBITS 13:30 - 14:20 ........................................................................................ Technical Track Sessions 14:20 - 14:30 ........................................................................................................................................... BREAK 14:30 - 15:20 ........................................................................................ Technical Track Sessions 15:20 - 15:50 .......................................................................................................................................... BREAK 15:50 - 16:40 ..................................................................................... Panel: “Stump the Chump” 16:40 – 17:00 ......................................................................................................................................... BREAK 17:00 - 18:30 ........................................................................................................ Lightning Talks 18:30........................................................................................................................... REVOLUTION PARTY THURSDAY MAY 26 7:45 – 8:45 ..................................................................................................................Light Breakfast Available 8:45 – 10:15 Keynote ....................................................................... Stephen O’Grady, Redmonk Panel ..................................................... Committers Q&A, Lucene/Solr Roadmap 10:15 – 10:45 .......................................................................................................................................... BREAK 10:45 - 11:35 ........................................................................................ Technical Track Sessions 11:35 - 11:45 ........................................................................................................................................... BREAK 11:45 - 12:35 ........................................................................................ Technical Track Sessions 12:35 - 13:45 ....................................................................................LUNCH AND SPONSOR EXHIBITS 13:45 - 14:35 ........................................................................................ Technical Track Sessions 14:35 - 14:45 ........................................................................................................................................... BREAK 14:45 - 15:35 ........................................................................................ Technical Track Sessions 15:35 - 15:45 ........................................................................................................................................... BREAK 15:45 - 16:35 ....................................................................................... Technical Track Sessions 16:35 - 17:30 ......................................... Panel: “Search for Tomorrow (RDBMS for Yesterday)” 17:30............................................................................................................................ CONFERENCE ENDS 4
  • 6. LUCENE REVOLUTION San Francisco 2011 LOGISTICS ! REGISTRATION is in the Grand Peninsula Foyer ! KEYNOTES and PANEL DISCUSSIONS are Grand Peninsula Ballroom D ! TRACK 1 is in Grand Peninsula Ballroom A/B/C ! TRACK 2 is in Grand Peninsula Ballroom D ! TRACK 3 is in Grand Peninsula Ballroom E/F/G ! TRACK 4 is in Sand Pebble A/B/C ! LUNCHES are in the Atrium (upstairs above Ballroom ) ! THE REVOLUTION PARTY is in the Grand Peninsula Foyer ! TRAINING CLASSES will be held in the Sandpebble Conference Rooms ! TRAINING REGISTRATION is outside the Sandpebble Conference Rooms (please contact charelm@gmail.com if are unsure which class you are in): 5
  • 7. San Francisco 2011 LUCENE REVOLUTION 6
  • 8. LUCENE REVOLUTION San Francisco 2011 7
  • 9. San Francisco 2011 LUCENE REVOLUTION As the world’s leading source of expertise in open source search technology and the commercial company for Apache Solr/Lucene, Lucid Imagination offers the products and services you need for cost-effective development and production deployment of cutting edge search applications that lower your cost of growth. Thousands of organizations around the world have turned to the power of Apache Solr/Lucene open source technology to drive their cutting-edge search applications. LucidWorks: Enterprise Grade Solr/Lucene LucidWorks Enterprise is a flexible, cost-effective scalable platform that simplifies development, tuning, configuration and deployment of Solr/Lucene open source search technology. It features: POW ERFUL SEARCH ! Complete Apache Solr 4.x Release Integrated and tested with powerful enhancements ! Scalability Distributed search and indexing ! Cloud-Ready Centrally managed search replication and configuration ! REST API Simplifies integration SIM PLIFIED ADM INSTRATION ! Easy-to-use Installer & Admin UI Streamlines startup and common configuration tasks ! Data Connectors for databases, file systems, Web sites, SharePoint and more ! Multiple file types MS Office, PDF, native XML format documents and more ! Security: LDAP-aware, document level, role- based, policy-driven. ADVANCED USER EXPERIENCE ! Enriched Query Parsing: more resilient interpretation of user input ! Click Scoring: boosts results based on user behavior ! User Alerts: Automatic notification of new results ! Integrated Auto-complete and spellchecking. 8
  • 10. LUCENE REVOLUTION San Francisco 2011 Global Expertise: Training & 24x7 Services Lucid Imagination offers a deep bench of resources in search and open source, backed by unmatched experience with thousands of diverse search applications at the world’s largest companies. TRAINING A comprehensive selection of courses and classes for developers, system administrators, managers, and search application users on LucidWorks Enterprise, Solr and Lucene; instruction is offered in a variety of formats around the world. CONSULTING Our unique ExpertLink Advisory Services provides consultative guidance on design and optimization for search applications during development and production to ensure your Lucene/Solr implementations meet the requirements of your business. ENTERPRISE SUPPORT AND SUBSCRIPTIONS Lucid Imagination offers attractively priced subscriptions that deliver Solr/Lucene technology in an integrated, well-packaged format. Subscriptions combine stability, security, robust interfaces, and predictable release schedules with unmatched support resources in reach 24 x 7 x 365 across the globe. 9
  • 11. San Francisco 2011 LUCENE REVOLUTION Platinum Sponsor: Basis Technology Basis Technology provides software solutions for multilingual text analytics, information retrieval, and name resolution. Our Rosette© Linguistics Platform is the text analysis engine behind many commercial and government search-based applications, adding language support to Lucene and Solr for better search precision and recall in English or 27 other languages. Starting with language identification in 55 languages, our high quality linguistic analysis seamlessly integrates into Lucene and Solr via a connector — enabling customizable tokenization and stemming/lemmatization for languages like Chinese, Japanese, Arabic, and Persian. Dictionary-based decompounding is available in German, Dutch, Danish, Swedish, Norwegian, and Korean. Entity extraction enriches search by adding auto-generated metadata and faceted navigation to results. Implementing support for new languages to Solr is less than a day’s work. The Rosette Platform powers search, business intelligence, e-discovery, and other enterprise and government applications for customers worldwide including: Microsoft/Bing, Cisco, EMC, Endeca, Oracle, and Yahoo! !!!"#$%&%'()*")+,- - 10
  • 12. LUCENE REVOLUTION San Francisco 2011 Exhibitors SALESFORCE.COM Salesforce.com is the enterprise cloud computing leader and the world’s 4th fastest-growing company. We’re also one of the “Best Places to Work” (FORTUNE). Salesforce.com’s Search Team is strong and experienced, with deep architecture expertise. We’re dedicated to delivering the fastest, most reliable cloud-scale enterprise search in the industry. In addition to innovating around scalability and security, we strive to delight our end users with an original, intuitive user experience and relevancy that’s adaptive, robust, and deeply satisfying. If you share our passion for search and for solving tough problems, swing by our booth to chat. !!!"%$.(%/+0)(")+,- SEARCH TECHNOLOGIES Search Technologies is the leading independent provider of search engine integration and support services. Operating internationally, we help clients to gain business advantage using search. Our technical team of more than 80 experts is the most experienced group of search implementation professionals globally, and this mitigates risk for our customers. In short, we are the experts at fine- tuning search applications to deliver business benefits. !!!"%($0)*'()*1+.+2&(%")+,- DOCUM ILL Documill is an independent software vendor (ISV) enabling browser-based access to Microsoft Office and PDF documents and empowering high volume server-side content processing solutions.Documill Visual Search dramatically improves search user experience and discoverability of multi-page documents. Instant document previews and page-level search results improve document data mining experience and accuracy. With page-level bookmarking features, Documill Visual Search enables collaborative search, allowing users to take actions based on their findings, share results and syndicate relevant pages into new documents. !!!"3+)4,&..")+,- 11
  • 13. San Francisco 2011 LUCENE REVOLUTION Community Sponsors SEM ATEXT Sematext is a software products and services company focused on Search & Analytics using Lucene, Solr, Nutch, Hadoop, HBase, Flume, Mahout, and other open-source technologies. Sematext also offers Lucene & Solr technical support subscriptions, consulting packages, and training. The company also runs the popular search-hadoop.com and search-lucene.com sites. Founded in 2007 in New York, Sematext is privately held and self-funded with presence in North America and Europe. Sematext’s customers include The Library of Congress, Lockheed Martin, Simon & Schuster, Salesforce, NAVTEQ, Comcast, Cox Communications, ProQuest, Citysearch, Gilt Groupe, Autodesk, and many others. !!!"#$%&'$('")*%+ EM C CORPORATION EMC Corporation is the world’s leading developer and provider of information infrastructure technology and solutions that enable organizations of all sizes to transform the way they compete and create value from their information.We can help you design, build, and manage flexible, scalable, and secure information infrastructures. And with these infrastructures, you’ll be able to intelligently and efficiently store, protect, and manage your information so that it can be made accessible, searchable, shareable, and, ultimately, actionable.In short, with an information infrastructure, you can avoid the potentially serious risks and reduce the significant costs associated with managing information, while fully exploiting its value for business advantage. !!!"$%)")*%+ SPRINGSOURCE, A DIVISION OF VM W ARE, INC. SpringSource, a division of VMware, Inc., (NYSE: VMW), employs the open source leaders who created and drive innovation for Spring, the de facto standard programming model for enterprise Java applications, as well as the Java and web thought leaders within the Apache Tomcat, Apache HTTP Server, RabbitMQ, Hyperic, Groovy and Grails open source communities. SpringSource forges open source innovations to create lean and powerful technology that people love to use. From high productivity developer tools and framework to lightweight application server runtimes including data management solutions for the hardest enterprise and cloud scale problems, SpringSource provides solutions for tomorrow’s enterprise challenges. !!!"#,-./0#*1-)$")*%"+ 12
  • 14. LUCENE REVOLUTION San Francisco 2011 M ANNING PUBLICATIONS Manning Publications offers computer books for professionals—programmers, system administrators, designers, architects, managers and others. Manning’s focus is on computing titles at professional levels. We care about the quality of our books. Our books are designed without gimmicks. Their main goal is elegance and readability—we feel the two are often the same. Our covers are understated, decorated with pictures of worldwide regional dress habits of two hundred years ago. Many of our books come with online reader support: authors answer the questions of their readers in our Web-based Author Online discussion forums. - - !!!",$11&12")+,- DZONE DZone is a social linking and blogging network for the developer and IT communities. According to PC Magazine, “DZone is a developer’s dream—a vast network of user-submitted links to message boards, news, coding tricks, and more.” Launched in June, 2006, DZone is in Alexa’s top 3000 sites, surpassing established leaders like DevX, Sys-con, FTP Online and TheServerSide.com. DZone is the only vertically focused site regularly listed among the web’s largest social bookmarking sites. In its first year of operation DZone sent over 5 million visitors to other developer websites. Today, DZone has curated topic pages for Java, Solr/Lucene, Cloud Computing, PHP, Agile, Mobile, and much more. !!!"37+1(")+,- TNR GLOBAL TNR Global is a systems design and integration company focused on enterprise search and cloud computing solutions. TNR develops scalable, fault-tolerant web-based search solutions built on the open source LAMP stack and utilizing Amazon Web Services and/or physical servers. TNR has over ten years of experience in web systems and enterprise search implementations, both proprietary and open source, and specializes in Lucene Solr and FAST ESP search applications. TNR Global builds solutions for: Vertical Search Engines, Publishing, Web Directories, News Sites, Information Portals, Web Catalogs, Education. We also work with web based startups to build scalable services. !!!"'102.+#$.")+,- UCHIDA SPECTRUM Uchida Spectrum, Inc. (USI) is a leader in the Japan search market. USI provides SMART/Insight, a search application that integrates and analyzes enterprise information. SMART/InSight is used by leading blue chips, like Canon and Moody’s. USI is working with Lucid Imagination as its Strategic Alliance Partner to integrate LucidWorks Enterprise into its products and offer Lucene/Solr support services. In 2011, USI expanded its offerings to Enterprise Search and Web Services/Ecommerce companies across Asia. USI now serves clients and partners in Japan, India, China and Singapore. !!!"%6()'04,")+"86- 13
  • 15. San Francisco 2011 LUCENE REVOLUTION Scaling Search With Big Data And Solr Scaling Search with Big Data and Solr is a 2-day instructor-led, hands-on classroom training course delivered by instructors certified by Lucid in a shared classroom setting. The class is for Solr developers who want to know how to leverage the flexible search functionality of Apache Solr and the Big Data processing of Apache Hadoop, to create the indexes for both general search and augmented data analytics. Lab exercises and real-world examples will be used to reinforce content. We’ll start with Hadoop from the ground up, and cover MapReduce, HDFS—the Hadoop Distributed File System, cluster management, “the shuffle,” etc., before continuing on to connecting it to Solr. We’ll look at common use cases for generating search indexes from big data, typical patterns for the data processing workflow, and how to make it all work reliably at scale. We will explore in-depth an example of processing 1 billion records to create a faceted Solr search solution. You’ll learn how Solr can be used as a NoSQL solution, and how it compares to classic NoSQL projects such as Cassandra and HBase. The class will continue with techniques for scaling your Solr installation, how to identify bottlenecks in your Solr installation, how to monitor your installation, and how determine resource usage. We’ll also cover various Solr architectures, their characteristics and use cases. We’ll examine how to apply these to make appropriate tradeoffs to effectively scale your Solr installation. THE COURSE COVERS ! An overview of Hadoop. ! Understanding MapReduce. ! Principles of Hadoop development, operations & eco-system. ! How to use Hadoop with Solr. ! How to Index large volumes of data. ! How to effectively search large indexes. ! Understanding NoSQL. ! How to shard/federate/replicate your data for large indexes. ! Understanding resources cost & tradeoffs for Solr Features. PREREQUISITES Prospective students should be familiar with Solr, obtained either through work experience with Solr, or having completed the Lucid Imagination Solr training course. It is assumed the student does not have prior Hadoop experience. 14
  • 16. LUCENE REVOLUTION San Francisco 2011 Developing Search Applications With Lucidworks Enterprise Developing Search Applications with LucidWorks Enterprise is a 2-day instructor-led, hands-on classroom training course designed and developed by the engineers that developed LucidWorks Enterprise (LWE), and delivered by instructors certified by Lucid in a shared classroom setting. The objective of this course is to introduce LucidWorks Enterprise to users with no previous experience working with search applications. Through a combination of lectures and hands-on lab exercises you will learn how to get up and running with LucidWorks Enterprise, what the components of a search application are, and how to make your content searchable and findable in a search application built on LucidWorks Enterprise. There will be time for questions and discussion to enhance your learning experience. At the end of the course you will know what a search application is, and how to set up and use LucidWorks Enterprise to index and search your content. You will also learn about all of the features LWE such as highlighting, spell checking, and custom alerts, and how to use these features to build a satisfying search experience for end users who will search your content. THE COURSE COVERS ! What a search application is and how to build one with LucidWorks Enterprise. ! How to install and configure LWE. ! How to make your content searchable and findable. ! How to work with different data sources such as web pages, relational databases, and rich content files. ! How to build queries to search for content in LWE. ! Techniques and features in LWE that can be used to make results for end users more relevant. ! Different ways to process search results returned by LWE. PREREQUISITES No programming skills are necessary, however some technical background and familiarity with application development will be helpful. There will be labs accompanying the lectures that will require basic computer skills including how to run a simple command from the command line.No previous experience with search applications is necessary. 15
  • 17. San Francisco 2011 LUCENE REVOLUTION Solr Application Development Workshop Solr Application Development Workshop is a two-day hands-on training course designed and developed by the engineers that helped write the Apache Lucene/Solr code, and delivered by instructors certified by Lucid in a shared classroom setting. The workshop is targeted at developers who want to build applications with Apache Solr, the Lucene Search Server. You will learn how to set up and use Solr to index and search, how to analyze and solve common problems, and how to use optional Solr modules such as facets, spell check, and highlighting. Lab exercises and real-world examples will be used to reinforce content. There will be time for questions and discussion to enhance your learning experience. At the end of the course you will understand how to set up and use Solr to index and search, how to analyze and solve common problems, and how to use optional Solr modules such as facets, spell check, and highlighting. THE COURSE COVERS ! Principles of search application development ! Common search use cases and their application ! How to make content searchable ! Key Solr and Lucene concepts ! Basics of indexing and searching using Solr ! How to design and run a Solr application ! Best practices for indexing, searching and performance ! Techniques to analyze and resolve common search problems ! How to leverage Solr’s optional modules including spell checking, highlighting, Data Import Handler, Tika Integration and other popular capabilities ! Advanced topics in designing Solr apps and running a site ! Solr operations and deployment tools and strategies ! How to customize and extend Solr PREREQUISITES Some programming skill and experience with a modern programming language such as Java, PHP, Perl, Ruby, .NET, or any language that supports HTTP and/or XML. 16
  • 18. LUCENE REVOLUTION San Francisco 2011 Lucene Application Development Workshop Lucene Application Development Workshop is a two day instructor-led hands-on training workshop, written and led by the engineers who helped write the Apache Lucene/Solr code. The objective of this course is to provide you with real life use cases and teach you how to apply Lucene to real business requirements. During the course you will learn to apply best practices in developing scalable, highly available and high performance search applications. There will be time for questions and discussion to enhance your learning experience. THE COURSE COVERS ! Principals of search application development. ! Common search use cases and their application. ! How to make content searchable. ! Key Lucene concepts. ! Basics of indexing and searching with the Lucene APIs. ! Best practices for indexing, searching and performance. ! Analysis techniques for solving common search problems. ! Lucene Internals. ! Lucene’s optional modules to enable spell checking, highlighting and other common search features. PREREQUISITES Basic Java programming skills 17
  • 19. San Francisco 2011 LUCENE REVOLUTION The Once and Future History of Enterprise Search and Open Source M ARC KRELLENSTEIN | LUCID IM AGINATION While it remains challenging to build best practice search applications, core search technology has become commoditized. Open source Lucene/Solr represents the best form of that commodity, as good as or better than any commercial search technology while also providing the cost, control and flexibility advantages of open source. In this talk, we’ll look at how past challenges in search were met and new ones evolved, and the place of Lucene/Solr in that evolution. From Publisher To Platform: How The Guardian Embraced the Internet using Content, Search, and Open Source STEPHEN DUNN | GUARDIAN NEW S AND M EDIA UK In 2009 The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications with The Guardian’s rich content. The content API, hosted on Solr instances on EC2, contains JSON representations of all Guardian articles back to 1999 - over 1 million articles, and is an increasingly complete representation of the output of the organization. The DataStore contains curated data sets for use in applications and virtualizations. This talk will cover how The Guardian opened up their business, enriched it, and reached new markets with its Open Platform strategy. Stephen will cover the technical architecture, implementation of Solr (the key technology powering the platform), and how The Guardian has used it to embrace disruption in the media space, while finding new sources of revenue and innovation. With two years since its launch, Stephen will cover some of the lessons learned, and explain how the Guardian complements use of Solr with other open-source non-relational technology, as it platform evolves. All Data Big and Small STEPHEN O’GRADY | REDM ONK The last twenty four months have seen a veritable explosion in discussion around what is commonly referred to as Big Data and the infrastructure technology employed to manage it. The wealth of available open source software means that businesses from any industry have easily accessible tools with which to tackle projects that would have been out of their reach just a few years prior. Less heralded, however, has been the fact that making data actually useful - whatever its size - remains a challenge. In this session we’ll explore the role of search in putting data - big and small - to work answering the important questions for businesses and society by reducing the friction between question and answer. 18
  • 20. LUCENE REVOLUTION San Francisco 2011 Integrating Advanced Text Analytics into Solr STEVE KEARNS | BASIS TECHNOLOGY Text analytics provides a number of interesting analytic capabilities that can enhance enterprise search applications, though in practice it is not always obvious how these can be integrated effectively into Solr. This presentation will describe some of the practical ways that leading organizations are using text analytics by integrating them directly into Solr and their user interface to improve relevance, navigate results, and discover new information. The combination of Solr and quality text analytics can improve existing keyword search solutions, and enable new ways of discovering knowledge hidden in existing data. Finite State Automata in Lucene: Internals and Applications DAW ID W EISS | POZNAN UNIVERSITY OF TECHNOLOGY, POLAND Finite state automata and transducers made it into Lucene fairly recently, but already show a very promising impact on search performance. This data structure is rarely exploited because it is commonly (and unfairly) associated with high complexity. During the talk, I will try to show that automata and transducers are in fact very simple, their construction can be very efficient (memory and time-wise) and their field of applications very broad. This will be backed by an introduction to how FSTs are implemented in Lucene (construction and traversals) and practical use cases of where FSTs have been useful so far. If you’d like to see how to squeeze a 150MB of text data into 1.8MB of compact data structure, this talk is for you. Case Study - Panasonic Europe Powered by Apache Solr DANIEL POTZINGER | AOE M EDIA GM BH In 2010 Panasonic made the decision to replace their legacy enterprise search tool and switched the search for all their European websites to a Apache Solr based solution. Now their customers benefit from an incredibly fast and feature rich solution that is much more than just a search and has become a valuable sales-driving tool for Panasonic. Features like relevancy manipulation, autosuggest, contextual filtering for properties like color or product category were implemented under not the most ideal circumstances mainly that there was no access to structured data. The search was rolled out in close to 30 countries so far also putting Solr multi-lingual handling to a test. 19
  • 21. San Francisco 2011 LUCENE REVOLUTION Real-time Search at Yammer BORIS ALEKSANDROVSKY | YAM M ER, INC. This talk will be focused on the architecture, scalability concerns, performance bottlenecks, operational characteristics and lessons learned while designing and implementing Yammer distributed real-time search system. Yammer is an enterprise social network SaaS offering with over 100,000 networks (including 85% of the Fortune 100) and nearly 2 million users. The search system we developed scales well up to 1B messages and serves a foundation of knowledge base analysis services Yammer is developing. Boosting Documents in Solr by Recency, Popularity and Personal Preferences TIM OTHY POTTER | NATIONAL RENEW ABLE ENERGY LABORATORY (NREL) Attendees with come away from this presentation with a good understanding and access to source code for boosting and/or filtering documents by recency, popularity, and personal preferences. My solution improves upon the common “recipe” based solution for boosting by document age. The framework also supports boosting documents by a popularity score, which is calculated and managed outside the index. I will present a few different ways to calculate popularity in a scalable manner. Lastly, my solution supports the concept of a personal document collection, where each user is only interested in a subset of the total number of documents in the index. My presentation will provide a good example of how to filter and/or boost results based on user preferences, which is a very common requirement of many Web applications. Jazzed about Solr: People as a Search Problem JOSHUA TUBERVILLE | EHARM ONY Search oriented architectures are obvious approaches for web pages, emails, documents, and other text based entities. Often with traditional structured data, text searching is “added on” to the traditional Boolean queries in relational stores. When Jazzed was initiated we wanted search to be front and center. When we evaluated Solr we realized we could take the opposite approach “add on” Boolean components to textual searches. This hybrid query approach makes transitioning to flexible ranking easy and straightforward. In this talk we will cover ! How we model semi-structured user data in Solr ! Indexing strategies and their tradeoffs ! Where in Jazzed architecture Solr does and doesn’t fit ! What aspects of Solr we are using ! Future considerations 20
  • 22. LUCENE REVOLUTION San Francisco 2011 Heavy Committing: DocValues aka. Column Stride Fields in Lucene 4.0 SIM ON W ILLNAUER | APACHE LUCENE PM C Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the “next generation” features. DocValues enable Lucene to efficiently store and retrieve type-safe Document & Value pairs in a column stride fashion either entirely memory resident random access or disk resident iterator based without the need to un-invert fields. Its final goal is to provide a independently update-able per document storage for scoring, sorting or even filtering. This talk will introduce the current state of development, implementation details, its features and how DocValues have been integrated into Lucene’s Codec API for full extendability. Search, APIs, capability management and the Sensis journey CRAIG REES | SENSIS Earlier this year, Sensis launched its Business Search API, which allows publishers to develop local search propositions powered by the two million business listings contained in the Australian Yellow Pages® and White Pages® directories. This case study will explore Sensis’ strategic direction for search and explain how the framework and metrics by which search is managed at Sensis were used to define our search roadmap. Key architectural decisions including our use of Solr and MongoDB will be discussed as well as our approach to real-time search tuning and quality management. A Study of I/O and Virtualization Performance with a Search Engine based on an XML database and Lucene ED BUECHE | EM C Documentum xPlore provides an integrated Search facility for the Documentum Content Server. The standalone search engine is based on EMC’s xDB (Native XML database) and Lucene. In this talk we will introduce xPlore and some of its key components and capabilities. These include aspects of a tight integration of Lucene with the XML database: xQuery translation and optimization into Lucene query/API’s as well as transactional update Lucene). In addition, xPlore is being deployed aggressively into virtualized environments (both disk I/O and VM). We cover some performance results and tuning tips in these areas. 21
  • 23. San Francisco 2011 LUCENE REVOLUTION Four Pillars of Designing the Search Experience TYLER TATE | TW IGKIT Lucene and Solr provide many excellent tools for presenting information to users, but what makes some search user interfaces better than others? Should you aim for a rich, advanced UI or should you “just make it look like Google”? Through his work at TwigKit with blue-chip corporations, scientific institutes, and governments, Tyler has identified four guiding pillars of the search experience: ! User Expertise - Novices orienteer, experts teleport ! User Behaviour - Lookup, learn, and investigate ! Information Diversity - homogenous vs. heterogenous data ! Situational Context - factors from the surrounding environment We’ll delve deep into each dimension and discuss how to achieve useful, useable, and beautiful search interfaces using design patterns including: autocomplete, faceted navigation, breadcrumbs, best bets, related searches, spelling suggestions, clickable metadata, result clustering, saved searches, data visualisation, and more. Using Solr in Online Travel Shopping to Improve User Experience ESTEBAN DONATO, SUDHAKARA KAREGOW DRA AND RAM ON RESM A | TRAVELOCITY In this talk we would like to present three different use cases of Solr in the travel industry. First of all we would describe how we implemented faceted navigation for hotel shopping. Then, we will introduce how we implemented destination searching functionality like auto-complete and misspelling. Lastly, we will show you how we integrated Solr to provide better experiences to mobile users. Solr @ eBay Kleinanzeigen OLAF ZSCHIEDRICH | EBAY.DE Attendees will learn how eBay Germany has implemented Solr, why Solr was selected, which Solr features are utilized. and how Solr is configured and used in production. Recommended best practices will be profiled alomng with eBay Kleinanzeigen plans for future deployment of Solr. 22
  • 24. LUCENE REVOLUTION San Francisco 2011 Rapid Prototyping with Solr ERIK HATCHER | LUCID IM AGINATION Got data? Let’s make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr’s schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We’ll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production. Search Analytics: What? Why? How? OTIS GOSPODNETIC | SEM ATEXT You’ve indexed your data and people are searching it. But how do you know if they are happy with the results? How do you know if they are finding what they need? With search increasingly becoming the primary information access mechanism, knowing how your search is doing is not just a matter of mere curiosity, but often has direct business impact. In this talk we’ll talk about Search Analytics and how it can be used to answer questions like: ! Are too many users getting the dreaded “no matches” results? ! How deep into search results do people dig? ! Which hits are they clicking on, or what percentage of them don’t click on any hits? ! How much do they use the Did You Mean or Auto-Complete suggestions? We’ll explore what specific Search Analytics reports tell us and what specific actions you should take based on those reports. 23
  • 25. San Francisco 2011 LUCENE REVOLUTION “Stump The Chump”: Get On The Spot Solutions To Your Real Life Solr/Lucene Challenges GRANT INGERSOLL | LUCID IM AGINATION Got a tough problem with your Solr or Lucene application? Facing challenges that you’d like some advice on? Looking for new approaches to overcome a Lucene/Solr issue? Not sure how to get the results you expected? Don’t know where to get started? Then this session is for you. Now, you can get your questions answered live, in front of an audience of hundreds of Lucene Revolution attendees! Back again by popular demand, “Stump the Chump” at Lucene Revolution 2011 is hosted by PMC chairman and Lucid Imagination co-founder Grant Ingersoll. All you need to do is send in your questions to us here at info@lucenerevolution.org. You can ask anything you like, but consider topics in areas like: ! Data modelling ! Query parsing ! Tricky faceting ! Text analysis ! Scalability You can email your questions to info@lucenerevolution.org. Please describe in detail the challenge you have faced and possible approach you have taken to solve the problem. Anything related to Solr/Lucene is fair game. Our MC will read the questions, and Grant will have to formulate a solution on the spot. A panel of judges will decide if he has provided an effective answer. Prizes will be awarded by the panel for the best question—and for those deemed to have “stumped the chump”. 24
  • 26. LUCENE REVOLUTION San Francisco 2011 Improve Relevance by Using Morphology and Named Entity Recognition CHRISTOPH GOLLER, DIRECTOR, RESEARCH | INTRAFIND SOFTW ARE AG This talk will show how the relevance of search results can be improved by using morphology and named entity recognition. After briefly explaining the purpose of morphological analysis and of named entity recognition we will analyze their potential advantages for search, faceting, and clustering of search results. Based on these ideas we will briefly sketch details how to implement a morphological analyzer in Lucene and how to implement a natural language question answering system based on Lucene using named entity recognition. The talk will be accompanied by a life demo of these ideas. BIO: Christoph Goller has more than 10 years of experience in the search industry. He got a Ph.D in computer science from the Technical University of Munich where he worked in several research projects on artificial intelligence, machine learning and neural networks. Christoph started his career at Lernout & Hauspie. Since 2002 he has been Director Research of Intrafind Software AG (www.intrafind.de), a German company specializing in full-text search and text mining based on Lucene/Solr. Christoph has been a Lucene committer since 2004. He has accompanied dozens of commercial projects using Lucene and Solr. Christoph is author of more than 15 scientific papers, frequently gives presentations on search related topics and is responsible for partner training at Intrafind. Scientific Data Search in the Pharmaceutical Industry with Solr JEFFREY GUO, CEO | SEM TIFIC SOFTW ARE, INC. Tremendous amount of experimental information and scientific knowledge has been locked or lost in data silos in the forms of semi-structured or unstructured data in today’s pharmaceutical industry. Out of the box full text search engines do not understand embedded scientific terms and objects and their relationships to facilitate context sensitive and relevant searches. This presentation will discuss a successful implementation at a major pharmaceutical company that utilizes Solr as enterprise search platform and enhances it with chemistry (molecular entities and reactions) search capabilities. The scope of the document indexing process is expanded to cover embedded chemistry objects and terms of various types such as common chemical names, corporate IDs, SMILES, and InChI from documents. Scientifically aware search based on query structure drawing or chemical terms is therefore enabled. Enterprise scientific search strategies and lessons learned will be discussed during the presentation. Bio: Founder of Semtific Software, Inc., a company that provides products and services that streamline drug discovery workflow and enterprise search of scientific research data. 25
  • 27. San Francisco 2011 LUCENE REVOLUTION Using Lucene’s Test Framework ROBERT M UIR | LUCID IM AGINATION The Lucene/Solr community takes testing seriously: we have a suite of over 3500 tests to ensure software quality. Over time we accumulated some useful extensions to JUnit testing, and several people found themselves using our extensions for other projects. We released this “test framework” for the first time in Lucene 3.1, and this talk is a short summary of its feature list to hopefully encourage you to go check it out for yourself. Find out how you can: ! Improve test coverage for custom Lucene components. ! Speed up your unit test suite by running tests in parallel ! Find resource leaks, localization or timezone-sensitive bugs in your application ! Use our extensions to make unit tests easier to write. Bio: Robert Muir, software engineer for Lucid Imagination, us a Lucene/Solr committer & PMC member. Using Apache Solr and Active Directory to unify data access across Intranet, ERP and Filesystem Cluster ROBERT W EIßGRAEBER, PROJECT DIRECTOR | LIGHTW ERK Solr is tightly linked into all available data and business intelligence sources in the enterprise: Indexing the TYPO3 CMS-based Intranet, downloads, forms, handbooks, an Oxaion based ERP- Database, and the file system Cluster running Microsoft Distributed File System – using TIKA for full-text content extraction. All data is connected via ActiveDirectory servers into user based fine- grained access control lists, which are evaluated in real-time and early-binding mode by Solr. A worldwide Solr-Cluster using different shards gives additional security for world-wide deployment, e.g. keeping confidential data inside the headquarters own data centers. Bio: Robert Weißgraeber is Project Director at Lightwerk, primary specialized in designing, planning and executing corporate portals. 26
  • 28. LUCENE REVOLUTION San Francisco 2011 Thousands of Indexes in the Cloud SHANEAL M ANEK, LEAD SEARCH ENGINEER | GREPLIN Indexes at Greplin are strange - instead of having one giant index that is searched all the time and updated infrequently, there are thousands of relatively small indexes that are updated much more frequently than they are searched. These unorthodox requirements lead to an unorthodox architecture that uses techniques inspired by Zoie and Bobo. We will discuss techniques that allowed us to exploit the inherent shardability and access patterns of our data to build an extremely high throughput information retrieval architecture. We will also examine some of the challenges and opportunities presented by running Lucene on Amazon’s Elastic Compute cloud. Bio: Shaneal Manek is the lead search engineer at Greplin. He was previously the founder and CTO of Signpost.com, which built a geospatial search and recommendation engine on top of Lucene and Lisp. 27
  • 29. San Francisco 2011 LUCENE REVOLUTION Intuit’s Live Community FLOYD M ORGAN | INTUIT TurboTax Live Community is a large-scale web application that uses user contribution and open source technology to assist millions of TurboTax users complete their tax returns. Other benefits from Live Community include reducing support calls, highly effective advertising campaigns, usability engineering and new for this year conversion prediction analytics. I will present how Solr/Lucene powers the many facets of TurboTax Live Community now in the future. Highly Relevant Search Result Ranking for Large Law Enforcement Information Sharing Systems RONALD M AYER | FORENSIC LOGIC Law enforcement data has many interesting complexities for search. Cross-agency searches are even more challenging because each agency has its own shorthand. Many different types of similarity between search clauses and documents should influence the ranking of results. For example, a search clause mentioning a “tall suspect” might want to include results with “6 foot 4 suspect”. Spatial clusters are important, as are temporal patterns. Different fields may be more or less important depending on the type of crime—for example, a victim’s race may matter more than a vehicle’s make in a sex crime but less in an auto theft. Also, documents may be related to each other in various ways that may also affect their ideal search ranking. Solr’s great flexibility in its analyzers, filters, synonyms, and boosting make it excellent tool for such diverse requirements. We’ve contributed a patch to Solr (#SOLR-2058) that helped further improve search result ranking for cases where a search for a suspect with a “red baseball cap, black leather jacket” is compared against many documents mentioning red caps, black caps, etc. This presentation will describe how we addressed some domain-specific challenges of our data. Using Solr/Lucene/LWE for eCommerce GRANT INGERSOLL | LUCID IM AGINATION If your user can’t find it, they can’t buy it right? In this talk, Apache Lucene and Solr committer Grant Ingersoll will discuss architecture, techniques and tips for successfully deploying search tools like Lucene, Solr and LucidWorks Enterprise in eCommerce environments. 28
  • 30. LUCENE REVOLUTION San Francisco 2011 Flexible Indexing in Lucene 4.0 UW E SCHINDLER | SD DATASOLUTIONS Apache Lucene’s next major release, 4.0, will introduce lots of flexibility into indexing, but also fundamental changes to the well-known APIs: It features a new and consistent, 4-dimensional iteration API on top of a low-level, pluggable codec API giving applications full control over the postings data. Terms are now arbitrary opaque bytes enabling users to store terms in any encoding, not necessarily UTF-8, natively in the index (e.g. numeric fields). Currently under development is a higher performance postings iteration API, enabling interesting codecs based on recent encoding algorithms to work effectively. Several codecs have already been created, including the default “standard” codec, which enables sizable RAM reduction for searchers, and a “pulsing” codec that inlines postings data directly into the terms dictionary, which provides a solid performance boost for primary key fields. A lot of new codecs are under development like “PFOR”, “FOR”, “AFOR”, or “Simple64”. In this talk, Uwe presents an overview of all of these exciting changes, as well as several concrete, real-world examples of how applications can tap into these new features. Transforming the House Hunting Experience: How Solr is Helping Trulia Reshape the Real Estate Industry ALEXANDER KANARSKY | TRULIA Trulia is a real estate search company that helps customers find homes for sale or to rent and provides them with information to help them make better decisions in the process. It is also a hub for real estate professionals to market their listings, view real estate data and promote their services. The presentation describes how Solr helped Trulia to transform the traditional real estate experience and make real estate data accessible and understandable to millions of users. It discusses approaches we took to achieve this by using custom-built distributed index management, indexing integration with Hadoop and geospatial search enhancements to Solr. 29
  • 31. San Francisco 2011 LUCENE REVOLUTION Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platform TREY GRAINGER| CAREERBUILDER For CareerBuilder, a 1% deviance in search relevancy can mean millions of missed job opportunities for our users. When CareerBuilder moved to Solr from an expensive, proprietary search vendor, our top priorities were maintaining the quality of our search results and drastically improving our agility. This talk will describe how we addressed both needs. For search quality, we’ll cover some of our internal studies and resulting methods for dealing with multi-lingual content across dozens of languages, as well as customizing and experimenting with relevancy calculations. For platform agility, we’ll discuss CareerBuilder’s cloud-like search API framework which seamlessly handles millions of searches an hour, processes hundreds of millions of documents, and is powered by hundreds of globally-distributed servers. Come hear the results of our studies and some best practices for quality and performance. Learn how our framework has lead to staggering improvements in both maintainability and technology innovation, allowing us to learn from our content, not just find it. Handy Installation Tool “Anuenue” for Solr Cluster & Implemen- tation of “Did you mean” Facility for Queries in Japanese TAKAHIKO ITO| M IXI mixi is one of the largest social networking services in Japan, providing various communication services for over 14M monthly active users. The latest internal mixi project is to replace the in-house search engine with Apache Solr. This session covers two topics a simple packaging system for Solr that eases the installation process and daily operations, and implementation of a “Did you mean” facility for Japanese queries using a log mining tool. These tools have been released as OSS projects. Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise ANDRZEJ BIALECKI | LUCID IM AGINATION This talk will present what are click-through events and how to process them with LucidWorks Enterprise. This innovative technique puts powerful search and relevancy at your fingertips—at a fraction of the time and effort required to program them yourself with native Apache Solr. Andrzej will discuss and present how you can use LucidWorks Enterprise for: ! Click Scoring to automatically configure relevance for most popular results ! Simplified implementation of auto-complete and “did-you-mean” functionality ! Unsupervised feedback to automatically provide relevance improvement on every query 30
  • 32. LUCENE REVOLUTION San Francisco 2011 Using Solr to find the Right Person for the Right Job LAURA KANG | THELADDERS In this talk, we’ll describe how TheLadders.com uses Lucene/Solr to instantly recommend candidates to a recruiter when he/she posts a job on the recruiter site. Our matching algorithm scores candidates from our job seeker site based on the criteria and description of jobs and job seekers’ resume and profile data. This helps recruiters quickly identify candidates that are right for the job and increases the chance of our job seekers getting hired. The talk covers an overview of our Solr architecture and a description of our matching algorithm. We’ll also a discuss criteria for evaluating the algorithm, including an overview of our testing sessions and their format. Finally, we’ll also demo the feature so you can see how it works in practice. Using Solr For Enabling Highly Customized Sitewide Navigation SHANTANU DEO | AT&T The organization needed to enable a very customizable form of Global Navigation for the various types of users (based on their profile and other factors). This would normally have involved complex logic to figure out the appropriate set of links to show for a customer, and would have been a maintenance nightmare. Instead we approached the problem as a search problem. Coupled with a novel encoding scheme we were able to solution the problem simply by searching on the customers profile groups and return a coherent global navigation using Solr to index the data. This has resulted in a very simple to understand and maintain solution that will stand in good stead in the future. The presentation is meant to be a description of using Solr to implement a real-world application. Building Specialized Industry Applications Using Solr, And Migration From FAST ESP RAHUL AGARW ALLA | UCHIDA SPECTRUM INC. Uchida Spectrum, Inc. is a leader in the Japan search market. USI provides SMART InSight, a search application used by many Fortune 500 companies for specialized industry applications like R&D and quality assurance for manufacturing, claims and customer management etc. Originally SMART/InSight was based on Microsoft FAST. This talk will review how SMART/InSight has migrated from FAST ESP to LucidWorks Enterprise, and how SMART/InSight incorporates virtual data integration, enterprise search, and the ability for users to have a unified way to navigate diverse data sources, analyze data more easily, and personalize results. Several use cases will be profiled with demonstrations of real-world use cases. 31
  • 33. San Francisco 2011 LUCENE REVOLUTION The Seven Deadly Sins of Solr JAY HILL | LUCID IM AGINATION Sloth. Greed. Pride. Lust. Envy. Gluttony. Wrath. Getting started with Solr can present some pitfalls and temptations, often turning into a trial and error process. (Confess - some or all of these may have been part of your development project.) Based on a broad swath of experience across Solr implementations running in some of the largest Fortune 500 companies as well as some of the smallest start-ups, this talk will cover common mistakes made by newbies and even veteran developers—and how to avoid them. You’ll learn how best to face the challenges that can occur either when starting out with a new Solr implementation, or in keeping up with the latest improvements and changes. Advanced Search and Analytics in 20 Minutes M ARK DAVIS | KITENGA Kitenga’s ZettaVox and ZettaSearch products support Solr and Lucene ecosystems at both the ingestion point and for the search user. In this talk, I will show how ZettaVox, our professional content mining platform on Hadoop, can be used to index content and rich metadata into a LucidWorks Enterprise installation. Being built on Hadoop, ZettaVox scales up by scaling out. I will then create an end-user search and analytics experience using our ZettaSearch solution that leverages the faceted metadata to enhance information discovery and analysis. All in about 20 minutes. Building SaaS Solutions for Online Media Using Apache Solr ALBERTO M IJARES | CANOO ENGINEERING AG SaaS applications have the advantage of remote web deployment that can be instantaneously be used by potentially any consumer in internet, or of the cost reduction that a Web-based deployment provides. The speaker explains in this talk the architecture of an innovative SaaS solution built for Axel Springer media group (Switzerland). This application can extracting remotely the content of multiple online newspaper articles, analyze them and classify them, determining which articles are the most similar to a given one, and integrating back into the article to provide the user with a “related articles” feature. The core components of the analysis process are: language-specific tools (used to filter the superfluous language terms) and semantic knowledge bases (like Wikipedia, used to enrich the indexed information with new context specific terms, or to disambiguate the extracted terms). In a more technical layer, the speaker will explain the criteria to select the emerging enterprise search framework Apache Solr as platform and how it reduced drastically the development effort required. 32
  • 34. LUCENE REVOLUTION San Francisco 2011 Solr Performance: Key Innovations YONIK SEELEY | LUCID IM AGINATION Recent developments in Solr/Lucene have made significant contributions to distributed search processing, scalability, and throughput. In this talk, Yonik Seeley, creator of Solr, will survey key performance strategies for building search applications with Solr, and review innovations included in Solr 3.1, as well as forthcoming development work in Solr 4.0 and beyond. Solr and Lucene at Etsy GREGG DONOVAN | ETSY Etsy is using Solr and Lucene to serve queries at a rate of more than 8 billion per year (and growing). In this case study, we will describe how Etsy has integrated Solr/Lucene into our continuous deployment infrastructure (see: http://codeascraft.etsy.com/2010/05/20/quantum-of- deployment/), allowing for Solr configuration, Java-based indexers, and query parsing logic to go from passing tests to production code in minutes. We’ll also discuss how we’re leveraging Solr’s new Geo-search to power both local item search and GeoIP-personalized location autosuggest. We’ll also share how we’ve extended Solr, adding personalized faceting and filtering as well as multi- currency sorting and filtering that accounts for real-time currency fluctuation (contributed in SOLR- 2202) Note that code will be open-sourced/contributed for both of these features]. We will share our real-time monitoring techniques, including how we track Solr replication, query, and GC times in Ganglia. Finally, we’ll discuss how we’ve used Hadoop-based user analytics to improve relevance and power data-driven spelling corrections, autocomplete suggestions, and related searches. 33
  • 35. San Francisco 2011 LUCENE REVOLUTION Lucene @ Yelp SUDARSHAN GAIKAIW ARI | YELP This talk describes how the Yelp uses Lucene to provide search services. It includes ! Statistics of Yelp search usage ! Overview of Yelp search architecture: Yelp uses different services to provide searches for different types of data. Some are based on Lucene and some on Solr ! Deeper dive into business and review search. This is the most important search service at Yelp. We will cover: ! Yelp’s implementation of a micro sharded architecture and differences with Katta. ! Yelp extensions to Lucene to implement features such as filters and performance comparison with solr/Bobo ! Yelp’s implementation of index replication. ! Various tricks used at Yelp to make the service faster. Using Solr Cloud to Tame an Index Explosion JON GIFFORD | LOGGLY We have hundreds of customers, each of whom may have dozens of shards. To manage this explosion of indexes, I’ll describe how we’re using Solr Cloud to manage every index - from creation, through migration from box to box, and finally destruction. I’ll describe some of the performance issues we had to deal with, especially with ZooKeeper. Lots of Facets, Fast ANNE VELING | BEYONDTREES We created a web application for a well-known US newspaper, to create a maps-like zooming application on top of the 60,000 newspapers since 1850 and using Solr over the 28,000,000 articles to create an interactive heatmap over it. The out-of-the-box faceting solution was optimized using domain knowledge by order-of-magnitude which allowed us to create a great visual way of exploring trends in historical newspapers. 34
  • 36. LUCENE REVOLUTION San Francisco 2011 CPython Embedded in Solr - Search Solution for Python Lovers With the Speed of Native Java ROM AN CHYLA | CERN SPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest full text repository for the full text papers in High Energy Physics, and INSPIRE is the biggest digital library that merges the two. We must work with result sets bigger than 1 million for citation related queries and our partners from Astrophysics with 6 million sets, however INSPIRE is written in Python. So how do we move several million result sets between the two systems fast? How do we take advantage of our special NLP processing pipeline written in Python? How do we join them? We do not use Jython. We do not use pipes. We do not embed Solr inside INSPIRE. We embed INSPIRE into Solr! The talk shows benefits and challenges of this surprisingly elegant solution. 35
  • 37. San Francisco 2011 LUCENE REVOLUTION Rahul Agarwalla HEAD OF INTERNATIONAL BUSINESS, UCHIDA SPECTRUM INC !!!"%6()'04,")+"86- Rahul Agarwalla heads international business for Uchida Spectrum Inc, Japan. Previously he has built and exited two content/technology ventures including Matrix Information, the pioneer of digital content syndication in India. He has over 14 years of experience with various search technologies like Verity, FAST ESP and Solr/Lucene. Boris Aleksandrovsky SEARCH ARCHITECT, YAM M ER -!!!"9$,,(0")+,- Boris Aleksandrovsky works for Yammer, the Enterprise Social Network company, where they are trying to bring benefits of social media to enterprises by creating discoverable knowledge bases. He specializes in solving problems of search, machine learning and data analysis on large scale by employing distributed and scalable software architectures. Boris has almost completed his PhD in Computer Science and Neuroscience at University of California at Irvine. Josh Berkus CORE TEAM , POSTGRESQL !!!"62(56(0'%")+,- Josh Berkus has been working as a database application consultant for 8 years. Josh primarily builds applications for the legal and HR industries and does performance tuning. He was also head of Sun Microsystem's PosgtreSQL support staff for 2 years and helped launch BI startup Greenplum. 36
  • 38. LUCENE REVOLUTION San Francisco 2011 Ed Bueche DISTINGUISHED ENGINEER, EM C !!!"#$%"%&$' Ed Bueche is an EMC Distinguished Engineer and one of the Architects of the Documentum xPlore search engine (part of EMC’s Information Intelligence Group). He has been with Documentum/EMC for 12+ years and has more than 23 years of experience in performance/development in the industry, including companies like AT&T Bell Labs and Sybase. At Documentum he worked to improve performance & scalability for all previous Documentum full-text integrations (Verity and FAST). Ed has been a regular speaker for over 11 years at the Documentum worldwide user conferences (both in America, Europe) as well as at EMC World. Andrzej Bialecki TECHNICAL ADVISOR, LUCID IM AGINATION !!!"()%*+*$,-*.,/*&."%&$' Andrzej Bialecki, Apache Lucene PMC Member, also serves as project lead for Nutch, and as committer in the Lucene-java, Nutch and Hadoop projects. He has broad expertise across domains as diverse as information retrieval, systems architecture, embedded systems, networking and business process/e- commerce modeling. He’s also author of the popular Luke index inspection utility. Roman Chyla RESEARCH FELLOW , CERN !!!"%#0."%1' Roman Chyla is a research fellow at CERN, Switzerland. He works in the INSPIRE team to build the biggest digital library for the High Energy Physics. He is a developer and also information specialist, presented at four conferences, two of them international: Knihovny soucasnosti 2006, CASLIN 2007, IKI 2009, CASLIN 2009. Mark Davis CTO, KITENGA, INC !!!"2*/#.-,"%&$' Mark Davis is Founder and CTO of Kitenga, Inc. Previously he served as Principal Engineer at Xerox PARC spin-out InXight (acquired by Business Objects) and designed their enterprise product suite, as well as at Microsoft as a Program Manager for enterprise search and SharePoint. Mark spent nearly a decade as an academic researcher in the defense/intelligence community specializing in cross-language search and computational linguistics. He has extensive speaking experience in professional and academic forums. 37
  • 39. San Francisco 2011 LUCENE REVOLUTION Shantanu Deo TECHNICAL DIRECTOR, AT&T !!!"$''")+,- Shantanu Deo is a Technical Director in AT&T, in charge of their ecommerce CMS team. He is a patent holder and has in the past presented and published his work at the INFORMs conference on Optimization. His interests include web technologies, optimization and lately mobile web communications. Shantanu holds a BS in Computer Engineering from the university of Poona, India and MS degrees in the areas of Operations Research and Computer Science from the Louisiana State University. Esteban Donato LEAD ARCHITECT, TRAVELOCITY !!!"'0$;(.+)&'9")+,- Esteban Donato works as Lead Architect for Travelocity. He has worked as Java Developer, Technical Leader and Architect for the last 10 years in different industries. Esteban has been working with Solr and Lucene technology for the last 2 years implementing it in different projects. Esteban has given conferences about Solr and Data Mining in Travelocity and Universities in Buenos Aires, Argentina. Gregg Donovan TECHNICAL LEAD SEARCH, ETSY !!!"('%9")+,- Gregg Donovan is currently Technical Lead, Search at Etsy.com, the world’s most vibrant handmade marketplace. He has worked extensively with Solr and Lucene at Etsy, and, previously, at TheLadders.com. At Etsy, located in Brooklyn, NY, he leads the search engineering team as it tackles the challenges presented by a growing international marketplace with a half-million different sellers in 150 different countries selling tens of millions of items. Stephen Dunn HEAD OF TECHNOLOGY STRATEGY, GUARDIAN NEW S AND M EDIA UK !!!"'*(24$03&$1")+"4:- Stephen Dunn is Head of Technology Strategy for Guardian News and Media in the UK. He joined The Guardian in 1999 where he helps guide the technology strategy for it’s multiple award winning network of web sites and services. His professional interests include open web technologies, digital identity and security. Prior to joining the Guardian, Stephen completed his PhD at the Center for Computational Neuroscience and Robotics at Sussex University, UK. 38
  • 40. LUCENE REVOLUTION San Francisco 2011 Sudarshan Gaikaiwari SOFTW ARE ENGINEER, YELP INC !!!"9(.6")+,- Sudarshan Gaikaiwari is a software engineer working on Yelp’s search team. Prior to Yelp he worked on various information retrieval technologies at Symantec’s Data Loss Prevention group. Jon Gifford CO-FOUNDER, LOGGLY !!!".+22.9")+,- Jon Gifford is the CTO and co-founder of Loggly, where he spends all day coercing Solr into playing nice with the cloud, and with high-volume real-time data streams. An active user and frequent hacker of Lucene since 2004, he’s happy to let Solr take care of some of the hard work for a change. Prior to Loggly, he has spent more than a decade working on Search systems at Minimal Loop, Scout Labs, Technorati and LookSmart. He is concerned that his near-complete web- anonymity is under threat. Otis Gospodnetic FOUNDER, SEM ATEXT !!!"%(,$'(5'")+,- Otis Gospodnetic is a coauthor of Lucene in Action (1st and 2nd edition). He has been involved with Lucene since 2000 and Solr since 2006. He is also a member of Nutch, and Mahout development teams, as well as Lucene Project Management Committee. Otis is an Apache Software Foundation member and the founder of Sematext, a software development and consulting company focused on Search & Analytics using open-source technologies like Lucene, Solr, Nutch, Hadoop, HBase, Flume, and more. 39
  • 41. San Francisco 2011 LUCENE REVOLUTION Trey Grainger SEARCH TECHNOLOGY DEVELOPM ENT TEAM LEAD, CAREERBUILDER !!!")$0((0#4&.3(0")+,- Trey Grainger leads the Search Technology Development group at CareerBuilder.com. He introduced Solr to CareerBuilder and led the successful conversion away from the Microsoft FAST ESP platform. He has been with CareerBuilder for 4 years, and his search experience includes handling multi-lingual content across dozens of markets/languages, genetic algorithm and user group based relevancy tuning, geo-spatial search and validation, and work on customized payload scoring models, data mining, clustering, and recommendations. He is responsible for architecting CareerBuilder’s cloud-like search API exposing search as a simple, dynamic, and powerful generic service abstracted away from a large, globally-distributed architecture. Trey is also the founder and Chief Architect of Celiaccess.com, a gluten-free search engine and networking site. Eric Gries PRESIDENT AND CEO, LUCID IM AGINATION !!!".4)&3&,$2&1$'&+1")+,- Eric Gries joined Lucid Imagination as the President and CEO, after spending more than 20 years in executive leadership roles, where he built high-growth technology-based businesses. Prior to joining the company, Eric was an Executive-in-Residence at Granite Ventures. Eric has served as CEO, general manager and vice president for companies in application development, systems management, networking, financial services and hardware systems, in both the U.S. and Europe. Prior to joining Granite Ventures, Eric led XACCT, a pioneering network mediation market leader, as its president and CEO. XACCT was acquired by Amdocs in 2004, at which time Eric joined Amdocs’ executive team as Senior Vice President. Earlier in his career, Eric served as general manager of Compuware’s Network and Systems Management division, and held product management, marketing, sales and engineering positions at companies such as ACI, Cullinet Software and DEC. Erik Hatcher TECHNICAL STAFF, LUCID IM AGINATION !!!".4)&3&,$2&1$'&+1")+,- Erik Hatcher is the co-author of two books, Lucene in Action co-author of Java Development with Ant. Erik has been an active member of the Lucene community - a leading Lucene and Solr committer, member of the Lucene Project Management Committee, member of the Apache Software Foundation as well as a frequent invited speaker at various industry events. Erik earned his B.S. in Computer Science from University of Virginia, Charlottesville, VA. 40
  • 42. LUCENE REVOLUTION San Francisco 2011 Jay Hill SENIOR SEARCH ARCHITECT, LUCID IM AGINATION !!!".4)&3&,$2&1$'&+1")+,- Jay Hill has been building enterprise search applications since 2003, and has worked extensively with Autonomy IDOL, Lucene, and Solr. He is a certified Solr trainer, and is lead author for Lucid Imagination’s Solr training courses. Grant Ingersoll CO-FOUNDER, LUCID IM AGINATION !!!".4)&3&,$2&1$'&+1")+,- Grant Ingersoll is a founder and member of the technical staff at Lucid Imagination. Grant’s programming interests include information retrieval, machine learning, text categorization, and extraction. Grant is a regularly featured speaker at ApacheCon and other industry events. He has been an active member of the Lucene community – a Lucene and Solr committer, co-founder of the Apache Mahout machine learning project, chairman of the Lucene Project Management Committee (PMC) as well as a Vice President at the Apache Software Foundation. He is also the co-author of Taming Text (Manning, forthcoming) covering open source tools for natural-language processing. Grant’s prior experience includes work at the Center for Natural Language Processing at Syracuse University in natural language processing and information retrieval. Grant earned his B.S. from Amherst College in Math and Computer Science and his M.S. in Computer Science from Syracuse University, NY. Takahiko Ito SOFTW ARE ENGINEER, MIXI, INC !!!",&5&"86- Takahiko Ito received his Ph.D. in Engineering at Nara Institute of Science and Technology, specializing in graph mining. He was a specialist for Japanese and Asian language processing at Fast Search and Transfer prior to joining mixi, Inc as an R&D engineer. Selected Papers include: ! Masashi Shimbo, Takahiko Ito, Daichi Mochihashi, Yuji Matsumoto. On the Properties of von Neumann Kernels for Link Analysis. Machine Learning, 75:37-67, 2009. ! Takahiko Ito, Massashi Shimbo, Taku Kudo, Yuji Matsumoto. Application of Kernels to Link Analysis, The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005. 41
  • 43. San Francisco 2011 LUCENE REVOLUTION Alexander Kanarsky SENIOR SOFTW ARE ENGINEER, TRULIA !!!"'04.&$")+,- Alexander Kanarsky is responsible for managing day-to-day operations of Trulia’s indexing and search infrastructure and oversees the search related development there. Prior to Trulia he was a member of core development team for Autonomy’s Digital Safe, world’s largest private archive of electronic documents. Laura Kang TECHNICAL LEAD, SEARCH AND M ATCHING, THELADDERS !!!"'*(.$33(0%")+,- Laura Kang holds a B.A. in computer science, mathematics, and economics from University of California at Berkeley, and M.S. and Ph.D. in computational mechanism design from Harvard University. She has presented her work at several conferences, including the International Conference for Electronic Commerce and the ACM Conference on Electronic Commerce. Before joining TheLadders, she was a manager at a NYC technology startup. At TheLadders, she focuses on search and matching algorithms. Sudhakara Karegowdra PRINCIPLE ARCHITECT, TRAVELOCITY !!!"'0$;(.+)&'9")+,- Sudhakara Karegowdra works as Principle Architect for Travelocity. He has worked as Java Developer, Technical Leader and Architect for the last 14 years in different industries and 10 out of those in Travel industry. Sudhakar has been working with Solr and Lucene technology for the last 3 years implementing it in different projects. Sudhakara has given conferences about Solr in Travelocity. 42
  • 44. LUCENE REVOLUTION San Francisco 2011 Steve Kearns ROSETTE PRODUCT M ANAGER !!!"#$%&%'()*")+,- Steve is the product manager for the Rosette Platform and is also the subject matter expert for the international compliance market within Basis Technology. Prior to Basis Technology, Steve worked at BBN Technologies where he worked on the Broadcast and Web Monitoring Systems, which capture and extract open-source intelligence from live television and internet news websites. He has experience in information visualization, distributed systems architecture and received his MS in Information Technology and BS in Computer Information Systems from Bentley University. He also spoke at the Apache Lucene EuroCon 2010 in Prague, on the topic of Building Multilingual Search Based Applications. Marc Krellenstein FOUNDER, LUCID IM AGINATION !!!".4)&3&,$2&1$'&+1")+,- Marc Krellenstein is the founder of Lucid Imagination. Marc has 30 years’ experience in the computer industry, focusing for the last 20 years on information retrieval technology and applications. Marc was previously Chief Technology Officer and Vice President for Search and Discovery Technology at Elsevier, the scientific, technical and medical publishing division of Reed- Elsevier. Prior to Elsevier Marc was Chief Technology Officer and Senior Vice President of Engineering at Northern Light Technology, where he was the founding technologist and led the design and development of the Northern Light search service, including designing the data model, query interpretation, relevancy ranking, automatic document classification and patented technology for document clustering. Marc has an A.B. in philosophy from Cornell he earned his M.S. in computer science from the University of Wisconsin at Madison and a Ph.D. in psychology (cognitive science) from the New School for Social Research, NY. Ronald Mayer CTO, FORENSIC LOGIC, INC. !!!"/+0(1%&).+2&)")+,- Ronald Mayer has spent his career with technology start-ups in a number of fields ranging from medical devices to digital video to law enforcement software. Ron has also been involved in Open Source for decades, with code that has been incorporated in the LAME MP3 library, the PostgreSQL database, and the PostGIS geospatial extension. His most recent speaking engagement was when he gave a presentation on a broader aspect of this system to the SD Forum’s Emerging Tech SIG titled “Fighting Crime: Information Chokepoints & New Software Solutions” 43
  • 45. San Francisco 2011 LUCENE REVOLUTION Alberto Mijares CANOO ENGINEERING AG !!!")$1++")+,- Alberto Mijares is a software engineer with more than 10 years of experience. He is Scrum Master and an agile practitioner. He has a large background in Web technologies and Java, having participated in the past in W3C activities related with Semantic Web. His usual role is either leading projects or designing architectures for web applications. He started working in Canoo Engineering AG (Switzerland) in 2008 and speaks Spanish, English and German. He has a degree in Computer Engineering. He has participated giving talks in Java and Web related conferences and user groups in Switzerland and Spain. Floyd Morgan INTUIT !!!"&1'4&'")+,- Floyd is a Principal Software Engineer who works in the Central Technology Organization at Intuit, makers of TurboTax, Quickbooks, Quicken and Intuit Payroll, to name a few. Floyd has developed core features of the flagship TurboTax product line and recently co-founded Intuit’s newest social driven technology Live Community. Under Floyd’s direction, Live Community has gone from a small project to a widely adopted platform used by most Intuit products and services. Floyd earned his B.S. from San Diego State University in Computer Science. Stephen O’Grady CO-FOUNDER AND PRINCIPAL ANALYST, REDM ONK !!!"0(3,+1:")+,- Stephen O’Grady is the co-founder and Principal Analyst of RedMonk, a boutique industry analyst firm focused on developers. Founded in 2002, RedMonk provides strategic advisory services to some of the most successful technology firms in the world. Stephen’s focus is on infrastructure software such as programming languages, operating systems and databases, with a special focus on open source and big data. Before setting up RedMonk, Stephen worked as an analyst at Illuminata. Prior to joining Illuminata, Stephen served in various senior capacities with large systems integration firms like Keane and consultancies like Blue Hammock. Regularly cited in publications such as the New York Times, NPR, the Boston Globe, and the Wall Street Journal, and a popular speaker and moderator on the conference circuit, Stephen’s advice and opinion is well respected throughout the industry. 44
  • 46. LUCENE REVOLUTION San Francisco 2011 Timothy Potter SENIOR ENGINEER, NATIONAL RENEW ABLE ENERGY LABORATORY (NREL) !!!"10(."2+;- Timothy is a highly skilled technologist with over 13 years experience delivering innovative software solutions that encompass a wide range of technologies and business sectors. Currently, Mr. Potter is a Senior Engineer at the National Renewable Energy Laboratory (NREL) where he leads the effort to build a large-scale distributed platform for handling smart grid related energy data using Hadoop and NoSQL technologies. Prior to NREL, Timtohy was the CTO for Viyya Technologies where he developed a large-scale content recommendation system based on Solr, Mahout, and Hadoop running in the Amazon Cloud. As a Senior Software Engineer for the WebLogic Platform at BEA Systems, he was the chief inventor of several US Patents that helped revolutionize J2EE-based enterprise application integration. His technical blog (http://thelabdude.blogspot.com/) is highly respected as a guide for other developers in the open-source Java community. Mr. Potter has a BS in Mathematics and BA in Economics with honors (summa cum laude) from the University of Colorado. Daniel Potzinger AOE M EDIA GM BH !!!"$+(,(3&$"3(- Daniel Potzinger has more than 10 years of web development experience under his belt. He is a skillful hand at developing clean solutions with a particular love of elegant, easily maintained and reusable coding. Daniel is always open to new projects and development methods, such as Agile Software development. Over the last few years since joining AOE media, Daniel has played “midwife” to more than 60 Enterprise CMS-Projects for such renowned clients as congstar, Cisco WebEx and VMware, Panasonic and the like: taking care of client requirements, directing the development and launching the results. 45
  • 47. San Francisco 2011 LUCENE REVOLUTION Craig Rees SENSIS !"#$%$&'()&*+, Craig Rees has been at Sensis since 2008. Craig heads up the content and search groups which manage the search capabilities, platforms and operational teams that support the Yellow Pages® and White Pages® businesses. Craig is the author of the Sensis Content Strategy and the technology owner of the Sensis Business Search API. Prior to joining Sensis, Craig worked in digital strategy development and implementation roles in the United Kingdom with companies including BBC, Sky and Argos. Ramon Resma ARCHITECT, TRAVELOCITY ---&./*0"1('%.2&'(), Ramon Resma works as an Architect for Travelocity Mobile. He has over 22 years of experience in the travel industry and has worked on technical leadership roles for Travelocity Architecture, Sabre Airline Solutions Architecture, and American Airlines. Ramon has been working with Solr and Lucene technology for the last 2 years. Recently he worked on implementing Solr functions for serving location-based content on travel mobile applications. Yonik Seeley CREATOR OF APACHE SO LR & CO-FOUNDER LUCID IM AGINATION ---&1+'%3%)*4%#*.%(#&'(), Yonik Seeley is the creator of Solr. He is an expert in distributed search systems architecture and performance. Yonik has been a prolific Lucene/Solr committer, a member of the Lucene PMC, and a member of the Apache Software Foundation. Yonik’s work experience includes CNET Networks, BEA and Telcordia. He earned his M.S. in Computer Science from Stanford University. 46