SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Securing Solr Documents with
         ManifoldCF

     How to Enforce Repository
      Authorization with Solr




                 2
What I Will Cover
§  What ManifoldCF does and the problem it is
    designed to solve
§  ManifoldCF’s way of mapping repository
    security to documents indexed by Solr/
    Lucene
§  A Q&A panel session describing real-world
    usage of the ManifoldCF security projection
    model




                      3
Who am I?
§  I am:
   •  Karl Wright (kwright@apache.org)
   •  Principal Software Engineer at Nokia, Inc.
   •  Formerly Principal Software Engineer at
      MetaCarta, Inc.
§  What I do:
   •  Work at Nokia on making location search better
   •  Designer and original implementer of
      ManifoldCF
   •  Author of ‘ManifoldCF in Action’
   •  Committer for ManifoldCF
   •  Other interests include musical composition,
      quantum mechanics, and evolutionary biology

                                 4
Let’s search our repository using Solr!

§  But first, we have to get our repository
    documents indexed by Solr
§  And then… there’s another obstacle… VINNY




                       5
Who is this Vinny guy??
§  Chances are, you already know him
§  “Vinny” protects your organization’s content
§  “Vinny” prevents unauthorized users from
    seeing what they aren’t supposed to see
§  “Vinny” isn’t going to let you index his content
    unless you can control access in the same way




                          6
ManifoldCF to the Rescue!
§  Plug-in architecture allows connectors
    to easily be written, if they don’t exist
    already
§  Existing repository connectors for web,
    RSS, JDBC, CIFS (shared file
    system), SharePoint, Meridio, FileNet,
    LiveLink, Documentum, CMIS
§  Existing output connectors for Solr,
    GTS, and OpenSearchServer
§  Includes a user-facing UI, an API, and
    an Authorization Service
                           7
Query Restriction Model




(From ManifoldCF in Action, Chapter 4. Reprinted with permission.)


                                 8
How ManifoldCF Implements
       Query Restriction
§  Document access tokens are sent to the search
    index along with the document content
§  Separate bins for “allow” tokens, “deny” tokens
    – for “file”, multiple “folder”, and “share” levels
§  In practice, only “file” and “share” levels are
    needed
§  ManifoldCF Authority Service maps user names
    to a user’s access tokens
§  Solr SearchComponent or QParserPlugin
    communicates with the MCF Authority Service
    and performs the query modification
                           9
ManifoldCF Architecture




            10
What does the Pull-Agent
           daemon do?
§  Pulls documents from various repositories,
    continuously or on a schedule, and hands them
    to the output search engine
§  Incremental – does as little work as possible
§  Also fetches and indexes each document’s
    access tokens




                        11
What does the Authority
     Service do?




            12
Ok, what does the Authority
       Service REALLY do?
§  User names go in (user@domain)
§  Access tokens come out – for all active
    authority connections currently defined in that
    ManifoldCF instance
§  HTTP based, line-by-line output, with helpful
    hints:
curl http://localhost:8345/mcf-authority-service/
UserACLs?username=foo@bar.com!
UNREACHABLEAUTHORITY:The+Spanish+Inquisition!
TOKEN:My+Authority:DEAD_AUTHORITY!
AUTHORIZED:Null+authority!
TOKEN:Null:foo%40bar.com!


                            13
What do you have to do to Solr
   to make this all work?
    §  Add fields to the schema to contain
        document access tokens
       •  A field for document-level “allow”
          tokens
       •  A field for document-level “deny” tokens
       •  A field for share-level “allow” tokens
       •  A field for share-level “deny” tokens
    §  Add something that authenticates a
        user and obtains a user name
    §  Add a SearchComponent or Query
        Parser to restrict incoming query
                      14
The Solr component is
     NOT where the magic is…
§  Each access token returned by
    the Authority Service adds a
    clause to a BooleanQuery
§  It is rare for a user to have more
    than one hundred access tokens
    – except for Documentum!!
§  ManifoldCF in Action provides an
    example Solr SearchComponent
§  dist/solr-integration provides
    a Solr SearchComponent and
    QParserPlugin (MCF trunk)
                          15
How are the four token types
             related?
§  Share and document levels computed
    independently; an included document must
    pass both
§  For each level, DENY tokens exclude and
    ALLOW tokens permit, but DENY tokens
    always win over ALLOW
§  Special meaning for no tokens at all at a level –
    no ALLOW nor DENY tokens means “public” –
    handled by a default token in Solr
§  Active Directory does it exactly the same way,
    oddly enough, using SIDs for tokens
                          16
Example
Document       Share allow Share deny   Doc allow Doc deny
Look_at_me     (empty)      (empty)     (empty)   (empty)
Very_secret    (empty)      (empty)     (empty)   T1
Not_picky      (empty)      (empty)     T1, T2, T3 T4
Really_picky   (empty)      (empty)     T1        (empty)
Insane         T1, T2       T3          T3, T2    T1
Share_ctrl’d   T1, T2, T3   T4          (empty)   (empty)


§  “Not_picky” and “Share_ctrl’d” seen by the
    same people
§  “Very_secret” seen by nobody
§  “Insane” seen by people with T2 only

                                 17
What is still missing from the
              picture?
§  Well, getting documents and authorization info
    into Solr is covered…
§  Getting authorization information for a user is
    covered…
§  Modifying the search to enforce authorization is
    covered…
§  Authentication is NOT covered!
   •  ManifoldCF does not help you with this problem
      – yet
   •  Consider JAAS in Tomcat
   •  Apache web server’s mod-auth-kerb also works
                          18
Do you think these people
  care about security?




             19
Wrap Up
§  ManifoldCF provides a great way to project
    repository security into Solr
§  ManifoldCF effectively converts repository
    security into an AD-like token model
§  As long as you can provide the authentication,
    MCF and Solr can provide the rest
§  Nobody ever expects the Spanish Inquisition




                         20
Our Panel Today
§  Karl Wright
§  Eric Pugh
§  Shinichiro Abe




                     21
Sources
§  ManifoldCF in Action
   •  http://www.manning.com/wright
   •  http://manifoldcfinaction.googlecode.com/svn/
      trunk/edition_1/security_example




                           22
Contacts
§  Shinichiro Abe
   •  shinichiro@apache.org
   •  http://www.rondhuit.com/apache-
      manifoldcf.html (In Japanese)
§  Eric Pugh
   •  epugh@opensourceconnections.com
§  Karl Wright
   •  kwright@apache.org
   •  http://manifoldcfinaction.blogspot.com



                           23

Contenu connexe

En vedette

Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solrMike Frampton
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearchBesoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearchfrancelabs
 
Presentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGPresentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGfrancelabs
 

En vedette (6)

Apache ManifoldCF
Apache ManifoldCFApache ManifoldCF
Apache ManifoldCF
 
Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solr
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearchBesoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
 
Presentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGPresentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUG
 
Engineering Drawing
Engineering DrawingEngineering Drawing
Engineering Drawing
 

Plus de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Plus de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Securing Documents in Solr with Manifold CF - Karl Wright

  • 1. Securing Solr Documents with ManifoldCF How to Enforce Repository Authorization with Solr 2
  • 2. What I Will Cover §  What ManifoldCF does and the problem it is designed to solve §  ManifoldCF’s way of mapping repository security to documents indexed by Solr/ Lucene §  A Q&A panel session describing real-world usage of the ManifoldCF security projection model 3
  • 3. Who am I? §  I am: •  Karl Wright (kwright@apache.org) •  Principal Software Engineer at Nokia, Inc. •  Formerly Principal Software Engineer at MetaCarta, Inc. §  What I do: •  Work at Nokia on making location search better •  Designer and original implementer of ManifoldCF •  Author of ‘ManifoldCF in Action’ •  Committer for ManifoldCF •  Other interests include musical composition, quantum mechanics, and evolutionary biology 4
  • 4. Let’s search our repository using Solr! §  But first, we have to get our repository documents indexed by Solr §  And then… there’s another obstacle… VINNY 5
  • 5. Who is this Vinny guy?? §  Chances are, you already know him §  “Vinny” protects your organization’s content §  “Vinny” prevents unauthorized users from seeing what they aren’t supposed to see §  “Vinny” isn’t going to let you index his content unless you can control access in the same way 6
  • 6. ManifoldCF to the Rescue! §  Plug-in architecture allows connectors to easily be written, if they don’t exist already §  Existing repository connectors for web, RSS, JDBC, CIFS (shared file system), SharePoint, Meridio, FileNet, LiveLink, Documentum, CMIS §  Existing output connectors for Solr, GTS, and OpenSearchServer §  Includes a user-facing UI, an API, and an Authorization Service 7
  • 7. Query Restriction Model (From ManifoldCF in Action, Chapter 4. Reprinted with permission.) 8
  • 8. How ManifoldCF Implements Query Restriction §  Document access tokens are sent to the search index along with the document content §  Separate bins for “allow” tokens, “deny” tokens – for “file”, multiple “folder”, and “share” levels §  In practice, only “file” and “share” levels are needed §  ManifoldCF Authority Service maps user names to a user’s access tokens §  Solr SearchComponent or QParserPlugin communicates with the MCF Authority Service and performs the query modification 9
  • 10. What does the Pull-Agent daemon do? §  Pulls documents from various repositories, continuously or on a schedule, and hands them to the output search engine §  Incremental – does as little work as possible §  Also fetches and indexes each document’s access tokens 11
  • 11. What does the Authority Service do? 12
  • 12. Ok, what does the Authority Service REALLY do? §  User names go in (user@domain) §  Access tokens come out – for all active authority connections currently defined in that ManifoldCF instance §  HTTP based, line-by-line output, with helpful hints: curl http://localhost:8345/mcf-authority-service/ UserACLs?username=foo@bar.com! UNREACHABLEAUTHORITY:The+Spanish+Inquisition! TOKEN:My+Authority:DEAD_AUTHORITY! AUTHORIZED:Null+authority! TOKEN:Null:foo%40bar.com! 13
  • 13. What do you have to do to Solr to make this all work? §  Add fields to the schema to contain document access tokens •  A field for document-level “allow” tokens •  A field for document-level “deny” tokens •  A field for share-level “allow” tokens •  A field for share-level “deny” tokens §  Add something that authenticates a user and obtains a user name §  Add a SearchComponent or Query Parser to restrict incoming query 14
  • 14. The Solr component is NOT where the magic is… §  Each access token returned by the Authority Service adds a clause to a BooleanQuery §  It is rare for a user to have more than one hundred access tokens – except for Documentum!! §  ManifoldCF in Action provides an example Solr SearchComponent §  dist/solr-integration provides a Solr SearchComponent and QParserPlugin (MCF trunk) 15
  • 15. How are the four token types related? §  Share and document levels computed independently; an included document must pass both §  For each level, DENY tokens exclude and ALLOW tokens permit, but DENY tokens always win over ALLOW §  Special meaning for no tokens at all at a level – no ALLOW nor DENY tokens means “public” – handled by a default token in Solr §  Active Directory does it exactly the same way, oddly enough, using SIDs for tokens 16
  • 16. Example Document Share allow Share deny Doc allow Doc deny Look_at_me (empty) (empty) (empty) (empty) Very_secret (empty) (empty) (empty) T1 Not_picky (empty) (empty) T1, T2, T3 T4 Really_picky (empty) (empty) T1 (empty) Insane T1, T2 T3 T3, T2 T1 Share_ctrl’d T1, T2, T3 T4 (empty) (empty) §  “Not_picky” and “Share_ctrl’d” seen by the same people §  “Very_secret” seen by nobody §  “Insane” seen by people with T2 only 17
  • 17. What is still missing from the picture? §  Well, getting documents and authorization info into Solr is covered… §  Getting authorization information for a user is covered… §  Modifying the search to enforce authorization is covered… §  Authentication is NOT covered! •  ManifoldCF does not help you with this problem – yet •  Consider JAAS in Tomcat •  Apache web server’s mod-auth-kerb also works 18
  • 18. Do you think these people care about security? 19
  • 19. Wrap Up §  ManifoldCF provides a great way to project repository security into Solr §  ManifoldCF effectively converts repository security into an AD-like token model §  As long as you can provide the authentication, MCF and Solr can provide the rest §  Nobody ever expects the Spanish Inquisition 20
  • 20. Our Panel Today §  Karl Wright §  Eric Pugh §  Shinichiro Abe 21
  • 21. Sources §  ManifoldCF in Action •  http://www.manning.com/wright •  http://manifoldcfinaction.googlecode.com/svn/ trunk/edition_1/security_example 22
  • 22. Contacts §  Shinichiro Abe •  shinichiro@apache.org •  http://www.rondhuit.com/apache- manifoldcf.html (In Japanese) §  Eric Pugh •  epugh@opensourceconnections.com §  Karl Wright •  kwright@apache.org •  http://manifoldcfinaction.blogspot.com 23