SlideShare une entreprise Scribd logo
1  sur  39
Angel Borroy
Tom Page
10th June 2020
(Re)Indexing
Large Repositories
22
Agenda
(Re)Indexing Large Repositories
• Alfresco SOLR in a Nutshell
• Indexing Process
• Indexing Scenarios
• When to Re-Index
• Deployment Alternatives
• Demo time without downtime!
• Benchmark Review
• Improvements in 1.4.2
• Future Improvements
• Recap
Alfresco SOLR
3
Alfresco SOLR in a Nutshell
SOLR 6 is used in Alfresco to perform two main processes:
• Indexing (or tracking) metadata, permissions and content from Alfresco Repository
• Returning results from search queries supporting several syntaxes (AFTS, CMIS)
Indexing process
Asynchronous
4
Searching process
Eventual consistency
SOLR is indexing the information after the database has committed the transaction, so there is a short period of time
when not all the documents are available in SOLR Index. We call this eventual consistency, as SOLR will catch up with
Repository eventually.
Syntax
AFTS
CMIS
Alfresco SOLR in a Nutshell
Permission
Checks
Synchronous
5
Alfresco SOLR in a Nutshell
Alfresco SOLR Storage
By default two SOLR cores are created, one for the living documents (alfresco) and one for the removed documents
(archive).
Each core includes following storage folders:
• Default SOLR Index files in the solrhome/<core>/index folder
• Alfresco customized Content Store in the contentstore folder
• This folder includes a cached copy of Repository content and metadata
• Content Store will be removed in Search Services 2.0
“These folders are populated by the Indexing Process
6
Indexing process
● Each tracker is fired asynchronously according to a cron expression: alfresco.cron or alfresco.*.tracker.cron
● Transactions and ACL Change Sets are processed in batches of Nodes or ACLs
● Batches are split to be executed in parallel by Workers
● However, Content Tracker recovers text from content nodes one by one
● Commit Tracker writes the changes from the different Trackers to SOLR Index "eventually"
>> Cascade Tracker is not running when indexing from scratch
7
Indexing scenarios
1. When updating the repository using applications or bulk ingestion
processes, the transactions will include a long list of nodes to be
indexed
2. When using Alfresco Share to create new content, there will be
more transactions but every transaction will include a small list of
nodes to be indexed
3. When setting the permission level for every node in a hierarchy
manually, the ACL Change Sets will include a long list of ACLs to
be indexed
4. When using default Alfresco permissions design, the ACL
Change Sets will include a small list of ACLs to be indexed
5. When using complex format of documents, Transformation
Service will require additional resources
6. When using large documents, SOLR Index will require additional
storage
8
Indexing scenarios
Controlling what to index
• Content can be excluded from SOLR Index by configuration
solrcore.properties > alfresco.index.transformContent=false
https://docs.alfresco.com/search-community/concepts/solrcore-properties-file.html
• Some nodes can be excluded from SOLR Index by using the Index Control aspect
cm:indexControl > cm:isIndexed :: false, metadata and content is not
indexed
cm:indexControl > cm:isContentIndexed :: false, content is not indexed
https://docs.alfresco.com/community/concepts/admin-indexes.html
• Some properties can be excluded from SOLR Index by design in the Content Model
<property>
<index enabled=”false”/>
</property>
https://docs.alfresco.com/community/references/dev-extension-points-content-model-define-and-deploy.html
Add this setting to
archive
core by default!
9
Re-indexing process can take some time, from a few hours to a few days, in large repositories.
Full re-index
• When upgrading to a major Search Services release, like 2.0
• When the SOLR Index has been corrupted, due to technical reasons
• When breaking changes are introduced in common custom Content Models
Partial re-index
• This process could also take some time, depending on the amount of documents to be re-indexed. But it will take
less than a full re-index
• When incremental changes are introduced in a Content Model, partial reindexation can be fired by using the SOLR
REST API
http://localhost:8983/solr/admin/cores?action=reindex&query=TYPE:person
When to re-index
10
Deployment alternatives
https://docs.alfresco.com/sie/concepts/solr-shard-overview.html
https://docs.alfresco.com/sie/concepts/solr-replication.html
https://docs.alfresco.com/sie/tasks/solr-install.html
11
• Using the ZIP Distribution file
https://docs.alfresco.com/search-community/concepts/solr-install-config.html
• Using Docker or Docker Compose
https://github.com/Alfresco/SearchServices/tree/master/search-services
https://github.com/Alfresco/acs-community-deployment/tree/master/docker-compose
https://github.com/Alfresco/alfresco-docker-installer
• Using Kubernetes
https://github.com/Alfresco/acs-community-deployment/tree/master/helm/alfresco-content-services-community
Installing alternatives
12
Deployment schema to minimize downtime in re-indexing processes
> When using different SOLR version,
configure Alfresco Repository to use the new SOLR server *
> When using the same SOLR version,
INDEX folder can be used directly
* Upgrading from SOLR 4 to SOLR 6 is not allowed when using Alfresco CE 6.2.0-ga (thanks for raising this @AFaust) >> SEARCH-2289
Deployment for Re-Indexing
13
When configuring an Alfresco Node to perform the reindexing process, there are some services you can switch off
depending on your requirements:
• Scheduled Jobs can be disabled, as they will be run by the Alfresco instance in the living service
https://docs.alfresco.com/6.2/concepts/scheduled-jobs.html
• Some ACS features can be disabled
https://docs.alfresco.com/6.2/concepts/maincomponents-disable.html
• Additional subsystems (apart from Search or Transformation) can be disabled
https://docs.alfresco.com/6.2/concepts/subsystem-categories.html
• Activities
• Audit
• Email
• …
“Don’t make a copy of your Alfresco Repository production configuration and press the start button!
Alfresco Repository Indexing Configuration
14
Monitoring
Profiling
• Using VisualVM or YourKit Java Profiler for the JVMs
(Repository, SOLR)
• Using pg_stats_statements extension or some other DB tool
https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-6-
profiling-with-docker/ba-p/295846
https://github.com/aborroy/alfresco-6-profiling
Monitoring
• Using Prometheus with Grafana (Repository, SOLR)
https://hub.alfresco.com/t5/alfresco-content-services-blog/monitoring-
alfresco-solr-with-prometheus-and-grafana/ba-p/294157
https://github.com/aborroy/alfresco-solr-monitoring
1515
Demo time without downtime!
16
• Living Docker Compose environment running with around 4,000 text documents indexed
• Using YourKit-Java-Profiler to monitor Repository performance
• Starting a new Search Services 2.0 server locally to start indexing the repository
• Once Search Services 2.0 is updated, change Solr hostname value from Admin Web
Console or modify alfresco-global.properties
Search Services 2.0
is not
released yet!
Demo time without downtime!
http://127.0.0.1:8083/solr/alfresco/select?indent=on&q=TEXT:[* TO *]&wt=json
http://127.0.0.1:8983/solr/alfresco/select?indent=on&q=TEXT:[* TO *]&wt=json
1717
Benchmark Review
18
1 Billion Documents Review (2015)
• Review from 1 billion benchmarks (November 2015)
• 10 repository nodes (Alfresco 5.1), 20 Solr 4 nodes (Alfresco Index Server)
• Indexed 1b documents in 5 days
How Alfresco powered a 1.2 Billion document deployment on Amazon Web Services
1919
Improvements in 1.4.2
20
1.2 Billion Baseline Plan (2020)
• Customer-sponsored benchmark to see performance of system with their configuration
• Want 1.2b documents indexed into Search Services
• 20 instances, each containing a single shard (DB_ID_RANGE based sharding)
21
• Bottlenecks
• Database (getChildAssocs)
• Transformers (when using large documents)
• Network (when using large metadata/content)
• Time spent processing data for other shards
Performance considerations
22
Baseline Results
• Estimated completion in 21 days
23
Baseline Results
• Estimated completion in 21 days
24
DB_ID_RANGE Sharding
• Does not require specifying total number of shards in advance
• Index can continue to grow with repository
See https://docs.alfresco.com/search-enterprise/concepts/solr-shard-approaches.html
25
Cascade Tracking
26
Cascade Tracking
27
Time spend processing transactions for other shards
• With DB_ID_RANGE sharding we know that only a range of transactions are relevant
• Skip transactions when using DB_ID_RANGE
• To support path queries we sometimes need to update data on multiple shards from a single change
• Option to disable cascade tracking
28
Reduce Database Access and Network Usage
• Reduce amount of data requested
• Remove unused calls to getChildAssocs
• Compress communication where appropriate
• Add option to compress content transfer
Lorem ipsum dolor
sit amet,
consectetur
adipiscing elit...
Please give me all
metadata for the
node
Please give me:
● X
● Y
● Z
78 9c 05 c1 81 09
c0 30 08 04 c0 ...
29
Overview of Improvements in 1.4.2
• Search Services 1.4.2 (and Insight Engine 1.4.2)
• ACS Repository 6.2 Enterprise
• No ACS Community release containing this yet
• However can use existing ACS and jars from https://github.com/aborroy/solr-performance-services-repo
Reindex of 1.2b documents in 10 days
(6 repo nodes, 20 search nodes)
Search Services 1.3.0
150 documents/second
Search Services 1.4.2
1200-3500 documents/second*
(depending on the number of
shards, size of documents, etc.)
* Depending on exact configuration
(Nb. Not yet validated on the production system)
3030
Future Improvements
31
Future Improvements - Coming in 2.0.0
• Schema Simplification
• Smaller index
• Removing Duplicate Fields
• Smaller communication
• Improved Trackers
• Less duplication with large transactions
• New tracker parallelism option
• Content Store Removal
• Reduced disk usage
• Less duplication
• Better usage of Solr optimisations
• Adds potential to use other Solr features
32
Scenarios datasets
• 100,000 documents created with 100,000 transactions
• 100,000 documents created with 1 transaction
• Changing the path for 100,000 documents
• 200,000 ACLs created with 200,000 ACL change sets
Parameters investigated
• The existing *BatchSize size parameters
• The new *MaxParallelism parameters
• These change the number of workers assigned to the
tracker. They use a ForkJoinPool, and can impact the
resources available to other processes
Improved Trackers - Testing
33
Hotspot calculation
• Increasing the Transaction Batch Size for nodes and ACLs
has an impact while the maximum number for your
deployment is not reached. After that, you can increase this
batch size but there will be no performance changes
• Increasing the Node Batch Size can improve your
performance while you are down the right number for your
deployment. After that, you can increase this batch size but
the performance will be penalised
• Increasing the maximum number of Parallel Threads
improved performance until the maximum number for our
deployment was reached. However in a real world
deployment it may be useful to use a lower number to avoid
impacting other processes.
Improved Trackers - Testing
Duration
(ms)
#
34
Content Store Removal
• Solr Content store removal will reduce disk usage and simplify replication
The Solr Content Store
35
Content Store Removal
• Solr Content store removal will reduce disk usage and simplify replication
The Solr Content Store
Replication of index across Solr nodes
3636
Recap
37
When to re-index
• When upgrading to major Search Services releases
How to re-index
• Running some small tests to ensure the performance of the indexing process before running it in production
• Indexing from scratch with the upgraded Repository
• Indexing in a parallel deployment
How to measure
• Profiling
• Monitoring
Recap
Thank you!
39
Relevant works
https://nathanmcminn.com/2017/01/11/alfresco-and-solr-search-reindexing-and-index-cluster-size/
https://www.slideshare.net/JosePortillo26/jose-portillo-dev-con-presentation-1138
https://www.slideshare.net/angelborroy/2019-dev-con115angelborroy
https://blog.xenit.eu/blog/ethias-sharding
https://hub.alfresco.com/t5/alfresco-content-services-blog/large-repository-upgrades/ba-p/287877
https://hub.alfresco.com/t5/alfresco-content-services-blog/scaling-search-with-db-id-range/ba-p/287900
https://www.alfresco.com/technical-whitepaper/alfresco-content-services-solr-deployment-options
https://www.alfresco.com/technical-whitepaper/alfresco-content-services-solr-deployment-example-aws
https://docs.alfresco.com/6.2/concepts/upgrade-path.html

Contenu connexe

Tendances

From zero to hero Backing up alfresco
From zero to hero Backing up alfrescoFrom zero to hero Backing up alfresco
From zero to hero Backing up alfrescoToni de la Fuente
 
Alfresco Backup and Disaster Recovery White Paper
Alfresco Backup and Disaster Recovery White PaperAlfresco Backup and Disaster Recovery White Paper
Alfresco Backup and Disaster Recovery White PaperToni de la Fuente
 
Ef09 installing-alfresco-components-1-by-1
Ef09 installing-alfresco-components-1-by-1Ef09 installing-alfresco-components-1-by-1
Ef09 installing-alfresco-components-1-by-1Angel Borroy López
 
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseHow to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseAngel Borroy López
 
Collaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoCollaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoAngel Borroy López
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the TradeLuis Colorado
 
Alfresco Security Best Practices 2014
Alfresco Security Best Practices 2014Alfresco Security Best Practices 2014
Alfresco Security Best Practices 2014Toni de la Fuente
 
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...J V
 
alfresco-global.properties-COMPLETO-3.4.6
alfresco-global.properties-COMPLETO-3.4.6alfresco-global.properties-COMPLETO-3.4.6
alfresco-global.properties-COMPLETO-3.4.6alfrescosedemo
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 
Bulk Export Tool for Alfresco
Bulk Export Tool for AlfrescoBulk Export Tool for Alfresco
Bulk Export Tool for AlfrescoRichard McKnight
 
Alfresco Security Best Practices Guide
Alfresco Security Best Practices GuideAlfresco Security Best Practices Guide
Alfresco Security Best Practices GuideToni de la Fuente
 
Getting Started with CMIS
Getting Started with CMISGetting Started with CMIS
Getting Started with CMISJeff Potts
 
Architectural changes in the repo in 6.1 and beyond
Architectural changes in the repo in 6.1 and beyondArchitectural changes in the repo in 6.1 and beyond
Architectural changes in the repo in 6.1 and beyondStefan Kopf
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platformLuis Cabaceira
 
Alfresco y SOLR, presentación en español
Alfresco y SOLR, presentación en españolAlfresco y SOLR, presentación en español
Alfresco y SOLR, presentación en españolToni de la Fuente
 
Jose portillo dev con presentation 1138
Jose portillo   dev con presentation 1138Jose portillo   dev con presentation 1138
Jose portillo dev con presentation 1138Jose Portillo
 

Tendances (20)

From zero to hero Backing up alfresco
From zero to hero Backing up alfrescoFrom zero to hero Backing up alfresco
From zero to hero Backing up alfresco
 
Alfresco Certificates
Alfresco Certificates Alfresco Certificates
Alfresco Certificates
 
Alfresco Backup and Disaster Recovery White Paper
Alfresco Backup and Disaster Recovery White PaperAlfresco Backup and Disaster Recovery White Paper
Alfresco Backup and Disaster Recovery White Paper
 
Ef09 installing-alfresco-components-1-by-1
Ef09 installing-alfresco-components-1-by-1Ef09 installing-alfresco-components-1-by-1
Ef09 installing-alfresco-components-1-by-1
 
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseHow to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
 
Collaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoCollaborative Editing Tools for Alfresco
Collaborative Editing Tools for Alfresco
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the Trade
 
Alfresco Security Best Practices 2014
Alfresco Security Best Practices 2014Alfresco Security Best Practices 2014
Alfresco Security Best Practices 2014
 
Upgrading to Alfresco 6
Upgrading to Alfresco 6Upgrading to Alfresco 6
Upgrading to Alfresco 6
 
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
 
alfresco-global.properties-COMPLETO-3.4.6
alfresco-global.properties-COMPLETO-3.4.6alfresco-global.properties-COMPLETO-3.4.6
alfresco-global.properties-COMPLETO-3.4.6
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Bulk Export Tool for Alfresco
Bulk Export Tool for AlfrescoBulk Export Tool for Alfresco
Bulk Export Tool for Alfresco
 
Alfresco Security Best Practices Guide
Alfresco Security Best Practices GuideAlfresco Security Best Practices Guide
Alfresco Security Best Practices Guide
 
Getting Started with CMIS
Getting Started with CMISGetting Started with CMIS
Getting Started with CMIS
 
Architectural changes in the repo in 6.1 and beyond
Architectural changes in the repo in 6.1 and beyondArchitectural changes in the repo in 6.1 and beyond
Architectural changes in the repo in 6.1 and beyond
 
Webscripts
WebscriptsWebscripts
Webscripts
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
 
Alfresco y SOLR, presentación en español
Alfresco y SOLR, presentación en españolAlfresco y SOLR, presentación en español
Alfresco y SOLR, presentación en español
 
Jose portillo dev con presentation 1138
Jose portillo   dev con presentation 1138Jose portillo   dev con presentation 1138
Jose portillo dev con presentation 1138
 

Similaire à (Re)Indexing Large Repositories in Alfresco

Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
python_development.pptx
python_development.pptxpython_development.pptx
python_development.pptxLemonReddy1
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10Anshum Gupta
 
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...Symphony Software Foundation
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)Mathew Beane
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under controlMarcin Przepiórowski
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsKublr
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoAlluxio, Inc.
 
Lucene/Solr 8: The next major release
Lucene/Solr 8: The next major releaseLucene/Solr 8: The next major release
Lucene/Solr 8: The next major releaseSteve Rowe
 
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucidworks
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to KubernetesVishal Biyani
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
 
Day 7 - Make it Fast
Day 7 - Make it FastDay 7 - Make it Fast
Day 7 - Make it FastBarry Jones
 
Alfresco Day Roma 2015: Platform Update
Alfresco Day Roma 2015: Platform UpdateAlfresco Day Roma 2015: Platform Update
Alfresco Day Roma 2015: Platform UpdateAlfresco Software
 

Similaire à (Re)Indexing Large Repositories in Alfresco (20)

Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
python_development.pptx
python_development.pptxpython_development.pptx
python_development.pptx
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
ITB2017 - Keynote
ITB2017 - KeynoteITB2017 - Keynote
ITB2017 - Keynote
 
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)
 
Solr 4
Solr 4Solr 4
Solr 4
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container Operations
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
 
Lucene/Solr 8: The next major release
Lucene/Solr 8: The next major releaseLucene/Solr 8: The next major release
Lucene/Solr 8: The next major release
 
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
Day 7 - Make it Fast
Day 7 - Make it FastDay 7 - Make it Fast
Day 7 - Make it Fast
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
 
Alfresco Day Roma 2015: Platform Update
Alfresco Day Roma 2015: Platform UpdateAlfresco Day Roma 2015: Platform Update
Alfresco Day Roma 2015: Platform Update
 

Plus de Angel Borroy López

Transitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearchTransitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearchAngel Borroy López
 
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 EuropeAlfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 EuropeAngel Borroy López
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Using Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms togetherUsing Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms togetherAngel Borroy López
 
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...Angel Borroy López
 
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1Angel Borroy López
 
Docker Init with Templates for Alfresco
Docker Init with Templates for AlfrescoDocker Init with Templates for Alfresco
Docker Init with Templates for AlfrescoAngel Borroy López
 
Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0Angel Borroy López
 
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeCSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeAngel Borroy López
 
Alfresco Embedded Activiti Engine
Alfresco Embedded Activiti EngineAlfresco Embedded Activiti Engine
Alfresco Embedded Activiti EngineAngel Borroy López
 
Desarrollando una Extensión para Docker
Desarrollando una Extensión para DockerDesarrollando una Extensión para Docker
Desarrollando una Extensión para DockerAngel Borroy López
 
DockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdfDockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdfAngel Borroy López
 
Deploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP PlatformsDeploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP PlatformsAngel Borroy López
 
A Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrA Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrAngel Borroy López
 
Alfresco search services: Now and Then
Alfresco search services: Now and ThenAlfresco search services: Now and Then
Alfresco search services: Now and ThenAngel Borroy López
 
Docker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
Docker 101 - Zaragoza Docker Meetup - Universidad de ZaragozaDocker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
Docker 101 - Zaragoza Docker Meetup - Universidad de ZaragozaAngel Borroy López
 

Plus de Angel Borroy López (20)

Transitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearchTransitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearch
 
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 EuropeAlfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Using Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms togetherUsing Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms together
 
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
 
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
 
Docker Init with Templates for Alfresco
Docker Init with Templates for AlfrescoDocker Init with Templates for Alfresco
Docker Init with Templates for Alfresco
 
Before & After Docker Init
Before & After Docker InitBefore & After Docker Init
Before & After Docker Init
 
Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0
 
Using Podman with Alfresco
Using Podman with AlfrescoUsing Podman with Alfresco
Using Podman with Alfresco
 
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeCSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
 
Alfresco Embedded Activiti Engine
Alfresco Embedded Activiti EngineAlfresco Embedded Activiti Engine
Alfresco Embedded Activiti Engine
 
Alfresco Transform Core 3.0.0
Alfresco Transform Core 3.0.0Alfresco Transform Core 3.0.0
Alfresco Transform Core 3.0.0
 
Desarrollando una Extensión para Docker
Desarrollando una Extensión para DockerDesarrollando una Extensión para Docker
Desarrollando una Extensión para Docker
 
DockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdfDockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdf
 
Deploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP PlatformsDeploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP Platforms
 
Introduction to AWS
Introduction to AWSIntroduction to AWS
Introduction to AWS
 
A Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrA Practical Introduction to Apache Solr
A Practical Introduction to Apache Solr
 
Alfresco search services: Now and Then
Alfresco search services: Now and ThenAlfresco search services: Now and Then
Alfresco search services: Now and Then
 
Docker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
Docker 101 - Zaragoza Docker Meetup - Universidad de ZaragozaDocker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
Docker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
 

Dernier

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 

Dernier (20)

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 

(Re)Indexing Large Repositories in Alfresco

  • 1. Angel Borroy Tom Page 10th June 2020 (Re)Indexing Large Repositories
  • 2. 22 Agenda (Re)Indexing Large Repositories • Alfresco SOLR in a Nutshell • Indexing Process • Indexing Scenarios • When to Re-Index • Deployment Alternatives • Demo time without downtime! • Benchmark Review • Improvements in 1.4.2 • Future Improvements • Recap Alfresco SOLR
  • 3. 3 Alfresco SOLR in a Nutshell SOLR 6 is used in Alfresco to perform two main processes: • Indexing (or tracking) metadata, permissions and content from Alfresco Repository • Returning results from search queries supporting several syntaxes (AFTS, CMIS) Indexing process Asynchronous
  • 4. 4 Searching process Eventual consistency SOLR is indexing the information after the database has committed the transaction, so there is a short period of time when not all the documents are available in SOLR Index. We call this eventual consistency, as SOLR will catch up with Repository eventually. Syntax AFTS CMIS Alfresco SOLR in a Nutshell Permission Checks Synchronous
  • 5. 5 Alfresco SOLR in a Nutshell Alfresco SOLR Storage By default two SOLR cores are created, one for the living documents (alfresco) and one for the removed documents (archive). Each core includes following storage folders: • Default SOLR Index files in the solrhome/<core>/index folder • Alfresco customized Content Store in the contentstore folder • This folder includes a cached copy of Repository content and metadata • Content Store will be removed in Search Services 2.0 “These folders are populated by the Indexing Process
  • 6. 6 Indexing process ● Each tracker is fired asynchronously according to a cron expression: alfresco.cron or alfresco.*.tracker.cron ● Transactions and ACL Change Sets are processed in batches of Nodes or ACLs ● Batches are split to be executed in parallel by Workers ● However, Content Tracker recovers text from content nodes one by one ● Commit Tracker writes the changes from the different Trackers to SOLR Index "eventually" >> Cascade Tracker is not running when indexing from scratch
  • 7. 7 Indexing scenarios 1. When updating the repository using applications or bulk ingestion processes, the transactions will include a long list of nodes to be indexed 2. When using Alfresco Share to create new content, there will be more transactions but every transaction will include a small list of nodes to be indexed 3. When setting the permission level for every node in a hierarchy manually, the ACL Change Sets will include a long list of ACLs to be indexed 4. When using default Alfresco permissions design, the ACL Change Sets will include a small list of ACLs to be indexed 5. When using complex format of documents, Transformation Service will require additional resources 6. When using large documents, SOLR Index will require additional storage
  • 8. 8 Indexing scenarios Controlling what to index • Content can be excluded from SOLR Index by configuration solrcore.properties > alfresco.index.transformContent=false https://docs.alfresco.com/search-community/concepts/solrcore-properties-file.html • Some nodes can be excluded from SOLR Index by using the Index Control aspect cm:indexControl > cm:isIndexed :: false, metadata and content is not indexed cm:indexControl > cm:isContentIndexed :: false, content is not indexed https://docs.alfresco.com/community/concepts/admin-indexes.html • Some properties can be excluded from SOLR Index by design in the Content Model <property> <index enabled=”false”/> </property> https://docs.alfresco.com/community/references/dev-extension-points-content-model-define-and-deploy.html Add this setting to archive core by default!
  • 9. 9 Re-indexing process can take some time, from a few hours to a few days, in large repositories. Full re-index • When upgrading to a major Search Services release, like 2.0 • When the SOLR Index has been corrupted, due to technical reasons • When breaking changes are introduced in common custom Content Models Partial re-index • This process could also take some time, depending on the amount of documents to be re-indexed. But it will take less than a full re-index • When incremental changes are introduced in a Content Model, partial reindexation can be fired by using the SOLR REST API http://localhost:8983/solr/admin/cores?action=reindex&query=TYPE:person When to re-index
  • 11. 11 • Using the ZIP Distribution file https://docs.alfresco.com/search-community/concepts/solr-install-config.html • Using Docker or Docker Compose https://github.com/Alfresco/SearchServices/tree/master/search-services https://github.com/Alfresco/acs-community-deployment/tree/master/docker-compose https://github.com/Alfresco/alfresco-docker-installer • Using Kubernetes https://github.com/Alfresco/acs-community-deployment/tree/master/helm/alfresco-content-services-community Installing alternatives
  • 12. 12 Deployment schema to minimize downtime in re-indexing processes > When using different SOLR version, configure Alfresco Repository to use the new SOLR server * > When using the same SOLR version, INDEX folder can be used directly * Upgrading from SOLR 4 to SOLR 6 is not allowed when using Alfresco CE 6.2.0-ga (thanks for raising this @AFaust) >> SEARCH-2289 Deployment for Re-Indexing
  • 13. 13 When configuring an Alfresco Node to perform the reindexing process, there are some services you can switch off depending on your requirements: • Scheduled Jobs can be disabled, as they will be run by the Alfresco instance in the living service https://docs.alfresco.com/6.2/concepts/scheduled-jobs.html • Some ACS features can be disabled https://docs.alfresco.com/6.2/concepts/maincomponents-disable.html • Additional subsystems (apart from Search or Transformation) can be disabled https://docs.alfresco.com/6.2/concepts/subsystem-categories.html • Activities • Audit • Email • … “Don’t make a copy of your Alfresco Repository production configuration and press the start button! Alfresco Repository Indexing Configuration
  • 14. 14 Monitoring Profiling • Using VisualVM or YourKit Java Profiler for the JVMs (Repository, SOLR) • Using pg_stats_statements extension or some other DB tool https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-6- profiling-with-docker/ba-p/295846 https://github.com/aborroy/alfresco-6-profiling Monitoring • Using Prometheus with Grafana (Repository, SOLR) https://hub.alfresco.com/t5/alfresco-content-services-blog/monitoring- alfresco-solr-with-prometheus-and-grafana/ba-p/294157 https://github.com/aborroy/alfresco-solr-monitoring
  • 16. 16 • Living Docker Compose environment running with around 4,000 text documents indexed • Using YourKit-Java-Profiler to monitor Repository performance • Starting a new Search Services 2.0 server locally to start indexing the repository • Once Search Services 2.0 is updated, change Solr hostname value from Admin Web Console or modify alfresco-global.properties Search Services 2.0 is not released yet! Demo time without downtime! http://127.0.0.1:8083/solr/alfresco/select?indent=on&q=TEXT:[* TO *]&wt=json http://127.0.0.1:8983/solr/alfresco/select?indent=on&q=TEXT:[* TO *]&wt=json
  • 18. 18 1 Billion Documents Review (2015) • Review from 1 billion benchmarks (November 2015) • 10 repository nodes (Alfresco 5.1), 20 Solr 4 nodes (Alfresco Index Server) • Indexed 1b documents in 5 days How Alfresco powered a 1.2 Billion document deployment on Amazon Web Services
  • 20. 20 1.2 Billion Baseline Plan (2020) • Customer-sponsored benchmark to see performance of system with their configuration • Want 1.2b documents indexed into Search Services • 20 instances, each containing a single shard (DB_ID_RANGE based sharding)
  • 21. 21 • Bottlenecks • Database (getChildAssocs) • Transformers (when using large documents) • Network (when using large metadata/content) • Time spent processing data for other shards Performance considerations
  • 22. 22 Baseline Results • Estimated completion in 21 days
  • 23. 23 Baseline Results • Estimated completion in 21 days
  • 24. 24 DB_ID_RANGE Sharding • Does not require specifying total number of shards in advance • Index can continue to grow with repository See https://docs.alfresco.com/search-enterprise/concepts/solr-shard-approaches.html
  • 27. 27 Time spend processing transactions for other shards • With DB_ID_RANGE sharding we know that only a range of transactions are relevant • Skip transactions when using DB_ID_RANGE • To support path queries we sometimes need to update data on multiple shards from a single change • Option to disable cascade tracking
  • 28. 28 Reduce Database Access and Network Usage • Reduce amount of data requested • Remove unused calls to getChildAssocs • Compress communication where appropriate • Add option to compress content transfer Lorem ipsum dolor sit amet, consectetur adipiscing elit... Please give me all metadata for the node Please give me: ● X ● Y ● Z 78 9c 05 c1 81 09 c0 30 08 04 c0 ...
  • 29. 29 Overview of Improvements in 1.4.2 • Search Services 1.4.2 (and Insight Engine 1.4.2) • ACS Repository 6.2 Enterprise • No ACS Community release containing this yet • However can use existing ACS and jars from https://github.com/aborroy/solr-performance-services-repo Reindex of 1.2b documents in 10 days (6 repo nodes, 20 search nodes) Search Services 1.3.0 150 documents/second Search Services 1.4.2 1200-3500 documents/second* (depending on the number of shards, size of documents, etc.) * Depending on exact configuration (Nb. Not yet validated on the production system)
  • 31. 31 Future Improvements - Coming in 2.0.0 • Schema Simplification • Smaller index • Removing Duplicate Fields • Smaller communication • Improved Trackers • Less duplication with large transactions • New tracker parallelism option • Content Store Removal • Reduced disk usage • Less duplication • Better usage of Solr optimisations • Adds potential to use other Solr features
  • 32. 32 Scenarios datasets • 100,000 documents created with 100,000 transactions • 100,000 documents created with 1 transaction • Changing the path for 100,000 documents • 200,000 ACLs created with 200,000 ACL change sets Parameters investigated • The existing *BatchSize size parameters • The new *MaxParallelism parameters • These change the number of workers assigned to the tracker. They use a ForkJoinPool, and can impact the resources available to other processes Improved Trackers - Testing
  • 33. 33 Hotspot calculation • Increasing the Transaction Batch Size for nodes and ACLs has an impact while the maximum number for your deployment is not reached. After that, you can increase this batch size but there will be no performance changes • Increasing the Node Batch Size can improve your performance while you are down the right number for your deployment. After that, you can increase this batch size but the performance will be penalised • Increasing the maximum number of Parallel Threads improved performance until the maximum number for our deployment was reached. However in a real world deployment it may be useful to use a lower number to avoid impacting other processes. Improved Trackers - Testing Duration (ms) #
  • 34. 34 Content Store Removal • Solr Content store removal will reduce disk usage and simplify replication The Solr Content Store
  • 35. 35 Content Store Removal • Solr Content store removal will reduce disk usage and simplify replication The Solr Content Store Replication of index across Solr nodes
  • 37. 37 When to re-index • When upgrading to major Search Services releases How to re-index • Running some small tests to ensure the performance of the indexing process before running it in production • Indexing from scratch with the upgraded Repository • Indexing in a parallel deployment How to measure • Profiling • Monitoring Recap