Discovering the 2 in Alfresco Search Services 2.0

Discovering the 2
in Search Services
2.0
Tech Talk Live
October 14th, 2020

22
Discovering the 2 in Search Services 2.0
Tech Talk Live
• Solr Core and Solr Schema
• Security, Performance and Precision
• Enterprise Enhancements
• One more thing...
• Q&A
14th October 2020

33
Solr Core and Solr Schema
Tom Page
COMMUNITY

4
Solr Content Store Removal
ACS
Repository
Content Store
Search Services
1.4
Content StoreDB Solr Index
COMMUNITY

5
Solr Content Store Removal
ACS
Repository
Content Store
Search Services
1.4
Content StoreDB Solr Index
ACS
Repository
Content Store
Search Services
2.0
DB Solr Index
COMMUNITY

6
Solr Content Store Removal Benefits
Removed custom code
9,311 lines of code removed
https://github.com/Alfresco/SearchServices/blob/mas
ter/search-services/alfresco-
search/doc/architecture/solr-content-store-
removal/00001-solr-content-store-removal.md
Helps leverage built-in Solr features
It's now possible to make use of built-in Solr features
(e.g. replication and backups)
Reduces I/O work
Particularly in systems with replication
Reduced disk usage
Search Services Version 1.4 2.0
Index Size (bytes per doc) 1 3,000
Content Store Size (bytes per doc) 40,000 0
COMMUNITY

7
Solr Content Store Removal Reindex
• Moving data from the content store to the index requires a reindex
Reindexing with sharding: Demo later
For more information see:
https://github.com/aborroy/solr-sharding-reindex
For more information about
reindexing see:
https://www.alfresco.com/events/webinars/
tech-talk-live-reindexing-large-repositories
COMMUNITY
TTL
#120

8
Solr Content Store Removal Impact
● More efficient replication as we're now using the default Solr
mechanism
○ Docker-compose example available at
https://github.com/aborroy/search-services-replication
● Now using atomic updates instead of removing and
recreating documents
○ To achieve this we enabled the SOLR Transaction Log
● Review your backup and restore procedures, as the folder
$SOLR_HOME/contentstore is not created anymore
$ du -h /opt/alfresco-search-
services/data/alfresco
4.7M ./index
8.5M ./tlog
4.0K ./snapshot_metadata
COMMUNITY
FTSSTATUS

9
Full information for a
Document can be still
recovered by using Solr
Queries.
Solr Content Store Removal Impact
http://127.0.0.1:8983/solr/alfresco/select?fl=*,[cached]&indent=on&q=DBID:563
COMMUNITY

10
New Destructured Date Fields
Solr schema simplification solrhome/core/conf/schema.xml
Improved storage of DATE fields
quarter
day_of_month
day_of_year
day_of_week
COMMUNITY

11
New fields *_unit_of_time_* can be used to build queries
Get all the documents created in 2020
SOLR FTS
Nb. CMIS is also supported, but not for this example:
● cm:created is not supported as cm:auditable aspect is not exposed for CMIS protocol
New Destructured Date Fields
COMMUNITY

12
Asynchronous Actions and Maintenance
SearchServices
Administrator
Maintenance Queue
Retryt1
Commit TrackerIndex
----
----
----
https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html
COMMUNITY

13
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Commit TrackerIndex
----
----
----
COMMUNITY

14
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Purge
t3
Commit TrackerIndex
----
----
----
COMMUNITY

15
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Purge
t3
Fixt4
Commit TrackerIndex
----
----
----
COMMUNITY

16
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Purge
t3
Fixt4
Commit TrackerIndex
----
----
----
t5
Dequeues scheduled work
COMMUNITY

17
SearchServices
Administrator
Maintenance Queue
Retryt1
Reindext2
Purge
t3
Fixt4
Commit TrackerIndex
-+-
--+
+--
t5
Dequeues scheduled work
t6
Index management
COMMUNITY

18
The FIX tool finds transactions and ACL change sets which are mismatched between the DB and Solr
It adds them to be reindexed on the next maintenance cycle performed by the CommitTracker
FIX Tool
{
"responseHeader": {
"QTime": 1,
"status": 0
},
"action": {
"status": "scheduled",
"txToReindex": [1, 2],
"aclChangeSetToReindex": [3, 4]
}
}
Old Response Shape
● “status” is always scheduled
● Only two error categories
● Each category contains the corresponding
transaction identifiers
COMMUNITY

19
{
"responseHeader": {
// As before
},
"action": {
"dryRun": true,
"status": "notScheduled",
"txToReindex": {
"txInIndexNotInDb": {
"192": 282, // Tx 192 is associated to 282 nodes
"827": 99 // Tx 192 is associated to 282 nodes
},
"duplicatedTxInIndex": {...},
"missingTxInIndex": {...}
},
"aclChangeSetToReindex": {
// Very similar to txToReindex, but for ACLs
}
}
}
FIX Tool New Features
● dryRun (defaults to true): If true the output report is
generated, but no reindex work is scheduled.
● fromTxCommitTime: The lower bound (the minimum
transaction commit time) of the target transactions
that you want to check or fix.
● toTxCommitTime: The upper bound (the maximum
transaction commit time) of the target transactions
that you want to check or fix.
● maxScheduledTransactions: The maximum number
of transactions that will be scheduled. The default is
500 but this can be overridden in solrcore.properties.
COMMUNITY

20
Enable/Disable Indexing
Motivation: Disable indexing in order to cancel a huge maintenance load
• Enable / disable indexing on a specific core or on all master/standalone cores
• MetadataTracker, ContentTracker, CascadeTracker, AclTracker are affected
• CommitTracker, ModelTracker, ShardStatePublisher are not affected
• When disabled, some admin endpoints (e.g. PURGE,INDEX) won’t execute
• When disabled, the FIX endpoint will be forced to run in dryRun mode
• If indexing is disabled in the middle of a tracking process, trackers will be set to rollback mode
• Commands are idempotent
• For more information see https://issues.alfresco.com/jira/browse/SEARCH-2330
Examples:
Disable indexing on all master/standalone cores
http://localhost:8983/solr/admin/cores?action=enable-indexing
Disable indexing on a specific (master or standalone core)
http://localhost:8983/solr/admin/cores?action=enable-indexing&core=alfresco
COMMUNITY

21
FIX Tool Demo
Postman Collection containing the example requests used in the demo
https://www.getpostman.com/collections/4c2fbe407a0134729546
COMMUNITY

2222
Security, Performance and Precision
Angel Borroy
COMMUNITY

23
● Communication between Repository and SOLR (for searching and indexing) may be
protected using mTLS Protocol with client authentication [1]
● New password handling mechanism has been introduced from ASS 2.0 / ACS 6.2.N [2]:
○ Switch from storing configuration in property files with passwords in plain text to JVM system
properties
○ The old way of configuring should still work for backwards compatibility, but is discouraged for security
reasons
[2] ACS 6.2.N is not released yet!
New mTLS Configuration
[1] https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-6-1-is-coming-with-mutual-tls-authentication-by-default/ba-p/287905
COMMUNITY

24
alfresco-ssl-generator command Line Tool to generate self-
signed certificates (classic and current formats)
https://github.com/Alfresco/alfresco-ssl-generator
alfresco-solr-docker-mtls sample configuration (repo using
classic and solr using current)
https://github.com/aborroy/alfresco-solr-docker-mtls
Additional resources
Installing and configuring Search Services with mutual TLS using the
distribution zip
https://docs.alfresco.com/search-community/tasks/solr-install.html
Alfresco mTLS Configuration Deep Dive
https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-mtls-
configuration-deep-dive/ba-p/296422
New mTLS Configuration
COMMUNITY

25
ALFRESCO SOLR
ALFRESCO REPOSITORY
Trackers Reworking
Transactions
NodesMetadata
NodesMetadata
NodesMetadata
NodesMetadata
alfresco-remote-apiAlfresco
DB
TransactionsGet
NodesMetadataGet
NodesGet
Nodes
1
2
1
2
3
3
Transaction Batch
Node Batch
Parallel Threads
JSON
COMMUNITY

26
Trackers Reworking
Transaction Batch Size for nodes and ACLs has an impact while the
maximum number for your deployment is not reached. After that, you can
increase this batch size but there will be no performance changes
alfresco.transactionDocsBatchSize (default 2000)
alfresco.changeSetAclsBatchSize (default 500)
Increasing the Node Batch Size can improve your performance up to an
optimal point for your deployment. After that, you can increase this batch
size but the performance will be penalised
alfresco.nodeBatchSize (default 100)
alfresco.cascade.tracker.nodeBatchSize (default 10)
alfresco.contentUpdateBatchSize (default 2000)
alfresco.aclBatchSize (default 100)
Increasing the maximum number of Parallel Threads improved performance
until the maximum number for the deployment is reached.
alfresco.metadata.tracker.maxParallelism (default 32)
alfresco.cascade.tracker.maxParallelism (default 32)
alfresco.content.tracker.maxParallelism (default 32)
alfresco.acl.tracker.maxParallelism (default 32)
HOTSPOT
HOTSPOT
Execution
Time
Parameter
Size
solrcore.properties
1
2
3
COMMUNITY

27
FTS operator = has changed behaviour in 2.0.0
● Detailed information is available in https://hub.alfresco.com/t5/alfresco-content-services-blog/exact-term-queries-in-
search-services-2-0/ba-p/302200
● Thanks @AFaust for noticing this issue: https://issues.alfresco.com/jira/browse/SEARCH-2461
Exact Search
COMMUNITY

2828
Enterprise Enhancements
Keerat Lalia
ENTERPRISE

29
In previous releases, Shard State was communicated to the repository as part of the retrieval of
information from the Metadata Tracker.
That could generate problems when the Metadata Tracker cycle takes long time to execute.
A new Shard State Publisher tracker has been added in order to report the state to the repository on
regular basis.
The new configuration for this tracker includes the following property.
alfresco.nodestate.tracker.cron
If this property is not specified, default cron is applied:
alfresco.cron=0/10 * * * * ? *
ShardState Tracker
solrcore.properties
ENTERPRISE
Sharding

30
DB_ID_RANGE Sharding
• When a shard goes down then search can now be restored more quickly
For more details see MNT-21591
ACS Node 1
ACS Node 2
SOLR Shard 1
DB_ID_RANGE
SOLR Shard 2
DB_ID_RANGE
Replica 1
Replica 2
ACS (alfresco-global.properties):
search.solrShardRegistry.shardInstanceTimeoutInSeconds = 30
(Historically this should be set to more like 300 seconds)
InsightEngine (solrcore.properties):
alfresco.nodestate.tracker.cron=0/10 * * * * ? *
This should be more frequent than the value set in ACS
ENTERPRISE
Sharding

31
Solr Sharding Reindex
When re-indexing a living Alfresco Repository with SOLR Sharding and
solr.useDynamicShardRegistration enabled, the new SOLR Shard Indexer services should be
configured with Alfresco NodeState Tracker off.
Using this approach, the SOLR Indexer services are not registered in the living Alfresco Repository as
available SOLR Shards and the living system can operate normally.
Sharding Reindex (Demo)
https://github.com/aborroy/solr-sharding-reindex
This configuration uses two Docker Compose templates:
● living is an ACS server running 2 SOLR Shards configured with DB_ID
method and Alfresco Search Services 1.4.3
● indexer is an Indexer service running 2 SOLR Shards configured with
DB_ID method and Alfresco Search Services 2.0.0.1
ENTERPRISE
Sharding

32
● Improved SOLR JDBC support
● Added support for Excel and Tableau to Alfresco Search and Insight Engine using an ODBC Driver
provided by a 3rd party company called CDATA
○ Download the driver in https://www.cdata.com/drivers/alfresco/
Alfresco
REPOSITORY
BI Tool Support
ENTERPRISE
BI Tools

33
Improvements to SQL Support (JDBC & ODBC)
• Support for Date Functions in SELECT Clause
• Support for Date Functions in WHERE Clause
• Support for Date Functions in GROUP BY Clause
• Support for SQL avg(field) with multiple GROUP BY
• Support for Date Functions in ORDER BY Clause
• Support SQL TIMESTAMP format
• Support for CAST AS TIMESTAMP function
• Support for QUARTER function
• Support for DAYOFMONTH, DAYOFWEEK, DAYOFYEAR functions
• Support for TIMESTAMPADD(timeUnit, integer, datetime) function
ENTERPRISE
BI Tools

34
JDBC Driver with DBVisualizer (Demo)
ENTERPRISE
BI Tools
Alfresco
REPOSITORY
>> Working JDBC Client sample is available in https://github.com/aborroy/solr-jdbc-client

35
CDATA ODBC installation
The driver is simple to install on your machine and can be done using the steps on the following page:
http://cdn.cdata.com/help/SJF/odbc/
Installation and setup is a simple two-step process, to be performed on end user’s machine
1. Install the driver
2. Configure the ODBC data source
Configuration is fully documented by Cdata.
ENTERPRISE
BI Tools

36
ODBC for Tableau
• Can connect to your relevant data source and portray the results in a table from the source.
• The results can be displayed by using the table directly or by entering a custom sql query to portray results specific
to what the user wants to see.
• Tableau consists of worksheets where we can build views of our data using the fields and graphs.
• Each worksheet builds the results of one query through the use of the fields.
• Can visualise our results as pie charts, bar charts, stacked bar charts, continuous line graphs and many more
• We can edit out results by applying filters within Tableau on our selected fields.
• Tableau has the ability to create dashboards to store all of our related queries on each of the sheets in one place.
• Can preview the results on different devices like desktop, tablet and more.
ENTERPRISE
BI Tools

37
ODBC for Excel
• Simply start by doing a data dump into excel
• Similar process to connect to the ODBC source like Tableau where you can connect and view all the results from the
table or provide a custom sql query similar to Tableau.
• Excel gives a preview of the results before going on to displaying the results on a different sheet.
• You can filter the data before displaying the results through the preview by clicking the ‘transform’ button and then
going on to filter your data to how you want.
• You can use native excel functionality from your chosen dataset without heavily relying on SQL in comparison to
using Zeppelin.
ENTERPRISE
BI Tools

38
Supported Stack
• Linux (Red Hat Enterprise v7.6 x64)
• CentOS 7 x64
• Ubuntu 18.04
• SUSE 12.0 SP1 x64
• Windows Server 2012 R2 (x64)
• Windows Server 2016
Server OS
• Solr 6.6.5
Solr
• OpenJDK 11.0.8
• Oracle JDK 11.0.1
Java
• Alfresco Enterprise Edition (ACS) 6.2
• Alfresco Community Edition 201911 GA
Alfresco Content Services
COMMUNITY
ENTERPRISE
Release notes
https://hub.alfresco.com/t5/alfresco-content-services-blog/search-services-2-0-0-release/ba-p/301308

39
2.0.0.0
2.0.0.1
shared.properties
• Suggestable Properties and Cross Locale fields
• This may have an impact in the SOLR index
• Spellcheck and Tokenisation work by default
2.0.x
• Settings changed back to commented out
by default like previous versions
2.0.0.1
COMMUNITY
ENTERPRISE

4040
One more thing...
COMMUNITY

41
https://hub.alfresco.com/t5/alfresco-content-services-blog/how-to-track-the-progress-of-the-indexing-process-in-
alfresco/ba-p/301444
SELECT count(1)
FROM alf_node
WHERE store_id=6;
count
-------
835
Is my SOLR Index (fully) updated?
http://127.0.0.1:8983/solr/admin/cores?action=summary&core=alfresco
SELECT id FROM alf_store
WHERE protocol='workspace'
AND
identifier='SpacesStore';
COMMUNITY

42
Index Checker Tool
https://github.com/AlfrescoLabs/index-checker
Simple report
$ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=false --run.fix.actions=false
Count SOLR documents = 814
Count DB nodes = 815
The database contains 2 nodes more than SOLR Index for {http://www.alfresco.org/model/content/1.0}category
SOLR indexed 1 nodes more than the existing in database for {http://www.alfresco.org/model/content/1.0}content
Count SOLR permissions = 58
Count DB permissions = 58
>> Available from Search Services 1.4.3

43
Index Checker Tool
Detailed report
$ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=true --run.fix.actions=false
The database contains 2 nodes more than SOLR Index for {http://www.alfresco.org/model/content/1.0}category
TYPE {http://www.alfresco.org/model/content/1.0}category: DbIds present in DB but missed in SOLR [212, 213]
SOLR indexed 1 nodes more than the existing in database for {http://www.alfresco.org/model/content/1.0}content
TYPE {http://www.alfresco.org/model/content/1.0}content: DbIds present in SOLR but missed in DB [584]
Count SOLR permissions = 58
Count DB permissions = 58
Batches of
1,000
elements

44
Fix actions
$ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=true --run.fix.actions=true
...
No Database Rows Were Harmed in the Fixing of This Solr Index
$ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=false --run.fix.actions=false
Index Checker Tool
>> Watch the living demo in https://youtu.be/YU-WyNgCH2U

Questions?
Join us in https://hub.alfresco.com

Discovering the 2 in Alfresco Search Services 2.0

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Discovering the 2 in Alfresco Search Services 2.0

Similaire à Discovering the 2 in Alfresco Search Services 2.0 (20)

Plus de Angel Borroy López

Plus de Angel Borroy López (20)

Dernier

Dernier (20)

Discovering the 2 in Alfresco Search Services 2.0