In this session, we'll discuss architectural, design and tuning best practices for building rock solid and scalable Alfresco Solutions. We'll cover the typical use cases for highly scalable Alfresco solutions, like massive injection and high concurrency, also introducing 3.3 and 3.4 Transfer / Replication services for building complex high availability enterprise architectures.
1. Mike Farman
Product Manager, Alfresco
Peter Monks
Director, Professional Services, Alfresco
Derek Hulley
Senior Engineer, Alfresco
2
2. Many areas to consider...
• Core Repository
• Web-tier load balancing and caching
• Scale-up/scale out - horizontal vs. vertical
• Components tuning
• Replication strategies (3.4)
• Profiling and benchmarking
• ....
We‟re going to focus on the Core Repository
4
3. What happens when you create a node?
1
Begin
Transaction
3 4 8
2 Write 5
Create Update DB Begin Commit
stream (Transaction ID for
node in DB content URL Commit IndexTracking)
to disk
6 9
Transform Add to L2
(extract) Text Cache
Update 7
Index (Props
& Content)
Content Indexing
automatically moved to
background if text extraction 7a
exceeds 20 ms Index Fulltext
5 (Background)
4. What happens when you querying for nodes?
1 2 3
Query Batch 4 5
Results Set In Cache Result Set
(Lucene) Pre-fetch
4a
DB Fetch
Check 6 Deliver 7
Permissions Results
- Max Permission Checks
- Timeout
6
5. What happens when you read a nodes content?
1 4 5
Node Read 2
Fetch Stream
Cached
Request Content Response
3
DB Lookup
7
6. Example Use Cases:
• UC01: Bulk Loading
• High batch throughput, ongoing
• e.g. scanning, archival solutions, systems of record
• Migration
• One-off migration to Alfresco from legacy system
• Then UC02...
• UC02: Enterprise Collaboration Platform
• Concurrent users, variety of interfaces
• e.g. Team/Project Collaboration, Document/Knowledge
Management
8
7. Typical Characteristics
• Large number of documents and throughput
• 10‟s thousands documents injected per day, often during nightly hours
• 10‟s million documents per year
• Low User concurrency
• 100-1000 users (read only access)
• Application profile – System of Record
• End users mostly search & read
• Document formats: PDF, TIFF, JPG (i.e. no full text indexing)
• Typically fixed metadata
• No or little version control
• Few to no rules, actions, workflows, content transformations
• Client Interfaces
• Share/Explorer or Custom e.g. Web Scripts, CMIS
• Typically little CIFS/WebDAV/FTP
10
8. Primary Objective is to Maximise Throughput
• Parallel processing
• Load nodes simultaneously
• Avoid unnecessary in-transaction processing
• In-transaction services often not required when loading
• e.g. Transformation, Indexing
• Disable unneeded services
• Many standard services are not required when loading
• Minimise network and file I/O operations
• Get source content as close to server storage as possible
• Always benchmark and tune...
• JVM, Network, Threads, DB Connections...
12
9. Architectural considerations
• Creation is CPU, memory, network intensive
• Always 64 bit
• Rule of thumb: Prefer scale up over scale out – simpler deployment and
management
• Rule of thumb: get the content as close as possible to Alfresco
• Nature of the data set (i.e. batches) is KEY
• If batches are sequential -> minimize time-per-batch
• Scale up in CPU and memory
• If batches are parallelizable -> maximize number of batches processed
• Scale out with multi-threaded uploads
• Consider dedicated server(s) for ingestion
• Use production servers for migration use case and then reconfigure
• Design content storage around your data
• How can you get the source content as close as possible to repository content
storage?
• Note: Avoid Sparc T and related series
• Highly parallel but not suited for atomic heavy serial operations
13
10. Tuning best practices - JVM Tuning – Application Server
• 64 bit • Pay attention to the
• Make NewSize as large as machine capacity i.e.
possible to avoid spill over • Threads
to OldGen • CPU Utilization
• I/O
• See
http://wiki.alfresco.com/wiki/JVM_Tuning
Sample JVM Config: 64-bit, dual 2.6GHz
Xeon / dual-core per CPU , 8GB RAM
environment
-server
-Xss1M
-Xms2G
-Xmx3G
-XX:NewSize=1G
-XX:MaxPermSize=256M
16
12. Tuning best practices – I/O
• Network
• Alfresco to Database is Key
• Latency is key e.g. > 10ms is absolute max
• JDBC fetch size should be 150
• See BP-1_Alfresco_Environment_Validation_and_Day_Zero_Configuration
• Alfresco to storage (if remote)
• If possible, avoid it completely for file transfers - Stage content on local disks
• Use a dedicated network for storage e.g. Fibre channel
• Incoming to Alfresco – Typically not relevant for bulk loading use case
• Disk
• Lucene index operations' are disk I/O intensive
• Fast read/writes i.e. local disk
• Avoid indexing if not required
• Avoid unnecessary content file copying
• Stage content on local disks
• Consider set cm:content property directly e.g.
• contentUrl=store://mypath/mydocument.docx|mimetype=application/vnd.openxmlformats-
officedocument.wordprocessingml.document|size=51142|encoding=UTF-8|locale=en_GB_
18
13. Tuning best practices - Database
• Connections – Relevant if you are loading concurrently
• See BP-1_Alfresco_Environment_Validation_and_Day_Zero_Configuration
• DB Indexes & Statistics
• Plan your batch loads to allow for periodic statistics maintenance
• Make sure the database hardware/software is sized
appropriately e.g.
• Log sizes, flush on transaction commit, cache tuning, lock
management....
• Use of multiple physical volumes/RAID....
•All databases provide many options to optimise
performance
• Get a DB administrator, partner involved
19
14. Tuning best practice - Repository Services
• Force background indexing
• alfresco-global.properties
• Everything: index.tracking.disableInTransactionIndexing=true
• Just Content: lucene.maxAtomicTransformationTime=0
• Is content indexing required at all?
• DoNotIndex aspect
• “Run As” system user to avoid permission checking
20
15. Tuning best practice - Repository Services
• Use an optimised custom bulk loader
• Process docs in batches - not 1 doc per transaction or 1 transaction for entire
content set
• Example: 100 documents per batch
• Use Foundation (Java) API if possible
• Design multi-threaded import code
• Partition your data set so you can use multiple threads loading in different areas
• Scale up CPU accordingly
•Consider direct APIs (e.g. “NodeService” vs “nodeService”)
• Public services are heavily wrapped with interceptors for transactions, auditing,
permissions, multilingual translations, etc.
• Disable behaviours
• Rules evaluations, cm:auditable, versioning, quotas (system.usages.enabled=false)
•Use proper transaction demarcation
• Complete all operations on a node in a single transaction
• Batching – group multiple updates in a single transaction
• Avoid mixing reads and writes
• See session CS2-Repository_Internals for more details on API specifics
21
16. Tuning best practices – Repository Services
• Disable modified timestamp propagation to parent folders
• system.enableTimestampPropagation=false (default)
• Deleting large numbers of nodes
• Skip deleted items (archive) by adding sys:temporary aspect your
content before deletion
• Partition your content within the repository
• Depends on read access requirements
• Consider partitioning more than 2000 nodes per space if browsing
space children
Note: Performance much improved in later releases 3.3.3, 3.4 –
test for your use case
22
17. Scale Out Using Dedicated Bulk Load Server(s)
• Alfresco can support a non-clustered injection only tier
• Objective: Separate input write process from front end read load
• Solution: Dedicated injection tier pointing to same DB/Content
store(s) as front end servers. No need to cluster caches from this
tier with the front end. Background index properties and/or content,
indexes will catch up from DB transactions.
• Benefits: No Cache update/invalidation overhead. Indexing does not
block loading process
24
18. Bulk load server(s) not clustered but share storage and DB
product servers will „catch up‟ via index tracking
Bulk Load Process Runtime Clients
Creates Only
Bulk Load A Bulk Load B Production A Production B Production C
Tomcat Tomcat Tomcat Tomcat Tomcat
EHCache EHCache EHCache EHCache EHCache
Lucene Lucene Lucene Lucene Lucene
Index Index Index Index Index
Database
Content
MySQL
Store
25
19. Load Server(s) Configuration Tips
• Bulk Load Server(s)
• To exclude servers(s) from cluster:
• Do not set cluster name for bulk load servers in alfresco-global.properties
• alfresco.cluster.name=
• Force background indexing in the local alfresco-global.properties using:
• Everything:
• index.tracking.disableInTransactionIndexing=true
• Just Content:
• lucene.maxAtomicTransformationTime=0
• Note: The load process should perform creates only, no updates or
reads
• Production Server(s)
• Ensure index tracking is enabled:
• index.tracking.cronExpression=0/5 * * * * ?
• index.recovery.mode=AUTO
26
20. Example: In-transaction v‟s Background Indexing
• 10,000 docs, 1,000 folders
• 50kb word documents
• FTP with 10 sessions
• Laptop
• Foreground Indexing:
• 33 mins
• Background Indexing:
• 5 mins
27
22. Requirements
• High (and potentially highly distributed) user concurrency
• 1,000‟s -10,000‟s users (read & write)
• Medium/High number of documents
• 10,000-1 million+ documents
• 1000 document updates per day
• Complex enterprise content and permission models
• Multiple content models/Dynamic ACL
• Versioning and full text indexing on all documents
• Document types: Office, drawing, images
• Advanced content management
• Multiple rules and actions
• Heavy use of content transformations/workflow
•Interfaces (All)
• Share, WebDAV, CIFS ....
30
23. Architectural considerations
• Fully fledged platform deployment
• Need to consider maintenance window
• Scale out Share independently from Repo
• Front and intermediate Load balancer/Web Cache layers
• Read/write split and scheduled repository exclusion for maintenance
• Scale out transformation server
• Enterprise only: JOD OpenOffice subsystem
• Scale out and up infrastructure
• Cluster CIFS with DFS (Distributed File System)
• All HTTP based protocols scale seamlessly (SSP on port 7070)
•Balance multi-CPU (scale up) and multi-node clusters (scale out)
• Overhead of index tracking
31
24. Design best practices
• Distribute your content within the repository
• Otherwise search and retrieval performance degradation is likely
• Use versioning and indexing where appropriate, not just because it‟s
there..
• e.g. don‟t simply apply cm:versionable to the full cm:content
• Modelling
• Prefer aspects over types
• Remember aspects support inheritance as well
• Content Model indexing options
• Tune what you need to index
• Quotas (aka Usages)
• Might save your repo from content explosion but also have an
overhead!
32
25. Tuning best practices – Note: Also see bulk load use case!
• RDBMS
• Number of connections much more important for this use case
• Formula: HTTP Worker Threads + 75 per cluster node
• For Tomcat defaults this is 275
• Cache Configuration
• L2 Cache: increase with RAM to include more objects in cache
• Use ehcache tracing tool to indentify which caches have low hit ratios and increase if you have available memory
• See http://wiki.alfresco.com/wiki/Repository_Cache_Configuration#Tracing_cache_sizes for details
• Alfresco Configuration optimization
• VFS thread pool tuning (default: <threadPool init=“25” max=“50” />)
• Tune ACLs and preload common searches (if needed)
system.acl.maxPermissionCheckTimeMillis=10000
system.acl.maxPermissionChecks=10000
Query via node browser as different users, not only admin
• Consider bulk load large user bases (10,000s) to single (un-clustered) node and then cluster
• Disable eager home folder creation
• home.folder.creation.eager=false in alfresco-globallproperties
• Use multi-threaded and incremental LDAP sync once initial sync has been completed
• Differential sync is the default
• Lucene Tuning
• Lucene.maxAtomicTransformationTime=20
• Monitor the network performance when adding nodes to a cluster
• What for ehcache waiting for the network via thread dumps
• Consider disabling some/all of the L2 caches
33
26. HTTP Clients
Example Windows ECM
CIFS
e.g. Share via alfrescocifs Production
Cluster Install
HTTP Load Balancer DFS Round Robin - Local & Shared Content
Store Active
Directory
User/Group Sync
NTLM Authentication
alfappsrv01 alfappsrv02
Tomcat 1 Tomcat 2
Local Local
alf_data alf_data
• Lucene Index • Lucene Index
EHCache Clustered EHCache d:alf_storelucene-indexes
d:alf_storelucene-indexes
• Content Store • Content Store
d:alf_storecontentstore d:alf_storecontentstore
In & Outbound Replication In & Outbound Replication to
shared content store on SAN
JDBC
oraclecluster
alfclustsrv01 alfclustsrv02
• Replicating Content Store • Replicating Content Store
Oracle 1 Oracle 2
In & Outbound replication <- Failover -> In & Outbound replication
between local and shared between local and shared
content store content store
MSCS Cluster
SAN
• Shared Content Store: sharedContentStore (alfdataDatastore)
• Oracle:
- Data (o:oradataalfresco), Control (o:oradataalfresco) & Logfiles (L:oradataalfresco)
- Oracle Backup (o:flash_recovery_area)
• Lucene Index Backup (alfdataHold)
27. Replication (3.4) offers new deployment options
• Replication may be appropriate for specific contexts
• Provides selective replication of content between distinct Alfresco
repositories
• On demand or scheduled via Replication Jobs
• Reporting and Tracking of Replication Jobs
• Read and viewing performance: Content is served from a
local server
35
28. For any system...
• Do not use the OOTB settings for application server, database etc
Alfresco you must always tune for your use case
• Balance your resources
• Separate tiers for DataBase, Content, App Servers
• Indexes should always be on fast, local disk e.g. not NFS mounts,
USB drives etc
• Run on a supported stack e.g.
• e.g. issues with 1.6u10 use JDK 1.6u.20, use MySQL 5.1.39 or later
• Don‟t starve your database of connections:
• db.pool.max=XXX
• Use appropriate application server worker threads
• Configuration details are application server specific e.g. Tomcat: server.xml
• When clustering, use JGroups and Unicast
• Use the latest Alfresco version/service pack e.g.
• 3.3.3, 3.4
36
29. Things you should NOT change
• The database transaction isolation level
• Use defaults for all databases except MS SQLServer
• FYI. SQLServer should be:
• db.txn.isolation=4096
• ALTER DATABASE alfresco SET ALLOW_SNAPSHOT_ISOLATION ON;
• The ehcache default configuration i.e. Replicate async
• The Lucene indexing defaults unless you know what you
are doing and why!
• Note: Also do not do a full-index rebuild unless you know
what was wrong in the first place!
• Use the index checker
37
33. Q/A & Feedback
• Any Questions?
• Share your experiences (good and bad) with us so we can
all learn!
• Successful scaled up/out architectures
• Limitations, bottlenecks
• Use case parameters => Implementation => Results
• What worked, what didn‟t
43
Notes de l'éditeur
We won’t be going into details on how to setup clustering and the web tier
[Check with AH the background indexing stuff, i.e. is it indexing or extraction that exceeds 20 ms]
Theses are typically, specifics with obviously vary.
[Derek]
[Derek]
[PM – how does the custom loading fit into this??]