Building highly scalable website requires to understand the core building blocks of your applicative environment. In this talk we dive into Jahia core components to understand how they interact and how by (1) respecting a few architectural practices and (2) fine tuning Jahia components and the JVM, you will be able to build a highly scalable service
2. Summary
Ecommerce is what we do
How we came to Jahia ?
The performance bottlenecks
How me made our servers busy sleeping
How Jahia treat its users equally (well)
A (Jack)Rabbit can’t be slow unless you want it to.
Assumptions
Publication nodes are for publishing only
My promise
Won’t talk about anything already in the docs.
3. Ecommerce is what we do
The easy part
Metropolitan Transportation Pass
The challenge
200.000 Pass / hour
55 Pass / second
250 hits/ second
Selfcare services
The challenge
30.000 users / minute
The hard part
Now do it with a CMS
6. How we made our server busy
sleeping
Browser
Browser
Jahia1
Browser
Browser
Browser
Data
Jahia2
7. Jahia Caching (1/3)
Jahia1
EHCACHE
Data
Jahia2
EHCACHE
DiskStore Policy which is 20 times slower than Memory Store
Writing is slow
Writing to disk uses OubjectOutputStream which is 18 times slower
than byte copy
Eviction is costly
Not only does EhCache write data to disk it also need to remove it
from the memory store
Reads from disks require deserialization + disk reads and
they are both synchronous
8. Jahia Caching (2/3)
DiskStore Policy which is 20 times slower than Memory Store
Writing is slow
Writing to disk uses OubjectOutputStream which is 18 times slower
than byte copy
Eviction is costly
Not only does EhCache write data to disk it also need to remove it
from the memory store
Reads from disks require deserialization + disk reads and
they are both synchronous
If you have to use the DiskStore policy then :
Use SSD drives
Deastivate disk schedulers (set elevator to noop)
Limit disk access by setting noatime
9. Jahia Caching (3/3)
MemoryStore Policy
Memory is faster than disk
Byte Copy is faster than Object serialization
Require to increase JVM Memory and the GC does not like it
Memory is subject to GC
EhCache (commercial version) can use Off Heap Memory
Not subject to GC
Allocated outside the JVM
Managed by EhCache externally
No perfect solution ?
The truth is elsewhere
12. The problem we wanted to solve
What we wanted
Hot deployment
Zero downtime
Continuous deployment
No human intervention
What we got
What we wanted
Almost linear scalability
13. Green / Blue deployment
Zzzzz
Green Line
Blue Line
App1
App1
App2
App2
App3
App3
Install Blue Line
Open Blue Gates
Close Green Gates
Wait for sessions to die
Install Green Line
An what if we did not have any session ???
14. What we did ? Jahia stateless
Browser
L
o
a
d
B
a
l
a
n
c
e
r
Session
Jahia1
Session
Session
Session
Browser
Jahia2
Session
Round robin
does not guaranty
Equal session lifetime
80 lines of Java code inside a Jahia Filter
L
o
a
d
B
a
l
a
n
c
e
r
Session
Jahia1
Session
Session in cookie based
Jahia stays Stateless
Jahia2
17. JackRabbit versus SGBDR versus File System
Content Repository
SGBDR
Integrity
File System
Hierarchy
Structure
CRUD
Queries
Transactions
Notifications
Versioning
ACL
Full Text
Search
18. Performance tuning
Indexing is done asynchronously
Data is available after a short delay
Disabling it
If you don’t use it comment out SearchIndex in workspace.xml
Avoid extra useless searches
For each node returned by Lucene, JackRabbit checks the ACE for that node
Set the property resultFetchSize accordingly
Defaults to 100
Adjust Lucene to JackRabbit Ids correctly
Set the property cacheSize to maximize cache hits
Available in RepositoryStatitics
19. Performance tuning
Avoid multiple I/O on your DB
Set minRecordLength to a value that avoid read/write to be larger than DB Unit of I/O
Unit of I/O defaults to 8K for Postgres, Oracle, SQLServer
Avoid too many references to a single node
JackRabbit is node oriented
When links between nodes matters we use a Graph database
Tuple Visit(Page, User)
Tuple Like(Page, User)
For large node sets
Add extra levels
Paths quickly reduce the search domain
Speedup write
Use usual transaction patterns
Write asynchronously whenever possible