Luc Gorissen
Previous employers:
- KPN Research
- CMG Wireless Data Solutions
- OraVision
- Oracle
Focus:
- BPM and SOA Suite
luc.gorissen@amis.nl
+31 6 3622 4226
@LucGorissen
No, no, no
LinkedIn
The Challenge
Starting Point:
Our ACM/BPM implementation supports successfully our core business processes
Requirement:
We need to be able to search through case/process data of the last 7 year
We need:
An ACM/BPM archive where we can
search through data
of cases/processes of up to 7 years old
The Technology
Company:
Product:
Promise:
Can it be done?
... a highly scalable open-source full-text search and analytics engine.
It allows you to store, search, and analyze big volumes of data
quickly and in near real time.
5
Topics
Use Case Data
Use Case
Evaluation
Use CaseElastic Product
Stack Basic Concepts
31 2
4 5
Recommendation
6
6
Elastic Product Stack
... a highly scalable open-source full-text search and analytics engine.
It allows you to store, search, and analyze big volumes of data
quickly and in near real time.
• Full-Text Search
• Document-Oriented
• Near-Real-Time
• Horizontally Scalable
• Multi Tenant
• Schema-Free
• REST-API
• Open Source – Apache 2 license
• On top of Apache Lucene
• REST/JSON
Features
7
Elastic Product Stack
Product Description
Elasticsearch Search engine
Elastic Cloud Elasticsearch Cloud offering
Logstash Data collection engine
Kibana Analytics and visualization platform
Beats Collect data (network, infra, file, winlog) and ship
Shield Protect access to your data
Watcher Alerts/notifications from changes in your data
Marvel Monitor your Elasticsearch cluster
8
Elastic Product Stack
Maturity
• Complete product stack
• Cloud offering
• Modern technology around solid Apache Lucene core (1999)
• Clients: Ruby, Python, PHP, Perl, .NET, Java, Javascript, etc
• Apache Lucene release 6.0.1, May 27, 2016
• Elasticsearch release 2.3.3, May 18th, 2016
• Oracle plans to replace Secure Enterprise Search with
ElasticSearch in WebCenter products (OOW 2015)
• Support / community group / meet-ups / training
9
Basic concepts
Supports: availability, scalability, distribution
Cluster
Document (JSON)
Index ABC Index ABC
Shard 1
Shard 2
Index ABC Index ABC
Replica Shard 1
Replica Shard 2
Distributeovernodes
10
Installation development set-up
Installation of Elasticsearch:
[developer@localhost bin]$ tar -xvf elasticsearch-2.3.2.tar.gz
[developer@localhost bin]$ pwd
/home/developer/elasticsearch/elasticsearch-2.3.2/bin
[developer@localhost bin]$ ./elasticsearch
Installation of Kibana (‘Analytics and visualization platform’):
[developer@localhost kibana]$ tar -xvf kibana-4.5.1-linux-
x64.tar.gz
[developer@localhost config]$ vi kibana.yml
[developer@localhost bin]$ pwd
/home/developer/kibana/kibana-4.5.1-linux-x64/bin
[developer@localhost bin]$ ./kibana
11
Use Case: tweets AMISnl
tweets
AMISnl
TwitterSupport
ScreenTweet
(Office Management)
CtoScreening
(CTO)
TweeterContacted
(telemarketeer)
MarketingScreening
(marketing)
Screen all tweets of AMISnl to see if action is required for the conference
12
Use case
Tweets:
733666488083750912
2016-05-20 14:31:36
RT @robbrecht: Orcas - Automatic deployment for the database https://t.co/4U6QSuROjf
@amisnl @OC_WIRE
733652455523811328
2016-05-20 13:35:50
RT @sai_penumuru: Learn something new from my session. #AMIS25 @oracleotn @oracleace
https://t.co/1gBagwgotD
733652388272312322
2016-05-20 13:35:34
RT @sai_penumuru: Join me on 2nd-3rd June 2016 for BEYOND THE HORIZON conference in
Netherlands. #AMIS25 @oracleace @oracleotn https://t.co…
7336219462906716202016-05-20 11:34:36
NEWSFLASH! The official #AMIS25 app is now available. Search for 'AMIS 25' in your app
store and enjoy! https://t.co/iYOEGG6l90
In total: 3212 tweets
13
Use Case result:
data in JSON format
Transform to JSON
<caseActivityDefinition>
<applicationName>default</applicationName>
<completedDate>2016-05-19T06:29:13.910+02:00</completedDate>
<componentName>TwitterSupport</componentName>
<compositeDn>default/TwitterSupport!1.0*soa_33331876-7da2-4ba6-b28d-fec89397281e</compositeDn>
<compositeName>TwitterSupport</compositeName>
<compositeVersion>1.0</compositeVersion>
<definitionId>default/TwitterSupport!1.0/CtoScreeningProcess</definitionId>
<displayName>CtoScreeningProcess</displayName>
{
"caseActivityDefinition": {
"caseId": "100036",
"completedDate": "2016-05-23T09:39:03.111+02:00",
"definitionId": "default/TwitterSupport!1.0/CtoScreeningProcess",
"displayName": "CtoScreeningProcess",
"instanceId": "116187",
"name": "CtoScreeningProcess",
"nameSpace": "http://xmlns.amis.nl/TwitterSupport/CtoScreeningProcess",
"startDate": "2016-05-23T09:19:08.111+02:00"
}
}
3212
tweets
Retrieve data from the ACM system with the
platform API. Retrieved data:
• CaseActivities
• CaseMileStones
• Comments
• CaseData
14
Insert data into ElasticSearch
Insert MileStone data into ElasticSearch archive:
curl -XPUT
'localhost:9200/casemilestones/external/1?pretty' -d '
{
"caseMilestone": {
"caseId": "103242",
"state": "ATTAINED",
"name": "TweetScreenedMilestone",
"updatedDate": "2016-05-25T10:27:34.111+02:00"
}
}
'
index
Milestone data in JSON
15
Results use case:
data into ElasticSearch
Totals - start:
[developer@localhost elasticsearch-2.3.2]$ curl 'localhost:9200/_cat/indices?v'
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open caseactivities 5 1 0 0 650b 650b
yellow open casemilestones 5 1 0 0 260b 260b
yellow open casecomments 5 1 0 0 650b 650b
yellow open casedata 5 1 0 0 650b 650b
Totals - end:
[developer@localhost elasticsearch-2.3.2]$ curl 'localhost:9200/_cat/indices?v'
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open caseactivities 5 1 3693 0 929.4kb 929.4kb
yellow open casemilestones 5 1 16060 0 1.5mb 1.5mb
yellow open casecomments 5 1 7207 0 685.1kb 685.1kb
yellow open casedata 5 1 16060 0 2.2mb 2.2mb
[developer@localhost elasticsearch-2.3.2]$
Timing :
# documents: 43020
Upload time: 9:57 min
Upload speed: ~72 docs / sec
21
Office Documents
Especially for case management,
‘Office Documents’ are important.
Installation of plugin for indexing Office and PDF docs (Apache Tika):
[developer@localhost bin]$ pwd
/home/developer/elasticsearch/elasticsearch-2.3.2/bin
[developer@localhost bin]$ ./plugin install mapper-attachments
22
‘Office Documents’
Document formats:
• Supported Document Formats
• HyperText Markup Language
• XML and derived formats
• Microsoft Office document formats
• OpenDocument Format
• Portable Document Format
• Electronic Publication Format
• Rich Text Format
• Compression and packaging formats
• Text formats
• Audio formats
• Image formats
• Video formats
• Java class files and archives
• The mbox format
24
Use Case Results
• Mature, enterprise grade product
• Easy search, even ‘Office Documents’
• Basic analysis, more investigation required
• Careffully determine what info to put into elasticsearch
– Audit trail? TaskQueryService? Other info?
• It is schema-free: easy transitions between Oracle releases
• You will find the caseIdentifier and anything related to the caseIdentifier
• Not an easy overview of case history
25
Recommendation
Back to ‘the challenge’:
An ACM/BPM archive where we can
search through data
of cases/processes of up to 7 years old
Aspects:
- TCO: License Costs
- TCO: Yet another technology
- DB versus elasticsearch:
- Schema-less JSON data store
- No transactions
- Near-real-time
- Document Management System / doc types
- Logstash jdbc plugin