2. About me
Name: Makito Hashiyama(@capyogu)
Age: 29
Role: Team manager / In charge of APIs for Rakuten Ichiba
Like: GlassFish, Tomcat, KVS(memcached, Coherence), GAE,
Android…
E-mail: makito.hashiyama@mail.rakuten.com,
hashiyaman@gmail.com
3. What is Rakuten?
E-commerce and Internet company based in Tokyo, Japan
B2B2C e-commerce platform
Head Office
E-Commerce
eBook
Travel
Other services & businesses
Rakuten Institute of Technology
Development center
Head Office / Regional Headquarters
4. How We Are Using GlassFish
Usage
One of core API service for Rakuten Ichiba
Require high availability
External
APIs
Environment
Use on production environment
GlassFish version 3.1.2.2
SOAP/
REST
Client side
Our API
GlassFish
Clusters
Oracle
Coherence
5. Benefits gleaned from Glassfish
Reference implementation of Java EE
Only GlassFish supported JAX-WS as standard in 2007
It was an advantage to evaluate new features earlier
Easy to manage with low cost
We need to manage huge cluster without stopping
Cost saving(Weblogic -> GlassFish)
6. What Worked with GlassFish
Community support(create patches)
https://java.net/jira/browse/GLASSFISH-5200
(If use jvmRoute JSESSIONID cookie is not secure even in
HTTPS)
https://java.net/jira/browse/GRIZZLY-1333
(NetworkAddressValidator will fail when passed property
substitution values)
We have contributed patched to GlassFish community.
7. Improvement to handle a huge traffic
Rakuten Super Sale
Biggest online sales in Japan
A lot of doorbuster deals(It causes a huge amount of traffic)
Performance bottleneck
External APIs called by our API were slow down
We needed to improve the system at the peek time
Client side
delay
Slow down
Our API
External
APIs
8. Improvement to handle a huge traffic
Worker Thread
Worker Thread
Request
Client
side
Task
Queue
Worker Thread
Worker Thread
Worker Thread
GlassFish
CPU load
was high
delay
delay
delay
delay
delay
External
APIs
9. Improvement to handle a huge traffic
Worker Thread
Request
Client
side
Task
Queue
Worker Thread
External
APIs
Worker Thread
GlassFish
(1)According to vmstat, ‘run queue’ was very high
(2)Decrease worker threads to keep ‘run queue’ low
(3)As a result, latency increased but throughput was improved
10. Improvement to handle a huge traffic
As a result…
Our API could process over 12,000 transactions / minute
The result showed the high reliability and availability of
GlassFish
11. Resolve issues & challenge to upper goal
Some issues
Instance down due to the full of task queue
Unknown exception on server.log
org.glassfish.flashlight.impl.client.ReflectiveClientInvoker
java.lang.reflect.InvocationTargetException
CAUSE: java.lang.NullPointerException
id=101
target=org.glassfish.web.admin.monitor.HttpServiceStatsProvider@193e1fc
method=public void org.glassfish.web.admin.monitor.HttpServiceStatsProvider.
connectionAcceptedEvent(java.lang.String,int,java.lang.String)
paramNames=[listenerName, connection, address]
probeIndices=[0, 1, 2]
useProbeArgs=true
hasComputedParams=false
Challenge to upper goal
Notes de l'éditeur
At first, let me introduce myself.My name is Makito Hashiyama.I have been working Rakuten for 4 years as web application engineer.Now, I’m in charge of APIs for Rakuten Ichiba.Before I explain about Rakuten Ichiba, let me introduce about Rakuten.
Rakuten is e-commerce and internet company based in Tokyo, Japan.In America, we provide few services such as Rakuten.com(former name was buy.com)and Rakuten LinkShare.Rakuten Ichiba is the biggest online shopping site in Japan.Ichiba means market place.Merchant can open their own online store and operate on Rakuten Ichiba.
We are using glassfish to provide API for Rakuten Ichiba.We use 3.1.2.2 version of Glassfish and make 3 clusters.1 cluster has over 40 instances and total number of instances is over 100.Let me explain more detail about our API.Our API behaves like a service bus based on SOA.It calls more than 15 external APIsand mash up them, then provides as a service to client side.In addition, our API is stateful API.It manages session information instead of client side.Our API creates specified unique key and client side only haveto call our API with the key.Our API stores session information into Oracle Coherence with the key as key-value form.(Of course, we can use except Oracle Coherence, but we use it another purpose)
It is the reason why we chose Glassfish.We have 2 reasons.One reason is that Glassfish is the reference implementation of Java EE.At 2007, we wanted to evaluate new features of JAX-WS and glassfish could only support it as standard.Another reason is easy to manage with low cost.We are managing huge cluster to support EC service 24/7.At first, we used Weblogic Server to manage them.But we shifted from Weblogic to Glassfish to reduce the cost.Glassfish has also powerful management console to handle huge clusters.
As the work for the community, we have created several patches as you can see.Former patch related to JSESSIONID bug of glassfish version 2.This bug was applied to next glassfish version after we created a patch..We have contributed patched to Glassfish community.
Now, let me share our experiences about Glassfish optimization under the huge traffic.We have an online big sale called “Rakuten Super Sale” a few times a year.You can buy a lot of doorbuster deals at half price include car, house and so on.At the beginning of the sale, shopper pour in to buy these bargain goods and it causes a huge amount of traffic.During the sale, some external API could not handle such a traffic and delayed.And so, our API was affected by these APIs’ slow down.As a fundamental solution, we ask these API to keep SLA.But we have no time to do so, we decided to improve glassfish system by ourselves to make it comfortable for shopper to buy.
To solve the performanceissue, we need to understand the detailed structure of Glassfish.Glassfish put a request into task queue and worker threads process the request one by one.We also found that the CPU load of Glassfish server was high.So, we assumed that CPU high load had something to do with the performance issue.
We kept a “vmstat” log to find out more about CPU load.According to the result, ‘run queue’ was very high when the delay occurred.The worker seemed too much compared to CPU performance in case of occurring delay on external API.We decreased the number of worker threads to see if that would keep ‘run queue’ low.As a result, ‘run queue’ decreased and CPU load also decreased.The amount of external API request was decreased and it responded much faster. Because we decreased the number of worker threads, latency increased a little bit, however, throughput was improved.It enabled Glassfish to process the request more efficiently.
As a result of performance tunings, our API improved drasticallyWe achieved over 12,000 transactions / minute at the peak time of the sale.We also achieved over 20,000 transactions on the load test. These result showed the reliability and availability of Glassfish at all.
We have some issues about Glassfish.At first, we have an experience that Glassfish instance goes down suddenly.We got an error message that said the task queue was full.But we set an enough value as the maximum size of tasks queue.We also monitor the number of tasks in queue but it keeps very few value.Secondly, server.log has unknown exceptions as you can see.Glassfish seems to work well, we would like to know the meaning of this errors.If anyone has information concerning these issues, please let us know.We will challenge to upper goal and optimize glassfish even more.