LINE is a messaging service with 200+ million active users. I will introduce why we feed 100+ billion daily messages into Kafka and how various systems such as data sync, abuse detection and analysis are depending on and leveraging it. It will be also introduced how we leverage dynamic tracing tools like SystemTap to inspect broker’s performance on production system, which led me to fix KAFKA-4614.
Presented by Yuto Kawamura, LINE Corporation
Systems Track
4. LINE
— Messaging service
— 169 million active users1
in
countries with top market
share like Japan, Taiwan and
Thailand
— Many family services
— News
— Music
— LIVE (Video streaming)
1
As of June 2017. Sum of 4 countries: Japan, Taiwan,
Thailand and Indonesia.
8. Cluster Scale
— 150+ billion messages /day
— 40+ TB incoming data /day
— 3.5+ million messages /sec at peak times
9. Broker Servers
— CPU: Intel(R) Xeon(R) 2.20GHz x 40
— Memory: 256GiB
— more memory, more caching (page cache)
— Network: 10Gbps
— Disk: HDD x 12 RAID 1+0
— saves maintenance costs
— Number of servers: 30
10. New challanges - Being part of the infrastructure
— Higher traffic
— Multi-tenancy
— Requirement for delivery latency
— As a communication path with bot systems
— Much faster threat detection
12. KAFKA-4614 - Long GC pause
harming broker performance
which is caused by mmap
objects created for OffsetIndex
— Highlighted as great
improvement in Log
Compaction Feb2
— https://issues.apache.org/jira/
browse/KAFKA-4614
2
https://www.confluent.io/blog/log-compaction-
highlights-in-the-apache-kafka-and-stream-
processing-community-february-2017/
13. One day, we found response times of Produce requests
ge!ing unstable...
14. Looking into detailed system metrics...
— Found that small amount of disk read was occurring during response time
spikes.
— Interesting, because all our consumers are supposed to be caught-up by
the latest offset => all fetch requests should be served from page cache.
15. Who is reading disk, and for what?
Tried reading code, kept observing logs, periodically
taking jstack and jvisualvm ... but no luck
16. Paradigm shi!: Observing lower level - SystemTap
— A kernel layer dynamic tracing tool and scripting
language
— Safe to run in production because of low overhead
— If we run strace or perf on production servers... !
17. Simple example: Counting syscalls:
$ stap -x PID -e '
global cnt
probe syscall.* {
cnt[name] += 1
}
probe end {
foreach (k in cnt)
printf("%s called %d timesn", k, cnt[k])
}
'
^Cfcntl called 19 times
read called 3333 times
pselect6 called 1 times
sendto called 4929 times
...
24. Who's mmap?
[kafka]$ git grep mmap ...
class OffsetIndex ... {
...
private[this] var mmap: MappedByteBuffer = {
val newlyCreated = _file.createNewFile()
val raf = new RandomAccessFile(_file, "rw")
...
val idx = raf.getChannel.map(FileChannel.MapMode.READ_WRITE, 0, len)
25. Who is calling munmap?
Thu Dec 22 17:21:27 2016,093: tid = 126123,
device = sdb1, inode = -1, size = 4096
...
tid = 126123
26. Finding munmap caller
jstack and grep thread id3
# hex(126123) = 0x1ecab
jstack KAFKA_PID | grep nid=0x1ecab
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007ff278d0c800 nid=0x1ecab
in Object.wait() [0x00007ff17da11000]
.... GC related thread?
3
nid=0xXXXX entry of jstack output tells "n"ative thread id
27. Visiting Javadoc of MappedByteBuffer
https://docs.oracle.com/javase/8/docs/api/java/nio/
MappedByteBuffer.html
> A mapped byte buffer and the file mapping that it
represents remain valid until the buffer itself is garbage-
collected.
28. How does munmap cause disk read?
Log cleaner thread deletes a log segment which expires
retention period.
29. How does munmap cause disk read?
OffsetIndex, calls File.delete() on an index file, but it
physically remains, as the living mmap still holds an open
reference.
30. How does munmap cause disk read?
MemoryMappedBuffer becomes garbage but it might not be
collected by GC as it is placed in a region which still has
many living objects.
31. How does munmap cause disk read?
While MemoryMappedBuffer survives several GC attempts,
several hours elapses, the entry which holds meta info
of the index file is evicted from buffer cache.
32. How does munmap cause disk read?
Finally GC collects the region which holds the disposed MemoryMappedBuffer,
and calls munmap(2) through MemoryMappedBuffer's cleaner.
33. How does munmap cause disk read?
Kernel realizes that the final reference to the file destroyed,
attempts to perform physical deletion of the index file.
34. How does munmap cause disk read?
XFS driver attempts to lookup up the inode entry for index file from
cache but can't find it => read it from disk.
35. Confirming...
grep 'Total time for which' kafkaServer-gc.log | # one-liner for summing up GC time
2017-01-11T01:43 = 317.8821
2017-01-11T01:44 = 302.1132
2017-01-11T01:45 = 950.5807 # << !!!
2017-01-11T01:46 = 344.9449
2017-01-11T01:47 = 328.936
Tip: You can enable the very useful "STW duration" logging by option:
-XX:+PrintGCApplicationStoppedTime
...
2017-08-03T20:15:27.413+0900: 12109287.163: Total time for which
application threads were stopped: 0.0186989 seconds, Stopping
threads took: 0.0000489 seconds
39. Conclusion
— LINE uses Kafka as part of its fundamental microservice
infrastructure and its usage is increasing weekly
— Introduced advanced techniques to achieve deeper
observability for Kafka
— However, Kafka is amazingly stable and high-
performant for most cases even with the defaults
— Try out today's techniques just in case you run into
complicated issues :p
— Since Kafka 0.10.2.0, broker's response times have
become much faster and more stable