This speech is about team which invented its own network security protocol for the big Golang project. Yes we were sober and conscious and still did it!
First of all - “why”?!
Everybody knows Go runtime SSL is quite slow. Less common it’s memory footprint could not be considered as optimal.
Yes, we do know we can link OpenSSL to our program and get as performant as NGINX, for example.
But did it cross your mind OpenSSL is also not fast enough for some corner cases? Say, you have to accept 1M new connections in 30 seconds…
The problem is: SSL is slow and CPU intensive on establishing connection phase.
On inCaller project we came across this: to perform the tasks we need 32 CPU cores only. That mean 4 servers cluster. To accept new connections fast enough we need 480 CPU cores, which give us 60 servers cluster.
60 servers cluster is about 15 times worst than 4 servers cluster, obviously.
Looking to this unpleasant math we’ve decided to build our own encryption and security protocol. And we succeeded!
What we did, how we did it and what we’ve got finally - this is what my speech about.
Company Snapshot Theme for Business by Slidesgo.pptx
Build your own network security protocol and get away uncaught
1. Build your own network
security protocol and
get away uncaught
Daniel Podolsky
ex-CTO inCaller.org
onokonem@gmail.com
skype: onokonem
telegram: @onokonem
2. Who am I
• 25 years in IT
• 20 years of sysadmin experience
• 10 years software development experience
• 5 years teamlead experience
3. What I will talk about
• inCaller: sort of messenger
• it was a key point to provide status update to each
client often enough
• 1 minute was an absolute maximum
4. What I will talk about
• inCaller: sort of messenger
• it was a key point to provide status update to each
client often enough
• 1 minute was an absolute maximum
• A lot of very private info was passed from client to
server and back so security was the second (or
even first) key point
5. What I will talk about
• 10 millions of clients was a business request
• App newer reach this number, but we payed for
make this possible
• And we finally made!
6. What I will talk about
• But first we've started with some test for HTTPS
and different kind of servers
• Go itself: 250 new HTTPS connections per CPU
core per second
• nginx (OpenSSL): about 500 new HTTPS
connections per CPU core per second
7. What I will talk about
• These numbers are way too low!
• 10M clients
• Connecting every 60 seconds
• 500 connects per second per CPU core
• We have to provide at least 50% of CPU time to the app itself
• We need 30% redundancy to handle the random traffic spikes
• 867 cores required!
8. What I will talk about
• 867 cores means 108 servers with 8 cores each
• or 27 servers with 32 cores each
• But these servers are 20% more expensive than
108 small servers
• Not sexy at all
9. What I will talk about
• But first we've started with some test for HTTPS
and different kind of servers
• Go itself: 250 new HTTPS connections per CPU
core per second
• nginx (OpenSSL): about 500 new HTTPS
connections per CPU core per second
10. What I will talk about
• Funny: we need this huge power just to accept the
connection
• It would be great to extend an encrypted session
beyond the single TCP session border!..
11. What I will talk about
• Funny: we need this huge power just to accept the
connection
• It would be great to extend an encrypted session
beyond the single TCP session border!..
• So we need a custom network transfer security
protocol
12. What I will talk about
• Funny: we need this huge power just to accept the
connection
• It would be great to extend an encrypted session
beyond the single TCP session border!..
• So we need a custom network transfer security
protocol
• Are you convinced? We were
13. What I will talk about
• Funny: we need this huge power just to accept the
connection
• It would be great to extend an encrypted session
beyond the single TCP session border!..
• So we need a custom network transfer security
protocol
• Are you convinced? We were at the time
14. Let’s the fight begin!
Disclamer: security is not an easy game!
Make sure you do know what you do!
15. Let’s the fight begin!
• What the SLL provides to us?
• Traf encryption - no room for sniffing
• Client identification: as soon as SSL session still exists we are dealing
with the same client
• Server identification: no room for the MitM attack
• Replay attack protection: session key is unique and sliding, it is
useless to re-send the previously recorded traf to the server nor client
• Note: SSL (and any other encryption) is not an absolute protection. It is
a way to make decryption time and effort big enough to be worthless.
16. Let’s the fight begin!
• SSL is the best we can get so let’s base on it
• As soon as clients are connecting to SSL once in 4 hours each
we can handle the load.
• AES128 is great too so it can be used to encrypt non-SSL data
• It is a standard
• It unbreakable (actually, break is expensive enough to make it
worthless)
• It is symmetric (same key used for encryption and decryption)
so we need to keep the key safe
17. Let’s the fight begin!
• One key to rule them all
• We have a key stored on server, and we are
passing it to the clients securely (over SSL)
• Clients are sending their data encrypted with this
key
• Server is sending its answers encrypted with the
same key
18. Let’s the fight begin!
• Problem: one key mean any client can decrypt the
traf for any other client.
• Unacceptable.
19. Let’s the fight begin!
• Uniquie key per client
• We create a key for the client during SSL session
• Key is stored to a DB and passed to the client
• Clients are sending their data encrypted with this
key accompanied with the client ID to allow server
to find an appropriate key
20. Let’s the fight begin!
• Problems:
• DB load could be quite high
• We do not want to pass permanent client ID
unencrypted
• Unacceptable
21. Let’s the fight begin!
• One key for the server and another one for the client.
Hocus-pocus!
• We have a key on the server. Not passed anywhere.
• We create a key for the client during SSL session
• We have UserID, UserEncryptionKey and some
garbage concatenated
• CRC32 checksum calculated for their concatenation
and attached to them
22. Let’s the fight begin!
• Result is encrypted by server key and passed to the client as its
ID accompanied with the client key
• Client will encrypt the messages with the key
• Server will decrypt this tricky ID to get a decryption key
• CRC is used to check decryption correctness
• Yes, sad, but AES does not provide any way to check was
decryption successful or not.
• Yes, almost any garbage will be successfully “decrypted”,
you need to check the data is not damaged by yourself
23. Let’s the fight begin!
• Minor problems:
• Static server key makes its crack little bit more
worthy: it’s in use for long enough and it’s the same
for all the clients
• Static server key makes client ID static and thereat
traceable, which is not good
• Yes, “not decryptable” answer has to be passed
unencrypted. No, there is nothing we can do about it.
24. Let’s the fight begin!
• Solution: Let’s make server key sliding
• We will generate a new server key every N hours
• We will keep 2 keys for each time: current and last one
• We will try to use a last one in case current one was
not able to decrypt tricky ID
• Client will get new tricky ID every N hours over SSL
• Acceptable
25. Let’s the fight begin!
• So far so good
• Traf encryption: provided by AES
• Client identification: no way to construct a forged
tricky ID without server key
• Server identification: no way to construct a forged
server answer without pre-shared key
26. Let’s the fight begin!
• But what about replay attack protection?
• Yes, any message could be recorded and re-played to the
server by MitM
• This could be harmless or harmful depends on the message
nature
• So we could have to must MUST MUST consider it harmful!
• So we need some sort of session
• But we can not involve SSL here, unfortunately: this will
drive the load to the original level
27. Let’s the fight begin!
• Attach call
• Client send a special Attach message. Attach messsage is
encrypted as described above, of course.
• Server will generate a random session ID, cache it in RAM and
return it encrypted to the client
• Yes you will need some sort os sticky sessions at this point:
same client has to reach the same server each time because of
SessionID cached in RAM.
• All the following client messages has to be accompanied with this
particular SessionID, otherwise they will be ignored, error will be
produced and the client will have to re-Attach
28. Let’s the fight begin!
• Problem: attach replay message will cause previous
session disconnect.
• This could be used to create DOS attack.
• Unacceptable.
29. Let’s the fight begin!
• AttachRequest call: create another special message to generate a new
SessionID.
• On this call server will create new SessionID
• Cache it in RAM
• But current SessionID will remain untouched for now
• Server will return this new SessionID to the client
• And client will send it back to server with Attach call
• No way to pass this loop without current client ancryption key!
• As soon as new SessionID will arrive with attach call it will replace the old
one.
30. Let’s the fight begin!
• So far so good
• AttachRequest message replay is harmless as it
does not change any data
• It does actually but time gap for the attack is small
enough to be ignored
• Attach message replay is impossible as we need
a new SessionID to make it.
31. Let’s the fight begin!
• Problem: messages inside one session still can be
replayed
32. Let’s the fight begin!
• SessionCounter: each message accompanied with
session counter
• At the Attach call client ans server both will set it to 0
• On each message client will increment the counter
and will send it along with a message
• Server will cache new counter value
• Or drop the message in case new counter is less
or equal to the cached one
33. Let’s the fight begin!
• Problem: no way to send messages in parallel
• There nothing we can do about it so we decided
to consider this acceptable.
34. Resume
• It is doable
• But quite expensive
• 2 months for 2 very well-payed engineers