SpringOne Platform 2016
Speaker: Brian Dunlap; Tech Lead, Southwest Airlines.
Distributed systems and fast data require new software patterns and implementation skills. Learn how Southwest Airlines uses Apache Geode, organizes team responsibilities, and approaches design tradeoffs. Drawing inspiration from real whiteboard conversations, we’ll explore:
-Common development pitfalls
-Environment capacity planning
-Streaming data patterns, like consumer checkpointing
-Support roles
-Production lessons learned
Every day, Apache Geode improves how Southwest Airlines schedules nearly 4,000 flights and serves over 500,000 passengers. It’s an essential component of Southwest’s ability to reduce flight delays and support future growth.
10. Southwest’s
Network Operations Control
integrates decision makers.
BOISE ALBANY
OKLAHOMA CITY
AUSTIN PANAMA CITY BEACH
CHARLESTON
GREENVILLE-SPARTANBURG
TUCSON
LUBBOCK
AMARILLO
MIDLAND/ODESSA
EL PASO
LITTLE ROCK
NASHVILLE
DALLAS (LOVE FIELD)
SACRAMENTO
OAKLAND
SAN JOSE
BURBANK
LOS ANGELES
(LAX) ORANGE COUNTY
ONTARIO
SAN DIEGO
SAN FRANCISCO (SFO)
BIRMINGHAM
LOUISVILLE
CLEVELAND
OMAHA
TULSA
RENO/TAHOE
HARLINGEN/SOUTH PADRE ISLAND
PUERTO VALLARTA
CORPUS CHRISTI
ALBUQUERQUE
DES MOINES
MEMPHIS
CABO SAN LUCAS/LOS CABOS
ROCHESTER
AKRON/
CANTON
WICHITA
PENSACOLA
MEXICO CITY
NASSAU
PUNTA CANA
SAN JUAN
MONTEGO BAY
ARUBA
CANCÚN
FLINT
GRAND
RAPIDS
CHARLOTTE
DAYTON
MINNEAPOLIS/
ST. PAUL
PHOENIX
DENVER
INDIANAPOLIS
COLUMBUS
RALEIGH/DURHAM
CHICAGO
(MIDWAY)
FT. LAUDERDALE (MIAMI AREA)
DETROIT
HOUSTON (HOBBY)
SEATTLE/TACOMA
LAS VEGAS
NEW ORLEANS
ST. LOUIS
MILWAUKEE
BUFFALO/
NIAGARA FALLS
ATLANTA
ORLANDO
FT. MYERS/NAPLES
JACKSONVILLE
TAMPA
WEST PALM BEACH
SAN ANTONIO
KANSAS CITY
BELIZE CITY
SAN JOSÉ
LIBERIA
PORTLAND
WASHINGTON, D.C. (REAGAN NATIONAL)
RICHMOND
MANCHESTER
PROVIDENCE
HARTFORD/SPRINGFIELD
NORFOLK/VIRGINIA BEACH
BOSTON LOGAN
PHILADELPHIA
BALTIMORE/WASHINGTON (BWI)
WASHINGTON, D.C. (DULLES)
PITTSBURGH
NEW YORK (LAGUARDIA)
LONG ISLAND/ISLIP
NEW YORK (NEWARK)
SALT LAKE CITY
SPOKANE
PORTLAND
23. What do you own? (core)
<focus>
What do you need?
(supporting)
<simplify>
How long can you keep it?
<intentional>
24. Adding is very easy.
Watch out for data that’s around for too long.
Do all of these data
need to be
in-memory?
Data at rest for
a long time? (>365 days)
GEODE
REGION
SIZES
25. Determine if each subdomain
should use Geode.
Don’t make an automatic
decision.
Domain tradeoffs
26. Maybe it needs an
entirely different home?
Domain tradeoffs
30. OLD NEW
NORMALIZED JOINS
REGIONS FOR READS
REGIONS FOR AGGREGATES
BLOCKING THREADS ASYNC - AKKA / ACTORS
ACTIVE / PASSIVE ACTIVE / ACTIVE
MUTABLE STATE
IMMUTABILITY / EVENT
SOURCING
DATA CONVERGENCE
CRUD
CQRS / DDD
EVENT DRIVEN
ServiceManagerHandlerIm
pl
32. OLD NEW
NORMALIZED JOINS
REGIONS FOR READS
REGIONS FOR AGGREGATES
BLOCKING THREADS ASYNC - AKKA / ACTORS
ACTIVE / PASSIVE ACTIVE / ACTIVE
MUTABLE STATE
IMMUTABILITY / EVENT
SOURCING
DATA CONVERGENCE
CRUD
CQRS / DDD
EVENT DRIVEN
33. We write immutable domain events into event
regions.
Client’s receive events using Geode CQs.
Client’s checkpoint their position into separate
regions.
Event regions expire messages.
checkpointing
34. Akka Cluster manages Actor Singletons which
coordinate parallel processing based on a logical
groupId.
Backpressure is implemented through a competing
consumer pattern. Take a look at Akka Streams!
All Geode replicate regions use distributed ack. We
don’t want to converge. (some write wins)
coordination (*important concept)
36. PUSH or PULL
How do we scale expensive read I/
O?
Contain expensive reads
With CQRS view model builders, perform
heavy state enriching “select *” once.
Push read updates vs. polling (Geode
CQs)
Conflate triggering view model rebuild
events
37. Be careful with timeouts!
Be careful with alerts!
Be careful with joins!
Be careful with large values!
Be careful with old habits!
safety tips
40. Integrate Geode security with a directory
Tune JVM size and GC
Deploy and upgrade environments
Size and configure VMs
Support production events
Enable WAN Gateway Sender / Receivers
Load snapshots between environments
Automate starting and stopping clusters
Teaching distributed concepts - like CAP
How do we share new
distributed system responsibilities?
DBAs
UNIX
DEVs
Middleware
Release Management
Offshore Support
New Geode Team
DevOps
EARLIER
IS
BETTER
41. Learn to luv conversation
tension.
When there’s tension, you’re on
the right track!
46. Prefer less-shared disk I/O.
(local to a VM rack, or dedicated)
Prefer larger + fewer Geode nodes.
(4 larger nodes vs. 8 smaller ones)
Take advantage of availability zones (A
CONVERSATION
LEADERSHIP
ACROSS TEAMS
SHARED or
SHARED LESS
What infrastructure
supports Geode?
47. Know your memory (and GC) limits.
Watch out for slow heap growth
that triggers continuous GC.
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=60
-Xloggc:/your/path/node-name.GC.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCCause
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=20
-XX:GCLogFileSize=5M
Check out GCViewer
for GC log analysis.
48. Essential tool for real-time
decision optimization testing!
Helpful for QA performance
and functional testing.
Wonderful Geode feature!
WAN Gateway
49. Optimization binary consumes PDX via C++ Native
Client
Moving > 200 MB per optimization request
Be careful with refactoring PDX data types!
C++ Native Client