The document discusses Nordstrom's development of a recommendations API and service called Recommendo using AWS services like DynamoDB, Elastic Beanstalk, and Node.js. Some key points:
- Recommendo provides product recommendations to Nordstrom's website and emails, serving over 4 billion recommendations from 105 days of development.
- It was built on AWS using services like DynamoDB for storage, Elastic Beanstalk for deployment, and Node.js for the backend. This allowed a small team to build and deploy it quickly.
- Performance was improved through tuning, and the system now handles the load with an average latency of 90ms from a few auto-scaling servers.
- Lessons learned
Breaking the Kubernetes Kill Chain: Host Path Mount
AWS and the Nordstrom Data Lab Presentation on Building a RESTful Product Recommendations API
1. Jason Wilson & David Von Lehman
PRESENTING
AWS and the Nordstrom Data Lab
2. Recommendo Overview
• REST-ful product recommendations API
• Live on nordstrom.com in November
• Service emails live in January
• Lives in the AWS cloud – Elastic Beanstalk,
DynamoDB, node.js
• 3rd party rec vendors don’t tap into what is
unique about Nordstrom or fashion
3. By the Numbers
• Over 4 billion recommendations served
• >3 million API hits per day
• 105 days between first commit and go-live (Aug 6
and Nov 19 respectively)
• 5 servers with auto-scaling to 20 (turns out we don’t
need them)
• 90ms average request latency
5. How We Built It
• Continuous integration and deployment from
the first week
• 90+ percent code coverage
• Fewer moving parts == less to monitor, fewer
ways for things to go wrong
• Fully PaaS based to minimize sys admin
responsibilities
• How can we support this ourselves without
carrying pagers?
6.
7. DynamoDB
• Fully managed NoSQL database-as-a-service
• Web API with SDK support for Python, Ruby, node.js,
.NET, and Java
• High performance queries, backed by SSD
• Maintains predictable performance for data at any
size through horizontal scale out
• Auto replication across 3 availability zones
• Need to understand data access patterns up front
• Pay for only what you use/need – both storage and
R/W throughput
8. • JavaScript on the server atop the Google V8 engine
• Asynchronous event loop makes it ideal for real-time
data intensive applications
• Vibrant open-source community around excellent
npm package manager (50K+ packages)
• Seeing increased adoption in enterprises including
Wal-Mart, LinkedIn, PayPal, Dow Jones, Microsoft,
New York Times
9. JavaScript – Learn to Love It
• No type checking, don’t find
errors until runtime
• Not classical OO
• var keyword
• Callback hell
• Server debugging too hard
• But wait..
• Chrome and V8
• Dynamic can be your friend
• npm!
• express, async, mocha
10. AWS Components
• EC2 – Provides web-scale computing as a
service.
• ELB – elastic load balancer. Routes incoming
traffic to ec2 instances, scales up to meet
demand.
• Auto-scaling group – a logical collection of EC2
instances behind an ELB
12. Elastic Beanstalk
• AWS PaaS – lightweight abstraction layer atop EC2/ELB with
no additional costs
• More transparent than Azure or Heroku
• Supports Java, .NET, Python, Node.js, PHP, and Ruby
• git push deployment
• Auto-scaling group with custom triggers and auto applied
config
• Possible to configure the AMI including yum packages,
environment variables, and more
• Supports custom AMIs
• Automated health checks
13. Continuous Deployment
git push
to dev
branch
Jenkins
CI
unit
tests
git push
to EB
git pull
dev
git
checkout
master
git merge
dev
git push
master
Jenkins
CI
unit tests
git push
to EB
(prod)
Development
Production
14. Performance testing
• Initial performance was poor.
• Disable DNS caching when load testing against
ELB.
• Pre-warm ELB for higher upfront throughput
• jmeter-ec2, bees with machine guns
15. Early Perf results – YIKES!Transactions
per second
Response
time (seconds)
16. Performance tuning
• New relic, Nodetime
– Real-time performance monitoring of node
runtime
• node-mem-watch
– Evented inspection of heap, gc events, leak
events, and heap diffing
• ssh into instances
17. Real Performance
• Pleasantly surprised
• Average latency ~90ms
• Dynamo response times <10ms
• Handful of auto-scaling up and back events
• One outage due to bad exception handling
20. Lessons Learned / Pitfalls
• True zero downtime deployment is difficult to
achieve
• Thoroughly explore the Elastic Beanstalk
configuration options
• Catch those errors – a rogue unhandled
exception can bring it all down
• Health checks that actually do something
• Out of the box monitoring is pretty good
25. Recommendo 2.0
• Sku based recommendations – size!
• Truly personalized recs based on individual browse
and purchase history
DynamoDB
Batch
Recs
Real-
Time
Refiner
y
ScorerIngester Redis
Streams
27. Wrap-Up
• Recommendo – initial success, now building upon what we
have learned
• Node.js + DynamoDB + Elastic Beanstalk is a winning
combination
• Possible to out-perform an incumbent vendor solution in a
competitive differentiating capability
• Cloud and PaaS enable small teams to move quick and deliver
solid production caliber systems
• Incremental cost of “gold plating” steadily shrinking
• Your company benefits when percent of resources devoted to
core competency is maximized
The ease of deployment is addictive.. Just a git push and you’re doneDAVID
DAVID
DAVID
DAVID
DAVID
JASON
JASON
JASON
JASON
JASON
JASON
JASON
DAVID
DAVID
DAVID
JASON
DAVIDMany think that a similar revolution is about to happen for the enterprise.
DAVIDOn-Premise – large crew with many specialists, lots of infrastructure to support, lots of regulations, redundancy, energy required to prep and depart from the port, no cloud powerIaaS – Less power/resources required, smaller crew. Still complicated to sail though, requires some specialization. PaaS – Smallest crew yet, highly nimble, 2-3 people can just hoist the sails and get out to sea. Everyone can operate all aspects of the boat. Also means they are alone out there. Team needs to be self sufficient if something goes wrong.Point is not that we need fewer people overall, rather we need a whole fleet of nimble PaaS vessels. Shift resources towards the point of customer impact. Offload non-differentiating commodity infrastructure.
DAVIDThanks to cloud and code libraries (especially open-source), the additional incremental effort required for industrial strength is shrinking.Logging – lots of great open source libsMonitoring – Cloud Watch, New RelicRedundancy – Elastic Load BalancerScalability – Auto-Scaling groupsHigh Availability – Dynamo replicationDeployment Automation – git pushSecurity – IAM, VPC, Firewalls