6. Digitalization
!
Yesterday, Internet was a tool
!
Today, numerical technologies are changing everything : the
way we communicate, work, learn, do business… the way we
live
11. !
All this data will end in the IT system of some company, and
they will make money from it
“Big data is the new oil”
!
It’s not only about data : there will be new usages, new
services… new competitors !
!
Sooner or later, every company will face the problematics the
web giants had to face
22. !
They measured everything :
! Power efficiency of all hardware parts
! Performance to power ratio, $ per transaction, etc.
! Cost models of failures
!
!
!
22
For them : Commodity hardware is 3 to 12 times cheaper
Start to design datacenters only based on commodity hw
Start to design application distributed on thousands of non
reliable machines
23. Small is beautiful, but…
!
!
Web giants are the champions of infrastructure automation, that’
why they became champions of the cloud
!
Need to completely redefine application resilience, since the
hardware is not reliable, and constantly fails.
!
23
Having to deploy on many machines changes everything : you
need to automate things
Resilience must be handled by software. Especially for
databases
25. NoSQL
! « Not Only SQL »
! To go beyond RDBMS limitations
!
!
!
!
!
Google : BigTable
Amazon : DynamoDB
Facebook : Cassandra, sharded key-value mysql
LinkedIn : Voldemort
etc
26. The need for speed
Amazon:
Google:
Yahoo:
Bing:
… and availability
100ms of degradation of latency
more than 500ms in page load
more than 400ms in page load
more than 1s in page load
Amazon: 1 min of unavailability
=
=
=
=
-1% of revenues
-20% of page views
+5 to 9% of bounce
-2.8% in ad revenues
=
50 K$ of revenue loss
(The blink of an eye is 300 ms)
26
Les géants du Web
27. New storage architectures and the CAP theorem
« Availability »
Users can access the system
(read or write)
A is also related to response time.
The more you look for consistency,
the worst will be the latency
Large websites use
“eventually consistent”
datastores (NoSQL)
DBRMS universe
Can pick
only two !
« Consistency »
All users have the same
version of information
« Partition tolerance »
The system continues to work in case
of network partition, ie. when different
nodes cannot communicate
28. NoSQL
!
A radically different approach to database
!
!
!
!
Distributed storage, tolerating failure by replicating data
Consistency constraint is relaxed : eventual consistency
Focus is put on availability and low response times (low latency)
Linear horizontal scalability
!
Variety of datamodels
! key/value
! column oriented
! graph
29. Different sharding approaches
!
Google
! BigTable, with the distributed storage file system GFS
!
Amazon
! Famous paper about Dynamo, key/value store organised in a ring
of replication with consistent hashing, and original approach to
eventual consistency
!
Facebook
! Cassandra, inspired form both BigTable, and Dynamo
! also : specific design of a sharded mysql used as key/value store
!
29
…
31. Exponential growth of capacities
CPU, memory, network bandwith, storage … all of them followed the Moore’s law
Source :
http://strata.oreilly.com/2011/08/building-data-startups.html
31
33. Google paper : Map Reduce
Key
principles
! Parallelize,
distribute,
and
load-‐balance
processing
! Fault-‐tolerant
(hide
failure
of
nodes
during
the
processing)
! Co-‐loca;on
of
processing
and
data
33
36. A new way of doing BI and data analytics
!
Consider that all the data is valuable, and store everything :
structured and un-structured data
!
Scale to peta-bytes of storage, at a low cost
! Yahoo has a cluster of 42’000 nodes
!
!
36
Don’t force the data to match a predefined data model (tables
and schema), instead use a “schema-on-read” approach
Don’t move the data (ETL) to process it, instead move the
processing to the data (Map-Reduce)
38. Build vs. Buy
Strategic and
innovative Assets
Faster
SPECIFIC
Unique,
Differentiating
Perceived as
a competitive advantage
Common to all companies in a sector
Perceived as an advantage for
production
COMMERCIAL
SOFTWARE
PACKAGES
BPO
Common to all companies
Perceived as a resource
Resources
Cheaper
38
39. They use and contribute massively to open source
!
Facebook : MySQL, Cassandra, Thrift, open compute (open
source hardware and datacenter design)…
!
Google : android, GWT, chromium, linux kernel…
! through their papers : GFS, MapReduce
!
LinkedIn : Voldemort, Kafka, Zoie …
!
NetFlix : a huge list of software…
I trust software I hacked myself
39
40. A way to expose services of
applications, to be re-used by
others to build and enrich their
own services and applications
40
46. Be a platform from the beginning
Memo de Jeff Bezos (2002)
1) All teams will expose their data and functionality through service
interfaces.
2) Teams must communicate with each other through these interfaces.
3) There will be no other form of interprocess communication allowed: no
direct linking, no direct reads of another team’s data store, no sharedmemory model, no back-doors whatsoever. The only communication
allowed is via service interface calls over the network.
4) It doesn’t matter what technology they use. HTTP, Corba, Pubsub,
custom protocols — doesn’t matter. Bezos doesn’t care.
5) All service interfaces, without exception, must be designed from the
ground up to be externalizable. That is to say, the team must plan and
design to be able to expose the interface to developers in the outside world.
No exceptions.
6) Anyone who doesn’t do this will be fired.
7) Thank you; have a nice day!
46
47. Open API : advantages to do it
!
Leverage effect
! enrich your service portfolio and business opportunities with many
partners
!
Do bigger things by using « collective intelligence of the world »
!
Create an ecosystem around you
!
Improve the quality
! If you want your APIs to be used,
! Companies of the world are looking at what you are doing à it
brings pressure on you to improve
!
Attract talented people
! The best way to attract good developers : they will want to come
and work with those who created these APIs
47
50. We try things. We celebrate our failures.
This is a company where it is absolutely OK
to try something that is very hard, have it not be
successful, take the learning and apply it to
something new
Eric Schmidt
former Google’s CEO
Move fast and break things
Mark Zuckerberg
Facebook
Failure is totally OK.
As long as you fail fast
50
Marissa Mayer
Yahoo
52. The minimum viable product
is that version of a new product
which allows a team to collect the
maximum amount of validated
learning about customers with the
least effort
Eric Ries
pioneer of Lean Startup
52
61. How long would it take your
organization to deploy a change that involves
just one single line of code?
Mary Poppendieck
From Concept To Cash
61
62. !
2 deployments per day
!
!
A deployment somewhere in datacenters every 11 seconds
Any moment, an average of 10’000 servers are being
updated
!
10 deployments / day
62
63. Why deploy continuously ?
!
!
Improve Time To Market
Learn Faster
IDEAS
(and it needs metrics !)
LEARN FAST
DATA
CODE FAST
CODE
MEASURE FAST
63
64. Why deploy continuously ?
!
!
Smaller change = Smallest Time-to-Recover
You reduce the risks, by lowering the impacts of problems
64
68. Tools and practices
! Continuous integration
! TDD - Test Driven Development
(automated unit testing)
! Code reviews
! Continuous code auditing (sonar…)
! Functional test automation
! Strong non-functional tests
(performance, availability…)
! Automated packaging and deployment,
independent of target environment
! Zero downtime deployment
68
69. Feature flipping
!
!
!
Push code to production != push a feature to production
Enable/ Disable a new feature on production in seconds
“Graceful degradation” during peaks of traffic
!
Can be used for A/B testing !
69
70. Datamodel evolution strategy example
Datamodel
Version N
Datamodel
Version N
V.1
Datamodel
Version N+1
Hybrid
V.1 + V.2
Datamodel
Version N+1
V.2
70
71. Dark Launch @ Facebook
We chose to simulate the impact of
many real users hitting many machines by
means of a “dark launch” period in which
Facebook pages would make connections to
the chat servers, query for presence
information and simulate message sends
without a single UI element drawn
on the page.
YES !
IT’S A LOAD TEST ON A PRODUCTION PLATFORM !
71
77. NetFlix Hystrix
77
Hystrix is a latency and fault tolerance library
designed to isolate points of access to remote
systems, services and 3rd party libraries, stop
cascading failure and enable resilience in complex
distributed systems where failure is inevitable.
79. In God we trust.
All others must bring data
W. Edwards Deming
79
80. Everyone must be able to experiment, learn and iterate.
Position, obedience and tradition should not hold no power.
For innovation to flourish, measurement must rule.
Werner Vogels ,
CTO of Amazon
80
81. !
They measure everything
!
!
!
!
!
!
!
81
usage
infrastructure, from datacenter to HDD power consumption
operational processes efficiency
…
self-service restaurant queue length !
management practices (Google)
Good ideas come from the field, from real data, because
managers always have biases when they try to interpret
situations
82. Best size for teams
82
http://www.qsm.com/process_improvement_01.html
89. If an idea worths 1,
a well-executed idea worths
$100...$1’000...$10’000’000 !
90. Attract and hire the best
WHAT FACEBOOK EMPLOYEES
EARN:
Senior software engineer $132,503
Product manager $130,143
User interface engineer $129,136
Machine learning engineer $123,379
Engineering manager $123,379
Source : www.glassdoor.com/index.htm
Software engineer $111,562
Project manager $98,302
Operations engineer $82,626
Site reliability engineer $80,413
Software engineering intern $74,700
Account executive $62,124
Network engineer $121,500
Business development mgr $115,000
!
They are also known to have tough technical interviews, to get
only the best developers !
91. Develop the talents !
!
Lots of training
!
Code review / Pair programming
!
Mentoring
!
Slack-time dedicated to RnD , or personal projects
!
Hackatons
!
Strong open source involvement