Web Giants Innovations

WEB GIANTS
Innovations, Practices, Culture

Mathieu DESPRIEE
mde@octo.com

1

© OCTO 2014

Soon in english !

3

© OCTO 2014

Digitalization

!

Yesterday, Internet was a tool

!

Today, numerical technologies are changing everything : the
way we communicate, work, learn, do business… the way we
live

http://postscapes.com/internet-of-things-examples/

!

All this data will end in the IT system of some company, and
they will make money from it

“Big data is the new oil”

!

It’s not only about data : there will be new usages, new
services… new competitors !

!

Sooner or later, every company will face the problematics the
web giants had to face

BIGGER

FASTER

BETTER

17

© OCTO 2014

highend machine / mainframe!
highly redondant hardware!
symmetric multi-processing!
20

lots
© OCTO 2014 of CPU, RAM, disk!

“commodity hardware”!
x86 machines!
pizza box with few CPUs and disks!
21

no hardware redundancy!

© OCTO 2014

!

They measured everything :
!   Power efficiency of all hardware parts
!   Performance to power ratio, $ per transaction, etc.
!   Cost models of failures

!
!
!

22

For them : Commodity hardware is 3 to 12 times cheaper
Start to design datacenters only based on commodity hw
Start to design application distributed on thousands of non
reliable machines

Small is beautiful, but…

!

!

Web giants are the champions of infrastructure automation, that’
why they became champions of the cloud

!

Need to completely redefine application resilience, since the
hardware is not reliable, and constantly fails.

!

23

Having to deploy on many machines changes everything : you
need to automate things

Resilience must be handled by software. Especially for
databases

SHARDING
NoSQL
24

© OCTO 2014

NoSQL
! « Not Only SQL »
! To go beyond RDBMS limitations

!
!
!
!
!

Google : BigTable
Amazon : DynamoDB
Facebook : Cassandra, sharded key-value mysql
LinkedIn : Voldemort
etc

The need for speed

Amazon:
Google:
Yahoo:
Bing:

… and availability

100ms of degradation of latency
more than 500ms in page load
more than 400ms in page load
more than 1s in page load

Amazon: 1 min of unavailability

=
=
=
=

-1% of revenues
-20% of page views
+5 to 9% of bounce
-2.8% in ad revenues

=

50 K$ of revenue loss

(The blink of an eye is 300 ms)

26
Les géants du Web

New storage architectures and the CAP theorem
« Availability »

Users can access the system
(read or write)

A is also related to response time.

The more you look for consistency,

the worst will be the latency

Large websites use

“eventually consistent”
datastores (NoSQL)

DBRMS universe

Can pick
only two !

« Consistency »

All users have the same
version of information

« Partition tolerance »

The system continues to work in case
of network partition, ie. when different
nodes cannot communicate

NoSQL
!

A radically different approach to database

!
!
!
!

Distributed storage, tolerating failure by replicating data
Consistency constraint is relaxed : eventual consistency
Focus is put on availability and low response times (low latency)
Linear horizontal scalability

!

Variety of datamodels
! key/value
! column oriented
!   graph

Different sharding approaches
!

Google
! BigTable, with the distributed storage file system GFS

!

Amazon
!   Famous paper about Dynamo, key/value store organised in a ring
of replication with consistent hashing, and original approach to
eventual consistency

!

Facebook
!   Cassandra, inspired form both BigTable, and Dynamo
!   also : specific design of a sharded mysql used as key/value store

!

29

…

BigData
Hadoop
30

© OCTO 2014

Exponential growth of capacities
CPU, memory, network bandwith, storage … all of them followed the Moore’s law

Source :
http://strata.oreilly.com/2011/08/building-data-startups.html

31

70
Seagate

Barracuda

7200.10

64 MB/s
60

MB/s

50

40

Seagate

Barracuda

ATA IV

30

20
IBM DTTA

35010

10

0,7 MB/s
0

2010

1990

Storage capacity
Throughtput

We can store 100’000 times more data, but it takes 1000 times longer to read it !

x 100’000

x 91

32

Google paper : Map Reduce

Key
principles

!   Parallelize,
distribute,
and
load-‐balance
processing

!   Fault-‐tolerant
(hide
failure
of
nodes
during
the
processing)

!   Co-‐loca;on
of
processing
and
data

33

Integration w/
Information System

Querying

Advanced
processing

Orchestration

Distributed Processing

Distributed Storage

Monitoring and Management

Overview of Hadoop architecture

35

A new way of doing BI and data analytics
!

Consider that all the data is valuable, and store everything :
structured and un-structured data

!

Scale to peta-bytes of storage, at a low cost
!   Yahoo has a cluster of 42’000 nodes

!

!

36

Don’t force the data to match a predefined data model (tables
and schema), instead use a “schema-on-read” approach
Don’t move the data (ETL) to process it, instead move the
processing to the data (Map-Reduce)

Build vs. Buy
Strategic and
innovative Assets
Faster

SPECIFIC
Unique,
Differentiating
Perceived as
a competitive advantage

Common to all companies in a sector
Perceived as an advantage for
production

COMMERCIAL
SOFTWARE
PACKAGES

BPO

Common to all companies
Perceived as a resource
Resources
Cheaper

38

They use and contribute massively to open source
!

Facebook : MySQL, Cassandra, Thrift, open compute (open
source hardware and datacenter design)…

!

Google : android, GWT, chromium, linux kernel…
!   through their papers : GFS, MapReduce

!

LinkedIn : Voldemort, Kafka, Zoie …

!

NetFlix : a huge list of software…

I trust software I hacked myself
39

A way to expose services of
applications, to be re-used by
others to build and enrich their
own services and applications

40

42

http://www.programmableweb.com/

! They take advantage of innovation made by others
(individuals, or companies)
! Crowdsourced RnD !

44

Be a platform from the beginning
Memo de Jeff Bezos (2002)
1) All teams will expose their data and functionality through service
interfaces.
2) Teams must communicate with each other through these interfaces.
3) There will be no other form of interprocess communication allowed: no
direct linking, no direct reads of another team’s data store, no sharedmemory model, no back-doors whatsoever. The only communication
allowed is via service interface calls over the network.
4) It doesn’t matter what technology they use. HTTP, Corba, Pubsub,
custom protocols — doesn’t matter. Bezos doesn’t care.
5) All service interfaces, without exception, must be designed from the
ground up to be externalizable. That is to say, the team must plan and
design to be able to expose the interface to developers in the outside world.
No exceptions.
6) Anyone who doesn’t do this will be fired.
7) Thank you; have a nice day!
46

Open API : advantages to do it
!

Leverage effect
!   enrich your service portfolio and business opportunities with many
partners

!

Do bigger things by using « collective intelligence of the world »

!

Create an ecosystem around you

!

Improve the quality
!   If you want your APIs to be used,
!   Companies of the world are looking at what you are doing à it
brings pressure on you to improve

!

Attract talented people
!   The best way to attract good developers : they will want to come
and work with those who created these APIs

47

FASTER
One of the things we most value at
Facebook engineering is moving fast.

48

© OCTO 2014

We try things. We celebrate our failures.

This is a company where it is absolutely OK

to try something that is very hard, have it not be
successful, take the learning and apply it to
something new

Eric Schmidt
former Google’s CEO

Move fast and break things

Mark Zuckerberg
Facebook

Failure is totally OK.

As long as you fail fast

50

Marissa Mayer
Yahoo

The minimum viable product
is that version of a new product
which allows a team to collect the
maximum amount of validated
learning about customers with the
least effort

Eric Ries
pioneer of Lean Startup

52

Short cycles to validate quickly each hypothesis

Lean Startup example

- 55Les géants du Web

Multi-variant testing / Google analytics

59

How long would it take your
organization to deploy a change that involves
just one single line of code?

Mary Poppendieck
From Concept To Cash

61

!

2 deployments per day

!
!

A deployment somewhere in datacenters every 11 seconds
Any moment, an average of 10’000 servers are being
updated

!

10 deployments / day

62

Why deploy continuously ?

!
!

Improve Time To Market
Learn Faster

IDEAS

(and it needs metrics !)

LEARN FAST

DATA

CODE FAST

CODE

MEASURE FAST

63

Why deploy continuously ?

!
!

Smaller change = Smallest Time-to-Recover
You reduce the risks, by lowering the impacts of problems

64

DevOps

1.
Infrastructure
as
Code

2.
Con;nuous
Delivery

3.
Collabora;on

65

65

Infra as Code : Industrialize and Automate everything

logstash
chef

puppet

vagrant
git !
capistrano
open stack
test driven infrastructure !
66

Continuous Delivery : a pipeline to bring code to production

67

Tools and practices
! Continuous integration
! TDD - Test Driven Development
(automated unit testing)
! Code reviews
! Continuous code auditing (sonar…)
! Functional test automation
! Strong non-functional tests
(performance, availability…)
! Automated packaging and deployment,
independent of target environment

! Zero downtime deployment
68

Feature flipping

!
!
!

Push code to production != push a feature to production
Enable/ Disable a new feature on production in seconds
“Graceful degradation” during peaks of traffic

!

Can be used for A/B testing !

69

Datamodel evolution strategy example
Datamodel
Version N

Datamodel
Version N

V.1

Datamodel
Version N+1

Hybrid

V.1 + V.2

Datamodel
Version N+1

V.2

70

Dark Launch @ Facebook

We chose to simulate the impact of
many real users hitting many machines by
means of a “dark launch” period in which
Facebook pages would make connections to
the chat servers, query for presence
information and simulate message sends
without a single UI element drawn
on the page.

YES !
IT’S A LOAD TEST ON A PRODUCTION PLATFORM !
71

You build it,

You run it !

73

Des outils partagés, qui facilitent les interactions

Open
the
tools

to
the
devs
!

3.

COLLABORATIO
N

(culture,
organisa@on…)

74

NetFlix Hystrix

77

Hystrix is a latency and fault tolerance library
designed to isolate points of access to remote
systems, services and 3rd party libraries, stop
cascading failure and enable resilience in complex
distributed systems where failure is inevitable.

In God we trust.

All others must bring data

W. Edwards Deming

79

Everyone must be able to experiment, learn and iterate.

Position, obedience and tradition should not hold no power.

For innovation to ﬂourish, measurement must rule.

Werner Vogels ,
CTO of Amazon

80

!

They measure everything
!
!
!
!
!
!

!

81

 
 
 
 
 
 

usage
infrastructure, from datacenter to HDD power consumption
operational processes efficiency
…
self-service restaurant queue length !
management practices (Google)

Good ideas come from the field, from real data, because
managers always have biases when they try to interpret
situations

Best size for teams

82

http://www.qsm.com/process_improvement_01.html

Use a Component oriented organization ?

Feature 1
Feature 2

Team
Back

Feature 4

Team
middleware

Feature 5

84

Team
Front

Team
framework

Feature team = cross functional teams

Product Owner – UX designer –Developers – Testers – Ops
85

If an idea worths 1,
a well-executed idea worths
$100...$1’000...$10’000’000 !

Attract and hire the best
WHAT FACEBOOK EMPLOYEES
EARN:
Senior software engineer $132,503
Product manager $130,143
User interface engineer $129,136
Machine learning engineer $123,379
Engineering manager $123,379

Source : www.glassdoor.com/index.htm

Software engineer $111,562
Project manager $98,302
Operations engineer $82,626
Site reliability engineer $80,413
Software engineering intern $74,700
Account executive $62,124

Network engineer $121,500
Business development mgr $115,000

!

They are also known to have tough technical interviews, to get
only the best developers !

Develop the talents !
!

Lots of training

!

Code review / Pair programming

!

Mentoring

!

Slack-time dedicated to RnD , or personal projects

!

Hackatons

!

Strong open source involvement

THANK YOU !
!
!
!

To get these slides,
To get the book in french (for free)
To be notified when the book is
available in English

JUST SEND ME AN EMAIL !
mde@octo.com

95

© OCTO 2014

Web Giants Innovations

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (6)

En vedette

En vedette (12)

Similaire à Web Giants Innovations

Similaire à Web Giants Innovations (20)

Plus de OCTO Technology

Plus de OCTO Technology (20)

Dernier

Dernier (20)

Web Giants Innovations