SMACK è l'acronimo di Spark, Mesos, Akka, Cassandra e Kafka. Il titolo del talk "provocatoriamente" confronta lo stack di tecnologie per lo sviluppo di applicazioni Reactive con quello più comunemente utilizzato nell'ambito dello sviluppo web. Durante il talk verranno illustrati i concetti di base della Reactive programming, le differenze concettuali introdotte da questo paradigma rispetto all'approccio "classico" della programmazione web ed alcuni casi di successo legati all'utilizzo di queste tecnologie.
3. ONCE UPON A TIME…
LAMP is an archetypal model of web
service stacks
Acronym of the names of its original open-
source components:
Linux Operating System
Apache Web Server
MySQL RDBMS
PHP Programming Language
5. ONCE UPON A TIME…
Specific solutions are required for
websites that serve large numbers of
requests, or provide high uptime
High-availability approaches may involve
multiple web and database servers,
combined with additional components to
distribute workload across multiple
servers
12. Responsive
o The system responds in a timely
manner if at all possible
Resilient
o The system stays responsive in
the face of failure
THE REACTIVE MANIFESTO
13. Event-Driven
o Reactive Systems rely on asynchronous
message-passing to establish a boundary
between components that ensures loose
coupling, isolation and location
transparency
Elastic
o The system stays responsive under
varying workload reacting to changes in
the input rate by increasing or decreasing
the allocated resources
THE REACTIVE MANIFESTO
16. REACTIVE PROGRAMMING
It becomes possible to express static or
dynamic data streams with ease and that
an inferred dependency within the
associated execution model exists, which
facilitates the automatic propagation of the
change involved with data flow
17. REACTIVE PROGRAMMING
To show the real power of
Reactive, let's just say
that you want to have a
stream of "double click"
events
Difficult to manage in a
traditional imperative and
stateful fashion
4 lines of code using a
reactive approach
18.
19. Apache Mesos is an open-source project
to manage computer clusters originally
developed at the University of California,
Berkeley
Apache Mesos abstracts CPU, memory,
storage from machines, enabling fault-
tolerant and elastic distributed systems to
easily be built and run effectively
APACHE MESOS
20. Mesos is built using the same principles as
the Linux kernel, only at a different level of
abstraction
The Mesoskernel runs on every machine
and provides applications (e.g., Hadoop,
Spark, Kafka, Elasticsearch) with API’s
forresource management and scheduling
across entire datacenter and cloud
environments
APACHE MESOS
21. Mesos Features
• Linear Scalability - up to 10k(x) of nodes
• High Availability - using Zookeeper
• Containers Support - Docker, AppC
• Two Level Scheduling
• HTTP APIs
• Web UI
• Cross Platform - Linux, Windows, OS X
• Resource Isolation
APACHE MESOS
23. Akka is an open-source toolkit and
runtime simplifying the construction of
concurrent and distributed applications
on the JVM
Akka supports multiple programming
models for concurrency, but it
emphasizes actor-based concurrency,
with inspiration drawn from Erlang
24. That model treats "actors" as the
universal primitives of concurrent
computation
In response to a message that it
receives, an actor can: make local
decisions, create more actors, send
more messages, and determine how to
respond to the next message received
25.
26.
27. The Akka HTTP modules implement a full
server- and client-side HTTP stack on top of
akka-actor and akka-stream
It’s not a web-framework but rather a more
general toolkit for providing and consuming
HTTP-based services
Web frameworks like Play and Lagom both
use Akka internally
29. Apache Kafka is an open-source
stream processing platform mantained
by the Apache Software Foundation
written in Scala and Java
The project aims to provide a unified,
high-throughput, low-latency platform
for handling real-time data feeds
KAFKA
30. Apache Kafka was originally developed
by LinkedIn, and was subsequently open
sourced in early 2011
Kafka stores messages which come
from arbitrarily many processes called
"producers”, the data can thereby be
partitioned in different "partitions" within
different "topics"
KAFKA
33. KAFKA
What is Kafka good for?
It gets used for two broad classes of
application:
• Building real-time streaming data pipelines
that reliably get data between systems or
applications
• Building real-time streaming applications
that transform or react to the streams of
data
35. CASSANDRA
Apache Cassandra is an open-source
distributed NoSQL database
management system designed to handle
large amounts of data across many
commodity servers, providing high
availability with no single point of failure
Initially developed at Facebook to power
the inbox search feature
36. CASSANDRA
Main Features
• Decentralized (p2p architecture)
• Multi data center replication
• Scalability and fault-tolerance
• Tunable consistency
• SQL-like query language (CQL)
Cassandra is essentially a hybrid between a
key-value (DynamoDB) and a column-
oriented (BigTable) database management
system
38. Apache Spark is an open-source cluster-
computing framework originally developed at
the University of California, Berkeley's AMPLab
Apache Spark requires a cluster manager and
a distributed storage system
For cluster management, Spark supports
standalone (native Spark cluster), Hadoop
YARN, or Apache Mesos
40. For distributed storage, Spark can
interface with a wide variety, including
HDFS (Hadoop Distributed File System),
Cassandra, OpenStack Swift, Amazon S3,
Kudu and others
Spark Streaming leverages Spark Core's
fast scheduling capability to perform
streaming analytics
41. Spark MLlib is a distributed machine
learning framework on top of Spark Core
Many common machine learning and
statistical algorithms have been
implemented and are shipped with MLlib
which simplifies large scale machine
learning pipelines