SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Thrift: Scalable Cross-Language Services Implementation
                                             Mark Slee, Aditya Agarwal and Marc Kwiatkowski
                                                      Facebook, 156 University Ave, Palo Alto, CA
                                                              {mcslee,aditya,marc}@facebook.com




Abstract                                                                         In evaluating the challenges of cross-language interaction in a net-
                                                                                 worked environment, some key components were identified:
Thrift is a software library and set of code-generation tools devel-
oped at Facebook to expedite development and implementation of                   Types. A common type system must exist across programming lan-
efficient and scalable backend services. Its primary goal is to en-               guages without requiring that the application developer use custom
able efficient and reliable communication across programming lan-                 Thrift datatypes or write their own serialization code. That is, a C++
guages by abstracting the portions of each language that tend to                 programmer should be able to transparently exchange a strongly
require the most customization into a common library that is imple-              typed STL map for a dynamic Python dictionary. Neither program-
mented in each language. Specifically, Thrift allows developers to                mer should be forced to write any code below the application layer
define datatypes and service interfaces in a single language-neutral              to achieve this. Section 2 details the Thrift type system.
file and generate all the necessary code to build RPC clients and
                                                                                 Transport. Each language must have a common interface to bidirec-
servers.
                                                                                 tional raw data transport. The specifics of how a given transport is
This paper details the motivations and design choices we made                    implemented should not matter to the service developer. The same
in Thrift, as well as some of the more interesting implementation                application code should be able to run against TCP stream sockets,
details. It is not intended to be taken as research, but rather it is an         raw data in memory, or files on disk. Section 3 details the Thrift
exposition on what we did and why.                                               Transport layer.
                                                                                 Protocol. Datatypes must have some way of using the Transport
1. Introduction                                                                  layer to encode and decode themselves. Again, the application
                                                                                 developer need not be concerned by this layer. Whether the service
As Facebook’s traffic and network structure have scaled, the re-                  uses an XML or binary protocol is immaterial to the application
source demands of many operations on the site (i.e. search, ad se-               code. All that matters is that the data can be read and written
lection and delivery, event logging) have presented technical re-                in a consistent, deterministic matter. Section 4 details the Thrift
quirements drastically outside the scope of the LAMP framework.                  Protocol layer.
In our implementation of these services, various programming lan-
guages have been selected to optimize for the right combination of               Versioning. For robust services, the involved datatypes must pro-
performance, ease and speed of development, availability of exist-               vide a mechanism for versioning themselves. Specifically, it should
ing libraries, etc. By and large, Facebook’s engineering culture has             be possible to add or remove fields in an object or alter the argu-
tended towards choosing the best tools and implementations avail-                ment list of a function without any interruption in service (or, worse
able over standardizing on any one programming language and be-                  yet, nasty segmentation faults). Section 5 details Thrift’s versioning
grudgingly accepting its inherent limitations.                                   system.
Given this design choice, we were presented with the challenge                   Processors. Finally, we generate code capable of processing data
of building a transparent, high-performance bridge across many                   streams to accomplish remote procedure calls. Section 6 details the
programming languages. We found that most available solutions                    generated code and TProcessor paradigm.
were either too limited, did not offer sufficient datatype freedom,               Section 7 discusses implementation details, and Section 8 describes
or suffered from subpar performance. 1                                           our conclusions.
The solution that we have implemented combines a language-
neutral software stack implemented across numerous programming                   2. Types
languages and an associated code generation engine that trans-
                                                                                 The goal of the Thrift type system is to enable programmers to
forms a simple interface and data definition language into client
                                                                                 develop using completely natively defined types, no matter what
and server remote procedure call libraries. Choosing static code
                                                                                 programming language they use. By design, the Thrift type system
generation over a dynamic system allows us to create validated
                                                                                 does not introduce any special dynamic types or wrapper objects.
code that can be run without the need for any advanced introspec-
                                                                                 It also does not require that the developer write any code for object
tive run-time type checking. It is also designed to be as simple as
                                                                                 serialization or transport. The Thrift IDL (Interface Definition Lan-
possible for the developer, who can typically define all the neces-
                                                                                 guage) file is logically a way for developers to annotate their data
sary data structures and interfaces for a complex service in a single
                                                                                 structures with the minimal amount of extra information necessary
short file.
                                                                                 to tell a code generator how to safely transport the objects across
Surprised that a robust open solution to these relatively common                 languages.
problems did not yet exist, we committed early on to making the
Thrift implementation open source.                                               2.1   Base Types
                                                                                 The type system rests upon a few base types. In considering which
1 See   Appendix A for a discussion of alternative systems.                      types to support, we aimed for clarity and simplicity over abun-
dance, focusing on the key types available in all programming lan-            3:double decimals,
guages, ommitting any niche types available only in specific lan-              4:string name=quot;thriftyquot;
guages.                                                                   }
The base types supported by Thrift are:                                   In the target language, each definition generates a type with two
 • bool A boolean value, true or false                                    methods, read and write, which perform serialization and trans-
 • byte A signed byte
                                                                          port of the objects using a Thrift TProtocol object.

 • i16 A 16-bit signed integer                                            2.4   Exceptions
 • i32 A 32-bit signed integer                                            Exceptions are syntactically and functionally equivalent to structs
 • i64 A 64-bit signed integer                                            except that they are declared using the exception keyword instead
                                                                          of the struct keyword.
 • double A 64-bit floating point number
                                                                          The generated objects inherit from an exception base class as ap-
 • string An encoding-agnostic text or binary string
                                                                          propriate in each target programming language, in order to seam-
Of particular note is the absence of unsigned integer types. Because      lessly integrate with native exception handling in any given lan-
these types have no direct translation to native primitive types          guage. Again, the design emphasis is on making the code familiar
in many languages, the advantages they afford are lost. Further,          to the application developer.
there is no way to prevent the application developer in a language
like Python from assigning a negative value to an integer variable,       2.5   Services
leading to unpredictable behavior. From a design standpoint, we           Services are defined using Thrift types. Definition of a service is
observed that unsigned integers were very rarely, if ever, used for       semantically equivalent to defining an interface (or a pure virtual
arithmetic purposes, but in practice were much more often used            abstract class) in object oriented programming. The Thrift compiler
as keys or identifiers. In this case, the sign is irrelevant. Signed       generates fully functional client and server stubs that implement the
integers serve this same purpose and can be safely cast to their          interface. Services are defined as follows:
unsigned counterparts (most commonly in C++) when absolutely
necessary.                                                                service <name> {
2.2    Structs                                                              <returntype> <name>(<arguments>)
                                                                              [throws (<exceptions>)]
A Thrift struct defines a common object to be used across lan-               ...
guages. A struct is essentially equivalent to a class in object ori-      }
ented programming languages. A struct has a set of strongly typed
fields, each with a unique name identifier. The basic syntax for            An example:
defining a Thrift struct looks very similar to a C struct definition.
Fields may be annotated with an integer field identifier (unique to         service StringCache {
the scope of that struct) and optional default values. Field identifiers     void set(1:i32 key, 2:string value),
will be automatically assigned if omitted, though they are strongly         string get(1:i32 key) throws (1:KeyNotFound knf),
encouraged for versioning reasons discussed later.                          void delete(1:i32 key)
                                                                          }
2.3    Containers
                                                                          Note that void is a valid type for a function return, in addition
Thrift containers are strongly typed containers that map to the most      to all other defined Thrift types. Additionally, an async modifier
commonly used containers in common programming languages.                 keyword may be added to a void function, which will generate
They are annotated using the C++ template (or Java Generics) style.       code that does not wait for a response from the server. Note that a
There are three types available:                                          pure void function will return a response to the client which guar-
 • list<type> An ordered list of elements. Translates directly            antees that the operation has completed on the server side. With
      into an STL vector, Java ArrayList, or native array in script-      async method calls the client will only be guaranteed that the re-
      ing languages. May contain duplicates.                              quest succeeded at the transport layer. (In many transport scenarios
 • set<type> An unordered set of unique elements. Translates              this is inherently unreliable due to the Byzantine Generals’ Prob-
                                                                          lem. Therefore, application developers should take care only to use
      into an STL set, Java HashSet, set in Python, or native
                                                                          the async optimization in cases where dropped method calls are
      dictionary in PHP/Ruby.
                                                                          acceptable or the transport is known to be reliable.)
 • map<type1,type2> A map of strictly unique keys to values
      Translates into an STL map, Java HashMap, PHP associative           Also of note is the fact that argument lists and exception lists for
      array, or Python/Ruby dictionary.                                   functions are implemented as Thrift structs. All three constructs are
                                                                          identical in both notation and behavior.
While defaults are provided, the type mappings are not explic-
itly fixed. Custom code generator directives have been added to
substitute custom types in destination languages (i.e. hash map or        3. Transport
Google’s sparse hash map can be used in C++). The only require-           The transport layer is used by the generated code to facilitate data
ment is that the custom types support all the necessary iteration         transfer.
primitives. Container elements may be of any valid Thrift type, in-
cluding other containers or structs.                                      3.1   Interface
struct Example {                                                          A key design choice in the implementation of Thrift was to de-
  1:i32 number=10,                                                        couple the transport layer from the code generation layer. Though
  2:i64 bigNumber,                                                        Thrift is typically used on top of the TCP/IP stack with streaming
sockets as the base layer of communication, there was no com-           messaging structure when transporting data, but it is agnostic to
pelling reason to build that constraint into the system. The perfor-    the protocol encoding in use. That is, it does not matter whether
mance tradeoff incurred by an abstracted I/O layer (roughly one         data is encoded as XML, human-readable ASCII, or a dense binary
virtual method lookup / function call per operation) was immate-        format as long as the data supports a fixed set of operations that
rial compared to the cost of actual I/O operations (typically invok-    allow it to be deterministically read and written by generated code.
ing system calls).
Fundamentally, generated Thrift code only needs to know how to          4.1   Interface
read and write data. The origin and destination of the data are
                                                                        The Thrift Protocol interface is very straightforward. It fundamen-
irrelevant; it may be a socket, a segment of shared memory, or a
                                                                        tally supports two things: 1) bidirectional sequenced messaging,
file on the local disk. The Thrift transport interface supports the
                                                                        and 2) encoding of base types, containers, and structs.
following methods:
 • open Opens the tranpsort                                             writeMessageBegin(name, type, seq)
 • close Closes the tranport                                            writeMessageEnd()
                                                                        writeStructBegin(name)
 • isOpen Indicates whether the transport is open                       writeStructEnd()
 • read Reads from the transport                                        writeFieldBegin(name, type, id)
                                                                        writeFieldEnd()
 • write Writes to the transport
                                                                        writeFieldStop()
 • flush Forces any pending writes                                      writeMapBegin(ktype, vtype, size)
                                                                        writeMapEnd()
There are a few additional methods not documented here which are        writeListBegin(etype, size)
used to aid in batching reads and optionally signaling the comple-      writeListEnd()
tion of a read or write operation from the generated code.              writeSetBegin(etype, size)
In addition to the above TTransport interface, there is a               writeSetEnd()
TServerTransport interface used to accept or create primitive           writeBool(bool)
transport objects. Its interface is as follows:                         writeByte(byte)
                                                                        writeI16(i16)
 • open Opens the transport                                             writeI32(i32)
 • listen Begins listening for connections                              writeI64(i64)
                                                                        writeDouble(double)
 • accept Returns a new client transport                                writeString(string)
 • close Closes the transport
                                                                        name, type, seq = readMessageBegin()
3.2     Implementation                                                                    readMessageEnd()
                                                                        name =            readStructBegin()
The transport interface is designed for simple implementation in
                                                                                          readStructEnd()
any programming language. New transport mechanisms can be
                                                                        name, type, id = readFieldBegin()
easily defined as needed by application developers.
                                                                                          readFieldEnd()
3.2.1    TSocket                                                        k, v, size =      readMapBegin()
                                                                                          readMapEnd()
The TSocket class is implemented across all target languages. It        etype, size =     readListBegin()
provides a common, simple interface to a TCP/IP stream socket.                            readListEnd()
                                                                        etype, size =     readSetBegin()
3.2.2    TFileTransport                                                                   readSetEnd()
The TFileTransport is an abstraction of an on-disk file to a data        bool =            readBool()
stream. It can be used to write out a set of incoming Thrift requests   byte =            readByte()
to a file on disk. The on-disk data can then be replayed from the log,   i16 =             readI16()
either for post-processing or for reproduction and/or simulation of     i32 =             readI32()
past events.                                                            i64 =             readI64()
                                                                        double =          readDouble()
3.2.3    Utilities                                                      string =          readString()
The Transport interface is designed to support easy extension us-
ing common OOP techniques, such as composition. Some sim-               Note that every write function has exactly one read counter-
ple utilites include the TBufferedTransport, which buffers the          part, with the exception of writeFieldStop(). This is a special
writes and reads on an underlying transport, the TFramedTransport,      method that signals the end of a struct. The procedure for reading a
which transmits data with frame size headers for chunking op-           struct is to readFieldBegin() until the stop field is encountered,
timization or nonblocking operation, and the TMemoryBuffer,             and then to readStructEnd(). The generated code relies upon
which allows reading and writing directly from the heap or stack        this call sequence to ensure that everything written by a protocol
memory owned by the process.                                            encoder can be read by a matching protocol decoder. Further note
                                                                        that this set of functions is by design more robust than necessary.
                                                                        For example, writeStructEnd() is not strictly necessary, as the
4. Protocol                                                             end of a struct may be implied by the stop field. This method is a
A second major abstraction in Thrift is the separation of data          convenience for verbose protocols in which it is cleaner to separate
structure from transport representation. Thrift enforces a certain      these calls (e.g. a closing </struct> tag in XML).
4.2   Structure                                                           }
Thrift structures are designed to support encoding into a streaming       To avoid conflicts between manually and automatically assigned
protocol. The implementation should never need to frame or com-           identifiers, fields with identifiers omitted are assigned identifiers
pute the entire data length of a structure prior to encoding it. This     decrementing from -1, and the language only supports the manual
is critical to performance in many scenarios. Consider a long list        assignment of positive identifiers.
of relatively large strings. If the protocol interface required reading
or writing a list to be an atomic operation, then the implementation      When data is being deserialized, the generated code can use these
would need to perform a linear pass over the entire list before en-       identifiers to properly identify the field and determine whether it
coding any data. However, if the list can be written as iteration is      aligns with a field in its definition file. If a field identifier is not
performed, the corresponding read may begin in parallel, theoreti-        recognized, the generated code can use the type specifier to skip
cally offering an end-to-end speedup of (kN − C), where N is the          the unknown field without any error. Again, this is possible due to
size of the list, k the cost factor associated with serializing a sin-    the fact that all datatypes are self delimiting.
gle element, and C is fixed offset for the delay between data being        Field identifiers can (and should) also be specified in function
written and becoming available to read.                                   argument lists. In fact, argument lists are not only represented as
Similarly, structs do not encode their data lengths a priori. Instead,    structs on the backend, but actually share the same code in the
they are encoded as a sequence of fields, with each field having a          compiler frontend. This allows for version-safe modification of
type specifier and a unique field identifier. Note that the inclusion of     method parameters
type specifiers allows the protocol to be safely parsed and decoded        service StringCache {
without any generated code or access to the original IDL file.               void set(1:i32 key, 2:string value),
Structs are terminated by a field header with a special STOP type.           string get(1:i32 key) throws (1:KeyNotFound knf),
Because all the basic types can be read deterministically, all structs      void delete(1:i32 key)
(even those containing other structs) can be read deterministically.      }
The Thrift protocol is self-delimiting without any framing and
regardless of the encoding format.                                        The syntax for specifying field identifiers was chosen to echo their
In situations where streaming is unnecessary or framing is advan-         structure. Structs can be thought of as a dictionary where the iden-
tageous, it can be very simply added into the transport layer, using      tifiers are keys, and the values are strongly-typed named fields.
the TFramedTransport abstraction.                                         Field identifiers internally use the i16 Thrift type. Note, however,
                                                                          that the TProtocol abstraction may encode identifiers in any for-
4.3   Implementation                                                      mat.
Facebook has implemented and deployed a space-efficient binary
protocol which is used by most backend services. Essentially, it          5.2   Isset
writes all data in a flat binary format. Integer types are converted       When an unexpected field is encountered, it can be safely ignored
to network byte order, strings are prepended with their byte length,      and discarded. When an expected field is not found, there must be
and all message and field headers are written using the primitive          some way to signal to the developer that it was not present. This
integer serialization constructs. String names for fields are omitted      is implemented via an inner isset structure inside the defined
- when using generated code, field identifiers are sufficient.               objects. (Isset functionality is implicit with a null value in PHP,
We decided against some extreme storage optimizations (i.e. pack-         None in Python and nil in Ruby.) Essentially, the inner isset
ing small integers into ASCII or using a 7-bit continuation for-          object of each Thrift struct contains a boolean value for each field
mat) for the sake of simplicity and clarity in the code. These alter-     which denotes whether or not that field is present in the struct.
ations can easily be made if and when we encounter a performance-         When a reader receives a struct, it should check for a field being
critical use case that demands them.                                      set before operating directly on it.
                                                                          class Example {
5. Versioning                                                              public:
Thrift is robust in the face of versioning and data definition               Example() :
changes. This is critical to enable staged rollouts of changes to             number(10),
deployed services. The system must be able to support reading of              bigNumber(0),
old data from log files, as well as requests from out-of-date clients          decimals(0),
to new servers, and vice versa.                                               name(quot;thriftyquot;) {}

5.1   Field Identifiers                                                        int32_t number;
Versioning in Thrift is implemented via field identifiers. The field             int64_t bigNumber;
header for every member of a struct in Thrift is encoded with                 double decimals;
a unique field identifier. The combination of this field identifier               std::string name;
and its type specifier is used to uniquely identify the field. The
Thrift definition language supports automatic assignment of field               struct __isset {
identifiers, but it is good programming practice to always explicitly            __isset() :
specify field identifiers. Identifiers are specified as follows:                      number(false),
                                                                                  bigNumber(false),
struct Example {                                                                  decimals(false),
  1:i32 number=10,                                                                name(false) {}
  2:i64 bigNumber,                                                              bool number;
  3:double decimals,                                                            bool bigNumber;
  4:string name=quot;thriftyquot;                                                       bool decimals;
bool name;                                                                      TProtocol in
  } __isset;                                                                        TProtocol out
...                                                                               class ServiceProcessor : TProcessor
}                                                                                   ServiceIf handler
5.3   Case Analysis                                                      ServiceHandler.cpp
There are four cases in which version mismatches may occur.               class ServiceHandler : virtual ServiceIf

1. Added field, old client, new server. In this case, the old client      TServer.cpp
   does not send the new field. The new server recognizes that the         TServer(TProcessor processor,
   field is not set, and implements default behavior for out-of-date               TServerTransport transport,
   requests.                                                                      TTransportFactory tfactory,
2. Removed field, old client, new server. In this case, the old client             TProtocolFactory pfactory)
   sends the removed field. The new server simply ignores it.              serve()
3. Added field, new client, old server. The new client sends a field       From the Thrift definition file, we generate the virtual service in-
   that the old server does not recognize. The old server simply         terface. A client class is generated, which implements the interface
   ignores it and processes as normal.                                   and uses two TProtocol instances to perform the I/O operations.
4. Removed field, new client, old server. This is the most danger-        The generated processor implements the TProcessor interface.
   ous case, as the old server is unlikely to have suitable default      The generated code has all the logic to handle RPC invocations
   behavior implemented for the missing field. It is recommended          via the process() call, and takes as a parameter an instance of the
   that in this situation the new server be rolled out prior to the      service interface, as implemented by the application developer.
   new clients.                                                          The user provides an implementation of the application interface in
                                                                         separate, non-generated source code.
5.4   Protocol/Transport Versioning
The TProtocol abstractions are also designed to give protocol            6.3     TServer
implementations the freedom to version themselves in whatever            Finally, the Thrift core libraries provide a TServer abstraction. The
manner they see fit. Specifically, any protocol implementation is          TServer object generally works as follows.
free to send whatever it likes in the writeMessageBegin() call.
It is entirely up to the implementor how to handle versioning at the      • Use the TServerTransport to get a TTransport
protocol level. The key point is that protocol encoding changes are       • Use the TTransportFactory to optionally convert the primi-
safely isolated from interface definition version changes.                      tive transport into a suitable application transport (typically the
Note that the exact same is true of the TTransport interface. For              TBufferedTransportFactory is used here)
example, if we wished to add some new checksumming or error               • Use the TProtocolFactory to create an input and output
detection to the TFileTransport, we could simply add a version                 protocol for the TTransport
header into the data it writes to the file in such a way that it would
                                                                          • Invoke the process() method of the TProcessor object
still accept old log files without the given header.
                                                                         The layers are appropriately separated such that the server code
6. RPC Implementation                                                    needs to know nothing about any of the transports, encodings,
                                                                         or applications in play. The server encapsulates the logic around
6.1   TProcessor
                                                                         connection handling, threading, etc. while the processor deals with
The last core interface in the Thrift design is the TProcessor,          RPC. The only code written by the application developer lives in
perhaps the most simple of the constructs. The interface is as           the definitional Thrift file and the interface implementation.
follows:
                                                                         Facebook has deployed multiple TServer implementations, in-
interface TProcessor {                                                   cluding the single-threaded TSimpleServer, thread-per-connection
  bool process(TProtocol in, TProtocol out)                              TThreadedServer, and thread-pooling TThreadPoolServer.
    throws TException                                                    The TProcessor interface is very general by design. There is no
}                                                                        requirement that a TServer take a generated TProcessor object.
The key design idea here is that the complex systems we build can        Thrift allows the application developer to easily write any type of
fundamentally be broken down into agents or services that operate        server that operates on TProtocol objects (for instance, a server
on inputs and outputs. In most cases, there is actually just one input   could simply stream a certain type of object without any actual RPC
and output (an RPC client) that needs handling.                          method invocation).

6.2   Generated Code                                                     7. Implementation Details
When a service is defined, we generate a TProcessor instance              7.1     Target Languages
capable of handling RPC requests to that service, using a few
helpers. The fundamental structure (illustrated in pseudo-C++) is        Thrift currently supports five target languages: C++, Java, Python,
as follows:                                                              Ruby, and PHP. At Facebook, we have deployed servers predom-
                                                                         inantly in C++, Java, and Python. Thrift services implemented in
Service.thrift                                                           PHP have also been embedded into the Apache web server, provid-
 => Service.cpp                                                          ing transparent backend access to many of our frontend constructs
     interface ServiceIf                                                 using a THttpClient implementation of the TTransport inter-
     class ServiceClient : virtual ServiceIf                             face.
Though Thrift was explicitly designed to be much more efficient           In C++, we use a relatively esoteric language construct: member
and robust than typical web technologies, as we were designing           function pointers.
our XML-based REST web services API we noticed that Thrift
could be easily used to define our service interface. Though we           std::map<std::string,
do not currently employ SOAP envelopes (in the authors’ opinions           void (ExampleServiceProcessor::*)(int32_t,
there is already far too much repetitive enterprise Java software          facebook::thrift::protocol::TProtocol*,
to do that sort of thing), we were able to quickly extend Thrift to        facebook::thrift::protocol::TProtocol*)>
generate XML Schema Definition files for our service, as well as            processMap_;
a framework for versioning different implementations of our web          Using these techniques, the cost of string processing is minimized,
service. Though public web services are admittedly tangential to         and we reap the benefit of being able to easily debug corrupt
Thrift’s core use case and design, Thrift facilitated rapid iteration    or misunderstood data by inspecting it for known string method
and affords us the ability to quickly migrate our entire XML-based       names.
web service onto a higher performance system should the need
arise.                                                                   7.4   Servers and Multithreading
7.2   Generated Structs                                                  Thrift services require basic multithreading to handle simultane-
                                                                         ous requests from multiple clients. For the Python and Java im-
We made a conscious decision to make our generated structs as            plementations of Thrift server logic, the standard threading li-
transparent as possible. All fields are publicly accessible; there are    braries distributed with the languages provide adequate support.
no set() and get() methods. Similarly, use of the isset object           For the C++ implementation, no standard multithread runtime li-
is not enforced. We do not include any FieldNotSetException              brary exists. Specifically, robust, lightweight, and portable thread
construct. Developers have the option to use these fields to write        manager and timer class implementations do not exist. We in-
more robust code, but the system is robust to the developer ignor-       vestigated existing implementations, namely boost::thread,
ing the isset construct entirely and will provide suitable default       boost::threadpool, ACE Thread Manager and ACE Timer.
behavior in all cases.
                                                                         While boost::threads[1] provides clean, lightweight and robust
This choice was motivated by the desire to ease application devel-       implementations of multi-thread primitives (mutexes, conditions,
opment. Our stated goal is not to make developers learn a rich new       threads) it does not provide a thread manager or timer implementa-
library in their language of choice, but rather to generate code that    tion.
allow them to work with the constructs that are most familiar in
each language.                                                           boost::threadpool[2] also looked promising but was not far
                                                                         enough along for our purposes. We wanted to limit the dependency
We also made the read() and write() methods of the generated             on third-party libraries as much as possible. Because
objects public so that the objects can be used outside of the con-       boost::threadpool is not a pure template library and requires
text of RPC clients and servers. Thrift is a useful tool simply for      runtime libraries and because it is not yet part of the official
generating objects that are easily serializable across programming       Boost distribution we felt it was not ready for use in Thrift. As
languages.                                                               boost::threadpool evolves and especially if it is added to the
7.3   RPC Method Identification                                           Boost distribution we may reconsider our decision to not use it.

Method calls in RPC are implemented by sending the method name           ACE has both a thread manager and timer class in addition to multi-
as a string. One issue with this approach is that longer method          thread primitives. The biggest problem with ACE is that it is ACE.
names require more bandwidth. We experimented with using fixed-           Unlike Boost, ACE API quality is poor. Everything in ACE has
size hashes to identify methods, but in the end concluded that           large numbers of dependencies on everything else in ACE - thus
the savings were not worth the headaches incurred. Reliably deal-        forcing developers to throw out standard classes, such as STL col-
ing with conflicts across versions of an interface definition file is       lections, in favor of ACE’s homebrewed implementations. In addi-
impossible without a meta-storage system (i.e. to generate non-          tion, unlike Boost, ACE implementations demonstrate little under-
conflicting hashes for the current version of a file, we would have        standing of the power and pitfalls of C++ programming and take no
to know about all conflicts that ever existed in any previous version     advantage of modern templating techniques to ensure compile time
of the file).                                                             safety and reasonable compiler error messages. For all these rea-
                                                                         sons, ACE was rejected. Instead, we chose to implement our own
We wanted to avoid too many unnecessary string comparisons               library, described in the following sections.
upon method invocation. To deal with this, we generate maps from
strings to function pointers, so that invocation is effectively accom-   7.5   Thread Primitives
plished via a constant-time hash lookup in the common case. This         The Thrift thread libraries are implemented in the namespace
requires the use of a couple interesting code constructs. Because        facebook::thrift::concurrency and have three components:
Java does not have function pointers, process functions are all pri-
vate member classes implementing a common interface.                      • primitives
                                                                          • thread pool manager
private class ping implements ProcessFunction {
  public void process(int seqid,                                          • timer manager
                      TProtocol iprot,
                                                                         As mentioned above, we were hesitant to introduce any additional
                      TProtocol oprot)
                                                                         dependencies on Thrift. We decided to use boost::shared ptr
    throws TException
                                                                         because it is so useful for multithreaded application, it requires no
  { ...}
                                                                         link-time or runtime libraries (i.e. it is a pure template library) and
}
                                                                         it is due to become part of the C++0x standard.
HashMap<String,ProcessFunction> processMap_ =                            We implement standard Mutex and Condition classes, and a
  new HashMap<String,ProcessFunction>();                                 Monitor class. The latter is simply a combination of a mutex and
condition variable and is analogous to the Monitor implementa-           lifecycle of each object is independent of the other. An application
tion provided for the Java Object class. This is also sometimes          may create a set of Runnable object to be reused in different
referred to as a barrier. We provide a Synchronized guard class to       threads, or it may create and forget a Runnable object once a thread
allow Java-like synchronized blocks. This is just a bit of syntactic     has been created and started for it.
sugar, but, like its Java counterpart, clearly delimits critical sec-
                                                                         The Thread class takes a boost::shared ptr reference to the
tions of code. Unlike its Java counterpart, we still have the ability
                                                                         hosted Runnable object in its constructor, while the Runnable
to programmatically lock, unlock, block, and signal monitors.
                                                                         class has an explicit thread method to allow explicit binding of
void run() {                                                             the hosted thread. ThreadFactory::newThread binds the objects
  {Synchronized s(manager->monitor);                                     to each other.
    if (manager->state == TimerManager::STARTING) {
      manager->state = TimerManager::STARTED;                            7.7    ThreadManager
      manager->monitor.notifyAll();                                      ThreadManager creates a pool of worker threads and allows ap-
    }                                                                    plications to schedule tasks for execution as free worker threads
  }                                                                      become available. The ThreadManager does not implement dy-
}                                                                        namic thread pool resizing, but provides primitives so that applica-
                                                                         tions can add and remove threads based on load. This approach was
We again borrowed from Java the distinction between a thread and         chosen because implementing load metrics and thread pool size is
a runnable class. A Thread is the actual schedulable object. The         very application specific. For example some applications may want
Runnable is the logic to execute within the thread. The Thread           to adjust pool size based on running-average of work arrival rates
implementation deals with all the platform-specific thread creation       that are measured via polled samples. Others may simply wish to
and destruction issues, while the Runnable implementation deals          react immediately to work-queue depth high and low water marks.
with the application-specific per-thread logic. The benefit of this        Rather than trying to create a complex API abstract enough to cap-
approach is that developers can easily subclass the Runnable class       ture these different approaches, we simply leave it up to the par-
without pulling in platform-specific super-classes.                       ticular application and provide the primitives to enact the desired
7.6   Thread, Runnable, and shared ptr                                   policy and sample current status.

We use boost::shared ptr throughout the ThreadManager and                7.8    TimerManager
TimerManager implementations to guarantee cleanup of dead ob-            TimerManager allows applications to schedule Runnable objects
jects that can be accessed by multiple threads. For Thread class         for execution at some point in the future. Its specific task is to al-
implementations, boost::shared ptr usage requires particular             lows applications to sample ThreadManager load at regular inter-
attention to make sure Thread objects are neither leaked nor deref-      vals and make changes to the thread pool size based on application
erenced prematurely while creating and shutting down threads.            policy. Of course, it can be used to generate any number of timer or
Thread creation requires calling into a C library. (In our case the      alarm events.
POSIX thread library, libpthread, but the same would be true             The default implementation of TimerManager uses a single thread
for WIN32 threads). Typically, the OS makes few, if any, guaran-         to execute expired Runnable objects. Thus, if a timer operation
tees about when ThreadMain, a C thread’s entry-point function,           needs to do a large amount of work and especially if it needs to do
will be called. Therefore, it is possible that our thread create call,   blocking I/O, that should be done in a separate thread.
ThreadFactory::newThread() could return to the caller well
before that time. To ensure that the returned Thread object is not       7.9    Nonblocking Operation
prematurely cleaned up if the caller gives up its reference prior to
the ThreadMain call, the Thread object makes a weak referenence          Though the Thrift transport interfaces map more directly to a
to itself in its start method.                                           blocking I/O model, we have implemented a high performance
                                                                         TNonBlockingServer in C++ based on libevent and the TFramedTransport.
With the weak reference in hand the ThreadMain function can at-          We implemented this by moving all I/O into one tight event loop
tempt to get a strong reference before entering the Runnable::run        using a state machine. Essentially, the event loop reads framed
method of the Runnable object bound to the Thread. If no strong          requests into TMemoryBuffer objects. Once entire requests are
references to the thread are obtained between exiting Thread::start      ready, they are dispatched to the TProcessor object which can
and entering ThreadMain, the weak reference returns null and the         read directly from the data in memory.
function exits immediately.
                                                                         7.10    Compiler
The need for the Thread to make a weak reference to itself
has a significant impact on the API. Since references are man-            The Thrift compiler is implemented in C++ using standard lex/yacc
aged through the boost::shared ptr templates, the Thread                 lexing and parsing. Though it could have been implemented with
object must have a reference to itself wrapped by the same               fewer lines of code in another language (i.e. Python Lex-Yacc
boost::shared ptr envelope that is returned to the caller. This          (PLY) or ocamlyacc), using C++ forces explicit definition of the
necessitated the use of the factory pattern. ThreadFactory cre-          language constructs. Strongly typing the parse tree elements (de-
ates the raw Thread object and a boost::shared ptr wrap-                 batably) makes the code more approachable for new developers.
per, and calls a private helper method of the class implement-
                                                                         Code generation is done using two passes. The first pass looks
ing the Thread interface (in this case, PosixThread::weakRef)
                                                                         only for include files and type definitions. Type definitions are not
to allow it to make add weak reference to itself through the
                                                                         checked during this phase, since they may depend upon include
boost::shared ptr envelope.
                                                                         files. All included files are sequentially scanned in a first pass. Once
Thread and Runnable objects reference each other. A Runnable             the include tree has been resolved, a second pass over all files is
object may need to know about the thread in which it is executing,       taken that inserts type definitions into the parse tree and raises an
and a Thread, obviously, needs to know what Runnable object it           error on any undefined types. The program is then generated against
is hosting. This interdependency is further complicated because the      the parse tree.
Due to inherent complexities and potential for circular dependen-        • SOAP. XML-based. Designed for web services via HTTP, ex-
cies, we explicitly disallow forward declaration. Two Thrift structs       cessive XML parsing overhead.
cannot each contain an instance of the other. (Since we do not al-       • CORBA. Relatively comprehensive, debatably overdesigned
low null struct instances in the generated C++ code, this would            and heavyweight. Comparably cumbersome software instal-
actually be impossible.)                                                   lation.
7.11    TFileTransport                                                   • COM. Embraced mainly in Windows client softare. Not an
The TFileTransport logs Thrift requests/structs by framing in-             entirely open solution.
coming data with its length and writing it out to disk. Using a          • Pillar. Lightweight and high-performance, but missing version-
framed on-disk format allows for better error checking and helps           ing and abstraction.
with the processing of a finite number of discrete events. The            • Protocol Buffers. Closed-source, owned by Google. Described
TFileWriterTransport uses a system of swapping in-memory
                                                                           in Sawzall paper.
buffers to ensure good performance while logging large amounts
of data. A Thrift log file is split up into chunks of a specified size;
logged messages are not allowed to cross chunk boundaries. A mes-       Acknowledgments
sage that would cross a chunk boundary will cause padding to be         Many thanks for feedback on Thrift (and extreme trial by fire) are
added until the end of the chunk and the first byte of the message       due to Martin Smith, Karl Voskuil and Yishan Wong.
are aligned to the beginning of the next chunk. Partitioning the file
into chunks makes it possible to read and interpret data from a par-    Thrift is a successor to Pillar, a similar system developed by Adam
ticular point in the file.                                               D’Angelo, first while at Caltech and continued later at Facebook.
                                                                        Thrift simply would not have happened without Adam’s insights.
8. Facebook Thrift Services
Thrift has been employed in a large number of applications at
                                                                        References
Facebook, including search, logging, mobile, ads and the developer      [1] Kempf, William, “Boost.Threads”, http://www.boost.org/doc/
platform. Two specific usages are discussed below.                          html/threads.html
                                                                        [2] Henkel, Philipp, “threadpool”, http://threadpool.sourceforge.
8.1    Search                                                              net
Thrift is used as the underlying protocol and transport layer for the
Facebook Search service. The multi-language code generation is
well suited for search because it allows for application development
in an efficient server side language (C++) and allows the Facebook
PHP-based web application to make calls to the search service
using Thrift PHP libraries. There is also a large variety of search
stats, deployment and testing functionality that is built on top of
generated Python code. Additionally, the Thrift log file format is
used as a redo log for providing real-time search index updates.
Thrift has allowed the search team to leverage each language for its
strengths and to develop code at a rapid pace.
8.2    Logging
The Thrift TFileTransport functionality is used for structured
logging. Each service function definition along with its parameters
can be considered to be a structured log entry identified by the
function name. This log can then be used for a variety of purposes,
including inline and offline processing, stats aggregation and as a
redo log.

9. Conclusions
Thrift has enabled Facebook to build scalable backend services
efficiently by enabling engineers to divide and conquer. Application
developers can focus on application code without worrying about
the sockets layer. We avoid duplicated work by writing buffering
and I/O logic in one place, rather than interspersing it in each
application.
Thrift has been employed in a wide variety of applications at Face-
book, including search, logging, mobile, ads, and the developer
platform. We have found that the marginal performance cost in-
curred by an extra layer of software abstraction is far eclipsed by
the gains in developer efficiency and systems reliability.

A. Similar Systems
The following are software systems similar to Thrift. Each is
(very!) briefly described:

Contenu connexe

Tendances

JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
JPT : A SIMPLE JAVA-PYTHON TRANSLATOR JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
JPT : A SIMPLE JAVA-PYTHON TRANSLATOR caijjournal
 
Project_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_finalProject_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_finalJerin John
 
Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-Krishna Sai
 
.Net platform an understanding
.Net platform an understanding.Net platform an understanding
.Net platform an understandingBinu Bhasuran
 
Dot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part iDot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part iRakesh Joshi
 
Dot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part iDot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part iRakesh Joshi
 
Programing paradigm &amp; implementation
Programing paradigm &amp; implementationPrograming paradigm &amp; implementation
Programing paradigm &amp; implementationBilal Maqbool ツ
 
Rechkov. Lomonosov Report
Rechkov. Lomonosov ReportRechkov. Lomonosov Report
Rechkov. Lomonosov ReportAnton Rechkov
 
Is fortran still relevant comparing fortran with java and c++
Is fortran still relevant comparing fortran with java and c++Is fortran still relevant comparing fortran with java and c++
Is fortran still relevant comparing fortran with java and c++ijseajournal
 
.Net framework interview questions
.Net framework interview questions.Net framework interview questions
.Net framework interview questionsMir Majid
 
A Research Study of Data Collection and Analysis of Semantics of Programming ...
A Research Study of Data Collection and Analysis of Semantics of Programming ...A Research Study of Data Collection and Analysis of Semantics of Programming ...
A Research Study of Data Collection and Analysis of Semantics of Programming ...IRJET Journal
 
Introduction to c_sharp
Introduction to c_sharpIntroduction to c_sharp
Introduction to c_sharpHEM Sothon
 
Dotnet interview qa
Dotnet interview qaDotnet interview qa
Dotnet interview qaabcxyzqaz
 
Imperative programming
Imperative programmingImperative programming
Imperative programmingEdward Blurock
 
Comparative Study of programming Languages
Comparative Study of programming LanguagesComparative Study of programming Languages
Comparative Study of programming LanguagesIshan Monga
 
Presentation1
Presentation1Presentation1
Presentation1kpkcsc
 

Tendances (19)

JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
JPT : A SIMPLE JAVA-PYTHON TRANSLATOR JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
 
Project_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_finalProject_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_final
 
Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-
 
.Net platform an understanding
.Net platform an understanding.Net platform an understanding
.Net platform an understanding
 
Dot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part iDot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part i
 
Dot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part iDot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part i
 
Programing paradigm &amp; implementation
Programing paradigm &amp; implementationPrograming paradigm &amp; implementation
Programing paradigm &amp; implementation
 
Net framework
Net frameworkNet framework
Net framework
 
Rechkov. Lomonosov Report
Rechkov. Lomonosov ReportRechkov. Lomonosov Report
Rechkov. Lomonosov Report
 
Is fortran still relevant comparing fortran with java and c++
Is fortran still relevant comparing fortran with java and c++Is fortran still relevant comparing fortran with java and c++
Is fortran still relevant comparing fortran with java and c++
 
.Net framework interview questions
.Net framework interview questions.Net framework interview questions
.Net framework interview questions
 
.Net slid
.Net slid.Net slid
.Net slid
 
A Research Study of Data Collection and Analysis of Semantics of Programming ...
A Research Study of Data Collection and Analysis of Semantics of Programming ...A Research Study of Data Collection and Analysis of Semantics of Programming ...
A Research Study of Data Collection and Analysis of Semantics of Programming ...
 
Introduction to c_sharp
Introduction to c_sharpIntroduction to c_sharp
Introduction to c_sharp
 
C Course Material0209
C Course Material0209C Course Material0209
C Course Material0209
 
Dotnet interview qa
Dotnet interview qaDotnet interview qa
Dotnet interview qa
 
Imperative programming
Imperative programmingImperative programming
Imperative programming
 
Comparative Study of programming Languages
Comparative Study of programming LanguagesComparative Study of programming Languages
Comparative Study of programming Languages
 
Presentation1
Presentation1Presentation1
Presentation1
 

En vedette

What's behind facebook
What's behind facebookWhat's behind facebook
What's behind facebookAjen 陳
 
Facebook Technology Stack
Facebook Technology StackFacebook Technology Stack
Facebook Technology StackHusain Ali
 
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...Angelo Failla
 
FBTFTP: an opensource framework to build dynamic tftp servers
FBTFTP: an opensource framework to build dynamic tftp serversFBTFTP: an opensource framework to build dynamic tftp servers
FBTFTP: an opensource framework to build dynamic tftp serversAngelo Failla
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to ThriftDvir Volk
 

En vedette (8)

What's behind facebook
What's behind facebookWhat's behind facebook
What's behind facebook
 
Python at Facebook
Python at FacebookPython at Facebook
Python at Facebook
 
Facebook Technology Stack
Facebook Technology StackFacebook Technology Stack
Facebook Technology Stack
 
The internship
The internshipThe internship
The internship
 
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
 
FBTFTP: an opensource framework to build dynamic tftp servers
FBTFTP: an opensource framework to build dynamic tftp serversFBTFTP: an opensource framework to build dynamic tftp servers
FBTFTP: an opensource framework to build dynamic tftp servers
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
 
Facebook thrift
Facebook thriftFacebook thrift
Facebook thrift
 

Similaire à thrift-20070401

New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word documentSIVAJISADHANA
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word documentSIVAJISADHANA
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word documentSIVAJISADHANA
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftTalentica Software
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...IndicThreads
 
Prominent Blockchain Programming Languages to consider while building Blockch...
Prominent Blockchain Programming Languages to consider while building Blockch...Prominent Blockchain Programming Languages to consider while building Blockch...
Prominent Blockchain Programming Languages to consider while building Blockch...Shelly Megan
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesIRJET Journal
 
Class 17-18 Introduction to Perl.pdf bbbbbb
Class 17-18 Introduction to Perl.pdf bbbbbbClass 17-18 Introduction to Perl.pdf bbbbbb
Class 17-18 Introduction to Perl.pdf bbbbbb21h51a0581
 
Aspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETAspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETWaqas Tariq
 
Event Driven Programming in C#.docx
Event Driven Programming in C#.docxEvent Driven Programming in C#.docx
Event Driven Programming in C#.docxLenchoMamudeBaro
 
Resume_Appaji
Resume_AppajiResume_Appaji
Resume_AppajiAppaji K
 
Why Is Rust Gaining Traction In Recent Years?
Why Is Rust Gaining Traction In Recent Years?Why Is Rust Gaining Traction In Recent Years?
Why Is Rust Gaining Traction In Recent Years?Techahead Software
 

Similaire à thrift-20070401 (20)

New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
 
Prominent Blockchain Programming Languages to consider while building Blockch...
Prominent Blockchain Programming Languages to consider while building Blockch...Prominent Blockchain Programming Languages to consider while building Blockch...
Prominent Blockchain Programming Languages to consider while building Blockch...
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
 
Class 17-18 Introduction to Perl.pdf bbbbbb
Class 17-18 Introduction to Perl.pdf bbbbbbClass 17-18 Introduction to Perl.pdf bbbbbb
Class 17-18 Introduction to Perl.pdf bbbbbb
 
Introduction of .net framework
Introduction of .net frameworkIntroduction of .net framework
Introduction of .net framework
 
Aspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETAspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NET
 
Net Interview questions
Net Interview questionsNet Interview questions
Net Interview questions
 
Event Driven Programming in C#.docx
Event Driven Programming in C#.docxEvent Driven Programming in C#.docx
Event Driven Programming in C#.docx
 
Unit 1
Unit 1Unit 1
Unit 1
 
Chapter1
Chapter1Chapter1
Chapter1
 
Unit 2 Java
Unit 2 JavaUnit 2 Java
Unit 2 Java
 
Resume_Appaji
Resume_AppajiResume_Appaji
Resume_Appaji
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Introduction to ‘C’ Language
Introduction to ‘C’ LanguageIntroduction to ‘C’ Language
Introduction to ‘C’ Language
 
Chandra_CV 3 8Yr Exp
Chandra_CV 3 8Yr Exp Chandra_CV 3 8Yr Exp
Chandra_CV 3 8Yr Exp
 
Why Is Rust Gaining Traction In Recent Years?
Why Is Rust Gaining Traction In Recent Years?Why Is Rust Gaining Traction In Recent Years?
Why Is Rust Gaining Traction In Recent Years?
 

Plus de Hiroshi Ono

Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipediaHiroshi Ono
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説Hiroshi Ono
 
EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitectureHiroshi Ono
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdfHiroshi Ono
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdfHiroshi Ono
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 

Plus de Hiroshi Ono (20)

Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipedia
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説
 
EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitecture
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdf
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdf
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdf
 
camel-scala.pdf
camel-scala.pdfcamel-scala.pdf
camel-scala.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 

Dernier

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Dernier (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

thrift-20070401

  • 1. Thrift: Scalable Cross-Language Services Implementation Mark Slee, Aditya Agarwal and Marc Kwiatkowski Facebook, 156 University Ave, Palo Alto, CA {mcslee,aditya,marc}@facebook.com Abstract In evaluating the challenges of cross-language interaction in a net- worked environment, some key components were identified: Thrift is a software library and set of code-generation tools devel- oped at Facebook to expedite development and implementation of Types. A common type system must exist across programming lan- efficient and scalable backend services. Its primary goal is to en- guages without requiring that the application developer use custom able efficient and reliable communication across programming lan- Thrift datatypes or write their own serialization code. That is, a C++ guages by abstracting the portions of each language that tend to programmer should be able to transparently exchange a strongly require the most customization into a common library that is imple- typed STL map for a dynamic Python dictionary. Neither program- mented in each language. Specifically, Thrift allows developers to mer should be forced to write any code below the application layer define datatypes and service interfaces in a single language-neutral to achieve this. Section 2 details the Thrift type system. file and generate all the necessary code to build RPC clients and Transport. Each language must have a common interface to bidirec- servers. tional raw data transport. The specifics of how a given transport is This paper details the motivations and design choices we made implemented should not matter to the service developer. The same in Thrift, as well as some of the more interesting implementation application code should be able to run against TCP stream sockets, details. It is not intended to be taken as research, but rather it is an raw data in memory, or files on disk. Section 3 details the Thrift exposition on what we did and why. Transport layer. Protocol. Datatypes must have some way of using the Transport 1. Introduction layer to encode and decode themselves. Again, the application developer need not be concerned by this layer. Whether the service As Facebook’s traffic and network structure have scaled, the re- uses an XML or binary protocol is immaterial to the application source demands of many operations on the site (i.e. search, ad se- code. All that matters is that the data can be read and written lection and delivery, event logging) have presented technical re- in a consistent, deterministic matter. Section 4 details the Thrift quirements drastically outside the scope of the LAMP framework. Protocol layer. In our implementation of these services, various programming lan- guages have been selected to optimize for the right combination of Versioning. For robust services, the involved datatypes must pro- performance, ease and speed of development, availability of exist- vide a mechanism for versioning themselves. Specifically, it should ing libraries, etc. By and large, Facebook’s engineering culture has be possible to add or remove fields in an object or alter the argu- tended towards choosing the best tools and implementations avail- ment list of a function without any interruption in service (or, worse able over standardizing on any one programming language and be- yet, nasty segmentation faults). Section 5 details Thrift’s versioning grudgingly accepting its inherent limitations. system. Given this design choice, we were presented with the challenge Processors. Finally, we generate code capable of processing data of building a transparent, high-performance bridge across many streams to accomplish remote procedure calls. Section 6 details the programming languages. We found that most available solutions generated code and TProcessor paradigm. were either too limited, did not offer sufficient datatype freedom, Section 7 discusses implementation details, and Section 8 describes or suffered from subpar performance. 1 our conclusions. The solution that we have implemented combines a language- neutral software stack implemented across numerous programming 2. Types languages and an associated code generation engine that trans- The goal of the Thrift type system is to enable programmers to forms a simple interface and data definition language into client develop using completely natively defined types, no matter what and server remote procedure call libraries. Choosing static code programming language they use. By design, the Thrift type system generation over a dynamic system allows us to create validated does not introduce any special dynamic types or wrapper objects. code that can be run without the need for any advanced introspec- It also does not require that the developer write any code for object tive run-time type checking. It is also designed to be as simple as serialization or transport. The Thrift IDL (Interface Definition Lan- possible for the developer, who can typically define all the neces- guage) file is logically a way for developers to annotate their data sary data structures and interfaces for a complex service in a single structures with the minimal amount of extra information necessary short file. to tell a code generator how to safely transport the objects across Surprised that a robust open solution to these relatively common languages. problems did not yet exist, we committed early on to making the Thrift implementation open source. 2.1 Base Types The type system rests upon a few base types. In considering which 1 See Appendix A for a discussion of alternative systems. types to support, we aimed for clarity and simplicity over abun-
  • 2. dance, focusing on the key types available in all programming lan- 3:double decimals, guages, ommitting any niche types available only in specific lan- 4:string name=quot;thriftyquot; guages. } The base types supported by Thrift are: In the target language, each definition generates a type with two • bool A boolean value, true or false methods, read and write, which perform serialization and trans- • byte A signed byte port of the objects using a Thrift TProtocol object. • i16 A 16-bit signed integer 2.4 Exceptions • i32 A 32-bit signed integer Exceptions are syntactically and functionally equivalent to structs • i64 A 64-bit signed integer except that they are declared using the exception keyword instead of the struct keyword. • double A 64-bit floating point number The generated objects inherit from an exception base class as ap- • string An encoding-agnostic text or binary string propriate in each target programming language, in order to seam- Of particular note is the absence of unsigned integer types. Because lessly integrate with native exception handling in any given lan- these types have no direct translation to native primitive types guage. Again, the design emphasis is on making the code familiar in many languages, the advantages they afford are lost. Further, to the application developer. there is no way to prevent the application developer in a language like Python from assigning a negative value to an integer variable, 2.5 Services leading to unpredictable behavior. From a design standpoint, we Services are defined using Thrift types. Definition of a service is observed that unsigned integers were very rarely, if ever, used for semantically equivalent to defining an interface (or a pure virtual arithmetic purposes, but in practice were much more often used abstract class) in object oriented programming. The Thrift compiler as keys or identifiers. In this case, the sign is irrelevant. Signed generates fully functional client and server stubs that implement the integers serve this same purpose and can be safely cast to their interface. Services are defined as follows: unsigned counterparts (most commonly in C++) when absolutely necessary. service <name> { 2.2 Structs <returntype> <name>(<arguments>) [throws (<exceptions>)] A Thrift struct defines a common object to be used across lan- ... guages. A struct is essentially equivalent to a class in object ori- } ented programming languages. A struct has a set of strongly typed fields, each with a unique name identifier. The basic syntax for An example: defining a Thrift struct looks very similar to a C struct definition. Fields may be annotated with an integer field identifier (unique to service StringCache { the scope of that struct) and optional default values. Field identifiers void set(1:i32 key, 2:string value), will be automatically assigned if omitted, though they are strongly string get(1:i32 key) throws (1:KeyNotFound knf), encouraged for versioning reasons discussed later. void delete(1:i32 key) } 2.3 Containers Note that void is a valid type for a function return, in addition Thrift containers are strongly typed containers that map to the most to all other defined Thrift types. Additionally, an async modifier commonly used containers in common programming languages. keyword may be added to a void function, which will generate They are annotated using the C++ template (or Java Generics) style. code that does not wait for a response from the server. Note that a There are three types available: pure void function will return a response to the client which guar- • list<type> An ordered list of elements. Translates directly antees that the operation has completed on the server side. With into an STL vector, Java ArrayList, or native array in script- async method calls the client will only be guaranteed that the re- ing languages. May contain duplicates. quest succeeded at the transport layer. (In many transport scenarios • set<type> An unordered set of unique elements. Translates this is inherently unreliable due to the Byzantine Generals’ Prob- lem. Therefore, application developers should take care only to use into an STL set, Java HashSet, set in Python, or native the async optimization in cases where dropped method calls are dictionary in PHP/Ruby. acceptable or the transport is known to be reliable.) • map<type1,type2> A map of strictly unique keys to values Translates into an STL map, Java HashMap, PHP associative Also of note is the fact that argument lists and exception lists for array, or Python/Ruby dictionary. functions are implemented as Thrift structs. All three constructs are identical in both notation and behavior. While defaults are provided, the type mappings are not explic- itly fixed. Custom code generator directives have been added to substitute custom types in destination languages (i.e. hash map or 3. Transport Google’s sparse hash map can be used in C++). The only require- The transport layer is used by the generated code to facilitate data ment is that the custom types support all the necessary iteration transfer. primitives. Container elements may be of any valid Thrift type, in- cluding other containers or structs. 3.1 Interface struct Example { A key design choice in the implementation of Thrift was to de- 1:i32 number=10, couple the transport layer from the code generation layer. Though 2:i64 bigNumber, Thrift is typically used on top of the TCP/IP stack with streaming
  • 3. sockets as the base layer of communication, there was no com- messaging structure when transporting data, but it is agnostic to pelling reason to build that constraint into the system. The perfor- the protocol encoding in use. That is, it does not matter whether mance tradeoff incurred by an abstracted I/O layer (roughly one data is encoded as XML, human-readable ASCII, or a dense binary virtual method lookup / function call per operation) was immate- format as long as the data supports a fixed set of operations that rial compared to the cost of actual I/O operations (typically invok- allow it to be deterministically read and written by generated code. ing system calls). Fundamentally, generated Thrift code only needs to know how to 4.1 Interface read and write data. The origin and destination of the data are The Thrift Protocol interface is very straightforward. It fundamen- irrelevant; it may be a socket, a segment of shared memory, or a tally supports two things: 1) bidirectional sequenced messaging, file on the local disk. The Thrift transport interface supports the and 2) encoding of base types, containers, and structs. following methods: • open Opens the tranpsort writeMessageBegin(name, type, seq) • close Closes the tranport writeMessageEnd() writeStructBegin(name) • isOpen Indicates whether the transport is open writeStructEnd() • read Reads from the transport writeFieldBegin(name, type, id) writeFieldEnd() • write Writes to the transport writeFieldStop() • flush Forces any pending writes writeMapBegin(ktype, vtype, size) writeMapEnd() There are a few additional methods not documented here which are writeListBegin(etype, size) used to aid in batching reads and optionally signaling the comple- writeListEnd() tion of a read or write operation from the generated code. writeSetBegin(etype, size) In addition to the above TTransport interface, there is a writeSetEnd() TServerTransport interface used to accept or create primitive writeBool(bool) transport objects. Its interface is as follows: writeByte(byte) writeI16(i16) • open Opens the transport writeI32(i32) • listen Begins listening for connections writeI64(i64) writeDouble(double) • accept Returns a new client transport writeString(string) • close Closes the transport name, type, seq = readMessageBegin() 3.2 Implementation readMessageEnd() name = readStructBegin() The transport interface is designed for simple implementation in readStructEnd() any programming language. New transport mechanisms can be name, type, id = readFieldBegin() easily defined as needed by application developers. readFieldEnd() 3.2.1 TSocket k, v, size = readMapBegin() readMapEnd() The TSocket class is implemented across all target languages. It etype, size = readListBegin() provides a common, simple interface to a TCP/IP stream socket. readListEnd() etype, size = readSetBegin() 3.2.2 TFileTransport readSetEnd() The TFileTransport is an abstraction of an on-disk file to a data bool = readBool() stream. It can be used to write out a set of incoming Thrift requests byte = readByte() to a file on disk. The on-disk data can then be replayed from the log, i16 = readI16() either for post-processing or for reproduction and/or simulation of i32 = readI32() past events. i64 = readI64() double = readDouble() 3.2.3 Utilities string = readString() The Transport interface is designed to support easy extension us- ing common OOP techniques, such as composition. Some sim- Note that every write function has exactly one read counter- ple utilites include the TBufferedTransport, which buffers the part, with the exception of writeFieldStop(). This is a special writes and reads on an underlying transport, the TFramedTransport, method that signals the end of a struct. The procedure for reading a which transmits data with frame size headers for chunking op- struct is to readFieldBegin() until the stop field is encountered, timization or nonblocking operation, and the TMemoryBuffer, and then to readStructEnd(). The generated code relies upon which allows reading and writing directly from the heap or stack this call sequence to ensure that everything written by a protocol memory owned by the process. encoder can be read by a matching protocol decoder. Further note that this set of functions is by design more robust than necessary. For example, writeStructEnd() is not strictly necessary, as the 4. Protocol end of a struct may be implied by the stop field. This method is a A second major abstraction in Thrift is the separation of data convenience for verbose protocols in which it is cleaner to separate structure from transport representation. Thrift enforces a certain these calls (e.g. a closing </struct> tag in XML).
  • 4. 4.2 Structure } Thrift structures are designed to support encoding into a streaming To avoid conflicts between manually and automatically assigned protocol. The implementation should never need to frame or com- identifiers, fields with identifiers omitted are assigned identifiers pute the entire data length of a structure prior to encoding it. This decrementing from -1, and the language only supports the manual is critical to performance in many scenarios. Consider a long list assignment of positive identifiers. of relatively large strings. If the protocol interface required reading or writing a list to be an atomic operation, then the implementation When data is being deserialized, the generated code can use these would need to perform a linear pass over the entire list before en- identifiers to properly identify the field and determine whether it coding any data. However, if the list can be written as iteration is aligns with a field in its definition file. If a field identifier is not performed, the corresponding read may begin in parallel, theoreti- recognized, the generated code can use the type specifier to skip cally offering an end-to-end speedup of (kN − C), where N is the the unknown field without any error. Again, this is possible due to size of the list, k the cost factor associated with serializing a sin- the fact that all datatypes are self delimiting. gle element, and C is fixed offset for the delay between data being Field identifiers can (and should) also be specified in function written and becoming available to read. argument lists. In fact, argument lists are not only represented as Similarly, structs do not encode their data lengths a priori. Instead, structs on the backend, but actually share the same code in the they are encoded as a sequence of fields, with each field having a compiler frontend. This allows for version-safe modification of type specifier and a unique field identifier. Note that the inclusion of method parameters type specifiers allows the protocol to be safely parsed and decoded service StringCache { without any generated code or access to the original IDL file. void set(1:i32 key, 2:string value), Structs are terminated by a field header with a special STOP type. string get(1:i32 key) throws (1:KeyNotFound knf), Because all the basic types can be read deterministically, all structs void delete(1:i32 key) (even those containing other structs) can be read deterministically. } The Thrift protocol is self-delimiting without any framing and regardless of the encoding format. The syntax for specifying field identifiers was chosen to echo their In situations where streaming is unnecessary or framing is advan- structure. Structs can be thought of as a dictionary where the iden- tageous, it can be very simply added into the transport layer, using tifiers are keys, and the values are strongly-typed named fields. the TFramedTransport abstraction. Field identifiers internally use the i16 Thrift type. Note, however, that the TProtocol abstraction may encode identifiers in any for- 4.3 Implementation mat. Facebook has implemented and deployed a space-efficient binary protocol which is used by most backend services. Essentially, it 5.2 Isset writes all data in a flat binary format. Integer types are converted When an unexpected field is encountered, it can be safely ignored to network byte order, strings are prepended with their byte length, and discarded. When an expected field is not found, there must be and all message and field headers are written using the primitive some way to signal to the developer that it was not present. This integer serialization constructs. String names for fields are omitted is implemented via an inner isset structure inside the defined - when using generated code, field identifiers are sufficient. objects. (Isset functionality is implicit with a null value in PHP, We decided against some extreme storage optimizations (i.e. pack- None in Python and nil in Ruby.) Essentially, the inner isset ing small integers into ASCII or using a 7-bit continuation for- object of each Thrift struct contains a boolean value for each field mat) for the sake of simplicity and clarity in the code. These alter- which denotes whether or not that field is present in the struct. ations can easily be made if and when we encounter a performance- When a reader receives a struct, it should check for a field being critical use case that demands them. set before operating directly on it. class Example { 5. Versioning public: Thrift is robust in the face of versioning and data definition Example() : changes. This is critical to enable staged rollouts of changes to number(10), deployed services. The system must be able to support reading of bigNumber(0), old data from log files, as well as requests from out-of-date clients decimals(0), to new servers, and vice versa. name(quot;thriftyquot;) {} 5.1 Field Identifiers int32_t number; Versioning in Thrift is implemented via field identifiers. The field int64_t bigNumber; header for every member of a struct in Thrift is encoded with double decimals; a unique field identifier. The combination of this field identifier std::string name; and its type specifier is used to uniquely identify the field. The Thrift definition language supports automatic assignment of field struct __isset { identifiers, but it is good programming practice to always explicitly __isset() : specify field identifiers. Identifiers are specified as follows: number(false), bigNumber(false), struct Example { decimals(false), 1:i32 number=10, name(false) {} 2:i64 bigNumber, bool number; 3:double decimals, bool bigNumber; 4:string name=quot;thriftyquot; bool decimals;
  • 5. bool name; TProtocol in } __isset; TProtocol out ... class ServiceProcessor : TProcessor } ServiceIf handler 5.3 Case Analysis ServiceHandler.cpp There are four cases in which version mismatches may occur. class ServiceHandler : virtual ServiceIf 1. Added field, old client, new server. In this case, the old client TServer.cpp does not send the new field. The new server recognizes that the TServer(TProcessor processor, field is not set, and implements default behavior for out-of-date TServerTransport transport, requests. TTransportFactory tfactory, 2. Removed field, old client, new server. In this case, the old client TProtocolFactory pfactory) sends the removed field. The new server simply ignores it. serve() 3. Added field, new client, old server. The new client sends a field From the Thrift definition file, we generate the virtual service in- that the old server does not recognize. The old server simply terface. A client class is generated, which implements the interface ignores it and processes as normal. and uses two TProtocol instances to perform the I/O operations. 4. Removed field, new client, old server. This is the most danger- The generated processor implements the TProcessor interface. ous case, as the old server is unlikely to have suitable default The generated code has all the logic to handle RPC invocations behavior implemented for the missing field. It is recommended via the process() call, and takes as a parameter an instance of the that in this situation the new server be rolled out prior to the service interface, as implemented by the application developer. new clients. The user provides an implementation of the application interface in separate, non-generated source code. 5.4 Protocol/Transport Versioning The TProtocol abstractions are also designed to give protocol 6.3 TServer implementations the freedom to version themselves in whatever Finally, the Thrift core libraries provide a TServer abstraction. The manner they see fit. Specifically, any protocol implementation is TServer object generally works as follows. free to send whatever it likes in the writeMessageBegin() call. It is entirely up to the implementor how to handle versioning at the • Use the TServerTransport to get a TTransport protocol level. The key point is that protocol encoding changes are • Use the TTransportFactory to optionally convert the primi- safely isolated from interface definition version changes. tive transport into a suitable application transport (typically the Note that the exact same is true of the TTransport interface. For TBufferedTransportFactory is used here) example, if we wished to add some new checksumming or error • Use the TProtocolFactory to create an input and output detection to the TFileTransport, we could simply add a version protocol for the TTransport header into the data it writes to the file in such a way that it would • Invoke the process() method of the TProcessor object still accept old log files without the given header. The layers are appropriately separated such that the server code 6. RPC Implementation needs to know nothing about any of the transports, encodings, or applications in play. The server encapsulates the logic around 6.1 TProcessor connection handling, threading, etc. while the processor deals with The last core interface in the Thrift design is the TProcessor, RPC. The only code written by the application developer lives in perhaps the most simple of the constructs. The interface is as the definitional Thrift file and the interface implementation. follows: Facebook has deployed multiple TServer implementations, in- interface TProcessor { cluding the single-threaded TSimpleServer, thread-per-connection bool process(TProtocol in, TProtocol out) TThreadedServer, and thread-pooling TThreadPoolServer. throws TException The TProcessor interface is very general by design. There is no } requirement that a TServer take a generated TProcessor object. The key design idea here is that the complex systems we build can Thrift allows the application developer to easily write any type of fundamentally be broken down into agents or services that operate server that operates on TProtocol objects (for instance, a server on inputs and outputs. In most cases, there is actually just one input could simply stream a certain type of object without any actual RPC and output (an RPC client) that needs handling. method invocation). 6.2 Generated Code 7. Implementation Details When a service is defined, we generate a TProcessor instance 7.1 Target Languages capable of handling RPC requests to that service, using a few helpers. The fundamental structure (illustrated in pseudo-C++) is Thrift currently supports five target languages: C++, Java, Python, as follows: Ruby, and PHP. At Facebook, we have deployed servers predom- inantly in C++, Java, and Python. Thrift services implemented in Service.thrift PHP have also been embedded into the Apache web server, provid- => Service.cpp ing transparent backend access to many of our frontend constructs interface ServiceIf using a THttpClient implementation of the TTransport inter- class ServiceClient : virtual ServiceIf face.
  • 6. Though Thrift was explicitly designed to be much more efficient In C++, we use a relatively esoteric language construct: member and robust than typical web technologies, as we were designing function pointers. our XML-based REST web services API we noticed that Thrift could be easily used to define our service interface. Though we std::map<std::string, do not currently employ SOAP envelopes (in the authors’ opinions void (ExampleServiceProcessor::*)(int32_t, there is already far too much repetitive enterprise Java software facebook::thrift::protocol::TProtocol*, to do that sort of thing), we were able to quickly extend Thrift to facebook::thrift::protocol::TProtocol*)> generate XML Schema Definition files for our service, as well as processMap_; a framework for versioning different implementations of our web Using these techniques, the cost of string processing is minimized, service. Though public web services are admittedly tangential to and we reap the benefit of being able to easily debug corrupt Thrift’s core use case and design, Thrift facilitated rapid iteration or misunderstood data by inspecting it for known string method and affords us the ability to quickly migrate our entire XML-based names. web service onto a higher performance system should the need arise. 7.4 Servers and Multithreading 7.2 Generated Structs Thrift services require basic multithreading to handle simultane- ous requests from multiple clients. For the Python and Java im- We made a conscious decision to make our generated structs as plementations of Thrift server logic, the standard threading li- transparent as possible. All fields are publicly accessible; there are braries distributed with the languages provide adequate support. no set() and get() methods. Similarly, use of the isset object For the C++ implementation, no standard multithread runtime li- is not enforced. We do not include any FieldNotSetException brary exists. Specifically, robust, lightweight, and portable thread construct. Developers have the option to use these fields to write manager and timer class implementations do not exist. We in- more robust code, but the system is robust to the developer ignor- vestigated existing implementations, namely boost::thread, ing the isset construct entirely and will provide suitable default boost::threadpool, ACE Thread Manager and ACE Timer. behavior in all cases. While boost::threads[1] provides clean, lightweight and robust This choice was motivated by the desire to ease application devel- implementations of multi-thread primitives (mutexes, conditions, opment. Our stated goal is not to make developers learn a rich new threads) it does not provide a thread manager or timer implementa- library in their language of choice, but rather to generate code that tion. allow them to work with the constructs that are most familiar in each language. boost::threadpool[2] also looked promising but was not far enough along for our purposes. We wanted to limit the dependency We also made the read() and write() methods of the generated on third-party libraries as much as possible. Because objects public so that the objects can be used outside of the con- boost::threadpool is not a pure template library and requires text of RPC clients and servers. Thrift is a useful tool simply for runtime libraries and because it is not yet part of the official generating objects that are easily serializable across programming Boost distribution we felt it was not ready for use in Thrift. As languages. boost::threadpool evolves and especially if it is added to the 7.3 RPC Method Identification Boost distribution we may reconsider our decision to not use it. Method calls in RPC are implemented by sending the method name ACE has both a thread manager and timer class in addition to multi- as a string. One issue with this approach is that longer method thread primitives. The biggest problem with ACE is that it is ACE. names require more bandwidth. We experimented with using fixed- Unlike Boost, ACE API quality is poor. Everything in ACE has size hashes to identify methods, but in the end concluded that large numbers of dependencies on everything else in ACE - thus the savings were not worth the headaches incurred. Reliably deal- forcing developers to throw out standard classes, such as STL col- ing with conflicts across versions of an interface definition file is lections, in favor of ACE’s homebrewed implementations. In addi- impossible without a meta-storage system (i.e. to generate non- tion, unlike Boost, ACE implementations demonstrate little under- conflicting hashes for the current version of a file, we would have standing of the power and pitfalls of C++ programming and take no to know about all conflicts that ever existed in any previous version advantage of modern templating techniques to ensure compile time of the file). safety and reasonable compiler error messages. For all these rea- sons, ACE was rejected. Instead, we chose to implement our own We wanted to avoid too many unnecessary string comparisons library, described in the following sections. upon method invocation. To deal with this, we generate maps from strings to function pointers, so that invocation is effectively accom- 7.5 Thread Primitives plished via a constant-time hash lookup in the common case. This The Thrift thread libraries are implemented in the namespace requires the use of a couple interesting code constructs. Because facebook::thrift::concurrency and have three components: Java does not have function pointers, process functions are all pri- vate member classes implementing a common interface. • primitives • thread pool manager private class ping implements ProcessFunction { public void process(int seqid, • timer manager TProtocol iprot, As mentioned above, we were hesitant to introduce any additional TProtocol oprot) dependencies on Thrift. We decided to use boost::shared ptr throws TException because it is so useful for multithreaded application, it requires no { ...} link-time or runtime libraries (i.e. it is a pure template library) and } it is due to become part of the C++0x standard. HashMap<String,ProcessFunction> processMap_ = We implement standard Mutex and Condition classes, and a new HashMap<String,ProcessFunction>(); Monitor class. The latter is simply a combination of a mutex and
  • 7. condition variable and is analogous to the Monitor implementa- lifecycle of each object is independent of the other. An application tion provided for the Java Object class. This is also sometimes may create a set of Runnable object to be reused in different referred to as a barrier. We provide a Synchronized guard class to threads, or it may create and forget a Runnable object once a thread allow Java-like synchronized blocks. This is just a bit of syntactic has been created and started for it. sugar, but, like its Java counterpart, clearly delimits critical sec- The Thread class takes a boost::shared ptr reference to the tions of code. Unlike its Java counterpart, we still have the ability hosted Runnable object in its constructor, while the Runnable to programmatically lock, unlock, block, and signal monitors. class has an explicit thread method to allow explicit binding of void run() { the hosted thread. ThreadFactory::newThread binds the objects {Synchronized s(manager->monitor); to each other. if (manager->state == TimerManager::STARTING) { manager->state = TimerManager::STARTED; 7.7 ThreadManager manager->monitor.notifyAll(); ThreadManager creates a pool of worker threads and allows ap- } plications to schedule tasks for execution as free worker threads } become available. The ThreadManager does not implement dy- } namic thread pool resizing, but provides primitives so that applica- tions can add and remove threads based on load. This approach was We again borrowed from Java the distinction between a thread and chosen because implementing load metrics and thread pool size is a runnable class. A Thread is the actual schedulable object. The very application specific. For example some applications may want Runnable is the logic to execute within the thread. The Thread to adjust pool size based on running-average of work arrival rates implementation deals with all the platform-specific thread creation that are measured via polled samples. Others may simply wish to and destruction issues, while the Runnable implementation deals react immediately to work-queue depth high and low water marks. with the application-specific per-thread logic. The benefit of this Rather than trying to create a complex API abstract enough to cap- approach is that developers can easily subclass the Runnable class ture these different approaches, we simply leave it up to the par- without pulling in platform-specific super-classes. ticular application and provide the primitives to enact the desired 7.6 Thread, Runnable, and shared ptr policy and sample current status. We use boost::shared ptr throughout the ThreadManager and 7.8 TimerManager TimerManager implementations to guarantee cleanup of dead ob- TimerManager allows applications to schedule Runnable objects jects that can be accessed by multiple threads. For Thread class for execution at some point in the future. Its specific task is to al- implementations, boost::shared ptr usage requires particular lows applications to sample ThreadManager load at regular inter- attention to make sure Thread objects are neither leaked nor deref- vals and make changes to the thread pool size based on application erenced prematurely while creating and shutting down threads. policy. Of course, it can be used to generate any number of timer or Thread creation requires calling into a C library. (In our case the alarm events. POSIX thread library, libpthread, but the same would be true The default implementation of TimerManager uses a single thread for WIN32 threads). Typically, the OS makes few, if any, guaran- to execute expired Runnable objects. Thus, if a timer operation tees about when ThreadMain, a C thread’s entry-point function, needs to do a large amount of work and especially if it needs to do will be called. Therefore, it is possible that our thread create call, blocking I/O, that should be done in a separate thread. ThreadFactory::newThread() could return to the caller well before that time. To ensure that the returned Thread object is not 7.9 Nonblocking Operation prematurely cleaned up if the caller gives up its reference prior to the ThreadMain call, the Thread object makes a weak referenence Though the Thrift transport interfaces map more directly to a to itself in its start method. blocking I/O model, we have implemented a high performance TNonBlockingServer in C++ based on libevent and the TFramedTransport. With the weak reference in hand the ThreadMain function can at- We implemented this by moving all I/O into one tight event loop tempt to get a strong reference before entering the Runnable::run using a state machine. Essentially, the event loop reads framed method of the Runnable object bound to the Thread. If no strong requests into TMemoryBuffer objects. Once entire requests are references to the thread are obtained between exiting Thread::start ready, they are dispatched to the TProcessor object which can and entering ThreadMain, the weak reference returns null and the read directly from the data in memory. function exits immediately. 7.10 Compiler The need for the Thread to make a weak reference to itself has a significant impact on the API. Since references are man- The Thrift compiler is implemented in C++ using standard lex/yacc aged through the boost::shared ptr templates, the Thread lexing and parsing. Though it could have been implemented with object must have a reference to itself wrapped by the same fewer lines of code in another language (i.e. Python Lex-Yacc boost::shared ptr envelope that is returned to the caller. This (PLY) or ocamlyacc), using C++ forces explicit definition of the necessitated the use of the factory pattern. ThreadFactory cre- language constructs. Strongly typing the parse tree elements (de- ates the raw Thread object and a boost::shared ptr wrap- batably) makes the code more approachable for new developers. per, and calls a private helper method of the class implement- Code generation is done using two passes. The first pass looks ing the Thread interface (in this case, PosixThread::weakRef) only for include files and type definitions. Type definitions are not to allow it to make add weak reference to itself through the checked during this phase, since they may depend upon include boost::shared ptr envelope. files. All included files are sequentially scanned in a first pass. Once Thread and Runnable objects reference each other. A Runnable the include tree has been resolved, a second pass over all files is object may need to know about the thread in which it is executing, taken that inserts type definitions into the parse tree and raises an and a Thread, obviously, needs to know what Runnable object it error on any undefined types. The program is then generated against is hosting. This interdependency is further complicated because the the parse tree.
  • 8. Due to inherent complexities and potential for circular dependen- • SOAP. XML-based. Designed for web services via HTTP, ex- cies, we explicitly disallow forward declaration. Two Thrift structs cessive XML parsing overhead. cannot each contain an instance of the other. (Since we do not al- • CORBA. Relatively comprehensive, debatably overdesigned low null struct instances in the generated C++ code, this would and heavyweight. Comparably cumbersome software instal- actually be impossible.) lation. 7.11 TFileTransport • COM. Embraced mainly in Windows client softare. Not an The TFileTransport logs Thrift requests/structs by framing in- entirely open solution. coming data with its length and writing it out to disk. Using a • Pillar. Lightweight and high-performance, but missing version- framed on-disk format allows for better error checking and helps ing and abstraction. with the processing of a finite number of discrete events. The • Protocol Buffers. Closed-source, owned by Google. Described TFileWriterTransport uses a system of swapping in-memory in Sawzall paper. buffers to ensure good performance while logging large amounts of data. A Thrift log file is split up into chunks of a specified size; logged messages are not allowed to cross chunk boundaries. A mes- Acknowledgments sage that would cross a chunk boundary will cause padding to be Many thanks for feedback on Thrift (and extreme trial by fire) are added until the end of the chunk and the first byte of the message due to Martin Smith, Karl Voskuil and Yishan Wong. are aligned to the beginning of the next chunk. Partitioning the file into chunks makes it possible to read and interpret data from a par- Thrift is a successor to Pillar, a similar system developed by Adam ticular point in the file. D’Angelo, first while at Caltech and continued later at Facebook. Thrift simply would not have happened without Adam’s insights. 8. Facebook Thrift Services Thrift has been employed in a large number of applications at References Facebook, including search, logging, mobile, ads and the developer [1] Kempf, William, “Boost.Threads”, http://www.boost.org/doc/ platform. Two specific usages are discussed below. html/threads.html [2] Henkel, Philipp, “threadpool”, http://threadpool.sourceforge. 8.1 Search net Thrift is used as the underlying protocol and transport layer for the Facebook Search service. The multi-language code generation is well suited for search because it allows for application development in an efficient server side language (C++) and allows the Facebook PHP-based web application to make calls to the search service using Thrift PHP libraries. There is also a large variety of search stats, deployment and testing functionality that is built on top of generated Python code. Additionally, the Thrift log file format is used as a redo log for providing real-time search index updates. Thrift has allowed the search team to leverage each language for its strengths and to develop code at a rapid pace. 8.2 Logging The Thrift TFileTransport functionality is used for structured logging. Each service function definition along with its parameters can be considered to be a structured log entry identified by the function name. This log can then be used for a variety of purposes, including inline and offline processing, stats aggregation and as a redo log. 9. Conclusions Thrift has enabled Facebook to build scalable backend services efficiently by enabling engineers to divide and conquer. Application developers can focus on application code without worrying about the sockets layer. We avoid duplicated work by writing buffering and I/O logic in one place, rather than interspersing it in each application. Thrift has been employed in a wide variety of applications at Face- book, including search, logging, mobile, ads, and the developer platform. We have found that the marginal performance cost in- curred by an extra layer of software abstraction is far eclipsed by the gains in developer efficiency and systems reliability. A. Similar Systems The following are software systems similar to Thrift. Each is (very!) briefly described: