SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Linked                        frastructure
QCON 2007

Jean-Luc Vaillant, CTO & Co-founder




                                © 2007 LinkedIn corp.
Disclaimer: I will NOT talk about...

• Communication System      • Advertising

• Network Update Feed       • Media Store

• Authentication            • Monitoring

• Security                  • Load Balancing

• Payment                   • etc

• Tracking




                                               © 2007 LinkedIn corp.
Quick Facts about LinkedIn

• Founded end of 2002, based in Mountain View

• Website opened to the public in May 2003

• Membership:

  • 16M members, > 1M new members per month

  • 50% membership outside the US

  • > 30% users active monthly

• Business:

  • Profitable since 2006

  • 180 employees (and yes, we’re hiring!)

                                 © 2007 LinkedIn corp.   3
Technical stack

• Java! (a bit of Ruby and C++)

• Unix: production is Solaris (mostly x86), development on OS X

• Oracle 10g for critical DBs, some MySQL

• Spring

• ActiveMQ

• Tomcat & Jetty

• Lucene


                                  © 2007 LinkedIn corp.           4
The “big idea”...

• Professional centric social network


• Using relationships and degree distance to:


   • improve connect-ability


   • improve relevance


   • enhance sense of reputation


• “Your world is NOT flat”, relationships matter!




                                   © 2007 LinkedIn corp.   5
The “big idea” (cont)


                                                            Data
                                                             Data
                                                               Data
                                                                   Data
                                                                    Data

      Graph of Relationships




                               More effective results !

                                    © 2007 LinkedIn corp.                  6
Just a little problem...

• Typical graph functions:


   • visit all members within N degrees from me


   • visit all members that appeared within N degrees from me since timestamp


   • compute distance and paths between 2 members


• Graph computations do not perform very well in a relational DB :-(


• Only way to get the performance needed is to run the algorithms on data
  stored in RAM!


                                  © 2007 LinkedIn corp.                         7
Just a little problem...

                 Q: how do we keep the RAM DB in sync
                            at ALL times?
                                                                          Graph Engine         Graph Engine   Graph Engine
        Connections
from_member_id        int
                                                                              RAM                  RAM           RAM
to_member_id          int

                                          option 2
                                                                                                                             ...
                 DB


                                           ins
                                              e
                                          co rt / d
                                             nn                                            1
                                                ec elete
                                                                                     ion
                                                  tio                              t
                                                      ns                        op




                            Application              Application          Application          Application




                                                               © 2007 LinkedIn corp.                                           8
Sync option 1

• Application updates DB and informs graph engines of the change...

   • direct RPC:
           db.addConnection(m1,m2);
           foreach (graphEngine in GraphEngineList) {
               graphEngine.addConnection(m1,m2);
           }


   • reliable multicast: better but system is very sensitive to network topology,
     latency and availability

   • JMS topic with durable subscription
         • messaging system becomes second POF
         • provisioning new clients becomes much more painful

• if application “forgets” to send the second notification we’re in BIG trouble

• Fairly “typical” 2 phase commit problem :-(


                                             © 2007 LinkedIn corp.                  9
Sync option 2

• Application updates DB (as usual) and graph engines manage to replicate the
  changes made in the DB


• Provides isolation of concerns


• Technology exists, it’s called DB replication


• There is a catch, it’s proprietary, opaque (Oracle) and works DB to DB only


• We need a “user space” replication...




                                   © 2007 LinkedIn corp.                        10
Ideal “Databus”

• Does not miss ANY event

• Does not add any new point of failure

• Scales well

• Works over high latency networks

• Does not assume that clients have 100% uptime.
  Client must be able to catch up when it is back online.

• Easy to add new clients

• Easy to use on the application side

• Easy to set up on the DB side

                                  © 2007 LinkedIn corp.     11
First attempt... using timestamps


                                     Connections
                            from_member_id                 int
                            to_member_id                   int
                            is_active                      boolean
                            last_modified                   timestamp

                                                                                        Graph Engine

                 insert / update                                                           RAM
                   connections                             getChangesSince(timestamp)

   Application                             DB



                                                                                        lastTimeStamp




   trigger on insert/update sets timestamp = SysTimeStamp

                                   © 2007 LinkedIn corp.                                                12
Works pretty well...

• DB provides full ACID behavior, no additional transaction needed


• Adding trigger and column is straightforward


• Performance on DB not seriously affected


• Client clock is not a factor


• Fairly easy for client to connect and get the event stream




                                  © 2007 LinkedIn corp.              13
A big problem though...

• Timestamps (in Oracle) are set at the time of the call, NOT at the time of
  commit
       time

                                                      • tx2 commits BEFORE tx1 but t2 > t1
              tx1
                    systimestamp
  t1
                                                      • if sync occurs between the 2 commits
                               tx2
                                                        lastTimeStamp is t2 and tx1 is missed!
                                     systimestamp
  t2




                                                      • Workaround (ugly): reread events since
                                                        lastTimeStamp - n seconds
                                       commit



                      commit

                                                      • Any locking and we can still miss an event



                                                    © 2007 LinkedIn corp.                        14
Second attempt... ora_rowscn

• little known feature introduced in Oracle 10g

• the ora_rowscn pseudo column contains the internal Oracle clock (SCN -
  System Change Number) at COMMIT time!

• by default, the ora_rowscn is block-level granularity

• to get row-level granularity: create table T rowdependencies ...

• SCN always advances, never goes back!

• no performance impact (that we could tell)

• little catch: cannot be indexed !


                                      © 2007 LinkedIn corp.                15
Indexing ora_rowscn?

• add an indexed column: scn


• set default value for scn to “infinity”


• after commit ora_rowscn is set


• select * from T where scn > :1 AND ora_rowscn > :1


• every so often, update T set scn = ora_rowscn where scn = infinity




                                    © 2007 LinkedIn corp.             16
Using ora_rowscn (cont)

                                      Connections
                            from_member_id            int
                            to_member_id              int
                            is_active                 boolean
                            scn                       int (indexed)
                            ora_rowscn                virtual

                                                                                     Graph Engine

                                                                                        RAM

   Application                               DB
                 insert / update                            getCxChangesSince(scn)
                   connections
                                                                                        lastScn



                                      ora_rowscn to scn
                                          quot;indexerquot;




                                        © 2007 LinkedIn corp.                                       17
Supporting more tables/events
                        trigger on insert/update allocates txn from sequence
                                        and inserts into TxLog



                         Connections                                          TxLog
                 from_member_id                               txn
                                         int                                          int
                 to_member_id                                 scn
                                         int                                          int (indexed)
                 is_active                                    ts
                                         boolean                                      timestamp
                 txn                                          mask
                                         int                                          int
                                                              ora_rowscn              virtual


                                                                                                      Graph Engine

                                                                                                         RAM

       Application                                   DB
                       insert / update                                 getChangesSince(scn)
                         connections
                                                                                                         lastScn




  Now we have a single log for all the tables that we want to track

                                               © 2007 LinkedIn corp.                                                 18
Many tables, many clients..

                                                                                      Graph Engine
                                                                                      People Search
                                                                                       Jobs Search
  Application                         DB
                                                      getChangesSince(table[],scn)   Ask Your Network
                insert / update                                                           Search
                                                                                      Experts Search

                                                                                         ...
                                     TxLog
                                  Connections
                                    Profiles
                                   Members
                                    Groups
                                      Jobs
                                    ...

    - adding a new “databusified” table is (fairly) easy
    - clients can track many tables at once
    - transactions involving many tables are supported

                                     © 2007 LinkedIn corp.                                              19
Ideal “Databus” ?

• Does not miss ANY event

• Does not add any new point of failure

• Scales well?

• Works over high latency networks

• Does not assume all clients to be 100%

• Easy to add new clients

• Easy to use on the application side

• Easy to set up on the DB side

                                  © 2007 LinkedIn corp.   20
Scaling?

                                                        Graph Engine

                                                        Graph Engine

                                                        Graph Engine
                  DB
                                                        Graph Engine

                                                        People Search

                                                        People Search

                                                        Jobs Search
           Lots of SQL connections !
            Does not scale too well !


            Adding a lot of clients will eventually kill
                         my databases

                                © 2007 LinkedIn corp.                   21
Scaling!
                                                                 Graph Engine

                                                                 Graph Engine
                                Relay
                                                                 Graph Engine
                  single   Wr                 Rd                 Graph Engine
                                Event                    http
     DB             DB
                                Buffer                  server
                  thread                                         People Search

                                                                 People Search

                                                                 Jobs Search


                                    HTTP stateless GETs...


          - only a few DB clients (for HA)
          - hundreds of clients load balanced over HTTP
          - relays can be cascaded (within reason)
          - clients do not talk to DB at all!

                                © 2007 LinkedIn corp.                            22
Additional applications...

• Cache invalidation


                                                               RPC

                          Data Access Layer


                                                          consistent?
                                                yes                           no




                             Apply Thread                        Reader
                                                                                                      Cache
                               loop:                                 result = cache.get(key);
                                  get updates from Master DB         if result is null {
                                                                                                    . can be large
      Master                      foreach (key,value) {                 result = db.get(key);
                                                                                                    . infinite TTL
               events                cache.clear(key);                  if (result not null)
       DB                         }                                        cache.add(key,result);
                                                                     }
       TxLog
                                                                     return result;




                        consistent = write operations or any subsequent read within N seconds...
                                note: N can be dynamically computed and adjusted for latency




                                                 © 2007 LinkedIn corp.                                               23
Combination caching & replication

• Even better, if cache is empty, fill it up from a LOCAL replica


                                                           RPC

                     Data Access Layer


                                                         consistent?
                                           yes                              no
                                                                                                    Cache
                                                                                                  . optional
                        Apply Thread                            Reader                            . can be large
                                                                                                  . infinite TTL
                          loop:                                    result = cache.get(key);
                             get updates from Master DB            if result is null {
  Master                     foreach (key,value) {                    result = repdb.get(key);
            events              repdb.put(key,value);                 if (result not null)
   DB                           cache.set(key,value);                    cache.add(key,result);
                             }                                     }
   TxLog
                                                                                                   Replica
                                                                   return result;
                                                                                                    DB
                                                                                                     lastScn



                     Some race conditions exists, details for dealing with them are omitted



                                                 © 2007 LinkedIn corp.                                             24
Specialized indices

• bulk lookups: “which of these 800 contacts in my address book is a member?”
• spatial indices...
• partial indices for time relevant data (most recent, short lived data)




                                                      Specialized Index
                                                           (RAM)
                       DB
                       TxLog
                                                             Apply Thread




                                     © 2007 LinkedIn corp.                      25
The overall view...

                      Applications




                                          RPC

      JDBC

                                                                       Caches
                                                        Lucene
                                  Graph                                              Custom
       DBs                                              Search
                                 Engines                                             Indices
                                                                      Replica
                                                        Engines
                                                                       DBs
       TxLog

                                Apply Thread          Apply Thread   Apply Thread   Apply Thread




                          HTTP
  Relays (capture)
  Databus Relays
    (mostly for HA)




                                           © 2007 LinkedIn corp.                                   26
Vendor lock-in?

• Our capture code is indeed Oracle specific but...


• All we need from the DB is


  • having a unique ID per transaction (txn). No ordering required, just unicity


  • having ONE table with a non indexable “SCN”


  • serializable transactions


  • (optional) alert mechanism (cond.signal())




                                  © 2007 LinkedIn corp.                            27
Databus summary

• it is NOT a messaging system, it only provides the last state of an object

• it is NOT a caching system or datastore, it provides changes quot;since a point in
  timequot;, not quot;by keyquot;

• it will provide 100% consistency between systems... eventually...

• no additional SPOF: it’s you commit to the DB, it will be available on the bus

• moves replication benefits out of the DBA land into the application space

• very limited support from the DBMS required

• plugging in the Databus from a SWE POV is as simple as wiring a Spring
  component and writing an event handler...

• allows for a full range of data management schemes if the data is mostly read

                                  © 2007 LinkedIn corp.                            28
Back to the Graph Engine...



                                                     Storage Mgr
                                                                                 Local
                                                                                 Disk
                                     R
                                                                                 scn
                                     P                                       D
    getDistance(m1,m2)
                                     C                                       a
                                             Connections           Groups
                                                                             t
                                                Store               Store
                                     S                                       a
filterAndSort(m,degree,[mbrs,data])
                                     e                                       b
                                     r                                       u
                                                 Visitor           Visitor
                                     v                                       s
    getNetwork(m,degree,ts)          e
                                     r                Aggregator




                                         © 2007 LinkedIn corp.                           29
Algorithms

• visitor pattern within a “Read Transaction”, standard Breadth First

                                                           3
                                                               2
                                                                   1




• a little fancier for visitNewSince(TimeStamp ts) if you want to avoid traversing
  the graph twice

• use symmetry when possible
                                                                   4
                                                                       3
                 2       2                                                 2
                     1       1                                                 1
                                   much cheaper than



                                   © 2007 LinkedIn corp.                             30
First challenge: Read Write Lock gotchas

• Wikipedia:
 A read/write lock pattern is a software design pattern that allows concurrent read access to an object but
 requires exclusive access for write operations. In this pattern, multiple readers can read the data in parallel but
 needs exclusive lock while writing the data. When writer is writing the data, readers will be blocked until writer is
 finished writing.


• Sounds like a great locking mechanism for any structure with a LOT of reads
  and a FEW tiny (serializable) writes...


• Graph index is one of these...


• So why do we observe?

   • CPU caps way under 100%

   • Graph RPC calls take much longer than expected

                                                 © 2007 LinkedIn corp.                                              31
Read Write Lock gotchas

    Problem occurs when some readers take significant time...



         R1   W   R2   R3


                       400 ms




                                                    5 ms


                                                     100 ms



        R2 is delayed by nearly 400 ms for a 5 ms call!

                            © 2007 LinkedIn corp.              32
Read Write Lock gotchas

• Solution: don’t use RW lock, use Copy On Write...

                     Reader 1                                             Reader 2
              root                                                 root




                        1                                                    1'


          2             3       4                                  2              3   4'   5




                                       Writer
                                    add node 5
                                    dup node 4 and 1
                                    switch root
                                    wait for previous readers to finish
                                    free nodes 1 and 4


                                           © 2007 LinkedIn corp.                               33
Second challenge: one’s network size...

• Graph of connections is highly skewed: yet another power law...


• Heavy connectors “attract” each other which leads to a “black hole” effect


• Once you introduce distance > 1 the network size grows very fast !


• Compressing can flatten that curve considerably...


                                                                        Network artifact is now:
         3
             2                      3
                                        2
                                                               - cacheable
                 1                          1


                                                               - transportable over the network
                                                               - whole, per degree, per “chunk”
                     compress and
                       partition




                                                © 2007 LinkedIn corp.                              34
With caching

     Graph Computing Engine
                                                         compressed
                                                                                 A: send ids for filtering and distance
                                                     3
                                                         2
                                                                                 B: ask for ids within distance + within id range
                                                             1




     Network Cache
                                                                                         A
                                             3
                                             2
                                                 1
                                     3
                                                                                                            Search
                                     2                            3
                                         1                         2
                                                     3                1
                                                     2
                                                         1                               B
      maintains compressed network
      of logged in users




• cache is maintained compressed
• compression is designed so visits, lookup and partial fetches are fast!
• network can be shipped to other systems for additional relevance

                                                                 © 2007 LinkedIn corp.                                              35
Summary

• Databus infrastructure useful for....

   • maintaining indices more efficiently than in a std relational DB

      • Graph of relationships beyond first degree

      • Text based indices (Lucene)

   • splitting data persistence from read queries

      • use DB mainly for storing data and maintaining TxLog

      • use RAM for processing/querying the data

   • keeping distributed caches in sync with Master DBs


                                    © 2007 LinkedIn corp.              36
Q&A




      © 2007 LinkedIn corp.   37

Contenu connexe

Tendances

Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012Weiwei Chen
 
Performance? That's what version 2 is for!
Performance? That's what version 2 is for!Performance? That's what version 2 is for!
Performance? That's what version 2 is for!Eduard Tudenhoefner
 
Consolidated shared indexes in real time
Consolidated shared indexes in real timeConsolidated shared indexes in real time
Consolidated shared indexes in real timeJeff Mace
 
MapReduce Using Perl and Gearman
MapReduce Using Perl and GearmanMapReduce Using Perl and Gearman
MapReduce Using Perl and GearmanJamie Pitts
 
ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with CassandraApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with CassandraMichaël Figuière
 
Dbi Advanced Talk 200708
Dbi Advanced Talk 200708Dbi Advanced Talk 200708
Dbi Advanced Talk 200708oscon2007
 
Chapter9 network managment-3ed
Chapter9 network managment-3edChapter9 network managment-3ed
Chapter9 network managment-3edKhánh Ghẻ
 
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...gogo6
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution ISSGC Summer School
 
DB2 V10 Migration Guidance
DB2 V10 Migration GuidanceDB2 V10 Migration Guidance
DB2 V10 Migration GuidanceCraig Mullins
 

Tendances (11)

Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012
 
An Hour of DB2 Tips
An Hour of DB2 TipsAn Hour of DB2 Tips
An Hour of DB2 Tips
 
Performance? That's what version 2 is for!
Performance? That's what version 2 is for!Performance? That's what version 2 is for!
Performance? That's what version 2 is for!
 
Consolidated shared indexes in real time
Consolidated shared indexes in real timeConsolidated shared indexes in real time
Consolidated shared indexes in real time
 
MapReduce Using Perl and Gearman
MapReduce Using Perl and GearmanMapReduce Using Perl and Gearman
MapReduce Using Perl and Gearman
 
ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with CassandraApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
 
Dbi Advanced Talk 200708
Dbi Advanced Talk 200708Dbi Advanced Talk 200708
Dbi Advanced Talk 200708
 
Chapter9 network managment-3ed
Chapter9 network managment-3edChapter9 network managment-3ed
Chapter9 network managment-3ed
 
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution
 
DB2 V10 Migration Guidance
DB2 V10 Migration GuidanceDB2 V10 Migration Guidance
DB2 V10 Migration Guidance
 

En vedette

LinkedIn Series D Opportunity - Investment Recommendation
LinkedIn Series D Opportunity - Investment RecommendationLinkedIn Series D Opportunity - Investment Recommendation
LinkedIn Series D Opportunity - Investment RecommendationEunice Chou
 
Top 10 lies of Venture Capitalists
Top 10 lies of Venture CapitalistsTop 10 lies of Venture Capitalists
Top 10 lies of Venture Capitalistshuer1278ft
 
Zapmeals: Sample Startup Pitch Deck (from SuperNova 2007)
Zapmeals: Sample Startup Pitch Deck (from SuperNova 2007)Zapmeals: Sample Startup Pitch Deck (from SuperNova 2007)
Zapmeals: Sample Startup Pitch Deck (from SuperNova 2007)Dave McClure
 
The investor presentation we used to raise 2 million dollars
The investor presentation we used to raise 2 million dollarsThe investor presentation we used to raise 2 million dollars
The investor presentation we used to raise 2 million dollarsMikael Cho
 
Foursquare's 1st Pitch Deck
Foursquare's 1st Pitch DeckFoursquare's 1st Pitch Deck
Foursquare's 1st Pitch DeckRami Al-Karmi
 
Linkedin Series B Pitch Deck
Linkedin Series B Pitch DeckLinkedin Series B Pitch Deck
Linkedin Series B Pitch DeckJoseph Hsieh
 
The Best Startup Investor Pitch Deck & How to Present to Angels & Venture Cap...
The Best Startup Investor Pitch Deck & How to Present to Angels & Venture Cap...The Best Startup Investor Pitch Deck & How to Present to Angels & Venture Cap...
The Best Startup Investor Pitch Deck & How to Present to Angels & Venture Cap...J. Skyler Fernandes
 
Mixpanel - Our pitch deck that we used to raise $65M
Mixpanel - Our pitch deck that we used to raise $65MMixpanel - Our pitch deck that we used to raise $65M
Mixpanel - Our pitch deck that we used to raise $65MSuhail Doshi
 
The slide deck we used to raise half a million dollars
The slide deck we used to raise half a million dollarsThe slide deck we used to raise half a million dollars
The slide deck we used to raise half a million dollarsBuffer
 

En vedette (10)

Doximity
DoximityDoximity
Doximity
 
LinkedIn Series D Opportunity - Investment Recommendation
LinkedIn Series D Opportunity - Investment RecommendationLinkedIn Series D Opportunity - Investment Recommendation
LinkedIn Series D Opportunity - Investment Recommendation
 
Top 10 lies of Venture Capitalists
Top 10 lies of Venture CapitalistsTop 10 lies of Venture Capitalists
Top 10 lies of Venture Capitalists
 
Zapmeals: Sample Startup Pitch Deck (from SuperNova 2007)
Zapmeals: Sample Startup Pitch Deck (from SuperNova 2007)Zapmeals: Sample Startup Pitch Deck (from SuperNova 2007)
Zapmeals: Sample Startup Pitch Deck (from SuperNova 2007)
 
The investor presentation we used to raise 2 million dollars
The investor presentation we used to raise 2 million dollarsThe investor presentation we used to raise 2 million dollars
The investor presentation we used to raise 2 million dollars
 
Foursquare's 1st Pitch Deck
Foursquare's 1st Pitch DeckFoursquare's 1st Pitch Deck
Foursquare's 1st Pitch Deck
 
Linkedin Series B Pitch Deck
Linkedin Series B Pitch DeckLinkedin Series B Pitch Deck
Linkedin Series B Pitch Deck
 
The Best Startup Investor Pitch Deck & How to Present to Angels & Venture Cap...
The Best Startup Investor Pitch Deck & How to Present to Angels & Venture Cap...The Best Startup Investor Pitch Deck & How to Present to Angels & Venture Cap...
The Best Startup Investor Pitch Deck & How to Present to Angels & Venture Cap...
 
Mixpanel - Our pitch deck that we used to raise $65M
Mixpanel - Our pitch deck that we used to raise $65MMixpanel - Our pitch deck that we used to raise $65M
Mixpanel - Our pitch deck that we used to raise $65M
 
The slide deck we used to raise half a million dollars
The slide deck we used to raise half a million dollarsThe slide deck we used to raise half a million dollars
The slide deck we used to raise half a million dollars
 

Similaire à Linked In Lessons Learned And Growth And Scalability

Brian Oliver Pimp My Data Grid
Brian Oliver  Pimp My Data GridBrian Oliver  Pimp My Data Grid
Brian Oliver Pimp My Data Griddeimos
 
Macleans - NZ Business taking on the world with a world class IT infrastructu...
Macleans - NZ Business taking on the world with a world class IT infrastructu...Macleans - NZ Business taking on the world with a world class IT infrastructu...
Macleans - NZ Business taking on the world with a world class IT infrastructu...Vincent Kwon
 
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...Jason Hearne-McGuiness
 
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P..."Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...Edge AI and Vision Alliance
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianA Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianData Con LA
 
Deploying and Scaling using AWS
Deploying and Scaling using AWSDeploying and Scaling using AWS
Deploying and Scaling using AWSwr0ngway
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitDenis Magda
 
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...Edge AI and Vision Alliance
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Databricks
 
Tungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten ReplicatorsTungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten ReplicatorsContinuent
 
Real-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on KubernetesReal-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on KubernetesDatabricks
 
Asynchronous Javascript and Rich Internet Aplications
Asynchronous Javascript and Rich Internet AplicationsAsynchronous Javascript and Rich Internet Aplications
Asynchronous Javascript and Rich Internet AplicationsSubramanyan Murali
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 NotesRoss Lawley
 
On component interface
On component interfaceOn component interface
On component interfaceLaurence Chen
 
Database Performance With Proxy Architectures
Database  Performance With  Proxy  ArchitecturesDatabase  Performance With  Proxy  Architectures
Database Performance With Proxy ArchitecturesPerconaPerformance
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaSShawn Zhu
 

Similaire à Linked In Lessons Learned And Growth And Scalability (20)

Brian Oliver Pimp My Data Grid
Brian Oliver  Pimp My Data GridBrian Oliver  Pimp My Data Grid
Brian Oliver Pimp My Data Grid
 
Macleans - NZ Business taking on the world with a world class IT infrastructu...
Macleans - NZ Business taking on the world with a world class IT infrastructu...Macleans - NZ Business taking on the world with a world class IT infrastructu...
Macleans - NZ Business taking on the world with a world class IT infrastructu...
 
Memcached Presentation
Memcached PresentationMemcached Presentation
Memcached Presentation
 
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
 
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P..."Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianA Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen Donigian
 
Deploying and Scaling using AWS
Deploying and Scaling using AWSDeploying and Scaling using AWS
Deploying and Scaling using AWS
 
XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
 
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
 
Tungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten ReplicatorsTungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten Replicators
 
Real-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on KubernetesReal-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on Kubernetes
 
Asynchronous Javascript and Rich Internet Aplications
Asynchronous Javascript and Rich Internet AplicationsAsynchronous Javascript and Rich Internet Aplications
Asynchronous Javascript and Rich Internet Aplications
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
On component interface
On component interfaceOn component interface
On component interface
 
Database Performance With Proxy Architectures
Database  Performance With  Proxy  ArchitecturesDatabase  Performance With  Proxy  Architectures
Database Performance With Proxy Architectures
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaS
 

Plus de ConSanFrancisco123

Open Ap Is State Of The Market
Open Ap Is State Of The MarketOpen Ap Is State Of The Market
Open Ap Is State Of The MarketConSanFrancisco123
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudConSanFrancisco123
 
Yellowpagescom Behind The Curtain
Yellowpagescom Behind The CurtainYellowpagescom Behind The Curtain
Yellowpagescom Behind The CurtainConSanFrancisco123
 
Teamwork Is An Individual Skill How To Build Any Team Any Time
Teamwork Is An Individual Skill How To Build Any Team Any TimeTeamwork Is An Individual Skill How To Build Any Team Any Time
Teamwork Is An Individual Skill How To Build Any Team Any TimeConSanFrancisco123
 
Agility Possibilities At A Personal Level
Agility Possibilities At A Personal LevelAgility Possibilities At A Personal Level
Agility Possibilities At A Personal LevelConSanFrancisco123
 
Res Tful Enterprise Development
Res Tful Enterprise DevelopmentRes Tful Enterprise Development
Res Tful Enterprise DevelopmentConSanFrancisco123
 
Behind The Scenes At My Spacecom
Behind The Scenes At My SpacecomBehind The Scenes At My Spacecom
Behind The Scenes At My SpacecomConSanFrancisco123
 
Making Threat Modeling Useful To Software Development
Making Threat Modeling Useful To Software DevelopmentMaking Threat Modeling Useful To Software Development
Making Threat Modeling Useful To Software DevelopmentConSanFrancisco123
 
Secure Programming With Static Analysis
Secure Programming With Static AnalysisSecure Programming With Static Analysis
Secure Programming With Static AnalysisConSanFrancisco123
 
Agile Software Development In The Large
Agile Software Development In The LargeAgile Software Development In The Large
Agile Software Development In The LargeConSanFrancisco123
 
Introduction Challenges In Agile And How To Overcome Them
Introduction Challenges In Agile And How To Overcome ThemIntroduction Challenges In Agile And How To Overcome Them
Introduction Challenges In Agile And How To Overcome ThemConSanFrancisco123
 
Business Natural Languages Development In Ruby
Business Natural Languages Development In RubyBusiness Natural Languages Development In Ruby
Business Natural Languages Development In RubyConSanFrancisco123
 
Orbitz World Wide An Architectures Response To Growth And Change
Orbitz World Wide An Architectures Response To Growth And ChangeOrbitz World Wide An Architectures Response To Growth And Change
Orbitz World Wide An Architectures Response To Growth And ChangeConSanFrancisco123
 

Plus de ConSanFrancisco123 (20)

Open Ap Is State Of The Market
Open Ap Is State Of The MarketOpen Ap Is State Of The Market
Open Ap Is State Of The Market
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The Cloud
 
Ruby V Ms A Comparison
Ruby V Ms A ComparisonRuby V Ms A Comparison
Ruby V Ms A Comparison
 
Yellowpagescom Behind The Curtain
Yellowpagescom Behind The CurtainYellowpagescom Behind The Curtain
Yellowpagescom Behind The Curtain
 
Teamwork Is An Individual Skill How To Build Any Team Any Time
Teamwork Is An Individual Skill How To Build Any Team Any TimeTeamwork Is An Individual Skill How To Build Any Team Any Time
Teamwork Is An Individual Skill How To Build Any Team Any Time
 
Agility Possibilities At A Personal Level
Agility Possibilities At A Personal LevelAgility Possibilities At A Personal Level
Agility Possibilities At A Personal Level
 
Res Tful Enterprise Development
Res Tful Enterprise DevelopmentRes Tful Enterprise Development
Res Tful Enterprise Development
 
Gallio Crafting A Toolchain
Gallio Crafting A ToolchainGallio Crafting A Toolchain
Gallio Crafting A Toolchain
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
Http Status Report
Http Status ReportHttp Status Report
Http Status Report
 
Behind The Scenes At My Spacecom
Behind The Scenes At My SpacecomBehind The Scenes At My Spacecom
Behind The Scenes At My Spacecom
 
Building Blueprint With Gwt
Building Blueprint With GwtBuilding Blueprint With Gwt
Building Blueprint With Gwt
 
Security Cas And Open Id
Security Cas And Open IdSecurity Cas And Open Id
Security Cas And Open Id
 
Soa And Web Services Security
Soa And Web Services SecuritySoa And Web Services Security
Soa And Web Services Security
 
Making Threat Modeling Useful To Software Development
Making Threat Modeling Useful To Software DevelopmentMaking Threat Modeling Useful To Software Development
Making Threat Modeling Useful To Software Development
 
Secure Programming With Static Analysis
Secure Programming With Static AnalysisSecure Programming With Static Analysis
Secure Programming With Static Analysis
 
Agile Software Development In The Large
Agile Software Development In The LargeAgile Software Development In The Large
Agile Software Development In The Large
 
Introduction Challenges In Agile And How To Overcome Them
Introduction Challenges In Agile And How To Overcome ThemIntroduction Challenges In Agile And How To Overcome Them
Introduction Challenges In Agile And How To Overcome Them
 
Business Natural Languages Development In Ruby
Business Natural Languages Development In RubyBusiness Natural Languages Development In Ruby
Business Natural Languages Development In Ruby
 
Orbitz World Wide An Architectures Response To Growth And Change
Orbitz World Wide An Architectures Response To Growth And ChangeOrbitz World Wide An Architectures Response To Growth And Change
Orbitz World Wide An Architectures Response To Growth And Change
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Linked In Lessons Learned And Growth And Scalability

  • 1. Linked frastructure QCON 2007 Jean-Luc Vaillant, CTO & Co-founder © 2007 LinkedIn corp.
  • 2. Disclaimer: I will NOT talk about... • Communication System • Advertising • Network Update Feed • Media Store • Authentication • Monitoring • Security • Load Balancing • Payment • etc • Tracking © 2007 LinkedIn corp.
  • 3. Quick Facts about LinkedIn • Founded end of 2002, based in Mountain View • Website opened to the public in May 2003 • Membership: • 16M members, > 1M new members per month • 50% membership outside the US • > 30% users active monthly • Business: • Profitable since 2006 • 180 employees (and yes, we’re hiring!) © 2007 LinkedIn corp. 3
  • 4. Technical stack • Java! (a bit of Ruby and C++) • Unix: production is Solaris (mostly x86), development on OS X • Oracle 10g for critical DBs, some MySQL • Spring • ActiveMQ • Tomcat & Jetty • Lucene © 2007 LinkedIn corp. 4
  • 5. The “big idea”... • Professional centric social network • Using relationships and degree distance to: • improve connect-ability • improve relevance • enhance sense of reputation • “Your world is NOT flat”, relationships matter! © 2007 LinkedIn corp. 5
  • 6. The “big idea” (cont) Data Data Data Data Data Graph of Relationships More effective results ! © 2007 LinkedIn corp. 6
  • 7. Just a little problem... • Typical graph functions: • visit all members within N degrees from me • visit all members that appeared within N degrees from me since timestamp • compute distance and paths between 2 members • Graph computations do not perform very well in a relational DB :-( • Only way to get the performance needed is to run the algorithms on data stored in RAM! © 2007 LinkedIn corp. 7
  • 8. Just a little problem... Q: how do we keep the RAM DB in sync at ALL times? Graph Engine Graph Engine Graph Engine Connections from_member_id int RAM RAM RAM to_member_id int option 2 ... DB ins e co rt / d nn 1 ec elete ion tio t ns op Application Application Application Application © 2007 LinkedIn corp. 8
  • 9. Sync option 1 • Application updates DB and informs graph engines of the change... • direct RPC: db.addConnection(m1,m2); foreach (graphEngine in GraphEngineList) { graphEngine.addConnection(m1,m2); } • reliable multicast: better but system is very sensitive to network topology, latency and availability • JMS topic with durable subscription • messaging system becomes second POF • provisioning new clients becomes much more painful • if application “forgets” to send the second notification we’re in BIG trouble • Fairly “typical” 2 phase commit problem :-( © 2007 LinkedIn corp. 9
  • 10. Sync option 2 • Application updates DB (as usual) and graph engines manage to replicate the changes made in the DB • Provides isolation of concerns • Technology exists, it’s called DB replication • There is a catch, it’s proprietary, opaque (Oracle) and works DB to DB only • We need a “user space” replication... © 2007 LinkedIn corp. 10
  • 11. Ideal “Databus” • Does not miss ANY event • Does not add any new point of failure • Scales well • Works over high latency networks • Does not assume that clients have 100% uptime. Client must be able to catch up when it is back online. • Easy to add new clients • Easy to use on the application side • Easy to set up on the DB side © 2007 LinkedIn corp. 11
  • 12. First attempt... using timestamps Connections from_member_id int to_member_id int is_active boolean last_modified timestamp Graph Engine insert / update RAM connections getChangesSince(timestamp) Application DB lastTimeStamp trigger on insert/update sets timestamp = SysTimeStamp © 2007 LinkedIn corp. 12
  • 13. Works pretty well... • DB provides full ACID behavior, no additional transaction needed • Adding trigger and column is straightforward • Performance on DB not seriously affected • Client clock is not a factor • Fairly easy for client to connect and get the event stream © 2007 LinkedIn corp. 13
  • 14. A big problem though... • Timestamps (in Oracle) are set at the time of the call, NOT at the time of commit time • tx2 commits BEFORE tx1 but t2 > t1 tx1 systimestamp t1 • if sync occurs between the 2 commits tx2 lastTimeStamp is t2 and tx1 is missed! systimestamp t2 • Workaround (ugly): reread events since lastTimeStamp - n seconds commit commit • Any locking and we can still miss an event © 2007 LinkedIn corp. 14
  • 15. Second attempt... ora_rowscn • little known feature introduced in Oracle 10g • the ora_rowscn pseudo column contains the internal Oracle clock (SCN - System Change Number) at COMMIT time! • by default, the ora_rowscn is block-level granularity • to get row-level granularity: create table T rowdependencies ... • SCN always advances, never goes back! • no performance impact (that we could tell) • little catch: cannot be indexed ! © 2007 LinkedIn corp. 15
  • 16. Indexing ora_rowscn? • add an indexed column: scn • set default value for scn to “infinity” • after commit ora_rowscn is set • select * from T where scn > :1 AND ora_rowscn > :1 • every so often, update T set scn = ora_rowscn where scn = infinity © 2007 LinkedIn corp. 16
  • 17. Using ora_rowscn (cont) Connections from_member_id int to_member_id int is_active boolean scn int (indexed) ora_rowscn virtual Graph Engine RAM Application DB insert / update getCxChangesSince(scn) connections lastScn ora_rowscn to scn quot;indexerquot; © 2007 LinkedIn corp. 17
  • 18. Supporting more tables/events trigger on insert/update allocates txn from sequence and inserts into TxLog Connections TxLog from_member_id txn int int to_member_id scn int int (indexed) is_active ts boolean timestamp txn mask int int ora_rowscn virtual Graph Engine RAM Application DB insert / update getChangesSince(scn) connections lastScn Now we have a single log for all the tables that we want to track © 2007 LinkedIn corp. 18
  • 19. Many tables, many clients.. Graph Engine People Search Jobs Search Application DB getChangesSince(table[],scn) Ask Your Network insert / update Search Experts Search ... TxLog Connections Profiles Members Groups Jobs ... - adding a new “databusified” table is (fairly) easy - clients can track many tables at once - transactions involving many tables are supported © 2007 LinkedIn corp. 19
  • 20. Ideal “Databus” ? • Does not miss ANY event • Does not add any new point of failure • Scales well? • Works over high latency networks • Does not assume all clients to be 100% • Easy to add new clients • Easy to use on the application side • Easy to set up on the DB side © 2007 LinkedIn corp. 20
  • 21. Scaling? Graph Engine Graph Engine Graph Engine DB Graph Engine People Search People Search Jobs Search Lots of SQL connections ! Does not scale too well ! Adding a lot of clients will eventually kill my databases © 2007 LinkedIn corp. 21
  • 22. Scaling! Graph Engine Graph Engine Relay Graph Engine single Wr Rd Graph Engine Event http DB DB Buffer server thread People Search People Search Jobs Search HTTP stateless GETs... - only a few DB clients (for HA) - hundreds of clients load balanced over HTTP - relays can be cascaded (within reason) - clients do not talk to DB at all! © 2007 LinkedIn corp. 22
  • 23. Additional applications... • Cache invalidation RPC Data Access Layer consistent? yes no Apply Thread Reader Cache loop: result = cache.get(key); get updates from Master DB if result is null { . can be large Master foreach (key,value) { result = db.get(key); . infinite TTL events cache.clear(key); if (result not null) DB } cache.add(key,result); } TxLog return result; consistent = write operations or any subsequent read within N seconds... note: N can be dynamically computed and adjusted for latency © 2007 LinkedIn corp. 23
  • 24. Combination caching & replication • Even better, if cache is empty, fill it up from a LOCAL replica RPC Data Access Layer consistent? yes no Cache . optional Apply Thread Reader . can be large . infinite TTL loop: result = cache.get(key); get updates from Master DB if result is null { Master foreach (key,value) { result = repdb.get(key); events repdb.put(key,value); if (result not null) DB cache.set(key,value); cache.add(key,result); } } TxLog Replica return result; DB lastScn Some race conditions exists, details for dealing with them are omitted © 2007 LinkedIn corp. 24
  • 25. Specialized indices • bulk lookups: “which of these 800 contacts in my address book is a member?” • spatial indices... • partial indices for time relevant data (most recent, short lived data) Specialized Index (RAM) DB TxLog Apply Thread © 2007 LinkedIn corp. 25
  • 26. The overall view... Applications RPC JDBC Caches Lucene Graph Custom DBs Search Engines Indices Replica Engines DBs TxLog Apply Thread Apply Thread Apply Thread Apply Thread HTTP Relays (capture) Databus Relays (mostly for HA) © 2007 LinkedIn corp. 26
  • 27. Vendor lock-in? • Our capture code is indeed Oracle specific but... • All we need from the DB is • having a unique ID per transaction (txn). No ordering required, just unicity • having ONE table with a non indexable “SCN” • serializable transactions • (optional) alert mechanism (cond.signal()) © 2007 LinkedIn corp. 27
  • 28. Databus summary • it is NOT a messaging system, it only provides the last state of an object • it is NOT a caching system or datastore, it provides changes quot;since a point in timequot;, not quot;by keyquot; • it will provide 100% consistency between systems... eventually... • no additional SPOF: it’s you commit to the DB, it will be available on the bus • moves replication benefits out of the DBA land into the application space • very limited support from the DBMS required • plugging in the Databus from a SWE POV is as simple as wiring a Spring component and writing an event handler... • allows for a full range of data management schemes if the data is mostly read © 2007 LinkedIn corp. 28
  • 29. Back to the Graph Engine... Storage Mgr Local Disk R scn P D getDistance(m1,m2) C a Connections Groups t Store Store S a filterAndSort(m,degree,[mbrs,data]) e b r u Visitor Visitor v s getNetwork(m,degree,ts) e r Aggregator © 2007 LinkedIn corp. 29
  • 30. Algorithms • visitor pattern within a “Read Transaction”, standard Breadth First 3 2 1 • a little fancier for visitNewSince(TimeStamp ts) if you want to avoid traversing the graph twice • use symmetry when possible 4 3 2 2 2 1 1 1 much cheaper than © 2007 LinkedIn corp. 30
  • 31. First challenge: Read Write Lock gotchas • Wikipedia: A read/write lock pattern is a software design pattern that allows concurrent read access to an object but requires exclusive access for write operations. In this pattern, multiple readers can read the data in parallel but needs exclusive lock while writing the data. When writer is writing the data, readers will be blocked until writer is finished writing. • Sounds like a great locking mechanism for any structure with a LOT of reads and a FEW tiny (serializable) writes... • Graph index is one of these... • So why do we observe? • CPU caps way under 100% • Graph RPC calls take much longer than expected © 2007 LinkedIn corp. 31
  • 32. Read Write Lock gotchas Problem occurs when some readers take significant time... R1 W R2 R3 400 ms 5 ms 100 ms R2 is delayed by nearly 400 ms for a 5 ms call! © 2007 LinkedIn corp. 32
  • 33. Read Write Lock gotchas • Solution: don’t use RW lock, use Copy On Write... Reader 1 Reader 2 root root 1 1' 2 3 4 2 3 4' 5 Writer add node 5 dup node 4 and 1 switch root wait for previous readers to finish free nodes 1 and 4 © 2007 LinkedIn corp. 33
  • 34. Second challenge: one’s network size... • Graph of connections is highly skewed: yet another power law... • Heavy connectors “attract” each other which leads to a “black hole” effect • Once you introduce distance > 1 the network size grows very fast ! • Compressing can flatten that curve considerably... Network artifact is now: 3 2 3 2 - cacheable 1 1 - transportable over the network - whole, per degree, per “chunk” compress and partition © 2007 LinkedIn corp. 34
  • 35. With caching Graph Computing Engine compressed A: send ids for filtering and distance 3 2 B: ask for ids within distance + within id range 1 Network Cache A 3 2 1 3 Search 2 3 1 2 3 1 2 1 B maintains compressed network of logged in users • cache is maintained compressed • compression is designed so visits, lookup and partial fetches are fast! • network can be shipped to other systems for additional relevance © 2007 LinkedIn corp. 35
  • 36. Summary • Databus infrastructure useful for.... • maintaining indices more efficiently than in a std relational DB • Graph of relationships beyond first degree • Text based indices (Lucene) • splitting data persistence from read queries • use DB mainly for storing data and maintaining TxLog • use RAM for processing/querying the data • keeping distributed caches in sync with Master DBs © 2007 LinkedIn corp. 36
  • 37. Q&A © 2007 LinkedIn corp. 37