In the Oracle world you scaled vertically - bigger and bigger iron. Expensive specialized hardware was necessary to power scaling (SAN trays and heads). Complex to configure, storage app DSLs. Mention the problems with MySQL on a large dedicated EC2 instance 8 cores, only 1 at 100% load during heavy query loads Horrible utilization of the 64GB of RAM it was given Modern applications are designed and built using agile methodologies. The focus is on time-to-market, quickly launching with a minimum-viable-product, and following with quick iterations. Modifying the schema in a large enough application that is in production is a nightmare. 'Solution' of BLOB columns defeated the notion of being relational In MySQL and PostgreSQL indexes are brittle and require constant maintenance or they break and we're back to scans-speed performance (if the table is still usable at all).
A glaring bottleneck requires urgent measures. Think how much engineering effort goes into fixing and working around the problems of relational databases at FB, Yahoo, Amazon, Google, etc. Not until recent years do decent algorithms for caching logic surface (russian-doll caching, timestamp-based expiration). Cache invalidation is usually done wrong. Wait, if you need to ditch the relational model to make your RDBMS-backed app scale… Lots of large scale internet companies have that realization and start building their own NoSQL Most of it is in-house efforts, custom KVS. Google publishes a white-paper about BigTable(2004), HDFS/Hadoop follow it (2005), CouchDB will show up around 2005. Amazon is inspired to do Dynamo in 2007, Cassandra is influenced by Dynamo in 2008. Managing TCP connection pools isn't trivial, evenly distributing data isn't trivial Custom scripts needed to manage data rebalancing (move and clear out) as cluster size changes. What happens when a cluster node goes down? Heartbeat? read-writes go where?
Twitter starts off with sharding MySQL and hits difficulties quickly. Introduces their in-house graph database FlockDB to reduce load from MySQL which is now just mostly as a key-value store. A big, complex, low performance one. Heavy reliance on caching. In late 2013 Twitter reports a peak of 140K writes-per-second for the site. Very large read/write ratio. Combining MySQL for storage and FlockDB for queries they report a latency of 350ms for a new write (tweet) In China, Weibo, Alibaba, and TenCent follow a new trend in application design, at scale. A recent discussion with Pinterest showed a similar design approach. * These companies use in-memory NoSQL on the front application tier, but also abstract the application logic from database choice and scale using a separate data-access middleware layer. provides abstractions like ‘get list of friends’, ‘get list of tweets’. decide what’s cached, what’s in RDBMS, what is in NoSQL. separate users into “high traffic” and “low traffic” groups with different infrastructure, which allows for different optimization patterns. The application does not have direct access to the databases. discover that fast NoSQ L is faster than RDBMS + caching The separation is due to economics of DRAM-only fast NoSQL (Redis). Expensive to scale.
The travel industry is the first to have realtime pricing on inventory, as early as the 70s. Travel agents required SABRE to be built with 350K agents using it. Based on old mainframe computers. When web-based travel portals were created, suddenly everybody acts as their travel agent. The same API supports all the travel portals, and they are charged for lookups due to limited bandwidth on the provider side (airlines). The infrastructure is old, and the amount of queries and bookings keeps increasing. Another problem is that as opposed to agents, you don't necessarily know anything about travel portal users. Most do not log in when they search or even have a user account at the travel portal. You need to track these users. Travel portals applied massive cache layers, but have consistency problems. How often do you try to reserve and find the seat is no longer available, or at a different price? How frustrating is that as a user - bad user experience. Travel portals also compete on an even basis. The user experience (better UIs) is about all they can offer as a differentiator. These snappier UI adds even more load onto the app but to keep their profit margin they can't reflect those extra calls to the APIs they use. * Moving to fast NoSQL that allow for removal of the caching layer and supported a much higher rate of queries per second.
* This is the technology stack that major advertising technology companies built to sustain the crushing load of aggregating the clicks and views from so many websites
* Individual retailers are now using this same tech stack, for the same reason they wish to present a near real-time experience, and include Analytics-based results
The modern scale out architecture replaces the cache, database, and storage tier with a single, straightforward system. less hardware to purchase, and maintain less development time patching antique systems (or searching for 'solutions') remove the caching logic easier to administer (configure and monitor), and now affordable because of flash
if you don’t have a database that’s good at writes you don’t even create an application that uses it. key value is what you need 99.999% of the time. a real query that needs to pull in actual analytics many web applications built around an RDBMS treat it as a key-value store - denormalizing and removing foreign key constraints.
Predictable low latency is one of our Aerospike's core strengths In the RTB world where the entire time frame for user lookup, ad lookup, decision whether to bid, and potential auction war is 100-150ms databases with unpredictable latency spikes cause whole opportunities to be lost.
Aerospike is a real distributed database. Clustering was built-in from the very beginning and is core to the operation and performance on the database. It is not an after-the-fact bolted-on feature. masterless with replication Smart client connects and learns about the cluster topology. It only needs a single IP address. records are identified by a RIPEMD-160 digest of the PK. indexes are always 20-bytes wide. The client knows the partition map and will seek to write to the master and replica partitions synchronously. The client knows which partition to read a record from, and knows where the replica is for failover. Stop writing sharing logic
Scaling in DRAM only is simply not economical. SSDs are formatted and used to expand the memory space. Raw device, direct access pattern. Indexes are kept in DRAM to save on an extra IOP. enterprise feature: fast restart (shared-memory) Sets (tables) are kept contiguous for efficient bulk reads, scans
Apparemment, vous utilisez un bloqueur de publicités qui est en cours d'exécution. En ajoutant SlideShare à la liste blanche de votre bloqueur de publicités, vous soutenez notre communauté de créateurs de contenu.
Vous détestez les publicités?
Nous avons mis à jour notre politique de confidentialité.
Nous avons mis à jour notre politique de confidentialité pour nous conformer à l'évolution des réglementations mondiales en matière de confidentialité et pour vous informer de la manière dont nous utilisons vos données de façon limitée.
Vous pouvez consulter les détails ci-dessous. En cliquant sur Accepter, vous acceptez la politique de confidentialité mise à jour.