Riak at Posterous

•

1 j'aime•13,770 vues

Posterous recently deployed Riak to serve as their content cache. In this talk, Julio Capote will cover why the engineering team chose Riak for the use case. He'll also share some details on the old post cache and its problems, what solutions they evaluated, and how they settled on Riak.

Technologie

Riak at Posterous
Julio Capote

San Francisco Riak Meetup
1/18/2012

A/S/L?

• Julio Capote
• Backend Developer at Posterous
• @capotej

• Allows anyone to create multiple private or
public spaces (blogs)
• Around since 2008
• Millions of posts and users
• Tons of long tail trafﬁc

Some of the ﬁrst posts are still being accessed today due to search engines

How we store posts
• Original post body goes into MySQL
• Multiple variants are generated (nojs,
mobile, etc)
• Expensive to generate (sanitizers,
expanders)

Enter Variant Cache

• A generic read/write-through cache library
• Started with Memcache
• Moved to Redis

At the time disk store looked promising, so we moved from memcache to redis

Redis is awesome, but
• Requires both the key and value go into
memory
• Terrible disk store performance
• Even with 3 machines with 64gb ram,
couldn’t ﬁt entire working set
• Forced to set a TTL

redis wasn’t really designed to ever hit the disk

What we wanted
• Key/Value store
• Disk backed
• Built in distribution
• Use less boxes to serve more users
• Consistent performance over raw
performance

MySQL /
HandlerSocket
The Good
• Great performance
• Can handle a huge number of rows
• Mature / Safe (at least the mysql part)

MySQL /
HandlerSocket
The Bad

• Sharding deﬁnitely not built in
• HandlerSocket is pretty much abandoned

No support going forward

MongoDB

The Good
• Crazy fast
• Built in sharding support
• ...did I mention it was fast?

MongoDB

The Bad

• 30% standard deviation on fetch times (!)
• Would falsely acknowledge a write

This is probably tunable, but still

Riak + Bitcask
The Good
• Distributed by default
• Consistent and predictable performance
• Highly concurrent, no perf degradation
• Ops guy loves it!

Riak + Bitcask
The Bad
• Not crazy fast
• Stuck it behind memcache
• Still way faster than generating
• No multi get support

write and read through memcache

Riak in production

• Started using our 3 node cluster for the
global production cache
• Accidentally turned off a node
• Keys rebalanced, site didn’t skip a beat
• No one even noticed till hours later

Stats

• 3 nodes
• 2600+ requests/second
• 300+ GB
• ~200 million keys
• 10 GB memcache/host

#Protips
• All nodes can serve all requests, so...
• Use a vip, or...
• Pass all cluster nodes to client driver
(thanks @aphyr!)
• Use curb instead of net/http
• Use Keep Alive

Thanks for listening!
Special thanks to
@twoism
@vincentchu
@kangchen
@argv0
@pharkmillups
@seancribbs
@aphyr
@jrecursive

Contenu connexe

Tendances

Functional Programming in PHP

Aurimas Niekis

VersaPay's Tools for Happyfficient Developers

Philippe Creux

Automating JavaScript testing with Jasmine and Perl

nohuhu

WordPress 4.4 and Beyond

Scott Taylor

Managing changes to eZPublish Database

Gaetano Giunta

Live Coverage at The New York Times

Scott Taylor

Beyond Apache: Faster Web Servers

webhostingguy

Developing Rich Internet Applications with Perl and JavaScript

nohuhu

Jenkins-Koji plugin presentation on Python & Ruby devel group @ Brno

Vaclav Tunka

2015 WordCamp Maine Keynote

Scott Taylor

Proxying DBI with DBD::Gofer and App::Staticperl

nohuhu

Online.sg #9 "LLVM" Opening & Closing

Shota Fukumori

This talk encompasses the idea that each of us can be empowered to use and improve WordPress through beta testing of upcoming releases. The 4.4 release is set to hit Beta 1 just three days before WCPDX, which makes this talk a unique opportunity to educate WordCampers on the value of beta testing, and even to interactively participate in testing the next version of WordPress during the talk. I’ll cover my personal journey and lessons learned in dogfooding WordPress for a living, as well as ways anyone (yes, anyone) can get started testing with little to no barrier to entry.

Trying Out Tomorrow’s WordPress Today

DrewAPicture

In this session, Drew will be sharing insight into how a WordPress release happens, including an overview of all the moving parts, teams, organization, and execution. A lot of people have this idea that the core team is solely responsible for new versions of WordPress getting released, which couldn’t be further from the truth – it’s an intricate ballet of multiple contributor teams coming together and executing a broad vision. He will talk about how a release cycle is structured, how and where the decision-making happens, as well as all of the various contributors and teams that play their own part in a successful release. It’s very much opening the black box of how a release works.

It Takes a Village to Make WordPress

DrewAPicture

Ignite Devops Fast Moving Software

SpamapS

$Less\sass done right in .NET$ $Less\sass done right in .NET$

Less\sass done right in .NET

PawelPabich

REST In Action: The Live Coverage Platform at the New York Times

Scott Taylor

DevTools at Etsy

Daniel Schauenberg

Unit Testing in JavaScript

Rob Scaduto

Provisioning Rails Servers with Ansible

Radamanthus Batnag

Tendances (20)

Functional Programming in PHP

VersaPay's Tools for Happyfficient Developers

Automating JavaScript testing with Jasmine and Perl

WordPress 4.4 and Beyond

Managing changes to eZPublish Database

Live Coverage at The New York Times

Beyond Apache: Faster Web Servers

Developing Rich Internet Applications with Perl and JavaScript

Jenkins-Koji plugin presentation on Python & Ruby devel group @ Brno

2015 WordCamp Maine Keynote

Proxying DBI with DBD::Gofer and App::Staticperl

Online.sg #9 "LLVM" Opening & Closing

Trying Out Tomorrow’s WordPress Today

It Takes a Village to Make WordPress

Ignite Devops Fast Moving Software

$Less\sass done right in .NET$ $Less\sass done right in .NET$

Less\sass done right in .NET

REST In Action: The Live Coverage Platform at the New York Times

DevTools at Etsy

Unit Testing in JavaScript

Provisioning Rails Servers with Ansible

Similaire à Riak at Posterous

Keeping MongoDB Data Safe

Tony Tam

High Scalability Toronto: Meetup #2

ScribbleLive

Ruby and Distributed Storage Systems

SATOSHI TAGOMORI

Rapid Evolution of Web Dev? aka Talking About The Web

PINT Inc

Webinar - DreamObjects/Ceph Case Study

Ceph Community

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?

DATAVERSITY

Memcached Code Camp 2009

NorthScale

From 100s to 100s of Millions

Erik Onnen

Apache Kafka lies at the heart of the largest data pipelines, handling trillions of messages and petabytes of data every day. Learn the right approach for getting the most out of Kafka from the experts at LinkedIn and Confluent. Todd Palino and Gwen Shapira demonstrate how to monitor, optimize, and troubleshoot performance of your data pipelines—from producer to consumer, development to production—as they explore some of the common problems that Kafka developers and administrators encounter when they take Apache Kafka from a proof of concept to production usage. Too often, systems are overprovisioned and underutilized and still have trouble meeting reasonable performance agreements. Topics include: - What latencies and throughputs you should expect from Kafka - How to select hardware and size components - What you should be monitoring - Design patterns and antipatterns for client applications - How to go about diagnosing performance bottlenecks - Which configurations to examine and which ones to avoid

Putting Kafka Into Overdrive

Todd Palino

London devops logging

Tomas Doran

Fixing twitter

Roger Xia

Fixing_Twitter

liujianrong

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...

smallerror

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...

xlight

EhTrace -- RoP Hooks

Shane Macaulay

Caching your rails application

ArrrrCamp

Chirp 2010: Scaling Twitter

John Adams

C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz

DataStax Academy

Dibi Conference 2012

Scott Rutherford

In this talk, I will talk about why log files are horrible, logging log lines, and more structured performance metrics from large scale production applications as well as building reliable, scaleable and flexible large scale software systems in multiple languages. Why (almost) all log formats are horrible will be explained, and why JSON is a good solution for logging will be discussed, along with a number of message queuing, middleware and network transport technologies, including STOMP, AMQP and ZeroMQ. The Message::Passing framework will be introduced, along with the logstash.net project which the perl code is interoperable with. These are pluggable frameworks in ruby/java/jruby and perl with pre-written sets of inputs, filters and outputs for many many different systems, message formats and transports. They were initially designed to be aggregators and filters of data for logging. However they are flexible enough to be used as part of your messaging middleware, or even as a replacement for centralised message queuing systems. You can have your cake and eat it too - an architecture which is flexible, extensible, scaleable and distributed. Build discrete, loosely coupled components which just pass messages to each other easily. Integrate and interoperate with your existing code and code bases easily, consume from or publish to any existing message queue, logging or performance metrics system you have installed. Simple examples using common input and output classes will be demonstrated using the framework, as will easily adding your own custom filters. A number of common messaging middleware patterns will be shown to be trivial to implement. Some higher level use-cases will also be explored, demonstrating log indexing in ElasticSearch and how to build a responsive platform API using webhooks. Interoperability is also an important goal for messaging middleware. The logstash.net project will be highlighted and we'll discuss crossing the single language barrier, allowing us to have full integration between java, ruby and perl components, and to easily write bindings into libraries we want to reuse in any of those languages.

Messaging, interoperability and log aggregation - a new framework

Tomas Doran

Similaire à Riak at Posterous (20)

Keeping MongoDB Data Safe

High Scalability Toronto: Meetup #2

Ruby and Distributed Storage Systems

Rapid Evolution of Web Dev? aka Talking About The Web

Webinar - DreamObjects/Ceph Case Study

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?

Memcached Code Camp 2009

From 100s to 100s of Millions

Putting Kafka Into Overdrive

London devops logging

Fixing twitter

Fixing_Twitter

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...

EhTrace -- RoP Hooks

Caching your rails application

Chirp 2010: Scaling Twitter

C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz

Dibi Conference 2012

Messaging, interoperability and log aggregation - a new framework

Dernier

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

Handwritten Text Recognition for manuscripts and early printed texts

Maria Levchenko

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

[2024]Digital Global Overview Report 2024 Meltwater.pdf

hans926745

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

GenCyber Cyber Security Day Presentation

Michael W. Hawkins

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

apidays

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

The Raspberry Pi 5 was announced on October 2023. This new version of the popular embedded device comes with a new iteration of Broadcom’s VideoCore GPU platform, and was released with a fully open source driver stack, developed by Igalia. The presentation will discuss some of the major changes required to support this new Video Core iteration, the challenges we faced in the process and the solutions we provided in order to deliver conformant OpenGL ES and Vulkan drivers. The talk will also cover the next steps for the open source Raspberry Pi 5 graphics stack. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://eoss24.sched.com/event/1aBEx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Igalia

Dernier (20)

Artificial Intelligence: Facts and Myths

Boost PC performance: How more available memory can improve productivity

🐬 The future of MySQL is Postgres 🐘

Partners Life - Insurer Innovation Award 2024

Handwritten Text Recognition for manuscripts and early printed texts

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

A Domino Admins Adventures (Engage 2024)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Powerful Google developer tools for immediate impact! (2023-24 C)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

GenCyber Cyber Security Day Presentation

presentation ICT roal in 21st century education

Data Cloud, More than a CDP by Matt Robison

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Riak at Posterous

1. Riak at Posterous Julio Capote San Francisco Riak Meetup 1/18/2012

2. A/S/L? • Julio Capote • Backend Developer at Posterous • @capotej

3. • Allows anyone to create multiple private or public spaces (blogs) • Around since 2008 • Millions of posts and users • Tons of long tail trafﬁc Some of the ﬁrst posts are still being accessed today due to search engines

4. How we store posts • Original post body goes into MySQL • Multiple variants are generated (nojs, mobile, etc) • Expensive to generate (sanitizers, expanders)

5. Enter Variant Cache • A generic read/write-through cache library • Started with Memcache • Moved to Redis At the time disk store looked promising, so we moved from memcache to redis

6. Redis is awesome, but • Requires both the key and value go into memory • Terrible disk store performance • Even with 3 machines with 64gb ram, couldn’t ﬁt entire working set • Forced to set a TTL redis wasn’t really designed to ever hit the disk

8. The Dream

10. What we wanted • Key/Value store • Disk backed • Built in distribution • Use less boxes to serve more users • Consistent performance over raw performance

11. Percona MySQL / HandlerSocket

12. MySQL / HandlerSocket The Good • Great performance • Can handle a huge number of rows • Mature / Safe (at least the mysql part)

13. MySQL / HandlerSocket The Bad • Sharding deﬁnitely not built in • HandlerSocket is pretty much abandoned No support going forward

14.

15. MongoDB The Good • Crazy fast • Built in sharding support • ...did I mention it was fast?

16.

17. MongoDB The Bad • 30% standard deviation on fetch times (!) • Would falsely acknowledge a write This is probably tunable, but still

18.

19. Riak + Bitcask The Good • Distributed by default • Consistent and predictable performance • Highly concurrent, no perf degradation • Ops guy loves it!

20. Riak + Bitcask The Bad • Not crazy fast • Stuck it behind memcache • Still way faster than generating • No multi get support write and read through memcache

21. Riak in production • Started using our 3 node cluster for the global production cache • Accidentally turned off a node • Keys rebalanced, site didn’t skip a beat • No one even noticed till hours later

22.

23. Stats • 3 nodes • 2600+ requests/second • 300+ GB • ~200 million keys • 10 GB memcache/host

24. #Protips • All nodes can serve all requests, so... • Use a vip, or... • Pass all cluster nodes to client driver (thanks @aphyr!) • Use curb instead of net/http • Use Keep Alive

25. Any Questions?

26. Thanks for listening! Special thanks to @twoism @vincentchu @kangchen @argv0 @pharkmillups @seancribbs @aphyr @jrecursive

Riak at Posterous

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Riak at Posterous

Similaire à Riak at Posterous (20)

Dernier

Dernier (20)

Riak at Posterous