Come hear MySpace share its experiences using Microsoft technologies to run Web applications for the most visited site on the Web. MySpace discusses its best practices for a massively scalable, federated application environment, and how it matured its deployment processes. An open Q&A session lets you pick the brains of engineers from both MySpace and Microsoft.com
Presentation on how to chat with PDF using ChatGPT code interpreter
The Megasite: Infrastructure for Internet Scale
1.
2. Aber Whitcomb – Chief Technology Officer
Jim Benedetto – Vice President of Technology
Allen Hurff – Vice President of Engineering
3. First Megasite
64+ MM Registered Users
38 MM Unique Users
260,000 New Registered Users Per Day
23 Trillion Page* Views/Month
50.2% Female / 49.8% Male
Primary Age Demo: 14-34
185 M
70 M
6M
1M
100K
4. As of April 2007 Page views in ‘000s
Internet Rank
185+ MM Registered Users MySpace #1 43,723
90 MM Unique Users Yahoo #2 35,576
Demographics MSN #3 13,672
Google #4 12,476
50.2% Female / 49.8% Male
Primary Age Demo: 14-34 facebook #5 12,179
AOL #6 10,609
Source: comScore Media Metrix March - 2007
5. 50,000
45,000
40,000
35,000
MySpace
30,000
Yahoo
M
M 25,000 MSN
Google
20,000
Ebay
Facebook
15,000
10,000
5,000
0
Nov 2006 Dec 2006 Jan 2007 Feb 2007 Mar 2007
Source: comScore Media Metrix April 2007
6. 350,000 new user registrations/day
1 Billion+ total images
Millions of new images/day
Millions of songs streamed/day
4.5 Million concurrent users
Localized and launched in 14 countries
Launched China and Latin America last
week
7. 7 Datacenters
6000 Web Servers
250 Cache Servers 16gb RAM
650 Ad servers
250 DB Servers
400 Media Processing servers
7000 disks in SAN architecture
70,000 mb/s bandwidth
35,000 mb/s on CDN
8.
9. Typically used for caching
MySpace user data.
Online status, hit counters, profiles, mail.
Provides a transparent client API for
caching C# objects.
Clustering
Servers divided into quot;Groupsquot; of one or
more quot;Clustersquot;.
Clusters keep themselves up to date.
Multiple load balancing schemes based
on expected load.
Heavy write environment
Must scale past 20k redundant writes per
second on a 15 server redundant cluster.
10. Relay
Client
Relay Service
IRelayComponents
Platform for middle tier
messaging. Socket
Relay Berkeley DB
Up to 100k request
Server
Client
messages per second per Non-locking Memory
server in prod.
Buckets
C
Purely asynchronous—no
C
thread blocking. Fixed Alloc Shared
C
Concurrency and
R
C
Coordination Runtime
Interlocked Int Storage
R
Bulk message processing. for Hit Counters
Custom unidirectional
connection pooling.
Message
Custom wire format. Message Forwarding
Orchestration
Gzip compression for larger
messages.
Data center aware.
Configurable components
11.
12. MySpace embraced Team Foundation Server and Team
System during Beta 3
MySpace was also one of the early beta testers of
BizDev’s Team Plain (now owned by Microsoft).
Team Foundation initially supported 32 MySpace
developers and now supports 110 developers on it's way
to over 230 developers
MySpace is able to branch and shelve more effectively
with TFS and Team System
13. MySpace uses Team Foundation Server as a source
repository for it's .NET, C++, Flash, and Cold Fusion
codebases
MySpace uses Team Plain for Product Managers and
other non-development roles
14. MySpace is a member of the Strategic Design Review
committee for the Team System suite
MySpace chose Team Test Edition which reduced cost
and kept it’s Quality Assurance Staff on the same suite
as the development teams
MySpace using MSSCCI providers and customization of
Team Foundation Server (including the upcoming K2
Blackperl) was able to extend TFS to have better
workflow and defect tracking based on our specific needs
15.
16. Maintaining consistent, always changing code base and
configs across thousands of servers proved very difficult
Code rolls began to take a very long time
CodeSpew – Code deployment and maintenance utility
Two tier application
Central management server – C#
Light agent on every production server – C#
Tightly integrated with Windows Powershell
17. UDP out, TCP/IP in
Massively parallel – able to update hundreds of servers
at a time.
File modifications are determined on a per server basis
based on CRCs
Security model for code deployment authorization
Able to execute remote powershell scripts across server
farm
18.
19. Images Videos
1 Billion+ images 60TB storage
80 TB of space
15,000 concurrent streams
150,000 req/s
60,000 new videos/day
8 Gigabits/sec
Music
25 Million songs
142 TB of space
250,000 concurrent streams
20. Millions of MP3, Video and Image Uploads Every Day
Ability to design custom encoding profiles
(bitrate, width, height, letterbox, etc.) for a variety of deployment
scenarios.
Job broker engine to maximize encoding resources and provide a
level of QoS.
Abandonment of database connectivity in favor of a web service layer
XML based workflow definition to provide extensibility to the encoding
engine.
Coded entirely in C#
21. Filmstrip for Image
Thumbnails for
Review
Categorization
DFS 2.0
CDN
MediaProcessor
Job Broker FTP Server
(Any Application)
Web Service
Communication
Upload
User Content
Layer
22.
23. Provides an object-oriented file store
Scales linearly to near-infinite capacity on commodity hardware
High-throughput distribution architecture
Simple cross-platform storage API
Designed exclusively for long-tail content
Accesses
Demand
24. Custom high-performance event-driven web server core
Written in C++ as a shared library
Integrated content cache engine
Integrates with storage layer over HTTP
Capable of more than 1Gbit/s throughput on a dual-
processor host
Capable of tens of thousands of concurrent streams
25. DFS uses a generic ―file pointer‖ data type for identifying
files, allowing us to change URL formats and distribution
mechanisms without altering data.
Compatible with traditional CDNs like Akamai
Can be scaled at any granularity, from single nodes to complete
clusters
Provides a uniform method for developers to access any media
content on MySpace
26.
27. 300
250
200
150
2005 Server
2006 Server
100
2007 Server
50
0
Pages/Sec
28. Distribute MySpace servers over 3
geographically dispersed co-location sites
Maintain presence in Los Angeles
Add a Phoenix site for active/active
configuration
Add a Seattle site for active/active/active with
Site Failover capability
29. Sledgehammer
Cache Engine Business
Users
Logic
Server Accelerator Engine
Storage Cluster
DFS Cache Daemon