1) MongoDB is used to collect analytics data from GitHub pages in real-time with over 10-15 million page views per day stored across 13 servers.
2) Data is stored in a denormalized manner across multiple collections to optimize for space, RAM, and read performance while live querying is supported.
3) As data volume grows over time, the data will need to be partitioned either by time frame, functionality, or individual servers to support the increased load.
19. class TrackService
def record(attrs)
message = MessagePack.pack(attrs)
@client.set(@queue, message)
end
end
20. class TrackProcessor
def run
loop { process }
end
def process
record @client.get(@queue)
end
def record(message)
attrs = MessagePack.unpack(message)
Hit.record(attrs)
end
end
22. class Hit
def record
site.atomic_update(site_updates)
Resolution.record(self)
Technology.record(self)
Location.record(self)
Referrer.record(self)
Content.record(self)
Search.record(self)
Notification.record(self)
View.record(self)
end
end
23. class Resolution
def record(hit)
query = {'_id' => "..."}
update = {'$inc' => {}}
update['$inc']["sx.#{hit.screenx}"] = 1
update['$inc']["bx.#{hit.browserx}"] = 1
update['$inc']["by.#{hit.browsery}"] = 1
collection(hit.created_on)
.update(query, update, :upsert => true)
end
end
end