This was my presentation to the Digital Identity Management workshop at the 2009 ACM Computer and Communications Security conference, in Chicago, IL. I delivered the presentation on November 13, 2009.
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Mnikr - Digital Identity Management 2009
1.
2. We developed a system to coalesce people’s distributed social network personas from across the Internet, then buy and sell shares of people in a stock market to determine reputation.
34. How Big Did It Get? Is 350,000 Big? Is 350,000 Big?
35. The Kings of Reputation Street Time Between Trades Highest-Valued Identities
36.
37.
Notes de l'éditeur
Hello, I’m Brendan O’Connor, this was joint work with John Linwood Griffin at Johns Hopkins.
So what does this mean?
So how do we get there? Well, stay tuned.
Many sites use reputation-- eBay, Amazon, etc. And many others could use it; we could automatically allow highly-reputed to comment on news articles whereas unknowns would be moderated, for instance. Or any digital analogue of reputation in the physical world-- but unlike the “real world,” digital analogues are contextual; we aim to change that. Something else to consider: while we can make definitive statements about reputation where it's quantified (like on eBay), we can't do that with general reputation; while eBay can describe sellers as disreputable, a presentation yesterday couldn't identify the hosts they used for research as they were specifically chosen for their bad reputation-- presenting a libel issue. These problems could be solved by a real way to discuss reputation.
So let’s quickly define a few things. We define a Service as being a web application, such as Twitter, Digg, YouTube, or hundreds of others that allows individual users to register and provides some sort of per-user value.
We define a User as an actual human being or autonomous entity capable of using a computer and registering for accounts on Services.
We define a Persona as being an account on one Service, identified by a locally-unique string usually called a Username or Nickname. A Service has hundreds or thousands of Personas; a User may have zero, one, or many Personas on a given Service.
So these three are all simple concepts, but Mnikr defines a new concept: the Identity, which represents one *virtual* user across multiple Personas, on multiple Services. For instance, if a user Bob has account Robert on LinkedIn, and ChunkyMonkey on MySpace, and we can find data to link these two accounts together, we say that they make up one Identity.
So we’re going to speak today about who and what you are on the Internet. And that’s kind of confusing, since there’s no good definition as to where the “you” is on the Internet. There are a few bad contenders, though....
LinkedIn is great, for professionals. It tends to ignore academics, but worse from a “me” perspective, it’s a bad place for socialization: no one wants to be asked out for a beer over LinkedIn.
There are lots of sites that collect all your data-- but most of them aren’t social.
Dating sites are another online profile that’s supposed to be the “whole you”-- but you probably don’t have an existing commitment to them, since by their nature, you only hang out on them when you need their *ahem* services. (And yes, there really are car-based dating sites.)
Similarly, there’s your profile on sites where you sell things...
And sites where you buy things. Note that this person is listed as a “Top 500 Reviewer,” and a “Vine Voice.” How many people in this room know what a Vine Voice is? (Turns out it’s Amazon’s semi-pro reviewer program.) This is another problem: if people say things about us, how do we make sure those are intelligible?
This is the compromise some people come up with, something you might call the “Merit Badge” (like the Boy Scouts) approach. Everyone likes a shiny icon on their website, but what if they had 10? (I have 25 on mine, and several important ones are actually missing from that list.) Instant headache.
The first part of the Mnikr project, then, deals with the aforementioned merit badge problem: how to figure out who you are, who your friends are, and where all your bits and pieces are. Or to use our shiny new terms, how to combine Personas into Identities.
(This slide and the next taken from a stack by David Recordon.) XFN stands for XHTML Friends Network, and it’s a Microformat, meaning that it’s a lightweight addition to things already existing on a webpage. For instance, if you have a list of sites you’re using elsewhere....
Then you can simply add Rel tags to it, allowing computers to easily understand the connection between the current site and the linked one: “me” means that they’re owned by the same person, but you can use “friend,” “contact,” or a host of others to imply different relationships. This lets you build a full social graph, just by spidering the web.
This lets us then build up Identities, consisting of collections of individual sites all linked together through XFN “me” links; similarly, we can find friends, by looking at each site for XFN “friend” or “contact” links. One difference is that “me” links need to be circular for us to recognize them, whereas “friend” links can have directionality. So to build an Identity, we start with one website-- usuallly a profile page at a Service. Then we scan it for “me” links, and add those URLs to a queue to be scanned; rinse, repeat, until we run out of “me” links, and have a list of scanned URLs; these URLs are then parsed into username/service pairs (like “USSJoin” on “Twitter”)-- these are the personas-- then stored together as one Identity.
Spidering to get all this data, however, gets very out of control for a large number of sites; after 32 days of Mnikr, we were looking at more than 540,000 separate pages representing Persona pages for our users. So we leveraged someone really *good* at spidering: Google! Their Social Graph API allows us to get XFN data immediately, and it’s kept up to date. It’s also useful to anyone interested in what their sites expose about linked information; it has a great visualization tool for both “me” and “friend” links that I encourage you all to look at.
The issue with XFN is that the data isn’t perfect. Some people don’t use it; other people use it incorrectly. We have to tolerate some imperfections with the data stream; one way we do this is by using directionality in friend relationships. With “me” relationships, we make sure relationships are bidirectional; this means some Identities won’t be fully combined, but that’s acceptable.
So now, we have Identities and Personas galore, which was fun, and interesting, but isn’t really the crux of what we wanted to do. So now, we want to have people build reputations, using Mnikr.
PageRank, obviously, needs no explanation; it’s one of the core systems in the Google search engine. Windley and OpenPrivacy... (explain)
MIT Personas is an interesting attempt to show, rather than quantify, what a person is known for on the Internet. This actually came out after the camera-ready deadline, but I encourage everyone to go look at it-- mostly because their animations are impressive (it’s half an art installation). Nonetheless, this provides little that’s machine readable or comparable.
Accelerando is an interesting book chronicling the history of three generations of one family as they proceed through a technological singularity. From the perspective of their cat. :-) It’s a fantastic book with applications in many areas of computer science. One of the side ideas in it is the idea of reputation stock markets, with real money; it’s mentioned briefly, but we thought it was a neat idea that deserved to come to life. So now, here it is.
We decided to use a stock market to create this statement of reputation. We did this even though they *ahem* seem to have been getting a bad rap lately.
(Explain each) The Sybil attack is the attack where users can trivially make accounts that are able to influence the system-- so you essentially have a fake crowd that influences a system because it’s easy to create new accounts. We defeat this attack scheme because only users who themselves have had others invest can invest in others-- so while users can create new accounts, those accounts don’t influence reputations. The negative commentary issue bears some additional explanation.
Allowing people to make negative ratings lets interesting, and bad, things happen. There’s an MMORPG based on the Sims, called Sims Online. There’s been a new virtual crime spree, in which members of a highly-rated group extort new members to transfer virtual money to them, or they’ll flag them as bad people; there are enough of them that they can actually boot players from the universe. This is an interesting type of reputation attack, and it can exist in many types of reputation systems; the eBay Reputation lockouts mentioned during the week are similar. Therefore, we decided not to allow negative-based trades in Mnikr.
(In addition to what’s on the slide) Where did these numbers come from? Well, we wanted to have one person buying all possible shares of an Identity be able to reap some dividends from that purchase-- which decreases startup friction. But these numbers are fairly arbitrary, as is the 5-share cap; tweaking these numbers is an interesting future area. Conceptually, you can think of dividends as representing your good reputation “paying dividends,” of goodwill.
We didn’t want to use real money to test a stock market (as after the incidents in the real one, no one has any left) so we used a simple play money we called points. To bootstrap the system, we gave out codes worth 50 points to users; thus we could still limit the Sybil attack (since auto signups don’t get points) but get a broad range of investors.
So we deployed the Mnikr system, for a period of 30 days, on the public Internet. Anyone could sign up and begin trading reputation shares. We spent some time making it pleasant enough to use that people might want to hang around, and stable enough that we could sleep occasionally. You can see friends, profiles, avatars, and action streams in these pictures. We pulled the Action Stream data in from the social networks listed, in real time, so while this picture just shows Tweets, we showed many different actions on many different networks, based on where the user was active. It was updated in real time.
So this is a graph of days (on the bottom) versus combined Identities (on the left). So you can see that we plateaued at about 350K Identities, which translated into about 1.5M Personas. Obviously, there are more than 350,000 social network members-- Oprah has nearly 2.6M followers on Twitter-- but we think this was an effect of our distribution; it turns out the entire Web 2.0 world may be a clique. Within that world, though, with distribution both at an O’Reilly Foo Camp and in several related groups, we found fairly wide interest. One point of interest: with those nearly 1.5M people, to display the action feeds displayed on the previous slide, we were actually pulling 30-50 GB of Atom and RSS feeds daily, and storing a good chunk of that-- not to mention the processing required to deduplicate the feeds. It turns out that providing services like that-- like FriendFeed, for instance-- is actually somewhat complex. We actually decided to move to a different solution entirely, using JavaScript for the feeds instead, though that decreases our flexibility. Since this project, a new startup (SuperFeedr) has actually opened to solve this precise problem-- started by a friend of mine who heard me whining about this problem. :-)
On the left, you can see the time between first and last trades for different Identity IDs; some are zero, which means that users didn’t come back to make more, but a significant number *did* make many trades over time; this is consistent with many Web 2.0 services, where user retention is an issue. On the right, you can see the reputations that were highest after the 32 days; while there’s no “ground truth” in this, as we noted before, these people are all well-thought-of people on the Internet. (Explain each person.) This list continues to evolve over time as well.
Point out that Konstantinou (Teaching Fellow, Stanford) contacted *us* to talk about reputation
Metcalfe’s Law - Mnikr is useful on the square of the Identities, not on the people using it Correlation - We found that yes, interesting people did seem to rise to the top; this has continued beyond the 30 days.