This talk was given on the 9th of August 2010 at the American Phytopathological Society's annual conference in Charolette North Carolina.
I talk about how the commodotisation of emerging tools on the web, such as the semantic web and scalable architectures, may have an effect on the communication and practice of science.
1. Unveiling the web, making the
implicit explicit; how new
technologies will do your
networking for you, and what
you can do to take advantage
of that.
..
Ian Mulvany
VP New Product Development Mendeley.com
http://www.flickr.com/photos/sminor/
2. Hi, I’m Ian. I started out in science studying astrophysics.
Then I worked as an editor for Springer.
While doing that job I got really interested in how the web could help with scientific communication.
That led me to Nature where I spent three years building web applications for scientists.
For the last 5 weeks I’ve been working with a great little startup called Mendeley.
3. Humans
Science Blogging/Tweeting/Social Communities
Public Academic
Machines
You are all familiar with social media tools like blogs, twitter and social networks.
They are great for connecting professionals, and for reaching out to the general public.
But in a way these tools are really just at the surface of the internet.
There are a lot of interesting emerging technologies that lie beneath that are starting to have an impact on science, technologies like semantic
markup, the commodotisation of scalable web architectures, and easier to implement machine learning tools.
Today I’m going to talk about some of these tools.
4. the future
is already
here. It's
just not
very evenly
distributed
william gibson - 1999
image flickr: fredarmitage
I love this quote, and what I want to do today is mainly show you what some other fields of science have been doing with some of these
technologies.
5. it required no brilliance for people to
foresee the fabulous growth
that awaited such industries as ... aircraft
(in 1930) and television sets (in 1950).
But the future then also included
competitive dynamics that would
decimate almost all of the companies
entering those industries
Warren Buffet
But I do have a very important warning for you all.
This is Warren Buffet, he is one of the most successful investors of all time, and he doesn’t invest in internet companies.
This quote is taken from his annual letter to investors, and there he explains that the internet is a disruptive technology, and that makes it hard to
predict what is going to succeed.
Just like it was hard to predict who would do really well at the dawn of the aeroplane, or the TV.
6. Google Wave
image: flickr prgibbs
For the last year I was convinced that Google Wave was going to be the next big thing, and I told a lot of people that.
Last week Google stopped development work on Google Wave.
Oops.
So instead of telling what tools are going to be the ones that you should use, I just want you to concentrate on how the tools I will talk about are
making changes happen.
I don’t know if they will be the ones that will be around in five years, but I know things will be different.
8. images: wikimedia commons
I’m very glad to say that I flew in on something that looked like this.
Technology does mature, and when it matures all the complicated bits get abstracted away from you.
They get hidden.
You sit in your seat, and a few hours later you are somewhere else.
It’s like magic.
9. Ethernet
TCP/IP
HTTP
• server room http://www.flickr.com/photos/tuxstorm/
The internet is very complicated.
It’s a big distributed mess of cables and protocols.
10. But you don’t see that any more.
You just see some very easy to use interfaces.
All the complexity has been abstracted away and hidden from you.
In the last five years I bet everyone in this room has become a content creator on the internet.
All because it’s just so easy nowadays.
11. The Royal Society, 350 years old
Nature, 141 years old
Peer Review at Nature, 43 years old
Google, 12 years old
Facebook, 7 years old
Twitter, 4 years old
Mendeley, ~1.5 years old
image flickr: robbie73
So even though some of the tools that I’m going to tell you about today may not be very mature yet.
That will happen,
Time goes really fast on the internet!
12. no idea person I know
person I know
person I know person I know
no idea
no idea no idea no idea
no idea no idea
person I know
no idea no idea person I know
no idea no idea no idea
person I know
no idea no idea
no idea no idea
no idea
no idea no idea
no idea
Some of the companies on the internet have started taking the content you have created.
The digital trails that leave behind you every day.
And they have used that to recommend new friends for you.
And stuff for you to buy all of these new friends of yours.
13. Why no
recommendation engine
for science, especially
multi-disciplinary
science?
I want a jet pack, but I also want a really good recommendation engine for science.
14. Bollen, J. et al.,
2009. A principal
component
analysis of 39
scientific impact
measures.
Methods, 1-19.
doi/10.1371/journal.pone.0004803.g007
I want a jet pack, but I also want a really good recommendation engine for science.
This shows how journals are related by the reading patterns of scientists.
Science is so richly interconnected, it’s a shame that we don’t have great recommendation engines yet.
(by the way, if you don’t like the impact factor, go and read this paper, it’s awesome, and Johan is a really great guy!)
15. Citations
time
Of course much of the rich interlinking comes from citations.
Citations link papers together, but there is a problem with these links
You can never tell whether the link is a good link or ...
17. RDF
There is a way to turn links into relationships on the web.
It adds meaning to links.
It adds semantics to the web.
RDF is a popular way of doing this.
RDF means Resource Description Framework, but at it’s heart, it’s just a way of adding information about what a connection means.
18. Semantic Web
Applications in
Neuromedicine
image: flickr fturmog
Researchers at Harvard Medical School and the Massachusetts's Hospital are using RDF in Alzheimer’s research.
Their systems is called SWAN.
19. Research Narrative
alternativeTo
inconsistent
consistent discusses
Research Research Research Research
Statement Statement Statement Statement
Every scientific paper is really a story.
It tells us about the nature of the world, and it draws on the works of other people to convince us that new claims about the world are true.
Using SWAN the author of a paper adds the context to each citation and statement in a paper.
They let us know whether the claims in a paper are consistent, inconsistent or an alternative to another claim elsewhere.
It takes a lot of effort to mark up a paper like this. It’s expensive.
20. http://hypothesis.alzforum.org/swan/
But when you do it, you get an amazing overview of the literature.
You can use a machine to find the most controversial claims very quickly.
You can use that information to decide what experiment will shed the most light into our ignorance.
21. There are a growing number of sites and data silos that support rdf. This is the semantic web.
22. 2 300,
000, 000
Assertions in BioRDF
There are a huge number of statements about biological systems.
But what happens if you have plain vanilla html, or a naked CSV data set?
23. Let’s take an example from plant science.
On http://sbr.ipmpipe.org/cgi-bin/sbr/public.cgi you can get a map of the spread of soybean rust.
When you click on the link you get the information as a html table.
24. This is like much of the information on the web, let’s have a look at the html.
25. This html is plain, without much explanatory mark up.
27. And then we could use a tool like Yahoo Query Language (http://developer.yahoo.com/yql/) to filter the information on the table.
28. And we can create an RSS feed.
With a little effort in creating nice html, we can go from a plain piece of content into a filtered alerting service.
The web is soooo cool.
29. HTML
YQL
RSS JSON RDF
YQL takes input from html sources, and allows you to manipulate that input in interesting ways.
30. CSV HTML HTML
YQL
RSS JSON RDF
The entire conversion can be called at a single url
YQL can also take data from csv files or xml files on the web.
It can merge data.
The entire pipeline can be mapped onto one url, making it transferable, open and very sharable.
YQL is a tool that has come out of the hacker community.
It has great potential for science.
Just remember, put your data on the web.
<div id=”important”>Be nice about how you put it there ;)</div>
31. Citizen Science
Ok, we have looked at how emerging tools can help us join data together.
How they can help us add meaning and insight to the literature.
And how they can be used to make it easier to put our data onto the web in interesting ways.
Another emerging trend is the way in which we can connect people to that data.
And by people, I mean EVERYONE !!
32. BOINC based science
> 2, 000, 000 people
> 5, 500, 000 CPUs
http://www.allprojectstats.com
Systems that analyse data on a users computer while the computer is in screen saver mode have been around for a long time.
SETI at home is the most famous.
They have been adopted by millions of people.
Millions of computers have been used for doing science at home.
But this is a somewhat passive way to engage people.
33. 10 000 sheep, Aaron Koblin, 2006
Tools like the Mechanical Turk (https://www.mturk.com/mturk/welcom) allow you to get people to do real world tasks for you.
Like drawing sheep (baaaa!).
35. The Galaxy Zoo project created an intuitive web interface that allowed members of the general public to classify galaxies from the Sloan Digital Sky
Survey.
They had a lot of galaxies that were too fuzzy for a computer to classify.
And they had too many for even a grad student to classify.
36. 1, 000, 000 galaxies
150, 000 people
50, 000, 000 classified
17 papers
In one year 150,000 people classified the one million fuzzy galaxies in the survey.
They did a lot of classification.
And Galaxy Zoo published a lot of papers as a result.
37. Cooper, S. et al., 2010. Predicting protein structures with a
multiplayer online game. Nature, 466(7307), 756-760.
The foldit project turned molecule folding into a game.
You get more points if you get your molecule into a lower energy state.
For many molecules this is too hard for computers to figure out.
After two years of people playing the game, they found the solution to a bunch of molecules that were not known before.
38. The last two examples were examples of data analysis.
You can also get people to collect data for you.
The great backyard bird count gets bird watchers to count birds.
39. They can make the best survey of bird populations, all across the US.
40. Noise tube turns a mobile phone into a sensing device for measuring noise pollution.
41. The noise profile of a bunch of of cities have been mapped out by people using this software in ambient mode.
As more people get more powerful phones what they will be able to measure will only be limited by the ingenuity of those looking for data.
We can already use phone to record sound, time, location, images, motion.
(Some phones can even be used to make phone calls)
42. image: flickr sybrenstuvel
But all of the things I’ve been talking about are not easy to do yet.
You need to really invest in building a platform, annotating your documents, or engaging with a community of people.
I believe that the tools that make these platforms possible will become easier to use.
The complexity will get abstracted away.
Tools will make it easy for us to engage people with our data, with each other, all helping science.
44. Mendeley Desktop
We have built a tool that works on your computer to help you manage your research library.
45. Manage
your research
papers
It’s really good (you should check it out at Mendeley.com). We want it to be the best tool that is possible for helping you.
(actually that’s my job, I’m in charge of making the product better, so let me know what you think at ian.mulvany@mendeley.com :P)
46. Mendeley aggregates research data in the cloud
But what is really cool is that we mirror your activity in the cloud.
We have a tool that is useful to you as an individual.
But when lots of you use it we can find out in real time what science is interesting!
47. By doing this, Mendeley makes science more
collaborative and transparent
We want to make it easy for everyone to find out what the experts think are the important papers.
48. Real-time data on 28m research papers:
Thomson Reuters’
Web of Knowledge
Mendeley after
16 months:
And we already have information on lots of papers.
49. We can tell you what kind of people are reading a paper, and where they are from.
50. And just like amazon can recommend books to you based on your behaviour, and the behaviour of everyone
We have started making recommendations about research.
We are trying to make crowed sourced recommendations for science easy, and we have an API, so we are trying to make it easy for you too.
We have BIG ideas, and we are really excited.
Come and help us make science easier to do at mendeley.com, I’d love to see you there.
51. image: flickr daviddmuir
In the future, I don’t think you will be asking yourself “how” can you use tools and platforms like the ones I’ve been describing.
They will become easy to use, and easy to utilise.
You will be asking yourself “why” should you use these things.
So let’s look at the befits.
52. Costs of research Source: Research
Information Network
This Research Information Network report from 2008 shows that a lot of time is spent looking for what to read.
And time is money.
If we can build a way for you to find what you need faster, we all save money :)
53. Huang,Y., Contractor, N. & Yao,Y., 2008. CI-KNOW: recommendation based on social networks. In
Proceedings of the 2008 international conference on Digital government research`. Digital Government
Society of North America, pp. 27-33.
Lazer, D. et al., 2009. Social science. Computational social science. Science (New York, N.Y.), 323(5915),
If we can recommend people to each other as well as papers we can save on redundancy in research.
That’s what the tool that Huang and Contractor can help you do.
It’s helped people in cancer research get their work done faster.
54. crystal eye:http://wwmm.ch.cam.ac.uk/crystaleye/
The crystal eye is a tool that extracts the crystallographic bond lengths reported in the literature.
You can compare you results with every other result.
If it’s very different have you found something really interesting?
Or have you found an error?
By quickly being able to see the context of the information you have, you can more quickly understand it.
(http://wwmm.ch.cam.ac.uk/crystaleye/summary/acs/inocaj/2009/10/index.html)
56. image: flickr matthewfield
We can make them into scientists.
Look at the last author on the foldit paper.
I wish I had a paper in Nature.
I wish I’d played that foldit game, don’t you?
57. DATA Collection Humans Academic Papers/
Analysis Annotation
Science Blogging/Tweeting/Social Communities
Reading Academic
Papers
Amateur Professional
Data Processing Data Mining/Linking
Machines
So you see, there are lots of ways to connect people.
58. The Future
I wanted to end with a few thoughts more about future trends.
The first one I want to talk about is that we are going to need to be more open about science.
59. GISTEMP
Global Temperature Anomaly
(and we match this)
slide from: clear climate code
When the Intergovernmental Panel on Climate Change reported their results.
60. Motivation
xkcd.com
slide from: clear climate code
Lot’s or people said that it was a fix-up, that the data could not be reproduced, and that the old Fortran code that produced that graph could never
be run.
61. Code Metrics
GISS ccc-gistemp
slide from: clear climate code
Indeed, the code was a mess, that’s the composition of the code on the left.
Some interested computer programmers (NOT SCIENTISTS, JUST NORMAL PEOPLE WHO WERE INTERESTED) rewrote the code in python.
Sorry for shouting just there, but that’s so important. Not scientists, not the custodians of reproducibility.
And the reason is that you don’t get credit in science for rewriting code.
But these computer programmers thought it was an important enough issue, the potential destruction of mankind, and they were not looking for
scientific accreditation.
So they proved you could run the original code.
And they vastly improved it (that’s their code in the middle).
You can go and tell them how awesome you think they are over at http://clearclimatecode.org/
62. Independent Analyses
Graphic courtesy Zeke Hausfather
slide from: clear climate code
And here is the proof.
So if you make your data open, you also really have to make the methods and the code and all the nitty gritty open too.
Otherwise you steal away the context.
And we will forget.
And the knowledge that you know is so important.
Will be lost.
63. image: flickr doug88888
I think another interesting trend will be that the world will start talking to us.
London Bridge talks to us.
(Hi Tom, ~waves~).
65. image: flickr scottkinmartin
From botanicalls.com you can even get something to put into your plant pot that will make you plant talk to you.
66. King, R. D., Rowland, J., Oliver, S. G., Young, M.,
Aubrey, W., Byrne, E., Liakata, M., et al. (2009).
The automation of science. Science, 324(5923),
85-89. AAAS. Retrieved from http://
www.ncbi.nlm.nih.gov/pubmed/19342587
With all of this data available machines like the one King et al. created will get more powerful.
You feed it data, and it doesn’t just analyze the data.
It creates hypotheses.
And they are correct.
Computers are going to start doing science.
I hope we can be friends.
67. Bradley W. Schenck
Bradley W. Schenck
image: flickr simon
The last idea I have for you is a 3-d printer that can print itself.
It’s slow, but the internet used to be slow too.
In 1982 it would take 400 hours to transmit 1 song.
In 1990 it still took 1 hour.
Right now it takes a week to print all the bits you need to make another 3-d printer.
But imagine a future where you could email your lab to someone.
And they could print it.