16 February 2009 by Victor

A few days ago, William Gunn blogged about a fascinating idea for a paper recommendation engine and also described Mendeley’s role in it. His post then generated a lively discussion on FriendFeed.

Perhaps due to our relatively well-known affiliation with Last.fm, our idea for a research paper recommendation engine had always involved tags and collaborative filtering. But William brings up Pandora, another type of recommendation engine which doesn’t rely on critical mass, but on scoring music based on a certain set of dimensions.

So I was wondering, how feasible would such a human-scored recommendation engine be for research papers, and how could one do it? If one were to transplant the Pandora approach 1:1, one would have to find suitable dimensions on which to score papers – but what could those be? Epistemological position (e.g. positivist vs. constructivist), academic discipline, methods used? Or would you have to define a slightly different set of dimensions for each academic discipline? As opposed to music, where you can score tracks based on instrumentation, mood, tempo etc., I feel that it would be rather difficult to use this level of abstraction for research paper recommendations, but maybe I’m wrong.

Of course, you could think of tagging as a form of (binary) scoring, too, but without pre-defined dimensions. I thus remain convinced that tagging and collaborative filtering will be very good starting point for our recommendation engine. However, William’s suggestion made me think of an additional possibility.

Here’s what we might do: We have been planning to gradually add “Paper Pages” to the Mendeley site over the next few weeks. There will be one page for every paper in our database, containing the metadata, the abstract (if possible/available), some usage statistics about the paper, links to the publisher’s page (if available), and (later on) commenting functionality. We were also thinking about crowdsourcing approaches to enable users to correct mistakes in the metadata or merge duplicates.

Incorporating William’s suggestion, we could also give users the option to explicitly link paper pages to each other, and then say “this paper is related to this other paper because ___”. Two papers sharing the same tag may implicitly suggest a relation, but it might also be a case of a homonym – the same tag meaning two completely different things in different disciplines. An explicit link would solve this problem.

I didn’t have much time to fully think this through, and any further ideas would be appreciated!

  • February 16, 2009 at 3:05 pm Christina Pikas
    dimensions = facets, you have some already, maybe also research methods? for social sciences and humanities, maybe theory? dunno.
  • February 16, 2009 at 3:52 pm Björn Brembs
    I've always wondered why one would need so many different applications to do the same thing: handle scientific literature. One search engine to quickly find it, one or two other engines to find citations, publisher website with login or FF references wanted room to get the papers, Endnote, RefMan, Mendeley etc. for inserting refs in manuscript, CiteUlike, Connotea and Mendeley for collaborating/sharing/bookmarking, etc.
  • February 16, 2009 at 3:55 pm Björn Brembs
    Ideally, there should be one place to find all papers, one place there to bookmark/share them and one way to take these bookmarked papers and add them to your manuscript as references. And of course, if you click on any reference in that paper, you get to the other paper. Mendeley for the first time has the potential to do all this with the power of crowdsourcing. Some of it would (sill) be illegal, but it could work. One ring, er, place to rule them all! Great post, Victor!
  • February 16, 2009 at 4:03 pm Christina Pikas
    the finding part is actually tricky - different research areas/disciplines/domains have different information they need to determine if the article is relevant. So to have all the fields (like taxonomic classification, frequency range, or astronomical object number, subject age group) for every article in science doesn't work... you could look at lowest common denominator, and federated search does that, but it's really not where you want to go.
  • February 16, 2009 at 4:05 pm Christina Pikas
    besides, some amount of competition is healthy. WoS didn't really make any significant changes for years until Scopus came along... However, interoperability is key as are things like apis and publishing and importing/exporting.
  • February 16, 2009 at 4:15 pm Richard Akerman
    Recommender is tricky in science because there are many different domains. Is the most relevant paper to me as a computer scientist going to be the most relevant to you as a biologist? Amazon is making recommendations based on mass consumer behaviour - a single crowd. I'm not sure how well this will map to narrow scientific disciplines. I think you need something like "people in your field read this and also that". One thing I've been wondering about the b2x recommender is whether the user pool (presumably mostly undergrads doing course readings) will skew the recommendations.
  • February 16, 2009 at 4:17 pm Richard Akerman
    Also, since FF doesn't have a way to connect "is-related-to", here's a related discussion thread: http://friendfeed.com/e/3ab061d4-bc66-cfaf-756f-a2db1686161f/Could-this-be-the-Science-Social-Networking/
  • February 16, 2009 at 4:22 pm Christina Pikas
    right so following on from Richard - you might want related as in uses same algorithm, same species, same geographic location, ....these facets (geographic location) only appear in databases where they're relevant
  • February 16, 2009 at 4:33 pm Richard Akerman
    @Christina And following on from my comment, my question continues to be, what standard facets are there that would be high value, but couldn't be machine-extracted. That is, what value can humans add in terms of facets (that aren't already done at creation time - for example many articles already include keywords)? (I can certainly see them adding value in terms of rankings, ratings, comments, folksonomy tagging etc.)
  • February 16, 2009 at 5:00 pm Martin Fenner
    "Paper Pages" in Mendeley is a good step forward. Please consider integration with other services (e.g Connotea, CiteULike, Researchblogging.org, Nature Blogs) for these pages.
  • February 16, 2009 at 5:15 pm Victor / Mendeley Team
    Martin: Yes, we'll do that! Richard: The concern for undergrads skewing collaborative filtering recommendation data and differing relevance for different disciplines was precisely the reason that we require academic position and discipline as part of our sign-up procedure - we felt that this was the minimum information that we needed (besides personal library contents) for making decent recommendations :-)
  • February 16, 2009 at 5:20 pm Victor / Mendeley Team
    I also forgot to say that we can use Mendeley Document Groups and tags in the same way as the "explicit links" described in my blog post: If a Mendeley user puts two papers into the same Document Group, or assigns the same tag to them, then they should both be relevant to a common topic (i.e. intra-personal tags don't have the homonym problem of inter-personal tags, and could this be weighted more strongly)...
  • February 16, 2009 at 6:37 pm Richard Akerman
    I'm looking forward to seeing the "Paper Pages" implementation - I think it has great potential.
  • February 16, 2009 at 9:35 pm Mr. Gunn
    I left my comments on your blog, Victor, but basically I think the "paper pages" idea is spot on. The kind of facets I was think about are things like "was trained by", "is colleague of", "is rebuttal of", "was influenced by". These are the kind of things that a text-mining approach won't be able to pull out from the data itself, but once you've got a set of annotations, you might just find some correlations that allow you to infer these properties.
  • February 17, 2009 at 9:08 am Cameron Neylon
    Just a plea for lots of rich feeds off of these paper pages - I want to monitor and mashup data coming from multiple sources about my and other interesting papers.

Add a comment on FriendFeed




8 Responses to “A human-scored research paper recommendation engine?”

  1. Andre Vellino Says:

    You should read the paper by Toine Bogers and Antal van den Bosch “Recommending Scientific Articles using CiteULike”, published in the proceedings of the Recommender Systems Conference 2008.

    http://portal.acm.org/citation.cfm?id=1454053

  2. Dr Shock Says:

    Just some quick thoughts for dimensions:
    keywords, discipline, citation index, other indexes?
    Kind regards Dr shock

  3. Mr. Gunn Says:

    Actually, Victor, that’s exactly what I was thinking about. Relatedness in terms of keyword extraction is useful, but being able to state, perhaps in a FOAF-like machine readable manner, this paper is related to this one because Author X did his PhD in Author Y’s lab, or this paper uses a similar approach as this other one in a related field, or this paper is a follow-up to the questions raised by this other paper.

    Types of features human annotation could add are things like:
    “was trained by”
    “is a colleague of”
    “is a rebuttal of”
    “is an extension of”
    “was first to publish”
    “was most influential to”

    Dates, authors, times cited, are all metadata that’s available, but with the explicit linkage you’re talking about, you could provide what I find most missing in a recommendation system – validation that the paper being recommended is truly an important one, not showing up just because of some keyword co-occurrence.

  4. Pedro Beltrao Says:

    Have a look at pubmed and how they suggest related articles. I don’t think you will have access to the references for each article but that could be used as a similarity signal as well. I think I would go with some form of clustering on keywords extracted from the abstract plus citations matching if you had access to it. These would serve as a baseline for article similarity that could then be personalized for each users according to likes/dislikes, tagging (co-similarity with tagged items) etc.

  5. Mr. Gunn Says:

    The PMRA article paper is here: http://www.biomedcentral.com/1471-2105/8/423

  6. Alex Vozny Says:

    I think, the best way to find the relation between two papers is the overlap of cited works within these papers
    As a score for recommending the papers found in such a way, its own citation index can be used.

    Another way of recommendation is the one used at Last.fm – find the users with overlapping libraries.

    Overlap of tag clouds of two papers could be useful too. In contrast to music you can assign much more tags to a paper (describing methodology, materials used, theory vs expt, etc.) But people hardly add tags to all papers they keep (I have about 1000 items in my db and have tags only for several of them).

    Maybe if it was a more automated process during saving the paper instead of manual revision of the library in your spare time will reduce the barrier to tagging. For example, I’ve noticed that I started tagging bookmarks as soon as Firefox 3 allowed to do it on the fly.
    With research papers it is harder to do, since you can’t tag effectively without having read the paper, but still I tend to put 2-3 keywords in the name of the file when I download the PDF (and now I’m going to stop doing this since at some moment I would rename the files from within Mendeley). Maybe it’s a good idea to extract those keywords from filename in Mendeley (but you have to teach users to have it as a habit).

  7. Obama to appear on the "Tonight Show" (Reuters) — But As For Me Says:

    [...] A human-scored research paper recommendation engine? | Mendeley Blog [...]

  8. Mark Says:

    You guys are way ahead of me… here I was thinking I was smart because I know what Markov engine is LOL

    Good luck