October 24, 2016

Open Access Week: How Am I Doing, Altmetrics?

This is one in a series of posts in celebration of Open Access week (on Twitter: #oaweek, #open access, #OpenScience). To kick things off, we will go through an informal evaluation of Altmetrics and other indicators of research paper usership.

In this post, I will discuss some quick investigations I did using the Altmetric metric system (known visually as the number within the multicolored donut). Altmetrics go beyond academic metrics based solely on academic journal prestige or number of formal citations in academic papers (e.g. h-index). In this post, I will discuss how these metrics might be used to help better understand the full impact on one's work.

The Altmetric donut and its diversity of input sources. The Altmetric score is based on how many interactions your content received from each source medium.

The first exercise I did was to acquire Altmetric donuts for journal articles and preprints for which I did not have such data. This includes venues such as arXivStem Cells and Development, and Principles of Cloning II, which do not feature Altmetric donuts on their pages. Interestingly, the bioRxiv preprint server does, in addition to tracking .pdf download and abstract view counts.

Example of an Altmetric donut in context (top) and readership stats (bottom) from a recent Biology paper for which I am an author. 

Retrieving a donut and data summary from the Altmetric database is easy. You embed a few line of code (see inset below) into an HTML document, and the donut and score appear where desired. While the donut is most useful for augmenting a publication list, in this case I simply created a test document for collating data from across many papers.

// Formal journal article citation
Alicea, B., Murthy, S., Keaton, S.A., Cobbett, P., Cibelli, J.B., and Suhr, S.T.  Defining phenotypic respecification diversity using multiple cell lines and reprogramming regimens. Stem Cells and Development, 22(19), 2641-2654 (2013).
// Code for donut and database call; possible data subclasses include:
// data-arxiv-id
// data-handle
// data-doi

In context, the donut can provide useful information about how a given paper is diffusing through the academic internet. In the case of the Stem Cells and Development paper (see code), the paper has an Altmetric score of 9. While the Journal website does not have Altmetric or download data, it does provide a doi identifier and select forward citations.

Examples of the Altmetric database entry (top) and the Journal website (bottom) for the Stem Cells and Development paper.

Similar data exist for a follow-up paper to the Stem Cells and Development paper -- in this case, a preprint involving a specialized quantitative analysis (based on Signal Detection Theory) of the same data. For this paper, we have an arXiv identifier, which provides us with a donut and statistics on the relative popularity of the paper based on age and other similar documents in the Altmetric database.

A typical arXiv article page, in this case for an arXiv preprint related to the Stem Cells and Development paper.

This arXiv preprint comes with code for the analysis, which is posted to Github.

For this particular paper, there is an associated Github repository. Even for preprint repositories with Altmetric and readership data (such as bioRxiv), the integration of Github materials is rather poor, particularly in generating an Altmetric. Alternately, there is an opportunity for Github to This is an area for which user statistics linked back to the original paper would be appreciated. 

Altmetrics for the same arXiv preprint. We can access data on the sources of the Altmetric score, as well as the attention score in the context of all other tracked documents in the Altmetrics database.

We can also integrate readership data across sources to come up with a picture of how our academic work is being shared, consumed, and diffused. In this example, I will show how data from a blog analytics engine and Altmetric data can be combined. Research blogs are an up-and-coming area of research in Altmetric statistics capture. I have taken two blogrolls (Carnival of Evolution #46 and Carnival of Evolution #70), for which citable versions were posted to Figshare immediately after going live. My blogging platform (Blogger) has readership stats but no Altmetrics, while Figshare has Altmetrics and readership stats for the Figshare version only.

Altmetric data for two blogrolls cross-posted to Figshare, which provides both a doi identifier and an Altmetric donut. There is also view and download information for the Figshare version, which may or may not be inclusive of people viewing such content on the blog site.

Let's look at the Figshare data first. Carnival #46 has an Altmetrics score of 10 with 188 views and 58 downloads. By contrast, Carnival #70 has an Altmetrics score of 6 with 331 views and 82 downloads. Clearly, there is some variation in direct engagement between the two datasets that is proportional to the score.

Readership statistics for Carnival of Evolution #46 (top) and Carnival of Evolution 
#70 (bottom). Blog analytics only provides the number of "reads" on the home site since publication.

There is also little relationship between the number of Blogger reads and the Altmetric score (as the Altmetric score does not directly capture this number). Carnival #46 has 7928 reads over roughly 4 years and 7 months. Carnival #70 has 1602 reads over roughly 2 years and 7 months. 

Even in cases where no Altmetric donut can be generated (such as for book chapters), there are still ways to evaluate an article's reach. In the case of Academia.edu, a new feature has been added that allows people to leave a comment when they interact with a document. This is a more qualitative assessment of engagement, but also provides authors an idea of whether or not "reads" or "views" translate into more than just a passing glance.

Two consumers of a book chapter took time to express their gratitude. Other reasons can be quite interesting as well, particularly when they have to do with educational purposes.

Hope you have enjoyed this exercise. It is not meant to be an exhaustive discussion of the Altmetric evaluation system, nor is it the limit of what can be done with Altmetrics and other tools for tracking you work. While there is clearly more technical work to be done on this front, tools such as Altmetric APIs are available. The biggest challenge is to building a social economy based on a variety of research outputs. The field is moving quite rapidly, so what I have shown here is likely to be just the beginning. 

October 20, 2016

OpenWorm Blog: Announcing the OpenWorm Open House 2016

The content is being cross-posted from the OpenWorm blog, and will be updated periodically.

Hello Everybody!

We want to announce our first Open House for 2016 that will happen on October 25th from 10:30am to 4pm EST (UTC-4) (check here for your timezone), so mark the date on your calendars! The event will be live streamed at this link.

If you were waiting for an opportunity to look at the recent progress we’ve made across all the projects, this is your chance. During the meetings many contributors will present a number of flash talks and various demos, so if you are interested to hear the latest about PyOpenWorm, c302, Sibernetic, Geppetto, Analysis toolbox or any other thing happening under our roof don’t miss this opportunity!

Click below for the schedule of events.

Streamed Online:

10:30 AM - 11AM: Welcome (Stephen Larson)

Flash talks
          11:00 - 11:05: Recent progress in OpenWorm (Stephen Larson)
          11:10 - 11:15: C. elegans nervous system simulation (Padraig Gleeson)
          11:20 - 11:25: C. elegans body simulation (Andrey Palyanov)
          11:30 - 11:35: OpenWorm Badge System (Chee-Wai Lee)
          11:40 - 11:45: DevoWorm Overview (Bradly Alicea)
          11:50 - 11:55: Neuroinformatics (Rick Gerkin)
          12:00 - 12:05: Geppetto (Matteo Cantarelli)
          12:10 - 12:15: Movement Validation (Michael Currie)
          12:20 - 12:25: WormSim (Giovanni Idili)

On social media channels
          12:30 - 1:30: Social media interactions & break out signup

Streamed online (links to be added)
          1:30 PM - 3:00: Multiple track breakout sessions
                    Morphozoic Tutorial - Tom Portegys

On social media channels
3:00 - 3:30: Wrap up & Social Media Networking

Oh and bring along your nerdy friends, the more the merrier!

Hope to see you there!

The OpenWorm team

September 23, 2016

Learning by Doing, Where Doing is Earning Badges

As a member of the OpenWorm Foundation community committee (see previous post), we have been trying to find a means of engaging potential contributors within the context of the various projects. One type of activity is the Badge, a bite-sized [1] learning opportunity that we plan to use as both certifications of competency and concrete goals for the various projects. The OpenWorm Badge System is being spearheaded by Chee-Wai Lee, and is an emerging method in Educational Technology [2]. More details about this will be shared to the community by Chee-Wai in the form of a tutorial at the upcoming OpenWorm Open House.

An example of how semantic data on phenotypes can be extracted from the scientific literature. PICTURE: Tagxedo.com, BLOGPOST: Phenoscape blog

Each badge is designed to impart a specific skill. The OpenWorm badge system currently covers scientific topics (Muscle Model Builder, Hodgkin-Huxley) and research skills (Literature Mining). My contribution is the Literature Mining (LM) series. Literature mining is a technique used to organize the scientific literature, extract useful metadata (e.g. semantic data) from these sources, and identify secondary datasets for re-analysis [3]. Learning skills in Literature Mining will be useful to a wide range of badge earners, particularly those interested in Bioinformatics and Open Science research. These are skills used extensively in the DevoWorm project, and we will be planning more badges on related topical areas in the future.

The first LM badge is focused on working with the scientific literature, while the second (LMII) badge introduces learners to open-access secondary datasets. The only prerequisite is that you must earn Badge I in order to earn Badge II. Both of these badges recently went live, and you may start working on them immediately.

Example of the badge curriculum for LMI. The badgelist system requires learners to complete each step one at a time, and then request feedback (if applicable) from the Admin (e.g. instructor).

[1] why not "byte-sized", you say? Well, the Literature Mining badges are almost byte-sized (seven requirements apiece), so you could say that we are headed in that direction!

[2] Ferdig, R. and Pytash, K. (2014).  There's a badge for that. Tech and Learning, February 26.

[3] For examples of how Literature Mining can be useful, please see the Nature site for news on literature mining research.

September 6, 2016

Now Announcing the OpenWorm Open House

OpenWorm Browser. Courtesy Christian Grove, WormBase and Caltech.

About two years ago, I announced the start of the DevoWorm project to the OpenWorm community. Now both OpenWorm and DevoWorm have grown up a bit, with the former (OpenWorm) now being a Foundation and the latter (DevoWorm) resulting in multiple publications. Now we will be celebrating all of the projects that make up the OpenWorm Foundation in an Open House format, taking place in cyberspace and tentatively scheduled for October.

Image courtesy Matteo Farinella: http://matteofarinella.com/Open-Worm. These posters are the outcome of an OpenWorm Kickstarter campaign several years ago.

The details of the schedule are still being worked out, but the format is to include both short, 5-minute talks (Ignite-style) and longer tutorials (45-60 minutes, plus questions). The short talks will highlight the various ongoing projects within OpenWorm, while the tutorials will focus on specific methods or procedures employed by the projects. If you happen to be a project leader or major contributor, I have probably already asked you for content. Interested in either contributing content or attending? Please let me know

Dr. Stephen Larson (pre-PhD), discussing the connection between Lt. Data and C. elegans at Ignite San Diego.

I have also been involved in committee work for the OpenWorm foundation. One of the initiatives we are in the process of establishing is the OpenWorm badge system, which is being spearheaded by Dr. Chee-Wai Lee. Currently trendy in the online learning world, this is an experiment in open learning that provides micro-credentials to a global community. Badges are a great way to learn new skills, as well as a means to motivate people's contributions to different projects within OpenWorm. Currently, OpenWorm is offering tutorials on the Hodgkin-Huxley model, the Muscle Model builder, and the Muscle Model explorer. If there are any tutorials you would like to see us offer, or if you think there is a need for a particular skill to be highlighted, please let me know.

August 19, 2016

From Toy Models to Quantifying Mosaic Development

Time travel in the Terminator metaverse. COURTESY: Michael Talley.

Almost two years ago, Richard Gordon and I published a paper in the journal Biosystems called "Toy Models for Macroevolutionary Patterns and Trends" [1]. Now, almost exactly two years later [2], we have published a second paper (not quite a follow-up) called "Quantifying Mosaic Development: towards an evo-devo postmodern synthesis of the evolution of development via differentiation trees of embryos". While the title is quite long, the approach can be best described as computational/ statistical evolution of development (evo-devo).

Sketch of a generic differentiation tree, which figures prominently in our theoretical synthesis and analysis. COURTESY: Dr. Richard Gordon.

This paper is part of a special issue in the journal Biology called "Beyond the Modern Evolutionary Synthesis- what have we missed?" and a product of the DevoWorm project. The paper itself is a hybrid theoretical synthesis/research report, and introduces a variety of comparative statistical and computational techniques [3] that are used to analyze quantitative spatial and temporal datasets representing early embryogenesis. Part of this approach was previewed in our most recent public lecture to the OpenWorm Foundation.

The comparative data analysis involves investigations within and between two species from different parts of the tree of life: Caenorhabditis elegans (Nematode, invertebrate) and Ciona intestinalis (Tunicate, chordate). The main comparison involves different instances of early mosaic development, or a developmental process that is deterministic with respect to cellular fate. We also reference data from the regulative developing Axolotl (Amphibian, vertebrate) in one of the analyses. All of the analyses involve the reuse and analysis of secondary data, which is becoming an important part of the scientific process for many research groups.

One of the techniques featured in the paper is an information-theoretic technique called information isometry [4]. This method was developed within the DevoWorm group, and uses a mathematical representation called an isometric graph to visualize cell lineages organized in different ways (e.g. a lineage tree vs. a differentiation tree). This method is summarized and validated in our paper "Information Isometry Technique Reveals Organizational Features in Developmental Cell Lineages" [4]. Briefly, each level of the cell lineage is represented as an isoline, which contains points of a specific Hamming distance. The Hamming distance is the distance between that particular cell in two alternative cell lineage orderings (the forementioned lineage and differentiation trees).

An example of an isometric graph from Caenorhabditis elegans, taken from Figure 12 in [5]. The position of a point representing a cell is based on the depth of its node in the cell lineage. The positions of all points are rotated 45 degrees clockwise from a bottom-to-top differentiation tree (in this case) ordering, where the one-cell stage is at the bottom of the graph.

A final word on the new Biology paper as it related to the use of references. Recently, I ran across a paper called "The Memory of Science: Inflation, Myopia, and the Knowledge Network" [6], which introduced me to the statistical definition of citation age. This inspired me to calculate the citation age of all journal references from three papers: Toy Models, Quantifying Mosaic Development, and a Nature Reviews Neuroscience paper from Bohil, Alicea (me), and Biocca, published in 2011. This was used as an analytical control -- as it is a review, it should contain papers which are older than the contemporary literature. Here are the age distributions for all three papers.

Distribution of Citation Ages from "Toy Models for Macroevolutionary Patterns and Trends" (circa 2014).

Distribution of Citation Ages from "Quantifying Mosaic Development: Towards an Evo-Devo Postmodern Synthesis of the Evolution of Development Via Differentiation Trees of Embryos" (circa 2016).

Distribution of Citation Ages from "Virtual Reality in Neuroscience Research and Therapy" (circa 2011).

What is interesting here is that both "Toy Models" and "Quantifying Mosaic Development" show a long tail with respect to age, while the review article shows very little in terms of a distributional tail. While there are differences in topical literatures (the VR and associated perceptual literature is not that old, after all) that influence the result, it seems that the recurrent academic Terminators utilize the literature in a way somewhat differently than most contemporary research papers. While the respect for history is somewhat author and topically dependent, it does seem to add a extra dimension to the research.

[1] the Toy Models paper was part of a Biosystems special issue called "Patterns in Evolution".

[2] This is a Terminator metaverse reference, in which the Terminator comes back every ten years to cause, effect, and/or stop Judgement Day.

[3] Gittleman, J.L. and Luh, H. (1992). On Comparing Comparative Methods. Annual Review of Ecology and Systematics, 23, 383-404.

[4] Alicea, B., Portegys, T.E., and Gordon, R. (2016). Information Isometry Technique Reveals Organizational Features in Developmental Cell Lineages. bioRxiv, doi:10.1101/062539

[5] Alicea, B. and Gordon, R. (2016). Quantifying Mosaic Development: Towards an Evo-Devo Postmodern Synthesis of the Evolution of Development Via Differentiation Trees of Embryos. Biology, 5(3), 33.

[6] Pan, R.K., Petersen, A.M., Pammolli, F., and Fortunato, S. (2016). The Memory of Science: Inflation, Myopia, and the Knowledge Network. arXiv, 1607.05606.