Today I am making good on an old promise to highlight more repositories for paleontological raw data. Previous posts in this series can be found here and here.
Full Disclosure: The statements about MorphoBank in the “Nitty Gritty” section were checked for accuracy by Maureen O’Leary (MorphoBank Project Director) and Seth Kaufman (software developer). The impressions are my own: I have submitted, published, and downloaded MorphoBank data.
Impressions: MorphoBank is not just a data repository, but also a collaborative tool to produce, edit, illustrate, and annotate morphological character matrices for phylogenetic analysis. It’s not a tree-making, tree-visualizing, or tree-using site; it’s a tree data site, and it’s built to accommodate large groups who are working in the same project. In service of this goal, each character or character state can be linked to an image or video of that character state, which can then be edited or annotated. MorphoBank is cloud-based, meaning you don’t have to concatenate dozens of files from collaborators to build a final matrix; everyone can see what you coded immediate (and why, if you load images). You can use MorphoBank just to create matrices, just as a media repository, or a combination.
There are lots of cool toys that make collaboration easier – these include an image annotation menu, comment screens (facilitating discussions of identity and homology), the ability to restriction which taxa/rows each person can edit (preventing people from accidentally coding/editing the wrong thing), the ability to merge many source matrices into one combined matrix, and the ability to keep all the relevant documents in one place. You can batch upload taxa, specimens, and media, which is convenient. The how-to’s are thorough and easy to use, and so is the site itself.
The matrix editor is quite user-friendly. You can create one from scratch or import from a TNT or Nexus file (and even if you don’t generate your matrix in MorphoBank, you can still host the data files there). The comments and annotations feature make it easy to discuss morphology or calls on codings with colleagues, without having to get them on the phone or in person. You can also track changes if you think someone has messed up, and assign different levels of editing ability to prevent such things from happening in the first place. I like this feature a lot, because it enables students and volunteers (and colleagues) to participate in different roles. Another cool feature is that it records and displays how many cells each team member has scored, as well as the taxa, media and citations they added. This provides greater demonstration of exactly who did what work in a collaboration.
MorphoBank is one of the best, if not the best, repositories for 2D images. The allowed file and image sizes are generous, and the built-in viewer enables you to zoom in/out and label/annotate your images. As a bone histologist, this is my preferred way to share my raw data – my dissertation involved over 700 large format images, and this was how I presented them to my committee. As a supplement to papers, MorphoBank is top notch: you can present detailed images at their original resolution without worrying about page sizes or file sizes, and you can show way more images than you could get into even in the most generous of journals. Having a permalink (and coming soon: DOIs) for each image means those images can be cited later, getting credit for all your work. Those permalinks also can be given to media outlets as part of your outreach or research promotion (yay broader impacts).
Overall, I have no big issues with MorphoBank, and from personal experience I can report that all the minor speedbumps I’ve experiences were quickly resolved by their excellent support team. However, there is one feature I’d like to see added that would integrate MorphoBank with other sites better: it would be nice to be able to link specimens and publications externally. For example, links from the vouchered specimens to their museum database pages, or bibliography publications to their journal page (both are features GenBank offers).
Bottom line: Grant writers should feel comfortable listing MorphoBank in their Data Management Plan because it’s safe, easy to use, and your reviewers will (should) have heard of it; reviewers should feel comfortable asking authors post data to MorphoBank when appropriate, for the same reasons.
Postcranial skeleton of Sebecus icaeorhinus, MPEF/PV 1776. Image © 2012 Diego Pol, licensed under CC-BY-NC-ND. MorphoBank accession number M106695, accessed here.
MORPHOBANK: THE NITTY GRITTY
What it is: Both a collaborative tool and repository for the scientific data associated with peer-reviewed scientific publications. The focus is on data related to phylogenetic tree-building and the evolution of morphological phenotypes in general. It allows research teams to work on a single shared copy of a character matrix in real time over the Web. These data matrices can be linked to images of different anatomical characters/character states, or the images can stand on their own. “Morphology” is intepreted broadly – really, any type of phenomic data is welcome.
What it is, in their words: “…a web application for conducting phylogenetics or cladistics research on morphology. It enables teams of scientists who use anatomy to study the Tree of Life (phylogeny) to work over the web – in real time – and to do research they could not easily do using desktop programs alone.”
Who runs it: The MorphoBank Project, which has an executive committee that consists of academic researchers (including one student representative). The Project Director is Maureen O’Leary (Stony Brook University), and the Executive Committee is chaired by Nancy Simmons (American Museum of Natural History).
Who funds it: Currently: NSF (direct), with in-kind support from the American Museum of Natural History and Stony Brook University (as server hosts). Previously: American Museum of Natural History, NESCENT, NOAA (NA04OAR4700191), NSF (DBI-0743309, DEB-9903964 and EAR-0622359), San Diego Supercomputer Center, Stony Brook University.
Who uses it: Researchers who use or submit data; journals allow you to cite MorphoBank data. Project/media links could be used for media promotion and outreach as well.
Nasutoceratops titusi UMNH VP 16800. Image © Mark Loewen, licensed under CC0. MorphoBank accession number M307812, accessed here.
Cost to submit: Free.
Cost to access: Free.
Data and file types supported: Data related to systematics or morphology, including text, audio, images, video, and more. 2D images: JPEG, GIF, PNG, TIFF and PSD are allowed, but a file in CMYK or with layers may not render properly. 3D images: Three dimensional surfaces in STL and PLY format will be supported soon. Video: MPEG-4 (preferred), QuickTime, and Windows Media. Audio: MP3, AAC, AIFF or WAV. Phylogenetics: The matrix editor/viewer accepts data in Nexus or TNT format. Other files not in this format are stored in the Documents folder. Other: No PPT or PPTX.
File sizes allowed: Doesn’t say. In the past, my files have been limited to 40MB, but by emailing tech support I was able to request an increase in maximum file size. The image viewer sometimes has difficulty processing gigapixel images, but this can be fixed by resizing or emailing tech support.
Copyright status: Settings are currently available for Media (images, video, and the like). Your choice: CC0, CC BY, CC BY-NC, CC BY-NC-SA, CC BY-SA, CC BY-ND, CC BY-NC-ND. You can also post copyrighted media released for one-time use. Cool feature: option to upload your copyright permissions document.
Data available during peer review? Yes, if you set it up. Password protected. This is not streamlined into the journal submission process, and from personal experience, journal editors don’t pass on this info if it’s in the cover letter only. I make sure to include the login information in the body of the manuscript, so the reviewers can see it.
Allowed to post data from previous pubs? Yes, even publications published before MorphoBank existed, and they invite and encourage this practice. Example: http://morphobank.org/permalink/?P694
Skeleton of Vulpavus ovatus (AMNH 11498). Image © American Museum of Natural History, licensed under CC BY-NC-SA. MorphoBank accession number M150920, accessed here.
Accession numbers provided? Yes. Every project gets a unique project number and stable URL (permalink), and each image gets its own accession number (similar to GenBank), assigned as you upload them. You can also assign media to folios (subsets of media), and these also get unique stable URLs. DOIs for Projects, Media and Matrices will be available before the end of April 2014 and this will be retroactive (!!!).
Data goes live when: You choose to publish the project. Data can stay as an unpublished project forever, or you can publish it when the manuscript is accepted, when embargo is lifted, when the paper is published, or any time after. You can choose to publish all or some of your files when the project is published.
Data is backed up? Yes, to tape at Stony Brook University and off-site mirror servers at the American Museum of Natural History.
Stats provided? Project views, project downloads, media views, media downloads, document downloads, data on team member efforts.
How to cite your data in your manuscript? Varies by journal, but some thoughts:
- Definitely cite the MorphoBank publication software. Currently, this is: O’Leary MA and SG Kaufman. 2012. MorphoBank 3.0: Web application for morphological phylogenetics and taxonomy. http://www.morphobank.org.
- You should additionally cite the peer-reviewed publication: O’Leary MA and SG Kaufman. 2011. MorphoBank: Phylophenomics in the “cloud”. Cladistics 27: 1-9.
- You should also cite your data. I recommend citing within text, something like:
Image data available on MorphoBank: http://morphobank.org/permalink/?P494 (but where ‘494’ is replaced with your own project number). I’ve also included lists of image accession numbers as tables (for larger projects, I think this is more appropriate for the SI).
How to cite data you download? Cite the permalink (the DOI once that feature goes live) and project number, and if you refer to a particular image, the accession number.
Can update after publication? No. Once the project is published, it cannot be modified.
Eryon (Houston Museum of Natural Science). Image by Daderot, licensed under CC0 1.0. MorphoBank accession number M326412, accessed here.
Exciting Future Developments: The MorphoBank group is about to release a version of the Matrix Viewer that will allow you view published matrices on iPad and Android tablets (the previous, Flash-based viewer is being replaced with a new, HTML 5-based viewer). In the future, they plan to do the same for the Matrix Editor, as well. MorphoBank also talks to a new NSF-supported site called the Evolution Project (now in beta testing). The Evolution Project allows people who have matrices in MorphoBank to crowdsource their data collection. Interested people (e.g., students, volunteers) can score cells from images (!!!). It is designed to speed up morphological data collection. Also coming soon: the ability to viewing CT images within the MorphoBank environment, and support for continuous characters.
Benefits in a nutshell:
- Secure cloud storage of image and character matrix data.
- Real-time collaborative editing of phylogenetic matrices and their associated data in the cloud.
- Images illustrate exactly what you mean by a given character or character state. As the MB site says, “Seeing the images that document the basis for homology – a character state or a cell score – is enormously helpful to researchers during their research project.”
- Nigh-infinite choices for: number of images, copyright, image size and resolution.
- Batch uploads and batch edits to metadata allowed.
- The ability to label and otherwise annotate images (without altering the original image).
- The ability to zoom in when viewing large or high-resolution images.
Three recent paleo papers using it:
Evans SE, JR Groenke, MEH Jones, AH Turner, DW Krause. 2014. New material of Beelzebufo, a hyperossified frog (Amphibia: Anura) from the Late Cretaceous of Madagascar. PLoS ONE 9(1): e87236. MorphoBank data here.
O’Leary MA, et al. 2013. The placental mammal ancestor and the post K-Pg radiation of placentals. Science 339:662-667. MorphoBank data here.
Nesbitt SJ, PM Barrett, S Werning, CA Sidor, AJ Charig. 2013. The oldest dinosaur? A Middle Triassic dinosauriform from Tanzania. Biology Letters 9:5pp. MorphoBank data here.