Monday, February 18, 2019

The value of non-profit: Dryad and California Digital Library

Workshop: Accelerating Data Publication: new models for research institutions. Melbourne 8 Feb 19.

I attended a workshop the other day, where John Chodacki, Director, UC Curation Center, California Digital Library, pitched an idea for a Data Repository, run by the non-profit organisation, Dryad. Dryad, accepts researcher's data, for (as John mentions) around an average $120USD processing fee, which covers the curation cost to check all submissions (data files and data description), and to hold the data long-term to support a publication related to the data. Here are the slides and Etherpad. Per Dryad's website, the charge per publication is calculated as a $120USD processing fee plus a storage fee if a researcher submits over 20Gb data ($50 per 10Gb). Most datasets come under the data cap, John says. This fee is waived, upon request, for researchers from lower  - middle income countries (per World Bank definitions).

John had another idea, for how to use Dryad, for his University of California (UC) researchers. UC had been considering building their own repository from scratch, but thought their funds might be better spent building on top of an existing platform - where researchers were already depositing their data. Apparently 90,000 researchers (Slide 55) already have deposited data at Dryad, and Dryad provide data deposit for over 900 academic journals (Slide 55).

John's idea was to use Dryad as an Institutional service, where the University could pay an annual fee, then their researchers would no longer need to pay a per article data deposit fee. Talks with Dryad began, and they agreed to seek support of other institutions to promote the new service. Fees were agreed on a sliding scale, where a big Institution would pay more (about $13,000 USD per year) and a smaller Institution would pay less (about $3,000 USD per year).

At the workshop I attended, there were representatives from a number of Australian Universities and research organisations; RMIT, Deakin, Victoria University, CSIRO, University of Sydney, University of New South Wales, Queensland University of Technology, Aarnet, Monash University, plus myself and my colleagues from ARDC, the Australian Research Data Commons. In a final quiz item, the attendees said they were interested in the new service, but that they were not the ones to sign off on that type of purchase. It raised the question of what was the next steps. How could Australian institutions contribute to interest in this new idea for a repository? This blog post was one of those next steps.

There were a number of obvious benefits:
- paying a non-profit community run organisation, rather than a commercial provider, so it should be a little cheaper
- community can drive what the system looks like, and the functions can be responsive to community needs
- using a system that already has a lot of researcher users (over 60,000), and journal integration.
- using a trusted system (certified with CoreTrustSeal recognised certification).

But there were some obvious challenges:
- if you were to use this system, how would we transition from our current system to this new system?
- would there be local support in Australia to help users, and help Institutions make a business case, and help researchers use the Dryad system?

There were a number of questions raised:
- where would the data be stored? In the US. But a copy could be kept locally.
- what about sensitive data, which has to be stored in Australia (or an equivalently privacy-protected jurisdiction ie not the US which has weaker privacy rules)? Dryad doesn't accept sensitive data, so where the data is stored should not be an issue.
- what size data can be sent to Dryad? (As a file up to 10Gb; and from a URL up to 100Gb)
- can we use specific data description types (vocabularies)? Should be fine, John said.
- can we get descriptions of data belonging to our Institutions, back into institutional systems? Contact Dryad for discussion.
- have CDL/Dryad talked to the Australian infrastructure provider, QCIF, who manages the RedBox repository? or Figshare? about how to create a win-win amongst the various repository providers? No, but there is enough room in the ecosystem to allow different repositories to service different users/
- were there any plans to have people in Australian to promote the services and educate institutions and users about the benefits of the system, like Figshare's sales team? Not at this stage.

Later that same day, I was at another event, and spoke to Natalie Meyers, University of Notre Dame, who is an ambassador for the Center for Open Science and interested in having more integrations with the Open Science Framework (OSF). She wondered if by engaging a larger community to join a Dryad Institutional service, there might be incentive to enhance the Dryad API to make it more consumable and its endpoints more interoperable with other open science platforms like OSF? John, in response, says the new software development work  at CDL/Dryad will provide exactly this type of opportunity by leveraging CDl's DaSH API (see API docs and github repo). 

So, it is obvious that the community needs to keep talking about what to do next, what questions are still left unanswered, and how to sell the idea to the money people at your institution. That will require at least in the beginning, glossy flyers, and ultimately a business case to assess the costs and benefits. See also the Dryad blogpost (May '18) announcing the partnership with CDL (follow up Oct '18 post).

If you have any questions, please post them in the comments section below and I will send them along to John (john.chodacki - at- ucop.edu) to answer.

NB: Where I work at ARDC, we have a policy of vendor neutrality, so this post should not be read as an assessment of the value of the CDL/Dryad service, but does seek to point out the many issues to be wrestled with in making a value assessment.
=
Thanks to Natalie and John for their feedback on this post.