Designing Future Wheat Resources

Grassroots and data coordination

Grassroots powered: The DFW Data Portal

Grassroots powered: The DFW Data Portal

With the data-generative approaches that are increasingly common in modern plant science research, it is vital that the data and metadata produced by these efforts can be shared and reused. The Grassroots Infrastructure project aims to create a reusable package of computing software tools to help users and developers gain access to scientific data using information systems that can easily be interconnected. This means institutions and groups can deploy a simple lightweight virtual machine, expose local data, connect up any existing data services, and federate their instance of the Grassroots with others out-of-the-box.

As part of the DFW project, we manage a Grassroots installation at EI and one at the University of Bristol as part of the CerealsDB group.

Overview

Users

The DFW project provides numerous services and tools that academics, breeders and users from industry can access interrogate wheat datasets hosted through our institutional partners. 

DFW Data Portal

For wheat researchers, we have a data portal which hosts a variety of community wheat datasets that are openly available as part of an existing publication, or through the Toronto agreement for unpublished work. This data warehouse provides a reliable storage area for wheat data provided by the community, alongside suitable descriptions of the project, study and data files.

Data analysis services

Whilst users are able to download our data in its entirety, many of our datasets are large in size or complexity, and highly specialised. To help users gain access to analyse these data, we provide a number of dedicated DFW data services. 

BLAST: We have a BLAST service if you wish to search for sequences of interest across a wide range of wheat sequence databases. 

Others

Seedstor Integration:

Description of the Seedstor service with its nice map could go in here….  XMan can you describe it as all of the good stuff is on the front end.

Polymarker: 

Scaffold: If you wish to get an entire named scaffold from a FASTA file, we have a Scaffold service based upon the Samtools software.

To make life as easy as possible for end-users, Grassroots is able to parse the output of certain services and set up the required values for other services to be run with a single click. For example, both the scaffold and Polymarker services can be run directly from the results of our BLAST service in this way.

Developers

For developers, the Grassroots Infrastructure uses a controlled vocabulary of JSON messages to communicate, so any server or client that can understand JSON can be used to access and connect to the platform along with a suite of software libraries to provide as much of the heavy lifting as possible.

Examples

Programmatic access

Developers can use our REST API to access information, or install a Grassroots instance and develop your own custom services and tools that can fully integrate with our systems. Further details are available in the Grassroots documentation.

Service hosting

The Grassroots systems are running within CyVerse UK, a cloud platform for UK bioscience researchers, hosted at the Earlham Institute (EI). As part of DFW, CyVerse UK can host virtual machines for data analysis and collaborative work, much like those provided by Amazon Web Services or Microsoft Azure. These machines can be preconfigured with analysis tools, frameworks, and wheat datasets to make wheat data access and analysis more easily available. If you would like to use CyVerse UK for your own research, please contact us.