by Howard Hughes Medical Institute
Overview of Spyglass. Credit: (2024). DOI: 10.1101/2024.01.25.577295
Loren Frank's HHMI lab at UCSF has pioneered an ambitious framework for sharing vast neuroscience datasets and complicated analysis methods, a step toward tipping the culture of science toward more effective and fruitful collaboration.
Science holds the promise of tackling some of the world's greatest problems, from taming new pandemics and climate change to figuring out why brain circuity goes awry. But science is not without its own thorny problems. In the often-cutthroat race for new discoveries and for prestigious publications and awards, research teams may work in secret for years, collecting reams of data and conducting intricate analyses that are enormously difficult to verify.
Without access to the full research environment and data, "it's almost impossible for one group to replicate the work of others," says Howard Hughes Medical Institute (HHMI) Investigator Loren Frank at the University of California, San Francisco (UCSF).
Moreover, the fact that science "is a highly competitive, fractured profession is one of the things that slows science down," says Kristen Ratan, founder of Strategies for Open Science (Stratos), who is working with HHMI on strategies for data sharing.
That's especially problematic as researchers tackle harder and harder questions. Many of today's complex problems "are no longer solvable by one grad student working in isolation for four or five years," says Bodo Stern, chief of strategic initiatives at HHMI.
And then too, the process of publishing those results also can take years—and the basic format hasn't changed much in centuries. "The way we publish science is outdated," says Stern. "The format of an article today looks remarkably similar to that of a Nature article from 1869."
That's why there's now a growing push to change the very culture of science, swapping some of that fierce competition for friendly collaboration and promoting the widespread sharing of datasets and data analyses well in advance of publication. And while culture changes in science are notoriously difficult, a determined band of open science advocates has been making real progress.
Stratos, which Ratan founded in 2019, has been working with HHMI researchers like Frank to explore the idea of "collaboration hubs" that was originally pioneered at NASA to interpret the vast amounts of environmental and space data coming from satellites, telescopes, and many other sensors.
The US Government and Federal agencies have also signaled their support for open science and collaboration. Indeed, the Biden Administration's Office of Science and Technology Policy (OSTP) declared 2023 the Year of Open Science.
In a landmark 2022 memo, OSTP required agencies "to make publications and their supporting data resulting from federally funded research publicly available." Meanwhile, the National Institutes of Health has put in place an even stricter data sharing requirement. Says Ratan, "the policy landscape in the US has shifted extraordinarily."
Making data more accessible
Policy changes, however, are only part of the story. Scientists also need to step up not just to release their data, but also to devise ways to make those data and their analysis methods more accessible and comprehensible to would-be collaborators. Fortunately, there's now an ambitious new example of such a data sharing tool.
On January 26, 2024, after five years of nitty gritty software engineering work, Frank's lab released a preprint on bioRxiv describing a new "data analysis framework for reproducible and shareable neuroscience research" that his team dubs "Spyglass."
In the Spyglass framework, all the data that Frank's lab collects from arrays of electrodes inserted into rat brain regions involved in behavior, learning, and imagination—along with detailed information on each animal's second by second behavior—are brought together and stored in a standardized format, called Neurodata Without Borders (NWB).
Then, Spyglass provides software code (written in an open-source language, Python) that allows both sharing and analysis not just of the raw data, but also of the results from every step in what typically is a very complex analysis. As the preprint describes, "Spyglass also offers ready-to-use pipelines for analyzing behavior and electrophysiology data, as well as extensive documentation and tutorials for training new users."
Spyglass is available to anyone without having to understand NWB or download software through a cloud-based data sharing hub that HHMI commissioned. This is "a real leap in data sharing," says Ratan, with benefits not just for the neuroscience community but also for Frank's lab and its direct collaborators.
The link to this hub is available through the preprint and offers anyone with enough compute power the ability to conduct their own analyses, changing parameters and assessing results on their own. Using the new standardized approach to data collection and analysis, "we're doing things two to three times faster than we were before," Frank explains. Adds Stern, "it's a huge upfront investment that's now starting to pay off."
More information: Kyu Hyun Lee et al, Spyglass: a framework for reproducible and shareable neuroscience research, bioRxiv (2024). DOI: 10.1101/2024.01.25.577295
Journal information: bioRxiv , Nature
Provided by Howard Hughes Medical Institute
Post comments