[Neurodebian-users] NIH Request for Information (RFI): Input on Development of Analysis Methods and Software for Big Data
Yaroslav Halchenko
debian at onerussian.com
Thu Aug 8 17:01:35 UTC 2013
----- Forwarded message from "Glanzman, Dennis (NIH/NIMH) [E]" <dglanzma at mail.nih.gov> -----
Date: Thu, 8 Aug 2013 15:13:03 +0000
From: "Glanzman, Dennis (NIH/NIMH) [E]" <dglanzma at mail.nih.gov>
To: "comp-neuro at neuroinf.org" <comp-neuro at neuroinf.org>, "connectionists at mailman.srv.cs.cmu.edu" <connectionists at mailman.srv.cs.cmu.edu>
Cc:
Subject: [Comp-neuro] Request for Information (RFI): Input on Development of Analysis Methods and Software for Big Data
Request for Information (RFI): Input on Development of Analysis Methods
and Software for Big Data
--------------------------------------------------------------------------
Notice Number: NOT-HG-13-014
Key Dates
Release Date: August 8, 2013
Response Due Date: September 6, 2013
Issued by
National Human Genome Research Institute ([1]NHGRI)
Purpose
This Request for Information (RFI) is to solicit comments and ideas for
the development of analysis methods and software tools, as part of the
overall Big Data to Knowledge (BD2K) Initiative. Specifically, this RFI
solicits input on needs for software and analysis methods related to data
compression/reduction, data visualization, data provenance, and data
wrangling.
Background
Biomedical research is becoming more data-intensive as researchers are
generating and using increasingly large, complex, and diverse datasets.
This era of 'Big Data' in biomedical research taxes the ability of many
researchers to release, locate, analyze, and interact with these data and
associated software due to the lack of tools, accessibility, and
training. In response to these new challenges in biomedical research, and
in response to the recommendations of the Data and Informatics Working
Group (DIWG) of the Advisory Committee to the NIH Director
([2]http://acd.od.nih.gov/diwg.htm), NIH has launched the trans-NIH Big
Data to Knowledge (BD2K) Initiative ([3]www.bd2k.nih.gov).
The long-term goal of the NIH BD2K Initiative is to support advances in
data science, other quantitative sciences, policy, and training that are
needed for the effective use of Big Data in biomedical research. (The
term "biomedical" is used here in the broadest sense to include
biological, biomedical, behavioral, social, environmental, and clinical
studies that relate to understanding health and disease). The term 'Big
Data' refers to datasets that are increasingly larger, more complex, and
which exceed the abilities of currently used approaches to manage and
analyze. "Big Data" is also meant to capture the opportunities and
address the challenges facing all biomedical researchers in accessing,
managing, analyzing and integrating large datasets of diverse data types.
Such data types may include imaging, phenotypic, molecular (including
–omics), clinical, environmental, behavioral, and many other types of
biological and biomedical data. "Big Data" also includes data generated
for other purposes (e.g. social media, search histories, cell phone data)
when they are repurposed and applied to address health research
questions. Biomedical Big Data primarily emanate from three sources: (1)
a small number of groups that produce very large amounts of data, usually
as part of projects specifically funded to produce important resources for
use by the research community at large, or large collections of electronic
health records; (2) individual investigators who produce large datasets
for their own project, but which might be broadly useful to the research
community at-large; (3) an even greater number of investigators who each
produce small datasets whose value can be amplified by aggregating or
integrating them with other data.
One of the DIWG recommendations was to support the development,
implementation, evaluation, maintenance and dissemination of informatics
methods and applications. NIH supports a wide range of bioinformatics and
computational science through efforts such as the Biomedical Science and
Technology Initiative funding opportunities and through programs supported
by individual NIH institutes and centers. NIH is now considering
supporting the development of analytical methods and software tools and
will focus initially on four targeted areas to begin to address critical
current and emerging needs of the research community for using, managing,
and analyzing more complex and larger data sets: data
compression/reduction, visualization, provenance, and wrangling.
An NIH BD2K Working Group charged with exploring the development of
informatics methods and tools seeks input from the biomedical research
communities on the four targeted areas listed above to ensure that
research resources generated will have the highest impact and value to the
research community. NIH has determined that guidance is needed from broad
scientific community in the following areas:
Data Compression/Reduction
While data compression is important in BD2K since it helps reduce resource
usage, most compression techniques involve trade-offs among various
factors, including the degree of compression, the amount of distortion
induced and the computational resources required to compress and
decompress the data.
Data reduction aims to more dramatically reduce the data volume, and in
the meantime reduce the complexity/dimensionality of data for easier
analysis. It usually involves processing and/or reorganization of data to
minimize redundancy, eliminate noise, and preserve signal and data
integrity.
Data Visualization
Data visualization permits researchers to communicate information through
graphical and interactive means and enables them to explore and gain
insight/knowledge from the data. The challenge in the Big Data era is on
interpreting complex, high-throughput data, especially in the context of
other relevant, but often orthogonal, data.
Data Provenance
Provenance of digital scientific data is useful for determining
attribution, identifying relationships between objects, tracking back
differences in similar results, guaranteeing the reliability of the data,
and to allow researchers to determine whether a particular dataset can be
used in their research (by providing lineage information about the data).
Data Wrangling
Data wrangling is a term that is applied to the conversion, formatting,
and mapping of data that enables researchers to more easily submit data to
a database, expose data to the internet, and allows data to be more easily
accessible and shareable. Researchers who generate datasets that, in
aggregate, become "Big Data" often find it difficult to submit data, even
when standards are well-established. Specialized informatics skills are
often needed, for example, to format data, apply metadata, fill gaps, use
ontologies, capture provenance, annotate features, and apply other
functions to reformat, manipulate, transform, or process data.
Information Requested
To maximize the impact of these valuable research resources and tools
(informatics methods and tools) and facilitate its use by scientists with
a broad range of expertise, we seek input from scientific and informatics
research and user communities in identifying and prioritizing needs and
gaps in the four focus areas outlined above.
Submitting a Response
All responses must be submitted via email to [4]BD2KSoftware at mail.nih.gov
by Friday, September 6, 2013. Please include the Notice number in the
subject line. Response to this RFI is voluntary. Responders are free to
address any or all of the categories listed above. The submitted
information will be reviewed by the NIH staff.
This request is for information and planning purposes only and should not
be construed as a solicitation or as an obligation on the part of the
Federal Government. The NIH does not intend to make any awards based on
responses to this RFI or to otherwise pay for the preparation of any
information submitted or for the Government's use of such information.
The NIH will use the information submitted in response to this RFI at its
discretion and will not provide comments to any responder's submission.
However, responses to the RFI may be reflected in future funding
opportunity announcements. The information provided will be analyzed and
may appear in reports. Respondents are advised that the Government is
under no obligation to acknowledge receipt of the information received or
provide feedback to respondents with respect to any information
submitted. No proprietary, classified, confidential, or sensitive
information should be included in your response. The Government reserves
the right to use any non-proprietary technical information in any
resultant solicitation(s).
Inquiries
Please direct all inquiries to:
Jennifer Couch, Ph.D
National Cancer Institute
Telephone: 240-276-6210
Email: [5]Jennifer_Couch at nih.gov
Website: [6]http://bd2k.nih.gov/#sthash.i3bBBRHF.dpbs
--------------------------------------------------------------------------
[7]Weekly TOC for this Announcement
[8]NIH Funding Opportunities and Notices
--------------------------------------------------------------------------
0 15 0 0 15
[10]Department of Department of
[9]NIH Office of Health and Human Health [11]USA.gov -
Extramural Research Services (HHS) - and Human Government Made Easy
Logo Home Page Services
(HHS)
NIH... Turning Discovery Into Health^®
--------------------------------------------------------------------------------
Note: For help accessing PDF, RTF, MS Word, Excel, PowerPoint, Audio or Video
files, see [12]Help Downloading Files.
References
Visible links
1. http://www.nhgrii.nih.gov/
2. http://acd.od.nih.gov/diwg.htm
3. http://www.bd2k.nih.gov/
4. mailto:BD2KSoftware at mail.nih.gov
5. mailto:Jennifer_Couch at nih.gov
6. http://bd2k.nih.gov/#sthash.i3bBBRHF.dpbs
7. http://grants.nih.gov/grants/guide/WeeklyIndex.cfm?WeekEnding=08-09-13
8. http://grants.nih.gov/grants/guide/index.html
9. http://grants.nih.gov/grants/oer.htm
10. http://www.hhs.gov/
11. http://www.usa.gov/
12. http://grants.nih.gov/grants/edocs.htm
_______________________________________________
Comp-neuro mailing list
Comp-neuro at neuroinf.org
http://www.neuroinf.org/mailman/listinfo/comp-neuro
----- End forwarded message -----
--
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
More information about the Neurodebian-users
mailing list