The Mathematical Data Science program is concerned with basic research in mathematics, probability, statistics, signal processing, machine learning, data engineering, and information theory. The program aims to develop rigorous mathematical and algorithmic answers to questions that are currently addressed using heuristics or non-principled approaches.
Application-driven research in mathematical data science is supported by the large-scale distributed decision-making thrust under the Computational Methods for Decision Making program.
Research Concentration Areas
- Big Data: In recent years, there has been an explosion in the amount of data generated and collected from various sensors. Analysis of these datasets brings numerous new challenges. In theory, more data is helpful since it improves inference; however, in practice this is hard to realize since one has to deal with limited computational resources. Therefore, fundamental work is needed to address the tradeoffs between computation and accuracy/inference.
- Small Data: Of particular interest is the situation when the dimensionality of data is high but the number of observations is small. In these situations, it is necessary to impose constraints on the allowable solutions (for example using the L1 norm) and this has been the topic of intense research in statistics. Similarly, in Bayesian modeling, to account for the missing information, the focus has been on developing appropriate priors. However, very little is known about the properties of such priors and new research is needed in developing computationally efficient prior distributions.
- Representation, learning and inference: Interestingly, the human visual system encounters a high-dimension small-sample learning problem every day. Recent research suggests that our visual system is successful because it relies heavily on priors and especially on contextual information. However, a principled mathematical approach for modeling contextual information during both learning and inference is still missing. An interdisciplinary approach that combines expertise from mathematics, neuroscience and computer science communities is encouraged.
- Complex Networks: Some of the most challenging datasets include complex networks such as social and biological/neural networks. In general, these datasets typically display various types of non-linear, non-Gaussian, and/or non-stationary structure. Current models fall mostly into the two groups: those that are fairly well understood but too simple to capture properties of real networks, and models that are more realistic but come without theoretical guarantees. Basic research in modeling dynamical properties of networks and determining causal effects and influences is needed in this area. Importantly, new computational algorithms should scale to large networks with performance guarantees.
- Multi-modal, multi-scale information integration: In addition to being large, current data sets are often heterogeneous and represent information at multiple scales both in space and time. A standard approach has been to develop a model that is specific to each modality and then to “fuse” information from different models often using ad-hoc techniques. This approach proved to be suboptimal and fundamental research is required to develop a theory that can integrate multi-modal and multi-scale information in a principled and unified way.
- Decision Making Under Uncertainty: Mathematical modeling for supporting complex decision making often involves the study of risk. Mathematical representation of risk and uncertainty and their association with a decision-making may necessitate inference of intents or high-level cognitive tasks. To this end, basic research is needed to formulate a rigorous foundation for a computational framework that can implement high-level reasoning, via quantitative or qualitative methods, which can cope with newly acquired information and a myriad of variations in operational environments.
Research Challenges and Opportunities
- Tradeoffs between computation and accuracy/inference in big data
- Constraints/priors that should be used on allowable solutions when the number of observations is small
- Dynamical properties of networks and determination of causal effects
- Computational algorithms that scale to large networks with performance guarantees
How to Submit
For detailed application and submission information for this research topic, please refer to our broad agency announcement (BAA) No. N0001425SB001.
Contracts: All white papers and full proposals for contracts must be submitted through FedConnect; instructions are included in the BAA.
Grants: All white papers for grants must be submitted through FedConnect, and full proposals for grants must be submitted through grants.gov; instructions are included in the BAA.