Release Date
November 29th, 2023
Open Date
January 3rd, 2024
Due Date(s)
February 21st, 2024
Close Date
February 21st, 2024
Topic No.


Acoustic Training Data Prioritization


Department of DefenseN/A


Type: SBIRPhase: BOTHYear: 2024


The Department of Defense (DOD) is seeking proposals for the topic of "Acoustic Training Data Prioritization" as part of the SBIR 24.1 BAA. The Navy branch is specifically interested in developing a tool that assesses training data for artificial intelligence or machine learning (AI/ML) algorithms used in detecting and tracking submarines. The current paradigm of using large sets of data for training AI/ML algorithms is costly and may not yield optimal results. The Navy is looking for a tool that can analyze acoustic data collected by undersea warfare systems and prioritize data that is diverse, representative, and as small as possible for training AI/ML algorithms. The tool should reduce the amount of training data used while maintaining or improving performance, as measured by the Receiver Operating Characteristic (ROC) curve. The Phase I of the project involves developing a concept and demonstrating feasibility using unclassified data, while Phase II focuses on designing and delivering a prototype tool for testing and evaluation. The technology developed in this project has potential applications in various industries that rely on AI/ML training.




The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), 22 CFR Parts 120-130, which controls the export and import of defense-related material and services, including export of sensitive technical data, or the Export Administration Regulation (EAR), 15 CFR Parts 730-774, which controls dual use items. Offerors must disclose any proposed use of foreign nationals (FNs), their country(ies) of origin, the type of visa or work permit possessed, and the statement of work (SOW) tasks intended for accomplishment by the FN(s) in accordance with the Announcement. Offerors are advised foreign nationals proposed to perform on this topic may be restricted due to the technical data under US Export Control Laws.


OBJECTIVE: Develop a tool for assessing training data with artificial intelligence or machine learning (AI/ML) algorithms that provides desired data prioritization results from current or new data for effective, complete, and precise training.


DESCRIPTION: Systems that detect and track submarines are migrating to AI/ML to improve the probability of detecting submarines and to limit the probability of false alerts. The current paradigm for training AI/ML is to use large sets of data. However, the cost associated with training AI/ML on large amounts of data is high and may not result in optimal training results.


There is not currently a commercial tool to assess how comprehensive a training set truly is, how much of the training data is effectively redundant, or whether some data over-represents unusual conditions. Additionally, there is not currently a tool that would enable researchers to determine a priori whether a newly collected data set would add useful diversity to the existing training data. This lack of tools for assessing training data for AI/ML algorithms results in a current state where all data is collected for training, resulting in possible excessive training costs as well as possible over-training to specific data which may not be representative of the full range of conditions in which the system will function during hostile tactical operations.


The Navy seeks a tool for analysis of acoustic data collected by undersea warfare systems to enable selection of data that is diverse, representative, and as small as practical for training of AI/ML algorithms.

Acoustic data used for detection of submarines is collected on arrays of transducers, whether towed line receive arrays such as the Multi-Function Towed Array, or hull-mounted source/receiver arrays such as the 576-element AN/SQS-53C hull-mounted sonar array. The signals from the transducers are formed into beams representing the acoustic environment as a function of bearing at any given point in time. Key characteristics of data sets will include both meta-data (e.g., season, latitude and longitude, time of day) and attributes of the data (e.g., volume reverberation levels, numbers of “clusters” associated with reflectors such as bathymetric features, marine entities, surface ships, submarines, and wakes).

The tool developed will need to demonstrate the training data prioritization technology which reduces the amount of training data used to allow the AI/ML algorithm(s) to maintain or improve performance. Performance of the system is determined by the Receiver Operating Characteristic (ROC) curve, where recorded data is run through the system to determine the number of true positives are achieved as a function of false positives.


Work produced in Phase II may become classified. Note: The prospective contractor(s) must be U.S. owned and operated with no foreign influence as defined by 32 U.S.C. § 2004.20 et seq., National Industrial Security Program Executive Agent and Operating Manual, unless acceptable mitigating procedures can and have been implemented and approved by the Defense Counterintelligence and Security Agency (DCSA) formerly Defense Security Service (DSS). The selected contractor must be able to acquire and maintain a secret level facility and Personnel Security Clearances. This will allow contractor personnel to perform on advanced phases of this project as set forth by DCSA and NAVSEA in order to gain access to classified information pertaining to the national defense of the United States and its allies; this will be an inherent requirement. The selected company will be required to safeguard classified material during the advanced phases of this contract IAW the National Industrial Security Program Operating Manual (NISPOM), which can be found at Title 32, Part 2004.20 of the Code of Federal Regulations. Reference: National Industrial Security Program Executive Agent and Operating Manual (NISP), 32 U.S.C. § 2004.20 et seq. (1993).


PHASE I: Develop a concept for an AI/ML training data prioritization tool that meets the requirements in the Description and demonstrate the feasibility of that concept using unclassified data obtained or created by the company. If the unclassified data is not acoustic data, then it must be clearly extensible to the acoustic data use case. Feasibility will be demonstrated through analysis and modeling. Demonstrate the ROC curve associated with training on all data and how the ROC curve is maintained or even improved when AI/ML is trained using the prioritized subset of all data.


The Phase I Option, if exercised, will include the initial design specifications and capabilities description to build a prototype solution in Phase II.


PHASE II: Design, develop, and deliver a prototype AI/ML training data prioritization tool for testing and evaluation based on the results of Phase I. Demonstrate the prototype meets the requirements in the Description. The government will provide data sets used to train current AI/ML algorithms that are used in the AN/SQQ-89A(V)15 sonar system, and a MatLab implementation of at least one such algorithm.


It is probable that the work under this effort will be classified under Phase II (see Description section for details).


PHASE III DUAL USE APPLICATIONS: Support the Navy in transitioning the technology to Navy use. The Navy will establish a contract vehicle to apply the training data prioritization technology to AN/SQQ-89A(V)15 in support of additional AI/ML algorithm development opportunities, not limited to Undersea Warfare systems.


Given the emerging importance of AI/ML in numerous major industry sectors this technology can be used in many training areas. Science and engineering professions would do well their training centers to incorporate the technology because of ever changing information data.



“AN/SQQ-89(V) Undersea Warfare / Anti-Submarine Warfare Combat System, updated 20 Sep 2021.”
“The Essential Guide to Quality Training Data for Machine Learning: What You Need to Know About Data Quality and Training the Machine.” Cloudfactory.
Rim of the Pacific (RIMPAC) international maritime exercise website, available 6 Apr 2023 at


KEYWORDS: Artificial intelligence or machine learning (AI/ML); training data for AI/ML algorithms; acoustic data; undersea warfare systems; data that is diverse and representative; Multi-Function Towed Array; AN/SQS-53C hull-mounted sonar