Data Scarcity Overcome With Synthetic Data and MLOps

Challenge: Lack of Data Obstructs Training for Object Detection Model

Every day, analysts working for US Special Operations Forces face a significant data-processing workload. They are tasked with locating objects of interest across thousands of images or hours of streaming video. Historically, the only way to process more data was to add more analysts, which proves challenging within a practical timeframe.

To scale and accelerate the force’s capabilities, one Special Operations team looked to introduce machine learning (ML)—specifically, an object detection model deployed at the tactical edge.

Yet, this project faced a critical challenge: a lack of data to train their model. Publicly available data for the object of interest—shipping containers—was thin. Capturing new, original imagery of shipping containers with the necessary breadth and quality for training would have proven time-consuming, costly, and onerous. The team needed a different solution.

Applying Synthetic Data Through the Striveworks MLOps Platform

Initially, the team attempted to train a YOLO model on the 461 open-source images sourced from the internet. YOLO is a popular and widely available algorithm known for its speed and accuracy. However, the results were unworkable—the training dataset was simply too small and narrow. They needed a better model trained on more imagery—fast and cost effectively.

Fortunately, the team chose to work with Striveworks—a machine learning operations (MLOps) company grounded in open architecture and ecosystem partnerships across a range of data and technology providers. For this project, Striveworks saw a prime opportunity to fine-tune the object detection model using synthetic data from SensorOps.

SensorOps provides a range of technologies from the command center to the edge to help teams adapt at mission speed. The SensorOps TargetModeler platform allows users to design precision-tailored synthetic imagery using an intuitive interface on a device as lightweight as a gaming laptop.

To improve the existing training dataset for Striveworks, SensorOps quickly and easily generated 1,200 new, fully annotated synthetic images of shipping containers in various fidelities and collection environments. This data was ideal for model training by delivering imagery with the specified lighting, view angles, and levels of obscurity needed for the data science team to quickly iterate and focus on specific instances in which the model struggled.

Combining Real and Synthetic Data Achieves Target ML Performance

The project team easily merged synthetic and open-source data, creating a fuller and varied training dataset. By applying data augmentations using the Striveworks MLOps platform, the team further enriched and expanded the limited training data. This enabled them to fine-tune their YOLO model to achieve validation accuracy up to 85%—more than effective for real-world applications.

I'm excited about what we can do. Quotation from Special Operations Project Lead

Result: Mission Success Through Synthetic Data and Open Architecture

Through their forward-thinking adoption of open-architecture technologies, the Special Operations team developed a repeatable process to circumvent crippling data issues. Despite the lack of training data, the team successfully created and deployed its object detection model into a live training environment.

By leveraging AI ecosystem partners and novel technologies, the Special Operations team proved the efficacy of their ML model, the advantage of open architecture MLOps, and the value of synthetic data for mission-critical tasks.

	Synthetic data is high-fidelity data that is artificially manufactured rather than derived from real-world objects or phenomena.
	For this project, SensorOps used its TargetModeler platform to produce 3D renderings of shipping containers in detailed, hyper realistic representative 3D environments, with collection conditions matching anticipated mission needs (weather, sun angle, moon illumination, environment).
	This process allowed SensorOps to produce photorealistic imagery with the various collection angles and situational contexts needed for model training.

	Synthetic data allows for quicker and cost-effective data generation, and can produce images that would otherwise be too risky or costly to capture in the real world.
	SensorOps TargetModeler’s auto-labeling supports both object segmentation and bounding box annotations. While manual annotations often pose accuracy challenges, TargetModeler delivers pixel-perfect annotations, allowing for swift testing and validation of annotation effects on model training tailored to specific use cases.

	Data augmentation is the process of transforming available data to create a richer dataset for use in ML model training.
	Augmentations are especially useful when training data lacks variety or context.
	Striveworks users can apply a set of augmentations with the push of a button to adjust imagery’s contrast, brightness, color, rotation, and other parameters.
	Augmentations result in broader training datasets that produce more robust and effective models.