This is an example of how to run a pretrained ultralytics YOLO model on Ouster data using the Ouster python SDK, pytorch, and opencv.
YOLO results displayed in opencv:
The Ouster SDK provides fast access to the 2D
LidarScan
data representation that streams from Ouster lidar sensors. The structured 2D imagery can be processed by any number of machine learning algorithms that are trained on camera images, in this example the recently released, YOLOv9.
This example runs YOLO twice per frame, once on the NEAR_IR
data and once on the REFLECTIVITY
data. The choice to run both is to demonstrate that depending on the scene, either Chanfield
can outperform the other. Specifically indoors and at night, NEAR_IR
data is not particularly useful because there is very little near infrared light for the sensor to detect.
1. Install and import required libraries
import argparse
from functools import partial
import numpy as np
import cv2
from ultralytics import YOLO
from ultralytics.engine.results import Results
import torch
from ouster.sdk.client import ChanField, LidarScan, ScanSource, destagger
from ouster.sdk import open_source
from ouster.sdk.client._utils import AutoExposure, BeamUniformityCorrector
from ouster.sdk.viz import SimpleViz
2. Define a main function parse user input, load data, apply processing, and visualize
We use the open_source
command to conveniently open a connection to a live sensor, or a PCAP or OSF recording. The open_source
command returns a Ouster ScanSource
object which emits an iterator of LidarScans
streaming from a live sensor or a data recording.
The example first displays the YOLO inference results using opencv, and then switches to Ouster’s SimpleViz to see how the 2D results map directly to the 3D pointcloud.
if __name__ == '__main__':
# parse the command arguments
parser = argparse.ArgumentParser(prog='sdk yolo demo',
description='Runs a minimal demo of yolo post-processing')
parser.add_argument('source', type=str, help='Sensor hostname or path to a sensor PCAP or OSF file')
args = parser.parse_args()
# Example for displaying results with opencv
scans = ScanIterator(open_source(args.source, sensor_idx=0, cycle=True), use_opencv=True)
for i, scan in enumerate(scans):
if i > 20: # break after N frames
break
# Example for displaying results with SimpleViz
scans = open_source(args.source, sensor_idx=0, cycle=True)
meta = scans.metadata
scans = ScanIterator(scans, use_opencv=False)
SimpleViz(meta, rate=0).run(scans)
3. Define the ScanIterator class that applies the YOLO model
We’ll now dive into the ScanIterator
class which runs the YOLO inference, and maintains some of the interfaces of the ScanSource
class which is required by SimpleViz
. The goal of the ScanIterator class is to run inference on each LidarScan before emitting the processed LidarScan. The ScanIterator
class could be replaced with a simple for-loop if in the main function if visualization with SimpleViz
wasn’t needed.
class ScanIterator(ScanSource):
if torch.cuda.is_available():
DEVICE = "cuda"
elif torch.backends.mps.is_available():
DEVICE = "mps"
else:
DEVICE = "cpu"
def __init__(self, scans: ScanSource, use_opencv=False):
self._use_opencv = use_opencv
self._metadata = scans.metadata
# Load yolo pretrained model.
# The example runs yolo on both near infrared and reflectivity channels so we create two independent models
self.model_yolo_nir = YOLO("yolov9c-seg.pt").to(device=self.DEVICE)
self.model_yolo_ref = YOLO("yolov9c-seg.pt").to(device=self.DEVICE)
# Define classes to output results for.
self.name_to_class = {} # Make a reverse look up for convenience
for key, value in self.model_yolo_nir.names.items():
self.name_to_class[value] = key
self.classes_to_detect = [
self.name_to_class['person'],
self.name_to_class['car'],
self.name_to_class['truck'],
self.name_to_class['bus']
]
# Post-process the near_ir, and cal ref data to make it more camera-like using the
# AutoExposure and BeamUniformityCorrector utility functions
self.paired_list = [
[ChanField.NEAR_IR, AutoExposure(), BeamUniformityCorrector(), self.model_yolo_nir],
[ChanField.REFLECTIVITY, AutoExposure(), BeamUniformityCorrector(), self.model_yolo_ref]
]
# Map the self._update function on to the scans iterator
# the iterator will now run the self._update command before emitting the modified scan
self._scans = map(partial(self._update), scans) # Play on repeat
ScanIterator
must behave as an iterator like the ScanSource
, meaning it emit LidarScans. We define its __iter__
function to do this.
# Return the scans iterator when instantiating the class
def __iter__(self) -> LidarScan:
return self._scans
Finally we define the _update
function which actual runs inference on the NEAR_IR
and REFLECTIVITY
Chanfields
individually. Not that in the ScanIterator
__init__
function we mapped the _update
function to the underlying scans
ScanSource
iterator.
def _update(self, scan: LidarScan) -> LidarScan:
stacked_result_rgb = np.empty((scan.h*len(self.paired_list), scan.w, 3), np.uint8)
for i, (field, ae, buc, model) in enumerate(self.paired_list):
# Destagger the data to get a normal looking 3D image
img = destagger(self._metadata, scan.field(field)).astype(np.float32)
# Make the image more uniform and better exposed to make it similar to camera data YOLO is trained on
ae(img)
buc(img, update_state=True)
# Convert to 3 channel uint8 for YOLO inference
img_rgb = np.repeat(np.uint8(np.clip(np.rint(img*255), 0, 255))[..., np.newaxis], 3, axis=-1)
# Run inference with the tracker module enabled so that instance ID's persist across frames
results: Results = next(
model.track(
[img_rgb],
stream=True, # Reduce memory requirements for streaming
persist=True, # Maintain tracks across sequential frames
conf=0.1,
# Force the inference to use full resolution. Must be multiple of 32, which all Ouster lidarscans conveniently are.
# Note that yolo performs best when the input image has pixels with square aspect ratio. This is true
# when the OS0-128 is set to 512 horizontal resolution, the OS1-128 is 1024, and the OS2-128 is 2048
imgsz=[img.shape[0], img.shape[1]],
classes=self.classes_to_detect
)
).cpu()
# Plot results. Masks don't display well when using SimpleViz
img_rgb = results.plot(boxes=True, masks=self._use_opencv, line_width=1, font_size=3)
if self._use_opencv:
# Save stacked RGB images for opencv viewing
stacked_result_rgb[i * scan.h:(i + 1) * scan.h, ...] = img_rgb
else:
# Overwrite grayscale Chanfield for visualization since SimpleViz cannot display RGB images
scan.field(field)[:] = destagger(self._metadata, img_rgb[..., 0], inverse=True)
# Display in the loop with opencv
if self._use_opencv:
cv2.imshow("results", stacked_result_rgb)
cv2.waitKey(1)
return scan
YOLO results as seen in SimpleViz
. If you look carefully, you can see that the bottom 2D results image is also overlaid on the pointcloud view:
Copying the code above in a yolo.py file, you will be able to run the script from the command line:
python yolo.py [SENSOR_HOSTNAME | PATH_TO_PCAP]