Running YOLO on 2D LidarScans using the SDK

This is an example of how to run a pretrained ultralytics YOLO model on Ouster data using the Ouster python SDK, pytorch, and opencv.

YOLO results displayed in opencv:


The Ouster SDK provides fast access to the 2D LidarScan data representation that streams from Ouster lidar sensors. The structured 2D imagery can be processed by any number of machine learning algorithms that are trained on camera images, in this example the recently released, YOLOv9.

This example runs YOLO twice per frame, once on the NEAR_IR data and once on the REFLECTIVITY data. The choice to run both is to demonstrate that depending on the scene, either Chanfield can outperform the other. Specifically indoors and at night, NEAR_IR data is not particularly useful because there is very little near infrared light for the sensor to detect.

1. Install and import required libraries

import argparse
from functools import partial

import numpy as np
import cv2
from ultralytics import YOLO
from ultralytics.engine.results import Results
import torch

from ouster.sdk.client import ChanField, LidarScan, ScanSource, destagger
from ouster.sdk import open_source
from ouster.sdk.client._utils import AutoExposure, BeamUniformityCorrector
from ouster.sdk.viz import SimpleViz

2. Define a main function parse user input, load data, apply processing, and visualize
We use the open_source command to conveniently open a connection to a live sensor, or a PCAP or OSF recording. The open_source command returns a Ouster ScanSource object which emits an iterator of LidarScans streaming from a live sensor or a data recording.

The example first displays the YOLO inference results using opencv, and then switches to Ouster’s SimpleViz to see how the 2D results map directly to the 3D pointcloud.

if __name__ == '__main__':
    # parse the command arguments
    parser = argparse.ArgumentParser(prog='sdk yolo demo',
                                     description='Runs a minimal demo of yolo post-processing')
    parser.add_argument('source', type=str, help='Sensor hostname or path to a sensor PCAP or OSF file')
    args = parser.parse_args()

    # Example for displaying results with opencv
    scans = ScanIterator(open_source(args.source, sensor_idx=0, cycle=True), use_opencv=True)
    for i, scan in enumerate(scans):
        if i > 20:  # break after N frames
            break

    # Example for displaying results with SimpleViz
    scans = open_source(args.source, sensor_idx=0, cycle=True)
    meta = scans.metadata
    scans = ScanIterator(scans, use_opencv=False)
    SimpleViz(meta, rate=0).run(scans)

3. Define the ScanIterator class that applies the YOLO model
We’ll now dive into the ScanIterator class which runs the YOLO inference, and maintains some of the interfaces of the ScanSource class which is required by SimpleViz. The goal of the ScanIterator class is to run inference on each LidarScan before emitting the processed LidarScan. The ScanIterator class could be replaced with a simple for-loop if in the main function if visualization with SimpleViz wasn’t needed.

class ScanIterator(ScanSource):

    if torch.cuda.is_available():
        DEVICE = "cuda"
    elif torch.backends.mps.is_available():
        DEVICE = "mps"
    else:
        DEVICE = "cpu"

    def __init__(self, scans: ScanSource, use_opencv=False):
        self._use_opencv = use_opencv
        self._metadata = scans.metadata

        # Load yolo pretrained model.
        # The example runs yolo on both near infrared and reflectivity channels so we create two independent models
        self.model_yolo_nir = YOLO("yolov9c-seg.pt").to(device=self.DEVICE)
        self.model_yolo_ref = YOLO("yolov9c-seg.pt").to(device=self.DEVICE)

        # Define classes to output results for.
        self.name_to_class = {}  # Make a reverse look up for convenience
        for key, value in self.model_yolo_nir.names.items():
            self.name_to_class[value] = key
        self.classes_to_detect = [
            self.name_to_class['person'],
            self.name_to_class['car'],
            self.name_to_class['truck'],
            self.name_to_class['bus']
        ]

        # Post-process the near_ir, and cal ref data to make it more camera-like using the
        # AutoExposure and BeamUniformityCorrector utility functions
        self.paired_list = [
            [ChanField.NEAR_IR, AutoExposure(), BeamUniformityCorrector(), self.model_yolo_nir],
            [ChanField.REFLECTIVITY, AutoExposure(), BeamUniformityCorrector(), self.model_yolo_ref]
        ]

        # Map the self._update function on to the scans iterator
        # the iterator will now run the self._update command before emitting the modified scan
        self._scans = map(partial(self._update), scans)  # Play on repeat

ScanIteratormust behave as an iterator like the ScanSource, meaning it emit LidarScans. We define its __iter__ function to do this.

    # Return the scans iterator when instantiating the class
    def __iter__(self) -> LidarScan:
        return self._scans

Finally we define the _update function which actual runs inference on the NEAR_IR and REFLECTIVITY Chanfields individually. Not that in the ScanIterator __init__ function we mapped the _update function to the underlying scans ScanSource iterator.

    def _update(self, scan: LidarScan) -> LidarScan:
        stacked_result_rgb = np.empty((scan.h*len(self.paired_list), scan.w, 3), np.uint8)
        for i, (field, ae, buc, model) in enumerate(self.paired_list):
            # Destagger the data to get a normal looking 3D image
            img = destagger(self._metadata, scan.field(field)).astype(np.float32)
            # Make the image more uniform and better exposed to make it similar to camera data YOLO is trained on
            ae(img)
            buc(img, update_state=True)

            # Convert to 3 channel uint8 for YOLO inference
            img_rgb = np.repeat(np.uint8(np.clip(np.rint(img*255), 0, 255))[..., np.newaxis], 3, axis=-1)

            # Run inference with the tracker module enabled so that instance ID's persist across frames
            results: Results = next(
                model.track(
                    [img_rgb],
                    stream=True,  # Reduce memory requirements for streaming
                    persist=True,  # Maintain tracks across sequential frames
                    conf=0.1,
                    # Force the inference to use full resolution. Must be multiple of 32, which all Ouster lidarscans conveniently are.
                    # Note that yolo performs best when the input image has pixels with square aspect ratio. This is true
                    # when the OS0-128 is set to 512 horizontal resolution, the OS1-128 is 1024, and the OS2-128 is 2048
                    imgsz=[img.shape[0], img.shape[1]],
                    classes=self.classes_to_detect
                )
            ).cpu()

            # Plot results. Masks don't display well when using SimpleViz
            img_rgb = results.plot(boxes=True, masks=self._use_opencv, line_width=1, font_size=3)
            if self._use_opencv:
                # Save stacked RGB images for opencv viewing
                stacked_result_rgb[i * scan.h:(i + 1) * scan.h, ...] = img_rgb
            else:
                # Overwrite grayscale Chanfield for visualization since SimpleViz cannot display RGB images
                scan.field(field)[:] = destagger(self._metadata, img_rgb[..., 0], inverse=True)

        # Display in the loop with opencv
        if self._use_opencv:
            cv2.imshow("results", stacked_result_rgb)
            cv2.waitKey(1)
        return scan

YOLO results as seen in SimpleViz. If you look carefully, you can see that the bottom 2D results image is also overlaid on the pointcloud view:

Copying the code above in a yolo.py file, you will be able to run the script from the command line:

python yolo.py [SENSOR_HOSTNAME | PATH_TO_PCAP]

3 Likes

UPDATE:

As of ouster-sdk version 0.12 you can use the LidarScan.add_field() functionality to visualize yolo results as custom RGB fields in SimpleViz. Custom field functionality and visualization is also a convenient tool for algorithm development:

The revised script taking advantage of version 0.12 features is copied below:

import argparse
from functools import partial

import numpy as np
import cv2
from ultralytics import YOLO
from ultralytics.engine.results import Results
import torch

from ouster.sdk.client import ChanField, LidarScan, ScanSource, destagger, FieldClass
from ouster.sdk import open_source
from ouster.sdk.client._utils import AutoExposure, BeamUniformityCorrector
from ouster.sdk.viz import SimpleViz


class ScanIterator(ScanSource):

    if torch.cuda.is_available():
        DEVICE = "cuda"
    elif torch.backends.mps.is_available():
        DEVICE = "mps"
    else:
        DEVICE = "cpu"

    def __init__(self, scans: ScanSource, use_opencv=False):
        self._use_opencv = use_opencv
        self._metadata = scans.metadata

        # Load yolo pretrained model.
        # The example runs yolo on both near infrared and reflectivity channels so we create two independent models
        self.model_yolo_nir = YOLO("yolov9c-seg.pt").to(device=self.DEVICE)
        self.model_yolo_ref = YOLO("yolov9c-seg.pt").to(device=self.DEVICE)

        # Define classes to output results for.
        self.name_to_class = {}  # Make a reverse look up for convenience
        for key, value in self.model_yolo_nir.names.items():
            self.name_to_class[value] = key

        self.classes_to_detect = [
            self.name_to_class['person'],
            self.name_to_class['car'],
            self.name_to_class['truck'],
            self.name_to_class['bus']
        ]

        # Post-process the near_ir, and cal ref data to make it more camera-like using the
        # AutoExposure and BeamUniformityCorrector utility functions
        self.paired_list = [
            [ChanField.NEAR_IR, AutoExposure(), BeamUniformityCorrector(), self.model_yolo_nir],
            [ChanField.REFLECTIVITY, AutoExposure(), BeamUniformityCorrector(), self.model_yolo_ref]
        ]

        # Map the self._update function on to the scans iterator
        # the iterator will now run the self._update command before emitting the modified scan
        self._scans = map(partial(self._update), scans)  # Play on repeat

    # Return the scans iterator when instantiating the class
    def __iter__(self):
        return self._scans

    def _update(self, scan: LidarScan) -> LidarScan:
        stacked_result_rgb = np.empty((scan.h*len(self.paired_list), scan.w, 3), np.uint8)
        for i, (field, ae, buc, model) in enumerate(self.paired_list):
            # Destagger the data to get a normal looking 3D image
            img = destagger(self._metadata, scan.field(field)).astype(np.float32)
            # Make the image more uniform and better exposed to make it similar to camera data YOLO is trained on
            ae(img)
            buc(img, update_state=True)

            # Convert to 3 channel uint8 for YOLO inference
            img_rgb = np.repeat(np.uint8(np.clip(np.rint(img*255), 0, 255))[..., np.newaxis], 3, axis=-1)

            # Run inference with the tracker module enabled so that instance ID's persist across frames
            results: Results = next(
                model.track(
                    [img_rgb],
                    stream=True,  # Reduce memory requirements for streaming
                    persist=True,  # Maintain tracks across sequential frames
                    conf=0.1,
                    # Force the inference to use full resolution. Must be multiple of 32, which all Ouster lidarscans conveniently are.
                    # Note that yolo performs best when the input image has pixels with square aspect ratio. This is true
                    # when the OS0-128 is set to 512 horizontal resolution, the OS1-128 is 1024, and the OS2-128 is 2048
                    imgsz=[img.shape[0], img.shape[1]],
                    classes=self.classes_to_detect
                )
            ).cpu()

            # Plot results. Masks don't display well when using SimpleViz
            img_rgb = results.plot(boxes=True, masks=True, line_width=1, font_size=3)
            if self._use_opencv:
                # Save stacked RGB images for opencv viewing
                stacked_result_rgb[i * scan.h:(i + 1) * scan.h, ...] = img_rgb
            else:
                # Add a custom RGB results field to allow for displaying in SimpleViz
                scan.add_field(f"YOLO_{field}", destagger(self._metadata, img_rgb, inverse=True))

        # Display in the loop with opencv
        if self._use_opencv:
            cv2.imshow("results", stacked_result_rgb)
            cv2.waitKey(1)
        return scan


if __name__ == '__main__':
    # parse the command arguments
    parser = argparse.ArgumentParser(prog='sdk yolo demo',
                                     description='Runs a minimal demo of yolo post-processing')
    parser.add_argument('source', type=str, help='Sensor hostname or path to a sensor PCAP or OSF file')
    args = parser.parse_args()

    # Example for displaying results with opencv
    scans = ScanIterator(open_source(args.source, sensor_idx=0, cycle=True), use_opencv=True)
    for i, scan in enumerate(scans):
        if i > 20:  # break after N frames
            break

    # Example for displaying results with SimpleViz
    scans = ScanIterator(open_source(args.source, sensor_idx=0, cycle=True), use_opencv=False)
    SimpleViz(scans._metadata, rate=0).run(scans)

Thank you, this works great. Is there a way to access the pixels that lie within a mask and associate them back to the point cloud? It would be great to apply the classes to the points themselves.

Here’s a further revised script that adds a function to generate the filled instance and class images without the additional annotations.

It also contains examples of how to calculate the XYZ point cloud for each LidarScan, how to deal with the concepts of staggered and destaggered data, and how to pull out xyz and range data that corresponds to each instance id for further analysis.

import argparse
from functools import partial

import numpy as np
import cv2
from ultralytics import YOLO
from ultralytics.engine.results import Results
import torch

from ouster.sdk.client import ChanField, LidarScan, ScanSource, destagger, FieldClass, XYZLut
from ouster.sdk import open_source
from ouster.sdk.client._utils import AutoExposure, BeamUniformityCorrector
from ouster.sdk.viz import SimpleViz


class ScanIterator(ScanSource):

    if torch.cuda.is_available():
        DEVICE = "cuda"
    elif torch.backends.mps.is_available():
        DEVICE = "mps"
    else:
        DEVICE = "cpu"

    def __init__(self, scans: ScanSource, use_opencv=False):
        self._use_opencv = use_opencv
        self._metadata = scans.metadata

        # Since LidarScans are always fixed resolution imagery, we can create an efficient lookup table for
        # converting range data to XYZ point clouds
        self._xyzlut = XYZLut(self._metadata)

        # For nice viewing. Requires matplotlib
        self._generate_rgb_table()

        # Load yolo pretrained model.
        # The example runs yolo on both near infrared and reflectivity channels so we create two independent models
        self.model_yolo_nir = YOLO("yolov9c-seg.pt").to(device=self.DEVICE)
        self.model_yolo_ref = YOLO("yolov9c-seg.pt").to(device=self.DEVICE)

        # Define classes to output results for.
        self.name_to_class = {}  # Make a reverse look up for convenience
        for key, value in self.model_yolo_nir.names.items():
            self.name_to_class[value] = key

        self.classes_to_detect = [
            self.name_to_class['person'],
            self.name_to_class['car'],
            self.name_to_class['truck'],
            self.name_to_class['bus']
        ]

        # Post-process the near_ir, and cal ref data to make it more camera-like using the
        # AutoExposure and BeamUniformityCorrector utility functions
        self.paired_list = [
            [ChanField.NEAR_IR, AutoExposure(), BeamUniformityCorrector(), self.model_yolo_nir],
            [ChanField.REFLECTIVITY, AutoExposure(), BeamUniformityCorrector(), self.model_yolo_ref]
        ]

        # Map the self._update function on to the scans iterator
        # the iterator will now run the self._update command before emitting the modified scan
        self._scans = map(partial(self._update), scans)

    # Return the scans iterator when instantiating the class
    def __iter__(self):
        return self._scans

    def _generate_rgb_table(self):
        # This creates a lookup table for mapping the unsigned integer instance and class ids to floating point
        # RGB values in the range 0 to 1
        import matplotlib.pyplot as plt
        import matplotlib as mpl
        from matplotlib import cm
        # Make some colors for visualizing bounding boxes
        np.random.seed(0)
        N_COLORS = 256
        scalarMap = cm.ScalarMappable(norm=mpl.colors.Normalize(vmin=0, vmax=1.0), cmap=mpl.pyplot.get_cmap('hsv'))
        self._mono_to_rgb_lut = np.clip(0.25 + 0.75 * scalarMap.to_rgba(np.random.random_sample((N_COLORS)))[:, :3], 0, 1)
        self._mono_to_rgb_lut = self._mono_to_rgb_lut.astype(np.float32)

    def mono_to_rgb(self, mono_img, background_img=None):
        """
        Takes instance or class integer images and creates a floating point RGB image with rainbow colors and an
        optional background image.
        """
        assert(np.issubdtype(mono_img.dtype, np.integer))
        rgb = self._mono_to_rgb_lut[mono_img % self._mono_to_rgb_lut.shape[0], :]
        if background_img is not None:
            if background_img.shape[-1] == 3:
                rgb[mono_img == 0, :] = background_img[mono_img == 0, :]
            else:
                rgb[mono_img == 0, :] = background_img[mono_img == 0, np.newaxis]
        else:
            rgb[mono_img == 0, :] = 0
        return rgb

    def _update(self, scan: LidarScan) -> LidarScan:
        stacked_result_rgb = np.empty((scan.h*len(self.paired_list), scan.w, 3), np.uint8)
        for i, (field, ae, buc, model) in enumerate(self.paired_list):

            # Destagger the data to get a human-interpretable, camera-like image
            img_mono = destagger(self._metadata, scan.field(field)).astype(np.float32)
            # Make the image more uniform and better exposed to make it similar to camera data YOLO is trained on
            ae(img_mono)
            buc(img_mono, update_state=True)

            # Convert to 3 channel uint8 for YOLO inference
            img_rgb = np.repeat(np.uint8(np.clip(np.rint(img_mono*255), 0, 255))[..., np.newaxis], 3, axis=-1)

            # Run inference with the tracker module enabled so that instance ID's persist across frames
            results: Results = next(
                model.track(
                    [img_rgb],
                    stream=True,  # Reduce memory requirements for streaming
                    persist=True,  # Maintain tracks across sequential frames
                    conf=0.1,
                    # Force the inference to use full resolution. Must be multiple of 32, which all Ouster lidarscans conveniently are.
                    # Note that yolo performs best when the input image has pixels with square aspect ratio. This is true
                    # when the OS0-128 is set to 512 horizontal resolution, the OS1-128 is 1024, and the OS2-128 is 2048
                    imgsz=[img_rgb.shape[0], img_rgb.shape[1]],
                    classes=self.classes_to_detect
                )
            ).cpu()

            # Plot results using the ultralytics results plotting. You can skip this if you'd rather use the
            # create_filled_masks functionality
            img_rgb_with_results = results.plot(boxes=True, masks=True, line_width=1, font_size=3)
            if self._use_opencv:
                # Save stacked RGB images for opencv viewing
                stacked_result_rgb[i * scan.h:(i + 1) * scan.h, ...] = img_rgb_with_results
            else:
                # Add a custom RGB results field to allow for displaying in SimpleViz
                scan.add_field(f"YOLO_RESULTS_{field}", destagger(self._metadata, img_rgb_with_results, inverse=True))

                # Alternative method for generating filled mask instance and class images
                # CAREFUL: These images are destaggered - human viewable. Whereas the raw field data in a LidarScan
                # is staggered.
                instance_id_img, class_id_img, instance_ids, class_ids = self.create_filled_masks(results, scan)

                # Example: Get xyz and range data slices that correspond to each instance id
                xyz_meters = self._xyzlut(scan.field(ChanField.RANGE))  # Get the xyz pointcloud for the entire LidarScan
                range_mm = scan.field(ChanField.RANGE)

                # It's more intuitive to work in human-viewable image-space so we choose to destagger the xyz and range data
                xyz_meters = destagger(scans._metadata, xyz_meters)
                range_mm = destagger(scans._metadata, range_mm)
                valid = range_mm != 0  # Ignore non-detected points
                for instance_id in instance_ids:
                    data_slice = (instance_id_img == instance_id) & valid
                    xyz_slice = xyz_meters[data_slice, :]  # The xyz data corresponding to an instance id
                    range_slice_mm = range_mm[data_slice]  # The range data corresponding to an instance id
                    # Example: Calculate the median range and xyz location to each detected object
                    print(f"ID {instance_id}: {np.median(range_slice_mm)/1000:0.2f} m, {np.array2string(np.median(xyz_slice, axis=0), precision=2)} m")

                # Add the data to the LidarScan for visualization. Always re-stagger (inverse=True)
                # the data to put it in the correct columns of the LidarScan. SimpleViz destaggers the data for human
                # viewing.
                scan.add_field(f"INSTANCE_ID_{field}", destagger(self._metadata, instance_id_img, inverse=True))
                scan.add_field(f"CLASS_ID_{field}", destagger(self._metadata, class_id_img, inverse=True))

                scan.add_field(f"RGB_INSTANCE_ID_{field}", destagger(self._metadata, self.mono_to_rgb(instance_id_img, img_mono), inverse=True))


        # Display in the loop with opencv
        if self._use_opencv:
            cv2.imshow("results", stacked_result_rgb)
            cv2.waitKey(1)
        return scan

    def create_filled_masks(self, results: Results, scan: LidarScan):
        instance_ids = np.empty(0, np.uint32)  # Keep track of which instances are kept
        class_ids = np.empty(0, np.uint32)  # Keep track of which classes are kept
        if results.boxes.id is not None and results.masks is not None:
            mask_edges = results.masks.xy
            orig_instance_ids = np.uint32(results.boxes.id.int())
            orig_class_ids = np.uint32(results.boxes.cls.int())
            # opencv drawContours requires 3-channel float32 image. We'll convert back to uint32 at the end
            instance_id_img = np.zeros((scan.h, scan.w, 3), np.float32)
            # Process ids in reverse order to ensure older instances overwrite newer ones in case of overlap
            for edge, instance_id, class_id in zip(mask_edges[::-1], orig_instance_ids[::-1], orig_class_ids[::-1]):
                if len(edge) != 0:  # It is possible to have an instance with zero edge length. Error check this case
                    instance_id_img = cv2.drawContours(instance_id_img, [np.int32([edge])], -1, color=[np.float64(instance_id), 0, 0], thickness=-1)
                    instance_ids = np.append(instance_ids, instance_id)
                    class_ids = np.append(class_ids, class_id)
            instance_id_img = instance_id_img[..., 0].astype(np.uint32)  # Convert to 1-channel image

            # Remove any instance_ids that were fully overwritten by an overlapping mask
            in_bool = np.isin(instance_ids, instance_id_img)
            instance_ids = instance_ids[in_bool]
            class_ids = class_ids[in_bool]
        else:
            instance_id_img = np.zeros((scan.h, scan.w), np.uint32)

        # Last step make the class id image using a lookup table from instances to classes
        if instance_ids.size > 0:
            instance_to_class_lut = np.arange(0, np.max(instance_ids) + 1, dtype=np.uint32)
            instance_to_class_lut[instance_ids] = class_ids
            class_id_img = instance_to_class_lut[instance_id_img]
        else:
            class_id_img = np.zeros((scan.h, scan.w), np.uint32)

        return instance_id_img, class_id_img, instance_ids, class_ids


if __name__ == '__main__':
    # parse the command arguments
    parser = argparse.ArgumentParser(prog='sdk yolo demo',
                                     description='Runs a minimal demo of yolo post-processing')
    parser.add_argument('source', type=str, help='Sensor hostname or path to a sensor PCAP or OSF file')
    args = parser.parse_args()

    # Example for displaying results with opencv
    scans = ScanIterator(open_source(args.source, sensor_idx=0, cycle=True), use_opencv=True)
    for i, scan in enumerate(scans):
        if i > 10:  # break after N frames
            break

    # Example for displaying results with SimpleViz
    scans = ScanIterator(open_source(args.source, sensor_idx=0, cycle=True), use_opencv=False)
    SimpleViz(scans._metadata, rate=0).run(scans)

You should be able to view the 1-channel instance and class id masks in SimpleViz. Here you can see the INSTANCEID mask on top and then a colorized version with the near infrared imagery below it, which is also automatically mapped to the point cloud in the viewer:

1 Like

Is it possible to get this updated for 0.15? A lot has changed in the SDK since this was posted.

Sure. Updated for Ouster SDK 0.15 and using YOLO11 as well. Tested on Python 3.12.

import argparse
from typing import List
import numpy as np
import cv2

from ultralytics import YOLO
from ultralytics.engine.results import Results
import torch

from ouster.sdk import open_source
from ouster.sdk.core import ChanField, LidarScan, ScanSource, destagger, FieldClass, XYZLut
from ouster.sdk.core._utils import AutoExposure, BeamUniformityCorrector
from ouster.sdk.viz import SimpleViz

class SourceIterator:

    if torch.cuda.is_available():
        DEVICE = "cuda"
    elif torch.backends.mps.is_available():
        DEVICE = "mps"
    else:
        DEVICE = "cpu"

    def __init__(self, source: ScanSource, use_opencv=False):
        self._source = source
        self._use_opencv = use_opencv
        self._sensor_info = source.sensor_info

        # Since LidarScans are always fixed resolution imagery, we can create an efficient lookup table for
        # converting range data to XYZ point clouds
        self._xyzlut = XYZLut(self._sensor_info[0])

        # For nice viewing. Requires matplotlib
        self._generate_rgb_table()

        # Post-process the near_ir, and cal ref data to make it more camera-like using the
        # AutoExposure and BeamUniformityCorrector utility functions
        # The example runs yolo on both near infrared and reflectivity channels so we create two independent models
        # Because each model holds some state because we are using the tracker functionality
        self.field_to_util = {}
        for field in [ChanField.NEAR_IR, ChanField.REFLECTIVITY]:
            self.field_to_util[field] = {
                "ae": AutoExposure(),
                "buc": BeamUniformityCorrector(),
                "model": YOLO("yolo11x-seg.pt").to(device=self.DEVICE) # This is the biggest YOLO11 model
            }

        # Define classes to output results for.
        self.name_to_class = {}  # Make a reverse look up for convenience
        for key, value in self.field_to_util[ChanField.NEAR_IR]["model"].names.items():
            self.name_to_class[value] = key

        self.classes_to_detect = [
            self.name_to_class['person'],
            self.name_to_class['car'],
            self.name_to_class['truck'],
            self.name_to_class['bus']
        ]


    @property
    def sensor_info(self):
        return self._sensor_info

    # Return the scans iterator when instantiating the class
    def __iter__(self):
        for scans in self._source:
            yield self.update(scans)

    def _generate_rgb_table(self):
        # This creates a lookup table for mapping the unsigned integer instance and class ids to floating point
        # RGB values in the range 0 to 1
        import matplotlib.pyplot as plt
        import matplotlib as mpl
        from matplotlib import cm
        # Make some colors for visualizing bounding boxes
        np.random.seed(0)
        N_COLORS = 256
        scalarMap = cm.ScalarMappable(norm=mpl.colors.Normalize(vmin=0, vmax=1.0), cmap=mpl.pyplot.get_cmap('hsv'))
        self._mono_to_rgb_lut = np.clip(0.25 + 0.75 * scalarMap.to_rgba(np.random.random_sample((N_COLORS)))[:, :3], 0, 1)
        self._mono_to_rgb_lut = self._mono_to_rgb_lut.astype(np.float32)

    def mono_to_rgb(self, mono_img, background_img=None):
        """
        Takes instance or class integer images and creates a floating point RGB image with rainbow colors and an
        optional background image.
        """
        assert(np.issubdtype(mono_img.dtype, np.integer))
        rgb = self._mono_to_rgb_lut[mono_img % self._mono_to_rgb_lut.shape[0], :]
        if background_img is not None:
            if background_img.shape[-1] == 3:
                rgb[mono_img == 0, :] = background_img[mono_img == 0, :]
            else:
                rgb[mono_img == 0, :] = background_img[mono_img == 0, np.newaxis]
        else:
            rgb[mono_img == 0, :] = 0
        return rgb

    def update(self, scans: List[LidarScan]) -> List[LidarScan]:
        scan = scans[0] # For a multi-sensor data set we would need to process all scans in the scans list in a loop
        stacked_result_rgb = np.empty((scan.h*len(self.field_to_util.keys()), scan.w, 3), np.uint8)
        for i, field in enumerate(self.field_to_util.keys()):

            # Destagger the data to get a human-interpretable, camera-like image
            img_mono = destagger(scan.sensor_info, scan.field(field)).astype(np.float32)
            # Make the image more uniform and better exposed to make it similar to camera data YOLO is trained on
            self.field_to_util[field]["ae"](img_mono)
            self.field_to_util[field]["buc"](img_mono, update_state=True)

            # Convert to 3 channel uint8 for YOLO inference
            img_rgb = np.repeat(np.uint8(np.clip(np.rint(img_mono*255), 0, 255))[..., np.newaxis], 3, axis=-1)
            # You can increase the internal image size beyond the input size for better results. Hence the factor of two
            imgsz = [img_rgb.shape[0] * 2, img_rgb.shape[1] * 2]  # Must be multiple of 32.

            # Run inference with the tracker module enabled so that instance ID's persist across frames
            results: Results = next(
                self.field_to_util[field]["model"].track(
                    [img_rgb],
                    stream=True,  # Reduce memory requirements for streaming
                    persist=True,  # Maintain tracks across sequential frames
                    conf=0.1,
                    # Force the inference to use full resolution. Must be multiple of 32, which all Ouster lidarscans conveniently are.
                    # Note that yolo performs best when the input image has pixels with square aspect ratio. This is true
                    # when the OS0-128 is set to 512 horizontal resolution, the OS1-128 is 1024, and the OS2-128 is 2048
                    imgsz=imgsz,
                    classes=self.classes_to_detect,
                    retina_masks=True,  # Higher resolution masks
                )
            ).cpu()

            # Plot results using the ultralytics results plotting. You can skip this if you'd rather use the
            # create_filled_masks functionality
            img_rgb_with_results = results.plot(boxes=True, masks=True, line_width=1, font_size=3)
            if self._use_opencv:
                # Save stacked RGB images for opencv viewing
                stacked_result_rgb[i * scan.h:(i + 1) * scan.h, ...] = img_rgb_with_results
            else:
                # Add a custom RGB results field to allow for displaying in SimpleViz
                # scan.add_field(f"YOLO_RESULTS_{field}", destagger(self._metadata, img_rgb_with_results, inverse=True))

                # Alternative method for generating filled mask instance and class images
                # CAREFUL: These images are destaggered - human viewable. Whereas the raw field data in a LidarScan
                # is staggered.
                instance_id_img, class_id_img, instance_ids, class_ids = self._create_filled_masks(results, scan)

                # Example: Get xyz and range data slices that correspond to each instance id
                xyz_meters = self._xyzlut(scan.field(ChanField.RANGE))  # Get the xyz pointcloud for the entire LidarScan
                range_mm = scan.field(ChanField.RANGE)

                # It's more intuitive to work in human-viewable image-space so we choose to destagger the xyz and range data
                xyz_meters = destagger(scan.sensor_info, xyz_meters)
                range_mm = destagger(scan.sensor_info, range_mm)
                valid = range_mm != 0  # Ignore non-detected points
                for instance_id in instance_ids:
                    data_slice = (instance_id_img == instance_id) & valid
                    xyz_slice = xyz_meters[data_slice, :]  # The xyz data corresponding to an instance id
                    range_slice_mm = range_mm[data_slice]  # The range data corresponding to an instance id
                    # Example: Calculate the median range and xyz location to each detected object
                    print(f"ID {instance_id}: {np.median(range_slice_mm)/1000:0.2f} m, {np.array2string(np.median(xyz_slice, axis=0), precision=2)} m")

                # Add the data to the LidarScan for visualization. Always re-stagger (inverse=True)
                # the data to put it in the correct columns of the LidarScan. SimpleViz destaggers the data for human
                # viewing.
                scan.add_field(f"INSTANCE_ID_{field}", destagger(scan.sensor_info, instance_id_img, inverse=True))
                scan.add_field(f"CLASS_ID_{field}", destagger(scan.sensor_info, class_id_img, inverse=True))

                scan.add_field(f"RGB_INSTANCE_ID_{field}", destagger(scan.sensor_info, self.mono_to_rgb(instance_id_img, img_mono), inverse=True))

        # Display in the loop with opencv
        if self._use_opencv:
            cv2.imshow("results", stacked_result_rgb)
            cv2.waitKey(1)
        return scans

    def _create_filled_masks(self, results: Results, scan: LidarScan):
        instance_ids = np.empty(0, np.uint32)  # Keep track of which instances are kept
        class_ids = np.empty(0, np.uint32)  # Keep track of which classes are kept
        if results.boxes.id is not None and results.masks is not None:
            mask_edges = results.masks.xy
            orig_instance_ids = np.uint32(results.boxes.id.int())
            orig_class_ids = np.uint32(results.boxes.cls.int())
            # opencv drawContours requires 3-channel float32 image. We'll convert back to uint32 at the end
            instance_id_img = np.zeros((scan.h, scan.w, 3), np.float32)
            # Process ids in reverse order to ensure older instances overwrite newer ones in case of overlap
            for edge, instance_id, class_id in zip(mask_edges[::-1], orig_instance_ids[::-1], orig_class_ids[::-1]):
                if len(edge) != 0:  # It is possible to have an instance with zero edge length. Error check this case
                    instance_id_img = cv2.drawContours(instance_id_img, [np.int32([edge])], -1, color=[np.float64(instance_id), 0, 0], thickness=-1)
                    instance_ids = np.append(instance_ids, instance_id)
                    class_ids = np.append(class_ids, class_id)
            instance_id_img = instance_id_img[..., 0].astype(np.uint32)  # Convert to 1-channel image

            # Remove any instance_ids that were fully overwritten by an overlapping mask
            in_bool = np.isin(instance_ids, instance_id_img)
            instance_ids = instance_ids[in_bool]
            class_ids = class_ids[in_bool]
        else:
            instance_id_img = np.zeros((scan.h, scan.w), np.uint32)

        # Last step make the class id image using a lookup table from instances to classes
        if instance_ids.size > 0:
            instance_to_class_lut = np.arange(0, np.max(instance_ids) + 1, dtype=np.uint32)
            instance_to_class_lut[instance_ids] = class_ids
            class_id_img = instance_to_class_lut[instance_id_img]
        else:
            class_id_img = np.zeros((scan.h, scan.w), np.uint32)

        return instance_id_img, class_id_img, instance_ids, class_ids


if __name__ == '__main__':
    # parse the command arguments
    parser = argparse.ArgumentParser(prog='sdk yolo demo',
                                     description='Runs a minimal demo of yolo post-processing')
    parser.add_argument('source', type=str, help='Sensor hostname or path to a sensor PCAP or OSF file')
    args = parser.parse_args()
    # Example for displaying results with opencv
    source = SourceIterator(open_source(args.source), use_opencv=True)
    for i, scans in enumerate(source):
        if i > 10:  # break after N frames
            break

    # Example for displaying results with SimpleViz
    source = SourceIterator(open_source(args.source), use_opencv=False)
    SimpleViz(source.sensor_info, rate=0).run(source)
1 Like

Hi, I tried using your latest script compatible with SDK 0.15, providing Urban_Drive.pcap from Ouster Studio as input. Although the model appears to produce some detections, nothing is displayed in SimpleViz.

In SimpleViz, you can use the ‘b’, ‘n’ and ‘m’ keys to cycle what fields are displayed in the 2D images and 3d point cloud data. You can press “SHIFT+?” to get a list of the full key mappings for SimpleViz.

If that wasn’t the issue, please post a screen grab of SimpleViz with the INSTANCE_ID_RGB field visible in the 2D view.

1 Like

Did you solve your issue @manuelamarenghi, and if so can you comment what was wrong? Thanks!

Yes, this partially addresses the issue. However, I’m still unable to achieve the same level of accuracy in the results. I suspect this is due to differences in the input data: the .pcap file I’m using is generated from OSx-32, whereas yours is from OSx-128, which has a more square aspect ratio and a resolution closer to that of the dataset used to train YOLO.
Aside from fine-tuning the model on my specific input format, do you have any other recommendations?

You are correct that the resolution of the input imagery is an important distinction. I would not expect to get great results with a 32 channel resolution input, in the same way that running YOLO on a 32x32 pixel resolution camera image is unlikely to provide good performance.

Running with Square Aspect Ratio Data
One thing that should help is to repeat the rows before running inference so that the data at least has a square aspect ratio (this applies whether you have a 32 channel or 128 channel lidar). If you are running a 32 channel OS1 in 1024 mode, you need to duplicate rows by 4x to achieve a 128x1024 resolution image with “square” aspect ratio visual information. Other lidar modes and sensor types will require different row repeat factors - and in some cases column repeat factors - which can be determined by finding the ratio of a pixel’s horizontal and vertical angular subtense in a given lidar mode.

In your case for an OS1-32 with a 45° vertical FOV in 1024 HRES mode: (45°/32 channels) / (360° / 1024) = 4

As another example, an OS1-128 running in 512 HRES mode needs 2x column repetition: (45°/128 channels) / (360° / 512) = 0.5

1 Like

Thank you, this helped and I’m already getting better results. I’m using 32 channels in 1024 mode and have stretched my input to 128×1024.

This appears to be a popular thread and resource for people wishing to get started inferencing ouster lidar data. Is it possible to get an update for 0.16.0? Many breaking changes with this release. Thank you so much.

Sure. Updated for Ouster SDK 0.16, using YOLO11, and tested on Python 3.12. This has also been updated to support multi-sensor sources by processing each lidar in a multi-lidar recording with its own YOLO models.

Save to file named yolo.py
Usage: python yolo.py [SENSOR_HOSTNAME | PATH_TO_PCAP_OSF_BAG]

import argparse
from typing import List
import numpy as np
import cv2

from ultralytics import YOLO
from ultralytics.engine.results import Results
import torch

from ouster.sdk import open_source
from ouster.sdk.core import (ChanField, LidarScan, LidarScanSet, ScanSource, destagger, XYZLut, SensorInfo,
    AutoExposure, BeamUniformityCorrector)
from ouster.sdk.viz import SimpleViz

class SourceIterator:

    if torch.cuda.is_available():
        DEVICE = "cuda"
    elif torch.backends.mps.is_available():
        DEVICE = "mps"
    else:
        DEVICE = "cpu"

    def __init__(self, source: ScanSource, use_opencv=False):
        self._source: ScanSource = source
        self._use_opencv = use_opencv
        # One sensor_info for every lidar in the source
        self._sensor_info: List[SensorInfo] = source.sensor_info

        # Since LidarScans are always fixed resolution imagery, we can create an efficient lookup table for
        # converting range data to XYZ point clouds. Make one lookup for each lidar in the source
        self._xyzluts = [XYZLut(info) for info in self._sensor_info]

        # For nice viewing. Requires matplotlib
        self._generate_rgb_table()

        # Post-process the near_ir, and cal ref data to make it more camera-like using the
        # AutoExposure and BeamUniformityCorrector utility functions
        # The example runs yolo on both near infrared and reflectivity channels. We create two independent models
        # because we are using the tracker functionality which holds some state. We then duplicate all of this information
        # for each lidar in the source so that each lidar has its own state
        self.field_to_util = []
        for _ in self._sensor_info:
            self.field_to_util.append({})
            for field in [ChanField.NEAR_IR, ChanField.REFLECTIVITY]:
                self.field_to_util[-1][field] = {
                    "ae": AutoExposure(),
                    "buc": BeamUniformityCorrector(),
                    "model": YOLO("yolo11x-seg.pt").to(device=self.DEVICE) # This is the biggest YOLO11 model
                }

        # Define classes to output results for.
        self.name_to_class = {}  # Make a reverse look up for convenience
        for key, value in self.field_to_util[0][ChanField.NEAR_IR]["model"].names.items():
            self.name_to_class[value] = key

        self.classes_to_detect = [
            self.name_to_class['person'],
            self.name_to_class['car'],
            self.name_to_class['truck'],
            self.name_to_class['bus']
        ]


    @property
    def sensor_info(self) -> List[SensorInfo]:
        return self._sensor_info

    # Return the scans iterator when instantiating the class
    def __iter__(self):
        for scans in self._source:
            yield self.update(scans)

    def _generate_rgb_table(self):
        # This creates a lookup table for mapping the unsigned integer instance and class ids to floating point
        # RGB values in the range 0 to 1
        import matplotlib.pyplot as plt
        import matplotlib as mpl
        from matplotlib import cm
        # Make some colors for visualizing bounding boxes
        np.random.seed(0)
        N_COLORS = 256
        scalarMap = cm.ScalarMappable(norm=mpl.colors.Normalize(vmin=0, vmax=1.0), cmap=mpl.pyplot.get_cmap('hsv'))
        self._mono_to_rgb_lut = np.clip(0.25 + 0.75 * scalarMap.to_rgba(np.random.random_sample((N_COLORS)))[:, :3], 0, 1)
        self._mono_to_rgb_lut = self._mono_to_rgb_lut.astype(np.float32)

    def mono_to_rgb(self, mono_img, background_img=None):
        """
        Takes instance or class integer images and creates a floating point RGB image with rainbow colors and an
        optional background image.
        """
        assert(np.issubdtype(mono_img.dtype, np.integer))
        rgb = self._mono_to_rgb_lut[mono_img % self._mono_to_rgb_lut.shape[0], :]
        if background_img is not None:
            if background_img.shape[-1] == 3:
                rgb[mono_img == 0, :] = background_img[mono_img == 0, :]
            else:
                rgb[mono_img == 0, :] = background_img[mono_img == 0, np.newaxis]
        else:
            rgb[mono_img == 0, :] = 0
        return rgb

    def update(self, scans: LidarScanSet) -> LidarScanSet:
        # Process each LidarScan in the LidarScanSet independently
        for ith_scan, scan in enumerate(scans):
            stacked_result_rgb = np.empty((scan.h*len(self.field_to_util[ith_scan].keys()), scan.w, 3), np.uint8)
            for i, field in enumerate(self.field_to_util[ith_scan].keys()):
                # Destagger the data to get a human-interpretable, camera-like image
                img_mono = destagger(scan.sensor_info, scan.field(field)).astype(np.float32)
                # Make the image more uniform and better exposed to make it similar to camera data YOLO is trained on
                self.field_to_util[ith_scan][field]["ae"](img_mono)
                self.field_to_util[ith_scan][field]["buc"](img_mono, update_state=True)

                # Convert to 3 channel uint8 for YOLO inference
                img_rgb = np.repeat(np.uint8(np.clip(np.rint(img_mono*255), 0, 255))[..., np.newaxis], 3, axis=-1)
                # You can increase the internal image size beyond the input size for better results. Hence the factor of two
                imgsz = [img_rgb.shape[0] * 2, img_rgb.shape[1] * 2]  # Must be multiple of 32.

                # Run inference with the tracker module enabled so that instance ID's persist across frames
                results: Results = next(
                    self.field_to_util[ith_scan][field]["model"].track(
                        [img_rgb],
                        stream=True,  # Reduce memory requirements for streaming
                        persist=True,  # Maintain tracks across sequential frames
                        conf=0.1,
                        # Force the inference to use full resolution. Must be multiple of 32, which all Ouster lidarscans conveniently are.
                        # Note that yolo performs best when the input image has pixels with square aspect ratio. This is true
                        # when the OS0-128 is set to 512 horizontal resolution, the OS1-128 is 1024, and the OS2-128 is 2048
                        imgsz=imgsz,
                        classes=self.classes_to_detect,
                        retina_masks=True,  # Higher resolution masks
                    )
                ).cpu()

                # Plot results using the ultralytics results plotting. You can skip this if you'd rather use the
                # create_filled_masks functionality
                img_rgb_with_results = results.plot(boxes=True, masks=True, line_width=1, font_size=3)
                if self._use_opencv:
                    # Save stacked RGB images for opencv viewing
                    stacked_result_rgb[i * scan.h:(i + 1) * scan.h, ...] = img_rgb_with_results
                else:
                    # Add a custom RGB results field to allow for displaying in SimpleViz
                    # scan.add_field(f"YOLO_RESULTS_{field}", destagger(self._metadata, img_rgb_with_results, inverse=True))

                    # Alternative method for generating filled mask instance and class images
                    # CAREFUL: These images are destaggered - human viewable. Whereas the raw field data in a LidarScan
                    # is staggered.
                    instance_id_img, class_id_img, instance_ids, class_ids = self._create_filled_masks(results, scan)

                    # Example: Get xyz and range data slices that correspond to each instance id
                    xyz_meters = self._xyzluts[ith_scan](scan.field(ChanField.RANGE))  # Get the xyz pointcloud for the entire LidarScan
                    range_mm = scan.field(ChanField.RANGE)

                    # It's more intuitive to work in human-viewable image-space so we choose to destagger the xyz and range data
                    xyz_meters = destagger(scan.sensor_info, xyz_meters)
                    range_mm = destagger(scan.sensor_info, range_mm)
                    valid = range_mm != 0  # Ignore non-detected points
                    for instance_id in instance_ids:
                        data_slice = (instance_id_img == instance_id) & valid
                        xyz_slice = xyz_meters[data_slice, :]  # The xyz data corresponding to an instance id
                        range_slice_mm = range_mm[data_slice]  # The range data corresponding to an instance id
                        # Example: Calculate the median range and xyz location to each detected object
                        print(f"ID {instance_id}: {np.median(range_slice_mm)/1000:0.2f} m, {np.array2string(np.median(xyz_slice, axis=0), precision=2)} m")

                    # Add the data to the LidarScan for visualization. Always re-stagger (inverse=True)
                    # the data to put it in the correct columns of the LidarScan. SimpleViz destaggers the data for human
                    # viewing.
                    scan.add_field(f"INSTANCE_ID_{field}", destagger(scan.sensor_info, instance_id_img, inverse=True))
                    scan.add_field(f"CLASS_ID_{field}", destagger(scan.sensor_info, class_id_img, inverse=True))

                    scan.add_field(f"RGB_INSTANCE_ID_{field}", destagger(scan.sensor_info, self.mono_to_rgb(instance_id_img, img_mono), inverse=True))

            # Display in the loop with opencv
            if self._use_opencv:
                cv2.imshow("results", stacked_result_rgb)
                cv2.waitKey(1)
        return scans

    def _create_filled_masks(self, results: Results, scan: LidarScan):
        instance_ids = np.empty(0, np.uint32)  # Keep track of which instances are kept
        class_ids = np.empty(0, np.uint32)  # Keep track of which classes are kept
        if results.boxes.id is not None and results.masks is not None:
            mask_edges = results.masks.xy
            orig_instance_ids = np.uint32(results.boxes.id.int())
            orig_class_ids = np.uint32(results.boxes.cls.int())
            # opencv drawContours requires 3-channel float32 image. We'll convert back to uint32 at the end
            instance_id_img = np.zeros((scan.h, scan.w, 3), np.float32)
            # Process ids in reverse order to ensure older instances overwrite newer ones in case of overlap
            for edge, instance_id, class_id in zip(mask_edges[::-1], orig_instance_ids[::-1], orig_class_ids[::-1]):
                if len(edge) != 0:  # It is possible to have an instance with zero edge length. Error check this case
                    instance_id_img = cv2.drawContours(instance_id_img, [np.int32([edge])], -1, color=[np.float64(instance_id), 0, 0], thickness=-1)
                    instance_ids = np.append(instance_ids, instance_id)
                    class_ids = np.append(class_ids, class_id)
            instance_id_img = instance_id_img[..., 0].astype(np.uint32)  # Convert to 1-channel image

            # Remove any instance_ids that were fully overwritten by an overlapping mask
            in_bool = np.isin(instance_ids, instance_id_img)
            instance_ids = instance_ids[in_bool]
            class_ids = class_ids[in_bool]
        else:
            instance_id_img = np.zeros((scan.h, scan.w), np.uint32)

        # Last step make the class id image using a lookup table from instances to classes
        if instance_ids.size > 0:
            instance_to_class_lut = np.arange(0, np.max(instance_ids) + 1, dtype=np.uint32)
            instance_to_class_lut[instance_ids] = class_ids
            class_id_img = instance_to_class_lut[instance_id_img]
        else:
            class_id_img = np.zeros((scan.h, scan.w), np.uint32)

        return instance_id_img, class_id_img, instance_ids, class_ids


if __name__ == '__main__':
    # parse the command arguments
    parser = argparse.ArgumentParser(prog='sdk yolo demo',
                                     description='Runs a minimal demo of yolo post-processing')
    parser.add_argument('source', type=str, help='Sensor hostname or path to a sensor PCAP or OSF file')
    args = parser.parse_args()
    # Example for displaying results with opencv
    source = SourceIterator(open_source(args.source), use_opencv=True)
    for i, scans in enumerate(source):
        if i > 10:  # break after N frames
            break

    # Example for displaying results with SimpleViz
    source = SourceIterator(open_source(args.source), use_opencv=False)
    SimpleViz(source.sensor_info, rate=0).run(source)