The resolution of the reflectivity and IR images was 2048 x 128. While inferring at this resolution, the detection quality was not satisfactory. Therefore, I changed the resolution to 1024 x 128 by removing every second column, and the results have been great. However, I am now unable to project the segmented objects into the point cloud to extract the range information of the detected objects.
Because of the downsampling step that you performed?
The reason the 1024x10 mode OS1 works better is because in this mode the lidar produces imagery with pixels with square aspect ratio like a camera.
A quick fix that will run slower because of higher resolution images is to duplicate the rows of of the 128x2048 to make a 256x2048 image instead of removing columns. This will create a square aspect ratio image like deleting columns but doesn’t throw away information. Though, again, it runs slower.
You can do this with:
img_256 = np.repeat(img, repeats=2, axis=0)
Either way once you inference on an upsampled or downsampled image, to put everything back in 128x2048 space, you will either remove every other row from the results images, or duplicate every column.
An alternative solution to downsampling results imagery would be to manually divide the coordinates of the results data like the bbox and mask x-positions by two if you’ve deleted columns or divide all of the y-positions by two if you’ve added rows. Then apply the results data to the original 128x2048 image.
“An alternative solution to downsampling results imagery would be to manually divide the coordinates of the results data like the bbox and mask x-positions by two if you’ve deleted columns or divide all of the y-positions by two if you’ve added rows. Then apply the results data to the original 128x2048 image.”
does it mean both azimuth and elevation angle on pixel level to be same
is possible to project segmented masks in reflective channel on range point cloud ?
When say you to project segmented masks do you mean associate the segmentation results with the appropriate pixel in the range channel?
Perhaps you want a visualzation like this?
Convert to RGB for YOLO inference
img_rgb = np.repeat(np.uint8(np.clip(np.rint(img * 255), 0, 255))[..., np.newaxis], 3, axis=-1)
print(img_rgb.shape)
# Run YOLO inference
results: Results = next(
model.track(
[img_rgb],
stream=True,
persist=True,
conf=0.1,
imgsz=[img_rgb.shape[0], img.shape[1]],
classes=self.classes_to_detect
)
).cpu()
track_ids = results.boxes.id.int().cpu().tolist()
masks = results.masks.xy
car_ids = list[track_ids]
# Scale bounding boxes back to original resolution
boxes = results.boxes
if boxes is not None and len(boxes) > 0:
print(f"the length of box:{len(boxes)}")
boxes_xywh = boxes.xywh.clone()
boxes_xywh[:, 0] *= 2 # Scale xc (center x)
boxes_xywh[:, 1] *= 1 # Scale yc (center y)
boxes_xywh[:, 2] *= 2 # Scale w (width)
boxes_xywh[:, 3] *= 1 # Scale h (height)
print(f"Reprojected Bounding Boxes (xywh):\n{boxes_xywh}")
# Upsample masks to original resolution
masks = results.masks
if masks is not None:
mask_tensor = masks.data.unsqueeze(1) # Add channel dimension
upsampled_mask_tensor = torch.nn.functional.interpolate(
mask_tensor,
size=(resized_height, original_width), # Target size (H, W)
mode="nearest"
)
upsampled_masks = upsampled_mask_tensor.squeeze(1).cpu().numpy() # Convert to NumPy
print(f"Upsampled Mask Shape: {upsampled_masks.shape}")
# Reproject results on the original LiDAR image
original_img_rgb = cv2.cvtColor((cv2.resize(img, (original_width, resized_height)) * 255).astype(np.uint8), cv2.COLOR_GRAY2BGR)
color = (0, 255, 0)
if masks is not None:
for j in range(upsampled_masks.shape[0]):
mask = upsampled_masks[j]
colored_mask = np.zeros_like(original_img_rgb, dtype=np.uint8)
colored_mask[mask > 0] = color # Green color for masks
alpha = 0.3
beta = 1
original_img_rgb = cv2.addWeighted(colored_mask, alpha, original_img_rgb, beta, 0)
# for box in boxes_xywh:
# xc, yc, w, h = map(int, box.tolist())
# x1, y1 = xc - w // 2, yc - h // 2 # Convert to top-left corner
# x2, y2 = xc + w // 2, yc + h // 2 # Convert to bottom-right corner
# cv2.rectangle(original_img_rgb, (x1, y1), (x2, y2), (255, 0, 0), 2) # Blue bounding box
scan.add_field(f"YOLO_{field}", destagger(self._metadata, original_img_rgb, inverse=True))
# Display the reprojected image
if self._use_opencv:
cv2.imshow("Reprojected YOLO Results", original_img_rgb)
cv2.waitKey(1)
return scan
@Samahu
i am able to project yolo_inference on reflectivity on it in 3d point cloud by adding scan.field(yolo_reflectivity).How can isolate car mask point cloud in point cloud, as i want visualize only point cloud of masked cars in yolo_refelctivity channel.
is it possible to project reflectivity detected point clouds in range point clouds perhaps similar to above shared image as i with masks on detected objects.