Data Collection

The Rig

Camera Rig

Our rig is constructed in order to capture synchronised data across multiple modalities including stereo polarisation, i-ToF correlation, Depth and IMU. We rigidly mount two polarisation cameras (Lucid PHX050S-QC) providing a left-right stereo pair, an i-ToF camera (Lucid HLS003S-001) operating at 25Mhz and a structured-light (RealSense D435i) camera for active IR stereo capture.

More details

All devices are connected with a hardware synchronisation wire resulting in time-aligned video capture at a frame rate of 10fps. The left polarisation camera is the lead camera which generates the genlock signal and defines the world reference frame. Accurate synchronisation was validated using a flash-light torch and was further confirmed by the respectable quality observed from stereo Block Matching. The focus of all sensors is set to infinity, the aperture to maximum, and the exposure is manually fixed at the beginning of each capture sequence. The calibration on all four cameras’ extrinsics, intrinsics, and distortion coefficients is done with a graph bundle-adjustment for improved multi-view calibration.



The data

Dataset_table

We collect a unique dataset comprising multi-modal captures such that each time point pertains to measurement of (1) Polarisation: raw stereo polarisation cameras produce 2448×2048 px stereo images. (2) i-ToF: 4 channel 640×480 px correlation images. (3) Depth: a structured-light capture of the scene resulting in a 848×480 px depth image. In addition to the three main sensors, IMU information is recorded to further enable future research directions.

More details

Our dataset consists of more than 20k frames, totalling >80k images of indoor and outdoor sequences in challenging conditions, with no constraints on minimum or maximum scene range. We group these sequences into four different scenes which we name: Kitchen, Station, Park and Facades.

Privacy protection

The scene imagery is collected in permitted public areas. We comply with local regulations and avoid releasing any localization information, including GPS and cartographic information. For privacy protection, we proactively attempt to avoid the capture of images that may contain personal information, such as human faces and license plates. If any dataset imagery is found not to comply with these privacy constraints, please contact the authors for image removal.

Data Format

Processed Data Format

Processed data are data which have been pre-processed for direct usage. For example, the stereo pair has been undistort-rectified, the depth (from structured-light and colmap) was projected to a reference camera frame (left polarized sensor), the colmap depth scale is corrected fitting the depth to the structure-light depth, and we generate RGB image from polarized image for convenience (although not used in the paper).

Hereafter follow some instructive examples to perform data loading for the various sensors:
Depth from {colmap,structured-light}:
The following example is to open the depth from colmap, frame 0000000097.png, scene Kitchen, date 20201027-180941:
                    
import cv2
image_path = "CroMo_dataset/kitchen/20201027-180941/colmap/sensor/data/0000000097.png"
depth = cv2.imread(image_path,-1).astype(float)/1000
                    
The depth will be a 2D float array (512, 612) with depth values in meters, zero being an invalid or missing value.

Correlation from i-tof:
The following example is to open the correlation images from i-tof, frame 0000000097.png, scene Kitchen, date 20201027-180941:

import cv2, numpy as np
image_path = "CroMo_dataset/kitchen/20201027-180941/i-tof/sensor/data/0000000097.png"
correlation_input = cv2.imread(image_path,-1).astype(float)/255
w,h = correlation_input.shape
nw = w//4
C_0 = correlation_input[0 * nw:1 * nw, :]
C_1 = correlation_input[1 * nw:2 * nw, :]
C_2 = correlation_input[2 * nw:3 * nw, :]
C_3 = correlation_input[3 * nw:4 * nw, :]
correlation = np.dstack((C_0, C_1, C_2, C_3))
                
The correlation will be a 3D float array (480, 640, 4), the last channels being the correlation coefficients.

Polarized data from polarized sensor:
The following example is to open the polarized data from left polarized sensor, frame 0000000097.png, scene Kitchen, date 20201027-180941:

import cv2, numpy as np
image_path = "CroMo_dataset/kitchen/20201027-180941/polarized/left/data/0000000097.png"
polarized_input = cv2.imread(image_path,-1).astype(float)/255
w,h,c = polarized_input.shape
nw = w//4
pol_0 = polarized_input[0 * nw:1 * nw, :]
pol_45 = polarized_input[1 * nw:2 * nw, :]
pol_90 = polarized_input[2 * nw:3 * nw, :]
pol_135 = polarized_input[3 * nw:4 * nw, :]
polarized = np.dstack((pol_0, pol_45, pol_90, pol_135))
            
The polarized data will be a 3D float array (512, 612, 12), the last channels being the 4 polarization angles for the 3 color channels.

RGB from polarized sensor:
The following example is to open the rgb from left polarized sensor, frame 0000000097.png, scene Kitchen, date 20201027-180941:

import cv2
image_path = "CroMo_dataset/kitchen/20201027-180941/rgb/left/data/0000000097.png"
rgb = cv2.imread(image_path)
            
The rgb will be a 3D uint8 array (512, 612, 3).
Note that we do not have rgb sensor and this is obtained using a demosaicing_CFA_Bayer_bilinear correction on raw polarised data before averaging the values.

IMU from structured-light sensor:
The following example is to open the imu from structured-light sensor date 20201027-180941:

import numpy as np
data_path = "CroMo_dataset/kitchen/20201027-180941/imu/sensor/data/log_accel.txt"
imu_raw = np.loadtxt(data_path)
data = {}
for key, val in zip(imu_raw[:,0], imu_raw[:,1:]):
    data[key] = val
        
Note that the imu accelerometer (log_accel.txt) and gyroscope (log_gyro.txt) are sampling the scene at their our pace, with the same clock as the structured-light depth. This means interpolation would be needed to align the data with the other frames. For reference, the mapping between timestamp and frame id is given in log_images.txt.
For convenience, we linearly interpolate the accelerometer and gyroscope at each timestamps of log_accel.txt, log_gyro.txt, and log_images.txt and provide the results in log_imu.txt.

Raw Data Format

Description on raw will be provided at the time of the release of the RAW data