Multi-camera setup on a roll

As a researcher, it is currently rather easy to surf the world wide web and find several image datasets that perfectly suit your needs. There are large repositories with thousands of datasets with detailed imagery of a wide range of objects and environments. However, the same cannot be said for the video equivalent, i.e. datasets of videos, all capturing the subject or setting of interest. This does not mean that there is no demand for such datasets, on the contrary. The addition of the time dimension greatly increases the amount of captured information, leading to new research possibilities.

The creation of video datasets is, however, an expensive endeavor. Instead of using one camera to take pictures of a static scene from different viewpoints, you need one camera for each viewpoint. Additionally, a mechanism that controls and synchronizes all the cameras is necessary. These two factors lead to expensive hardware requirements and camera setups that are hard to transport.

Videos from 2*2*9 picameras

Near the conclusion of his Ph.D, dr. Ruben Verhack started designing a multi-camera setup, keeping in mind the cost and ease of transport. Here, the decision was made to use Raspberry Pi’s with PiCameras, all connected to a central server over Ethernet. For robustness, 9 cameras are fixed onto a wooden panel in a grid of 3 by 3. To make the setup more flexible, 6 of those panels can be freely positioned around the subject, resulting in 54 cameras.

Recently, the framework to control and synchronize all PiCameras was completed by Julie Artois, with the help of the IDLab MEDIA team. The framework also postprocesses the output video datasets by applying color correction and camera calibration. The goal now is to produce interesting multi-view video datasets and apply them in the field of immersive light field video rendering. In the meantime, you can admire some of the results, when using 4 of the 6 panels, on this webpage.

NeRF example output This GIF demonstrates the view synthesis output of instant-ngp NeRF when 36 images taken by our camera setup are used as input.