Cascaded deep network systems with linked ensemble components for underwater fish detection in the wild
We propose a fish detection system based on deep network architectures to robustly detect and count fish objects under a variety of benthic background and illumination conditions. The algorithm consists of an ensemble of Region-based Convolutional Neural Networks that are linked in a cascade structure by Long Short-Term Memory networks. The proposed network is efficiently trained as all components are jointly trained by backpropagation. We train and test our system for a dataset of 18 videos taken in the wild. In our dataset, there are around 20 to 100 fish objects per frame with many fish objects having small pixel areas (less than 900 square pixels). From a series of experiments and ablation tests, the proposed system preserves detection accuracy despite multi-scale distortions, cropping and varying background environments. We present analysis that shows how object localization accuracy is increased by an automatic correction mechanism in the deep network's cascaded ensemble structure. The correction mechanism rectifies any errors in the predictions as information progresses through the network cascade. Our findings in this experiment regarding ensemble system architectures can be generalized to other object detection applications.