Text this: A combinatorial RGB and depth images CNN-based model for oil palm fruit bunch detection and heatmap localisation for a visual SLAM system