Text this: Modeling spatial layout for scene image understanding via a novel multiscale sum-product network