Text this: Adopting attention and cross-layer features for fine-grained representation