Text this: A multi-layer CNN-GRUSKIP model based on transformer for spatial: temporal traffic flow prediction