Text this: Hybrid spatio-temporal models for deepfake detection using facial movement analysis