A vision-based deep learning approach for non-contact vibration measurement using (2+1)D CNN and optical flow
This paper introduces a proof-of-concept vision-based deep learning approach for vibration measurement, proposing a factorized (2+1)D Convolutional Neural Network (CNN) model to predict four vibration metrics: acceleration, velocity, displacement, and frequency, with a focus on rigid body motion. Un...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | en |
| Published: |
Vibromechanika
2025
|
| Subjects: | |
| Online Access: | https://eprints.ums.edu.my/id/eprint/45594/1/FULLTEXT.pdf https://eprints.ums.edu.my/id/eprint/45594/ https://doi.org/10.21595/jve.2025.25002 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper introduces a proof-of-concept vision-based deep learning approach for vibration measurement, proposing a factorized (2+1)D Convolutional Neural Network (CNN) model to predict four vibration metrics: acceleration, velocity, displacement, and frequency, with a focus on rigid body motion. Unlike conventional neural network models that primarily focus on frequency prediction alone, this approach uniquely enables the simultaneous estimation of four critical vibration metrics, offering a comprehensive and cost-effective alternative to traditional contact-based sensors such as accelerometers. The framework relies on the visibility of a training fiducial marker, eliminates the need for calibration in controlled settings, enhancing scalability across specific environments. A curated dataset was generated using a controlled experimental setup comprising a single object in a lab-scale environment, augmented synthetically to enhance frequency diversity. An optical flow-based preprocessing algorithm synchronized motion features in recorded video inputs with measured vibration labels, improving measurement accuracy. The proposed model achieved an average Mean Absolute Percentage Error (MAPE) of 7.51 %, with acceleration predictions exhibiting the lowest error at 4.84 % and displacement the highest at 8.80 % across varying brightness levels and object-camera distances. Techniques such as Region of Interest (ROI) cropping and multi-section frame extraction were implemented to reduce computational complexity while further enhancing accuracy. These results highlight the frameworkâs potential for non-invasive vibration analysis, though its generalizability is limited by the single-object dataset. Future work will expand the dataset, integrate multi-sensor inputs, explore marker-less tracking methods, and enable real-time deployment for predictive maintenance and structural health monitoring. |
|---|
