抄録
Video surveillance systems are a key tool for the identification of anomalous events, but they still rely heavily on human analysis, which limits their efficiency. Current video anomaly detection models aim to automatically detect such events. However, most of them provide only a binary classification (normal or anomalous) and do not identify the specific type of anomaly. Although recent proposals address anomaly classification, they typically require full video analysis, making them unsuitable for online applications. In this work, we propose a modular framework for the joint detection and classification of anomalies, designed to operate on individual clips within continuous video streams. The architecture integrates interchangeable modules (feature extractor, detector, and classifier) and is adaptable to both offline and online scenarios. Specifically, we introduce a multi-category classifier that processes only anomalous clips, enabling efficient clip-level classification. Experiments conducted on the UCF-Crime dataset validate the effectiveness of the framework, achieving 74.77% clip-level accuracy and 58.96% video-level accuracy, surpassing prior approaches and confirming its applicability in real-world surveillance environments.