The University of Central Florida invention is a real-time online system and method that can detect multiple activities occurring in long, untrimmed security videos. The invention uses a deep learning approach to process videos in an online fashion at a clip level—drastically reducing the computation time in detecting activities. The ability of the method to process one video clip at a time in an online fashion makes it robust against varying length activities. The methodology was tested on the VIRAT and MEVA (Multiview Extended Video with Activities) datasets with more than 250 hours of videos and demonstrated effective performance in terms of processing speed as well as activity detection. The invention can process high-resolution security videos at 100 frames per second.

Active Sparse Labeling System Cuts Video Annotation Costs

Abstract

The University of Central Florida invention provides a low-cost system that can significantly reduce the annotation cost for videos. Video activity detection requires annotations at every frame, which drastically increases the labeling cost. As a solution, the UCF invention offers a way to greatly reduce annotation costs while using only a few annotations. An example application is preparing large-scale video datasets for various video analysis tasks such as video tracking, detection and segmentation.

Technical Details: The invention’s Active Sparse Labeling (ASL) algorithm identifies the usefulness of each frame of a video. ASL then suggests the frames and videos that can improve dense video understanding tasks such as activity detection in videos. Along with the algorithm, the invention uses a unique method called Spatio-Temporal Weighted loss (STeW loss) to train video models using datasets with sparsely annotated frames. The invention works in two stages, first, it trains a deep-learning video model using very sparse frames and then uses the trained deep-learning model to select more frames for annotation based on their utility value. When tested on public benchmark datasets, UCF-101 and J-HMDB, with more than 400,000 frames, the invention effectively reduced annotation cost by 90 percent and learned action detection in videos using only 10 percent of annotated video frames.

Partnering Opportunity: The research team is seeking partners for licensing, research collaboration, or both.

Stage of Development: Prototype available.

Benefit

Can save annotation cost by 90 percent

Trains deep-learning models using less than 10 percent of the annotated video dataset

Reduces redundancy in the selected frames, thus reducing the overall annotation cost by selecting fewer high-utility frames

Market Application

Companies working in big data collection and data labeling of videos

Publications

Are All Frames Equal? Active Sparse Labeling for Video Action Detectio n, 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

Co-Investigator Network

Niels Da Vitoria Lobo

University of Central Florida

Computer Science - Academic Instruction, College Of Engineering & Computer Science

Niels.DaVitoriaLobo@ucf.edu

Mubarak Shah

University of Central Florida

Computer Science - Academic Instruction, College Of Engineering & Computer Science

Mubarak.Shah@ucf.edu

Websites

Search Google Scholar for Yogesh Singh Rawat

Contact Information

yogesh@ucf.edu

Profile QR code

What is this?

Yogesh Singh Rawat

COMPUTER SCIENCE - ACADEMIC INSTRUCTION | COLLEGE OF ENGINEERING & COMPUTER SCIENCE

Research Terms & Keywords

Research Projects

Publications

Technologies

Abstract

Abstract

Benefit

Market Application

Publications

Co-Investigator Network

Websites

Contact Information

About

Discover

Connect

Keyword Search

Browse by STEM