Skip to content

Latest commit

 

History

History
 
 

403-action-recognition-webcam

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Live Action Recognition with OpenVINO

Human action recognition finds actions over time in a video. The list of actions in this notebook is extensive (400 in total) and covers Person Actions, (e.g., drawing, drinking, laughing), Person-Person Actions (e.g., hugging, shaking hands), and Person-Object Actions (opening present, mowing the lawn, playing "instrument"). You could find several parent-child groupings on the label's list, such as braiding hair and brushing hair, salsa dancing, robot dancing, or playing violin and playing guitar. For more information about the labels and dataset, see "The Kinetics Human Action Video Dataset" research paper.

Binder
Binder is a free service where the webcam will not work, and performance on the video will not be good. For best performance, we recommend installing the notebooks locally.

Notebook Contents

This notebook demonstrates live human action recognition with OpenVINO. We use the Action Recognition Models from Open Model Zoo, specifically the Encoder and Decoder from action-recognition-0001 . Both models create a sequence to sequence ("seq2seq")1 system to identify the human activities for Kinetics-400 dataset. The models use the Video Transformer approach with ResNet34 encoder2. In the notebook we show how to create the following pipeline:

At the end of this notebook, you will see live inference results from your webcam. You can also upload a video file.

NOTE: To use the webcam, you must run this Jupyter notebook on a computer with a webcam. If you run on a server, the webcam will not work. However, you can still do inference on a video in the final step.

1 seq2seq: Deep learning models that take a sequence of items to the input and output. In this case, input: video frames, output: actions sequence. This "seq2seq" is composed of an encoder and a decoder. The encoder captures the "context" of the inputs to be analyzed by the decoder and finally gets the human action and confidence.

2 Video Transformer, and ResNet34.

For more information about the pre-trained models, refer to the Intel and public models documentation. All included in the Open Model Zoo

Installation Instructions

If you have not done so already, please follow the Installation Guide to install all required dependencies.

See Also