Tracking and Counting

Overviews

Using YOLOv8 to Tracking and Counting:

With a line it is possible to count in and out against that line.
Or with a region, it is possible to count the objects in the selected area

Install requirement

Please install ecos-core before

install bytetrack

pip install git+https://github.com/ifzhang/ByteTrack.git
pip install loguru lap onemetric
pip install supervision==0.9.0

install cython_bbox:

for ubuntu:

pip install cython_bbox

Note*: for windows

pip install -e git+https://github.com/samson-wang/cython_bbox.git#egg=cython-bbox

Usage

python predict.py --opt "<PATH_TO_OPT_FILE>" --weight-path "<PATH_TO_WEIGHT_FILE>" --input-path "<PATH_TO_INPUT_VIDEO>" --output-path "<PATH_TO_OUTPUT_VIDEO>"

PATH_TO_OPT_FILE is path to file opt.json
PATH_TO_WEIGHT_FILE is path to file yolov8<version>.pt
PATH_TO_INPUT_VIDEO is path to Input video
PATH_TO_OUTPUT_VIDEO is path to output video

Example with video:

python predict.py --opt opt.json --weight-path yolov8x.pt --input-path vehicle-counting.mp4 --output-path out.mp4

Example with camera:

python predict.py --opt opt.json --weight-path yolov8x.pt --is-camera --output-path out.mp4 --show

Content of opt.json like this:

{
    "data": "data.yaml",
    "task": "detect",
    "imgsz":640,
    "batch_size": 4,
    "epochs": 20,
    "version":"x",
    "save": false,
    "device": "0",
    "classes_filter": [0],
    "polygon_zone": [[150, 10], [150, 700]],
    "thickness": 2,
    "text_thickness": 2,
    "text_scale": 1,
    "camera_resolution": [1280, 720]
}

With:

classes_filter: is the array containing the class index to filter. If use default model from YOLO, class_index set based on COCO class. Example: 0 is person.
polygon_zone: is the array containing point to define zone count. To define a line, polygon_zone: "[[x1,y1]", "[x2,y2]]", with "[x1, y1]", "[x2, y2]" are the coordinates of the start and end points. Or a region is a set of many points.
thickness, text_thickness, text_scale are value to draw text and line.
camera_resolution is the array containing the resolution of the camera [width, height]