The Skimovie dataset was created for the paper: Evaluation of Object Detection Models and Video Tracking in Skiing Videos. It includes videos and images, which display people on ski race tracks from several ski resorts. Camera systems are set up alongside the race tracks and film skiers and snowboarders going down the slope. All clearly visible people in the frame were manually annotated with bounding boxes, which capture their whole body but not their skis, poles or snowboard.
The dataset comprises 2718 bounding-box-annotated images, which were extracted from a total of 110 sequences. The sequences depict six different scenes or race tracks. The image resolution is 1280x720 pixels with 72 pixels per inch. The images depict scenarios that include a single skier, multiple skiers, people partially occluded, and skiers who crashed. Also, different weather conditions and surroundings are displayed depending on the sequence.
Annotations are available in the YOLO format. For each image, ground truths are saved in a corresponding text file with the same name as the image file. The format is as follows
<class>, <x>, <y>, <width>, <height>
x, y are the coordinates of the bounding box center and
width, height are the bounding box width and height.
Values are relative to the image size.
The dataset contains four fully annotated videos, which are annotated at each frame. Each person occurring in the video is annotated and given a tracking ID, even if they occur only briefly in the video. The videos feature seven unique annotated objects and have a combined total length of 74 seconds with a resolution of 1920 x 1080 pixels and 12.5 frames per second (fps). Each video depicts a different setting. Except for one video, all videos feature two annotated objects, where at least one object is in the frame for the entirety of the sequence.
Annotations are available in the MOT format. Ground truths are saved in a single text file and have the following format:
<frame>, <id>, <bb_left>, <bb_top>, <bb_width>, <bb_height>, <conf>, <x>, <y>, <z>
Values describing the bounding box are absolute.
conf value acts as a flag whether the entry is to be considered.
A value of 0 means that this particular instance is ignored in the evaluation, while any other value can be used to mark it as active.
The values for
x, y, z are not relevant in our case and filled with -1.
The datasets are exclusively provided for scientific research purposes and as such cannot be used commercially or for any other purpose.
If any other purpose is intended, you may directly contact the originator of the dataset, Philip Steinkellner, or Assoc. Prof. DI Dr. Klaus Schoeffmann.
Besides, a reference must be made to the following publication  when this dataset is used in any academic and research reports:
Philip Steinkellner. Evaluation of Object Detection Models and Video Tracking in Skiing Videos. Universität Klagenfurt, Master Thesis, 2021.
The datasets are licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0, ) and is maintained by the Multimedia Systems Group of the Institute of Information Technology (ITEC) at Alpen-Adria Universität in Klagenfurt, Austria.
This license allows users of this dataset to copy, distribute, and transmit the work under the following conditions:
If you agree to above conditions, you are free to download:
 Philip Steinkellner. Evaluation of Object Detection Models and Video Tracking in Skiing Videos. Universität Klagenfurt, 2021.