If you ever get back into it, I bet you can 10x to 20x that speed with augmented labeling.
Once you have >100 frames labeled you can put a classifier in the loop and only have to label the % of frames it gets wrong.
I usually set up a view with 10x10 samples only containing samples labeled as a single class by the classifier, then I mark those it got wrong as unlabeled and move on to the next batch. With an 80% accurate classifier you can get 80 samples labeled every 5 seconds or so.
And if you retrain the classifier regularly on the newly labeled samples you can improve its accuracy and the speed of labeling with it.
A data scientist friend of mine had some success with Figure8 but I haven't used it myself.
Honestly I always roll my own, it's dead fast to throw a simple GUI together in tkinter and it makes it easy to integrate your own models and custom sample rendering/plotting.
That is if you're doing simple discrete class labeling, as opposed to more complex labeling like box-labeling for image segmentation, or text2speech labeling for example.
Thx for sharing the link to your work. Interesting idea. A few thoughts:
1) Congrats on the birth of your child. I imagine you have zero time now but as they get older, you start getting your time back. I went through this and now I can sneak in personal projects while the kids are in their activities, late at night, early am. Be aware your body and stamina declines as you age. I am getting close to mid 40s and I can feel it.
2) The previous poster presented an interesting idea about putting a basic classifier in the loop. The challenge is how do you if the classifier gets it wrong. Confidence scores you get from the logits are extremely flakey. I think one solution is to metric learning methods (contrastive loss instead of cross entropy). I have seen some papers that dance around this but have not seen anything fully baked from a scientific perspective.
3) Your task is an interesting action recognition task. You should seriously consider putting it on kaggle or write a paper on it (and release the dataset). The easy off-the-shelf model you could try on this data for a video classification task like this is possibly X3D. But there are a variety of other methods (I'm a researcher in the field).
Once you have >100 frames labeled you can put a classifier in the loop and only have to label the % of frames it gets wrong.
I usually set up a view with 10x10 samples only containing samples labeled as a single class by the classifier, then I mark those it got wrong as unlabeled and move on to the next batch. With an 80% accurate classifier you can get 80 samples labeled every 5 seconds or so.
And if you retrain the classifier regularly on the newly labeled samples you can improve its accuracy and the speed of labeling with it.
PS: congrats on the son!