And I want microphones, I want to hear the thing ring with that 2.3kHz note. I want to feel the 27Hz wiggle and the 196Hz thump. I want to get the Slow-Mo guys in there to place their camera, and watch the thing jump when a beam hits it.
The amount of energy in that thing just defies intuitive understanding from reading a paper, I have to use other senses.
You should look at LayoutLM models for a NER task. Then your pipeline should look like :
- Identity the menu sub structure (title, item list ...)
- Classify each item with 2 labels.
The training process is not hard, but the data gathering / cleaning / labelling can be a little long.