IMO the RNN is overkill of this problem, compared to a simple and elegant algorithm called "$1 unistroke recognizer". That one works beautifully even when trained with just a single sample of each gesture.
I hope $1 unistroke gets more recognition because it can be integrated in an afternoon into any project to add gesture recognition and make the UI more friendly.
It works quite reliably for palm style "Graffiti" text entry, as long as each letter is just a single stroke. The original paper also makes great effort to be readable and understandable.
A big issue with the $1 recognizer is that it requires strokes to be drawn in a specific way, for example to draw a circle you need to go counterclockwise, if you go clockwise (as seems more natural to me) it gets recognized as a caret. This makes it not really usable in a context of free drawing were the users are not aware of the details of your implementation.
But this is only a potential issue if you expect users to record their own gestures and then switch direction for some reason. If you are the one to define the gestures you can just preprocess them to allow multiple directions/orientations (or just record multiple yourself).
This does not scale well when your drawing is more complicated. A simple example is a square, which can start in 4 places and go 2 directions, now you have 8 samples, but it gets more complicated because some people use multi-stroke for the square.
The other algos in the family are more robust to this, but after experimenting, a RNN or vision model does much better on the consistency side of things.
What I meant is to add both the clockwise and counter-clockwise variant of same gesture. Rotations are another matter, $1 unistroke can be made to be either sensitive or insensitive to gesture rotation, depending what you want. Often you'd want to discern "7" from "L".
Uni-stroke is much more elegant input method than multi-stroke. You can react to user's gesture as soon as they lift the mouse button (or stylus or finger), without introducing some arbitrary delay. Users can learn and become very fast at using gestures. Multi-stroke on the other hand requires coordination of each stroke with previous ones and to me it doesn't justify its complexity. I admit I have preference the software where users adapt and become proficient, while many products with wider audience need to be more accessible towards beginners. Different strokes...
right, but for a square, you have to add 8 samples, not 2, to handle the 4 starting points and 2 directions, but this does not account for the users who multi-stroke
> Different strokes...
I see what you did there :] I'm definitely in the reduce user burden camp.
People here testing out the example on this page and reporting errors seem to be missing the fact that this demo is "trained" on one example. The linked paper[0] goes into error rates, and they get better pretty quickly with a few more examples.
There's no "training", it's more of a data sample matching, akin more to these new vector databases than a neural network. You have to have gesture or point cloud samples in the data set
I played with this for a bit and found it too simple. If you don't draw the example shapes exactly, it confuses them. I recommend playing with "delete" versus "x" from the example shapes to see just how poorly this does. I could not get it to consistently differentiate between different drawing techniques.
This would certainly get you started for gesture interfaces, where drawing a shape the same way every time is expected. It would not be a good fit for the use case here of diagramming.
I implemented that in Objective-C when the iPhone was new-ish. It was a fun demo on a touch screen. It was surprising how well it worked for how simple it was.
https://github.com/ddunkin/dollar-touch
I have this deep seated fear that NNs will be the death of the lessons learned from 1970-2010. After all, if you can use massive amounts of compute to materialize what seems to be a good enough function approximator, why do advanced algorithms at all?
Obviously the reason we should is that approximators like the NNs have explainability issues and corner case unpredictability issues plus they are bad at real world complexity (which is why self driving efforts continue to struggle even when exposed to a narrow subset of the real world).
I think you're right on about explainability and unexpected handling of corner cases - but I think one of the lessons from GOFAI is that handcrafted algorithms might look good in a lab, but rarely handle real-world complexity well at all. Folks worked for decades to try to make systems that did even a tiny fraction of what chatgpt or SD do and basically all failed.
For safety stuff, justice-related decision-making, etc I think explainability is critical, but on the other hand for something like "match doodle to controlled vocabulary of shapes" (and tons of other very-simple-for-humans-but-annoyingly-hard-for-computers problems), why not just use the tiny model?
Maybe if we get really good at making ML models we can make models that invent comprehensible algorithms that solve complex problems and can be tweaked by hand. Maybe if we discover that a problem can be reasonably well solved by a very tiny model, that's a good indication that there is in fact a decent algorithm for solving that problem (and it's worth trying to find the human-comprehensible algorithm).
> However, if you’re anything like us, even a simple straight line drawn with a mouse or a trackpad can end up looking like a path trod by a tipsy squirrel. Don’t even get us started on circles and rectangles.
But who needs to draw shapes with their mouse in Canva? Years ago, Miro had a feature that converted your flailing attempts at drawing a star with a mouse into a geometrically precise star (or circle, or triangle, or whatever). I thought it was super cool, but then I never, ever needed to use it. I never need to do line drawing with my mouse: if I'm making diagrams, I just use pre-made shapes, which are faster. If I am making icons, I use a whole different process centered around Boolean operations and nudging points and the Pen tool—and I am probably using a dedicated program, like Illustrator, to do it. And if I am actually illustrating something (rarer these days than in times past) I have a tablet I will pull out. I am sure the tech here is cool, but what's the use case?
Canva is not a diagraming tool. It’s a visual design tool with a very different user base.
Their asset library is massive with millions, maybe tens of millions, of images including both photos and vector graphics.
One of the more annoying parts of the tool - in my limited experience - is searching through an endless library for simple shapes when I already know exactly what I want. Presumably this tool aims to solve that pain point.
Disclosure: worked there a few years ago.
Edit: I suspect (zero inside info) this use case is important because they want to be a competitive diagraming tool as well. However, they’ll be constrained in that they cannot fundamentally change the design experience for the other 99% of their current users.
Designers/marketers who don't learn keyboard shortcuts, for whom the comparison is "drawing the shape with my mouth" (quick) vs. "going through upwards of a half dozen menus to pick the right shape, place it, then resize it" (slower). Even if the shape is available w/o going to any menus, drawing the entire thing with your mouth using a single cursor is going to be faster than placing and resizing a bunch of icons, switching to the arrow feature and adding the arrows in.
Every designer I know (including myself) uses keyboard shortcuts like crazy. That's why Photoshop, Illustrator, Sketch, and Figma all have a very robust set of keyboard shortcuts. I assume marketers are the same—anyone who uses an application every day, really.
But I'm saying I don't know any designers who don't use hotkeys, and while that's anecdotal, I can't imagine they are more than a slice of a slice of Canva's target market.
I’m the author of perfect-freehand. It’s a good amount and I found the whole interaction really great: I get to tease library users on twitter to sponsor me, get a bunch of engagement, and give the company the opportunity to sponsor me to get a positive boost from a happy ending. No complaints.
Having my work used in a visible way by large companies does make me wish I’d productized it more. Maybe sometime I’ll release a refactored/improved version under a different license. However, I’ve also started a company around a different open source project of mine (tldraw.com) and if there’s a bag to be got, I’m sure I’ll get it there.
Nice work and thank you for replying. I’m new(ish) to open source and was asking because I wanted the perspective of someone more experienced. I don’t have much context for what level of work goes into making something like this, nor have I any clue what it feels like to know a big company is using my work. Your reply really helped to inform my understanding of these things and like you, I’d be quite pleased with a sponsorship of that amount.
"We developed a variation on the Ramer-Douglas-Peucker (RDP) algorithm, which is a curve simplification algorithm that reduces the number of points in a curve while preserving its important details. It achieves this by recursively removing points that deviate insignificantly from the simplified version of the curve."
This reminded me of an old side project, which others may be interested in. I applied Douglas-Peucker to Picasso for a talk at Strange Loop 2018:
This makes me wonder how they pulled off something similar in Macromedia Flash (RIP) well over 20 years ago. I vividly remember being amazed by how it smoothed out curves when drawing freehand, with such limited processing power compared to today's CPUs.
Smoothing is a different operation where you are simplifying the bezier curve by removing redundant(ish) points. So if you draw an almost straight line, you may have created 100 control points, and then the software simplifies it down to 4 points.
For what it’s worth, perfect freehand (the library I wrote that they’re using for the digital ink) does not use Bézier curves at all to determine the position of points or create the “stroke” polygon that wraps the input points. Curves are only used to draw the line around that stroke polygon.
I tried to incorporate simplification into the perfect freehand algorithm but eventually gave up because I could not find any “stable” algorithm that would not cause the line to jump around as you drew / added points to the end. As far as I know none exist. Most apps that use simplification solve this problem by simplifying the line only after you’ve finished drawing, however for handwriting or artistic drawing this is especially bad as it “undoes” legitimate decisions you’ve made about where to place detail. In addition, the perfect-freehand algorithm simulates pressure based on density of points, so simplifying the line (ie removing points) without creating a corresponding pressure curve will cause a reduction in line variation, which is part of the fun of using it in the first place!
I’d love to learn more about what the canva team has done here though. Freehand drawing is a fascinating problem space, even without the ml / shape recognition stuff.
I suspect it took mouse events, and initially drew straight lines between them. That's necessary on 1990's hardware because drawing straight lines is fast, and you need to do it fast.
Then, when you are done drawing, it redraws the line, using the same points as before, but this time as input to a spline curve algorithm.
Drawing splines isn't much harder computationally, but notably if you add one more point to the end of a spline curve, then part of the line that you have already drawn changes. That in itself is very computationally heavy, since everything behind that line now needs to be redrawn - certainly not something you can be sure can be done at 60 fps!
Drawing lines is faster than you think! The perfect-freehand algorithm that I wrote / that canva is using here does not use splines but it does recompute the entire line on every frame. It’s fine at 60fps (and also fine at 120fps) up to a few thousand points on somewhat modern hardware before the paths get too slow to render. The algo itself is just blazing fast, in part because it does not rely on splines (which are much more complex).
For an svg-style approach to ink (as opposed to a raster / dab-style brush) there’s no other option than recomputing the whole line each time. As a bonus, you can adjust the properties after the fact very easily. (You can try that at perfectfreehand.com.)
But on 90's hardware, you still have no compositing or back buffers, so if you want to 'move' previous bits of line, you must redraw the background, which is the expensive bit.
I'm surely in the minority, but I oddly find myself enjoying the hand-drawn "shaky scribble" versions more than the "sleek vector graphic." I'm sure even my preference would be context dependent though, so even in my case it's a cool feature. But in a world of artificial perfection, there's something innately attractive in a genuine hand-drawn production.
If you implement a feature like this, please, make it optional and obvious when it’s enabled. It’s maddening when tools try to be too smart and don’t get it perfect (I have been guilty of this too)
There was a game called Scribblenauts that my kids loved years before any of the recent ML/AI hype and it was able to turn very rough scribbles into an amazing number of different objects. No idea how they did it, but even I was impressed - the kids thaught it was magic.
It would be nice if this was open source :) Recently, there have been various models that have become small in size (This one is 250kb. There are other simple tasks that have seen models of size 50kb or so for finetuning large models). I am looking forward to when we can actually get back to small models for useful applications :)
They trained it to recognize nine predefined shapes?
Come on, if you're going to train a model, make it a generic smoother/DWIM for drawing shapes!
You will also get more "analog"/never-identical shapes, which will feel much more stylish in the way drums feel warmer than drum samples even when played by an expert at hitting the notes identically and on time.
The iPad drawing app Procreate has a smoothing tool that sounds kinda like what you're describing—you basically draw a line freehand, and then Procreate smooths it afterwards.
Most other drawing apps (like Clip Studio Paint, which is what I primarily use) have a comparable ability to smooth the lines as you're drawing by stabilizing the actual brush tool—basically slowing down the responsiveness of the brush to reduce jitter.
I agree, all the examples in TFA feel lifeless compared to the originals (except the circle). I could see the utility if they went for "proper" vector shapes, but here it feels like the worst of both worlds.
There's an odd feeling about the writing in this article. Maybe I'm seeing things but it does not feel like it's written or composed entirely by a person.
The engineers of ASML, TSMC and others wake up every day, shoot lasers on liquid lead to generate light with extreme short wavelenghts, to make smaller and more performant chips.
And web developers wake up every day so that no one notices their work.
seems like a recipe for fragility... The mixtures of wavelengths will make optimizing other parts of the process very hard. And even keeping a consistent mix of wavelengths isn't easy.
More performant chips mean you can have more software abstraction and build things quickly. The increase in chip speed does not correspond to faster program execution but rather faster program authorship.
It's easier to train an army of web developers to build React applications than to teach them PHP + JS, Ruby + JS, etc. Those React developers can also (on average; many people are insanely productive in "uncool" languages) write applications more quickly.
For example, a company could write their app for macOS + Windows + Linux using native frameworks, or they could write their app once in JS + Electron.
A native app would certainly be much more performant, but that comes at the cost of being much more difficult to build, and most likely, Linux would not be supported at all.
What you've described is simply a tooling problem. We can (and should) have tooling that creates native, performant apps and is as easy to create with as React.
It's not against web devs in general, but this "give a man a hammer and every problem looks like a nail" approach to things.
Apps like Canva are unusable for me on my old PC. Many websites too.
I have only about 15GB of mobile data, some websites take like 20 megabytes of my monthly without any fancy video/imagery.
I hope $1 unistroke gets more recognition because it can be integrated in an afternoon into any project to add gesture recognition and make the UI more friendly.
It works quite reliably for palm style "Graffiti" text entry, as long as each letter is just a single stroke. The original paper also makes great effort to be readable and understandable.
https://depts.washington.edu/acelab/proj/dollar/index.html