Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ship Shape (canva.dev)
327 points by SerCe on Nov 13, 2023 | hide | past | favorite | 81 comments


IMO the RNN is overkill of this problem, compared to a simple and elegant algorithm called "$1 unistroke recognizer". That one works beautifully even when trained with just a single sample of each gesture.

I hope $1 unistroke gets more recognition because it can be integrated in an afternoon into any project to add gesture recognition and make the UI more friendly.

It works quite reliably for palm style "Graffiti" text entry, as long as each letter is just a single stroke. The original paper also makes great effort to be readable and understandable.

https://depts.washington.edu/acelab/proj/dollar/index.html


A big issue with the $1 recognizer is that it requires strokes to be drawn in a specific way, for example to draw a circle you need to go counterclockwise, if you go clockwise (as seems more natural to me) it gets recognized as a caret. This makes it not really usable in a context of free drawing were the users are not aware of the details of your implementation.


But this is only a potential issue if you expect users to record their own gestures and then switch direction for some reason. If you are the one to define the gestures you can just preprocess them to allow multiple directions/orientations (or just record multiple yourself).


Not an issue, just invert each recorded gesture and add it to the same symbol.


This does not scale well when your drawing is more complicated. A simple example is a square, which can start in 4 places and go 2 directions, now you have 8 samples, but it gets more complicated because some people use multi-stroke for the square.

The other algos in the family are more robust to this, but after experimenting, a RNN or vision model does much better on the consistency side of things.


What I meant is to add both the clockwise and counter-clockwise variant of same gesture. Rotations are another matter, $1 unistroke can be made to be either sensitive or insensitive to gesture rotation, depending what you want. Often you'd want to discern "7" from "L".

Uni-stroke is much more elegant input method than multi-stroke. You can react to user's gesture as soon as they lift the mouse button (or stylus or finger), without introducing some arbitrary delay. Users can learn and become very fast at using gestures. Multi-stroke on the other hand requires coordination of each stroke with previous ones and to me it doesn't justify its complexity. I admit I have preference the software where users adapt and become proficient, while many products with wider audience need to be more accessible towards beginners. Different strokes...


right, but for a square, you have to add 8 samples, not 2, to handle the 4 starting points and 2 directions, but this does not account for the users who multi-stroke

> Different strokes...

I see what you did there :] I'm definitely in the reduce user burden camp.

https://quickdraw.withgoogle.com/ is a good baseline to start from for a more resilient gesture recognizer


it thinks everything I draw is a caret.


> ^^ ^^^^^^ ^^^^^^^^^ ^ ^^^^ ^^ ^ ^^^^^^

I don't understand what you're trying to say here.


People here testing out the example on this page and reporting errors seem to be missing the fact that this demo is "trained" on one example. The linked paper[0] goes into error rates, and they get better pretty quickly with a few more examples.

[0]https://faculty.washington.edu/wobbrock/pubs/uist-07.01.pdf , page 8


I've just tried it, and it's pretty bad, without training at least.

My rectangle is recognized as a caret, my zigzag as curly bracket.

And it doesn't support drawing a shape in two strokes, like the arrow for example.


There's no "training", it's more of a data sample matching, akin more to these new vector databases than a neural network. You have to have gesture or point cloud samples in the data set


I played with this for a bit and found it too simple. If you don't draw the example shapes exactly, it confuses them. I recommend playing with "delete" versus "x" from the example shapes to see just how poorly this does. I could not get it to consistently differentiate between different drawing techniques.

This would certainly get you started for gesture interfaces, where drawing a shape the same way every time is expected. It would not be a good fit for the use case here of diagramming.


Agreed, it really works too well for how simple it is!

We implemented it in ES6 as part of a uni project if anyone's interested: https://github.com/gurgunday/onedollar-unistroke-es6


I tried 4 different shapes (circle, rectangle, triangle and heart) and it always said "Ellipse with score ...".


From the README:

> By default, it recognizes three shapes: Arrow, Check mark, and Ellipse.

> You can add more templates by drawing them and clicking on the Add Template button.

It worked well for the three - except a clockwise circle wouldn't work, only a counter-clockwise.


it works really well if all you are drawing is an eclipse ¯\_(ツ)_/¯. Could be a bug in the client or your implementation of $1.


I implemented that in Objective-C when the iPhone was new-ish. It was a fun demo on a touch screen. It was surprising how well it worked for how simple it was. https://github.com/ddunkin/dollar-touch


it does not work as well.

I have this deep seated fear that NNs will be the death of the lessons learned from 1970-2010. After all, if you can use massive amounts of compute to materialize what seems to be a good enough function approximator, why do advanced algorithms at all?

Obviously the reason we should is that approximators like the NNs have explainability issues and corner case unpredictability issues plus they are bad at real world complexity (which is why self driving efforts continue to struggle even when exposed to a narrow subset of the real world).


I think you're right on about explainability and unexpected handling of corner cases - but I think one of the lessons from GOFAI is that handcrafted algorithms might look good in a lab, but rarely handle real-world complexity well at all. Folks worked for decades to try to make systems that did even a tiny fraction of what chatgpt or SD do and basically all failed.

For safety stuff, justice-related decision-making, etc I think explainability is critical, but on the other hand for something like "match doodle to controlled vocabulary of shapes" (and tons of other very-simple-for-humans-but-annoyingly-hard-for-computers problems), why not just use the tiny model?

Maybe if we get really good at making ML models we can make models that invent comprehensible algorithms that solve complex problems and can be tweaked by hand. Maybe if we discover that a problem can be reasonably well solved by a very tiny model, that's a good indication that there is in fact a decent algorithm for solving that problem (and it's worth trying to find the human-comprehensible algorithm).


exactly


> However, if you’re anything like us, even a simple straight line drawn with a mouse or a trackpad can end up looking like a path trod by a tipsy squirrel. Don’t even get us started on circles and rectangles.

But who needs to draw shapes with their mouse in Canva? Years ago, Miro had a feature that converted your flailing attempts at drawing a star with a mouse into a geometrically precise star (or circle, or triangle, or whatever). I thought it was super cool, but then I never, ever needed to use it. I never need to do line drawing with my mouse: if I'm making diagrams, I just use pre-made shapes, which are faster. If I am making icons, I use a whole different process centered around Boolean operations and nudging points and the Pen tool—and I am probably using a dedicated program, like Illustrator, to do it. And if I am actually illustrating something (rarer these days than in times past) I have a tablet I will pull out. I am sure the tech here is cool, but what's the use case?


Canva is not a diagraming tool. It’s a visual design tool with a very different user base.

Their asset library is massive with millions, maybe tens of millions, of images including both photos and vector graphics.

One of the more annoying parts of the tool - in my limited experience - is searching through an endless library for simple shapes when I already know exactly what I want. Presumably this tool aims to solve that pain point.

Disclosure: worked there a few years ago.

Edit: I suspect (zero inside info) this use case is important because they want to be a competitive diagraming tool as well. However, they’ll be constrained in that they cannot fundamentally change the design experience for the other 99% of their current users.


> but what's the use case?

Designers/marketers who don't learn keyboard shortcuts, for whom the comparison is "drawing the shape with my mouth" (quick) vs. "going through upwards of a half dozen menus to pick the right shape, place it, then resize it" (slower). Even if the shape is available w/o going to any menus, drawing the entire thing with your mouth using a single cursor is going to be faster than placing and resizing a bunch of icons, switching to the arrow feature and adding the arrows in.


Every designer I know (including myself) uses keyboard shortcuts like crazy. That's why Photoshop, Illustrator, Sketch, and Figma all have a very robust set of keyboard shortcuts. I assume marketers are the same—anyone who uses an application every day, really.


Which is why I specified those who don't :)


But I'm saying I don't know any designers who don't use hotkeys, and while that's anecdotal, I can't imagine they are more than a slice of a slice of Canva's target market.


The library Canva use for drawing lines may be of interest: https://github.com/steveruizok/perfect-freehand


Doesn't look like Canva is a sponsor...



After being called out lol


Is 5K for this is a good amount? It seems generous to me but I’m also sure it’s only a drop in the bucket for Canva.


I’m the author of perfect-freehand. It’s a good amount and I found the whole interaction really great: I get to tease library users on twitter to sponsor me, get a bunch of engagement, and give the company the opportunity to sponsor me to get a positive boost from a happy ending. No complaints.

Having my work used in a visible way by large companies does make me wish I’d productized it more. Maybe sometime I’ll release a refactored/improved version under a different license. However, I’ve also started a company around a different open source project of mine (tldraw.com) and if there’s a bag to be got, I’m sure I’ll get it there.


Nice work and thank you for replying. I’m new(ish) to open source and was asking because I wanted the perspective of someone more experienced. I don’t have much context for what level of work goes into making something like this, nor have I any clue what it feels like to know a big company is using my work. Your reply really helped to inform my understanding of these things and like you, I’d be quite pleased with a sponsorship of that amount.


If I was the creator of the project, I'd be happy with it. Most companies would not have donated anything.


If this seems generous then it’s a good amount. That it’s only a drop in the bucket for Canva is not relevant.


"We developed a variation on the Ramer-Douglas-Peucker (RDP) algorithm, which is a curve simplification algorithm that reduces the number of points in a curve while preserving its important details. It achieves this by recursively removing points that deviate insignificantly from the simplified version of the curve."

This reminded me of an old side project, which others may be interested in. I applied Douglas-Peucker to Picasso for a talk at Strange Loop 2018:

Picasso's Bulls: Deconstructing his design process with Python https://rrherr.github.io/picasso/


This makes me wonder how they pulled off something similar in Macromedia Flash (RIP) well over 20 years ago. I vividly remember being amazed by how it smoothed out curves when drawing freehand, with such limited processing power compared to today's CPUs.


LeCun et al. got 99%+ handwritten digit accuracy in 1995, which is pretty analogous to shape identification.

Having it run trivially and performantly in the browser is still an accomplishment. As always, the experience for the user is what counts.


It was core funtionality for The Apple Newton in 1993 with a 20 MHz ARM processor

https://en.wikipedia.org/wiki/MessagePad#User_interface


Smoothing is a different operation where you are simplifying the bezier curve by removing redundant(ish) points. So if you draw an almost straight line, you may have created 100 control points, and then the software simplifies it down to 4 points.


For what it’s worth, perfect freehand (the library I wrote that they’re using for the digital ink) does not use Bézier curves at all to determine the position of points or create the “stroke” polygon that wraps the input points. Curves are only used to draw the line around that stroke polygon.

I tried to incorporate simplification into the perfect freehand algorithm but eventually gave up because I could not find any “stable” algorithm that would not cause the line to jump around as you drew / added points to the end. As far as I know none exist. Most apps that use simplification solve this problem by simplifying the line only after you’ve finished drawing, however for handwriting or artistic drawing this is especially bad as it “undoes” legitimate decisions you’ve made about where to place detail. In addition, the perfect-freehand algorithm simulates pressure based on density of points, so simplifying the line (ie removing points) without creating a corresponding pressure curve will cause a reduction in line variation, which is part of the fun of using it in the first place!

I’d love to learn more about what the canva team has done here though. Freehand drawing is a fascinating problem space, even without the ml / shape recognition stuff.


Irrelevant - you must use machine learning for everything now.


These GPUs aren't going to heat themselves!


I lol’d


I suspect it took mouse events, and initially drew straight lines between them. That's necessary on 1990's hardware because drawing straight lines is fast, and you need to do it fast.

Then, when you are done drawing, it redraws the line, using the same points as before, but this time as input to a spline curve algorithm.

Drawing splines isn't much harder computationally, but notably if you add one more point to the end of a spline curve, then part of the line that you have already drawn changes. That in itself is very computationally heavy, since everything behind that line now needs to be redrawn - certainly not something you can be sure can be done at 60 fps!


Drawing lines is faster than you think! The perfect-freehand algorithm that I wrote / that canva is using here does not use splines but it does recompute the entire line on every frame. It’s fine at 60fps (and also fine at 120fps) up to a few thousand points on somewhat modern hardware before the paths get too slow to render. The algo itself is just blazing fast, in part because it does not rely on splines (which are much more complex).

For an svg-style approach to ink (as opposed to a raster / dab-style brush) there’s no other option than recomputing the whole line each time. As a bonus, you can adjust the properties after the fact very easily. (You can try that at perfectfreehand.com.)


But on 90's hardware, you still have no compositing or back buffers, so if you want to 'move' previous bits of line, you must redraw the background, which is the expensive bit.


I don't deal with splines much, but when I do, I'm reticulating splines.


Great article, and very interesting work.

I'm surely in the minority, but I oddly find myself enjoying the hand-drawn "shaky scribble" versions more than the "sleek vector graphic." I'm sure even my preference would be context dependent though, so even in my case it's a cool feature. But in a world of artificial perfection, there's something innately attractive in a genuine hand-drawn production.


If you implement a feature like this, please, make it optional and obvious when it’s enabled. It’s maddening when tools try to be too smart and don’t get it perfect (I have been guilty of this too)


There was a game called Scribblenauts that my kids loved years before any of the recent ML/AI hype and it was able to turn very rough scribbles into an amazing number of different objects. No idea how they did it, but even I was impressed - the kids thaught it was magic.

https://store.steampowered.com/app/218680/Scribblenauts_Unli...


I’ve played it — it truly is amazing. If I’m remembering correctly, it made it to iOS, too.


It would be nice if this was open source :) Recently, there have been various models that have become small in size (This one is 250kb. There are other simple tasks that have seen models of size 50kb or so for finetuning large models). I am looking forward to when we can actually get back to small models for useful applications :)


A pentagram and a sparkly star are not the same thing. Is this an example of underfitting?



They trained it to recognize nine predefined shapes?

Come on, if you're going to train a model, make it a generic smoother/DWIM for drawing shapes!

You will also get more "analog"/never-identical shapes, which will feel much more stylish in the way drums feel warmer than drum samples even when played by an expert at hitting the notes identically and on time.


The iPad drawing app Procreate has a smoothing tool that sounds kinda like what you're describing—you basically draw a line freehand, and then Procreate smooths it afterwards.

Most other drawing apps (like Clip Studio Paint, which is what I primarily use) have a comparable ability to smooth the lines as you're drawing by stabilizing the actual brush tool—basically slowing down the responsiveness of the brush to reduce jitter.


I agree, all the examples in TFA feel lifeless compared to the originals (except the circle). I could see the utility if they went for "proper" vector shapes, but here it feels like the worst of both worlds.


There's an odd feeling about the writing in this article. Maybe I'm seeing things but it does not feel like it's written or composed entirely by a person.


followed the "Draw" link and played with the thing there, but didn't see a way to demonstrate this functionality? Is it a paid feature or something?


It triggers if you keep the mouse button down for a couple of seconds after finishing the stroke.


same, after 1h of technical descriptions of how they did it, I didn't find out WHERE / HOW we as users can use this feature!!


if model is on client side itself why not make it open source?


To keep a competitive advantage and get value out of your investment.


still now someone would reverse engineer it and make it open source then


how will this change with ai though?


It already uses AI so I'm not sure what you mean.


The engineers of ASML, TSMC and others wake up every day, shoot lasers on liquid lead to generate light with extreme short wavelenghts, to make smaller and more performant chips.

And web developers wake up every day so that no one notices their work.


nitpick: TSMC's EUV process uses lasers to vaporize tin, not lead, into EUV emitting plasma.


Does the tin itself lase? Is the radiation given off from the tin single-frequency and phase coherent?


No. And it doesn’t matter for their use case.


seems like a recipe for fragility... The mixtures of wavelengths will make optimizing other parts of the process very hard. And even keeping a consistent mix of wavelengths isn't easy.


Well, go apply here: https://www.cymer.com/careers/

Given the criticality of the tech, I'm sure the price will be met.


More performant chips mean you can have more software abstraction and build things quickly. The increase in chip speed does not correspond to faster program execution but rather faster program authorship.

It's easier to train an army of web developers to build React applications than to teach them PHP + JS, Ruby + JS, etc. Those React developers can also (on average; many people are insanely productive in "uncool" languages) write applications more quickly.

For example, a company could write their app for macOS + Windows + Linux using native frameworks, or they could write their app once in JS + Electron.

A native app would certainly be much more performant, but that comes at the cost of being much more difficult to build, and most likely, Linux would not be supported at all.


What you've described is simply a tooling problem. We can (and should) have tooling that creates native, performant apps and is as easy to create with as React.


I think you're right; we just aren't there yet.

My point was that these performance increases aren't simply going into the void, they're being transformed into productivity increases.


Faster or just cheaper web devs that won’t unionize?


Both, probably.

If you can teach more people how to write web apps, that skill becomes less valuable.


Wouldn't it be hilarious if some ASML/TSMC engineers used Canva internally? I bet it happens in some corner.


They certainly do, somewhere.

It's not against web devs in general, but this "give a man a hammer and every problem looks like a nail" approach to things. Apps like Canva are unusable for me on my old PC. Many websites too. I have only about 15GB of mobile data, some websites take like 20 megabytes of my monthly without any fancy video/imagery.

Related article: https://www.theolognion.com/unreal-engine-5-is-meant-to-ridi...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: