Hacker News new | past | comments | ask | show | jobs | submit login
"River" detection in text (dsp.stackexchange.com)
434 points by cpleppert on Feb 23, 2013 | hide | past | favorite | 91 comments



It's really impressive how their sophisticated image processing algorithms are only a few lines long. If I wrote an algorithm to detect rivers, it would be hundreds of lines long and less effective. These guys tackle it mathematically and it seems almost magical. I wish I could think that way.


I find this comment interesting given the recent (ok, continuous) talk about not using maths puzzles in programmer interviews. This is perhaps an interesting real-world case that demonstrates how a mathematical outlook leads to much cleaner code.

Ok, solving this kind of thing is rare, but still...


In some areas, programming is not the most important ability, but instead the understanding of the mathematics behind the phenomenon. This is the case in areas such as operations research, computer vision, and machine learning. You cannot assume that, to be a programmer, a person must know these subjects. On the other hand, creating a good algorithm in these mathematical subjects doesn't imply that you are a good programmer.


The secret sauce here is not the ability to solve "maths puzzles", but specialized domain knowledge.


That's exactly the wrong way round. You can acquire domain knowledge, you can have a domain expert assigned to you to work with you. What you can't acquire, on demand, is the ability to think in the ways that mathematics gives you. That requires extensive training and practice, and recognizing when obscure bits of theory are applied is something that doesn't come overnight.


Its made up of both actually. Somebody outside devops would consider pervasive use of sed, awk and other unix text processing utilities line noise. Yet such a thing would come naturally to that person working on it daily.

If you can't really get such stuff easily at the very first time does it mean you are a bad programmer?

Not quite.

The way I look at it, I would consider somebody good if he can read the documentation/theory etc and then come up solution to problems(read: can write programs). That way you know the guy can actually get some work done. It requires practice and time to get used to and comfortable to a new domain. That's natural friction you have to wear out. Regardless of whether its math, music, literature of whatever. Paying too much importance to factual stuff isn't of much use. What's more important is, can the person work his way out of the problem.

I checked out your bio, I understand you do math for a living and are obviously a little defensive when some one underplays the importance of your area of expertise. But the fact of the matter is though math is relevant, the areas in which its relevant are largely either rare or are already solved and presented to most programmers as libraries and frameworks. And if its not, simple analysis, a little reading and experimenting is sufficient to solve the problem at hand.

Your problem is not with programmers but with a term called abstraction. And fighting that is futile. The benefits vastly over weight any intellectual argument you can present against it.

EDIT: By the way after reading your bio, I have developed atmost respect for all the work you have done in your life.


Was a fine tangential comment, till you started an unsolicited diagnosis of Colin's hypothetical disease.

Downvoted.


Huh? What hypothetical disease?


He said:

    Your problem is not with programmers but with a term
    called abstraction. And fighting that is futile. The
    benefits vastly over weight any intellectual argument
    you can present against it.
Utterly bizarre, to be claiming that a pure mathematician has a problem with abstraction.


Why would that be a disease though? I thought he'd maybe edited his reply and said something else. Maybe I'm just tangled up in the misuse of a word though!

Side note, does the theory of juggling have a mathematical underpinning?


I have no idea why it would be a "disease", but he did seem to be diagnosing my attitude or "problem."

Don't care.

And yes, there is a great deal of mathematics underneath the structures of juggling patterns. Using them we predicted the existence of previously unknown juggling patterns, and have now used the math to create a proof that of all the juggling tricks of a certain type, we know them all (for some definition of "to know").


It was supposed to be a metaphor, intended to highlight that no such disease is present and nor was it necessary to go about trying to diagnose what "the problem" with Colin was.

diagnose <-> disease

It clearly did not work as intended.


Yes, juggling patterns can be expressed neatly by the use of fairly simple mathematics. Look up "siteswap" or "siteswap notation". (It's also closely related to the mathematics of bell-ringing or "chiming".)


This is an utterly bizarre comment, and I have absolutely no idea what to make of it. It seems mostly non-sequitur, and then gets things exactly the wrong way round. I really don't know how to respond - it's almost perverse.

However, there appears to be such a depth of misunderstanding, or non-understanding, I feel compelled to try ...

Having read, re-read, and re-re-read your comment, let me say this. You appear to completely misunderstand the point. I'm not talking about mathematical ideas that have been taken, used, programmed, and made available through libraries. I'm talking about recognizing problems and techniques when they turn up unexpectedly, and in disguise.

Many, many times in my 30 years as a working programmer have I found elegant solutions to previously intractable problems, purely because I recognized some sort of weird mathematical structure lurking underneath. It's my specialty to go into a situation where the domain experts have worked to solve a problem, sometimes for years, to have them explain the problem to me, and then to bring different insights and tools to bear on the problem. I'm not necessarily more clever, or more capable, I just have a different toolbox. And when the problem at hand is unusual, my toolbox is particularly powerful.

You said:

    Somebody outside devops would consider pervasive use of sed, awk
    and other unix text processing utilities line noise.  Yet such a
    thing would come naturally to that person working on it daily.
I have no idea what point you're trying to make here.

    If you can't really get such stuff easily at the very first time
    does it mean you are a bad programmer?  Not quite.
Of course it doesn't. Different programmers have different skill sets, and different experience. Someone who doesn't know databases or web programming may be a wonderfully productive programmer in another area of specialization.

I have no idea what relevance this has to my comment.

    The way I look at it, I would consider somebody good if he can read
    the documentation/theory etc and then come up solution to problems
    (read: can write programs). That way you know the guy can actually
    get some work done. It requires practice and time to get used to and
    comfortable to a new domain. That's natural friction you have to wear
    out. Regardless of whether its math, music, literature of whatever.
    Paying too much importance to factual stuff isn't of much use. What's
    more important is, can the person work his way out of the problem.
Perhaps. My point is that people often dismiss mathematics as useless to programmers because they can't see how this lemma or that theorem can ever be relevant. My point is that the methods of thought trained by the study of advanced mathematics has, in my experience and others', regularly led to deep insights in apparently unrelated problem domains. This ability is not something you can "just read up on." This isn't a case of reading documentation and/or existing theory - this is a case of seeing something as being a disguised example of something else. I've lost count of the number of times I've recognized a matching problem, or a topological separation, or an abstract vector space, or a group acting on a metric space, and that recognition has subsequently led to a new solution.

    I checked out your bio, I understand you do math for a living and are
    obviously a little defensive when some one underplays the importance
    of your area of expertise. But the fact of the matter is though math
    is relevant, the areas in which its relevant are largely either rare
    or are already solved and presented to most programmers as libraries
    and frameworks.  And if its not, simple analysis, a little reading and
    experimenting is sufficient to solve the problem at hand.
Then, with respect, your experience appears to be limited, and your opinions parochial.

    Your problem is not with programmers but with a term called abstraction.
    And fighting that is futile. The benefits vastly over weight any
    intellectual argument you can present against it.
This is just bizarre. I'm a pure mathematician - how can you suggest my problem is with abstraction? How can you possibly think I'm offering any argument against abstraction?


When I said 'your', I didn't mean 'you'. I mean't the problem with 'your argument'. I apologize if this has hurt you. Its like code review, I was intending to critique your argument, not you.

Yes, I agree my experience is limited. In fact now that you mention you have been a programmer for 30 years. I now realize I have much more to learn and understand, Its possible I have simply don't even know what you know and how deeply you understand things.

What advice would you give to some one like me.


  > I meant the problem with 'your argument'.
To recap the discussion:

==========

Chirono: I find this comment interesting given the recent ... talk about not using maths puzzles in programmer interviews. This is perhaps an interesting real-world case that demonstrates how a mathematical outlook leads to much cleaner code.

thomasz: The secret sauce here is not the ability to solve "maths puzzles", but specialized domain knowledge.

I replied: That's exactly the wrong way round. You can acquire domain knowledge, you can have a domain expert assigned to you to work with you. What you can't acquire, on demand, is the ability to think in the ways that mathematics gives you. That requires extensive training and practice, and recognizing when obscure bits of theory are applied is something that doesn't come overnight.

==========

So my argument is that while domain knowledge is a good thing, and (jacquesm eloquently argued) often necessary to solve a problem, it usually can be acquired as needed through reading, study, and discussion with local experts. The ability to recognize, let alone solve, problems that are actually obscure mathematics in disguise cannot be acquired on as "as needed" basis. This is why knowledge of advanced mathematics needs to be gained early, and if you don't have it by the time you are a practising, working programmer, you are unlikely ever to acquire it.

Not having acquired such skills does not make one a bad programmer. I know very, very few programmers who hold advanced degrees in mathematics, and yet the world is full of programmers who can "get the job done" efficiently and effectively. Equally, having skills in advanced mathematics does not, of itself, make one a better programmer (and certainly not a better person.) I know many PhDs in mathematics who simply cannot program effectively.

But it does make one a programmer with different skills, and those skills can enable one to solve problems others can't, and sometimes to produce solutions that are more elegant. Not always, not every problem requires such skills. But if it does, by the time you find the problem, it's too late to acquire them.

  > What advice would you give to some one like me.
I don't know what you like, what skills you already have, or what you want to end up doing. I studied math because it was fun, exciting, and I was good at it. Computers at the time didn't exist in the form they do now - maybe I would have done computing. Impossible to say.

So I can't give advice - there would be a real danger of simply trying to turn you into a clone of me, and that's a bad idea. I can tell you that understanding algorithms, time-complexity, calculus, vector spaces, and point-set topology have all been of direct and immediate use in my work.

And if you'd like a simple puzzle, here:

    Suppose the demand for a product falls linearly as a function
    of its price.  Show that if you price the product to maximise
    your profit, you will have less than half the possible market.
That turned up a few years ago in a discussion with a customer.


If you want a mathematician who can also program, that's fine. But you should be aware that that's what you actually want, and make that clear to potential applicants.

The majority of development roles do not need any more mathematical knowledge than any other job.


You don't know that. The point is that mathematical insights turn up and are useful far, far more often than people expect. The problem is that the insights usually go unnoticed because there's no one with the background or skills to recognize them.


For what it's worth (I've only taken one college level course on computer vision, plus some side projects) my experience with matlab is that it's surprisingly easy to put together a short solution once you've figured out what's actually needed. The matlab CV API is very broad and pretty well documented, so it's less about knowing the math and more about reading the manual. In that regard producing a concise solution to a CV problem is very much like producing a concise solution to most other problems in any language.


The MATLAB CV library is almost identical to OpenCV if you prefer C or C++.


Actually math and stats applications are very common, developers could really benefit from knowing more math, recognizing known problems and applying known solutions. People just do things the dumb way more often than not, and then spend months if not years fixing the mess they've created.


There is no panacea in image processing.

These methods work for this particular image, and give an idea about the possible solution, but the thresholding and morphological operations are not scale invariant, and very sensitive to light changes and noise.

Any algorithm that can detect rivers in text from any image, acquired by some device, such as mobile phone or a scanner, in any light circumstances, will run hundreds of lines long.


Sure, but the problem here is "we'd like to detect rivers in TeX, but it doesn't know about glyph shape which we know is important." They're talking about clean images only.


I wonder how hard it would be, given the detection algorithm, to fix this automatically. Seems like simulated annealing would be a pretty good fit, where you perturb the word spacing on each iteration and the energy function is the total river length. The fitness landscape is rich enough with good solutions that you should rarely need more than a few iterations to arrive at one.

Still, that's a pretty expensive fitness test, and I wonder if there's a more elegant and efficient approach than Monte Carlo methods.


Thanks for this. I just spent 30 minutes reading up on calculating pi with a simple Monte Carlo algo. Thinking about doing it d3.js...


It's an entertaining exercise, especially when you watch your computer chew away for several minutes before telling you that π is probably somewhere around 3.26.


Huh? You should get as close as 3.14 in less than a second if your random number source is decent.


I was wondering about this. Evidently the more random, the more accurate. Wacky!


Perhaps. I may be misremembering, or may have been using a particularly crappy RNG when I last did it. It was a long time ago.


How long ago? How much faster is current hardware?

If it was, say, a 4MHz machine, you both could be right.


Just for fun, and because I'd never done it before, I just threw together some Ruby to do this: https://gist.github.com/peterc/5019760

It gets to 3.14 within a second, about 30 seconds to 3.1415.. and Ruby is surely doing this in 100x the time C or Java would.

Update: Also did it for JavaScript - https://gist.github.com/peterc/5019804


You can optimize that JS code a lot by using the unit circle, limiting yourself to the first quadrant, and using multiplication instead of Math.pow(). Then you can make it branchless using a little bitwise trickery. The result is this: https://gist.github.com/osuushi/5022143 .

OK, so I was going to just do the unit circle thing. Then I got stuck in optimization mode. God I miss performance graphics programming.


It would have probably been on an 450 MHz G4, maybe 6-7 years ago, and if I had to guess, I'd say I probably did it in Python.


Calculating Pi with Monte Carlo is equivalent to calculating the Fibonacci series with the recursive formula.

It's a nice example but almost useless in practice.


I did it in Clojure once[1]. It's quite fun, I recommend it as well.


I've been seeing these all my life and for some reason never thought to ask if anyone else noticed them. I never found them particularly distracting and didn't know they were a sign of bad typography. Still, cool to see an article about them.


If you start to study typography at a relatively deep level, there will be two things that will absolutely kill you on a daily basis. One is rivers. The second is bad kerning on signs/displays.



Not a font-nerd but I'll bite. Is the 'N' in MADISON and SECOND upside down? (but correct in http://www.flickr.com/photos/ryjones/4046758633/)


You are correct. It drives me nuts when I walk around Second and Madison in Seattle; I would assume the workers were trolling when they did it, but I have pointed it out to plenty of co-workers that didn't see a problem.


> The second is bad kerning on signs/displays.

Relevant xkcd: http://xkcd.com/1015/.


Yeah bad keming is a real problem


Note to myself: Don't study typography.


Also do not visit the Wikipedia page on kerning. It taught me to recognise bad kerning :(


Same here. Did you tend to notice them out the corner of hour eye more than straight-on?


I think the vision mode that detects rivers is more of a "big-picture, perhriphery" type where the vision mode that you use for for reading is a focused, small area type. I think you will always catch them in the big picture, but may not notice when reading.


The peripheral vision is also blurry.

Rivers stand out more in blurred text, even in central vision.


Definitely. People typesetting books are supposed to get rid of them but it's a manual process. As I've written in another comment I'd typically "blur" my vision a bit and quickly scan all the pages of the book to track these down (on screen) and then remove them. Don't know what was the technique of other typesetters when it came down to finding these.

Some proofreaders would note them should they find / notice any but some wouldn't.

Now that I think of it, using OP's findings it would probably wouldn't really be too hard to write a "plugin" taking screenshots of Quark XPress / Adobe InDesign / LaTeX preview and showing "rivers" in a HUD on top of the screen-rendered pages. This would make chasing "rivers" quite easy.


Definitely cool. How about the reverse for an art project? Maybe the Constitution or Bill of Rights with rivers forming the statue of liberty.


Well, the reverse is already done in ASCII art isn't it?


There's lots of typographic art that doesn't use monospaced fonts.


Something similar: http://stackoverflow.com/questions/8479058/how-do-i-find-wal...

These sorts of image processing questions are incredibly entertaining imo (and I don't mean it in a bad way).


I was thinking about developer continuing education yesterday, and this underlines the need for some way of keeping sharp - without wasting time or direction.

I simply don't know what a Hough Transform is and rather than Wikipedia, I would rather a coursera course on image processing - it's to get things in context that matters.

As I get older i am not afraid of hard work, just afraid of exploring in every direction - the waste of time and effort is the problem, time and effort finding out what to learn rather than the practise of learning

Oh, just answered my own question - coursera!


TeX uses a dynamic programming algorithm to perform its advanced hyphenation, which allows the text to fill the page "beautifully": http://en.wikipedia.org/wiki/TeX#Hyphenation_and_justificati...

If TeX is already using dynamic programming in order to improve the visual appearance of the words, I imagine that the same can be done for the space between the words without having to resort to image processing of the TeX document rendered as PDF.


The thread at SO discusses this; both glyph positions and glyph shapes are important for a river to become noticeable.


Whether vanilla TeX supports this out of the box or not, the TeX technology stack (including MetaFont and friends) has intimate knowledge of the size and shape of all glyphs. It is also not random as to which types of letters accentuate or obfuscate the border of a river in text. It is hard for me to believe that one cannot use dynamic programming to minimize the occurrence of rivers in the final layout, in the same way that TeX can minimize ugly borders using sophisticated hyphenation algorithms.


I think a false positive or two is worth it for glyph based analysis.

You could even do some sensible analysis per glyph.

width of base: A

vs width of top: V


Is glyph width at base and top available to the layout algorithm? I don't know TeX that well.


I've always wondered if there exist any book that uses rivers as an encryption technique for hidden messages. Like the author has hidden an easter egg or something more exciting.


I love the fact that all three answers used totally different techniques and each found the answer.


This seems to me, like the least efficient method ever for dealing with the problem. Ok you want to find 'rivers' in text. To do so you have to turn that text into an image and then run it through your app, then go back and manually correct the problem? What was the point of the app again? I don't care if it took 1 line of code or 1000, this seems completely useless, when the anomaly is blatantly obvious when proofreading. Not to mention if you wanted to fix it dynamically, you'd end up with an image of a block of text instead of text, no big deal for print, but devastating for SEO.. "ok smart ass what would you recommend?" is that what your thinking? Well, since you asked; why not take an open source text editor and add an algorithm that stores each line of text in an array, find the index of all the spaces and compare it to the previous and next lines of text. If the index of spaces are relatively 0 between lines; you have a river. If the indexes increment or decrement; you have a river. Now add an extra space somewhere or move the last word on the last line to the next line. Whatever solution is most aesthetically pleasing. Now your rivers are getting fixed on the fly and you don't need to take a screenshot of a block of text to analyze it. Maybe I missed the point of the article, but i thing those of you praising their solution aren't taking into account what the problem actually is, and the fact that their app doesn't actually include a solution to the problem.


I love the variance in answers. You get several different sig-pro solutions and then at the bottom also a machine learning answer.

IMHO the machine learning solution could be better, as rivers are defined more aesthetically than functionally.


I'd like to have the tex, to translate the letters to blocks before doing image processing, but it's almost as good to simply convert it to a binary mask, and then "dilate" or "grow" the pixels. I think it makes it a simpler problem both in processing time and conceptually. Not sure about how I'd decide, identify what vertical lines to pick out, there are a lot of choices for that.


Next, someone should put that in LaTeX to automatically fix it rather than just detect it. (Or some sort of HTML/JS plugin, since using LaTeX again is somewhat disconnected from my immediate life goals).


I wouldn't hold my breath. This sounds like a huge, messy, heuristic coding effort to solve a rare, über cosmetic problem. TeX and typography in general seem to attract really obsessive folks, but this doesn't strike me as harboring a hidden ideal that's computationally well-defined and feasible the way other parts of TeX turned out to be. I guess we'll find out if somebody does it.


I like the idea that mistercow proposes above: use an algorithm to sample the space of possible typsettings and find one with less that some minimum amount of rivers or river length.


I get that, but here's the problem: there's no guarantee that any perturbation isn't going to produce more rivers. It's guess-and-check. For simulated annealing to work, you have to believe that large perturbations are more likely to produce large differences and small ones produce small differences. I am not convinced that property holds here, because a river is an optical phenomenon that doesn't directly relate to the spacing size, which is the one variable you're interested in manipulating. A small change in space might have a small effect on the river you noticed, but create an entirely new and worse one. Another problem with simulated annealing is that you want to start with large changes and work your way to small changes, but TeX has already optimized the spacing, so you would really rather start with small changes.

Your next problem is that TeX doesn't know what glyphs look like, just how much space they take up. If the "riverbanks" are made up of As and Vs, the river will be extremely pronounced. But it may not be noticeable at all if they're made up of Ms and Xs. Worse, it will also depend on the font in play. But TeX doesn't know what the glyphs look like while it's adjusting the whitespace, so you now need to introduce a feedback loop between the eject-page phase and the paragraph layout phase. I'm under the impression you'd be jumping over three or four phases to do that, because TeX lays out lines first, then paragraphs, then emits pages.

This is a huge amount of work. How big a problem are rivers?


I want to know if it can be done on HTML texts with JavaScript.


Really impressed by the short length and high effectiveness of the code. Thanks!


Why are they using bogus examples?


It's probably Lorem Ipsum, which is commonly used when you want to focus on layout rather than meaning. Since the text is meaningless it won't be distracting, yet it looks close enough to real to be useful. See http://en.wikipedia.org/wiki/Lorem_ipsum


If there are any typesetting geek on HN, any idea when this would be incorporated in TeX / LaTeX and other typesetting programs?

Oh the memories when I used to write (and typeset) books using LaTeX and Quark XPress: "rivers" had to be tracked down manually, by eye-ball searching. You'd basically "blur" your vision a bit and quickly scan through all the pages of the book. I didn't take that long and I don't write books anymore but I'd still be curious as to when that technique is going to be implemented (maybe in InDesign which I never used!?).


In Adobe InDesign, one chooses to set type using the Single-line Composer or the Paragraph Composer. The latter analyzes the entire paragraph and tweaks spacing and hyphenation to minimize rivers, ragged lines, orphans and widows.

So nowadays, I don't manually check each page for rivers anymore. When designing the page layout, I decide on the 'color' (the density of the paragraph) that works best, so that rivers are avoided, but I leave it at that.

I understand how river detection can be a fun mathematical puzzle, but it's moot. Adobe built the Multi-line Composer into InDesign since version 1.0 (1999). And in my experience, with every version, it gets better.

From MacWorld's review of InDesign 1.0:

"InDesign's text features will appeal to designers and production people tired of the drudgery of manual copyfitting and kerning. The Multi-line Composer feature can calculate hyphenation and justification settings by examining an entire paragraph (or as many lines as you specify), instead of just a single line, to create better-looking text. In the process, it notably reduces the amount of manual tweaking necessary and will be especially helpful if you're trying to avoid multiple word breaks in a design with awkward text wraps. Similarly, the program's Optical Kerning feature does its best to find optimal character spacing, even if you've mixed different type sizes and faces. "

http://www.macworld.com/article/1014955/k2.html

http://blog.paragonpress.net/2010/08/04/adobe-indesigns-para...


Woah. InDesign’s Paragraph Composer detects rivers? I didn’t know that. Now I wonder how it’s doing that.


This is massively distracting for me at certain times and I was always considered a "good, fast" reader. The only thing I detest more in a block of text than rivers, is justified alignment. Ick.


Good typography is about making the correct decisions regarding your text. If justification is making a passage hard to read then it's simply bad typography and doesn't mean justification is bad as a concept. Ragged-right isn't necessarily going to save you from rivers either although they are far less likely due to uniform spacing.


Oh, I know. I hate them as separate entities. I'm not even convinced necessarily that justified text causes more rivers - they make awkward gaps and ruin my pace, but I don't usually get distracted by rivers with justified text.


Justification without gaps works best with aggressive hyphenation. If the hyphenation isn't good, the text is full of surprises at line wraps. It's hard to do justification well as a result. It's usually restricted to newspapers with limited space these days.


> It's usually restricted to newspapers with limited space these days

That's false. Almost any printed publication uses justification and hyphenation by default: books, magazines, scientific papers. The big exception is websites.


And different languages - German, for example, has massively longer words, so the problem is worse over here.


They usually do cause more rivers simply because on many lines the spaces are significantly larger. Since you only would letterspace as an absolute last resort (and probably not even then), it tends to create more dangerous whitespace per line.


justified alignment has a time and a place (newspapers), but for the most part I agree, the differing line lengths make keeping your place far easier.


In general text in multiple columns; ragged-right columnised text is sometimes horrible to read.


Is this really a problem? In how many texts does this occur? It seems quite unlikely. Show me a book where this has happened.


You are unlikely to see many books where this has happened, because professionals try to make sure that it doesn't, and fiddle with things if it does.


I'm pretty sure I've seen it in plenty of books, and it's pretty distracting.


Probably the result of someone at the publisher not being willing to pay for good graphic design. My Dad is a graphic designer who typesets a lot of books, they really do go over every page fiddling and optimizing to make it look good.


I see it all the time on my Kindle at least (I don't read paper books anymore, due to nerve damage in my wrists). That said, I actually quite enjoy seeing 'rivers' in text, for whatever odd reason.


So do I. I'm rather annoyed at this attempt to engineer them away.


Ah, but remember that this sword cuts both ways! That is, these methods can also be used to ensure a given text is displayed with the maximum possible river amount and length.


OK, you've convinced me :-)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: