Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To author -- code sample as images is great for syntax highlight but I wanted to play with the examples and.. got stuck trying to copy the content.

(also expected tesseract to do a bit better than this:

  $ wl-paste -t image/png | tesseract -l eng - -
  Estimating resolution as 199
  const std = @import("std");
  const expect = std.testing.expect;
  
  const Point = struct {x: i32, y: i32};
  
  test "anonymous struct literal" {
  const pt: Point = .{
  x = 13,
  -y = 67,
  33
  try expect (pt.x
  try expect(pt.y
  
  13);
  67);

)




tesseract does well for me...

    const std = @import("std");
    const expect = std.testing.expect;

    const Point = struct {x: i32, y: i32};

    test "anonymous struct literal" {

    const pt: Point = .{
    .x = 13,
    .y = 67,
    };
    try expect(pt.x == 13);
    try expect(pt.y == 67);

The trick is to preprocess the image a little bit like so:

    ocr () 
    { 
        magick - -monochrome -negate - | tesseract stdin stdout 2> /dev/null
    }

Thank you!

Unfortunately I get the same kind of garbage around closing curly braces / closing parenthesis / dots with this magick filter... It seems to do slightly better with an extra `-resize 400%`, but still very far from as good as what you're getting (to be fair the monochrome filter is not pretty (bleeding) when inspecting the result).

I wonder what's different? ( ImageMagick-7.1.1.47-1.fc42.x86_64 and tesseract-5.5.0-5.fc42.x86_64 here, no config, langpack(s) also from the distro)


Yeah, he definitely should have used a code block for the examples. To the author, if you are trying to preserve code formatting and syntax highlighting, there are JS packages that will take care of all of that and produce clean, copyable, well-rendered, accessible code formatting for you.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: