Launch HN: Biodock (YC W21) – Better microscopy image analysis

onychomys · on March 3, 2021

I work in the research arm of a certain extremely famous hospital in a small town in the midwest, and once you guys get your histo package up and running, I'll definitely check it out. It's a crowded space, but we're always on the lookout for the next big thing.

I do wonder, though, about the wisdom of doing that sort of analysis in the cloud. Our projects routinely use several terabytes of images (we have about 150TB stored right now, most of which is full-slide images), and uploading them somewhere isn't just a simple fire-and-forget procedure. Cool analysis algorithms might not be enough to make up for the headache of having to wait for days on end for the uploads to reach the cloud.

mike210 · on March 3, 2021

Hi! Michael here - one of the cofounders. That sounds great and would love to chat! What kind of tissues are you looking at and what are you trying to achieve?

As for the size of data - it's definitely a tradeoff here. We're seeing more and more scientists already uploading their data to the cloud however. We even integrate with data stores like S3 already so you can hook into data you've already uploaded.

We're also looking to build out a pipeline builder - where you can process images in real-time, so this would be less painful, as well as a Dropbox-like mini tool so data would be uploaded as you acquire it.

skwb · on March 3, 2021

Bioinformatics PhD student here: any plans to move to histology field? Or are you focusing more on the R&D space for now?

mike210 · on March 3, 2021

Hey! What kind of histology? If you mean doing AI analysis of histology images for research/etc. Then 100% - that's actually of the modules we're looking to do and if you work with those kind of images I would love for you to reach out at michael at biodock dot ai. We're totally free for academics, and a module like that would be free for academics as well.

If you mean clinical diagnostics histology - we're going to hold off a bit on that, although we're discussing some early partnerships there.

ArtWomb · on March 5, 2021

I think this is terrific you are trying to tackle this problem. The issue with using "50 year old" techniques is a detriment to progress. Interface looks beautiful as well.

My insights are probably things you've already considered. There are as many quantitative microscopy techniques as there are lab teams in the wild. Everyone builds their own correlations depending on their research dive of interest. So I think even if you just solved that problem: cloud storage of images combined with manual human analysis software. You could get some fraction of the million or so researchers across the globe. For many UI use cases, simplicity is desirable. But in this space, I think feature richness is key (even if 80% of users only need 20% of counting techniques).

The other obvious point is that image analysis can be similar across domains. From biology, to high energy physics. Even places you may not dream of: digital archeology. Keeping things general as possible may not be in your original mission or expertise, but I think it could help discover other pockets of interest.

Last point is around "AI enhanced microscopy". I've seen state of the art techniques like this termed "Medical Vision". Sort of a combination of "precision medicine" and "computer vision". As you try to break from the past, it may be something to ascribe to your services. Plus it sounds cool and futuristic ;)

Best of luck and really rooting for this to succeed. Definitely a problem that needs to be solved. And ripe with potential. Great job!

mike210 · on March 5, 2021

Hi! First, your last sentence made us really happy - thanks for the kind words.

As for your points: 1. Definitely. We are trying to get to feature parity ASAP with traditional methods so that our AI modules can really shine. We're building out a library right now so that scientists can put together their own pipelines.

2. We're keeping everything open at this point, and definitely see the possibility that other domains will need structured image analysis like the kind that we are building.

3. Interesting - might be a better way to phrase what we are working on.

Thanks! If you're interested in chatting, would love to get in touch. Reach out at michael at biodock dot ai.

itamarst · on March 3, 2021

To be fair, you can get really very far with non-machine-learning automated techniques (I did good-enough algorithms for an in-situ fluorescent gene sequencing image processing pipeline at one job). I suspect any form of good automated processing, regardless of whether it's AI, would be welcome to biologists.

If you'd ever like to chat about the automation parts, I'd be interested in hearing how you're approaching it; the niche of "scientific computing, but repeatable" is quite different than traditional scientific software, and it seems like people are still in early stages of figuring out how to do it.

Would also be interested in hearing how you approach correctness. The best approach I've discovered is metamorphic testing. Basically you modify real inputs, and then ensure the output matches. E.g. you say, "OK, I have this finished algorithm that segments cells, `f(image) -> cells`. Now, if I double brightness on everything, I would expect the same results, so let's see what `f(brighter_image)` is, I would expect same output." Or like "If I merge nuclei-looking splotches that cross what original segmentation boundary was, that should result in fewer cells." Unfortunately only discovered this technique after I left the image processing job, so haven't had chance to try it.

mike210 · on March 3, 2021

Definitely! There are some applications where traditional methods are just simply good enough. However, when they aren't, it can be incredibly frustrating. From our conversations with scientists, this kind of data (3D, histology, difficult tissues, new assays) is increasing in volume.

As for correctness, we've only done mAP scores and traditional accuracy metrics so far to compare with other algorithms, but we also have our own internal metrics and a test set we're building out in-house to cover many edge cases, many of which cover some of the things you are talking about. One thing we're always trying to be sensitive of is fairness. We want to make sure that we're not biasing the test to our algorithm, which would make us look better than we are.

itamarst · on March 3, 2021

I guess when I say correctness, I mean "how do I know it _continues_ to be correct on data we've never seen before". That's where metamorphic testing can be valuable, because it lets you at least find incorrectness on real-world data that hasn't been hand-tagged.

mike210 · on March 3, 2021

Ah, yes. We're even looking to use some generative models in order to even do variations based on data and then compare that we do similarly well between cases.

I guess the point I was making was that we want to make sure we don't then use this generated or modified data in order to test other algorithms in the space and say we're better. Simply put, it would be unfair for us to make changes to perform better on a hurdle and then put other algorithms through those hurdles. But for internal use, it's definitely great!

mike210 · on March 3, 2021

Also would love to chat if you ping me about the automation part - would love to get some feedback there. michael at biodock dot ai

andy99 · on March 3, 2021

Very cool.

Do you have lots of unlabelled data, and if so, do you do any self-supervised pre-training?

Have you ever considered releasing the backbone weights for the pre-trained models you have? No idea if this would be possible without giving up core IP, but I know I'm personally dying for an alternative to Imagenet (COCO for you?) trained on a big dataset.

Are the images in your set diverse enough that you'd expect the backbone to be a good general feature extractor?

mike210 · on March 3, 2021

Hi Andy,

We do have lots of unlabelled data, and we're also labeling a large portion of it. We do transfer learn for all of the models we're training, and the first backbone we use is partially self-supervised. Seems to help in overall performance, but it's not a huge effect in our experience.

Maybe once we get a lot of models we can release the backbone weights for nuclear segmentation or at least a competition set of some data we've labeled. Some IP issues here though.

What kind of alternative are you looking for? Specifically one for cells, or just for biologics in general? I'm guessing you're trying to have a better base of weights to transfer off so you can train your own model?

I would say we have medium diversity in terms of images - I think unless you have a similar application right now, you'd be better off transferring off of Imagenet just due to the amount of labeled data.

andy99 · on March 3, 2021

Thanks for the reply! I'm actually working in a different domain but it seems to have a lot in common with yours - lots of unlabelled data, images that have nothing in common with Imagenet, in that they are all essentially of the same thing and we are looking for variations or features. We found that self-supervised pre-training (with various contrastive models) underperformed vs. starting with weights trained on Imagenet.

So a model that has been pretrained on something else, with enough variability to work as a feature extractor, but closer to the problem framing I mention would be of interest.

For state of the art computer vision stuff, most of the benchmarks use imagenet or similar datasets. But unfortunately I'm coming around to the realisation that those datasets are not representative of most real world problems (except general purpose scene / object recognition). So it becomes very challenging to pick out a potential technique to apply, and hope it transfers.

gkk · on March 4, 2021

Do I read it correctly that you're working with images with repeated pattern of instances of the same object on the image? I've been working with cell images, solving segmentation task - what biodock works on - and found interesting tricks to train models on vastly smaller number of labels than what you would think is possible with off-the-shelf models (e.g. Mask-RCNN or U-net + refinements).

mike210 · on March 3, 2021

Interesting - it would be great to chat and find out more. Maybe there are things we can learn about each other. Can you shoot me an email at michael at biodock dot ai?

svara · on March 3, 2021

Hi! We've been bootstrapping ariadne.ai [0] in the same space. Similar origin story, too - turning tools we built as grad students into products for the biomedical industry.

Looks like we're thinking along somewhat convergent lines, but with some interesting differences as well. Feel free to reach out to my HN username at ariadne.ai if you'd like to chat.

[0] https://ariadne.ai

mike210 · on March 3, 2021

Hey! Super cool. Will shoot you a message. Would love to chat and see what things we could talk about. Seems like you guys have built out some valuable assay analyses that we've heard people asking for, so congrats.