Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Biodock (YC W21) – Better microscopy image analysis
47 points by nurlybek on March 3, 2021 | hide | past | favorite | 18 comments
Hi Hacker News! We’re Nurlybek and Michael, the cofounders of Biodock (http://www.biodock.ai/). We help scientists expedite microscopy image analysis.

Michael and I built Biodock due to the challenges we experienced in microscopy image analysis while we were at Stanford. As a Ph.D. student, I spent hours manually counting through lipid droplets in microscope images of embryonic tissues. The incredible frustration I felt led me to try all kinds of software. Eventually, I went out to seek help from other scientists. Michael, a computer science student, was working in a lab just across from mine when he got my email asking for help. We got to chatting in a med school cafe and realized that we were both tackling the same issues with microscopy images.

Microscopy images are one of the most fundamental forms of data in biomedical research, from discovery all the way to clinical trials. They can be used to show the expression of genes, the progression of the disease, and the efficacy of treatments.

However, images are also very frustrating, and we think a lot of that has to do with the current tools available. To analyze their images, many scientists at top research institutions use software techniques invented 50 years ago, like thresholding and filtering. Some even spend their days manually drawing regions around cells or regions. Not only is this extremely frustrating, but it slows down the research cycle, meaning that it takes a lot more time and money to create potentially lifesaving cures. Contrast these tools to the incredible recent headway into deep learning - where applications like AlphaFold have led to incredible gains in what was previously possible.

Our goal is to bring these performance gains to research scientists. The current core module in Biodock is AI cell segmentation for fluorescent cells, based mostly on Mask R-CNN and U-Net architectures, and trained on thousands of cell images. Essentially, it identifies where each cell is and calculates important features like location, size, and fluorescent expression for each cell. This module performs around 40% more accurately than other software.

So how is this different from training deep learning models yourself? First, our pretrained modules are trained on a huge amount of data, which allows for great performance for all scientists without needing to label data or optimize training. Secondly, we’ve spent time carefully building our cloud architecture and algorithms for production, including a large cluster of GPUs. We even slice images into crops, process them in parallel, and stitch them together. We also have storage, data integrations, and visualizations built into the platform.

We know that AI cell segmentation addresses only a small fraction of microscopy analysis in the biomedical space, and we are launching several more modules soon, tackling some of the most difficult images in the space. So far, we’ve been able to generate different custom AI modules for diverse tissues and imaging modalities (fluorescence, brightfield, electron microscopy, histology). Eventually, we want to link other biological data analyses into the cloud including DNA sequences, proteomics, and flow cytometry, to power the 500K scientists and 3K companies in the US biotech and pharma space.

We would love to hear from you and get your feedback—especially if you've ever spent hours on image analysis!




I work in the research arm of a certain extremely famous hospital in a small town in the midwest, and once you guys get your histo package up and running, I'll definitely check it out. It's a crowded space, but we're always on the lookout for the next big thing.

I do wonder, though, about the wisdom of doing that sort of analysis in the cloud. Our projects routinely use several terabytes of images (we have about 150TB stored right now, most of which is full-slide images), and uploading them somewhere isn't just a simple fire-and-forget procedure. Cool analysis algorithms might not be enough to make up for the headache of having to wait for days on end for the uploads to reach the cloud.


Hi! Michael here - one of the cofounders. That sounds great and would love to chat! What kind of tissues are you looking at and what are you trying to achieve?

As for the size of data - it's definitely a tradeoff here. We're seeing more and more scientists already uploading their data to the cloud however. We even integrate with data stores like S3 already so you can hook into data you've already uploaded.

We're also looking to build out a pipeline builder - where you can process images in real-time, so this would be less painful, as well as a Dropbox-like mini tool so data would be uploaded as you acquire it.


Bioinformatics PhD student here: any plans to move to histology field? Or are you focusing more on the R&D space for now?


Hey! What kind of histology? If you mean doing AI analysis of histology images for research/etc. Then 100% - that's actually of the modules we're looking to do and if you work with those kind of images I would love for you to reach out at michael at biodock dot ai. We're totally free for academics, and a module like that would be free for academics as well.

If you mean clinical diagnostics histology - we're going to hold off a bit on that, although we're discussing some early partnerships there.


I think this is terrific you are trying to tackle this problem. The issue with using "50 year old" techniques is a detriment to progress. Interface looks beautiful as well.

My insights are probably things you've already considered. There are as many quantitative microscopy techniques as there are lab teams in the wild. Everyone builds their own correlations depending on their research dive of interest. So I think even if you just solved that problem: cloud storage of images combined with manual human analysis software. You could get some fraction of the million or so researchers across the globe. For many UI use cases, simplicity is desirable. But in this space, I think feature richness is key (even if 80% of users only need 20% of counting techniques).

The other obvious point is that image analysis can be similar across domains. From biology, to high energy physics. Even places you may not dream of: digital archeology. Keeping things general as possible may not be in your original mission or expertise, but I think it could help discover other pockets of interest.

Last point is around "AI enhanced microscopy". I've seen state of the art techniques like this termed "Medical Vision". Sort of a combination of "precision medicine" and "computer vision". As you try to break from the past, it may be something to ascribe to your services. Plus it sounds cool and futuristic ;)

Best of luck and really rooting for this to succeed. Definitely a problem that needs to be solved. And ripe with potential. Great job!


Hi! First, your last sentence made us really happy - thanks for the kind words.

As for your points: 1. Definitely. We are trying to get to feature parity ASAP with traditional methods so that our AI modules can really shine. We're building out a library right now so that scientists can put together their own pipelines.

2. We're keeping everything open at this point, and definitely see the possibility that other domains will need structured image analysis like the kind that we are building.

3. Interesting - might be a better way to phrase what we are working on.

Thanks! If you're interested in chatting, would love to get in touch. Reach out at michael at biodock dot ai.


To be fair, you can get really very far with non-machine-learning automated techniques (I did good-enough algorithms for an in-situ fluorescent gene sequencing image processing pipeline at one job). I suspect any form of good automated processing, regardless of whether it's AI, would be welcome to biologists.

If you'd ever like to chat about the automation parts, I'd be interested in hearing how you're approaching it; the niche of "scientific computing, but repeatable" is quite different than traditional scientific software, and it seems like people are still in early stages of figuring out how to do it.

Would also be interested in hearing how you approach correctness. The best approach I've discovered is metamorphic testing. Basically you modify real inputs, and then ensure the output matches. E.g. you say, "OK, I have this finished algorithm that segments cells, `f(image) -> cells`. Now, if I double brightness on everything, I would expect the same results, so let's see what `f(brighter_image)` is, I would expect same output." Or like "If I merge nuclei-looking splotches that cross what original segmentation boundary was, that should result in fewer cells." Unfortunately only discovered this technique after I left the image processing job, so haven't had chance to try it.


Definitely! There are some applications where traditional methods are just simply good enough. However, when they aren't, it can be incredibly frustrating. From our conversations with scientists, this kind of data (3D, histology, difficult tissues, new assays) is increasing in volume.

As for correctness, we've only done mAP scores and traditional accuracy metrics so far to compare with other algorithms, but we also have our own internal metrics and a test set we're building out in-house to cover many edge cases, many of which cover some of the things you are talking about. One thing we're always trying to be sensitive of is fairness. We want to make sure that we're not biasing the test to our algorithm, which would make us look better than we are.


I guess when I say correctness, I mean "how do I know it _continues_ to be correct on data we've never seen before". That's where metamorphic testing can be valuable, because it lets you at least find incorrectness on real-world data that hasn't been hand-tagged.


Ah, yes. We're even looking to use some generative models in order to even do variations based on data and then compare that we do similarly well between cases.

I guess the point I was making was that we want to make sure we don't then use this generated or modified data in order to test other algorithms in the space and say we're better. Simply put, it would be unfair for us to make changes to perform better on a hurdle and then put other algorithms through those hurdles. But for internal use, it's definitely great!


Also would love to chat if you ping me about the automation part - would love to get some feedback there. michael at biodock dot ai


Very cool.

Do you have lots of unlabelled data, and if so, do you do any self-supervised pre-training?

Have you ever considered releasing the backbone weights for the pre-trained models you have? No idea if this would be possible without giving up core IP, but I know I'm personally dying for an alternative to Imagenet (COCO for you?) trained on a big dataset.

Are the images in your set diverse enough that you'd expect the backbone to be a good general feature extractor?


Hi Andy,

We do have lots of unlabelled data, and we're also labeling a large portion of it. We do transfer learn for all of the models we're training, and the first backbone we use is partially self-supervised. Seems to help in overall performance, but it's not a huge effect in our experience.

Maybe once we get a lot of models we can release the backbone weights for nuclear segmentation or at least a competition set of some data we've labeled. Some IP issues here though.

What kind of alternative are you looking for? Specifically one for cells, or just for biologics in general? I'm guessing you're trying to have a better base of weights to transfer off so you can train your own model?

I would say we have medium diversity in terms of images - I think unless you have a similar application right now, you'd be better off transferring off of Imagenet just due to the amount of labeled data.


Thanks for the reply! I'm actually working in a different domain but it seems to have a lot in common with yours - lots of unlabelled data, images that have nothing in common with Imagenet, in that they are all essentially of the same thing and we are looking for variations or features. We found that self-supervised pre-training (with various contrastive models) underperformed vs. starting with weights trained on Imagenet.

So a model that has been pretrained on something else, with enough variability to work as a feature extractor, but closer to the problem framing I mention would be of interest.

For state of the art computer vision stuff, most of the benchmarks use imagenet or similar datasets. But unfortunately I'm coming around to the realisation that those datasets are not representative of most real world problems (except general purpose scene / object recognition). So it becomes very challenging to pick out a potential technique to apply, and hope it transfers.


Do I read it correctly that you're working with images with repeated pattern of instances of the same object on the image? I've been working with cell images, solving segmentation task - what biodock works on - and found interesting tricks to train models on vastly smaller number of labels than what you would think is possible with off-the-shelf models (e.g. Mask-RCNN or U-net + refinements).


Interesting - it would be great to chat and find out more. Maybe there are things we can learn about each other. Can you shoot me an email at michael at biodock dot ai?


Hi! We've been bootstrapping ariadne.ai [0] in the same space. Similar origin story, too - turning tools we built as grad students into products for the biomedical industry.

Looks like we're thinking along somewhat convergent lines, but with some interesting differences as well. Feel free to reach out to my HN username at ariadne.ai if you'd like to chat.

[0] https://ariadne.ai


Hey! Super cool. Will shoot you a message. Would love to chat and see what things we could talk about. Seems like you guys have built out some valuable assay analyses that we've heard people asking for, so congrats.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: