Just curious, could you expand on what about that process takes years?

a_bonobo · 2025-06-05T01:34:49 1749087289

Bioinformatically, you could compare your protein with known proteins and infer function from there, but OP's paper is specifically for the use case where we know nothing in our databases.

Time-wise it depends where in the process you start!

Do you know what your target protein even is? I've seen entire PhDs trying to purify a single protein - every protein is purified differently, there are dozens of methods and some work and some don't. If you can purify it you can run a barrage of tests on the protein - is it a kinase, how does it bind to different assays etc. That gives you a fairly broad idea in which area of activity your protein sits.

If you know what it is, you can clone it into your vector like E. coli. Then E. coli will hopefully express it. That's a few weeks/months of work, depending on how much you want to double-check.

You can then use fluorescent tags like GFP to show you where in the cell your protein is located. Is it in the cell-wall? is it in the nucleus? that might give you an indication to function. But you only have the location at this point.

If your protein is in an easily kept organism like mice, you can run knock-out experiments, where you use different approaches to either turn off or delete the gene that produces the protein. That takes a few months too - and chances are nothing in your phenotype will change once the gene is knocked out, protein-protein networks are resilient and there might be another protein jumping in to do the job.

if you have an idea of what your protein does, you can confirm using protein-protein binding studies - I think yeast two-hybrid is still very popular for this? It tests whether two specific proteins - your candidate and another protein - interact or bind.

None of those tests will tell you 'this is definitely a nicotinamide adenine dinucleotide binding protein', every test (and there are many more!) will add one piece to the puzzle.

Edit: of course it gets extra-annoying when all these puzzle pieces contradict each other. In my past life I've done some work with the ndh genes that sit on plant chloroplasts and are lost in orchids and some other groups of plants (including my babies), so it's interesting to see what they actually do and why they can be lost. It's called ndh because it was initially named NADH-dehydrogenase-like, because by sequence it kind of looks like a NADH dehydrogenase.

There's a guy in Japan (Toshiharu Shikanai) who worked on it most of his career and worked out that it certainly is NOT a NADH dehydrogenase and is instead a Fd-dependent plastoquinone reductase. https://www.sciencedirect.com/science/article/pii/S000527281...

Knockout experiments with ndh are annoying because it seems to be only really important in stress conditions - under regular conditions our ndh- plants behaved the same.

Again, this is only one protein, and since it's in chloroplasts it's ultra-common - most likely one of the more abundant proteins on earth (it's not in algae either). And we still call it ndh even though it is a Ferredoxin-plastoquinone reductase.