Hacker News new | past | comments | ask | show | jobs | submit login

There's a bunch of issues:

- Real, bespoke biomedical analysis is not trivial in effort, cost, or time. There are biomedical analysis systems-in-a-box (look at https://galaxyproject.org), but that's just canned analysis. To make real breakthroughs, you need rigorous analysis that requires years of experience to be able to perform.

- It's easier to get the money to collect the data than it is to effectively steward the data you collect. In a past life, I ran a biomedical research computing facility, and everyone got plenty of money for new sequencers, mass specs, and other fancy instruments. They got plenty of money for collecting all kinds of data. No one would ever add money to their grants to actually STORE the data. They would literally put the data on USB hard drives bought from Best Buy, and left them in file cabinets and on desks. There was absolutely nothing I could do about this, and so I quit.

- Research is balkanized to hell. Even though I ran the scientific computing for 20 research labs, each research lab was its own fiefdom. They could decide to obey or disobey my policies at will, since they controlled their own funding. You can imagine what happened when I proposed turning on quotas (~100TB per lab, to start!). Rather than work with my team to determine how to share resources, people would just jump off my high speed facility, buy a shitty cheap JBOD from Dell for their analysis, and store their archives on shitty cheap USB hard drives from Best Buy. The funniest part was that if the hard drive failed, and the data couldn't be restored, in theory the primary investigators could get into real legal trouble. No one seemed to worry.

There are a few biomedical research institutes that "get" scientific data stewardship - Broad, Scripps, but for the most part, biomedical research computing is a total clusterfuck and I couldn't have gotten out of there fast enough for the way saner land of tech companies.




This describes the earth sciences about 5 yrs ago. People (around here at least) are seeing the light. A big driver of the change is the emphasis on data management and archiving that is coming from the NSF. Many research programs have to make data available post-publication or risk cutting off funding. Not sure if this is yet the case in the health sciences.


This has been a problem since at least 2006 or so in biomedical research.


I agree. A lot lab do not want to hire good developer or data scientist. Or they do not have the money to hire, even they spend thousands in data collecting.

check the job here, most of them are postdoc level. https://www.biostars.org/t/Jobs/

the postdoc level means you get about 50k~80k, even in bay area.

The situation is really bad.


I believe money is definitely a big part of it. If you have the skills needed to help manage and analyze "big" data (big as in too big to realistically handle in Excel, which is the limit of most biologists), you can easily earn much more somewhere else.


Partially. I worked with bioinformatics labs until recently. Career progression is limited as they treat a software engineer as a technician, and nothing more. They don't appreciate the value you bring unless you are publishing papers (certainly in the last two institutes I worked in).


I don't even think the level is at "do not want" I think it is at "do not know how" which is a much harder problem to solve.


I've had similar experiences. If you think quotas are bad, you mention storage chargebacks and suddenly it's cheesy home NAS off eBay.

https://www.youtube.com/watch?v=N2zK3sAtr-4


Could it work as a not-for-profit organization something like Internet Archive or Wikipedia?


It sort of does already. Not-for-profit organizations like the San Diego Supercomputing Center act as Biomedical-Research-IT-As-A-Service providers, but there's so much competition for grant funding, that if you can get away with doing things cheaper, you will.

The scariest part was that before I left, I built a highly scalable, long-term archive for scientific data built on LTO tapes that would allow ridiculously cheap (basically the cost of LTO tapes) on-line and near-line storage. When I left, no one wanted to bother with paying for the upkeep of the hardware, people got bored with swapping tapes, and it eventually died. Oh well. Your tax dollars at work.


Thanks for your answer.

This is such an interesting (and kind of worrying) problem. It's one of those problems where the obvious solution is only solving the obvious problem. As you have pointed out there is a more underlying problem that hides underneath and which you only know of if you know the well enough.

I need to think about this some more to understand why that is and why an incentive structure can't be created.

You gave me a new perspective on things today, thanks for that.


I built in an incentive structure. Each grant includes a certain amount of "overhead" in addition to what the researcher requests, which keeps the lights on, pays for common infrastructure. I negotiated to fund a big portion of ongoing costs for data storage out of the overhead. (Capital costs were mostly paid out of funds from a large settlement, private grants, etc.) We found that when there were no limits to usage, researchers would just duplicate their data over and over with tiny changes, which was incredibly costly.

To solve this problem, I tried giving "monopoly money" to the professors, allowing them to trade data storage and cluster time for favors, analysis, and so on. For the researchers who didn't need as much storage, they could give their excess up. For those who gobbled up storage, they could "buy" the excess. It ended up failing because I didn't have backup from leadership to say "no" when I was asked to do things that were irresponsible:

Yes, you can buy a 2TB HDD for $100. No, that isn't the same as 2TB of storage on an enterprise-level storage array, clustered, with local mirroring, offsite tape backup, etc. No, I won't plug your 2TB USB HDD into my compute cluster.


Thanks again, I was more wondering about the incentive structure for who (i.e not the professors). Even assuming its costly could someone benefit enough that they would have no problem paying those money. Could it be built into the grants structure etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: