More

gabeiscoding · on March 19, 2013

Nope. It was a Pilot and they are not sure about doing more exomes.

From my last chat with Brian Naughton (their lead informatics guy) about this, it sounds like they are planning on doing more sequencing in the future. But it could be whole genome and it may be geared more towards research (your selected based on your phenotype) than open to any customer.

gabeiscoding · on Jan 28, 2013

Ahh the efficiency argument.

The trick is, academics often have excess manpower capacity in the form of grad students and post-docs. Even though personell is usually one of the highest expenses on any given grant, they often don't look at ways to improve the efficiency of their research man-hours.

That's not a blank rule, as we have definitely had success with the value proposition of research efficiency, but in general, a lot of things business adopt to improve project time (like Theory of Constraints project management, Mindset/Skillset/Toolset matching of personel et) is of no interest to academic researchers.

ank286 · on Jan 28, 2013

I disagree with you. If there was excess manpower, graduate students wouldn't be stressed out with overwhelming work. Obviously, there is a lot more work to go around and less bodies to give it to. Most of the research man-hours is gone trying to implement other people's research-methods so you have a 'baseline.' A complete waste of time just to have one graph in the Results section of your publication. The height of research inefficiency is to replicate someone else's results and hope (finger's crossed) that you followed their 8-page paper (that took them 10 months to develop) meticulously. Academic researchers only care about results, it is the graduate students that need to be efficient. The efficiency software should be bought by the PIs for their graduate students.

jk4930 · on Jan 28, 2013

After researching this field (biomedical R&D) a bit, I found that the mindset and workflow is mostly pre-computers. The relevant decision makers in the labs usually don't see a need to change something because "it works" and "it's done always this way".

ank286 · on Jan 28, 2013

"its always done this way" is the ultimate motivation of any startup. We wouldn't have any competing startups if everyone just accepted that, probably, not have any entrepreneurs or have a better world for that matter. The fitness function of the world will flatline.

gabeiscoding · on Jan 27, 2013

If you're interested in Next Generation Sequencing (the new "technology" OP referred to to replace microarrays), I wrote a 3-part series on my blog:

"A Hitchhikers Guide to Next Generation Sequencing"

Part1: http://blog.goldenhelix.com/?p=423

Part2: http://blog.goldenhelix.com/?p=490

Part3: http://blog.goldenhelix.com/?p=510

gabeiscoding · on Jan 27, 2013

I live in this field, as a computer scientist learning the biology, and trying to make a living with a bootstrapped company.

I wrote a post about why GATK - one of the most popular bioinformatic tools in Next Generation Sequencing should not be put into a clinical pipeline:

http://blog.goldenhelix.com/?p=1534

In terms of your ideal software strategy, I can speak to that as well, as I am actually attempting to do almost exactly what you suggesting. My team is all masters in CS & Stats, with focus on kick-ass CG visualization and UX.

We released a free genome browser (visualization of NGS data and public annotations) that reflects this:

http://www.goldenhelix.com/GenomeBrowse/

But you're right, selling software in this field is a very weird thing. It's almost B2B, but academics are not businesses and their alternative is always to throw more Post-Doc man-power at the problem or slog it out with open source tools (which many do).

That said, we've been building our business (in Montana) over the last 10 years through the GWAS era selling statistical software and are looking optimistically into the era of sequencing having a huge impact on health care.

carbocation · on Jan 27, 2013

> I wrote a post about why GATK - one of the most popular bioinformatic tools in Next Generation Sequencing should not be put into a clinical pipeline:

I've seen you link to your blog post a couple of times now, and I still think it's misleading. I do wonder whether your conflict of interest (selling competing software) has led you to come to a pretty unreasonable conclusion. (My conflict of interest is that I have a Broad affiliation, though I'm not a GATK developer.)

In your blog post, you received output from 23andme. The GATK was part of the processing pipeline that they used. What you received from 23andme indicated that you had a loss of function indel in a gene. However, it turns out that upon re-analysis, that was not present in your genome; it was just present in the genome of someone else processed at the same time as you.

Somehow, the conclusion that you draw is that the GATK should not be used in a clinical pipeline. This is hugely problematic:

1) It's not clear that there were any errors made by the GATK. Someone at 23andme said it was a GATK error, but the difference between "user error" and "software error" can be blurred for advantage. It's open source, so can someone demonstrate where this bug was fixed, if it ever existed?

2) Now let's assume that there was truly a bug. Is it not the job of the entity using the software to check it to ensure quality? An appropriate suite of test data would surely have caught this error yielding the wrong output. Wouldn't it be as fair, if not more so, to say that 23andme should not be used for clinical purposes since they don't do a good job of paying attention to their output?

Your blog post shows, for sure, a failure at 23andme. Depending on whether the erroneous output was purely due to 23andme or if the GATK had a bug in production code, your post shows an interesting system failure: an alignment of mistakes at 23andme and in the GATK. But I really don't think it remotely supports the argument that the GATK is unsuitable for use in a clinical sequencing pipeline.

gabeiscoding · on Jan 27, 2013

On your first point, my post detailed that 23andMe confirmed it was a GATK bug that introduced the bogus variants and the bug was fixed in the next minor release of the software. There are comments on the post from members of 23andMe and the GATK team that go into more details as well.

On your second point. 23andMe had every incentive to pay attention to their output, but it is fair to say it's their responsibility for letting this slip through. But, it's worth noting in the context of the OP rant, that 23andMe probably paid much more attention to their tools than most academics who often treat alignment and variant calling as a black box that they trust works as advertised.

So what I actually argue in the post (and should have stated more clearly in my summary here) was that GATK is incentivised, as an academic research tool, to quickly advance their set of features with the cost of bugs being introduced (and hopefully squashed) along the way.

This "dev" state of a tool is inappropriate for a clinical pipeline, and GATK's teams' answer to that is a "stable" branch of GATK that will be supported by their commercial software partner. Good stuff.

Finally, I actually have no conflict of interest here as Golden Helix does not sell commercial secondary analysis tools (like CLC Bio does). I wrote this from the perspective of someone who is a 23andMe consumer as well as being informed as I give recommendations of upstream tools with our users (which I might add, I would still recommend and use GATK for research use, with the caution to potentially forgo the latest release for a more stable one).

You know though, the conflict of interest dismissal is something I run into more than I would expect. I'm not sure if some commercial software vendor has acted in bad faith in our industry to deserve the cynicism or if this is defaultly inherited by the "academic" vs "industry" ethos.

carbocation · on Jan 27, 2013

> So what I actually argue in the post (and should have stated more clearly in my summary here) was that GATK is incentivised, as an academic research tool, to quickly advance their set of features with the cost of bugs being introduced (and hopefully squashed) along the way.

Sure, I agree with that. And I would agree if you would say "Using bleeding-edge nightly builds of %s for production-level clinical work is a bad idea," whether %s was the GATK or the Linux kernel. I would be in such complete agreement that I wouldn't even feel compelled to respond to your posts if that's what you would say originally, rather than saying, "the GATK ... should not be put into a clinical pipeline". The former is accepted practice industry-wide; the latter reads like FUD and cannot be justified by one anecdote.

> You know though, the conflict of interest dismissal is something I run into more than I would expect.

Regarding conflict of interest, my point in trying to understand your potential interests, and also disclosing my own so that you can see where I'm coming from. That's not a dismissal, it's a search for a more complete picture. Interested parties are often the most qualified commenters, anyway, but their conclusions merit review.

Hopefully people wouldn't dismiss my views because of my Broad connection, anymore than they would dismiss yours if you sold a competing product.

gabeiscoding · on Jan 28, 2013

They key is 23andMe was not using bleeding-edge nightly builds but official "upgrade-recommended" releases.

GATK currently has no concept of a "stable" branch of their repo (Appistry is going to provide quarterly releases in the future, which is great).

The flag I am raising is that a "stable" release is needed before it get's integrated into a clinical pipeline. Because the Broad's reputation is so high, it is important to raise this flag as otherwise researchers and even clinical bioinformaticians assume choosing the latest release of GATK for their black-box variant caller is as safe as an IT manager choosing IBM.

carbocation · on Feb 1, 2013

Good call. Much like a Ubuntu LTE, having stable freezes of the GATK (now that it's relatively mature) that only get bug-fixes but no new (possibly bug-prone) features is a great idea.

gabeiscoding · on April 30, 2012

Very cool, but I fired up an emacs session and hit Ctrl+N to start scrolling through a file.... doh!

j_s · on April 30, 2012

They list a few caveats: http://git.chromium.org/gitweb/?p=chromiumos/platform/assets...

no ssh keys, no port forwarding, Chrome 19 + "Open as Window" for some shortcut keys, etc.

roadnottaken · on April 30, 2012

htop breaks it, too -- but it actually does a pretty good job considering how complex the output is.

rginda · on April 30, 2012

I think the htop bug was fixed by https://gerrit.chromium.org/gerrit/21255. You should get it in the next version of Secure Shell.

gabeiscoding · on Oct 24, 2011

Excellent book BTW, would recommend it to anyone interested in the history of not just computing but what we think of as UI/UX.

It's almost depressing to see the same problems we keep (re-)solving have had great minds beating on them for quite a while now :)

gabeiscoding · on May 17, 2011

My Google offer experience: I turned down a offer from Google last spring. Long story, but my reasons can be summed up as having a more challenging opportunity in a leadership position at a small company versus having to enter the "engineer" lotto where you don't know what team and project you get placed on.

Funny thing is, I had a call from one of their recruiters today. A couple times previously somebody contacted me by email and I said "sure, call me", and didn't hear from them. But this guy was out of the blue and from a different office and had a different approach to it.

The message was that things are different at Google and they at least from his office's perspective, they treat each recruitment uniquely.

My take away is that Google is a big company and you'll get different experiences depending on how you enter the HR process (college grad applicant, versus sought after name). The OP in this case is doing a lot of generalization.

gabeiscoding · on Nov 8, 2010

Very interesting and ambitious project.

A few notes when testing it:

- Uses PyQt/RPy and bundles it's own R instance, but the R instance is quite old and people don't usually like having two entire R installs (twice the maintenance of their favorite packages etc).

- Why RPy and not RPy2?

- Why not have a R shell that can play with the data objects created?

- It's obviously very beta right now, some things just don't work like setting font colors on R graphs

Good start and great to see a working windows installer of something with so many dependencies. From experience I know that's half the battle :)

anupparikh · on Nov 8, 2010

Thanks for the feedback.

- the next version of Red-R will use RPy2. We are still benchmarking and improving RPy2 performance.

- Red-R has a widget called R-Executor which is a R shell. Can execute any R code and interact with the R session.

Creating and testing installers is a major issue. I have tested in windows xp, vista and 7, both 32 and 64 bit. But as another comment mentioned, there are plenty of issues we haven't tested. Please report any errors to anup@red-r.org and we will try to fix as quick as possible.

gabeiscoding · on Oct 22, 2010

I do for my team. As a small business that sells commercial software, I am totally not opposed to paying for good software.

I've been watching Fogbugz for a while. Our previous solution was Trac + Subversion for about 5 years and it served us very well. Before Fogbugz added Kiln there wasn't enough of an advantage to making a switch. The other factor that won me over was EBS.

We transitioned to a self-hosted FogBugz+Kiln install a couple months ago and I've been very happy with it, especially with the development velocity and very flawless upgrade process and customer support.

gabeiscoding · on June 3, 2010

I got this exact same question in a Google on-site interview. It was one of those warm-ups :)

It does require someone to think like a C programmer: handling buffers directly, in-place modification and using NULL terminators to end strings that may have memory allocated beyond the NULL. I think it's a great question, but I don't know how well it would go over with CS undergrad students coming out with a Java only background.

Should I look for an equivalent simplistic Java question or stick to the guns and require candidates to know C? (I know the answer for my own team, but curious on others thoughts)