Antha – A high-level language for biology

samuell · on Dec 5, 2014

Interestingly, uses GoFlow [0] (Flow based programming[1] lib, by Vladimir Sibiroff [2]) under the hood, and has a whole section on FBP, at [3].

Was in fact playing with the idea of using it for bioinformatics processing some 1 year ago (see [4] for components and [5] for an example program) and thought it was a great idea :)

Refs:

[0] https://github.com/trustmaster/goflow

[1] http://www.jpaulmorrison.com/fbp

[2] http://twitter.com/sibiroff

[3] http://www.antha-lang.org/docs/concepts/flow-based-programmi...

[4] https://github.com/samuell/blow

[5] https://gist.github.com/samuell/6164115

xaa · on Dec 6, 2014

Fellow bioinformatician here. I find it amusing to see all these projects reinventing pipes and makefiles much more verbosely and without language independence.

The project in the OP, it goes without saying, will never be used by more than a dozen wet-lab biologists, and that's being generous. Possibly it may have some use in automation.

Wet-lab biologists have many other things to worry about, and do not have the incentive or technical knowledge to record all the parameters of their experiments in a complex computer system. If the incentive existed, this data would be sent in Excel spreadsheets to NCBI, and NCBI would turn it into some half-assed file format.

samuell · on Dec 6, 2014

I'm sure makefiles go a long way (many people solve their needs with them), but there are also a lot of use cases where they start to fall short, AFAIS.

Think e.g. a cross-validation set up, combined with a parameter-grid search (to find an optimal parameter combination, for, say, building a support vector machine model), where certain re-usable workflow components (training and prediction) are run for each parameter combination, for each fold in the cross validation ...

We are doing that kind of stuff, and couldn't really imagine any sane way to implement that in make ... which made us go with Spotify's luigi, as documented at https://medium.com/@saml/loosely-coupled-tasks-in-luigi-work...

I still could imagine making this much easier using a light-weight system such as the mentioned "blow".

xaa · on Dec 6, 2014

You are right that makefiles alone would be a bad fit for that situation. My approach would probably be to wrap the parameterizable parts in a script accepting arguments and run the combinations with GNU parallel. Possibly controlling the overall execution and dependencies in a makefile.

It's less "clean", because you have to keep track of metadata like parameters in the output filename or similar. But the advantage is a huge increase in flexibility.

We tried Celery for awhile as a job manager which looks like it has similar capabilities to Luigi. It was slower, caused us to write a lot of ugly "bash-in-python" when calling non-Python programs, and broke the UNIX philosophy of having independent programs doing one thing, making it harder to quickly test new combinations of components without writing a lot of Python code.

It also depends on your dataset size. We do a lot of machine learning on datasets that won't fit in RAM, which is perfect for the pipe/streaming model.

wspeirs · on Dec 5, 2014

Your "blow" library looks pretty interesting as well. I'm not a biologist, but have spoken to a few... this type of thing could prove very helpful in the field.

samuell · on Dec 5, 2014

I think flow based programming principles can do wonders for workflow systems and composable analytics pipelines.

We have recently used Spotify's luigi workflow system a lot, and ran into some tricky problems with a bit more complex workflows, that was only solved after we incorporated a bit of FBP principles (in/out ports and separate network definition, to be exact).

Have documented a bit of it here: https://medium.com/@saml/loosely-coupled-tasks-in-luigi-work... (fairly detailed and technical post though, not sure how easy it is to follow).

At least in my experience so far, FBP principles really help to create the kind of modularity and composability that is needed in these kind of applications.

frisco · on Dec 6, 2014

Their website appears to be a fork of https://github.com/Polymer/docs with just some content changed and the "antha" repo itself appears to be a thin fork of go-lang. Compare:

- https://github.com/antha-lang/antha/blob/master/ast/ast.go

- https://code.google.com/p/go/source/browse/src/go/ast/ast.go

...and the antha code as-is doesn't really do anything? (other than be mostly go)

Not really sure what to make of that.

tshadwell · on Dec 6, 2014

I was looking into who this was funded by, and Synthace who are listed in the footer have this page:

http://www.synthace.com/investors/

Amusingly, the link for their main sponsor's website goes to: "file:///C:/Users/sward/Downloads/www.sofinnova.fr"

kinow · on Dec 5, 2014

Hi, just a heads up. In Portuguese, Antha sounds a lot like anta (tapir). While we rarely talk about the animal, it is used as slang for idiot, dumb. I enjoy playing with bioinformatics tools, so I'll give it a try :) thanks for sharing!

PT_2014 · on Dec 5, 2014

Be interesting to see if there are plans to integrate with BioKepler(http://www.biokepler.org/) or similar existing bioinformatics and lifescience workflow systems.