Interestingly, uses GoFlow [0] (Flow based programming[1] lib, by Vladimir Sibiroff [2]) under the hood, and has a whole section on FBP, at [3].
Was in fact playing with the idea of using it for bioinformatics processing some 1 year ago (see [4] for components and [5] for an example program) and thought it was a great idea :)
Fellow bioinformatician here. I find it amusing to see all these projects reinventing pipes and makefiles much more verbosely and without language independence.
The project in the OP, it goes without saying, will never be used by more than a dozen wet-lab biologists, and that's being generous. Possibly it may have some use in automation.
Wet-lab biologists have many other things to worry about, and do not have the incentive or technical knowledge to record all the parameters of their experiments in a complex computer system. If the incentive existed, this data would be sent in Excel spreadsheets to NCBI, and NCBI would turn it into some half-assed file format.
I'm sure makefiles go a long way (many people solve their needs with them), but there are also a lot of use cases where they start to fall short, AFAIS.
Think e.g. a cross-validation set up, combined with a parameter-grid search (to find an optimal parameter combination, for, say, building a support vector machine model), where certain re-usable workflow components (training and prediction) are run for each parameter combination, for each fold in the cross validation ...
You are right that makefiles alone would be a bad fit for that situation. My approach would probably be to wrap the parameterizable parts in a script accepting arguments and run the combinations with GNU parallel. Possibly controlling the overall execution and dependencies in a makefile.
It's less "clean", because you have to keep track of metadata like parameters in the output filename or similar. But the advantage is a huge increase in flexibility.
We tried Celery for awhile as a job manager which looks like it has similar capabilities to Luigi. It was slower, caused us to write a lot of ugly "bash-in-python" when calling non-Python programs, and broke the UNIX philosophy of having independent programs doing one thing, making it harder to quickly test new combinations of components without writing a lot of Python code.
It also depends on your dataset size. We do a lot of machine learning on datasets that won't fit in RAM, which is perfect for the pipe/streaming model.
Your "blow" library looks pretty interesting as well. I'm not a biologist, but have spoken to a few... this type of thing could prove very helpful in the field.
I think flow based programming principles can do wonders for workflow systems and composable analytics pipelines.
We have recently used Spotify's luigi workflow system a lot, and ran into some tricky problems with a bit more complex workflows, that was only solved after we incorporated a bit of FBP principles (in/out ports and separate network definition, to be exact).
At least in my experience so far, FBP principles really help to create the kind of modularity and composability that is needed in these kind of applications.
Their website appears to be a fork of https://github.com/Polymer/docs with just some content changed and the "antha" repo itself appears to be a thin fork of go-lang. Compare:
Hi, just a heads up. In Portuguese, Antha sounds a lot like anta (tapir). While we rarely talk about the animal, it is used as slang for idiot, dumb. I enjoy playing with bioinformatics tools, so I'll give it a try :) thanks for sharing!
Be interesting to see if there are plans to integrate with BioKepler(http://www.biokepler.org/) or similar existing bioinformatics and lifescience workflow systems.
Was in fact playing with the idea of using it for bioinformatics processing some 1 year ago (see [4] for components and [5] for an example program) and thought it was a great idea :)
Refs:
[0] https://github.com/trustmaster/goflow
[1] http://www.jpaulmorrison.com/fbp
[2] http://twitter.com/sibiroff
[3] http://www.antha-lang.org/docs/concepts/flow-based-programmi...
[4] https://github.com/samuell/blow
[5] https://gist.github.com/samuell/6164115