More

Random_ernest · on Sept 18, 2020

I recall a story where a friend was unable to publish a paper in which he wrote an alternative to a very commonly used commercial tool (that virtually everybody used) with roughly 10 times better performance. He open sourced it and all, it was extremely useful, but there was no new methodology, it was simply very well implemented.

At a talk of his it lead to a very heated discussion where an older professor accused him of wasting government money on such nonsense.

dr_zoidberg · on Sept 18, 2020

Been there. A few years back I got a government scholarship for my PhD (which is still in progress, due to my follow up work). I basically built the foundation upon which to establish a new field for my university, and the region where I live. There are some professor who think that scholarship (and the little money it gave me) was wasted on my because I chose to build all of that from the ground up, instead of rushing through my PhD.

By the way, those of that opinion are all professors who wanted me on their labs, but I turned them down...

yig · on Sept 18, 2020

For every story like this, I believe there are many more in which the student simply writes their own implementation due to not invented here syndrome or engineering as a form of procrastination.

V1ndaar · on Sept 18, 2020

If you talked to me about my PhD for a few minutes you would surely put me into your "had to reinvent the wheel for no reason" category.

As indeed, I wrote an analysis framework for my data (of a gaseous detector used for axion search) [0] instead of using an existing framework used by my predecessor. However, things are always more complicated than they seem. Many of those not talked about students who rewrite stuff probably have reasons!

In my case the existing framework [1] was a monster that was bent to allow it to work with the kind of data we have in the first place. In my case my detector had several additional features, which fit _even less_ into the existing framework. It would have been a hack and still a significant amount of work to make it work well.

To be fair, when I started this I expected it to be less work than it ended up being. But that's the story of software development.

The advantages now are significant of course. I know the whole codebase. It does exactly what I want. I can extend it easily as I see fit.

That doesn't mean I didn't also partly procrastinate writing software. Far from it. Hell, there was no reason to write a freaking plotting library (a sort of port of ggplot2 for Nim) [3]. But again, this means my thesis will have plots created natively using a TikZ backend while at the same time provide links to Vega-Lite plots for each and every plot in my thesis (which of course will include the data for each plot!).

Finally, the most important point: A university / professor who only pays me for 20h a week does not get to tell me how I do my PhD.

[0]: https://github.com/Vindaar/TimepixAnalysis [1]: https://ilcsoft.desy.de/portal/software_packages/marlintpc/ [2]: https://github.com/Vindaar/ggplotnim

dr_zoidberg · on Sept 18, 2020

I certainly have experienced similar things, particularly been acused of reinventing wheels. Flexibility and performance are two big reasons, but also "it's fun" or "I want to understand X" also have a good weight when we do this kind of "useless reinvention".

dr_zoidberg · on Sept 18, 2020

Maybe I wasn't clear enough. When I started my PhD, I was working on leading edge, basically 3 people in my country knew that we were talking about (and I was one of them). It certainly wasn't NIH-syndrome. Still, instead of "bailing out" on the easy path (present a paper here, work with that professor in That Other Thing That Doesn't Interest Me, etc) I chose to keep doing what I love.

End result so far? I'm quite respected, still one of the leading researchers in my country on my specific topic, but since I don't have a PhD (because of the aforementioned delays, and some grumpy professors actively pushing against me) I'm starting to lose access to grants and programs.

I'd still do it all again, but with a few tweaks here and there, you know hindsight always helping.

coliveira · on Sept 18, 2020

Yes, it is easy to be sidetracked on writing software. Not that software is a bad thing, but research is something different.

mratsim · on Sept 18, 2020

Given the current publish or perish culture, I doubt any student serious about publication can afford to waste time.

coliveira · on Sept 18, 2020

It is obvious that we need good software, however from the point of view of science the old professor may have reason. If you are receiving a grant, you're not being paid to write software, in the same way that an engineer is not paid to write novels. As useful as the software may be, the person in question should be spending time on research (by definition new subjects), not writing again an existing software.

ivirshup · on Sept 18, 2020

> If you are receiving a grant, you're not being paid to write software

In my (albeit limited) experience, software is a pretty common deliverable from a grant, at least in computational biology. This has also been my experience with more alternative funding sources like CZI and DARPA.

Taken more broadly, I think there is a huge disconnect between what academics are paid to do, and what takes most of their time. Review is unpaid. Grants are not dependent on which journal the results go into, but time could be saved by aiming lower. A salary can be payed from a research grant, while the investigator still has to teach.

DiogenesKynikos · on Sept 18, 2020

What if that piece of software increases research output across the entire field? Often, a good piece of scientific software advances research more than what you're calling "research."

kergonath · on Sept 18, 2020

Writing software is sometimes necessary to achieve the objectives of a grant (even though this is not necessarily explicit). It’s not writing software that’s a problem, it’s reinventing the wheel; you should not focus on “scientists should not write software”, because that is obviously far from the truth.

For a scientist, writing useful software is a good way to get exposure, build a reputation and get citations. It’s an opportunity to do some different kind of problem solving than usual. It’s also a way of understanding how the software really work (which assumptions are built in, which methods are used, and how does it affect the software’s results?). This does help improve the quality of subsequent results.

A grant typically (there are exceptions, of course) lists things that are going to be studied. How the studying is done is typically down to the people doing the work. It certainly isn’t for grumpy old professors who hear a talk at a conference to judge.

Random_ernest · on Aug 26, 2020

Cold calling businesses is definitely not illegal in Germany if you got the contact info from their website.

vbsteven · on Aug 26, 2020

and every german business (or business operating in germany) is mandated by law to have contact information on their website.

Random_ernest · on Aug 9, 2020

The huge difference between spiritualism and religion is that religion is dogmatic and systemically organised. In my opinion freeing spirituality from the clutch of religion is the final step, not the other way around. So being spiritual but not religious is in my opinion not something that inevitably leads to religion, but if we do it right there is no need for oppressive religion anymore.

Random_ernest · on Aug 5, 2020

Bayes Theorem is one of the most fundamental theorems in the history of mathematics. I have yet to work in a field where it doesn't have deeply fundamental applications. In many cases expert knowledge or heuristic rules serve as prior.

Saying it is overrated is like saying sun or air is overrated.

doublesCs · on Aug 5, 2020

I 100% agree with you.

But hyperbole aside, OP also has a point. If we forget that the estimation of probability in itself has a cost, we could be tempted to put more and more resources into more and more sophiticated methods of data collection and analysis to be more and more certain of your estimate. But if we remember that this process has a cost, some times it's more efficient to just add a margin of safety and move on with your life. Bayes theorem is often used for resource allocation, but the process of optimizing resource allocation in itself has a cost.

brummm · on Aug 5, 2020

There is a reason one learns about it in high school after all.

Random_ernest · on July 30, 2020

> by automating consumption

It's called the subscription economy for a reason. Automated consumption exists and is constantly growing. The money spent on subscriptions grew in the US by >50% from 2010 to 2015.

Random_ernest · on July 28, 2020

Testing out the demo:

SELECT * FROM trips WHERE tip_amount > 500 ORDER BY tip_amount DESC

Very interesting :-)

joan4 · on July 28, 2020

thanks for trying the live demo. That's a very interesting result indeed. Btw, we are working on applying SIMD operations on filter queries (where clause) that will speed up the runtime of queries like that considerably.

120bits · on July 28, 2020

For some reason this query is taking too long to execute. Not sure if I missed something.

santafen · on July 28, 2020

When I ran it it took about 20s total.

tuananh · on July 29, 2020

took almost a min for me

santafen · on July 28, 2020

Some of those are absolutely monstrous tips!

Random_ernest · on July 24, 2020

From my very limited understanding the Graphcore machines outperform CUDA only significantly in inference, in training the improvements might not be sufficient to switch technology.

Random_ernest · on July 23, 2020

With likely very dire results, yes I think you should. If your mothers insurance rate goes up, since you got one of these dna tests for Christmas, she should be involved in the decision to publish this data in the first place.

jrib · on July 23, 2020

In the US, Congress has passed a law that explicitly makes that specific practice illegal: https://en.m.wikipedia.org/wiki/Genetic_Information_Nondiscr...

What workarounds insurance companies come up with to circumvent the spirit of the law and how well it can be enforced will be interesting.

alexpotato · on July 23, 2020

And George W. Bush, a Republican, signed this into law. I remember thinking that strange at the time because I would have thought insurance companies would want to be able to use DNA information and the Republicans being more of a "big business" party would have supported that.

Also, I found out last time this discussion came up on HN that the law prevents it being used for regular insurance but does not apply to life insurance.

astura · on July 23, 2020

And life insurance could just simply demand a dna sample from you before underwriting a policy just like they might demand a physical so the whole "concern" is entirely moot.

Insurance is highly regulated, insurance companies have specific legal ways to underwrite policies, the idea that life insurance companies are going to secretly use stolen data of uncertain provenance in their underwriting instead of just making you submit a dna sample is, quite frankly, silly.

FerretFred · on July 23, 2020

What workarounds insurance companies come up with to ... will be interesting

If there's enough money to be made, I'm sure the Usual People will be persuaded to bend the law until it gives way.

Random_ernest · on July 10, 2020

But the problems where we apply human labour are vastly different from the ones where we apply machine labour. In (most) tasks where we apply human labour a few errors are tolerated.

ukj · on July 11, 2020

This seems irrelevant.

Neither humans nor ML make zero errors.

Ceteris paribus, if an ML algorithm makes fewer errors at a task which with low error tolerance - you would use the algorithm instead of the human, no?

mtrower · on July 12, 2020

I would expect that might depend on what sort of errors each make, no?

Random_ernest · on July 2, 2020

Having worked extensively on synthetic data, what's your "verdict" on the topic? People seem to be very divided about it.

_k0o1 · on July 2, 2020

I think it definitely has its uses. Is it an effective drop in replacement for sensitive data in all scenarios? I never really got that impression. My biggest takeaway was that it is excellent for development and early refinement.

Having access to synthetic data like this would let you give lower level analysts or people with lower clearances data similar to the confidential data you're working with. That's a good way to reduce costs, while also developing skills that would have otherwise been difficult to develop without providing access to the data itself.

What I found is that it's a valuable tool for bringing something up to a state where it can be applied to real data, and refined further. So, long story short would be that I think it certainly has uses, however those uses eventually lead back to using real data.