More

mattheww · on Oct 7, 2017

So there's no obvious practical application (right now).

Also, the proton is not 4% smaller. Protons are obviously whatever size they are.

The discrepancy comes from the fact there are two techniques to measure the proton size. Both experiments do their thing and then there's a way to interpret the results that would tell you the size of the proton (look up proton form factors).

However, when you do the interpretations, which depend on some theoretical calculations, you get different results. The general thinking around this result, because nobody has found any issue with the experimental results, is that there are some additional interactions that are stronger than expected that need to be accounted for (there are some unknown quantities that allow this).

One of the interactions would only affect the muonic hydrogen measurement - basically there are some different interactions between muons and protons than between electrons and protons because of the muon's mass and those might be different than originally thought.

The other is a type of interaction that could affect both normal and muonic hydrogen. This new measurement shows that the interactions that affect both has to play an important role in understanding this discrepancy. There are other measurements trying to measure this effect independently (not using hydrogen at all).

mattheww · on Sept 20, 2016

As you can imagine, the long-lived particles can have "any" lifetime. Since the lifetime is unknown, it makes sense to probe many lifetimes. The experiment being suggested by Lubatti and his collaborators probes lifetimes on the order of 10^5 - 10^7 m. There aren't strong constraints on this range, which can appear as missing energy in CMS or ATLAS (or not). CMS, ATLAS, and LHCb are giving constraints at 10^-6 to 10^2 m. Interestingly, the different experiments are probing mostly different ranges.

To correct something in the grandparent (?) post, the LHC experiments do not know the incoming energy. Even though the beam energy is 6.5 TeV on 6.5 TeV, the collisions are only a fraction of that. We only know that the energy has to balance in the perpendicular direction. However, if a collision produces two invisible particles that balance each other, it would appear that there is no missing energy. In many of models, pair production of new particles is preferred, so if the particles are long-lived, they can be hard to find.

mattheww · on Sept 16, 2016

The argument in favor of an electron-positron collider is that there is much less "junk" in the detector from the hadron collision. So the trade-off is that you produce a lot fewer Higgs bosons, but, in principle, you are able to measure each Higgs much more precisely. Having a machine at 250 GeV puts us at the sweet spot for producing the Higgs with mass 125 GeV.

There is some discussion in the community about whether it truly is going to advance the field to build such a machine. It's not super clear whether the various proposed 250 GeV machines will improve on what will be done by HL-LHC. From the Chinese point of view, though, it absolutely is the right decision to build this machine on the way to a 100 TeV hadron collider, since they desperately need to build up some local expertise in constructing/operating a large collider.

Jizzle · on Sept 17, 2016

Ah, thanks. This gives me better perspective.

mattheww · on Sept 16, 2016

Experiments like IceCube answer different questions than accelerator based experiments. You couldn't, for example, measure the Higgs branching ratio to bottom quarks at IceCube. On the other hand, you can't measure high-energy neutrinos at the LHC. Both types of experiments have their place.

rubidium · on Sept 16, 2016

Totally hear you. But Yang's whole point was accelerator measurements are not what we should be focusing on for the next "x" new big experiments. I tend to agree.

mattheww · on June 19, 2015

This is not a very straight-forward question to answer. The short answer is that both CMS and ATLAS detected a couple thousand Higgs events.

The long answer depends on your definition of detected. As you probably know, the Higgs decays into many different combinations of particles. In some of these cases, we have very little chance of determining whether there was a Higgs in the event. In fact, only in H(iggs) -> ZZ -> 4l(eptons), do we stand a reasonably good chance of saying whether a specific event contains the Higgs. In this channel, CMS and ATLAS both observed ~20 events.

The next two most sensitive channels are H->WW->2l+2nu (neutrino) and H->2photon. In the most Higgs-enriched regions that have been constructed, the contribution from Higgs is ~10% in H->WW->2l+2nu and ~20% in H->2photon. So in these channels, we have very little chance to say whether a specific event contained a Higgs, but if we look at all the events, we see features that wouldn't be present if there weren't any Higgs bosons. The contributions from Higgs in these channels are a few hundred events in CMS and ATLAS.

Then there are even less sensitive channels, which means we have even less chance to say whether a specific event contains a Higgs. CMS and ATLAS probably detected a few hundred to a thousand or so events each in these channels.

mattheww · on May 12, 2014

I am a scientist, and I have seen a lot of terrible code. Most scientists have no formal training in computer science or coding. Many advisors don't place much value in having their grad students take such classes, though even a short language-specific introduction class would vastly improve their students' productivity.

I recently undertook a complete rewrite of our group's analysis software that was written by our previous postdoc. It was ~30k lines of code in 2 files (one header, one source file), with pretty much every bad coding practice you can image. It was so complicated that that postdoc was essentially the only one who could make changes and add features.

The rewritten framework is only ~6k lines of code to replicate the exact same functionality. It's easy enough to use that just by following some examples, the grad students have been able to do implement studies in a couple days that took weeks in the old framework. The holy grail is for it to be easy enough for the faculty to use, but that will probably take a dedicated tutorial.

My point is that following "best practices" may be overkill, but taking a thoughtful approach to the design of the software can vastly improve your productivity in the long run. Posts like the OP help scientists who write bad code defend poor practices. Any scientist worth his salt should support following good practices because it will always lead to better science.

ap22213 · on May 12, 2014

I work in R&D for a large science services company. And, I'm often responsible to turn nifty research projects into marketable products. Because of this, I often take over a lot of code from scientists and academics. And, it's usually (e.g. always) pretty bad.

'Software engineers' get a bad rap for over-engineering code. And, I understand that. But, the opposite is so, so much worse. I see what you're describing every time I take over a project.

The worst characteristic though is lack of version control. Usually these teams will have used email to exchange source files. They usually have a directories full of 'version_X' sub-directories of different code. And, usually each member of the team will have different versions of the code.

The second worst characteristic I find is code that doesn't actually work unless it is placed exactly in the right directory of a now non-existent server. They send me code (in a zip file, of course), no instructions, no configuration. And, then I spend several days or even weeks just trying to get it to work the way that they said it worked back at their research 'demo' a year ago. 'It worked last year', they say. And, then imply that I'm some sort of hack because I can't understand what they're doing.

squidfood · on May 12, 2014

I'm a scientist who does lots of code. Most of my "projects" are 1000 lines or less (usually much less) to do a single function or calculation.

Last year I was pulled into my first larger-scale project (about 8 science coders at multiple institutions over 5 years). We were able to produce reasonable, readable code for each other on a file-by-file basis. But Version Control was the worst, worst part. Files emailed back and forth between subgroups that never made it into the tree, edits lost, we all had our own forked version at the end, essentially.

The most telling part was when I emailed both IT in my department and several professors (PIs) on the project, including those that taught "scientific programming", asking about setting up a source repository, if one of them could host one, and NONE of them had any clue what git, subversion, etc. even were, let alone where/how to set something up.

andreasvc · on May 12, 2014

You could set up a private BitBucket repo and simply give them the link to a .zip download, while you would enter any code you receive into the repo. It might be unfair that you would have to do all the version control, but it's better than nothing...

stcredzero · on May 12, 2014

At one company I worked at, we had EVCS. "Eliott Version Control System." Everyone emailed Eliott change sets and he put them together.

epaladin · on May 12, 2014

If you can say, what company? It sounds like a pretty interesting role- despite the frustration and difficulty of dealing with such code, turning that into something more generally useful/useable seems like it would be relatively fulfilling in the end.

eli_gottlieb · on May 13, 2014

>The second worst characteristic I find is code that doesn't actually work unless it is placed exactly in the right directory of a now non-existent server. They send me code (in a zip file, of course), no instructions, no configuration. And, then I spend several days or even weeks just trying to get it to work the way that they said it worked back at their research 'demo' a year ago. 'It worked last year', they say. And, then imply that I'm some sort of hack because I can't understand what they're doing.

As a graduate student who has had to deal with this kind of code, and finally joined together with another grad-student to fight back and make our software retargettable... I'm so, so sorry.

taeric · on May 12, 2014

I wouldn't say the opposite is so much worse, rather it is whichever annoyance you deal with is worse than the one you don't.

Also, there is a difference between over engineered and not engineered. It is truly the "over engineered" that has me annoyed nowdays.

danso · on May 12, 2014

This is true in finance and in other data-heavy fields as well. I've been shocked at the kinds of Excel sheets that, with a mess of spaghetti VB code written by someone long gone, factors into trades worth millions...sure, it "works"...but besides the very minor question of code elegance, who knows what optimization of returns could be made if the code wasn't such a fright that a knowledgable partner could tweak and experiment with it? Or that it was abstracted enough to be applied to the other kinds of trades that the firm is making (but hell what do I know, I'm not as rich as my hedge fund friends)?

What's particularly annoying is working with analysts who have a system of pasting SQL scripts from a (hand-labeled-versioned) text file to perform the necessary data-munging/pivoting for in-house use...their SQL work is, to be fair, so much of a leap forward from however such bulk data work was being done previously that they take offense when I offer to help them automate the work...as if their system of hand-pasting/executing scripts, then eyeballing the results for an hour to spot-check it, was inherently more reliable than a batch script with well-defined automated test parameters...What they fail to see is that it's not just about faster/better error-checking, but it's about more flexible analysis and output. Once the process has been abstracted, instead of producing one "clean" giant database that is faceted along one dimension (time, perhaps), the script can loop through and spit out a variety of useful permutaitons, which would be impossible/insanity if you stick with the hand-tweaked process.

That's the problem I see with the OP...A scientist can recognize when something seems to work, when it comes to the domain of programming and structure, but "what works" may simply be "what seems to work better than what I did last time"...which is not a foolproof standard of evaluation

dwc · on May 12, 2014

I'm a programmer and I've worked with scientists (planetary geology). The code is usually pretty bad, but ignoring how "pretty" or maintainable it might be, from the outside it ran way too slow, used too much memory, and botched edge conditions. On the good side, the intentions were pretty clear and the mathematics were sound. So it was pretty easy to fix things up to handle needed data volume and deal with the missed edge cases. As long as I was brought in within a certain window of time it was easy indeed.

The real issue is not best practices, per se, but what passes for them in some rather large circles. Yosefk's "DriverController, ControllerManager, DriverManager, ManagerController, controlDriver ad infinitum" is a fine warning sign. Nothing there is named after anything in the problem domain, and that's a sure sign of trouble. It's a sign that the programmer thinks the problem domain is software engineering or computer science, but that's wrong.

I've always seen becoming intimate with the problem domain as an integral part of programmimg in the real world. I've succeeded to the extent that I have been occasionally asked to provide help outside of software, by top people. How can anyone do a good job providing software solutions otherwise?

timr · on May 12, 2014

The question is (from direct experience): how long did it take you, and what was the cost to your career in terms of papers you didn't publish, research you didn't do, etc.?

It took me far too long to realize that there's almost no reward for code quality in academia. Code rarely gets re-used. Of the small amount that does, result consistency is a higher priority than maintainability, except for the .0001% of projects that end up being maintained by a large, collaborative team. So if you're the sucker who spends 30% of his time cleaning up the old code, you're at a 30% disadvantage to the people on the team who will quite happily use your work to publish papers, get postdocs/professorships and succeed.

I'm being a little harsh, but not by much. Unless you're tenured faculty, publishing is job one. The same rule applies to startups: code quality doesn't matter until you're successful, and once you're successful, someone else will be maintaining the code. The costs of badness are externalized to those who will voluntarily bear the burden.

beejiu · on May 12, 2014

I think you've hit the nail on the head. Scientists are not there to create great software. They are there to create great science. For the small amount of software that does end up in a commercial product, it will probably be rewritten anyway, and probably by somebody who wasn't doing the research in the first place.

SilasX · on May 12, 2014

So it "saves time for research" in the sense that scientists don't check that the code component operates correctly? In that case, why bother with code at all? Just make up plausible output and no one will look any further.

timr · on May 12, 2014

You're making an invalid assumption: "nice code" is neither a necessary nor sufficient condition for "correct code".

SilasX · on May 12, 2014

Where did I make that assumption? I said that scientists aren't checking that the results of the code are in fact correct.

timr · on May 12, 2014

If you're concluding that from what I said, then you're making the assumption. Bad code can still be well-tested.

SilasX · on May 13, 2014

Sorry, I wasn't clear: I meant that the reviewers aren't checking that the code is running correctly.

And yes, I'm sure scientists do a bang-up job testing their own code, just like they do a bang-up job validating their own experience, checking their own logic, and criticizing their own experiments.

But the whole point of science is not to trust yourself; to make reproducible what you did. To the extent that you seal off part of the process from this kind of review, you're not doing science, but something else.

jmcgough · on May 13, 2014

I think it depends on whether it needs to be maintained over a period of time or if multiple people need to work on the codebase. If it's just being written for one paper then sure, just get it done as quickly as possible.

However, there's no reason not to follow some best practices. Using a VCS has pretty much no cost other than some initial learning curve, and the productivity benefits can be substantial. So - I think there's a balancing act in terms of optimal speed between writing good code and writing code as fast as possible.

npsimons · on May 12, 2014

I hate "best practices", precisely because it implies there is one (and only one) "best" way to do something, and it's usually implied that there is only one tool that does things that way. That being said, I can see why "best practices" have come into being.

Like the article author, I too have worked on code created by physicists, mathematicians and yes, even electrical engineers. The article author is lucky; "bad" coding practices I've come across include:

- create a new directory, copy the files you want to change into the directory, then make new changes - that's version control! (nb - no, they didn't name anything to indicate which was the new "version").

- constructors with (I shit you not), 29 arguments, none of them defaulted. Of course, that was because it was converted from Matlab code where the original functions had 30 arguments . . .

- etc, etc, etc

I'll tell you what; give me your paper, and I'll implement the code from that much better than you ever could. Sure, I've had plenty of experience cleaning up other people's messes ("we've got this standalone RADAR sim written in Matlab; it should be quick and easy for you to convert to C++ and interact with a two other sims!"), which is precisely why I don't do it anymore. Or at least, I'll have a look and give you a better estimate than I used to, but I'll be honest and also quote you a much shorter time to re-write it from scratch.

hosh · on May 12, 2014

> "taking a thoughtful approach to the design of the software can vastly improve your productivity in the long run"

I think, taking a "thoughtful approach" is the key to a lot of different practices. "Best practice" as used by most people, in many different crafts and arts, is a method to avoid thinking on what it is you are trying to do.

The most effective kinds of "best practice" are the ones you mastered by making a lot of mistakes, not something you pulled out from a book or a class. It is naive to think you can substitute standards for personal mastery.

kunstmord · on May 12, 2014

I've waded through a lot of legacy and current scientific code (and still do that sometimes).

The worst part (not taking into account the coding style per se) for me was the (sometimes) inability to reuse the code I've encountered or adapt it to other cases.

I think scientific advisors should make a point which goes something like "If you're serious about your work, you might find one day that someone else wants to use parts of your code, so take that into account when planning your program". In my experience, a lot of programs are written as quick-hack solutions, and then there is no time to rewrite them, they grow bigger and it just snowballs from there.

The way CS was taught to us (and we're a big university) was pretty bad. No coding style, no experience with CVS, nothing concerning planning before writing new code. In the end, a lot of people got the bare minimum amount of knowledge needed to code, and started doing research using that knowledge.

yosefk · on May 12, 2014

I agree of course, I just think a scientist taking a more thoughtful approach > a scientist taking a sloppy approach > a "software engineer" taking an overly thoughtful approach. Because the latter could have written ~200K LOC spread in 5 directories and you'd need a debugger to tell which piece of code calls which.

Silhouette · on May 12, 2014

I think you're comparing apples to oranges, both here and repeatedly in your original article.

For one thing, you describe many "sins" that "software engineers" commit, but in reality code that was flawed in most of those ways would not even have passed review and made it into the VCS at a lot of software shops, nor would any serious undergrad CS or SE course advocate using those practices as indiscriminately as you seem to be suggesting.

For another thing, how many "scientists taking a sloppy approach" do you actually know who can successfully build the equivalent of a ~200K LOC project at all, even if those 200K lines were over-engineered, over-abstract code that could have been done in 50K or 100K lines by better developers? It's one thing to say a scientist writing a one-page script to run some data through an analysis library and chart the output can get by without much programming skill, but something else to suggest that the guy building the analysis library itself could.

gknoy · on May 12, 2014

It's not that a single scientist writes it, but rather that someone publishes a paper on something, with ugly code used to prove it, and then becomes a professor. Subsequent generations of graduate students are tasked with extending / improving this existing codebase until it is basically Cthulu in C form. ;)

I recall reading a propulsion simulation's code developed in this way. "Written" in C++, initially by automated translation of the original Fortran code. Successive generations of graduate students had grafted on bits of stuff, but the core was basically translated Fortran, with a generous helping of cut-and-paste rather than methods for many things. (I don't mean this as an insult to Fortran: I've tremendous respect for its capabilities, and have read well-written code in that as well.)

The net result was that fixing bugs in the system was very challenging, as it was a very brittle black box. It was not Daily-WTF-worthy, but still very frightening. I'm very grateful I was not the one maintaining it. ;)

danieldk · on May 12, 2014

You must not have been in science or you'd have encountered the 200K LOC program, written in five programming languages (two of them obscure), which can only be compiled on the author's computer. Oh, and add 50K of C code from ancient versions of other projects (which could've been used as libraries) for undocumented reasons.

Though, I have also had colleagues who were also brilliant programmers.

danieltillett · on May 12, 2014

This describes almost every published application I have ever tried to get running. It ends up being impossible to get the application working on anything other than the authors workstation.

happimess · on May 12, 2014

I would alter your list to say that a competent software engineer working together with a scientist > a scientist taking a thoughtful approach > a sloppy scientist > someone who is neither a competent software engineer nor a thoughtful scientist.

From the article and your comment above, it sounds to me like you have had to work with a terrible programmer who ranted about best practices to cover for his incompetence. We've all worked with someone like that, even in software shops. Don't tar us all with that brush.

mbillie1 · on May 13, 2014

I think it's a pretty shoddy software engineer who writes more LOC than the scientist. Good code is concise, readable without comments, etc. Bad software engineers write bad code is no different than a bad scientist reasoning that the sun is cold because the temperature in January is below freezing.

angersock · on May 12, 2014

What's really interesting here is comparing the two lists of problems the author gives.

On one hand, the problems are either product defects (crashes, missing files, etc.) or maintainability defects (globals, bad names, obscure clever libraries, etc.).

On the other hand, the problems the author mentions are basically things anathema to snowflake programmers (files spread all over, deep hierarchies, "grep-defeating techniques", etc.)

The academic's code scales vertically, because you can always (hah!) find some really bright researcher who is smart enough to grok the code and spend all the time in valgrind and whatnot to make it work. However, God help you if you can't find (or, more appropriately given the current academic culture, force) somebody to waste many hours of their lives fixing mudball code.

The other extreme scales horizontally, right? You have these many files, and deep hierarchies, and dynamic loading, but that's how a lot of people are used to doing it and that's what the tooling is designed to support. The big accomplishment of Java and C# isn't that it lets you get a 100x return from a 50x programmer, but that it lets you scale to having 50-100 programmers in a semi-reasonable way on a project.

In an ideal world, you have a small number of academics and engineers that communicate tightly and write good, compact, and clean code; in the real world, you want to pick tools that help you deal with the fact that it is hard to scale vertically.

EDIT:

At second read-through, I think the author just needs to use better tools. A good IDE makes code discovery much easier than mere grep, and helps solve a lot of other problems.

I do not understand the insistence of academics on using unfriendly tools.

mbillie1 · on May 13, 2014

> I do not understand the insistence of academics on using unfriendly tools.

My step father teaches doctorate business students. Until VERY recently he was running Corel Wordperfect simply because it was the first word processor he had installed. Never underestimate the potential stubbornness of smart people :)

mattheww · on May 8, 2014

Close to nothing would happen. The relevant quantity to understand is the stopping power (usually referred to by us physicists as dE/dx). Stopping power drops significantly as energy of the charged particle increases. So for very large energies, very little of that energy is deposited over a given distance (in an absolute sense).

This principle is how some radiation therapies work to treat cancer. Because the stopping power curve for protons is well understood, a proton beam can be tuned to deposit all of its energy is a fairly small area.

Aside/Rant: this question clearly can be answered by someone with expertise in the field. So why did people feel the need to speculate about it instead of just waiting for someone who knows how to answer it?

mattheww · on May 8, 2014

Actually, it does follow. If there are particles of the energy described in the original link, there are many more particles of lower energies also hitting the Earth's atmosphere, including energies accessible at the LHC. And none of these resulted in the Earth's destruction.

The same logic applies to the argument that any black-holes or otherwise dangerous particles would simply zip by the Earth (stated in a sibling comment to yours). Particles of slightly lower momenta come in higher numbers, so these would produce the dangerous particle, but not retain enough momentum to escape. Since the Earth has not been destroyed, there are no particles produced at low enough energies for us to destroy ourselves.

brazzy · on May 8, 2014

If I understand you correctly, your argument is based on the assumption that particle energies have some sort of continuous distribution. That's not necessarily true.

mattheww · on May 8, 2014

It's based on the fact that particle energies follow a continuous distribution.

See, for example: http://www2.astro.psu.edu/users/nnp/cr.html.

mattheww · on Jan 8, 2014

These off-base criticisms reflect the fact that commenter did not read the whole paper and likely did not even read the introduction, which ends on page 3, where he found "serious" problems.

For example, the paper nowhere conflates consciousness and memory. Instead, the paper repeats the suggestion that a requirement of consciousness may be the ability to process a substantial amount of information. This idea is related to the idea of memory, but it's not really the same.

The paper is not about describing the biological experience of consciousness at all. This paper asks the question: "Is there some way to understand from the Hamiltonian and density matrix of our universe that it should contain consciousness?"

Tegmark then proposes some criteria that are probably necessary for consciousness to arise and then presents some metrics for the various criteria and calculates those metrics for various conditions. The paper is really describing a framework for how consciousness can be considered in the context of a physical representation. The calculations should be relatively straight-forward to follow for anyone with a decent memory of their linear algebra class.

To put it another way: If you had a Hamiltonian and density matrix that described a universe that you thought contained consciousness, what kind of things would you calculate for that Hamiltonian and density matrix to try and find out if it did or didn't? This paper suggests some ways to think about this question.

So I don't understand why this comment spends so long talking about "observing consciousness."

mattheww · on Nov 19, 2013

Why do you want to limit by location when the the data is already separated by cause? All of the Tesla fires were caused by collision.

If you compare (Fires caused by collision)/Vehicles on the road, then I agree Tesla does not compare favorably. In principle, you should compare Fires/collision in case Tesla drivers have an abnormally high collision rate. And statistically speaking, it's pretty hard to draw a conclusion when your sample size is 3 fires.

But the fact is there are still no fires attributed to electrical or mechanical problems.

And the points on both sides about injuries/deaths from fires are pretty much red herrings since the numbers are already extremely low. There are only 2 deaths per 100 fires caused by collisions.