“Now before every flight I always send out an email company-wide saying, ‘If anyone can think of any possible reason to hold off on launching, they should call me immediately on my cell phone or send me an email, whether their manager agrees with it or not.’ But thats sort of something I’ve always sent before every flight, but the 20th time I send that email it just seems like… you know, ’There’s Elon being paranoid again’, so maybe it doesn’t resonate with the same force. But I think now everyone at the company appreciates just how difficult it is to get rockets to orbit successfully and I think we’ll be stronger for it.”
"What really happened was typical I think in large bureaucratic organizations, and any big organization where you’re frankly trying to be a hero in doing your job. And NASA had two strikes against it from the start, which one of those is they were too successful. They had gotten by for a quarter of a century now and had never lost a single person going into space, which was considered a very hazardous thing to do. And they had rescued the Apollo 13 halfway to the moon when part of the vehicle blew up. Seemed like it was an impossible task, but they did it. … So it gives you a little bit of arrogance you shouldn’t have. And a huge amount of money [was] involved. But they hadn’t stumbled yet and they just pressed on. So you really had to quote “prove that it would fail” and nobody could do that." -McDonald
I just listened to a Freakanomics podcast about the person who raised a red flag and wanted to cancel the launch. Very interesting listen. Episode was "Failure Is Your Friend".
> ‘If anyone can think of any possible reason to hold off on launching, they should call me immediately on my cell phone or send me an email, whether their manager agrees with it or not.’
Two thoughts: if you need to send a message like that, the culture of your organization needs some work. Also, does he really think a person can send a message like that - when their manager does not agree - without torpedoing their career?
Sending a message like that is part of the work to ensure the culture of your organization allows it. Otherwise managers at all kinds of levels will bring with them their own ideas of whether or not this is acceptable, and act accordingly.
Besides, this is unusual for a company the size of SpaceX. In most companies of some size I've worked at, most people may have the CEO's e-mail. Maybe. But usually the one filtered by a PA. But most of them don't have the CEO's cellphone number.
> Also, does he really think a person can send a message like that - when their manager does not agree - without torpedoing their career?
If this is usually the case, does that not mean that all organizations needs to work on their culture? So what is the issue with sending am message like that again?
Yeah. It is also the sort of thing NASA actually has in place - "anyone can escalate a safety issue!"
It is worth studying how that kind of thing has NOT fixed the issues as NASA, and how it did not prevent us from losing Columbia. You either have a culture that puts safety, engineering and problem solving ahead of politics and expediency in all cases, or you don't. There are absolutely not any shortcuts to that.
Actually it's only a problem if you're REALLY wrong. If you're just a little bit wrong, or your manager was only just barely right, it's likely that your manager will be reprimanded instead of you.
Further I think half the point of the exercise is to help ensure that managers don't dismiss people's concerns. If you want everyone to take launch safety really seriously broadcasting to everyone that you can go above as many boss' heads as necessary to get something addressed should help keep everyone on their toes.
So I know very little about rockets, but here's my simplified understanding:
The liquid oxygen tank provides the liquid oxygen which fuels the rocket. They use helium to keep the tank pressurised as the liquid oxygen gets used up (why helium? it's light and unreactive). The helium is stored in its own tanks, which are secured by struts like in this picture: http://i.szoter.com/741dc2bcf5762a48.jpg (via reddit). The strut failed, causing a helium leak and a complicated series of events (including the helium tank bouncing around the liquid oxygen tank?), that basically resulted in the liquid oxygen tank being over-pressurised and exploding.
Bonus (very cool!) gif of the liquid oxygen tank during a previous launch (unfortunately cameras apparently weren't included in this launch): https://i.imgur.com/WRp2ujX.gif (also via reddit)
Some extra cool things:
(1) They narrowed down the source of the failure using "acoustic triangulation", which, I think, is essentially using sound sensors (accelerometers?) located at various locations to pinpoint the location of the failure in 3D space.
(2) The Dragon capsule could have been saved if it'd had the right software (which would deploy the parachutes). They'd already planned to do this, and will now have a software update for the next launch. Why hadn't they done this already? Because if the parachutes deploy accidentally, it could result in launch failure, so it's something they have to be careful about. But the capsule survived the explosion and they remained in contact with it until it was below the horizon.
Not exactly relevant but there is a great video with narration(1) from a camera inside a kerosene tank from a Saturn 1 Rocket during flight. Gives appreciation to the amount of thought put into the design of something as simple as a pressurized fuel tank.
> Bonus (very cool!) gif of the liquid oxygen tank during a previous launch (unfortunately cameras apparently weren't included in this launch)
My understanding is that they have them on every launch, but bandwidth constraints with the downlink mean they have to pick between the many cameras to monitor in realtime.
> (1) They narrowed down the source of the failure using "acoustic triangulation", which, I think, is essentially using sound sensors (accelerometers?) located at various locations to pinpoint the location of the failure in 3D space.
An accelerometer is a poor device for doing acoustic measurement. At best, it can measure how the surface the accel is attached to responds to the acoustic environment (and that's only if the accel has a very high frequency response and is conditioned and sampled appropriately). For this kind of work they probably wanted to measure dynamic pressures within the tank, which they would most likely do with microphones. Other than that you have exactly the right idea.
Well depends on their bandwidth. An accelerometer is essentially indistinguishable microphone also capable of detecting sounds all the way "DC" sound (where 'sound' is applied force on the accelerometer itself rather than a pressure plate that reacts to air pressure) -- if it's sampling rate is fast enough and it is coupled to the structure (i.e. the coupling system also has enough bandwidth) it is an acoustic sensor.
Of course, a microphone was designed with high bandwidth in mind, while accelerometers probably value more precision around the DC input.
Not all accelerometers have usable response at DC. The accels I have on my platform for dynamics measurement (high structural vibration and flutter) are meant to be AC coupled, for example. At rest they may give some garbage reading, but when they're vibrating, they accurately measure how the structure is responding.
Your point remains though, if the accel has appropriate frequency response and is properly conditioned and sampled at a high enough rate, acoustic measurement is possible. What I was trying to say (maybe poorly) is that an accelerometer would not be my first choice for doing such work. But in the aftermath of an event like this, the data may be (and was, it sounds like) usable for such purposes.
As final note the 'big explosion' was the air force blowing up the falcon9. This is standard procedure to stop the rocket doing damage in further locations and stopping a much of the fuel hitting the ground as possible (it's apparently nasty stuff).
For pedantry's sake: Falcon 9 blew itself up once an anomaly was detected. The Air Force kill command was not sent until dozens of minutes after the vehicle disintegrated.
The F9's fuel is just rocket-grade kerosene, nothing particularly dangerous or exotic. The thrusters on Dragon on the other hand use monopropellent which is very toxic, but Dragon survived and impacted the sea.
On the other hand, the oxidizer, LOX aka liquid oxygen, is a bit exotic and certainly dangerous in many ways. In general burning and dispersing as much of this and the rocket itself way up there is good, although of course the flight path is designed to be as safe as possible for failures.
It's not really that nasty in terms of toxicity, it's mostly just that you don't want the huge fireball and resulting pressure waves to happen anywhere near people or infrastructure.
It's amazing what profound effects the culture of a company can have on its results, and how that changes over time.
> He said the early team had "an extreme level of paranoia" because of the difficulty of learning how to design and launch rockets. But now, “the vast amount of people at the company today have only ever seen success… when you’ve only seen success, you don’t fear failure quite as much."
> Musk said the night before each launch, he sends a company-wide email asking employees to send him an email or call his cell phone if they have any reason to believe the rocket should not launch.
I think it's good such realities are being realized, then handled in a gentle (but firm) way.
I've done a lot of manufacturing with all sorts of metals. I am not sure one can blame a vendor for a grain structure problem. This is a testing failure. A failure to identify parts that don't pass tests.
The problem might very well be that the very tests a part must pass will weaken the part to the point that it is not usable or less reliable. In other words, some tests are destructive.
Here's on of many interesting pages that came up by searching for "how to test aluminum for grain structure".
I guess I am saying I hope a vendor isn't blamed for this when the reality of the matter might very well be that testing to 100% certainty is impossible.
But there's so much margin for error on these specific parts. Even if the test only ensures 60% strength, and weakens the part by 50%, that's still strong enough.
A part being mildly out of spec might not be a vendor problem. A part being 20% strength is absolutely a vendor problem.
Why was this designed such that the failure of A SINGLE STRUT would be catastrophic?
I don't like to blame a vendor for what should be an engineering problem, whether this means design or testing.
The problem here --assuming it is as described-- is that someone designed a system with the assumption that none of these struts would fail. And, furthermore, executing on a design where the failure of ONE strut could cause a disaster.
Anyhow, that's what it looks like to me given what's been released.
Why was this designed such that the failure of A SINGLE XXX would be catastrophic?
There are plenty of things in a rocket that can fail that would take the entire vehicle with it. Structural mechanics is pretty well understood and loads are well predictable, so it seems perfectly reasonable to me to design with the assumption that a part with a 10x safety factor will not fail.
If you didn't, your rocket would never get to orbit anyway.
I completely agree on the responsibility for acceptance testing, but _someone_ is responsible for the grain structure of the material, after all that's one of the most important properties of metals aside from their constituent metal content, which is itself chosen for its influence over the microstructure of the alloy.
Yes, agreed. That said, I prefer to function from a mindset where I place fault in engineering first. What I mean by this is, I assume it is my responsibility to verify, as much as possible, that components and assemblies meet the required specifications. In other words, don't just engineer the parts. Take the time to engimeer a "failure is not an option" process as well.
I suspect SpaceX's costs are going to, over time, increase significantly as they continue to learn that playing it loose isn't always possible in that business. They are famous for going for COTS in order to save money.
This is NOT a put-down. I think what they are doing is fantastic. It is obviously redefining aerospace. At the same time I am astounded that critical structural components are not 100% tested. That said, metalurgy isn't my area of expertise, which means my opinion here could be complete nonsense. Perhaps this particular failure mode can only be tested through destructive methods (sectioning?) which means you can never be 100% certain to be flying good metal.
Are SpaceX famous for COTS though? My impression was that they manufacture a ridiculous amount of stuff in-house for ... well pretty much this exact reason.
I'd argue that the likely result here is Elon Musk will continue his campaign of "hell with it, we'll build our own".
I've only been exposed to this in oil & gas, but there is a problem that non-destructive testing of drill stems does not test for all failure modes
Edit: i.e. NDT'd pieces are not immune from failure (but other factors such as age and in-service time/rotations/re-thread/re-collar runs etc. can provide additional context to NDT results and should correlate more or less with actual failure patterns).
In a single-use part, non-destructive testing mightn't be a complete picture of how the part will perform.
Kind of sad to see the reddit group /r/spacex wasn't afforded access to the conference. I think they are probably one of the best moderated and one of the most intelligent groups of fans spacex has.
Acoustic triangulation of accelerometers in upper stage helped pinpoint the strut, and using only 0.893 seconds of data (unless the accels were in the part that kept transmitting). That means the streaming sample rate must be quite high!
assumong 330m/s for the speed of sound, you can get 1 inch of precision at a sample rate of just under 13kHz. With 3 channels of 12-bit accelerometer data that's ~ 58kBps with no compression which doesn't seem too out there. Of course you can get a pretty close approximation at much lower sample rates (/2, /4, /8, /16) using band-limited interpolation
helium is 972 m/s, aluminum 5100 m/s, steel 6100 m/s. depending on what they actually measure, correct triangulation sounds like a non-trivial problem; good job spacex for solving that. (unless your cad software does that for you :))
snippet summary: prelim failure cause might have been strut holding up helium tank inside of oxygen tank failed well-below rated stress causing tank to release helium into oxygen tank > failure.
snippet:
"Preliminary conclusion is that a COPV (helium container) strut in the CRS-7 second stage failed at 3.2Gs.
A lot of data was analysed, it took only 0.893 seconds between first sign of trouble and end of data. Preliminary failure arose from a strut in the second stage liquid oxygen tanks that was holding down one composite helium bottle used to pressurize the stage. High pressure helium bottles are pressurized at 5500 psi, stored inside in LOX tank. Several helium bottles in upper stage. At ~3.2 g, one of those struts snapped and broke free inside the tank. Buoyancy increases in accordance with G-load. Released lots of helium into LOX tank. Data shows a drop in the helium pressure, then a rise in the helium pressure system. Quite confusing. As helium bottle broke free and pinched off manifold, restored the pressure but released enough helium to cause the LOX tank to fail. It was a really odd failure mode."
Indeed, it appears that they had to test thousands of struts. Most passed but they found one that failed. Microscopic examination showed bad grain structure. They're made by a vendor so it would appear that vendor has a quality control problem.
The delays will be in setting up a QA process to test each individual strut part, as well as eventually testing all vendor-supplied parts. If anyone can do it, SpaceX can. No reason they can't design robot systems to test every part.
I worked for a while for a vendor that produced some critical components for ... let's say major entities. NASA, US military, Boeing, and so on. I was in electrical QA, next-to-last step before shipping. My job was to electrically scrutinize individual pieces (in some cases) or random samples from a batch (depending on how much the customer was paying) and compare their output to customer's spec.
If your stuff is mission critical, if anybody could potentially die, you really can't trust the parts from these vendors. A few of their smarter customers would repeat all of our tests and send back defective parts. The thing was, we had a lot of borderline stuff come through from bad production (poorly paid or poorly trained staff or defective tools or materials), and once some of these parts hit QA, they had a lot of expense sunk into them. The company didn't want to eat that cost, so there were a lot of arguments between myself and the general manager. I failed a lot of stuff that previous people in my position had let slide.
They also had a really stupid hockey-stick output graph each month. The beginning of the month was slow, we were all cleaning our work areas and retesting our test equipment, and then the last week of the month they'd try to produce 90% of their expected output for the month. Because of my reputation for rejecting stuff, he'd hover over my work area for the last day or two each month.
Given the size of the company I worked for, I have to assume this is not uncommon practice.
It was a heck of an experience, I finally got a better understanding for why so many things seem to break all the time.
I'm bookmarking your comment. It is just the slice-of-life that I want to show newbie engineers. Like some people say you need to spend a year or two in the service industry to learn empathy, I feel engineers likewise need to spend time in QA to learn what their ethics really are. QA is hard, and the pressure to pass is difficult to withstand; eventually you take the "my boss told me to do it" attitude or you learn to make ... Well, not enemies, but certainly rock the boat.
> I feel engineers likewise need to spend time in QA to learn what their ethics really are. QA is hard, and the pressure to pass is difficult to withstand
You really nailed it. The GM's position -- and he said this more than a few times -- was that the parts were designed with extra tolerances already, so if they were a little below spec it was OK.
Engineers have to keep that in mind when designing products: production knows there's a margin for error and they'll take that into consideration when deciding whether or not they can get away with shipping something.
(And the GM was a pretty OK guy, we got along fine otherwise. He in turn was just under a lot of pressure from further up the ladder to meet certain production goals.)
“Look at this. What do you see?” He nodded at Tony again.
“A laser weld, sir.”
“So it would appear. Your identification is quite understandable --- and quite wrong. I want you all to memorize this piece of work. Look well. Because it may easily be the most evil object you will ever encounter.”
They looked wildly impressed, but totally bewildered. He commanded their absolute silence and utmost attention.
“That,” he pointed for emphasis, his voice growing heavy with scorn, “is a falsified inspection record. Worse, it’s one of a series. A certain subcontractor... found its profit margin endangered by a high volume of its work being rejected... The welds passed the computer certification all right --- because it was the same damn good weld, replicated over and over again...”
He gathered his breath. “This is the most important thing I will ever say to you. The human mind is the ultimate testing device... There is nothing, nothing, nothing more important to me in the men and women I train than their absolute personal integrity. Whether you function as welders or inspectors, the laws of physics are implacable lie detectors. You may fool men. You will never fool the metal.”
I was really pleased that they found a strut in their current batch that would have failed in a similar way. That, for me, is key to the confidence on this root cause. If they had been unable to find one it would have remained "theoretically possible but we don't know how", now it is "if the strut is made improperly it can fail."
Dad worked at a firm that produced high-reliability capacitors (they went into the IBM System/360, etc) and they did 100% testing. IBM would do their own 100% testing upon receipt, and they would still find a few rejects each month. They investigated these of course, but never really found a cause. They marked it up to delayed yield problems from the production line.
The funny thing was Delphi was also a customer (the electronics subsidiary of General Motors). They wouldn't pay for more than normal statistical sampling. When he visited them, they had large bins of defective car radios that wouldn't turn-on, etc. To Delphi it was cheaper to run the production line flat-out and deal with the failures, rather than find problems earlier by inspection of received parts.
Great story, thaumaturgy! Does anyone know how much more rigorous Statistical Process Control [0] needs to be for aerospace than for consumer electronics (for example)? I wouldn't be surprised if 1/1000 failure is AOK for consumer hardware and catastrophic failure for aerospace (as we see with SpaceX).
If you only do a few cycles under the yield point, fatigue isn't going to be an issue. If you ran 10^5 or more on a fatigue critical part, then I'd start to question it.
Those struts are also used in the stages that are about to be reusable, so my guess is that they're supposed to withstand the flight conditions multiple times. That would hint the equivalent tests should also not be destructive to the material.
The post said they're rated to 10,000lb and failed at 2000. Sounds like there's a lot of margin between the highest force expected and the rated level, so they could test somewhere between the two.
From what I understand they are designed for 10,000lb rated/certified for 6,000lb and only need to withstand 2,000lb. They found one/some struts in stock that failed at/below 2,000lb. Does anyone have some other interpretation? It seems like every place I read one of those numbers it's different.
I wonder if they'll be looking to recoup some losses from the strut manufacturer. Would be an interesting case. They can prove that the struts can fail well below certification, which should be worth something, but unless they can prove that this particular failure was due to the strut, or at least more likely than not, it would probably be difficult. Also might be unwise from a business perspective as it could make other (potential) suppliers nervous.
"And thus it was learnt that on the twentieth repetition, company-wide emails cease their function, having been seen and imprinted many times, as it was with the boy who cried wolf. But with the experience of failure comes wisdom and strength for the future of the civilization who wishes to become spacefaring." Elon 36:15
I can't help but agree with this. It's because governments have believers in their midst. It's like we're all part of the crazy club, who believe in a magical world, and so if someone else believes in the magical world too, we help them out, part of the club y'know? One of us, one of us. And the magical world clubs that are dotted around the world, we'll help them out as well. Meanwhile everyone in the real world, yeah forget them, they know nothing about our magical world.
Meanwhile everything they do or use or enjoy is based on the work of the real world...
In the US churches file taxes as not for profit orgnazations, this categorization is not exclusive to church's, so this implication that religions get some sort of exceptional tax break is really unfounded.
I'd also note that 'tribalism' is a fundamental human charactistic. Of course we tend to gravitate toward, socialize with, and give preference too those we perceive to be like us. Whether it's people who cheer for the same sportsing team, the same ethnic group, the same school, the same political party, it's all really isomorphic to those who share a religion. So I also find this idea that people who believe in magical worlds have some sort of advantage over those that don't, to be similarly without basis.
But perhaps I missed your point and you meant something else?
Elon at 36:15
http://nasawatch.com/archives/2015/07/spacex-releases.html