Debugging Software Deployments with Strace

birdyrooster · on Nov 25, 2019

Try out sysdig, it has many more features for domain specific tracing using what they call “chisels” and it can do stuff like inspect threads and flows attached to a container or kubernetes resource.

Strace is still a very good tool, but also don’t forget about perf to monitor kernel threads, print details when a c symbol is seen in execution, and more.

gnufx · on Nov 25, 2019

Also perf trace has a lower overheads than strace. I think Brendan Gregg has numbers somewhere.

t34543 · on Nov 26, 2019

Didn’t know about perf - awesome. I love strace, it’s saved me many times.

rachelbythebay · on Nov 25, 2019

“Failed to open %s: %s”, fn, strerror(errno))

Adapt to your own language. Don’t make your users hate you by having to resort to strace.

And I love me some strace. I just enjoy NOT needing it even more.

Shish2k · on Nov 26, 2019

Amen to that. I wonder how good the signal-noise ratio would be on a lint rule to specify "error messages may not be static strings, you need at least one variable" :P

(Maybe not for strictly resource constrained systems - but for something like python, it just seems lazy to `raise Exception("Failed to connect")` without saying what you're trying to connect to or what the error is...)

eyberg · on Nov 25, 2019

We turned on strace like functionality (ex: https://github.com/nanovms/nanos/issues/844 ) for Nanos a long time ago to figure out what to work on for various applications.

Having said that it's not something I advocate people do to fix production issues - especially if they didn't write the code they are looking to debug/fix.

downerending · on Nov 25, 2019

Not sure whether I understood your comment, but for me, strace has been an extremely useful tool for debugging code that I didn't write. Even in the case where source is available, it's often virtually impenetrable, and many times, seeing that a program is attempting a series of system calls that is foolish or guaranteed to fail can quickly provide insight into the problem.

pmoriarty · on Nov 25, 2019

Another very useful tool along these lines is sysdig. It can do pretty much everything in this article and more.