Try out sysdig, it has many more features for domain specific tracing using what they call “chisels” and it can do stuff like inspect threads and flows attached to a container or kubernetes resource.
Strace is still a very good tool, but also don’t forget about perf to monitor kernel threads, print details when a c symbol is seen in execution, and more.
Amen to that. I wonder how good the signal-noise ratio would be on a lint rule to specify "error messages may not be static strings, you need at least one variable" :P
(Maybe not for strictly resource constrained systems - but for something like python, it just seems lazy to `raise Exception("Failed to connect")` without saying what you're trying to connect to or what the error is...)
Having said that it's not something I advocate people do to fix production issues - especially if they didn't write the code they are looking to debug/fix.
Not sure whether I understood your comment, but for me, strace has been an extremely useful tool for debugging code that I didn't write. Even in the case where source is available, it's often virtually impenetrable, and many times, seeing that a program is attempting a series of system calls that is foolish or guaranteed to fail can quickly provide insight into the problem.
Strace is still a very good tool, but also don’t forget about perf to monitor kernel threads, print details when a c symbol is seen in execution, and more.