It's fascinating how differently languages approach the string formatting design space.
- Java's been trying to add f/t-strings, but its designers appear to be perfectionists to a fault, unable to accept anything that doesn't solve every single problem possible to imagine: [1].
- Go developers seem to have taken no more than 5 minutes considering the problem, then thoughtlessly discarded it: [2]. A position born from pure ignorance as far as I'm concerned.
- Python, on the other hand, has consistently put forth a balanced approach of discussing each new way of formatting strings for some time, deciding on a good enough implementation and going with it.
In the end, I find it hard to disagree with Python's approach. Its devs have been able to get value from first the best variant of sprintf in .format() since 2008, f-strings since 2016, and now t-strings.
> Go developers seem to have taken no more than 5 minutes considering the problem, then thoughtlessly discarded it: [2]. A position born from pure ignorance as far as I'm concerned
There are a million things in go that could be described this way.
Looking at the various conversations involving string interpolation, this characterization is extremely unkind. They've clearly spent a lot more than 5 minutes thinking about this, including writing their own mini-proposals[1].
Are they wrong about this issue? I think they are. There is a big difference in ergonomics between String interpolation and something like fmt.Sprintf, and the performance cost of fmt.Sprintf is non-trivial as well. But I can't say they didn't put any thought into this.
As we've seen multiple times with Go generics and error handling before, their slow progress on correcting serious usability issues with the language stem from the same basic reasons we see with recent Java features: they are just being quite perfectionist about it. And unlike Java, the Go team would not even release an experimental feature unless they feel quite good about it.
> There is a big difference in ergonomics between String interpolation and something like fmt.Sprintf
On the other hand, there’s a difference in localizability as well: the latter is localizable, the former isn’t. (It also worries me that I see no substantive discussion of localization in PEP 750.)
T-strings should also help localizability, you can now just retrieve them from a locale -> t-string mapping and they should Just Work. Or am I missing something?
First—and more importantly—a professional translator’s interface is list of strings and some supplementary materials in, list of strings out. (Protip: given baseline i18n competence like not concatenating sencences out of parts, the quality of the translation you get is largely determined by the supplementary materials; screenshot every part of your UI and your translator will be willing to kiss you.) Both in communicating with clients and on the fast path of their own work (in CAT software like Across, Trados, etc.).
They do not need to see the code that fills in any placeholders, nor do they want to spend time and attention preserving it exactly as it’s been written—they just want to reorder the placeholders as the syntax of the language dictates. Arguably even %s vs %d is too much information. The ideal is {name} and {count}, and {1} and {2} are acceptable.
(By contrast, a very common issue is that like half of the sentence may need to change depending on the numeric value filled in for {count}, and two variants in English may map to anywhere between one and four after localization[1]. Template strings help with this not at all.)
Second—though this may be easier to fix—the two major ways to retrieve localized messages at runtime are integers in, strings out (most commercial tools out there) and strings in, strings out (GNU gettext), where the output strings are retrieved from some sort of data file that’s intentionally incapable of containing executable code (resource-only “MUI” DLLs, hashtables in GNU “MO”s, plain old text in Java “properties”, various kinds of XML, etc.).
Operating on pure data isn’t a strict necessity. It’s is largely a concession to the development process, which may not permit the string tables to go out to the localization contractors until principal QA is finished. The last thing you want is to insert engineering into the already-slow loop between the translators, the editors (or a translation agency employing both), and localization QA (hopefully in-house, using the actual live software to check the translations in context).
I just expect better from professional language designers. To me, the blindingly obvious follow up to the thought "We understand that people familiar with other languages would like to see string interpolation in Go." [1] is to research how said other languages have gone about implementing this and to present a brief summary of their findings. This is table stakes stuff.
Then there's "You can [get] a similar effect using fmt.Sprint, with custom functions for non-default formatting." [2]:
- Just the fact that "you can already do this" needs to be said should give the designers pause. Clearly you can't already do this if people are requesting a new feature. Indeed, this situation exactly mimics the story of Go's generics - after all, they do not let you do anything you couldn't do before, and yet they got added to Go. It's as if ergonomics matter, huh.
Another way to look at this: if fmt.Sprint is so good it should be used way more than fmt.Sprintf right? Should be easy to prove :)
- The argument crumbles under the load-bearing "similar effect". I already scratched the surface of why this is wrong in a sibling post: [3].
I suspect the reason for this shallow dismissal is the designers didn't go as far as to A/B test their proposal themselves, so their arguments are based on their gut feel instead of experience. That's the only way I can see someone would come up with the idea that fmt.Sprint and f-strings are similar enough. They actually are if all you do is imagine yourself writing the simplest case possible:
fmt.Sprint("This house is ", measurements(2.5), " tall")
f"This house is {measurements(2.5)} tall"
Similar enough, so long as you're willing to handwave away the need to match quotation marks and insert commas and don't spend time coding using both approaches. If you did, you'd find that writing brand new string formatting statements is much rarer than modifying existing ones. And that's where the meat of the differences is buried. Modifying f-strings is trivial, but making any changes to existing fmt.Sprint calls is painful.
P.S. Proposing syntax as noisy as:
fmt.Println("This house is \(measurements(2.5)) tall")
is just another sign the designers don't get it. The entire point is to reduce the amount of typing and visual noise.
Value types anyone? I have zero doubt it is tough to add and get right, esp. to retrofit, but it has been so many years that I have learned/discarded several new languages since Java... and they STILL aren't launched yet.
A format function that arbitrarily executes code from within a format string sounds like a complete nightmare. Log4j as an example.
The rejection's example shows how that arbitrary code within the string could instead be fixed functions outside of a string. Safer, easier for compilers and programmers; unless an 'eval' for strings is what was desired. (Offhand I've only seen eval in /scripted/ languages; go makes binaries.)
No, the format function doesn't "arbitrarily execute code."
An f/t string is syntax not runtime.
Instead of
"Hello " + subject + "!"
you write
f"Hello {subject}!"
That subject is simple an normal code expression, but one that occurs after the opening quote of the literal and before the ending quote of the literal.
And instead of
query(["SELECT * FROM account WHERE id = ", " AND active"], [id])
you write
query(t"SELECT * FROM account WHERE id = {id} AND active")
It's a way of writing string literals that if anything makes injection less likely.
The Rejected Golang proposal cited by the post I'm replying to. NOT Python's present PEP or any other string that might resolve magic variables (just not literally eval / exec functions!).
As far as I can tell from the linked proposal, it wouldn't have involved such evaluation either. It seems like it was intended to work fundamentally the same way as it currently does in Python: by analyzing the string literal ahead of time and translating into equivalent explicit formatting code, as syntactic sugar. There seem to have been many misunderstandings in the GitHub discussion.
The inline {variable} reference suffix format would be less confusing for situations that involve _many_ variables. Though I'm a bit more partial to this syntax with an immediately trailing %{variable} packet since my gut feeling is that special case would be cleaner in a parser.
Thanks for this example - it makes it clear it can be a mechanism for something like sqlc/typed sql (my go-to with python too, don't like orms) without a transpilation step or arguably awkward language API wrappers to the SQL. We'll need linters to prevent accidentally using `f` instead of `t` but I guess we needed that already anyways. Great to be able to see the actual cost in the DB without having to actually find the query for something like `typeddb.SelectActiveAccount(I'd)`. Good stuff.
The PEP says these return a new type `Template`, so you should be able to both type and/or duck type for these specifically and reject non-Template inputs.
In case there was confusion: Python's f-string functionality in particular is specific to string literals. The f prefix doesn't create a different data type; instead, the contents of the literal are parsed at compile time and the entire thing is rewritten into equivalent string concatenation code (although IIRC it uses dedicated bytecodes, in at least some versions).
The t-string proposal involves using new data types to abstract the concatenation and formatting process, but it's still a compile-time process - and the parts between the braces still involve code that executes first - and there's still no separate type for the overall t-string literal, and no way to end up eval'ing code from user-supplied data except by explicitly requesting to do so.
Python source code is translated into bytecode for a VM just like in Java or C#, and by default it's cached in .pyc files. It's only different in that you can ask to execute a source code file and the compilation happens automatically before the bytecode-interpretation.
`SyntaxError` is fundamentally different from other exceptions because it can occur during compilation, and only occurs at run-time if explicitly raised (or via explicit invocation of another code compilation, such as with `exec`/`eval`, or importing a module). This is also why you can't catch a `SyntaxError` caused by the invalid syntax of your own code, but only from such an explicit `raise` or a request to compile a source code string (see https://stackoverflow.com/questions/1856408 ).
My reply was to the parent post's SPECIFIC example of Golang's rejected feature request. Please go read that proposal.
It is NOT about the possibility of referencing existing / future (lazy / deferred evaluation) string literals from within the string, but about a format string that would literally evaluate arbitrary functions within a string.
The proposal doesn't say anything about executing code in user-supplied strings. It only talks about a string literal that is processed by the compiler (at which point no user-supplied string can be available).
On the other hand, the current solution offered by Go (fmt.Sprintf) is the one who supports a user-supplied format String. Admittedly, there is a limited amount of damage that could be done this well, but you can at the very least cause a program to panic.
The reason for declining this feature[1] has nothing to do with what you stated. Ian Lance Taylor simply said: "This doesn't seem to have a big advantage over calling fmt.Sprintf" and "You can a similar effect using fmt.Sprint". He conceded that there are performance advantages to string interpolation, but he doesn't believe there are any gains in usability over fmt.Sprintf/fmt.Sprint and as is usual with Go (compared to other languages), they're loathe to add new features to the compiler[2].
No, it's exactly the opposite--f-strings are, roughly, eval (that is, unsanitary string concatenation that is presumptively an error in any nontrivial use) to t-strings which are just an alternative expression syntax, and do not even dereference their arguments.
Shooting any unsanitized input into your application is bad. template strings don't make this worse. any_func(attacker_provided) is even worse then any_func(t"{attacker_provided}") since in the later case you actually have reduced the attack surface to just strings.
t-string are lazy, which is the point (escaping HTML, translating strings when you get preferred language headers, preparing SQL statements...).
Does Ruby strings already allow lazy processing ?
I'm not talking about wrapping them in a block and passing the block (all languages can do that with a lambdas) but a having literally that eventually resolves to something when you use it.
That's seems like the wrong pattern, maybe I'm missing something.
Ruby has lazy evaluation with a generic lazy enumeration facility, whether to produce string or any kind of object.
That is, I don't know what is the actual behavior of the default string interpolation in Ruby, but if profiling a codebase some string generation would gain lazy evaluation, there is a path to do so. But in the general case, does it really matter? Chances are good that a string construction is not a big bottleneck.
Does Python miss such a feature of generic lazy enumeration, or is it so painful to use that some syntactic sugar felt like a must have? Genuine question here, I don't have any strong opinion on this t-string feature.
- string construction is a hot path, you don't want them to always be lazy, especially since any access is slow in python.
- having it using a string syntax is just very clean and easy to read. It's explicit and can be supported by good editor highlighting.
- it's easy to grep, analyse for, substitute, etc.
- you get one single unified API instead of thousands of variations. Translations API, log API and escaping API can all look the same, arguments are in the same shapes.
I understand that string generation can be a hotpath, though I wouldn't take it as a general certain fact.
From what I understand here the benefit in term of performance is mainly due to partial application automatically handled by the interpreter. It's hard to me to jauge actual pro/con compared to Ruby which can also leverage on freezed string, lambda, miscellaneous lazy evaluation facilities for example. I'm not aware of anything close in PHP, to stay in the realm of popular interpreted languages. I didn't make any Lua for a long time, so no idea how it evolved on that matter.
D had a big blow up over string interpolation. Walter wanted something simple and the community wanted something more like these template ones from Python (at least from scanning the first little bit of the PEP). Walter eventually went with what the community wanted.
This led to the OpenD language fork (https://opendlang.org/index.html) which is led by some contributors who had other more general gripes with D. The fork is trying to merge in useful stuff from main D, while advancing the language. They have a Discord which unfortunately is the main source of info.
I promise, no trolling from me in this comment. I never understood the advantage of Python f-strings over printf-style format strings. I tried to Google for pros and cons and didn't find anything very satisfying. Can someone provide a brief list of pros and cons? To be clear, I can always do what I need to do with both, but I don't know f-strings nearly as well as printf-style, because of my experience with C programming.
Sure, here are the two Go/C-style formatting options:
fmt.Sprintf("This house is %s tall", measurements(2.5))
fmt.Sprint("This house is ", measurements(2.5), " tall")
And the Python f-string equivalent:
f"This house is {measurements(2.5)} tall"
The Sprintf version sucks because for every formatting argument, like "%s", we need to stop reading the string and look for the corresponding argument to the function. Not so bad for one argument but gets linearly worse.
Sprint is better in that regard, we can read from left to right without interruptions, but is a pain to write due to all the punctuation, nevermind refactor. For example, try adding a new variable between "This" and "house". With the f-string you just type {var} before "house" and you're done. With Sprint, you're now juggling quotation marks and commas. And that's just a simple addition of a new variable. Moving variables or substrings around is even worse.
Summing up, f-strings are substantially more ergonomic to use and since string formatting is so commonly done, this adds up quickly.
You don't with f-strings because they're substituted eagerly. You could with the new t-strings proposed here because you can get at the individual parts.
Superficially f-strings reminds you of php and everyone remembers how awful that was. But Python's implementation is leagues better and we also have better tooling (ie smart parsers) for handling fstrings.
- Java's been trying to add f/t-strings, but its designers appear to be perfectionists to a fault, unable to accept anything that doesn't solve every single problem possible to imagine: [1].
- Go developers seem to have taken no more than 5 minutes considering the problem, then thoughtlessly discarded it: [2]. A position born from pure ignorance as far as I'm concerned.
- Python, on the other hand, has consistently put forth a balanced approach of discussing each new way of formatting strings for some time, deciding on a good enough implementation and going with it.
In the end, I find it hard to disagree with Python's approach. Its devs have been able to get value from first the best variant of sprintf in .format() since 2008, f-strings since 2016, and now t-strings.
[1]: https://news.ycombinator.com/item?id=40737095
[2]: https://github.com/golang/go/issues/34174#issuecomment-14509...