Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it just me for thinking that using format strings instead of some strongly typed interface with verbose names for this is not great?

I would take `format(year(), '/', month(), '/', day())` over ad-hoc format strings by various APIs.

Reading the docs further this also stands out:

> For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964. During parsing, only strings consisting of exactly two digits, as defined by Character.isDigit(char), will be parsed into the default century. Any other numeric string, such as a one digit string, a three or more digit string, or a two digit string that isn't all digits (for example, "-1"), is interpreted literally. So "01/02/3" or "01/02/003" are parsed, using the same pattern, as Jan 2, 3 AD. Likewise, "01/02/-3" is parsed as Jan 2, 4 BC.

I hope nobody uses this to parse historical data that happens to become older than 80 years recently.



> Is it just me for thinking that using format strings instead of some strongly typed interface with verbose names for this is not great?

Yes, but it would not have helped with the bug in question. The parallel bug for a strongly typed interface would be "year()" returning this monstrosity, while "iso_year()" or some other poorly named variant returning the expected year.

No API is immune to footguns and bad design decisions.


I forgot how bad JS's date API was until I tried getting the year out of a date.

    var date = new Date()
    date.getYear() // 122
I plugged this into another object and couldn't understand why it was setting the date to 0122. Like, what significance does 122 years from 1900 even have?

Turns out I had to use getFullYear, which is of course perfectly intuitive.


I used the JS Date API for the first time a few months ago, and wrote something like

    date.getYear() + "-" + date.getMonth() + "-" + date.getDay()
in the hope of getting something like "2022-1-4".

The actual result for that date is "122-0-2" - not a single component of the date was what I expected.


date.toLocaleDateString ("fr-CA") would have got you there, but yea agreed 0 indexed months are whack.

I'm sure you've read the docs since then but just for the viewers at home:

s/getYear/getFullYear

s/getDay/getDate

getYear is oldfashioned, years since 1900. getDay is day-of-the-week, 0 for Sunday, 1 for Monday, etc


> 0 indexed months are whack

Why, if everything else in JS is 0 indexed? Isn't consistency a good thing in a language? If they'd gone the other way, wouldn't there be people here complaining that 1 indexing is whack?


Well because days are returned 1-indexed and calendars in general are 1-indexed, but of course the internal 'getMonth' method isn't meant to be a calendar, it's meant to provide an array index to ["January", "February", ...] as an interface to the other methods of printing a human-readable form.



Well yes, as it's an extention of code written in the 1970s, when dates were most commonly referred to with 2 digits.

Even when javascript was written (and likewise with things like perl and php - and of course the underlying function dated back to the 70s ), it was still quite common for people to write dates by hand as 6/6/96 for June 6th 1996, hence people would often write something like

print day() + "/" + mon() + "/" + year()

If you wanted to put a 4 digit year many people

print day() + "/" + mon() + "/19" + year()

After 2000 came round, many sites would claim the year was 19100, I still saw these sites until well into the late 00s

However some people wrote print day() + "/" + mon() + "/" + (1900 + year())

Which would give the correct display.

Changing year() from returning a 2 digit (which in the 80s and even in the 90s was often preferable) to 4 digits would be a breaking change. Changing it to being "the last two digits" rather than "number of years since 1900" would be a breaking change. That's not something that people like to do, hence things like "getFullYear" to return a 4 digit year (or 3 digit for <1000 etc)


To be fair getYear has been deprecated for over 20 years (a bit after it was deprecated in Java for the same reason). The Y2K problem was a real thing in lots of places.


The JS Date API was cribbed wholesale from Java's java.util.Date, which was cribbed wholesale from POSIX (and thus Unix) date APIs, which was designed in the 1970s. But both of the cribbings preferred compatibility with the older API rather than fixing obvious warts in the API like "year is since 1900, not 1 BC" or "month is 0-based instead of 1-based."


It made more sense before 2000 of course. And they can’t really change it without breaking many things.


Back then it was quite common to only care about two-digit years, and by implication only years from the 20th century.

This kind of feature[1] lead to many potential Y2K bugs.

[1] Which it shares with Perl and probably many other languages.


"Back then" for Javascript was 1995. The phrase "millennium bug" was already well established in the public mind by then.


date.getYear()

  99:  1999
  100: 2000
  …
  122: 2022


It's harder to make that bug. The common case is "year of era", so it is likely to be used for "year()".

On the other hand the much less often used "year of week" would be named "year_of_week()" and hence it is clear to everyone that's not likely what you want.


> It's harder to make that bug. The common case is "year of era", so it is likely to be used for "year()".

That would be the sane thing to do. But the same applies to `YYYY`: it should be used for "year of era". But it hasn't been and that's the problem here. For contrast in moment.js, YYYY and yyyy do what you expect and "week year" is GGGG or gggg.


> That would be the sane thing to do. But the same applies to `YYYY`: it should be used for "year of era".

Why? `yyyy` is simpler to type, so makes a lot more sense for "year of era".

> For contrast in moment.js, YYYY and yyyy do what you expect and "week year" is GGGG or gggg.

In LDML (which I assume is what SimpleDateFormat uses), the G field is already spoken for the era name (BC/BCE and AD/CE).


> The parallel bug for a strongly typed interface would be "year()" returning this monstrosity, while "iso_year()" or some other poorly named variant returning the expected year.

No, the parallel for this would be called `week_year()`, or `iso_week_year()`, to be paired with an `iso_week()` (or `year_week()` since there really is no other formal week-year).


The point is there's nothing about YYYY vs yyyy telling you at a glance that there is a significant semantic difference in the result.

Imagine an API which had methods 'getYear()' and 'GET_YEAR()' where the latter returns the payroll week year, while the former returns the, you know, actual year.

That's what having YYYY produce the week year feels like - a blatant violation of the principle of least surprise.

Which is why your op was saying this is like having an API where year() returns a surprising number thta isn't the year.

You know, like JavaScript does.


> The point is there's nothing about YYYY vs yyyy telling you at a glance that there is a significant semantic difference in the result.

There's the part that the entire LDML datetime grammar works like that.

Literally the next letter in your pattern will make it clear: M is the month, and m is the minute.

Which, granted, makes "y" designating the calendar year less than ideal as you'd want to pair Y with M, rather than y with M. Even more so as the 24-hour hour is H, so your standard ISO-8601 pattern is yyyy-MM-ddTHH:mm:ssZ which is... a bit of a mess casing-wise.

> Imagine an API which had methods 'getYear()' and 'GET_YEAR()' where the latter returns the payroll week year, while the former returns the, you know, actual year.

A big difference is the LDML pattern-space is rather more limited, and casing is definitely an important component of it.

> Which is why your op was saying this is like having an API where year() returns a surprising number thta isn't the year.

> You know, like JavaScript does.

If you want to rag on JS's datetime API, which really is Java's, first get in line, second getYear is hardly the worst offender (that belongs to the paired glue-eaters that are getDay and getDate).


All true.

But there's a little bit of a difference between mm/MM being minutes/months, where at least as someone thinking about datetime formatting you likely have a concept that the concepts of minutes and months are plausible things 'm' might stand for, vs. yyyy/YYYY where unless you've come across it before, the idea that there might be an inbuilt concept of a 'week year' is likely not going to be something that occurs to you.

I suppose the fact that both 'yyyy/mm/dd' and 'YYYY/MM/DD' produce values that are obviously wrong most of the time should be enough of an incentive for developers to go and look. up the codes. But that said, 'YYYY/MM/DD' produces what looks like the right answer for most of January...


Sure, the issue is somewhat orthogonal. But I assume they wanted to keep the format string parsing minimal, hence the single letter format specifiers. Once you have a strongly typed API, you are no longer bound by this and you can have sensible names.


Format strings are intended to be configurable (possibly per user-interface language) and not necessarily hardcoded. A strongly-typed builder API might be useful, but if you need both programmatic specification and external configuration, format strings can fulfill both purposes, whereas only having a builder API doesn’t.


The best part is your format language can be extensible to support things like inline JNDI lookups...


That's perhaps true, but there is no reason why we can't have both. There are many hard-coded format strings in practice.


Right, my intent was to explain that format-string APIs are there for good reasons and not an arbitrary choice.


`{year}-{month}-{date}` then.


More like `{numeric-year-no-leading-zeros}-{zero-padded-numeric-month}-{zero-padded-numeric-day-of-month}`. It’s virtually impossible to make it both succinct and unambiguous.


100% agreed with you. I do not understand why stringly typed date systems are still so prevalent.

When I write code that needs to format in different formats I always create well named, perhaps verbose but I don't care as it's now readable, functions such as this (ignore HN butchering the code):

    /\*
     \* Formats Date into "twelve hour time". For example, 3:23. This is "h:mm" format from date-fns.
     \* @param { Date } date The date.
     \* @returns { string } The formatted string.
     \*/
     export const formatAsTwelveHourTime = (date: Date) => format(date, 'h:mm');


I agree. I really like how the `time` crate[0] in the rust world handles this[1].

with your example:

> format_description!("[year]/[month]/[day]")

0: https://crates.io/crates/time

1: https://time-rs.github.io/book/api/format-description.html


This is easier to read but not descriptive enough since there are different kinds year, month, day. Year can be 2 digits, 4 digits, regular or week year. Month can have leading zero or not, long name, short, name. Day could be Julian, leading zero or not, day of the week, etc


That's when you can be more precise if you want: `[year padding:zero repr:full base:calendar sign:automatic]`. The format is also checked on compile time.


How long until that becomes Turing-complete ;)


+1 for strongly typed and compile-time checked, but I'm still not a fan of a sub-language within a string literal. The language already has syntax to structure things, why make a separate language within strings?


In the context of the original article addressing Java 8, in the java.time.format package, there's a DateTimeFormatterBuilder that removes the sub-language element of the issue. With those builder methods, you can construct the fields in the order you want, with whatever precision, with padding, etc.


> The language already has syntax to structure things

I don’t think the language has anything suitable for these purposes. What did you have in mind?


I don't know about Rust. Several people responded that Java already has a builder interface for creating a format string. That is possibly more type erased than the Rust macro equivalent. In C++ I would use variadic templates, as I originally proposed in my root comment: `format(year(), '/', ...)`.


The format_description! macro uses the same syntax as the format_description::parse method. If you want to support user-provided formatting strings (which you do), then you need such a method already, and once you’ve got that, why do the macro differently?

That components have parameters makes the non-literal approach even less compelling: take this which can produce the likes of “2:34:56pm”:

  format_description!("[hour padding:none repr:12]:[minute]:[second][period case:lower]")
For reference, that is equivalent to this:

  use time::format_description::{FormatItem, component::Component, modifier::{Hour, Minute, Second, Period, Padding}};

  [
      FormatItem::Component(Component::Hour(Hour { padding: Padding::None, is_12_hour_clock: false })),
      FormatItem::Literal(b":"),
      FormatItem::Component(Component::Minute(Minute { padding: Padding::Zero })),
      FormatItem::Literal(b":"),
      FormatItem::Component(Component::Second(Second { padding: Padding::Zero })),
      FormatItem::Component(Component::Period(Period { is_uppercase: false, case_sensitive: true })),
  ]
You could easily provide a prettier DSL so that you could write something like this:

  use time::format_description::shorthand::{HOUR, MINUTE, SECOND, PERIOD, literal};
  [
      HOUR.padding_none().repr_12(),
      literal(":"),
      MINUTE,
      ":".into(),  // even this if you wanted
      SECOND,
      PERIOD.case_lower(),
  ]
This wouldn’t be awful in the absence of the parse method, but really, once you have that, the format_description macro is just what you want: compact, checked at compile time, and matching a runtime equivalent which can take user-provided format strings.

(Now there are two or three changes I’d prefer to make to format_description’s syntax: I’d use = instead of :, the two being generally very similar but : far more regularly occurring in literal parts, so that the different = would make it scan better; and I think that escaping opening square brackets by doubling them but not requiring doubling for closing square brackets was a particularly bad idea; and I’m mildly inclined to prefer {} to []. So I might end up with "{hour padding=none repr=12}:{minute}:{second}{period case=lower}".)


Even as a proponent of strong typing, I fail to see how type system could have possibly helped in this case. Could you show some sort of example of how you envision types to be used here?

Verbose naming, yes, on the other hand would probably have made the situation clearer.


Have distinct types for each calendar (Gregorian, Julian, ISO week, Islamic, Jewish, Chinese, etc.) so it becomes obvious when you are mixing calendars.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: