In almost every database-backed application I've ever built someone, at some poi...

lisper · on April 5, 2021

> My current preferred strategy for dealing with this (at least for any table smaller than a few GBs) is to dump the entire table contents to a git repository on a schedule.

How does that solve the problem? All this would do is give you snapshots of the state of the DB at particular points in time.

The only way to know who changed what when is to keep track of those changes as they are made, and the best way to do that is to stick the changes in an eventlog table.

zikzak · on April 5, 2021

A developer joined a team I was on once. The boss was infamous for hiring anyone with a pulse and assigning a small client project. Sink or swim.

As the senior dev it was my job to provide info as needed and subtly assess the new, potential rockstar code ninjas.

Anyway, this new guy says "I store all my dates in sql as text in iso format so I can easily sort them".

com2kid · on April 6, 2021

That is... almost not horrible?

If he had said anything other than "in sql" it would be the correct answer.

"I store all dates in spreadsheets in ISO format"

"I prepend ISO dates to marketing campaign names so I can easily sort them in the horrible web frontends that doesn't have a sort by date option"

"I start all my blog entries with ISO dates..."

I mean, just... so close!

dragonwriter · on April 6, 2021

> "I store all dates in spreadsheets in ISO format"

As in SQL, the application’s native date format (which is usually a number with appropriate format) would be the natural choice and a YYYY-MM-DD string would be a distant second. ISO (presumably, 8601) format is too broad, and admits of a mix of formats that are not simply sortable (quick, which is first 2021-W14-2 or 2021-04-07?).

com2kid · on April 6, 2021

Eh, when people say ISO they mostly mean YYYY-MM-DD.

If a date is going to be string prepended to something else, or if the spreadsheet is going to be exported to CSV, or parsed by some tool, then storing dates in YYYY-MM-DD can be a very justifiable thing to do in a spreadsheet.

Also Excel likes to format dates using the system locale, which is, well, stupid. So I have to enter YYYY-MM-DD as a custom format anyway, or change the locale on the column to China. (Or Cherokee, who apparently also properly format their dates!)

(Also Excel is smart enough if you set a column as YYYY-MM-DD custom format, it'll let you use all the date based pivot table stuff as expected)

collyw · on April 6, 2021

It's actually a decent hiring process providing the new hires are made aware of it in advance.

zikzak · on April 6, 2021

He was a radical honestly type guy. It was all on the table and we had lots of success stories. We would give anyone a chance which let a lot of self taught, entry level people get work. I got a ton of experience mentoring people, as well (difficult people, too). Unfortunately, he got screwed over by a shotgun clause in a business agreement and lost the business that to an ex-partner. That's what led me to my current job, though, where I've been happy for over 10 years and it's much more stable.

goranmoomin · on April 6, 2021

> I store all my dates in sql as text in iso format so I can easily sort them

Strong SQLite vibes... (SQLite doesn't have a date format, it has builtin functions that accept ISO format text or unix timestamps)

calvinmorrison · on April 6, 2021

As a Unix person, aren't dates just number? Why is sort complex? Just parse input and render timestamps

yen223 · on April 6, 2021

It's pretty miserable when you're trying to look up a date, and all you get are numbers that requires a conversion step to be readable.

drchickensalad · on April 6, 2021

Assuming by number you mean epoch seconds/milliseconds, a number can store the instant in time, but not a TIMEZONE. Obviously you could do bit magic if you shift it over (so you don't mess up sorting), but that's true for any data and a sufficiently large number type.

adrianN · on April 6, 2021

Time zone information makes this a bit tricky.

zikzak · on April 6, 2021

That's a good point. We were talking MS SQL but yes, that would make way more sense.

beagle3 · on April 6, 2021

I did that in DB2 or Oracle (or both, don’t remember - it was 15 years ago and the project used both) because I needed millisecond time resolution and the database date time field did not support milliseconds.

Among the option of storing (date field, millisecond field) and (date as iso8601 text field with millisecond without time zone) I chose the latter, and I can’t tell for sure but it was almost surely the better choice.

hnarn · on April 6, 2021

Surely you can do both, so you can still perform sane sorting/indexing on "real" timestamps and then use the extra precision string timestamp for drilling down?

beagle3 · on April 6, 2021

Iso8601 sorts just as well.

I saw no benefit in storing two copies. That’s recipe for problems - e.g. “update” code that doesn’t update both, or somehow updates them differently (which one is right?). You can probably have t-sql or some “check” constraint to enforce equivalence, but ... why?

For most practical uses, an ISO8601 textual representation is as good, though a little less space efficient.

dragonwriter · on April 6, 2021

> Iso8601 sorts just as well.

No, it doesn’t, since all of the following are valid ISO 8601 representations of a date:

20210408 2021-04-07 2021-W14-1 2021W143 2021-097 2021098

(Now, consistently using any one of the ISO 8601 formats is sortable...)

beagle3 · on April 6, 2021

Point taken.

I always recommend and use the specific ISO8601 format

    YYYY-MM-DDThh:mm:ss.sssZ

(or drop the ".sss" if not needed), and be 100% consistent about it.

But you are right, I shouldn't refer to this as "use ISO8601" because it is only one of the numeous possible formats described by that spec.

lmm · on April 6, 2021

Sounds like someone who's realised how crappy most databases' time handling is.

dmingod666 · on April 6, 2021

I once saw an app that had a text column as date, then 3 other columns for day month and year so they could sort by date.

arethuza · on April 6, 2021

I remember one design I reviewed that wanted to use XML as something to be queried using simple text comparison (this was before some databases adding explicit support for XML).

dmingod666 · on April 7, 2021

Not XML, but, the classic 'never parse html with regex'

https://stackoverflow.com/questions/1732348/regex-match-open...

:)

chaps · on April 6, 2021

Meh, I don't see much wrong with this honestly. There are some cases where using text as the type is the only way. Bet you the junior dev worked with a lot of messy data in the past, where it wasn't always possible to do a direct datestr -> SQL date type.

Just point him in the right direction and move on.

zikzak · on April 6, 2021

Yeah, I did explain why that wasn't "how we did it", and tried not to sound like a know it all (I am definitely not this).

I am way less judgemental now and a lot of the comments here are making me rethink what I thought about him saying this. He had a lot of other issues and resigned after a couple of days so I guess we'll never know what he was capable of.

chaps · on April 6, 2021

Hah, amazing and thanks for sharing. I think it's mostly that you sounded like you were holding a petty grudge over some small technical thing that can make sense. Maybe he resigned because he felt overly judged?

zikzak · on April 7, 2021

No, he had a day drinking and absentee problem. My boss talked with him, offered him another chance, but he resigned. He really didn't like following requirements either but I think it was just a symptom of his overall problems. He was going through something and we were just a stop on his journey.

chaps · on April 8, 2021

Fair enough. Cheers.

qmmmur · on April 6, 2021

Why is this bad?

happythought · on April 6, 2021

Sql databases usually come with their own date types that are implemented with integers behind the curtains. They take up less space and are easier to sort than text fields.

EDIT: also I wouldn’t consider this egregious. If the Senior explains and the person was happy to learn something then that’s a good outcome. If they are stubborn about it, then that wouldn’t be great.

deathanatos · on April 6, 2021

But which very likely store less information than an ISO timestamp does. (The integers stored are usually POSIX timestamps.) So, you might have to store, e.g., timezone information elsewhere. (And that might also be a good thing. It might not. It depends. Without knowing more, the DB's type is probably the right choice.)

I've seen far worse things than this done to a SQL database though…

zachrose · on April 6, 2021

Except that ISO-8601 doesn’t include anything as useful as a political time zone, just an offset from UTC. That’d be good if UTC offset was useful for anything, but instead it just adds to the confusion.

zachrose · on April 7, 2021

A lot of this was really cleared up for me by Evan Czaplicki's documentation for the Elm time library: https://github.com/elm/time

chillfox · on April 6, 2021

UTC offset is way more useful than those 3-4 character timezone abbreviations.

Just have a look at the list from Wikipedia [1] and notice how many repeats there are.

As an example AMT could be either UTC+4 or UTC-4.

And this doesn't even includes the various local acronyms that people use around the world.

[1] https://en.wikipedia.org/wiki/List_of_time_zone_abbreviation...

hobs · on April 6, 2021

I didn't downvote you, but just for reference the time zone abbreviations you are supposed to use are in the tzdata db, and are listed https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

If you don't use these, you will inevitably be somewhat wrong due to leap issues, political changes in time, or the like.

hashhar · on April 6, 2021

Political zones are not the same thing as the abbreviated zones.

IANA time zone identifiers are fully-qualified and unambiguous like America/Los_Angeles or Asia/Kolkata.

chillfox · on April 6, 2021

Thanks, I have never heard the term “political zones” before, so I made an assumption.

golergka · on April 6, 2021

At least in Postgresql, it is standard practice to use timestamptz type, which saves timestamp together with a time zone. Not only it uses less space on disc, but you can do a lot of operations with it natively in database itself: sort in various ways, compare to NOW(), construct time intervals and even ensure data validity with checks.

singron · on April 6, 2021

Timestamptz doesn't store a time zone in postgres. It converts all timestamps to utc. It's a common misconception.

dragonwriter · on April 6, 2021

> But which very likely store less information than an ISO timestamp does.

If you are trying to store dates and not timestamps, that’s what you want, otherwise equality tests might not work right.

> (The integers stored are usually POSIX timestamps.)

For date types, if they exist distinctly, usually not.

Something1234 · on April 6, 2021

I'm going to guess because sql has rich datetime support. At least the majority of modern databases do.

dmingod666 · on April 6, 2021

You can then do things like greater than date or between X date and y date and get some use out of that column..

swiley · on April 6, 2021

ITT HN collectively learns what an "audit log" is.

harperlee · on April 6, 2021

That reminds me of https://www.enterpriseready.io/ where you can find a guide of which features, such as an audit log, you may not have though about when building your side project, B2C startup, etc. that will facilitate its uptake by companies.

For example, for audit logs: https://www.enterpriseready.io/features/audit-log/

antonvs · on April 6, 2021

I would imagine the functional programming advocates here already understand this. For the rest, audit logs are at least a baby step towards that. Progress!

acidbaseextract · on April 5, 2021

Completely agreed on the inevitability of that ask.

At risk of extreme overkill the other way, something like Debezium [1] doing change monitoring dumping into S3 might be a viable industrial strength approach. I haven't used it in production, but have been looking for appropriate time to try it.

[1] https://debezium.io/

Aqueous · on April 5, 2021

usually businesses use auditing tables to track this, usually by setting up triggers to insert a row into the auditing table any time a delete, update or insert occurs on the original

sedeki · on April 5, 2021

So each commit is exactly one change to a relevant table?

I guess you can't just plug this into a common system of rotating logs easily, as there might be several changes beteeen the log rotation.

Also, I guess you'd need a user friendlier interface to actually display who made the change from the git repo.

Anyway, interesting solution.

irrational · on April 5, 2021

So you commit to Git.

Sally makes a change to column 1 of record 1.

Billy makes a change to column 2 of record 1 a nanosecond later.

You commit to Git again.

Your boss wants to know who changed column 1 of record 1.

You report it was Billy.

Billy is fired.

simonw · on April 6, 2021

You report it could have been Sally or Billy

irrational · on April 6, 2021

That isn’t how this works. Based on OPs description, the fact that Sally made a change is gone since the last editor was Billy. It looks like Billy made the change to both column 1 and 2.

simonw · on April 6, 2021

You tell your boss:

"Just so you know, we only take a snapshot every X minutes... so there's a chance Billy wasn't the person who made that change. If you had told me you were planning to fire people as a result of this tracking system I might have spent more than five minutes hacking it together with a cronjob and a git repository."

TeeMassive · on April 6, 2021

Your boss let changes go through the production DB directly

Your boss decides to fire people as a solution

Your boss should be fired

irrational · on April 6, 2021

I’m assuming the change Sally made was an unethical or illegal change.

livvelupp · on April 6, 2021

Your boss doesn't micro manage

You should be taking care of this, not Billy

TeeMassive · on April 6, 2021

Firing a team member for a team mistake is micromanagement.

irrational · on April 6, 2021

I’m assuming Billy was fired because Sally made an unethical or illegal change.

antonvs · on April 6, 2021

Billy did nothing wrong!

simonw · on April 6, 2021

Since this is being picked apart a bit in the comments, I should clarify that I don't think this is the BEST solution to this problem. It's the solution that gives you 90% if the value with 5% of the work.

dragonwriter · on April 6, 2021

> It's the solution that gives you 90% if the value with 5% of the work.

It’s not, though. History tables don’t provide only 111% of the value of this, and don’t take 20× the effort. They have a deterministic relation to base tables (so generating the schema from that of base tables, and the code necesaary for maintenance, is a scriptable mechanical transform if you have the base table schema.)

OTOH, lots of places I've seen approach this by haphazard addition of some of what I think of as “audit theater” columns to base tables (usually some or all of “created”, “created_at”, “last_updated”, “updated_by”), so its also not the worst approach I’ve seen.

vishnugupta · on April 6, 2021

A running joke at Uber is that every single configuration management system will end up re-implementing a half ass version of Git.

bitwize · on April 6, 2021

Were their exact words "an ad hoc, informally specified, bug-ridden, slow implementation of half of Git"?

vishnugupta · on April 7, 2021

Heh something to that effect. I’m not at Uber anymore so can’t reference the article which makes this assertion. It is quite well written, starts off with a super simple version of configuration management system and describes how it progressively evolves to encompass 80% of Git features.

gjvnq · on April 6, 2021

Why not have a history table and use SQL triggers to insert the new data as JSON everytime someone changes something?

coolgeek · on April 7, 2021

Indeed. I have on multiple occasions written scripts to automatically generate the triggers and audit tables