Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In almost every database-backed application I've ever built someone, at some point, inevitable asks for the ability to see what changes were made when and by whom.

My current preferred strategy for dealing with this (at least for any table smaller than a few GBs) is to dump the entire table contents to a git repository on a schedule.

I've run this for a few toy projects and it seems to work really well. I'm ready to try it with something production-scale the next time the opportunity presents itself.



> My current preferred strategy for dealing with this (at least for any table smaller than a few GBs) is to dump the entire table contents to a git repository on a schedule.

How does that solve the problem? All this would do is give you snapshots of the state of the DB at particular points in time.

The only way to know who changed what when is to keep track of those changes as they are made, and the best way to do that is to stick the changes in an eventlog table.


A developer joined a team I was on once. The boss was infamous for hiring anyone with a pulse and assigning a small client project. Sink or swim.

As the senior dev it was my job to provide info as needed and subtly assess the new, potential rockstar code ninjas.

Anyway, this new guy says "I store all my dates in sql as text in iso format so I can easily sort them".


That is... almost not horrible?

If he had said anything other than "in sql" it would be the correct answer.

"I store all dates in spreadsheets in ISO format"

"I prepend ISO dates to marketing campaign names so I can easily sort them in the horrible web frontends that doesn't have a sort by date option"

"I start all my blog entries with ISO dates..."

I mean, just... so close!


> "I store all dates in spreadsheets in ISO format"

As in SQL, the application’s native date format (which is usually a number with appropriate format) would be the natural choice and a YYYY-MM-DD string would be a distant second. ISO (presumably, 8601) format is too broad, and admits of a mix of formats that are not simply sortable (quick, which is first 2021-W14-2 or 2021-04-07?).


Eh, when people say ISO they mostly mean YYYY-MM-DD.

If a date is going to be string prepended to something else, or if the spreadsheet is going to be exported to CSV, or parsed by some tool, then storing dates in YYYY-MM-DD can be a very justifiable thing to do in a spreadsheet.

Also Excel likes to format dates using the system locale, which is, well, stupid. So I have to enter YYYY-MM-DD as a custom format anyway, or change the locale on the column to China. (Or Cherokee, who apparently also properly format their dates!)

(Also Excel is smart enough if you set a column as YYYY-MM-DD custom format, it'll let you use all the date based pivot table stuff as expected)


It's actually a decent hiring process providing the new hires are made aware of it in advance.


He was a radical honestly type guy. It was all on the table and we had lots of success stories. We would give anyone a chance which let a lot of self taught, entry level people get work. I got a ton of experience mentoring people, as well (difficult people, too). Unfortunately, he got screwed over by a shotgun clause in a business agreement and lost the business that to an ex-partner. That's what led me to my current job, though, where I've been happy for over 10 years and it's much more stable.


> I store all my dates in sql as text in iso format so I can easily sort them

Strong SQLite vibes... (SQLite doesn't have a date format, it has builtin functions that accept ISO format text or unix timestamps)


As a Unix person, aren't dates just number? Why is sort complex? Just parse input and render timestamps


It's pretty miserable when you're trying to look up a date, and all you get are numbers that requires a conversion step to be readable.


Assuming by number you mean epoch seconds/milliseconds, a number can store the instant in time, but not a TIMEZONE. Obviously you could do bit magic if you shift it over (so you don't mess up sorting), but that's true for any data and a sufficiently large number type.


Time zone information makes this a bit tricky.


That's a good point. We were talking MS SQL but yes, that would make way more sense.


I did that in DB2 or Oracle (or both, don’t remember - it was 15 years ago and the project used both) because I needed millisecond time resolution and the database date time field did not support milliseconds.

Among the option of storing (date field, millisecond field) and (date as iso8601 text field with millisecond without time zone) I chose the latter, and I can’t tell for sure but it was almost surely the better choice.


Surely you can do both, so you can still perform sane sorting/indexing on "real" timestamps and then use the extra precision string timestamp for drilling down?


Iso8601 sorts just as well.

I saw no benefit in storing two copies. That’s recipe for problems - e.g. “update” code that doesn’t update both, or somehow updates them differently (which one is right?). You can probably have t-sql or some “check” constraint to enforce equivalence, but ... why?

For most practical uses, an ISO8601 textual representation is as good, though a little less space efficient.


> Iso8601 sorts just as well.

No, it doesn’t, since all of the following are valid ISO 8601 representations of a date:

20210408 2021-04-07 2021-W14-1 2021W143 2021-097 2021098

(Now, consistently using any one of the ISO 8601 formats is sortable...)


Point taken.

I always recommend and use the specific ISO8601 format

    YYYY-MM-DDThh:mm:ss.sssZ
(or drop the ".sss" if not needed), and be 100% consistent about it.

But you are right, I shouldn't refer to this as "use ISO8601" because it is only one of the numeous possible formats described by that spec.


Sounds like someone who's realised how crappy most databases' time handling is.


I once saw an app that had a text column as date, then 3 other columns for day month and year so they could sort by date.


I remember one design I reviewed that wanted to use XML as something to be queried using simple text comparison (this was before some databases adding explicit support for XML).


Not XML, but, the classic 'never parse html with regex'

https://stackoverflow.com/questions/1732348/regex-match-open...

:)


Meh, I don't see much wrong with this honestly. There are some cases where using text as the type is the only way. Bet you the junior dev worked with a lot of messy data in the past, where it wasn't always possible to do a direct datestr -> SQL date type.

Just point him in the right direction and move on.


Yeah, I did explain why that wasn't "how we did it", and tried not to sound like a know it all (I am definitely not this).

I am way less judgemental now and a lot of the comments here are making me rethink what I thought about him saying this. He had a lot of other issues and resigned after a couple of days so I guess we'll never know what he was capable of.


Hah, amazing and thanks for sharing. I think it's mostly that you sounded like you were holding a petty grudge over some small technical thing that can make sense. Maybe he resigned because he felt overly judged?


No, he had a day drinking and absentee problem. My boss talked with him, offered him another chance, but he resigned. He really didn't like following requirements either but I think it was just a symptom of his overall problems. He was going through something and we were just a stop on his journey.


Fair enough. Cheers.


Why is this bad?


Sql databases usually come with their own date types that are implemented with integers behind the curtains. They take up less space and are easier to sort than text fields.

EDIT: also I wouldn’t consider this egregious. If the Senior explains and the person was happy to learn something then that’s a good outcome. If they are stubborn about it, then that wouldn’t be great.


But which very likely store less information than an ISO timestamp does. (The integers stored are usually POSIX timestamps.) So, you might have to store, e.g., timezone information elsewhere. (And that might also be a good thing. It might not. It depends. Without knowing more, the DB's type is probably the right choice.)

I've seen far worse things than this done to a SQL database though…


Except that ISO-8601 doesn’t include anything as useful as a political time zone, just an offset from UTC. That’d be good if UTC offset was useful for anything, but instead it just adds to the confusion.


A lot of this was really cleared up for me by Evan Czaplicki's documentation for the Elm time library: https://github.com/elm/time


UTC offset is way more useful than those 3-4 character timezone abbreviations.

Just have a look at the list from Wikipedia [1] and notice how many repeats there are.

As an example AMT could be either UTC+4 or UTC-4.

And this doesn't even includes the various local acronyms that people use around the world.

[1] https://en.wikipedia.org/wiki/List_of_time_zone_abbreviation...


I didn't downvote you, but just for reference the time zone abbreviations you are supposed to use are in the tzdata db, and are listed https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

If you don't use these, you will inevitably be somewhat wrong due to leap issues, political changes in time, or the like.


Political zones are not the same thing as the abbreviated zones.

IANA time zone identifiers are fully-qualified and unambiguous like America/Los_Angeles or Asia/Kolkata.


Thanks, I have never heard the term “political zones” before, so I made an assumption.


At least in Postgresql, it is standard practice to use timestamptz type, which saves timestamp together with a time zone. Not only it uses less space on disc, but you can do a lot of operations with it natively in database itself: sort in various ways, compare to NOW(), construct time intervals and even ensure data validity with checks.


Timestamptz doesn't store a time zone in postgres. It converts all timestamps to utc. It's a common misconception.


> But which very likely store less information than an ISO timestamp does.

If you are trying to store dates and not timestamps, that’s what you want, otherwise equality tests might not work right.

> (The integers stored are usually POSIX timestamps.)

For date types, if they exist distinctly, usually not.


I'm going to guess because sql has rich datetime support. At least the majority of modern databases do.


You can then do things like greater than date or between X date and y date and get some use out of that column..


ITT HN collectively learns what an "audit log" is.


That reminds me of https://www.enterpriseready.io/ where you can find a guide of which features, such as an audit log, you may not have though about when building your side project, B2C startup, etc. that will facilitate its uptake by companies.

For example, for audit logs: https://www.enterpriseready.io/features/audit-log/


I would imagine the functional programming advocates here already understand this. For the rest, audit logs are at least a baby step towards that. Progress!


Completely agreed on the inevitability of that ask.

At risk of extreme overkill the other way, something like Debezium [1] doing change monitoring dumping into S3 might be a viable industrial strength approach. I haven't used it in production, but have been looking for appropriate time to try it.

[1] https://debezium.io/


usually businesses use auditing tables to track this, usually by setting up triggers to insert a row into the auditing table any time a delete, update or insert occurs on the original


So each commit is exactly one change to a relevant table?

I guess you can't just plug this into a common system of rotating logs easily, as there might be several changes beteeen the log rotation.

Also, I guess you'd need a user friendlier interface to actually display who made the change from the git repo.

Anyway, interesting solution.


So you commit to Git.

Sally makes a change to column 1 of record 1.

Billy makes a change to column 2 of record 1 a nanosecond later.

You commit to Git again.

Your boss wants to know who changed column 1 of record 1.

You report it was Billy.

Billy is fired.


You report it could have been Sally or Billy


That isn’t how this works. Based on OPs description, the fact that Sally made a change is gone since the last editor was Billy. It looks like Billy made the change to both column 1 and 2.


You tell your boss:

"Just so you know, we only take a snapshot every X minutes... so there's a chance Billy wasn't the person who made that change. If you had told me you were planning to fire people as a result of this tracking system I might have spent more than five minutes hacking it together with a cronjob and a git repository."


Your boss let changes go through the production DB directly

Your boss decides to fire people as a solution

Your boss should be fired


I’m assuming the change Sally made was an unethical or illegal change.


Your boss doesn't micro manage

You should be taking care of this, not Billy


Firing a team member for a team mistake is micromanagement.


I’m assuming Billy was fired because Sally made an unethical or illegal change.


Billy did nothing wrong!


Since this is being picked apart a bit in the comments, I should clarify that I don't think this is the BEST solution to this problem. It's the solution that gives you 90% if the value with 5% of the work.


> It's the solution that gives you 90% if the value with 5% of the work.

It’s not, though. History tables don’t provide only 111% of the value of this, and don’t take 20× the effort. They have a deterministic relation to base tables (so generating the schema from that of base tables, and the code necesaary for maintenance, is a scriptable mechanical transform if you have the base table schema.)

OTOH, lots of places I've seen approach this by haphazard addition of some of what I think of as “audit theater” columns to base tables (usually some or all of “created”, “created_at”, “last_updated”, “updated_by”), so its also not the worst approach I’ve seen.


A running joke at Uber is that every single configuration management system will end up re-implementing a half ass version of Git.


Were their exact words "an ad hoc, informally specified, bug-ridden, slow implementation of half of Git"?


Heh something to that effect. I’m not at Uber anymore so can’t reference the article which makes this assertion. It is quite well written, starts off with a super simple version of configuration management system and describes how it progressively evolves to encompass 80% of Git features.


Why not have a history table and use SQL triggers to insert the new data as JSON everytime someone changes something?


Indeed. I have on multiple occasions written scripts to automatically generate the triggers and audit tables




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: