Hacker Newsnew | past | comments | ask | show | jobs | submit | emilburzo's commentslogin

I just tested it on one of my nemeses: PDF bank statements. They're surprisingly tough to work with if you want to get clean, structured transaction data out of them.

The JSON extract actually looks pretty good and seems to produce something usable in one shot, which is very good compared to all the other tools I've tried so far, but I still need to check it more in-depth.

Sharing here in case someone chimes in with "hey, doofus, $magic_project already solves this."


Camelot[1] worked very well for me with bank statements. Disclaimer: I'm one of the core contributors.

[1] https://github.com/camelot-dev/camelot


For 'zoned' extraction, Cermine[0] may be of use as a pre-processing step. Mileage may vary as its tailored towards papers.

[0]: http://cermine.ceon.pl/about.html


This is exactly what I did for my DIY solution, it's quite amazing how far you can get with some SMS parsing.

Initially I was still missing quite a lot of transactions, as there isn't a SMS for non-CC payments, but then I realized I could just grab the information from the 2FA confirmation, as it had all the important details.

I'm not so good on the frontend, so I'm still using redash to build charts, dashboards and alerts, but it's fine for keeping an overview.

Long-term I would still like to switch to parsing the statement PDF from the bank, especially since there's an option to receive them automatically at the end of the month, but I'm dragging my feet on that one, as parsing tables from PDF is surprisingly difficult. Even though the actual text is embedded in the PDF, getting it in the correct layout has proven impossible so far. Or maybe I'm just missing some piece of the puzzle?


> without resorting to google maps history (which is great but has…obvious downsides)

Maybe relevant: they moved it to on-device now: https://9to5google.com/2023/12/12/google-location-history-ti...


Are there any plans to have a version without the battery? It looks exactly like what I've been looking for otherwise.

Also, what country are the orders shipped from? US?


Internally we are debating on releasing a Hackable DIY kit. Feel feel to send a message to support@usetrmnl.com.

It's shipped from USA.


curious what your use case is without a battery. currently you could keep it plugged in, are you wnting NFC-powered etc?


Nothing spectacular, I just want to have a display by the door that shows various things I'd like to check on before leaving, like: which windows are open, outside temperatures, etc.

I don't want a battery because:

- although every X months is quite ok, I don't want the hassle of remembering to charge it (first world problems, I know)

- but I also have a fear of leaving devices with a battery plugged in for a long time / having to monitor for battery swelling or other abnormalities

I already have a classic battery-powered display which shows temperature info from some sensors and it's really convenient, but annoying when the battery is dead right when you need the info. Even if that only happens every X months.


A GP in this thread linked to Inkycal, which is a RPi0W based solution, no batteries:

https://github.com/aceinnolab/Inkycal


Previous discussion with more comments: https://news.ycombinator.com/item?id=42017771


Just wanted to give a shout out that the forecasts for this year have been so much more spot on compared to last year, happy to see the UI/UX will also get some love!


Romania is missing from the list of phone number countries on signup, not sure if on purpose or not.


Is any of the code public?

Or at least the tool(s) you use?

I have the same need but it's surprisingly difficult to get it right, at least with the `camelot` or `fitz` python packages.


No public code. This has been a long running project for me. Last I touched it- pre-LLM world- it had turned into a real Rube Goldberg machine. Hard to imagine anyone else putting up with it.

PDF to text (using either python or Java lib), which then is turned into a "header" structure with dates and balances via configuration driven regexes, and a "body" structure containing the transactions. The transactions themselves go through an EBNF parser to extract the date(s), narration, amount, and balance if reported. The narration text gets run against a custom merchant database for payee and categorization. It is a painful problem! The code is Clojure so there is not much of it, and there are high abstraction libraries like Instaparse that make it easy to use grammars as primitives. And the rube goldberg has yielded for me balance-validated data now for the last several years from half a dozen financial providers.

I have been incorporating local LLMs, running on an RTX 3090, into some other workflows I have, hope over the summer to see if those can help simplify some of the workflow.


> like convert PDF bank statements into CSV transaction files

I've tried this recently and it's surprisingly difficult. Any pro-tips?

Extracting pdf tables, while respecting the cell position, seems almost impossible in a way that works in all cases (think borderless tables, whitespace cells, etc)


It is remarkably difficult and continues to provide a good example of the limitations of LLM based systems.

In my case, I used perl, and exploited the fact for for a given bank, the statements are consistently formatted. Further, PDF OCR conversion responds consistently to the documents with the same formatting. With this combination, it is possible to extract the characters and numbers that are associated with transactions from the document, and then to take those extracted bundles of text and transform them into lines for a CSV file.

The caveat is that it works for only that bank, that "kind" of account (usually checking, credit card, or savings), and when using that specific document OCR tool. Within those constraints it is eminently reliable but utterly non-transferable to a general case.


If you use AWS config setup for the organization (aggregator), you'll get a athena-sql-queryable inventory of all your resources from all organization accounts.

So finding out which account owns a resource can be as simple as, roughly: select accountId where arn = "x"


You can also do this with steam pipe.

It might not scale well beyond tens of accounts though, depending in your query…


... how did I not know this existed.

That is exactly how we are setup, the amount of time I just spent going account by account looking for a specific resource.

Thank you! I have long wondered why it didn't exist, and apparently it did...


Be aware that AWS Config is not free. https://aws.amazon.com/config/pricing/


Yeah it's pretty nice feature wise but surprisingly expensive given all it really does is run API calls in a loop and export to S3


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: