Hacker Newsnew | past | comments | ask | show | jobs | submit | banditelol's commentslogin

Cool, can you share your setup for python and your current ide?

Yes sure, dev containers inside each project, that way the entire environment (debugger, all ide plugins for linting etc) are standard across all devs and the coding environment matches prod exactly

IDE is cursor


https://github.com/MedUnes/go-kata-solutions seems like they intended to create the solutions too, but seems like there's no progress yet


One of the things that made me think twice for self hosting postgres is securing the OS I host PG on. Any recommendation where to start for that?


Can you get away without exposing it to the internet? Firewall it off altogether, or just open the address of a specific machine that needs access to it?


I tried this before, but since I often need to open different browser even if a link came from the same app, I ended up moving to https://github.com/will-stone/browserosaurus

Not to say you cant use both tho


I've tried airbyte, sling, and dlt (besides building several tools from scratch)

My best bet for now will be dlt if you have dedicated DE team, but sling will get you a long way for moving data around your warehouse


Hi, I've been loking something like this! Any of your custumer has success story migrating off bigquery to your platform? And how do you compare to motherduck? (Looks like you built some of ypur stack on top of duckdb)


Yes, we've had many bigquery / snowflake converts. The reality is, most companies don't have 100tb of data (which is what those platforms are optimized for). Motherduck has a good post[0] on this:

> There were many thousands of customers who paid less than $10 a month for storage, which is half a terabyte. Among customers who were using the service heavily, the median data storage size was much less than 100 GB.

I'm a fan of what motherduck is doing. We're building something different (opinionated, instant data stack), but yes, we both use duckdb under the hood.

0 - https://motherduck.com/blog/big-data-is-dead/


Anyone have tried comparing with Qwen VL based model? I heard good things about its performance on ocr compared to other self hostable model, but haven't really tried benchmarking its performance


Yes I'd like to see this repeated with any of the small VLM's like IBM Granite or the HF Smols. Pretty much anything in the sub 7B range.


Now you make me wonder if I could run this entirely inside pyscript


I think you want something aling the line of dvc (github.com/iterative/dvc)


Looking at the syncer it seems like copying data to csv from the whole table everytime (?) Code: https://github.com/BemiHQ/BemiDB/blob/6d6689b392ce6192fe521a...

I cant imagine until at what scale can you do this and is there anything better we can do before using debezium to sync the data via cdc?

Edit: add code permalink


Our initial approach was to implement periodic full table re-syncing. We're starting to work on CDC with logical replication for incremental syncing. Here is our roadmap https://github.com/BemiHQ/BemiDB#future-roadmap


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: