Hacker News new | past | comments | ask | show | jobs | submit | more ocdnix's comments login

I really hope it can replace PgBouncer. We've spent way too much time getting it running in ECS, mostly due to us needing a separate PgBouncer service for the primary and replica RDS instances. And then a third ECS service running HAProxy in TCP mode to load balance between the PgBouncer services, to provide a read-only endpoint that balances between the primary and replica. Add issues like: having to patch in HAProxy TCP keep-alive parameter support (it only allows enabling/disabling it, leaving params up to the OS, which you can't control on ECS Fargate, _and_ the AWS NLB having a 350-second TCP flow idle timeout - good luck with long-running queries); AWS NLBs having horrible timing issues when handling target group changes... it was not a great experience.


We also set up pgbouncer, but after testing we realized that running it in a container added extra latency so went with ec2 instances behind ASG.

For load blanacing we used ELBv1, we also tried to use NLB, but 350 second timeout, and broken behavior when after timeout it just silently closes the connections was a no-no. ELBv1 doesn't silently close connections, and idle timeout can be set to 1h. After that we configured pgbouncer to close idle connections after 55 minutes. ELBv1 also doesn't have the timing issues with health checks.

IMO NLB is maybe performant, but that's the only good thing about it, in every other category it is crap and for this use case ELBv1 isn't a bottleneck.

If you will plan to revise your setup, be aware that pgbouncer is single threaded so throwing multicore machines at it won't do much good. If CPU ends up being a bottleneck it is better to use a large single core instance and then adding more instances. The only problem with that is that as you're adding more instances you're increasing number of connections to the database, but you could have additional process that could monitor ASG and set the number of instances and dynamically adjust number of server-side connections (doesn't require restarting pgbouncer).

BTW: Since version 1.12.0 you can also run multiple pgbouncer instances listening on the same port so that can help make a better use of extra cores if you chose multicore instances.


I, too, am curious how it'll fare compared to DIY PgBouncer. Related, I am building something related that solves a superset of these classes of struggles. I'd love to pick your brain, if you're interested! Do you mind shooting me a message? (email in profile)


Why did you not use ECS with EC2 instead of patching HAProxy? With the Amazon AMI it is pretty straightforward.


If I were to do it again, I'd consider either that or building it directly on EC2, preparing the AMI with Packer or something, and handling scaling with an autoscaling group.


Yes, for about a year now, both in the console (alacritty) and vscode, and I think it's great. Granted, I mostly ignore it, which is a good thing.

I frequently change the term emulator font size, and each size oddly changes the "aspect ratio" – for lack of a better term – of both Plex and any other monospace font. I find it weird, but it does sometimes help to have a somewhat narrower variant of Plex, especially for side-by-side terminals.


This looks like it an interesting alternative to Gravitational Teleport.


Do you run into problems with Lambda's 5-minute maximum execution time for those kinds of operations? I'd like to do something similar to this for both RDS and DynamoDB, but the execution time will often surpass 5 minutes, meaning I'd have to run a Step Functions worker on EC2 or ECS. That opens up a whole bunch of complexity with managing the worker code and its deployment, which I'd rather avoid if possible.


With the current implementation; no problems hitting the limit. As mentioned in my below comment, our query for anonymization would be the heaviest - but it's designed to be quick as we don't care about unique values for most data.

If we did though - Fargate is a great solution for it, but you wouldn't be able to feed data back into the next step without some additional complexity - Maybe have the next step pull an SQS queue, or an S3 file, or look for a database entry, etc. as it's next bit of data that it needs - and just fail until it finds it, and once the Fargate (Or whatever) has done it's job and placed it in your method of choice, then it could continue.


I'd like to have metadata about the materials used in the lenses, specifically info about wetting agents. My go-to lens is Acuvue Moist, which has the "Lacreon" wetting agent, aka polyvinylpyrrolidone. When my lens store stopped carrying Moist, I had a hard time figuring out which other brand would have an equivalent gradual moisture release effect, given the jungle of proprietary material names. I'm not saying I'd like to see the chemical formula for each lens (although that would be nice too), but some physical parameters beyond brand name and type of lens would be useful.


That sounds interesting, but if you've been prescribed the Acuvue Moist, you can only purchase the Acuvue Moist as per law. So, although this info would certainly be interesting (and maybe reduce some customer confusion), it wouldn't actually be actionable.


You're US-only, then?

Here in Europe I can buy any contact lens I'd like, but I'm still short on information as to which ones might be better, or why.


"JSON documents are stored relationally, not as a blob/jsonb."

How is the transformation designed, to go from a structured document to a flat set of tables, akin to what an object-relational mapper would do?


The transformation is performed at the ToroDB layer. Each document is analyzed and several steps are performed:

- Document is received in BSON format (as per the MongoDB wire protocol) and transformed into a KVDocument. KVDocument is an internal implementation, an abstraction of the concrete representation of a JSON document (i.e., hierarchical, nested sets of key-pairs).

- Then, KVDocuments are split by levels (called sub-documents).

- Each subdocument is further split into a subdocument type and the data. The subdocument type is basically an ordered set of the data types of that subdocument.

- Subdocuments are matched 1:1 to tables. If there is an existing table for the given subdocument type, the document is stored directly there. If there isn't, a new table with that type is directly created. This means that there is also a 1:1 mapping between the attribute names (columns) and key names, and makes it very readable from a SQL user perspective.

- There is a table called structure that is basically a representation of the JSON objetct but without the (scalar) data. Think of the JSON object but only the braces and square brackets, plus all the keys (or entries in arrays) that contain objects. There is, per level, a key in this structure that cointains the name of the table where the data for this object is stored. This table uses a jsonb field to store this structure, but note that there's no actual data in this jsonb field.

- There's finally a root table which matches structure with the current document. This is used as structures are frequently re-used for many documents. This is in part one of the biggest factors which contributes to significantly reduce the storage required compared to, for example, MongoDB, as the "common" information of that "types of documents" is stored only once.

This information and more will be shortly added to the project's wiki. However, it's very easy to see if you run ToroDB and look at the created tables :)

Note: I'm one of the authors of ToroDB


Can you briefly explain the advantage of tearing apart the JSON document, rather than just storing it as JSONB and using the various JSONB functions/operators/indexes?


He did explain that - it can yield significant space savings where there is a common "type of document" because the field names do not need to be stored in each document instance, the way they would have to be in JSONB. Also, it means you can query these tables with normal Postgres query tools and they will actually make sense.


This is a pretty interesting approach, and it doesn't seem to be tied to any particular database implementation.

Would it be possible to factor this up into a JSON -> SQL translation function, which could than be used by various backends (effectively consume the JSON and spit out the CREATE TABLE and INSERT statements)?


Indeed.

It's more than a JSON to SQL translation, but it could definitely use various backends. It has some plpgsql Postgres code and some data types to speed operations (saving some round trips to the database), but it won't be hard to port it to other backends :)


Is it possible to quantify the savings of storage and IO?


There are some benchmarks in the following presentation: http://www.slideshare.net/8kdata/toro-db-pgconfeu2014 where ToroDB was presented (PostgreSQL Europe Conference, Madrid, 2014, 2014.pgconf.eu).


I'd be very interested to know more about this too. There is considerable overhead for each row in PostgreSQL (24 byte header, which must be aligned). Obviously you'll save a bit if there are repeated keys higher up, but you'll pay for the foreign key references to those rows in the children.


It is true that PostgreSQL has such a high row overhead. But the storage savings are significant, in any way. Please see the presentation pointed out earlier.


Can this be made to use HStore?


This is my question. How is the data actually broken down into tables/columns? I can't find a schema anywhere in the source, but maybe I just don't know where to look.

Also the code talks an awful lot about DB cursors, which indicates that this is not really taking advantage of either SQL or the relational model at all.


You might find Revenj (https://github.com/ngs-doo/revenj) interesting too. It maps DSL schema to various Postgres objects and offer 1:1 mapping between view in the DB and incoming/outgoing Json.


zrail, it's difficult to find the schema and/or tables in the source code as all of them are 100% dynamic. See my coments above about the internal workings of ToroDB. Or give it a try, run it and check how data is laid out!

Regarding the use of cursors, they are absolutely necessary, as there is -in MongoDB- the concept of a session, and queries may be asked in subsequent packets to return the next results. However, I don't see how this impedes to take advantages of the relational model. ToroDB definitely does that, if you look at the created tables schema.


I would also be very interested in how this kind of a breakdown occurs. It is not uncommon I have to store arbitrary structure, and if there is a better way than a blob, which can be abstracted away by an app, that would be quite useful on its own, without the MongoDB protocol.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: