Hacker Newsnew | past | comments | ask | show | jobs | submit | karmaniverous's commentslogin

If you know, you know. But not everybody does.


Well, that's the nice thing about a library... when it works, all that effort happens under the hood. :)

By "regular ole" I presume you mean some flavor of RDBMS. Those have significant issues at scale that the newfangled platforms don't... but as you see there's a price to be paid at design time.

If I do my job right, you get to have your cake & eat it too!


I mean srsly why use computers at all right.


DynamoDB is a very crippled product what does it have to do with using computers.


What about ScyllaDB


Can't really comment have not ever used it. In general I'd err on the side of using ACID SQL DB over NoSQL unless you have some very specific use case that will highly benefit from NoSQL.


The key focus of EM is the implementation of a multi-entity data model along the lines of the Single Table Design Pattern.

Recall that, in DDB, your index has two parts: hash & range key. If you want to have many entities in the same table, then you need a way of distinguishing between different entities, and a way of locating an individual record. In your primary index, those account for your hash and range keys, respectively: the hash key is your entity differentiator, and the range key is your entity id (which may come from a different record property from one entity to another). If you follow the development of the article, you’ll see how this plays out with variously constructed keys across different indexes.

Now, forget EM sharding for a minute and let DDB manage your sharding. Say you launch your application with little data and a single shard. Over time your data scales & spills over onto additional shards. When you perform a search, DDB has no way of knowing which shards are relevant so it has to search ALL of them.

But from the application side, your data scaled over TIME. Therefore, if you know which shards were created when, you could limit a time-based search only to the shards that are relevant to the search parameters. And a LOT of searches involve a time window.

Within the context of EM, when I say a “shard”, I am talking about a unique hash key value like `user!1F`, where `user` is the entity type and `1F` is the shard key. These may or may not map to physical DDB shards, and the good news is that you don’t NEED to care… DDB will flex if you don’t.

EM has a lot of features that greatly streamline the dev experience when operating against a DDB table with a multi-entity data model. You don’t HAVE to use the sharding feature… it’s literally just a config item, everything else happens behind the scenes. But when you DO use it, EM splits a search across sharded data into MANY parallel searches, one per shard, then assembles the returns into a coherent result with a “page key” that is actually a compressed map of ALL the underlying page keys. You don’t have to care about THAT, either… just pass the compressed string back to EM and it will rehydrate the page keys & perform the next set of searches.

So you get to choose your own adventure… you can run every entity on a single “shard” or run in parallel. I’d just keep an eye out for any drop in performance at scale and add a shard bump when I see it.

Also worth noting: EM is actually platform-agnostic. There is a companion repo that contains the DDB-specific client. This is still a bit in flux btw so be kind lol. Anyway the point is that other platforms that don’t have AWS’ resource footprint may not handle sharding as well, and EM will be able to render effectively the same result.

Hope that answers your question!

P.S. Worth noting: in addition to searching across multiple SHARDS, an EM query can also search across multiple INDEXES. Say you want to query on “name” and you want to query both your firstName and lastName indexes with the same “name” value. With EM, this is a SINGLE query that returns a combined, paged, deduped, sorted result set. Handy.


This approach is not really compatible with the single-table design pattern, which has some significant advantages. The point where performance degrades due to the issues you mentioned would be a good point to start applying sharding.


With due respect, I think you've misunderstood the single-table design pattern.

Because you've introduced static hash keys ("user", "email", etc) you've had to manually partition which DDB should do for you automatically. And while you covered the partition size limit you're also likely to have write performance issues because you're not distributing writes to the "user" and "email" hash keys.

Single-table design should distribute writes and minimize roundtrips to the database. user#12345 as a hash key and range keys of 'User', 'Email#jo@email.com', 'Email#joe@email.com', etc achieve those goals. If you need to query and/or sort on a large number of attributes it's going to be easier, faster, and probably cheaper to stream data into Elasticsearch or similar to support those queries.


I'm the author, and thanks for asking! The article is really just background, submitted by a friend. Still building the demo & documentation, so my apologies for the confusion.

Entity Manager is a framework for defining, managing, and most importantly QUERYING an entity model with DynamoDB. It's actually platform-generic, so the DynamoDB-specific machinery is implemented at https://github.com/karmaniverous/entity-client-dynamodb

Entity Manager's most important feature is that it permits a simple, scheduled partition sharding configuration and then transparent, multi-index querying of data across shards with a very compact, fluent query API.

This resolves the biggest challenge of using DynamoDB at scale, which is that very large data sets MUST be sharded, and a given query can ONLY operate against a single shard. If you're querying on the basis of a related record, you won't know which shard your results will be on so you must query ALL shards.

Entity Manager reduces this to an effortless operation: once you've defined your sharding strategy for a given entity, you can forget sharding is even a thing.

For some more color on Entity Manager within the context of SQL vs NoSQL databases, please review this (much shorter!) article: https://karmanivero.us/projects/entity-manager/sql-vs-nosql


That makes me want to learn more. I will read this.

Like most been burned and frustrated by this sort of sharded DB and unperformant queries if you dont or cant use the partition key. Usually you need a second table amirite :) And seen others burned.

Especially when your intro is a Jira ticket learning on the job!


Very cool of you to say so!

I've actually been using the JS version of EM in production for over a year. It's been working flawlessly.

The TS version is a complete rewrite that factors in a BUNCH of lessons learned and is completely--maybe obsessively lol--type-safe. The query builder got a LOT of attention, and the fluent API reduces even complex queries to a super-compact, declarative coding experience.

I'm pushing a big update tonight and will then resume my focus on the demo & docs, basically the companion stuff to this one. Should be ready for use in a couple of weeks.

Thanks for the interest, it really means a lot to me!


thanks … that’s really helpful


I’m laying down a big bet on Bali! Let me share the evidence that has inspired me to take this leap. Maybe it will inspire you to jump in with me!


I don't know if this is unusual, but it's striking. From Polymarket:

ELECTION WINNER Trump: 64% Harris: 36%

POPULAR VOTE WINNER Harris: 64% Trump: 37%

Looks like it's gonna be a bumpy ride lol.


I like it, but I like PlantUML better. Shitty documentation but awesome UML support, and also supports GraphML and a bunch of smaller libs like C4.


Ok fair enough but didn't have the same ring as "best known for promoting his own design artifacts".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: