More

ashug · on Jan 30, 2024

Thank you! Spot.io falls broadly into the same category as Vantage/Ternary/others, so the same answer as here applies: https://news.ycombinator.com/item?id=39183504.

In a sentence, these tools display the same cost data available within AWS - where max granularity is per-database or per-EC2 instance - whereas, Dashdive can accurately attribute portions of usage on the same DB or instance to different features/customers.

ashug · on Jan 30, 2024

Awesome!! We'll send you an email shortly :)

ashug · on Jan 30, 2024

Haha, we've heard many a story like this one over the months!

An advisor suggested we offer a sort of "pricing heuristics" deliverable to sales as part of our product (e.g. "if a customer needs feature foo, add $bar to the price"), and your story makes me think he's onto something.

ashug · on Jan 30, 2024

Thanks! It's on the roadmap :)

ashug · on Jan 30, 2024

TL;DR - We treat the per-customer usage data as the ground truth, and the per-customer cost can vary based on parameters chosen by the Dashdive user and their preferred mental model.

At a minimum, every customer is assigned the cost resulting from the usage that is directly attributable to them. This "directly attributable" figure is obtained by "integrating" vCPU-seconds and RAM-seconds on a per-DB-query basis, per-API-invocation basis, or similar. For example, if Customer X used 5 vCPU seconds and 7 RAM-GiB-seconds due to queries over a period of 10 seconds on our RDS cluster with total capacity 1 vCPU and 2 GiB RAM, then they directly utilized 50% of vCPU capacity and [7 GiB-sec / (2 GiB * 10 sec)] = 35% of RAM capacity over that period.

The question remains of how to distribute the cost of the un-utilized capacity over that period amongst the customers, perhaps distributing some portion to special "desired headroom" and "undesired headroom" values. As you mentioned, the answer is subjective and can vary between Dashdive users (or even over time for the same Dashdive user, e.g. a user decides they can reduce their desired headroom from 30% to 20%). The only sensible approach in our opinion is to make this configurable for the user, with sane and explicit defaults.

Let's go through both your examples to illustrate. In example 1, each of the 4 equally sized customers would be assigned only 12.5% of the total cost of the cluster. The dashboard would show that, by default, 30% headroom is desired, so out of the remaining 50% capacity, 30% would be marked as desired headroom, and 20% would be marked as undesired headroom. The user can override the desired headroom percentage. Although in our opinion it is most correct for all headroom to be treated as a fixed cost in multitenant scenarios, we would also provide the option to distribute both/either headroom types amongst all customers, either proportionally or equally.

For example 2, our model is not sophisticated enough to capture the nuance that losing only the larger customer would allow cost reduction. Assuming both customers used both databases (let's say they're replicas or shards), and 0% headroom, we would simply assign 40% of costs to the smaller customer and 60% of costs to the larger one. This is subtle, but the missed nuance is only important if $100/m is the finest-grained resource you can get. Otherwise, if you lose the 40% customer, you can switch to a 2x $60/m DB, for example.

This is a very astute callout! It has come up a couple times as a point of concern from prospective customers. Would be keen to hear if this diverges from your expectations at all.

danpalmer · on Jan 30, 2024

Thank you for the very detailed response, it all sounds excellent and it's great to see you've been thinking these things through. I don't really know what my expectations were, but this seems like a good combination of flexible, with opinions based in real-world experience. You're right that in my second example it's unlikely to occur at scale, the bigger you get the more granular you can be in general, so I doubt that'll be an issue.

ashug · on Jan 30, 2024

You're right; it's a bit inaccessible at the moment. We're planning to offer a more affordable tier in the next 1-2 months. A bit more context here: https://news.ycombinator.com/item?id=39178753#39186948.

ashug · on Jan 30, 2024

Thanks! Yes, we're working on making the product more accessible. Right now, for every new customer, we have to manually provision and manage some additional infrastructure. We're worried we could quickly get overextended in both time and cost if we have to do this for lots of users in a free tier for example.

It's on our roadmap in the next 1-2 months to eliminate these manual steps and make these last parts of our infra multitenant. At that point, we plan to release a cheaper tier for individual devs.

ashug · on Jan 30, 2024

Great point. We could definitely add usage-based pricing (UBP) adjacent features - for example, a Stripe integration and user-defined rules to auto-calculate invoices based on incurred cloud usage per-customer. Would that be useful?

However, it's not always possible to infer one's own costs from UBP events. In UBP products, the user defines what constitutes a "usage event" (e.g. "customer generates a PDF") and so these events can be quite disconnected from underlying cloud usage. In other words, there's nothing that prevents some "someone generated a PDF" events from incurring large amounts of EC2 usage while other "someone generated a PDF" events incur very little EC2 usage, depending on the input parameters to the workload. And in most UBP scenarios, this difference in underlying cloud usage from PDF generation to PDF generation is not taken into account; often all UBP events of a given type are billed at the same rate. In fact, we've seen this exact issue in the wild: namely, a company implementing UBP but still being unsure about profit margin because certain UBP event types had high variance in cloud usage per-event.

One company is planning to use Dashdive's S3 storage data to charge their customers based on usage, so in some cases the data we collect can serve as a substitute for UBP.

I agree that it would be more convenient if we also offered user-defined UBP events. This way, we could be a single vendor for the folks that want both usage monitoring and usage-based billing, where the UBP events don't necessarily align super well with underlying cloud usage.

esafak · on Jan 30, 2024

Excellent points. Now that I think about it, I think the two products are naturally offered together, because usage-based pricing should be based on underlying costs, monitoring which is what you provide. If you had better visibility into your costs, you could set usage-based prices, and price tiers in a more principled manner.

ashug · on Jan 29, 2024

You can use the usage and cost data Dashdive collects to identify cost spikes or ongoing inefficiencies (e.g. this particular feature is using more vCPU than should be necessary). But we won't do any automatic cost cutting for you (some products allow you to buy reserved instances or rightsize from directly within their app).

ashug · on Jan 29, 2024

We actually use Kafka rather than Kinesis, although they're very similar. For writing to ClickHouse from Kafka, we use the ClickHouse Kafka sink connector: https://github.com/ClickHouse/clickhouse-kafka-connect.

darkbatman · on Jan 30, 2024

we are actually trying something similar but possible kinesis + clickhouse or kafka + clickhouse. Currently kinesis seems easier to deal with but not a good intergration or sink connector available to process records at scale for kinesis to put into clickhouse. Were you ever felt into similar problems where you had to process records at huge scale to be able to insert into clickhouse without much delay.

One more thing is kinesis can have duplicates while kafka is exactly once delivery.

ashug · on Jan 30, 2024

I'm not familiar with Kinesis's sink APIs, but yes I'd imagine you'll have to write your own connector from scratch.

To answer your question, though, no: in the Kafka connector, the frequency of inserts into ClickHouse is configurable relatively independent of the batch size, so you don't need massive scale for real-time CH inserts. To save you a couple hours, here's an example config for the connector:

  # Snippet from connect-distributed.properties

  # Max bytes per batch: 1 GB
  fetch.max.bytes=1000000000
  consumer.fetch.max.bytes=1000000000
  max.partition.fetch.bytes=1000000000
  consumer.max.partition.fetch.bytes=1000000000

  # Max age per batch: 2 seconds
  fetch.max.wait.ms=2000
  consumer.fetch.max.wait.ms=2000

  # Max records per batch: 1 million
  max.poll.records=1000000
  consumer.max.poll.records=1000000

  # Min bytes per batch: 500 MB
  fetch.min.bytes=500000000
  consumer.fetch.min.bytes=500000000

You also might need to increase `message.max.bytes` on the broker/cluster side.

If you're still deciding, I'd recommend Kafka over Kinesis because (1) it's open source so more options, e.g. self host or Confluent or AWS MSK and (2) it has a much bigger community, meaning better support, more StackOverflow answers, a plug-and-play CH Kafka connector, etc.

darkbatman · on Feb 2, 2024

Thanks these config are helpful