Job description: To work on specific features across the Supabase product stacks, which may vary depending on assigned team. This is purely software development contracting with clearly scoped features and requirements.
Salary range: US$50/h.
Qualifications or experience required: 3+ years of experience with Elixir. JS, Rust, Go, and other languages are a plus.
What the successful job applicant will be working on: Immediate capacity is being sought after for the Logflare product, which would involve LiveView improvements for the Logflare UI, data ingestion and querying improvements, and interesting feature development areas involving OpenTelemetry, Clickhouse, and BigQuery. At Supabase scale, we value high performance code and innovative design to push the boundaries for our analytics infrastructure.
Send applications to elixir@supabase.io and include your github and resume.
I made a custom client for the ChatGPT API, so that I can template and chain together prompts to automate content generation. I only just finished off the workflow feature to prompt chain (where one output goes into one of more prompts), but personally think its cool and has lots of applications.
A little rough on the edges so probably not ready for a ShowHN yet.
Supporting quickwit and its query language is definitely feasible and would not require quickwit to have SQL support. However, we've got SQL DBs on our roadmap at the moment, so it might be a while until we get to quickwit
You mentioned above that BigQuery reduces cost. I am surprised by that assertion, tbh. Can you point out ways in which Logflare uses it that makes it so (for ex, is it tiered-storage with a BQ front-end)?
How does Logflare's approach contrast with other entrants like axiom.co/99 who are leveraging blob stores (Cloudflare R2) for storage and serverless for querying for lower costs?
Multiple pluggable storage/query backends (like Clickhouse) is all good, but is there a default that Logflare is going to recommend / settle on?
Are there plans to go beyond just APM with Logflare (like metrics and traces, for instance)?
I guess, at some level, this product signals a move away from Postgres-for-everything stance?
> Can you point out ways in which Logflare uses it that makes it so (for ex, is it tiered-storage with a BQ front-end)?
After 3 months BigQuery storage ends up being about half the cost of object storage if you use partitioned tables and don't edit the data.
> How does Logflare's approach contrast with other entrants like axiom.co/99 who are leveraging blob stores (Cloudflare R2) for storage and serverless for querying for lower costs?
Haven't really looked at their arch but BigQuery kind of does that for us.
> Multiple pluggable storage/query backends (like Clickhouse) is all good, but is there a default that Logflare is going to recommend / settle on?
tbd
> Are there plans to go beyond just APM with Logflare (like metrics and traces, for instance)?
Yes. You can send any JSON payload to Logflare and it will simply handle it. Official open telemetry support is coming, but it should just work if your library can send it over as JSON. And you can send it metrics.
> I guess, at some level, this product signals a move away from Postgres-for-everything stance?
Postgres will last you a very long time usually but at some point with lots of this kind of data you'll really want to use an OLAP store.
With Supabase Wrappers you'll be able to easily access your analytics store from Postgres.
> After 3 months BigQuery storage ends up being about half the cost of object storage if you use partitioned tables and don't edit the data.
It sounds like it's just standard GCS nearline pricing plus BigQuery margin fee. Raw nearline is cheaper to store, but has a per-GB access fee that'll hurt if you re-process old logs.
Interestingly, BigQuery's streaming read free tier of 300 TB/mo makes BigQuery a fun hack to store your old-but-read-more data into, even if it's e.g. backups blobs.
BigQuery reduces storage costs as even before their recent pricing change, the cost per GB [0] is on par with and slightly lower than s3 storage costs [0], which we can use as a estimated market price for data storage. BigQuery makes money off the querying costs, which we take steps to optimize and minimize, with table partitions, caching, and ui design, both client side and server side. These all help to reduce querying costs. Table partitioning also helps to cut per-GB storage costs by half by switching to long-term logical storage after 90 days.
Of course, using blog storage might possibly result in comparable cost, however, relying on blob storage would likely increase the querying costs (in terms of GET requests) as well as complexity to query across multiple stored objects/buckets, as opposed to relying on BQ to handle the querying.
In the long term, we would likely continue using BQ for our platform infrastructure, unless GCP changes their pricing in a way that adversely affects us. When it comes to self-hosting, it would of course depend on how much complexity one would like to take on, and out-sourcing the storage and querying management is a better option in most cases.
We would not rule out such features, but we consider them nice-to-have features and are very far down the priority list. At the moment we're mostly focused on improving integration with the Supabase products and platform. It is actually possible to log stack traces, and is supported out-of-the-box for certain integrations such as the Logflare Logger Backend [2].
Postgres without any extensions is not optimized for columnar storage, and would not be an optimal experience for such large scale data storage and querying. It is also not advisable to use the same production datastore for both your application and your observability, it is better to keep them separate to avoid coupling. If one really wants to use the same Postgres server for everything, there are extensions that allow for Postgres to work as a columnar store, such as citus[3] and timescaledb[4], and we have not ruled out supporting these extensions as Logflare backends.
Supabase-specific SDKs are still in the works. However, if you're using the Logflare service as is, there is a pino transport[0] available for sending events directly to Logflare.
Hi I’m one of the logflare devs and I work on observability at Supabase.
Great question. To directly address some of the tools you mentioned:
- Logstash is the transport and transformation portion (along with Filbert) in the elastic stack, and it performs the same functions as vector. It is out of scope for Logflare, which focuses on acting as a centralised server to point all your logging pipelines to.
- Kibana is the visualisation layer of the elastic stack. For Supabase, this functionality is taken over by the Supabase Studio, and the reporting capabilities will eventually converge to compete to match APM services like sentry etc.
- Splunk’s core is not open source, and is very much geared to large contract enterprise customers. Their main product is also much more geared towards visualisation as opposed to bare log analysis.
When it comes to a logging server/service, you’d consider the following factors:
- Cost. Logging is quite expensive, and the way that Logflare leverages BQ (and in the future, other OLAP engines) cuts down storage costs greatly
- Reliability. The last thing that you would want is for your application to take high load and go down, but you’re unable to debug it because the high traffic led to high log load and subsequently took down your o11y server. Logflare is built on the BEAM and can handle high loads without breaking a sweat. We’ve handled over 10x average load for ingestion spikes and Logflare just chugs along.
- Querying capabilities. Storing logs isn’t enough, you need to effectively debug and aggregate your logs for insights. This incurs both querying costs and additional complexity in the sense that your storage mechanism must be able to handle such complex queries without breaking the bank. Logflare performs optimisations for these, performing table partitioning and caching to make sure costs are kept low. This allows Supabase to expose all logging data to users and perform joins and filters within the Logs Explorer to their hearts’ content.
> - Reliability. The last thing that you would want is for your application to take high load and go down, but you’re unable to debug it because the high traffic led to high log load and subsequently took down your o11y server. Logflare is built on the BEAM and can handle high loads without breaking a sweat. We’ve handled over 10x average load for ingestion spikes and Logflare just chugs along.
Can logflare scale out into multiple containers / vms / machines? Is Supabase currently deploying something like autoscaling with kubernetes or something?
For the Logflare infra, we manage a cluster of around 6 VMs. Not a very complex setup, and no need for k8s as we have a monolith architecture. We also don't scale horizontally as much as the cross-node chatter increases with each additional node.
Job title: PT/FT Contractor.
Job description: To work on specific features across the Supabase product stacks, which may vary depending on assigned team. This is purely software development contracting with clearly scoped features and requirements.
Salary range: US$50/h.
Qualifications or experience required: 3+ years of experience with Elixir. JS, Rust, Go, and other languages are a plus.
What the successful job applicant will be working on: Immediate capacity is being sought after for the Logflare product, which would involve LiveView improvements for the Logflare UI, data ingestion and querying improvements, and interesting feature development areas involving OpenTelemetry, Clickhouse, and BigQuery. At Supabase scale, we value high performance code and innovative design to push the boundaries for our analytics infrastructure.
Send applications to elixir@supabase.io and include your github and resume.