More

tcdent · 2026-04-14T18:47:03 1776192423

I appreciate the effort you put into mapping semantics so language constructs can be incorporated into this. You’re probably already seeing that the amount of terminology, how those terms interact with each other, and the way you need to model it have ballooned into a fairly complex system.

The fundamental breakthrough with LLMs is that they handle semantic mapping for you and can (albeit non-deterministically) interpret the meaning and relationships between concepts with a pretty high degree of accuracy, in context.

It just makes me wonder if you could dramatically simplify the schema and data modeling by incorporating more of these learnings.

I have a simple experiment along these lines that’s especially relevant given the advent of one-million-token context windows, although I don’t consider it a scientifically backed or production-ready concept, just an exploration: https://github.com/tcdent/wvf

pranabsarkar · 2026-04-14T19:17:04 1776194224

Thanks for the careful read — the "schema is ballooning" observation is real and I've felt it building this. You're pointing at a genuine design tension.

My counter, qualified: deterministic consolidation is cheap and reproducible in a way LLM-in-the-loop consolidation isn't, at least today. Every think() invocation is free (cosine + entity matching + SQL). If I put an LLM in the loop the cost is O(N²) LLM calls per consolidation pass — for a 10k-memory database, that's thousands of dollars of inference per tick. So for v1 I'm trading off "better merge decisions" against "actually runs every 5 minutes without burning a budget."

On 1M-context-windows: I think they push the "vector DB break point" out but don't remove it. Context stuffing still has recall-precision problems at scale (lost-in-the-middle, attention dilution on unrelated facts), and 1M tokens ≠ unbounded memory. At 10M memories no context window saves you.

wvf is interesting — just read through. The "append everything, let the model retrieve" approach is the complement of what I'm doing: you lean fully into LLM semantics, I try to do the lookup deterministically. Probably both are right for different workloads. Yours wins when you have unbounded compute + a small corpus; mine wins when you have bounded compute + a large corpus that needs grooming.

Starring wvf now. Curious if you're seeing meaningful quality differences between your approach and traditional retrieval at scale.

tcdent · 2026-04-14T19:43:55 1776195835

Appreciate the thoughtful reply.

Absolutely agree the deterministic performance-oriented mindset is still essential for large workloads. Are you expecting that this supplements a traditional vector/semantic store or that it superceeds it?

My focus has absolutely been on relatively small corpii, and which is supported by forcing a subset of data to be included by design. There are intentionally no conventions for things like "we talked about how AI is transforming computing at 1AM" and instead it attempts to focus on "user believes AI is transforming computing", so hopefully there's less of the context poisoning that happens with current memory.

Haven't deployed WVF at any scale yet; just a casual experiment among many others.

pranabsarkar · 2026-04-15T00:42:24 1776213744

Supplements, definitely — for a specific workload. General document retrieval at scale (millions of chunks, read-heavy, doc-search patterns) is well-served by existing vector stores; YantrikDB doesn't compete on throughput. Where it's meant to supersede is the narrow case of agent memory: small-to-medium corpus, write-heavy with paraphrases every turn, lives for the lifetime of an agent identity, nothing curating the input.

Your "user believes X" framing is exactly the episodic/semantic split cognitive psych has been calling this for decades. YantrikDB exposes it via memory_type ∈ {episodic, semantic, procedural}. Your intuition about context poisoning from over-specific episodic details lines up with how I've been thinking about it — "we talked about AI at 1am" is high-noise low-signal for future retrieval. The design bet is consolidation + decay should burn episodic into semantic over time, and episodic-only memories should fade faster.

What does WVF stand for? Curious what you've been experimenting with.

evulhotdog · 2026-04-14T23:30:24 1776209424

To me, the OP’s reply reeks LLM, along with many others from them in this thread.

I would hope that their replies are from an actual person, knowing they’re interacting with people in a similar field as themselves, and asking for criticism from real people in the top comment.

tcdent · 2026-04-16T16:18:15 1776356295

It really doesn't bother me. The persistent flags about how people think something was AI generated are far noisier.

Technically all of my replies are from an LLM, too; they all went through transformer-backed STT.

tcdent · 2026-04-11T21:03:08 1775941388

Exactly this. It’s counterintuitive for most people, but the more complexity you add to the systems (the more organic they are), the more sustainably successful they become.

Everyone is looking for a simple solution, but simple solutions don't take into account human social dynamics.

tcdent · 2026-04-11T02:09:05 1775873345

Nah, taking the risk is even more fun when the thing you're modifying holds more value.

Chopping the fenders on a Porsche 911 to install a widebody kit does not have the same weight as rolling the seams on an Jeep Cherokee.

seizethecheese · 2026-04-11T02:16:57 1775873817

All things being equal, sure, but I personally am way more likely to mod the Cherokee than the Porsche

dotancohen · 2026-04-11T04:45:32 1775882732

I'd say it's an even split. Half the Jeeps on the road and on the trails are modified. On the road maybe 1/10 of Porches are modified, but on the track 90% are.

Big difference between bolt-ons vs deeper mods too.

tcdent · 2026-04-06T18:05:42 1775498742

I don't think you're wrong, but if you really want to really re-think the approach, building an orchestration layer for Firecracker like every other company in the space is doing is probably not it.

chwzr · 2026-04-07T13:32:05 1775568725

Wonder what you are thinking of then?

tcdent · 2026-04-05T01:36:27 1775352987

This is a fantastic idea.

On a nonzero number of occasions I have priced the cost of running an inference server with a model that is actually usable and the annual cost is astronomical.

tcdent · 2026-04-01T21:59:47 1775080787

And they said turning Lead into Gold was just heresy.

tcdent · 2026-03-28T21:20:31 1774732831

5.5 min to train on a PDP/11 you mean to tell me we could have been doing this all along???

rahen · 2026-03-28T21:35:57 1774733757

Yes. The Cray supercomputers from the 80s were crazy good matmul machines in particular. The quad-CPU Cray X-MP (1984) could sustain 800 MFLOPS to 1 GFLOPS, and with a 1 GB SSD, had enough computer power and bandwidth to train a 7-10M-parameter language model in about six months, and infer at 18-25 tok/sec.

A mid-90s Cray T3E could have handled GPT-2 124M, 24 years before OpenAI.

I also had a punch-card computer from 1965 learn XOR with backpropagation.

The hardware was never the bottleneck, the ideas were.

lucasfin000 · 2026-03-29T14:24:06 1774794246

Post-quantum crypto is a good example of this. Lattice-based schemes were theorized in the 90s, but they took decades to actually reach production. The math existed, the hardware existed, and the ideas for making it work were just not there yet.

CamperBob2 · 2026-03-29T04:13:53 1774757633

The hardware was never the bottleneck, the ideas were.

For sure. Minsky and Papert really set us back.

Onavo · 2026-03-29T06:26:57 1774765617

They should have lived to see the results of the bitter lesson.

CamperBob2 · 2026-03-29T15:50:15 1774799415

Minsky came close (d. 2016) -- although he may have had other interests later in life, if the Epstein file dumps are to be believed.

tcdent · 2026-03-24T20:51:25 1774385485

The replies lol.

"Yes" Proceeds to talk about AI.

tcdent · 2026-03-23T17:52:56 1774288376

DSPy is cool from an integrated perspective but as someone who extensively develops agents, there have been two phases to the workflow that prevented me from adopting it:

1. Up until about six months ago, modifying prompts by hand and incorporating terminology with very specific intent and observing edge cases and essentially directing the LLM in a direction to the intended outcome was somewhat meticulous and also somewhat tricky. This is what the industry was commonly referring to as prompt engineering.

2. With the current state of SOTA models like Opus 4.6, the agent that is developing my applications alongside of me often has a more intelligent and/or generalized view of the system that we're creating.

We've reached a point in the industry where smaller models can accomplish tasks that were reserved for only the largest models. And now that we use the most intelligent models to create those systems, the feedback loop which was patterned by DSPy has essentially become adopted as part of my development workflow.

I can write an agent and a prompt as a first pass using an agentic coder, and then based on the observation of the performance of the agent by my agentic coder, continue to iterate on my prompts until I arrive at satisfactory results. This is further supported by all of the documentation, specifications, data structures, and other I/O aspects of the application that the agent integrates in which the coding agent can take into account when constructing and evaluating agentic systems.

So DSPy was certainly onto something but the level of abstraction, at least in my personal use case has, moved up a layer instead of being integrated into the actual system.

sbpayne · 2026-03-23T17:57:02 1774288622

I think many people have the same experience! And that's the point I'm trying to make. There are patterns here that are worth adopting, whether or not you're using Dspy :)

tcdent · 2026-03-21T21:17:19 1774127839

Not worth it. It is a very significant performance hit.

With that said, people are trying to extend VRAM into system RAM or even NVMe storage, but as soon as you hit the PCI bus with the high bandwidth layers like KV cache, you eliminate a lot of the performance benefit that you get from having fast memory near the GPU die.

zozbot234 · 2026-03-21T23:12:11 1774134731

> With that said, people are trying to extend VRAM into system RAM or even NVMe storage

Only useful for prefill (given the usual discrete-GPU setup; iGPU/APU/unified memory is different and can basically be treated as VRAM-only, though a bit slower) since the PCIe bus becomes a severe bottleneck otherwise as soon as you offload more than a tiny fraction of the memory workload to system memory/NVMe. For decode, you're better off running entire layers (including expert layers) on the CPU, which local AI frameworks support out of the box. (CPU-run layers can in turn offload to storage for model parameters/KV cache as a last resort. But if you offload too much to storage (insufficient RAM cache) that then dominates the overhead and basically everything else becomes irrelevant.)"