Hacker Newsnew | past | comments | ask | show | jobs | submit | dchu17's commentslogin

Gave this a run and it was pretty intuitive. Good work!


Thanks!


Yes this is a major problem I thought about. The makeshift solution here was to redact the “identifying information” on the press release. Even then, I benchmarked that GPT-5 could still match it back to the right TIKR around 53% of the time. It does not seem to be able to recall the price of the stock in my benchmark, but to be honest I’m not entirely sure how trustworthy this benchmark is and I may need to come up with a few more clever solutions to validate.

One solution could be to get experts to write similar press releases so that the text itself is out of distribution or if an actual quant firm has internal models, they can just make sure that there is a cutoff date to the pre-training data.

I'm curious, when you ran a quant fund, what was your approach?


We didn't use other LLMs. We built our own models and had a system designed to never leak future information at any given timepoint. Models can only access data the system allowed it at any time point gating future information. This means even training has to use the same system.

You have to design it from the ground up with that approach. Just to give you an idea of how hard it is, when a company releases an earnings report, they can update it in the future with corrected information, so if you pull it later you will leak future information into the past. So even basics like earnings need to be versioned by time.

But you know, most people don't really care and think they have an edge, and who knows maybe they do. Only live trading will prove it.


Yes we know of a few. Honestly, it was pretty hard to even find a good catalyst calendar for this space.

I'll give it a read to learn more. Thanks for the note!


That's interesting. I am curious, what kind of analyses did you work with on the molecule and drug itself? Was it like mostly reading papers/patents or did your team do anything experimental?


Nothing experimental. Only using foundational models. And usually basic stuff thats more relevant to the indication we were looking ie does it cross the bbb or potency models


Our initial goal with this project actually wasn't trying to get an edge in terms of better evaluating information, but rather, we wanted to see if an LLM can perform similarly to a human analyst at a lower latency. The latency for the market to react to catalysts is actually surprisingly high in biotech (at least in some cases) compared to other domains so there may be some edge there.

Appreciate the comment though! I generally agree with your sentiment!


Hi, thanks for the comment! Just wanted to respond to some of comments here:

>> First, your business model isn't really clear, as what you've described so far sounds more like a research project than a go-to-market premise.

This is not really a core component of our business but more so was just something cool that I built and wanted to share!

>> Computational pathology is a crowded market, and the main players all have two things in common: access to huge numbers of labeled whole-slide images, and workflows designed to handle such images. Without the former, your project sounds like a non-starter, and given the latter, the idea you've pitched doesn't seem like an advantage. Notably, some of the existing models even have open weights (e.g. Prov-GigaPath, CTransPath).

We have partnerships with a few labs to get access to a large amount of WSIs, both H&E and IHC, but our core business really isn't building workflow tools for pathologists at the moment.

>> Second, you've talked about using this approach to make diagnoses, but it's not clear exactly how this would be pitched as a market solution. The range of possible diagnoses is almost unlimited, so a useful model would need training data for everything (not possible). My understanding is that foundation models solve this problem by focusing on one or a few diagnoses in a restricted scope, e.g. prostate cancer in prostate core biopsies.

I agree with you in that I don’t necessarily think this is really a market solution at the current state (it isn't even close to accurate enough), but I think that the beauty of this solution is the general-purpose nature of it in that it can work not only across tissue types, but also different pathology tasks like IHC scoring along with cancer sub typing. The value of foundation models is in the fact that tasks can generalize. For example, part of what made this super interesting to me was the fact that the general purpose foundation models like GPT 5 are able to even perform this super niche task! Obviously there are path-specific foundation models too that have their own ViT backbones, but it is pretty incredible that GPT 5 and Claude 4.5 can perform at this level already.

Yes to the best of my knowledge, most FDA-approved solutions are point solutions, but I am not yet convinced this is the best way to deploy solutions in the long-term. For example, there will always be rare diseases where there isn't enough of a market for there to be a specialized solution for and in those cases, general-purpose models that can generalize to some degree may be crucial.


Thought about this too. I think there are two broad LLM capabilities here that are kind of currently tangled up in this eval:

1. Can an LLM navigate a slide effectively (i.e find all relevant regions of interest)? 2. Given a region of interest, can an LLM make the correct assessment?

I need to come up with a better test here in general but yep I'm thinking about this


I've been thinking a bit more about better ways to build the tooling around it, I don't know much about video compression to be fully transparent but will read up on it.

I have been running into some problems with memory management here as each later frame needs to have a degree of context of the previous frames... (currently I just do something simple like pass in the previous frame and the first reference frame into context) maybe I can look into video compression and see if there is any inspiration there


Nope I haven't, I can take a look and see if I can fit it in


I think so. It feels like there is more to be squeezed from just better prompts but was going to play around with fine-tuning Qwen3


fair enough. I wonder if fine-tuning over different modalities like IMC, H&E etc would help it generalize better across all


Yeah I think one of the things that would be interesting is to see how well it generalizes across tasks. It seems like the existence of pathology foundation models means there is certainly a degree of generalizability (at least across tissues) but I am not too sure yet about generalizability across different modalities (there are some cool biomarker-prediction models though)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: