More

febed · 2025-12-17T05:35:30 1765949730

What was your prompt to get it to run the test suite and heal tests at every step? I didn’t see that mentioned in your write up. Also, any specific reason you went with Codex over Claude Code?

simonw · 2025-12-17T06:35:51 1765953351

All of the prompts I used are in the article. The two most relevant to testing were:

  We are going to create a JavaScript port of ~/dev/justhtml - an HTML parsing library that passes the full ~/dev/html5lib-tests test suite. [...]

And later:

  Configure GitHub Actions test.yml to run that on every commit, then commit and push

Good coding models don't need much of a push to get heavily into automated testing.

I used Codex for a few reasons:

1. Claude was down on Sunday when I kicked off tbis project

2. Claude Code is my daily driver and I didn't want to burn through my token allowance on an experiment

3. I wanted to see how well the new GPT-5.2 could handle a long running project

EmilStenstrom · 2025-12-17T06:17:38 1765952258

For me (original author of JustHTML), it was enough the put the instructions on how to run tests in the AGENTS.md. It knows enough about coding to run tests by itself.

febed · 2025-12-14T01:52:29 1765677149

Can skills completely replace MCPs? For example, can a skill be configured to launch my local Python program in its own venv. I don’t want Claude to spend time spinning up a runtime

simonw · 2025-12-14T02:04:00 1765677840

Skills only work if you have a code environment up and running and available for a coding agent to execute commands in.

You can absolutely have a skill that tells the coding agent how to use Python with your preferred virtual environment mechanism.

I ended up solving that in a slightly different way - I have a Claude hook that spits attempts to run "python" or "python3" and returns an error saying "use uv run instead".

febed · 2025-12-11T23:18:48 1765495128

Apart from sounding a bit stiff and informal, I was also surprised at how good Gemini Live mode is in regional Indian languages.

febed · 2025-10-26T05:20:03 1761456003

The biggest issue with Windows 11 for me is the noticeable performance lag in basic apps like Notepad and File Explorer. Simple tasks, like opening files or navigating folders, feel sluggish, and I can visibly see windows rendering in slow motion. I’ve heard this might be due to Windows 11’s UI elements being redrawn over lower level UI. I’m considering switching to Linux as my daily driver.

febed · 2025-05-05T00:21:55 1746404515

Last I checked DuckDB spatial didn’t support handling projections. It couldn’t load the CRS from a .prj file. This makes it useless for serious geospatial stuff.

febed · 2025-05-03T13:14:07 1746278047

Any workaround for those needing prescription glasses?

alex1115alex · 2025-05-03T13:24:56 1746278696

You can get prescription inserts from Vuzix, but they're pretty bad. If you need a prescription, and want to run AugmentOS, your best bet is to buy the Even Realities G1 instead.

febed · 2025-05-02T10:36:56 1746182216

now do telnet

febed · on April 10, 2025

LLMs make it really trivial to work with MermaidJS. Just yesterday I used it to sketch out some business logic. Seeing the whole flowchart like that helped me catch some corner cases.

aitchnyu · on April 10, 2025

Which cheap LLMs are good for Mermaid, and to sketch graphs from code? I had to switch between the cheap LLMs and "tutor" them a bit to edit Mermaid in a Markdown file, although Claude seems perfect.

febed · on April 1, 2025

Would be interested to know what drawbacks you found with Dagster or Prefect.

jtbaker · on April 1, 2025

Prefect is amazing. Built out an ETL pipeline system with it at last job and would love to get it incorporated in the current one, but unfortunately have a lot of legacy stuff in Airflow. Being able to debug stuff locally was amazing and super clean integration with K8S.

bashfulpup · on April 2, 2025

Other guy said it right. These work and are fine but you lose the legacy stuff. If you know your limits and where the eventual system will end up it's great and probably better.

If you are building a expandable long term system and you want all the goodies baked in choose airflow.

Pretty much the same as any architecture choice. Ugly/hard often means control and features, pretty/easy means less of both.

On the surface the differences are not very noticable other than the learning curve of getting started.

febed · on March 27, 2025

Anyone compare this with BrowserUse ?

https://browser-use.com/

NinadSinha · on March 27, 2025

I think the use cases are slightly different between for the two. The playwright MCP depends on the mcp server (like claude desktop or cursor) to provide the intelligence, while browser-use can "think" by itself. Plus it seems that unless you use the vision mode, you are kind of restricted to the accessibility tree, which may not be present or well populated depending on the website you're using. This also means that it won't really work as well with stuff like cursor/windsurf since they don't really process images from MCPs right now.

I'm more in the camp of using claude computer-use/openai cua. I think they work better for most things, especially if you don't interact with hidden/obscured elements.

If you're interested in comparing these different services, you can try HyperPilot by Hyperbrowser at https://pilot.hyperbrowser.ai .

Disclaimer: I worked on Hyperpilot so I might be a bit biased.