Hacker Newsnew | past | comments | ask | show | jobs | submit | more febed's commentslogin

What was your prompt to get it to run the test suite and heal tests at every step? I didn’t see that mentioned in your write up. Also, any specific reason you went with Codex over Claude Code?


All of the prompts I used are in the article. The two most relevant to testing were:

  We are going to create a JavaScript port of ~/dev/justhtml - an HTML parsing library that passes the full ~/dev/html5lib-tests test suite. [...]
And later:

  Configure GitHub Actions test.yml to run that on every commit, then commit and push
Good coding models don't need much of a push to get heavily into automated testing.

I used Codex for a few reasons:

1. Claude was down on Sunday when I kicked off tbis project

2. Claude Code is my daily driver and I didn't want to burn through my token allowance on an experiment

3. I wanted to see how well the new GPT-5.2 could handle a long running project


For me (original author of JustHTML), it was enough the put the instructions on how to run tests in the AGENTS.md. It knows enough about coding to run tests by itself.


Can skills completely replace MCPs? For example, can a skill be configured to launch my local Python program in its own venv. I don’t want Claude to spend time spinning up a runtime


Skills only work if you have a code environment up and running and available for a coding agent to execute commands in.

You can absolutely have a skill that tells the coding agent how to use Python with your preferred virtual environment mechanism.

I ended up solving that in a slightly different way - I have a Claude hook that spits attempts to run "python" or "python3" and returns an error saying "use uv run instead".


Apart from sounding a bit stiff and informal, I was also surprised at how good Gemini Live mode is in regional Indian languages.


The biggest issue with Windows 11 for me is the noticeable performance lag in basic apps like Notepad and File Explorer. Simple tasks, like opening files or navigating folders, feel sluggish, and I can visibly see windows rendering in slow motion. I’ve heard this might be due to Windows 11’s UI elements being redrawn over lower level UI. I’m considering switching to Linux as my daily driver.


Last I checked DuckDB spatial didn’t support handling projections. It couldn’t load the CRS from a .prj file. This makes it useless for serious geospatial stuff.


Any workaround for those needing prescription glasses?


You can get prescription inserts from Vuzix, but they're pretty bad. If you need a prescription, and want to run AugmentOS, your best bet is to buy the Even Realities G1 instead.


now do telnet


LLMs make it really trivial to work with MermaidJS. Just yesterday I used it to sketch out some business logic. Seeing the whole flowchart like that helped me catch some corner cases.


Which cheap LLMs are good for Mermaid, and to sketch graphs from code? I had to switch between the cheap LLMs and "tutor" them a bit to edit Mermaid in a Markdown file, although Claude seems perfect.


Would be interested to know what drawbacks you found with Dagster or Prefect.


Prefect is amazing. Built out an ETL pipeline system with it at last job and would love to get it incorporated in the current one, but unfortunately have a lot of legacy stuff in Airflow. Being able to debug stuff locally was amazing and super clean integration with K8S.


Other guy said it right. These work and are fine but you lose the legacy stuff. If you know your limits and where the eventual system will end up it's great and probably better.

If you are building a expandable long term system and you want all the goodies baked in choose airflow.

Pretty much the same as any architecture choice. Ugly/hard often means control and features, pretty/easy means less of both.

On the surface the differences are not very noticable other than the learning curve of getting started.


Anyone compare this with BrowserUse ?

https://browser-use.com/


I think the use cases are slightly different between for the two. The playwright MCP depends on the mcp server (like claude desktop or cursor) to provide the intelligence, while browser-use can "think" by itself. Plus it seems that unless you use the vision mode, you are kind of restricted to the accessibility tree, which may not be present or well populated depending on the website you're using. This also means that it won't really work as well with stuff like cursor/windsurf since they don't really process images from MCPs right now.

I'm more in the camp of using claude computer-use/openai cua. I think they work better for most things, especially if you don't interact with hidden/obscured elements.

If you're interested in comparing these different services, you can try HyperPilot by Hyperbrowser at https://pilot.hyperbrowser.ai .

Disclaimer: I worked on Hyperpilot so I might be a bit biased.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: