Hacker Newsnew | past | comments | ask | show | jobs | submit | tamnd's commentslogin

Possible, but currently I disable all large files, including videos.

For video downloading, I suggest wrapping around yt-dlp. It's an awesome tool.


If there's more demand for that, maybe I will implement a more relaxed version.

Currently, all of that is broken. At one point, I had a traumatic experience where an archived HTML file kept redirecting to the live site, even though I already had all the content rendered, so I ended up disabling all JavaScript entirely.

Good news for you: here is the command to clone Apple Docs:

```bash bin/kage clone https://developer.apple.com/documentation/ \ --scope-prefix /documentation/ \ --out /Users/apple/data/apple-docs \ --chrome "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --max-pages 0 --max-depth 0 \ --workers 3 --browser-pages 3 --asset-workers 6 \ --render-timeout 60s --settle 2s --timeout 30s \ 2>&1 | tee -a /Users/apple/apple-docs.log ```

Adjust it to your needs :)

I smoke-tested it, and all the content and CSS work, but I stripped all the JS, so the sidebar won't work.

If you run into any problems, feel free to create new issues in the repo. It helps me prioritize and know what should be fixed.


This could be a nice code golf project. It only needs a webview, a ZIM reader, and a way to append data to an existing binary and read it back.

I did something like that a very long time ago (Of course, I have forgotten)


For sharing, better use the html folder or zim format, Kage supports both of them.

I have a project for creating and archiving RSS feeds, keeping the full history from the time the crawler starts. I need to clean up a bit, then will open source it soon.

Exactly. For downloading, Kage requires Chrome or Chromium. Running it inside Docker makes setup easier and keeps cleanup simple:

https://github.com/tamnd/kage/blob/main/Dockerfile

Btw, let me think the way to only enable this when running inside Docker.


Docker is designed to be undetectable by default, the best way I have found is to set env IN_DOCKER=True manually in your Dockerfile + check that there is no $DISPLAY configured + that you're on linux. Usually if all/most of those are true you can safely add --no-sandbox --disable-setuid-sandbox --disable-dev-shm-usage etc. all the docker-specific flags. Thats what we do in https://github.com/ArchiveBox/ArchiveBox/blob/dev/Dockerfile...

It should be fixed by https://github.com/tamnd/kage/pull/12

Thanks for nice trick.


Cool approach.

But, a compromise still lands on host's kernel, Docker doesn't provide kernel isolation (well it does on a macOS because it runs in Docker machine but thats a side effect).

I wonder if a better solution would be to play with seccomp or Linux capabilities so that Chrome is sandboxed even in Docker. Not sure how this would work tbh.

Answering here to get ideas, I saw your fix on Git and request for feedback (will try to review and give it some thought once I find some time)


Making docs available offline was one of my main motivations for building this tool. I will try Apple Docs too.

I previously downloaded the Snowflake docs, and it was something like tens or even hundreds of thousands of pages, I do not remember exactly. The output ended up being very large.

By the way, I forgot to add zstd compression support to my ZIM reader/writer. I will implement that in the next version.


Kiwix has readers for almost every platform, Android, desktop, iPhone. That's why I made Kage produce ZIM file.

The executable file is mostly for people who don't have Kiwix installed yet, or just want to run the archive directly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: