(2) Saves the webpage it points to into a git repo (a simple curl should suffice for most websites)
(3) Inserts that URL, title of the page pointed-to by the URL and the optional comment into an org-mode file that lives in the root of the repo
The org-mode file is a highly-searchable and context-preserving database (I can add tags, create hierarchies, add links to and from other relevant (org-mode or not) files) in the most portable format ever: plain text.
I really don't need a web interface. Actually, if I later decide that I need one, I can build one easily on top of this basic system.
I really want to be able to use this across multiple devices: mainly my two computers, and an Android phone. Using git gives me a reliable protocol for syncing between multiple devices. I want it to be a smooth experience on my phone, which would probably require some sort of git-aware app. Something similar to the Android client for the pass password manager would be ideal.
I hear that git repos can be GPG-encrypted. Ideally, I'm able to serve all this off of a repo hosted on a VPS. I don't want to rely on Dropbox (I'm trying to transition away from it) for syncing.
>(2) Saves the webpage it points into a git repo (a simple curl should suffice for most websites)
FWIW I've done something similar and lots of sites that use a lot of JS (and pretty much every single page webpage like twitter and FB) will not re-render correctly just because you have the files. It actually takes a lot of work to clone a webpage, the best solution I've found so far is to print a PDF from a headless chrome (but this has its own problems, like now you have to deal with a PDF).
Even generating the PDF is a lot harder than it seems, at least if you've never done it before, because there are a lot of gotchas (for example, did you know that most websites provide a second stylesheet to be used while printing that makes it look barely messed up, but still clearly broken? I didn't either)
If the PDF format is not mandatory for you, you might be interested in SingleFile [1] (I'm the author) which you can run from the command line. It will interpret scripts and faithfully save a snapshot of a page in a single HTML file.
For many "modern" sites, its really better to just take a screenshot and save the PNG.
Though there are still many sites that render just fine without JS. I've been trying out Brave Browser with JS disabled for some weeks now, and I was surprised how many sites are readable with JS disabled. And so much faster and less jumpy too.
Hmmm.. this wouldn't be too hard to write and sounds like an interesting weekend project.
Would you be interested in using WARC for the webpage though? This way, everything is captured in a single file and you aren't littering your repo with random files and images.
Aren't we just in luck that the weekend is just coming up!
> Would you be interested in using WARC for the webpage though? This way, everything is captured in a single file and you aren't littering your repo with random files and images.
I didn't know about this. I've looked into it a bit, and it seems perfect.
I'm not too concerned about saving webpages, I'm much more concerned about actually having a populated database of links. I only expect to need to use the saved page if the link breaks.
I can work on writing a simple elisp script (incidentally, I don't know very much elisp either, but that's something I am willing to take time out to learn because I expect to be using it a lot in the future), but I do need someone else to write the Android app.
(2) Saves the webpage it points to into a git repo (a simple curl should suffice for most websites)
(3) Inserts that URL, title of the page pointed-to by the URL and the optional comment into an org-mode file that lives in the root of the repo
If you're willing to change "git" to "version control", it should be pretty easy to implement that in Fossil. It doesn't require much to add an extension written in your language of choice if you're going to run it on your desktop. Plus you'd get the web interface for free if you decided to put it on a web server.
I just wrote a script that cover the first 2 points (though it does create a pdf rather than a simple curl) and allows for searching the database. Org-mode stuff could be added later.
github.com/websalt/bmark
(1) Takes a URL and optional comment as input
(2) Saves the webpage it points to into a git repo (a simple curl should suffice for most websites)
(3) Inserts that URL, title of the page pointed-to by the URL and the optional comment into an org-mode file that lives in the root of the repo
The org-mode file is a highly-searchable and context-preserving database (I can add tags, create hierarchies, add links to and from other relevant (org-mode or not) files) in the most portable format ever: plain text.
I really don't need a web interface. Actually, if I later decide that I need one, I can build one easily on top of this basic system.
I really want to be able to use this across multiple devices: mainly my two computers, and an Android phone. Using git gives me a reliable protocol for syncing between multiple devices. I want it to be a smooth experience on my phone, which would probably require some sort of git-aware app. Something similar to the Android client for the pass password manager would be ideal.
I hear that git repos can be GPG-encrypted. Ideally, I'm able to serve all this off of a repo hosted on a VPS. I don't want to rely on Dropbox (I'm trying to transition away from it) for syncing.