Author here. Thanks so much for the interest! It's funny, I've been hacking away at a major update that I plan to release in the next few days... Was assuming I'd stay under the radar until then. The update adds a ton of features and improves the UI quite a bit ;)
To respond to some of your ideas feedback:
* I do plan to sell the service by some combination of charging to buy the native version and a monthly/yearly fee for the cloud version.
* By offering the native version I hope to assuage any privacy or legacy concerns -- all you data is on your machine (encrypted and backed up however you see fit). You'll even have access to a local API to extract or do whatever you want with it.
* One idea I've had is to offer a cloud version / native version combo. You would sync to the cloud only your bookmarked sites -- all the other indexed pages you visit would stay on the local version. This way you control what gets put up on the servers but can still have access to your links from all your devices. Thoughts?
* I'd also consider open sourcing it (it's built on Meteor and ElasticSearch) but really do need to get paid for my efforts (just had a baby) and am not familiar with all the ins/outs of open source based businesses. I'd love to hear ideas and advice!
* This has turned out to be quite a lot more difficult than I'd thought but I'm real happy with how things are coming along. Two words: ElasticSearch Rocks.
* Very embarrassed about the privacy policy link. Fixed now. ;)
This looks great, I've wanted this for a long time. One idea: Maybe have an option to "backfill" based on your browser's history? Seems like a good way to give users instant gratification, and solves the problem of "I wish I'd started running this a year ago". In reality, I wish I had it for every page I've visited in the last 14 years.
Update: Another idea - maybe you could integrate with pinboard or delicious.com (if anyone still uses it) to backfill-index all links saved to those services. Maybe this could be a premium feature.
I'm impressed with your work, I always wanted to have something like this, and was about to start coding it!
Here's my feedback: I do want the native version for privacy concerns, but I also want the syncing. Why not offer a program (or Docker container?) that I could put on my cloud of choice? That would be the real freedom. If people don't want to hassle with it, they will just pay your cloud offering.
I really value products that pay attention to this 'detail'.
This is exactly what I wished Workflowy or Thinkery would let me do. I love the mechanics of those two services, but I barely use them because I do not have control over my data. Thinkery basically turned into a bookmarking service for me because of that.
So yeah, that's what I would pay for in this case as well.
I'd love to support you in some way if you decide to open source this.
I'm otherwise a bit paranoid to let all the text of every homepage I visit be captured by a closed source plug in. Still - amazing job, and technically very, very impressive.
Hi, The local install option of this looks very interesting to me. I'd like to connect by email and give some feedback. Happy to buy as I'm looking for something that isn't online only.
I'm a regular user of Diigo, about 10K links, 500 different tags. Don't like it being cloud only.
My favourite feature is my ability to annotate a link (mostly highlighting text), so it effectively creates a chronological and topical feed of the exact sentences of what I want to remember from a link. It's kind of a self writing blog of what I read and experienced, complete with what stood out to me, and any notes I wanted to make.
I find I more remember a point or a sentence from a link than the link itself, and having a full text search of the words I remember highlighting and saving is incredibly powerful. I actually end up revisiting those links.
I have some experience with research and filing large databases of articles and images at a job in another life.
Very cool, and something I think would be great to have. To add to the questions, have you thought about a version that uses or can interface with owncloud (or something similar)? I think the cross-device capability would be great, but I would personally be more inclined to use it if I could keep the data on servers I control.
Hi, thanks for this incredible useful tool. actually i kind of working a similar chrome plugin. Now i dont have to :)
does it index existing bookmarks? seems its not doing it now. The reason i wanted to build this is because my bookmarks are grown toooo big. And i wanted a way to search.
Please add this feature. and indexing the history too if possible
Quick bug report on the cloud extension in Chrome, probably the others as well. The inputted email address for login is case sensitive, and the initial registration converts any inputted email address to lower-case. It took me a good while to figure out why I couldn't login.
I am wondering how localhost works. When Linux version comes out, will it support storing database on a remote host? In other words, using own virtual host as a server.
Good stuff, there's definitely a value in "local search", where local stands for "your own stuff". Pinboard can be used in a similar way (the paid version), but the difference is that's FTS only for pinned things.
Shameless plug, I'm attempting to do something similar specifically for science, but make those local results also available globally using a distributed network based on WebRTC. It's also a browser extension, which detects if you're on a page of a scientific article. If you are, it takes the body of the article and indexes it, by putting its contents into a DHT. You can then use the extension to search through this distributed network.
For those interested, the post back from June is available here: http://juretriglav.si/an-open-distributed-search-engine-for-... with the source code here: https://github.com/ScholarNinja/extension
The project will get a lot more love soon, as it turned out it was a bit too early back then because WebRTC implementations were buggy (since fixed in Chrome, but e.g. it resulted in 100% CPU usage in Chrome after a short while, gigabytes of memory used).
Anyway, best of luck making Fetching.io sustainable, flippyhead!
I've been told (by slingbox folks some time ago) that EFF argues that automatic updates are never (edit: generally not) a good idea. They can be used to add or remove functionality by court order.
Another scenario where automatic updates of a native app hurt users is when your company is purchased by a larger company who then shuts down the product. Please reconsider that feature for the localhost version.
BTW I'm just going by the green checkbox in the features comparison table to conclude that you have this ill-advised feature.
Sorry for latching on to that one thing, but it's important imho.
Other than that, this is something I've wished for many times, so great to see it becoming real. I loved clamprecht's suggestion of backfill from history -- that would be great!
This is the kind of product that would really benefit from having a clear business model up front. Free + some promise of charging in the future doesn't encourage me that it will be sustainable in it's current form.
Without a clear alternative, the likely conclusion is that user data will be used for advertising some time in the future.
So this would save every page I visit, including when I'm logged in to private services (email, bank, etc), to somewhere in the cloud I can't control?
Is there any client-side encryption done? If so, where is the publicly auditable code? And how does the search work? It fetches everything and decrypt it for each single query?
The idea is very good, but this should not be done in the cloud, it has to be done locally, and potentially securely synchronized among different machines.
EDIT: Okay it's not mentioned on the landing page, but there is an option to use it locally. Cool!
EDIT2: hey downvoters, when I read "Your cloud data is visible only to you. You can optionally install fetching as an application on your computer." I assumed that the app was a client for the cloud service that was distinct from the web interface usable in a browser. This is a totally legit interpretation, especially when the next title is "It's accessible from anywhere". I don't see why it is wrong in that case to raise the privacy concerns that I mentioned. I cared enough to continue investigating and found on an other page that the product can actually be used locally. I then edited my comment (maybe 4 or 5 minutes later) in accordance with that new knowledge. Knowing that the concern I raised are still valid for users who would chose to use it with the cloud, what does your downvotes mean?
Very interesting. I can't get the extension to work on Safari, though it works fine on Chrome. On Safari it logs in, but the search doesn't work (typing "f <something>" just goes to Google to search for "f <something>" every time, and when I restart the browser, I'm logged out.) Twitter authentication is also busted (returns a 500 error).
When it works, it's fast, clean, and really well integrated into the workflow of my browsing, since I use the address bar to control basically everything.
If you can figure out the Safari issue, I'd happily pay a few bucks a month for the cloud version.
Quick edit: turns out the Safari extension is definitely indexing the browsing, just the keyword search shows issues. Restarting the browser also kills the authentication every time. Latest Safari on OS X 10.10, if it helps.
It seems like this functionality could be pretty easy to achieve (minus the "cloud" part) with any existing browser that caches to the local filesystem - just set the cache limits to "unlimited" so it'll continue accumulating pages, and let your OS's search function take care of the rest. If you want to be fancy and keep only the text, add a script that periodically runs to clear out images, CSS, JS, and other cached files you don't need.
One of the biggest problems I can see is with the increasing popularity of web apps that load as a single page and use JS to load/parse/display the data; only the browser can get the actual content in that case.
A browser addon could work. I've toyed around in firefox. You can access the rendered dom in addons and do whatever you want with it, including logging it to file. Here's an example of writing to file: https://github.com/prekageo/http-request-logger/blob/master/...
I (as have many others) considered building something like this in the past. What you say was exactly what put me off.
The difference in CPU time between downloading a page and rendering it (even virtually as with say PhantomJS) was sufficiently large that running in the browser (not centrally) seemed to be the only general purpose way and maintaining _browser_ extensions is... a pretty major job. I was looking into writing a daemon to externally monitor the browser and it's cache (such as Chrome's Current Session file) when I left off.
Hopefully the use of server-side prerendering will catch on, be it through Node or other systems...
This is super cool and I've been thinking about building something like this for a long time. One thing to also consider: timing data. In my thinking about it, I thought that it would be massively useful to record both the exact time and order pages were visited in, and the other tabs open at that time. Lastly, check out AlchemyAPI to auto-suggest some tags / keywords based on page content to make recall easier.
Basically this + all of the above + a browsable timeline interface was my idea. Please take it and build it if you have the time / motivation, and I'll subscribe to your service (or help hack on it if you open-source it). The best competitor I've found so far is Pocket with the premium features (I'm a subscriber).
Good luck, and great work. I also love meteor and ES :)
Though it's not particularly practical, storing an event for every tab and window open, close, and URL change could lead to some very cool analytics after a long period of usage. I'd love to see visualizations of the number of tabs I have open over time, number of tabs per domain, etc.
um. I guess I retract my "closed source" comment and amend it to "not open source". 234M .app seems pretty heavyweight - do you really need all these node packages in the bundle?
If I'm running localhost, why do I need to create an account?
So this is sort of like the late Google Desktop? For a long time I have been annoyed that with the oodles of space and bandwidth we have nowadays, something as conceptually simple as "a searchable history of everything you look at online" doesn't really exist anymore.
What a delightful and extremely useful piece of software! I had a plan to build a smart bookmarking service (like pinboard on steroids), but I think this is what I was actually thinking of. Great work!
PS: If you ever decide to open source it I'd be happy to contribute.
Polipo can already store full text of every visited web page efficiently. For searches, we'd need a way to grep its compressed cache. Web UI optional. (Cache versioning would be a nice, separate feature.)
I also got an internal server error when creating a full (non-twitter, non-facebook) account, but it had actually registered me, and I could login with my details.
Wish a Windows version was incoming as well. Alas, I'll have to make do with Googling various phrases whenever I want to try and find a webpage I recall seeing and want to find again, but can't remember the name of.
for me, a privacy-respecting free and open-source product built to be accessible via tor browser (i.e. usable without the browser plugin) that accepts bitcoin donations could be a great alternative to similar functionality in e.g. the paid version of pinboard. replicating privacy-enhanced workflows from pinboard/evernote and the ilk can be frustrating at times, and trying to host my own bookmarks as a hidden service seems like overkill.
I just tried your extension on chrome. Works pretty neatly. I was wondering where do you store the index for full text search so that I can keep it in my dropbox and give it a 'cloud' like feature, where I can use the same dropbox location from a different device.
To respond to some of your ideas feedback:
* I do plan to sell the service by some combination of charging to buy the native version and a monthly/yearly fee for the cloud version.
* By offering the native version I hope to assuage any privacy or legacy concerns -- all you data is on your machine (encrypted and backed up however you see fit). You'll even have access to a local API to extract or do whatever you want with it.
* One idea I've had is to offer a cloud version / native version combo. You would sync to the cloud only your bookmarked sites -- all the other indexed pages you visit would stay on the local version. This way you control what gets put up on the servers but can still have access to your links from all your devices. Thoughts?
* I'd also consider open sourcing it (it's built on Meteor and ElasticSearch) but really do need to get paid for my efforts (just had a baby) and am not familiar with all the ins/outs of open source based businesses. I'd love to hear ideas and advice!
* This has turned out to be quite a lot more difficult than I'd thought but I'm real happy with how things are coming along. Two words: ElasticSearch Rocks.
* Very embarrassed about the privacy policy link. Fixed now. ;)