My concern with the IPFS claims of permanence, like "Where you can know for sure your pictures will be available forever" is that, AFAICT, files are only uploaded to the network once another party requests them.
An example:
$ipfs init
$echo 'agdkjagka' > test
$ipfs add test
added QmTtmsSSTH5fMQ9fn9NRXWtcZXnBBazxR8fidXcE5KB76h test
$rm -R ~/.ipfs
$ipfs init
$ipfs cat QmTtmsSSTH5fMQ9fn9NRXWtcZXnBBazxR8fidXcE5KB76h
Error: merkledag: not found
In English, `ipfs add` does not upload anything to the network. The only way a hash becomes distributed on the network is for another party to request that hash. Even then, I believe that the data only exists on that party's system. That is, if both you and the requester leave the network, the data is gone.
Yeah, this is a common source of confusion, and the ipfs.pics tagline doesn't really help the situation. Personally I'd change it to "where you can ensure your pictures will be available forever".
You shouldn't read "permanent" as "eternal", but as "not temporary" --- in the same sense as "permanent address" or "permanent employment". HTTP is temporary in the sense that you're relying on a single server to actively continue serving a file for it to remain available. IPFS is permanent in the sense that, so long as the file exists somewhere in the network, it will remain available to everyone.
Edit: If you want to ensure your content remains available, you still need to host it somewhere (your own server, convincing/paying someone else to, etc). IPFS isn't so much about storage as it is about distribution.
IPFS is just piece of the puzzle. Filecoin, Bitcoin like technology (but about files instead) is meant to combat this issue. Similar to how you pay Dropbox or AWS to host your files, you'll have IPFS hosts that you'll pay to rehost your content. Or, you'll have your own public daemon that you can "pin" your own content. In the end, people are not willing to give out disk space for free, therefore Filecoin will exist.
http://filecoin.io/ < built by the same guys behind IPFS and meant to be used together
> In the end, people are not willing to give out disk space for free, therefore Filecoin will exist.
Disk space is one parameter of the equation though. Bandwidth and uptime should be considered too in order to estimate the effective amount of resources a node adds to the network.
Given enough peers over time, neither bandwidth or uptime should be an issue. If an incentive based system like Filecoin were to take off, and there's a market to exchange that currency, then it may actually be a viable business just to set up farms that host content. Kind of like a hosting service, but with indirect customers.
What if you pay the network to host your blocks by hosting some other blocks from the cloud, at some m:n ratio? The cloud (some random peer) could periodically check your block inventory and make sure you were still holding up your end.
Really, this whole concept is similar to torrents. Either ratio-based (upload at least X % of what you download), or HnR based (don't download something and refuse to seed it.)
This is true. However things disappear from the web all the time. Imagine if everything on the internet that was accessed once a week would be available forever. While not perfect, it would be much better than the web is today.
IPFS doesn't download anything to new peers, unless the new peers ask for it. That way each node owner can control what's on their node.
But say if popular browsers by default used IPFS as a cache, that way if the original publisher goes away the content could live on, as long as the content is popular.
> But say if popular browsers by default used IPFS as a cache, that way if the original publisher goes away the content could live on, as long as the content is popular.
That is my main issue with the way IPFS is being marketed, as it were.
It is not a "Permanent Web" if the only content that is permanent is popular content. Old links will still break, because old links are generally less popular. Old websites will simply vanish, even ones that are otherwise maintained on the current web.
In particular, applications like this post itself, that are part backup part publishing, aren't great applications of IPFS because your images are just hosted off your home internet connection. Power outage? No data. ISP issue? No data. Hardware failure? Hope you had a real backup. Basically, why would I choose IPFS, which is in this case equivalent to self hosting, over flickr, instagram, etc?
Edit: I'd be remiss to not refer you to toomuchtodo's comment below. Were a service like the Internet Archive to take part in IPFS then it would help with some of my above concerns. However, it's not really IPFS that is making the permanence possible so much as the Internet Archive in that circumstance.
The permanence of a "site" in IPFS is intrinsically bound to the active participation of those social entities propelling the sites content.
So, were we to have IPFS support directly in the browser, every time you or I take a look at the pics in a thread, for example, we'd be contributing to the effort to keep things on the IPFS network, to our nearest local trusted peers, for as long as the subject is relevant.
So, your typical net forum, whose members are dedicated to the subject, would communicate about that subject a such, and in so doing .. perpetuate the network.
Yes, the IPFS web has to be tended. But so do your servers. Your servers will die in an instant if 10 or so people die, in an instant (extreme case), or for any one of a thousand different - social - reasons. In this case though the technology is aligned; the load of supporting an IPFS web is being distributed among people whose interest supports the subject, instead of the centralized, sysadmin-with-keys-of-godlike-power. This de-centralization should be considered an attack on devops. IPFS means that the admin of establishing a distributed content delivery system capable of scaling to load, no longer requires an admin. The user is the admin.
IPFS also allows IA to not just be a backup, but to help distribute the original content itself. There's no longer a distinction between origin hosts and backups.
>Old websites will simply vanish, even ones that are otherwise maintained on the current web.
Not really. Today, a website needs to be maintained by the original host or it goes away. If IPFS were used, the same site would need to be hosted by the original host or any other interested party.
If absolutely nobody else is interested enough to host it, the original host can continue to do so, and the site would be in the same situation as today's sites: hosted by one node.
>In particular, applications like this post itself, that are part backup part publishing, aren't great applications of IPFS because your images are just hosted off your home internet connection. Power outage? No data. ISP issue? No data. Hardware failure? Hope you had a real backup. Basically, why would I choose IPFS, which is in this case equivalent to self hosting, over flickr, instagram, etc?
While I haven't looked at the source code, I'm fairly certain ipfs.pics is uploading the photos to their servers as well. It's effectively a Flickr-type site using IPFS as the backend, with the added benefit that the photos may still be available somewhere else if their servers disappear.
Well, it doesn't matter so much how many hosts join the network. You still need to convince some members of the network to view your content at least once in order to distribute the data.
I suppose you could argue that nonvaluable content would vanish over time justifiably, but then it's not really a, "Permanent Web."
Edit: Apparently I can't reply to your reply to this comment, but thanks for the link. I hadn't seen that.
I believe IPFS was partially intended to help the Internet Archive in that regard. They'll be the consumer of last resort for all objects, thereby bringing about the Permanent Web.
It'd be interesting to see a browser implement caching using something like IPFS. When a regular HTTP GET (or whatever, really) request is sent, the IPFS enabled browser could look for a `Link` header with the `ipfs:` scheme and `rel="alternate"` in the response, and use that as an alternate place to look for the content. The Etag header could carry the hash, so the browser would could tell on subsequent requests which hash it associates with the mutable URI. In the event of a 304 it'd look up the data in the IPFS cache – which may or may not actually be on disk. If not it might still be a more efficient fetch than HTTP since there may be many peers; worst case scenario, the only peer is the HTTP server you made the request to in the first place.
I suppose `Content-Location` could be used as well, but I don't know how well browsers that don't understand the `ipfs:` scheme would react to that, although the spec doesn't preclude non-http schemes used in the URI.
It'd be an interesting experiment anyway, and could be a boon to adoption of a distributed system like IPFS.
Come to think of it, `Content-Location` is much more semantically appropriate than `Link <ipfs:hash>; rel="alternate"`; the latter is just a link, but the `Content-Location` header would tell you the canonical location of the requested content. For an IPFS enabled client, this would mean that if they want that specific content, they'd never even hit the HTTP server on subsequent requests, but dive straight into IPFS. That said, existing clients may get very confused by an unsupported scheme in that header. Presumably, that client should go `IDK lol` and go use the not-so-canonical URL instead, but I'd be surprised if they'd actually work like that.
It caches things locally (in ~/.ipfs/blocks), so you'd have to request it from a secondary system to get it even on another node. However, my understanding is that if that second system left the network and you left the network the data would still be lost.
You need a third party to request the data and not leave the network to keep the data around.
Given either the third party reliably remains in the network (e.g. the Internet Archive) or you can consistently get new third parties to request the data and cache it then it will remain in the network. The latter does not seem particularly reliable to me, however.
I think it's more of an issue with "marketing" or how IPFS is (was?) presented: It's not a magic web-in-the-sky for all the things -- but it does make it really easy to a) host stuff redundantly, and b) scale out distribution. So you could edit a static web page on your laptop, have a "post commit hook" (or other automagic system) that pulls/pushes published posts to two-three permanent servers -- these could be backed up as normal, or you could just spin up some VMs and have them "restore" from some "master list" of your own content (hashes).
Now as long as at least one device is up (and has the content), you can bring backups on-line easily. And as long as at least one server is connected to IPFS other nodes can get the content, and in theory, any spike in popularity will get distributed and cached "suitably".
An added bonus is that if you publish something like a controversial, but popular, political blog post/expose, and some government throw you in a hole that officially doesn't exist -- your readers, if they're on IPFS, will maintain an active backup of your content by virtue of reading it.
This is a lot more convenient than someone having to explicitly spider it etc (although a combination would probably work/be good idea -- eg: an IPFS "dmoz.org" where authors could register content index-pointers for others to spider/download into their IPFS nodes -- and index for search).
I don't disagree on any particular points. When I first read about it and started playing with it I definitely felt like my expectations were set to something other than what IPFS actually provides.
That said, I think systems of this nature are worth pursuing and perhaps IPFS itself can be improved for more general purpose use cases. For my part, I think it'd be awesome to be able to write some html, css, make some images, `ipfs add ~/website` and then be able to link anyone my content and have reasonable guarantees of it's existence for the rest of my life. I can host my own websites, but it's not a particularly enjoyable experience.
> This is a lot more convenient than someone having to explicitly spider it etc (although a combination would probably work/be good idea -- eg: an IPFS "dmoz.org" where authors could register content index-pointers for others to spider/download into their IPFS nodes -- and index for search).
IIRC it's possible to follow announcements of new hashes on the network and retrieve them automatically. I picked this up from #ipfs on FN, I believe, so I'm not 100% sure about it. Doing that would make an IPFS search engine fairly robust (and interesting to build, actually).
ipfs dev here! This is indeed possible, you will be able to listen on announcements (provider messages) of hashes that are near your nodes peerID within the kademlia metric space. To get a complete picture of all hashes on the network, you would need to ensure your nodes had reasonable coverage over a good portion of the keyspace (enough that the K closest peers calls for any hash would return at least one of your nodes).
I really want to build something like this, just haven't had the time to do so.
You don't need to do this with Freenet. When you insert data it is pushed to other nodes - a completed upload means other nodes have the data. You can turn off your node and the data is still available.
before the "rm -R"-line? I'm guessing it might not be something the gateway servers are set up for doing ATM -- but maybe they should, or a cluster of IPFS node should be set up for that?
I'm not sure if forcing a full download of the whole file is a waste of bandwidth, or a clever way to force the person adding a file to the cache at least perform some effort by "using" the same amount of incoming bandwidth as the cache nodes would have to on the back-end.
My initial thought was that such a system should allow "seeding" the cache by simply sending a HEAD or similar request...
Yeah, you can definitely force the gateways to cache content for you. Just make sure not to rely on that for keeping things available, they run a garbage collection every hour or so to clear up disk space.
Great idea and use case for IPFS. I'm not sure I want to run php just to host images though. I always have this 'oh... It's in PHP...' moment with things like this and I generally end up not wanting to play with or host them, either because I don't find php attractive as a language or because of the various security issues around hosting php applications.
EDIT: I'd encourage the HN community to look past the technical merits of the code, and focus on the idea presented (and even perhaps fork and contribute back an improved version).
As you suggest, looking past the code quality / implementation choices I believe the idea if solid and it is a creative use of IPFS to attempt to 'solve' a real world problem, I'm excited to see more projects like this.
He just needs to happen to have IPFS running. Any website can make a request to localhost via Javascript or even without.
Scenario:
Attacker sends you harmless looking link to a page that contains some invisible JS that sends a request to localhost and pins kiddypon on your IPFS node.
Attacker then sends the police to you.
The IPFS API isn't exposed like that, there is actually a whitelist for hashes that can access the API (the IPFS webui is in the whitelist by default), and hashes that are blocked will get a 403.
There's nothing wrong nor insecure about choosing not to use prepared statements. There does seem to be a very conservative/cargo cult streak amongst some tech people that there are only pure ways to do things. For example there is nothing wrong with building a SQL statement as a string and passing it to a database. If it's programmed correctly then it is exactly as secure as a prepared statement.
> If it's programmed correctly then it is exactly as secure as a prepared statement.
That's a very big "if" right there. The same could be said for nearly every security flaw ever—if it were programmed correctly, it wouldn't be a security flaw. The reason people harp on prepared statements is not because you can't be secure without them, but that it's much easier to be secure with them. One you have to think about all the time (do I need to escape, do I not need to escape), while the other is almost completely secure against SQL injections by default.
Same thing with using system() vs execve()[1]. One is a minefield of quoting issues, the other has none.
of course it was a bit of a generalization. However I don't think it's harder to use prepared statements and I do think you should always use it, even if you're absolutely sure that the data you get in isn't an injection. Not using it rings an alarm bell IMO.
IMHO PHP get's treated fairly. It's possible to write secure and well designed programs in it, it's just very hard. And in fact I'd argue that half of the internet does not work. We life in a world where zerodays are ubiquities.
I just think that languages like c and especially c++ do not get their fair amount of laughter, that php gets.
"It's 2015, there are not "security issues around hosting php applications" otherwise half the internet wouldn't work."
PHP adds complexity and therefore adds fragility and insecurity. Running PHP ipso facto adds insecurity to an environment.
There are many ways to do web programming that do not add server daemons or listening ports, or long running processes, or additional logins, passwords and management interfaces.
Would be nice if the repo came with a dockerfile where you could just "docker run -d" on your server and it would do everything automagically. You'd just need to point nginx at it.
The difference here is that the content you're opting in to rehosting is sitting in cleartext on your machine. So it's not blind, per se; you can delete (un-rehost) whatever you don't want to be there, or run a nudity detection algorithm (yes, those exist) to automatically prevent re-hosting of any porn, etc.
Similar to this, you could suscribe to a good-faith censor that removes known illegal images from your ipfs enlistment (using known hashes, for example). Obviously wouldn't catch everything, but I suspect would remove any legal obligation on your part (IANAL).
A scheme like this should really have some sort of encryption or chunking wherein the image files wouldn't be reconstructible any one person's machine. I like the idea of a P2P image hosting service but I definitely don't want to feel responsible for hosting CP or other terrible garbage.
IANAL but I don't think that such schemes have been tested in court. Here in Germany you can get in trouble for just linking to bad content, so it seems to me that storing a chunk of it, however encrypted and inaccessible to you it might be, isn't going to be safe.
> With Freenet, it's very hard to even prove even that you host pieces of any particular thing.
It's only hard if you don't know the content. Freenet uses Content Hash Keying, where the file is hashed and the hash is used as the encryption key (so only people who either know the hash or have the file can request it).
If you have a list of cleartext files or hashes, it's not very hard to check if those are hosted on your node.
This may be somewhat immature as far as its development but IMHO it's the beginning of the future. My guess is that we'll now be seeing almost complete server/client decentralization of many of the most popular apps (social / search / sharing). 5+ years from now.
The system only distributes requested objects. If no one requests spam objects, they won't be replicated or distributed, thereby not wasting resources (in theory).
Git's not that decentralized, though. You've got centralization at a technical level (everybody contacts the same server to push code) and at an organizational level (if you want to add code you need to be the repository owner or someone authorized by them).
That brings up an important point, though: you can make your system partially decentralized, and gain many of the benefits of a totally centralized system with a fraction of the complexity.
What if it was a decentralized system with authentication? Could a community of users host content internally, and add a users public-key (of sorts) to some global "allow" list?
There is a well known decentralized system that scales called email. It has proven difficult to keep clear of spam, but that isn't an issue on a closed/authenticated email network.
A Sybil attack can be seen as either a censorship problem, or a trust problem. As long as you are careful how you add new members, such a system wouldn't be at all vulnerable to a sybil attack.
Well designed decentralized systems scale better than their centralized counterparts (see bittorrent vs. rapidshare et al).
It's more difficult to keep them clear of spam, but hardly impossible. You can run spam detection on every client. You can require some obstacle to posting content (proof of work, or maybe payment of 10 or so microbitcoin). Most effectively, you can design your systems to allow for distributed moderation.
I bet strongly against that simply because large commercial players in the field center a significant part of their business around hoarding users' data.
Even if they somehow figured out how to scale decentralized systems and keep them reasonably spam-free, they wouldn't want them.
I've been looking into developing distributed client-side apps for IPFS, and it would be great if we could just tap into the native IPFS DHT rather than having to depend on a separate IP-based DHT for peer discovery in our apps.
A fully client side implementation of IPFS is in the works. Right now, you'll have to run the daemon on your server and interact with it, so not fully distributed until either the client-side implementation of IPFS is done (node-ipfs on Github/NPM) or the public gateway can accept calls from the JS api client that works in the browser too (node-ipfs-api on Github/NPM).
It looks like the PHP code is the front facing web code which then makes curl calls into the server.js code as you suspected. Presumably server.js is run using node which listens on port 8090 and exposes a few ipfs methods which just calls into the local ipfs binary on the system using exec calls.
IPFS is an interesting project and this is a pretty cool usage of it.
Yes but the argument is unidirectional. AGPL answers different needs than more permissive licenses.
Also it would be good to recognize the work done before shooting out "wrooooong license (for me)", specially when it still is licensed under a free software license.
I'd say yes. IMO there isn't a single thing you can do with a AGPL licensed piece of SW that you cannot do with a MIT licensed one, -the other way around, -quite a bit.
> Also it would be good to recognize the work done before shooting out "wrooooong license (for me)"
Sorry. Wasn't my intention but I can clearly see it being read that way. I just wanted to point it out as I personally don't like AGPL and wanted others to avoid the disappointment had.
> IMO there isn't a single thing you can do with a AGPL licensed piece of SW that you cannot do with a MIT licensed one
Sure there is. You can prevent others from imposing restrictions. If someone takes your non-copyleft software and uses it to build something that forbids people from knowing what's in it, how it works, or what it does with their data, then you are helpless to stop it.
IMO nobody can impose restrictions to your MIT licensed code unless they can somehow threaten you and your users (think patents, law enforcement) in which case you have lost anyway.
They can reuse your MIT code and impose restrictions on their modified code but we still have the free MIT code that is way free-er that AGPL and we also have an additional proprietary option.
Edit: And if your worry is that they will use the code to create a commercial product with a slightly different file format etc this is a standards issue, not a open source issue. Furthermore, if there isn't liberally licensed code available they will just create their own broken/different version.
Summarized: I am fairly convinced that for many companies the option to using BSD/GPL/LGPL/MIT code isn't using AGPL but rather using a commercial solution or write their own. Both this more or less guarantees you get no patches back to your open source project. AGPL is then just a way to make a statement.
> IMO nobody can impose restrictions to your MIT licensed code
Not to you, but to your users. If you don't care about your users' well-being, well...
> use the code to create a commercial product
I wish people who oppose the AGPL didn't see mistreating their customers as the only way of being commercial. There are several examples of commercial AGPL software.
Yes, hiding the source code and forbidding them from copying the source code is mistreating them. If you are not doing anything wrong, then why are you hiding the source code? When software is naturally copyable, why are you denying them this natural right?
> Both this more or less guarantees you get no patches back to your open source project
There are also plenty of examples of copylefted projects getting even more patches than corresponding non-copylefted projects. There's a popular kernel called Linux that is copylefted and gets tons of industry-backed patches. People have been using that bugbear for as long as copyleft has existed. Mailpile got that threat when it switched to AGPL-only and so far its contributions haven't slowed down.
> Not to you, but to your users. If you don't care about your users' well-being, well...
They can still use my MIT licensed code which gives my user more rights than AGPL ever will.
> Yes, hiding the source code and forbidding them from copying the source code is mistreating them.
Now you are arguing against closed-source, not liberal open source licenses.
> There's a popular kernel called Linux that is copylefted and gets tons of industry-backed patches.
Note that you can use Linux on your server without announcing it to everyonce who acesses it. Also another reason for why patches are submitted are because industry hope their patches will be mainlined so there will be less maintenance.
That said, GPL has worked well for the Linux kernel, possibly better than BSD. (I'm mostly arguin against AGPL.)
> Mailpile got that threat when it switched to AGPL-only and so far its contributions haven't slowed down.
> Now you are arguing against closed-source, not liberal open source licenses.
Non-copyleft licenses provide no protection against proprietary derivatives. If you have no objection to proprietary derivatives, then I think that's a weak stance to take. It means you don't care if your users can be given your software but without the rights that you intended them to have, because someone between you and your users can always take that right away.
While your users might come back to you to get the same software with the same rights you intended them to have, they might not be aware of this, or they might not want your software alone, but together with whatever else was built on top of it, which they cannot get under your original permissive terms anymore.
An example:
In English, `ipfs add` does not upload anything to the network. The only way a hash becomes distributed on the network is for another party to request that hash. Even then, I believe that the data only exists on that party's system. That is, if both you and the requester leave the network, the data is gone.