Hacker News new | past | comments | ask | show | jobs | submit login

I don’t really think Google’s plan is that weird. And it would be amazing for decentralized networks, archiving, and offline web apps. Google can’t just serve nyt.com — they can serve a specific bundle of resources published and signed by nyt.com verified by your browser to be authentic and unmodified.



How does centralizing content on Google from multiple sources improve decentralization? The web is already decentralized. That's why it is a web.

AMP is a scourge. It's a bad idea being pushed by bad actors.


The current implementation of the AMP cache servers obviously doesn't help the decentralization.

I think what Spivak is saying though is right. If we could move from location addressing (dns+ip) to content-addressing , but not via the AMP cache servers, in general, anyone could serve any content on the web. Add in signing of the content addressing, and now you can also verify that content is coming from NYTimes for example.

Also, I'd say that the internet (transports, piping, glue) is decentralized. The web is not. Nothing seems to work with each other and most web properties are fighting against each other, not together. Not at all like the internet is built. The web is basically ~10 big silos right now, that would probably kill their API endpoints if they could.


I think this would require an entirely new user interface to make it abundantly clear that publisher and distributor are seperate roles and can be seperate entities.

I don't think this should be shoehorned into the URL bar or into some meta info that no one ever reads hidden behind some obscure icon.


Isn't it already the case though with CloudFlare and other CDNs serving most of the content? Very few people really get their content from the actual source server anymore.


That's a good point. I just feel that there is an important distinction to be made between purely technical distribution infrastructure like Cloudflare's and the sort of recontextualisation that happens when you publish a video on Youtube. I'm not quite sure where in between these two extremes AMP is positioned.


Thank you for this explanation. AMP has put a really bad taste in my mouth but what you describe here does have some interesting implications. Something to consider for sure.


Please fact check me on this, but the ostensible initial justification for AMP wasn't decentralization, but speed. Businesses had started bloating up their websites with garbage trackers and other pointless marketing code that slowed down performance to unbrowsable levels. Some websites would cause your browser to come close to freezing because of bloat. So Google tried to formalize a small subset of technologies for publishers to use to allow for lightning fast reading, in other words, saving them from themselves. AMP might be best viewed as a technical attempt to solve a cultural problem: you could already achieve fast websites by being disciplined in the site you build, Google was just able to use its clout to force publishers to do it. As for what it’s morphed into, I’m not really a fan because google is trying to capitalize on it and publishers are trying various tricks to introduce bloat back into AMP anyway. The right answer might be just for Google to drop it and rank page speed for normal websites far higher than it already does.


> How does centralizing content on Google from multiple sources improve decentralization?

It actually makes perfect sense in Doublespeak. /s


They’re suggesting a web technology which would allow any website to host content for any other website, under the original site’s URL, as long as the bundle is signed by the original site. That could be quite interesting of a site like archive.org, as the url bar could show the original url.

But AMP is a much narrower technology, I’d imagine only Google would be able to impersonate other websites, essentially centralised as you say. The generic idea would just be a distraction to push AMP.

Everything would be so much better if the original websites were not so overloaded with trackers, ads and banners, then there would be no need for these “accelerated” versions.


I see where you are going, but what if my website is updated?Is the archive at address _myurl_ invalidated, or is there a new address where it can be found? I am thinking of reproducible URLs for academic references or qualified procedures, for example, which might or might not matter in the intended use case.

Could there be net-neutrality-like questions in all this as well?


I think this is possible already, but should not override the displayed URL for the content.

Create a new “original URL” field or something.


Google is not a single server. Think of Google as a CDN.


So it's decentralized because Google has multiple servers? And here I was, thinking that Google runs everything from a single IBM mainframe.

What you're saying would be described as distributed... Not decentralized.


Seems to me like it's easy to forget there's a difference between those two..


+1. The way I think about it is that signed exchanges are basically a way of getting the benefits of a CDN without turning over the keys to your entire kingdom to a third party. Instead you just allow distribution of a single resource (perhaps a bundle), in a crytographically verifiable way.

Stated another way, with a typical CDN setup the user has to trust their browser, the CDN, and the source. With signed exchanges we're back to the minimal requirement of trusting the browser and the source; the distributor isn't able to make modifications.


It seems like there is a risk that an old version of a bundle will get served instead of a new one by an arbitrary host? Maybe the bundle should have a list of trusted mirrors?


There is a publisher selected expiration date as part of the signed exchange which the client inspects. The expiration also cannot be set to more than 7 days in the future on creation. This minimizes, but of course does not eliminate, this risk.


It also makes signed exchanges completely unusable for delivering packages offline. (E.g. the USB stick scenario)

What a bummer.


Browsers could have a setting to optionally display the content anyway, along with a warning to the effect of "site X is trying to show an archive of site Y", similar to how we currently handle expired or self-signed SSL certificates.


Alternatively super short expiry times. It doesn't seem like it would be that concerning to have another site serving a bundle that was 5 minutes out of date. It doesn't seem like it should be too much load to be caching content every 5 minutes.


I could see some sort of alternative URL bar ("https://nyt.com/somearticle/ | served by https://somecdn.example.org/blah"), but complete replacement is far too dangerous and confusing in that it is completely hidden.


The New York Times surely already serves their pages through a CDN, silently, and with the CDN having the full technical capability to modify the pages arbitrarily. Signed exchange allows anyone to serve pages, without the ability to modify them in any way.

(Disclosure: I work for Google, speaking only for myself)


My objection is that it's no longer clear if you're dealing with content addressing or server addressing. If I see example.com in the URL bar, is it a server pointed from the DNS record example.com (a CDN that server tells me to visit), or am I seeing content from example.com? If I click a link and it doesn't load, is it because example.com is suddenly down, or has it been down this whole time? Is the example.com server slow, or is the cache slow? Am I seeing the most recent version of this content from example.com, or did the cache miss an update?


What if there was a `publisher://...` or `content-from://...` or `content://...` protocol, somehow? (visible in the address bar, maybe a different icon too, so one would know wasn't normal https:)

And by hovering, or one-clicking, a popup could show both the distributor's address (say, CloudFlare), and the content's/publisher's address (say, NyT)?


> a way of getting the benefits of a CDN without turning over the keys to your entire kingdom to a third party.

https://blog.cloudflare.com/keyless-ssl-the-nitty-gritty-tec... is a thing now.


The session key, which is given carte blanche by the TLS cert to sign whatever it wants under the domain, is still controlled by Cloudflare.

To put it simply, Cloudflare still controls the content. The proposal here would avoid that, by allowing Cloudflare to transmit only pre-signed content.


Your browser would have a secure tunnel to CloudFlare which is encrypted with their key. But then that tunnel would deliver a bundle of resources verified your browser differently that CF doesn’t have the key for.


The plan is bad because google currently tracks all of your activities inside AMP hosted pages site in their support article.

Google controls the AMP project and the AMP library. They can start rewriting all links in AMP containers to Google’s AMP cache and track you across the entire internet, even when you are 50 clicks away from google.com.


While that's theoretically possible, the library can be inspected and does not do these things.


Could Google give specific persons different versions or is technically impossible?


Technically yes, but not very practically. The domain is cookieless, so it would be difficult to even identify a specific user, other than by IP. Also, the JavaScript resource is delivered from the cache with a 1 year expiry, which means most times it's loaded it will be served from browser cache rather than the web.


How is google.com cookieless?


The AMP javascript is served on the cdn.ampproject.org domain, not google.com.


It's very possible indeed.


They have the log files.


> the library can be inspected

Really? Could you publish how you are inspecting an unknown program to determine if it exhibits a specific behavior? There are a lot of computer scientists interested in your solution to the halting problem.

Joking aside, we already know from the halting problem[1] that it you cannot determine if a program will execute the simplest behavior: halting. Inspecting a program for more complex behaviors is almost always undecidable[2].

In this particular situation where Google is serving an unknown Javascript program, a look at the company's history and business model suggests that the probability they are using that Javascript to track use behavior is very high.

[1] https://en.wikipedia.org/wiki/Halting_problem

[2] https://en.wikipedia.org/wiki/Undecidable_problem


By reading the source code?

    def divisors(n):
        for d in range(1, n):
            if n % d == 0:
                yield d

    n = 1
    while True:
        if n == sum(divisors(n)):
            break
        n += 2
    print(n)
I don’t know if this program halts. But I’m pretty sure it won’t steal my data and send it to third parties. Why? Because at no point does it read my data or communicate with third parties in any way: it would have to have those things programmed into it for that to be a possibility. At no point I had to solve the halting problem to know this.

Also, if I execute a program and it does exhibit that behaviour, that’s a proof right there.

The same kind of analysis can be applied to Google’s scripts: look what data it collects and where it pushes data to the outside world. If there are any undecidable problems along the way, then Google has no plausible deniability that some nefarious behaviour is possible. Now, whether that is a practical thing to do is another matter; but the halting problem is just a distraction.


> at no point does it read my data

Tracking doesn't require reading any of your data. All that is necessary is to trigger some kind of signal back to Google's servers on whatever user behavior they are interested in tracking.

> or communicate with third parties

Third parties like Google? Which is kind of the point?

> [example source code]

Of course you can generate examples that are trivial to inspect. Real world problems are far harder to understand. Source is minified/uglified/obfuscated, and "bad" behaviors might intermingle with legitimate actions.

Instead of speculating, here is Google's JS for AMP pages:

https://cdn.ampproject.org/v0.js

How much tracking does that library implement? What data does it exfiltrate from the user's browser back to Google? It obviously communicates with Google's servers; can you characterize if these communications are "good" or "bad"?

Even if you spent the time and effort to manually answer these questions, the javascript might change at any time. Unless you're willing to stop using all AMP pages every time Google changes their JS and you perform another manual inspection, you are going to need some sort of automated process that can inspect and characterize unknown programs. Which is where you will run into the halting problem.


Funny how people can literally "forget" that Google is a third party. Probably people at Google believe they are not third parties. Not even asking or trust, just assuming it. No other alternatives. Trust relationship by default.


> I don’t know if this program halts.

Be cool if you did ;)


If you didn't catch the joke: It is currently unknown whether there are any odd perfect numbers (and the program halts on encountering the first).

https://en.wikipedia.org/wiki/Perfect_number

https://oeis.org/A000396


> Could you publish how you are inspecting an unknown program to determine if it exhibits a specific behavior? There are a lot of computer scientists interested in your solution to the halting problem.

This has nothing to do with the halting problem because that is concerned about for all possible programs not some programs.

We obviously know if some programs halt.

    while true: nop
Is an infinite loop.

    X = 1
    Y = X + 2
Halts.

More complex behaviours can be easier. Neither of my programs there make network calls.


Publishers who use AMP were already allowing Google to track everything through either Analytics or Ads.

Likewise, AMP pages are mostly accessed from Google search that's already tracked.


As a user I can choose to block GA, either through URL blocking or through legally mandated cookie choices in some regions (e.g. France). When served from Google I have no choice in the matter.


If you can block GA at the client, you can block google.com at the client, no?


Not if I want AMP pages. (I mean, I don’t, but there are presumably people who do.)


The AMP spec REQUIRES you include a Google controlled JavaScript URL with the AMP runtime. So technically the whole signing bit is a little moot, given that the JS could do whatever it wanted.


The same could be said of any CDN hosted javascript library. For example: jquery. There is an open intent to implement support for publishers self-hosting the AMP library as well.


For most JS served by CDN, you can (and should) use Subresource Integrity to verify the content. At least the last time I was involved in an AMP project, Google considered AMP to be an "evergreen" project and did not allow publishers to lock in to a specific version.


Long term versions are now supported, so publishers can lock in a specific version.

Publisher hosted copies are in the pipeline, as I referenced in the parent comment. My choice of verbiage was a bit confusing it appears.


I don't think it's your wording that's confusing. You are contradicting the AMP documentation.

AMP's documentation seems to indicate that the LTS is stable only for one month (new features released via the same URL each month), and so is not compatible with SRI (see https://github.com/ampproject/amphtml/blob/master/contributi...)

You can specify a version (ie, https://cdn.ampproject.org/rtv/somenum/v0.js), but the AMP validator complains about that.


> The same could be said of any CDN hosted javascript library

Yes, and? What’s your point? It’s actually a security weakness to include third party JS. The whole thing runs on trust.


What's an open intent? Where is this documented?


AMP spec: https://amp.dev/documentation/guides-and-tutorials/learn/spe...

"AMP HTML documents MUST..."

"The AMP runtime is loaded via the mandatory <script src="https://cdn.ampproject.org/v0.js"></script> tag in the AMP document <head>."

Do a whois on ampproject.org:

"Registrant Organization: Google LLC Registrant State/Province: CA Registrant Country: US Admin Organization: Google LLC"

Note that jQuery, as mentioned in some GP comment has no such requirement. Google AMP is quite unique in this regard. This is NOT some general CDN type issue. Also...agreed, WTF is "open intent"?



Note "open", i.e., unresolved. Perhaps in a less positive light, "how to enabled signed exchanges/AMP without controlling it".


Correct. Open as in not resolved yet, but intended to be resolved in the future.


You missed the required part.


That's not why Google (the corporation) wants this to happen. This is not about technical capabilities but about power.

They cannot be allowed to become the gatekeeper for the web.


They already are. The question is not how we're going to stop that from happening but how we are going to roll it back.


I agree, if we finally got a way to have working bundles on the web, that would be extremely useful. (And would also restore some of the capabilities of browsers to work without internet connection).

It seems to me, a lot of the security concerns come from the requirements to make pages served live and pages served from bundles indistinguishable to a user - a requirement that really only makes sense if you're Google and want to make people trust your AMP cache more.

I'd be excited about an alternative proposal for bundles that explicitly distinguishes bundle use in the URL (and also uses a unique origin for all files of the bundle).


I believe the issue with this is that users already largely don't understand decorations in the URL. For example, the difference between a lock and an extended verification certificate bubble. Educating a user on what a bundle URL means technically may be exceedingly challenging.


In what ways is this different/similar from "content centric networking"?

https://m.youtube.com/watch?v=gqGEMQveoqg

(Google Tech Talk from Van Jacobsen on CCN many years ago)


AMP is happening and CCN is not.


Do you mean to say that is the only difference?


No, there are many differences but they don't matter. Since CCN is not economically feasible, none of the technical details matter.


Why is CCN not economically feasible?


The problem is ownership. Google is “stealing” or caching content for what they consider a better web.

I don’t support ads but I also don’t support Google serving a version of the page that steals money from content creators. So, therein lies the problem: choice.

I can imagine a future where amp is ubiquitous and Google begins serving ads on amp content. Luckily, companies have to make money and amp is not in most people’s or company’s best interests.

If amp was opt-in only, this would be much more ethically sound.


Signed exchanges guarantee that the content cannot be modified by the cache, such as ad injection.

Google has never injected ads into any cache served AMP document (technically if the publisher uses AdSense, this is false, but that's not the point you are making).

It's difficult to follow what definition of theft is being suggested. The cache does not modify the document rendering, it's essentially a proxy. In a semantic sense, this is no different than your ISP delivering the page or your WiFi router.


It's completely moving away from the client/server model to something else.

Perhaps that's a great thing to do, but it's not something to do quietly.


Just hearing about this from the thread, I'm getting a IPFS vibe from this. It would be interesting to see that tech get more native integration with the browser from this idea.


How is it not weird that I see a domain name in the URL bar that has nothing to do with the domain I actually requested content from?


Why do they need a special extension though? What's wrong with DNS?


Signed exchanges are an extension to digital certificates, such as used for TLS. This is independent of DNS.


Why would it be amazing for decentralized networks and offline web apps?


If I publish mycoolthing.com/thing, it could be mirrored over a P2P network as peer1.com/rehosted/mycoolthing.com/thing, peer2.com/rehosted/mycoolthing.com/thing, etc., in a way that would make it evident to end-users not familiar with the protocol that the content is from mycoolthing.com.


AMP is of course not P2P.


I think the point is that signed exchanges ( https://developers.google.com/web/updates/2018/11/signed-exc...) could potentially be useful, if separated from AMP, and made an actually secure thing. Like, for example, the spec doesn't require specific Google controlled js URLS to be in the content.


Signed exchanges is actually separate spec from AMP. The browser implements it independently. There is no requirement for AMP pages to use signed exchanges nor for signed exchanges to be AMP.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: