I see this as a remarkable answer to the problem of needing to view a cached version of the website.
For example, what if a URL were posted to Hacker News, but after the URL was a ?hasifyme=THEHASH, where THEHASH was the Hash of the website linked-to.
This way, if the URL could not be loaded because the server load was to high, you could just forward the URL to Hashify.me and the cache of the plain text from the website would still then be readable.
Boom, instant cache of the website content stored right in the URL!!!!
Actually tinyurl will shorten it but your browser may not allow the redirect. Chrome throws up an error page for security reasons: http://tinyurl.com/3maue6t
That's not possible because the base 64 encoding is longer than the plaintext. There are 4 characters of text in the URL for every 3 characters of text entered.
If you're willing to jump through hoops and js-compress your content, you can deal with this issue, unless your compressor uses something like crc. That way, you can grow the plaintext faster than the b64 - though you'd still need to do clever things with the url.
bit.ly is the only URL shortening service to support cross-origin resource sharing, as far as I'm aware.
The _right_ thing to do would be to build a shortening service designed to handle URLs of arbitrary length. I'm not sure that I'm willing to take on that responsibility, though.
With a name like 'Hashify', I'm surprised they don't also offer the option of putting the content into the '#fragment' portion of the URL. Then, not even the hashify.net site would need to receive and decode the full URL; they'd just send down a small bit of constant Javascript that rebuilds the page from the #fragment.
The OP is saying that with a #fragment, the browser wouldn't even need to send the server the doc contents. (Since they aren't used anyway, there's obviously no use in doing so.)
I can't recall the last thing that made me giggle as much as realizing what they were doing. I can't see any time I'd really use this, but the audacity is inspiring.
Because "URI" stands for "Uniform Resource Identifier" (or URL for "Locator"), not "Resource". The intent is for it to be a pointer, not the value at the location. And, if you were to use the URI as the content (instead of chunking and shrinking it via Bitly), you'd be duplicating that content on every page that links to it. And in your browsing history, by merely viewing it.
edit: oooh, another thought: you're essentially uploading the content of the page to view it.
There's an interesting copyright question in there somewhere too. If the URL for my document is the document then sharing the link is infringing my copyright, or something.
Is this really the case? Bit.ly would still have to forward the user over to the hashify.me site, where the hash would be decoded server-side and the content would have to be sent back over the wire to the client. That's still eating the same amount of bandwidth on hashify.me, no?
Bit.ly has to store the entire document encoded in base64 as the URL of the destination in their database in order to return to users the value of the given bit.ly URL hash. In essence, yes, bit.ly is storing the entire document on their servers anytime anyone shortens a hashify.me link.
Think if it like the difference between the postal service letting you know there is a package that you can go pick up at the post office, and the postal service giving you a package at your home or work that cannot be opened until you go to the store to buy a box cutter, but you have to bring the package with you.
The first example is cheap, since you only receive a pointer or link to where the package is, but you have to do all the work to get it. The second is not cheap, since if the package was a bed from Ikea (for a random large example), the postal service (bit.ly) has to deliver the package to you, and then you have to go somewhere (hashify.me) while carrying that package in order to see what's inside.
Ok, that makes sense. I thought we were debating on whether or not bit.ly incurred ALL the load and hashify.me incurred NONE, but that doesn't seem to be the case.
Last week on a whim I whipped up a URL shortener that expires the forwarded URL after one week[1]. Using that plus hashify, you can essentially make expiring web pages.
"A hash function is any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum"
No hashify is not really a hash function. Since a hash function takes large data to small data it is implicitly not invertible. Hashify's URLs obviously are.
And the relationship is this: 'path' element of bit.ly URLs are keys in a key->value mapping (where the value is your target URL), and one of the best implementations of a key->value mapping is a hash-table. (At least, it's good for in-memory implementations... I suppose that on disk something a little more elaborate may be called for?)
Historically, the authors of Perl and Ruby (and WP tells me, Common Lisp?) decided to confuse the interface with the implementation, and use "hash" or "hash table" to refer to the mapping, and not ever Perl hacker has a Computing Science degree, so now we live in a world of people who think that "hash" means the thing that bit.ly does for you.
In Common Lisp, it's not interface for mapping but really an hash table as some internal details of hash table implementation are exposed by the interface.
I hacked together an encrypted (aes 256) read/write "database" once with the bitty API as the persistence backend.
However, this site disappoints me, it doesn't seem to do anything other than what a data URL can do, except it's vulnerable to downtime because of a centralized website.
Edit: for those of you unfamiliar with what a data URL is. You an store a HTML or image document using a URL like data:text/HTML;base64,hashifystuffhere
This is pointless. It's impossible to create two pages that link to each other, for one. Also, as noted, most browsers won't allow URLs greater than 2k in size.
This research is a few years old but, hopefully, things should be even better by now: http://www.boutell.com/newfaq/misc/urllength.html .. Safari, Opera, and Firefox all go over 80k. I found another source for Chrome that says they "could not find any limits on Chrome and Safari".
Nonetheless, it's tricky because you have no idea if proxies in the middle will be able to cope, mobile clients, and all sorts of things.. so you're right in the sense that it's pointless (if you want it to be universally acceptable ;-)).
Clearly this is awesome, I'm curious as to what lead you to build it? Understanding that you weren't solving a 'problem', but you've created something really compelling here.
A month ago, all the developers in the office spent two days working on interesting small projects, the idea being that at the end of that time we'd have a bunch of cool shippable features. Though it was encouraged to work on useful, sensible things, this was not a requirement.
Anyway, I felt great until 4pm on the Friday, when we presented our creations. Afterwards, I felt flat (as one often does after meeting a deadline or finishing a series of exams). I didn't want the excitement to end.
As I was walking home I had an idea. For some reason I wanted to share thoughts in 72pt Helvetica. I didn't want to broadcast them (I was melancholic after all), but I felt compelled to express them visually.
I began to think about how this might be done. The Web seemed like the obvious platform. I wondered whether it could be done without a database. I remembered something I had heard on a podcast about a site that allowed one play musical notes on a computer keyboard, and would encode these in the URL for easy playback.
This seemed a lot more interesting than sharing my moody thoughts, and now that I had something cool to work on I no longer felt the need to do so anyway.
I think I spent 40 hours working on it that first weekend (yes, I was consumed). I truly believed that I could ship it before showering and leaving for work on Monday! Doing so would have been a mistake – I'm pleased that I spent several weeks ironing out the kinks and integrating with bit.ly and Twitter.
It is worth noting there's a difference in the base64.b64encode function, vs the default MIME Base64 codec on strings[2].
Per RFC2045[1]:
(Soft Line Breaks) The Quoted-Printable encoding
REQUIRES that encoded lines be no more than 76
characters long. If longer lines are to be encoded
with the Quoted-Printable encoding, "soft" line breaks
Due to the insertion of these soft-line breaks, encoding is not the same, as you can verify yourself:
import os
import base64
import unittest
class Base64Test(unittest.TestCase):
def test_long_string_base64_decoding_and_encoding(self):
byte_seq = os.urandom(500)
mime64_encoded = byte_seq.encode('base64')
self.assertNotEqual(base64.b64encode(byte_seq), mime64_encoded)
self.assertEqual(base64.b64decode(mime64_encoded),
mime64_encoded.decode('base64'))
if __name__ == '__main__':
unittest.main()
Decoding, as you can see above, is fine. This makes a difference when encoding a really long string in an HTTP header.
GET http://hashify.me/IyBIYXNoaWZ5CgpIYXNoaWZ5IGRvZXMgbm90IHNvbH... HTTP/1.1
Host: hashify.me
Proxy-Connection: keep-alive
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.205 Safari/534.16
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
The following error was encountered:
Invalid Request
Some aspect of the HTTP Request is invalid. Possible problems:
Missing or unknown request method
Missing URL
Missing HTTP Identifier (HTTP/1.0)
Request is too large
Content-Length missing for POST or PUT requests
Illegal character in hostname; underscores are not allowed
I am just coding the same thing right now (began a week ago). Also had the idea to use bit.ly as shortener (because of its api) and make use of multiple shortened links to store the data. Right before looking at HN I was doing some research for a good js compressing algo.
On the one hand i am a bit disappointed (that i am too late), but on the other hand hashify.me is made far better I could make it. Great realisation.
It's interesting that you say that. For some reason I felt under immense pressure to get this out quickly, but I wrote that off as paranoia (should've heeded Kurt's warning).
Now that it _is_ out in the wild, I can't wait to see what people do with it slash build upon it.
Well, it really _was_ out before in many ways, but not in such a nice one (formated html etc.). Btw, do you have a base64 online converter and read my code six days ago? Just "kidding".
Technically, if you squint hard enough, and assuming that Bit.ly always shortens the same URL to the same code (which on my brief testing it did when I tried the same URL from two different browsers), a one-way hashing function is being created from bit.ly shortened URLs -> the full "hashed" document. Technically. It certainly lacks most of the usual properties we associate with hash functions but the bare skeleton is sort of there. And there's no way to feasibly reverse that hash function algorithmically short of keeping the original input table and querying that, which is how Bit.ly of course works.
> Nice hack, though odd name given that no hashing occurs.
When I registered the domain I imagined that state would be stored in a Twitter-style hashbang. Thanks to `history.pushState` and `history.replaceState`, this hack is not required in modern browsers. :)
"Two-way hash function" is an oxymoron. The fundamental characteristic of a function that makes it a hash function is the inability to (feasibly) reverse it.
I agree with your final conclusion, that Base64 is an encoding, but it is also a hash function and hash functions can very well be two-way. In fact, the trivial hash function just maps data to itself, which is trivially reversible. However, we do want our cryptographic hash functions to be trapdoors to be of use.
Trapdoor function means the function is easy to calculate in one direction, but is difficult to calculate in the other direction without knowing a special bit of information that allows this calculation much easier.
What would a trapdoor value be for SHA1 ?
What I can agree is that I would like the cryptographic hash to be a one-way function, yes. But not trapdoor functions, please :-)
(and can you point to the definition of the hash functions that you described ? I'm curious).
> What would a trapdoor value be for SHA1 ?
Okay, you got me there. It's been long enough since I've actually used the term that I seemed to have remembered it being the same as one-way. Oops.
As for the parenthetical, I'm not sure I take your meaning properly. If you are asking where I learned that hash functions don't have to be one way it seems to be an odd question, but I just checked Wikipedia and it agrees with me, at least.
"Okay, you got me there." - it was not really intended as a trick - sorry for the wording.
As for the trivial hash - now I scrolled down the page on Wikipedia, indeed. I never thought of it this way. (That an identity function on an integer would deserve to be called a hash function :-)
The part that got me was the initial sentence about hash function converting "large, possibly variable amount of data" into a "small datum". As "large" and "small" are implied to be of different sizes, I glazed over a possibility of identity function there.
Oh, I didn't think you intended to trick me. Perhaps some context that'd help you understand my state of mind when I wrote that: while I may be a dev and hacker professionally and as a hobby, I went to university in mathematics specialising in number theory. I'm expected to know silly things like the difference between "one way function" and "trapdoor function", so it's a touch off-putting when I forget.
I know what you mean. For me it was: "Hm. he wrote that with a good confidence. Which part of my knowledge is wrong or incomplete ?" :-) Kind of the same feeling when the significant other garbage-collects a pen you put on the table just a few minutes ago.
So it's a reimplementation of data URIs, except it depends on two different sites being up and responding to replies, so it lacks even the tiny amount of usefulness data URIs have?
I can't think of a single use case where you would go, "Ah ha! Hashify would work perfectly for this!"
One site; once you have the URL you don't necessarily need Hashify to decode it for you. Actually, it seems like Hashify is simple enough that it could be made to work offline using HTML5 without much work.
A similar technique is used already on the website http://www.wondersay.com
Here the URL path is the text to animate and the fragment hash stores the settings. Bitly is also used to hide the contents of the URL (and hence the messages).
This is clever, in that the entire content of the website is not stored in a database, but in external links. Obviously the biggest problem with this technique is having bots crawl your site, so Google's #! convention is used.
would be a good text "host" but needs clones, so that when it disappears in a few years I can still easily convert my urls back into the document therein. that's the one problem these text host sites have. they never last. this gets around this by hosting nothing, merely converting, but still.
and using bit.libya. i dont trust it.
isn't this also somewhat censorship resistant. since the hashify url without its bitly can be put anywhere on the web that is writable, thus making multiple copies available in a covert way.
This is going to break in cases where the request line grows above 8k-16k. Many browsers/proxies implement limits on headers/request lines, for good reasons.
This is quite clearly covered in the actual document!
> For longer documents, Hashify splits the contents into as many as 15 chunks. The chunks are then Base64-encoded and sent to bit.ly in a single request. The bit.ly hashes contained in the response are then "packed" into a URL such as http://hashify.me/unpack:gYi2Ie,g4fpte. Finally, this URL is itself shortened.
So basically, the URLs are files, and copying/pasting them is like copying/pasting encoded data. It is the same as data: urls actually, except maybe for browser security, which is pretty irrelevant for this anyway.
Actually, now I am wondering if an iframe src could be a data: url in browsers. If so, that could be interesting! Showing content without hitting the server. Probably not though, because of cross-domain security again. Any ideas?
What is the purpose of this? The URL is already pointing to the store - the actual site which hosts the page. Instead now we have a shortened URL which stores the document. So they just took away the distributed nature of the URL and put it all in one store (bitly).
This is sweet. It would actually be possible to create a database using bitly entirely in javascript. It would be read-only for clients and read/write for webservers. You could even make it ACID compliant. I might have a go.
I could see this as a very useful implementation for HTML5/Mobile Web sites.
Consider the user experience for the target site on a mobile platform. You have already loaded the site on your mobile device before even taking action, so when you click the link the response is much faster than requesting the site at the click.
That's a feature. You can encode an entire webpage, including JavaScript, and, yes, including alerts.
As long as they don't have user accounts or database access or such, XSS doesn't let an attacker do anything meaningful. It's not weak security, it's just how the site works.
Agreed! One can always link someone to a static HTML page with <script>alert('fu')</script> in the body, but no one would tag that "XSS".
Does hashify.me make it easier to send annoying alert messages to your friends? Sure. Annoying, but no more harmful than sending them to the static equivalent.
Fair enough. It's similar to embedding third-party gadgets for things like iGoogle. However, in practice that content is sanitized.
Note there are risks to hosting arbitrary Javascript beyond stealing cookies. For example, you can steal browser history, discover NAT IP addresses, scan intranet ports, etc. Here's a presentation by Jeremiah Grossman covering some of these attacks:
http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Gro...
Of course, attackers can host malicious content anywhere they control. I could just as easily send someone a bit.ly link to a malicious site I control.
For example, what if a URL were posted to Hacker News, but after the URL was a ?hasifyme=THEHASH, where THEHASH was the Hash of the website linked-to.
This way, if the URL could not be loaded because the server load was to high, you could just forward the URL to Hashify.me and the cache of the plain text from the website would still then be readable.
Boom, instant cache of the website content stored right in the URL!!!!