Hacker News new | past | comments | ask | show | jobs | submit login

That site is nice, but they also want to charge me close to $2000 for a list of sites using a single technology, I could definitely do this myself for much less.

Anyone want to build a free version with me?




To be honest, I didn't dig into that site deeply enough to notice that. I pulled this out of an email I was sent just yesterday.

Sure, I'd be interested in collaborating if there's not one out there already.

The spider part is easy, you just need a web client (e.g. Ruby's or Python's Mechanize, Java's HTTPClient, even just wget or curl) coupled with an HTML parser (Hpricot, Nokogiri, Tidy, etc., or even some basic regular expressions). One can readily hack something rough together in an hour or two. Gabriel might have a lot of the data and certainly the code in order to produce DuckDuckGo, but he may have good reasons to keep that private.

The harder part, and the part that I wonder if builtwith is doing correctly, is to do the technology detection. Things like JavaScript libraries or CSS frameworks might be fairly easy to detect, but it is not trivial to reliably detect some of the server side technologies. I recently put together a script to survey the operating system and web server in use at a large number of domains from Alexa's top million list (similar to what Netcraft does) and there are plenty of servers that make that difficult, let alone determining whether a site is built with Ruby, Java or PHP. There are HTTP headers that could tell you, but not everyone uses them. There are certain signatures that give a pretty good clue, but those aren't always present and can be downright misleading. (I've seen sites that migrated from ASP to Java Servlets, for example, that kept .aspx URLs to avoid breaking links.)

If I remember correctly someone posted a JavaScript framework survey based on a similar spidering approach on HN a while back, you might be able to find it at searchyc.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: