Hacker News new | past | comments | ask | show | jobs | submit login

Well sorry but you don't have a clue what you're talking about.

I very much work in "big data" with about 2 terabytes of new data coming in every day that has to be ingested and processed with hundreds of jobs running against them. The data needs to be queryable via an SQL like language and analyzed by a dozen data scientists using R or Map Reduce.

There isn't anything on the market today that has been proven to work in environments like this and has the tooling to back it up. Unless you want to prove everyone e.g. Netflix, Linkedin, Spotify, Apple, Microsoft wrong ?




>There isn't anything on the market today that has been proven to work in environments like this and has the tooling to back it up.

Here you go:

http://kx.com/


Well sorry but you don't have a clue what you're talking about.

From the Guidelines:

Be civil. Don't say things you wouldn't say in a face to face conversation.

When disagreeing, please reply to the argument instead of calling names.


I don't see how he broke the guidelines.


A few large organizations and microscpic scientists have big data. 99.9% of users do not.


The point here is that the same functionality could be implemented way more efficiently even with shell scripts.

The idea of using standard UNIX tools for the showcase is good one. Basically, it tells you that a modern FS is very good at storing chunks of read-only data (one don't need Java for that) with efficient caching and in-kernel procedures. That using pthreads for jobs is a waste, because context-switching has its costs, etc.

To put it simple - by mere rewriting basic functionality in, say, Erlang, one could get orders of magnitude more efficient implementation.

The only selling point of Hadoop is that it exist (mature, stable, blah-blah). It also has one problem - Java. But as long as hardware is cheap and credit is easy - who cares?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: