Hacker Newsnew | past | comments | ask | show | jobs | submit | rouli's commentslogin

wonderful!

Coincidently, I read today a paper [1], which has a nice video demo here [2] describing a very similar system but with only one type of cells and a simple rule. Surprisingly enough, particles form cell type structure and seem to "reproduce".

[1] http://www.nature.com/articles/srep37969

[2] https://www.youtube.com/watch?v=makaJpLvbow


really nice! in your fitness function - wouldn't you prefer to normalize somehow for the the line's length? i.e. - longer lines (going between two far away pins) will cover more dark pixels, and are more likely to be picked, even though they might also cover many bright pixels. I wonder if the average darkness of pixels along the line will work better?


Normalizing would most likely only change the order of chosen paths. I do feel there would be something to be gained from penalizing the crossing of light areas. I there say this would increase the quality of finer details.



This is fun! I took the boys average weight data given in [1]

months from conception, average weight (kilos)

(9, 3.25)

(15, 7.5)

(21, 9.97)

(33, 12.88)

(45, 14.97)

(69, 18.97)

and saw that it really looks like a logarithmic function. So, fitted y to log(x), got y~7.4685*log(x)-13, with R^2=0.9965 (thank you R!). Extrapolated back, and found that at conception, the fetus weighs -inf kilos. Not surprising result, I must say. However, I was surprised to find that only after five months of gestation, the embryo reaches the critical mass of 0 kilos. Science!

[1] http://www.buzzle.com/articles/average-weight-for-children-b...


cool idea!

could be useful for companies (like the one I work at) for employees to tweet via the company official account which is handled by marketing. (If you don't already), you may want a review panel for outgoing tweets, for which (I think) you can charge.


I don't believe Facebook set up this contest in order to find solutions to a problem they have. The biggest clue is that the graph given to contenders was a directed graph, unlike Facebook's social graph.

It seems very likely to me they used that problem in order to find good candidates for their data-group, which is exactly what this contest was all about. Either that, or they are trying to expand the social graph of Instagram :)


I'm pretty sure that Facebook's social graph is directed since they introduced the Subscribe To feature.


It's a very interesting field. Can insurance companies detect the early stages of Parkinson's when you call their call-centers and change your rates accordingly? Here's a related dissertation on detecting mental health condition by using voice recordings http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-55... (haven't read it yet, but was meaning to for quite a while)


how did it go for you? I finished at the disappointing 12th place...


Being a machine learning (and Kaggle competitions) novice, can you explain what it means that your feature selection didn't properly implement cross-validation?


What it means is that we did something stupid. I'm not sure how novice you are, so I'll take it from the top.

So, lets say you have a large training set - a large number of randomly chosen example essays, and a score for each essay. You want to develop a model that will predict/guess scores of other essays, in future.

Your model is going to be evaluated against a set of essays - the test set - which you do not see until after the competition is over (at which point you may no longer make changes to your model).

Now, in an ideal world, we'd be able to take just one look at the training set, and instantly learn the best way there is to predict similar essay scores in future. But, in practice, that's not going to be the case. We, the designers of the ML approach, initially don't know how to build essay grading systems. So we are going to need to do a development process, where we think, then try a particular approach, then see how well it works. And we have to iterate on that process.

So, the key bit there, is that we need to see whether a particular approach is working. We can't check the test-set - we don't get access to that until the end of the competition. So, instead, we take the training set, and partition it up into 2 parts, lets call them A and B. Essays from both A and B are in the training set, as we've just said, hence we know the scores for the essays of both A and B.

We now train our model with essays from A, without it being allowed look at essays from B. Then, afterwards, we have it make predictions/guesses of the scores of essays in B, and see how well it does.

This allows us to evaluate how well our modelling approach is doing.

In practice, we repeat this procedure for many different partitions of the training set, into many different A and Bs.

So, thats cross validation.

The key thing to realize is that if we use any information from B, when training A, its not really a fair evaluation. The model we make from A will have been contaminated with information from B.

Which is simple enough.

The thing is, in practice, before doing any of this, people generally go through a conceptually separate process, of deciding what features their model is going to use - 'Feature Selection'. Some part of this process is intuitive, but there are also algorithms people use, to help them do feature selection.

The wrong way to do Feature Selection, is to run your Feature Selection algorithms on the entire training set, and then afterwards train your model on one partition A of the training set, and test it on separate essays from B.

This is wrong, because the essays from B were seen in the feature selection process. Sometimes that doesn't matter - if you are dealing with large data sizes, etc. But on the Kaggle essay competition, in particular because we had rather a large amount of features (e.g. bag-of-words n-grams, parts-of-speech n-grams), relative to the number of training examples, it certainly did matter.

As a result, we saw much higher scores on the training set, than we would have gotten on the test set.

The right way to do things is to do any automatic feature selection only on the training partition of the training set, A. One way to do this is to build any algorithmic feature selection into your process after the point you partition into A and B, during cross validation. But its very easy, when hacking on a competition solution, over a weekend, to accidentally put things in the wrong order, in the codebase.

For this reason, if you have limited training data, its generally a good idea, at the very start of the process, to take some training data, and put it somewhere completely separate, until you think your algorithm is ready to go, and then use it as a final test. People often call this the validation-set.


Not that novice, but your explanation was exquisite, thanks!


different use cases. Leap motion is for (extreme) short distance manipulation, which probably gives it the higher resolution, while Kinect is for mid-distance manipulation, which works well for games. I wonder though why would one opt for Leap motion and not buy a real touch screen if he needs to be that close to the screen anyway.


> I wonder though why would one opt for Leap motion and not buy a real touch screen if he needs to be that close to the screen anyway.

A couple of reasons that came to mind:

1) Smudges! I tolerate them on my touch devices, but they really annoy me. Gesture-based manipulation takes you three-quarters of the way to touch UI without the actual touching.

2) Gestures detection can occur in a plane that is different than the screen surface. A big reason we don't all use touch screens on our desks is because of the "gorilla arms" problem. Any interface that requires you to extend your arms to shoulder height must be limited to infrequent use because of fatigue. With a remote sensor like this, you wouldn't have to reach all the way to your screen.

You can see the benefits of this with a simple exercise. Reach forward at chest height to the point that your hand just barely passes the front edge of your keyboard and hold it for a 10 count. Now do the same exercise, but reach all the way to your screen. Hold that for a 10 count and the difference will become clear.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: