Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sounds like an altogether easier problem, assuming you don't care to compensate for noise in your data somehow - supposing you want to identify all distinct m×n×k-sized "elements", simply use some appropriate rolling hash[1] (i.e. a hash of a window that you can update in constant time as you slide the window) as a key mapping to a list of "elements" you have seen so far with that hash, and only do pointwise comparison (to see if you have seen that exact pattern before) if the hashes match. Assuming you don't have too many distinct elements, this should give you performance close to linear in the size of your data.

[1] https://en.wikipedia.org/wiki/Rolling_hash



Whew, now that was a wikipedia binge. Thanks for the leaping off point! Unfortunately, our data is VERY noisy. We can do some techniques to smooth the data, but the unfortunate part is that the things that matter in biology are under the diffraction limit of light. Inherently, what we want out of the data will then be noisy. Gestalt structures are less noisy and these techniques can help with that (think using a fiber-optic read-out as you go into the brain for surgery so you know what region you are in), at least I think. Also, the 'elements' of our data set are unknown, but likely very large. Maybe in the 100s to 1000s, not 5 or 10. It turns out the brain is complicated, who knew?!

Still, thanks a ton for this info! I think it can really help with some computational bio stuff another lab is working on here (viral similarity in genes in your DNA)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: