Hacker News new | past | comments | ask | show | jobs | submit | dsalaj's comments login

This failed completely after trying to search for songs that don't have a unique name. This seemingly just does a basic string search on other streaming platforms. Sad...


Deep state-space models (Deep SSMs) have shown capabilities for in-context learning on autoregressive tasks, similar to transformers. However, the architectural requirements and mechanisms enabling this in recurrent networks remain unclear. This study demonstrates that state-space model architectures can perform gradient-based learning and use it for in-context learning.


Cool, thanks! It desperately needs the copping feature for the original image, so I don't need other tools.


My interest in sound design and music production took a surprising turn and I now spend my weekends studying alien vocalization. More specifically, figuring out how to synthesize believable alien vocals and make them expressive and customizable enough to be a tool for anyone who needs that sort of thing for their creative work.

I am releasing everything for free and the first "instrument" and sample pack with previews/videos is already available here: https://neuromorph.gumroad.com/l/alien_vocalization_study_1


That’s… amazing. I love it.


In celebration of the newest Star Wars droid, Pip, featured in the upcoming series The Acolyte, Lucasfilm and Autodesk, a global leader in technology and software for Design and Make industries, are teaming up to launch a limited-time droid-design contest*. Fans can go to autodesk.com/droid to submit their own designs for a chance to win special grand prizes including a trip to and guided tour of the Lucasfilm headquarters in San Francisco, California, or The Walt Disney Studios Lot in Burbank, California and more

Adam Savage video: https://www.youtube.com/watch?v=LwQ-ssVd0ic


"Grown for centuries by indigenous farmers in rural Mexico, this incredibly rare corn can self-fertilise. In episode three of 'Planet Fix', we explore how this wonder crop could help tackle world hunger, and even end farming's toxic reliance on chemical fertilisers for good!"


What does self fertilization have to do with ending farming's toxic reliance on chemical fertilizers?


In this case "self fertilization" refers to fixing nitrogen, not "fertilizing" in the sense of reproduction.


Knowing exactly how much nitrogen they can fix is important. It's unlikely to be enough to take care of the nitrogen requirements of high yield corn.


In the video it said something about modulating the amount of nitrogen fixation based on the plant's needs. Could mean that it's able to scale it up if its needs are higher, but I imagine this is part of the breeding work they're doing.


I have been working on an entity matching solution for two years now, and I have decided to write down some of the learning I picked up along the way. Turns out there are too many relevant details to cover in a single post, so I will cover the topic in multiple parts.

This first part is the high-level introduction, useful for project planning and architecture decisions that need to be made early in the development process. Any feedback is welcome, along with wishes for the follow-up parts if you have something specific that you would like to be covered.


Thank you for a very helpful writeup!

Do you have any materials on word embedding strategies past Word2Vec? BERT and beyond?

I am currently working on a recommendation engine for a large library - original idea being to find "similar" documents - the funding comes from a plagiarism checking project.

I was slightly surprised how deceptively simple the widely cited winnowing paper is https://dl.acm.org/doi/10.1145/872757.872770 . The key idea being simple mod reduction of hashed fingerprints.

My project's goal is to find phrase level similarities to assist researchers.

It seems k-grams, n-grams, tf-idf and even Word2Vec is not going to cut it. A "smarter" context aware embedding is in order. My foray in training BERT from scratch was not very successful. - My corpora are not in English...

PS. As usual I spend most of the time on improving OCR quality and preprocessing corpora...


For achieving a high accuracy for matching, it really comes down to details of your specific domain and dataset. Regarding the attempt of using large pre-trained language models to be able to find semantically similar documents, which is what you are attempting now, maybe try Whisper or other multilingual models, and then fine tune them on your dataset.

But a better bet might be actually turning looking into simpler embedding and methods and attempting to directly improve them by including some domain knowledge in the method or the process. Again, it is hard to judge what might work better just looking at the surface.

In case you really need to work with labeled datasets, set up a strong baseline, look into the active-learning methods and set up the loop, do a few iterations and try to predict if it will scale sufficiently fast to your target accuracy.


Thank you for this writeup. Having done some work on deduplication/matching systems, my experience (as with many things in data science) is that there are a lot of things consider and there is no single best solution. Hopefully you are able to keep up with this series, because I think it will be very helpful to many people.


I understand that youtube recommendations can be a very useful tool, but I definitely do not want depend on any opaque recommendation system for my daily media consumption or whatever.

Given the existence of "Subscriptions" page I don't really understand the problem. Do you really want to consume regularly from the sources you didn't explicitly "verify" yourself i.e. subscribed to?

I simple use the Unhook browser plugin which hides the Home, Trending, recommendations side bar, comments, etc. for me so I only have the Subscriptions page, and the search bar. Makes the youtube experience SO MUCH better.


>Given the existence of "Subscriptions" page I don't really understand the problem. Do you really want to consume regularly from the sources you didn't explicitly "verify" yourself i.e. subscribed to?

Of course I do! This is often how I find new things that I'm interested in, or just videos that I want to watch that aren't a part of channels I'm already subscribed to.


I can not get this to work. Tried with both Firefox and Chrome, and turning of the add blocker. I just get the "Entering world..." popup that just stays there. Occasionally I get the red popup with cryptic "World is clean" message. I tried registering, I tried different worlds, even tried it on my phone, but it just doesn't start.... :(


Sorry about that. If you try again now you might have better luck...


Thanks for the great tips! I'll make sure to update the content with these points.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: