More

WinLychee · on Sept 24, 2023

Never forget https://news.ycombinator.com/item?id=8863

> 1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.

That said, right now the industry is going through some turmoil. We're coming off the high of low interest rates, and it's turning into a mighty hangover. Plus, we're trying to automate ourselves away with AI, and (working in) tech just isn't fun with Scrum/Agile/Meetings/Sprints/Bluh.

dang · on Sept 25, 2023

That comment has been unfairly misinterpreted and did not deserve to turn into a meme of dismissal. BrandonM was sincerely trying to help Drew with his YC application (that's what "app" meant on HN in 2007), and if you read the rest you can see that they had quite a nice exchange.

It's a hobbyhorse but I'm on a mission about this:

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

WinLychee · on Sept 25, 2023

That's helpful context, thanks. Unfortunately I cannot edit my original comment, but good to know.

WinLychee · on Sept 22, 2023

It's a problem with a long tail, and it very much depends on what objective you're optimizing for. In search at least, you aim for "good" and "better", but will never achieve "perfect". It's a pretty interesting space at the meeting point of software and data science. You probably don't necessarily need to read full books before diving in, but play around with "learning to rank" https://xgboost.readthedocs.io/en/latest/tutorials/learning_... and maybe check out https://www.microsoft.com/en-us/research/uploads/prod/2017/0... . Also https://www.tensorflow.org/recommenders/examples/basic_retri... .

WinLychee · on Sept 16, 2023

Different incentives. A publically traded corporation with thousands of workers trying to grow in perpetuity, has different goals than a community project. While the former has more resources, the latter is more mission driven.

WinLychee · on Sept 14, 2023

if you have a task that is easy to split, make a python script that runs on a subset of the task, split into N subsets, and write one output per process? Once they all complete, join together the outputs. Maybe https://docs.dask.org/en/stable/ is a good start if you want a framework. I don't think there's a consensus, it depends on the problem.

WinLychee · on Sept 11, 2023

The software support just isn't there. The drivers need work, the whole ecosystem is built on CUDA not OpenCL, etc. Not to say someone that tries super hard can't do it, e.g. https://github.com/DTolm/VkFFT .

WinLychee · on Sept 10, 2023

What's even better is that net zero isn't doing anything for _already emitted carbon_, or the already accumulated heat energy. We can get to net zero and average temperature is still going to keep rising.

WinLychee · on Sept 5, 2023

These vectors are lower-dimensional than traditional vectors though, aren't they? Vector embeddings are in the hundreds to low thousands range of dimensions (roughly between 128-1024), whereas TF-IDF has the same dimension as your vocabulary. It's also not just about being flat-out better, but about increasing the recall of queries, as you're grabbing content that doesn't contain the keywords directly, but is still relevant. You are also free to mix the two approaches together in one result set, which gives the best of both.

pradn · on Sept 5, 2023

The problems with dimensionality certainly show up even with 256 dimensions. PCA-ing down to a few hundred dimensions is still a problem, and then you have to deal with PCA lossiness too!

mk67 · on Sept 5, 2023

Nobody used TF-IDF for vector lookups without applying a PCA first though.

WinLychee · on Sept 5, 2023

User/Item/Query embeddings are the most common. That way you can generate per-user recommendations, or search results for a given query (with personalization using side information). Video will be interesting, once we have video embeddings (maybe this exists already). It depends on the use-case but a few of your ideas are certainly possible. Generally I've seen them at a coarse rather than fine level, but I'm sure that's out there too.

This looks like a good overview if you want to read about it: https://recsysml.substack.com/p/two-tower-models-for-retriev...

WinLychee · on Aug 31, 2023

IMO there are many Americans who would work in IT/Tech if you paid them to do it, but the risk calculation doesn't currently make sense. If you're an adult making $25-35/hour in your current job, just meeting rent/utilities/obligations, it's hard to accept going back to school for several years to complete a Bachelor's degree, with zero guarantee of employment, but a definite guarantee of debt on the order of ~60K (taking a cheaper option). This is also true for those lower on the socioeconomic totem pole, whose parents are not going to pay for them to go to school. We've seen the result of making student loans widely available, there are many under-employed Americans in debt.

Numerically I agree with you, the debt load is worth the risk, if you're specifically going for software/IT, but the risk is not zero.

TimPC · on Aug 31, 2023

The debt load is worth the risk for someone who's wanted to be an engineer all their life and has seen signs in their schooling they are likely to be good at it. I think the debt load is very questionable, for someone who is at the margin between being an engineer and not being an engineer, has no indicators they would be good at it. Plenty of engineers (especially outside big cities) end up peaking at $80k/year which isn't worth taking a huge amount of debt for especially if their alternative is earning $60k/year already.

rnk · on Aug 31, 2023

I don't know how many typical hourly workers are going to be able to be reasonably trained for it, but there are some. The vast demand for it and tech workers should pull them in. But it doesn't, cost is one reason, but a bigger reason is it's hard or they think they can't do it or are actively dis-interested.

One way to see this is that a million people every year who are already in college choose not to study cs or it type things. They could do it as part of regular college but they don't. Then they get out and can't get a job.

WinLychee · on Aug 26, 2023

PHP has some excellent ideas that other languages can't replicate, while at the same time having terrible ideas that other languages don't have to think about. Overall a huge fan of modern PHP, thanks for this writeup.

pm · on Aug 27, 2023

Which excellent ideas does it have that other languages can't replicate?

WinLychee · on Aug 27, 2023

Perhaps more precisely: the defacto Apache-as-runtime + PHP model simplifies a ton of things. Namely your request state is created and destroyed all within the context of a single process, and you don't have to reason about shared state with other in-flight requests (unless you explicitly choose to go this route). It makes some bad programming patterns workable, because your state doesn't linger over a long-running period. Deploys are also super fast, you just have to swap the application code on disk and it'll get picked up on the next request (in-flight requests will keep processing with the old version IIRC). It's productive if not necessarily pretty. Also it has a type system now!

As a related thought, a lot of the modern serverless stuff feels like it's reinventing the ideas of Apache + PHP, or perhaps CGI?