As someone who's spent a lot of years in an academic social science field and in analytics/modeling more generally, I feel strongly that most educated people have way too much trust in academic literature.
Outright generated data like this is rare and dumb, instead it's very common to play around with data cleaning/variable selection/modeling assumptions to get whatever result you want. I've seen it from grad students, tenured academics, DS/MLE in big tech, basically anywhere where you mix high pressure & neurotic personalities with analytics work.
I'm very skeptical that this will succeed. As with other extremely hyped social medias (anyone remember Clubhouse?) it's already filled with influencers and e-celebs trying to get in early, but these people always poison the well with insincere, low-quality, monetization-oriented content. AFAIK, every successful social media platform grew more or less organically and only had these folks latch on later (usually causing the degradation of the platform when they did).
Clubhouse is a bad example as it was the result of entirely unintentional luck due to the pandemic creating a perfect audience. No one outside of a16z thought it had staying power.
In Threads's case, there's some luck with Elon burning Twitter down, but atleast this launch is intentional.
I agree, but I also think that, at this moment, the conditions are right. People are hungry for a twitter alternative and this could lead to a lot of sincere and authentic users joining Threads.
OTOH Twitter is actively imploding and a toxic place for corporate comms and brands, marketing, etc. A meta/FB/Instagram alternative is exactly what those entities want to move towards and rapidly exit the sinking Twitter/Elon ship. I think it's going to do well.
Having worked in analytics at various tech companies including Fb, the reason is almost always just that app users have way higher usage rates and ad clickthrough rates. Of course if you know basic stats you will realize this is spurious correlation (social media addicts prefer apps, using an app doesn't make one a social media addict), but somehow management never cares.
I find this an extremely odd and uninterpretable article for this reason. When people say that "x drives y" they usually mean that "x causes y". And inflation is by definition price hikes. So article seems to be saying that "corporate profits causes price hikes", which is meaningless since the causal direction should be in the reverse (price changes cause profit changes).
The fact that labor costs (which are part of a company's profit calculation) are considered a separate "driver" makes this even more confusing.
There is a stop sign in front of my house and probably around 95% of cars stop, and the amount of cars that completely ignore it (ie don't even slow down) is easily <1%. I think this is just more of an issue with local cultural norms.
> There is a stop sign in front of my house and probably around 95% of cars stop
There was some discussion amongst my neighbors about a sign near us and we found that the key variable is the speed at which one considers a vehicle to be "stopped". Some folks felt that a 6 mph (10 kph) rolling stop at an empty intersection counted as a stop while others felt that anything more than 1 mph (1.6 kph) meant that the driver was running the stop sign. Since the majority of drivers fell somewhere in that range the estimates for how many people stopped were wildly different.
This class of startup, "build domain specific LLMS using your own data", is extremely crowded right now but I am not optimistic about their future. For large companies, the actual modeling work for this is already easy for any ML team, thanks to existing FOSS work on stuff like PEFT and LoRA. The hard part is figuring out what data goes into the fine tuning process and how to get this data in a usable form, but this is very business specific and can't be automated in a SaaS process.
For SMBs, the value would be in using the LLM to generate responses to customer Q&A/search queries. But these companies aren't going to integrate some external third party service, they'll only use it if it's already baked into their CMS - Wordpress/Shopify/Wix/etc. I just don't see who the final consumer for this product would be.
> "build domain specific LLMS using your own data",
It seems to me that the vast majority of these people would be better off just doing semantic search with their documents chunked, run through an embeddings process, and stored in a vector database, with the search queries and results then run through an LLM at the final step to create an actual "answer". For applications where this is not practical, I agree that LoRA should be the next approach. I have a hard time believing that the future is everyone training their own domain specific LLMs from the ground up.
I wholeheartedly agree with this. Vector databases are easily updatable, searchable by recency, and you can verify where the information came from. Training a custom frozen LLM for every company seems insane. Each company’s data is not that unique - it’s just the numbers that matter, for which you need a vector or traditional database.
Look at QLoRA. The QLoRA can be attached to all layers, allowing you to alter behavior with much less data than the original LoRA implementation. It seems to "stick" better.
I just fine tuned a ~30b parameter model on my 2x 3090s to check it out. It worked fantastically. I should be able to fine tune up-to 65b parameter models locally but wanted to get my dataset right on a smaller model before trying.
Are there any repos and steps you can point to to do this? I'd love to try to do exactly what you describe. I have been trying to do the same and have run into a lot of repos with broken dependencies.
I used: https://github.com/artidoro/qlora but there are quite a few others that likely work better. It was literally my first attempt at doing anything like this, and took the better part of an evening to work through CUDA/Python issues to get it training, and ~20 hours of training.
> is extremely crowded right now but I am not optimistic about their future
Why not? Every larger "cloud" company seems to be randomly buying 1 at the moment to offer "AI" so might get some good deals. This is clearly 1 of them - panic buy.
Most time series models assume you've already deseasonalized your data in advance. Typically, seasonality is obvious to the human doing the modeling (e.g. sales being up near Christmas), so it's usually preferable for the human to deseasonalize the data in advance using a separate model that bakes in some of their human knowledge of how the world works. Forcing the model to learn seasonal trends fully on its own adds another layer of estimation error.
Prophet is popular because it works off the shelf with non-deseasonalized data and mixed frequency data, which makes it great for quick forecasting exercises. But IMO it is never the ideal model if you have a lot of time and expertise to work with.
Honestly this makes the most sense to me. I guessed first before thinking it through and if I were doing the test and it were timed I might've just guessed.
> I would bet that quite a few people just guess when they find this type of question too difficult.
Psych research is such garbage. Of course people are going to start guessing if the problems are boring and they can't solve them. They tried to account for this by comparing timings of only correct answers. But on a multiple choice test, even random guessing will be correct (and fast!) sometimes.
At work we were facing this dilemna. Our team is working on a model to detect fraud/scam messages, in production it needs to label ~500k messages a day at low cost. We wanted to train a basic gbt/BERT model to run locally but we considered using GPT-4 as an label source instead of our usual human labelers.
For us human labeling is suprisingly cheap, the main advantage of GPT-4 would be that it would be much faster, since scams are always changing we could general new labels regularly and be continuously retraining our model.
In the end we didn't go down that route, there were several problems:
- GPT-4 accuracy wasn't as good as human labelers. I believe this is because scam messages are intentionally tricky, and require a much more general understanding of the world compared to the datasets used in this article which feature simpler labeling problems. Also, I don't trust that there was no funny business going on in generating the results for this blog, since there is clear conflict of interest with the business that owns it.
- GPT-4 would be consistently fooled by certain types of scams whereas human annotators work off a consensus procedure. This could probably be solved in the future when there's a larger pool of other high-quality LLMs available, and we can pool them for consensus.
- Concern that some PII information gets accidentally sent to OpenAI, of course nobody trusts that those guys will treat our customers data with any level of appropriate ethics.
I wonder if the LLM could at least reliably label something as "smells funny," and then you could have human labelers only work on that smaller, refined batch. But like you said, PII is a concern. In any case, at the rate its going, does anyone really doubt that LLMs one or two years out will have the same problem?
>> don't trust that there was no funny business going on in generating the results for this blog
All the datasets and labeling configs used for these experiments are available in our Github repo (https://github.com/refuel-ai/autolabel) as mentioned in the report. Hope these are useful!
Did you consider fine-tuning your own copy of GPT-4 that can handle scam messages better? I'm doing something similar with Azure OpenAI Services and custom vector database to handle ham/spam labeling for some of my customer feedback APIs.
Outright generated data like this is rare and dumb, instead it's very common to play around with data cleaning/variable selection/modeling assumptions to get whatever result you want. I've seen it from grad students, tenured academics, DS/MLE in big tech, basically anywhere where you mix high pressure & neurotic personalities with analytics work.