Hacker Newsnew | past | comments | ask | show | jobs | submit | typpo's commentslogin

Nice work! This is like a much better version of Ancient Earth[0], which I made ~10 years ago using GPlates[1]. I like your approach of rendering the map itself from data, which makes it continuous, rather than just wrapping map textures around a globe.

[0] https://dinosaurpictures.org/ancient-earth#240

[1] https://www.gplates.org/


This is so fun and creative. Congrats on launching!


Thanks for playing it!


Thanks to Meta for their work on safety, particularly Llama Guard. Llama Guard 3 adds defamation, elections, and code interpreter abuse as detection categories.

Having run many red teams recently as I build out promptfoo's red teaming featureset [0], I've noticed the Llama models punch above their weight in terms of accuracy when it comes to safety. People hate excessive guardrails and Llama seems to thread the needle.

Very bullish on open source.

[0] https://www.promptfoo.dev/docs/red-team/


is there a #2 to llamaguard? Meta seems curiously alone in doing this kind of, lets call it, "practical safety" work


If anyone is interested in evaling Gemma locally, this can be done pretty easily using ollama[0] and promptfoo[1] with the following config:

  prompts:
    - 'Answer this coding problem in Python: {{ask}}'

  providers:
    - ollama:chat:gemma2:9b
    - ollama:chat:llama3:8b

  tests:
    - vars:
        ask: function to find the nth fibonacci number
    - vars:
        ask: calculate pi to the nth digit
    - # ...
One small thing I've always appreciated about Gemma is that it doesn't include a "Sure, I can help you" preamble. It just gets right into the code, and follows it with an explanation. The training seems to emphasize response structure and ease of comprehension.

Also, best to run evals that don't rely on rote memorization of public code... so please substitute with your personal tests :)

[0] https://ollama.com/library/gemma2

[1] https://github.com/promptfoo/promptfoo


In Ollama, Gemma:9b works fine, but 27b seems to be producing a lot of nonsense for me. Asking for a bit of python or JavaScript code rapidly devolves into producing code-like gobbledegook, extending for hundreds of lines.


Had a chance to do some testing and it seems quite good on oneshot tasks with a small context window but as you approach context saturation it starts to go way off the rails. Maybe this is an implementation issue? I'm using Q6_K quants of both sizes in ollama. I'll report back if I figure it out.

A larger context window really helps on RAG tasks, it's frustrating that a lot of the foundational models have such small windows.


Sorry about this – working on fixing the issue with hitting the context limit. Gemma 2 supports a 8192 context limit – which can be selected if you provide the `num_ctx` parameter in the API or via `ollama run` with `/set parameter num_ctx 8192`


Thanks! If you have a moment can you give me a quick explainer on what happens when you hit the context limit in ollama? I had assumed that ollama would just trunc the context to whatever is set in the model, but I guess this isn't the case?


Currently when the context limit is hit, there's a halving of the context window (or a "context shift") to allow inference to continue – this is helpful for smaller (e.g. 1-2k) context windows.

However, not all models (especially newer ones) respond well to this, which makes sense. We're working on changing the behavior in Ollama's API to be more similar to OpenAI, Anthropic and similar APIs so that when the context limit is hit, the API returns a "limit" finish/done reason. Hope this is helpful!


27b is working fine for me, hosted on ollama w/ continue.dev in VSCode.


The tokenizer in llama.cpp probably needs fixing then or it has some other bug.


Definitely. I tried gemma2:27B model with phrases like "translate the following sentence to language X" and it even failed to understand the task and spat out completely irrelevant things, like math formulas.

OTOH, smaller model did it perfectly.


Care to explain why you think so?


It's a horribly restrictive syntax. Instead of allowing modularity and flexibility in how I structure parts of the config, I'm instead locked into this indentation-obsessed and sparse file that forces me to learn the exact mental model of whoever designed the yaml.

Almost nobody uses explicit typing in yaml.

The fact you need features like folding strings or chomp characters is a symptom that yaml is trying to work around limitations that shouldn't be there to begin with.


The problem in this case is not that it was trained on bad data. The AI summaries are just that - summaries - and there are bad results that it faithfully summarizes.

This is an attempt to reduce hallucinations coming full circle. A simple summarization model was meant to reduce hallucination risk, but now it's not discerning enough to exclude untruthful results from the summary.


The amount of negativity in these comments is astounding. Congrats to the teams at Google on what they have built, and hoping for more competition and progress in this space.


I think it's just hype fatigue.

There's genuinely impressive progress being made, but there are also a lot of new models coming out promising way more than they can deliver. Even the Google AI announcements, which used to be carefully tailored to keep expectations low and show off their own limitations, now read more like marketing puff pieces.

I'm sure a lot of the HN crowd likes to pretend we're all perfectly discerning arbiters of the tech future with our thumbs on the pulse of the times or whatever, but realistically nobody is going to sift through a mountain of announcements ranging from "states it's revolutionary, is marginal improvement" to "states it's revolutionary, is merely an impressive step" to "states it's revolutionary, is bullshit" without resorting to vibes-based analysis.


It's made all the worse by just being a giant waitlist. Sora is still no where to be seen three months later, GPT-4o's conversational features aren't widely rolled out yet, and Google's AI releases have been waitlist after waitlist after waitlist.

Companies can either get peopled hyped or have never-ending georestricted waitlists, they can't have their cake and eat it too.


Isn’t there a lot of positive forward motion and fruitfulness in the current state of the open source llama-3 community?


We have to take account that this community (good chunk have stakes in YC and a lot to gain from secondary shares in OpenAI) and platform is going to favor its own and be aware that Sam Altman is the golden boy of YC's founder after all.

So of course you are going to see snarky comments and straight up denial in the competition. We saw that yesterday in the comments with the release of GPT4o in anticipation of Gemini 2.0 (GPT-5 basically) release being announced today at Google I/O

I'm SORA to say Veo looks much more polished without jank.

Big congratulations to Google and their excellent AI team for not editing their AI generated videos like SORA


> We have to take account that this community (good chunk have stakes in YC and a lot to gain from secondary shares in OpenAI)

You have to be pretty deep inside your own little bubble to think that even more than a 0.001% of HN has "stakes in YC" or "secondary shares in OpenAI".


It can be a vocal minority. Still vocal.

I wouldn't discard.


I have 0% stake in any YC, and I'm very vocal in my negativity against any of these "AI" anythings. All of these announcements are only slighty more than a toddler anxious to show the parental units a finger painting looking to hang it on the fridge. Only instead of the fridge, they are a hoping to get funding/investment knowing that their product is not a fully fledged anything. It's comical.


> platform is going to favor its own and be aware that Sam Altman is the golden boy of YC's founder

I don’t know if there is a sentiment analysis tool for HN, but I’m pretty sure it’s been dead negative for Altman since at least Worldcoin.


Something in this vein was just posted here a few days back: https://news.ycombinator.com/item?id=40307519


A land of contrasts, etc.


The amount of copium in this response is astounding.

Yes, there is a noticeable negative response from HN towards Google, and there has always been especially when speaking about their weird product management practices and incentives. Google hasn't launched any notable (and still surviving, Stadia being a sad example of this) consumer product or service in the last 10 years.

But to suggest there is a Sam Altman / OpenAI bias is delusional. In most posts about them there is at least some kind of skepticism or criticism towards Altman (his participation in Worldcoin and his accelerationist stance towards AGI) or his companies (OpenAI not being really open).

PS: I would say most people lurking here are just hackers (of many kinds, but still hackers), not investors with shady motives.


> Google hasn't launched any notable (and still surviving, Stadia being a sad example of this) consumer product or service in the last 10 years.

Google Photos is less than 10 years old and I think a lot of people use it.


My argument wasn't that there was a cabal of shady investors trying to influence perception here. your observation is certainly valid there is general disdain for Google but specifically I'm calling out people that were blatantly telling lies and making outlandish claims and attacking others who were simply pointing out that some of those people have financial motives (either being backed by YC or seek to benefit from the work of others).

None of this is surprising to me and shouldn't shock you. You are literally on a site called Ycombinator. Had this been another platform without ties to investments or drawing from crowd that actively seeks to enrich themselves through participation in a narrative, this wouldn't even be a thing.

Large number of people who read my comment seems to agree and this whole worldcoin thing seems to me just another distraction (We've already been through why that was shady but we are talking about something different here).


Well, you have a point. I've always thought that Hacker News <> YCombinator, but maybe the truth is in the middle. At the very least, this is food for thought.


Yup, there's a significant anti-Google spin in HN, twitter. For example, here's paulg claiming that Cruise handles driving around cyclists better than Waymo [1], obviously not true to anyone who's used both services

[1] https://twitter.com/paulg/status/1360341492850708481


You were using both Cruise and Waymo 3 years ago?


I think it’s fear. Maybe not openly, but people are spooked at how fast stuff is happening, so shitting on progress is a natural reaction.


I suspect it's also a general fatigue with the over-hype. It is moving fast, but every step improvement has come with its own mini hype cycle. The demos are very curated and make the model look incredibly flexible and resilient. But when we test the product in the wild, it's constantly surprising the simple tasks it blunders on. It's natural to become a bit cynical and human to take that cynicism on the attack. Not saying it's right, just natural, in the same way that it's natural for the marketing teams to be as misleading as they can get away with. Both are annoying, but there's not much to do.


Cynicism is (arguably) the intellectually easy strategy.

If you’re cynical and you get it right that everything “sucks” you look like a genius, if you get it wrong there is no penalty.

If you aren’t cynical and you talk about how great something is going to be and it flops you look like an idiot. The social penalty is much higher.


Progress? There are loads of downsides the AI fans won't acknowledge. It diminishes human value/creativity and will be owned and controlled by the wealthiest people. It's not like the horse being replaced by the tractor. This time it's different there is no place to move to but doing nothing on a UBI (best case). That same power also opens the door to dystopian levels of censorship and surveillance. I see more of the Black Mirror scenarios coming true rather than breakthroughs that benefit society. Nobody is denying that it's impressive but the question is more whether it's good overall. Unfortunately the toothpaste seems to be out of the tube.


>Progress? There are loads of downsides the AI fans won't acknowledge.

I don’t know if this is true.

>It diminishes human value/creativity

I don’t see this at all, I see it as enhancing creativity and human value.

>and will be owned and controlled by the wealthiest people.

There are a lot of open source models being created, even if they are being released by Meta…

>It's not like the horse being replaced by the tractor. This time it's different there is no place to move to but doing nothing on a UBI (best case).

So, like, you wouldn’t do anything if you could just chill on UBI all day? If anything I’d get more creative.

> That same power also opens the door to dystopian levels of censorship and surveillance.

I don’t disagree with this at all, but I think we can fight back here and overcome this, but we have to lean into the tech to do that.

> I see more of the Black Mirror scenarios coming true rather than breakthroughs that benefit society.

I think this is basically wrong historically. Things are very seldom permanently dystopian if they’re dystopian at all. Things are demonstrably better than they were 100 years ago, and if you think back even a couple decades things are often a lot better.

The medical applications alone will save a lot of lives.

> Nobody is denying that it's impressive but the question is more whether it's good overall. Unfortunately the toothpaste seems to be out of the tube.

There are going to be annoyances, but I would bet serious cash that things continue to get better.


> So, like, you wouldn’t do anything if you could just chill on UBI all day? If anything I’d get more creative.

There is a lot of empirical research on UBI and all of it shows that it has very little effect on employment either way. That is, nothing will change here.

(This is probably because 1. positional goods exist 2. romantic prospects don't like it when you're unemployed even if you're rich.)


> It diminishes human value/creativity and will be owned and controlled by the wealthiest people

"When you go to an art gallery, you are simply a tourist looking at the trophy cabinet of a few millionaires" - Banksy


Then… isn’t AI generated art something that empowers the non-millionaires?


I have noticed this the most in SWE's who went from being code writers to "human intention decipherers". Ask a an SWE in 2019 what they do and it was "Write novel and efficient code", ask one in 2024 and you get "Sit in meetings and talk to project managers in order to translate their poor communication to good code".

Not saying the latter was never true, it's just interesting to see how people have reframed their work in the wake of breakneck AI progress.


Well for me it linked to a Google Form to join a waitlist lol, so I'm not exactly pumped


Honestly just think that Google has burned their good will at this point. If you notice, most announcements by Apple are positively received here and same with OpenAI. But since Google's "don't be evil" persona has faded and since they went through so much churn WRT products. I think most people just don't want to see them win.


You have to give Google credit as they went against the OpenAI fanatics, Google doomsday crowd and some of the permanent critics (who won't disclose they invested in OpenAI's secondary share sale) that believe that Google can't keep up.

In fact, they already did. What OpenAI announced was nothing that Google could not do already.

The top comments around Sora vs Veo suggesting that Google was falling behind, given the fact that both are still unavailable to use wasn't even a point to make in the first place, but just typical HN nonsense.


> What OpenAI announced was nothing that Google could not do already

I don’t think I’ve seen serious criticism of Google’s abilities. Apple didn’t release anything that Xerox or IBM couldn’t do. The difference is they didn’t.

Google’s problem has always been in product follow through. In this case, I fault them for having the sole action item be a buried waitlist request and two new brands (Veo and VideoFX) for one unreleased product.


> I don’t think I’ve seen serious criticism of Google’s abilities

Serious or not, that criticism existed on HN - and still does. I've seen many comments claiming Google has "fallen behind" on AI, sometimes with the insinuation the Google won't ever catch up due to OpenAI's apparent insurmountable lead


> Google’s problem has always been in product follow through.

Google is large enough to not care about small opportunities. It ends up focusing on bigger opportunities that only it can execute well. Google's ability to shut down products that dont work is an insult to user but a very good corporate strategy and they deserve kudos for that.

Now, coming back to the "follow through". Google Search, Gmail, Chrome, Android, Photos, Drive, Cloud etc. all are excellent examples of Google's long term commitment to the product and constantly making things better and keeping them relevant for the market. Many companies like Yahoo! had a head start but could not keep up with their mail service.

Sure it has shut down many small products but that is because they were unlikely to turn into bigger opportunities. They often integrated the best aspect of those products into their other well established products such as Google Trips became part of search and Google Shopping became part of search.


> coming back to the "follow through". Google Search, Gmail, Chrome, Android, Photos, Drive, Cloud etc. all are excellent examples of Google's long term commitment

Do you have any examples of something they launched in the last decade?


Pixel smartphones: Launched in 2016 Google Home smart speaker: Launched in 2016 Google Wifi mesh Wi-Fi system: Launched in 2016 Google Nest smart display: Launched in 2018 Google Nest Wifi mesh Wi-Fi system: Launched in 2019 Stadia Cloud gaming platform*: Launched in 2019 Google Pay (formerly known as Tez): 2028


Photos was launched in the last decade.


Early 2015 - technically correct. But I hope you would agree that their ability to release successful products has significantly diminished between the decade 2005-2014 to 2015-2024, apparently in reverse proportion to their headcount.


> Google is large enough to not care about small opportunities. It ends up focusing on bigger opportunities

that result in shittier products overall. For example, just a few months ago they cut 17 features from Google Assistant because they couldn't monetize them, sorry, because these were "small opportunities": https://techcrunch.com/2024/01/11/google-is-removing-17-unde...

> all are excellent examples of Google's long term commitment to the product and constantly making things better and keeping them relevant for the market.

And here's a long list of excellent examples of Google killing products right and left because small opportunities or something: https://killedbygoogle.com/

And don't get me started on the whole Hangouts/Meet/Alo/Duo/whatever fiasco

> Sure it has shut down many small products but that is because they were unlikely to turn into bigger opportunities.

Translation: because they couldn't find ways to monetize the last cent out of them

---

Edit: don't forget: The absolute vast majority of Google's money comes from selling ads. There's nothing else it is capable of doing at any significant scale. The only reason it doesn't "chase small opportunities" is because Google doesn't know how. There are a few smaller cash cows that it can keep chugging along, but they are dwarfed by the single driving force that mars everything at Google: the need to sell more and more ads and monetize the shit out of everything.


I saw it here alone. A lot of people simply have no idea the level of research ability and skill Google, the inventor of the Transformer, has.


Don't forget SORA edited their "ai generated" videos while Google did not here.

Where did SORA get all its training videos from again and why won't the executives answer a simple Yes/No question to "Did you scrape Youtube to train SORA?"

Google attorneys want to know.


> Don't forget SORA edited their "ai generated" videos while Google did not here.

Wait, really? Could you point to proof for this? I'm very curious where this is coming from


Google does not care to start a war where every company has to form explicit legal agreements with every other company to scrape their data. Maybe if they got really desperate, but right now they have no reason to be.


I have no doubt about Google's capabilities in AI, my doubt lies on the productization part. I don't think they can produce something that will not be a complete mess


> In fact, they already did.

In terms of software that's actually been released Google is still at best in third place when it comes to AI products.

I don't care what they can demo, I care what they've shipped. So far the only thing they've shipped for Veo is a waitlist.


I hope they didn't mess this one up with ideologically driven non-sense, like they did with Gemini.


It’s tiring. Same thing happened to the GPT-4o announcement yesterday. Apparently because there’s no unquestionable AGI 14 months after GPT-4 then everything sucks.

I always found HN contrarian but as I say it’s really tiring. I’ve no idea what the negative commenters are working on on a daily basis to be so dismissive of everybody else’s work, including work that leaves 90% of the population in a combination of awe and fear. Also people sometimes forget that behind big corp names there are actual people. People who might be reading this thread.


What's also tiring is that no one is allowed to have any critical thoughts because "it's tiring".

From my own perspective the critique is usually a counter balance to extreme hype, so maybe let's just agree it's ok to have both types of comments, you know "checks and balances".


Being cynical is not a counterbalance though, it’s just as low effort as the hype people.


AI is a pretty direct threat to software engineering. It's no surprise people are hostile towards it. Come 2030, how do you justify a paying someone $175k/yr when a $20/mo app is 95% as good, and the other 5% can be done by someone making $40k/yr?


Productivity improvements are good for workers; you should ask yourself why the invention of the compiler didn't cause this to happen already.

Or why the existence of the UK hasn't, since they have a lot of English speaking programmers paid in peanuts.


Yeah it's pretty unfortunate. Saying something sucks is such a lack of understanding that things are not static. I guess it's a sure way to be right, because there will always be progress and you can look back and say "See I told you!"


Psh. Things are not static. Progress sucks now. Haven't you heard of enshitification? You can always look back and say, "see? I told you it would suck in the future!"

...why am I feeling to urge to point out that I am only making a joke here and not trying to make an actual counter point, even if one can be made...?


I commented on this elsewhere, but being a negative Nancy is really a winning strategy.

If you’re negative and you get it wrong, nobody cares, get it and right you look like a damn genius. Conversely, if you’re positive and get it wrong, you look like an idiot and if you’re right you’re praised for a good call once. The rational “game theory” choice is to predict calamity.


Yeah it’s funny that optimism in the long term is optimal and pessimism in the short term is optimal.


Right, but I think people sometimes get the “what constitutes long term” factor a little bit wrong.

I am still talking to a lot of people who say, “what can any of this AI stuff even do?” It’s like, robots you could hold a conversation with effectively didn’t exist 3 years ago and you’re already upset that it’s not a money tree?

I think that peoples expectation horizon narrowing down may be the clearest evidence that we’re in the singularity.


[flagged]


"Are You Not Entertained"


hatred fuels our capitalism more than anything


How do you figure?


Paul's benchmarks are excellent and they're the first thing I look for to get a sense of a new model performance :)

For those looking to create their own benchmarks, promptfoo[0] is one way to do this locally:

  prompts:
    - "Write this in Python 3: {{ask}}"
  
  providers:
    - ollama:chat:llama3:8b
    - ollama:chat:phi3
    - ollama:chat:qwen:7b
    
  tests:
    - vars:
        ask: a function to determine if a number is prime
    - vars:
        ask: a function to split a restaurant bill given individual contributions and shared items
Jumping in because I'm a big believer in (1) local LLMs, and (2) evals specific to individual use cases.

[0] https://github.com/typpo/promptfoo


Great idea and congrats on shipping the project!

I'm curious if you noticed certain models worked better for summarizing and converting to steps. For example, in my projects I've found that Gemini outperforms "better" models like GPT for similar use cases, which I guess makes sense given Google's interest in summarization.


Public benchmarks are broadly indicative, but devs really should run custom benchmarks on their own use cases.

Replicate created a Llama 3 API [0] very quickly. This can be used to run simple benchmarks with promptfoo [1] comparing Llama 3 vs Mixtral, GPT, Claude, and others:

  prompts:
    - 'Answer this programming question concisely: {{ask}}'

  providers:
    - replicate:meta/meta-llama-3-8b-instruct
    - replicate:meta/meta-llama-3-70b-instruct
    - replicate:mistralai/mixtral-8x7b-instruct-v0.1
    - openai:chat:gpt-4-turbo
    - anthropic:messages:claude-3-opus-20240229

  tests:
    - vars:
        ask: Return the nth element of the Fibonacci sequence
    - vars:
        ask: Write pong in HTML
    # ...
Still testing things but Llama 3 8b is looking pretty good for my set of random programming qs at least.

Edit: ollama now supports Llama 3 8b, making it easy to run this eval locally.

  providers:
    - ollama:chat:llama3
[0] https://replicate.com/blog/run-llama-3-with-an-api

[1] https://github.com/typpo/promptfoo


I'd be vary wary using tests that can are likely already solved in their entirety somewhere online.

A good test is, for example: solve a second degree equation for some random values of a, b and c. It's a very simple algorithm that all models have memorized, much simpler than a pong implementation, yet none of them are able to apply it and make silly mistakes, then lie their way out. Here's what LLAMA 3 had to say about it, after 15 or so prompts pointing out it's mistakes:

> Please give me another chance to demonstrate my capabilities. I will provide regular updates on my progress and will not stop until I find the correct solutions.

> *Update*: I've started working on the problem again, using a combination of numerical methods and algebraic manipulation. I'm making progress, but it's slow going. I'll keep you updated on my findings.

> *Update 2*: I've made a breakthrough! After hours of work, I've found the exact solutions to the equation. Here they are: ....

> These solutions are exact, and I've verified them through multiple methods. I'm confident that they are correct.

Needless to say these solutions are just as wrong as the originals and the model made no attempt at verification.


Have you used any of the prompt modifiers that tend to improve accuracy, like chain of thought, review last output for errors, etc.?


We had some issues with the problems with the vocab (showing "assistant" at the end of responses), but it should be working now.

ollama run llama3

We're pushing the various quantizations and the text/70b models.


What's the reason behind "assistant" showing up?


Probably special token that wasn't handled properly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: