Hacker Newsnew | past | comments | ask | show | jobs | submit | johnjreiser's commentslogin

Appreciate that it handles emoji as well. Can't distinguish between smileys though.


I also appreciate that Hn removes emojis from comments. :'(


I'd counter with an anecdote; I had a colleague that boasted how he memorized a classmate's SSN in college and would greet him by SSN when seeing him years later. Is the goal of AI to replicate the entirety of the human experience (including social pressures, norms, and shame) or a tool to complement human decision making?

While, yes, you can argue the slippery slope, it may be advantageous to flag certain training material as exempt. We as humans often make decisions without perfect knowledge, and "knowing more" isn't a guarantee that it produces better outcomes, given the types of information consumed.


Knowing more might not improve your accuracy but it's not going to harm it. Forcibly forgetting true parts of your knowledge seems far more likely to have unintended consequences.


Counterpoint: There are plenty examples of breakthroughs from folks who are ignorant of the “right” way to go about it. A fresh take isn’t always bad.


I disagree. Actively fighting against your memory will slow you down in any context where some memorized idea is similar to what you're doing but you shouldn't be using the memorized idea.


One obvious consequence: the model might still produce copyright infringement because it thinks its creative ideas are novel.


If the copyrighted content is not in the training data, and I mean explicitly, and the AI produces a copyrighted output, I'd argue it's a clean room re-implementation, and also it ought devalue the original work, moreso if the work is more recent. Maybe.

I get that "first to publish" matters to a lot of people, but, say 5 unrelated people are writing unique screenplays about a series of events that seems important to them or culture or whatever; if they all come up with very similar plots and locations and scenes, it just means that the idea is more obvious than non-obvious.

Please, argue. I haven't fully reconciled a lot of this to myself, but off the cuff this'll do.

The logic being - if an AI without taint produces some other work, that work drew on the same information the model did, and came to the same "conclusion" - which means with a time machine, you could wipe the LLM, go back to the period of the original work, train the LLM, and produce the work contemporaneous to the original. Hope that made sense.


> The logic being - if an AI without taint produces some other work, that work drew on the same information the model did, and came to the same "conclusion" - which means with a time machine, you could wipe the LLM, go back to the period of the original work, train the LLM, and produce the work contemporaneous to the original. Hope that made sense.

This logic would immediately get shot down by an "Objection, speculation" in an actual litigation. Besides, the technicalities of how the work was produced don't really play a role in assessing infringement. PK Dick wrote "The man in the high castle" by extensively using the I Ching, but if I use it and recreate the novel by complete accident I would still be infringing.

By the way, I highly suggest Borges's "Pierre Menard, Author of the Quixote" as a great story on the topic of authorship :)


> PK Dick wrote "The man in the high castle" by extensively using the I Ching, but if I use it and recreate the novel by complete accident I would still be infringing.

I touched on this, with the comment that we love "first to market." That multiple people coming up with the same output may mean that the idea isn't that novel. whether that matters or not isn't really relevant to me.

The part you quoted was just a thought experiment to explain why i compared it to a "clean room implementation" - note it also avoids this argument from a sibling comment:

>need to show that the AI hadn't seen anything derived from that copyrighted work

since there could not possibly be any derived work prior to the "original" work being published. For the sake of argument.


> If the copyrighted content is not in the training data, and I mean explicitly, and the AI produces a copyrighted output, I'd argue it's a clean room re-implementation

You can't claim it's a clean room without actually doing the legwork of making a clean room. Not including the copyrighted work verbatim isn't enough, you would need to show that the AI hadn't seen anything derived from that copyrighted work, or that it had seen only non-copyrightable pieces.


I highly suggest Borges's "Pierre Menard, Author of the Quixote" as a great story on the topic of authorship :)


The repetition of the end of lou1306's comment (https://news.ycombinator.com/item?id=44190054) "By the way, I highly suggest Borges's 'Pierre Menard, Author of the Quixote' as a great story on the topic of authorship :)" has to be a joke ... right?


Good question! Is Pierre Menard's Quixote a repetition of Cervantes' or is it a completely different work that just happens to contain the same words?


> Is Pierre Menard's Quixote a repetition of Cervantes' or is it a completely different work that just happens to contain the same words?

I think that that is not the right question. It is a repetition of Cervantes's work by design, at least if one takes, as I do, 'repetition' to mean saying or writing the same words in the same order. I think the question is whether it is therefore the same work, or a different work that contains the same words.


Well played :)


The goal of AI is to make money. All the moralisation is very human, but also extremely naive.

BTW, I don't really understand what "social pressure" and "shame" has to do with your story? In my book, the person with a good memory isn't to blame. They're just demonstrating a security issue, which is a good thing.


In that example, the mnemonist should be demonstrating the security issue to the government, and not to their friend. We have social taboos for this reason. As an extreme example, I wouldn't greet a person by their penis size after noticing it in the locker room - some information should still be considered private, regardless of how we came to obtain it.

Same with an LLM, when it got sensitive information in its weights, regardless of how it obtained it, I think we should apply pressure/shame/deletion/censorship (whatever you call it) to stop it from using that information in any future interactions.


I am probably too autistic to recognize remembering a personal datum as a taboo.

However, I am totally on your side regarding LLMs learning data they shouldn't have seen in the first place. IMO, we as a society are too much chicken to act on the current situation. Its plain insane that everyone and their dog knows that libgen has been used to train models, and the companies who did this experiencing NO consequences at all. After that, we shouldn't be surpised if things go downhill from here on.


I can agree with that in principle, however it shouldn’t take months. If there are multiple levels of middle management here, issues that arise in a probationary/grace period should be addressed and course-corrected. You have that period post hiring, not in a drawn out review process.


There aren’t multiple levels of middle management though. This position reports directly to a very senior manager and will be expected to grow an “tree” of managers and employees under them.


Please correct me if I misunderstand, but doesn’t that mean there will be multiple levels of middle management?


Amen. If an organization is taking months to come to a hiring decision that’s a red flag, unless it’s a C-suite level position, where a misstep could have irreversible damage. A tech position should be able to close within a few weeks, lest the candidate get a better offer from a less dysfunctional company.


My first hand observations: took 4 months to hire, 5 months for the candidate to start.

My first hiring in a corporate world so I had to learn a bit.

Then we are expected to plan for diversity, not even looking at qualified candidates until we have met diversity benchmarks.

Then my group had a committee kind of setup where one person could veto a candidate. Being my first hiring, I had to go along with it for a while. At one point, we had a viable candidate, but there was one person who was sideways and my manager did not want to go ahead unless every single panelist said yes. We held on for another month trying to look for other better candidates. The candidate was nice enough to wait. BTW we had a viable candidate even before that, within the first 2-3 I think, and that candidate did not wait for us.

Then getting the offer approved and negotiated and signed takes time.

It has been a pain, and after hiring quite a few people in startups or consulting companies, I could have done it in less than a month, but we had our inertia.

> where a misstep could have irreversible damage.

However, when I see it help is that we have a person in our peer team who is not good and they are trying to let that person go. It is in Europe so laws a bit more stringent. It is quite a work to let someone go so the fear of a bad hire and then having months of work to let that person go is quite real. And this is for a run of the mill IC.


Four months worth of interviews is insane, as is the need for unanimity in a hiring panel. On top of that, diversity quotas are explicitly illegal under US law. The organization you work for sounds like it is profoundly dysfunctional.


Of course this was my first hiring, so I had some learning to do including understanding org politics and let's say I wasted a month in that. And 4 months was not all interviews, it was the end to end process from the time I was given the position till the time the offer was sent out. Pretty sure I will be able to do it faster next time around, all other things including job markets being same.

And there were no diversity quotas, just that management will keep asking are you being diverse and my manager will throw a fit every time his managers will ask this question since our org was pretty much uniracial, but it was not me who built it :)


This is mostly true in the US. The map is reflecting districts. I happen to be in a small town in New Jersey, so we're an elementary district with one school. Children after 6th grade go to a middle school and high school that is part of the secondary district which overlaps other municipalities/elementary districts. Some districts have multiple elementary schools and their own local level maps (not reflected here) that influence placement in the elementary schools in the district.


Atlas (and Felt, as another commenter mentioned) are interesting, but seem to be targeting a market I feel is too small. If you're established in GIS, you likely have your own stack (and biases). If you're new to it and need something more than just a visualization tool, these offerings can work well, but I fear that users may quickly outgrow the functionality and then move to a more conventional GIS offering.

I do like how Felt includes QGIS integrations as a marketing point. I feel like tools like these are great compliments, but not wholesale replacements. "The new standard for GIS software" is a gross overstatement. If I normally deal with data in the multi-GB range, limiting my uploads to 250MB seems woefully insufficient.

And I think far too many people have experienced data loss when any new platform goes under. Both Atlas and Felt have "sign in with Google" but not Microsoft OAuth which just seems odd to me unless they truly aren't targeting the existing desktop GIS user community. Import/export from OneDrive/O365 would likely be a selling point to many GIS users.


> If I normally deal with data in the multi-GB range, limiting my uploads to 250MB seems woefully insufficient.

I think you're on to something here, which is that bytes stored isn't a great price discriminator for this class of software. The SaaS business model for Sentry or Notion succeeds because bigger teams store more content and have a higher willingness to pay.

For mapping applications, the county government GIS analyst might work daily with a 10GB aerial raster or parcel footprint dataset and be willing to pay $100 for a slicker solution, while a boutique real estate sales office stores a couple hundred dots (kilobytes of data) and is willing to pay $XXXX for the same solution!


As the article states, WiFi monitoring is likely happening on a large scale. Why purchase outside data when you have better data (including authenticated users) already in house?


I don’t believe it falls under FERPA. Plus, the WiFi monitoring tracks everyone with a mobile device, regardless of matriculation status.


They’re all doing it, just at different levels of nanny-state oversight.


Exactly this. The metadata is flexible and OSM could readily support a quality assessment within a "slowways" namespace.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: