It is my feeling that sentiment analysis has a ways to go. Here are a few of the comments I've made that the system has described as among my saltiest:
Oh, sorry, "nm" means "nanometer" to me, but of course nautical miles. (Score: -0.5. Comment is entirely taking responsibility for misinterpreting someone)
Well, if he was trying 1M combinations every 40 seconds, for $7 per hour, and he didn't need to use hundreds of dollars per hour of commute time, let's say 10 hours = $70. That's 900M combinations per hour, so 9B combinations in 10 hours. If he was trying combinations using upper-case characters, lower-case characters, numbers, and let's say 20 symbols, that's 82 possible combinations for each one. We'd expect him to find the password after exhausting half of the search set, so we want log base 82 of 18B. That suggests 5 characters. If he let's say just used lower-case characters and numbers, that's log base 36 of 18B, which suggests 7 characters. (Score: -0.32. Comment is 100% technical, with no meaningful sentiment.)
Sorry, I submitted this article earlier with the wrong link. (Score: -0.27. System appears to regard legitimate, largely bloodless apologies as salty.)
Note that the article is from approximately 20 years ago. (Score: -0.24. This is to some degree a critical comment, but it's a short, straightforward statement of fact.)
Probably too late to reply, but I mean things like :ets.method or :queue.whatever. (Score: -0.20. Probably it's cueing off the first phrase?)
Thanks for trying my app!
And thank you for taking the time to respond!
It does have a LONG way to go. These are valid criticisms. And all areas we are working to improve on with our next ML model. (Sad vs Negative, rating numbers as "salty",
This model is pretty simple. It's using TextBlob and looking for a combination of negative sentiment (not necessarily condescending) and subjectivity. Essentially hand built heuristics derived by weighting each word in the sentence. Not a great way to make predictions.
The model is FAR from great. But great from afar. For high level (overall user saltiness) it performs better.
The unlabeled dataset of this size presents some unique challenges but in our testing of our new model (based on SOTA BERT fine-tuining & a large labeled training dataset) the results look promising. I'm really looking forward to getting it deployed.
I am encouraged by the words of @pg who said "you can and should give users an insanely great experience with an early, incomplete, buggy product, if you make up the difference with attentiveness.
Can, perhaps, but should? Yes. Over-engaging with early users is not just a permissible technique for getting growth rolling. For most successful startups it's a necessary part of the feedback loop that makes the product good. Making a better mousetrap is not an atomic operation. Even if you start the way most successful startups have, by building something you yourself need, the first thing you build is never quite right. And except in domains with big penalties for making mistakes, it's often better not to aim for perfection initially."
No problem. By the way, another thing that seems like it maybe departs from the intuition of human-understanding of negative sentiment versus the machine scoring: I notice that almost all of the highly negatively rated comments that it's flagged -- both mine and others -- are relatively short.
I know that there are multi-paragraph laments about how dumb other people are or whatever on HN. In general, those strike me as seeming more salty than even deeply negative one-sentence putdowns. Like, sure, "Javascript is awful" is clearly negative sentiment. But spilling a few hundred words on the topic of "Javascript is awful" is surely more so?
The scoring heuristic could use some work; I've already encountered multiple "salty" comments along the lines of "That sounds awful", with a sympathetic tone, probably tagged because of the word "awful".
Agreed. It looks like they went through a dictionary and scored the words on a negativity/positivity axis, and then just took the mean of all the scoring words in a post.
I have written posts very much saltier than the ones scored as saltiest by this ranking algorithm, possibly because I didn't use inherently negative vocabulary to express a highly negative sentiment.
It's a fun party trick, but its usefulness is limited without semantic analysis or live-human scoring.
20.23% of my posts are rated as "salty". I wonder what percentage of scoring words are rated as negative.
The package used is a pretty popular one called TextBlob. It is nifty for working with unlabeled data like we have with the HackerNews dataset.
We really focused our definition of saltiness around being a combination of (subjective + negative) comments.
We reduced the impact of (objective + negative) as we feel that criticism, while at times painful, if presented objectively isn't necessarily salty.
We built this model fast (1 week) and have since iterated this week into developing a Fine Tuned BERT model that we are training over a much broader set of toxicity, demographic, and polarity features. The training set is much larger and higher quality so we are expecting a large jump in precision upon deployment.
I hope the app gave you some good chuckles as you went around though. It's hard to explain how excited I felt when I saw pg_is_a_butt at the top of my pandas data frame the first time I processed the data.
My saltiest comment was reportedly "If taxing price gougers seems stupid you're going to hate the pitchfork-toting mobs." I'm not sure how to work some kind of contextual analysis into it but I'm pretty sure it's something on your to-do list. Good on you for the creative idea and implementation and putting it out here for us to sprinkle salt on.
I agree. Would have been much easier too. Unfortunately, HN doesn't have downvotes. The current model does incorporate upvotes, but we're seeing a lot more success training with like-kind labeled datasets + BERT fine tuning.
Thank you for trying the app and hopefully V2 will leave you feeling like the system is more precise.
It certainly does; although it's possible they aren't stored independently and simply "cancel out" an upvote, so maybe that's what you meant.
The interface and the graphs are really nice; even though basing the data on votes alone would be less interesting in one sense, I think the rest of the site would still provide value even with that simpler metric.
(testing the tool with this comment, do not take this seriously - I think it's a neat experiment)
This is the worst thing I've ever seen.
I am appalled at how terrible the UI design is. My eyes are literally bleeding because it does not have enough contrast. I can't even see my "sweet" comments so I can inflate my massive ego.
I take personal offense that the site evaluated 9 of my comments as being salty. Maybe the site is just salty!
The developer should take personal responsibility for this tool by manually counting the distribution of colors in 400 bags of skittles!
Also, ordbajsare is right. Javascript is disgusting!
Here's some more words I think the algorithm dislikes: horrible, screwing, dreadful, idiot, stupid, retard
Yeah, I noticed that trying to use the back button sets you back to the front page, not the actual last user you looked at. Similarly, modifying the URL to hackersalt.com/usernametolookfor doesn't actually work, despite being what appears when you get a search result.
The first list is mostly people who I don't recognize, and assume are uncommon posters, where one or two bad comments put them on the "saltiest" list.
The other two top lists are more interesting, because while there is a lot of salty people in there I recognize, a bunch of people make the list solely based on total prolific amount of comments. tptacek, of course, as well as literally all of the HN moderators.
Apparently in terms of the raw number of salty comments, I rank 217.
I would worry about my methodology if I saw dang, who basically only ever weighs in to politely tell people to be less salty, in the #2 spot on my "Total Salt Score" list.
Again, that list is hugely weighted by frequency of comment. Someone who posts more than almost anyone else will end up vastly higher on the list, even if the majority of comments are not salty. People whose job it is to post on this forum, and particularly to step in where things get negative, is almost sure to manage to rank on the list.
And of course, if you click on him, you can see his "saltiest" responses to people. Some are arguably decently salty, though many are pretty polite too.
What's your definition of 'salty'? I think you can definitely be politely salty, and dang's comments are all saying (at least) 'don't do that' which seems to fit.
My understanding of the word lines up pretty well with the Merriam-Webster definition (https://www.merriam-webster.com/dictionary/salty): feeling or showing resentment towards a person or situation; bitter.
By that standard, just telling someone not to do something wouldn't be salty. Salty is telling someone not to do something because only an idiot would do that.
I assumed as much. I've played with what HN looks like with showdead: off from time to time, and relatively happy to trust the moderators and leave it turned on. (I am both glad I have the option to view it though, and glad people who are dead-ed still have the ability to participate. I think HN's system here is quite nice.)
Yeah, does pg_is_a_butt@ not realize that most people cannot see his posts? You'd think he would figure it out after a while as nobody is replying to him.
I'm sort of proud to note that my salt score is -0.08, with my saltiest comment being:
The iPad and iPhone are especially dangerous when it comes to accidental downvoting. Separating or enlarging the arrows would help those of us with fat fingers.
However, (and this will apparently add to my salt score ;)), I'm curious how a comments like these get rated as expressing a negative sentiment?:
Out of curiosity, how does Metro look to color blind people?
I used to get terribly sleepy in the afternoons; sometimes I'd go out to my car and take a 15 minute nap, even in the brutal Texas summer. Then I started taking vitamin D and went on a paleo diet, and now I almost never get tired in the afternoons. Nada. It's a great relief to not always be fighting to stay awake.
I'm nothing but salty and my score is -0.13. Garbage in garbage out.
Sentiment analysis is overall a dangerous way for social media players to punt around 'things we don't like', whatever that may be. You will end up shadowbanned on whatever social media platform you use, because you're not posting cat photos and other prosaic content that people trying to sell insurance policies, sugar water, and watches are okay with.
Thankfully, this is such an ill-posed problem, it will just be an embarrassing boondoggle.
I wonder if the ML has learned that words like "brutal", "fighting" "tired" or even "terribly" imply saltiness. If that's the case, this reply to you is going to be pretty darned salty.
I bet it's using a version of the basic sentiment score system used that just takes the 'positivity or negativity' of each work and then sums it up to figure out the sentence. Some are better than others but the most basic version is literally just a list of words with points and the sum is the score.
I like the idea, but I'm very skeptical of the implementation. My saltiest comments per the tool are not remotely salty, and I'm sure I have many saltier comments. Turns out NLP is hard.
> I like the idea, but I'm very skeptical of the implementation. My saltiest comments per the tool are not remotely salty, and I'm sure I have many saltier comments. Turns out NLP is hard.
> FWIW, my salt score is -0.09 ;)
Same salt score, same opinion on the results. My top salty comment has a score of -1.00
> > > Unfortunately it's not a bottle. It's just a plastic cup with a lid and a straw. A really big plastic cup that tapers at the bottom so you can fit it in your vehicle's cupholder.
> > In USA even the cups have muffin-tops!?
> Yep. Horrific, isn't it?
It's a joke, it's not terribly on topic, and it's a bit glib. But salty? Nah.
Fourth most "salty" comment, at a score of -0.50:
> I like having a library that I can flip to when I'm bored and want to do something productive.
Seventh most salty, at -0.45, is literally just an explanation of the second argument in Javascript's parseInt function. Ninth is just an earnest recap of a conversation about earworms.
Meanwhile the saltiest comment I saw in a quick skim is in 12th place, with a score of -0.30. I was discussing House of Cards and the then recent scandal around Kevin Spacey, and someone was putting words in my mouth and claiming I said things that I didn't actually say or believe. I was quite salty in my response(s), but here the tool marked it as 70% less salty than joking about muffin topped cups.
Defining saltiest by counting salty words has some serious limitations. #2 on the "Total Overall Score" list, merricksb, only points out reposts, and politely points users to the original post. merricksb's only sin is using the words "discuss" and "discussed". Maybe those are considered salty because "discussed" sounds like "disgust"?
How is saltiness trained? Are there readily available sentiment analysis tools or did the author train it himself? Anyway, cool and entertaining idea :) PS: not just seeing the saltiest but also the sweetest comments of a person would be nice.
Direct links just go to the home page, even though that's the URL produced by putting your username into the search box.
Rather than say something salty about that, I'll just say what we see here is an excellent example of something that should be based on traditional pages rather than a single-page application. In fact, it seems to be an SPA emulating pages poorly, and I can't imagine why anyone would want to do that.
Thank you very much for your feedback Zak. Those are totally valid points. We will definitely continue to work on the app and fix the search function and share-ability of urls.
what turns me off about HN is how most of the conversations are argumentative and not productive discussion. I don't understand why the urge to correct or invalidate other peoples' ideas is so pervasive. Is it an appeal to control? Is it a subconscious ploy to be liked and accepted by finding a "weak idea" and attempting to destroy it, so that others will fear and revere you? Is it a yearning to be noticed and appreciated for how smart you are? I sense this layer of hostility lying about 1mm below the figurative skin of most of the commenters here. The moderation is strong here, so the hostility is usually cloaked under innocent sounding barbs like "what an odd thing to say" or "I'm baffled at <what you said>". No, you're not "baffled", you're shooting a barb at that person by trying to alienate them and making it sound like their comment was so bizarre and nonsensical that it left you "baffled" when really you just disagree with it so much you're signalling that you won't even attempt to consider it.
Another great one is "you're getting confused about ___". Nice casual drive-by insult on their intelligence.
I really wish the mods had just capped total users on HN, years ago. Maybe open it up (silently) once or twice a year to keep a minimum number of engaged commentary. But it's essentially the same thing as reddit was 5 years ago.
Amusingly, this seems to think I'm pretty unsalty, assuming I'm not misunderstanding something.
This is amusing from two perspectives:
1. Posting as openly female has been enough drama that the low score doesn't really jibe with how other people seem to perceive me.
2. I have a condition that causes unusually salty sweat. There are other people here with a more severe form of it who presumably are saltier than I am, but most of you people can't compete with my brine.
My guess is pg_is_a_butt gets enough salt from his username.
But this got me curious. Who are the sweetest users?
And what kind of topics generate most salt? This will be interesting because there has been a general trend on HN where people bemoan "HN hates X" with X being cryptocurrency, Tesla, new starts on Show HN.
Actually, username & story titles aren't included in the model. We may experiment with including them in the future but for now we left them out to avoid any undo bias.
Sweetest users is a good question. I'll have to build out that view some time.
It seems to be based on the individual words, not the meaning. "Grim" and "dystopian" are probably considered "salty" words. My lowest scoring comment was about dissonance in music, and presumably "dissonance" is a "salty" word too.
I'm seeing this "Firefox detected a potential security threat and did not continue to www.hackersalt.com. If you visit this site, attackers could try to steal information like your passwords, emails, or credit card details."
Fun to see “Things fall apart; the centre cannot hold; Mere anarchy is loosed upon the world” is ranked above “Robert Scobel is an ass”. The algorithm seems to know about tech c-list celebs AND able to recognize great literature.
I’d love to see a category like “productively salty” which would do something like an h index ranking. Your score is the highest number n such that you have n salty comments with at least n net score.
Sorry about that, and thank you for the feedback. I'll try to make sure it is more readable in the next version. Thanks for taking the time to check it out!
Your account is pretty new. We intentionally remove any accounts with less then ? number of comments to discourage people from attempting to set a record. Lol
a computer converted a human's action into a number and graphed it using sophisticated numerical algorithms and analysis methods. ergo it must be true.
Well done, this is a great comment! And I really disagree with it.
I actually think "salty" is a very deep misunderstanding of why people are writing such comments, and the attempt to politely call it negative by labeling it "salty" is ironically itself very negative and aggressive.
By making that timid, low-level aggression very blunt and calling it "nihilistic asshat", you are making a very good point. But I don't think they are actually nihilistic asshats. And I think even a true nihilist asshat is worthy of some respect.
Oh, sorry, "nm" means "nanometer" to me, but of course nautical miles. (Score: -0.5. Comment is entirely taking responsibility for misinterpreting someone)
Well, if he was trying 1M combinations every 40 seconds, for $7 per hour, and he didn't need to use hundreds of dollars per hour of commute time, let's say 10 hours = $70. That's 900M combinations per hour, so 9B combinations in 10 hours. If he was trying combinations using upper-case characters, lower-case characters, numbers, and let's say 20 symbols, that's 82 possible combinations for each one. We'd expect him to find the password after exhausting half of the search set, so we want log base 82 of 18B. That suggests 5 characters. If he let's say just used lower-case characters and numbers, that's log base 36 of 18B, which suggests 7 characters. (Score: -0.32. Comment is 100% technical, with no meaningful sentiment.)
Sorry, I submitted this article earlier with the wrong link. (Score: -0.27. System appears to regard legitimate, largely bloodless apologies as salty.)
Note that the article is from approximately 20 years ago. (Score: -0.24. This is to some degree a critical comment, but it's a short, straightforward statement of fact.)
Probably too late to reply, but I mean things like :ets.method or :queue.whatever. (Score: -0.20. Probably it's cueing off the first phrase?)