Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Open Source Bot That Summarizes Top Hacker News Stories Using GPT-3 (github.com/jiggy-ai)
71 points by wskish on Nov 26, 2022 | hide | past | favorite | 32 comments
HN Summary is an open source bot which sumarizes top stories on Hacker News and publishes the summaries to a Telegram channel.

Whenever a new story appears on the Hacker News API /topstories.json endpoint, this bot summarizes it (currently using OpenAI GPT-3 text-davinci-002) and sends the story title, summary, and url to the hn_summary channel on Telegram.

The purpose of this project is to help build intuition on the capabilities of the current generation of large language models while making a broader swath of top Hacker News content more easily accessible. It could also serve as a platform for experimentation with other language model capabilites such as semantic search.

Join the HN Summary channel on Telegram to see the bot in action and enjoy the story summaries.

https://t.me/hn_summary

There are a number of potential directions I am interested in exploring here, such as making the bot interactive to allow features like bookmarking urls, semantic search, and semantic filtering.

I am also thinking about interfaces other than telegram. I started with telegram since I had recently used it on another project and it seemed like the easiest way to start playing with it. What other interfaces would make sense here?



FYI, here is the current prompt prefix we prepend to the story text before sending to GPT-3. Any suggested alternatives? Will try them based on points here.

"Provide a detailed summary of the following web page, including what type of content it is (e.g. news article, essay, technical report, blog post, product documentation, content marketing, etc). If there is anything controversial please highlight the controversy. If there is something unique or clever, please highlight that as well:"


Browsing through the summaries, GPT-3 seems pretty good at capturing “controversial” aspect of the stories, which is quite impressive.


yeah news articles really tend to highlight the "controversial" aspect. Unfortunately they are also the hardest ones to scrape...


I'm curious: is the "please" you added purely a communication habit or does it lead the model to provide better results?


It is perhaps unnecessary, and I don't have any data that would support that it makes things better, but I am basically guessing that in the training set of {mostly the entire internet} the responses to questions that include "please" might be ever so slightly higher quality! Perhaps it is just a superstition...


"An MP4 file first draft https://news.ycombinator.com/item?id=33741701 This is a blog post from the Twitter Help Center. It explains that the MP4 file format is not supported by Twitter. It recommends that users switch to a supported browser or enable JavaScript."

Declared with such authority!


haha yeah thats pretty funny. We need to figure out how to derate these models when the have no actual clue. And also how to scrape the source content even better...


> What other interfaces would make sense here?

RSS feed maybe?

Heard people complain about the lack of goodness in the RSS where it basically just gives you a link to the article so they can get the page views.


I would like to see HN discussion summaries, of what HN comments section brain talk about.


What would you like to see as a summary of the discussions? Sentiment analysis? Summarize each discussion as one sentence? Summarize the overall unique perspectives? Any more ideas?


Definitely, it would be great to figure out how to summarize the discussions and work them into the UX. Anyone please send PRs with ideas around how this might work!


Do separate channel with delay to processing HN posts from 8 to 24 hours, and maybe link to first channel in posts.


Thought about that idea on twitter comments. I’m not sure if you could monetize that kind of product though.


It would be great if hacker news can integrate this feature in the main news so people can read the summary without leaving the page.


yeah, one of the biggest issues on HN is that folks comment without reading the story article. Perhaps this sort of summary of the story article can help drive a higher quality engagement.


This could be done using a browser extension for desktop which fetches the already summarized text generated for the bot.

Or better yet, rather than costing you server hours, it would authenticate the user with openai directly so each user uses his personal account


For anyone interested in seeing summaries without telegram you can now view the summaries for the corresponding front page here:

https://news.jiggy.ai


If only n-gate were still publishing their own weekly HN summaries still. The humor was a much needed filter to keep me from getting too serious about things.


what was n-gate? I have a pet theory that humor will be the actual litmus test for AGI.



wow, good stuff; would be super entertaining to try to duplicate that with models


This is great


Doesn't text-davinci-002 have a text input limit , what happens to stories that exceed that limit ?


We use the GPT2 tokenizer (which is apparently the same as the GPT3 tokenizer) to count the number of tokens in the extracted input text and brutally truncate the input text to keep it under the limit. There is a lot of room for more finesse in this area if anyone wants to help out. See open issues in github.


You can summarize them recursively: https://openai.com/blog/summarizing-books/


The summary is great! Now I don't have to see all those ads when I click into the story!


What is the current best option for self-hosted summarization?


with some additional coding on top, smaller models like GPT2 and BERT can produce decent summaries, a few summaries per minute depending on your target quality.


Here are a few recent ouputs from the service:

Editing Binaries in DOS (2002) https://news.ycombinator.com/item?id=33737207 This is a technical report on editing binaries in DOS. The author describes how to use the debugger program DEBUG.EXE to edit binary files without requiring additional tools. The author provides two examples of how to edit a binary file. The first example edits a string of bytes in an executable file. The second example edits machine instructions to alter the behaviour of the program.

Elon Says He’ll Make His Own Phone If Apple and Google Deplatform Twitter https://news.ycombinator.com/item?id=33750616 This is a news article discussing a potential new phone by Elon Musk. The article discusses the controversy around whether or not Twitter should be banned from Apple and Google's app stores. It also discusses the possibility that Musk may make his own phone if Twitter is banned.

IRS warns taxpayers about new $600 threshold for third-party payment reporting https://news.ycombinator.com/item?id=33750598 This is a news article from CNBC discussing the IRS's new $600 threshold for receiving Form 1099-K for third-party payments. The article explains that the change applies to payments from third-party networks, such as Venmo or PayPal, for transactions such as part-time work, side jobs or selling goods. The article also explains that the IRS is reminding taxpayers that they may receive Form 1099-K for transactions they don't expect, such as reselling Taylor Swift tickets at a profit, for example. The article advises taxpayers that if they receive Form 1099-K for personal transactions, they should contact the issuer for a correction.


Just switched to newly released davinci-003 model.


You can seem comparisons of 002 and 003 here for current top HN stories:

https://news.jiggy.ai


I will fear AI when it produces consistently funny n-gate style summaries of Hackernews.


[deleted]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: