Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a system built on top of Calibre's "recipe" scripts which scans a set of RSS feeds every day at 3am for new articles, scrapes and cleans the full content where necessary, bundles them into .mobi ebooks, and sends them to my Kindle's email address. Amazon's network wirelessly delivers them overnight, and I wake up in the morning with a fresh batch of reading material. It's like a personalized newspaper subscription.

Similarly, I have a self-hosted instance of Tiny Tiny RSS set up with an array of custom scraping plugins to pull all the web comics I follow into one feed, which I consume with the Android client. I'd push this through my Kindle delivery system, but then I'd be stuck reading black-and-white versions of color comics.

Along the same lines, there are a few YouTube channels I subscribe to whose content can be enjoyed nearly as well in audio-only form. As a university student, I do a lot of walking most days to get from place to place, and I fill that time listening to audio content. The same server which runs my news- and comic-gathering systems also watches those YouTube channels, pulls down new videos, converts them to audio, and publishes the results as podcast feeds which I can subscribe to through Pocket Casts on my phone.



This is cool.

I wonder how effective it is though these days, I'd say 70% of my RSS feeds are either truncated forms of a full article (with a 'click here to continue reading!' link), or just summaries.


That's what the scraper scripts are for. For each site that does this, I have a bit of code which visits the article URL and pulls out the full content.


This is re-implemented by so many. (HN user megous https://news.ycombinator.com/item?id=13226170 Dec 2016.) It would be nice to find a way to share this work, but the variety of tools used often make the customizations too custom (or maybe this is just the most common reason why creators feel there is no reason to share?).

Is anyone aware of any repositories where the customization required to obtain the important content is maintained by a community?


Most site also display data in a much easier way if you identify as googlebot


That’s a great way to kill your page rank.


This is why I gave up on RSS. I wanted to read content quickly without having to navigate to a website.


I think your project is very very cool, and I was thinking maybe we can work together on this. We created a very similar thing, but instead of scraping RSS, we scrapped some well structured pages like newspapers and blogs.

If you are interested in working together please reach me out!

You can check what we did here: https://eink.news


I used to use Youcast (https://github.com/I3arnon/YouCast), which allows Youtube channel or playlist to be consumed as Podcast. I used it with my iPhone podcast app to watch and listen to my favorite Youtube channels when I was off the internet.


Which youtube channels are you referring to? The weather is getting nicer and my walks are getting longer, would appreciate having something great to listen to (versus an audio version of a textbook).


Not OP, but some channels I would recommend that work as audio-only: (mostly science or thought-provoking)

CaspianReport

CrashCourse

ExtraCredits

Innuendo Studios

Kurzgesagt

SciShow (and the sub-channels: SciShow Space and SciShow Psych)

PBS Idea Channel (no new videos, but old content is good)

vlogbrothers

Wendover Productions


This method of downloading a website for later viewing is how Richard Stallman uses the web.

https://stallman.org/stallman-computing.html


I'm currently in the process of doing the exact same thing with my ttrss instance. I am using lxml. I wanted to create "pages" of comics sort of like the funnies, instead of just one big feed, since ttrss can do that already. Would you mind if I looked over your scripts? I'd love to see what your solution was.

I'm also developing something similar to your news scraper but with different end goals. I'd love to talk with you about your approaches in that domain as well.


Wouldn't happen to have this on Github would you?


Been using my tt-rss instance and Fiery Feeds on my iphone for a while now. Fiery Feeds does some magic to get the full articles in case they only display preview snippets.


Very interesting and smart system. Could you possibly share this code on GitHub?


Please share!


So simple!!! Time to build this for my train ride


github please?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: