For both Reddit and HN there are complete dumps that you can download easily within much less than a day. I have worked with both, the reddit dump from pushshift ist quite big (https://files.pushshift.io/reddit/ several TB?). Scraping HN completely from the API is much, much smaller, around a few GB if I remember correctly.
But of course this is still bullshit and doesn't really use that data.
But of course this is still bullshit and doesn't really use that data.