It seems to rely on a willingness of the company owning the data to disclose their full data set up you. Currently, with things like GraphQL, we are moving in the opposite direction in that the server only sends you those columns that are absolutely required to fill the fields in your GUI.
Since they used it as the example, I don't see any incentive for AirBnb to let random people on the internet download their full internal data tables. Quite to the contrary, AirBnb will block you from accessing their servers if they believe that you are scraping.
So this is a new way for users to toy around with the limited incomplete data set that the website operator was willing to give them. But it won't empower users. What if AirBnb implements server-side pagination, so that your client doesn't even receive the data for the cheapest apartment, because it's on a different page?
Tools like this would be perfect in theory to enhance social networks like LinkedIn with an export and batch processing capabilities. But the company claiming ownership of your contacts will surely prevent you from actually getting a useful export.
Plus there's cases where the data is on a server because it's impractically large. For example, try this to improve your Google search results. Downloading a 100mio row spreadsheet as the first step?
You're absolutely right that limited data access and pagination exclude certain types of modifications.
So far, we've decided to defer thinking about that limitation, and first focus on other questions like getting the spreadsheet interactions right. We're making new site adapters every week and finding that we can build lots of useful modifications for ourselves which work even with only one page of a paginated list. For one example, see my demo of modifying HN front page [1], which I find useful even though it only loads the current front page articles.
At some point, we're considering adding more features around fetching subsequent pages of a table (as explored in Sifter [2], which sorts an entire list of search results across pagination boundaries) or scraping each detail page from a table (as explored in Helena [3]).
Tell the websites what is being done with the data/spreadsheet. If hacker news is being filtered to exclude domains, or people are searching for all things LISP, the admins could use that information to change the website. Try making a sharing website (like greasemonkey scripts) -- users post scripts and discuss what their trying to do, and website admins can comment and post changes or scripts, etc...
I have another comment on this thread that discusses the difference between research and engineering. The goal of this project is not to improve your google search results via the provided framework. That argument is an fine use of reductio ad absurdum, but it assumes a different premise than the one that the paper is addressing. The paper is an inquiry into where are we building systems that could empower user modification but for some reason or another are not. I encourage you to read the Related Work section of the paper to perhaps pattern match on other more fleshed out systems that might demonstrate the end goal in a way you've seen before.
It seems to rely on a willingness of the company owning the data to disclose their full data set up you. Currently, with things like GraphQL, we are moving in the opposite direction in that the server only sends you those columns that are absolutely required to fill the fields in your GUI.
Since they used it as the example, I don't see any incentive for AirBnb to let random people on the internet download their full internal data tables. Quite to the contrary, AirBnb will block you from accessing their servers if they believe that you are scraping.
So this is a new way for users to toy around with the limited incomplete data set that the website operator was willing to give them. But it won't empower users. What if AirBnb implements server-side pagination, so that your client doesn't even receive the data for the cheapest apartment, because it's on a different page?
Tools like this would be perfect in theory to enhance social networks like LinkedIn with an export and batch processing capabilities. But the company claiming ownership of your contacts will surely prevent you from actually getting a useful export.
Plus there's cases where the data is on a server because it's impractically large. For example, try this to improve your Google search results. Downloading a 100mio row spreadsheet as the first step?