We use it in our CI, just before deployment, to compare the DB structure of what's being tested with the DB structure of our staging or production environment.
It's a last minute check that prevented a lot of mistakes.
It absolutely should, though you might want to have a specific set of questions you ask in an ordered fashion as things like column order might differ between production and development due to data sizes, dropping and recreating dev instead of migration, etc.
None of those things actually being a problem could give you false positives, so you might want some minor shuffling.
It's similar in the sense that it's a listing of APIs. But we're providing non-official APIs that work by automating / scraping websites. Our platform runs Headless Chrome instances behind the scenes.
Well, it's up for debate. We automate websites on behalf of our users (that is, logged in as them). Which means the site knows at all time who's doing what and can take action in case of abuse.
Also, we see more and more ruling indicating that scraping is in fact legal. Websites can block users according to their ToS but they can't take legal action against them or us. Maybe.
In any case, our platform also provides the tools for anyone to automate any website (make them into an API). That part is just a developer tool.
>We automate websites on behalf of our users (that is, logged in as them). Which means the site knows at all time who's doing what and can take action in case of abuse.
So you break ToS on your user's account, thereby risking their own and not yours... Even better.
>Also, we see more and more ruling indicating that scraping is in fact legal.
It doesn't matter if it is legal. What matters more is if the service considers it a violation of an implicit agreement not to abuse servers with rapid API requests (Big props if you are already throttling)
---
Like your service is a great idea, but breaking ToS on your user's accounts is super no-bueno in my opinion. I scrape too but I am always under the complete understanding the service can ban my account or IP at any time.
Obviously, it's up to each site's specific TOS. Tons of sites explicitly call out scrapers and non-human/automated means of accessing the site. You might debate over definitions and intent, but ultimately it's up to the site owners when they say, "you know what? X _is_ against the ToS and we're just gonna ban anyone doing it"; users won't/don't have any recourse to argue their point.
For example, here's a few relevant parts for the top sites on Phantom Buster:
>These terms govern your collection of data from Facebook through automated means, such as through harvesting bots, robots, spiders, or scrapers ("Automated Data Collection"), as well as your use of that data. You will not engage in Automated Data Collection without Facebook's express written permission.
>We prohibit crawling, scraping, caching or otherwise accessing any content on the Service via automated means, including but not limited to, user profiles and photos (except as may be the result of standard search engine protocols or technologies used by a search engine with Instagram's express consent).
> In order to protect our members’ data and our website, we don't permit the use of any third party software, including "crawlers", bots, browser plug-ins, or browser extensions (also called "add-ons"), that scrapes, modifies the appearance of, or automates activity on LinkedIn’s website.
I'd give a read through each of the APIs offered and make sure that users know 1) your service has the potential to get accounts banned for use, and 2) since the service is on behalf of the user's accounts, it'll be their accounts getting banned if the websites ban anyone.
FWIW I wrote this comment while watching a bot (that I wrote) play a game on my behalf on a second monitor. :)
LinkedIn example tho: I think an interesting argument could be made that they should be blocking accessibility extensions / tools. Since these ( to some extent ) modify and automate UX.
I guess the question in the end is not terms. It is enforcement. Clearly ToS do not cover all cases, and even tho LI ToS say "Thou shalt not scrape" the courts adjudicated differently. So what matters is -- what is enforceable and actually enforced?
The issue of acting as "agent" for user is very important. I don't think the current way this tool does it is OK, because banning is a bad thing. Maybe there is a better way to set it up. Or maybe I'm wrong.
Wow, you are located in San Francisco. I would have guessed in India or somewhere else where you might not fear being sued.
>> "is in fact legal"
There is a big difference between "legal" and "court decision" . If a court will rule in hiq labs favor vs. linkedin it doesn't make scraping linkedin automagically legal for you.
Not sure what your link proves. In reality, no one knows if this statement to be true because addresses and wallets can be shared. There was a Bloomberg article with lots of interviews on this topic with people in crypto business and no one seem to deny this: https://news.ycombinator.com/item?id=15877838
You'd probably find that 40% of US equities are owned by fewer than 1000 asset managers as well, so I'm not sure that this is all that relevant. If Vanguard decided to liquidate their holdings overnight there would be pandemonium just as if these people decided to liquidate their BTC, but it's not going to happen in either case.
We're developing a library[1] for this use-case. The goal is to have the same simple API for multiple browsers. Right now it supports both Headless Chrome and PhantomJS.
We think we'll begin work on Firefox headless soon. PRs welcome :)
They're a hassle-free way of getting the data. No need to worry about CORS, sessions, cookies, CSRF and other modern web stuff. Just simulate a human and you’re in.
Yeah, I used to work for a company doing similar things. Was more expensive that way, but there were tons of sites you could only get the data being in a proper browser.