Hacker Newsnew | past | comments | ask | show | jobs | submit | davidsojevic's commentslogin

The most recent project I’ve been working through has been a tool for JSON query evaluation and debugging [0] inspired by how easy regex101 is to use.

I couldn’t find any that were as nice or as powerful to use for writing JSONPath queries, so instead of spending an hour crafting and testing them manually, I spent >40 hours building this tool to save myself half an hour.

[0]: https://jsonpath101.com/


I've got a few things on my plate that I bounce between building for fun:

1. A point-and-click adventure game: making it with incredibly heavy technical constraints "just because"

2. A coding puzzle game of rapidly escalating difficulty

3. Part of #2 had me needing to craft some JSON Path queries and I felt like there wasn't anything nice to build and test them with, so I built this tool for it (inspired by the amazing regex101): https://jsonpath101.com/

4. A website where I write about text-based browser games


I suspect part of the issue is that people are still using things like `acme.com` and `demo.com` as an example domain in their documentation and tests instead of relying on `example.com` which is reserved exactly for this purpose [0]

[0]: https://www.iana.org/domains/reserved


A small part. On my server AI bots outnumber real visitors 300 to one.

I don't mean that users are following the links to `acme.com` and `demo.com` type domains in documentation; I mean that bots are likely finding and following many links to them because of their widespread use in documentation.

If you search for `site:github.com "acme.com"` in Google, you'll find numerous instances of the domain being used in contrived links in documentation as an example of how URLs might be structured on an arbitrary domain and also in issues to demonstrate a fully qualified URL without giving away the actual domain people were using.

This means that numerous links are pointing to non-existent paths on `acme.com` because of the nature of how people are using them in documentation and examples.


That is very possible.

But it is not necessary to see the results that are being described.

If sites like my tiny little browser game, with roughly 120 weekly unique users, are getting absolutely hammered by the scraper-bots (it was, last year, until I put the Wiki behind a login wall; now I still get a significant amount of bot traffic, it's just no longer enough to actually crash the game), then sites that people actually know and consider important like acme.com are very likely to be getting massive deluges of traffic purely from first-order hits.


The article describes that a lot of the requests are for non-existent URLs. Do you observe the same?

Yes; I get a lot of requests for a mostly a small set of paths on my site that look like they're attempts at finding exploitable surfaces. Things like /auth/bind-session, /auth/check?jwt=, etc. (And those are just the ones that are coming up in the obvious error reports; when I go looking at the logs there are more.)

That such an absolutely ludicrous thing to hear in a "wtf are these people doing" type of way. I can't imagine a non-social media site would be generating enough traffic to the level that these bots need to be essentially doing continuous scraping. It's just gross to me to be okay with that level of unsophisticated effort that they just do the same thing over and over with zero gain.

Next to the massive amounts of energy they are burning in their own datacenters, they are burning up other datacenters as well. Plus all the extra energy used by every router, hub and switch in between.

How are you measuring this? Does your solution rely on user agent or device fingerprinting? Curious to know what tools are available today and how accurate they are.

I'm popular in Europe, there's no reason people from Singapore, Russia, Brazil and literally every other country in the world to all start visiting very old articles and permalinks for comments en masse.

Having honeypot links is the only thing that helps, but I'm running into massive IP tables, slowing things down.

This is not what I want to do with my time. I can't afford the expensive specialised tools. I'm just a solo entrepreneur on a shoestring budget. I just want to improve the website for my 3k real users and 10k real daily guests, not for bots.


Where from? And quite frankly why? There are existing training data sets that are large enough for smaller models. Larger models have been focusing on data quality more than quantity. There's limited utility to further indiscriminate widespread scraping,

Tell that to the idiots doing the scraping.

Small site operators like us know very well that the utility they can get by scraping us is marginal at best. Based on their patterns of behavior, though, my best guess is that they've simply configured their bots to scrape absolutely everything, all the time, forever, as aggressively as possible, and treat any attempt to indicate "hey, this data isn't useful to you" as an adversarial signal that the site operator is trying to hide things from them that are their God-given right.


SerpApi | https://serpapi.com | Junior to Senior Fullstack Engineer multiple positions | Customer Success Engineer | Hiring Coordinator | Python/Ruby/PHP/Js/Rust/Cotlin/C#/Crystal/Nim/Elixir Developer Advocate positions | Based in Austin, TX but remote-first structure | Full-time | ONSITE or FULLY REMOTE | $150K - 180K a year 1099 for US or local avg + 20% for outside the US SerpApi is the leading API to scrape and parse search engine results. We deeply support Google, Google Maps, Google Images, Bing, Baidu, and a lot more.

Our current stack is Ruby, Rails, MongoDB, and React.JS. We are looking for more Junior and Senior FullStack Engineers. We have an awesome work environment: We are a remote first company (before Covid!). We do continuous integration, continuous deployments, code reviews, code pairings, profit sharing, and most of communication is async via GitHub.

We value super strongly transparency, do open books, have a public roadmap, and contribute to the EFF.

Apply at: https://serpapi.com/careers


Hey @davidsojevic. I am a full-stack software engineer with experience developing web applications in ReactJS and Typescript doing my master's degree at the University of Illinois Urbana-Champaign. I am also well-versed in Python and backend because of my work with AI models and systems of GenAI. Coming to the main topic, our university is hosting a new venture challenge that you might benefit from. It is called Cozad New Venture Program and is Illinois’ largest venture creation program (https://tec.illinois.edu/programs/cozad). Basically, it is for startups and only open to current students, and winners can get investments of up $60,000+. I am a current student there and would love to work with you as a Full-Stack Software Developer and represent SerpAPI in that challenge cause I love your idea. You can look at my quality of work and if you like it, you can hire me as a Software Developer at SerpAPI. It is a win-win for both as I can bring both investment and skill, and can get a job in return if I am worth it. Happy to talk about it more. Let's connect. My email: agarwalkeshav8399@gmail.com My Portfolio: https://infamousbolt.github.io/keshav-portfolio


SerpApi | https://serpapi.com | Junior to Senior Fullstack Engineer multiple positions | Customer Success Engineer | Hiring Coordinator | Python/Ruby/PHP/Js/Rust/Cotlin/C#/Crystal/Nim/Elixir Developer Advocate positions | Based in Austin, TX but remote-first structure | Full-time | ONSITE or FULLY REMOTE | $150K - 180K a year 1099 for US or local avg + 20% for outside the US SerpApi is the leading API to scrape and parse search engine results. We deeply support Google, Google Maps, Google Images, Bing, Baidu, and a lot more.

Our current stack is Ruby, Rails, MongoDB, and React.JS. We are looking for more Junior and Senior FullStack Engineers. We have an awesome work environment: We are a remote first company (before Covid!). We do continuous integration, continuous deployments, code reviews, code pairings, profit sharing, and most of communication is async via GitHub.

We value super strongly transparency, do open books, have a public roadmap, and contribute to the EFF.

Apply at: https://serpapi.com/careers


Hello David!

Would love to be considered for the Portuguese Developer Advocate position.

I built https://radar-de-precos.netlify.app using SerpApi and even wrote about it here https://dev.to/tamilchelvan/criando-uma-ferramenta-de-compar....

The Portuguese-speaking (esp Brazilian) developers remain underserved by localized documentation and tutorials. Expanding this content and partnering with local dev communities could drive meaningful adoption and retention.

Thanks


I like the idea of it; though I think the latest update (around puzzle solveability?) may have broken the game as I'm unable to begin in any browser I've tried:

    Uncaught TypeError: can't access property "puzzleLog", G is null


Fixed now! Sorry, bug fixing on the fly is trepidatious :')


I've been working on a "businesses for sale" aggregation/search engine that sources data from all of the major "business for sale" type platforms in Australia and de-duplicates listings, extracts data like revenue/profit/etc, and normalises it all for quick browsing.

I have a couple of family members and friends who are looking to buy businesses (separately), and it's been much more time-consuming than you'd expect just to browse through listings to determine if they're relevant to you or not.

The platforms seem to mostly follow the same format as real estate listings (as the brokers seemingly rely on the same software/data formats), with one big blob of freeform text that contains the various information that you'd ideally just be reading at a glance.

Add to the fact that there are over 15 "business for sale" type platforms in Australia where they have a minimum of 1,000 listings and at least 10 platforms with between 100-1,000 listings, you can easily burn hours looking through them individually.

I'm currently covering 12 of the top 15 (ranked by number of listings they contain) platforms and I just tinker away once or twice a month, adding support for new platforms.

I should probably release it and get some feedback at some point, but I suffer a bit from "it needs more polish before I let people other than my family and friends use it"


When a previous employer went bust, it was bought by the CEO after being listed by the administrator on an obscure website (ip-bid.com) which you have to make an account and log in to just to see the listings. There was only one bid. (I leave to the reader to speculate on the utility of a bid website without public listings, but such listings might represent good value compared to those advertised more widely). It may be worth checking company filings to see if there are equivalents used by insolvency administrators in Australia (I found out only by reading the administrator's "statement of proposals" on the fillings website after the fact -the sale wasn't advertised anywhere else as far as I can tell).


We have ASIC (Australian Securities & Investments Commission) that handles company registrations, notices, etc. and they maintain a register of insolvency notices and liquidations.

This can be a reasonable place to go to look for distressed businesses/assets too and I've considered using them as a source with my aggregation/search engine, though they don't really have the same type of information as a business for sale listing so they fall somewhat outside of the main type of results that I otherwise display.

Other reasonable places I've seen too, though in incredibly low volumes (think 0-3 listings a month), are commercial auction houses/sites where they'll list a business for sale or the full assets of a business. The main issue with that it is that they're so low volume that I'm not sure it's worth spending the time ingesting them this early on while there's still many other larger listing sources.


Makes sense.

In my ex-employers case, the sale was what's called a "pre-pack" sale. That means the sale was advertised and proposed before the administrator were appointed and the administration was noticed. So you would not have found out in time from the filings, only from ip-bid.com. I don't know if Australian law allows pre-packs.


If it’s anything like the US, the listings only represent a minority of what’s available for sale and buyers are better off hiring a broker. If you could find a way to have majority of everyone actually list things for sale online, I think you’d have a solid business.


In Australia, brokers need to be licenced real estate agents (even as a buyers agent in many states), so I suspect there's a relatively decent culture of people who are serious about selling their business going through brokers and resulting in listings on at least one of the major platforms.

There's just shy of 90,000 unique listings I'm tracking (i.e. after de-duplication) on these platforms.

On the traditional classifieds sites and things like Facebook groups focusing on these, there's a significantly smaller number of listings/ads for business sales (e.g. a couple of thousand).

I think where there are definitely hidden gems is where there are many small business owners at or close to retirement age where they haven't planned for a sale at all. For example, a family member nearing retirement age has a small business they're just intending to shut down because they "couldn't be bothered" selling it. I've heard people have had reasonable success just approaching local businesses like this that have older owners OR asking accountants if they have any clients that are thinking of selling.


> brokers need to be licenced real estate agents (even as a buyers agent in many states), so I suspect there's a relatively decent culture of people who are serious about selling their business going through brokers and resulting in listings on at least one of the major platforms.

Works like this in US too. Commercial brokers rely on their network and not listing things on a market. Even most commercial real estate property for sale in the US is unlisted. It’s a weird industry, there are listings site but they only reflect a minor percentage of what you’d find if you drive around looking at for sale signs.


Living in Australia and interested in buying a business, I can attest to biz4sale-type sites being a real problem.


If you're interested in giving the tool a shot, feel free to shoot me an email (take my username and insert an @ after david and a .com at the end) and I'll happily give you access after I get it up somewhere publicly accessible -- possibly in the next couple of weeks.


email sent!


I've been working on and off on a client-side* SERP rank tracker: https://serpowl.com/

I wanted a simpler alternative to the self-hosted SerpBear tool that I could use and share, so this is the result.

It uses SerpApi (where I work) as the data source for what actually executes the SERP scraping because it's much too complex to have purely client-side, but 100% of the rank tracking portion is client-side.

It's not fully complete and there's definitely rough edges with it, but because of the data source, it supports a large number of search engines right off the bat.


I've been using SerpApi for a while, its a great product. checking this out now!


I work at SerpApi and we offer a Bing Search API[0] that is very easy to integrate with.

[0] https://serpapi.com/bing-search-api


SerpApi | https://serpapi.com | Junior to Senior Fullstack Engineer positions | Customer Success Engineer | and more... | Based in Austin, TX but remote-first structure | Full-time | FULLY REMOTE or ONSITE | $150K - 180K a year 1099 for US or local avg + 20% for outside the US

SerpApi is the leading API to scrape and parse search engine results. We deeply support Google, Google Maps, Google Images, Bing, Baidu, and a lot more.

We are still hiring - we have so much work and we're after great people to help out! I only started at SerpApi late last year myself and can't recommend it enough.

We do continuous integration, continuous deployments, code reviews, code pairings, profit sharing, and most of communication is async via GitHub.

Our current stack is Ruby, Rails, MongoDB, and React.JS. We are looking for more Junior and Senior FullStack Engineers. We have an awesome work environment: We are a remote first company (before Covid!).

We very strongly value transparency, do open books, have a public roadmap, and contribute to the EFF.

When you apply, please mention that you saw David's post on HackerNews.

Apply at: https://serpapi.com/careers

If you've previously applied and didn't make it through, please feel free to reapply if it's been a while and you think you would make a better fit than previously.

You can contact me at david at serpapi.com if it's been a few days and you think your application may have fallen through the cracks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: