Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel like a graph database is a solution to an issue I've faced (and, continue to face) and it may just be because that I haven't spun one up and tried or that the documentation/examples don't stick out. But could someone confirm my feeling? If my feeling is correct, I'd enjoy verifying it with EdgeDB or the like.

My example/requirement: I have a user wanting to find best-matching blog posts. Every post is tagged with a given category. There could be 100+ categories in the blog system and a blog post could be tagged with any number of these system categories. A user wants to see all posts tagged with "angular", "nestjs", "cypress" and "nx". The resulting list should return and be sorted by the best matches, to those of least relevance. So, posts that include all four tags should be up top and as the user browses down the results, there are posts with less matching tags.

What I've seen with SQL looks expensive, especially if you search with more and more tags. I may just not know what to search for though, re. SQL. Is there a query against a graph database that could accomplish this?



EdgeDB employee here. I couldn't have asked for a better question to demonstrate the power of subqueries! Here's how I'd do this in EdgeQL:

  with tag_names := {"angular", "nestjs", "cypress", "nx"},
  select BlogPost {
    title,
    tag_names := .tags.name,
    match_count := count((select .tags filter .name in tag_names))
  }
  order by .match_count desc;
Which would give you a result like this:

  [
    {
      title: 'All the frameworks!',
      tag_names: ['angular', 'nestjs', 'cypress', 'nx'],
      match_count: 4,
    },
    {
      title: 'Nest + Cypress',
      tag_names: ['nestjs', 'cypress'],
      match_count: 2,
    },
    {
      title: 'NX is cool',
      tag_names: ['nx'],
      match_count: 1,
    },
  ];


Do I understand this correctly in that if the list goes on, it will also show posts with unrelated tags in tag_names? As the filter is only applied to match_count?

So to only get blog posts with matching tags we would need to add a filter „match_count > 0“, right?

Update: I am very excited about EdgeDB :)


Yep, you understand correctly!

  with tag_names := {"angular", "nestjs", "cypress", "nx"},
  select BlogPost {
    title,
    tag_names := .tags.name,
    match_count := count((select .tags filter .name in tag_names))
  }
  filter .match_count > 0
  order by .match_count desc;


So as I read TheSpiciestDev's comment, he's complaining that making his query in PostgreSQL is slow. It looks like EdgeDB is a frontend to PostgreSQL; how will it help with TheSpiciestDev's problem?


The problem sounds like something that could be solved with a GIST index. EdgeDB doesn't yet have a way to specify the index type, though, mostly because we aren't sure what would be the best way to do it without things becoming too Postgres-specific in schemas.


What do you mean by "too Postgres-specific"? Will you be supporting other DBs behind the EdgeDB interface in the future?


This is not something we plan to do in the near future, but it’s also not outside the realm of possibility. We picked Postgres because of its power, quality and unparalleled extensibility, but we are also very careful to not leak any implementation details into our interfaces.


I'm curious how this squares up with what someone linked elsewhere: https://github.com/edgedb/edgedb/discussions/3403

> EdgeDB does not treat Postgres as a simple standard SQL store. The opposite is true. To realize the full potential of the graph-relational model and EdgeQL efficiently, we must squeeze every last bit of functionality out of PostgreSQL's implementation of SQL and its schema.

I don't see how this and what you're saying can both be true at the same time. Is EdgeDB tightly coupled to the implementation of PostgreSQL, or isn't it? Is there really a chance that EdgeDB could support other databases, or not really? I don't think there's anything wrong with the answers being "yes" and "no", respectively; that's actually what I'd expect. It would be more unusual to try to do this in an implementation-agnostic way.


I think what they mean by this is that EdgeDB's query language should not be coupled to Postgres, but EdgeDB itself should use Postgres specific SQL features to maximise performance - so you couldn't drop in MariaDB without changing code in EdgeDB, but in theory you could write another backend that takes the same queries and uses MariaDB or MongoDB or something custom under the hood.


Exactly, implementation coupling vs interface coupling.


I'm feeling very disappointed to see that you're using two space-separated words for "order by" in your language. Do "order" and "by" have meanings on their own in independence such that the meaning of "order by" arises naturally via their composition/conjunction? If not then surely it should be "orderby" in your language.

SQL definitely must be replaced, but the silly pseudo English syntax is one of the things we want to get rid of, not retain.


Do you have any timeline on when i can use this with rust?

Found this: https://github.com/edgedb/edgedb-rust


select posts.name as post, count(post_tags.id) as matches from posts,post_tags where post_tags.post=posts.id and post_tags.tag in ("angular","nestjs","cypress","nx") group by post_tags.post order by matches desc; Test data @ http://pratyeka.org/hn.sqlite3


Is it though? Simple IN with an ORDER BY on the same match will return the correct ranking. More info on ranking here https://www.postgresql.org/docs/current/textsearch-controls....


Am I right in reading this as the parent comment envisioning a “post” table and a “tag” table, and you’re suggest the “post” table just have a “tag” column?


I see just one table

> Every post is tagged with a given category. There could be 100+ categories in the blog system and a blog post could be tagged with any number of these system categories.

My point is, I don’t see SQL query as expensive for this kind of use case. There are easy and native ways to do it.

In case you would like a top notch performance, Redis might be a way to do it. Even a reverse-index would achieve great performance.


That's a standard many-to-many relationship that would normally be implemented by three tables:

    +---------+-------+---------+-------+-----------+
    | post_id | title | content | other | fields... |

    +--------+------+
    | tag_id | name |

    +------------+---------+--------+
    | tagging_id | post_id | tag_id |

But it seems like the core of the request is still something like:

    SELECT post_id, count(1) AS count
    FROM taggings
    WHERE tag_id IN (3, 8, 255)
    GROUP BY post_id
    ORDER BY count DESC
(off the top of my head; I haven't checked this for any kind of correctness)

And I don't see why that query suffers as you add tags...?

------------

EDIT responding to below [HN believes I am a problem user who should only be allowed to make so many comments per day]:

< that is pretty much what I meant by “I see just one table” as you don’t need any joins

Well, assuming you're doing this because a user is interacting with your site via some kind of web interface, you can set the interface up to deliver you tag_id values directly, but you'll still need to do a join with the posts table so you can present a list of posts back to the user instead of a list of internal post_id values.

So I guess

    SELECT t.post_id, count(1) AS count, p.title, p.url
    FROM taggings t JOIN posts p ON t.post_id = p.post_id
    ...


Confusing but that is pretty much what I meant by “I see just one table” as you don’t need any joins (atleast with the same design you outline)


I do these often with standard GraphQL queries, often over Postgres. Now I’m curious about the performance difference compared to an SQL ORDERBY or similar EdgeDB implementation!


Thanks for posting this. Kind of comment that adds value to the discussion by illustrating how a piece of tech can or cannot be useful.

I just happen to have a very similar requirement to yours and was also wondering.


Maybe I misunderstand what you are saying but it sounds pretty straight forward in SQL.


I'm currently investigating whether Redis Bloom [1] could be a good tool for similar requirement.

[1] https://github.com/RedisBloom/RedisBloom




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: