RethinkDB 1.15: Geospatial queries

CitizenKane · on Sept 25, 2014

I've made a comment to this effect before, but seriously do yourself a favor and check out Rethink. It's been great to work with, and has been 100% rock solid. The only complaint I have is keeping up with the change of pace for everything that's going on has been tricky ;).

It's great and fun to build apps with and all of the guys at Rethink are incredibly helpful (even with my sometimes naive questions).

evmar · on Sept 25, 2014

I tried it out recently. I discovered that the Go client has some important issues. https://github.com/dancannon/gorethink/issues/125

I feel bad to bring this up because it is not the rethinkdb authors' fault (as this is a third-party client) but from this user's perspective it took me some debugging before I realized that rethinkdb wasn't necessarily slow.

_dancannon · on Sept 25, 2014

Yeah there was an unfortunate bug with how I was encoding values being sent to the database. I understand that it was frustrating and fixes should be coming soon.

That being said this update brings a couple of great new features and I look forward to see what else comes.

troyk · on Sept 25, 2014

Thank you making a go driver, I've played with it a bit and the first thing that jumped out at me is the documents passed to and from the database use a special "gorethink" tag instead of the built-in "json" tags. I thought this was odd since rethinkdb is a JSON database.

_dancannon · on Sept 25, 2014

Yeah this is because I encode from json string > interface{} > final data structure. This is so that I can process any psuedo-types such as times.

I am now looking to do this whole process in a much simpler way in the next release which will solve the issue mentioned by evmar. I think its also worth mentioning that both RethinkDB and my driver are not yet production ready but both projects are getting there.

elithrar · on Sept 25, 2014

Likely because some users may be consuming/outputting JSON in a different structure from how they are storing it.

neumino · on Sept 25, 2014

Thanks for all your work dancannon!

_dancannon · on Sept 25, 2014

Thanks! GJ on the latest release.

coffeemug · on Sept 25, 2014

Hey guys, slava @ rethink here. I'll be around all day on HN to answer questions.

We'll be doing a live webcast[1] today at 1:30pm PT showcasing geo features and some example apps you can build with them, would love for you to join us!

[1] http://www.meetup.com/RethinkDB-Bay-Area-Meetup-Group/events...

jimktrains2 · on Sept 25, 2014

Some of your documentation is 404ing.

How many dimensions are supported?

Also, does this support many projections, or just a few. I see "geoSystem: 'WGS84'" which isn't an EPGS identifier. Along those lines, are distance calculations done on a geodesic, a plane, or is that configurable (some projections are a planer, others aren't).

Also, what does this mean? geometry.distance(geometry[, {geoSystem: 'WGS84', unit: 'm'}]) Does this do a correct interpolation of degrees to meters along the path (again based on the projection) or is there a fixed constant?

EDIT: Are there commands to export WKT or WKB?

coffeemug · on Sept 25, 2014

> Some of your documentation is 404ing.

Thanks, will fix momentarily! EDIT: fixed, thanks!

> How many dimensions are supported?

Two dimensions. The commands are designed primarily with Earth geometry in mind to help people build location-aware apps.

> Also, does this support many projections, or just a few.

It supports WGS84 (a commonly used ellipsoid model) or a unit sphere. Currently you can't use a plane. Check out http://rethinkdb.com/api/javascript/get_nearest/ for an example.

> Does this do a correct interpolation of degrees to meters along the path

The implementation is based on the S2 library (https://code.google.com/p/s2-geometry-library/) also used by PostGIS, which AFAIK does spherical linear interpolation.

> Are there commands to export WKT or WKB?

No, currently only GeoJSON (which you could later post-process and export to other formats with existing tools). This is a great idea, though, if there is more demand for it we'll definitely add other exporters.

EDIT: all great questions, I opened an issue in the doc repo (https://github.com/rethinkdb/docs/issues/521) -- we'll address all of these in the docs.

danielmewes · on Sept 25, 2014

> The implementation is based on the S2 library (https://code.google.com/p/s2-geometry-library/) also used by PostGIS, which AFAIK does spherical linear interpolation.

Note that RethinkDB actually doesn't use S2 for computing distances. S2 is used for computing intersections and for indexing though. (also as another poster already pointed out, S2 is not used by PostGIS)

jimktrains2 · on Sept 25, 2014

> Two dimensions. The commands are designed primarily with Earth geometry

Earth has elevations:)

coffeemug · on Sept 25, 2014

Yes, though most location-aware apps typically don't deal with those (yet :)) It would be a great addition, though.

jimktrains2 · on Sept 25, 2014

Very true:) I just found how you phrased it amusing:)

I don't mean to sound disparaging, for it's use-case it's a great addition:)

TkTech · on Sept 25, 2014

Not the OP, but I noticed:

> Commands to create points, lines, polygons and circles

All of the links in that sentence 404.

coffeemug · on Sept 25, 2014

Thanks, fixed!

danielmewes · on Sept 25, 2014

> Also, does this support many projections, or just a few.

RethinkDB doesn't do any planar projections of spherical coordinates (if that's what you are asking for). There are many independent tools available to perform such projections if you need to draw a map or convert to a Euclidean coordinate system. As coffemug noted, WGS84 is not a projection but just a reference ellipsoid used for distance calculations along earth's surface. (Edit: Ok, I guess that can be considered a projection as in projection from earth's actual geometry onto an ellispoid. It's not a projection with respect to latitude and longitude though, if that makes sense.)

> Along those lines, are distance calculations done on a geodesic

Yes, along the geodesic between the two points on an oblate ellipsoid. We use the algorithm from Karney 2013, see http://charles.karney.info/biblio/papers/karney13-geod.pdf

> geometry.distance(geometry[, {geoSystem: 'WGS84', unit: 'm'}]) Does this do a correct interpolation of degrees to meters along the path (again based on the projection)

Yes, again using Karney's algorithm (but not by doing a projection first).

jimktrains2 · on Sept 25, 2014

> WGS84 is not a projection but just a reference ellipsoid used for distance calculations along earth's surface.

Fair enough, it only handles a single GCS and no PCSs.

WGS84 is more than just a datum, it is a GCS. It's also used for more than just distances; it defines a mean sea level used for elevation as well.

WGS84 is also used to mean EPSG 4326, hence why I asked like I did.

> There are many independent tools available to perform such projections if you need to draw a map or convert to a Euclidean coordinate system

Sure there are, but if I can't import EPGS 4269 (NAD83, used by the census), EPGS 3857 (Spherical Mercator), a state plane, or a UTM data source without first converting it, then I need to know that.

danielmewes · on Sept 25, 2014

Ok, got it. Guess I mis-represented what WGS84 in my earlier post. What I should have said is that in RethinkDB we only mean a specific reference ellipsoid for computing distances between latitude/longitude coordinate pairs when referring to WGS84.

If you have data in other projections you will definitely have to convert it first.

On a side-note, in case it's of any help: There is a currently undocumented feature that allows you to specify an oblate reference ellispoid other than the one from WGS 84 in ReQL. The way you do that is by passing an object `{a: ..., f: ...}` to the `geo_system` optional argument. The value of 'a' must be the major radius of the ellipsoid in meters, and 'f' must be the flattening of the ellipsoid. This feature is not heavily tested though and should be used with caution.

jimktrains2 · on Sept 25, 2014

/me immediately looks up one of the ellipsoids for Mars.

fprotthetarball · on Sept 25, 2014

I'm seeing this on Debian wheezy. Looks like libssl0.9.8 isn't available -- only libssl1.0.0. I'm using this as my apt source:

    deb http://download.rethinkdb.com/apt lucid main

Is there a better wheezy package source?

  The following NEW packages will be installed:
    rethinkdb{b}
  0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
  Need to get 14.3 MB of archives. After unpacking 39.1 MB will be used.
  The following packages have unmet dependencies:
  rethinkdb : Depends: libssl0.9.8 (>= 0.9.8k-1) which is a virtual package.
  The following actions will resolve these dependencies:

     Keep the following packages at their current version:
  1)     rethinkdb [Not Installed]

coffeemug · on Sept 25, 2014

Sorry about that, it might take a bit of time to work out the kinks in the packaging (libssl is surprisingly finicky on different platforms). I have dispatched a message to the RethinkDB packaging genie (https://github.com/atnnn) and should be able to get an answer soon.

atnnn · on Sept 25, 2014

Hello. Etienne here. I did the packaging for this release.

It looks like the lucid package is no longer compatible with wheezy.

You may have better luck using precise, trusty or saucy instead of lucid in your sources.list

I will run some tests and update the docs on rethinkdb.com.

rev · on Sept 25, 2014

It's no longer compatible with jessie/sid either. precise and saucy versions depend on libprotobuf7 (wheezy has it too), trusty - on libprotobuf8 and jessie has libprotobuf9.

tieTYT · on Sept 25, 2014

I'm reading the faq^1 and I'm having trouble figuring out why I should use this over MongoDB (even after skimming the very detailed comparison page). From what I gather, it seems like they are pretty similar in features but rethinkdb is easier to administer. What am I missing?

Sorry to be so curt, but I've spent over 15 minutes researching and by now I feel like I should have an high level understanding of the advantage of this database.

^1: http://rethinkdb.com/faq/

neumino · on Sept 25, 2014

A few things: - It has efficient joins. So you can store a "many to many" relation in your database (like you would have done in a SQL database) -- so storing things related to a social network is way nicer/more efficient.

- It has changfeed, so you can push notifications to client when something changes on the database. You can build real time application. For example if your http server is NodeJS, plug SockJS for the browser, and RethinkDB for your database, and your whole stack is reactive.

- The query language is way nicer to use (it's embedded in the host language).

That's the 3 main differences for a web developer I can think of on top of my head.

tinco · on Sept 25, 2014

Hi, not really a question that has to do with decision making about which DB to use, just out of general curiosity:

There's a bunch of open source databases that have fancy features like this now, is there much sharing of code or implementation ideas between the projects? Did you look at the source of for example Mongo or PostGIS, or is it mainly implementing algorithms from papers?

coffeemug · on Sept 25, 2014

All the open source implementations (including Mongo, PostGIS, and Rethink) use Google's S2 library (https://code.google.com/p/s2-geometry-library/). Implementing algorithms from scratch would be a huge and extremely bug-prone effort (although tons of fun).

morganherlocker · on Sept 25, 2014

Actually, PostGIS does not use s2. I believe it uses its own modified R-Tree index. The main downside to this is that the entire index must be kept in memory, which is one of the the primary motivations for libraries like s2. There are alternatives to s2 as well like tile-cover[1], a tile-based indexer that can run client-side. Most of the NoSQL DBs these days do use s2 though.

[1] https://github.com/mapbox/tile-cover

atombender · on Sept 26, 2014

I know nothing about s2, but PostGIS uses two external libraries for its calculations: Proj4 [1] and GEOS [2].

Proj4 is for working with projections. It comes with a handy command line tool that lets you easily perform chores like converting between coordinate systems/projects.

GEOS is a C++ library (with a C API on top) that's actually a port of a Java library called JTS. It's used for expressing structured geometry/geography primitives and performing calculations on them, as well as encoding/decoding them in the WKT/WKB (text and binary respectively) formats (the PostGIS "geometry" data type). All the geometry testing and manipulation functions, eg. ST_Intersects(), use GEOS.

GEOS actually comes with support for a spatial index called STR, or STRtree, but I believe it's not used by PostGIS. Instead, PostGIS uses Postgres' own GiST, which is a generalization of R-trees that perform better in a relational environment.

[1] http://www.remotesensing.org/proj

[2] http://geos.osgeo.org/

robrenaud · on Sept 25, 2014

I work on Google Maps and think s2 is wonderful. It's a mostly overlooked gem. What advantage does tile-cover have over s2?

morganherlocker · on Sept 25, 2014

3 main reasons:

- Runs in browser, for client-side indexing.

- Based off of tiles, instead of s2's 6-faced-cube thing. This allows it to be used with a lot of well established tools and mathematical formulas. Also much easier to reason about IMHO.

- Indexes points about twice as fast at the moment.

I also use s2, and it really is terrific. There are many cases where s2 is the superior choice (mostly due to its hilburt curve indexing scheme).

coffeemug · on Sept 25, 2014

Ah, great info, thanks!

cwmma · on Sept 25, 2014

so it looks like your doing a geohash for the geoindexing bure not using a hilbert curve[1]? Did you guys look into more flexible structures like an rtree or some of the other stuff from libspatialindex [2] which might be a bit more flexible (e.g. for finding what polygon a point is in)

[1]http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-...

[2]http://libspatialindex.github.io/

danielmewes · on Sept 25, 2014

Hi cwmma. RethinkDB actually does use Hilbert curves at multiple scales for indexing geometry. The implementation for this mapping comes from Google's S2 library. You can find the details documented in Google's code: https://code.google.com/p/s2-geometry-library/source/browse/...

We looked into some other data structures before settling for our current approach. Using a space-filling curve turned out to be the least error-prone to implement given that we already have a highly optimized and well tested btree implementation in RethinkDB.

cwmma · on Sept 26, 2014

yeah my bad I was looking through the wrong source. Do you do anything special for point queries, e.g. what town is this point in, which geohashes tend not to be that great for (in my experience)

danielmewes · on Sept 26, 2014

Nothing special. We use S2 to compute a covering of a polygon (like a town) by a number of grid cells. If you insert a polygon (or line) into a geospatial index, it will actually be inserted multiple times, with one entry for each grid cell of its covering. Then when you query by a point (e.g. to answer said query, assuming you have a table with all the town polygons), we find any polygon that has a grid cell that intersects with that point and then do a post-filtering check using the actual detailed geometry of the candidate polygon. This is fairly efficient and can be extended to a lot of different cases, though I'm sure there are more efficient specialized data structures for some specific types of queries.

giulianob · on Sept 25, 2014

I really appreciate their transparency with their stability report ( http://rethinkdb.com/stability/ ). It's even linked right on the front page and shows what type of issues you can expect at scale.

estefan · on Sept 25, 2014

I wish every software project had a FAQ like this: http://rethinkdb.com/faq/

Animats · on Sept 26, 2014

MySQL has had geospatial extensions for years. They even work. The underlying table format is an R-tree, which, unfortunately, is only available for MyISAM. Point in rectangle is very efficient, and other queries that can be expressed as multiple point in rectangle tests are reasonably efficient. "Nearest" is not efficient, but for many purposes ("find nearest McDonalds"), generating a query rectangle for a reasonable driving distance, then sorting by distance, is effective.

Gigablah · on Sept 26, 2014

MySQL didn't properly support point-in-polygon until 5.6 though (before that it used a MBR and you had to hack around it with a stored procedure)

altschuler · on Sept 25, 2014

What's with the red ball always rolling by in the RethinkDB videos? :)

jaz46 · on Sept 25, 2014

Joey from RethinkDB here. In our first release video(v1.5), one of our engineers rolled the ball across the office as a prank. The ball seems to roll, stop, then magically continue rolling again. Since then, it has become a tradition.

spb · on Sept 26, 2014

Does Rethink allow arbitrary keys? Looking at the serialization format, it looks like it has special $-surrounded keys, but I don't know if it also does some kind of transformation so I can actually use that kind of key name without it being recognized as a special key by Rethink.

coffeemug · on Sept 26, 2014

Yes, it allows arbitrary keys. The only one that's reserved by the database is `$reql_type$`, and it's used for serialization of data structures that don't have a native JSON representation (time, binary objects, geometry objects, etc.)

spb · on Sept 27, 2014

So if I try to define `$reql_type$`, it will throw an error? That means that I can't store arbitrary data / use user data as a key. Can't something be done about that at the driver / protocol level (eg. translating literal '$' to '$$' and back again)?

imslavko · on Sept 25, 2014

It is not clear from the changelog for me: do geospatial queries support "changes feed" introduced in recent releases of RethinkDB? i.e. can I get notification every time a new document within a range from a point appears in the table? (w/o filtering data in application level)

neumino · on Sept 25, 2014

Yes you can with a query like

r.table("users").changes().filter(function(change) { return change("new_val")("location").distance( r.point(<someLongitude>, <someLatitude>).lt(<someDistance>) })

jchrisa · on Sept 25, 2014

Kudos for Rethink getting into geo. GeoCouch has been a huge adoption driver for public data applications.

I think the real killer mix is geo + sync (because then you can trivially share civic data). Maybe someday we'll see sync from Rethink?

coffeemug · on Sept 25, 2014

What do you mean by sync? Something like http://rethinkdb.com/docs/changefeeds, or do you mean something different?

apendleton · on Sept 25, 2014

Probably Couch-style master-master replication (and an accompanying protocol that allowed alternate implementations to sync in the same fashion with Couch, e.g., PouchDB).

jrobn · on Sept 25, 2014

Any plans for temporal support similar to what postgres does with range types and exclusion constraints? This would be a great addition to Rethink.

coffeemug · on Sept 25, 2014

Yes, this is definitely on the roadmap (though no ETA yet). Lots of people have asked for this, so it will almost certainly make it into RethinkDB some time next year.

goldenkey · on Sept 26, 2014

I've continually been impressed by the pace at which RethinkDB improves in both performance and developer usefulness. Keep it up!

cdnsteve · on Sept 26, 2014

Is anyone offering this in a managed capacity like DynamoDB type setup in AWS us-east? If so please let me know.