I've made a comment to this effect before, but seriously do yourself a favor and check out Rethink. It's been great to work with, and has been 100% rock solid. The only complaint I have is keeping up with the change of pace for everything that's going on has been tricky ;).
It's great and fun to build apps with and all of the guys at Rethink are incredibly helpful (even with my sometimes naive questions).
I feel bad to bring this up because it is not the rethinkdb authors' fault (as this is a third-party client) but from this user's perspective it took me some debugging before I realized that rethinkdb wasn't necessarily slow.
Yeah there was an unfortunate bug with how I was encoding values being sent to the database. I understand that it was frustrating and fixes should be coming soon.
That being said this update brings a couple of great new features and I look forward to see what else comes.
Thank you making a go driver, I've played with it a bit and the first thing that jumped out at me is the documents passed to and from the database use a special "gorethink" tag instead of the built-in "json" tags. I thought this was odd since rethinkdb is a JSON database.
Yeah this is because I encode from json string > interface{} > final data structure. This is so that I can process any psuedo-types such as times.
I am now looking to do this whole process in a much simpler way in the next release which will solve the issue mentioned by evmar. I think its also worth mentioning that both RethinkDB and my driver are not yet production ready but both projects are getting there.
Hey guys, slava @ rethink here. I'll be around all day on HN to answer questions.
We'll be doing a live webcast[1] today at 1:30pm PT showcasing geo features and some example apps you can build with them, would love for you to join us!
Also, does this support many projections, or just a few. I see "geoSystem: 'WGS84'" which isn't an EPGS identifier. Along those lines, are distance calculations done on a geodesic, a plane, or is that configurable (some projections are a planer, others aren't).
Also, what does this mean? geometry.distance(geometry[, {geoSystem: 'WGS84', unit: 'm'}]) Does this do a correct interpolation of degrees to meters along the path (again based on the projection) or is there a fixed constant?
No, currently only GeoJSON (which you could later post-process and export to other formats with existing tools). This is a great idea, though, if there is more demand for it we'll definitely add other exporters.
Note that RethinkDB actually doesn't use S2 for computing distances. S2 is used for computing intersections and for indexing though. (also as another poster already pointed out, S2 is not used by PostGIS)
> Also, does this support many projections, or just a few.
RethinkDB doesn't do any planar projections of spherical coordinates (if that's what you are asking for). There are many independent tools available to perform such projections if you need to draw a map or convert to a Euclidean coordinate system.
As coffemug noted, WGS84 is not a projection but just a reference ellipsoid used for distance calculations along earth's surface. (Edit: Ok, I guess that can be considered a projection as in projection from earth's actual geometry onto an ellispoid. It's not a projection with respect to latitude and longitude though, if that makes sense.)
> Along those lines, are distance calculations done on a geodesic
> geometry.distance(geometry[, {geoSystem: 'WGS84', unit: 'm'}]) Does this do a correct interpolation of degrees to meters along the path (again based on the projection)
Yes, again using Karney's algorithm (but not by doing a projection first).
> WGS84 is not a projection but just a reference ellipsoid used for distance calculations along earth's surface.
Fair enough, it only handles a single GCS and no PCSs.
WGS84 is more than just a datum, it is a GCS. It's also used for more than just distances; it defines a mean sea level used for elevation as well.
WGS84 is also used to mean EPSG 4326, hence why I asked like I did.
> There are many independent tools available to perform such projections if you need to draw a map or convert to a Euclidean coordinate system
Sure there are, but if I can't import EPGS 4269 (NAD83, used by the census), EPGS 3857 (Spherical Mercator), a state plane, or a UTM data source without first converting it, then I need to know that.
Ok, got it.
Guess I mis-represented what WGS84 in my earlier post.
What I should have said is that in RethinkDB we only mean a specific reference ellipsoid for computing distances between latitude/longitude coordinate pairs when referring to WGS84.
If you have data in other projections you will definitely have to convert it first.
On a side-note, in case it's of any help: There is a currently undocumented feature that allows you to specify an oblate reference ellispoid other than the one from WGS 84 in ReQL. The way you do that is by passing an object `{a: ..., f: ...}` to the `geo_system` optional argument. The value of 'a' must be the major radius of the ellipsoid in meters, and 'f' must be the flattening of the ellipsoid. This feature is not heavily tested though and should be used with caution.
I'm seeing this on Debian wheezy. Looks like libssl0.9.8 isn't available -- only libssl1.0.0. I'm using this as my apt source:
deb http://download.rethinkdb.com/apt lucid main
Is there a better wheezy package source?
The following NEW packages will be installed:
rethinkdb{b}
0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 14.3 MB of archives. After unpacking 39.1 MB will be used.
The following packages have unmet dependencies:
rethinkdb : Depends: libssl0.9.8 (>= 0.9.8k-1) which is a virtual package.
The following actions will resolve these dependencies:
Keep the following packages at their current version:
1) rethinkdb [Not Installed]
Sorry about that, it might take a bit of time to work out the kinks in the packaging (libssl is surprisingly finicky on different platforms). I have dispatched a message to the RethinkDB packaging genie (https://github.com/atnnn) and should be able to get an answer soon.
It's no longer compatible with jessie/sid either. precise and saucy versions depend on libprotobuf7 (wheezy has it too), trusty - on libprotobuf8 and jessie has libprotobuf9.
I'm reading the faq^1 and I'm having trouble figuring out why I should use this over MongoDB (even after skimming the very detailed comparison page). From what I gather, it seems like they are pretty similar in features but rethinkdb is easier to administer. What am I missing?
Sorry to be so curt, but I've spent over 15 minutes researching and by now I feel like I should have an high level understanding of the advantage of this database.
A few things:
- It has efficient joins. So you can store a "many to many" relation in your database (like you would have done in a SQL database) -- so storing things related to a social network is way nicer/more efficient.
- It has changfeed, so you can push notifications to client when something changes on the database. You can build real time application. For example if your http server is NodeJS, plug SockJS for the browser, and RethinkDB for your database, and your whole stack is reactive.
- The query language is way nicer to use (it's embedded in the host language).
That's the 3 main differences for a web developer I can think of on top of my head.
Hi, not really a question that has to do with decision making about which DB to use, just out of general curiosity:
There's a bunch of open source databases that have fancy features like this now, is there much sharing of code or implementation ideas between the projects? Did you look at the source of for example Mongo or PostGIS, or is it mainly implementing algorithms from papers?
All the open source implementations (including Mongo, PostGIS, and Rethink) use Google's S2 library (https://code.google.com/p/s2-geometry-library/). Implementing algorithms from scratch would be a huge and extremely bug-prone effort (although tons of fun).
Actually, PostGIS does not use s2. I believe it uses its own modified R-Tree index. The main downside to this is that the entire index must be kept in memory, which is one of the the primary motivations for libraries like s2. There are alternatives to s2 as well like tile-cover[1], a tile-based indexer that can run client-side. Most of the NoSQL DBs these days do use s2 though.
I know nothing about s2, but PostGIS uses two external libraries for its calculations: Proj4 [1] and GEOS [2].
Proj4 is for working with projections. It comes with a handy command line tool that lets you easily perform chores like converting between coordinate systems/projects.
GEOS is a C++ library (with a C API on top) that's actually a port of a Java library called JTS. It's used for expressing structured geometry/geography primitives and performing calculations on them, as well as encoding/decoding them in the WKT/WKB (text and binary respectively) formats (the PostGIS "geometry" data type). All the geometry testing and manipulation functions, eg. ST_Intersects(), use GEOS.
GEOS actually comes with support for a spatial index called STR, or STRtree, but I believe it's not used by PostGIS. Instead, PostGIS uses Postgres' own GiST, which is a generalization of R-trees that perform better in a relational environment.
- Based off of tiles, instead of s2's 6-faced-cube thing. This allows it to be used with a lot of well established tools and mathematical formulas. Also much easier to reason about IMHO.
- Indexes points about twice as fast at the moment.
I also use s2, and it really is terrific. There are many cases where s2 is the superior choice (mostly due to its hilburt curve indexing scheme).
so it looks like your doing a geohash for the geoindexing bure not using a hilbert curve[1]? Did you guys look into more flexible structures like an rtree or some of the other stuff from libspatialindex [2] which might be a bit more flexible (e.g. for finding what polygon a point is in)
Hi cwmma.
RethinkDB actually does use Hilbert curves at multiple scales for indexing geometry. The implementation for this mapping comes from Google's S2 library. You can find the details documented in Google's code: https://code.google.com/p/s2-geometry-library/source/browse/...
We looked into some other data structures before settling for our current approach. Using a space-filling curve turned out to be the least error-prone to implement given that we already have a highly optimized and well tested btree implementation in RethinkDB.
yeah my bad I was looking through the wrong source. Do you do anything special for point queries, e.g. what town is this point in, which geohashes tend not to be that great for (in my experience)
Nothing special. We use S2 to compute a covering of a polygon (like a town) by a number of grid cells. If you insert a polygon (or line) into a geospatial index, it will actually be inserted multiple times, with one entry for each grid cell of its covering. Then when you query by a point (e.g. to answer said query, assuming you have a table with all the town polygons), we find any polygon that has a grid cell that intersects with that point and then do a post-filtering check using the actual detailed geometry of the candidate polygon. This is fairly efficient and can be extended to a lot of different cases, though I'm sure there are more efficient specialized data structures for some specific types of queries.
I really appreciate their transparency with their stability report ( http://rethinkdb.com/stability/ ). It's even linked right on the front page and shows what type of issues you can expect at scale.
MySQL has had geospatial extensions for years. They even work. The underlying table format is an R-tree, which, unfortunately, is only available for MyISAM. Point in rectangle is very efficient, and other queries that can be expressed as multiple point in rectangle tests are reasonably efficient. "Nearest" is not efficient, but for many purposes ("find nearest McDonalds"), generating a query rectangle for a reasonable driving distance, then sorting by distance, is effective.
Joey from RethinkDB here. In our first release video(v1.5), one of our engineers rolled the ball across the office as a prank. The ball seems to roll, stop, then magically continue rolling again. Since then, it has become a tradition.
Does Rethink allow arbitrary keys? Looking at the serialization format, it looks like it has special $-surrounded keys, but I don't know if it also does some kind of transformation so I can actually use that kind of key name without it being recognized as a special key by Rethink.
Yes, it allows arbitrary keys. The only one that's reserved by the database is `$reql_type$`, and it's used for serialization of data structures that don't have a native JSON representation (time, binary objects, geometry objects, etc.)
So if I try to define `$reql_type$`, it will throw an error? That means that I can't store arbitrary data / use user data as a key. Can't something be done about that at the driver / protocol level (eg. translating literal '$' to '$$' and back again)?
It is not clear from the changelog for me: do geospatial queries support "changes feed" introduced in recent releases of RethinkDB? i.e. can I get notification every time a new document within a range from a point appears in the table? (w/o filtering data in application level)
Probably Couch-style master-master replication (and an accompanying protocol that allowed alternate implementations to sync in the same fashion with Couch, e.g., PouchDB).
Yes, this is definitely on the roadmap (though no ETA yet). Lots of people have asked for this, so it will almost certainly make it into RethinkDB some time next year.
It's great and fun to build apps with and all of the guys at Rethink are incredibly helpful (even with my sometimes naive questions).