Hacker News new | past | comments | ask | show | jobs | submit login

I've got a perhaps naïve question related to:

> In particular, I’d like to replace DataFrames with a new type that no longer occupies a strange intermediate position between matrices and relational tables.

Namely, why is an embedded SQLite database not used for all things tabular in languages like R/Julia/Foo? I was thinking about this as I was attempting to reconstruct a visualization in Racket using their (pretty good!) 2d plotting facilities and lamenting not having a tabular data structure.

SQLite is embeddable. It has fast in-memory database support. It can operate reasonably quickly on data that is stored to disk. It supports indexing. NULL values already exist to represent missingness. SQLite allows for callbacks to user-supplied functions that I'd imagine could be created relatively easy in something like Julia.

As a side benefit, it seems like a SQLite-oriented tabular data store could be extended, like dplyr has done, to support other databases.

When I think about the use cases I've encountered where I've found myself reaching for DataFrame or data.frame, I am struggling to think how a tightly integrated SQLite wouldn't work.

Are there Computer Science reasons why this is a silly idea? I know Pandas claims nominally better performance than SQLite in some circumstances, but then again SQLite has also recently seen some substantial performance gains.




I am definitely going to use embedded SQLite in some of future prototypes. The only problem is that SQLite doesn't, to my knowledge, support all of the features I'd like Julia to support -- including columns that contain arrays, dictionaries and even complete tables. But, that said, it supports >90% of the features I'd be interested in.


Ah, that's a very succinct reason. Because my needs are somewhat ordinary, I often forget about the notion that columns might be anything other than vectors containing single values. I wonder if some of the more complex datatypes could be added to SQLite in the same sort of way that Spatialite has added geospatial geometries as column types. In fact, I wonder if some of those geospatial types might be abused to push past the 90% use case barrier.

Hmm... Now I'm wondering what level of extension to SQLite would be required to create a shared library that could be plugged in to any of the various languages that need dataframe functionality. DataLite?


I'm not sure. I think, for many users, SQLite is sufficient, which is why I'll be using it for prototype work in the future. Trying to add things like arrays seems like a lot of work, since you also need to add things like HQL's EXPLODE at the same time.


Check out continuum's blaze project for python. Provides table and array abstractions for a variety of backends, including SQL solutions.


Racket comes with support for SQLite (the index of Scribble is built with help from SQlite for example). That is, it is possible to make such a tabular data structure with a reasonable effort.

It would be interesting to hear what kind of features such a tabular data structure. A post on the Racket mailing list is welcome.


Thank you for the example of SQLite being used with Scribble. I was playing a little bit with the sqlite interface, but using it rather naively to pass queries and return values.

Not quite sure I have the Racket chops to implement something like a data.frame abstraction over an in-memory SQLite database (or even dplyr style query construction). Maybe a project for after I get off Hacker News and finish up a couple of articles that have needed finishing...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: