Pony ORM - Use Pure Python to Speak to Your Data

kozlovsky · on April 19, 2013

Hi, I'm Alexander Kozlovsky, one of Pony ORM authors. I'll try to answer some questions

> why such an obsession with "pure" Python?

With this statement we are trying to say Pony is not just to offer some syntax sugar. The end goal of Pony ORM is to understand the semantic of each Python generator which can be translated to database query and then do translation accordingly.

The direct benefit of this is that generator query are more high-level then resulted SQL, and can be much easier to understand and refactor. The simplest example is an automatic converting of attribute path traversing such as grade.student.group.department.name to a minimal set of SQL joins. Pony also can use attribute path in reverse direction, from “one” to “many”, such as department.groups.students.gpa, which in this case resulted in a multiset of all ‘gpa’ values for this department. Also you can see examples of query optimization in the documentation, for example when the subquery is converted to LEFT JOIN with GROUP BY, where it can give performance benefit.

There are more distant benefits of the semantic translation. In future, we plan to add support of denormalized database schemas. Such denormalization will take place on a physical level (i.e. in the database) while on the logical level (in Python) all queries and logic will remain the same. But first of all we are going to add the migration support.

Another future plan is to add support of MongoDB. This is a distant plan, but I think Pony architecture will allow this. After that, migration of PonyORM-based project from relational databases to noSQL and vice versa will be possible.

> An ORM like Django's offers a higher level of abstraction

I disagree with this. I think, any Django query can be written in PonyORM much more concisely, and there are easy-to-understand PonyORM queries which cannot be expressed in Django easily.

Also, Pony ORM completely eliminates N+1 select problem.

benatkin · on April 19, 2013

This has a split licensing model. http://ponyorm.com/license-and-pricing.html

bru · on April 19, 2013

>Free for non-commercial usage: Creative Commons Attribution-NonCommercial 3.0 License

Beware! CC licenses are not for software, and it actually makes it incompatible with GPL. Alexander and Alexey, please see http://wiki.creativecommons.org/Frequently_Asked_Questions#C...

Flimm · on April 19, 2013

The NonCommercial part makes it not open source, and not libre software. Both the OSI and the FSF clearly state that FLOSS software has to allow commercial use for it to qualify.

amalashkevich · on April 19, 2013

Thank you for pointing this out, will check this out

hackerboos · on April 19, 2013

Yeah...no.

kijin · on April 19, 2013

I'm not very familiar with the culture of the Python community so this might come across as offending, but why such an obsession with "pure" Python?

On one extreme, of course, there are spaghetti apps that mix PHP, HTML, CSS, JS, shell, and SQL in the same file. We all know and hate those apps. But is there any reason to jump to the other extreme and turn as much as possible into "pure" $LANGUAGE ?

As soon as your query gets moderately complicated, you're still littering Python with SQL keywords like order_by(), desc(), select(), and commit(). The GROUP BY section of the documentation just reads like a rough translation of SQL into Python, the only difference being the syntax. It's like watching a first-week ESL student try to construct sentences in English.

The documentation advertises that "Pony allows any programmer to write complex and effective queries against a database, even without being an expert in SQL." I don't think this is true for any ORM I've seen so far, whether in Python or in any other popular language. Beyond a certain level of complexity, you need to know SQL in order to write complex queries. But if you already know SQL, why translate long GROUP BY ... HAVING queries into Python only to have the ORM translate it back into SQL? Why are we trying so hard to avoid writing SQL? What are we going to do next? Write a library that translates pure Python into Lua scripts for your Redis server?

I like ORMs because they simplify frequent tasks, like grabbing a dozen items from the database and filtering them by a couple of columns. I also like them because they often come with caching and effective protections against SQL injection attacks. But I also think that purity is overrated. Both web apps and native apps are already a mixture of several different languages, both on the frontend and on the backend. Don't be afraid to add SQL to your belt, it's just another language.

(By the way, why is there an order_by() method and a separate orderby() method?)

ronnix · on April 19, 2013

> I'm not very familiar with the culture of the Python community so this might come across as offending, but why such an obsession with "pure" Python?

I think it refers to the implementation, meaning the library has no C modules and is written 100% in Python.

This means it should be able to run on all Python implementations, including Jython, IronPython, PyPy, Google App Engine...

amalashkevich · on April 19, 2013

I'm Alexey Malashkevich, one of the Pony ORM authors. For now Pony ORM runs on CPython only because there is no access to frames in other implementations. "Pure Python" means that you can write queries in term of objects using Python generators.

ronnix · on April 19, 2013

OK, thanks. But isn't it the point of all ORMs? Why the emphasis on pure?

amalashkevich · on April 19, 2013

Pure in this case means that you can use Python syntax in order to query a database. This way a database query looks identical to an iteration over a list of Python objects.

kijin · on April 19, 2013

I see. It actually makes sense to emphasize the purity in this case, since it almost looks like Pony is messing with the Python interpreter itself.

mbreese · on April 19, 2013

Ordinarily this is what "pure python" would mean. Since "pure python" libraries can run anywhere without compiling C code.

But in this case, they are referring to the fact that you don't have to write SQL and write your queries using Python syntax generators.

darksaints · on April 19, 2013

SQL is declarative, statically typed but with an incredible type-inference system, and does an amazing amount of optimization behind the scenes. If it weren't so domain-specific, it would be the holy grail of languages. It amazes me how many developer hours of effort has been put into this global quest to completely avoid SQL. It is starting to remind me of all the people I work with that would rather write 10k lines of VBA for some unmaintainable Excel spreadsheet than learn to write 300 lines of Python/Ruby/Lua/Whatever that can accomplish exactly the same task.

I can understand some hesitation to learn something new, especially when there are so many new things to learn and the value of learning them is not always apparent...but I would assume that the benefits of knowing SQL would be readily apparent. I'm not even a developer and I would feel like a quadriplegic without it.

romaniv · on April 19, 2013

It's not hesitance to learn something new. People avoid SQL for a variety of fairly rational reasons. For me, the biggest one is that most general-purpose programming languages encourage you to solve large problems by splitting them up into manageable chunks. SQL, on the other hand, encourages you to lump different problems together (for performance reasons).

It also has remarkably poor tools for code reuse and abstraction. Have you ever worked with SPs that dynamically generate SQL code? I see it daily. Not only that, it's code written by professional database administrators.

pwang · on April 20, 2013

> If it weren't so domain-specific, it would be the holy grail of languages.

That's relatively true of many languages. The reason SQL has been able to be optimized so heavily (and the reason why it's been adopted so broadly) is because it is so domain specific.

More specifically, it is optimized for expression set algebraic operations, and certain kinds of kernels to apply to resultant sets. It is most definitely NOT suited for most array or otherwise metrically-oriented data series, which is why you never see it used in HPC to express timeseries transformations or, say, 3D PDE diffusion simulations. For those applications, people have come up with different kinds of specialized languages which are rooted in an more appropriate data abstraction than merely set operations.

_david · on April 19, 2013

Right, there's only like a million of these already in existence in every language and I have yet to find one that is comprehensive enough that I don't just fall back to regular sql queries sooner or later.

And if you think about it, any library that was powerful enough for me to not write SQL anymore would basically have to rewrite the complete SQL spec using awkward hacks in a language that is totally unsuited for it.

And if you're writing an application of any complexity or size using an SQL database you need the full capacities of SQL. There are a lot of little features or quirks of SQL that these languges seem to overlook that you need to make your app perform.

Now I usually just write a couple functions like this:

    query("SELECT one, two FROM mytable",
        ["WHERE weight > %f AND name = '%s'", weight, name]);

That gives me SQL-injection protection, lets me save subqueries as strings and re-use them, and caching to if I need it.

And as a plus, I don't have continuously look up/relearn each little feature of SQL in this little language, or if I write it this way is it actually going to compile into the SQL I want? It's pretty much inevitable that you're going to need to know/control the SQL that's being written sooner or later (usually sooner), so save yourself the extra headache and just go SQL from the start.

kozlovsky · on April 19, 2013

One of the goals of Pony ORM is to reduce the necessity of using the specific syntax. For example, in Pony ORM you can write:

    (o for o in Order if o.date_shipped.year == 2012)

instead of something like:

    .filter(extract("year", Order.date_shipped) == 2012)

The goal is to use native Python syntax wherever possible

flyinRyan · on April 19, 2013

I know we're talking about python, but personally I advise against raw SQL because there is no type protection of any kind. In, e.g. C# I can use Linq to ensure I build a query that type checks and makes sense, even across joins.

ralph · on April 19, 2013

A "couple [of] functions"? What does the other do?

stcredzero · on April 19, 2013

[of] is optional in this case.

ralph · on April 20, 2013

It seems from others' replies that it's not optional. It certainly isn't in English. I put it in not to correct but to explain my interpretation of what was written to better frame my question.

cmsd2 · on April 19, 2013

could be an American/British English difference. I would consider omitting "of" incorrect in British English.

mbreese · on April 19, 2013

It's incorrect in American English too...

kozlovsky · on April 19, 2013

Hi, I answered about “pure Python” obsession in a separate comment ;)

> you're still littering Python with SQL keywords like order_by(), desc(), select(), and commit()

Well, I see them not as “SQL keywords” but as logical concepts

> The GROUP BY section of the documentation just reads like a rough translation of SQL into Python

Hmm, not sure I understand this. With PonyORM, GROUP BY section is generated automatically, and not present in Pony ORM queries. This eliminates the need to duplicate the same columns in SELECT and GROUP BY sections.

> Beyond a certain level of complexity, you need to know SQL in order to write complex queries.

I agree with this. But I hope PonyORM can lower entry barrier for new developers. Also, PonyORM allows developers to use pure SQL, if all they need from mapper is IdentityMap, caching, etc.

> By the way, why is there an order_by() method and a separate orderby() method?

Thanks for noticing this, it is a typo in documentation. The method was renamed from orderby() to order_by() recently.

kijin · on April 19, 2013

Here's an example from the documentation that just reads like a "rough translation of SQL into Python":

    select((item.order, item.order.total_price,
         sum(item.price * item.quantity))
         for item in OrderItem
         if item.order.total_price < sum(item.price * item.quantity))

kozlovsky · on April 19, 2013

Well, it doesn't seems that "rough" to me, because content of GROUP BY and HAVING sections is determined automatically and there is no need to manually join tables

dagw · on April 19, 2013

why such an obsession with "pure" Python?

One guess is that python is very popular among large non-developer communities, like GIS, Finance, and various groups doing numerical analysis. Many of these people aren't programmers and have no desire to become programs, they simply use python to automate tedious tasks and script the applications they use. If you tell them they have to learn a second language completely unrelated to the one they already know just to grab some data from a database, they'll probably push back. Tell them they can grab data from the database using the language they already know and they'll be much more interested in learning. Also the diagram editor is conceptually similar to many tools they might already be using and thus it might be easier for them to get a hang of it.

d0ugal · on April 19, 2013

I think you need to know SQL to use this. An ORM like Django's offers a higher level of abstraction and removes the need to fully understand SQL in many cases. This is closer to a direct translation.

A few possible reasons I can think of;

- People prefer to only work in one language - you don't need to make the full mental shift

- It works better/looks better in Python editors

- It should be safer and cleaner that mashing SQL strings together.

estebank · on April 19, 2013

A few other potential benefits:

- syntax highlighting - autocomplete - syntax check - query linting, finding common antipatterns - no SQL injections possible

nightski · on April 19, 2013

You are assuming the only purpose of doing this IS to generate SQL. In this library that may be the case, but the truth is with say Linq in .NET there are a variety of back ends (Linq to objects being very useful). Many sql operations are very useful when working with data, even outside of a database. None of this is new.

ulput · on April 19, 2013

Pure Python ensures that your software is as slow as possible.

This minimizes the likelihood of users mistakenly continuing to use it despite its low quality, and thus maximizes global welfare.

megaman821 · on April 19, 2013

I really like the style of this. I remember seeing LINQ years ago and wondering if something like it was possible in Python. I was disappointed when it looked like it couldn't be done, but generating an AST then translating to SQL is smart.

It seems from this thread not many people appreciate the parity of:

    list(o for o in Item if o.price>3)

when Item is a normal in-memory Python iterable, and:

    select(o for o in Item if o.price>3)

when Item is a row in a database.

After just helping someone get started that was new to Python and Django, it was weird helping them learn list comprehensions and generators and then having to teach such a divergent form of syntax to work with the Django ORM even though the a lot of the concepts should be similar.

d0mine · on April 19, 2013

you might mean "when Item is a table in a database" instead of "a row in a database". `o` would be the row.

bayesianhorse · on April 19, 2013

It will be very interesting to see, if this can be a commercial success, and a good example to learn from.

In competition with Django ORM and Sqlalchemy, both of which are open source and battle tested, they will have to do some cutting edge marketing to justify the cost "per process".

The idea is great. We might see something like this pop up in Sqlalchemy or Django real soon...

falsedan · on April 19, 2013

SQLAlchemy comes with SQLSoup: not as 'pure' but the abstraction overhead is low and the interface is very similar

luckystarr · on April 19, 2013

How does it translate generator expressions? Some kind of byte-code magic?

yangyang · on April 19, 2013

It appears to build an AST from the code object on the generator, and then translate to SQL.

https://github.com/ponyorm/pony/blob/orm/pony/orm/decompilin...

https://github.com/ponyorm/pony/blob/orm/pony/orm/asttransla...

amalashkevich · on April 19, 2013

Pony gets the code object from a generator, decompiles it to a Python AST and then translates it to a SQL abstract tree.

estebank · on April 19, 2013

This definitely reminds me of LINQ (never having used it).

As much as I dislike things like this that translate a language into another, but not quite right, as you have one syntax trying to mimic a completely different one, you do (or could) get the benefit of the possibility of autocompletion, syntax highlighting and lint checks.

I know that adding something like LINQ in Python is completely out of the question, as you'd have a special syntax case, and I agree with that.

For some time I've wondered how hard would it be to reverse traverse the Python AST to find literal strings that are being passed to a dbapi2 interface, and syntax highlight/lint/autocomplete the SQL on those strings. This gets there by bypassing the "Language is in a string" issue.

jimmaswell · on April 19, 2013

I've used LINQ very often and I can say it's been superb. I can't compare it to sql because I don't really know sql though.

taude · on April 19, 2013

You're pretty lucky that you haven't had to know SQL to have good luck.

Having used Linq2SQL for a few years, I don't like this style of coding. Often, to optimize a query, we've had to look at the generated SQL and then reverse engineer it back to Linq to get the SQL that we wanted. While certain queries were really easy in Linq2SQL, the more complex ones were extremely difficult. I had to still understand and write the underlying SQL, then I had to figure out how to "translate" that into the Linq2SQL. In the long run, all time savings were nullified. I see the same with a DSL-type of wrapper like this.

Additionally, almost all good backend devs speak SQL, I can take a Java-person and without the Linq2SQL stuff, he could understand our backend, work on queries, etc. The Linq just added another layer of complexity for someone to think about.

(Granted, there's a database layer that we could truly put a lot of that stuff in, if our queries got really complex)

jimmaswell · on April 19, 2013

I haven't used linq2SQL or anything, just LINQ on collections in .NET.

A team I'm a part of (language is C#) had to get rid of a library that handled SQL and do the SQL manually because the library was making a lot of inefficient calls. Sounds like a similar problem.

omaranto · on April 19, 2013

If you had that sort of experience, wouldn't it have been easier to write SQL instead of Linq?

mh- · on April 19, 2013

unfortunately the confusing licensing circumstances will probably hamper some of the interest in this.

i'm not opposed to commercial licensing, it's just that the existing combination of options doesn't offer clear legal guarantees.

if you go to the website, the split-licensing is apparent - yet has issues with, what seems like, improper use of Creative Commons licensing.

if you go to the github repo, the only license presented is AGPLv3.

are people who find the repo on github directly at risk of violating the license?

amalashkevich · on April 19, 2013

Thanks for pointing this out, we are working on the licensing part now. We are not going to put anyone at risk of the licence misuse. Eventually we are going to have a multiple licensing model allow using Pony ORM for free for non-commercial and open source projects, but in the same provide a commercial license as well.

> are people who find the repo on github directly at risk of violating the license?

Nope, why? Using it under AGPL is a valid option.

nbevans · on April 19, 2013

.NET has had this since 2006. LINQ to SQL and Entity Framework both support LINQ expressions. The problem is that more and more .NET devs are shunning it when working their relational database. There is an adage that if you're working with SQL then you should just be writing directly in SQL. Everything else is second rate.

JonoW · on April 19, 2013

As long as an ORM allows you to break out and right custom optimised SQL when needed, then I see no reason to write all your SQL by hand. ORMs are great for covering basic, repetitive queries, but there are times where it's not producing what you need. Of course you then loose the portability feature of ORMs, but I think this is acceptable. The biggest advantage for me isn't generation of SQL, it's mapping data back into your domain model/DTOs.

lloeki · on April 19, 2013

> Everything else is second rate.

Given the queries I had to code lately, I'll take ARel and ActiveRecord scopes composability any day instead of mashing SQL fragment strings together and hoping for the best.

aleyan · on April 19, 2013

From the sample on the front page

  SELECT "c"."id"
  FROM "Customer" "c"
    LEFT JOIN "Order" "order-1"
      ON "c"."id" = "order-1"."customer"
  GROUP BY "c"."id"
  HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000

is translated into

  select(c for c in Customer
           if sum(c.orders.price) > 1000)

How does the python version handle nulls? They are explicitly coalesced into 0s in SQL. In python sum([1,numpy.NaN]) evaluates to nan. I think this code will do something wrong.

I am not sure what flavor of SQL is beind used, but why are the table names and the field names surrounded by double quotes? They make the query look unnecesierily hairy.

Good job otherwise. Always glad to see people contributing to the community.

kozlovsky · on April 19, 2013

No, it is the other way, Python is translated to SQL, not vice versa ;)

According to SQL standard, aggregate functions must skip NULL values. The problem is with queries which return no rows. In this case, intuitive expected result of SUM aggregate function is zero, but, because of some quirk in SQL standard, SUM must return NULL if no rows were found.

Without coalesce function, result of SUM will be returned as None if no rows were found. Coalesce replaces this surprising None with zero.

aleyan · on April 19, 2013

Duh, of course. This is what happens when I don't read the explanation and just grab on to the first pieces of code I see.

Thanks.

tom_usher · on April 19, 2013

I've enjoyed using Peewee[1] as a lightweight ORM for Python - mainly for scripting data manipulation on legacy databases but it does the job nicely. How does Pony ORM compare?

[1] https://github.com/coleifer/peewee

kozlovsky · on April 19, 2013

I have not used Peewee, but at first sight the most visible difference is the query syntax - Peewee uses method chaining, whereas Pony ORM decompiles Python AST. I think that many queries can look simpler in Pony than in Pewee.

For example, if I want to retrieve all pairs of different users with the same name, in Pony this query would look as follow:

    select((u1, u2) for u1 in User for u2 in User if u1.name == u2.name and u1 != u2)

In Peewee, this query probably would look something like this (didn't test it):

    User2 = User.alias()
    User.select(User, User2).join(User2).where((User.name == User2.name) & (User.id != User2.id))

It’s probable the matter of taste, but I like Pony syntax better. Another example is taken from the Peewee doc:

    staff_users = User.select().where(is_staff=True)
    Tweet.select().where(~(Tweet.user << staff_users))

In Pony, this query would look as follow:

    select(t for t in Tweet if t.user not in (u for u in User if u.is_staff))

Also, this simple version is possible, don't know why it is not as simple in Peewee

    select(t for t in tweet if not t.user.is_staff)

Also, I don’t know if Peewee supports IdentityMap and optimistics transactions.

On there other side, it seems Peewee already has migration support, whereas in Pony migrations are not implemented yet, this is the next task.

agentgt · on April 19, 2013

Its so much easier to do this in something like Python, Scala, Ruby... or just about any language other than Java. I built my own ORM (but uses Immutable objects) in Java that uses a fluent DSL. Is it good... no it sucks compared to what Scala and even .NET offer.

But then I thought screw that... SQL is a pretty good language so why not just make writing real SQL easier: https://github.com/agentgt/jirm/tree/master/jirm-core#sql-pl...

cjauvin · on April 19, 2013

I have solved a similar problem in a similar way (https://github.com/cjauvin/little_pger) by creating a very thin wrapper just above psycopg2, allowing to assemble SQL queries with kwargs and plain data structures.

ludwigvan · on April 19, 2013

A little off topic, but the visual editor reminded me of http://dbpatterns.com/ https://github.com/fatiherikli/dbpatterns Maybe you could join forces.

amalashkevich · on April 19, 2013

Thank you, will check this out.

csears · on April 19, 2013

It looks very cool, and as others said, it immediately reminded me of LINQ. But I wonder if there is too much magic going on for Pony to ever become widely adopted by the Python community. Regardless, nice work!

tnuc · on April 19, 2013

What is the purpose of this?

If you set up the an SQL Alchemy properly you can achieve the same thing.

If you wish complain about django then yes this sort of fixes it but it may break more than it fixes.

SQL Alchemy is free and better.

kozlovsky · on April 19, 2013

SQL Alchemy is fundamentally based on a relational model and translates queries to SQL. Pony ORM is based on an entity-relationship model, which is logically independent from SQL. Currently Pony translates queries to SQL only, but in the future noSQL support is also planned.

coolsunglasses · on April 20, 2013

This looks really nice. Korma serves as an interesting halfway point for those that use Clojure.

http://sqlkorma.com/

Siecje · on April 19, 2013

What intrigues me is the Diagram Editor. Why else would someone use this over SQLAlchmey, Storm or even Django's ORM?

swah · on April 19, 2013

Ok, you use this and now you can't even use solutions from Bill Karwin book...

daGrevis · on April 19, 2013

This seems very close to what PHP does. The difference is that PHP doesn't have so many ways to express itself so it's all done with method-chaining.

In my opinion, this is the way to write the core of an ORM and it should allow to use this abstraction layer too! It's somewhat between SQL queries as strings and ORM-level, I think.

kijin · on April 19, 2013

PHP doesn't have a built-in ORM. Which framework/library are you talking about?

daGrevis · on April 19, 2013

Not anything super specific. Kohana, for example, has ORM that uses that-alike way to built queries. And, as far as I know, it's ORM is built on top of that. I think that ORMs should have something like that in them instead of SQL queries as strings.

est · on April 19, 2013

I think the selling point of Pony ORM is that you can write variable and literals instead of string.

Django:

    Item.objects.filter(price__gt=3)

Pony ORM:

    select(o for o in Item if o.price>3)

Peewee:

    Item.select().where(Item.price>3)

Kohana:

    $Item->where('price', '>', 3)->find();

You see, the __gt or '>' here is kind of stupid.