Hi, I'm Alexander Kozlovsky, one of Pony ORM authors.
I'll try to answer some questions
> why such an obsession with "pure" Python?
With this statement we are trying to say Pony is not just to offer some syntax sugar. The end goal of Pony ORM is to understand the semantic of each Python generator which can be translated to database query and then do translation accordingly.
The direct benefit of this is that generator query are more high-level then resulted SQL, and can be much easier to understand and refactor. The simplest example is an automatic converting of attribute path traversing such as grade.student.group.department.name to a minimal set of SQL joins. Pony also can use attribute path in reverse direction, from “one” to “many”, such as department.groups.students.gpa, which in this case resulted in a multiset of all ‘gpa’ values for this department. Also you can see examples of query optimization in the documentation, for example when the subquery is converted to LEFT JOIN with GROUP BY, where it can give performance benefit.
There are more distant benefits of the semantic translation. In future, we plan to add support of denormalized database schemas. Such denormalization will take place on a physical level (i.e. in the database) while on the logical level (in Python) all queries and logic will remain the same. But first of all we are going to add the migration support.
Another future plan is to add support of MongoDB. This is a distant plan, but I think Pony architecture will allow this. After that, migration of PonyORM-based project from relational databases to noSQL and vice versa will be possible.
> An ORM like Django's offers a higher level of abstraction
I disagree with this. I think, any Django query can be written in PonyORM much more concisely, and there are easy-to-understand PonyORM queries which cannot be expressed in Django easily.
The NonCommercial part makes it not open source, and not libre software. Both the OSI and the FSF clearly state that FLOSS software has to allow commercial use for it to qualify.
I'm not very familiar with the culture of the Python community so this might come across as offending, but why such an obsession with "pure" Python?
On one extreme, of course, there are spaghetti apps that mix PHP, HTML, CSS, JS, shell, and SQL in the same file. We all know and hate those apps. But is there any reason to jump to the other extreme and turn as much as possible into "pure" $LANGUAGE ?
As soon as your query gets moderately complicated, you're still littering Python with SQL keywords like order_by(), desc(), select(), and commit(). The GROUP BY section of the documentation just reads like a rough translation of SQL into Python, the only difference being the syntax. It's like watching a first-week ESL student try to construct sentences in English.
The documentation advertises that "Pony allows any programmer to write complex and effective queries against a database, even without being an expert in SQL." I don't think this is true for any ORM I've seen so far, whether in Python or in any other popular language. Beyond a certain level of complexity, you need to know SQL in order to write complex queries. But if you already know SQL, why translate long GROUP BY ... HAVING queries into Python only to have the ORM translate it back into SQL? Why are we trying so hard to avoid writing SQL? What are we going to do next? Write a library that translates pure Python into Lua scripts for your Redis server?
I like ORMs because they simplify frequent tasks, like grabbing a dozen items from the database and filtering them by a couple of columns. I also like them because they often come with caching and effective protections against SQL injection attacks. But I also think that purity is overrated. Both web apps and native apps are already a mixture of several different languages, both on the frontend and on the backend. Don't be afraid to add SQL to your belt, it's just another language.
(By the way, why is there an order_by() method and a separate orderby() method?)
I'm Alexey Malashkevich, one of the Pony ORM authors.
For now Pony ORM runs on CPython only because there is no access to frames in other implementations. "Pure Python" means that you can write queries in term of objects using Python generators.
Pure in this case means that you can use Python syntax in order to query a database. This way a database query looks identical to an iteration over a list of Python objects.
SQL is declarative, statically typed but with an incredible type-inference system, and does an amazing amount of optimization behind the scenes. If it weren't so domain-specific, it would be the holy grail of languages. It amazes me how many developer hours of effort has been put into this global quest to completely avoid SQL. It is starting to remind me of all the people I work with that would rather write 10k lines of VBA for some unmaintainable Excel spreadsheet than learn to write 300 lines of Python/Ruby/Lua/Whatever that can accomplish exactly the same task.
I can understand some hesitation to learn something new, especially when there are so many new things to learn and the value of learning them is not always apparent...but I would assume that the benefits of knowing SQL would be readily apparent. I'm not even a developer and I would feel like a quadriplegic without it.
It's not hesitance to learn something new. People avoid SQL for a variety of fairly rational reasons. For me, the biggest one is that most general-purpose programming languages encourage you to solve large problems by splitting them up into manageable chunks. SQL, on the other hand, encourages you to lump different problems together (for performance reasons).
It also has remarkably poor tools for code reuse and abstraction. Have you ever worked with SPs that dynamically generate SQL code? I see it daily. Not only that, it's code written by professional database administrators.
> If it weren't so domain-specific, it would be the holy grail of languages.
That's relatively true of many languages. The reason SQL has been able to be optimized so heavily (and the reason why it's been adopted so broadly) is because it is so domain specific.
More specifically, it is optimized for expression set algebraic operations, and certain kinds of kernels to apply to resultant sets. It is most definitely NOT suited for most array or otherwise metrically-oriented data series, which is why you never see it used in HPC to express timeseries transformations or, say, 3D PDE diffusion simulations. For those applications, people have come up with different kinds of specialized languages which are rooted in an more appropriate data abstraction than merely set operations.
Right, there's only like a million of these already in existence in every language and I have yet to find one that is comprehensive enough that I don't just fall back to regular sql queries sooner or later.
And if you think about it, any library that was powerful enough for me to not write SQL anymore would basically have to rewrite the complete SQL spec using awkward hacks in a language that is totally unsuited for it.
And if you're writing an application of any complexity or size using an SQL database you need the full capacities of SQL. There are a lot of little features or quirks of SQL that these languges seem to overlook that you need to make your app perform.
Now I usually just write a couple functions like this:
query("SELECT one, two FROM mytable",
["WHERE weight > %f AND name = '%s'", weight, name]);
That gives me SQL-injection protection, lets me save subqueries as strings and re-use them, and caching to if I need it.
And as a plus, I don't have continuously look up/relearn each little feature of SQL in this little language, or if I write it this way is it actually going to compile into the SQL I want? It's pretty much inevitable that you're going to need to know/control the SQL that's being written sooner or later (usually sooner), so save yourself the extra headache and just go SQL from the start.
I know we're talking about python, but personally I advise against raw SQL because there is no type protection of any kind. In, e.g. C# I can use Linq to ensure I build a query that type checks and makes sense, even across joins.
It seems from others' replies that it's not optional. It certainly isn't in English. I put it in not to correct but to explain my interpretation of what was written to better frame my question.
Hi, I answered about “pure Python” obsession in a separate comment ;)
> you're still littering Python with SQL keywords like order_by(), desc(), select(), and commit()
Well, I see them not as “SQL keywords” but as logical concepts
> The GROUP BY section of the documentation just reads like a rough translation of SQL into Python
Hmm, not sure I understand this. With PonyORM, GROUP BY section is generated automatically, and not present in Pony ORM queries. This eliminates the need to duplicate the same columns in SELECT and GROUP BY sections.
> Beyond a certain level of complexity, you need to know SQL in order to write complex queries.
I agree with this. But I hope PonyORM can lower entry barrier for new developers. Also, PonyORM allows developers to use pure SQL, if all they need from mapper is IdentityMap, caching, etc.
> By the way, why is there an order_by() method and a separate orderby() method?
Thanks for noticing this, it is a typo in documentation. The method was renamed from orderby() to order_by() recently.
Here's an example from the documentation that just reads like a "rough translation of SQL into Python":
select((item.order, item.order.total_price,
sum(item.price * item.quantity))
for item in OrderItem
if item.order.total_price < sum(item.price * item.quantity))
Well, it doesn't seems that "rough" to me, because content of GROUP BY and HAVING sections is determined automatically and there is no need to manually join tables
One guess is that python is very popular among large non-developer communities, like GIS, Finance, and various groups doing numerical analysis. Many of these people aren't programmers and have no desire to become programs, they simply use python to automate tedious tasks and script the applications they use. If you tell them they have to learn a second language completely unrelated to the one they already know just to grab some data from a database, they'll probably push back. Tell them they can grab data from the database using the language they already know and they'll be much more interested in learning. Also the diagram editor is conceptually similar to many tools they might already be using and thus it might be easier for them to get a hang of it.
I think you need to know SQL to use this. An ORM like Django's offers a higher level of abstraction and removes the need to fully understand SQL in many cases. This is closer to a direct translation.
A few possible reasons I can think of;
- People prefer to only work in one language - you don't need to make the full mental shift
- It works better/looks better in Python editors
- It should be safer and cleaner that mashing SQL strings together.
You are assuming the only purpose of doing this IS to generate SQL. In this library that may be the case, but the truth is with say Linq in .NET there are a variety of back ends (Linq to objects being very useful). Many sql operations are very useful when working with data, even outside of a database. None of this is new.
I really like the style of this. I remember seeing LINQ years ago and wondering if something like it was possible in Python. I was disappointed when it looked like it couldn't be done, but generating an AST then translating to SQL is smart.
It seems from this thread not many people appreciate the parity of:
list(o for o in Item if o.price>3)
when Item is a normal in-memory Python iterable, and:
select(o for o in Item if o.price>3)
when Item is a row in a database.
After just helping someone get started that was new to Python and Django, it was weird helping them learn list comprehensions and generators and then having to teach such a divergent form of syntax to work with the Django ORM even though the a lot of the concepts should be similar.
It will be very interesting to see, if this can be a commercial success, and a good example to learn from.
In competition with Django ORM and Sqlalchemy, both of which are open source and battle tested, they will have to do some cutting edge marketing to justify the cost "per process".
The idea is great. We might see something like this pop up in Sqlalchemy or Django real soon...
This definitely reminds me of LINQ (never having used it).
As much as I dislike things like this that translate a language into another, but not quite right, as you have one syntax trying to mimic a completely different one, you do (or could) get the benefit of the possibility of autocompletion, syntax highlighting and lint checks.
I know that adding something like LINQ in Python is completely out of the question, as you'd have a special syntax case, and I agree with that.
For some time I've wondered how hard would it be to reverse traverse the Python AST to find literal strings that are being passed to a dbapi2 interface, and syntax highlight/lint/autocomplete the SQL on those strings. This gets there by bypassing the "Language is in a string" issue.
You're pretty lucky that you haven't had to know SQL to have good luck.
Having used Linq2SQL for a few years, I don't like this style of coding. Often, to optimize a query, we've had to look at the generated SQL and then reverse engineer it back to Linq to get the SQL that we wanted. While certain queries were really easy in Linq2SQL, the more complex ones were extremely difficult. I had to still understand and write the underlying SQL, then I had to figure out how to "translate" that into the Linq2SQL. In the long run, all time savings were nullified. I see the same with a DSL-type of wrapper like this.
Additionally, almost all good backend devs speak SQL, I can take a Java-person and without the Linq2SQL stuff, he could understand our backend, work on queries, etc. The Linq just added another layer of complexity for someone to think about.
(Granted, there's a database layer that we could truly put a lot of that stuff in, if our queries got really complex)
I haven't used linq2SQL or anything, just LINQ on collections in .NET.
A team I'm a part of (language is C#) had to get rid of a library that handled SQL and do the SQL manually because the library was making a lot of inefficient calls. Sounds like a similar problem.
Thanks for pointing this out, we are working on the licensing part now. We are not going to put anyone at risk of the licence misuse. Eventually we are going to have a multiple licensing model allow using Pony ORM for free for non-commercial and open source projects, but in the same provide a commercial license as well.
> are people who find the repo on github directly at risk of violating the license?
.NET has had this since 2006. LINQ to SQL and Entity Framework both support LINQ expressions. The problem is that more and more .NET devs are shunning it when working their relational database. There is an adage that if you're working with SQL then you should just be writing directly in SQL. Everything else is second rate.
As long as an ORM allows you to break out and right custom optimised SQL when needed, then I see no reason to write all your SQL by hand. ORMs are great for covering basic, repetitive queries, but there are times where it's not producing what you need. Of course you then loose the portability feature of ORMs, but I think this is acceptable. The biggest advantage for me isn't generation of SQL, it's mapping data back into your domain model/DTOs.
Given the queries I had to code lately, I'll take ARel and ActiveRecord scopes composability any day instead of mashing SQL fragment strings together and hoping for the best.
SELECT "c"."id"
FROM "Customer" "c"
LEFT JOIN "Order" "order-1"
ON "c"."id" = "order-1"."customer"
GROUP BY "c"."id"
HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000
is translated into
select(c for c in Customer
if sum(c.orders.price) > 1000)
How does the python version handle nulls? They are explicitly coalesced into 0s in SQL. In python sum([1,numpy.NaN]) evaluates to nan. I think this code will do something wrong.
I am not sure what flavor of SQL is beind used, but why are the table names and the field names surrounded by double quotes? They make the query look unnecesierily hairy.
Good job otherwise. Always glad to see people contributing to the community.
No, it is the other way, Python is translated to SQL, not vice versa ;)
According to SQL standard, aggregate functions must skip NULL values. The problem is with queries which return no rows. In this case, intuitive expected result of SUM aggregate function is zero, but, because of some quirk in SQL standard, SUM must return NULL if no rows were found.
Without coalesce function, result of SUM will be returned as None if no rows were found. Coalesce replaces this surprising None with zero.
I've enjoyed using Peewee[1] as a lightweight ORM for Python - mainly for scripting data manipulation on legacy databases but it does the job nicely. How does Pony ORM compare?
I have not used Peewee, but at first sight the most visible difference is the query syntax - Peewee uses method chaining, whereas Pony ORM decompiles Python AST. I think that many queries can look simpler in Pony than in Pewee.
For example, if I want to retrieve all pairs of different users with the same name, in Pony this query would look as follow:
select((u1, u2) for u1 in User for u2 in User if u1.name == u2.name and u1 != u2)
In Peewee, this query probably would look something like this (didn't test it):
Its so much easier to do this in something like Python, Scala, Ruby... or just about any language other than Java. I built my own ORM (but uses Immutable objects) in Java that uses a fluent DSL. Is it good... no it sucks compared to what Scala and even .NET offer.
I have solved a similar problem in a similar way (https://github.com/cjauvin/little_pger) by creating a very thin wrapper just above psycopg2, allowing to assemble SQL queries with kwargs and plain data structures.
It looks very cool, and as others said, it immediately reminded me of LINQ. But I wonder if there is too much magic going on for Pony to ever become widely adopted by the Python community. Regardless, nice work!
SQL Alchemy is fundamentally based on a relational model and translates queries to SQL. Pony ORM is based on an entity-relationship model, which is logically independent from SQL. Currently Pony translates queries to SQL only, but in the future noSQL support is also planned.
This seems very close to what PHP does. The difference is that PHP doesn't have so many ways to express itself so it's all done with method-chaining.
In my opinion, this is the way to write the core of an ORM and it should allow to use this abstraction layer too! It's somewhat between SQL queries as strings and ORM-level, I think.
Not anything super specific. Kohana, for example, has ORM that uses that-alike way to built queries. And, as far as I know, it's ORM is built on top of that. I think that ORMs should have something like that in them instead of SQL queries as strings.
> why such an obsession with "pure" Python?
With this statement we are trying to say Pony is not just to offer some syntax sugar. The end goal of Pony ORM is to understand the semantic of each Python generator which can be translated to database query and then do translation accordingly.
The direct benefit of this is that generator query are more high-level then resulted SQL, and can be much easier to understand and refactor. The simplest example is an automatic converting of attribute path traversing such as grade.student.group.department.name to a minimal set of SQL joins. Pony also can use attribute path in reverse direction, from “one” to “many”, such as department.groups.students.gpa, which in this case resulted in a multiset of all ‘gpa’ values for this department. Also you can see examples of query optimization in the documentation, for example when the subquery is converted to LEFT JOIN with GROUP BY, where it can give performance benefit.
There are more distant benefits of the semantic translation. In future, we plan to add support of denormalized database schemas. Such denormalization will take place on a physical level (i.e. in the database) while on the logical level (in Python) all queries and logic will remain the same. But first of all we are going to add the migration support.
Another future plan is to add support of MongoDB. This is a distant plan, but I think Pony architecture will allow this. After that, migration of PonyORM-based project from relational databases to noSQL and vice versa will be possible.
> An ORM like Django's offers a higher level of abstraction
I disagree with this. I think, any Django query can be written in PonyORM much more concisely, and there are easy-to-understand PonyORM queries which cannot be expressed in Django easily.
Also, Pony ORM completely eliminates N+1 select problem.