I just got into data analysis recently (former software engineer) and tried out pandas vs polars. I like polars way more because it feels like SQL but then sane, and it's faster. It's clear in what it tries to do. I didn't really have that with pandas.
I've been doing data analysis for decades, and stayed on R for a long time because Pandas was so bad.
People complain about R, but compared to the multitude of import lice and unergonomic APIs in Pandas, R always felt like living in the future.
Polars is a much much more sane API, but expressions are very clunky for doing basic computation. Or at least I can't find anything less clunky than pl.col("x") or pl.literal(2) where in R it's just x or 2.
Still, I'm using Python a ton more now that polars has enough steam for others to be able to understand the code.
R's data.table is still my favorite data frames API, over pandas, polars, and spark dataframes. Plotly has edged out ggplot2, but that took a long time.
IMO R is really slept on because it's limited to certain corners of academia, and that makes it seem scary and outdated to compsci folks. It's really a lovely language for data analysis.
> Or at least I can't find anything less clunky than pl.col("x") or pl.literal(2) where in R it's just x or 2.
In many cases you can pass a string or numeric literal to a Polars function instead of the pl.col (e.g. select()/group_by()).
Overall I agree it's less convenient than in dplyr in the cases where pl.col is required, sure, but not terrible and has the benefit of making the code less ambigious which reduces bugs.
I think compsci people can appreciate R as a language itself, because it has really beautiful language features. I think programmers hate it, because it's so different and lispy, with features that they can't really appreciate when coming from a C-style OOP mindset.
Even then, the overhead of having an additional five characters per named variable is really unergonomic. I don't know of a way to get around it given Python's limited grammar and semantics without moving to something as Lispy as R.
The thing with Polars is it's really hard for me to get past the annoyance of having to do `pl.col("blah")` instead of `df.blah`. I find pandas easier for quick interactive work which is basically everything I do with it.
Their (polars) FOSS solution isn't at all neuteured, imo that's a little bit of an unfair criticism. Yeah, they are trying to make their distributed query engine for-profit, but as a user of the single-node solution, I haven't been pressured at all to use their cloud solution.
Sure, just wanted to give the perspective of a new person walking into this field. I'd agree, but I think there are a lot of data analysts that have never heard of polars.
It’s a bit of a hot take, but not wildly outlandish either.
Pandas supports so many use cases and is still more feature rich than polars. But you always have the polars.DataFrame.to_pandas() function in your back pocket so realistically you can always at least start with polars.