I just got into data analysis recently (former software engineer) and tried out ...

epistasis · 2025-03-07T22:22:24 1741386144

I've been doing data analysis for decades, and stayed on R for a long time because Pandas was so bad.

People complain about R, but compared to the multitude of import lice and unergonomic APIs in Pandas, R always felt like living in the future.

Polars is a much much more sane API, but expressions are very clunky for doing basic computation. Or at least I can't find anything less clunky than pl.col("x") or pl.literal(2) where in R it's just x or 2.

Still, I'm using Python a ton more now that polars has enough steam for others to be able to understand the code.

Centigonal · 2025-03-08T01:45:47 1741398347

R's data.table is still my favorite data frames API, over pandas, polars, and spark dataframes. Plotly has edged out ggplot2, but that took a long time.

IMO R is really slept on because it's limited to certain corners of academia, and that makes it seem scary and outdated to compsci folks. It's really a lovely language for data analysis.

minimaxir · 2025-03-07T23:03:44 1741388624

> Or at least I can't find anything less clunky than pl.col("x") or pl.literal(2) where in R it's just x or 2.

In many cases you can pass a string or numeric literal to a Polars function instead of the pl.col (e.g. select()/group_by()).

Overall I agree it's less convenient than in dplyr in the cases where pl.col is required, sure, but not terrible and has the benefit of making the code less ambigious which reduces bugs.

epistasis · 2025-03-08T02:03:25 1741399405

I think compsci people can appreciate R as a language itself, because it has really beautiful language features. I think programmers hate it, because it's so different and lispy, with features that they can't really appreciate when coming from a C-style OOP mindset.

theLiminator · 2025-03-07T23:14:58 1741389298

I think if that's too painful, you can introduce a convention of: ``` from polars import col as c, lit as l ```

For anything production though, I just stick to pl.col and pl.lit as it's widely used.

minimaxir · 2025-03-07T23:19:33 1741389573

Coming from R, that introduces a different confusion problem as there, c() has a specific and common purpose. https://www.rdocumentation.org/packages/base/versions/3.6.2/...

epistasis · 2025-03-08T02:05:40 1741399540

Even then, the overhead of having an additional five characters per named variable is really unergonomic. I don't know of a way to get around it given Python's limited grammar and semantics without moving to something as Lispy as R.

orlp · 2025-03-08T14:59:13 1741445953

Two characters, if you do `from polars import col as c` you can simply write `c.foo`, assuming the column name is a valid Python identifier.

epistasis · 2025-03-08T15:46:26 1741448786

Oh that's very interesting, thanks!!

BrenBarn · 2025-03-08T07:52:17 1741420337

The thing with Polars is it's really hard for me to get past the annoyance of having to do `pl.col("blah")` instead of `df.blah`. I find pandas easier for quick interactive work which is basically everything I do with it.

ritchie46 · 2025-03-08T11:32:00 1741433520

import polars.col as C

C.blah

prometheon1 · 2025-03-08T12:24:59 1741436699

Thanks! I'm not sure if pl.col improved since the last time I looked at polars or if I was too lazy to find it, but pl.col (docs) look great!

minimaxir · 2025-03-07T22:39:50 1741387190

This may be a hot take, but there is now no reason to ever use pandas for new data analysis codebases. Polars is better in every way that matters.

latenightcoding · 2025-03-07T23:44:21 1741391061

pandas has been around for years and never tried to sell me a service.

theLiminator · 2025-03-08T02:00:50 1741399250

Their (polars) FOSS solution isn't at all neuteured, imo that's a little bit of an unfair criticism. Yeah, they are trying to make their distributed query engine for-profit, but as a user of the single-node solution, I haven't been pressured at all to use their cloud solution.

melvinroest · 2025-03-07T22:44:09 1741387449

Sure, just wanted to give the perspective of a new person walking into this field. I'd agree, but I think there are a lot of data analysts that have never heard of polars.

Though, I guess they're not on this site :')

The-Ludwig · 2025-03-08T03:35:05 1741404905

Only thing I can think of is HDF5 support. That is currently stoping me from completely switching to polars.

comte7092 · 2025-03-07T22:50:16 1741387816

It’s a bit of a hot take, but not wildly outlandish either.

Pandas supports so many use cases and is still more feature rich than polars. But you always have the polars.DataFrame.to_pandas() function in your back pocket so realistically you can always at least start with polars.