Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I just got into data analysis recently (former software engineer) and tried out pandas vs polars. I like polars way more because it feels like SQL but then sane, and it's faster. It's clear in what it tries to do. I didn't really have that with pandas.


I've been doing data analysis for decades, and stayed on R for a long time because Pandas was so bad.

People complain about R, but compared to the multitude of import lice and unergonomic APIs in Pandas, R always felt like living in the future.

Polars is a much much more sane API, but expressions are very clunky for doing basic computation. Or at least I can't find anything less clunky than pl.col("x") or pl.literal(2) where in R it's just x or 2.

Still, I'm using Python a ton more now that polars has enough steam for others to be able to understand the code.


R's data.table is still my favorite data frames API, over pandas, polars, and spark dataframes. Plotly has edged out ggplot2, but that took a long time.

IMO R is really slept on because it's limited to certain corners of academia, and that makes it seem scary and outdated to compsci folks. It's really a lovely language for data analysis.


> Or at least I can't find anything less clunky than pl.col("x") or pl.literal(2) where in R it's just x or 2.

In many cases you can pass a string or numeric literal to a Polars function instead of the pl.col (e.g. select()/group_by()).

Overall I agree it's less convenient than in dplyr in the cases where pl.col is required, sure, but not terrible and has the benefit of making the code less ambigious which reduces bugs.


I think compsci people can appreciate R as a language itself, because it has really beautiful language features. I think programmers hate it, because it's so different and lispy, with features that they can't really appreciate when coming from a C-style OOP mindset.


I think if that's too painful, you can introduce a convention of: ``` from polars import col as c, lit as l ```

For anything production though, I just stick to pl.col and pl.lit as it's widely used.


Coming from R, that introduces a different confusion problem as there, c() has a specific and common purpose. https://www.rdocumentation.org/packages/base/versions/3.6.2/...


Even then, the overhead of having an additional five characters per named variable is really unergonomic. I don't know of a way to get around it given Python's limited grammar and semantics without moving to something as Lispy as R.


Two characters, if you do `from polars import col as c` you can simply write `c.foo`, assuming the column name is a valid Python identifier.


Oh that's very interesting, thanks!!


The thing with Polars is it's really hard for me to get past the annoyance of having to do `pl.col("blah")` instead of `df.blah`. I find pandas easier for quick interactive work which is basically everything I do with it.


import polars.col as C

C.blah


Thanks! I'm not sure if pl.col improved since the last time I looked at polars or if I was too lazy to find it, but pl.col (docs) look great!


This may be a hot take, but there is now no reason to ever use pandas for new data analysis codebases. Polars is better in every way that matters.


pandas has been around for years and never tried to sell me a service.


Their (polars) FOSS solution isn't at all neuteured, imo that's a little bit of an unfair criticism. Yeah, they are trying to make their distributed query engine for-profit, but as a user of the single-node solution, I haven't been pressured at all to use their cloud solution.


Sure, just wanted to give the perspective of a new person walking into this field. I'd agree, but I think there are a lot of data analysts that have never heard of polars.

Though, I guess they're not on this site :')


Only thing I can think of is HDF5 support. That is currently stoping me from completely switching to polars.


It’s a bit of a hot take, but not wildly outlandish either.

Pandas supports so many use cases and is still more feature rich than polars. But you always have the polars.DataFrame.to_pandas() function in your back pocket so realistically you can always at least start with polars.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: