yeshengm's comments

yeshengm · on Feb 4, 2021

Since you need fast lookup, traditional transactional row store like RocksDB seems a better fit.

yeshengm · on Feb 3, 2021

Are these execution engines internally using Arrow columnar format or are they just exposing Arrow as a client wire format? AFAIK Spark and Presto does not use Arrow as execution columnar format, but just data sources/sinks.

groceryheist · on Feb 4, 2021

You can configure Spark to use arrow for passing data between Java and Python via spark.sql.execution.arrow.pyspark.enabled but yes, Spark uses Java datatypes internally.

yeshengm · on May 6, 2020

Yeah. Likely a small amount of analytics jobs taking up most memory/CPU footprint, while critical prod OLTP jobs take very little resource comparably.

yeshengm · on April 11, 2020

Nowadays most Spark users use either PySpark or Spark SQL. Spark is designed in some sense to mimic Scala collection library. Rust is nice but its interop with other open source big data libs (Hadoop/Zookeeper/HBase) can be painful.

yeshengm · on April 11, 2020

I'd say start with some eager functional language such as OCaml without the "object" part. Haskell code is hard to debug cuz you can't rely on good old print.

Monad itself is never a silver bullet. Type systems and Composability are the true power of FP IMO.

grumpyprole · on April 11, 2020

Debugging declarative code is always hard, whatever the language. Haskell does allow you to use trace expressions which will output values to standard out, see Debug.Trace.

This is no reason to avoid arguably the most popular (and state-of-the-art) function language and implementation out there.

bontaq · on April 11, 2020

It took me way too long to come across Trace, if it was more widely known I don't think we'd see so many "impossible to debug" related issues. It's great.

marceloabsousa · on April 11, 2020

I think we can all agree that debugging languages with lazy evaluation is specially hard. To effectively use Debug.Trace you actually need to add strictness and that can be fundamentally incompatible if you do intend to use the full power of lazy evaluation.

lonelappde · on April 11, 2020

Debug.Trace won't solve the space leaks lurking around every corner of your program, and won't shave the hundreds of megabytes to gigabytes of RAM the compiler needs for compiling a non trivial program with dependencies.

hopia · on April 11, 2020

It's for runtime debug logging after all, not intended to solve those issues.

Space leaks I'd like to read more about but writings about them are a bit hard to come by. Recommendations?