Are you afraid that you're going to "inherit" the issues the python ecosystem ha...

throwawaymaths · on May 2, 2023

Agreed. I think this effort is completely missing the real pain points that ML suffers from.

While python the language is easy, and in many ways great for its original purpose as a teaching language, I'll take note of the few ways that Python ML suffers:

- pip hell. Really, having globally installed dependencies was great for the 90s and is terrible now that disk space is more or less a non-issue relative to dependencies. Venv/conda which do sneaky things e.g. with your shell is super dangerous (https://twitter.com/garybernhardt/status/1653171980483575808), and a misstep can trash your system especially when it has to deal with wheels with system-level dependencies (looking at you, tensorflow -- probably half of the reason why people moved to pytorch). Poetry sounds nice. It's been a while since I've checked in with the python ecosystem. Are ML people using that yet?

- Subpar deployment. Let's remember that Containerization basically exists because Python does not have an ops story.

- Subpar integration with web. You are forced to either create a microservice, or, spin it up within Django (nobody really does this). Then you typically have to pull in a bunch of sidecar processes (Redis, Celery, etc.) just to get queuing of your web jobs correct.

- Poor concurrency. Sure, you can run your tensorflow code in an awkward 'with' statement but I think there are very few ML practicioners who could really explain to you what that with is doing. That GPU is actually fundamentally an asynchronous entity. And god help you if you want to run and debug async python.

- No distribution story. Sure, the big guys are able to spin up, e.g. Horovod, but it's not really a thing for someone with less resources for a hot second on a few machines, and again, god help you if something goes wrong and you need to debug it.

Does Mojo solve any of these issues? From a cursory look, it looks like no.

cavisne · on May 3, 2023

I guess their argument is the reason those things suck is because they are hooking in some c++ monstrosity (tensorflow) or making rpcs to an external daemon (redis, horovod).

So rather than writing another Python wrapper over c++ they are making a new performant language that can call Python.

To me it makes sense as torch is great and hard to compete with, but everything feeding into it is a mess today (Data loading, distribution logic).

throwawaymaths · on May 4, 2023

> a new performant language that can call Python

don't forget the control layer/data layer separation principle. Performance mostly only matters at the data layer, and I don't believe that python ML really has a substantial problem with this, aside from not having a real distribution story. So "having a more performant python" doesn't really solve that much.

I'll tell you what could make the control layer better.

- no gil

- better async primitives

- immutability of passed parameters

- better testing story

- better documentation story (python is quite good at documentation, well, when python devs actually do it, which they usually don't).

- project-local dependencies with no shenanigans

itissid · on May 4, 2023

> control layer/data layer separation principle

Could you explain or give references to what you exactly mean by this? I've heard of separation of concerns, but is this a specific realization of that principle?

throwawaymaths · on May 4, 2023

Ugh. I misremembered terms. Look up control plane and data plane. This is well understood.

derbOac · on May 3, 2023

It might make the issues worse, actually, if you end up with a language that promises full support for python as a subset, but indefinitely doesn't actually do that, and is nevertheless encouraging people to import python libraries. Then it's like a wrapper around a wrapper around a dependency tangle...

This post has me interested in mojo a lot and it has a lot of potential but it's difficult for me to get too excited about it at the moment because so much of it doesn't actually exist at the moment. Nothing is open sourced yet, and from the docs it seems they don't even have classes implemented.

My experience with new languages is that the devil is often in the details and the stuff that gets put off is sometimes where the sticking points are, where performance starts to decline relative to other languages, and where you start to run into dependency hell. It's not so much I want mojo to fail or anything — the contrary in fact — but it's so hard to know where it will end up this early in its development.

qumpis · on May 2, 2023

Poetry and pipenv seem to be quite popular among ML researchers, especially those who care about reproducibility.

int_19h · on May 3, 2023

For DS/ML, Conda is probably used more often than everything else combined.

freilanzer · on May 3, 2023

I use pyenv and poetry.

freilanzer · on May 3, 2023

> original purpose as a teaching language

The original purpose was to act as a glue language for C++.

throwawaymaths · on May 4, 2023

fair. I confused its origin story for "what it was actually used for in its first decade"/why it became popular. Apologies.