Yes. Let’s say you want certain features of a voice sample. You need to do that feature engineering every time before you send it to the model. Doesn’t it make sense to do it in C++ or Rust? This is currently already done. So if you already are starting to do parts of the feature engineering in Rust why not continue?
Yeah it’s not reasonable right now because Python has the best ecosystem. But that will not always be the case!
I can’t exactly tell what you mean but I think you’re confusing two levels of abstraction here. C++ (or rust) and python already work in harmony to make training efficient.
1. In tensorflow and similar frameworks the Python runtime is used to compose highly optimized operations to create a trainable graph.
2. C++ is used to implement those highly optimized ops. If you have some novel feature engineering and you need better throughput performance than a pure python op can give you, you’d implement the most general viable c++ (or rust) op and then use that wrapped op in python.
This is how large companies scale machine learning in general, though this applies to all ops not just feature engineering specific ones.
There is no way that Instagram is using a pure python image processing lib to prep images for their porn detection models. That would cost too much money and take way too much time. Instead they almost certainly wrap some c++ in some python and move on to more important things.
I know. That’s how we do it too. You don’t see any benefits in instead of Python wrappers + C++ just do Rust? Especially in handling large data like voice iff there was a good ecosystem and toolbox in place?
Maybe but then we’re no longer making an argument about performance, which is what I was responding to in your initial claim about “everything counts” and numpy shuffle being slow. That’s a straw man argument that has zero bearing on actual engineering decisions.
Yeah it’s not reasonable right now because Python has the best ecosystem. But that will not always be the case!