Yes. Let’s say you want certain features of a voice sample. You need to do that ...

kahnjw · on Aug 18, 2019

I can’t exactly tell what you mean but I think you’re confusing two levels of abstraction here. C++ (or rust) and python already work in harmony to make training efficient.

1. In tensorflow and similar frameworks the Python runtime is used to compose highly optimized operations to create a trainable graph. 2. C++ is used to implement those highly optimized ops. If you have some novel feature engineering and you need better throughput performance than a pure python op can give you, you’d implement the most general viable c++ (or rust) op and then use that wrapped op in python.

This is how large companies scale machine learning in general, though this applies to all ops not just feature engineering specific ones.

There is no way that Instagram is using a pure python image processing lib to prep images for their porn detection models. That would cost too much money and take way too much time. Instead they almost certainly wrap some c++ in some python and move on to more important things.

danielscrubs · on Aug 19, 2019

I know. That’s how we do it too. You don’t see any benefits in instead of Python wrappers + C++ just do Rust? Especially in handling large data like voice iff there was a good ecosystem and toolbox in place?

kahnjw · on Aug 19, 2019

Maybe but then we’re no longer making an argument about performance, which is what I was responding to in your initial claim about “everything counts” and numpy shuffle being slow. That’s a straw man argument that has zero bearing on actual engineering decisions.

EDIT: clarification in first sentence