When using low precision formats like float8 you usually have to upscale the activations to BF16 before normalising. So the normalisation layers are proportionally using more compute when going to lower precision. Replacing these layers would help reduce the compute cost significantly.
That's mostly because Julia questions get answered on its Discourse or Slack. The sharp decline is due to an automatic cross-post bot that stopped working.
No one bothered fixing it, in great part due to Discourse being the main place of discussion, as far as I know.
Even languages like Python and Javascript who are huge show a decline after 2022 which suggests ChatGPT is probably responsible. It would be better to have some other measure imo.
It measures the proportion of questions for that language out of all languages. So, if there is a general decline in Stackoverflow questions, it’s already accounted for in the metric
Can you use to launch an Intel VM on Apple Silicone and visa versa? I’m interested in doing this so I can compile C++ applications for different architectures on MacOS. Do you know of any other “easy” methods?
You can do this without virtualization/emulation, pass ‘-arch x86_64’ or ‘-arch arm64’ to clang. Or both, for a universal binary. And on Apple Silicon, you can test them both thanks to Rosetta.
Tensors used in deep learning are not the same as the definition used by Physicists - blame the DL community for this :). So DL tensors are just N-dimensional arrays of data, and there is no concept of covariance and contravariance of the dimensions. You could think of DL tensors as Cartesian tensors and they don't need to conform to the same transformation laws that Physics tensors do.
Warp is great - I use it as my daily terminal. The best features are being able to edit commands, chunking the output into blocks and AI generated commands at your fingertips.
Another implementation is still unlikely to have the exact same bugs. Especially rewrite in Rust will force the code to be structured differently (Rust is very opinionated about that).
The spec is big enough that the team won't be able to just write the exact same implementation from memory.
I don't disagree that it's sufficient, but also, ideally different people would implement the spec. If you have a particular mental model or understanding of a part of the spec that doesn't match what the spec actually says, that is likely to translate identically when writing a second implementation.