I think the point is that a monad is a useful concept _purely_ because of what it _allows_ you to do, and _not_ because of anything syntactical. Those rules that you're obviating there, the commutative squares, are precisely what then lets us have powerful intuitions about these objects. The type signatures matter a lot less. If, for example, you don't have functoriality (which is false for `std::vector`, for instance, since `std::vector<bool>` is special-cased) you lose the ability to reason powerfully about abstract algorithms.
Thus, explaining the syntax and where the type variables go is explaining the least relevant thing about monads to their power and importance. It's certainly easy to showcase both the syntax and the list and maybe monads, that's part of the "monad tutorial fallacy". Gaining intuition for how to think about monads _in general_ is a lot harder and requires practice. Like, yes, list and maybe are "containers", but is `(->) t` a container? Is `IO`? How do these compose, if at all? What is this about "effect" semantics, "I thought monads were just burritos/containers"? etc. These are the hard, both conceptually and pedagogically, questions. Yes you need to know the syntax to use it in any given programming language, but knowing what scabbard your knife fits in doesn't give you the skills of how to use knife :)
This is an excellent summary of these techniques :) I like that every single one comes with an example implementation, with shape comments on the tensors. Thanks Stephen!
It's a chip (and associated hardware) that can do linear algebra operations really fast. XLA and TPUs were co-designed, so as long as what you are doing is expressible in XLA's HLO language (https://openxla.org/xla/operation_semantics), the TPU can run it, and in many cases run it very efficiently. TPUs have different scaling properties than GPUs (think sparser but much larger communication), no graphics hardware inside them (no shader hardware, no raytracing hardware, etc), and a different control flow regime ("single-threaded" with very-wide SIMD primitives, as opposed to massively-multithreaded GPUs).
Thank you for the answer! You see, up until now I had never appreciated that a GPU does more than matmuls... And that first reference, what a find :-)
Edit: And btw, another question that I had had before was what's the difference between a tensor core and a GPU, and based on your answer, my speculative answer to that would be that the tensor core is the part inside the GPU that actually does the matmuls.
All of them are vectors of embedded representations of tokens. In a transformer, you want to compute the inner product between a query (the token who is doing the attending) and the key (the token who is being attended to). An inductive bias we have is that the neural network's performance will be better if this inner product depends on the relative distance between the query token's position, and the key token's position. We thus encode each one with positional information, in such a way that (for RoPE at least) the inner product depends only on the distance between these tokens, and not their absolute positions in the input sentence.
No. In the common use of the word fine-tuning, one is in the supervised learning scenario. One has an input prompt, and an output sentence. One teaches the model to say that output in response to that prompt. In the reinforcement learning scenario, one has a prompt, and a way of rewarding the model for different outputs. One can have, for instance, a reward model, that assigns a reward for a given model output. One could also have a pairwise reward model, where the learner is sampled with that prompt twice (with different RNGs), and the reward model gives a reward based on the better of the two samples. You could also have humans give these pointwise or pairwise rewards.
In essence, one is not telling the model "This. This is what you should output next time." but rather "I liked this reply. Have a cookie." The behaviors that you can learn in RL are more subtle, but you get a lot less information per step. That's because, in a causal language modeling objective, when I tell you "For the prompt X, you should output exactly Y[0...m)", you get a gradient for P(Y[0] | X), another one for P(Y[1] | X Y[0..1)), another for P(Y[2] | X Y[0..2)), another for P(Y[3] | X Y[0..3)), and so on. It's a lot more of a step-by-step guidance, than it is a sentence-wise reward that you get in the RL framework. In RL, I'd give you a cookie for P(Y | X). What part of Y made me give you that cookie? Was there even such a part? Was it perhaps some internal representation that made everything in Y better? That's for the model to learn.
One wrinkle, is that it is now common to fine-tune on previously derived RL datasets, with the tested inputs and preferred sample outputs as the training data.
"Almost nobody except X does Y." and "Z does Y, with Z != X" are consistent. Is your disagreement entirely due to the (possibly nil) distinction between "Almost nobody except X does Y" and "Almost nobody does Y -- Except X", which is how the article is worded? If so, at least one native English speaker will disagree.
The poster you're replying to seems to have more of an emphasis in how the "Wrong." parent was worded. There was no need to be that confrontational, when one is adding a point of information ("Z does Y, with Z != X") that is most likely consistent with the thread title, and with the article content.
I don't see a link between that Bloomberg article and these kidnappings happening mostly to married men cheating. Nor is it the case that "love motels" are primarily used for cheating. We have plenty of those in Argentina. As the Bloomberg article indicates, they serve a social need because of the traditional multigenerational family homes people live in.
And hardly will find in other article due the fact that cheaters will lie to protect their identify if they ever come across a survey or interview. The "love motel" being used for cheating is a known thing across brazilians and still is not easy to find in writing. Also, BR motels are different from the Argentian ones (at least the ones that I saw in Buenos Aires).
the list was to give some sort of reference to the statement "they arrange dates on empty or far away places, late night and don't communicate where they're going" posted above
I think k8sToGo's point is that she never "joined Alphabet", Alphabet did not exist when she joined Larry and Sergey. She joined concurrent with Google's founding as a company, she probably joined when the search engine was named Backrub. Alphabet would be created decades later.
* There's almost always a simple geometric intuition, and low-dimensional intuition can get you quite far even in high dimensional cases.
* You can surprisingly often get by with closing your eyes and saying "my problem is linear" three times. See: All of neural networks.
* Linear problems have practically all nice properties you could ever ask of any function.
Has made linear algebra by far the most bang/buck mathematics topic I've studied in my life. Close behind is asymptotic analysis.
That seems a weird use of the word. By that notion, every convicted criminal is a victim, because the court imposed a sentence that they, presumably, do not like. This removes practically all meaning from the word "victim". In common parlance, well-understood consequences of one's own actions occurring does not make one a victim.
> This removes practically all meaning from the word "victim".
No, it removes the personal judgement from the word "victim," where we decide whether a particular victim deserves their punishment as a preliminary to discussing their situation.
-----
edit: i.e. where sympathetic people who are killed are victims, and unsympathetic people are killed are being portrayed as victims. It's an attempt to distract from the material facts of a situation with arguments about language.
edit 2: In the spirit of tripling down, I'm also giving this discussion more credit than it deserves. This is about someone saying that the word "cancel" implies the word "victim." So here's the implied argument afaict.
1) Using the word "cancel" to refer to an imposition on your work means that you're implying you're a "victim."
2) A "victim" is someone who is undeserving of what has happened to them.
3) This person is deserving of what has happened to them.
But your original reply didn't say "he didn't call himself a victim," it tried to argue why he was a victim. The gp to this reply was pointing out how absurd your definition of victim is. It's not making an argument about whether or not he called himself a victim, it makes an argument about what you yourself defined as a victim.
I've made that argument, and you can accept it or not. The larger point is that accusing people of playing victims is always a distraction from actual argument, and the entire thread is evidence of that. It started with irrelevance and ended nowhere.
If a court ignored the law and imposed a criminal sentence merely because it didn't like the defendant, then that would be an injustice and you would call the convicted person a victim.
Thus, explaining the syntax and where the type variables go is explaining the least relevant thing about monads to their power and importance. It's certainly easy to showcase both the syntax and the list and maybe monads, that's part of the "monad tutorial fallacy". Gaining intuition for how to think about monads _in general_ is a lot harder and requires practice. Like, yes, list and maybe are "containers", but is `(->) t` a container? Is `IO`? How do these compose, if at all? What is this about "effect" semantics, "I thought monads were just burritos/containers"? etc. These are the hard, both conceptually and pedagogically, questions. Yes you need to know the syntax to use it in any given programming language, but knowing what scabbard your knife fits in doesn't give you the skills of how to use knife :)