It's good to think of dy/dx as (d/dx)y. In addition, it is also possible to make some sense of dy/dx. Here's one very hand-wavy way of looking at it.
Let ε be something very small, and define the difference operator d so that (df)(x) = f(x + ε) - f(x). Usually we don't want to handle the functions dx and dy by themselves, because they are so small, and their exact values depend on ε. But when we divide dy by dx we get something that is no longer ε-sized, and doesn't (in a limit sense) depend on the value of ε.
And why think way? When I learned the chain rule dy/dx = dy/du * du/dx I was told that even though the du's appear to cancel out, this is just abuse of notation and basically a meaningless coincidence. I understand that the teachers just wanted students to be careful; they don't want people "simplifying" dx/dy to x/y. However, I was never really satisfied with this explanation. I finally realized that by thinking about it using the difference operator above, it is not a meaningless coincidence: the du's actually do, in a sense, cancel out.
Let ε be something very small, and define the difference operator d so that (df)(x) = f(x + ε) - f(x). Usually we don't want to handle the functions dx and dy by themselves, because they are so small, and their exact values depend on ε. But when we divide dy by dx we get something that is no longer ε-sized, and doesn't (in a limit sense) depend on the value of ε.
And why think way? When I learned the chain rule dy/dx = dy/du * du/dx I was told that even though the du's appear to cancel out, this is just abuse of notation and basically a meaningless coincidence. I understand that the teachers just wanted students to be careful; they don't want people "simplifying" dx/dy to x/y. However, I was never really satisfied with this explanation. I finally realized that by thinking about it using the difference operator above, it is not a meaningless coincidence: the du's actually do, in a sense, cancel out.