We added it to the docs. The downside of the OAI compat endpoint is we have to design the API twice, once for our API, then once through the OAI compat layer which makes it slower sometimes to have certain features, especially if we diverge at all.
BTW, I have noticed that when tested outside GCP, the OpenAI compat endpoint has significantly lower latency for most requests (vs using the genai library). VertexAI is better than both.
> If you want to disable thinking, you can set the reasoning effort to "none".
For other APIs, you can set the thinking tokens to 0 and that also works.