Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the linked docs:

> If you want to disable thinking, you can set the reasoning effort to "none".

For other APIs, you can set the thinking tokens to 0 and that also works.



Wow thanks I did not know


We added it to the docs. The downside of the OAI compat endpoint is we have to design the API twice, once for our API, then once through the OAI compat layer which makes it slower sometimes to have certain features, especially if we diverge at all.


Thanks, yes makes sense.

BTW, I have noticed that when tested outside GCP, the OpenAI compat endpoint has significantly lower latency for most requests (vs using the genai library). VertexAI is better than both.

Any idea why or if that will change?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: