I still don't really understand what Vertex AI is.
If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.
You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.
Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai
It's a way for you to have your AI billing under the same invoice as all of your other cloud purchases. If you're a startup this is a dumb feature, if you work at a $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers
> $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers
What's the point of working at $ENTERPRISE_BIGCO if you don't fight with IT & Legal & various annoying middle managers.
Anyway let's table this for now and circle back later after we take care of some of the low hanging fruit. Keep me in the loop and I will do a deep dive into how we can think outside the box and turn this into a win-win. I will touch base with you when I have all my ducks in a row and we can hop on a call.
If they replaced the leet code interviews with department wide meetings and email chain take home tasks I could make hay and really shine with a series of No nothing from this side, FYI's and back burners.
Google sounds like a fun place to work, run it up the flagpole and see if you can move the needle before the next hard stop for me.
It's also useful in a startup, I just start using it with zero effort.
For external service I have to get a unique card for billing and then upload monthly receipts, or ask our ops to get it setup and then wait for weeks as the sales/legal/compliance teams on each side talk to each other.
That `vertexai=True` does the trick - you can use same code without this option, and you will not be using "Vertex".
Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.
For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.
Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.
I don't think a service account vs an API key would improve performance in any meaningful way. I doubt the AI endpoint is authenticating the API key against a central database every request, it will most certainly be cached against a service key in the same AZ or whatever GCP call it.
Service account file vs API Key have similar security risks if provided the way you are using them. Google recommends using ADC and it’s actually an org policy recommendation to disable SA files.
ADC (Application Default Credentials) is a specification for finding credentials (1. look here 2. look there etc.) not an alternative for credentials. Using ADC one can e.g. find an SA file.
As a replacement for SA files one can have e.g. user accounts using SA impersonation, external identity providers, or run on GCP VM or GKE and use built-in identities.
Google Cloud Console's billing console for Vertex is so poor. I'm trying to figure out how much i spent on which models and I still cannot for the life of me figure it out. I'm assuming the only way to do it is to use the gemini billing assistant chatbot, but that requires me to turn on another api permission.
I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.
I couldn’t have said it better. My billing friends are working to address some of these concerns along with the Vertex team. We are planning to address this issue. Please stay tuned, we will come back to this thread to announce when we can
In fact, if you can DM me (@chrischo_pm on X) with, I would love to learn more if you are interested.
100% this. We actually use OpenRouter (and pay their surcharge) with Gemini 2.5 Pro just because we can actually control spend via spent limit on keys (A++ feature) and prepaid credit.
Gemini’s is no better. Their data can be up to 24h stale and you can’t set hard caps on API keys. The best you can do is email notification billing alerts, which they acknowledge can be hours late.
Only problem is that the genai API at https://ai.google.dev is far less reliable and can be problematic for production use cases. Right around the time Gemini 2.0 launched, it was done for days on end without any communication. They are putting a lot of effort into improving it but it's much less reliable than openai, which matters for production. They can also reject your request based on overall system load (not your individual limits), which is very unpredictable. They advertise 2000 requests per minute. When I tried several weeks ago, I couldn't even get 500 per minute.
Pls ping me if you run into any production issues, will raise right away to the team. We have massive at scale products operating on AI Studio, so we are set up to ensure stability.
OpenAI compatible API is missing important parameters, for example I don't think there is a way to disable flash 2 thinking with it.
Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft
I find Google's service auth SO hard to figure out. I've been meaning to solve deploying to Cloud Run via service with for several years now but it just doesn't fit in my brain well enough for me to make the switch.
simonw, 'Google's service auth SO hard to figure out' – absolutely hear you. We're taking this feedback on auth complexity seriously. We have a new Vertex express mode in Preview (https://cloud.google.com/vertex-ai/generative-ai/docs/start/... , not ready for primetime yet!) that you can sign up for a free tier and get API Key right away.
We are improving the experience, again if you would like to give feedback, please DM me on @chrischo_pm on X.
If you're on cloud run it should just work automatically.
For deploying, on GitHub I just use a special service account for CI/CD and put the json payload in an environment secret like an API key. The only extra thing is that you need to copy it to the filesystem for some things to work, usually a file named google_application_credentials.json
If you use cloud build you shouldn't need to do anything
You should consider setting up Workload Identity Federation and authentication to Google Cloud using your GitHub runner OIDC token. Google Cloud will "trust" the token and allow you to impersonate service accounts. No static keys!
Yes it does. We deploy firebase and bunch of other GCP things from github actions and there are zero API keys or JSON credentials anywhere.
Everything is service accounts and workload identity federation, with restrictions such as only letting main branch in specific repo to use it (so no problem with unreviewed PRs getting production access).
Edit: if you have a specific error or issue where this doesn't work for you, and can share the code, I can have a look.
No thank you, there is zero benefit to migrating and no risk in using credentials the way I do.
How do you sign a firebase custom auth token with workload identity federation? How about a pre signed storage URL? Off the top of my head I think those were two things that don't work
First, regarding "zero benefit" and "no risk". I disagree. The risk and benefit might be low, and not worth the change for you. But it is absolutely not zero.
You have a JSON key file which you can't know how many people have. The person who created the key, downloaded it and then stored it as github secret - did they download it to /dev/shm? Did some npm/brew install script steal it from their downloads folder? Any of the github repo owners can get hold of it. Depending on whether you use github environments/deployments and have set it up properly, so can anyone with write access to the repo. Do you pin all your dependencies, reusable workflows etc, or can a compromise of someone elses repo steal your secrets?
With the workload identity auth, there is no key. Each access obtains a short lived token. Only workflows on main branch can get it. Every run will have audit logs, and so will every action taken by that token. Risk of compromise is much lower, but even more importantly, if compromised I'll be able to know exactly when and how, and what malicious actions were taken.
Maybe this is paranoid to you and not worth it. That's fine. But it's not "no risk", and it is worth to me to protect personal data of our users.
---
As for your question, first step is just to run https://github.com/google-github-actions/auth with identity provider configured in your GCP project, restricted to your github repo or org.
This will create application default credentials that most GCP tools and libraries will just work with as if when you are running things locally after "gcloud auth login".
You could post on Reddit asking for help and someone is likely to provide answers, an explanation, probably even some code or bash commands to illustrate.
And even if you don't ask, there are many examples. But I feel ya. The right example to fit your need is hard to find.
yeah bro just one more principal bro authenticate each one with SAML or OIDC or Google Signin bro set the permissions for each one make sure your service account has permissions aiplatform.models.get and aiplatform.models.list bro or make a custom role and attach the role to the principle to parcel the permission
It's not complicated in the context of huge enterprise applications, but for most people trying to use Google's LLMs, it's much more confusing than using an API key. The parent commenter is probably using an aws secret key.
And FWIW this is basically what google encourages you to do with firebase (with the admin service account credential as a secret key).
We added it to the docs. The downside of the OAI compat endpoint is we have to design the API twice, once for our API, then once through the OAI compat layer which makes it slower sometimes to have certain features, especially if we diverge at all.
BTW, I have noticed that when tested outside GCP, the OpenAI compat endpoint has significantly lower latency for most requests (vs using the genai library). VertexAI is better than both.
When I used the openai compatible stuff my API’s just didn’t work at all. I switched back to direct HTTP calls, which seems to be the only thing that works…
I got claude to write me an auth layer using only python http.client and cryptography. One shot no problem, now I can get a token from the service key any time, just have to track expiration. Annoying that they don't follow industry standard though.
as there are so many variations out there the AI gets majorly confused, as a matter of fact, the google oauth part is the one thing that gemini 2.5 pro cant code
Maybe you should just read the docs and use the examples there. I have used all kinds of GCP services for many years and auth is not remotely complicated imo.
simonw, good points. The Vertex vs. non-Vertex Gemini API (via AI Studio at aistudio.google.com) could use more clarity.
For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.
However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.
I think you are talking about generativeai vs. vertexai vs. genai sdk.
And you are watching us evolve overtime to do better.
Couple clarifications
1. Going forward we only recommend using genai SDK
2. Subtle API differences - this is a bit harder to articulate but we are working to improve this. Please dm at @chrischo_pm if you would like to discuss further :)
No idea what any of those SDK names mean. But sure enoough searching will bring up all three of them for different combination of search terms, and none of them will point to the "recommend only using <a random name that is indistinguishable form other names>"
Oh, And some of these SDKs (and docs) do have a way to use this functionality without the SDKs, but not others. Because there are only 4 languages in the world, and everyone should be happy using them.
I think you can strongly influence which SDK your customers use by keeping the Python, Typescript, and Curl examples in the documentation up to date and uniformly use what you consider the ‘best’ SDK in the examples.
Overall, I think that Google has done a great job recently in productizing access to your models. For a few years I wrote my own utilities to get stuff done, now I do much less coding using Gemini (and less often ChatGPT) because the product offerings do mostly what I want.
One thing I would like to see Google offer is easier integrated search with LLM generation. The ‘grounding’ examples are OK, but for use in Python I buy a few Perplexity API credits and use that for now. That is the single thing I would most like to see you roll out.
EDIT: just looked at your latest doc pages, I like the express mode setup with a unified access to regular APIs vs. Vertex.
Indeed. Though the billing dashboard feels like an over engineered April fool's joke compared to Anthropic or OpenAI. And it takes too long to update with usage. I understand they tacked it into GCP, but if they're making those devs work 60 hours a week can we get a nicer, and real time, dashboard out of it at least?
Wait until you see how to check Bedrock usage in AWS.
(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)
Except that the OpenAI compatible endpoint isn't actually compatible. Doesn't support string enum values for function calls and throws a confusing error. Vertex at least has better error messages. My solution, just use text completions and emulate the tool call support client side, validate the responses against the schema, and retry on failure. It rarely has to retry and always works the 2nd time even without feedback.
Vertex AI is essentially equivalent to Azure OpenAI - enterprise-ready, with HIPAA/SOC2 compliance and data-privacy guarantees.
FWIW OpenAI compatibility only gets you so far with Gemini. Gemini’s video/audio capabilities and context caching are unparalleled and you’ll likely need to use their SDKs instead to fully take advantage of them.
This is correct, "When you activate a Cloud Billing account, all use of Gemini API and Google AI Studio is a "Paid Service" with respect to how Google Uses Your Data, even when using Services that are offered free of charge, such as Google AI Studio and unpaid quota of Gemini API."
There are a few conditions that take precedence over having-billing-enabled and will cause AI Studio to train on your data. This is why I personally use Vertex
If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.
I actually use their HTTP API directly (with the ijson streaming JSON parser for Python) and the code is reasonably straight-forward: https://github.com/simonw/llm-gemini/blob/61a97766ff0873936a...
You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.
Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai
Anthropic have the same feature now as well: https://docs.anthropic.com/en/api/openai-sdk