I still don't really understand what Vertex AI is. If you can ignore Vertex most...

anaisbetts · 2025-05-04T10:07:54 1746353274

It's a way for you to have your AI billing under the same invoice as all of your other cloud purchases. If you're a startup this is a dumb feature, if you work at a $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers

blitzar · 2025-05-05T10:05:35 1746439535

> $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers

What's the point of working at $ENTERPRISE_BIGCO if you don't fight with IT & Legal & various annoying middle managers.

Anyway let's table this for now and circle back later after we take care of some of the low hanging fruit. Keep me in the loop and I will do a deep dive into how we can think outside the box and turn this into a win-win. I will touch base with you when I have all my ducks in a row and we can hop on a call.

kridsdale1 · 2025-05-05T18:43:56 1746470636

I work AT Google and 99% of my conversations must have been the training set for your paragraph.

blitzar · 2025-05-06T08:43:09 1746520989

If they replaced the leet code interviews with department wide meetings and email chain take home tasks I could make hay and really shine with a series of No nothing from this side, FYI's and back burners.

Google sounds like a fun place to work, run it up the flagpole and see if you can move the needle before the next hard stop for me.

progbits · 2025-05-04T10:20:57 1746354057

It's also useful in a startup, I just start using it with zero effort.

For external service I have to get a unique card for billing and then upload monthly receipts, or ask our ops to get it setup and then wait for weeks as the sales/legal/compliance teams on each side talk to each other.

NoahZuniga · 2025-05-04T15:01:01 1746370861

This is not true??? The AI studio surface is also billed on a per project basis?

bn-l · 2025-05-04T12:32:59 1746361979

ah! thank you. I was also struggling with where vertex fitted.

tzury · 2025-05-04T01:46:29 1746323189

Vertex by example:

    creds = service_account.Credentials.from_service_account_file(
        SA_FILE,
        scopes=[
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/generative-language",
        ]
    )


    google.genai.Client(
        vertexai=True,
        project=PROJECT_ID,
        location=LOCATION,
        http_options={"api_version": "v1beta1"},
        credentials=sa_creds,
    )

That `vertexai=True` does the trick - you can use same code without this option, and you will not be using "Vertex".

Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.

For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.

Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.

sitefail1 · 2025-05-04T02:09:57 1746324597

I don't think a service account vs an API key would improve performance in any meaningful way. I doubt the AI endpoint is authenticating the API key against a central database every request, it will most certainly be cached against a service key in the same AZ or whatever GCP call it.

ivanvanderbyl · 2025-05-04T04:08:29 1746331709

Service account file vs API Key have similar security risks if provided the way you are using them. Google recommends using ADC and it’s actually an org policy recommendation to disable SA files.

wanderer2323 · 2025-05-04T04:57:06 1746334626

ADC (Application Default Credentials) is a specification for finding credentials (1. look here 2. look there etc.) not an alternative for credentials. Using ADC one can e.g. find an SA file.

As a replacement for SA files one can have e.g. user accounts using SA impersonation, external identity providers, or run on GCP VM or GKE and use built-in identities.

(ref: https://cloud.google.com/iam/docs/migrate-from-service-accou...)

logankilpatrick · 2025-05-11T19:10:46 1746990646

The startup credits are fully compatible with AI Studio, they are not specific to Vertex.

laborcontract · 2025-05-04T00:58:20 1746320300

Google Cloud Console's billing console for Vertex is so poor. I'm trying to figure out how much i spent on which models and I still cannot for the life of me figure it out. I'm assuming the only way to do it is to use the gemini billing assistant chatbot, but that requires me to turn on another api permission.

I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.

chrisheecho · 2025-05-04T06:46:26 1746341186

I couldn’t have said it better. My billing friends are working to address some of these concerns along with the Vertex team. We are planning to address this issue. Please stay tuned, we will come back to this thread to announce when we can In fact, if you can DM me (@chrischo_pm on X) with, I would love to learn more if you are interested.

jeswin · 2025-05-04T13:58:12 1746367092

Can you allow prepaid credits as well please?

byefruit · 2025-05-04T18:28:22 1746383302

100% this. We actually use OpenRouter (and pay their surcharge) with Gemini 2.5 Pro just because we can actually control spend via spent limit on keys (A++ feature) and prepaid credit.

chrisheecho · 2025-05-04T19:28:24 1746386904

one step ahead of you ;)

tyre · 2025-05-04T01:29:29 1746322169

Gemini’s is no better. Their data can be up to 24h stale and you can’t set hard caps on API keys. The best you can do is email notification billing alerts, which they acknowledge can be hours late.

__jl__ · 2025-05-04T14:53:46 1746370426

Only problem is that the genai API at https://ai.google.dev is far less reliable and can be problematic for production use cases. Right around the time Gemini 2.0 launched, it was done for days on end without any communication. They are putting a lot of effort into improving it but it's much less reliable than openai, which matters for production. They can also reject your request based on overall system load (not your individual limits), which is very unpredictable. They advertise 2000 requests per minute. When I tried several weeks ago, I couldn't even get 500 per minute.

logankilpatrick · 2025-05-11T19:12:35 1746990755

Pls ping me if you run into any production issues, will raise right away to the team. We have massive at scale products operating on AI Studio, so we are set up to ensure stability.

mgraczyk · 2025-05-04T00:09:47 1746317387

OpenAI compatible API is missing important parameters, for example I don't think there is a way to disable flash 2 thinking with it.

Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft

simonw · 2025-05-04T00:32:15 1746318735

I find Google's service auth SO hard to figure out. I've been meaning to solve deploying to Cloud Run via service with for several years now but it just doesn't fit in my brain well enough for me to make the switch.

chrisheecho · 2025-05-04T06:47:02 1746341222

simonw, 'Google's service auth SO hard to figure out' – absolutely hear you. We're taking this feedback on auth complexity seriously. We have a new Vertex express mode in Preview (https://cloud.google.com/vertex-ai/generative-ai/docs/start/... , not ready for primetime yet!) that you can sign up for a free tier and get API Key right away. We are improving the experience, again if you would like to give feedback, please DM me on @chrischo_pm on X.

mgraczyk · 2025-05-04T01:15:47 1746321347

If you're on cloud run it should just work automatically.

For deploying, on GitHub I just use a special service account for CI/CD and put the json payload in an environment secret like an API key. The only extra thing is that you need to copy it to the filesystem for some things to work, usually a file named google_application_credentials.json

If you use cloud build you shouldn't need to do anything

candiddevmike · 2025-05-04T01:57:15 1746323835

You should consider setting up Workload Identity Federation and authentication to Google Cloud using your GitHub runner OIDC token. Google Cloud will "trust" the token and allow you to impersonate service accounts. No static keys!

mgraczyk · 2025-05-04T03:40:22 1746330022

Does not work for many Google services, including firebase

progbits · 2025-05-04T10:25:16 1746354316

Yes it does. We deploy firebase and bunch of other GCP things from github actions and there are zero API keys or JSON credentials anywhere.

Everything is service accounts and workload identity federation, with restrictions such as only letting main branch in specific repo to use it (so no problem with unreviewed PRs getting production access).

Edit: if you have a specific error or issue where this doesn't work for you, and can share the code, I can have a look.

mgraczyk · 2025-05-04T17:07:00 1746378420

No thank you, there is zero benefit to migrating and no risk in using credentials the way I do.

How do you sign a firebase custom auth token with workload identity federation? How about a pre signed storage URL? Off the top of my head I think those were two things that don't work

progbits · 2025-05-05T20:14:04 1746476044

First, regarding "zero benefit" and "no risk". I disagree. The risk and benefit might be low, and not worth the change for you. But it is absolutely not zero.

You have a JSON key file which you can't know how many people have. The person who created the key, downloaded it and then stored it as github secret - did they download it to /dev/shm? Did some npm/brew install script steal it from their downloads folder? Any of the github repo owners can get hold of it. Depending on whether you use github environments/deployments and have set it up properly, so can anyone with write access to the repo. Do you pin all your dependencies, reusable workflows etc, or can a compromise of someone elses repo steal your secrets?

With the workload identity auth, there is no key. Each access obtains a short lived token. Only workflows on main branch can get it. Every run will have audit logs, and so will every action taken by that token. Risk of compromise is much lower, but even more importantly, if compromised I'll be able to know exactly when and how, and what malicious actions were taken.

Maybe this is paranoid to you and not worth it. That's fine. But it's not "no risk", and it is worth to me to protect personal data of our users.

---

As for your question, first step is just to run https://github.com/google-github-actions/auth with identity provider configured in your GCP project, restricted to your github repo or org.

This will create application default credentials that most GCP tools and libraries will just work with as if when you are running things locally after "gcloud auth login".

For firebase token you can just run a python script as subsequent step in the github job doing something like https://firebase.google.com/docs/auth/admin/create-custom-to.... For signed storage url this can be done with the gcloud tool: https://cloud.google.com/storage/docs/access-control/signing...

In both cases after running the "google-github-actions/auth" step it will just work with the short-lived credentials that step generated.

PantaloonFlames · 2025-05-04T02:07:21 1746324441

You could post on Reddit asking for help and someone is likely to provide answers, an explanation, probably even some code or bash commands to illustrate.

And even if you don't ask, there are many examples. But I feel ya. The right example to fit your need is hard to find.

mountainriver · 2025-05-04T01:28:59 1746322139

GCP auth is terrible in general. This is something aws did well

PantaloonFlames · 2025-05-04T02:09:54 1746324594

I don't get that. How?

- There are principals. (users, service accounts)

- Each one needs to authenticate, in some way. There are options here. SAML or OIDC or Google Signin for users; other options for service accounts.

- Permissions guard the things you can do in Google cloud.

- There are builtin roles that wrap up sets of permissions.

- you can create your own custom roles.

- attach roles to principals to give them parcels of permissions.

mgraczyk · 2025-05-04T18:32:33 1746383553

yeah bro just one more principal bro authenticate each one with SAML or OIDC or Google Signin bro set the permissions for each one make sure your service account has permissions aiplatform.models.get and aiplatform.models.list bro or make a custom role and attach the role to the principle to parcel the permission

It's not complicated in the context of huge enterprise applications, but for most people trying to use Google's LLMs, it's much more confusing than using an API key. The parent commenter is probably using an aws secret key.

And FWIW this is basically what google encourages you to do with firebase (with the admin service account credential as a secret key).

arccy · 2025-05-04T08:28:48 1746347328

GCP auth is actually one of the things it does way better than AWS. it's just that the entire industry has been trained on AWS's bad practices...

minimaxir · 2025-05-04T00:20:17 1746318017

From the linked docs:

> If you want to disable thinking, you can set the reasoning effort to "none".

For other APIs, you can set the thinking tokens to 0 and that also works.

mgraczyk · 2025-05-04T00:25:08 1746318308

Wow thanks I did not know

logankilpatrick · 2025-05-11T19:14:34 1746990874

We added it to the docs. The downside of the OAI compat endpoint is we have to design the API twice, once for our API, then once through the OAI compat layer which makes it slower sometimes to have certain features, especially if we diverge at all.

mgraczyk · 2025-05-11T19:17:49 1746991069

Thanks, yes makes sense.

BTW, I have noticed that when tested outside GCP, the OpenAI compat endpoint has significantly lower latency for most requests (vs using the genai library). VertexAI is better than both.

Any idea why or if that will change?

chrisheecho · 2025-05-04T06:46:37 1746341197

We built the OpenAI Compatible API (https://cloud.google.com/vertex-ai/generative-ai/docs/multim...) layer to help customers that are already using OAI library to test out Gemini easily with basic inference but not as a replacement library for the genai sdk (https://github.com/googleapis/python-genai). We recommend using th genai SDK for working with Gemini.

mike_hearn · 2025-05-04T18:40:19 1746384019

So, to be clear, Google only supports Python as a language for accessing your models? Nothing else?

chrisheecho · 2025-05-04T19:29:25 1746386965

We have Python/Go in GA.

Java/JS is in preview (not ready for production) and will be GA soon!

troupo · 2025-05-04T21:26:58 1746394018

What about providing an actual API people can call without needing to rely on Google SDKs?

logankilpatrick · 2025-05-11T19:15:36 1746990936

you can do so with the AI SDK from Vercel, open router, etc or just sending raw http requests

logankilpatrick · 2025-05-11T19:13:23 1746990803

This is documented for AI Studio here: https://ai.google.dev/gemini-api/docs/openai#thinking

Aeolun · 2025-05-04T01:13:47 1746321227

When I used the openai compatible stuff my API’s just didn’t work at all. I switched back to direct HTTP calls, which seems to be the only thing that works…

franze · 2025-05-04T08:25:46 1746347146

yeah, 2 days to get Google OAuth flow integrated into an background app/script, 1 day coding for the actual app ...

jpc0 · 2025-05-04T10:32:22 1746354742

Is this vertexAI related or in general, I find googles oauth flow to be extremely well documented and easy to setup…

jacob019 · 2025-05-04T14:29:00 1746368940

I got claude to write me an auth layer using only python http.client and cryptography. One shot no problem, now I can get a token from the service key any time, just have to track expiration. Annoying that they don't follow industry standard though.

arccy · 2025-05-04T08:26:50 1746347210

should have used ai to write the integrations...

franze · 2025-05-04T08:31:44 1746347504

thats with AI

as there are so many variations out there the AI gets majorly confused, as a matter of fact, the google oauth part is the one thing that gemini 2.5 pro cant code

should be its own benchmark

enneff · 2025-05-04T10:54:42 1746356082

Maybe you should just read the docs and use the examples there. I have used all kinds of GCP services for many years and auth is not remotely complicated imo.

shresbm123 · 2025-05-05T04:43:24 1746420204

We support reasoning_effort = none. That will let you disable flash 2 thinking. We will document it better.

omneity · 2025-05-04T00:59:13 1746320353

JSONSchema support on Google's OpenAI-compatible API is very lackluster and limiting. My biggest gripe really.

shresbm123 · 2025-05-05T05:19:13 1746422353

yeah we are looking into it

omneity · 2025-05-05T08:46:39 1746434799

Thank you! Adding support for `additionalProperties`[0] (and perhaps `patternProperties` too) would be particularly great!

Happy to provide test cases as well if helpful.

0: https://datatracker.ietf.org/doc/html/draft-fge-json-schema-...

chrisheecho · 2025-05-04T06:46:07 1746341167

simonw, good points. The Vertex vs. non-Vertex Gemini API (via AI Studio at aistudio.google.com) could use more clarity.

For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.

However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.

troupo · 2025-05-04T13:35:22 1746365722

Why create two different APIs that are the same, but only subtly different, and have several different SDKs?

chrisheecho · 2025-05-04T19:32:00 1746387120

I think you are talking about generativeai vs. vertexai vs. genai sdk.

And you are watching us evolve overtime to do better.

Couple clarifications 1. Going forward we only recommend using genai SDK 2. Subtle API differences - this is a bit harder to articulate but we are working to improve this. Please dm at @chrischo_pm if you would like to discuss further :)

troupo · 2025-05-04T21:23:57 1746393837

So. Three different SDKs.

No idea what any of those SDK names mean. But sure enoough searching will bring up all three of them for different combination of search terms, and none of them will point to the "recommend only using <a random name that is indistinguishable form other names>"

Oh, And some of these SDKs (and docs) do have a way to use this functionality without the SDKs, but not others. Because there are only 4 languages in the world, and everyone should be happy using them.

mark_l_watson · 2025-05-05T13:18:24 1746451104

I think you can strongly influence which SDK your customers use by keeping the Python, Typescript, and Curl examples in the documentation up to date and uniformly use what you consider the ‘best’ SDK in the examples.

Overall, I think that Google has done a great job recently in productizing access to your models. For a few years I wrote my own utilities to get stuff done, now I do much less coding using Gemini (and less often ChatGPT) because the product offerings do mostly what I want.

One thing I would like to see Google offer is easier integrated search with LLM generation. The ‘grounding’ examples are OK, but for use in Python I buy a few Perplexity API credits and use that for now. That is the single thing I would most like to see you roll out.

EDIT: just looked at your latest doc pages, I like the express mode setup with a unified access to regular APIs vs. Vertex.

chrisheecho · 2025-05-05T17:30:37 1746466237

Thanks! - I like it too :)

unknown_user_84 · 2025-05-04T00:42:52 1746319372

Indeed. Though the billing dashboard feels like an over engineered April fool's joke compared to Anthropic or OpenAI. And it takes too long to update with usage. I understand they tacked it into GCP, but if they're making those devs work 60 hours a week can we get a nicer, and real time, dashboard out of it at least?

logankilpatrick · 2025-05-11T19:16:51 1746991011

we will have a dashboard in AI Studio very soon! Then will work to drive down delay.

coredog64 · 2025-05-04T03:00:10 1746327610

Wait until you see how to check Bedrock usage in AWS.

(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)

jacob019 · 2025-05-04T14:20:23 1746368423

Except that the OpenAI compatible endpoint isn't actually compatible. Doesn't support string enum values for function calls and throws a confusing error. Vertex at least has better error messages. My solution, just use text completions and emulate the tool call support client side, validate the responses against the schema, and retry on failure. It rarely has to retry and always works the 2nd time even without feedback.

ashu1461 · 2025-05-04T15:17:33 1746371853

There is also no way to over-write content moderation settings, and half of the responses you generate via open ai endpoint end up being moderated.

fzysingularity · 2025-05-04T15:07:53 1746371273

Vertex AI is essentially equivalent to Azure OpenAI - enterprise-ready, with HIPAA/SOC2 compliance and data-privacy guarantees.

FWIW OpenAI compatibility only gets you so far with Gemini. Gemini’s video/audio capabilities and context caching are unparalleled and you’ll likely need to use their SDKs instead to fully take advantage of them.

minimaxir · 2025-05-04T00:18:55 1746317935

Vertex AI is essentially a rebranding of their more enterprise platform on GCP, nothing explicitly "new."

ashu1461 · 2025-05-04T15:19:16 1746371956

Have to work hard to figure out the difference between

- Vertex AI

- AI Studio

- Gemini

- Firebase Gen AI

hustwindmaple1 · 2025-05-04T22:50:49 1746399049

If you are not a paying GCP user, there is really no point to even look at Vertex AI.

Just stick with AI Studio and the free developer AI along with it; you will be much much happier.

egamirorrim · 2025-05-04T07:30:35 1746343835

I use Vertex because that's the one that makes enterprise security people happy about how our datas handled.

Do Google use all the AI studio traffic to train etc?

sunaookami · 2025-05-04T07:50:01 1746345001

Not if you have billing enabled: https://ai.google.dev/gemini-api/docs/pricing

logankilpatrick · 2025-05-11T19:23:18 1746991398

This is correct, "When you activate a Cloud Billing account, all use of Gemini API and Google AI Studio is a "Paid Service" with respect to how Google Uses Your Data, even when using Services that are offered free of charge, such as Google AI Studio and unpaid quota of Gemini API."

kmod · 2025-05-04T16:52:37 1746377557

There are a few conditions that take precedence over having-billing-enabled and will cause AI Studio to train on your data. This is why I personally use Vertex

KTibow · 2025-05-04T01:50:11 1746323411

Vertex is the enterprise platform. It also happens to have much higher rate limits, even for free models.