To be honest I've always found the model of a frontend being able to write data into a database highly suspect, even with security rules.
Unlike a backend where where the rules for validation and security are visible and part of the specifications, Firebase's security rules is something one can easily forget as it's a separate process, and has to be reevaluated as part of every new feature developed.
Yeah, I've never understood how this concept can work for most applications. In everything I build I always need to do something with the input before writing it to a database. Just security rules are not enough.
What kind of apps are people building where you don't need backend logic?
I think I missed where writing to the database precludes backend logic. Databases have triggers and integrity rules, but beyond that, why can't logic execute after data is written to a database?
Because once it is written to the database, it can be output somewhere before you execute your logic. IE, explicit language, child porn, etc. You generally want to check for that BEFORE you write the data.
You're saying it's impossible to have public write access to a table without also providing public read access?
"it can be output somewhere before you execute your logic" is a design choice that is orthogonal from whether you execute your logic before or after input into the database.
First of all, most database records couldn't fit child porn, unless it was somehow encoded across thousands of records, in which case you couldn't realize it was child porn until after you've stored 99% of it.
Sure though, by putting "child porn" in a sentence, you can make anything seem bad. Tell me this, would you rather your application middleware was in the "copying child porn" business? ;-)
Actually, the more I think about it, the crazier this seems. You're going to store all the "child porn" you receive in RAM until you've validated that it is child porn?
I don’t get your tone or why you seem shocked that binary data can be stored in a database. Postgres and MySQL both have column sizes for binary data that can hold gigabytes.
Second, you generally need to hold the entire image in RAM to create the perceptual hash needed to check that the image is/isn’t child porn.
> I don’t get your tone or why you seem shocked that binary data can be stored in a database. Postgres and MySQL both have column sizes for binary data that can hold gigabytes.
My tone is shocked, because what you're describing seems totally removed from any system I've seen, and I've implemented a ton of systems. For performance reasons, you want to stream large uploads to storage (web servers, like nginx, are typically configured to do this even before the request is sent to any application logic). You invariably want to store UGC data that conforms to your schema, even if you're going to reject it for content. There's a whole process for contesting, reviewing and reversing decisions that requires the data be in persistent storage.
I think you misunderstood what I said. Yes, Postgres, MySQL and a variety of other databases have column sizes for binary data that can hold gigabytes. What I wouldn't agree with is that most database records can hold gigabytes, binary or otherwise. Heck, most database records aren't populated from UGC sources and not UGC sources where child porn is a risk.
But okay, let's assume, for arguments sake, most database records are happily accepting 4TB large objects, and you're accepting up to 4TB uploads (where Postgres' large objects max out). Do all your web & application servers have 4TB of memory? What if you're processing more than one request at once, do you have N*4TB of memory?
At least all the systems I've implemented that receive data from users enforce limits on request sizes, and with the exception of file uploads, which are typically directly streamed to the filesystem before processing, those limits tend to be quite small, often less than a kilobyte. Maybe someone could write some really terse child porn prose and compress it down to fit in that space, but pretty much any image would have to be spread across many records. By design, almost any child porn received would be put in persistent storage before being identified as such.
> Second, you generally need to hold the entire image in RAM to create the perceptual hash needed to check that the image is/isn’t child porn.
This is one of many reasons that you generally want to stream file uploads to storage before performing analysis. Otherwise you're incredibly vulnerable to a DoS attack on your active memory resources. Even without a DoS attack, you're harming performance by unnecessarily evicting pages that could be used for caching/buffering for bytes that won't be served at least until you've finished receiving all the file's data.
[Note: Many media encodings tend to store neighbouring pixels together, so you can, conceptually, compute a perceptual hash progressively, without loading the entire file into active memory, which is often desirable, particularly with video content.]
Thought about it some more... this whole scenario makes sense in only the narrowist of contexts. Very few applications directly serve UGC to the public, and a lot of applications are B2B. You're authenticated, and there's a link to your employer (or you if you're self-employed). Uploaded data isn't made visible to the public. Services are often limited to a legal jurisdiction. If you want to upload your unencrypted child porn to a record in Google's Firebase database, you go ahead. The feds could use some easy cases.
There's little point in not writing it to disk, the idea of holding it in RAM vs writing a file to disk is moot. You've got to handle it and the best way of handling that kind of thing at scale is to write it to a temporary disk and then have a queue process work over the files doing the analysis.
No serious authority is going to hang you for UGC which is illegal material in storage while you process it. Heck, you can even allow stuff to go straight to publicly accessible if you have robust mechanisms for matching and reporting. The authorities won't take a hard line against a platform which is open to the public as long as they have the right mitigations in place. And they won't immediately blame you unless you act as a safe haven.
A sensible architectural pattern for binary UGC upload data would plan to put it in object storage and then deal with it from there.
I have never in my life wrote a "child porn validator" that restrict files uploaded by users to "non child porn". This sound nontrivial and futile (every bad file can also be stored as a zip file with a password). This sound like an example of a "think of the children" fallacy.
I also find the firebase model weird (but I didn't use it yet), but not for the child porn reasons.
Writing directly to Firebase is rarely done past the MVP stage. Normally it's the reading which is done directly from the client. Generally writes are bounced through Cloud Functions or a traditional server of some form. Some also "fan out" data, where a user has a private area to write to (say a list of tweets) then they get "fanned out" to follower's timelines via an async backend process which does any verification / cleansing as needed.
context: I have a near-100% naive perspective. Mobile dev whose built out something approximating Perplexity on Supabase. I have to use edge functions for ex. CORS, but by and large, logic is all in the app.
Probably because the client is in Flutter, and thus multiplatform & web in one, I see manipulating the input on both the client and server as code duplication and error prone.
I think if I was writing separate native apps, I'd push everything through edge functions, approximating your point: better to have that sensitive logic of what exactly is committed to the DB in one place.
Our experience has been very different. Our Firebase security rules are locked down tight, so any new properties or collections need to be added explicitly for a new feature to work — it can't be "forgotten". Doing so requires editing the security rules file, which immediately invites strict scrutiny of the changed rules during code review.
This is much better than trying to figure out what are the security-critical bits in a potentially large request handler server-side. It also lets you do a full audit much more easily if needed.
Are you suggesting that it's essentially too easy for a dev to just set and forget? That's a pretty interesting viewpoint. Not sure how any BaaS could solve that human factor.
Say you add a super_secret_internal_notes field. If you're writing a traditional backend, some human would need to explicitly add that to a list of publicly available fields somewhere (well, hopefully). For systems like Firebase, it's far too easy to have this field be created by frontend code that's just treating this as another piece of data in a nested part of a payload. But this can also happen on any system, if you have any JSON blob whose implicit schema can be added to by frontend development alone.
IMO implicit schema updates on any system should be consolidated and lifted to an easily emailed report - a security manager/CSO/CTO should be able to see all the super_secret_internal_notes as they're added across the org, and be able to immediately rectify security policies (perhaps even in a staging environment).
(Also, while tongue in cheek, the way that the intro to a part of Firebase's training materials https://www.youtube.com/watch?v=eMa0hsHqfHU implicitly centers security as part of the launch process, not something ongoing, is indicative of how pervasive the issue is - and not at all something that's restricted to Firebase!)
Generally agreed on improved audit logs of some formed helping.
Re training materials, this is one of the mitigations we launched to attempt to pull security to front of mind. I do not really think this is a Firebase problem, I think average developers (or average business leaders) just don't, in general, think much about security. As a result, Firebase materials have a triple burden - they need to get you to think about security, they need to get you to disrupt the most "productive" flow to write rules, and they need to get you to consistently revisit your rules throughout development. This is a lot to get into someone's head.
For all the awesomeness of Firebase's databases, they're both ripe footgun territory (Realtime Database specifically). Our original goal was to make the easiest database to get up and running with, which I think we did, but that initial ease comes with costs down the road which may or may not be worth it, that's a decision for the consumer.
You could either do away with the model of the frontend writing to the DB and ask customers to implement a small backend with a serverless component like AWS Lambda or Google Cloud Functions.
Barring that, perhaps Firestore could introduce the concept of a "lightweight database function hook" akin to Cloudflare workers that runs in the lifecycle of a DB request, thus formalizing the security requirements specific to the business requirement and causing the development organization to allocate resources to its upkeep.
So while a security rule usually gets tested very lightly, you'd see far more testing in a code component like the one I'm suggesting.
> Barring that, perhaps Firestore could introduce the concept of a "lightweight database function hook" akin to Cloudflare workers that runs in the lifecycle of a DB request, thus formalizing the security requirements specific to the business requirement and causing the development organization to allocate resources to its upkeep.
I think it's more like there's more surface area to forget when you have humans handling so many concerns, and it's not likely the part that's changed the most so it's a likely candidate for being "pushed out of the buffer" (of the human).
In a more typical model, backend devs focus more on security, while not needing to know the frontend, and vice versa.
The concept with firebase DB's is flawed IMO, I never got the point of directly accessing a DB in the frontend, or allowing that even with security rules, it just seems like it would cause problems.
Unlike a backend where where the rules for validation and security are visible and part of the specifications, Firebase's security rules is something one can easily forget as it's a separate process, and has to be reevaluated as part of every new feature developed.