One compromise that I can recommend is hosting the Email server yourself for incoming mail, but using an external SMTP relay such as SendGrid or SMTP2GO to send outgoing email. That way you never have to worry about your emails being blocked due to some misconfigured setting. Switching SMTP relay provider is easy so vendor lock-in is no issue.
Also, those services are mostly intended for broadcasting mass spam mails for better or worse, so for personal use their free tiers are almost too good to be true.
If you have data that's so 'dirty' that you can't decide on the filtering rules in advance (or based on only historic data), then what you have is garbage, not data. Therefore, we could call the art of shaping this into meaningful stories garbage science.
Tell me in a comment you have never worked with business data in your life.
Business data is full of minor inconsistencies which are not obvious until you sit in front of it. Products are sold by different units. Reporting ranges and aggregates are slightly different. Subsidiaries use categories which are close but not exactly identical.
There is generally plenty of massaging to do before you can get the information you need.
Here is my script gfm-preview [1], which I think is pretty cool since it implements a HTTP server in 50 lines of shell script (ab-)use with netcat. What is does is it starts a HTTP server that serves a rendered preview of a Markdown document using GitHub's API for rendering GitHub Flavoured Markdown. The page will automatically update when the document changes using fswatch and HTTP long polling!
Dhall was inspired by Nix, and created by a longtime Nixer.
Also as someone who likes simple FP but is not some sophisticated typelevel programmer, Dhall looks really complicated and verbose. To me, the Nickel approach, where authors of libraries in the language (equivalent to things like nixpkgs.lib) can pepper their functions with type annotations, but users can use what looks and feels like a simple configuration language, seems like a better approach for the domain.
I can't imagine getting all of the PHP developers I support comfortable editing the equivalent of shell.nix in Dhall themselves. Nix files I can ask them to edit without taking up too much of their time or focus.
Have we established that a US NGO is accepting "CSAM" hashes from China or that they are cooperating with them at all? That seems unlikely and Apple hasn't yet announced plans with how they're going to scan phones in China, I mean wouldn't China just demand outright to have full scanning capabilities of anything on the phone since you don't have any protection at all from that in China?
> Have we established that a US NGO is accepting "CSAM" hashes from China or that they are cooperating with them at all?
I believe Apple's intention is to accept hashes from all governments, not just one US organization. One of their ineffectual concessions to the criticism was to require two governments provide the same hash before they'd start using it.
China can definitely find a state government requiring some cash injection to help push the hash of a certain uninteresting square where nothing happened into the db
Sure, but Apple receives far less backlash if the system is applied to all phones and under the guise of "save the children". This would allow Apple to accommodate any nation state's image scanning requirements, which guarantees their continued operation in said markets.
The main announcement was Apple was getting hashes from NCMEC but they also listed ICMEC and have said "and other groups". Much like the source database for the image hashes the list of sources is opaque and covered by vague statements.
Also, those services are mostly intended for broadcasting mass spam mails for better or worse, so for personal use their free tiers are almost too good to be true.