Show HN: Minibatch – Python stream processing

miraculixx · on Jan 31, 2020

Scenario: you really have a streaming use case, but you don't want to deal with the complexity and overhead of some of the other streaming frameworks like Apache Spark or Flink.

If so minibatch is for you

tudelo · on Jan 31, 2020

do you have a list of features? For example, implemented windowing type? Triggering?

miraculixx · on Feb 2, 2020

good feedback, updated the README

woile · on Jan 31, 2020

Interesting I've started a project with a friend to build a simple toolkit for stream processing https://github.com/python-streaming/dam

I'm happy to see the streaming ecosystem is getting bigger in python, Java has a lot of market share

AndrewKemendo · on Jan 31, 2020

Kafka-python does python native streaming already. Why would I use this instead?

ewhauser421 · on Jan 31, 2020

Why not Apache Beam? It doesn’t require Spark or Flink

miraculixx · on Jan 31, 2020

Thanks for the question. Beam has a more complex execution model and AFAIK also needs some executor environment like Spark to really parallelize workloads. Given a mongodb that all producers and consumers can attach to, minibatch runs anywhere.

ewhauser421 · on Jan 31, 2020

(Beam has a DirectRunner which is just in memory)

dennisy · on Feb 1, 2020

Do you have any scale or performance metrics?

miraculixx · on Feb 2, 2020

no benchmarks yet unfortunately. While it should scale O(n) in the number of streams, there will be limitations to scale a single stream because the stream processing functions by default are executed synchronously (pending enhancement).

kstrauser · on Jan 31, 2020

It lost me at "for humans". I'm not even kidding: it's trite, demeaning to every similar project (with the implication that everything else is obviously for weirdos, not like you and me, amirite?), and shows a kind of naivete (everything else is overly complicated, lol, and I have no idea why that would be!).

It was momentarily cute the first time I saw it but that faded quickly. By the 100th "for humans" project, it had become a distinct code smell.

dang · on Jan 31, 2020

Ok, we've taken humans out of the title above. Let's focus on interesting things about the project now.

thiagomgd · on Jan 31, 2020

Agree! I just assume everything posted here is "for humans". So... what's the differential here?

acrisci · on Jan 31, 2020

I disagree. To me this term is a meaningful way to express the values of a project. It means that the abstractions will be at the highest level possible to provide the result. Whenever the library can make a decision for you, it will attempt to do so. It is like you are dealing with a human intelligence that is always guessing what you want and trying to give it to you. When I see "for humans", I expect that the api will behave like a human. This is not always desirable. Sometimes the highest level of abstraction is the wrong level of abstraction. Sometimes you need to be making all those decisions yourself, and you don't want the library to ever guess what your intention is. The human-like apis tend to work wonderfully until you need to optimize, have security concerns, or figure out you're actually doing something novel. Then you jump down a layer of abstraction and everything fits wonderfully again. Sometimes you really are just dealing with a machine and telling the machine what to do and whatever humanity is brought to the task is simply a distraction.

kstrauser · on Jan 31, 2020

You say you disagree, but then go on to spell out my position quite nicely. "For humans" implies a certain level of abstraction that quite often tends to be too abstract in exactly the way you describe. "We don't bother you with all the little details that the others make you handle!" frequently ignores the fact that those others make you specify all that stuff for a reason.

For example: "Do you want this database to be AP or CP?" "Don't pester me with the nitty gritty! I just want to store data." A "for humans" database that quietly made the choice for you would be very bad news if it later turns out to have chosen poorly for your own workload.

The first "for humans" thing I saw was Python's Requests library, and I think it earned the title. Having just built a web scraper on top of raw httplib a year earlier, I would have killed to have had Requests available. It's a great example of a project with a decent track record of mostly setting good defaults and letting devs concentrate on their parts of their projects. Since then, I have seen very few "for humans" projects that weren't so abstract as to become almost unusable. I mean, you could call MS Paint "Photoshop for humans", but that doesn't make it so.

acrisci · on Jan 31, 2020

Oh ok I thought you were making the point that "for humans" is a sort of meaningless marketing term, but I see you understand the nuances of this. Sometimes it is good to be "for humans" but you are saying most people in your experience who make the attempt fail.

The classic example I think is Microsoft Access (databases for humans) which is great until it can't do the one thing you need and then it doesn't work anymore. And everyone needs a different one thing.

kstrauser · on Jan 31, 2020

Ah, got it! I think that's a great example.

pryelluw · on Jan 31, 2020

There's another thread right now on HN's front end asking about online toxicity. Since you've decided to openly mock this project and it's maintainers, I've decided it's fair to remind you about your humanity.

Why do projects insist on labeling themselves as "for humans"? That makes no sense!

Libraries have historically been terse and hard to understand. It's why stackoverflow exists. It allows those with more experience help others.

Labeling a library as "for humans" means that special consideration was made to make it easier to grasp. That might hide some of the complexity. It could also limit the scope of the library's abilities. But it puts people and the UX first.

Is this the correct approach? We should have simple and complex libraries. For every "for humans" library there should be a more complex one. And vice versa.

Be human. Emphatic. Quick to praise. Slow to condem. Imagine it was you announcing a similar library. How would you feel to be openly ridiculed in such forum?

kstrauser · on Jan 31, 2020

I didn't ridicule anyone. I did criticize what I see as an obnoxious trend in software labeling and I stand by that. But if you re-read my post, I didn't say one thing bad about the project itself, and didn't mention the author at all. My comment was entirely scoped to the "for humans" description.

I emphatically insist that it's absolutely OK to lambast industry trends, and that's what I did here. I'm not sure how you inferred ridicule from that.

strgcmc · on Jan 31, 2020

I think to most reasonable readers of your text, the "it" that you reference multiple times isn't particularly clear (do you mean: it = the trend, it = the project, it = the maintainers?), and also you literally said it "shows a kind of naivete."

Shows naivete on the part of what/who? Surely a trend cannot be naive on its own, so any reasonable reading of this sentence means that you are calling the project and/or maintainer naive. How is that not (a mild form of) ridicule?

kstrauser · on Jan 31, 2020

I'll be honest: it feels to me here that you really want to find bad intentions in my post. Well, there weren't any. But yes, since you brought it up, I do think it shows a little naivete on the part of authors who label their projects "for humans" unless they can make a very strong case for why they're abstracting away the details that other projects require. One possibility is that previous projects have been needlessly complicated (in which case I honestly see "for humans" as kind of an insult to those projects, as though they weren't for humans). Another is that the author doesn't understand why those projects are more complex, which means that new users are about to learn some interesting lessons about leaky abstractions.

But more fundamentally, I'm not sure you noticed that the project's author was the one who wrote this post. The reason you do that on Hacker News is to demonstrate something you've been working on to get community feedback. Sometimes that feedback may be unpleasant, but as long as it was delivered with good intentions (as mine was), that's absolutely crucial. If OP thought they were going to get nothing but positive "this is the greatest thing ever!" replies, then that would be a radically higher level of naivete than anything I said would have implied. I don't think that's the case.

I know I'm not the only person who feels this way about "for humans". Any time the subject comes up, lots - by my estimation the majority - of opinions lean the same way. This project is still very new. Is it kinder to say something up front to let them know they may be turning away potential users, or to not say a word and maybe see the project gain less popularity than it might otherwise deserve? Again, nothing I said was insulting.

And more to the point, it was sincere. The author was implicitly asking for sincere feedback about their project, and I gave it. You said I've ridiculed them, but note that they haven't piped up to agree with you. I kind of think you're offended on the behalf of someone who isn't.

(And OP, if you're reading this, nothing I said was intended as an insult. But really, drop the "for humans". It's not a good look.)

pryelluw · on Jan 31, 2020

I really appreciate this sincere long form response. You make a great point about intentions. They are very hard to sometimes be put into text.

kstrauser · on Jan 31, 2020

You’re very welcome! Have a great weekend. :)

proc0 · on Jan 31, 2020

People with depth of knowledge that can use advanced tools are also human!

miraculixx · on Feb 2, 2020

OP here. Labeling the project "for humans" is meant to express that a user doesn't need to know a lot about streaming and its technical complexities to make use of the library. Essentially, knowing how to create a stream and writing a consumer function is enough.

throwaway55554 · on Jan 31, 2020

I agree! Anything that uses "Modern" does the same for me as well.