specifically I've discovered how to 'trick' mainstream cloud storage and mainstream query engines using mainstream table formats how to read parallel arrays that are stored outside the table without using a classic join and treat them as new columns or schema evolution. It'll work on spark, bigquery etc.
Table1 = {"col1": [1,2,3]}
Table2 = {"epiphany": [1,1,1]}
for i, r in enumerate(Table1["col1"]):
print(r, Table2["epiphany"][i])
He's really happy he found this (Edit: actually it seems like Chang She talked about this while discussing the Lance data format[1]@12:00 in 2024 at a conference calling it "the fourth way") and will represent this in a conference.
Seriously, this is not what big data does today. Distributed query engines don't have the primitives to zip through two tables and treat them as column groups of the same wider logical table. There's a new kid on the block called LanceDB that has some of the same features but is aiming for different use-cases. My trick retrofits vertical partitioning into mainstream data lake stuff. It's generic and works on the tech stack my company uses but would also work on all the mainstream alternative stacks. Slightly slower on AWS. But anyway. I guess HN just wants to see an industrial track paper.
The Rust version of this is "turn .iter() into .par_iter()."
It's also true that for both, it's not always as easy as "just make the for loop parallel." Stylo is significantly more complex than that.
> to this I sigh in chrome.
I'm actually a Chrome user. Does Chrome do what Stylo does? I didn't think it did, but I also haven't really paid attention to the internals of any browsers in the last few years.
Concurrency is easy by default. The hard part is when you are trying to be clever.
You write concurrent code in Rust pretty much in the same way as you would write it in OpenMP, but with some extra syntax. Rust catches some mistakes automatically, but it also forces you to do some extra work. For example, you often have to wrap shared data in Arc when you convert single-threaded code to use multiple threads. And some common patterns are not easily available due to the limited ownership model. For example, you can't get mutable references to items in a shared container by thread id or loop iteration.
> For example, you can't get mutable references to items in a shared container by thread id or loop iteration.
This would be a good candidate for a specialised container that internally used unsafe. Well, thread id at least; since the user of an API doesn't provide it, you could mark the API safe, since you wouldn't have to worry about incorrect inputs.
Loop iteration would be an input to the API, so you'd mark the API unsafe.
I watched gnome evolve to the point where your calendar notifies you of meetings 10 minutes prior and has a join button which runs the appropriate application. I also get notifications for new emails and slack messages. Last week I was pleasantly surprised by
> why some people are incapable of changing their point of view
I've thought about this and the conclusion was:
What you believe you know makes you what you currently are. You can't just believe in a contradictory position. You could believe that you have been proven wrong, which would then change your belief.
Changing your point of view, looking at things from the vantage of someone else with different life experiences and the resulting belief systems would be dishonest at best, and claiming that you are capable of changing your beliefs on a whim is like being able to rip your arm off.
You can, at best, adapt your own belief to encompass theirs with caveats or simply not care about your truths.
With a simple array of unsigned int and bit operations like 20 years ago. It could solve a lot of puzzles within microseconds. Later I realized rules 1, 2, 5, 6 are pretty much the same.
reply