Hacker Newsnew | past | comments | ask | show | jobs | submit | nurettin's commentslogin

You mean you discovered parallel arrays?

specifically I've discovered how to 'trick' mainstream cloud storage and mainstream query engines using mainstream table formats how to read parallel arrays that are stored outside the table without using a classic join and treat them as new columns or schema evolution. It'll work on spark, bigquery etc.

Whats a good place to see parallel arrays defined. I have no data lake expetience. Know how relational db works.

I mean,

    Table1 = {"col1": [1,2,3]}
    Table2 = {"epiphany": [1,1,1]}
    for i, r in enumerate(Table1["col1"]):
      print(r, Table2["epiphany"][i])

He's really happy he found this (Edit: actually it seems like Chang She talked about this while discussing the Lance data format[1]@12:00 in 2024 at a conference calling it "the fourth way") and will represent this in a conference.

[1] https://youtu.be/9O2pfXkCDmU?si=IheQl6rAiB852elv


Seriously, this is not what big data does today. Distributed query engines don't have the primitives to zip through two tables and treat them as column groups of the same wider logical table. There's a new kid on the block called LanceDB that has some of the same features but is aiming for different use-cases. My trick retrofits vertical partitioning into mainstream data lake stuff. It's generic and works on the tech stack my company uses but would also work on all the mainstream alternative stacks. Slightly slower on AWS. But anyway. I guess HN just wants to see an industrial track paper.

That code is for in memory data right? I see no storage access.

What is really happening? Are these streaming off 2 servers and zipped into 1. Is this just columnar storage or something else?


OT: The time between releasing a free Rubik's cube program to play store and receiving a cease & desist has always impressed me.

My 10 year old has been building this website using google sites for a year now, he collects interesting/fun/functional links.

https://awebsite.space


Forgot that it needs www

https://www.awebsite.space/


Yes! gcc/omp in general solved a lot of the problems which are conveniently left out in the article.

The we have the anecdotal "They failed firefox layout in C++ twice then did it in Rust" < to this I sigh in chrome.


The Rust version of this is "turn .iter() into .par_iter()."

It's also true that for both, it's not always as easy as "just make the for loop parallel." Stylo is significantly more complex than that.

> to this I sigh in chrome.

I'm actually a Chrome user. Does Chrome do what Stylo does? I didn't think it did, but I also haven't really paid attention to the internals of any browsers in the last few years.


And the C++ version is add std::execution::par_unseq as parameter to the ranges algorithm.

This has the same drawbacks as "#pragma omp for".

The hard part isn't splitting loop iterations between threads, but doing so _safely_.

Proving an arbitrary loop's iterations are split in a memory safe way is an NP hard problem in C and C++, but the default behavior in Rust.


Well, if you are accessing global data with ranges, you are doing it wrong.

Naturally nothing on C++ prevents someone to do that, which is why PVS, Sonar and co exist.

Just like some things aren't prevented by Rust rather clippy.


Concurrency is easy by default. The hard part is when you are trying to be clever.

You write concurrent code in Rust pretty much in the same way as you would write it in OpenMP, but with some extra syntax. Rust catches some mistakes automatically, but it also forces you to do some extra work. For example, you often have to wrap shared data in Arc when you convert single-threaded code to use multiple threads. And some common patterns are not easily available due to the limited ownership model. For example, you can't get mutable references to items in a shared container by thread id or loop iteration.


> For example, you can't get mutable references to items in a shared container by thread id or loop iteration.

This would be a good candidate for a specialised container that internally used unsafe. Well, thread id at least; since the user of an API doesn't provide it, you could mark the API safe, since you wouldn't have to worry about incorrect inputs.

Loop iteration would be an input to the API, so you'd mark the API unsafe.


There’s split_at_mut to avoid writing unsafe yourself in this case.

Afaik it does all styling and layout in the main thread and offloads drawing instructions to other threads (CompositorTileWorker) and it works fine?

That does sound like Chrome has also either failed to make styling multithreaded in C++ (or haven't attempted it), while it was achieved in Rust?

It has pretty graphics.

Satellites don't work because iran gov. is broadcasting gibberish causing satellite connections to drop.

I watched gnome evolve to the point where your calendar notifies you of meetings 10 minutes prior and has a join button which runs the appropriate application. I also get notifications for new emails and slack messages. Last week I was pleasantly surprised by

    snap install tmnationsforever
We are in a good place.

> why some people are incapable of changing their point of view

I've thought about this and the conclusion was:

What you believe you know makes you what you currently are. You can't just believe in a contradictory position. You could believe that you have been proven wrong, which would then change your belief.

Changing your point of view, looking at things from the vantage of someone else with different life experiences and the resulting belief systems would be dishonest at best, and claiming that you are capable of changing your beliefs on a whim is like being able to rip your arm off.

You can, at best, adapt your own belief to encompass theirs with caveats or simply not care about your truths.


I remember implementing some of these

https://www.stolaf.edu/people/hansonr/sudoku/12rules.htm

With a simple array of unsigned int and bit operations like 20 years ago. It could solve a lot of puzzles within microseconds. Later I realized rules 1, 2, 5, 6 are pretty much the same.


Why not just

    blocks(Rows, Blocks), maplist(all_distinct, Blocks), maplist(label, Rows)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: