More

anst · on Nov 30, 2023

Time to go amd, poor old me, Intel MB Air 2018 (zsh: exec format error, Darwin Kernel Version 22.2.0, MacOS Ventura 13.1).

jart · on Nov 30, 2023

You need to upgrade to zsh 5.9+ or run `sh -c ./llamafile`. See the Gotchas section of the README.

anst · on Nov 30, 2023

Many thanks! Incredibly versatile implementation.

anst · on May 4, 2023

Definitely. The Invincible would fit as well https://en.wikipedia.org/wiki/The_Invincible Big time for Lem.

anst · on March 10, 2023

On my old Mac: Python=133.5 s, Numba=2.61 s (parallel prange in count_primes), Taichi=1.8 s. (on ti.cpu, but fails with metal).

anst · on March 10, 2023

In a winner-takes-all finite world a tipping point may never happen...

anst · on Jan 5, 2020

Yeah, stir some mysterious ideas, maybe someone sees there a meaning. Quite surprised with this kind of vague magical thinking coming from lesswrong (thought they were rationalists or something).

IC4RUS · on Jan 5, 2020

I was surprised too, but it appears that the post has negatives votes, so I take it that it wasn't quite rational enough for the rest of the community.

anst · on Nov 5, 2019

Python + Jupyter OK, but pandas actually reads everything at once, doesn’t it. 100MB is no problem but bigger files could result in high swapping pression.

cgufus · on Nov 5, 2019

I definitely agree that with this amount of data, you should move to a more programmatic way to handle it... pandas or R.

Keep in mind that pandas (and probably also R?) internally uses optimized structures based on numpy. So a 10 GB csv, depending on the content, might end up with a much smnaller memory footprint inside pandas.

If you have 10 GB csv, I think you will be happy working with pandas locally even on a Laptop. If you go to csv files with tens of GB, a cloud vm with corresponding memory might serve you well. If you need to handle big-data-scale csvs (hundreds of GB or even >TB), a scalable parallel solution like Spark will be your thing. Before you scale up however, maybe your task allows to pre-filter the data and reduces the amount by orders of magnitude... often, thinking the problem through reduces the amount of metal one needs to throw at the problem...

anst · on Dec 17, 2017

A really interesting collection for teaching Data Science. But the "WorkshopScipy" and the "scientific computation" make reference to something like scipy.org. It seems that this repo has different interests.

BucketSort · on Dec 17, 2017

The workshop was focused on using tools in the scipy.org ecosystem. If you take a peak at the slides and code, you'll see that I primarily stick to scipy stuff.

anst · on Nov 6, 2017

Stranger Things.

anst · on Aug 28, 2017

What "maths" is keras? Or scikit-learn? For what it's worth, to understand scikit-learn doc/tutorial I'd say you'll need Probability, Linear Algebra, Multivariate Calculus and, yeah, Stats. Not necessarily at a PhD level but still. And more you understand maths farther you can get in AL/ML.

mljoe · on Aug 28, 2017

These libraries leave most of the actual day to day work for ETL. ETL happens to be highly data and problem dependent, so it can't be easily automated or reused. For this reason I think the best thing to be a good applied ML person is a solid programming background. You should have a working knowledge of statistics and linear algebra, but the most useful skill really is being able to write good code. It's different for research of course.

leecarraher · on Aug 28, 2017

Those are ML and AI frameworks that use a tremendous amount of mathematics under the hood, but you can also reliably treat them as blackbox learning systems too. Understanding the model generation procedure and setup is often unneeded. And many tools will help direct you toward what algorithms makes the most sense for your data, and even have competitions to figure out which actually works best. I agree, it's a little disappointing, but admittedly it doesn't take a PhD to do this stuff anymore.

It is important to note that just because you can do all the stuff a PhD Scientist might regularly do, doesn't mean that someone will hire you for it. In that case you might need to have a PhD in mathematics, computer science or a related field. But that is more a consequence of competition and long term talent investment, than the practice of ML/AI itself.

thanatropism · on Aug 28, 2017

Competition (labor supply side) and ultimate success of current ML approaches.

As the market starts to overheat, it seems that there will be a labor shortage/good quality workers will be scarce and we'll have to make simple tools for simpletons. But this is all a huge "if". Eventually the market will contract a lot and slack labor market conditions will have companies hiring them PhDs.

murbard2 · on Aug 28, 2017

It's not just competition: a clear understanding of what happens under the hood will make you a better user of the tool.

anst · on July 22, 2017

Free will arising from quantum indeterminacy, well, that's a thought of Robert Penrose. The bad news is that even he didn't manage to convince others and the theory didn't really took off. In other words, we are not yet ready to understand such a link.

However, there is the Strong Free Will Theorem of Conway and Kochen. Essentially, they prove that "if humans have free will, then elementary particles already have their own small share of this valuable commodity", see http://www.ams.org/notices/200902/rtx090200226p.pdf