Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Tutorials written with heavy dependencies
38 points by bilekas on Dec 29, 2022 | hide | past | favorite | 50 comments
I am quite stubborn in a lot of ways but one in particular is when I'm guiding a team member on something I like to explain the workings so that there is a high level understanding of how things are working.

In the last couple years, I've been dipping my toes into other areas for 'hobby' time and wanting to know how things I use & like work.

A great example is the machine learning: An immediate google gets you as far as 'install these 10x libs' then write this.

When you dig into the OS of those libraries they're overwhelming and the documentation is never focused on the underlying functionality, which I personally am giddy to learn.

I find myself having to go with trial and error, I hate this because the wheel has already been invented. Maybe I'm missing a resource.

It feels these tutorials are just tutorials in libraries.

I know source code IS the workings, is there a resource other than source code I'm missing ?



In many fields you're expected to work at some higher abstraction level above "how things really are working", in which case all the tutorials will use the primitives of that abstraction level.

In this case, if you want to understand how stuff works, you should explicitly look for things that are not labeled "tutorials" - often textbooks will be a decent example, covering the principles and theory behind these abstraction layers which you'll then use in practice.

Like, in ML there are books which work through a basic implementation of all the algorithms using just the matrix multiplication primitives of matlab or numpy, and that works well as a learning exercise, but in practice everyone would rather use a highly optimized (but thus complicated less understandable) library maintained by others.

Similarly, in cryptography, there are textbooks which will work you through an implementation of the core algorithms, but again, a tutorial teaching how to do stuff in practice would not (and definitely should not) cover making your own implementations for cryptographic operations, but rather describe how to use a properly verified library.


Which is kind of a shame, to be honest. I would dare say that even x86 assembly is less complicated to understand than many of these libraries, where time is spent trying to memorize their semantics rather than understanding how data is modified and how systems communicate with each other.

It is true that a lot of platform specific nastiness and complicated math occurs in some of these libraries which is not necessary for a user to learn. But for those desiring to learn "what's happening under the hood", the code should not make an effort to needlessly obscure this from them by way of complicated syntax, poor interface design, academic naming practices, and big unwieldy spaghetti classes.


idk man I already got a couple hobbies and one serious domain of expertise I'm just tryna get work done and get paid here. might be a shame to you but I'm so grateful I don't have to dig through a compiler manual or whatever to do my job.


Agreed, with the only caveat being, there is a chance that the current favorable market conditions for programmers disappears.

There might come a day where companies believe that the value of slapping a few potentially buggy libraries together is not worth $xxx,xxx per year. I for one kind of hope that companies return to buying/contracting well-defined finish products with small external studios rather than throwing money at W2 developers like us. I think the software quality will be higher, and the risk of our jobs being outsourced lower, wherher rightfully so or not.


> In many fields you're expected to work at some higher abstraction level above "how things really are working", in which case all the tutorials will use the primitives of that abstraction level.

Great point, I used ML only as an example, GraphQL was another that came up in comments.

Books are a good resource, or college literature for example, but from an approachable point of view, without knowing the 'basics' it's hard to consume.


Can you link some examples? Appreciate it.


This is one of the reasons I still use LiveCode despite middling performance, near-zero library support, and a language spec that was reasonable in 2000: it has zero dependencies. You can:

    Go to the web site
    Download a single installer
    Run that to produce a single file executable
    Run the app
    Create a new project (you get a window for free)
    Drag a text box from the tool palette onto the window
    Switch to the arrow tool
    Click the text box
    Type "Hello world"
    Save the project to disk
    Select Standalone Application Settings on the File menu
    Check mark to build for MacOS, Windows, and Linux
    Select Save As Standalone Application on the File menu
...and you get single-file executables for three platforms. That was maybe 13 steps from nothing to multiplatform Hello World. It stuns me that other environments/languages make it harder than that.

Many years back I used to do demos for LiveCode at trade shows where I would build a stopwatch timer while holding my breath.

These days it would be much better in so many ways to be working in Python. But the lack of an environment like LiveCode is a major pain point.

https://livecode.com in case anyone is curious.


What stops me from trying LiveCode is that I don't see a way to buy a non-subscription version (would also have to not be significantly hobbled).

All the pricing plans are either grossly expensive (meaning I'd have to already be skilled with LiveCode and have at least one product profitable enough to offset the subscription), or too expensive for what I'd get (too limited features, and everything disables if you stop subscribing…including any ceated apps).

Things that make me almost want to try it: It looks like Hypercard (script-wise) and a bit like FutureBASIC II (including the Program Generator), can edit a running program and deploy to various platforms.


You probably already know this, but: the most affordable option is $10/month for a single platform. It seems perfectly reasonable for trying out the platform. If the license fees for serious development are a non-starter, then it's probably not worth trying out. But:

There was an open source version at one point that wasn't economically viable so they discontinued it. In the announcement here: https://livecode.org they say to email their support team if $10/month is too much for you.


Thank you, for the pointer and the followup.


Hey, you're welcome. LC has warts, but when I just want to bang out some code to transform this into that, with a few buttons/dropdowns to play with the way it gets transformed, it's my go-to.

If there's something else better for that purpose I'd love to know about it -- probably would have to be python-based given all the libraries available for it at this point.


I had forgotten there's an effort to maintain the open source version. I don't know how well-maintained it is, but the transition only happened in 2021, so it's probably not too out of date even if nothing has happened since. So how does free sound? http://www.openxtalk.org


I see some active discussions on the org's forum. Thanks!


I remember attending a meetup, once, where it was supposed to be a tutorial on GraphQL. I was barely familiar with GraphQL, when I went in, and left, almost exactly the same.

The tutorial was actually all about a couple of JS libraries that you could use as GraphQL abstractions. I am a Swift programmer, and was a lot more interested in the actual GraphQL interface, which was barely mentioned.

Lot of that stuff, going around...


You’ve probably figured it out by now, but for others who may be in a similar position; GraphQL is a specification (with various implementations) and you can read up on the spec here: https://spec.graphql.org/


Can you explain it like I am 5 ?


Explain the spec, or the tech? Well the spec basically exists because some people at Facebook noticed that it's more efficient for the client to request exactly the data it needs than to have the server send everything and the client has to transform it. Yes, REST could work, but once you have more complex data structures, it becomes harder.

For example with a todo list app, if you want a user's todos that are complete, your request might look like

    /api/todos/$userId/$completionStatus
Now if you want the todo titles, completion status, and description, but don't want a field like subtitle, you could send the entire todo from which the client could extract the necessary fields, but why? That's extra data sent over the wire and extra computation done on the client, for basically no real reason. So maybe in your api, you do something like

    /api/todos/$userId/fields=title&$completionStatus&description
Congratulations, you reinvented GraphQL.


GraphQL is a spec, not tech. The implementation of it (Apollo for example) is what you are probably after. GraphQL can exist on a multitude of services that support queries with the GraphQL spec. Like SQL, there are many databases that support SQL (in various forms).


Raison d'être: https://graphql.org/

Assumes you’ve done some API design, maintenance, and/or usage, to grasp benefits.


Please... I've already come to the belief that graphQL is black magic. I use it daily.

You do raise a another great example with graphQL & sparQL .. I gave up trying to find any low level documentation on it because it was its so abstract from something I knew before, so I just learnt the triplets and felt comfortable to query 'enough'. I use a library now https://github.com/giacomociti/iride because I know the owner and could bug him. Other than that it was a black box.


The source code of the black box is open, you can read it. And its tests.


OP, An analogy for your question/concern:

You want to add 2 numbers.

So, I hand you a calculator.

That calculator has about 1000 dependencies.

I didn't hand you petroleum oil and copper wire, and say "Oh, first, do all these prerequisite processes to manufacture the inputs you'll use to build a calculator. Next, build a calculator. Ok, now you can add two numbers together"

(We could even go a step back from that-- I hand you some steel components, a forge, and some people (labor and engineers), and give you a tutorial on how to make a foundry to create the parts to create and oil rig, and to use to create components for bull dozers, which you then manufacture and use to mine copper ore, which you then process into wire... and so forth)

Do you want to:

- build a foundry to build tools for mining

- mine minerals (iron for tools to make other tools with, oil for plastic, copper ore for wires, etc.)

- build a factory for making calculators

- manufacture a calculator

Or do you want to

- Take this calculator, and add 2 + 2?

Perhaps you'd rather build the tools. Or perhaps you'd rather use the tools to solve a business problem.

Personally, I'm the developer who prefers to solve the business problem, as business strategy & product management interests me more than hardcore science/engineering.


Factorio addicts might disagree


Machine learning is an enormous field, and if you are after a explanation/exploration from the ground up, then Kevin Murphy's Probabilistic Machine Learning: An Introduction is very good, but a bit of a tome.

If instead, you want to focus on neural networks, I found Michael Nielsen's Neural Networks and Deep Learning an excellent resource for implementing them from first principles (available at http://neuralnetworksanddeeplearning.com/).


This may not be a sexy idea but one example I can thing of Tom Mitchell’s Machine Learning book, it gives the bare bones of what machine learning is (albeit it is missing some of the newer architectures) but that gives a great overview of how to build these architectures from scratch


> Machine Learning: A multistrategy approach added to cart

Thanks


I can certainly relate to it. I used to follow tutorials and stuff on ML when I was near end of high school but I just could not shake the feeling that I don't know lot of things going around

It's been 2.5 years into college, and yeah I did laze at many times (college standards here are not particularly rigorous), and I am nowhere nearly capable to reason about numerical calculations considerations, Statistical methods, architectural development at the hardware level for scientific computation, etc. There's so much that goes into it, there's so many layers, components, theories, etc that you get lost very soon

At end all you can really do 1. Grab a statistics book and get into ML theoretically

2. Learn about numerical computations, I particularly enjoyed the Handbook of Floating Point Arithmetic, but I never really finished it

3. Libraries have lot of optimizations in it pertaining to the specific architecture, and in fact, I remember that someone gave a demonstration of using numpy and he faced an error, which had to do Windows itself : ) You will get to see lot of such exceptions of course in the code too... idk what to recommend here really, just, read more?

4. Documentations can be wrong at times, or fail to mention some assumptions, or there might not be one at all really. Software just didn't see massive adoption of rigorous frameworks, like in many other disciplines, and soon got surrounded by business needs and customers' complaints. But even if you can somehow get an insight to philosophy, the values they put into their code, etc., it's a huge help imo. Books provide it at times, for example I was struggling with SYCL specification, so I grabbed this "Mastering DPC++", tho it also assumes a bit of experience


For ML not in python: maybe these videos will help [0] the datasets are here [1]

The author walks through the basics (Linear regression, Naïve Bayes, etc) using Julia. The parameters and output are better explained than what I have found with python equivalents.

[0] https://www.youtube.com/playlist?list=PLhQ2JMBcfAsi76O13sJzk...

[1] https://github.com/fabfabi/julia4ta_tutorials/tree/master/Se...


In general, most documentation is infamously:

1. outdated, and filled with deprecated syntax or abandoned bugs

2. version dependent, and thus pointless to read or write

3. platform dependent, and thus also falls under point #1 or #2 with time

4. poorly written, as most rarely read/update the documentation

While tools like Doxygen attempt to fill the reference holes, in general a lot of effort is made to create unit-tests/examples of how software should be integrated.

The experience can be unpleasant if you are new to a large library. The major downside of Open-Source projects is usually the RTFM dismissive attitude, as many people are not here to provide "free" support rather than solve/share there own use-cases. If you can show a unit-test that highlights a specific issue, than there may be mutual interest from project members... but you need to prove you are not asking people to search google for you.

In general, if you are at the "trial-and-error" stage, than looking at another active/well-documented project that uses the same key library in a similar use-case to your own will often be faster (example: search deprecated kernel api calls in utilities for compatibility details).

Another point I will mention, is thinking about the long-term sustainability choices for a project. In general, splitting up dependencies into small piped/ipc/rpc/ClMPI/AMQP utilities is wise. That way when someone unwisely permutes a library API like they often do for various reasons (rarely good reasons for a shared object), the affected area needing maintenance is minimized (i.e. the next person only needs to read 3 or 4 familiarly structured documents to securely refactor the module.)

If you are more interested in the algorithmic side, than an optimized library is probably the wrong place to start. Rather, pull the published paper(s) for the algorithm, and look at the published history (i.e. the datasets and ROC curves especially detail what to expect). Prototyping languages like Python/Julia/Octave/MatLab are often the language of choice in this area.

Best of luck =)


There is a source you may be missing: the human factor. Try talking to people involved with the project and ask them for any insights they're willing to share.


Absolutely 100%, I go to any meetups of conferences that are around in my areas. And I have friends who are with a similar mentality of me trying to figure it out.

I maybe should have rephrased the question.


> is there a resource other than source code I'm missing ?

First, you should understand the math you need to be able to implement the algorithms, then you should learn the algorithms. You may "get" the code, but you will never understand it if you don't understand the mathematical objects it represents.

Simple example, linear regression, we can compute the solution without iterative improvements as in Neural Networks, why? How do we actually code that? What is the pseudo-inverse that we are computing doing there?

From here, what's the relationship of that with Newton's method for numerical optimisation? Why does alternating least squares even work?

The code will _never_ explain the underlying mathematics, it can only represent them.

I am not really sure what you are expecting for such a vast field. I recommend reading Elements of statistical learning or Bishop's book or Murphy's Probabilistic ML.


I think, this is partially, back side of open source, as semi-commercial activity.

What I mean, nothing in this world free, and when You making something, You anyway need some close loop, to make it good, and to retain people (consumers and developers) around Your project.

Next in this logic happen strong tie of tutorial to Your library, so tutorial in reality work as ad for You.

Must admit, it is possible to be above this primitive scheme, but for this must be magnitudes better than competitors in Your niche.


Fast.ai courses are great resources for leaning how modern ML/DL models work under the hood

https://course.fast.ai/


what about Data Science from Scratch, Deep Learning from Scratch, and similar books where they just use numpy or something and do everything from mostly first principles?


Nice options, as a hobby and just dipping my toes, I like to not pay for anything before I know if I like it or not. Saying that, I did buy a book on PLC's while I was just starting because I had one to play with.. Fundamentals of Programmable Logic Controllers. It felt like playing a video game before googling the answers.

Loved every min.


For ML tutorials, I feel like I do see quite a lot of tutorials in pure numpy. A fun thing I've been doing with ChatGPT is to ask it to show me some starter code for some topic. Usually it shows me some code that imports sklearn and then I'll followup asking it to give me everything written without these dependencies. It's worked out pretty well so far.


You could patchwork together some of those library tutorials that relate to the first tutorial that encompasses them all.

__Okay so you've installed the 10 libraries, and see how powerful they can be together, here are individual things you can do per library used.


95% of programming is plumbing.


Plumbing is fine but knowing how water flows is more important.


Perhaps this is a meaningful distinction between "software engineering" and "programming".


Better learn the other 5% because AI may be coming for the programmer-plumbers' jobs real soon.


> A great example is the machine learning: An immediate google gets you as far as 'install these 10x libs' then write this.

This is because ML mostly uses Python. And Python is an absolute clusterfuck of dependency hell. https://xkcd.com/1987/


I was a HPC system admin for years an I still avoid python because of this and having to manage 2-3 different versions for everyone that never updated their scripts. I hated it and to this day I do my best to write my own support routines in my programs to minimize pulling in anything but the most common libraries or packages.


You are lucky, have no much deal with js world, especially Node, but also affected React, etc.

There real Hell. - I have not seen any real project, without few totally outdated libraries, which changed everything if You once decide update.


> ML mostly uses Python

This is a symptom maybe and not a cause. I preferred R-lang for data intensive even matlab, but I don't know because all tutorials are in Python and there is no low lvl ones.


How do you get started with Python dependencies? I've been trying to play with some scientific code and never managed to get it to run and install its dependencies.


Read about virtualenv, or some of it's analogs.

Yes, answer, You run Your scientific projects in chroot environment, with it's own Python and libraries, and connect to outer world via http-json or in some cases with csv files (pipes).

To run with pipes, could run third party user space program, which will call different Python envs (fortunately it's easy, just find interpreter inside venv directory tree, and use it directly, instead of just python), and will feed them with data.

And yes, sometimes this becomes real pain, because we humans, not machines, and it's easy to make mistake, when working with two so similar envs as typical Python.


Also could share data with db, but I think it is not always good, to add (usually) heavy weight db libraries into venv, but pipes are lightweight, and csv/json/yml also not heavy.


If you are not using Windows you can use:

pip: https://pypi.org/project/pip/

pipenv: https://pipenv.pypa.io/en/latest/

poetry: https://python-poetry.org/

Which are basically the same thing.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: