Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Lately, I've been studying machine learning, from point zero, with a focus on time series analysis. Two months in already completed a course on Python and another book on Pandas, several hundreds of hours later in the fourth chapter in a book I paid for on deep learning and time series analysis they provided me with the most important information I needed: there is no evidence deep learning works better than traditional statistical analysis using classical methods like SARIMA and ETS. Sure, great, if an academic is interested in theory hopefully making a breakthrough, however, the rest of us who are interested in applied should stick with the classical methods.

I was going to write a lot here but I'll keep it short.

What I discovered is that everything I want to do can best be done in PostgreSQL. It's one thing to do data analysis in a Python notebook and another in an environment that works dynamically on a server. My first guess was to do the heavy lifting in Python with Numpy, Pandas, and machine learning and have the node server -- instead of Django and if I'm learning a new web framework it will be Phoenix -- execute the Python scripts through stdin / stdout. Since started I learned that I don't need machine learning and that I can do the calculations inside PostgreSQL sometimes orders of magnitude faster than in Python.

I'm using TimescaleDB which provides the postgresSQL time_bucket() function and with chunks should scale very well. First I tried to integrate it with Prisma in node, however, that proved to be far too difficult and convoluted. I reverted back to using TypeORM in node and it was extremely easy to run all the boilerplate code to initialize the TimescaleDB plugin inside of migrations which would probably be just as easy in another framework like Phoenix with Ecto. Sometimes I use SQL queries in a string literal and other times I use the query builder for more dynamic interaction with the database and to leverage some of TypeORM's other features beyond only being a connection manager.

What I discovered which interestingly someone yesterday shared a popular link to a blog post on the subject[0], for most of time series analysis, Pandas isn't required and perhaps not the fastest solution. Grokking window function was a little difficult until I found this lecture on YouTube, Postgres Window Magic[1]. Leveraging and understand window function in SQL is probably the most important skill to have.

I don't need Python and Pandas for time series analysis. I can using TimescaleDB and some increased knowledge of using PostgreSQL do time series analysis using all the same infrastructure I've been using for the past several years.

[0] https://hakibenita.com/sql-for-data-analysis

[1] https://www.youtube.com/watch?v=D8Q4n6YXdpk



Yeah, I didn't fully understand the problem of time series forecasting. Looks like I'll probably have to have C or python bindings / bridge to do the computation.


> Since started I learned that I don't need machine learning and that I can do the calculations inside PostgreSQL sometimes orders of magnitude faster than in Python.

remeber that you are in a unique position where you know the ML application and specialized pSQL to implement it.

The market is paying big bucks for people that have either of those skills. If you are making less than 300k/y (at the very least), move out now ;)


SQL devs make >300k/yr..?

Also, what does "knowing ML" really even mean?


I prefer 'esoteric' programming languages too, they are better. Experience has taught that ecosystem is way more important for shipping stuff, so stick mostly to mainstream ones.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: