No. All the textbooks know that polynomials of high degree are *numerically* dan...

heisenzombie · 2025-04-22T21:53:45 1745358825

He seems reasonably explicit about this:

""

This means that when using polynomial features, the data must be normalized to lie in an interval. It can be done using min-max scaling, computing empirical quantiles, or passing the feature through a sigmoid. But we should avoid the use of polynomials on raw un-normalized features.

""

constantcrying · 2025-04-22T22:22:07 1745360527

No.

This paragraph has nothing to do with numerics. It is about the fact that continuous functions can not be approximated globally by polynomials. So you need to restrict to intervals for reasons of mathematical theory. This is totally unrelated to the numerical issues, which are nowhere even acknowledged.

vient · 2025-04-23T01:02:41 1745370161

But what's the point in acknowledging numerical issues outside of [-1,1] if polynomials do not even work there, as author explicitly notes?

constantcrying · 2025-04-23T06:05:26 1745388326

All polynomials "work" globally. That some polynomials form an orthonormal basis over certain intervals is essentially irrelevant.

The author does not address the single most important reason why high degrees of polynomials are dangerous. Which is pretty insane to be honest, obviously to be honest you have to at least mention why people tell you to be cautious about high degree polynomials AND point out why your examples circumvent the problem. Anything else is extremely dishonest and misleading.

sheepscreek · 2025-04-23T02:17:46 1745374666

Is there a mathematical proof or basis to back what you’re saying?

constantcrying · 2025-04-23T06:06:51 1745388411

That polynomials do not approximate continuous functions globally??

That is pretty obvious. Consider that every polynomial grows arbitrarily large, but not every continuous function does.

gthompson512 · 2025-04-23T20:14:23 1745439263

There is a simple corollary to Stone-Weierstrass that extends to infinite intervals, but requires the use of rational functions.

toxigun · 2025-04-23T09:01:36 1745398896

Actually they aren't. You never compute high powers of the argument when working with specialized bases.

You use the recursive formula that both the Bernstein basis and the orthogonal polynomial bases are endowed with. This is implemented in numpy, so you don't have to do anything yourself. Just call, for example, np.polynomial.legendre.legvander to get the features for the Legendre basis.

And a basis orthogonal over [-1,1] is easily made orthogonal over arbitrary interval. Take p_i to be the i-th legendre polynomial, then the basis composed of q_i(x)=p_i(2(x-a)/(b-a)-1) is orthogonal over [a,b]. Each q_i is itself a polynomial of degree i, but you never use its coefficients explicitly.

There is an entire library for computing with polynomial apptoximants of functions over arbitrary intervala using orthogonal polynomials - Chebfun. The entire scientific and spectral differential equations community knows there are no numerical issues working with high degree polynomials over arbitrary intervals.

The ML community just hasn't caught up.

constantcrying · 2025-04-23T09:28:32 1745400512

>The entire scientific and spectral differential equations community knows there are no numerical issues working with high degree polynomials over arbitrary intervals.

This is totally wrong. Of course there are enormous numerical problems with high degree polynomials. Computing large powers of large numbers is enormously unstable and needs to be avoided under basically all circumstances, that is what makes dealing with those polynomials difficult and cautioning people against this is obvious correct.

What you described are the ways to deal with those problems. But this isn't what the article does. My problem with the article is the following:

- Does not mention the most important issue with high degree polynomials, namely their numerical instability.

- Gives examples, but does not explain why they circumvent the numerical problems at all. The most important choice made (the interval of 0 to 1) is portrayed as essentially arbitrary, not an absolute necessity.

- Concludes that there are no problems with high degree polynomials, based on the fact that the experiments worked. Not based on the fact that the actual issues were circumvented, leaving the reader with a totally wrong impression of the issue.

This is terrible scholarship and makes for a terrible article. Not acknowledging the issue is a terrible thing to do and not explain why seemingly arbitrary choices are extremely important is another terrible thing to do. The whole article fails at actually portraying the issue at all.

To be clear, I am not saying that this approximation does not work or that with appropriate scaling and the right polynomials these issues can't be mostly circumvented. Or that high degree polynomials "in general" are incalculable. I am saying that this article completely fails to say what the issue is and why the examples work.

toxigun · 2025-04-23T12:02:04 1745409724

I believe the author assumes that it's clear to the reader that there is a distinction between how a mathematical object is defined, and how it's computationally used. A polynomial can be defined as a power series, but it's not how they are computationally used. In this sense, the author was mistaken.

But it's not that the problems are "circumvented", in the sense that it's a kind of a hack or a patch, but they are solved, in the sense that there is a systematic way to correctly compute with polynomials.

constantcrying · 2025-04-23T12:35:00 1745411700

>But it's not that the problems are "circumvented", in the sense that it's a kind of a hack or a patch, but they are solved, in the sense that there is a systematic way to correctly compute with polynomials.

But the author does not even acknowledge that there is a wrong way. He claims that it is just a "MYTH" that polynomials have huge problems.

Read the article, nowhere does the author acknowledge that these numerical issues exist at all. Nowhere does the author talk about why the specific methods he uses work or even acknowledge that had he used a naive approach and written the polynomials as a power series over a large interval everything would have fallen apart.

Surely the "MYTH" that high degree polynomials are dangerous is not a myth. Concrete examples where naive approaches fail are trivially easy to find.

Again. I think you are missing the point entirely. I am not saying that these Fourier type projections are "wrong" or "bad" or "numerically unstable" or that you can't write an article about those or that high degree polynomials are impossible to work with.

I am saying that if you claim that it is a "MYTH" that high degree polynomials are dangerous, you should mention why people think that is the case and why your method works. Everything else seems totally disingenuous.

sevensor · 2025-04-23T13:39:27 1745415567

> Computing large powers of large numbers is enormously unstable and needs to be avoided under basically all circumstances, that is what makes dealing with those polynomials difficult and cautioning people against this is obvious correct

But we don’t compute large powers of large numbers? Chebyshev is on [-1, 1]. Your large powers go to zero. And your coefficients almost always decrease as the degree goes up. Then to top it off, you usually compute the sum of your terms in descending order, so that the biggest ones are added last.

constantcrying · 2025-04-23T13:55:38 1745416538

Could you please read the post before commenting?

"To be clear, I am not saying that this approximation does not work or that with appropriate scaling and the right polynomials these issues can't be mostly circumvented. Or that high degree polynomials "in general" are incalculable. I am saying that this article completely fails to say what the issue is and why the examples work."

hervature · 2025-04-23T00:33:36 1745368416

Neural network training is harder when the input range is allowed to deviate from [-1, 1]. The only reason why it sometimes works for neural networks is because the first layer has a chance to normalize it.

jansan · 2025-04-23T05:40:07 1745386807

If you have a set of points, why can't you just map them to an interval [0, 1] before fitting the polynomial?