Hacker News new | past | comments | ask | show | jobs | submit login
How to compare floats in Python (davidamos.dev)
137 points by EntICOnc on March 30, 2022 | hide | past | favorite | 86 comments



Either people are doing very strange things with floating point numbers or this is generally pretty poor advice. You should not generally try to ham-fistedly pretend that approximate floating point operations are producing exact results. You should not generally be looking for exact equality of floating point values except in the case where you have a justified reason to believe that a floating point computation would arrive at an exact result. You should not generally be adding approximate equality terms to your directional comparisons because all that does is switch the precise point at which you are dividing the input space, except now it is at a location that isn't consistent with the opposite test and isn't specified explicitly in the program text. You also shouldn't treat exact floating point values as if they are approximate—floating point is not approximate, some floating point operations are (conditionally) approximate. A value of exactly 1.0, say from the result of min(1.0, f), is not going to arbitrarily change to a different value.

Decimal is also not a solution to floating point approximation. Decimal has all of the same rounding properties that binary floating point has, and actually it's a bit less well behaved. The differences are that Decimal round trips between textual decimal representations, and that Decimal supports an arbitrary choice of bit length.

There are cases where one does need to check for convergence to arbitrary values, but these are probably not what you are doing, and if they are you should probably know more about the precise numerical guarantees that you have.


Decimal representations have one really minor advantage: they behave more the way people expect with respect to which numbers are exactly representable especially in the size ranges of every-day numbers, since they exhibit mostly the same behavior as calculators do.

Plenty of people find that 0.2 not being exactly representable in binary floating point is not intuitive.


IEEE-754 allows "decimal floats". Basically, instead of 2^exponent, it's 10^exponent. `decimal32`[0], `decimal64`[1], and `decimal128`[2] are defined. But I'm not aware of any popular system that implements them.

[0]: https://en.wikipedia.org/wiki/Decimal32_floating-point_forma...

[1]: https://en.wikipedia.org/wiki/Decimal64_floating-point_forma...

[2]: https://en.wikipedia.org/wiki/Decimal128_floating-point_form...


> Decimal has all of the same rounding properties that binary floating point has

This is not true.

> Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem

> Both binary and decimal floating point are implemented in terms of published standards. While the built-in float type exposes only a modest portion of its capabilities, the decimal module exposes all required parts of the standard. When needed, the programmer has full control over rounding and signal handling. This includes an option to enforce exact arithmetic by using exceptions to block any inexact operations.

https://docs.python.org/3/library/decimal.html


I think they mean in the abstract. With a fixed precision, you have round-off error as long as you only have a fixed number of digits, and thus have all the attendant issues in computations with them.


> Decimal is also not a solution to floating point approximation.

Then what is the solution?

As someone whose worked with finance where floating point approximation lead to one being attacked by auditor's chasing 1 cent error that somehow crept in over millions of operations, what other choices are there?

Floating point introduces the worst sort of error. If you are expecting exact results floating point gets it right with almost no effort 99.999% of the time. The remaining 0.001% it gets it wrong, and it happens at seemingly random. A heisenbug in other words. But if you force programmers to use integers, all those missing fractions of cents suddenly become glaringly obvious.

Decimals / fixed point / integers don't "solve" the approximation problem. They just make it plain, forcing programmers to think about it, code for it, and test for it. If you care about what happens to every last bit, then there is no other solution.

Or to put it another way, if you don't particularly care what happens to the lsb, then fine use floating point. Problems happen because people really do care, but are seduced by floating points 99.999% success rate and use it anyway. Hint: if you are comparing for the equality of two numbers computed in two different ways, you absolutely care.


I should have specified ‘decimal floating point’ for clarity, since that is what Python's Decimal is. I agree that non-floating representations often are the solution, depending on what problem you have.


>You should not generally be looking for exact equality of floating point values except in the case where you have a justified reason to believe that a floating point computation would arrive at an exact result.

I mean, yes, but there are plenty of cases where a result would be equal from an analytic perspective, but the implementation never realizes this due to floating point arithmetic.

>You also shouldn't treat exact floating point values as if they are approximate—floating point is not approximate, some floating point operations are (conditionally) approximate. A value of exactly 1.0, say from the result of min(1.0, f), is not going to arbitrarily change to a different value.

You're not wrong, but I think you also need to be careful here in how you communicate this. I've encountered plenty of situations where low-level details of floating point arithmetic mattered because it was only (in a sense) approximate, and I've even gotten non-deterministic behavior out of what should have been a deterministic program due to floating point issues.


I agree that you need to be cautious asserting that you know the exact value of a floating point value, because it isn't a property you get by default.

Generally, though, if it isn't fairly obvious that equality should be preserved, and if you don't have exactness as a requirement, then you probably shouldn't be building your solution around equality checks at all.


> You also shouldn't treat exact floating point values as if they are approximate—floating point is not approximate, some floating point operations are (conditionally) approximate.

Depending on your domain, they usually are. Perhaps your floating point number represents a reading from a sensor which has much less precision than your floating point type.

It would be more correct to model all of your approximate variables with error bars (and perhaps even error distributions) but in practice, treating the closest floating point number as a point estimate and doing all your comparisons to within some reasonable tolerance may be an acceptable approach.


This is a completely legitimate concern but it isn't about floating point. You could be getting integer readings from your sensor and you would have exactly the same concern. In such cases you need to provide tolerances in your code that correspond to the tolerances of your measurement. You might need to implement hysteresis, you might need to take multiple samples and track moving averages, you might need to take empirical measurements of the variance you expect and suppress readings below a threshold. It is very rare however for the solution to just be to say two values compare equal if they are within a billionth part from the other.


But why would you need to compare such measurements for equality with an arbitrary tolerance? If your algorithm is doing something like producing a yes/no answer from some measured data then it should be obvious what tolerance to use according to the domain. I would code that explicitly using inequalities instead of depending on some mystery default epsilon.


You’re right here I think, but it is a hard problem in practice I noticed. Hard in the engineering sense, because you’re sometimes writing generic library-level code that needs a domain specific notion of error bars and tolerances. So you need to parametrize everything, sometimes for multiple dimensions. Becomes messy quickly.

I must admit putting a mystery epsilon in my code here and there for that reason. Admittedly the wrong thing to do.


FWIW, I thought the advice in the article was generally correct (at least I couldn't spot anything I knew to be wrong). I agree that, if you actually need exactness and can spare the extra CPU cycles, floats are not your best bet (e.g. banking, accounting). But for the use case:

> There are cases where one does need to check for convergence to arbitrary values [...]

the advice in the article doesn't strike me as wrong per se. If you're implementing a numerical algorithm for some reason and you want to check that it's well behaved, adding a couple of tests where you check for approximate equality of some results as detailed in the article seems fine to me. (Of course, ideally you would additionally also formally prove the numerical stability of that algorithm.)

I guess the article is maybe missing one crucial piece of advice, namely that floating point computations are not guaranteed to be well-behaved. If you don't know much about floating point computations and just read this article, you might be tempted to think that every computation involving floats should lead to a result that is "close" to the "real" result. But that's not true, numerical algorithms can be poorly behaved and error terms can multiply, leading to catastrophically wrong results.


> I agree that, if you actually need exactness and can spare the extra CPU cycles, floats are not your best bet (e.g. banking, accounting).

You are misunderstanding me, this is not what I was saying at all. I think much of the time when people feel like they should use floats they can actually use floats, yet only in very rare, niche scenarios should math.isclose be used.

Using exact equality comparisons with floating point numbers, while fairly uncommon, seems a lot more frequently the right thing to do than math.isclose: for example, sorting algorithms, hashmaps and caches, checking for equality between structs, sentinel values.

Even in most cases where approximate equality is wanted, like maybe you are snapping a position to another position, or maybe you are checking that your new audio processing algorithm is producing noise below a noise floor, even then I would hesitate to use math.isclose, because it still probably isn't a good fit for the domain, particularly with it having an awkward default relative error term.


> Using exact equality comparisons with floating point numbers, while fairly uncommon, seems a lot more frequently the right thing to do than math.isclose: for example, sorting algorithms, hashmaps and caches, checking for equality between structs, sentinel values.

I mean, math.isclose() doesn't even form an equivalence relation, so of course it doesn't make sense to base struct or hash value equality off of it.

I don't think anyone was suggesting to use approximate equality as a form of actual equality. The only reasonable equality for floats is exact equality, yes. However, it's also an equality which is useless for calculation.

> I think much of the time when people feel like they should use floats they can actually use floats

I disagree, many LOB applications deal almost exclusively with discrete settings, and in such cases, floats are generally not necessarily appropriate. Money, for example, should generally not be represented as a float, unless you're doing e.g. financial mathematics on it, estimating return rates and so on (and then with the understanding that these calculations won't be exact).


I think most people trying to losslessly represent money or similar discrete quantities already have the intuition that you shouldn't use binary floating point, and those that don't are not well served by telling them to math.isclose it, which is at best a fragile bandaid.


I sadly don't think the first part of your sentence is true, from what I've seen, but yes, definitely agree about the second part.


What a strange comment to say "A value of exactly 1.0, say from the result of min(1.0,f), is not going to arbitrarily change to a different value." Christ man, floating point arithmetic is on computers, it's definitely deterministic, no one is saying it's indeteriminable, lol, including the author. It's not like we're off in LSD space where numbers change meaning from moment to moment, it's only that there is information lost in each operation that lead to accumulating error such that exact comparisons usually do not express what you mean to compare in computer programs. That's all, and I don't think the OP is saying anything else.

Other than that I'm not sure what you're adding. The point is people are unclear when they write code because they do not understand floats, and because few people really understand round-off error, they should as a rule compare with np.isclose unless, as you say, they now for a fact they expect a specific floating point number (although the times that happens is by and far the minority in actual code).


In spirit you are right.

Alas, floating point arithmetic is not necessarily deterministic in the sense that running the same Python or C code twice is guaranteed to give you the some results.

First, your compiler is often allowed to do weird things. (Like compile the same code in different ways.) And there's some weirdness with the state of flags that might be different from iteration to iteration.

See https://randomascii.wordpress.com/2013/07/16/floating-point-... for some more information.


Yeah, pretty much the only case I can think of when you'd want to check floating points for equality is for bug hunting e.g. if you calculate a set of disjoint probabilities you might want to verify that they add up to 1.


Adding floating point numbers is dangerous, too.


Can you explain? In the case that all floating point numbers have the same sign (in this example, probabilities), I thought addition was well behaved.

I was taught that you can only get into trouble when you e.g. subtract two approximately equal values from each other.


Alas, addition isn't well-behaved.

Eg when you want to add a list of (non-negative) numbers, it makes a difference what order you add them in. The usual recommendation is to add them in ascending order.


It's not commutative, yes, but it should still obey the property that the relative error terms add instead of multiplying, no?

(i.e. in the parent's case of "checking that, if you add a bunch of disjoint probabilities, they're at most 1", doing that with approximate equality seems fine to me?)


Floating point numbers are a common source of issues in large open world games. Because of how a float is stored, there is more precision for values closer to the "origin" of the world (0, 0, 0). Which means there is less precision for very far away things. This can result in things like "z-fighting"[0], which I'm sure most gamers have seen, and also issues with physics.

One solution to address some of the precision issues (not necessarily z-fighting, as it operates in a different coordinate system) is to dynamically re-adjust the player's world origin as they move throughout the world. This way, they are always getting the highest float precision for things near to them.

0. https://en.wikipedia.org/wiki/Z-fighting


Is there a reason that game developers don't use fixed-point math for Cartesian coordinates?

A 64 bit integer and a 64 bit float both chop up your coordinate system into the same number of points, but with the integer, those points are equally spaced which is the behaviour you'd expect from a Cartesian coordinate system (based on the symmetry group of translational invariance).

And even a 32-bit integer is still fine enough resolution to support four kilometres at one-micrometer resolution. With 64 bits per axis you can represent the entire solar system with 15 nm resolution, while maintaining equal resolution at any location, and exact distance calculations between any points no matter how close or how far.


Having asked this myself once, and tried to write it: it is hilariously slow to render. Graphics cards are float crunchers. Changing one's frame of reference is not trivial but isn't impossible, and it is much faster.


The rendering can be done relative to the camera position though, can't it?

So for the graphics you just subtract all world coordinates from the camera coordinate, and cast the result to float; for the game physics and AI, you work directly in fixed point.


That is typically done with matrix transformations, which all end up in floating-point space anyway. Having to do integer-to-float transforms for everything to get you there is bad news.


You don't need to transform every vertex of the 3D model though. If you're rendering an astronaut on mars, you just feed the graphics engine the relative position of the astronaut and the camera. The detailed rendering of the astronaut's eyebrows can all be done natively in floating point once you've calculated that offset.


I mean...maybe. I am not up on it enough to say, though my intuitive answer is "it's not that simple." But that's just not how any existing stuff works, too. If you want to work with the ton of middleware, etc. that already exists, you work the way Unreal (or Unity, etc.) do.


Rendering is usually done with floating point on graphics cards, but I don't know if this is a requirement.


I've heard that the Star Citizen engine devs had to change all of their math operations in CryEngine from single to double precision to add support for seamless large worlds (the player origin hack has limits...) Don't even want to imagine how awful of a nightmare that would have been.


I believe unreal engine is doing the same thing for their next major release to support large open worlds


A former AAA game developer visualized this exact sort of glitch that a player sees due to mantissa imprecision.

https://www.youtube.com/watch?v=qYdcynW94vM


This is manifested as "far lands" in java editions of minecraft. Location away from origin gets out to precision limitations in floats resulting in jittering and other rendering oddities

https://www.youtube.com/watch?v=crAa9-5tPEI


Not just in games, but also the real world. I've had issues implementing robotic mapping code coming from the naive use of GPS coordinates, for which the solution was the same as you just said: dynamically adjusting the origin.


This is not a very good article, and I would not recommend it beyond simple use cases. The problem is that there is no right way, it depends on the usecase and the magnitude of numbers you're comparing. See e.g. https://bitbashing.io/comparing-floats.html as a better referenc.

The fundamental difficulty of comparing floats is that the format ensures a near constant number of digit of precision regardless of the scale. This is very useful for most calculations because it means you can calculate without worrying too much about the amplitude of your numbers. But it means that the smallest representable difference between two numbers gets bigger as numbers get bigger: that's why they are called floating point.

That's why using tolerance, etc. is not so reliable: because the tolerance will depend on the magnitude of the numbers, even by doing those simple tricks. In particular, it is important to understand epsilon is only "correct" around 1, i.e. a + eps != a is only true of a is close to 1. More precisely, epsilon is the smallest number such as 1 + eps != 1.


I mean, he's using relative tolerance, not absolute. Most common arithmetic operations should be well behaved w.r.t. relative tolerance, unless you're using denormalised numbers.


relative tolerance is not always enough, especially near 0. This classic article gives a nice overview of absolute, relative, ulp-based and other methods for comparison of float numbers: https://randomascii.wordpress.com/2012/02/25/comparing-float...


> especially near 0

Yes, these are denormalised numbers. They aren't as well behaved.


I abuse `Decimal` all the time in python. I usually make a little helper with a very short name like

  def d(in: Any) -> Decimal
Then use that everywhere I expect a float (or just anywhere). Decimal's constructor is so forgiving allowing strings, ints, floats, other decimals it is so convenient to use. Of course there is the perf penalty but I always think the precision is worth the tradeoff.


wouldn't just

    from decimal import Decimal as d
work as well?


Sorry, I am doing stuff in the function... here is one of the old ones from code I can share

    def dec(value, prec=4):
        """Return the given value as a decimal rounded to the given precision."""
        if value is None:
            return Decimal(0)
        
        value = Decimal(value)
        
        ret = Decimal(str(round(value, prec)))
        if ret.is_zero():
            # this avoids stuff like Decimal('0E-8')
            return Decimal(0)
        return ret


Decimal has the same fundamental representation issues that float does, albeit with greater configuration.


It does, but those issues manifest themselves in ways that humans have been trained to operate.

"Round to nearest even" for binary floating point is weird to anyone who doesn't have a numerics background. "Round 0.5 up" is normal to humans because that's what most have been taught.

Future programming languages should probably default to Decimal Floating Point and allow people to opt-in to binary floating point on request.


> It does, but those issues manifest themselves in ways that humans have been trained to operate.

Not really. If you do computations like "convert Fahrenheit to Celsius" or "pay 6 days of interest at this APR" or a million other things, you run into the same basic faulty assumptions as ever


Rounding to even makes sense for human-facing decimals as well.

That's what they do in accounting, I think? At least Wikipedia tells me it's also called banker's rounding.

Incidentally, it's also what Python's Decimal does by default. See https://docs.python.org/3/library/decimal.html#rounding-mode...


This is true for all languages, not just Python.

Unfortunately C++ STL does not offer anything close to isclose (boost does however). Neither does Java. Float.compare is just == in disguise.

But while going through the docs I stumbled upon this Gem

> If f1 and f2 both represent Float.NaN, then the equals method returns true, even though Float.NaN==Float.NaN has the value false.

https://docs.oracle.com/javase/7/docs/api/java/lang/Float.ht...


> This is true for all languages, not just Python.

Not quite: APL does comparisons with a configurable relative tolerance, which is nonzero by default. My article [0] about comparing one number to many quickly in this system starts with a discussion of the reasoning and mathematics there. It brings a lot of implementation difficulty, particularly with hash tables. It's possible to build a hash on tolerant doubles by making two hashes per lookup, and even complex numbers with four per lookup, but no one's figured out how to deal with entries that contain multiple numbers yet.

Nonetheless I would say it does make it easier to write working programs overall. But not by much: I've been working on an APL derivative and initially left out comparison tolerance as there were some specifics I was unsure of. A year or so later I noticed I hardly ever needed it even for numerical work and dropped any plans to add it to the language.

[0] https://www.dyalog.com/blog/2018/11/tolerated-comparison-par...


> It's possible to build a hash on tolerant doubles by making two hashes per lookup, and even complex numbers with four per lookup, but no one's figured out how to deal with entries that contain multiple numbers yet.

I'm curious what obstacle you have in mind for "multiple numbers".

The problem is equivalent to a distance query on a data structure, i.e. enumerate everything within a distance tol of a query point x in an appropriate distance metric. There are many data structures for this problem (it's a basic building block in computational geometry). But I assume the hashing solution you have in mind is what we usually call a hash grid in gamedev and graphics, which works well when tol is fixed for the lifetime of the data structure. [1] Namely, you define a grid with cell size tol and map a point x to the index cell(x) of its containing cell, which is just a rounding operation. Then you can use a hash table keyed by cell(x) to store the contents of an unbounded, sparse grid. To perform a distance query from x you need to check not just x's cell but some of the immediately neighboring cells and then filter all of those candidates based on the pairwise distance. [2] [3]

This approach works with any distance metric (including the usual suspects L^2, L^1 and L^inf) and in any dimension d although the worst-case number of cells to check in the hash is 2^d, so the curse of dimensionality is present and in high-dimensional spaces another data structure not based on rectilinear partitioning would be preferable. But hashing does work with "multiple numbers" if by "multiple numbers" you mean a vector (with not too many components) where tolerance is defined by a distance metric.

[1] Actually, hash grids work well any time the query radius is on roughly the same scale as the cell size. But if the query radius is a lot smaller than the cell size then the grid is too coarse to perform adequate pre-filtering. And if the query radius is larger than the cell size you have to check a bigger neighborhood of cells (i.e. more hash lookups) to enumerate all the candidates.

[2] Depending on the read vs write ratio, you can actually flip this around by storing each point in multiple cells so that queries only need one hash table lookup at the expense of slightly less precise pre-filtering.

[3] Instead of doing multiple hash table lookups, you can also have each occupied cell store pointers to neighboring cells (null for unoccupied neighbors), which replaces hash table lookups for the neighbors with plain memory loads. A variation on this trick is to instead store bit flags to avoid doing hash table lookups for neighboring cells which are known to be empty; this takes up far less memory.


> But while going through the docs I stumbled upon this Gem

You make it sound like that's crazy or something but the goal of .compare is to be used as a "comparator" to provide an ordering of all possible values in the implementation of various data structures or sorting algorithms, while the goal of the various operators is to support programmatic logic.


We need a little bit of background for this. In Java the == operator checks if two variables are pointing to the same object. The .equals() method checks for actual value equality.

  new String("Hi")==new String("Hi"); // False
  new String("Hi").equals(new String("Hi")); // True
Except for primitives like ints and floats where == does value comparison.

  1==1;// True
For NaN the value comparison is false. Even then .equals returns true

  Float.NaN==Float.NaN; // False
  new Float(0f/0f).equals(new Float(0f/0f)); // True!!!


> Unfortunately C++ STL does not offer anything close to isclose

maybe because you can trivially write it yourself with whatever precision you want:

   float a = 0.1;
   float b = 0.2;
  
   // now test for equality to within half of the smallest
   // value float can represent
 
   bool equal = fabs (0.3 - (a + b)) < (FLT_EPSILON/2);
Alternatively:

   bool isclose (float v1, float v2, float prec = FLT_EPSILON/2) { 
        return fabs (v1 - v2) < prec;
   }
         
Variations are left to the reader. There are a few optimizations one can do but this version conveys the basic point: you get to specify the semantics of isclose()


I never noticed math.isclose() and have often done the naive a - b < eps

There is also (for rolling your own fn)

   import sys
   sys.float_info.epsilon
although I can't immediately think of an advantage over the module.


Why do you consider that naive?

Also, I wouldn't use epsilon, because that is a very small number and is related to representation of numbers. I would instead use a larger "tolerance" value, in `if |a-b| < tol ...` because that indicates the expected accuracy of your algorithm (unless you're really working with 1e-80 tolerance?). But TFA explains that.

But back to point (a) is that not reasonable? (If not, I've got some code to change... :)


Naive because it's just the easiest way. (Also, right, I forgot abs.)

I imagine the downside with a chosen "epsilon" (which I used as an alias for any tolerance) is that it can be truncated on some machines, plus this equation on its own doesn't do relative accuracy.

On an LEGO EV3, Python's float is 1e38 max with 7 significant digits.

I've never run into problems, which is why I keep using it, too.


See issue 1580 for a description of the change that displays floats using the shortestish decimal expansion that round trips:

https://bugs.python.org/issue1580

I thought there was a PEP for it, but evidently not.

Although this is generally an improvement, it can contribute to confusion.


I remember having to write something similar for Lua, because there was no math.approximately() function and I was dealing with networking floats and comparing values over the wire to predicted movement in 2D space.

https://github.com/Planimeter/grid-sdk/blob/master/engine/sh...

https://en.wikipedia.org/wiki/Machine_epsilon

https://pubs.opengroup.org/onlinepubs/009696899/basedefs/flo...


I have another problem with ==. I have some code that needs to know if two (nested) tuples of Any are equivalent. I'd like to test (tupleA == tupleB), but that doesn't work if NaN appears anywhere in tuples because (NaN != NaN).

This is because == is an overloaded concept. Python has chosen to define == in the way that makes the most mathematical sense in the case of NaN.

Maybe there's a metaprogramming way to override how == in interpreted for floats?

  with float_equality_test( math.isclose):
    assert (0.1 + 0.2) == 0.3

  # This still allows (+0.0 == -0.0):
  def my_float_eq( a :float, b :float) ->bool:
    return (math.isnan( a) and math.isnan( b)) or (a == b)
  with float_equality_test( my_float_eq):
    assert (1, 2.0, math.nan) == (1, 2.0, math.nan)


If you use your own dataclasses (from dataclassy) instead of tuples, you can override this behaviour.

Making NaNs compare unequal to themselves was a hack that dates back from the time modern floats were first introduced.


Not mentioned in the article but numpy.isclose() works for 0.0 case unlike math.isclose().


An explanation for the atol difference between numpy and math, by the author of math.isclose():

https://github.com/numpy/numpy/issues/10161#issuecomment-350...

plus a ton of flogging/benchmarking common libraries (numpy, scipy, scikit-learn) and discussion about np.isclose()

... and also this gem:

  import numpy as np
  
  a = 1
  b = 10
  rtol = 1
  atol = 0
  np.isclose(a, b, rtol=rtol, atol=atol)  # True
  np.isclose(b, a, rtol=rtol, atol=atol)  # False
(For the record:)

  import math
  
  math.isclose(a, b, rel_tol=rtol, abs_tol=atol)  # True
  math.isclose(b, a, rel_tol=rtol, abs_tol=atol)  # True


Could you explain what you mean the 0.0 case? math.isclose() has an abs_tol parameter for, among other things, handling comparisons to 0.


The Julia equivalent does a good job explaining the 0.0 case: https://docs.julialang.org/en/v1/base/math/#Base.isapprox


I assume “out of the box” is the distinction.


I think floats should be treated like noisy analog values.

If you are doing something that requires you to reason about their precision beyond deciding how many bits is enough, you should probably not be using floats, because then you've added some more reasoning the next guy has to understand.

There is a good reason to compare float equality though: when there's no calculation involved and you're basically just comparing object identity.

Checking if a float timestamp has changed by comparing it to an old copy of exactly the same variable is fine.

They're like analog values that can be exactly stored and copied.


I know the answer is "yes" because the majority tends to have a point.. but are there good reasons, beyond backwards compatibility, for languages to be using IEEE 754 floating point arithmetic nowadays rather than just storing decimals "precisely" (to a specific degree of resolution)? Or are any new languages eschewing IEEE 754 entirely? (I'm aware of BigDecimal, etc. but these still seem to be treated as a bonus feature rather than 'the way'.)


They should have convenient hardware support (vector extensions for example). I'm sure you could design floating-point decimals that work nicely with vector extensions (which usually do cover integers), but it would be a significant project.

If your language is going to plug into the numerical computing stack (BLAS+LAPACK then all the fun stuff built up on that) then it'll need to talk binary floats.

All the annoying stuff that numericists understand but would like to not mess around with, like rounding directions and denormals, are handled nicely with 754 floats.

These are all sort of backward compatibility/legacy issues in the sense that they are based on decisions made in the past, but the hardware and libraries aren't going anywhere I bet!

Also, note that IEEE 754 does define a decimal interchange format. I bet they aren't handled as nicely in hardware, though.


In Julia at least, there are packages that provide alternatives to Bigfloats:

Logarithmic numbers: more range, less precision https://github.com/cjdoris/LogarithmicNumbers.jl

Double arithmetic: stitch 2 floats together https://github.com/JuliaMath/DoubleFloats.jl

Of course, if you are calling BLAS/LAPACK, you are constrained to use floats, but the recommendation on DoubleFloats is clear: if you know you algorithms, use the increased precision only in the parts that matter


Floats have an enormous range, with fixed relative precision. Even a single-precision float can store numbers up to about 1.7e38 .

Now of course you pay for that by losing absolute precision, but chances are that if you’re working with numbers like 1e20 you don’t much care about anything after the decimal point.


Wouldn't you want to default to rational numbers, instead of decimals, if you don't care too much about performance?


I'm not comfortable they recommend using an epsilon (isclose()). That's begging for mysterious bugs when the numbers are near the threshold. Better to just not compare floats for equality except when you're confident they can be exactly equal. Usually, that doesn't make sense, any more than comparing measurements of weight for equality.


For another breakdown of the details of floating-point precision, I recommend qntm's "0.1 + 0.2 returns 0.30000000000000004":

https://qntm.org/notpointthree


This is a good introductory article, but I would not recommend it for more advanced use cases. Just because two numbers are "close" does not mean that they are "equal". If truly precise floating point math is needed, symbolic representations are a much better way to do things.


Python supports rational numbers just fine, too.


> In other words, 0.1 gets converted from base 10 to base 2.

This isn’t really a (helpful) explanation, but it’s ample setup for: oh I see your problem, you’re converting from base 10 to base 10.


Step one:

You don't. At least, not exact numbers. You almost always have to round the numbers up, or just aproximate the value.

So, while

  1.2 + 2.3 != 3.5
this is always correct:

  1.2 + 2.3 < 3.6


How do you determine the fudge factor?

Eg if you compare (x + 1.2) + (x + 2.3) < (2*x + 3.6) that goes wrong for big enough x.


Use the round function in that case.


How is that going to help? What precision do you want to round to?

You know that at big enough values, floating point numbers are farther apart than integers?


once again I am glad I am not a Python programmer.

What is this: "math.isclose()"???

In computing floating point numbers cannot be compared for equality. It is obvious why.

So "math.isclose()" is for people who cannot use "<" or ">" operators?

Golly. Do any programmers need it explained to them why floating point numbers are not precise and equality does not apply?

Decimal types are an abomination. The Fraction type blows my mind. Why? It is the sort of nonsense C++ programmers loved to do in the twentieth century, and we all (?) learnt from that and do not do such opaque things any more.

This is all very elementary mathematics, and a language is wasting its time making easy things easy. Programmers must learn the easy things


> So "math.isclose()" is for people who cannot use "<" or ">" operators?

No, it's a shortcut for `abs(a-b)<tolerance`. Is `math.sin` for people who cannot implement Taylor expansions themselves?


You missed my point

"math.isClose()" is making easy things easy. Not easier, a little bit harder, and it is hiding what is actually happening. Like the C++ nonsense I mentioned that did all that (and I was a sinner in those days)

"math.sin(..)" makes a hard thing easy, which is a good thing.

Do you see what I mean? math.colse()" increases the cognitive load for no benefit.


> Do you see what I mean? math.colse()" increases the cognitive load for no benefit.

That seems to me to be a matter of perspective. It doesn't increase cognitive load for me (appropriate name, meaning is clear, checking the implementation, that's how I would've done it anyway, etc.) so I guess opinions differ.

from your earlier comment:

> Golly. Do any programmers need it explained to them why floating point numbers are not precise and equality does not apply?

I think the answer is _yes_. Actually, all programmers need that explained to them. Especially python programmers, as python is used by a lot of beginning programmers. So I don't see the problem here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: