How to implement float hashing with approximate equality





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







4















Let's say we have the following python class (the problem exists in java just the same with equals and hashCode)



class Temperature:
def __init__(self, degrees):
self.degrees = degrees


where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that




  • compares floats up to an epsilon difference instead of direct equality testing,

  • and honors the contract that a == b implies hash(a) == hash(b).


def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
return # What goes here?


The python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.



Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?



Update: Now I understand that this type of equality testing for floats eliminates the transitivity of == and equals. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?










share|improve this question









New contributor




CQQL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..

    – Neil
    5 hours ago






  • 2





    why is the question has Java's tag?

    – Laiv
    4 hours ago








  • 4





    About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

    – J. Fabian Meier
    3 hours ago






  • 1





    @Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

    – Matthieu M.
    1 hour ago













  • @MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.

    – Neil
    25 mins ago


















4















Let's say we have the following python class (the problem exists in java just the same with equals and hashCode)



class Temperature:
def __init__(self, degrees):
self.degrees = degrees


where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that




  • compares floats up to an epsilon difference instead of direct equality testing,

  • and honors the contract that a == b implies hash(a) == hash(b).


def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
return # What goes here?


The python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.



Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?



Update: Now I understand that this type of equality testing for floats eliminates the transitivity of == and equals. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?










share|improve this question









New contributor




CQQL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..

    – Neil
    5 hours ago






  • 2





    why is the question has Java's tag?

    – Laiv
    4 hours ago








  • 4





    About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

    – J. Fabian Meier
    3 hours ago






  • 1





    @Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

    – Matthieu M.
    1 hour ago













  • @MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.

    – Neil
    25 mins ago














4












4








4


2






Let's say we have the following python class (the problem exists in java just the same with equals and hashCode)



class Temperature:
def __init__(self, degrees):
self.degrees = degrees


where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that




  • compares floats up to an epsilon difference instead of direct equality testing,

  • and honors the contract that a == b implies hash(a) == hash(b).


def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
return # What goes here?


The python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.



Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?



Update: Now I understand that this type of equality testing for floats eliminates the transitivity of == and equals. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?










share|improve this question









New contributor




CQQL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












Let's say we have the following python class (the problem exists in java just the same with equals and hashCode)



class Temperature:
def __init__(self, degrees):
self.degrees = degrees


where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that




  • compares floats up to an epsilon difference instead of direct equality testing,

  • and honors the contract that a == b implies hash(a) == hash(b).


def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
return # What goes here?


The python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.



Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?



Update: Now I understand that this type of equality testing for floats eliminates the transitivity of == and equals. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?







java python hashing floating-point






share|improve this question









New contributor




CQQL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




CQQL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 14 mins ago









Solomon Ucko

11717




11717






New contributor




CQQL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 12 hours ago









CQQLCQQL

1244




1244




New contributor




CQQL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





CQQL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






CQQL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..

    – Neil
    5 hours ago






  • 2





    why is the question has Java's tag?

    – Laiv
    4 hours ago








  • 4





    About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

    – J. Fabian Meier
    3 hours ago






  • 1





    @Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

    – Matthieu M.
    1 hour ago













  • @MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.

    – Neil
    25 mins ago



















  • The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..

    – Neil
    5 hours ago






  • 2





    why is the question has Java's tag?

    – Laiv
    4 hours ago








  • 4





    About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

    – J. Fabian Meier
    3 hours ago






  • 1





    @Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

    – Matthieu M.
    1 hour ago













  • @MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.

    – Neil
    25 mins ago

















The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..

– Neil
5 hours ago





The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..

– Neil
5 hours ago




2




2





why is the question has Java's tag?

– Laiv
4 hours ago







why is the question has Java's tag?

– Laiv
4 hours ago






4




4





About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

– J. Fabian Meier
3 hours ago





About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

– J. Fabian Meier
3 hours ago




1




1





@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

– Matthieu M.
1 hour ago







@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

– Matthieu M.
1 hour ago















@MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.

– Neil
25 mins ago





@MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.

– Neil
25 mins ago










2 Answers
2






active

oldest

votes


















14















implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



So I strongly recommend you don't do that.



Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)






share|improve this answer
























  • interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

    – Christophe
    5 hours ago



















6














Good Luck



You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



Example:



Assume that each point hashes to its own unique hash value.



As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.





  1. For each two points within epsilon of each other that do not share the same hash value.




    • Adjust the hashing scheme so that these two points hash to the same value.



  2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.


There are a few cases where this will not hold true:




  • Positive/Negative Infinity

  • NaN

  • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

  • perhaps a few other format specific instances


However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



Outcome



Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



Or the epsilon is such that only exact matches are permitted.



Granular



You could of course go for a granular approach instead.



Under this approach you define exact buckets down to a particular resolution. ie:



[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...


Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.






share|improve this answer



















  • 1





    This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.

    – Doc Brown
    6 hours ago











  • I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

    – Neil
    5 hours ago






  • 2





    @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

    – J. Fabian Meier
    3 hours ago












Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "131"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






CQQL is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f391100%2fhow-to-implement-float-hashing-with-approximate-equality%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









14















implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



So I strongly recommend you don't do that.



Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)






share|improve this answer
























  • interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

    – Christophe
    5 hours ago
















14















implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



So I strongly recommend you don't do that.



Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)






share|improve this answer
























  • interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

    – Christophe
    5 hours ago














14












14








14








implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



So I strongly recommend you don't do that.



Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)






share|improve this answer














implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



So I strongly recommend you don't do that.



Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)







share|improve this answer












share|improve this answer



share|improve this answer










answered 5 hours ago









Sebastian RedlSebastian Redl

11.5k63741




11.5k63741













  • interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

    – Christophe
    5 hours ago



















  • interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

    – Christophe
    5 hours ago

















interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

– Christophe
5 hours ago





interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

– Christophe
5 hours ago













6














Good Luck



You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



Example:



Assume that each point hashes to its own unique hash value.



As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.





  1. For each two points within epsilon of each other that do not share the same hash value.




    • Adjust the hashing scheme so that these two points hash to the same value.



  2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.


There are a few cases where this will not hold true:




  • Positive/Negative Infinity

  • NaN

  • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

  • perhaps a few other format specific instances


However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



Outcome



Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



Or the epsilon is such that only exact matches are permitted.



Granular



You could of course go for a granular approach instead.



Under this approach you define exact buckets down to a particular resolution. ie:



[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...


Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.






share|improve this answer



















  • 1





    This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.

    – Doc Brown
    6 hours ago











  • I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

    – Neil
    5 hours ago






  • 2





    @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

    – J. Fabian Meier
    3 hours ago
















6














Good Luck



You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



Example:



Assume that each point hashes to its own unique hash value.



As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.





  1. For each two points within epsilon of each other that do not share the same hash value.




    • Adjust the hashing scheme so that these two points hash to the same value.



  2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.


There are a few cases where this will not hold true:




  • Positive/Negative Infinity

  • NaN

  • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

  • perhaps a few other format specific instances


However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



Outcome



Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



Or the epsilon is such that only exact matches are permitted.



Granular



You could of course go for a granular approach instead.



Under this approach you define exact buckets down to a particular resolution. ie:



[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...


Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.






share|improve this answer



















  • 1





    This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.

    – Doc Brown
    6 hours ago











  • I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

    – Neil
    5 hours ago






  • 2





    @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

    – J. Fabian Meier
    3 hours ago














6












6








6







Good Luck



You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



Example:



Assume that each point hashes to its own unique hash value.



As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.





  1. For each two points within epsilon of each other that do not share the same hash value.




    • Adjust the hashing scheme so that these two points hash to the same value.



  2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.


There are a few cases where this will not hold true:




  • Positive/Negative Infinity

  • NaN

  • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

  • perhaps a few other format specific instances


However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



Outcome



Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



Or the epsilon is such that only exact matches are permitted.



Granular



You could of course go for a granular approach instead.



Under this approach you define exact buckets down to a particular resolution. ie:



[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...


Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.






share|improve this answer













Good Luck



You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



Example:



Assume that each point hashes to its own unique hash value.



As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.





  1. For each two points within epsilon of each other that do not share the same hash value.




    • Adjust the hashing scheme so that these two points hash to the same value.



  2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.


There are a few cases where this will not hold true:




  • Positive/Negative Infinity

  • NaN

  • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

  • perhaps a few other format specific instances


However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



Outcome



Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



Or the epsilon is such that only exact matches are permitted.



Granular



You could of course go for a granular approach instead.



Under this approach you define exact buckets down to a particular resolution. ie:



[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...


Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.







share|improve this answer












share|improve this answer



share|improve this answer










answered 10 hours ago









Kain0_0Kain0_0

4,620419




4,620419








  • 1





    This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.

    – Doc Brown
    6 hours ago











  • I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

    – Neil
    5 hours ago






  • 2





    @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

    – J. Fabian Meier
    3 hours ago














  • 1





    This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.

    – Doc Brown
    6 hours ago











  • I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

    – Neil
    5 hours ago






  • 2





    @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

    – J. Fabian Meier
    3 hours ago








1




1





This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.

– Doc Brown
6 hours ago





This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.

– Doc Brown
6 hours ago













I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

– Neil
5 hours ago





I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

– Neil
5 hours ago




2




2





@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

– J. Fabian Meier
3 hours ago





@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

– J. Fabian Meier
3 hours ago










CQQL is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















CQQL is a new contributor. Be nice, and check out our Code of Conduct.













CQQL is a new contributor. Be nice, and check out our Code of Conduct.












CQQL is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Software Engineering Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f391100%2fhow-to-implement-float-hashing-with-approximate-equality%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Ponta tanko

Tantalo (mitologio)

Erzsébet Schaár