How to implement float hashing with approximate equality
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
Let's say we have the following python class (the problem exists in java just the same with equals
and hashCode
)
class Temperature:
def __init__(self, degrees):
self.degrees = degrees
where degrees
is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature
in a way that
- compares floats up to an epsilon difference instead of direct equality testing,
- and honors the contract that
a == b
implieshash(a) == hash(b)
.
def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON
def __hash__(self):
return # What goes here?
The python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0)
but this is not quite the same problem.
Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?
Update: Now I understand that this type of equality testing for floats eliminates the transitivity of ==
and equals
. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?
java python hashing floating-point
New contributor
|
show 1 more comment
Let's say we have the following python class (the problem exists in java just the same with equals
and hashCode
)
class Temperature:
def __init__(self, degrees):
self.degrees = degrees
where degrees
is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature
in a way that
- compares floats up to an epsilon difference instead of direct equality testing,
- and honors the contract that
a == b
implieshash(a) == hash(b)
.
def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON
def __hash__(self):
return # What goes here?
The python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0)
but this is not quite the same problem.
Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?
Update: Now I understand that this type of equality testing for floats eliminates the transitivity of ==
and equals
. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?
java python hashing floating-point
New contributor
The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..
– Neil
5 hours ago
2
why is the question has Java's tag?
– Laiv
4 hours ago
4
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
3 hours ago
1
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
1 hour ago
@MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.
– Neil
25 mins ago
|
show 1 more comment
Let's say we have the following python class (the problem exists in java just the same with equals
and hashCode
)
class Temperature:
def __init__(self, degrees):
self.degrees = degrees
where degrees
is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature
in a way that
- compares floats up to an epsilon difference instead of direct equality testing,
- and honors the contract that
a == b
implieshash(a) == hash(b)
.
def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON
def __hash__(self):
return # What goes here?
The python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0)
but this is not quite the same problem.
Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?
Update: Now I understand that this type of equality testing for floats eliminates the transitivity of ==
and equals
. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?
java python hashing floating-point
New contributor
Let's say we have the following python class (the problem exists in java just the same with equals
and hashCode
)
class Temperature:
def __init__(self, degrees):
self.degrees = degrees
where degrees
is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature
in a way that
- compares floats up to an epsilon difference instead of direct equality testing,
- and honors the contract that
a == b
implieshash(a) == hash(b)
.
def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON
def __hash__(self):
return # What goes here?
The python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0)
but this is not quite the same problem.
Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?
Update: Now I understand that this type of equality testing for floats eliminates the transitivity of ==
and equals
. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?
java python hashing floating-point
java python hashing floating-point
New contributor
New contributor
edited 14 mins ago
Solomon Ucko
11717
11717
New contributor
asked 12 hours ago
CQQLCQQL
1244
1244
New contributor
New contributor
The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..
– Neil
5 hours ago
2
why is the question has Java's tag?
– Laiv
4 hours ago
4
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
3 hours ago
1
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
1 hour ago
@MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.
– Neil
25 mins ago
|
show 1 more comment
The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..
– Neil
5 hours ago
2
why is the question has Java's tag?
– Laiv
4 hours ago
4
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
3 hours ago
1
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
1 hour ago
@MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.
– Neil
25 mins ago
The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..
– Neil
5 hours ago
The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..
– Neil
5 hours ago
2
2
why is the question has Java's tag?
– Laiv
4 hours ago
why is the question has Java's tag?
– Laiv
4 hours ago
4
4
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
3 hours ago
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
3 hours ago
1
1
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
1 hour ago
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
1 hour ago
@MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.
– Neil
25 mins ago
@MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.
– Neil
25 mins ago
|
show 1 more comment
2 Answers
2
active
oldest
votes
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
5 hours ago
add a comment |
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
1
This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.
– Doc Brown
6 hours ago
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
5 hours ago
2
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
3 hours ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "131"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
CQQL is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f391100%2fhow-to-implement-float-hashing-with-approximate-equality%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
5 hours ago
add a comment |
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
5 hours ago
add a comment |
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
answered 5 hours ago
Sebastian RedlSebastian Redl
11.5k63741
11.5k63741
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
5 hours ago
add a comment |
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
5 hours ago
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
5 hours ago
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
5 hours ago
add a comment |
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
1
This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.
– Doc Brown
6 hours ago
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
5 hours ago
2
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
3 hours ago
add a comment |
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
1
This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.
– Doc Brown
6 hours ago
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
5 hours ago
2
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
3 hours ago
add a comment |
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
answered 10 hours ago
Kain0_0Kain0_0
4,620419
4,620419
1
This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.
– Doc Brown
6 hours ago
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
5 hours ago
2
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
3 hours ago
add a comment |
1
This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.
– Doc Brown
6 hours ago
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
5 hours ago
2
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
3 hours ago
1
1
This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.
– Doc Brown
6 hours ago
This could become a good answer, when you remove the "not possible" nonsense from the first paragraph and extend the last paragraph by the small part which is missing: the idea of testing the neighboured buckets as well.
– Doc Brown
6 hours ago
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
5 hours ago
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
5 hours ago
2
2
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
3 hours ago
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
3 hours ago
add a comment |
CQQL is a new contributor. Be nice, and check out our Code of Conduct.
CQQL is a new contributor. Be nice, and check out our Code of Conduct.
CQQL is a new contributor. Be nice, and check out our Code of Conduct.
CQQL is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Software Engineering Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f391100%2fhow-to-implement-float-hashing-with-approximate-equality%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The standard hashCode/equals contract says that if two instances are considered equal, then their hash codes must match. You violate this using a float, since its hash will change. If you're willing to allow a tolerance to be "fixed" (say, rounded to the nearest thousandths digit), then the hash code of a rounded degree would match the hash code of a rounded degree in another instance.. Just an idea..
– Neil
5 hours ago
2
why is the question has Java's tag?
– Laiv
4 hours ago
4
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
3 hours ago
1
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
1 hour ago
@MatthieuM. If that's possible, that's obviously ideal. Though if you had to round to the nearest 0.32%, then that's difficult to represent properly using a mere integer.
– Neil
25 mins ago