Discussion:
[squeak-dev] Message>>#= & Message>>#hash
Eliot Miranda
2018-11-19 17:32:17 UTC
Permalink
Hi All,

In VisualWorks Message implements #= & #hash naturally; two messages
whose selectors and arguments are #= are also equal. But in Cuis, Squeak
and Pharo Message inherits #= and #hash from Object, i.e. uses identity
comparison. This is, to say the least, annoying. Any objections to
implementing comparing in Message to match VisualWorks?

_,,,^..^,,,_
best, Eliot
David T. Lewis
2018-11-19 17:51:17 UTC
Permalink
Post by Eliot Miranda
Hi All,
In VisualWorks Message implements #= & #hash naturally; two messages
whose selectors and arguments are #= are also equal. But in Cuis, Squeak
and Pharo Message inherits #= and #hash from Object, i.e. uses identity
comparison. This is, to say the least, annoying. Any objections to
implementing comparing in Message to match VisualWorks?
That sounds like an obviously good thing to do :-)

Is the lookkupClass instance variable relevant for comparisons? I am
guessing not, since we already have #analogousCodeTo: for that type of
c
Eliot Miranda
2018-11-20 00:16:51 UTC
Permalink
Hi David,
Post by David T. Lewis
Post by Eliot Miranda
Hi All,
In VisualWorks Message implements #= & #hash naturally; two messages
whose selectors and arguments are #= are also equal. But in Cuis, Squeak
and Pharo Message inherits #= and #hash from Object, i.e. uses identity
comparison. This is, to say the least, annoying. Any objections to
implementing comparing in Message to match VisualWorks?
That sounds like an obviously good thing to do :-)
Is the lookupClass instance variable relevant for comparisons? I am
guessing not, since we already have #analogousCodeTo: for that type of
comparison.
For me it is relevant. Two messages with different lookupClasses, e.g. one
with nil and one with a specific class, represent different messages, one a
normal send one a super send. So my changes in waiting include lookupClass
in both hash and =. I don't think it makes much difference, but the
incompatibility with VisualWorks, while regrettable, feels correct to me.

Dave
_,,,^..^,,,_
best, Eliot
David T. Lewis
2018-11-20 00:32:24 UTC
Permalink
Post by Eliot Miranda
Hi David,
Post by David T. Lewis
Post by Eliot Miranda
Hi All,
In VisualWorks Message implements #= & #hash naturally; two messages
whose selectors and arguments are #= are also equal. But in Cuis, Squeak
and Pharo Message inherits #= and #hash from Object, i.e. uses identity
comparison. This is, to say the least, annoying. Any objections to
implementing comparing in Message to match VisualWorks?
That sounds like an obviously good thing to do :-)
Is the lookupClass instance variable relevant for comparisons? I am
guessing not, since we already have #analogousCodeTo: for that type of
comparison.
For me it is relevant. Two messages with different lookupClasses, e.g. one
with nil and one with a specific class, represent different messages, one a
normal send one a super send. So my changes in waiting include lookupClass
in both hash and =. I don't think it makes much difference, but the
incompatibility with VisualWorks, while regrettable, feels correct to me.
That feels correct to me also. But implementation details and bikeshedding
aside, I would be happy if "Message allInstances asSet asArray" could answer
a reasonably small collection of different-looking things. So +1 to the
change, with or without consideration of lookupClass.

Dave
Marcus Denker
2018-11-20 09:16:47 UTC
Permalink
Post by Eliot Miranda
Hi David,
Post by Eliot Miranda
Hi All,
In VisualWorks Message implements #= & #hash naturally; two messages
whose selectors and arguments are #= are also equal. But in Cuis, Squeak
and Pharo Message inherits #= and #hash from Object, i.e. uses identity
comparison. This is, to say the least, annoying. Any objections to
implementing comparing in Message to match VisualWorks?
That sounds like an obviously good thing to do :-)
Is the lookupClass instance variable relevant for comparisons? I am
guessing not, since we already have #analogousCodeTo: for that type of
comparison.
For me it is relevant. Two messages with different lookupClasses, e.g. one with nil and one with a specific class, represent different messages, one a normal send one a super send. So my changes in waiting include lookupClass in both hash and =. I don't think it makes much difference, but the incompatibility with VisualWorks, while regrettable, feels correct to me.
To me this looks like a good change, yes.

Marcus
Levente Uzonyi
2018-11-20 10:10:20 UTC
Permalink
Post by Eliot Miranda
Hi David,
Post by Eliot Miranda
Hi All,
     In VisualWorks Message implements #= & #hash naturally; two messages
whose selectors and arguments are #= are also equal.  But in Cuis, Squeak
and Pharo Message inherits #= and #hash from Object, i.e. uses identity
comparison.  This is, to say the least, annoying.  Any objections to
implementing comparing in Message to match VisualWorks?
That sounds like an obviously good thing to do :-)
Is the lookupClass instance variable relevant for comparisons? I am
guessing not, since we already have #analogousCodeTo: for that type of
comparison.
For me it is relevant.  Two messages with different lookupClasses, e.g. one with nil and one with a specific class, represent different messages, one a normal send one a super send.  So my changes in
waiting include lookupClass in both hash and =.  I don't think it makes much difference, but the incompatibility with VisualWorks, while regrettable, feels correct to me.
What about Behavior >> #hash? It uses the name of the class, which is fine
as long as you don't rename your classes while you store them in hashed
collections.

Levente
Post by Eliot Miranda
Dave
_,,,^..^,,,_
best, Eliot
Chris Muller
2018-11-20 17:05:24 UTC
Permalink
It sounds good, but an Inbox submission would go a long way to
understanding exactly what you mean esp. re: understanding this
special condition with the lookupClass being nil... and whether it
deserves a first-class name or not...
Post by Levente Uzonyi
Post by Eliot Miranda
Hi David,
Post by Eliot Miranda
Hi All,
In VisualWorks Message implements #= & #hash naturally; two messages
whose selectors and arguments are #= are also equal. But in Cuis, Squeak
and Pharo Message inherits #= and #hash from Object, i.e. uses identity
comparison. This is, to say the least, annoying. Any objections to
implementing comparing in Message to match VisualWorks?
That sounds like an obviously good thing to do :-)
Is the lookupClass instance variable relevant for comparisons? I am
guessing not, since we already have #analogousCodeTo: for that type of
comparison.
For me it is relevant. Two messages with different lookupClasses, e.g. one with nil and one with a specific class, represent different messages, one a normal send one a super send. So my changes in
waiting include lookupClass in both hash and =. I don't think it makes much difference, but the incompatibility with VisualWorks, while regrettable, feels correct to me.
What about Behavior >> #hash? It uses the name of the class, which is fine
as long as you don't rename your classes while you store them in hashed
collections.
Levente
Post by Eliot Miranda
Dave
Chris Cunningham
2018-11-20 18:55:49 UTC
Permalink
It is interesting. In base Squeak, the only time that lookupClass is set
is, well, never(!). There is a setter method, but it isn't called is base
Squeak. It is checked a couple of times, but unless there is hidden
machinery in the VM that sets it, it isn't actually ever used.

Unless someone decided to use it in their code.

-cbc
Post by Chris Muller
It sounds good, but an Inbox submission would go a long way to
understanding exactly what you mean esp. re: understanding this
special condition with the lookupClass being nil... and whether it
deserves a first-class name or not...
Post by Levente Uzonyi
Post by Eliot Miranda
Hi David,
Post by Eliot Miranda
Hi All,
In VisualWorks Message implements #= & #hash naturally;
two messages
Post by Levente Uzonyi
Post by Eliot Miranda
Post by Eliot Miranda
whose selectors and arguments are #= are also equal. But in
Cuis, Squeak
Post by Levente Uzonyi
Post by Eliot Miranda
Post by Eliot Miranda
and Pharo Message inherits #= and #hash from Object, i.e. uses
identity
Post by Levente Uzonyi
Post by Eliot Miranda
Post by Eliot Miranda
comparison. This is, to say the least, annoying. Any
objections to
Post by Levente Uzonyi
Post by Eliot Miranda
Post by Eliot Miranda
implementing comparing in Message to match VisualWorks?
That sounds like an obviously good thing to do :-)
Is the lookupClass instance variable relevant for comparisons? I
am
Post by Levente Uzonyi
Post by Eliot Miranda
guessing not, since we already have #analogousCodeTo: for that
type of
Post by Levente Uzonyi
Post by Eliot Miranda
comparison.
For me it is relevant. Two messages with different lookupClasses,
e.g. one with nil and one with a specific class, represent different
messages, one a normal send one a super send. So my changes in
Post by Levente Uzonyi
Post by Eliot Miranda
waiting include lookupClass in both hash and =. I don't think it
makes much difference, but the incompatibility with VisualWorks, while
regrettable, feels correct to me.
Post by Levente Uzonyi
What about Behavior >> #hash? It uses the name of the class, which is
fine
Post by Levente Uzonyi
as long as you don't rename your classes while you store them in hashed
collections.
Levente
Post by Eliot Miranda
Dave
_,,,^..^,,,_
best, Eliot
Levente Uzonyi
2018-11-20 22:33:17 UTC
Permalink
To make things more clear, the current implementation of Behavior >> #hash
has two negative side effects:
- behaviors stored in collections relying on the hash value (e.g. Set,
Dictionary) will have to be rehashed whenever a behavior is renamed
- objects using Behavior >> #hash to implement their own #hash, like what
Eliot just did to Message will suffer from the same issue. So Sets and
Dictionaries holding those kind of objects will have to be rehashed as
well upon the rename of the behavior.

My questions related to this:
- why does Behavior >> #hash rely on the name instead of identity?
- do we want to fix those issues mentioned above or do we just say that
one should not rename classes and expect things to work?

Levente
Post by Eliot Miranda
Hi David,
Post by Eliot Miranda
Hi All,
     In VisualWorks Message implements #= & #hash naturally; two
messages
Post by Eliot Miranda
whose selectors and arguments are #= are also equal.  But in Cuis,
Squeak
Post by Eliot Miranda
and Pharo Message inherits #= and #hash from Object, i.e. uses
identity
Post by Eliot Miranda
comparison.  This is, to say the least, annoying.  Any objections
to
Post by Eliot Miranda
implementing comparing in Message to match VisualWorks?
That sounds like an obviously good thing to do :-)
Is the lookupClass instance variable relevant for comparisons? I am
guessing not, since we already have #analogousCodeTo: for that type of
comparison.
For me it is relevant.  Two messages with different lookupClasses, e.g. one
with nil and one with a specific class, represent different messages, one a
normal send one a super send.  So my changes in
waiting include lookupClass in both hash and =.  I don't think it makes
much difference, but the incompatibility with VisualWorks, while
regrettable, feels correct to me.
What about Behavior >> #hash? It uses the name of the class, which is fine as
long as you don't rename your classes while you store them in hashed
collections.
Levente
Post by Eliot Miranda
Dave
_,,,^..^,,,_
best, Eliot
Bob Arning
2018-11-20 23:56:54 UTC
Permalink
Post by Levente Uzonyi
- why does Behavior >> #hash rely on the name instead of identity?
FWIW, I found this...

'From Squeak3.9alpha of 4 July 2005 [latest update: #6726] on 16
February 2006 at 7:30:27 pm'
"Change Set:        6727updateFromMC
Date:            16 February 2006
Author:            Marcus Denker

-Put back in the simplified Decompiler>>#decompileBlock:
- deprecated #who
- CompiledMethod>>#methodNode now sets Class and Selector for non-installed
  methods to Object/ CompiledMethod>>#defaultSelector
- refactored ContextPart to not call #who
- refactored ProcessBrowserPlus to not use #who
- refactored ProcessBrowser to not use #who
- simplified CompiledMethod>>#defaultSelector
- String>>#hash now uses identityHasch as initial hash (needed
forBehavior>>#hash)
- moved numArgs: to Symbol, faster
- addBehavior hash
- simplify CompiledMethod: #defaultSelector, #equivalentTo:, #methodNode
- add CompiledMethod>>#selector:
- add iVarselector to MethodProperties, accessors
- make MethodProperties compact again."
Chris Muller
2018-11-21 02:27:14 UTC
Permalink
Post by Levente Uzonyi
To make things more clear, the current implementation of Behavior >> #hash
- behaviors stored in collections relying on the hash value (e.g. Set,
Dictionary) will have to be rehashed whenever a behavior is renamed
- objects using Behavior >> #hash to implement their own #hash, like what
Eliot just did to Message will suffer from the same issue. So Sets and
Dictionaries holding those kind of objects will have to be rehashed as
well upon the rename of the behavior.
- why does Behavior >> #hash rely on the name instead of identity?
If you mean #identityHash, then its because involving an unstable
value in a #hash calculation is never a good idea. #identityHash can
be different for the same class between two different images, or if
the class was ever becomed or reloaded into a new image, etc.
Post by Levente Uzonyi
- do we want to fix those issues mentioned above or do we just say that
one should not rename classes and expect things to work?
Neither. We just say that when one renames a class to rehash a
Levente Uzonyi
2018-11-21 12:31:28 UTC
Permalink
Post by Chris Muller
Post by Levente Uzonyi
To make things more clear, the current implementation of Behavior >> #hash
- behaviors stored in collections relying on the hash value (e.g. Set,
Dictionary) will have to be rehashed whenever a behavior is renamed
- objects using Behavior >> #hash to implement their own #hash, like what
Eliot just did to Message will suffer from the same issue. So Sets and
Dictionaries holding those kind of objects will have to be rehashed as
well upon the rename of the behavior.
- why does Behavior >> #hash rely on the name instead of identity?
If you mean #identityHash, then its because involving an unstable
value in a #hash calculation is never a good idea. #identityHash can
be different for the same class between two different images, or if
the class was ever becomed or reloaded into a new image, etc.
Is there an actual user of that feature?

Bob found out that #hash had been changed during the developement of
Squeak 3.9. Therefore this issue is not present in Cuis (forked from 3.7).
And I just checked Pharo and found that Behavior >> #hash had been removed
from there.
So, I suggest we remove it as well unless there's a really good reason to
keep it.
Post by Chris Muller
Post by Levente Uzonyi
- do we want to fix those issues mentioned above or do we just say that
one should not rename classes and expect things to work?
Neither. We just say that when one renames a class to rehash all
relevant HashedCollections.
That's "one should not rename classes and expect
Bob Arning
2018-11-21 13:25:40 UTC
Permalink
I'm neutral on this, but here is a bit more (and from earlier it would seem)


'From Squeak3.7beta of ''1 April 2004'' [latest update: #5963] on 22
June 2004 at 8:51:57 pm'
"Change Set: BehaviorHashEnh v1.2
Date:            22 June 2004, 16.02.2006
Author:            Stephan Rudlof, md
md: added a line to the poscript to uncompactify the MethodProperties
class. We want to add an instVar for the selector.
Improves the default Object>>hash forBehaviors by
installingBehavior>>hash. String>>hash has been changed a little to
avoid infinite recursion (without changing its semantics).
All is done in the postscript.

Important
-----------
This is a special changeset: Do not export and import this changeset
again after importing it the first time! Then the methods are not
installed alone in the postscript anymore, leading to serious problems!
-----------

Rationale: Object>>hash calling ProtoObject>>identityHash gives poor
results forBehaviors. Therefore a newBehavior>>hash using Symbol>>hash
or String>>hash (the latter slightly changed to avoide infinite
recursion) will be installed.

Consequences:
- It speeds up Set/Dictionary operations withBehaviors a lot (see below).
- The main consequence for other objects asBehaviors seems to be a
changed hash if they use
    self species hash
as a start value for computing their hash.
But AFAICS this doesn't hurt, since in most cases (non meta classes as
species) it maps to Symbol>>hash, which is fast.
Post by Levente Uzonyi
Post by Chris Muller
Post by Levente Uzonyi
To make things more clear, the current implementation of Behavior >> #hash
- behaviors stored in collections relying on the hash value (e.g. Set,
Dictionary) will have to be rehashed whenever a behavior is renamed
- objects using Behavior >> #hash to implement their own #hash, like what
Eliot just did to Message will suffer from the same issue. So Sets and
Dictionaries holding those kind of objects will have to be rehashed as
well upon the rename of the behavior.
- why does Behavior >> #hash rely on the name instead of identity?
If you mean #identityHash, then its because involving an unstable
value in a #hash calculation is never a good idea. #identityHash can
be different for the same class between two different images, or if
the class was ever becomed or reloaded into a new image, etc.
Is there an actual user of that feature?
Bob found out that #hash had been changed during the developement of
Squeak 3.9. Therefore this issue is not present in Cuis (forked from
3.7). And I just checked Pharo and found that Behavior >> #hash had
been removed from there.
So, I suggest we remove it as well unless there's a really good reason
to keep it.
Post by Chris Muller
Post by Levente Uzonyi
- do we want to fix those issues mentioned above or do we just say that
one should not rename classes and expect things to work?
Neither.  We just say that when one renames a class to rehash all
relevant HashedCollections.
That's "one should not rename classes and expect things to work",
isn't it?
Levente
Post by Chris Muller
- Chris
Levente Uzonyi
2018-11-21 23:23:09 UTC
Permalink
Nice find. So the reason for the change was to improve performance, but
now that change actually makes things slower.
We have 22-bit identity hashes and #hashMultiply is quicker too, so a new
hash function based on those could be 6x-10x quicker.

Levente
Post by Bob Arning
I'm neutral on this, but here is a bit more (and from earlier it would seem)
'From Squeak3.7beta of ''1 April 2004'' [latest update: #5963] on 22 June 2004 at 8:51:57 pm'
"Change Set:        BehaviorHashEnh v1.2
Date:            22 June 2004, 16.02.2006
Author:            Stephan Rudlof, md
md: added a line to the poscript to uncompactify the MethodProperties class. We want to add an instVar for the selector.
Improves the default Object>>hash for Behaviors by installing Behavior>>hash. String>>hash has been changed a little to avoid infinite recursion (without changing its semantics).
All is done in the postscript.
Important
-----------
This is a special changeset: Do not export and import this changeset again after importing it the first time! Then the methods are not installed alone in the postscript anymore, leading to serious
problems!
-----------
Rationale: Object>>hash calling ProtoObject>>identityHash gives poor results for Behaviors. Therefore a new Behavior>>hash using Symbol>>hash or String>>hash (the latter slightly changed to avoide
infinite recursion) will be installed.
- It speeds up Set/Dictionary operations with Behaviors a lot (see below).
- The main consequence for other objects as Behaviors seems to be a changed hash if they use
    self species hash
as a start value for computing their hash.
But AFAICS this doesn't hurt, since in most cases (non meta classes as species) it maps to Symbol>>hash, which is fast.
To make things more clear, the current implementation of Behavior >> #hash
- behaviors stored in collections relying on the hash value (e.g. Set,
Dictionary) will have to be rehashed whenever a behavior is renamed
- objects using Behavior >> #hash to implement their own #hash, like what
Eliot just did to Message will suffer from the same issue. So Sets and
Dictionaries holding those kind of objects will have to be rehashed as
well upon the rename of the behavior.
- why does Behavior >> #hash rely on the name instead of identity?
If you mean #identityHash, then its because involving an unstable
value in a #hash calculation is never a good idea.  #identityHash can
be different for the same class between two different images, or if
the class was ever becomed or reloaded into a new image, etc.
Is there an actual user of that feature?
Bob found out that #hash had been changed during the developement of Squeak 3.9. Therefore this issue is not present in Cuis (forked from 3.7). And I just checked Pharo and found that
Behavior >> #hash had been removed from there.
So, I suggest we remove it as well unless there's a really good reason to keep it.
- do we want to fix those issues mentioned above or do we just say that
one should not rename classes and expect things to work?
Neither.  We just say that when one renames a class to rehash all
relevant HashedCollections.
That's "one should not rename classes and expect things to work", isn't it?
Levente
- Chris
Chris Muller
2018-11-22 05:49:08 UTC
Permalink
Hi Levente,
Post by Levente Uzonyi
Post by Chris Muller
Post by Levente Uzonyi
- why does Behavior >> #hash rely on the name instead of identity?
If you mean #identityHash, then its because involving an unstable
value in a #hash calculation is never a good idea. #identityHash can
be different for the same class between two different images, or if
the class was ever becomed or reloaded into a new image, etc.
Is there an actual user of that feature?
Several actually.

If your primary motivation is improving performance then we could
cache the hash value, possibly even straight in the CompiledMethod
itself (hard-coded).

But we _must not_ destroy stable and deterministic hashes. It would
be a huge blow to existing users and even future potential
possibilities of future implementations of things. Anything that
would try to share #hash calculations between images... Possibly
Croquet. It's easy to imagine some things!

MagmaDictionary is just a Dictionary except the internal array grows
on disk providing size scalability while sparing RAM. The #hash of
the objects put into it are calculated by the clients -- it must be
consistent between users.

source.squeak.org uses a MagmaDictionary for the revision history. It
was the best way to introduce the capability without invading
Monticello's code at all.
Post by Levente Uzonyi
Bob found out that #hash had been changed during the developement of
Squeak 3.9. Therefore this issue is not present in Cuis (forked from 3.7).
And I just checked Pharo and found that Behavior >> #hash had been removed
from there.
I think that's a big mistake. There's no need for us to cross the
line of a stable hash t
Chris Muller
2018-11-22 05:57:13 UTC
Permalink
Post by Bob Arning
That's "one should not rename classes and expect things to work", isn't
it?
Not really, since that is already true anyway. If you rename a class,
all references to t

Chris Muller
2018-11-20 19:43:59 UTC
Permalink
Wait, Message is the one used internally by the system. It doesn't
have #= and #hash because it didn't need it.

*MessageSend* is the one for external use, and it already defines #= and #hash.
Post by Eliot Miranda
Hi All,
In VisualWorks Message implements #= & #hash naturally; two messages whose selectors and arguments are #= are also equal. But in Cuis, Squeak and Pharo Message inherits #= and #hash from Object, i.e. uses identity comparison. This is, to say the least, annoying. Any objections to implementing comparing in Message to match V
Eliot Miranda
2018-11-20 21:59:15 UTC
Permalink
Hi Juan,
Hi Eliot,
(below)
Hi All,
In VisualWorks Message implements #= & #hash naturally; two messages
whose selectors and arguments are #= are also equal. But in Cuis, Squeak
and Pharo Message inherits #= and #hash from Object, i.e. uses identity
comparison. This is, to say the least, annoying. Any objections to
implementing comparing in Message to match VisualWorks?
_,,,^..^,,,_
best, Eliot
= aMessage
"Any object is equal to itself"
self == aMessage ifTrue: [ ^ true ].
self class == aMessage class ifFalse: [ ^false ].
selector = aMessage selector ifFalse: [ ^false ].
lookupClass = aMessage lookupClass ifFalse: [ ^false ].
^args = aMessage arguments
this last line should be

^args literalEqual: aMessage arguments

and this is very important :-)
hash
"Hash is reimplemented because = is implemented."
^selector hash bitXor: args hash
Do they look ok? (I considered lookupClass for #=, but not for hash, as
collisions because of that seem unlikely, and in any case, grouping them
that way could be useful).
Cheers,
--
Juan Vuletichwww.cuis-smalltalk.orghttps://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Devhttps://github.com/jvuletichhttps://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich
--
_,,,^..^,,,_
best, Eliot
Juan Vuletich
2018-11-21 21:31:58 UTC
Permalink
Post by Eliot Miranda
Hi Juan,
this last line should be
^args literalEqual: aMessage arguments
and this is very important :-)
--
_,,,^..^,,,_
best, Eliot
Thanks Eliot. Fix is now at Cuis GitHub repo.

Cheers,
--
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich
Loading...