Discussion:
The Trunk: Collections-mt.593.mcz
(too old to reply)
c***@source.squeak.org
0000-01-20 22:04:55 UTC
Permalink
Marcel Taeumel uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-mt.593.mcz

==================== Summary ====================

Name: Collections-mt.593
Author: mt
Time: 15 January 2015, 5:14:16.888 pm
UUID: 0316ee3c-b9ec-f845-9474-8df35a929d8b
Ancestors: Collections-mt.592

New ordered dictionary added, which keeps track of the insertion order.

=============== Diff against Collections-mt.592 ===============

Item was added:
+ Dictionary subclass: #OrderedDictionary
+ instanceVariableNames: 'order lastIndex'
+ classVariableNames: ''
+ poolDictionaries: ''
+ category: 'Collections-Sequenceable'!
+
+ !OrderedDictionary commentStamp: '<historical>' prior: 0!
+ I am an ordered dictionary. I have an additional index (called 'order') to keep track of the insertion order of my associations.
+
+ The read access is not affected by the additional index.
+
+ Storing new data updates the index in O(1) for new keys. Keys that are already present involve actions in O(n) to update the insertion order.
+
+ The growth operation compacts the index and takes O(n) additional time.!

Item was added:
+ ----- Method: OrderedDictionary>>add: (in category 'adding') -----
+ add: anAssociation
+
+ | oldLastIndex oldCapacity |
+ oldLastIndex := lastIndex.
+ oldCapacity := self capacity.
+
+ super add: anAssociation.
+
+ (lastIndex = oldLastIndex or: [self capacity > oldCapacity]) ifTrue: [
+ | index |
+ "The association was already present or we grew. We need to update the order."
+ index := self scanOrderFor: anAssociation key.
+ index = lastIndex ifFalse: [
+ lastIndex := lastIndex + 1.
+ index := self scanOrderFor: anAssociation key.
+ order at: lastIndex put: (order at: index).
+ order at: index put: nil.
+ lastIndex = order size ifTrue: [self fixEmptySlots]]].
+
+ ^ anAssociation!

Item was added:
+ ----- Method: OrderedDictionary>>associationsDo: (in category 'enumerating') -----
+ associationsDo: aBlock
+ "Iterate over the order instead of the internal array."
+
+ lastIndex = 0 ifTrue: [^ self].
+ 1 to: lastIndex do: [:index |
+ (order at: index) ifNotNil: [:element |
+ aBlock value: element]].!

Item was added:
+ ----- Method: OrderedDictionary>>at:put: (in category 'accessing') -----
+ at: key put: anObject
+
+ | oldLastIndex oldCapacity |
+ oldLastIndex := lastIndex.
+ oldCapacity := self capacity.
+
+ super at: key put: anObject.
+
+ (lastIndex = oldLastIndex or: [self capacity > oldCapacity]) ifTrue: [
+ | index |
+ "The association was already present or we grew. We need to update the order."
+ index := self scanOrderFor: key.
+ index = lastIndex ifFalse: [
+ lastIndex := lastIndex + 1.
+ order at: lastIndex put: (order at: index).
+ order at: index put: nil.
+ lastIndex = order size ifTrue: [self fixEmptySlots]]].
+
+ ^ anObject!

Item was added:
+ ----- Method: OrderedDictionary>>atNewIndex:put: (in category 'private') -----
+ atNewIndex: index put: anObject
+
+ lastIndex := lastIndex + 1.
+ order at: lastIndex put: anObject.
+
+ super atNewIndex: index put: anObject.!

Item was added:
+ ----- Method: OrderedDictionary>>fillOrderFrom: (in category 'private') -----
+ fillOrderFrom: anArray
+
+ | arraySize |
+ arraySize := lastIndex.
+ lastIndex := 0.
+ 1 to: arraySize do: [:index |
+ (anArray at: index) ifNotNil: [:object |
+ lastIndex := lastIndex + 1.
+ order at: lastIndex put: object]].!

Item was added:
+ ----- Method: OrderedDictionary>>fixEmptySlots (in category 'private') -----
+ fixEmptySlots
+ "Remove all nil slots in the order index to avoid overflow."
+
+ self fillOrderFrom: order.!

Item was added:
+ ----- Method: OrderedDictionary>>growTo: (in category 'private') -----
+ growTo: anInteger
+
+ | oldOrder |
+ super growTo: anInteger.
+ oldOrder := order.
+ order := self class arrayType new: anInteger.
+ self fillOrderFrom: oldOrder.!

Item was added:
+ ----- Method: OrderedDictionary>>initialize: (in category 'private') -----
+ initialize: n
+
+ super initialize: n.
+ order := self class arrayType new: n.
+ lastIndex := 0.!

Item was added:
+ ----- Method: OrderedDictionary>>removeKey:ifAbsent: (in category 'removing') -----
+ removeKey: key ifAbsent: aBlock
+
+ order at: (self scanOrderFor: key) put: nil.
+ ^ super removeKey: key ifAbsent: aBlock!

Item was added:
+ ----- Method: OrderedDictionary>>scanOrderFor: (in category 'private') -----
+ scanOrderFor: anObject
+
+ 1 to: lastIndex do: [:index |
+ | element |
+ ((element := order at: index) notNil and: [anObject = element key])
+ ifTrue: [^ index]].
+
+ lastIndex = order size
+ ifTrue: [self errorNoFreeSpace]
+ ifFalse: [^ self order at: lastIndex + 1 "nil"].!
Marcel Taeumel
2015-01-15 16:15:34 UTC
Permalink
Well, it was just a little programming exercise for me. Let's discuss whether
you like it or not. ;) I did need such a container several times in the
past. Maybe there is a reason why Squeak did not have it? :)

Best,
Marcel



--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799754.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Levente Uzonyi
2015-01-15 19:22:19 UTC
Permalink
I find the OrderedDictionary implementation similar to the one in Pharo,
and therefore sub-optimal. The order variable is only used to iterate
over the associations in insertion/update order.
Such behavior can achieved with a LinkedDictionary implementation, which
provides O(1) average time for all lookup-based operations, and can also
support different behaviors (order by last access instead of order by
insertion/update) within the same time constraints.

When I implement such linked dictionaries (see LRUCache, or OCompletition
for examples), I usually link the values, because that way I can use any
dictionary. For a general purpose linked dictionary, the links can be
added to the associations to hide the details from the user.

Other than this, I found the following issues with the code after a quick
look:
- #scanOrderFor: sends #order, which is not implemented.
- #add: evaluates [index := self scanOrderFor: anAssociation key.] twice.
The second evaluation seems to be unnecessary.
- The class comment doesn't state it everywhere that the big O notation
applies to time, not space.

Can you give me some examples when/where you needed an ordered dictionary?

Levente
Post by Marcel Taeumel
Well, it was just a little programming exercise for me. Let's discuss whether
you like it or not. ;) I did need such a container several times in the
past. Maybe there is a reason why Squeak did not have it? :)
Best,
Marcel
--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799754.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Marcel Taeumel
2015-01-16 08:39:55 UTC
Permalink
Hi Levente,

thank you for the quick code review. I did some fixes. :)

For the LinkedDictionary: When do you establish the desired order
(insertion, lexical, ... someBlock) for the associations?

Use cases are similar to OrderedCollection. But instead of accessing items
via index (someItems at: 5), you can access it with a more descriptive key
(e.g., someItems at: #foobar). I will look for a more concrete one.

To be better comparable with OrderedCollection, there might be an additional
protocol to support #atFirst and others... Hmmm...

Here are more references to the concept itself:
https://www.python.org/dev/peps/pep-0372/
https://docs.python.org/3/library/collections.html#collections.OrderedDict

Best,
Marcel



--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799843.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Bert Freudenberg
2015-01-16 11:13:00 UTC
Permalink
Post by Marcel Taeumel
https://www.python.org/dev/peps/pep-0372/
Interesting. This states that the order is never updated: when an existing element is overwritten, it is not moved to the end. Only new keys are appended. This allows for a more efficient implementation (such as suggested by Levente).

I think an OrderedDictionary would most often be used to preserve the original order of keys. Assigning a new value should not move the item to the end.

- Bert -
Marcel Taeumel
2015-01-16 13:26:11 UTC
Permalink
Hmm... you're right. It seems that in C#, it is just an overwrite, too. Which
could imply no update of the order.

http://msdn.microsoft.com/en-us/library/system.collections.specialized.ordereddictionary.add%28v=vs.100%29.aspx

I cannot verify that right now. However, this would really make the
implementation simpler and maybe more flexible.

Still, I have to look into this linking of values. Did not get it yet... :)

Best,
Marcel



--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799882.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Levente Uzonyi
2015-01-16 14:07:30 UTC
Permalink
Post by Marcel Taeumel
Hmm... you're right. It seems that in C#, it is just an overwrite, too. Which
could imply no update of the order.
http://msdn.microsoft.com/en-us/library/system.collections.specialized.ordereddictionary.add%28v=vs.100%29.aspx
I cannot verify that right now. However, this would really make the
implementation simpler and maybe more flexible.
Still, I have to look into this linking of values. Did not get it yet... :)
I've uploaded a quick implementation, which uses the insertion order.
http://leves.web.elte.hu/squeak/LinkedDictionary-ul.1.mcz

Levente
Post by Marcel Taeumel
Best,
Marcel
--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799882.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Tobias Pape
2015-01-16 15:01:24 UTC
Permalink
Post by Bert Freudenberg
Post by Marcel Taeumel
https://www.python.org/dev/peps/pep-0372/
Interesting. This states that the order is never updated: when an existing element is overwritten, it is not moved to the end. Only new keys are appended. This allows for a more efficient implementation (such as suggested by Levente).
I think an OrderedDictionary would most often be used to preserve the original order of keys. Assigning a new value should not move the item to the end.
ACK. See other mail.
Best
-Tobias
Post by Bert Freudenberg
- Bert -
Eliot Miranda
2015-01-16 15:21:37 UTC
Permalink
Post by Bert Freudenberg
Post by Marcel Taeumel
https://www.python.org/dev/peps/pep-0372/
Interesting. This states that the order is never updated: when an existing element is overwritten, it is not moved to the end. Only new keys are appended. This allows for a more efficient implementation (such as suggested by Levente).
I think an OrderedDictionary would most often be used to preserve the original order of keys. Assigning a new value should not move the item to the end.
+1
Levente Uzonyi
2015-01-16 13:31:02 UTC
Permalink
Post by Marcel Taeumel
Hi Levente,
thank you for the quick code review. I did some fixes. :)
For the LinkedDictionary: When do you establish the desired order
(insertion, lexical, ... someBlock) for the associations?
I'm not sure if I understand this question right, so I'll try do my best
to give a meaningful answer.

If the goal is to keep the insertion order, then #atNewIndex:put: is the
right place.
If you want it to reflect the order of modifications (insert + update),
then #add: and #at:put: has to be modified too.
If you want the order to reflect the order of accesses, then all other
methods have to be changed which access the value (or the key?).

For the list implementation, I always prefer the circular doubly-linked
list with a head element (Not sure if this is the right term in english.
It means a separate element with no value, just the next and the previous
links. This allows the code to be simpler, because the list always has
at least one element). This list provides O(1) time removal, and insertion
next to the head (or any other known element). This is why any kind of
ordering can be implemented with O(1) extra time, as long as the elements
are moved to the front.

The only drawback this dictionary implementation has is that the
associations added with #add: can't be used, unless they are from the
right class (the one which has the two links).

In Java this collection is called LinkedHashMap:
http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html
This supports two kind of ordering: access-order and insertion-order.

Levente
Post by Marcel Taeumel
Use cases are similar to OrderedCollection. But instead of accessing items
via index (someItems at: 5), you can access it with a more descriptive key
(e.g., someItems at: #foobar). I will look for a more concrete one.
To be better comparable with OrderedCollection, there might be an additional
protocol to support #atFirst and others... Hmmm...
https://www.python.org/dev/peps/pep-0372/
https://docs.python.org/3/library/collections.html#collections.OrderedDict
Best,
Marcel
--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799843.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Marcel Taeumel
2015-01-16 13:54:20 UTC
Permalink
Ah, okay. I was curious about the performance of iterating over an array vs.
the linked list:

a := Array new: 1000.
l := LinkedList new.

1000 timesRepeat: [l add: (StackLink with: nil)].

[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"

[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"

I expected the opposite! :-O Why is that so?

Best,
Marcel



--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799893.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Eliot Miranda
2015-01-16 15:36:56 UTC
Permalink
Hi Marcel,
Post by Marcel Taeumel
Ah, okay. I was curious about the performance of iterating over an array vs.
a := Array new: 1000.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: nil)].
[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
I expected the opposite! :-O Why is that so?
Look at the implementations of do:.

In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element.

In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element.
Post by Marcel Taeumel
Best,
Marcel
--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799893.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Bert Freudenberg
2015-01-16 15:53:11 UTC
Permalink
Post by Eliot Miranda
Hi Marcel,
Post by Marcel Taeumel
Ah, okay. I was curious about the performance of iterating over an array vs.
a := Array new: 1000.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: nil)].
[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
I expected the opposite! :-O Why is that so?
Look at the implementations of do:.
In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element.
In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element.
Making it all the more surprising that the linked list is twice as fast.

My guess would be that the two inst var reads are way more efficient than the at: primitive.

- Bert -
Eliot Miranda
2015-01-16 16:27:53 UTC
Permalink
Post by Bert Freudenberg
Post by Eliot Miranda
Hi Marcel,
Post by Marcel Taeumel
Ah, okay. I was curious about the performance of iterating over an array vs.
a := Array new: 1000.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: nil)].
[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
I expected the opposite! :-O Why is that so?
Look at the implementations of do:.
In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element.
In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element.
Making it all the more surprising that the linked list is twice as fast.
Ugh. I was blindsided. I saw the numbers as times not rates. Marcel, I'm sorry. OK indeed the difference would be the cost of the bounds check in each array access. Interesting to see at what point the poorer memory locality of the list makes it worse. Difficult to measure given the locality is decided by history and the GC.
Post by Bert Freudenberg
My guess would be that the two inst var reads are way more efficient than the at: primitive.
- Bert -
Marcel Taeumel
2015-01-16 16:28:47 UTC
Permalink
Okay, the Array >> #do: comes down to Number's:

to: stop do: aBlock
"Normally compiled in-line, and therefore not overridable.
Evaluate aBlock for each element of the interval (self to: stop by: 1)."
| nextValue |
nextValue := self.
[nextValue <= stop]
whileTrue:
[aBlock value: nextValue.
nextValue := nextValue + 1]

And the LinkedList >> #do: is:

do: aBlock
| aLink |
aLink := firstLink.
[aLink == nil] whileFalse:
[aBlock value: aLink.
aLink := aLink nextLink]

So Array iteration has the incrementation of 1 in each iteration. That makes
the difference? I thought that #to:do: is optimized as it does not appear on
the stack... :)

Best,
Marcel



--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799947.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Eliot Miranda
2015-01-16 16:39:34 UTC
Permalink
Hi Marcel,
Post by Marcel Taeumel
to: stop do: aBlock
"Normally compiled in-line, and therefore not overridable.
Evaluate aBlock for each element of the interval (self to: stop by: 1)."
| nextValue |
nextValue := self.
[nextValue <= stop]
[aBlock value: nextValue.
nextValue := nextValue + 1]
do: aBlock
| aLink |
aLink := firstLink.
[aBlock value: aLink.
aLink := aLink nextLink]
So Array iteration has the incrementation of 1 in each iteration. That makes
the difference? I thought that #to:do: is optimized as it does not appear on
the stack... :)
Ghhh, it's too early :-). Only just now drinking some tea. The Array implementation is compiled to an unlined while loop so all the overhead is in each application of at: that does bounds checking every time.

I'll profile this with the VMProfiler in a few minutes. This is great news for Sista because this is some of the overhead it will eliminate.
Post by Marcel Taeumel
Best,
Marcel
--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799947.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Bert Freudenberg
2015-01-16 16:50:55 UTC
Permalink
Post by Marcel Taeumel
to: stop do: aBlock
"Normally compiled in-line, and therefore not overridable.
Evaluate aBlock for each element of the interval (self to: stop by: 1)."
| nextValue |
nextValue := self.
[nextValue <= stop]
[aBlock value: nextValue.
nextValue := nextValue + 1]
No, the to:do: is inlined. Look at the bytecodes ...

do: aBlock
1 to: self size do:
[:index | aBlock value: (self at: index)]
Post by Marcel Taeumel
do: aBlock
| aLink |
aLink := firstLink.
[aBlock value: aLink.
aLink := aLink nextLink]
So Array iteration has the incrementation of 1 in each iteration. That makes
the difference? I thought that #to:do: is optimized as it does not appear on
the stack... :)
The difference is that #at: is much more expensive than #nextLink.

Interestingly, on the interpreter VM the Array version is ever so slightly faster.

- Bert -
Levente Uzonyi
2015-01-16 16:55:23 UTC
Permalink
Post by Eliot Miranda
Post by Bert Freudenberg
Post by Eliot Miranda
Hi Marcel,
Post by Marcel Taeumel
Ah, okay. I was curious about the performance of iterating over an array vs.
a := Array new: 1000.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: nil)].
[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
I expected the opposite! :-O Why is that so?
Look at the implementations of do:.
In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element.
In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element.
Making it all the more surprising that the linked list is twice as fast.
Ugh. I was blindsided. I saw the numbers as times not rates. Marcel, I'm sorry. OK indeed the difference would be the cost of the bounds check in each array access. Interesting to see at what point the poorer memory locality of the list makes it worse. Difficult to measure given the locality is decided by history and the GC.
In general the locality of the links are worse. In the example they are
allocated almost next to each other.
A slightly better example would be this:

| buffer |
buffer := Array new: 1000. "This will hold the filler objects."
#(1024 2048 4096 8192 16384 32768 65536) collect: [ :fillerSize |
| list |
list := LinkedList new.
1 to: 1000 do: [ :index |
list add: (StackLink with: nil).
buffer at: index put: (Array new: fillerSize) ].
Smalltalk garbageCollect.
fillerSize -> [ list do: [ :ea | 1 + 2 ] ] bench ]

{ "gap between nodes (x4 bytes)" -> "runs"
1024->'78,000 per second.'.
2048->'78,600 per second.'.
4096->'77,500 per second.'.
8192->'73,300 per second.'.
16384->'70,200 per second.'.
32768->'66,400 per second.'.
65536->'56,400 per second.'}

(The result is '45,700 per second.' for Arrays)

The results depend on the CPU (and probably on the OS too), but the
numbers are still pretty good. Probably a longer list would cause more
problem for the caches.
The same for 10k elements in the list:

{
1024->'3,770 per second.'.
2048->'4,780 per second.'.
4096->'4,220 per second.'.
8192->'2,480 per second.'.
16384->'2,110 per second.'}

(The result is '4,380 per second.' for Arrays)

So the speed of a list depends on the cache locality of its elements. The
performance is still comparable with arrays for practical cases, but may
be both faster and slower depending on the cache locality.

Levente
Post by Eliot Miranda
Post by Bert Freudenberg
My guess would be that the two inst var reads are way more efficient than the at: primitive.
- Bert -
Tobias Pape
2015-01-16 17:30:12 UTC
Permalink
Post by Eliot Miranda
Hi Marcel,
[snip]

These kind of discussions is why I love squeak-dev :)
I always lern.

Best
-Tobias
Tobias Pape
2015-01-16 17:31:55 UTC
Permalink
Post by Tobias Pape
Post by Eliot Miranda
Hi Marcel,
[snip]
These kind of discussions is why I love squeak-dev :)
I always lern.
(save spelling, apparently -.-')
Post by Tobias Pape
Best
-Tobias
Bert Freudenberg
2015-01-16 17:45:50 UTC
Permalink
Post by Tobias Pape
Post by Tobias Pape
Post by Eliot Miranda
Hi Marcel,
[snip]
These kind of discussions is why I love squeak-dev :)
I always lern.
(save spelling, apparently -.-')
That was perfect Germanish ;)

- Bert -
Chris Muller
2015-01-16 19:49:34 UTC
Permalink
Post by Levente Uzonyi
In general the locality of the links are worse. In the example they are
allocated almost next to each other.
| buffer |
buffer := Array new: 1000. "This will hold the filler objects."
#(1024 2048 4096 8192 16384 32768 65536) collect: [ :fillerSize |
| list |
list := LinkedList new.
1 to: 1000 do: [ :index |
list add: (StackLink with: nil).
buffer at: index put: (Array new: fillerSize) ].
Smalltalk garbageCollect.
fillerSize -> [ list do: [ :ea | 1 + 2 ] ] bench ]
{ "gap between nodes (x4 bytes)" -> "runs"
1024->'78,000 per second.'.
2048->'78,600 per second.'.
4096->'77,500 per second.'.
8192->'73,300 per second.'.
16384->'70,200 per second.'. 32768->'66,400 per second.'.
65536->'56,400 per second.'}
(The result is '45,700 per second.' for Arrays)
The results depend on the CPU (and probably on the OS too), but the numbers
are still pretty good. Probably a longer list would cause more problem for
the caches.
{
1024->'3,770 per second.'.
2048->'4,780 per second.'.
4096->'4,220 per second.'.
8192->'2,480 per second.'.
16384->'2,110 per second.'}
(The result is '4,380 per second.' for Arrays)
So the speed of a list depends on the cache locality of its elements. The
performance is still comparable with arrays for practical cases, but may be
both faster and slower depending on the cache locality.
Geez you guys /seriously/ rock. I never thought down to that level of
(CPU) cache locality.

You've got me feeling like I should go back to some
performance-critical code which is enumerating Array's and see about
converting them to LinkedLists.. too bad I don't have time to do
that!
Eliot Miranda
2015-01-16 20:06:35 UTC
Permalink
Post by Eliot Miranda
Post by Levente Uzonyi
In general the locality of the links are worse. In the example they are
allocated almost next to each other.
| buffer |
buffer := Array new: 1000. "This will hold the filler objects."
#(1024 2048 4096 8192 16384 32768 65536) collect: [ :fillerSize |
| list |
list := LinkedList new.
1 to: 1000 do: [ :index |
list add: (StackLink with: nil).
buffer at: index put: (Array new: fillerSize) ].
Smalltalk garbageCollect.
fillerSize -> [ list do: [ :ea | 1 + 2 ] ] bench ]
{ "gap between nodes (x4 bytes)" -> "runs"
1024->'78,000 per second.'.
2048->'78,600 per second.'.
4096->'77,500 per second.'.
8192->'73,300 per second.'.
16384->'70,200 per second.'. 32768->'66,400 per second.'.
65536->'56,400 per second.'}
(The result is '45,700 per second.' for Arrays)
The results depend on the CPU (and probably on the OS too), but the
numbers
Post by Levente Uzonyi
are still pretty good. Probably a longer list would cause more problem
for
Post by Levente Uzonyi
the caches.
{
1024->'3,770 per second.'.
2048->'4,780 per second.'.
4096->'4,220 per second.'.
8192->'2,480 per second.'.
16384->'2,110 per second.'}
(The result is '4,380 per second.' for Arrays)
So the speed of a list depends on the cache locality of its elements. The
performance is still comparable with arrays for practical cases, but may
be
Post by Levente Uzonyi
both faster and slower depending on the cache locality.
Geez you guys /seriously/ rock. I never thought down to that level of
(CPU) cache locality.
You've got me feeling like I should go back to some
performance-critical code which is enumerating Array's and see about
converting them to LinkedLists.. too bad I don't have time to do
that!
Please, folks, unless you're really pressed don't bother. We should have
Sista deployed this year and that will take care of this nicely.
--
best,
Eliot
Paul DeBruicker
2015-01-16 19:36:00 UTC
Permalink
Post by Eliot Miranda
Ugh. I was blindsided. I saw the numbers as times not rates. Marcel,
I'm sorry. OK indeed the difference would be the cost of the bounds check
in each array access. Interesting to see at what point the poorer memory
locality of the list makes it worse. Difficult to measure given the
locality is decided by history and the GC.
With Spur could you use object pinning to make a dense linked list? And then
use that for arrays that need to be fast?



--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4800035.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Eliot Miranda
2015-01-16 19:47:55 UTC
Permalink
Hi Paul,
Post by Paul DeBruicker
Post by Eliot Miranda
Ugh. I was blindsided. I saw the numbers as times not rates. Marcel,
I'm sorry. OK indeed the difference would be the cost of the bounds
check
Post by Eliot Miranda
in each array access. Interesting to see at what point the poorer memory
locality of the list makes it worse. Difficult to measure given the
locality is decided by history and the GC.
With Spur could you use object pinning to make a dense linked list? And then
use that for arrays that need to be fast?
It might make some difference but I doubt it'll work well. There is code
to cluster pinned objects in the segment or segments that contain pinned
objects. But it's not going to work well; if the object is in new space
then it is copied to old space and forwarded, plus some scanning of stack
pages. Thats a lot of overhead. If it's in old space anyway the GC just
sets the bit.

Pinning is really there for objects to be passed through the FFI or space
needed to be referred to from machine code, such as conditional branch
counters in Sista. it's not there to control locality. Hence it is
unoptimised. But that's an interesting idea :-).

A more general solution would be to have some sort of reorganisation in the
VM. Currently the compaction algorithm used with global GC scrambles
things. It would be nice if the incremental global compactor could not
scramble, and its my intent to at least avoid scrambling, if not undo it in
the incremental global gc (which has yet to be started).

--
Post by Paul DeBruicker
http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4800035.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
--
best,
Eliot
Eliot Miranda
2015-01-16 15:55:27 UTC
Permalink
Hi All,
Post by Eliot Miranda
Hi Marcel,
Post by Marcel Taeumel
Ah, okay. I was curious about the performance of iterating over an array vs.
a := Array new: 1000.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: nil)].
[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
I expected the opposite! :-O Why is that so?
Look at the implementations of do:.
In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element.
In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element.
This is interesting. (Marcel, Chris, forgive me; I'm presuming; please don't take this personally). Marcel above appears to lack an intuition about the structure of Array vs LinkedList. And in developing a hash algorithm for a 32-bit subset of Floats a few weeks ago Chris appeared to lack an I tuition about Floats being boxed, assuming they were value types, not containers.

As a VM implementer I carry around a clear picture (literally, I am a visual thinker) of objects in my head. Those pictures are key to my approach to design and optimization.

I presume that for someone approaching the system given only a textual description of object structures, through class comments and method source it is difficult to develop a good picture or mental model. For me, I read the blue book first, which is replete with pictures.

I know that historically visual inspector frameworks such as Jun have been able to auto-generate pictorial representations of specific object graphs. I wonder how useful it would be to provide support for designers to include pictorial representations in class comments.

Again I presume that the text model would have to support inclusion of simple bitmaps (to avoid having to include a full drawing framework in the system) and that the designer would construct a sample graph, generate a diagram using a visual inspector framework using eg Jun, render it to a bitmap and include it in the class comment.

A more elaborate system could of course include the sample graph and render it dynamically, that would allow exploration.

Either approach would make an interesting project, yes?
Post by Eliot Miranda
Post by Marcel Taeumel
Best,
Marcel
--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799893.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Eliot Miranda
2015-01-16 16:00:36 UTC
Permalink
Post by Eliot Miranda
Hi All,
Post by Eliot Miranda
Hi Marcel,
Post by Marcel Taeumel
Ah, okay. I was curious about the performance of iterating over an array vs.
a := Array new: 1000.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: nil)].
[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
I expected the opposite! :-O Why is that so?
Look at the implementations of do:.
In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element.
In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element.
This is interesting. (Marcel, Chris, forgive me; I'm presuming; please don't take this personally). Marcel above appears to lack an intuition about the structure of Array vs LinkedList. And in developing a hash algorithm for a 32-bit subset of Floats a few weeks ago Chris appeared to lack an I tuition about Floats being boxed, assuming they were value types, not containers.
As a VM implementer I carry around a clear picture (literally, I am a visual thinker) of objects in my head. Those pictures are key to my approach to design and optimization.
and lest anyone think I have no problem developing these pictures, elsewhere in the thread Levente discussed the linked hash map organization of OrderedDictionary. It took me the few minutes in which I drafted an erroneous and unsent email to develop the picture that made me realize Levente was right about lookup being O(1) because I didn't have the picture of a link in my head when I started writing the email.
Post by Eliot Miranda
I presume that for someone approaching the system given only a textual description of object structures, through class comments and method source it is difficult to develop a good picture or mental model. For me, I read the blue book first, which is replete with pictures.
I know that historically visual inspector frameworks such as Jun have been able to auto-generate pictorial representations of specific object graphs. I wonder how useful it would be to provide support for designers to include pictorial representations in class comments.
Again I presume that the text model would have to support inclusion of simple bitmaps (to avoid having to include a full drawing framework in the system) and that the designer would construct a sample graph, generate a diagram using a visual inspector framework using eg Jun, render it to a bitmap and include it in the class comment.
A more elaborate system could of course include the sample graph and render it dynamically, that would allow exploration.
Either approach would make an interesting project, yes?
Post by Eliot Miranda
Post by Marcel Taeumel
Best,
Marcel
--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799893.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Marcel Taeumel
2015-01-16 16:16:56 UTC
Permalink
Ah, I just thought of an array as being some contiguous memory area. Mea
culpa. :)

@ Levente: Thanks for the implementation of LinkedDictionary! Now I see,
what you meant. The links are still an addition to the array of Dictionary.
I misunderstood that part.

Best,
Marcel



--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799945.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Eliot Miranda
2015-01-16 16:33:55 UTC
Permalink
Post by Marcel Taeumel
Ah, I just thought of an array as being some contiguous memory area. Mea
That's exactly what it is. I've made a fool if myself replying to email b4 I've properly woken up :-/
Post by Marcel Taeumel
@ Levente: Thanks for the implementation of LinkedDictionary! Now I see,
what you meant. The links are still an addition to the array of Dictionary.
I misunderstood that part.
Best,
Marcel
--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799945.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Chris Muller
2015-01-16 18:41:55 UTC
Permalink
Post by Eliot Miranda
I presume that for someone approaching the system given only a textual description of object structures, through class comments and method source it is difficult to develop a good picture or mental model. For me, I read the blue book first, which is replete with pictures.
I like to envision the physical model in my mind. I look at the class
definitions to identify the object links (for pointer objects) and
then the methods which modifies those vars.

Once I have that picture of the physical model in my head, I then in
my mind think about that shape model with each node "suited up" with
its behaviors. Knowing their underlying links to other objects even
helps me evaluate the consistency and nuance of the API).
Post by Eliot Miranda
I know that historically visual inspector frameworks such as Jun have been able to auto-generate pictorial representations of specific object graphs. I wonder how useful it would be to provide support for designers to include pictorial representations in class comments.
For years, I have been /totally/ interested in Alexandre's project,
not just to represent in-memory models but large Magma models,
visually. Roassal might struggle to perform with millions of nodes, I
don't know, but I know Craig used Walrus to render a large model.
Unfortunately I just haven't had the time to dig into these projects..
Post by Eliot Miranda
Again I presume that the text model would have to support inclusion of simple bitmaps (to avoid having to include a full drawing framework in the system) and that the designer would construct a sample graph, generate a diagram using a visual inspector framework using eg Jun, render it to a bitmap and include it in the class comment.
Squeak's Text model actually supports the inclusion of any Morph
embedded as a single character. Maui uses that for its "documents" it
actually works very well. Seeing animated, fully-functioning Morphs
as characters in a document is kind'a cool.
Post by Eliot Miranda
A more elaborate system could of course include the sample graph and render it dynamically, that would allow exploration.
Either approach would make an interesting project, yes?
Post by c***@source.squeak.org
Best,
Marcel
--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799893.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
Craig Latta
2015-01-16 21:23:58 UTC
Permalink
Hi Chris--
Post by Chris Muller
For years, I have been /totally/ interested in Alexandre's project,
not just to represent in-memory models but large Magma models,
visually. Roassal might struggle to perform with millions of nodes, I
don't know, but I know Craig used Walrus to render a large model.
Yes, Walrus is fantastic. There's still nothing else that does
interactive hyperbolic trees, as far as I know.


-C

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)
Trygve Reenskaug
2015-01-17 11:06:23 UTC
Permalink
Eliot,
I suggest using BabySRI to help students internalize the difference
between a class oriented and an object oriented mindset. My message "A
puzzle for the New Year" of 02.01.2015 12:01 has an example.
--Trygve
Post by Eliot Miranda
Hi All,
+++
This is interesting. (Marcel, Chris, forgive me; I'm presuming; please don't take this personally). Marcel above appears to lack an intuition about the structure of Array vs LinkedList. And in developing a hash algorithm for a 32-bit subset of Floats a few weeks ago Chris appeared to lack an I tuition about Floats being boxed, assuming they were value types, not containers.
As a VM implementer I carry around a clear picture (literally, I am a visual thinker) of objects in my head. Those pictures are key to my approach to design and optimization.
I presume that for someone approaching the system given only a textual description of object structures, through class comments and method source it is difficult to develop a good picture or mental model. For me, I read the blue book first, which is replete with pictures.
I know that historically visual inspector frameworks such as Jun have been able to auto-generate pictorial representations of specific object graphs. I wonder how useful it would be to provide support for designers to include pictorial representations in class comments.
Again I presume that the text model would have to support inclusion of simple bitmaps (to avoid having to include a full drawing framework in the system) and that the designer would construct a sample graph, generate a diagram using a visual inspector framework using eg Jun, render it to a bitmap and include it in the class comment.
A more elaborate system could of course include the sample graph and render it dynamically, that would allow exploration.
Either approach would make an interesting project, yes?
--
The essence of object orientation is that
objects collaborate to achieve a goal.
The class hierarchy is in the nature of a comment;
it has no run time significance.
---
Trygve Reenskaug mailto: ***@ifi.uio.no <mailto:%***@ifi.uio.no>
Morgedalsvn. 5A http://folk.uio.no/trygver/
N-0378 Oslo http://fullOO.info
Norway Tel: (+47) 22 49 57 27
Eliot Miranda
2015-01-16 21:39:40 UTC
Permalink
Hi Marcel,

On Fri, Jan 16, 2015 at 5:54 AM, Marcel Taeumel <
Post by Marcel Taeumel
Ah, okay. I was curious about the performance of iterating over an array vs.
a := Array new: 1000.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: nil)].
[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
I expected the opposite! :-O Why is that so?
Best,
Marcel
OK, here's the profiling results, all on a .2GHz Core i7 MBP (and these
numbers are noisy; there's a lot else going on on my machine). First,
using Cog and a slightly modified benchmark that uses the elements of the
collections so that the benchmark is "real", and so that each block does
the same ammount of work

| a l |
a := Array new: 1000 withAll: 1 -> 1.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
{ [| t | t := 0. a do: [:ea | t := t + ea key]. t] bench.
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench} #('51,900 per
second.' '70,400 per second.')

Now the same code on Spur:
#('57,100 per second.' '63,300 per second.')

at: is faster, but (alas) LinkedList>>do: is slower because in Spur #==
requires a read barrier to check for possible forwarders, so in "[aLink ==
nil] whileFalse:" Spur has to fetch a field from aLink to ensure it is not
forwarded.

At the VM level we can see that Object>>at: is expensive, but the block
overhead dominates. First Cog

/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.v3/Fast.app/Contents/MacOS/Squeak
1/16/2015
eden size: 2,097,152 stack pages: 160 code size: 1,048,576

| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench

gc prior. clear prior.
5.004 seconds; sampling frequency 1471 hz
7340 samples in the VM (7363 samples in the entire program) 99.69% of total

7302 samples in generated vm code 99.48% of entire vm (99.17% of total)
38 samples in vanilla vm code 0.52% of entire vm ( 0.52% of total)

% of generated vm code (% of total) (samples) (cumulative)
30.33% (30.08%) BlockClosure>>value: (2215) (30.33%)
18.47% (18.32%) Object>>at: (1349) (48.81%)
16.83% (16.69%) UndefinedObject>>DoIt (1229) (65.64%)
14.43% (14.31%) SequenceableCollection>>do: (1054) (80.07%)
9.57% ( 9.49%) LookupKey>>key (699) (89.65%)
6.35% ( 6.30%) SmallInteger>>+ (464) (96.00%)
3.75% ( 3.72%) PIC at: (274) (99.75%)
0.07% ( 0.07%) BlockClosure>>value (5) (99.82%)
0.05% ( 0.05%) BlockClosure>>bench (4) (99.88%)
0.04% ( 0.04%) PIC size (3) (99.92%)
0.04% ( 0.04%) ceClosureCopyTrampoline (3) (99.96%)
0.03% ( 0.03%) Time class>>mi...condClockValue (2) (99.99%)
0.01% ( 0.01%) ArrayedCollection>>size (1) (100.0%)


| l |
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench

gc prior. clear prior.
5.001 seconds; sampling frequency 1452 hz
7240 samples in the VM (7260 samples in the entire program) 99.72% of total

7186 samples in generated vm code 99.25% of entire vm (98.98% of total)
54 samples in vanilla vm code 0.75% of entire vm ( 0.74% of total)

% of generated vm code (% of total) (samples) (cumulative)
29.22% (28.93%) BlockClosure>>value: (2100) (29.22%)
26.32% (26.05%) UndefinedObject>>DoIt (1891) (55.54%)
22.92% (22.69%) LinkedList>>do: (1647) (78.46%)
8.14% ( 8.06%) SmallInteger>>+ (585) (86.60%)
6.76% ( 6.69%) StackLink>>element (486) (93.36%)
6.46% ( 6.39%) Link>>nextLink (464) (99.82%)
0.08% ( 0.08%) Time class>>...ndClockValue (6) (99.90%)
0.04% ( 0.04%) BlockClosure>>bench (3) (99.94%)
0.03% ( 0.03%) BlockClosure>>value (2) (99.97%)
0.01% ( 0.01%) ceClosureCopyTrampoline (1) (99.99%)
0.01% ( 0.01%) ceCreateNew...Trampoline (1) (100.0%)


and then Spur:

/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak
1/16/2015
eden size: 4,101,312 stack pages: 160 code size: 1,048,576

| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench

gc prior. clear prior.
5.002 seconds; sampling frequency 1471 hz
7340 samples in the VM (7359 samples in the entire program) 99.74% of total

7330 samples in generated vm code 99.86% of entire vm (99.61% of total)
10 samples in vanilla vm code 0.14% of entire vm ( 0.14% of total)

% of generated vm code (% of total) (samples) (cumulative)
33.06% (32.93%) BlockClosure>>value: (2423) (33.06%)
18.66% (18.59%) UndefinedObject>>DoIt (1368) (51.72%)
17.97% (17.90%) SequenceableCollection>>do: (1317) (69.69%)
13.33% (13.28%) Object>>at: (977) (83.02%)
8.84% ( 8.81%) SmallInteger>>+ (648) (91.86%)
4.90% ( 4.88%) PIC at: (359) (96.75%)
3.08% ( 3.07%) LookupKey>>key (226) (99.84%)
0.05% ( 0.05%) ceSmallBlockContext (4) (99.89%)
0.04% ( 0.04%) BlockClosure>>value (3) (99.93%)
0.03% ( 0.03%) Time class>>mi...condClockValue (2) (99.96%)
0.01% ( 0.01%) ArrayedCollection>>size (1) (99.97%)
0.01% ( 0.01%) BlockClosure>>bench (1) (99.99%)
0.01% ( 0.01%) EventSensor>>eventTickler (1) (100.0%)

| l |
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench

gc prior. clear prior.
5.002 seconds; sampling frequency 1464 hz
7315 samples in the VM (7321 samples in the entire program) 99.92% of total

7307 samples in generated vm code 99.89% of entire vm (99.81% of total)
8 samples in vanilla vm code 0.11% of entire vm ( 0.11% of total)

% of generated vm code (% of total) (samples) (cumulative)
27.33% (27.28%) UndefinedObject>>DoIt (1997) (27.33%)
27.11% (27.06%) LinkedList>>do: (1981) (54.44%)
24.39% (24.34%) BlockClosure>>value: (1782) (78.83%)
11.15% (11.13%) SmallInteger>>+ (815) (89.98%)
5.30% ( 5.29%) Link>>nextLink (387) (95.28%)
4.50% ( 4.49%) StackLink>>element (329) (99.78%)
0.12% ( 0.12%) BlockClosure>>bench (9) (99.90%)
0.07% ( 0.07%) Time class>>...ndClockValue (5) (99.97%)
0.01% ( 0.01%) ceSmallBlockContext (1) (99.99%)
0.01% ( 0.01%) BlockClosure>>value (1) (100.0%)


So to try and make sense of this we can say that the cost of at: vs the
cost of block evaluation is, in Cog
1349 + 274 / 2215 ~= 0.73

and in Spur
977 + 359 / 2423 ~= 0.55

at: being much faster in Spur because the object representation makes it
much simpler to do the type analysis (since Object>>at: works on any
indexable class) and bounds check.


and approximately the difference between the Array and the LinkedList comes
down to the relative costs of at: & SequenceableCollection>>#do: vs
nextLink & LinkedList>>#do:, in Cog

1349 + 274 + 1054 / (1647 + 464) ~= 1.27

and in Spur

977 + 359 / (1981 + 387) ~= 0.56


Now Sista will make a big difference because it won't just eliminate the
bounds checking overhead in at:, it'll also inline the block, so it'll be
much more like

| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. 1 to: 1000 do: [:i| t := t + (a at: i) key]. t] bench

Cog: 72,700 per second.
Spur: 87,300 per second.

but faster still because the bounds check in at: is completely eliminated.
--
best,
Eliot
Loading...