Discussion:
Magma back end to SqueakSource (was: Re: Build.squeak.org and squeaksource.com in danger (was Re: [Box-Admins] Disk space usage on box3))
(too old to reply)
David T. Lewis
2013-12-08 05:54:22 UTC
Permalink
I thought about it 4 years ago and made an object model to take care
of it. The problem with zip-files is how wasteful they are. When
someone changes one single method of the Morphic package, the other 1K
definitions (however many there are) are duplicated in new zip (mcz)
file. By contrast, the object model refers to the same canonicalized
MCDefinition instances across Versions, adding only one new
MCDefinition to the bulk of the model in that example.
The result is that the redundant Magma-backed copy of
source.squeak.org consumes less than 1/4th the space of the original
File based version. A Magma-backed copy of squeaksource.com would
about 12GB of space.
PS -- For interest, I just kicked off a bulk-load of entire
squeaksource.com repository into Magma to see how much space it will
take..
I changed the subject line, and am moving the discussion over to squeak-dev
because I think it may be of more general interest.

I'm quite interested to know how that turns out. Entirely aside from
disk space concerns, the approach you are describing makes a lot of
sense to me, and the squeaksource.com archive provides a fairly large
data set to try it out.

Bob Arning has been doing some really interesting things with a Seaside
browser for exploring the change set records of earlier Squeak development.
Meanwhile you (Chris) are doing equally interesting work to enable a
Monticello browser to browse through the historical record of source.squeak.org,
backed up a Magma-enabled image currently running on box4.squeak.org:8888.

I have a of vague hand-waving notion that these two should be related, and
that if I wanted to figure out how some method in e.g. ObjectMemory came
into being, it would be really convenient if I could explore its change
history to see various things that Eliot or Tim or I might have done in
recent years in the Monticello repositories, and continue back in time
through the change set update stream to see how and why Dan Ingalls might
have originally implemented it in Squeak 1.x.

Is there some way in which the change set based update stream from earlier
Squeak could also be captured in the Magma back end, similar to what you
are doing with the Monticello packages?

Dave
Chris Muller
2013-12-09 02:57:45 UTC
Permalink
That's very cool, but it's not the zips we're talking about. We're
talking about the artifacts produced by Jenkins, which are Squeak
images. I wrote the release process to produce a series of artifacts
with different names. Because nothing gets replaced/overwritten, disk
usage is unbound.
The rate of untamed growth is too fast.
The result is that the redundant Magma-backed copy of
source.squeak.org consumes less than 1/4th the space of the original
File based version. A Magma-backed copy of squeaksource.com would
about 12GB of space.
For now, though, Frank has the ability and responsibility to trim up
the jenkins stuff. Thanks Frank.
I nuked the ReleaseSqueakTrunk target directory, freeing up 5.9 GB of
space. (We should always consider the target/ directory of every job
as being evanescent. Feel free to nuke these at any time. Every job
should be written such that it can reconstitute its working
environment in target/.
Automating something (builds) shouldn't introduce a new manual
process. There should be something to keep the size reasonable so we
don't have to remember do it manually every month..
Frank Shearar
2013-12-09 03:02:33 UTC
Permalink
Post by Chris Muller
That's very cool, but it's not the zips we're talking about. We're
talking about the artifacts produced by Jenkins, which are Squeak
images. I wrote the release process to produce a series of artifacts
with different names. Because nothing gets replaced/overwritten, disk
usage is unbound.
The rate of untamed growth is too fast.
I have no idea what you mean by that. I don't consider ~30MB of new
data to be "untamed growth", nor do I consider the accumulation of a
mere 6 GB over, what, 600 builds? to be "too fast".
Post by Chris Muller
The result is that the redundant Magma-backed copy of
source.squeak.org consumes less than 1/4th the space of the original
File based version. A Magma-backed copy of squeaksource.com would
about 12GB of space.
For now, though, Frank has the ability and responsibility to trim up
the jenkins stuff. Thanks Frank.
I nuked the ReleaseSqueakTrunk target directory, freeing up 5.9 GB of
space. (We should always consider the target/ directory of every job
as being evanescent. Feel free to nuke these at any time. Every job
should be written such that it can reconstitute its working
environment in target/.
Automating something (builds) shouldn't introduce a new manual
process. There should be something to keep the size reasonable so we
don't have to remember do it manually every month..
Er. I explicitly decided to produce versioned artifacts that a human
would assess for quality. I stand by that decision. You _should_ be
able to easily go and say "Well, version N is crap, but N - 2 is just
fine. Let's just go with that."

What's crazy is that we don't have graphs that we can use to easily
track disk usage, nor alerting mechanisms other than Ken to warn of
impending doom. Munin and Icinga would do fine, with Apache/nginx to
serve up status pages. If only someone with sufficient time had the
energy to put in the graft necessary to set up these standard sysadmin
tools.

frank
David T. Lewis
2013-12-09 04:18:21 UTC
Permalink
Post by Frank Shearar
Er. I explicitly decided to produce versioned artifacts that a human
would assess for quality. I stand by that decision. You _should_ be
able to easily go and say "Well, version N is crap, but N - 2 is just
fine. Let's just go with that."
I want to ask specifically with respect to the SqueakTrunk job, which is
set up to do this:

Take a base image (currently 4.5-12565), update it, archive the result.
Run the entire suite of in-image tests.

For that job, do we want to archive all build artifacts, or is it instead
sufficient to discard all but the last successful/stable artifact to save
disk space?

I would like to make the following changes to the SqueakTrunk job
configuration:

1) Archive just the TrunkImage.zip, not all of TrunkImage.*

2) Enable the "Discard all but the last successful/stable artifact to save
disk space" option.

3) Enable the "Abort the build if it's stuck" option, with a timeout
of 30 minutes.

Is it OK to do this for the SqueakTrunk job?

Dave
Frank Shearar
2013-12-09 16:18:45 UTC
Permalink
Post by David T. Lewis
Post by Frank Shearar
Er. I explicitly decided to produce versioned artifacts that a human
would assess for quality. I stand by that decision. You _should_ be
able to easily go and say "Well, version N is crap, but N - 2 is just
fine. Let's just go with that."
I want to ask specifically with respect to the SqueakTrunk job, which is
Take a base image (currently 4.5-12565), update it, archive the result.
Run the entire suite of in-image tests.
For that job, do we want to archive all build artifacts, or is it instead
sufficient to discard all but the last successful/stable artifact to save
disk space?
I would like to make the following changes to the SqueakTrunk job
1) Archive just the TrunkImage.zip, not all of TrunkImage.*
I did this over the weekend. But I just noticed that I didn't
configure Jenkins to actually use the new zipfile. It does now.
Post by David T. Lewis
2) Enable the "Discard all but the last successful/stable artifact to save
disk space" option.
I specifically don't want to archive only the last successful build
because that makes it harder for us to debug problems. Imagine that
build 101 works, 102 doesn't, and 103 does. It turns out that 103
"succeeds" because of a completely bogus reason, like a bug in the CI
scripts. We need to see the last known good state, 101. But alas, it's
gone.

This is not what caused the disk usage problem.
Post by David T. Lewis
3) Enable the "Abort the build if it's stuck" option, with a timeout
of 30 minutes.
That's fine, but wouldn't have helped any time the build hung in the
past, because those hung processes were orphaned anyway. We have used
this feature, in fact, We don't anymore because you end up having two
separate places to define timeouts. Eventually it _will_ be useful,
once we teach Squeak to fail, rather than hang.
Post by David T. Lewis
Is it OK to do this for the SqueakTrunk job?
frank
Post by David T. Lewis
Dave
Chris Muller
2013-12-09 04:20:55 UTC
Permalink
Post by Frank Shearar
Post by Chris Muller
That's very cool, but it's not the zips we're talking about. We're
talking about the artifacts produced by Jenkins, which are Squeak
images. I wrote the release process to produce a series of artifacts
with different names. Because nothing gets replaced/overwritten, disk
usage is unbound.
The rate of untamed growth is too fast.
I have no idea what you mean by that. I don't consider ~30MB of new
data to be "untamed growth", nor do I consider the accumulation of a
mere 6 GB over, what, 600 builds? to be "too fast".
Ken should not have to keep reminding us about space. We don't need
to keep 600 builds, and I respect Ken's concern about proactive
management.
Post by Frank Shearar
Post by Chris Muller
The result is that the redundant Magma-backed copy of
source.squeak.org consumes less than 1/4th the space of the original
File based version. A Magma-backed copy of squeaksource.com would
about 12GB of space.
For now, though, Frank has the ability and responsibility to trim up
the jenkins stuff. Thanks Frank.
I nuked the ReleaseSqueakTrunk target directory, freeing up 5.9 GB of
space. (We should always consider the target/ directory of every job
as being evanescent. Feel free to nuke these at any time. Every job
should be written such that it can reconstitute its working
environment in target/.
Automating something (builds) shouldn't introduce a new manual
process. There should be something to keep the size reasonable so we
don't have to remember do it manually every month..
Er. I explicitly decided to produce versioned artifacts that a human
would assess for quality. I stand by that decision. You _should_ be
able to easily go and say "Well, version N is crap, but N - 2 is just
fine. Let's just go with that."
Of course, so may we discuss something somewhere _in-between_ 600 and
none? Are the last 10 builds enough? 50?
Post by Frank Shearar
What's crazy is that we don't have graphs that we can use to easily
track disk usage, nor alerting mechanisms other than Ken to warn of
impending doom. Munin and Icinga would do fine, with Apache/nginx to
serve up status pages. If only someone with sufficient time had the
energy to put in the graft necessary to set up these standard sysadmin
tools.
Space monitoring and alerts are required for any 24/7 system
irregardless of the level of sustainability of processes running on
the machine. They're both important, but since we don't have alerts,
sustainability is all the more important.

You're now sharing box3 with SqueakSource. The best approach is for
everyone with access to the box to keep their resource impact
reasonable enough that it doesn't affect service or require alert
notices from others.

Thanks again.
Frank Shearar
2013-12-09 16:22:44 UTC
Permalink
Post by Chris Muller
Post by Frank Shearar
Post by Chris Muller
That's very cool, but it's not the zips we're talking about. We're
talking about the artifacts produced by Jenkins, which are Squeak
images. I wrote the release process to produce a series of artifacts
with different names. Because nothing gets replaced/overwritten, disk
usage is unbound.
The rate of untamed growth is too fast.
I have no idea what you mean by that. I don't consider ~30MB of new
data to be "untamed growth", nor do I consider the accumulation of a
mere 6 GB over, what, 600 builds? to be "too fast".
Ken should not have to keep reminding us about space. We don't need
to keep 600 builds, and I respect Ken's concern about proactive
management.
No, Ken absolutely should _not_ have to keep reminding us about
anything. But you are not suggesting a solution. The solution is this:
more people should be doing more to keep an eye on the box.
Post by Chris Muller
Post by Frank Shearar
Post by Chris Muller
The result is that the redundant Magma-backed copy of
source.squeak.org consumes less than 1/4th the space of the original
File based version. A Magma-backed copy of squeaksource.com would
about 12GB of space.
For now, though, Frank has the ability and responsibility to trim up
the jenkins stuff. Thanks Frank.
I nuked the ReleaseSqueakTrunk target directory, freeing up 5.9 GB of
space. (We should always consider the target/ directory of every job
as being evanescent. Feel free to nuke these at any time. Every job
should be written such that it can reconstitute its working
environment in target/.
Automating something (builds) shouldn't introduce a new manual
process. There should be something to keep the size reasonable so we
don't have to remember do it manually every month..
Er. I explicitly decided to produce versioned artifacts that a human
would assess for quality. I stand by that decision. You _should_ be
able to easily go and say "Well, version N is crap, but N - 2 is just
fine. Let's just go with that."
Of course, so may we discuss something somewhere _in-between_ 600 and
none? Are the last 10 builds enough? 50?
I am more than happy to hear suggestions on how to do that. One
suggestion is this: someone periodically takes the latest
ReleaseSqueakTrunk artefact, tries it out, blesses it by pushing it to
ftp.squeak.org, and then tells us. Some kind soul throws away the old
potential release candidates.
Post by Chris Muller
Post by Frank Shearar
What's crazy is that we don't have graphs that we can use to easily
track disk usage, nor alerting mechanisms other than Ken to warn of
impending doom. Munin and Icinga would do fine, with Apache/nginx to
serve up status pages. If only someone with sufficient time had the
energy to put in the graft necessary to set up these standard sysadmin
tools.
Space monitoring and alerts are required for any 24/7 system
irregardless of the level of sustainability of processes running on
the machine. They're both important, but since we don't have alerts,
sustainability is all the more important.
We don't have a 24/7 system, because those require operational staff.

Seriously, the big problem here is not that ReleaseSqueakTrunk did
what I told it to. The big problem here is that no one except Ken
cares enough about box3 to monitor it.

frank
Post by Chris Muller
You're now sharing box3 with SqueakSource. The best approach is for
everyone with access to the box to keep their resource impact
reasonable enough that it doesn't affect service or require alert
notices from others.
Thanks again.
Chris Muller
2013-12-09 05:47:48 UTC
Permalink
Magma itself supports persisting change-sets via the following API:

==============
"file-out a ChangeSet"
mySession commit: [ mySession codeBase fileOutChangeSet:
(ChangeSorter changeSetNamed: 'myChangeSet') ]

"Load a ChangeSet"
mySession codeBase fileInChangeSetNamed: 'myChangeSet'

"browse a change-set before filing it in"
mySession codeBase browseChangeSetNamed: 'myChangeSet'

"Answer a collectioon of all changeSet names in the codeBase."
mySession codeBase changeSetNames

"Install all the changeSets in the codeBase immediately."
mySession codeBase installChangeSets
===============

But I didn't do anything to integrate methods stored in change-sets
this way into the new MC History function.

Seems like it would be a neat thing to do though if someone had the time..
Post by David T. Lewis
I thought about it 4 years ago and made an object model to take care
of it. The problem with zip-files is how wasteful they are. When
someone changes one single method of the Morphic package, the other 1K
definitions (however many there are) are duplicated in new zip (mcz)
file. By contrast, the object model refers to the same canonicalized
MCDefinition instances across Versions, adding only one new
MCDefinition to the bulk of the model in that example.
The result is that the redundant Magma-backed copy of
source.squeak.org consumes less than 1/4th the space of the original
File based version. A Magma-backed copy of squeaksource.com would
about 12GB of space.
PS -- For interest, I just kicked off a bulk-load of entire
squeaksource.com repository into Magma to see how much space it will
take..
I changed the subject line, and am moving the discussion over to squeak-dev
because I think it may be of more general interest.
I'm quite interested to know how that turns out. Entirely aside from
disk space concerns, the approach you are describing makes a lot of
sense to me, and the squeaksource.com archive provides a fairly large
data set to try it out.
Bob Arning has been doing some really interesting things with a Seaside
browser for exploring the change set records of earlier Squeak development.
Meanwhile you (Chris) are doing equally interesting work to enable a
Monticello browser to browse through the historical record of source.squeak.org,
backed up a Magma-enabled image currently running on box4.squeak.org:8888.
I have a of vague hand-waving notion that these two should be related, and
that if I wanted to figure out how some method in e.g. ObjectMemory came
into being, it would be really convenient if I could explore its change
history to see various things that Eliot or Tim or I might have done in
recent years in the Monticello repositories, and continue back in time
through the change set update stream to see how and why Dan Ingalls might
have originally implemented it in Squeak 1.x.
Is there some way in which the change set based update stream from earlier
Squeak could also be captured in the Magma back end, similar to what you
are doing with the Monticello packages?
Dave
Ken Causey
2013-12-10 02:02:59 UTC
Permalink
I want to respond to some comments made in this thread.

First I want to admit that my posting on Friday was a bit shrill. I was
getting frustrated that this was my third or fourth post on this
increasingly problematic issue and little if any action had been taken.
Further there seemed a real possibility that over the next few days,
possibly over the weekend when I had little time to provide assistance,
that the file system for the server which hosts both build.squeak.org
and squeaksource.com would fill up. I have seen greater than 1% per 24
hour increases on that server in the past.

Thanks to Frank the immediate issue has been addressed and hopefully we
have a couple of weeks of breathing time now to consider how best to
avoid the issue in the future.

There has been some discussion regarding my admittedly somewhat extreme
comments regarding squeaksource.com. One thing that has been mentioned
is the idea that 'disk space is cheap'. I think that is easy to say and
true in general, but I'm not sure it is true in this specific case. I
will admit to possibly over-estimating the 'cost' but... Keep in mind
that we have no direct control over the configuration of either
box3.squeak.org and box4.squeak.org. These were contributed to us by
Gandi.net at the request of the Software Freedom Conservancy. Neither I
nor anyone else in our community has any access to modify the server
configuration and do things like add disk space. At best we have to go
through Software Freedom Conservancy for this. They don't have a lot of
time to spare to such issues themselves, further I don't think we should
make assumptions that Gandi.net is going to be willing to donate more
resources. I'm not sure it is even easy to throw money at the issue
given the fact that we are using donated resources. But then, I may
just be unreasonably pessimistic about this.

Someone kindly thanked me and gave the impression that I was the only
one that 'cared' enough to monitor the servers for such issues. Thanks
but don't assign me too much altruism or think that I'm so interested.
The minor amount of daily server checking I do is largely habit for me
and is an easy way for me to trigger a few endorphins and feel like I
have in some way contributed for the day.

To be honest my interest in Squeak and the community has been waning for
some time and is quite low at this point. Don't assume I'm going to
continue to do what little I do indefinitely. Someone else must step up
to take responsibility for the Squeak servers.

Ken
karl ramberg
2013-12-10 23:02:11 UTC
Permalink
Hi Ken
Thank you for all the times you helped out. I do not have much spare time
at the moment so I can not be of much help maintaining stuff.

Seems like interest in Squeak has peaked and now is in a downward slope.
But just wait another 20 years. It will come back !

Best regards,
Karl
Post by Ken Causey
I want to respond to some comments made in this thread.
First I want to admit that my posting on Friday was a bit shrill. I was
getting frustrated that this was my third or fourth post on this
increasingly problematic issue and little if any action had been taken.
Further there seemed a real possibility that over the next few days,
possibly over the weekend when I had little time to provide assistance,
that the file system for the server which hosts both build.squeak.org and
squeaksource.com would fill up. I have seen greater than 1% per 24 hour
increases on that server in the past.
Thanks to Frank the immediate issue has been addressed and hopefully we
have a couple of weeks of breathing time now to consider how best to avoid
the issue in the future.
There has been some discussion regarding my admittedly somewhat extreme
comments regarding squeaksource.com. One thing that has been mentioned
is the idea that 'disk space is cheap'. I think that is easy to say and
true in general, but I'm not sure it is true in this specific case. I will
admit to possibly over-estimating the 'cost' but... Keep in mind that we
have no direct control over the configuration of either box3.squeak.organd
box4.squeak.org. These were contributed to us by Gandi.net at the
request of the Software Freedom Conservancy. Neither I nor anyone else in
our community has any access to modify the server configuration and do
things like add disk space. At best we have to go through Software Freedom
Conservancy for this. They don't have a lot of time to spare to such
issues themselves, further I don't think we should make assumptions that
Gandi.net is going to be willing to donate more resources. I'm not sure it
is even easy to throw money at the issue given the fact that we are using
donated resources. But then, I may just be unreasonably pessimistic about
this.
Someone kindly thanked me and gave the impression that I was the only one
that 'cared' enough to monitor the servers for such issues. Thanks but
don't assign me too much altruism or think that I'm so interested. The
minor amount of daily server checking I do is largely habit for me and is
an easy way for me to trigger a few endorphins and feel like I have in some
way contributed for the day.
To be honest my interest in Squeak and the community has been waning for
some time and is quite low at this point. Don't assume I'm going to
continue to do what little I do indefinitely. Someone else must step up to
take responsibility for the Squeak servers.
Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20131210/ea0f8c24/attachment.htm
Continue reading on narkive:
Loading...