Buildbot proposal: combine portwatcher and portbuilder

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Buildbot proposal: combine portwatcher and portbuilder

Ryan Schmidt-24
The current buildbot setup has a number of problems that I believe could be solved by combining the currently separate portwatcher and portbuilder schedulers into a single ports scheduler.

I am not suggesting that we return to the behavior of the ports scheduler on the old macOS forge buildbot system in which a single build would build all the specified ports. We will keep the current method of building only one port (and its dependencies) per build.

The problems I want to solve are the following:

1. Currently, portwatcher is responsible for updating a copy of mpbb, MacPorts base and a ports tree that it shares with portbuilder. Having portbuilder maintain its own copy would waste a lot of time. If someone makes one commit that changes 100 ports and then no further commits occur for hours, we only want to update mpbb, MacPorts base and the ports tree once, not 100 times. But the fact that it's shared means that portwatcher must (and is configured to) wait for all triggered portbuilder builds to finish before it processes the next commit. This works fine, unless the buildmaster is stopped while portwatcher builds are pending. This has happened several times when the servers lost power during a power outage. (The servers are on a UPS, but the UPS does not provide as much instantaneous power as I expected, so if the servers are busy building, they draw more power than the UPS can instantaneously provide and the servers shut down immediately. I might remove the buildworker machines from the UPS and leave only the buildmaster, modem and router on it.) When buildmaster comes back online, it sees the portbuilder build that was in progress and starts it again, but it also sees the portwatcher build that was in progress and starts it again. Now we have a portwatcher running (updating mpbb, updating MacPorts, updating the ports tree, and updating its portindex) while portbuilder is trying to install a port. The portbuilder can fail if it is trying to install ports at the moment that portwatcher is updating the index (see https://trac.macports.org/ticket/53587).

2. If a single portwatcher build "X" triggers many portbuilder builds, and while those portbuilder builds are in progress another commit comes in that would affect those ports, it don't notice until all portbuilder builds triggered by "X" are finished. This can waste time building ports that are already superseded by newer versions or revisions. An extreme example of this is if we were to force a portwatcher build for all ports (which we might want to do when a new version of macOS is released). mpbb, MacPorts base and the ports tree would be updated once, and then it would schedule a portbuilder for each port that had not yet been built. Building all ports will take weeks. During that time, a commit may come through that updates a port to a new version. But if the build of the old version of the port was still pending at that time, the buildbot will build the old version, because it can't update the ports tree until the current portwatcher build is done waiting for its triggered portbuilder builds.

3. When there are portwatcher builds pending, we have no idea how many portbuilder builds are pending. It may say there are e.g. 3 portbuilder builds pending, but the pending portwatchers could trigger any number of additional portbuilders.

An objection to this proposal was that buildbot 0.8 does not have the capability to dynamically create scheduler steps at runtime. But that's not required and that's not what I'm proposing.

Buildbot has the ability to call a function for each step to determine if that step should run, by specifying the doStepIf property. I recently started using this feature in portwatcher to skip the two trigger steps if there are no ports in the port list:

https://github.com/macports/macports-infrastructure/commit/18135d6c75698f88b48698473c9364063fb6fba9

Here is an example of what that looks like when it runs:

https://build.macports.org/builders/ports-10.13_x86_64-watcher/builds/3989

The only port that was committed there had already been built so it was excluded from the port list, leaving the list empty, so the portbuilder trigger step was skipped (to save time) and the mirror trigger step was skipped (to prevent it from printing an error that no ports were specified). The skipped steps are still shown in the web interface, but if that's not desired, they can be hidden by also using the hideStepIf property.

So my proposed combined ports scheduler would still contain all of the steps of the current portwatcher and portbuilder schedulers, but each build would still conceptually "be" either a portwatcher or a portbuilder, and for each build, the steps that don't relate to that conceptual function would be skipped and hidden.

Buildbot has the ability to associate custom properties with a build. We use this to set a "portname" property when we trigger portbuilder builds. portwatcher builds don't have that property. (They have a "portlist" property if they were forced from the web interface, and always have a "fullportlist" property which is the portlist plus the computed list of ports that were modified by the commit.) So if a build has the "portname" property, it is a portbuilder, otherwise it is a portwatcher. We can then write some simple functions:

    def is_portbuilder(step):
        return step.hasProperty('portname')

    def is_portwatcher(step):
        return not step.hasProperty('portname')

    def is_skipped(result, step):
        return (result == buildbot.status.results.SKIPPED)

The steps of the combined ports scheduler would then be:

* update mpbb (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* clean (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* selfupdate MacPorts (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* sync ports tree (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* get the list of subports (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* trigger mirror scheduler (doStepIf=is_portwatcher, hideStepIf=is_skipped, waitForFinish=True)
* trigger ports scheduler for each subport (doStepIf=is_portwatcher, hideStepIf=True, waitForFinish=False)

* install dependencies (doStepIf=is_portbuilder, hideStepIf=is_skipped)
* install port (doStepIf=is_portbuilder, hideStepIf=is_skipped)
* gather archives (doStepIf=is_portbuilder, hideStepIf=is_skipped)
* upload archives to buildmaster (doStepIf=is_portbuilder, hideStepIf=is_skipped)
* deploy archives (doStepIf=is_portbuilder, hideStepIf=is_skipped)
* clean (doStepIf=is_portbuilder, hideStepIf=is_skipped)

(I'm advocating here for always hiding the ports trigger step. I don't find its output in the waterfall to be very useful and it can take up a lot of space if a lot of builds were triggered.)

This should solve problem (1). By having portwatcher and portbuilder tasks in the same queue, they can't run simultaneously so they can't cause the problems that happen when they run simultaneously.

Solving (2) and (3) requires an additional step. Buildbot allows you to define a nextBuild property when you create the builder, and pass it a function that determines which of the pending builds should go next.

http://docs.buildbot.net/0.8.12/manual/cfg-builders.html#builder-configuration

We would write a function that looks through the pending builds in the order in which they were scheduled, and picks the first one that is a portwatcher (the first one that doesn't have a "portname" property). If none are portwatchers, it picks the first* one. (* This is where we would later improve the situation to pick the next port in the correct dependency order, but I have another email about that.)

This solves (2) because now when a new commit comes in, a new "portwatcher" gets added to the end of the ports scheduler's queue, but it will be the next build picked immediately after the current "portbuilder" finishes, no matter how many other "portbuilders" are still pending. That will update the ports tree, so any pending builds for ports that were subsequently updated will build the now-current version, not the old version.

It also solves (3) because now if 100 "portbuilder" builds are pending, we don't have to wait until all of them are finished before we know how many builds all the pending "portwatcher" builds will schedule; we only have to wait until the current "portbuilder" finishes.

One drawback, depending on how you look at it, is that we would no longer be able to send a single combined email for all of the failed builds of a single commit, or of a single forced build. We would have to send an individual email for each failed build. Personally, I would prefer that, as the subject line of the email would make clear which port failed to build, rather than requiring me to open it to see what happened.

If we give the new combined scheduler a new name like "ports", that will invalidate all old links to build logs. That's not a deal-breaker for me, but since we often paste build log URLs into tickets, it would be nice to keep them alive if we can. Maybe defining empty portbuilder and portwatcher schedulers would be enough.

Reply | Threaded
Open this post in threaded view
|

Re: Buildbot proposal: combine portwatcher and portbuilder

Mojca Miklavec-2
Dear Ryan,

Conceptually I like the approach a lot.

*However*, this will generate an incomprehensible amount of emails
when a new build slave gets added or when some revbumps are done etc.
In particular from the 10.5/ppc machine. The individual emails will be
a lot more helpful, but the amount of them ...

This sounds totally hacky, but one way around that I can see is to
write a special-purpose mailer. Either a job on the buildbot or a
special external script that iterates through all the jobs from the
last hour from all the builders and creates build failure summaries
for each individual developer (author/committer/maintainer). Maybe
this could be a build job on the builder master, but I don't know how
tricky it would get to do it.

How would you handle duplicate entries in the build list? Dependency
order could also become semi-obsolete, but that's probably a price to
be payed.

I tried to play with buildNext, but I couldn't figure out how to read
properties of the build request.

Mojca

On 11 March 2018 at 10:25, Ryan Schmidt wrote:

> The current buildbot setup has a number of problems that I believe could be solved by combining the currently separate portwatcher and portbuilder schedulers into a single ports scheduler.
>
> I am not suggesting that we return to the behavior of the ports scheduler on the old macOS forge buildbot system in which a single build would build all the specified ports. We will keep the current method of building only one port (and its dependencies) per build.
>
> The problems I want to solve are the following:
>
> 1. Currently, portwatcher is responsible for updating a copy of mpbb, MacPorts base and a ports tree that it shares with portbuilder. Having portbuilder maintain its own copy would waste a lot of time. If someone makes one commit that changes 100 ports and then no further commits occur for hours, we only want to update mpbb, MacPorts base and the ports tree once, not 100 times. But the fact that it's shared means that portwatcher must (and is configured to) wait for all triggered portbuilder builds to finish before it processes the next commit. This works fine, unless the buildmaster is stopped while portwatcher builds are pending. This has happened several times when the servers lost power during a power outage. (The servers are on a UPS, but the UPS does not provide as much instantaneous power as I expected, so if the servers are busy building, they draw more power than the UPS can instantaneously provide and the servers shut down immediately. I might remove the buildworker machines from the UPS and leave only the buildmaster, modem and router on it.) When buildmaster comes back online, it sees the portbuilder build that was in progress and starts it again, but it also sees the portwatcher build that was in progress and starts it again. Now we have a portwatcher running (updating mpbb, updating MacPorts, updating the ports tree, and updating its portindex) while portbuilder is trying to install a port. The portbuilder can fail if it is trying to install ports at the moment that portwatcher is updating the index (see https://trac.macports.org/ticket/53587).
>
> 2. If a single portwatcher build "X" triggers many portbuilder builds, and while those portbuilder builds are in progress another commit comes in that would affect those ports, it don't notice until all portbuilder builds triggered by "X" are finished. This can waste time building ports that are already superseded by newer versions or revisions. An extreme example of this is if we were to force a portwatcher build for all ports (which we might want to do when a new version of macOS is released). mpbb, MacPorts base and the ports tree would be updated once, and then it would schedule a portbuilder for each port that had not yet been built. Building all ports will take weeks. During that time, a commit may come through that updates a port to a new version. But if the build of the old version of the port was still pending at that time, the buildbot will build the old version, because it can't update the ports tree until the current portwatcher build is done waiting for its triggered portbuilder builds.
>
> 3. When there are portwatcher builds pending, we have no idea how many portbuilder builds are pending. It may say there are e.g. 3 portbuilder builds pending, but the pending portwatchers could trigger any number of additional portbuilders.
>
> An objection to this proposal was that buildbot 0.8 does not have the capability to dynamically create scheduler steps at runtime. But that's not required and that's not what I'm proposing.
>
> Buildbot has the ability to call a function for each step to determine if that step should run, by specifying the doStepIf property. I recently started using this feature in portwatcher to skip the two trigger steps if there are no ports in the port list:
>
> https://github.com/macports/macports-infrastructure/commit/18135d6c75698f88b48698473c9364063fb6fba9
>
> Here is an example of what that looks like when it runs:
>
> https://build.macports.org/builders/ports-10.13_x86_64-watcher/builds/3989
>
> The only port that was committed there had already been built so it was excluded from the port list, leaving the list empty, so the portbuilder trigger step was skipped (to save time) and the mirror trigger step was skipped (to prevent it from printing an error that no ports were specified). The skipped steps are still shown in the web interface, but if that's not desired, they can be hidden by also using the hideStepIf property.
>
> So my proposed combined ports scheduler would still contain all of the steps of the current portwatcher and portbuilder schedulers, but each build would still conceptually "be" either a portwatcher or a portbuilder, and for each build, the steps that don't relate to that conceptual function would be skipped and hidden.
>
> Buildbot has the ability to associate custom properties with a build. We use this to set a "portname" property when we trigger portbuilder builds. portwatcher builds don't have that property. (They have a "portlist" property if they were forced from the web interface, and always have a "fullportlist" property which is the portlist plus the computed list of ports that were modified by the commit.) So if a build has the "portname" property, it is a portbuilder, otherwise it is a portwatcher. We can then write some simple functions:
>
>     def is_portbuilder(step):
>         return step.hasProperty('portname')
>
>     def is_portwatcher(step):
>         return not step.hasProperty('portname')
>
>     def is_skipped(result, step):
>         return (result == buildbot.status.results.SKIPPED)
>
> The steps of the combined ports scheduler would then be:
>
> * update mpbb (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * clean (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * selfupdate MacPorts (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * sync ports tree (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * get the list of subports (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * trigger mirror scheduler (doStepIf=is_portwatcher, hideStepIf=is_skipped, waitForFinish=True)
> * trigger ports scheduler for each subport (doStepIf=is_portwatcher, hideStepIf=True, waitForFinish=False)
>
> * install dependencies (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> * install port (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> * gather archives (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> * upload archives to buildmaster (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> * deploy archives (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> * clean (doStepIf=is_portbuilder, hideStepIf=is_skipped)
>
> (I'm advocating here for always hiding the ports trigger step. I don't find its output in the waterfall to be very useful and it can take up a lot of space if a lot of builds were triggered.)
>
> This should solve problem (1). By having portwatcher and portbuilder tasks in the same queue, they can't run simultaneously so they can't cause the problems that happen when they run simultaneously.
>
> Solving (2) and (3) requires an additional step. Buildbot allows you to define a nextBuild property when you create the builder, and pass it a function that determines which of the pending builds should go next.
>
> http://docs.buildbot.net/0.8.12/manual/cfg-builders.html#builder-configuration
>
> We would write a function that looks through the pending builds in the order in which they were scheduled, and picks the first one that is a portwatcher (the first one that doesn't have a "portname" property). If none are portwatchers, it picks the first* one. (* This is where we would later improve the situation to pick the next port in the correct dependency order, but I have another email about that.)
>
> This solves (2) because now when a new commit comes in, a new "portwatcher" gets added to the end of the ports scheduler's queue, but it will be the next build picked immediately after the current "portbuilder" finishes, no matter how many other "portbuilders" are still pending. That will update the ports tree, so any pending builds for ports that were subsequently updated will build the now-current version, not the old version.
>
> It also solves (3) because now if 100 "portbuilder" builds are pending, we don't have to wait until all of them are finished before we know how many builds all the pending "portwatcher" builds will schedule; we only have to wait until the current "portbuilder" finishes.
>
> One drawback, depending on how you look at it, is that we would no longer be able to send a single combined email for all of the failed builds of a single commit, or of a single forced build. We would have to send an individual email for each failed build. Personally, I would prefer that, as the subject line of the email would make clear which port failed to build, rather than requiring me to open it to see what happened.
>
> If we give the new combined scheduler a new name like "ports", that will invalidate all old links to build logs. That's not a deal-breaker for me, but since we often paste build log URLs into tickets, it would be nice to keep them alive if we can. Maybe defining empty portbuilder and portwatcher schedulers would be enough.
>
Reply | Threaded
Open this post in threaded view
|

Re: Buildbot proposal: combine portwatcher and portbuilder

Rainer Müller-4
In reply to this post by Ryan Schmidt-24
On 2018-03-11 10:25, Ryan Schmidt wrote:
> The current buildbot setup has a number of problems that I believe could be solved by combining the currently separate portwatcher and portbuilder schedulers into a single ports scheduler.
>
> I am not suggesting that we return to the behavior of the ports scheduler on the old macOS forge buildbot system in which a single build would build all the specified ports. We will keep the current method of building only one port (and its dependencies) per build.
>
> The problems I want to solve are the following:
>
> 1. Currently, portwatcher is responsible for updating a copy of mpbb, MacPorts base and a ports tree that it shares with portbuilder. Having portbuilder maintain its own copy would waste a lot of time. If someone makes one commit that changes 100 ports and then no further commits occur for hours, we only want to update mpbb, MacPorts base and the ports tree once, not 100 times. But the fact that it's shared means that portwatcher must (and is configured to) wait for all triggered portbuilder builds to finish before it processes the next commit. This works fine, unless the buildmaster is stopped while portwatcher builds are pending. This has happened several times when the servers lost power during a power outage. (The servers are on a UPS, but the UPS does not provide as much instantaneous power as I expected, so if the servers are busy building, they draw more power than the UPS can instantaneously provide and the servers shut down immediately. I might remove the buildworker machines from the UPS and leave only the buildmaster, modem and router on it.) When buildmaster comes back online, it sees the portbuilder build that was in progress and starts it again, but it also sees the portwatcher build that was in progress and starts it again. Now we have a portwatcher running (updating mpbb, updating MacPorts, updating the ports tree, and updating its portindex) while portbuilder is trying to install a port. The portbuilder can fail if it is trying to install ports at the moment that portwatcher is updating the index (see https://trac.macports.org/ticket/53587).

Do the steps for selfupdate/sync really hurt that much that cannot just
run them on every portbuilder run? Looking at the portwatcher build you
linked as an example, these steps only took a few seconds in total. Why
can we not just move these steps to the portbuilder?

At the moment, portwatcher and portbuilder are sharing resources, but
buildbot assumes each builder is isolated and that leads to these
problems when resuming builds. I guess we should not do that...

> 2. If a single portwatcher build "X" triggers many portbuilder builds, and while those portbuilder builds are in progress another commit comes in that would affect those ports, it don't notice until all portbuilder builds triggered by "X" are finished. This can waste time building ports that are already superseded by newer versions or revisions. An extreme example of this is if we were to force a portwatcher build for all ports (which we might want to do when a new version of macOS is released). mpbb, MacPorts base and the ports tree would be updated once, and then it would schedule a portbuilder for each port that had not yet been built. Building all ports will take weeks. During that time, a commit may come through that updates a port to a new version. But if the build of the old version of the port was still pending at that time, the buildbot will build the old version, because it can't update the ports tree until the current portwatcher build is done waiting for its triggered portbuilder builds.

To me it seems like this is only an issue for forced builds, not for the
builds scheduled by commits.

So maybe for this use case the force scheduler should be on a level one
higher in the hierarchy. Then the force scheduler would get a list of
ports and for each schedule a portwatcher with only one port name (or
use some other way of partitioning).

> 3. When there are portwatcher builds pending, we have no idea how many portbuilder builds are pending. It may say there are e.g. 3 portbuilder builds pending, but the pending portwatchers could trigger any number of additional portbuilders.

Why does it matter how many portbuilder builds will be scheduled later?
I do not see the problem...?


Overall, my immediate thought was that with the current portwatcher, we
could just not wait for the triggered builds to finish. That seems to
solve (1) and (2), but also lose the ability to send summary emails.
Although I did not think enough about this whether it would really work.

Am I missing something why we would definitely need to merge portwatcher
and portbuilder?

> An objection to this proposal was that buildbot 0.8 does not have the capability to dynamically create scheduler steps at runtime. But that's not required and that's not what I'm proposing.
>
> Buildbot has the ability to call a function for each step to determine if that step should run, by specifying the doStepIf property. I recently started using this feature in portwatcher to skip the two trigger steps if there are no ports in the port list:
>
> https://github.com/macports/macports-infrastructure/commit/18135d6c75698f88b48698473c9364063fb6fba9
>
> Here is an example of what that looks like when it runs:
>
> https://build.macports.org/builders/ports-10.13_x86_64-watcher/builds/3989
>
> The only port that was committed there had already been built so it was excluded from the port list, leaving the list empty, so the portbuilder trigger step was skipped (to save time) and the mirror trigger step was skipped (to prevent it from printing an error that no ports were specified). The skipped steps are still shown in the web interface, but if that's not desired, they can be hidden by also using the hideStepIf property.

I noticed this and I like this change. +1

> So my proposed combined ports scheduler would still contain all of the steps of the current portwatcher and portbuilder schedulers, but each build would still conceptually "be" either a portwatcher or a portbuilder, and for each build, the steps that don't relate to that conceptual function would be skipped and hidden.

> [...]

It sounds like a complete hack to use doStepIf this way...

> This should solve problem (1). By having portwatcher and portbuilder tasks in the same queue, they can't run simultaneously so they can't cause the problems that happen when they run simultaneously.
>
> Solving (2) and (3) requires an additional step. Buildbot allows you to define a nextBuild property when you create the builder, and pass it a function that determines which of the pending builds should go next.
>
> http://docs.buildbot.net/0.8.12/manual/cfg-builders.html#builder-configuration
>
> We would write a function that looks through the pending builds in the order in which they were scheduled, and picks the first one that is a portwatcher (the first one that doesn't have a "portname" property). If none are portwatchers, it picks the first* one. (* This is where we would later improve the situation to pick the next port in the correct dependency order, but I have another email about that.)
>
> This solves (2) because now when a new commit comes in, a new "portwatcher" gets added to the end of the ports scheduler's queue, but it will be the next build picked immediately after the current "portbuilder" finishes, no matter how many other "portbuilders" are still pending. That will update the ports tree, so any pending builds for ports that were subsequently updated will build the now-current version, not the old version.

But the old builds would be left in the queue and follow after the
portwatcher, so we still have multiple builds in the queue for the same
port...

As the builds were scheduled by an earlier commit (which buildbot keeps
in the "sourcestamp"), the attribution for notifications might be
difficult. I am afraid it could lead to reports of a build failure for
the wrong commit unless you manage to cancel the pending builds.

Buildbot would normally solve this by merging build requests. If a new
build request gets put into the queue, the old one is discarded. The
problem is that we cannot merge portwatcher builds, because we would
only be able to do that as soon as we know the list of ports. However,
we can only determine that once the portwatcher build is executed, so it
is no option for us.

https://docs.buildbot.net/0.8.12/manual/cfg-builders.html#merging-build-requests

If the portwatcher triggered the portbuilder and then exited, we could
look into merging the portbuilder jobs.

> It also solves (3) because now if 100 "portbuilder" builds are pending, we don't have to wait until all of them are finished before we know how many builds all the pending "portwatcher" builds will schedule; we only have to wait until the current "portbuilder" finishes.

> One drawback, depending on how you look at it, is that we would no longer be able to send a single combined email for all of the failed builds of a single commit, or of a single forced build. We would have to send an individual email for each failed build. Personally, I would prefer that, as the subject line of the email would make clear which port failed to build, rather than requiring me to open it to see what happened.

Didn't we invest a lot of effort to find a way to only send a single
email instead of sending one per build...?

> If we give the new combined scheduler a new name like "ports", that will invalidate all old links to build logs. That's not a deal-breaker for me, but since we often paste build log URLs into tickets, it would be nice to keep them alive if we can. Maybe defining empty portbuilder and portwatcher schedulers would be enough.

I would not care about the logs at all. They are only interesting for a
limited time and only for failed builds. It should be no problem to just
delete old logs.

Rainer
Reply | Threaded
Open this post in threaded view
|

Re: Buildbot proposal: combine portwatcher and portbuilder

Ryan Schmidt-24
In reply to this post by Mojca Miklavec-2

On Mar 11, 2018, at 12:32, Mojca Miklavec wrote:

> Dear Ryan,
>
> Conceptually I like the approach a lot.
>
> *However*, this will generate an incomprehensible amount of emails
> when a new build slave gets added or when some revbumps are done etc.
> In particular from the 10.5/ppc machine. The individual emails will be
> a lot more helpful, but the amount of them ...

Well you'll only get emails for failed builds, of course. Wouldn't you want to know if your ports failed to build on a new version of macOS?

Certainly build failure reports from old OS versions that you already know your ports will fail on are annoying, and we should fix that in base.


> This sounds totally hacky, but one way around that I can see is to
> write a special-purpose mailer. Either a job on the buildbot or a
> special external script that iterates through all the jobs from the
> last hour from all the builders and creates build failure summaries
> for each individual developer (author/committer/maintainer). Maybe
> this could be a build job on the builder master, but I don't know how
> tricky it would get to do it.

Well we can talk about using hourly summary emails instead of individual-build emails or whole-batch emails, but I don't see it as related to my proposal.

> How would you handle duplicate entries in the build list?

The first implementation would not alter how we handle duplicate entries. We currently decide in portwatcher whether to schedule a build of a port, based on whether we have already built that port at that time. Later, I would want to implement https://trac.macports.org/ticket/55078 to cause any duplicate jobs to exit early. Later still, we might look into removing duplicate jobs from the queue at the beginning of a build, assuming buildbot gives us an API to do that.

> Dependency
> order could also become semi-obsolete, but that's probably a price to
> be payed.

Scheduling a batch of builds in dependency order is still an important goal, unaffected by my proposed changes.

> I tried to play with buildNext, but I couldn't figure out how to read
> properties of the build request.

I saw your question to the buildbot mailing list. I haven't tried to use nextBuild yet, but I found an example from the webkit project:

https://trac.webkit.org/changeset/117508/webkit/trunk/Tools/BuildSlaveSupport

And here's somebody else's nextBuild function:

https://gitlab.com/lede/buildbot/blob/8911485994d498e02302c3d474166848b0d894e0/phase1/master.cfg#L249

Reply | Threaded
Open this post in threaded view
|

Re: Buildbot proposal: combine portwatcher and portbuilder

Ryan Schmidt-24
In reply to this post by Rainer Müller-4

On Mar 11, 2018, at 17:48, Rainer Müller wrote:

> On 2018-03-11 10:25, Ryan Schmidt wrote:
>> The current buildbot setup has a number of problems that I believe could be solved by combining the currently separate portwatcher and portbuilder schedulers into a single ports scheduler.
>>
>> I am not suggesting that we return to the behavior of the ports scheduler on the old macOS forge buildbot system in which a single build would build all the specified ports. We will keep the current method of building only one port (and its dependencies) per build.
>>
>> The problems I want to solve are the following:
>>
>> 1. Currently, portwatcher is responsible for updating a copy of mpbb, MacPorts base and a ports tree that it shares with portbuilder. Having portbuilder maintain its own copy would waste a lot of time. If someone makes one commit that changes 100 ports and then no further commits occur for hours, we only want to update mpbb, MacPorts base and the ports tree once, not 100 times. But the fact that it's shared means that portwatcher must (and is configured to) wait for all triggered portbuilder builds to finish before it processes the next commit. This works fine, unless the buildmaster is stopped while portwatcher builds are pending. This has happened several times when the servers lost power during a power outage. (The servers are on a UPS, but the UPS does not provide as much instantaneous power as I expected, so if the servers are busy building, they draw more power than the UPS can instantaneously provide and the servers shut down immediately. I might remove the buildworker machines from the UPS and leave only the buildmaster, modem and router on it.) When buildmaster comes back online, it sees the portbuilder build that was in progress and starts it again, but it also sees the portwatcher build that was in progress and starts it again. Now we have a portwatcher running (updating mpbb, updating MacPorts, updating the ports tree, and updating its portindex) while portbuilder is trying to install a port. The portbuilder can fail if it is trying to install ports at the moment that portwatcher is updating the index (see https://trac.macports.org/ticket/53587).
>
> Do the steps for selfupdate/sync really hurt that much that cannot just
> run them on every portbuilder run? Looking at the portwatcher build you
> linked as an example, these steps only took a few seconds in total. Why
> can we not just move these steps to the portbuilder?

I stand by my proposal. I want to update the ports tree clone on the buildbot workers only when it changes. I don't want to waste cycles updating a tree that has not changed.

> At the moment, portwatcher and portbuilder are sharing resources, but
> buildbot assumes each builder is isolated and that leads to these
> problems when resuming builds. I guess we should not do that...

If you want a separate portwatcher and portbuilder that each have their own separate copy of mpbb, MacPorts, the ports tree, and the portindex, not only is that more disk space used, but then a simple single port update commit will cause the buildworkers to have to do twice as much updating, which will take longer than what we're doing now. I want to optimize, not de-optimize! :)

>> 2. If a single portwatcher build "X" triggers many portbuilder builds, and while those portbuilder builds are in progress another commit comes in that would affect those ports, it don't notice until all portbuilder builds triggered by "X" are finished. This can waste time building ports that are already superseded by newer versions or revisions. An extreme example of this is if we were to force a portwatcher build for all ports (which we might want to do when a new version of macOS is released). mpbb, MacPorts base and the ports tree would be updated once, and then it would schedule a portbuilder for each port that had not yet been built. Building all ports will take weeks. During that time, a commit may come through that updates a port to a new version. But if the build of the old version of the port was still pending at that time, the buildbot will build the old version, because it can't update the ports tree until the current portwatcher build is done waiting for its triggered portbuilder builds.
>
> To me it seems like this is only an issue for forced builds, not for the
> builds scheduled by commits.

No, it is relevant for all situations that cause lots of ports to build. My extreme example was forcing a build of all ports, but the same problem would happen if you committed a change to many ports, such as a commit that occurred in the past to remove $Id$ lines from all ports, or commits that will happen in the future to batch-add GitHub maintainer handles and add file size to checksums.

> So maybe for this use case the force scheduler should be on a level one
> higher in the hierarchy. Then the force scheduler would get a list of
> ports and for each schedule a portwatcher with only one port name (or
> use some other way of partitioning).

I don't see any need to do this.

>> 3. When there are portwatcher builds pending, we have no idea how many portbuilder builds are pending. It may say there are e.g. 3 portbuilder builds pending, but the pending portwatchers could trigger any number of additional portbuilders.
>
> Why does it matter how many portbuilder builds will be scheduled later?
> I do not see the problem...?

I want some kind of indication of how much work the buildbot has left to do. We've already had to turn off the feature that estimates how much time the current build will take, since we don't know that, since each port takes a different amount of time to build. The only piece of information we have left that might provide some kind of answer is how many builds are pending, and our current setup doesn't give us that information either. My proposal provides this information quicker than our current setup.

> Overall, my immediate thought was that with the current portwatcher, we
> could just not wait for the triggered builds to finish. That seems to
> solve (1) and (2), but also lose the ability to send summary emails.
> Although I did not think enough about this whether it would really work.
>
> Am I missing something why we would definitely need to merge portwatcher
> and portbuilder?

I would rather ask you: am I missing something why we want them to be separate? The waterfall is a logfile with a fancy interface. I want to `tail -f` that logfile and see the list of events happening on the ports queue. I don't want to have to switch between two different columns of output.

The other problem I wanted to fix, that I forgot to mention, is that the waterfall is too wide. I want one column and one queue for each port builder, not two. Especially since the waterfall is about to get wider still when we add more jobs to it.


>> An objection to this proposal was that buildbot 0.8 does not have the capability to dynamically create scheduler steps at runtime. But that's not required and that's not what I'm proposing.
>>
>> Buildbot has the ability to call a function for each step to determine if that step should run, by specifying the doStepIf property. I recently started using this feature in portwatcher to skip the two trigger steps if there are no ports in the port list:
>>
>> https://github.com/macports/macports-infrastructure/commit/18135d6c75698f88b48698473c9364063fb6fba9
>>
>> Here is an example of what that looks like when it runs:
>>
>> https://build.macports.org/builders/ports-10.13_x86_64-watcher/builds/3989
>>
>> The only port that was committed there had already been built so it was excluded from the port list, leaving the list empty, so the portbuilder trigger step was skipped (to save time) and the mirror trigger step was skipped (to prevent it from printing an error that no ports were specified). The skipped steps are still shown in the web interface, but if that's not desired, they can be hidden by also using the hideStepIf property.
>
> I noticed this and I like this change. +1
>
>> So my proposed combined ports scheduler would still contain all of the steps of the current portwatcher and portbuilder schedulers, but each build would still conceptually "be" either a portwatcher or a portbuilder, and for each build, the steps that don't relate to that conceptual function would be skipped and hidden.
>
>> [...]
>
> It sounds like a complete hack to use doStepIf this way...

You could call our entire current buildbot setup a hack. Buildbot was intended to continuously build new versions of a single piece of software, not what we're doing. That didn't stop us from doing it...

>> This should solve problem (1). By having portwatcher and portbuilder tasks in the same queue, they can't run simultaneously so they can't cause the problems that happen when they run simultaneously.
>>
>> Solving (2) and (3) requires an additional step. Buildbot allows you to define a nextBuild property when you create the builder, and pass it a function that determines which of the pending builds should go next.
>>
>> http://docs.buildbot.net/0.8.12/manual/cfg-builders.html#builder-configuration
>>
>> We would write a function that looks through the pending builds in the order in which they were scheduled, and picks the first one that is a portwatcher (the first one that doesn't have a "portname" property). If none are portwatchers, it picks the first* one. (* This is where we would later improve the situation to pick the next port in the correct dependency order, but I have another email about that.)
>>
>> This solves (2) because now when a new commit comes in, a new "portwatcher" gets added to the end of the ports scheduler's queue, but it will be the next build picked immediately after the current "portbuilder" finishes, no matter how many other "portbuilders" are still pending. That will update the ports tree, so any pending builds for ports that were subsequently updated will build the now-current version, not the old version.
>
> But the old builds would be left in the queue and follow after the
> portwatcher, so we still have multiple builds in the queue for the same
> port...

As I said in my reply to Mojca, we would implement https://trac.macports.org/ticket/55078 to make those duplicate builds exit quickly, and later maybe find a way to remove pending duplicate builds entirely.

> As the builds were scheduled by an earlier commit (which buildbot keeps
> in the "sourcestamp"), the attribution for notifications might be
> difficult. I am afraid it could lead to reports of a build failure for
> the wrong commit unless you manage to cancel the pending builds.

I have not seen build emails in awhile because of the email delivery problem I haven't investigated yet, but I'm pretty sure we already have this "problem", if you really think it is a problem. When we first implemented the new buildbot, we always built the latest version of a port, not the version that was committed.* So if a developer commits, in rapid succession, r5 updating a port to version 1.4, r6 adding a library dependency to it, and r7 increasing its revision, and this all occurs while the buildbot is busy building r4, then, when the build for r4 finishes and portwatcher gets around to processing r5, it will actually update the git clone to r7 before scheduling the portbuilder. Assuming that succeeds, when portwatcher processes r6, it will realize that the port has already been built and exit without scheduling any portbuilders, and the same will happen again for r7. On the other hand, the build of r7 that was scheduled for r5 might still fail for whatever reason. In the email that gets sent about that, if we're not already doing so, we might want to point out if we built a different commit than the one that triggered the build.

*Larry fixed that in https://github.com/macports/macports-infrastructure/commit/588e94a585c46532585d571957634b73ce42253c, but this ended up wasting a lot of time building ports that had already been superseded. See https://trac.macports.org/ticket/54648. I reverted the fix in https://github.com/macports/macports-infrastructure/commit/f02854986a62259279dccf21b4d2bf3bed939762.


> Buildbot would normally solve this by merging build requests. If a new
> build request gets put into the queue, the old one is discarded. The
> problem is that we cannot merge portwatcher builds, because we would
> only be able to do that as soon as we know the list of ports. However,
> we can only determine that once the portwatcher build is executed, so it
> is no option for us.
>
> https://docs.buildbot.net/0.8.12/manual/cfg-builders.html#merging-build-requests
>
> If the portwatcher triggered the portbuilder and then exited, we could
> look into merging the portbuilder jobs.

Hmm. Well, I don't know about that. I'm still hopeful that buildbot has an API we can use to inspect and cancel upcoming jobs. This feels like a minor optimization that can be done later. It would be good to do, but implementing https://trac.macports.org/ticket/55078 would get us most of the way there.


>> It also solves (3) because now if 100 "portbuilder" builds are pending, we don't have to wait until all of them are finished before we know how many builds all the pending "portwatcher" builds will schedule; we only have to wait until the current "portbuilder" finishes.
>
>> One drawback, depending on how you look at it, is that we would no longer be able to send a single combined email for all of the failed builds of a single commit, or of a single forced build. We would have to send an individual email for each failed build. Personally, I would prefer that, as the subject line of the email would make clear which port failed to build, rather than requiring me to open it to see what happened.
>
> Didn't we invest a lot of effort to find a way to only send a single
> email instead of sending one per build...?

Honestly, I didn't find the combined emails helpful. As I recall, if I made a commit that scheduled many port builds, and one of them failed to build, the email that was sent contained a link to every port's log, with no indication of which of them failed; I'm not going to click 100 or 50 or 10 log links to find the one that I was meant to look at.

Of course, we could improve the content of the emails and make it more useful.

Thinking about it further, there's no reason why we can't send a single email for each batch of builds. We can store the build logs grouped by a batch id. We probably already have some property that could be used as the batch id, such as the id of the portwatcher build that triggered the builds. Since we'll be triggering the builds in a predictable order, we know which build is the last one. We can flag the last build of the batch with a unique property. When a build has that property, that means it should gather all the logs from that batch and send it out.

But personally I would rather receive individual emails, as soon as a build fails. If we schedule a build of all ports, do you want to know that your port failed to build right when it happened, so that you can maybe fix it quickly and thus prevent some dependent ports from failing to build as a result, or do you want to wait weeks for all the other ports to finish first before you're notified?


>> If we give the new combined scheduler a new name like "ports", that will invalidate all old links to build logs. That's not a deal-breaker for me, but since we often paste build log URLs into tickets, it would be nice to keep them alive if we can. Maybe defining empty portbuilder and portwatcher schedulers would be enough.
>
> I would not care about the logs at all. They are only interesting for a
> limited time and only for failed builds. It should be no problem to just
> delete old logs.

I care, because as I said I often paste build or log URLs into tickets, both MacPorts tickets and upstream tickets.

We could keep the combined scheduler named "portbuilder" to keep the old logs; we don't care about the portwatcher logs.

Or we could name the new scheduler "ports" but keep the logs and last build id of the portbuilder scheduler, and use a web server rule to fix up old portbuilder build URLs.


Reply | Threaded
Open this post in threaded view
|

Re: Buildbot proposal: combine portwatcher and portbuilder

Mojca Miklavec-2
Dear Ryan,

Please take a look at the discussion summary we just had at the meeting:

https://trac.macports.org/wiki/Meetings/MacPortsMeeting2018/BuildbotRestructuring

Regarding emails: we figured out that it makes no sense to make our
design decisions based on how annoying it would be to write emails. It
will probably be easier to write a separate set of scripts to send
individual emails separately.

Regarding the width of the waterfall: buildbot one solves that, the
waterfall is much leaner there :)

I have another request: could you please package buildbot 1.1 in a
Portfile? Ideally those who have buildbot installed now should get the
port replaced_by buildbot-0.8 and those who install it from scratch
should get buildbot version 1.0, but I'm not sure if MacPorts
currently support such a migration scheme. It would make more sense to
have buildbot 1.1 named "buildbot" and the old one named
"buildbot-0.8" though. Perhaps the new one should be called buildbot-1
after all, at least for a while, I don't know.

Mojca
Reply | Threaded
Open this post in threaded view
|

Re: Buildbot proposal: combine portwatcher and portbuilder

Ryan Schmidt-24

On Mar 13, 2018, at 09:50, Mojca Miklavec wrote:

> Dear Ryan,
>
> Please take a look at the discussion summary we just had at the meeting:
>
> https://trac.macports.org/wiki/Meetings/MacPortsMeeting2018/BuildbotRestructuring


So...

> Ryan's proposal
>
> Ryan asked us to get rid of the separate portwatcher & portbuilder jobs and re-configure the remaining job to interleave the two types of actions. As we understood it, this was to solve the following problem:
>
> • A commit for portA comes in
> • portwatcher for this commit schedules a portbuilder job for portA
> • This build takes a long time
> • While the build is still running, a new commit for the same port arrives, which queues a portwatcher job
> • Another commit for the same port arrives, which queues another portwatcher job
> • When the portbuilder job finishes, a useless build is scheduled for portA.

Well, one of the several problems it would solve.

> He proposed the following to solve this:
>
>  +---------------+
>  |    commit1    |
>  +---------------+
>  |    prepare    |
>  |   resources   |
>  +---------------+
>  |   scheduler1  |
>  +---------------+
>  +---------------+
>  |   port1 dep1  |
>  +---------------+
>  +---------------+
>  |   port1 dep2  |
>  +---------------+
>  +---------------+
>  |    commit2    |
>  +---------------+
>  |    prepare    |
>  |   resources   |
>  +---------------+
>  |   scheduler2  |
>  +---------------+
>  +---------------+
>  |   port1 dep2  |
>  +---------------+
>  +---------------+
>  |     port1     |
>  +---------------+
>  +---------------+
>  |   port2 dep1  |
>  +---------------+
>  +---------------+
>  |   port2 dep2  |
>  +---------------+
>  +---------------+
>  |     port2     |
>  +---------------+

If those boxes for "port2 dep1", "port2 dep2, "port2" are meant to all be separate builds, then that's not what I proposed. I proposed that each build in the combined scheduler would either perform the steps of the current portwatcher, or the steps of the current portbuilder. So what you've shown as separate "port2 dep1" and "port2 dep2" builds or steps, I was proposing would be our current "install deps of port2" portbuilder step, to be followed in the same build by an "install port2" step.


> Unfortunately that introduces the problem with the prepared shared resources (like the portindex, mpbb checkout, ports tree), because we really cannot change the portstree while we have build scheduled in a dependency order that was computed from the old ports tree. This would cause seemingly random and hard-to-debug problems if a follow-up commit changes dependencies of ports.

I don't think this causes any problems; on the contrary, it solves them. If on April 1 I schedule a build of all ports, and it takes a month to finish them all, I don't want it to spend the rest of the month building ports as they were on April 1. I want it to build ports as they are at the time that it gets to the build. If a commit comes in on April 2, I want that to be reflected in the previously-scheduled but still pending builds.

For a specific example on my proposed setup, let's say I:

* force a build of zlib @1.2.11, libpng @1.6.34_1, pngpp @0.2.5_2, and pigz @2.4. (I only specify the port names of course, but those are the versions and revisions that are current at the time I force the build.) This schedules build b1, which "is" a "portwatcher".
* b1(portwatcher) updates the shared resources, and schedules four builds: b2(portbuilder:zlib), b3(portbuilder:libpng), b4(portbuilder:pngpp), b5(portbuilder:pigz).
* b2(portbuilder:zlib) installs dependencies of zlib, installs zlib @1.2.11, cleans up.
* b3(portbuilder:libpng) installs dependencies of libpng, installs libpng @1.6.34_1, cleans up.
* While libpng was building, commit abcd came in that updates zlib to the long-awaited version 2.0! It also revbumps all dependents because zlib 2 has a new library version. This schedules b6(portwatcher) at the end of the queue.
* b6(portwatcher) is selected to run next, because it is a portwatcher and has precedence over all portbuilders. It updates shared resources, schedules b7(portbuilder:zlib) and lots of other builds for the dependents that got revbumped, which includes libpng and pigz.
* b4(portbuilder:pngpp) installs dependencies of pngpp, which include libpng and zlib. Since the shared resources have already been updated by b6(portwatcher), this step of this build is where zlib @2.0 and the revbumped libpng @1.6.34_2 end up getting built. It then installs pngpp @0.2.5_2 and cleans up.
* b5(portbuilder:pigz) installs dependencies of pigz, installs pigz @2.4_1, cleans up. This build was scheduled before commit abcd came in, yet thanks to the shared resources being updated by b6(portwatcher) earlier, it is able to build the latest version of the port available now, and not waste time building an already-superseded version.
* b7(portbuilder:zlib) notices that zlib @2.0 has already been installed and exits early.


> If we were to wait for the steps planned by the first scheduler to finish, we would end up in the same situation as outlined at the beginning, which was our initial approach, but only works with buildbot >= 0.9.
>
> To avoid the problem outlined by Ryan, we could also just enable build merging on the portwatcher, which would avoid the spurious builds.

I'm aware of the build merging feature but haven't played with it much. I don't know if it would work for that here. We can try it. In our current setup, where portwatcher waits for portbuilders and lots of portwatchers could be pending, merging them could be helpful, if it works.

But I want to continue to pursue my idea, since it would solve more problems than just merging portwatcher builds would.


> Regarding emails: we figured out that it makes no sense to make our
> design decisions based on how annoying it would be to write emails. It
> will probably be easier to write a separate set of scripts to send
> individual emails separately.
>
> Regarding the width of the waterfall: buildbot one solves that, the
> waterfall is much leaner there :)

...since it removes all the information I want to see...


> I have another request: could you please package buildbot 1.1 in a
> Portfile? Ideally those who have buildbot installed now should get the
> port replaced_by buildbot-0.8 and those who install it from scratch
> should get buildbot version 1.0, but I'm not sure if MacPorts
> currently support such a migration scheme. It would make more sense to
> have buildbot 1.1 named "buildbot" and the old one named
> "buildbot-0.8" though. Perhaps the new one should be called buildbot-1
> after all, at least for a while, I don't know.

Yes I should do that.

https://trac.macports.org/ticket/53006