Re: [MacPorts] #60590: Macports mirrors are down?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

Clemens Lang-2
On Thu, Jun 04, 2020 at 02:38:20PM -0400, Herby G wrote:
> Anything to be done about having such a single point of failure like
> this in MacPorts infra?

Unfortunately running our own rsync server costs us money, so ever since
our server hardware is no longer being sponsored by Apple a couple of
years ago, we have relied on the services of University of
Erlangen-Nuremberg to run our main mirror for us. This main mirror syncs
all its contents from a private mirror that is run by Ryan Schmidt,
which is where the buildbots put the files.

This has served us well so far, and the downtime of this mirror would
not be a big issue if it weren't for rsync – for all other attempts to
download from a mirror, MacPorts will automatically fall back to other
mirrors.

We could extend the code that updates MacPorts itself and the ports tree
[1,2] with retying at some of the other mirrors (although this may have
undesired effects if you sync with a mirror that happens to have an
older state than what you have locally), or switch to a different
source, e.g. git [3], or a http server, preferrably using some diff
mechanism to avoid downloading the entire thing from scratch every time.

None of those have been implemented yet, because it would require quite
a bit of work to get all the corner cases right (e.g. we do care about
the timestamps of the synced files). If you want to help out, that would
be welcome.

[1] https://github.com/macports/macports-base/blob/master/src/macports1.0/selfupdate.tcl#L69-L84
[2] https://github.com/macports/macports-base/blob/master/src/macports1.0/macports.tcl#L2663-L2694
[3] https://trac.macports.org/wiki/howto/SyncingWithGit

HTH,
--
Clemens
Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

Christopher Chavez
On 6/5/2020 12:32 PM, Clemens Lang wrote:
> or switch to a different source, e.g. git
As a user, I've thought it would be great if MacPorts used git syncing
by default. I have used git-over-https since 2016 when I was constrained
to ports 80/443 by a university campus network. I had found it also
avoided needlessly writing hundreds of MB to the SSD when updating the
ports tree. If GitHub outages are a concern, then I would think falling
back to a mirror hosted by GitLab/SourceForge/Bitbucket/etc. should be
possible.

Christopher A. Chavez
Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

Saagar Jha
Though it’s not the default, you can actually tell MacPorts to use Git to sync your ports, at least; see the third part of section 2.3.3 in the guide.

Saagar Jha

On Jun 5, 2020, at 16:05, Christopher Chavez <[hidden email]> wrote:

On 6/5/2020 12:32 PM, Clemens Lang wrote:
or switch to a different source, e.g. git
As a user, I've thought it would be great if MacPorts used git syncing
by default. I have used git-over-https since 2016 when I was constrained
to ports 80/443 by a university campus network. I had found it also
avoided needlessly writing hundreds of MB to the SSD when updating the
ports tree. If GitHub outages are a concern, then I would think falling
back to a mirror hosted by GitLab/SourceForge/Bitbucket/etc. should be
possible.

Christopher A. Chavez

Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

Ralph Seichter-3
* Saagar Jha:

> Though it’s not the default, you can actually tell MacPorts to use Git
> to sync your ports [...]

I had a glimpse at the guide. Am I correct to assume that it would be
possible to use a shallow Git clone?

-Ralph
Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

Clemens Lang-2
In reply to this post by Christopher Chavez
Hi,

On Fri, Jun 05, 2020 at 06:05:41PM -0500, Christopher Chavez wrote:
> As a user, I've thought it would be great if MacPorts used git syncing
> by default. I have used git-over-https since 2016 when I was
> constrained to ports 80/443 by a university campus network. I had
> found it also avoided needlessly writing hundreds of MB to the SSD
> when updating the ports tree. If GitHub outages are a concern, then I
> would think falling back to a mirror hosted by
> GitLab/SourceForge/Bitbucket/etc. should be possible.

The reason why we haven't done this is that you would no longer get a
prebuilt PortIndex. You would have to generate the PortIndex locally,
which costs CPU time and is noticeably slower than downloading a
matching PortIndex from rsync.

We might eventually figure out a way to provide prebuilt portindexes via
http when syncing from Git, but that needs to be set up on the server
side, implemented on the client side and thoroughly tested.

--
Clemens
Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

Ruben Di Battista
I'm probably saying something completely wrong, but can't you leverage the CI system and buildbots to create a Github release that includes the PortIndex and whatsoever and then download it from Github servers via http? What am I missing here?



On Sat, 6 Jun 2020, 02:49 Clemens Lang, <[hidden email]> wrote:
Hi,

On Fri, Jun 05, 2020 at 06:05:41PM -0500, Christopher Chavez wrote:
> As a user, I've thought it would be great if MacPorts used git syncing
> by default. I have used git-over-https since 2016 when I was
> constrained to ports 80/443 by a university campus network. I had
> found it also avoided needlessly writing hundreds of MB to the SSD
> when updating the ports tree. If GitHub outages are a concern, then I
> would think falling back to a mirror hosted by
> GitLab/SourceForge/Bitbucket/etc. should be possible.

The reason why we haven't done this is that you would no longer get a
prebuilt PortIndex. You would have to generate the PortIndex locally,
which costs CPU time and is noticeably slower than downloading a
matching PortIndex from rsync.

We might eventually figure out a way to provide prebuilt portindexes via
http when syncing from Git, but that needs to be set up on the server
side, implemented on the client side and thoroughly tested.

--
Clemens
Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

Clemens Lang-2
Hi,

On Sat, Jun 06, 2020 at 12:57:19PM +0200, Ruben Di Battista wrote:
> I'm probably saying something completely wrong, but can't you leverage
> the CI system and buildbots to create a Github release that includes
> the PortIndex and whatsoever and then download it from Github servers
> via http? What am I missing here?

The ports tree changes multiple times a day. We currently build a new
PortIndex every few hours, and the way we sync things (PortIndex
together with the ports tree), we automatically ensure that the
PortIndex matches the files you have on disk. Additionally, we sync the
file modification time, so even if the PortIndex does not match exactly,
we can easily compute on the client side which ports have not been
indexed yet.

This is not as simple with Git. Git sets the mtime of modified files to
the date at which you did a checkout on them, not the time when they
were committed. The portindex command, however, compares the PortIndex
mtime with the mtimes of the portfiles.

The unsolved problems are:
1. Do we really want to create a new GitHub release every few hours?
2. How does MacPorts know which ports are correctly indexed in a
   downloaded PortIndex, and which ones were added afterwards and thus
   need to be re-indexed locally?
3. How do we ensure efficient transfer of the PortIndex? The PortIndex
   is currently ~15M, rsync efficiently only copies the delta for us.
   With hosting a precompiled PortIndex on GitHub, how do we avoid
   downloading the entire PortIndex on every sync?

I guess the solution for (2) could be to include the Git hash that was
indexed, compute the modified files between that hash and HEAD, touch
them, and re-run portindex.

I don't have a good suggestion for (3) yet. It might be possible to
generate daily PortIndex diffs, download all of those between the time
of the last update and now and apply them in-order. Such a mechanism
would have to work with any local modifications users might have in
their tree, though. As soon as the size of the diffs is larger than just
re-downloading the entire thing, we could switch to that.

Technically, all of this is doable. However, somebody actually needs to
sit down and implement it.

--
Clemens
Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

ryandesign2
Administrator


On Jun 6, 2020, at 09:04, Clemens Lang wrote:

(stuff about distributing PortIndex via git)

We used to commit the PortIndex into the subversion repository every hour if anything had changed. Here's the last time we did that, 10 years ago:

https://trac.macports.org/changeset/68632

We stopped doing it after that for many reasons: Having generated files in version control is an antipattern. Seeing automated commit messages potentially every hour is annoying. Seeing them in the commit log is annoying. And it doesn't account for the fact that the portindex can have different contents on different os.major/os.arch combinations, which is why we now generate the portindex separately for each os.major/os.arch combination in mprsyncup and MacPorts knows how to get the index appropriate for the user's system. We currently have 14 different portindexes:

PortIndex_darwin_8_i386
PortIndex_darwin_8_powerpc
PortIndex_darwin_9_i386
PortIndex_darwin_9_powerpc
PortIndex_darwin_10_i386
PortIndex_darwin_11_i386
PortIndex_darwin_12_i386
PortIndex_darwin_13_i386
PortIndex_darwin_14_i386
PortIndex_darwin_15_i386
PortIndex_darwin_16_i386
PortIndex_darwin_17_i386
PortIndex_darwin_18_i386
PortIndex_darwin_19_i386


Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

Clemens Lang-2
Hi Ryan,

On Sun, Jun 07, 2020 at 05:24:23PM -0500, Ryan Schmidt wrote:
> (stuff about distributing PortIndex via git)

I don't think that was ever suggested. There was a suggestion to somehow
distribute PortIndexes related to Git versions, but never directly in
the Git history.

> We used to commit the PortIndex into the subversion repository every
> hour if anything had changed. Here's the last time we did that, 10
> years ago:

This antipattern only gets worse with Git, because you always get the
entire history and cannot get rid of it ever again. I see the conversion
from SVN -> Git as the last step where we ever had the chance to rewrite
our repository history.

That doesn't really mean we can't use Git's version history in relation
to the PortIndex, though. We could build the various platform-specific
PortIndexes once a day, and serve them indexed with a Git commit ID,
which would give users (ideally in an automated fashion) an idea of what
state the PortIndex represents, and which ports need to be (re-)indexed
locally.

I've checked, and my PortIndex is 1.5M when gzipped. This is the range
where I would say we could get away with only serving a PortIndex in its
entirety once a day, have port(1) download the latest one, and re-index
any ports locally that aren't (a) in that PortIndex, or (b) have been
committed after that.

WDYT?

--
Clemens
Reply | Threaded
Open this post in threaded view
|

Re: [MacPorts] #60590: Macports mirrors are down?

ryandesign2
Administrator


On Jun 8, 2020, at 17:24, Clemens Lang wrote:

> Hi Ryan,
>
> On Sun, Jun 07, 2020 at 05:24:23PM -0500, Ryan Schmidt wrote:
>> (stuff about distributing PortIndex via git)
>
> I don't think that was ever suggested. There was a suggestion to somehow
> distribute PortIndexes related to Git versions, but never directly in
> the Git history.
>
>> We used to commit the PortIndex into the subversion repository every
>> hour if anything had changed. Here's the last time we did that, 10
>> years ago:
>
> This antipattern only gets worse with Git, because you always get the
> entire history and cannot get rid of it ever again. I see the conversion
> from SVN -> Git as the last step where we ever had the chance to rewrite
> our repository history.
>
> That doesn't really mean we can't use Git's version history in relation
> to the PortIndex, though. We could build the various platform-specific
> PortIndexes once a day, and serve them indexed with a Git commit ID,
> which would give users (ideally in an automated fashion) an idea of what
> state the PortIndex represents, and which ports need to be (re-)indexed
> locally.
>
> I've checked, and my PortIndex is 1.5M when gzipped. This is the range
> where I would say we could get away with only serving a PortIndex in its
> entirety once a day, have port(1) download the latest one, and re-index
> any ports locally that aren't (a) in that PortIndex, or (b) have been
> committed after that.
>
> WDYT?

I'm glad we're thinking about ways that we might be able to move away from rsync and towards https to avoid the types of problems that various users keep having with rsync due to network restrictions, and so that we might be able to leverage our CDN to make it faster and more resilient against temporary server outages. But I don't think I can devote the necessary time and thought to this problem to provide any useful input right now.