Sorry, intended to send this to the dev list...
> Begin forwarded message:
> From: Craig Treleaven <[hidden email]>
> Subject: Re: Call for designers for our ports website
> Date: June 12, 2020 at 2:31:52 PM EDT
> To: Mojca Miklavec <[hidden email]>
>> On Jun 12, 2020, at 1:24 PM, Mojca Miklavec <[hidden email]> wrote:
>> Dear MacPorters,
>> As part of a GSOC project Arjun has been working on great new features
>> for our web application with information about ports.
>> The application from last year has been deployed at
>> while the new testing site is temporarily located at
>> The website already looks nice, but if we had some talented designers
>> among our users willing to help us go one step beyond what we have
>> right now, we would be extremely grateful for either just some advice
>> or potentially some more extensive help. There are a lot of minor
>> tweaks that could be done, but neither of us is a designer, and I'm
>> not able to give any competent advice about how to best improve the
>> Here are some concrete examples of subpages:
>> - http://macports.silentfox.tech/port/root6/
>> - http://macports.silentfox.tech/port/gnuplot/stats/?days=365&days_ago=0
>> - http://macports.silentfox.tech/search/?installed_file=&q=root&name=on
>> Thank you very much in advance,
> Not a designer, but…
> Re "Port Installations by month" 
> In the example referenced above, a new version of gnuplot was apparently made available in March 2020. At first glance, the chart used makes it look like the number of installations of this port jumped up from about 82 in Feb. 2020 to 130 in March 2020; falling back in April 2020 to about 110. This, however, is a distortion introduced by our weekly submissions being summarized into monthly buckets. I believe we should be reporting the _percentage_ of installations by version rather than the raw numbers. Using this example, about 94% of reporting systems were on version 5.2.7 in Feb. 2020. In March, 69% of submissions identified version 5.2.7 and 27% version 5.2.8. In April 2020, the submissions reporting version 5.2.7 was down to 36% and 5.2.8 was up to 59%. I believe this more clearly communicates the degree to which reporters have updated to the most-recently released version.
> If we want to show the number of installations by month irrespective of version (and I think that is useful information), we should use the current version of the chart “Installations by month” 
>  http://macports.silentfox.tech/port/gnuplot/stats/?days=365&days_ago=0
>  https://ports.macports.org/port/gnuplot/stats?days=365&days_ago=0
Thank you. You make a valid point regarding the possible distortion due to weekly submissions being bundled to calculate monthly charts.
But what we are seeing here is a known issue with the query that calculates this chart.
The current query has a limitation. Let’s say we receive two submissions from a user within one month. One has port version X.1 and the other has upgraded version X.2, then this query counts that user as using both the versions and not just the latest one. This is the cause for the sudden jump in Mar 2020. This problem is only with the "versions vs month" chart and should be fixed soon. Rest all charts, including "installations by month" display accurate information (https://ports.macports.org/port/gnuplot/stats?days=365).
Thank you for the percentage suggestion. I am just wondering the right way to display that information graphically.
I was trying to combine "installations by months" and "versions by month", but it turns out they would be better separate.
The following is a quick mockup of how versions over time might be reported:
To display percentages, I don’t think we need the ‘count distinct’. Suppose only a single system is reporting that it uses a particular port and that port is updated during the month. Suppose further, that the first two reports in the month from that single system say it is using version 1.0 and the last two say it is has version 1.1 installed. Given the way we collect stats, I think it would be accurate to report usage as 50% for each of the versions of the port for that month. Over the course of the month, that was what was reported.
Either that or only use the last report for the month and have the date displayed as the last day of each month.
Thinking about it a little more, I would suggest using an area chart something like this:
Using areas rather than stacked bars emphasizes the trends in the data (presenting it as changing uniformly throughout the month). In this chart, it highlights how the installed base of a port migrates to a new version over time (with some holdouts).
Conversely, I would suggest the that “Port installations by month” chart should be a bar chart rather than an area chart.
Note that I wasn’t trying to suggest that we need a table of numeric percentages alongside the graphical representation. I think the table will be redundant with a good chart.
Also, a quibble on the word “users”. More than a few of our users administer several systems that are all reporting statistics*. We know how many systems are submitting stats but we really don’t know the number of individual users that represents.
Thanks for ‘sweating the details’ on this project!
* In fact, if MacPorts is installed in more than one prefix on a system, couldn’t each prefix be submitting statistics independently?
On 2020-6-14 05:38 , Craig Treleaven wrote:
> Also, a quibble on the word “users”. More than a few of our users
> administer several systems that are all reporting statistics*. We know
> how many systems are submitting stats but we really don’t know the
> number of individual users that represents.
I guess something like "installations" would be more accurate.
> * In fact, if MacPorts is installed in more than one prefix on a system,
> couldn’t each prefix be submitting statistics independently?
The mpstats LaunchDaemon has a fixed label, so it can only be loaded
once per machine.
|Free forum by Nabble||Edit this page|