11_x86 build - build failures due to 'no space on device'

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

11_x86 build - build failures due to 'no space on device'

Christopher Jones
Hi,

I am seeing some large builds fail due to drive space issues

error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't write output file: bazel-out/host/bin/tensorflow/python/_pywrap_tfcompile.so.XXXXXX (No space left on device)
Could the space on the drive for this build be reviewed and purged / increased a bit ?

cheers Chris

smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator
On Feb 11, 2021, at 04:21, Christopher Jones wrote:

> I am seeing some large builds fail due to drive space issues
>
> https://build.macports.org/builders/ports-11_x86_64-builder/builds/21466/steps/install-port/logs/stdio
> error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't write output file: bazel-out/host/bin/tensorflow/python/_pywrap_tfcompile.so.XXXXXX (No space left on device)
> Could the space on the drive for this build be reviewed and purged / increased a bit ?

Sigh... not again...

Josh reported the same thing one month ago on the infrastructure list. At the time, I discovered that Xcode had created two separate copies of a CoreSimulator directory, each containing cache files for simulating iOS, iPadOS, watchOS and tvOS environments, none of which we need or asked for, and each copy occupied 9.2GB. At the time, I deleted one of them, leaving us with 25GB free space. I have now deleted the second one, leaving us with 11GB free space and additional space will be freed once the currently running build of py37-tensorflow finishes. But it seems likely that Xcode will silently recreate this enormous cache at some point in the future.


Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator
On Feb 11, 2021, at 18:45, Ryan Schmidt wrote:

> On Feb 11, 2021, at 04:21, Christopher Jones wrote:
>
>> I am seeing some large builds fail due to drive space issues
>>
>> https://build.macports.org/builders/ports-11_x86_64-builder/builds/21466/steps/install-port/logs/stdio
>> error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't write output file: bazel-out/host/bin/tensorflow/python/_pywrap_tfcompile.so.XXXXXX (No space left on device)
>> Could the space on the drive for this build be reviewed and purged / increased a bit ?
>
> Sigh... not again...
>
> Josh reported the same thing one month ago on the infrastructure list. At the time, I discovered that Xcode had created two separate copies of a CoreSimulator directory, each containing cache files for simulating iOS, iPadOS, watchOS and tvOS environments, none of which we need or asked for, and each copy occupied 9.2GB. At the time, I deleted one of them, leaving us with 25GB free space. I have now deleted the second one, leaving us with 11GB free space and additional space will be freed once the currently running build of py37-tensorflow finishes. But it seems likely that Xcode will silently recreate this enormous cache at some point in the future.

Looks like the copy of the CoreSimulator directory that I had deleted on January 18 got recreated on January 25. I have deleted it again. We are now at 20GB free.

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator
On Feb 11, 2021, at 19:19, Ryan Schmidt wrote:

> On Feb 11, 2021, at 18:45, Ryan Schmidt wrote:
>
>> On Feb 11, 2021, at 04:21, Christopher Jones wrote:
>>
>>> I am seeing some large builds fail due to drive space issues
>>>
>>> https://build.macports.org/builders/ports-11_x86_64-builder/builds/21466/steps/install-port/logs/stdio
>>> error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't write output file: bazel-out/host/bin/tensorflow/python/_pywrap_tfcompile.so.XXXXXX (No space left on device)
>>> Could the space on the drive for this build be reviewed and purged / increased a bit ?
>>
>> Sigh... not again...
>>
>> Josh reported the same thing one month ago on the infrastructure list. At the time, I discovered that Xcode had created two separate copies of a CoreSimulator directory, each containing cache files for simulating iOS, iPadOS, watchOS and tvOS environments, none of which we need or asked for, and each copy occupied 9.2GB. At the time, I deleted one of them, leaving us with 25GB free space. I have now deleted the second one, leaving us with 11GB free space and additional space will be freed once the currently running build of py37-tensorflow finishes. But it seems likely that Xcode will silently recreate this enormous cache at some point in the future.
>
> Looks like the copy of the CoreSimulator directory that I had deleted on January 18 got recreated on January 25. I have deleted it again. We are now at 20GB free.

Apple helpfully recreated the CoreSimulator directory again today. I'm not going to delete it again unless we have a plan for how to prevent it from being recreated, to avoid unnecessary wear and tear on the SSD, since we've already destroyed three SSDs in these servers. 16 GB free. I have another change planned to mpbb to free up some more space.

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

Christopher Jones


> On 13 Feb 2021, at 9:06 am, Ryan Schmidt <[hidden email]> wrote:
>
> On Feb 11, 2021, at 19:19, Ryan Schmidt wrote:
>
>>> On Feb 11, 2021, at 18:45, Ryan Schmidt wrote:
>>>
>>>> On Feb 11, 2021, at 04:21, Christopher Jones wrote:
>>>
>>>> I am seeing some large builds fail due to drive space issues
>>>>
>>>> https://build.macports.org/builders/ports-11_x86_64-builder/builds/21466/steps/install-port/logs/stdio
>>>> error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't write output file: bazel-out/host/bin/tensorflow/python/_pywrap_tfcompile.so.XXXXXX (No space left on device)
>>>> Could the space on the drive for this build be reviewed and purged / increased a bit ?
>>>
>>> Sigh... not again...
>>>
>>> Josh reported the same thing one month ago on the infrastructure list. At the time, I discovered that Xcode had created two separate copies of a CoreSimulator directory, each containing cache files for simulating iOS, iPadOS, watchOS and tvOS environments, none of which we need or asked for, and each copy occupied 9.2GB. At the time, I deleted one of them, leaving us with 25GB free space. I have now deleted the second one, leaving us with 11GB free space and additional space will be freed once the currently running build of py37-tensorflow finishes. But it seems likely that Xcode will silently recreate this enormous cache at some point in the future.
>>
>> Looks like the copy of the CoreSimulator directory that I had deleted on January 18 got recreated on January 25. I have deleted it again. We are now at 20GB free.
>
> Apple helpfully recreated the CoreSimulator directory again today. I'm not going to delete it again unless we have a plan for how to prevent it from being recreated, to avoid unnecessary wear and tear on the SSD, since we've already destroyed three SSDs in these servers. 16 GB free. I have another change planned to mpbb to free up some more space.

Just a though but maybe if you delete it and create a stub empty file in its place, will block whatever is recreating it ?

Chris
>

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator


On Feb 13, 2021, at 04:21, Chris Jones wrote:

> On 13 Feb 2021, at 9:06 am, Ryan Schmidt wrote:
>
>> Apple helpfully recreated the CoreSimulator directory again today. I'm not going to delete it again unless we have a plan for how to prevent it from being recreated, to avoid unnecessary wear and tear on the SSD, since we've already destroyed three SSDs in these servers. 16 GB free. I have another change planned to mpbb to free up some more space.
>
> Just a though but maybe if you delete it and create a stub empty file in its place, will block whatever is recreating it ?

Maybe, but I don't want to do experiments on the buildbot machines. I assume Xcode is creating and populating this directory because it needs it for something, and guess that brute force preventing its creation might cause Xcode to fail. It would be nice to know why Xcode thinks it should create this, and to figure out how to tell Xcode not to do it, ideally how to tell Xcode never to do anything at all with simulators. This would be useful for older systems as well. The gigantic CoreSimulator directory is a new problem in Big Sur but inexplicable warnings and errors about simulators in Xcode build logs have been an annoyance for years. We only build macOS software in MacPorts, so we never have any need for simulators.


Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator
In reply to this post by Christopher Jones
On Feb 11, 2021, at 04:21, Christopher Jones wrote:

> I am seeing some large builds fail due to drive space issues
>
> https://build.macports.org/builders/ports-11_x86_64-builder/builds/21466/steps/install-port/logs/stdio
> error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't write output file: bazel-out/host/bin/tensorflow/python/_pywrap_tfcompile.so.XXXXXX (No space left on device)
> Could the space on the drive for this build be reviewed and purged / increased a bit ?
>
> cheers Chris

I did make an additional change to mpbb to delete more unneeded ports:

https://trac.macports.org/ticket/57464#comment:12

And after updating to macOS 11.2.3 and Xcode 12.4 I deleted the simulator directories again and I don't think they've come back.

We have 26GB free at the moment. So let's let the next builds of py-tensorflow* run without interrupting them and see if they complete now.

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

Christopher Jones
Hi,

Great, thanks. As it happens I’ve been working on the bazel builds a bit, and discovered an option to disable local build caching. To quote the manual

--[no]use_action_cache

This option is enabled by default. If disabled, Bazel will not use its local action cache. Disabling the local action cache saves memory and disk space for clean builds, but will make incremental builds slower.

As the ‘incremental’ build option is not an issue for MacPorts, as builds are always started afresh with an empty cache, enabling this does not help us at all, and just uses resources. So I will turn this off.

Chris


On 23 Mar 2021, at 4:04 pm, Ryan Schmidt <[hidden email]> wrote:

On Feb 11, 2021, at 04:21, Christopher Jones wrote:

I am seeing some large builds fail due to drive space issues

https://build.macports.org/builders/ports-11_x86_64-builder/builds/21466/steps/install-port/logs/stdio
error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't write output file: bazel-out/host/bin/tensorflow/python/_pywrap_tfcompile.so.XXXXXX (No space left on device)
Could the space on the drive for this build be reviewed and purged / increased a bit ?

cheers Chris

I did make an additional change to mpbb to delete more unneeded ports:

https://trac.macports.org/ticket/57464#comment:12

And after updating to macOS 11.2.3 and Xcode 12.4 I deleted the simulator directories again and I don't think they've come back.

We have 26GB free at the moment. So let's let the next builds of py-tensorflow* run without interrupting them and see if they complete now.



smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator
On Mar 23, 2021, at 11:23, Christopher Jones wrote:

> As it happens I’ve been working on the bazel builds a bit, and discovered an option to disable local build caching. To quote the manual
>
> --[no]use_action_cache
>
> This option is enabled by default. If disabled, Bazel will not use its local action cache. Disabling the local action cache saves memory and disk space for clean builds, but will make incremental builds slower.
>
> As the ‘incremental’ build option is not an issue for MacPorts, as builds are always started afresh with an empty cache, enabling this does not help us at all, and just uses resources. So I will turn this off.

Thanks, that sounds promising.

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator
In reply to this post by ryandesign2


On Mar 23, 2021, at 11:04, Ryan Schmidt wrote:

> On Feb 11, 2021, at 04:21, Christopher Jones wrote:
>
>> I am seeing some large builds fail due to drive space issues
>>
>> https://build.macports.org/builders/ports-11_x86_64-builder/builds/21466/steps/install-port/logs/stdio
>> error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't write output file: bazel-out/host/bin/tensorflow/python/_pywrap_tfcompile.so.XXXXXX (No space left on device)
>> Could the space on the drive for this build be reviewed and purged / increased a bit ?
>>
>> cheers Chris
>
> I did make an additional change to mpbb to delete more unneeded ports:
>
> https://trac.macports.org/ticket/57464#comment:12

This change uninstalled more ports than I intended. This resulted in many ports needing to be reinstalled often. This slowed down builds and resulted in a backlog of builds. I fixed the problem which means more ports will remain installed now, which will take a little more disk space again.

https://trac.macports.org/ticket/57464#comment:16


> And after updating to macOS 11.2.3 and Xcode 12.4 I deleted the simulator directories again and I don't think they've come back.
>
> We have 26GB free at the moment. So let's let the next builds of py-tensorflow* run without interrupting them and see if they complete now.

Both copies of the simulator caches came back again. I filed FB9072613 with Apple about this. We were down to 3GB free disk space which isn't a good place to be. I deleted the caches again and marked the dyld directories chmod 000. Let's see if that prevents Xcode from recreating the caches, hopefully without causing error messages that cause builds to fail. We now have 21GB free.


Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator


On Apr 8, 2021, at 07:25, Ryan Schmidt wrote:

> We now have 21GB free.

31GB free after rebooting. For some reason fseventsd was taking 14GB of memory. The VM only has 8GB of real memory so a lot of swap was being used. This is probably also the explanation for why that builder has gotten a larger backlog of builds lately.

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

Nils Breunese
In reply to this post by ryandesign2
Ryan Schmidt <[hidden email]> wrote:

> Both copies of the simulator caches came back again. I filed FB9072613 with Apple about this. We were down to 3GB free disk space which isn't a good place to be. I deleted the caches again and marked the dyld directories chmod 000. Let's see if that prevents Xcode from recreating the caches, hopefully without causing error messages that cause builds to fail. We now have 21GB free.

I recently learned that 'xcrun simctl delete unavailable’ and 'xcrun simctl delete all’ exist. Those commands can be used to reclaim disk space in use by Xcode simulators. Not sure if they could be of use in this scenario.

Nils.
Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

Jason Liu
In reply to this post by ryandesign2
For some reason fseventsd was taking 14GB of memory.

Not really surprising, for a machine being used as a builder. The constant creation and deletion of files when compiling software is bound to be generating a ton of file system events.

-- 
Jason Liu


On Thu, Apr 8, 2021 at 1:58 PM Ryan Schmidt <[hidden email]> wrote:


On Apr 8, 2021, at 07:25, Ryan Schmidt wrote:

> We now have 21GB free.

31GB free after rebooting. For some reason fseventsd was taking 14GB of memory. The VM only has 8GB of real memory so a lot of swap was being used. This is probably also the explanation for why that builder has gotten a larger backlog of builds lately.

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator
In reply to this post by Nils Breunese
On Apr 8, 2021, at 13:05, Nils Breunese wrote:

> Ryan Schmidt wrote:
>
>> Both copies of the simulator caches came back again. I filed FB9072613 with Apple about this. We were down to 3GB free disk space which isn't a good place to be. I deleted the caches again and marked the dyld directories chmod 000. Let's see if that prevents Xcode from recreating the caches, hopefully without causing error messages that cause builds to fail. We now have 21GB free.
>
> I recently learned that 'xcrun simctl delete unavailable’ and 'xcrun simctl delete all’ exist. Those commands can be used to reclaim disk space in use by Xcode simulators. Not sure if they could be of use in this scenario.

I haven't tried that. I not only want to delete them, I also want them not to get recreated in the future. I don't know whether those commands would accomplish that. If my preceding attempt of making the dyld directories unreadable and unwritable does not work, I may look into your suggestion.

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator
In reply to this post by Jason Liu
On Apr 8, 2021, at 16:04, Jason Liu wrote:

>> For some reason fseventsd was taking 14GB of memory.
>
> Not really surprising, for a machine being used as a builder. The constant creation and deletion of files when compiling software is bound to be generating a ton of file system events.

It's completely surprising and unusable for an OS background process to suddenly take 14 *GB* of memory, and Apple will be receiving a bug report from me about it.

Being a build machine is not relevant. None of the other build machines I run for MacPorts, from 10.5 through 10.15, have ever exhibited this problem. Only the macOS 11 machine has, and then not all the time. Right now, the macOS 11 machine is busy building and its fseventsd process is only taking 13 *MB* of memory.

I believe I observed the increased fseventsd memory usage on macOS 11 before when I started a Time Machine backup. I don't want to start a backup now because if it causes the problem again then it might affect the currently running builds. When the currently queued builds are finished, or in a few days, I'll take the builder offline and do a backup, and if the issue surfaces again, then I'll file the bug report.

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

Christopher Jones
In reply to this post by ryandesign2


> On 9 Apr 2021, at 7:20 pm, Ryan Schmidt <[hidden email]> wrote:
>
> On Apr 8, 2021, at 13:05, Nils Breunese wrote:
>
>> Ryan Schmidt wrote:
>>
>>> Both copies of the simulator caches came back again. I filed FB9072613 with Apple about this. We were down to 3GB free disk space which isn't a good place to be. I deleted the caches again and marked the dyld directories chmod 000. Let's see if that prevents Xcode from recreating the caches, hopefully without causing error messages that cause builds to fail. We now have 21GB free.
>>
>> I recently learned that 'xcrun simctl delete unavailable’ and 'xcrun simctl delete all’ exist. Those commands can be used to reclaim disk space in use by Xcode simulators. Not sure if they could be of use in this scenario.
>
> I haven't tried that. I not only want to delete them, I also want them not to get recreated in the future. I don't know whether those commands would accomplish that. If my preceding attempt of making the dyld directories unreadable and unwritable does not work, I may look into your suggestion.

Maybe these commands could be run automatically, periodically, as part of the buildbot setup ?

>

Reply | Threaded
Open this post in threaded view
|

Re: 11_x86 build - build failures due to 'no space on device'

ryandesign2
Administrator


On Apr 9, 2021, at 14:10, Chris Jones wrote:

> On 9 Apr 2021, at 7:20 pm, Ryan Schmidt wrote:
>
>> On Apr 8, 2021, at 13:05, Nils Breunese wrote:
>>
>>> Ryan Schmidt wrote:
>>>
>>>> Both copies of the simulator caches came back again. I filed FB9072613 with Apple about this. We were down to 3GB free disk space which isn't a good place to be. I deleted the caches again and marked the dyld directories chmod 000. Let's see if that prevents Xcode from recreating the caches, hopefully without causing error messages that cause builds to fail. We now have 21GB free.
>>>
>>> I recently learned that 'xcrun simctl delete unavailable’ and 'xcrun simctl delete all’ exist. Those commands can be used to reclaim disk space in use by Xcode simulators. Not sure if they could be of use in this scenario.
>>
>> I haven't tried that. I not only want to delete them, I also want them not to get recreated in the future. I don't know whether those commands would accomplish that. If my preceding attempt of making the dyld directories unreadable and unwritable does not work, I may look into your suggestion.
>
> Maybe these commands could be run automatically, periodically, as part of the buildbot setup ?

I don't want the wear and tear on the SSDs resulting from constantly re-writing 18.5 GB of unwanted data. I want to prevent Xcode from doing that. I am hopeful that the steps I have already taken will prevent that. If not, I will investigate other possibilities.