Everyone has experienced the abrupt freeze VRChat has sometimes when loading an avatar, often for multiple seconds at a time. Well I have been doing some testing recently, and I believe this is primarily caused by audio assets being decompressed on the main thread.
The default load behavior of an audio asset in Unity is "Decompress on Load". For VRChat, this means the game has to completely decompress the audio asset immediately upon an avatar loading, and keep it in memory so that it can be played at any time. The problem with this is that the decompression happens on the main thread, stalling anything else that also happens to reside on the main thread from running until it has completed the decompression. With longer duration, higher filesize assets this can cause a considerably long freeze when loading in an avatar.
Unity has a built-in option for any audio asset to "Load in Background", which according to Unity's documentation loads the audio asset on a separate thread instead of the main one. This setting, however, currently doesn't work in VRChat and actually causes your audio source to not function at all. Not sure if this would be the best option, because if an audio source requests to play the asset before it is decompressed, it queues the request and must wait until the decompression process is finished before playing. This would not really be ideal for load-in sounds that are present on many avatars.
Another option is to change the audio asset Load Type to "Compressed in Memory." Using this load type currently works as expected in VRChat. The audio loading no longer stalls the main thread, since it causes the audio to be decompressed in the mixer thread as it plays. While this fixes the freezing problem, this seems like it could create more CPU overhead in scenarios where many audio assets are being decompressed at the same time. Large scale testing would obviously be needed to see if this setting is viable for VRChat.
There is a third load type called "Streaming" that seems similar to Compressed in Memory, but there is not a ton of info on the wiki and this option does not currently work in VRChat. Looks like it would increase disk usage since it incrementally reads the audio from the disk and decodes it on the fly on the streaming thread. I would guess it probably has similar CPU usage to Compressed in Memory. I'm not familiar with this option, so I'm unsure about any other aspects of it.
So if this issue is looked into I think there are three ways to go about fixing it:
Solution A)
Force Load in Background on for any audio asset found on an avatar. To avoid unwanted pauses for load-in audio, maybe a small script could be added to an audio source to designate to the SDK that the user intends that sound to play immediately upon load, so it will not force on Load in Background for the audio asset used in that audio source. A problem I foresee with this is that it would probably retroactively break synchronization of load-in audio with animations for old avatars, which isn't great. This also would add another layer of complexity to some aspects of avatar creation, which may not be desirable.
Solution B)
Force all audio assets found on an avatar to have a load type of Compressed in Memory. This has the upside of not retroactively breaking anything for anyone, which is good. It also may reduce the overall memory usage of audio assets. However, this could potentially end up adding a significant amount of CPU overhead if many sounds are being decompressed simultaneously. As stated earlier, large scale testing would need to be done to measure the overall impact on CPU usage.
Solution C)
Force Streaming load type for all audio assets found on an avatar. As I said earlier I do not have much reference for how this load type works. In theory it seems like it could be a solution. Might be worth testing, at least.
I can personally vouch for Solution B. I have been using the Compressed in Memory load type for audio assets on an avatar with a lot of audio, and it has completely removed the freeze upon load. Additionally I haven't noticed any weirdness with the audio playback, it seems to work as it should. Before switching to this load type, the avatar in question would cause a chunky ~4 second freeze upon loading on my system.
A quick note before I finish, I was in a very active public instance recently and due to everyone initially joining and swapping avatars I probably spent nearly 15-20% of my time frozen. Definitely not an ideal situation.
This may be an oversimplification, but I am just trying to suggest what I believe are potential solutions to a pretty nasty problem that everyone experiences. I would like to see these solutions explored at least (if they haven't been already), even if the final decision is made to keep audio asset loading as-is.
-------TL;DR-------
Audio asset decompression is the primary factor causing freezes upon loading avatars. Please change the way audio assets are loaded to help mitigate this!