Mix submission requirements for mastering
For those about to submit audio for mastering please read bullet points below.
Session Mixes Submission Requirements:
• All audio files at lossless 32 bit/32 bit float and same sample rate as the original project (24 bit if not possible).
• Use interleaved or non-interleaved .wav or .aif files for maximum compatibility.
• Aim for max peak between -3 and -6dBFS where possible.
• No master bus processing or limiting for perceived loudness. Supply an additional master bus version if this is vital to the mix sound balance overall.
• Supply any ‘limited’ version used in the mix sign off process if this is different to the main mix master.
• Leave a minimum of a 3 second pre post gap on the file. Do not hard-edit or top/tail.
• Do not apply fade in/outs, supply a separate faded version which can be ‘fade matched’ after mastering.
If you’re interest in why these requirements are important please read on…
The following discussion is taken from my Audio Mastering book ‘Separating the science from fiction’, chapter 4, ‘Looking to the mix’.
Available at www.routledge.com/9781032359021
Resolution:
In this modern digital construct, most mix projects are completed in a DAW and likely to be running at 64 bit in the mix bus. Even if the mix is external in analogue, or a combination of analogue inserts and digital pathing, the audio will eventually come back to a DAW via its analogue to digital convertors (ADC). This means the higher the mix export resolution from the DAW, the more of the original mix quality there is to master with. But equally it is important to comprehend the source system. A DAW project will be playing at the fixed sample rate selected in the initial creation of a project. This is likely to be set based on the interfacing and available digital signal processing (DSP). Remember, doubling the sample rate doubles the DSP load, i.e., halves the number of potential plugins and track count available.
In this regard, it is best to request the mix engineer supply the mix at the same sample rate as the project setup and not ‘upsample’ from the rate they have been working at during the project’s construction. This will avoid potential errors as some DAW project plugins do not perform correctly on being adjusted in rate and will potentially deliver incorrect outcomes without reloading as a new insert. This is aside from the computer resources being at least halved in upsampling leading to DSP overloads and disk read errors as resourcing is low. This is different to upsampling a file that is already rendered, as another set of conditions apply and the quality and method of the conversion used are crucial. Hence this is best left as a decision during the mastering process and not the mix. Equally rendering at a higher samplerate in the bounce setting is not going to capture anymore data, it is just resampled at the output, which again could cause errors. I will cover these parameters in later chapters. For most output formats now post mastering aside compact disc (CD) audio, the delivery will be at 48kHz or 96kHz, making it better for the mix engineer to work at these scalable rates at the start of a project where possible. These formats scale to video media and HD audio ingestion without conversion. It is better to scale down to CD quality from the outcome with the highest quality sample rate conversion software. Some mastering engineers prefer to do this in the analogue transfer path, sending at one rate and recording back in on a separate system with its own independent clock. This scaling should not take place in the DAW mix bounce. If the engineer has worked at 44.1kHz, it is best to stick with that but I suggest they think about more appropriate rates at the start of future projects.
Higher rates are possible at 192kHz or faster looking towards Digital eXtreme Definition (DXD) processing systems such as Pyramix. The best quality clocks are required to run at these rates and sometimes dedicated hardware. But just because a piece of equipment says it can do something, does not mean it can do it well all the time. Rigorous testing is needed to be sure there are no negatives to the intended positive of a higher resolution, which in principle is always a good thing. There have been many transitions from lower resolution systems to higher in the past, just in my time working as an audio engineer. Testing and developments will eventually make these transitions happen again in the future.
In terms of bit depth, the higher the better. The bigger the bit depth, the more of the original master bus of the DAW will be captured. As most are now working at 64 bit, 32 bit floating point is ideal as a standard for export in this case. This formatting means the capture will contain all of the mix bus and the file will also retain any over shoot above 0dBFS. Not to understate this, 24 bit at 6dB per bit gives a possible dynamic range of 144dB, 16 bit 96dB, but 32 bit float has 1528dB of dynamic range, of which over 700dB is above 0dBFS. It is almost impossible to clip when bouncing the mix for mastering. If a 32 bit file looks clipped, a normalise function will readjust the peak back to 0dBFS and you will see there is no clipping present in the file.
If the source is digital but the mix is analogue, the converter used in the transitions between domains governs the capture bit depth. For the majority of commercial ADC, this will be 24 bit, which is a 144dB dynamic range. It is more than we can hear, but care should be taken to use all the available bits. If aiming for a peak between -3dBFS and -6dBFS, all the possible dynamic range will be recorded while also avoiding potential clipping at 0dBFS. Some meters are not the best at showing clip level and some converters start to soft clip at -3dBFS if it has a limit function. Remember recording at a lower level and normalising in the DAW does not put back in what was never captured, and recording at 32 bit with a 24 bit converter is still only 24 bits in a 32 bit carrier. The DAW bit depth does not govern the captured path, the ADC does. But changing the recorded file with processing is when the higher bit depth comes into play. Even in a simple reduction of volume, it means all the original bits are still intact as there is dynamic range left to turn down into. If the DAW mix bus resolution was only 24 bits, the amount the audio was reduced at peak would be cut off at the bottom of the resolution. This is called truncation and should be avoided. In a modern DAW context, the mix bus is working at 64 bit, meaning a recorded 24 bit audio file could be reduced by 144dB, rendered at 32 bit float in an internal bounce and turned back up 144dB, and there would be no loss in quality from the original file as the headroom is easily available. To avoid losing any of this quality gained from the mix, it should be exported at 32 bit float. The way to deal with bit depth reduction to formats such as CD audio is discussed in Chapter 16, ‘Restrictions of delivery formats’.
An aspect worth exploring at this point regarding resolution and truncation is offline normalisation. If you are recording a file into a normal digital system from analogue, the converter will be working 24 bit, if the recorded level peak was -12dBFS and each bit is 6dB. The file is actually 22 bit, recorded at -24dBFS peak and it is 20 bit, -48dBFS peak 16 bit and so on. The capture level is critical to the quality. This simply cannot be put back in by normalising the files to 0dBFS because it was not recorded – there is nothing there. In applying normalisation to the file, the bottom of the floor is raised and in doing so quantisation distortion is created between what was its floor and the new floor. With the 16 bit captured example, this is a significant amount and clearly audible. This adds a brittle sound to the audio in the same way truncation does when the bit depth is reduced. This is why normalising offline is the wrong approach when trying to maintain quality. It can be very interesting creatively in sound design to destroy or digitise a source in a similar way to a bit crusher, but this is hardly the sound that would be wanted on any part of our master. But if using the fader to turn up the audio or the object-based volume to apply an increase, our DAW is working at a much higher resolution of the mix bus, and this quantisation distortion is minimised.
Whilst discussing resolution, it is important to mention Direct Stream Digital (DSD), which is another method of encoding from analogue to digital. DSD is a one bit system, in a simple sense it means each sample stored can only have two states, which are read as go up or go down. With a super fast sample rate of 2.8224 MHz, 64 times the rate of CD audio, each time a new sample is taken it just goes up or down in amplitude relative to the next. In many ways it can just be thought of as a redrawing of analogue in the digital domain. But the super fast sample rate means the recording cannot be processed by a normal DSP and requires conversion to DXD at 24 or 32 bit. The lower sample rate of 8 times CD audio is at 352.8kHz. Higher sample rates come with their own potential problems, which I discuss in Chapter 10 ‘The transfer path’ and I discuss the practicalities of DSD and DXD use in Chapter 16 ‘Restrictions of delivery formats’.
top AND tail:
If a mix is submitted with a fade in and/or fade out, the relative audio level is not constant during those fade transitions. Examining this from a processing point of view, if the audio fades out, it will fall through the threshold of any dynamic control. If this is a downward process, the likely outcome is a period of hold that will be created in the fade. The material above the sound will be reduced in gain, when it falls below the threshold, the compression stops, meaning the fade fall level is reduced until the gain reduction amount is past, and the fade continues. The fade will now have a hold period inserted in the fade due to the compression reducing the amplitude then not reducing past the threshold. The same principle can be applied to static effects such as EQ, as the amount of compound EQ will change relative to the transition level during the fade.
To avoid this, it is best to request the client send a version with and without fades. A master can be made and the fade applied mirroring the client’s original musical intent. This sounds more effective on long reprise fade types, as the whole sound of mastering is maintained while the fade out transitions. This is in contrast to processing with the fade in place, as the impact of the mastering change gets less as the fade goes through transition.
Just as a straightforward observation, the fade would transition through the threshold of the dynamic tool, meaning the processing timing imposed would change. The start of fade dynamics would be correct, but by the tail end there is no processing. This is not the best requirement for a fade where the RMS content is constant. Equally if there was an expander enhancing the width in the S as the fade reduced, the track would become narrower.
In every regard, it is better to have a mixed version without any main fades; these are always better applied post processing. This does not apply to break down ends where the instruments are turning off in turn as part of the music’s closure. But it applies when the mix engineer is fading the master bus to achieve a fade or applies a post fade to the file to give the client the outcome they intended musically. This is fine for the artist’s ‘limited’ reference, but the fadeless version is best for the mastering chain. It can be faded to match the artist reference afterwards.
It is also worth noting, if the mix engineer can leave an audio header at the start and end of the bounce, they will have captured any possible broadband noise present in the system. This can be used for noise prints if any reduction noise is required, but mainly it means the top and tail of the actual audio will not be cropped when supplying the mix. Remember when mastering the level relative to the noise floor could increase by over 20dB. Hence the tails of the audio, which sounded fine, can be cut off as overly zealous editing has been committed to the mix file. The end of the tail will be stepped, but if a lead in/out ‘blank’ is there, and the file is 32 bit float, the audio’s natural fade will be retained below the original audible spectrum.
This potential pull-up is also where noise can come from especially in the tail of a fade. Sometimes a little noise reduction in the final transition goes a long way to smoothing this outcome post mastering while creating the fade. Noise prints in this regard are not always required as a dynamic frequency dependant expander like Waves X-Noise, or an advanced algorithmic tool like Cedar Audio Auto Dehiss will work very well. This is clearly the best broadband noise reduction I have ever found. For more intrusive sounds such as static or broadband electrical noise (guitar amp buzz), the TC Electronic System 6000 Backdrop noise print algorithm can be very effective. Equally all the Cedar Tools are top of their class in any form of removal of unwanted sonic material.
Source dynamics:
The more overly compressed the source, the less range there is to set the trigger level for any dynamics control. As an example, the mix engineer might make a ‘loud’ version for the artist to listen to the mix in context with other material, because louder sounds better, but equally, the artist will be influenced by the loudness difference between their mix and commercial audio in their playlist. Hence this ‘loud’ version works well in helping the artist understand the context and signing off the sound balance in this more realistic comparable context. But this is not the source required for mastering, though in experience, it can be helpful for our context to have a copy, because sometimes our ideal master is less pushed in level than this ‘limited’ version. If everyone has been listening to this limited ‘loud’ mix and our master is delivered quieter, without a conversation to explain why an incorrect comparison can be made, louder will sound better in principle.
Primarily from a mastering point of view, if our source is an ‘unlimited’ version, the decision to apply a peak reduction at the start of the chain is now our choice rather than it being dictated upon us. This is a good outcome; remember that dynamics work from a threshold, and the more its range is restricted, the harder it is to target the required aspect. With less experienced mix engineers, they may be applying master bus processing to make the mix sound more like the outcome they want. Not a bad thing in principle, but the point of a mix is you have access to the individual tracks, if something is not compressed enough, it should be applied to the individual sound or the group bus, i.e., a drum or vocal bus. It is ill-considered to apply it to the master bus. It can seem effective on face value, but this is just skirting over the issues within the mix. If they have applied a multiband and it is triggering in the bass end with a wide variance in gain reduction, they should instead be targeting this element in the individual channels where the sub/bass is present. Unmasking the rest of the mix which was over processed by the master bus multiband. This is apart from the more effective trigger achieved targeting the original source tracks alongside better evaluation of the envelope required.
Sometimes a mix is so reliant on the master bus processing it is pointless to request a mix without, because the majority of the sound balance is happening on the master bus. In this case, you just have to go with what you have. By undoing the master bus processing, the mix will no longer be coherent, especially in context to the sound the artist and label have signed off on. There is a big difference between skimming a few peaks off and applying a multiband compressor to your mix bus.
This should not stop mix engineers from using a bus compressor on their master bus if they want to ‘push’ their mix into it. But from experience, a lot of engineers get out of this habit or deliver a ‘clean’ mix alongside without any process on the master bus. The mastering engineer in principle should have a better compressor or similar to the same one used, but with a different perspective to achieve the ‘glue’ being sought by this type of mix bus compression.
PARTS:
It is best to request instrumental mix masters alongside the main mixes. One of the core uses for music in the modern era is its role in conjunction with video. This could be a television show, film or advertisement. All this falls into a category called synchronisation of music (sync). This is often directed or managed by a music supervisor on behalf of the media/film production team. Making sure instrumentals have been considered can save the client hassle, time and money down the line, but obviously the creation of extra mix versions is an additional use of time that the mix engineer should be charging for.
The master supplied into this post-production scenario is no different from the other outcomes considered. The music would be ‘mixed’ in ‘post’ with any dialogue, sound effects and so on, often this would be at a much wider dynamic range outcome than the music itself. Especially in a film context, there is no rationale for this to differ from the music the consumer would digest, the music supervisors have selected it because it sounds good, mastered! However, it may require editing to better fit to the visual or just have the music without any vocals in sections. An instrumental master is vital to be able to remove the vocals from this post scenario where the producer will want to create differing sequenced events to the original. It is very helpful if these two masters are sample aligned to make that edit process seamless for the post engineers even if this means the ‘top’ of the file has ‘digital black’ where the vocal intro may have been. Digital black means there is no signal in that part of the audio file. This term can also be used where video has no content.
Having a sample accurate instrumental master also makes for the easy creation of ‘clean’ versions where any potentially offending lyrical content can be easily removed without fuss and a new ‘clean master’ rendered post mastering. The artist may want to supply clean versions edited with new lyrics or effects in the gaps. Though I found over the years for the many differing radio/tv outcomes, context is key. It is usually highly dependent on the target audience and demographic. For example, I have had to edit out the words ‘drunk’ and ‘damn’ before for radio play. Equally, editing down of the solos breaks for radio is not uncommon and having the instrumental without any vocal overlaps has made this easier to edit as the music transitions between sections. You could say ‘why not go back to the mix’ – cost first, time second, and third recalling all these differing paths from mix to mastering means some of it might not recall as the original makes it sound different. Simply put, just have an instrumental. It saves a lot of time and hassle in any scenario.
The mastering engineer’s remit in my view is to have the foresight and awareness of potential issues for a given project. It will put you as an engineer in good standing with your clients being on top of any potential future scenarios that may come to pass. Even in the simplest sense of making the client aware, they may need instrumental mixes, even if these are not processed at the time of mastering. It is a lot easier to recall a mastering transfer path (something done all the time) rather than trying to pull a mix project back months later on a different version of the DAW with older plugins outdated or unavailable in an updated studio where access time can be limited.
The final aspect where an instrumental version can be useful in the mastering process is the simple separation of the vocals. A straightforward sum/ difference matrix will polarity cancel the vocal out of the main mix. This render summed with the instrument is the same as the main mix, but take care to make sure the polarity remains true to the original. Now any targeted processing required on the vocal or the mix can be applied without affecting the other. I find this can be helpful, though in truth, I would prefer to work with the mix engineer on revisions to get the mix right from the start rather than being this intrusive to the mix balance.
The next logical step in this approach to changing the mix balance would be to ask for stems.