SOUND POST FAQ: The "Bullets"
I have no idea what you don't know -- although I can make a guess based on questions students often ask. So this guide tries to convey the gist of what you need to know to get a decent mix on your LMU film project. If you've taken the intro Sound Design class, much of this will be familiar.
Since you're probably working under time pressure, I'll try to give you the basic facts in short little "bullets." Details are provided later (just click on the links.)
THESE ARE THE BASIC QUESTIONS WE TRY TO ANSWER:
Plus a question you are unlikely to ask (because it involves some older
analog recorders) :
HOW CAN I DO A PULLDOWN FOR SOUND RECORDED ON A DIGITAL RECORDER LIKE THE FOSTEX FR-2?
(New, simpler method for Pro Tools 7.3)
STEP BY STEP PROCEDURE
1. Create a Pro Tools session that is set up for 16 bit, AIFF, 48kHz files.
Make sure that Setups>Disk Allocation is set to the Audio Drive that you
are working from. Do NOT create this session on the Desktop.
2. Import and Convert the FR2 files from your flash media.
Go to File>Import Audio (The shortcut to get to this is SHIFT-SPLAT-I.)
Navigate to your FR2 files, select them, and choose to CONVERT them.
(Not “add” them.) Conversion will probably be necessary anyway since the flash
recorders use .wav format and we use AIFF for Final Cut and Pro Tools.
3. Sample Rate Convert on Import.
By choosing to “CONVERT”, the files will be listed in the Regions
to Import box and the “APPLY SRC” option becomes available. Check the box
for APPLY SRC. Set Conversion Quality to Better or Best.
From the “Source Sample Rate” pulldown menu choose: 48kHz (PullUp/Down.)
From the submenu choose: 47.952 .1% Down (FILM to NTSC).
Click “Done” and import the files to your regions list. You will be prompted
to choose a Destination Folder. Usually this will default to the Audio Files
folder of your Pro Tools session. Be sure to double-check as to where these
files end up. (You may elect to create a convenient new folder, like “My Project
PulledDownAudio.”)
4. You now have new 48kHz imported files that are “stretched” slightly slower
to correspond with the telecine pull-down, ready for import into Final Cut
Pro.
Should my files be stereo or mono?
In many cases your production sound may be mono sound recorded on two tracks
of a stereo recorder -- in other words, both tracks are absolutely identical.
In that case, your audio just takes up twice as much storage space, ties up
an extra track in FCP, and potentially complicates your FCP editing, all for
no reason.
Problem is, when you import and convert the files from your flash media,
Pro Tools can't tell if your files are stereo or mono, so it "assumes" it
is stereo and imports both the left and right tracks. If you know for sure
that your dailies are in mono (this is where good sound reports become important)
you can toss out either the .L or .R version of the files right after the
import and convert step.
In other cases your production sound person may have recorded different
sound on the two tracks (like a boom mic on track 1 and a lavalier on track
2.) In that case you want to keep both.
Just before I did the Import, I noticed that the import window was showing
my files as being 47.952 kHz. I thought the flash recorder was set up for
48kHz – what gives?
You’re not crazy – and the Pro Tools documentation is a bit vague on what
is going on here -- but it seems as if the pulldown conversion proceeds as
if your original 48 kHz files were going to be played back at 47.952.
In this way you end up with true 48 kHz files that are nonetheless stretched
in length.
WHY ARE WE GOING TO ALL THIS TROUBLE WITH FILE CONVERSION?
If you're shooting on digital video none of this applies, because you either
have single-system sound with picture and sound already in sync, or you have
double-system sound but both picture and sound are recorded at the same frame
rates. But if you're shooting film, digitizing the film via telecine, and
completing your post on a non-linear system, the film you shot on the set
-- the "real time" that you recorded -- is going to get slowed down slightly
during the telecine process. If you filmed for exactly ten minutes on the
set, that footage, after being telecined, will play back slightly slower and
last 10 minutes and 6/10th of a second. The difference is enough to make dialog
obviously out of sync -- if you sync up the clapper at the start of the take,
the sound will drift noticeably out of sync after just a minute or two.
To avoid this problem, you need to "stretch" the recording, slowing it down
by a corresponding amount, which happens to be .1%. This is the difference
between a theoretical video frame rate of 30 fps and the actual NTSC frame
rate of 29.97 fps.
The problem is that the Fostex FR-2 files want to play back in “real time.”
We need to stretch those files so that they correspond to the "video time"
of the telecined picture.
If you were strictly working within Pro Tools, this would be relatively
simple. Pro Tools defaults to using the Mac's clock reference to determine
the playback speed of all audio files. That is, the session itself "assumes"
a rigid speed based on the sample rate you choose for your session -- in
this case, 48kHz. For instance, if you bring a 44.1 kHz file into the session
without doing a file conversion, it will play back at the wrong speed --
it will occupy less space on the Pro Tools timeline, and play back faster
than its original speed. So one way to achieve a pulldown would be to select
a file and go to the Audio Menu of the Regions List, choose Export Selected
as Files >Sample Rate>48kHz (Pull Up/Down)>48.048. You would then
save these "sample rate tweaked" versions of the files into a different folder.
Now if these files were played back at their correct sample rate of 48.048
they would be exactly the same length as before. But if you import these files
directly into Pro Tools without converting them to "48kHz exactly" during
the import, then they will end up sounding subtly slower and taking up more
space on the timeline -- in other words, they'll be stretched to the proper
pulled down length.
However, Final Cut takes more of a Quicktime Player approach to sound playback,
essentially looking at the file and saying, "So -- you're a 48.048 file and
if played at that speed should last exactly ten minutes. Okay -- we'll adjust
accordingly so that you end up lasting exactly ten minutes." So FCP expects
to deal with files of different sample rates in the same session and adjusts
on the fly. Impressive, but in this case it complicates matters since we want
files that will be stretched to work in FCP.
By using the new and improved SRC options of Pro Tools 7.3, we can end up
with stretched files that really are at 48 kHz, and with a lot fewer steps
in the process.
HOW CAN I DIGITIZE PRODUCTION SOUND FROM NAGRA TAPES?
(The “Old School” method)
This is no longer a widely supported procedure at LMU so if you elect to
work with a Nagra see a RECA instructor/engineer to get checked out on the
transfer cart procedures. A transfer cart is available through RECA studio
support on the second floor of the SFTV building.
RESOLVING THE NAGRA TO 59.94 HZ.
If you're shooting on digital video none of this probably applies, because
you have single-system sound with picture and sound already in sync and both
are already compatible with video frame rates. But if you're shooting film,
digitizing the film via telecine, and completing your post on a non-linear
system, the film you shot on the set -- the "real time" that you recorded
-- is going to get slowed down slightly during the telecine process. If you
filmed for exactly ten minutes on the set, that footage, after being telecined,
will play back slightly slower and last 10 minutes and 6/10th of a second.
The difference is enough to make dialog obviously out of sync -- if you sync
up the clapper at the start of the take, the sound will drift noticeably out
of sync after just a minute or two.
To avoid this problem, you need to "stretch" the Nagra recording, slowing
it down by a corresponding amount, which happens to be .1%. This is the difference
between a theoretical video frame rate of 30 fps and the actual NTSC frame
rate of 29.97 fps. Likewise, it's the difference between the Nagra's original
sync pulse reference of 60 Hz and 59.94 Hz.
During playback, the Nagra "expects" to see an external reference of 60
Hz to compare to its pre-recorded reference of 60 Hz. It will then make any
tiny adjustments in playback speed that are needed to stay "in perfect pitch"
with this external reference. When we use the 59.94 sync reference box instead,
the Nagra will seem to be playing back a sync pulse that is subtly "sharper"
in pitch than is desired so the correction circuitry will slow the Nagra playback
down very slightly so that it is once again playing "in perfect pitch."
This is best done during the digitizing itself so that the resulting AIFF
files will already be "stretched" to the correct length.
Remember that none of this resolving takes place with a Nagra model 4.2
unless it is playing back with the Selector Switch lever all the way down
in the "speaker icon" mode. (The Nagra will play back audio in the "first
click" position but will not resolve in that mode.) Also note that resolving
and pulldown are not the same thing. If you failed to resolve a Nagra transfer,
doing a pulldown through file conversion tricks could still result in sound
that drifts in sync because the Nagra was essentially "freewheeling" during
playback and not doing any subtle ongoing speed corrections. The pulldown
is an overall speed adjustment that affects an entire take; resolving involves
internal incremental adjustments during a take.
Nagra transfer procedure checklist.
Trouble-shooting the transfer.
I'm playing back the Nagra tape, but I'm not getting any sound into the Mackie or Pro Tools.
First, confirm that you have the Nagra's "Line & Playback" knob turned up so that you're seeing appropriate levels on the Nagra itself. Look on the right hand side of the Nagra and confirm that the output jacks have not been disconnected. On the Mackie, check the black Mute/Alt button above the level knob that controls the Nagra signal to Pro Tools. Make sure it is "down" and not up. If it's already down, check that the Source button is in the correct position (the button second from the top should be "down.") If all that is correct, re-check to see that the Nagra input channel level knob is at the detent "12:00" position and that the Main Mix or Master Output to Pro Tools knob is set likewise. Confirm that the separate Headphone/Speaker level is not turned all the way down. If you still don't see level on the Mackie Mixer or hear sound through the monitor, get some RECA tech support.
I see and hear good sound levels at the Mackie but nothing shows up on the Pro Tools audio meters.
Make sure that your audio track is RECORD ARMED (no levels will read on the meters unless the "R" button is red.) This is the most likely culprit. If the track is already armed and ready to record and you still see no audio on the meters, go to Setups >Hardware Setup and confirm that your input source is "analog" and not "S/PDIF." (S/PDIF is for a digital-to-digital transfer from DAT.)
I'm hearing a loud hum/buzz in the monitor.
Check the three banana plugs on the right hand side of the Nagra to confirm that all are connected correctly. Do not proceed with the transfer unless the problem is corrected; seek tech support.
I'm hearing distortion on my production recordings.
If you're listening to the output of the Mackie mixer, try switching the headphones and monitoring the output of the Nagra directly. (You may need a stereo-mono adapter for the headphones.) If you still hear distortion, it's undoubtedly a problem in the original recording. If the sound from the Nagra is fine but the Mackie output is not, check to see that the mixer input slice and master output are set to the usual detent position of 12:00 and that the loud parts of your dialog are not causing the "+22" clipping indicator on the Mackie's meter to light. If the clip light is coming on, or if most of your levels are in the "yellow" range on the LEDs, try adjusting the Nagra "Line & Playback" knob to a lower setting.
The pilot flag on the Nagra is not indicating sync.
Check the connector to the external 59.94 sync box. And make sure the Nagra playback selector switch is fully down.
I went to back up my session folder but I don't see my audiofiles in the Audio Files folder.
You may not have had your Disc Allocation set correctly. In that case, Pro Tools has created an identically named session folder on one of the other drives, and that should contain your audio.
About reference levels and how they line up (and when to fudge the settings.)
Why is a -10 dB tone on the Nagra supposed to line up to "-10" on the Mackie, but read "-20" in Pro Tools?
While each device uses decibels on its meters, each device uses "0" dB on the scale to mean something different. On the Nagra, "0" represents a very strong peak -- but one that we can go slightly beyond. We can go 4 dB higher than that with a slight edge of distortion. We can actually go a bit higher still, but the tape itself will become saturated and introduce bad distortion.
The Mackie uses "0" to represent "good and strong" but with quite a bit of safety margin beyond that point before we begin to distort. (About +20 dB above and beyond.)Pro Tools is a digital recorder and uses "0" to mean the absolute maximum. Digital clipping occurs beyond that point. Another complication is that the Mackie mixer output is a pro line level output and the input to Pro Tools is a consumer line level and we don't want to overload the Pro Tools inputs. (Hence the slightly wimpy level on the Mackie meters.)
To deal with these different devices and scales, we have to choose a smart reference point, and "headroom" is the key factor. The Nagra uses a reference level of -10 dB to indicate "good and strong" -- with about 14 dB of headroom or safety margin above that point. We can line that tone up to approximately "-10" on the Mackie and our loudest sounds should only go up to about +4 or so on the Mackie, which is well within the headroom of the Mackie itself. So far, so good. The next step is to choose a point of reference on our digital recording. Logically, we might choose -14 dBFS for our Pro Tools setting. This would work out fine as long as our Nagra recordings stayed in an ideal range.
But if we've botched our Nagra recording slightly and over-recorded the tape, we could end up with peaks somewhat higher, maybe 18 dB higher than our Nagra reference level. This would mean that loud sounds that are already somewhat distorted on the original tape would go into digital clipping on Pro Tools and end up sounding still worse. So we allow that extra four dB of safety margin to help prevent digital clipping. (Since digital is a low-noise medium, it's better to play safe and let the levels be a little on the low side rather than risk digital clipping.)
When should we make an exception to the recommended line up procedure?
Despite our best efforts to build in enough safety margin, it's possible that you might have extremely loud recordings where your peaks might still go into digital clipping. In that case, back off the level on the Nagra's "Line & Playback" knob.
More likely is a case where your production sound is under-recorded at a consistently low level. In other words, your Nagra line up tone is fine but the production sound itself is on the wimpy side relative to the tone. How will you know? If you follow the recommended line up procedure and watch the Nagra modulometer, you should find that the peaks for well-recorded dialog tracks will end up bouncing around more or less straight up -- from about "-10" to "-6." A shout or door slam might deflect to "0" or slightly beyond. On the Pro Tools meters, this might translate to peaks in the -14 to -10 range. [If you're thinking that this seems a little "hot", especially since we used -20 dBFS as a reference, the difference is largely due to the faster response time of the Pro Tools meters compared to the Nagra.]
On the other hand, if your loudest production sound is peaking well below -10 dB on the Nagra and you're seeing similarly low levels in Pro Tools, you might wish to do a slight boost during this transfer stage. Since the Mackie Mixer and Pro Tools are aligned so that "0" on the Mackie produces approximately "-18" on Pro Tools, it's best to do any fudging with the Nagra "Line & Playback" knob. Try to do this in an intelligent, non-arbitrary way. For instance, you may decide that a 4 dB boost overall is desirable. Go back to your Nagra line up tone and reset the Nagra playback so that the tone reads "-6" on the Nagra instead of the usual "-10." This should produce a reading of "-14" in Pro Tools. Make a note of this correction factor on the sound roll's sound report. This could prove handy if you need to go back and retransfer. And keep an eye on the loudest sounds during the transfer to make sure that they do not digitally clip.
About pull down and resolving the Nagra to 59.94 Hz.
If you're shooting on digital video none of this probably applies, because you have single-system sound with picture and sound already in sync and both are already compatible with video frame rates. But if you're shooting film, digitizing the film via telecine, and completing your post on a non-linear system, the film you shot on the set -- the "real time" that you recorded -- is going to get slowed down slightly during the telecine process. If you filmed for exactly ten minutes on the set, that footage, after being telecined, will play back slightly slower and last 10 minutes and 6/10th of a second. The difference is enough to make dialog obviously out of sync -- if you sync up the clapper at the start of the take, the sound will drift noticeably out of sync after just a minute or two. To avoid this problem, you need to "stretch" the Nagra recording, slowing it down by a corresponding amount, which happens to be .1%. This is the difference between a theoretical video frame rate of 30 fps and the actual NTSC frame rate of 29.97 fps. Likewise, it's the difference between the Nagra's original sync pulse reference of 60 Hz and 59.94 Hz.
During playback, the Nagra "expects" to see an external reference of 60 Hz to compare to its pre-recorded reference of 60 Hz. It will then make any tiny adjustments in playback speed that are needed to stay "in perfect pitch" with this external reference. When we use the 59.94 reference box instead, the Nagra will seem to be playing back a sync pulse that is subtly "sharper" in pitch than is desired so the correction circuitry will slow the Nagra playback down very slightly so that it is once again playing "in perfect pitch."
This is best done during the digitizing itself so that the resulting AIFF files will already be "stretched" to the correct length.
Remember that none of this resolving takes place unless the Nagra is playing back with the Selector Switch lever all the way down in the "speaker icon" mode. (The Nagra will play back audio in the "first click" position but will not resolve in that mode.) Also note that resolving and pulldown are not the same thing. If you failed to resolve a Nagra transfer, doing a pulldown through file conversion tricks could still result in sound that drifts in sync because the Nagra was essentially "freewheeling" during playback and not doing any subtle ongoing speed corrections. The pulldown is an overall speed adjustment that affects an entire take; resolving involves internal incremental adjustments during a take.
About achieving pulldown through file conversion.
See the section on pulling down files recorded on the digital flash media
recorders and follow the same basic procedure.
About redoing a bad transfer and getting FCP to automatically replace the files.
There have been instances when someone has inadvertently introduced noise or distortion during the transfer process. For instance, you might forget to close the mike pots on the Nagra or you might have left some of the other inputs turned up on the Mackie -- this could introduce system noise/hiss. Or the EQ settings could have been altered from their neutral (12:00) position, giving your audio an unnecessary "tweak." In any case, suppose you've been editing away in FCP and only later discover your mistake. The original Nagra recordings are fine, but the sound in your workstation is awful. Can it be fixed? And can you avoid having to re-sync every piece of audio in your completed cut?
Yes, but you have to follow some steps. First, re-transfer the audio making sure this time to exercise proper quality control. (When in doubt, listen to the original Nagra tapes straight from the Nagra and compare.)
Next you have to fool FCP into accepting these new files as the original transfer. But first you've got to ensure that the new files are in exact sync with the originals. And this may not be as simple as simply aligning the beginning of the first slate marker. If you transferred an entire Nagra roll into a single file, remember that between each production take there will be a subtle interruption of the Nagra sync pulse. This can result in a slight "bobble" in speed as the Nagra adjusts and resolves to the new burst of sync pulse. So, you may have to readjust each production take slightly to accomodate. This may sound like a huge chore, but in fact it can go pretty quickly.
This is all best done in a Pro Tools session where you can easily make edits of 1/4 frame accuracy or less.
First, re-transfer/digitize the sound roll. Next, bring in the original NG audiofile -- this will serve as your sync template. Put the new improved audio in a track below the original and examine the waveforms. Line up the first slate marker of the new audio to match the original as closely as possible. Trim the beginning of the new audiofile so that it begins exactly where the original does on the timeline. Proceed down to the next slate marker and see if any sync adjustment needs to be made to get the new audio to line up. If so, edit the new audio to realign the waveforms. Continue the process to the very end of the audiofile. Trim the end of the new audiofile so it exactly matches the length of the original. (Note: If you have less audio than you need, you can cut & paste a little roomtone from somewhere else.)
Now highlight all the regions of this new improved audio you've created. Go to the Edit Menu and select Consolidate Selection. This will create a continuous audiofile with no edits that should match your original exactly in terms of length and sync.
After this you just need to do a bit of file management. Give the new file(s) exactly the same name as the originals. Move the original files from the folder where FCP "expects" to find them and set them aside. Replace the originals with the new improved versions. When you re-open your FCP session FCP should reference these new files and your session will look and sound the same -- only better!
WHAT KIND OF BAD PRODUCTION SOUND CAN I FIX IN POST?
You can't "undo" distortion in a bad recording.
If your record levels were set too high, or if you overloaded a mic input or some other amplifier stage, it's too late to fix it. That's why it's important to closely monitor the sound during production and fix those problems that need fixing when they need fixing.
But before you give up, make sure the problem is actually a bad recording -- you want to be sure that any distortion you're hearing is on the original production recordings and not the result of some error that further down the audio road. For instance, if the distortion was created during the transfer from original recording to digital editing system, you might solve the problem by re-transferring the sound. When in doubt, identify a take that sounds bad on your editing system and go back to the original production recording and listen to the original tape on a good monitor system.
It's even possible that the distortion you're hearing exists only in the monitor chain that you're listening through. You might just have a dirty connection on the headphone jack, so try gently wiggling the connector.
Some tips for distortion trouble shooting:
Does everything sound bad, including music tracks and sound effects? In that case, it's likely to be a monitoring problem -- a bad pair of headphones, an overloaded amp, some dirty contacts in the audio connectors, etc.
Does the entire dialog take sound bad, or just the loudest portion of the take? (If just the loudest dialog is distorted, this makes it somewhat more likely that it was just a production sound error.)
Do things like voice slates and reference tone beeps also sound distorted? If you can eliminate monitoring problems as a culprit, this situation makes it more likely that the problem may be due to a bad transfer/digitizing process.
Do the digital sound levels frequently "hit zero" on the distorted takes? Does the waveform show signs of clipping? If your production sound was recorded and loaded digitally then the problem is likely to be in the original recording. If the production sound was recorded on an analog deck like a Nagra, it's likely that your sound transfer/digitizing was made at too high a level. Listen to the original sound roll; you may just need to re-digitize the problem takes. (You can avoid re-editing by aligning the waveforms and trimming any re-digitized sound files so that they exactly match the existing files. Then give them the same file names and FCP should open the new improved files to recreate your edits.)
You can't make a "distant, unclear" recording sound close up and clear -- but you might help it a bit.
Try this: use the EQ capabilities of your editing system to do a little "frequency enhancement." (Final Cut Pro calls all its EQ functions "filters"; not the best choice of terms, but close enough.) Using the 3 Band Equalizer, try reducing frequencies below about 80 Hz by around 5 dB. Now try boosting the midrange around 2000 Hz by about 5 dB. You'll have to experiment a bit, but often a combination like this will improve the overall clarity of the dialog.
It's best to be a bit conservative with EQ -- don't overdo it.
Something like the above works best when the sound is only slightly off mic. If the mike was more than seven feet away from the actor...well...
One of the most important things about production sound: you have to get the mike in close enough so that you get a crisp clear recording without a lot of background noise or reverberance that can "muddy" the sound. So please -- when working with video cameras, don't rely on the built-in mikes mounted on the camera. And when working with double-system recorders, try to plan your shots so that at least some of the coverage will allow for decent mike placement.
Remember that it's easy in the mix to make a good clear recording sound "gritty" or "dirty" -- you can add ambience tracks or reverb or you can EQ the track to reduce the high and midrange frequencies that convey clarity. But it's very tough to try to clean up a bad recording.
So that's enough for the sadder-but-wiser routine; if you're reading this section, you already regret some of your production problems.
You can reduce certain types of noise (but don't expect miracles.)
Here's the basic rule that will tell you if your particular noise problem is fixable: the noise you don't want has to be in a frequency range that is different from the sounds you do want. For instance, you could get rid of quite a bit of low pitched rumble in a recording of a parakeet chirping because the chirps you want are high pitched and the noise is low pitched. That makes it a good candidate for an EQ fix.Try setting the 3 Band EQ to drastically reduce the gain on the low frequencies you select. Start off with the low frequency slider set to around 100 Hz and experiment with raising it to higher frequencies. In the case of this example, you might be able to reduce everything below 800 Hz without affecting the sound of the chirps; so you'd get rid of a lot of noisy rumble without cutting into the higher pitched chirps.
Dialog recordings are trickier to fix because dialog contains a much broader, richer range of frequencies than a simple bird chirp. However, the basic EQ approach described above can help reduce certain types of rumble such as wind noise or very distant traffic. Even some types of air conditioner noise can be lessened. But bear in mind that you have to be much more conservative in your approach --chances are you should confine your filtering to frequencies below 100 Hz. (Anything much above that begins to cut into the voice range.)
Noise that isn't confined to a narrow frequency range is a lot trickier to reduce. Expanders and other types of noise-reduction software can be applied to some of these problem noises. At the time of this writing, Final Cut Pro has expanded its sound processing functions and you may be able to do quite a bit of fix-it work within FCP. Soundtrack Pro has a fairly good noise reduction processor as well. (As with most processing, it's best to be conservative with it -- listen critically to make sure that you're not creating weird processing artifacts in voices, etc.)
For more exotic tweaking you might have to export selected sound files to Pro Tools to take advantage of some more exotic plug-ins. And I can't give you a quick lesson in using them in just a paragraph or two so you may have to get someone to help you with it.
But to summarize and illustrate the basic rule about noise reduction, I'll list a few likely suspects:
Fairly easy to fix:
Somewhat tricky:
Difficult to impossible:
If the original recording is poor quality you either have to live with it, find an alternate take (sometimes you can use the sound from a different take and "cheat" it over a different shot), or record a new version of the line. If it's a fairly brief line -- especially if it occurs offscreen -- you might get together with the actor and record this "wild" (without seeing the picture.) For more extensive replacement you'll need to arrange some type of ADR session. In either case, try to get a decent perspective match with the other dialog in the scene -- in other words, if the original recording was done in a carpeted room with the window drapes closed and the mike fairly close, don't record the new lines in a reverberant room with a marble floor. Do not try to match background ambience in this new recording; just because your original sound was recorded next to a busy freeway is no reason to record the new lines that way. (You'll use dialog editing tricks to fill in some matching ambience for the final mix.)
HOW DO I MAKE EDITED SCENES SOUND SMOOTHER?
In nature, continuous sounds don't change instantaneously; rivers don't suddenly quit flowing, rain doesn't stop on a dime, wind doesn't cut off as if someone has thrown a switch. But all these things can happen when you edit various takes together into a supposedly continuous scene -- and our ears are trained to notice these weird discontinuities and wonder about them. And wouldn't you rather have the audience wondering what's going to happen next in your story and not distracted by these technical "bumps" in your soundtrack?
So how do you smooth the bumps? Sound editors rely on variations of just a few basic tricks:
1) "Hide" the change in ambience by making it happen during another strong sound that distracts the listener; this drowns out or "masks" the change.
2) Make the change gradual rather than instantaneous by creating a long fade-in or fade-out instead of a simple cut.
3) Run some continuous effects ambience underneath the entire scene; this can help mask subtle changes when you cut between different angles, especially if the continuous effects ambience is more "complicated" than the production ambience.
Example: You shot your "restaurant scene" in a converted warehouse and the room tone varies from cut to cut. But when you add some sound effects of background voices, dish clatter, and P.A. music those minor bumps aren't noticeable.
A bit more detail about "Trick #1." For subtle presence bumps, often it's enough to re-think the position of the edit point. Picture editors tend to cut the sound at the same point as a picture cut. When no sound except ambience is occurring at that point, the bump is noticeable. But if the ambience from shot A is continued past the picture cut to shot B, and the edit point occurs just prior to a strong modulation such as the first line of dialog in shot B, the strong modulation will tend to mask the slight change in ambience. (The sound for the two angles are placed on different units, but the same masking principle applies if they were cut together that way on a single worktrack.)
Here's what the original worktrack might look like if it was simply split onto two different tracks at the original edit point:
Since they're from two different takes (12/1 and 12B/2) there's probably a presence mis-match at the cut.
Instead, if we extend the presence from the first angle up to the point where the strong sound of the incoming dialog would mask the shift in presence, we might end up with something like this:
You can see that we had to "steal" some presence from elsewhere in take 12/1 in order to extend it far enough. In the above case that it took two different regions to "stretch" the presence that far.
Here's a variation on "Trick #2" in which a crossfade was created to ease a transition between two different ambiences.
Those crossfades might end up looking something like this:
Sometimes you need to mask the incoming or outgoing fade by having it occur while there's a strong modulation of dialog to "hide" the fact that the ambience is changing.
About "Trick #3" -- sometimes you run into cases where you have a particular "problem angle" with a distinctive ambient noise that cuts in and out in a jarring way. Then you may need to have the ambience extended from the beginning of the scene to the end so that there is at least a consistent ambient tone throughout. But remember that ambience is additive and that the dialog from that angle already has the noisy ambience tied to it. So don't just create a separate continuous piece of that background noise on a spare track, because when that noisy dialog occurs the ambience will still bump in and out. Instead, extend the ambience on the same track in which the dialog occurs, essentially "filling in the gaps."
In the first example, the background noise will still bump when Bob's angles cut in and out:
In the second case, the dialog should sound smooth and consistent (as long as Mary's lines are reasonably clean.)
STATEGIES FOR WORKING WITH FINAL CUT PRO & PRO TOOLS
This section assumes that you have a good working knowledge of Pro Tools. So it isn't a step-by-step "how to" guide; it's just an outline of general strategy for attacking various problems.
GETTING YOUR FINAL CUT PICTURE INTO PRO TOOLS
There are two basic choices for getting picture from FCP and viewing it
on the Pro Tools workstations:
Option A – DV CODEC
This hi-res approach is recommended for the best interchangeability between
LMU sound studios that use different types of video playback. (If you're working
on a single screen Pro Tools system, like a laptop, or a slower system that
is having trouble delivering smooth video playback, or just need to keep
the movie file size smaller, see Option B below.)
From Final Cut Pro export the FCP project as a "self-contained Quicktime
Movie." The settings should be:
Format: DV NTSC 48kHz
Size: 720x480
Quality: "Medium" (actually, it'll be high quality for the DV NTSC format)
Frame Rate: 29.97
It's recommended that you choose the “audio & video” option to embed
your worktrack audio in the Quicktime movie.*
Settings for sound should be:
Format: Uncompressed
Sample rate: 48 kHz
Sample size: 16 (that's the bit rate)
Channels: either 1 or 2 depending on whether you export mono or stereo.
Since this is usually just guide audio, choosing MONO is simpler.
The DV format creates a very large file -- approximately 3.5 megabytes per
second for a Quicktime movie without sound (or about 210 meg per minute.)
[Checking the "sound" option requires a rather small increase of 5 meg per
minute. So you can see that a 20 minute project will require a file larger
than 4 gigabytes. That alone might is not necessarily a problem. Be aware,
though, that for a very complex Pro Tools session the added data stream required
for this high res format could put a little more strain on the "digital pipeline"
than is necessary.
Once you have your 720x480 Quicktime Movie you’ll find that it takes up
most of a Mac monitor screen, so in a Pro Tools room equipped with two screens
you can just drag the movie to the second screen. (If you are in one of the
Pro Tools rooms equipped with an NTSC monitor, so much the better. Just remember
to select the “Play DV Movie Out Firewire” option.
[If you are working on your own “single screen” system, the DV movie will
hog too much of the screen space for convenience. In Pro Tools 7.3, you can
resize the movie window. In earlier versions you have to use Quicktime Pro
to resize the movie display, save it in the new size, then reimport the movie
into Pro Tools.]
Option B - Quicktime Compression
Export a somewhat more compressed Quicktime Movie format. At about 1 megabyte
per second, this will result in a file less than one-third the size of the
DV format.
Recommended settings:
Format: Motion JPEG A
Frame size--320x240; Frame rate -- 29.97; Audio -- None or Mono 48kHz*.;
Quality -- Medium to High. (Options for key framing and internet streaming
should be unchecked.)
This format will give you picture quality that is reasonably good for sound
editing purposes.
Understand that if you view this in it's "normal" size, it's a pretty small
picture, suitable for viewing on a single Mac screen that must also display
the Pro Tools window. (Handy if you're working on a laptop.) This image size
would not be very good for cutting detailed sync effects, but will work for
cutting dialog when the worktrack is clearly in sync and no alternate take
"cheating" or ADR editing is needed.
When there is a second Mac monitor available you can view this same movie
in a "double size" mode.
Again, in Pro Tools 7.3 you can simply re-size the movie window from within
Pro Tools.
Note: At the time of this writing, there are a number of Pro Tools rooms
(studios C, D, K, N & O, P, and Q) that have, in addition to two Mac screens,
small NTSC TVs hooked up to firewire DV converters that send the movie to
the TVs (or projectors.) But they only work with the DV NTSC format described
in Option A, and in Pro Tools you must remember to go to the Movie Menu and
choose to "Play DV Movie Out Firewire Port."
If you find yourself in a room like Foley Studio P and your movie isn't
in the DV format, you must open your existing movie file in Quicktime Pro
and export it as a self-contained Quicktime Movie. When you do this, you
must choose the correct DV NTSC codec for this new alternate copy of the
movie.
SOUND - GETTING YOUR FCP SOUND INTO PRO TOOLS
OPTION 1 - EXPORTING AIFF FILES
Ideally, all the finicky editing of the production dialog tracks will be
done in Final Cut. The advantage is that the editor has easy access to all
the complete audio takes -- very handy when you're trying to find matching
ambience to plug "holes", extending incomplete words, etc. One disadvantage
is that, as of this writing, FCP is still just a little clunky when it comes
to making fussy edits. The other is that the filmmaker doing the picture editing
may not have good dialog editing skills. But assuming that they do, the procedure
is to:
∑ Edit the dialog tracks carefully, removing all undesired glitches, filling
all ambience "holes" with matching ambience, and smoothing presence "bumps"
within scenes. Remember that isolated lines that are to be ADR'd will often
require matching ambience to make them believable as occurring within the
scene.
∑ Move undesired sound or dialog which is to be looped to a separate "X
Dialog" track which can be muted when you do your dialog pre-dub.
∑ "Checker-board" the various takes across different tracks to facilitate
level or EQ adjustments where needed.
∑ If feasible, set appropriate levels so that the angles within a scene
flow smoothly.
Next you will export each individual track as a single audio file. For instance,
mute all tracks except for A1. Go to File>Export>Using Quicktime Conversion.
From "Format", choose AIFF. Under "Options" make sure to export the track
as a 16 bit 48 kHz mono file. Repeat the process for each track you wish to
export. Make sure there are head & tail pops on each track and that they
align correctly with your picture.
Now it's just a question of importing those tracks into a Pro Tools session,
making sure they are in sync with the Quicktime movie for the session, and
refining your dialog mix within Pro Tools. NOTE: This procedure is very much
like what Pro Tools calls a "bounce", so any volume graphing and processing
performed in FCP will be reflected in the AIFF files.
OPTION 2 - OMF EXPORT/IMPORT TO PRO TOOLS
This is actually the more common situation: a picture editor hands the project
off to a soundperson to finesse the dialog editing as well as do the effects
work and mixing.
If you must do extensive dialog editing in Pro Tools you will probably want
to try the OMF file exporting procedure. The object is to create a Pro Tools
session that mirrors the way audio regions are laid out in the Final Cut Pro
tracks. The OMF file (Open Media File Interchange format) is just a means
to “translate” media from one software program to another.
From Final Cut Pro, the editor must choose: File>Export>Audio to OMF.
There are then some choices to make: 16 bit, 48 kHz sampling rate, include
Crossfade Transitions. In the newest versions of FCP there are two other important
options to check: Volume Automation and Panning. This means that volume adjustments
made in FCP will be mirrored by similar volume automation in Pro Tools, preserving
any hard work done in FCP, but still giving you the chance to fine-tune it
or delete it and start from scratch. Lastly there is an item called handle
length. This is very important for dialog work; the handle length determines
how much you will be able to extend the heads or tails of audio regions.
This is essential for repairing clipped off words or smoothing ambience transitions.
FCP defaults to one second handles; five seconds is probably a better minimum
length.
Once you have an OMF file you must go to a Pro Tools station that is equipped
with Digitranslator software. (Many of the LE stations do not have this option,
but some do – so check the studio capabilities list.) Create a Pro Tools session
set up for 48 kHz AIFF files and a timecode start time that matches your
FCP session. Then go to File>Import Session Data and open the OMF file.
The Import Session Data window will appear. Select the tracks you wish to
"Import as New Tracks". Under Audio Media Options, be sure to select "Copy
from Source Media" rather than "Refer to Source Media." This should result
in a "stand alone" Pro Tools session folder containing all the audio you need.
(Otherwise you are confined to working on a system which has the Digitranslator
option installed and can just refer to the OMF media and extract the audio
on the fly.)
OPTION 3 (You won't like this one...)
Well, there are many variations on “Option 3.” In the early days of digital
editing, picture editors sometimes saved drive space by using lower quality
audio on their systems. Or there were times when the production sound passed
through various stages of doubtful analog transfer during the “loading” process,
so that it was often a good idea to go back to the original sound rolls and
“re-assemble” the sound from the pristine original. There were various clunky
ways to do this.
One method involved time-code slaving; the software looked at the original
time-code stamp of each region in the session and controlled an external playback
machine (often a time-code DAT) that would roll to the correct position, and
then Pro Tools would go into record for each portion of sound on the timeline.
Someone would have to babysit this process in order to pop new sound rolls
into the machine as needed.
Pretty crazy, isn’t it? The solution was obvious: make sure that the sound
used by the picture editor was of high enough quality to use for the final
mix, then use the same media throughout. But it took years for that to become
common.
Hopefully you will not have to use this procedure but it may be the only
option if, for instance, you must use audio from a back-up recorder.
1. Export audio either as OMF or AIFF files as desribed in OPTION 1.
2. If you had to use the AIFF file approach, you have an extra step, because
you’ll need to distinguish where region cuts occur. Using a printed EDL, carve
the files into regions, noting the take I.D.s.
3. Mod match all the needed takes by locating some obvious sync points (“p”
or “t” sounds in dialog, or door slams, etc.) and nudging the new audio to
line up with the original. Then trim the new regions to fit the original region
lengths. This gives you the equivalent of an "Auto Assemble" -- only
it's not automatic. The advantage here is that you have access to the entire
audio take, not just limited "handles." (This approach may seem like a major
chore, but it's been routinely done on many feature films. Of course there
are software programs that can help by comparing waveforms and doing some
of the mod matching authomatically. You could experiment with a program like
Titan or Vocalign, but really, if it’s just a few takes it’s probably easiest
to do this manually. You’ll be surprised how fast it can go once you get
used to it.)
Historical note: the above is analogous to the method used by sound editors
who worked in the days of magnetic film. The picture editor’s worktrack may
have gone through a fair amount of physical wear and tear during editing,
so it was common to replace that track with fresh mag film transfers. The
original was locked into a synchronizer equipped with sound heads (basically
a round sprocketed shaft with magnetic heads on little swivels) and the sound
editor would go through, cut by cut, and find the corresponding words in the
new transfer, carefully line it up in sync with the worktrack, and
create the new edited version from those new transfers.
A variation on the “mod match” approach has been done for several recent
Spielberg films (such as “Catch Me If You Can” and “The Terminal”) because
his editor is one of the few still cutting on film. In that case, an analog
to digital transfer had to be made of the worktrack and an EDL used to identify
where the various sound takes occurred. Much of the re-syncing was done with
Titan software but rechecking and finessing was still required.
SOUND - GETTING PRO TOOLS SOUND INTO FCP
DIALOG PRE-DUBBING
You can create a dialog pre-dub bounce at 48 kHz and import it back to FCP
for further mixing with effects and music within FCP, or you may continue
to do all the post sound in Pro Tools to create a final DME mix that can then
be imported into FCP.
If you're going to do more effects and foley work and final mixing in Pro
Tools, I'd strongly recommend that you do a dialog pre-dub rather than try
to handle all the tracks in one large complex session. For one thing, you're
liable to use a number of plug-ins for cleaning up your dialog and that could
tie up some DSP power that could limit your choices later on for your effects
and music work.
A dialog pre-dub should contain your principal sync dialog and any principal
ADR with your main actors. It should not contain Group ADR or walla. (You
may wish to do a separate pre-dub of those elements, or you could leave them
as separate tracks to carry in the final mix.)
In creating the dialog pre-dub you should concentrate on getting a smooth
and believable balance from angle to angle within a scene, and good relative
levels on scene transitions. This is the step where you should concentrate
on minimizing background noise, getting extra clarity on "problem tracks"
that could benefit from EQ, fixing mis-matches of perspective, and so forth.
The audience shouldn't have to strain to hear any key lines of dialog; they
shouldn't have their ears pinned back by jarringly loud lines either.
You may want to use some gentle compression where needed during the pre-dubbing.
Remember, though, that in the final mix you'll be combining the dialog with
new elements of music and effects -- so the level adjustments you're making
in the pre-dub aren't necessarily carved in stone. You'll undoubtedly have
to do some level riding and tweaking on the pre-dub itself during the final
mix, so you may want to be a bit conservative during the pre-dubbing phase.
(Example: It's easy to add more reverb to the pre-dubbed voices in the final
mix, but to "undo" excessive reverb you created in the pre-dub would involve
going back to the original dialog tracks and starting from scratch.)
Note: If you're going to do the rest of your post in Pro Tools, you may
wish to bring a few effects ambience background tracks into your dialog pre-dub
session. DO NOT mix these into your dialog pre-dub -- keep them separate until
your final mix. But by having them available to play against the dialog tracks,
you can sometimes tell if some ambience bumps in your dialog tracks will
be "fixed" by the addition of a bit of background effects. In fact, some
feature mixers like to have background ambience pre-dubs done and available
during their dialog pre-dubs for this very reason.
FX PREDUBBING FOR IMPORT INTO FINAL CUT
The method I would like to see more students experiment with is for the
FCP editors to hand off the FX editing and pre-dubbing to Pro Tools users
(RECA majors) who would then create pre-dubs of props, FX backgrounds and
hard FX for importing into FCP. (The easiest option being to create bounces
as 48 kHz AIFF files -- either mono or interleaved stereo.) The FCP editors
would then be responsible for combining that with their own production dialog
tracks for the final mix.
After all, the person editing the picture is most intimately familiar with
the production sound and has all the material in FCP to do the bulk of the
dialog editing.
Alternatively, you could create a dialog pre-dub in FCP for import into
a Pro Tools session. The Pro Tools session would then contain the FCP dialog
pre-dub and all other effects and music. Then the final mixing could be done
in Pro Tools to create a bounce of the final stereo DME for reimport into
FCP.
But it seems that Production students who are lucky/smart enough to get
RECA majors involved with their sound tend to hand off the dialog work as
well.
In which we first wind it back to the RECA 250 level and deal with such
mysteries as:
What should my two track stereo mix levels
look like on the Final Cut or Pro Tools meters?
That’s a perfectly reasonable question – and you can search the web and
professional references in vain for a straightforward answer. We’ll start
with the short version and then work our way into the details.
Typical dialog scenes should peak in the neighborhood of your reference
level. By “peak” we mean the highest reading for the louder words in a sentence
or longer speech. By “neighborhood”, let’s say 3-7 dB below reference level
for normal or subdued conversation. A bigger-than-life Voice Over or strong
in-your-face speech should peak right at or a few dB above reference level.
In mixing for film or TV, “dialog is king” – meaning: set your dialog levels
first and adjust music and effects relative to the dialog. Remember that,
although you may know the words by heart, the audience usually hears them
just once, so please err on the side of keeping the dialog clear and understandable.
How much louder can you go with music and effects? That depends on whether
you’d like a mix that might be suitable for TV broadcast, which is a goal
worth striving for. Broadcast TV has limited headroom compared to theatrical
release. Figure on maximum peaks of 8-10 dB louder than reference level, maybe
12 dB louder if you want to push your luck. (The same headroom applies to
video formats like Beta SP, which is still a preferred format for many festival
submissions.)
What reference level do I use?
In the U.S., as specified by the Society of Motion Picture & Television
Engineers (SMPTE), professional film and TV broadcast mixers use a digital
reference level of –20 dBFS. And we’d like to do our work as professionally
as possible, right? So at LMU we have recently adopted this same standard.
This has several repercussions for us. For a theatrical release on 35mm
with a Dolby digital soundtrack, a –20 dBFS reference level means that the
loudest sounds can go almost four times louder (and remember, reference
level is already “loud and clear.”) But people mixing for NTSC broadcast
have to be careful that they don’t actually use a lot of that available headroom.
Their maximum levels shouldn’t exceed –12 dBFS to –8 dBFS, depending on their
network specs.
LMU production specs call for a mix that meets “broadcast standards.” We’d
like your projects to play well if broadcast (a number of LMU films have played
on PBS, for instance.) And broadcast-friendly mixes also tend to play better
in home theater environments, or off a TV in a producer’s office.
The problem is, asking a typical student filmmaker not to push their loudest
peak levels beyond –12 to –8 dBFS is like handing them the keys to a Ferrari
and asking them not to drive above 65mph. But that’s how the pros do it, and
that’s what we’re expecting from you.
I already did a mix at home on my own computer,
and I used a –12 dBFS reference level in Final Cut Pro. Do I have to re-mix?
Not necessarily. If you did a proper mix with respect to a –12 dBFS level,
all you really have to do is take existing mix and lower it by 8 dB. The easiest
way to do that is with the master fader in the mixer window. Since you only
had 12 dB of headroom to work with before, your loudest sounds should “automatically”
fall into the recommended headroom for NTSC broadcast.
After adjusting the master fader, double-check the result by playing it
back in a room properly calibrated for a –20 dBFS reference level.
(If that seems too simple, rest assured that I have done similar level tweaks
for delivery to European broadcasters, who use a slightly different reference
level of –18 dBFS. If you think about it, reference level simply represents
“good and strong”, so as long as we don’t exceed the headroom of our target
media there is a bit of leeway as to the exact number where that level might
be defined. The important thing is that our mix has a consistent and
intelligent relationship to that reference level – only then can we get away
with adjusting it up and down the scale if needed.)
So how should my two-track stereo mix levels
relate to a –20 dBFS reference?
Simple enough math:
SOUND PEAK
LEVELS IN dBFS
STRONG DIALOG -20 to
-18 dBFS
MEDIUM DIALOG -25
to -23 dBFS
SOFT DIALOG
-27 to -25 dBFS*
*If you want your dialog mix to be “TV safe”, the networks really seem to
prefer that even subtle dialog passages have some peaks no more than 7 dB
below reference, which would be about –27 dBFS on this scale. (TV dialog mixes
tends to be more compressed than feature films.)
CHART: FINAL CUT PRO METERING - IF USING A -20 dBFS REFERENCE (SMPTE STANDARD)
Based on a -20 dBFS reference level, with dialog panned to the center of
a stereo mix, a “broadcast-friendly” dynamic range, and meters with an instantaneous
rise time.
To summarize:
Now that we’ve over-simplified all this down to a graph and bullet points,
let’s stress the following: this does not mean that every word of dialog needs
to hit some kind of magic number. These are just ballpark figures for the
peaks of representative lines of dialog. Some may go higher. A few may go
lower.
So if it’s so simple, why make it complicated?
THE DETAILS
How do I meter dialog for a 5.1 surround mix?
Dialog meter levels were given assuming a 2 track stereo mix. If you’re
doing a true 5.1 mix and your principal dialog plays in just the center channel
(and your meters are looking at just that one channel), then add 3 dB to the
figures given. Why? Because if two speakers are each playing the same dialog,
the channels add together for a 3 dB increase in listening level. If you’re
relying on just one channel to play the sound, you have to make up for the
difference to get the same loudness.
Why do different meters show different levels?
Different audio meters have different ballistics, meaning they react differently
to quick changes in sound level. VU meters are especially tricky to use for
judging dialog because they’re designed to react quite slowly – they have
a 300 millisecond rise time (reaction time) to fully deflect. So a quick transient
peak like a drum hit might barely move the VU needle, even though it is good
and loud.
So most professionals mixing for digital media no longer rely much on VU
meters. Instead they use some variation of peak metering, where the meters
react very quickly to changes in sound.
One of the nagging details is: just how fast do the meters respond? Many
of these software meters are instantaneous. That is, they have an 0 millisecond
rise time. So even one sample of a waveform – 1/48000 of a second –
can deflect the meter to it’s maximum value.
The standard software meters for Final Cut Pro and Pro Tools are instantaneous
peak meters.
These are really handy to avoid distortion from digital clipping. The thing
is, you’re unlikely to hear distortion that brief. You’d have to pile
up more than just a couple clipped samples in a row to hear a problem. In
fact, a very quick peak can go close to the max on the meter and not be perceived
as all that loud – our senses take a little time to react to loudness. A sustained
peak can seem much louder than a very brief peak (even if the sustained peak
reads lower on the meter.) So an instantaneous peak meter isn’t always a
reliable measurement of what seems loud to our ears. And you should
mix a little more “aggressively” when you are using instantaneous meters,
otherwise you are likely to set your levels on the low side.
Complicating all this number-crunching is that some professional peak meters
are designed not to be instantaneous, but to have a slightly slower reaction
time. (A delay of just 10 milliseconds can result in readings that are lower
by 2-3 dB.) Recent research suggests that in many ways these are preferable
for measuring things like dialog peaks, since they tend to “ignore” some of
the very briefest transients and give a more realistic measurement.
Another type of meter provides an alternate “RMS” display that represents
more of an average level. RMS readings will tend to read at least 10
dB lower than instantaneous peak readings. And still other meters display
both peak and RMS readings simultaneously.
Another detail: if you are mixing in 5.1 surround, where dialog principally
plays in only one channel – the center speaker – dialog levels should read
about 3 dB hotter than the figures given above. Why? Because with two track
stereo, the dialog comes out of two speakers, at equal loudness. With dialog
coming from only one speaker we have to boost it somewhat to make up for the
difference. (The same rule applies if you are creating a mono predub
and checking levels on a mono master fader.)
So how do I judge loudness in my mix?
You have very good loudness detectors on either side of your head. Seriously
– your ears and common sense play a big role in all this. Meters are used
to get you in the ballpark – then you have to listen and use good judgment.
You also have to have your speaker playback volume set appropriately (follow
the recommended settings in various rooms and use the “LMU Level Check” audio
file when in doubt.)
How do I set mix levels by ear?
The first step is to set your monitor listening levels correctly so that
a level that looks "good and strong" on the meters will sound "good and strong."
In some of the Pro Tools rooms the playback levels have been calibrated; use
the monitor level markings on the Mackie mixers so that the loudness you
hear at the workstation will correspond to the loudness in a properly set-up
theater. (As a rule, the rooms are set up so that you'll get a proper listening
level by turning the mixer volume knobs to Unity Gain (12 o'clock.)
Many times you'll be working with headphones or on home systems that have
not been expertly calibrated. But you can set workable "ballpark" levels using
the sample files provided. Just go to the desktop folder called MixLevelCheck.
(For easier use in Final Cut Pro, this would is a "AIFF 48 kHz" stereo interleaved
version. Pro Tools users will have to convert this into two mono files during
import.)
Bring the file called "-20 dBFS_LMU_LevelCheck" into your session. It's
a stereo file, so put it into two tracks panned left and right. (For Pro
Tools, create a stereo track in which to place the file.)
Play the file and adjust your headphone/speaker level accordingly. There's
a voice-over that explains how things should sound. Basically you start off
by playing some voice samples that help give you a feel for how typical dialog
might play: a softer normal voice, then a bigger-than-life "Movie Voice Over"
voice. The voice that introduces each sample actually gives you a good guide
for a moderately strong voice level. To give you an appreciation of what "loud"
should sound like, there is also some fairly loud music and an explosion sound
effect. Lastly, there are a few samples to give you an idea of what
would qualify as "subtle" or right at the edge of audibility for TV broadcast.)
Now that you have a good listening level that corresponds to a good meter
level, you should be able to apply common sense to your mix -- if something
sounds too loud, you should probably lower it; if it sounds too soft,
you should probably raise it.
Remember to check your monitor levels at the beginning of every mixing session.
Doesn't a higher or lower reference level mean my movie will sound louder or fainter overall?
Not necessarily, because playback levels are based around the reference
level. In other words, we adjust the playback volume so that sounds recorded
at reference level will sound "good and strong." You may have noticed that
when you play a typical dialog scene from DVD instead of a VHS tape, you tend
to turn your TV volume up a bit higher. That's because DVD uses a lower reference
level. You may also have noticed that once you do so, when the big sound
effects and music scenes come along, they tend to sound pretty loud. Because
the flip side of having a lower reference level is that you have more headroom,
more leeway for your loudest sounds to go louder.
Here's one way to think of it: for DVD, we record most of our mix at a lower
level, with our average "strong" levels hovering around -20 dBFS. As a result,
we have more safety margin for our loudest sounds because with that lower
"average" level the loudest sounds can be 20 dB louder rather than just 12
dB louder.
How do I transfer my mix from digital to an analog format like Beta SP tape?
If you use a reference level of -20 dBFS in Final Cut Pro and you are transferring
to VHS Hi-Fi or Beta SP, play your reference tone and set the input of the
Beta deck so that its audio meters read "0" VU. Then locate one of the loudest
portions of your mix and double-check the levels on the Beta deck's meters.
On Beta SP you would expect the meters to tip "into the red" a bit. That's
fine; that's what you'd expect. What you don't want is for the meters to
slam into the max and stay there for extended periods.
Note: VHS linear audio tracks only have about 8 dB of headroom, so if your
mix tends to be on the loud side overall you may want to adjust your reference
level lower to that it reads "-4" on the VHS deck. Or, if your film is on
the quiet side you may not have to adjust at all. You may have a peak or two
that will distort a bit on the linear tracks, but these days most VHS decks
play from the Hi-Fi tracks as their default anyway.
It's a good idea to spot check the tape by playing back a few of the louder
scenes.
Do I need to remix for different formats like DVD or Digi Beta?
The short answer is: not necessarily. If you use a -20 dBFS reference level
and mix as suggested, your mix will fit into the available headroom of DV
tape, DVD, and VHS hi-fi The result should make your average levels more
or less the same as most DVDs, although your loud scenes won't be as loud
as most recent commercial DVDs because you'll have some "unused headroom"
left over.
Why not boost the mix level by 8 dB to use some of that headroom? Because
by keeping your “average strong levels” in the neighborhood of –20 dBFS you
will have better compatibility with most commercial DVDs. In fact, when you
author the DVD in DVD Studio Pro, you have the option of setting what is called
a “dialnorm” value, which is the average dialog level. Based on feature film
dialog, the default level is set to –27 dBFS which should be appropriate for
your sound mix. (Remember that peak levels tend to read higher than average
levels.)
Now if you want to fully exploit the greater dynamic range of DVD you should
reopen your original mix session and selectively raise the biggest loudest
elements to take advantage of the increased headroom. (This approach might
be good for a suspense film, where you are exploiting the shock effect of
going from a quiet scene to a loud one, or using big music "stings" for emphasis.)
Another possible approach: depending on your film, you might just "ride
the level" on your completed mix file. Start by creating a FCP session which
contains your picture and completed sound mix file. Now go in and selectively
raise portions of the mix where you want the whole mix to go louder. (The
disadvantage of doing this to a completed stereo mix file rather than going
into the original mix session is that you don't have separate control of the
sound elements. For instance, if you want to raise a music cue you'll also
be raising ambiences and sound effects as well. So this approach is not recommended
where you have dialog content.)
All of the above is the "short answer."The long answer gets complicated.
When you're mixing on a simple workstation, the resulting mix is a kind of
"prosumer compromise" to begin with, because you're creating a simple two
track stereo mix.
The slang term for this is "Lo Ro", meaning "Left Only, Right Only". This
is the format for standard audio CDs, and most people watch stereo television
shows or home video in the Lo Ro mode.
But film mixes are not created in Left Only, Right Only stereo. Even the
earliest stereo films added a separate center channel, and from the 70's to
the mid 90's most stereo films were mixed in a four channel format: Left,
Center, Right, and a single mono Surround channel. This is called "Dolby Stereo."
The trick to this format is that it's actually stored on just two tracks.
It takes a special encoding and decoding process to derive four separate channels
from the two tracks, so to distinguish it from regular stereo it's called
"Lt Rt" for "Left Total, Right Total." Most VHS tapes of popular films have
Lt Rt tracks, and stereo TV is broadcast in this format, so if you have a
Dolby Pro Logic amp and the right speaker setup, you can listen in this mode
at home.
Digital film release formats, including DVD, support "5.1" stereo which
improves on the Dolby Stereo format by giving us a stereo or "split" surround
and also a dedicated channel for very low frequencies (an LFE or "subwoofer"
channel.)
The problem is: there's no practical way to mix for these formats unless
you are listening in a room that has the correct number of speakers, properly
tweaked. As of this writing, your basic Final Cut Pro or Pro Tools LE workstation
is a "Lo Ro" environment.
How big a compromise is this? Well, good film mixers try to double-check
their 5.1 mixes against a "worst case scenario" and ideally create actual
re-mixes for formats with less headroom, fewer channels, etc. (This is called
"down-mixing".) For lower budget projects this is sometimes done handled just
by doing "compromise transfers", taking the 5.1 mix and tweaking it only
slightly.
With a student project you have the opposite problem: you're taking a Lo
Ro downmix that will probably play mostly in Lo Ro settings (like an agent
or producer watching it on TV) and hoping it will play well in potentially
"upmix" settings like a festival theater. For a theatrical showing from Beta
SP or DV tape, your film will probably be run through a Dolby Stereo decoder.
That means that the decoder will be trying to derive 4 channels from your
two track mix. The basic rules for the decoders is:
So anything you panned dead center -- like dialog -- should end up playing
in the center channel. Anything that "leans" left or right -- like stereo
music -- should play in the left and right speakers only. Now in theory nothing
should go into the surround channels, but in practice you sometimes end up
with "magic surrounds" -- a bit of your stereo recordings may bleed into the
surround channel because of complicated phase relationships due to the original
stereo mike placements. (This happens a lot with professional mixes done
in this format, too. Sometimes it's a pleasing effect, sometimes not. Just
one of the trade-offs on the encoding/decoding process.)
Also be prepared to have your stereo mix sound "narrower" or somewhat more
"mono-ized." (Another case of reality vs theory.)
Note: To avoid having Lo Ro mixes decoded as if they are Dolby Stereo, DVDs
contain a little "flag" in the metadata that "tells" a Dolby Pro Logic decoder
whether the mix is Lt Rt or not.
I want my movie to be as loud as possible -- how can I do that?
It's true that by using only 12 dB of the total 20 dB of headroom available,
the loudest sounds in your mix won't play quite as loud as the loudest scenes
in the latest Hollywood action movie -- but judging from surveys done of audience
complaints to theater owners, that's not necessarily a bad thing. (I'm not
necessarily trying to talk you out of your goal of a "loud movie", but let's
kick around the repercussions first.)
Before around 1993, movies were released to theaters in formats that only
had around 9 dB of headroom in four channels. This meant that the very loudest
scenes in a movie might reach around 103 dB SPL.
Now trust me: audiences in the 80's weren't complaining that movies like
"Top Gun" would be better if only they were louder.
But digital sound formats came along, and movies released with digital sound
have more headroom, so the loudest scenes can reach around 118 dB SPL. And
a difference of about 10 dB will strike most people as being twice as loud,
so this is a pretty dramatic difference.
There's nothing necessarily wrong with this picture so far: that extra headroom
was meant to be used for truly cataclysmic, awesome events, like "Death Stars
blowing up" and so forth. The problem is that Hollywood historically isn't
known for self-restraint -- just look at old posters hailing every cheesy
B movie as "colossal, stupendous, tremendous!" So you have half the producers
in town urging their mixers to make every two-bit "car crash effect"or music
cue louder, right up to the max.
It's a little "needy", isn't it?
But if that's your goal, the obvious solution is to use the SMPTE standard
of -20 dBFS as your reference level, then "mix to the max." Be aware though,
that a format like Beta SP won't "hold" those loud passages without serious
distortion -- you can't pour 20 ounces of water into a 12 ounce glass. So
if you're transferring your mix to Beta SP you should line up your reference
tone to "-8 VU" instead of "0 VU" -- which tends to defeat your purpose, as
your loudest sounds coming off the Beta deck will play just as loud as anyone
else's. (In fact your average dialog scenes will now end up sounding on the
faint side.)
If you output to DV tape then the medium can “hold” your loudest sounds
and you should carefully label it as having a reference level of -20 dBFS
and note where the maximum peaks fall. For LMU screenings, the playback level
will not be customized to your mix choices so if you have mixed unwisely (for
instance, if you have mixed typical dialog scenes much hotter than recommended)
it may be necessary to spare the audience major discomfort by stopping the
tape so that the project will not finish screening.
The best solution would be to create a mix for alternate venues: one that
is “TV friendly” (LMU specs) and another that is “Theater friendly.” You could
use DVD as your release format, or if you have a big enough budget, have
a Dolby Digital 35mm print made.(And if you lean toward that approach, you
should look into doing a full 5.1 mix. That gets pretty involved, and is
way beyond the scope of this little handbook.)
Or you can content yourself with knowing that, even with Lo Ro playback
and the recommended mix levels, if the theater is properly adjusted your
loudest peaks should hit around 100 dB SPL.
HOW CAN I MAKE THE MIX SOUND BETTER?
Listen over good headphones or speakers, at an appropriate volume.
Seems like a pretty obvious statement. But what does "good" mean? And what is "appropriate"?
Basically, you want headphones/speakers that are accurate or have a "flat" frequency response. Avoid headphones that brag about "Mega Bass" and such, because they are designed to artificially pump up low frequencies. If you base all your mixing judgements on what you hear in the "Mega Bass Mode", you'll probably be disappointed when you hear the results played on properly designed speakers.
(Just as a personal aside, for headphones I've found that the Sony MDR-7506 types tend to provide a fair approximation of what you will hear on a good dubbing stage. They aren't cheap at around $90, but might prove to be a good investment.)
As for appropriate listening level, you should ideally listen in a room that has playback levels calibrated to match those of a good theater. That's not always possible, but you can use a reference tone or a sample soundfile to get your monitor level in the ballpark. (More about this topic.)
Note: If you're confident that your mix will be heard primarily in television viewing or home theater situations rather than a large theater, you'll probably want to cheat your monitor level down so that it more closely matches the typical listening situation. Most professionals mixing for television set their monitor levels about 6 dB lower than the standard level for theatrical film. (That translates to about a 79 dB sound pressure level instead of an 85 dB sound pressure level.)
Set dialog levels first, then music and effects.
This rule may not seem obvious at first, but suppose you were to set your main title music level before listening to anything else -- and you tended to push it toward the maximum level. Now suppose that same music cue continues and overlaps into your first dialog scene. You would probably end up pushing the dialog hard just to make it heard over the music. Then when the music ends and the dialog continues, you'd find yourself with an artificially "pumped up" dialog scene. If you follow that with a scene at normal levels, it will seem somehow out of balance, and you might be tended to raise it as well, creating an entire mix that is slightly off-kilter.
Dialog in a sense serves as our point of reference for what is "normal loudness" in a mix. There's a certain range of loudness that we expect to hear given a certain image size on the screen: a big close-up sounds one way, a person standing in a wide shot sounds another. What I'm suggesting is that you get the dialog in the ballpark first and then adjust other elements relative to that.
For dialog scenes, avoid selecting music cues that clash with the dialog.
Sometimes music cues can create some unwelcome competition for your actors and the performances you are trying to impress on the audience. Just playing the music lower isn't always the answer; at times it's the nature of the music cue itself that is the problem. For one thing, some types of music are simply "designed" to play big -- Beethoven's 9th Symphony just doesn't sound right when it's pushed to the background.
Or sometimes the music has so much energy in the mid-range frequencies that it tends to drown out the dialog. This is due to a quirk of our hearing called "frequency masking." It's easier for our brains to sort out two equally loud sounds if they are in very different pitches. When the sounds are in the same range it's easier for one to "drown out" the other. Mid-range frequencies are especially important when it comes to understanding speech, so as a rule if you're selecting or composing music for a dialog scene it's best to avoid having lots of instruments working busily in the mid-range.
Sheer "busy-ness" is also a factor; if the music is in a fast tempo and highly melodic, part of our brain is trying to follow the complex melody line and this tends to compete with the dialog for our attention.
And for a stereo mix, since dialog typically plays from the center speaker, music that contains a lot of information assigned "dead center" tends to fight the dialog more than music which has a wider stereo spread.
The above are just general observations and you can obviously think of many exceptions to them. Just be aware that there are sometimes technical reasons why a particular music cue tends to muddy your dialog track.
When you have ambience mismatches in your dialog tracks, attack the problem with dialog editing tricks first.
I've noticed that students will often approach an ambience mismatch from one shot to another by making level or EQ adjustments. This may work to improve the offending "bump" on the cut -- but what is it doing to the dialog itself? Will it play at the proper level? Will it sound artificially "tweaked"?
It's usually best to address these problems through editing first. You can eliminate many problems just by good editing, and at least minimize others so you won't have to resort to major processing in the mix. Often this will result in more natural-sounding tracks than if you resorted strictly to mixing/processing tricks.
Remember that the audience is hearing the dialog for the first time; you know the words by heart, but they may need help understanding them.
So obvious it hardly needs saying, right? And yet you can probably think of more than one recent film where you couldn't quite make out a line and felt frustrated. (This is especially deadly in a comedy, when the audience really wants to get the joke.)
It's understandable that after spending months going back and forth over the editing filmmakers can recite the entire film in a sleep-deprived dream-state -- and often do. And it's understandable that it's easy to lose perspective in the mix.
But smart filmmakers develop a knack for asking themselves such dumb questions as: can I understand what the actor is saying? If the answer is no, they look to fix it.
Learn to listen critically for elements that are out of balance or "stick out."
This is really the tough one, and I have no stunning insight to offer as to how you develop this ability, other than to listen to what is considered a good professional mix and then compare it to some not-so-good student film mixes. (You can also learn a lot listening to good student mixes and not so good major studio films.)
Foley in particular often needs careful adjustment in order to blend in with the rest of the track. Footsteps, for instance -- how do they sound in one mix as compared to the other? If you know something isn't quite right, can you pinpoint the difference? Are they louder? Too soft? Brighter, more trebly? Too bassy? Or maybe the scene is set in a big empty room and the foley sounds too "dry" and lacks reverb.
I'll risk a big generalization here: it seems that in a lot of student mixes foley tends to be mixed a bit too loud and "upfront." It may be a natural inclination -- "I went to all this trouble to record this stuff, I want to hear it" -- but the result can sound very unnatural. So in addition to playing the foley tracks lower, you may want to experiment with rolling off some of the high frequencies and adding a touch of reverb (when the space of the scene is appropriate.) This can help create a more realistic perspective to the sound.
Ambiences, too, are often mixed a bit on the hot side. Again, for a dialog scene, the dialog itself should give you a pretty good point of reference for giving other elements the appropriate weight.
Other times it's as simple as imagining yourself standing at the same location, right next to the camera, and imagining how something would sound.
If it's not broken, don't fix it. Avoid running your tracks through compressors, equalizers, etc, unless it's called for.
You can really get yourself in trouble by doubling and tripling processors and running the sound through a lot of "roundabout routes." Sometimes you can even have one processor trying to "undo" the processing of another. (For instance, if you've run sound through an expander in order to reduce some background noise, then do heavy compression on it, you can end up bringing a lot of the background noise back up.) And if there's still a problem with the overall sound, it just makes it that much harder to pinpoint the cause because you've introduced so many new variables into the signal path.
So please don't apply a processor simply by rote; approach the tracks case by case and decide when some special treatment is actually needed.
When you have to do processing, periodically use the bypass feature to compare the "new improved" version to the original. (It's easy to overdo the processing.)
This is closely related to the above. The bypass switch is a great tool for judging whether your "fix" is actually an improvement or whether you're better off leaving the sound alone.
Audiences respond to change. Provide some contrasts to keep your mix dynamic.
Successful entertainers understand this need for dynamics; musicians may do a series of increasingly intense songs and build to a sort of crescendo -- then follow that with a slow ballad to take the energy down before building it up again for the finale. By contrast, a strategy of trying to sustain one long peak may backfire and result in a kind of plateau. Even though in theory you've kept the intensity level constant, the audience feels let down.
Some films lend themselves better than others to big contrasts in sound -- suspense or action films, for instance. But even a relatively quiet dialog-centered film can benefit from little sonic changes-of-pace. Hopefully you've scripted and designed your film to create those opportunities.
Sure it helps if your sound is technically polished -- meaning that the recordings aren't unintentionally distorted or noisy, that the dialog sounds clear and natural, etc. But to really have an effective soundtrack, it's important that the sound conveys information that the audience cares about. If the dialog is hackneyed and irrelevant, the audience won't care if they can't make out a few words -- why should they? If the music doesn't take us anywhere interesting, it's really more of an annoyance than anything else. And if sound effects are merely used to cover the basics of "seeing a car, hearing a car" then a filmmaker really isn't working with the full potential of the medium.
One trick to get beyond the basic "two dimensional" treatment of sound and to really engage an audience, to make the sound matter to them, is to create story situations where the sound matters to the protagonist. Your main character should have a "stake" in the sounds going on both in the story and the film. We need to see the character actually listening to something and reacting to what they hear. And we need to hear it as well (although not necessarily at the same time as the character hears it.)
Suspense films provide obvious examples: a muffled conversation heard through a wall, the squeak of a footstep on a staircase, a doorknob being turned. Or: the deep creaks of metal under stress, sonar beeps, and the concussions of approaching depth charges.
It needn't be so melodramatic. You could imagine a comedy about a petty office rivalry where the sound of a Xerox machine might have great significance to one of the characters. The point is this: the soundtrack is not just a given, some minimum-daily-requirement of technical competence, but must be integral to the story. Oddly enough, an audience will overlook some lack of technical polish if they're really engaged and moved by the film.
Your job as a filmmaker is to find new and interesting ways to do that.
Copyright © 2003, 2004, 2005, 2007 by Rodger Pardee