copyright 2012, Jump Cut: A Review of Contemporary Media
Jump Cut
, No. 54, fall 2012

Truth in the mix: Frederick Wiseman’s
construction of the observational microphone

by Giovanna Chesler

A version of this essay was first published in German in Frederick Wiseman, Kino des Sozialen (Ed. Eva Hohenberger, Vorwerk 8, 2009, pp. 139-155)

As he explores the connections between technological developments and documentary practice and form in Claiming the Real and Claiming the Real II, Brian Winston acknowledges the new threat to documentary posed by digital photographic manipulations and CGI:

“The diffusion of this technology…is taking-decades. Nevertheless, it is not hard to imagine that, by the end of this process, every documentarist will have to hand, in their video-editing laptop, the wherewithal for complete fakery. Technology, by finally and irrevocably dissolving the connection between the image and the imaged, must therefore have a significant potential impact on the documentary film. The camera’s capacity to capture the real will not be erased by this, but a far greater sophistication on the part of the audience will be needed to determine documentary’s authenticity. (2008, 9)

For me, this attention to digital image manipulation as the ultimate manipulative practice evidences a foregrounding of the documentary image in truth debates. Yet the soundtracks of documentary films, carefully constructed, invisibly edited, and easily manipulated, provide a bed of truth upon which documentarians stake their claims and construct their narratives. Soundtracks and their relationship to the image must be considered as a primary method for ‘fakery’ in documentary film. Yet, to attend to the manipulations afforded by sound editing, we need not condemn documentary practice as a whole. Indeed I shall rely upon Winston’s arguments from Claiming the Real to save our cherished practice.

Here I turn my attention to the mix of observational cinema, and to a filmmaker brilliantly emblematic of this mode of documentary, Frederick Wiseman. I shall explore the use of sound in two of his films: High School (1969), Wiseman’s portrait of the institutionalized power dynamics between students, teachers and parents of Northeast High School, a school situated in a middle class neighborhood in Philadelphia, and Domestic Violence (2001) a study of the women and children’s domestic violence shelter The Spring in Tampa Bay, Florida. These films stand as prime examples of observational cinema, but are connected here for the striking significance of sound thematically in these works. I highlight the sound editing practices Wiseman employs, while attending to Wiseman’s use of music and volume in his explorations of gender and power within the institutions observed by these films.

“Mommy if you just be quiet”: dialogue editing for continuity

G R Levin: “During the shooting, you do the sound, and somebody else shoots. You’re the one, I take it, who chooses the scenes to shoot?”

Frederick Wiseman: “Right. And I work out signals with whomever I’m working with so that he knows the way I want to get the shot, and we talk very carefully both before, during and after the shooting about those stylistic things I like.” (Levin 1971, 318)

In Wiseman, we find a filmmaker who is both editor and sound recordist. At the moment of recording, Wiseman, tethered to the audio recording equipment, focuses on the word and determines what images the cinematographer captures based on the audio. The microphone becomes a laser pointer, illuminating places of interest within a living scene, calling upon the cinematographer to follow its line toward the object worth filming. In this moment of listening and recording, we must imagine Wiseman as editor as well, already considering the moments where the voice will be selected for in the finished piece.

Wiseman’s filmmaking style falls within the tradition of observational documentary, often called Direct Cinema. This style of documentary production became popular with the advent of wireless synchronized sound recording technology and the movement away from 35mm toward 16mm cameras at the beginning of the 1960’s. Makers employing this mode abandoned voice-over narration and evidentiary editing, privileging the observation of lived experience spontaneously (Nichols 2001, 110). The primacy of speech, recorded in the moment, altered the construction of documentaries:

“Synchronized sound affected editing style. The silent-film editing tradition, under which footage was fragmented and then reassembled, creating ‘film time,’ began to lose its feasibility and value. With speech, ‘real time’ reasserted itself.” (Barnouw 1993, 251)

In addition to structuring time, voice-over narration, often provides the glue holding together documentary films. Though heavy handed and authorial, voice-over narration allows the documentary editor to connect disparate images and moments. In their commitment to representing ‘real time,’ observational makers are challenged to present scenes that convey a reality of time as lived without the straightforwardness of an authorial voice. To achieve the construction of cohesive scenes, observational makers turned to the tradition of continuity established during the Silent Era (Ruoff 1993, 28.) In Cross Cultural Filmmaking: A Handbook for Making Documentary and Ethnographic Films and Videos, authors Barbash and Taylor translate the rules of continuity editing for a documentary audience:

“watch out for potential jump cuts…try to overlap action from shot to shot…try to match spatial relationships, screen position, and eyeline from shot to shot…try to match lighting conditions…try to maintain a consistent screen direction…resort to cutaways to avoid a jump cut if need be…last, but not least, don’t follow these prescriptions to the letter!” (1997, 389-397)

The style of observation in documentary challenges the film editor to create a story with a strong arc and to build scenes that have continuity and resolution which move the story arc forward.

In his earlier work, Wiseman began with approximately 90,000 feet of film, or 42 hours of raw material and condensed this into 3,000 feet, or 1.5 of hours for the finished film (Wiseman in Atkins, 1976, 35 & 47). Presently, his films run between two-and-a-half and four-and-a-half hours. According to his own calculations, he uses about 3% of what he shoots in his final film (Grant 2006, xi). In the introduction of published transcripts in 5 Films by Frederick Wiseman, Wiseman notes his challenges as an editor and the relationships between condensing time in documentary film and methods of fiction filmmaking:

“For me the making of a documentary film is in some ways the reverse of making a fiction film. With fiction, the idea for the film is transformed into a script by the imagination and work of the writer and/or director, which obviously precedes the shooting of the film. In my documentaries the reverse is true: The film is finished when, after editing, I have found its “script.” If a film of mine works, it does so because the verbal and pictorial elements have been fused into dramatic structure. This is the result of the compression, condensation, reduction, and analysis that constitute the editing process for me.”(Grant 2006, xi)

Though not explicit in his method, I believe that Wiseman’s emphasis on the “script” of the documentary suggests an approach to editing that begins with sound. I term this method of documentary editing sound-up construction and believe it to be popular amongst documentary editors, though not so named as a method. Authors of popular handbooks on documentary filmmaking advise makers to produce transcripts of recorded dialogue and create paper edits from those transcripts as a principle step in documentary film editing (Chapman 2007, Barbash and Taylor 1997, Rabiger 2004.) In paper edits, sentences are strung together, time is condensed and sense is made of lived moments through the words connected on the page.[1] [open endnotes in new window] I do not wish to argue that editors of observational documentaries, including Wiseman, rely only upon the written word to construct their films. Rather, I wish to point to the role of sound as an architectural foundation of observational documentaries. For instance, later in the 5 Films introduction, Wiseman connects the use of dialogue with the other elements he considers as an editor, but again, dialogue is primary:

“My work as editor, like that of the writer of a fiction film, is to try to figure out what is going on in the sequence I am watching on the editing machine. What is the significance of the words people use, the relevance of tone or changes of tone, pauses interruptions, verbal associations, the movement of eyes, hands, and legs?” (xi)

As the base of the documentary edit, words on paper can be moved with ease, but in the mix, rearranging of words happens in degrees. Technically, it is quite difficult to bring sound from one location to another location, as all sound recording carries with it a spatial signature, or mark of the location in which it was recorded (Altman 1988). But within one location during one episode of filming, sound proves malleable. Words, phrases, sentences and paragraphs may be moved around allowing for a compression of time and an arrangement, or rearrangement, of a conversation. Wiseman’s sequences, which may range in lengths of two minutes to ten minutes, typically build from medium close ups which privilege the person speaking at the time. This framing renders the speaker in a space that may not be understood until a wide shot establishes the other listeners and speakers in the room. Typically, the camera follows the conversation, panning between speakers, maintaining medium close ups on each. Dialogue editing and, thereby, the condensing of time, occurs when the medium close up is disrupted by a cutaway.

In High School Wiseman uses the cutaway to maintain logical dialogue progression in the conversation between Dr. Allen and student Michael who fights for his principals by refuting the detention he has been given. A cutaway to Dr. Allen’s ring offers an opportunity to condense part of a conversation. In another moment, the technique of condensing time occurs between two edits as Michael inexplicably sits, and then returns to standing. The seated moment is identifiable through an analysis of the camera height and Dr. Allen’s eyeline match. However, the continuity in this conversation between Dr. Allen and Michael, and the pace of the dialogue editing does not explicitly reveal that time has passed between these shots or that they have been rearranged and brought together sequentially. However, the words have been rearranged though they convey the meaning Wiseman initially observed in the sequence. In the interests of time and dramatic action, Wiseman has made an editorial decision to link two separate moments.

As in High School, the cut away in Domestic Violence serves to condense time and conversation. Cutaways function so that stories can be trimmed and shaped, while the meaning of the stories themselves and the impact of the teller’s experience remain intact. Additionally, cutaways serve to surprise the viewer in ways fitting with the themes of the film. In the second half of Domestic Violence, Wiseman gives more time for stories to unfold and the camera lingers on the seemingly irrelevant twists and turns in the telling of a story (indeed, these twists are never irrelevant, but they demonstrate the way victims mask the worst parts of their experience.) Eventually, the long take is disrupted as the image cuts to another woman sitting silently next to the speaker who has been commanding the floor of the session. He then cuts to another woman, also sitting silently nearby.

Through this style of editing and sound construction, Wiseman doubles, then triples, then quadruples the silent bodies within the ‘safe’ space of the group, noting that most have yet to speak. These silent women vary in their listening styles: some look away, others look toward. Some react while others are distantly listening to their own memories. In these segments, carefully over the course of the film, Wiseman expresses the repetitive silencing endured by victims of domestic violence. The cutaway, placed within an extended story, and placed on top of the voice of a woman telling her story, both hides sound edits, but importantly attends to the fact that most within this space are not able to speak…yet.

Wiseman’s techniques in sound editing preserve continuity when the image may suggest a break. During a group meeting on brain washing techniques in Domestic Violence, an older woman with a whispery voice speaks to the ways in which she was silenced:

“He had my son and my daughter telling me, if you just shut your mouth. Mommy if you just be quiet. Mommy, don’t say anything. Mommy if you just stop. If you just be quiet you won’t get hit. But they don’t understand. Even if you’re quiet you’re going to get hit.”

The speech starts with a wide shot demonstrating the woman’s position in the room and the logics of space in this meeting. However her mouth movement and voice are not in synch in this wide shot. The next cut is someone else listening with a baby, and then a medium close up on the speaker. Now her lips and words move together. The technique of replacing dialogue upon the moving mouth of a social actor typically appears in documentary during a wide shot where the mouth movement cannot be read with specificity. Dialogue replacement may also occur when the camera’s lens privileges a character’s cheek, hiding the social actor’s mouth. Though the mouth is turned and lips may not be read, the cheek’s movement suggests that words are articulated.

This is a common practice in editing narrative films; a turned cheek and mouth blurred in a wide shot, provide opportunities for automated dialogue replacement (ADR.) Here we see Wiseman use the technique to cover the speaker when the camera was likely to be focused somewhere else or it was being reloaded. In some cases, he uses this technique to change a character’s line of action, ellipsing a tangential story. This serves in condensing time and space, particularly for Wiseman who has a 30:1 shooting ratio.

Even with logical dialogue progression, how are these jumps in time (replaced dialogue and images) masked in the mind of the viewer? Arguably, the illusion of ‘real-time’ as discussed by Barnouw begins this process. As explored, constructing real-time depends upon dialogue editing that follows logical, conversation progressions. Yet, these edits must be concealed and as makers borrowing from classical Hollywood traditions, observational documentarians work to hide the apparatus whenever possible. Voice-over narration, which aligns with the apparatus as it “becomes a ‘voice on high,’ …a voice which speaks from a position of superior knowledge, and which superimposes itself ‘on top’ of the diegesis” (Silverman 1988, 48) is discarded. Proximity between the viewer and the text becomes possible through the absence of narration. Wiseman has described this relationship, stating that he avoids narration so that “there’s no separation between the audience watching the film and the events in the film.” (Wiseman in Atkins 1971, 43) In Representing Reality, Bill Nichols suggests proximity between the film text and the audience within this style of documentary film:

“Even in observational films like Primary, Jane or A Married Couple, and especially in works by Fred Wiseman like Hospital, High School, or Model, the strong sense of an indexical bond between what occurred in front of the camera and its historical referent draws us not into the details of the everyday but also into the formulation of a perspective on these institutional domains of the real. We process the documentary not only as a series of highly authentic sounds and images that bear the palpable trace of how people act in the historical world, but as serial steps in the formation of a distinct, textually specific way of seeing or thinking.” (1991, 29)

By veiling the apparatus and relying on continuity, observational documentaries allow for the indexicality Nichols describes.[2] The seamlessness of the soundtrack builds a foundation for this indexicality: time represented as a flow of words, which progress logically, without (audible) breaks.

“Simple Simon Says”: ambience and music as narrator

Wiseman’s films are striking in that they do not build from the meaning translated through words alone, but through the suggestions and guidance provided by other components of his soundtracks beyond the voice. Though Wiseman elides voice-over narration in his films, his use of music in High School particularly, and the structuring of ambient tones in Domestic Violence, present a version of a narrator’s voice, guiding the development of the thesis of the film, and commenting on the actions of the films’ participants.

Ambient sound, typically picked up through an omnidirectional microphone, captures the whole of a sonic environment without privileging a specific sound source in a scene. These ambiences defy logics of listening practice as all sounds within a space are captured within a 360 degree area. Domestic Violence is structured through the placement of ambient tones which develop and change during the interludes between sequences. The film begins with the loud, nearly deafening roars of city highways in Tampa, Florida. Like Jacques Tati who used traffic patterns and deafening vehicular sounds in Mon Oncle to comment upon de-humanizing effects of modernity, Wiseman relies on a score of mechanical roars to open the film and establish a powerfully uncomfortable, dare I say patriarchal, tone that controls this community. In this ambient track, mass movements of air, represented by the tremendous low frequencies of traffic, chillingly contrast with the immobile, glassy office buildings that rise like masculine pillars of society and economy. Through a sound dissolve, the traffic sounds dissipate to the swoosh of a lone vehicle, a cop car, gliding through the greener, quieter space of a poor suburban neighborhood. The cop car takes the viewer to a house where a woman has been hit.

Soon thereafter, Wiseman returns to the violence of traffic sounds on city streets as he leads us into another scene where cops respond to a call related to domestic violence. In this scene, we meet a woman covered in blood who explains her injuries from within her darkened apartment. Indeed, the blood saturates her clothes and drips down her legs. An emergency worker gives her gauze to plug up her wounds, and she puts this gauze into her mouth and screams from behind it as she is moved from the apartment to the ambulance. As the victim traumatized by domestic violence, her voice can become part of the urban ambience only when attenuated. Indeed, attending to the attenuation of victim’s voices within public space is the point of Domestic Violence.

Wiseman uses the ambient sound of The Spring, a shelter for women and children who are victims of domestic violence, to provide contrast with the exterior environment. Within The Spring, words drive the meaning as women tell of trauma and abuse during intake meetings with counselors, and as shocking statistics are shared.[3] But ambient spaces of the quiet hallways, sprinkled with children’s laughter, dictate the contrasting tone of this space from the world outside. The Spring’s corridors bring women together as they enter with bags of clothes and car seats loaded with children. At times these interstitials are noisier than at other moments, particularly as the children run through or as a group of elder women receive a tour of the facility. But always, voices within the corridors are clearly differentiated. Volume is expressed, not out of anger and not to control, but rather for glee or agreement.

By generating these shifts in volume association within the walls of The Spring, Wiseman dilutes the connection between volume and power / violence, as he had established at the opening of the film. Now volume builds to represent community, safety and recovery. In this vein, at the close of Domestic Violence, the sound of traffic returns. Here, traffic sounds are placed in the space of night and are merely swooshes of tires on empty streets. The ambience beyond the shelter cannot overpower the viewer as it once did, for it has been unpacked and interrogated.

In High School diegetic music (music that appears to emanate from within the scene itself) functions as ambience, but has been arranged and orchestrated by Wiseman, as editor, to express the themes of the film and to build its thesis. Diegetic music in High School can rarely be read as such and at face value, save for a lesson on percussive instruments, and in an assembly where boys, dragging as cheerleaders, move on stage to tunes played by a school marching band. Even in the classroom where a teacher plays a recording of Simon and Garfunkel’s “The Dangling Conversation” to her class, the music shifts from playing ‘diegetically’ to playing non-diegetically.

As the teacher begins to play the tape, the spatial signature of the room drops out of the recording altogether. The song plays while lacking the mark of its origination in the classroom. The montage edited with this song focuses on the tape itself, the teacher’s response (insecurity and pleasure in this teaching moment), and to singles on the students, evidencing their interest and boredom. The music drifts into images in the hallway—a student leaning against the wall, someone dragging a box, and finally fades into another disciplinary moment in Dr. Allen’s office. As the music floats across space (lacking the mark of space in the recording itself) and motivates this montage, the song associates with the author of the film text and performs as a narrator of these images.

Similarly, at the start of High School, a car approaches the neighborhood surrounding the school, then the school itself. This introduction to the setting for the film is paired with Otis Redding’s tune “(Sittin’ on) The Dock of the Bay.” Later, music dictates the movements of girls in their gym class as the poppy song called “Simon Says” fills the soundtrack. In these cases, the music is masked as diegetic—seemingly emanating from within the scene itself (the car speakers and speakers within the gymnasium respectively.) However, these songs indeed belong outside of the moment and have been placed by the editor of the film.

In her discussion of narrative film music, Claudia Gorbman considers the ways in which music is received by the viewer and the meaning music imparts based on its application in the film text. I argue that the use of music in observational documentary, like the use of continuity editing and dialogue editing, builds from traditions in narrative fiction film and cite Gorbman’s work here based on connections outlined above between the formal traditions. Wiseman’s use of music in the opening of the film and in the gym sequence, like Fellini in Gorbman’s study, “deliberately blurs the line between diegetic and non-diegetic components of his filmic discourse” (1980, 197-8). Gorbman calls this questionable site of musical source “extradiegetic” and further, “metadiegetic” placing the music within a characters memory or internal, subjective space.

“In reading music as metadiegetic or not, the issue is not its truth/falsity value—for music is not representational, and as such, cannot lie—but rather its connection to a secondary narrator at all. Although the question of “point-of-view music” demands rigorous analysis, we may agree that a metadiegetic reading depends on justification by narrative context and on other specific cinematic conventions.” (198)

The High School sequence in which the bodies of young women swing, move, and stretch to the beat of a tune whose lyrics ironically dictate order and obedience, deserves attention. In Grant’s discussion of High School in 5 Films he describes this scene:

“After questioning students on the telephone about their hall passes, the teacher looks through the window in a door as the pop song “Simple Simon” fades in on the soundtrack. The song’s authoritative commands—similar to much of what we hear throughout High School –reinforce the film’s overall view of the education process as impersonal and authoritarian. The film cuts from the teacher in the hallway peering through the window of the door into the gym, where girls in uniform are exercising to the song, and again we are encouraged to make a connection that the teacher was looking at this particular activity in the gym. Encouraged by both image and sound, viewers have often imputed a predatory sexuality to the teacher in the hallway, reading the camera’s emphasis on the girls’ buttocks as a point-of-view shot from the teacher’s perspective.” (2006, 6-7)

The teacher’s look through a window certainly frames the sequence as his point-of-view, as Grant suggests. But Wiseman does not return to this predatory observer during the extended sequence which would solidify this connection through the shot-reverse shot visual structure heretofore employed in High School. Rather, the camera remains in the gym, further developing fragmented images of the girl’s bodies. The visual focus of the sequence builds around the girls’ mid sections as the camera privileges their behinds and waists moving to the words “do it when Simon says, and you will never be out.” In a moment where the source of this music is called into question, so too is the point-of-view of this sequence and also of the presence of a second narrator, as described by Gorbman above.

Wiseman pays particular attention to the gendered subject in High School. Throughout the ‘day,’ students learn about proper gendered attire, reproduction, and sexed body parts via aural instruction. A teacher reprimands a girl for the inappropriately short length of the girl’s prom skirt. A group of female students are taught how to walk and carry their bodies, working around and against their awkward figures. These gender lessons appear through synch sound sequences, and incorporate the visual and aural structure we saw with Dr. Allen’s scenes. The teacher speaks, the students listen, and time is compressed through edits in the sound track, covered by close ups, to produce conversations with continuity. But in the Simon Says sequence, the visual structure of High School drastically shifts. There is pleasure to be had in the upbeat tempo of the music and in the humor of watching bodies without heads swinging to the beat. For, by removing the head, and particularly, by fragmenting the body, Wiseman removes the potential rendering of these girls’ subjectivity. The commanding music track, occupying all of the sonic space of this scene, presents both a heavy authorial voice articulating the loss of power by a group of students that is particular to gender, and simultaneously a male gaze deriving pleasure through this control.[4]

Kaja Silverman’s work on female voice in the narrative fiction film, illuminates conventions in sound in Classical Hollywood cinema which restrict female characters by distancing them from the filmmaking apparatus. Through multiple techniques, including desynchronization and silence, female characters are differentiated from male characters, and pushed to a recessed space, inside of the story of the film. Silverman argues that within this recessed space, female characters are separated from the author of the film text. They cannot function as narrators or give voice to the direction of the film. (54) In Into the Vortex (2006), Britta Sjogren calls these strict gender-based assigns of the apparatus into question. She explores how female voice in certain works of 1940’s film noir reveal difference and contradiction. In High School, by fragmenting young female bodies and removing the possibility for speech (by removing their heads in the frame), Wiseman creates a contradiction: he aligns the viewer with the principal who looks into the scene while trolling the halls, yet his awareness of patriarchal, gendered proscriptions is clearly apparent throughout the text. While one may consider that this view of the objectified girls represents the gaze of the documentary filmmaker, I wish to conclude that through this scene Wiseman reflexively acknowledges the qualities of cinema that allow for this type of gendered objectification. More relevantly, this gaze is underscored and dictated by the soundtrack itself as the musical beat allows for this disruption in the otherwise consistent style of the film up to this point.

“I happen to be a gynecologist and get paid to do it”: volume and space

I wish to extend a discussion of the alignment of male characters with the apparatus by considering volume of voice. By volume, I mean, quite literally, the amplitude of the sound wave carrying the voice of male characters onto the soundtrack of the film. In moments, male voices in High School are reflexively aligned with the apparatus. We see this with Simon Says silencing, to consciously negate female subjects. By selecting these moments, Wiseman, as author attends to a gendered difference in aural space.

Wiseman places two of the loudest moments in High School side by side, connecting them technically, but also thematically, as they both address constructions of masculinity. The first of the two outbursts in the sound mix occurs in the school assembly as a group of boys dragging as cheerleaders take the stage. A quiet moment precedes this scene as a teacher announces a scheduled discussion around Martin Luther King Jr.’s assassination. Wiseman emphasizes the racist undertones at the school by cutting away immediately to an auditorium bursting with revelry. In this moment, white male students wearing pigtailed wigs and tight fitting cashmere sweaters bedecked with sparkle pasties take to a stage, dismissing the call for dialogue and mourning. The sound track is full of high, mid and low frequencies of great amplitude as the boys prance and dance in front of their audience. A mass of sound fills the screening space as the performance continues, in what may be read as derogatory dragging. This gendered and racialized disruption is manifest through distortions in the sound track pushing at the boundaries of volume and audio control.

In the scene that follows, the sound of boys performing gender again fills the soundtrack. I wish to focus here on the scene wherein a visiting gynecologist speaks to an auditorium of boys using excerpts from Grant’s 5 Films transcript:

“[Cut to Medium Shot of gynecologist at microphone in front of auditorium filled with boys.]

Gynecologist: … [taking a written question.] ‘Is it possible to impregnate a girl by rubbing the surface of the vagina?” With what? Your nose? No. [Laughter and applause from boys.] and I might add, this brings up one other good point. This brings up one other darn good point. Virginity is a state of mind. By that, I have seen several girls who have been physiologically, or by physical examination, virgins; the hymen, the mucous membrane covering the so-called, the cherry—it’s called the cherry because it produces red fluid when it’s busted—is intact. I have seen girls whose hymens were so small that I couldn’t pass a finger through them. [laughter] in fact, I once saw a girl—I happen to be a gynecologist and get paid to do it. [laugher and applause from boys]…’” (82-3)

Upon the first joke: “With what your nose” the camera pans from the MS of gynecologist to the audience. It takes some time for the boys to settle down aurally, though visibly we only see subtle movements in the crowd and boys getting up to hand in their questions scrawled on paper. Their building excitement is fostered by the performance of the excitable gynecologist. He prepares to deliver the word “cherry” seemingly knowing that he is breaking from script, and pronouncing a colloquial word often not allowed in this space. As he delivers this word, “cherry” his expression shifts. He raises one eyebrow and twists his mouth with sly amusement. He pauses. As he delivers the next line, the tone of the gynecologist further shifts and he creates pauses between words for dramatic effect: “It’s called the cherry because it produces red fluid when it’s (pause) busted.” He carefully articulates this word, ‘busted,’ accentuating the letter “b.” This word, like “cherry,” is another departure from his professional, dry script. After the cherry / busted build up, the boys have already begun to buzz. Buzzing continues as the gynecologist describes how he puts his fingers inside of a female patient to examine her.

Finally, upon the delivery of his punchline “I happen to be a gynecologist and get paid to do it,” the room explodes with laughter and applause. The boys’ raucous response is communicated to the viewer solely through the audio track. We cannot see the boys, as the camera stays on the gynecologist. But this outburst represents one of the loudest moments in the film. This requires that the doctor then deliver his next few lines over laughs and one-liners from the boys. He attempts to return the room to order by performing dry delivery of pre-scripted material. In fact, only when Wiseman cuts from the medium shot on the gynecologist to close up shots of the boys listening, does the room return to order. This moment, like Dr. Allen’s sequences, masks a sound edit and the removal of time from the moment in situ.

Volume and gender are central in Wiseman’s Domestic Violence. As I discussed earlier, Wiseman utilizes volume in ambient space to comment upon violence and recovery from violence. However, the volume of voice, particularly of the voices of those who have been abused, receives significant attention in the course of the film’s narrative arc. During numerous sessions and in-take meetings, victims of domestic violence speak at length to their experiences. Often, as with most women traumatized by domestic violence, their speeches are delivered with suppressed emotion and numbness.[5] They present flat, affectless descriptions of being pushed into walls, punched, controlled, intimidated, yelled at and raped. Counselors have space and time to tell the women about cycles of domestic violence—just as their silence was taught to them through intimidation, so must they learn to speak here at The Spring. And through these lessons, feeling returns to their voices. Near the end of the film, one woman exclaims: “We were talking about this this morning. (yelling now) We’re allowed to use the tel-e-phone! We’re allowed to talk!”

Technically, Wiseman seems to employ two methods of sound recording in these group sessions. The recording of the women who have been abused and are residents at the shelter carries with it spatial signature, which suggests a boom mic or hand operated shotgun was used for audio pick up. At times, one recognizes a dip in volume as the microphone travels to the direction of a new speaker (moving from off to on-mic).

However, the recording of the counselor’s voice who leads the discussion is on mic at all times and lacks spatial signature, suggesting that she was recorded with a lavaliere microphone or a planted directional microphone on a separate channel. Because of this difference in recording technique, the quality of her voice varies drastically from the quality of the voices of the other women in the room. Most strikingly, when she speaks, Wiseman does not cut to an image of her speaking. Rather we only hear her and see the women in the room listening to what she says.  As evidenced by Sjogren, female voice is not always contained and controlled by the apparatus, particularly when it is disembodied voice off and/or voice-over (2006). In applying Sjogren to this observational documentary, the disembodied voice-off of the counselor, devoid of spatial signature, positions her as an authorial voice more closely aligned with the apparatus. Interestingly, this counselor is also a woman and her associations with the technology of documentary production, in this instance, does not connect her with a masculine space of domination. Rather, her voice, resounding with spacelessness and on-mic delivery, becomes the message that must be repeated so that others may heal and so that the experiences of these particular women, and other abused women, may be understood, thereby suggesting Wiseman’s feminist apparatus.

Music masked as diegetic, dialogue rearranged and reconstructed, jump cuts supported by logical flows of words, and ambient sound emphasized and placed constitute manipulations common in the work of Wiseman and observational documentarians who study and work to emulate his techniques. Frederick Wiseman, considered by many as the mastercraftsman of observational cinema, openly acknowledges the manipulations inherent in this mode of documentary production. However, Winston notes that while claiming subjective construction, Wiseman speaks from both sides of his mouth, simultaneously arguing that he is editing with purpose, but insisting the viewer find meaning in the text for themselves. (2008, 163)

“The claim now was that it is the film-maker’s subjectivity that is being objectivity recorded. In this, though, direct cinema films remain evidence of something – the film-maker’s ‘witness’.” (2008, 164)

For Winston, this denouncement of objectivity by the direct-cinema filmmaker, while simultaneously claiming the film reflects the filmmakers’ point of view, is a “profound contradiction” that cannot undo the films’ signing as objective, realistic, evidence. (163-165) Winston looks to a moment in the Maysles’ Primary where synch was faked as a moment when audiences were made to consider the film as scientific evidence, not construction and mediation. (151)

In the tradition of direct-cinema that continues today, this moment and this technique, is indeed primary. Wiseman’s films continue to rely on and benefit from the invisibility and malleability of the audio track. As such, sound editing provides the foundation in several documentary styles: be it voice-over narration enveloping and guiding an entire film, or the invisible sound edits in observational works linking sentences that never were, nor belonged, together. In practice, at the stage of post-production, documentary filmmakers employ transcribers to visualize the word. We highlight, cut and paste, rearrange these typed words into scintillating dialogue and cohesive interviews. Further, by arranging selections in music of the period and place, accentuating and arranging ambient tones, and adjusting and managing volume, the filmmaker, Wiseman, asserts his critique of a particular institution and recognition of social trauma. I do not condemn Wiseman and other documentary makers who have built upon his techniques for taking such liberties. In Bill Nichols' terms, these are storytellers “representing reality.” Like Winston, I see that selection, arrangement and manipulation are a necessary part of the craft. He charges audiences to ‘embrace an understanding of the inevitable mediations of the film-making process.” (2008, 289) As filmmakers, we must also acknowledge how sound editing, and the manipulations therein, are central to the documentary process and look to Wiseman’s sound editing practice as a guide that has influenced much of documentary production since the lessons we witnessed early on in High School and throughout his later films.


A version of this essay was first published in German in Frederick Wiseman, Kino des Sozialen (Ed. Eva Hohenberger, Vorwerk 8, 2009, pp. 139-155)

1.While foregrounding this method of documentary editing, Rabiger warns against relying on the written word entirely “I should warn that using transcripts too literally also has some dangers. It can lead you to place too much emphasis on words and thus to making a speech-driven film.” (2004, 417) [return to text]

2. In Invisible Storytellers, Sara Kozloff deems observational documentary films engrossing and engaging because of the absence of voice-over narration. But she stipulates:

“In film, then, while there are major differences between having the camera capture an action and having a narrator describe that action, the ideal of blissful communion between the viewer and some untouched, untainted reality presented by a completely neutral mechanism is an illusion.” (14)

3. During a tour of the facility, the guide indicates that one in three American women is abused during a relationship, and that

“the F.B.I., which is not a feminist organization, statistics suggest that it is more like one in two women will be physically abused.”

4. As may be assumed, I rely on Laura Mulvey’s discussion of the male gaze which derives pleasure from objectifying the female body, here.

5. Many studies of post traumatic stress disorder in chronically traumatized people, from victims of prolonged captivity to women effected by domestic violence, speak to the inhibition of emotion. I suggest these for brevity’s sake: Herman 2004 and Elkin, Newman, Carter and Zaslav 1999.

6. A documentary editor with whom I work calls this sound edit a “Franken-byte.”


Altman, Rick. “Material Heterogeneity of Recorded Sound.” Sound Theory / Sound Practice, [TELL CITY] Indiana: Routledge, 1988, 13-31.

Atkins, Thomas R. Frederick Wiseman. New York: Monarch Press: Simon & Shuster, Inc. 1976.

Barbash and Taylor. Cross Cultural Filmmaking A Handbook for Making Documentary and Ethnographic Films and Videos. Berkeley: University of California Press, 1997.

Barnouw, Eric. Documentary: A History of the Non-Fiction Film. New York: Oxford University  Press, 1993.

Benson & Anderson Reality Fictions: The Films of Frederick Wiseman. Carbondale: Southern Illinois University Press 2nd Edition, 2002.

Chapman, Jane. Documentary in Practice: Filmmakers and Production Choices. Cambridge: Polity Press, 2007.

Elkin, G. David, Emily Newman, Cameron S. Carter, and Mark Zaslav. “Post-Traumatic Stress  Disorder.” Introduction to Clinical Psychiatry. Stamford: Appleton & Lange, 1999, 101-112.

Gorbman, Claudia. “Narrative Film Music.” Yale French Studies. No. 60, Cinema/Sound, 1980, 183-203.

Grant, Barry Keith. 5 Films by Frederick Wiseman: Titicut Follies, High School, Welfare, High School II, Public Housing. Berkeley: University of California Press,2006.

Herman, Judith. “From Trauma and Recovery: The Aftermath of Violence from Domestic Violence to Political Terror.” Violence in War and Peace: An Anthology. ed Nancy Scheper-Hughes, Philippe I. Bourgois. Malden: Blackwell Publishing, 2004, 368-371.

Kozloff, Sarah. Invisible Storytellers: Voice-over Narration in American Fiction Film. Berkeley: University of California Press, 1988.

Levin, G. Roy. “Frederick Wiseman.” Documentary Explorations. Garden City: Doubleday & Co., Inc., 1971.

Mulvey, Laura. “Visual Pleasure and the Narrative Cinema.” Screen. Vol. 16, No. 3, Summer 1975, 6-18.

Nichols, Bill. Introduction to Documentary. Bloomington: Indiana University Press,2001.

Nichols, Bill. Representing Reality: Issues and Concepts in Documentary. Bloomington: Indiana University Press, 1991.

Rabiger, Michael. Directing the Documentary: Fourth Edition. Burlington: Focal Press, 2004.

Ruoff, Jeffrey .“Conventions of Sound in Documentary.” Cinema Journal. Vol 32, No. 3, Spring1993, 24-40.

Silverman, Kaja. The Acoustic Mirror: The Female Voice in Psychoanalysis and Cinema.Bloomington: Indiana University Press, 1988.

Sjogren, Britta. Into the Vortex: Female Voice and Paradox in Film. Urbana and Chicago: University of Illinois Press, 2006.

Winston, Brian. Claiming the Real: The Documentary Film Revisited. London: British Film Institute, BFI Publishing, 1995.

To topJC 54 Jump Cut home

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.