Multimodal Interaction Analysis (Final Project)

During this seminar, we looked at data from different perspectives by not considering the oral discourse as the primary channel of communication which echoes Och’s (1979) point that “transcription is a selective process reflecting theoretical goals and definitions” (p. 44). Conveying the different modalities in an interaction is complicated and messy. However, when choosing to do so, the researcher must choose the most salient modes of communication and know that their choice makes a claim that those modes are more important than others.

For the final project, I wanted to investigate how postural configurations convey alignment between interlocutors in order to convey specific “social identities” (Ochs, 1993). According to Ochs, individuals in interaction perform certain social acts or verbally display certain social stances in order to construct a specific social identity. (I go into more detail about this here)

Objective: Foreground the oral discourse between interlocutors in a conversation and place at the forefront the culmination of the interlocutors’ postural configurations, body movements, gaze in the maintenance of the interactive space, or F-formation (Kendon, 1990).

The data: For this project, I recorded an hour long exercise class at a gym which included a coach (Beth), her two clients (Pam and Gale), and me as an observer (Nick). Of the one hour recording, I analyzed 3.5 minute long segment. As a point of comparison, transcribing the oral discourse for this segment took 45 minutes due to the number of interlocutors (four). However, when I took into account the different modalities, it took me 12 additional hours to transcribe.

Thus, in a transcript which would normally look like this:


Figure 1. Oral transcription

looks like this:


Figure 2. Multimodal transcription

Methodology: To record the hour long class, I used two GoPro H5s (video-recording) and one Zoom H5 (audio-recording). Figure 2 illustrates the gym layout and the locations of each recording device and the layout of the gym.


Figure 3. Gym schematic

Taking into account the large open space of the gym, I set up the two video-recording devices on opposite sides of the gym and an audio-recording device in the center. Video-recording device 1 was set to wide-angled lens mode while video-recording device 2 was set to a narrow mode. The dotted lines represent the areas within the frames of each camera. I placed the audio-recording device in the center to maximize audio recording redundancy For my analysis, I relied on video-recording device 2 because the participants faced that direction.

Typically, in the beginning of class, members congregate in front of the whiteboard to read the workout of the day. Thus, I oriented both cameras to include the whiteboard and the space in front of it. When the first two clients arrived, they did as I anticipated and walked up to the whiteboard to read the workout while discussing it. When the clients began their warm-up prior to the workout, they continued to remain in the frame of both cameras as well as in proximity to the audio-recorder. However, during the workout, audio-recording is not consistent as the clients increased distance from one another and went on runs outside the gym.

To develop the transcript, I first transcribed the oral discourse in Microsoft word because it would later be easier to transcribe the corresponding gaze and gestures. After I finished typing the transcription, I created a horizontal “scroll-like” document in Adobe Photoshop. In the scroll, I included six rows. The first four rows depicted the interactive contributions of each interlocutor. The fifth row included illustrations of the changing F-formation system, and the sixth row contained ambient influences.

Transcription convention: As mentioned above, my objective was to foreground oral discourse to focus on other modes of communicating. To do so, I used different colors to represent direction of gaze and gestures.


Figure 4. Transcription convention

Each participant was assigned a color. For example, Purple is Pam (for ease, I changed the names of each participant with a name that began with the first letter of the assigned color). Within each row, there are 3 sub-rows to include discourse, gaze, and gestures. To illustrate gaze, I transcribed the color of the recipient of the gaze. In Figure 4, Pam’s gaze is depicted by a solid gray line which represents Gale. She then re-orients her gaze to the whiteboard which is represented by a black outline. For the third sub-row, for simplicity (elsewhere, I’ve seen gestures drawn), I annotated the movement in plain language and then included a colored bar to indicate the duration of such movement. In the next row, Gale, as Pam is speaking to her, we can see that her interactive contribution is primarily gaze.

The fifth row illustrates the F-formation systems (Kendon, 1990). Kendon defines F-formation as the “maintenance and control” of an “o-space” (p. 81), the interactive space between participants. The behaviors exhibited by the participants in an interaction refers to the F-formation system, or the system required to maintain such F-formations. Figure 5 depicts a basic F-formation.


Figure 5. The F-formation system

The o-space refers to the interactive space maintained and controlled by the participants. It is the space between participants in an interaction. The p-space is the space allocated for the participants of the interaction to move around in. The r-space serves as a “buffer” for the protected o-space and accounts for outside influences as well as a space for incoming participants. Taking these three spaces into consideration, I developed the following illustration of the F-formation in my transcript and exemplified in Figure 6 below.


Figure 6. New F-formation system

While the F-formation system in Figure 5 accounts for the bodily configurations of each interlocutor (their placement in the p-space), my version in Figure 6 accounts for the varieties of interactive contributions. In other words, in Figure 6, Gale (represented by the gray moon icon) and me (represented in black) are actively maintaining the interactive space either through oral discourse and body language. This active engagement is represented by the overlapping-directional triangles. Meanwhile Beth (blue) and Pam (Purple) passively maintain the space with fewer modalities, i.e. solely gaze, and is illustrated by the red circle.

Putting all this together, we have Figure 7.


Figure 7. Putting it together

According to the F1, Pam’s orientation is shifting from Gale to Nick, which also corresponds with the gaze transcription in her row. At the same time, Gale is moving her attention from Pam to Nick. Beth’s orientation shifts between Nick and Gale while Nick maintains his attention on Beth as her speech is directed toward him. In addition to the interactive activity represented by the red, bodily configuration in the p-space is shifting, which is represented by the overlaying shades of each individual’s icon. In F2, Nick is depicted moving toward the o-space. F2 captures the formation of a new F-formation system which then shifts immediately into F3 to depict the active engagement of two interlocutors, Nick and Gale.

css.php