Silvan Tang’s Thesis Paper in NYU Music Tech Master Program

Emotional Perception and Listener Preference:
Comparing MIDI-Sampled Strings and Studio-Recorded String Quartets in Pop Music Production

  • This thesis investigates how listeners perceive emotion and realism in string ensemble music produced through two contrasting methods: studio-recorded string quartet performance and advanced MIDI-sampled emulation. Building on the rapid expansion of digital production technologies in contemporary pop and cinematic-pop music, the study examines whether MIDI-generated strings—enhanced through velocity, modulation, and expression control, which can reproduce or surpass the emotional authenticity of live recordings. A mixed-methods experimental design was employed, incorporating 24 stimuli across two emotional categories (joy and sorrow), each presented in four production versions: one live recording and three progressively expressive MIDI variants. Nineteen participants from a music-technology background completed blind emotional-perception ratings, focused listening tasks, and post-reveal reflections, supplemented by qualitative interviews. Results indicate that while studio recordings consistently achieved the highest ratings in naturalness and emotional intensity, certain MIDI versions, particularly those emphasizing modulation shaping, elicited comparable or even greater listener engagement depending on register and phrasing context. Participants with stronger production backgrounds demonstrated heightened sensitivity to timbral detail, articulation, and spatial cues, whereas general listeners responded primarily to global contour and emotional clarity rather than production source. The study concludes that although contemporary MIDI technologies can closely approximate expressive string performance, live recordings still maintain advantages in micro-timbral variation, phrasing authenticity, and perceived emotional depth. These findings provide insight into the aesthetic boundaries of digital orchestration, inform future improvements in expressive MIDI mapping, and offer implications for hybrid workflows in modern music production.

  • This research explored how listeners perceive and evaluate emotional authenticity and preference between MIDI-sampled strings and studio-recorded string quartets across two emotional contexts, sadness and joyfulness. Combining quantitative ratings with qualitative interviews, the study aimed to understand how expressive control, musical structure, and production method interact to shape emotional perception in contemporary string production.

    Contrary to the original hypothesis, the findings revealed no singular or universal answer to which format, MIDI or recorded, is superior or replaceable. Instead, the results demonstrated an open-ended and condition-dependent relationship, where emotional perception and preference are influenced by a network of factors, including production technique, note design, expressive range, listener expectation, and prior experience. These interacting variables created unexpected biases and diverse perceptual outcomes that defy a simple binary conclusion.

    In this research, the sad category, featuring long legato phrases and narrower melodic motion, showed that MIDI strings with Velocity + Modulation + Expression could convey emotionally rich, authentic, and nuanced performances, closely approximating the depth of live recordings. In contrast, the joy category, characterized by shorter and more rhythmically dynamic notes, revealed a technical limitation: because the notes were so brief, expression and modulation automation often had insufficient time to take effect, resulting in performances that felt less vivid or naturally articulated. This structural constraint partly explains why recorded versions consistently outperformed MIDI versions in the joy group, where live players could still express micro-level dynamics and articulation details that MIDI systems could not fully reproduce in such short timeframes.

    These results collectively emphasize that emotional realism depends on the interplay between expressive control and musical context, rather than production format alone. Recorded strings maintain an advantage in timbre complexity, vibrato variation, and spontaneous articulation, while expressive MIDI programming offers flexibility and precision that can achieve convincing emotion under suitable structural conditions, particularly for slower, legato-based materials.