From the Machine: Realtime Networked Notation

The possibility of instantiating realtime compositional intelligence in machines holds the most radically transformative promise for a paradigmatic shift in music in the years ahead.

Written By

Joseph Branciforte

Last week, we looked at algorithms in acoustic music and the possibility of employing realtime computation to create works that combine pre-composition, generativity, chance operations, and live data input. This week, I will share some techniques and software tools I’ve developed that make possible what might be called an interactive score. By interactive score, I mean a score that is continuously updatable during performance according to a variety of realtime input. Such input might be provided from any number of digitized sources: software user interface, hardware controllers, audio signals, video stream, light sensors, data matrices, or mobile apps; the fundamental requirement is that the score is able to react to input instantaneously, continuously translating fluctuations in input data into a musical notation that is intelligible to performers.

THE ALGORITHMIC/ACOUSTIC DIVIDE

It turns out that this last requirement has historically been quite elusive. As early as the 1950s, composers were turning to computer algorithms to generate and process compositional data. The resultant information could either be translated into traditional music notation for acoustic performance (in the early days, completely by hand; in later years, by rendering the algorithm’s output as MIDI data and importing it into a software notation editor) or realized as an electronic composition. Electronic realizations emerged as perhaps the more popular approach, for several reasons. First, by using electronically generated sounds, composers gained the ability to precisely control and automate the timbre, dynamics, and spatialization of sound materials through digital means. Second, and perhaps more importantly, by jettisoning human performers—and thus the need for traditional musical notation—composers were able to reduce the temporal distance between a musical idea and its sonic realization. One could now audition the output of a complex algorithmic process instantly, rather than undertake the laborious transcription process required to translate data output into musical notation. Thus, the bottleneck between algorithmic idea and sonic realization was reduced, fundamentally, to the speed of one’s CPU.

As computation speeds increased, the algorithmic paradigm was extended to include new performative and improvisational possibilities. By the mid-1970s, with the advent of realtime computing, composers began to create algorithms that included not only sophisticated compositional architectures, but also permitted continuous manipulation and interaction during performance. To take a simple example: instead of designing an algorithm that harmonizes a pre-written melody according to 18th-century counterpoint rules, one could now improvise a melody during performance and have the algorithm intelligently harmonize it in realtime. If multiple harmonizations could satisfy the voice-leading constraints, the computer might use chance procedures to choose among them, producing a harmonically indeterminate—yet, perhaps, melodically determinate—musical passage.

Realtime computation and machine intelligence signal a new era in music composition and performance, one in which novel philosophical questions might be raised and answered.

This is just one basic example of combining live performance input with musically intelligent realtime computation; more complex and compositionally innovative applications can easily be imagined. What is notable with even a simple example like our realtime harmonizer, however, is the degree to which such a process resists neat distinctions such as “composition”/“performance”/“improvisation” or “fixed”/“indeterminate.” It is all of these at once, it is each of these to varying degrees, and yet it is also something entirely new as well. Realtime computation and machine intelligence signal a new era in music composition and performance, one in which novel philosophical questions might be raised and answered. I would argue that the possibility of instantiating realtime compositional intelligence in machines holds the most radically transformative promise for a paradigmatic shift in music in the years ahead.

All of this, of course, has historically involved a bit of a trade-off: composers who wished to explore such realtime compositional possibilities were forced to limit themselves to electronic and virtual sound sources. For those who found it preferable to continue to work exclusively with acoustic instruments—whether for their complex yet identifiable spectra, their rich histories in music composition and performance, or the interpretative subtleties of human performers—computer algorithms offered an elaborate pre-compositional device, but nothing more.[1]

BRIDGING THE GAP

This chasm between algorithmic music realized electronically (where sophisticated manipulation of tempi, textural density, dynamics, orchestration, and form could be achieved during performance) and algorithmic music realized acoustically (where algorithmic techniques were only to be employed pre-compositionally to inscribe a fixed work) is something that has frustrated and fascinated me for years. As a student of algorithmic composition, I often wished that I could achieve the same enlarged sense of compositional possibility offered by electronically realized systems—including generativity, stochasticity, and performative plasticity—using traditional instruments and human performers.

This, it seemed, hinged upon a digital platform for realtime notation: a software-based score that could accept abstract musical information (such as rhythmic values, pitch data, and dynamics) as input and convert it into a readable measure of notation. The notational mechanism must also be continuously updatable: it must allow for a composer’s live data input to change the notation of subsequent measures during performance. It must here strike a balance between temporal interactivity for the composer and readability for the performer, since most performers are accustomed to reading at least a few notes ahead in the score. Lastly, the platform must be able to synchronize notational outputs for two or more performers, allowing an ensemble to be coordinated rhythmically.

Fortunately, technologies do now exist—some commercially available and others that can be realized as custom software—that satisfy each of these notational requirements.

I have chosen to develop work in Cycling ’74’s Max/MSP environment, for several reasons. First, Max supports realtime data input and output, which provides the possibility of transcending the merely pre-compositional use of algorithms. Second, two third-party notation objects —bach.score[2] and MaxScore[3]—have recently been developed for the Max environment, which allow for numerical data to be translated into traditional (as well as more experimental forms of) musical notation. For years, this remained a glaring inadequacy in Max, as native objects do not provide anything beyond the most basic notational support. Third, Max has several objects designed to facilitate communication among computers on a local network; although most of these objects are low-level in their implementation, they can be coaxed into a lightweight, low-latency, and relatively intelligent computer network with some elaboration.

REALTIME INTERACTIVE NOTATION: UNDER THE HOOD

Let’s take a look at the basic mechanics of interactive notation using the bach.score object instantiated in Max/MSP. (For those unfamiliar with the Max/MSP programming environment, I will attempt to sufficiently summarize/contextualize the operations involved so that this process can be understood in more general terms.) bach.score is a user-interface object that can be used to display and edit musical notation. While not quite as robust as commercial notation software such as Sibelius or Finale, it features many of the same operations: manual note entry with keyboard and mouse, clef and instrument name display, rhythmic and tuplet notation, accidentals and microtones, score text, MIDI playback, and more. However, bach.score‘s most powerful feature is its ability to accept formatted text messages to control almost every aspect of its operation in realtime.

To take a basic example, if we wanted to display the first four notes of an ascending C major arpeggio as quarter notes in 4/4 (with quarter note = 60 BPM) in Sibelius, we would first have to set the tempo and time signature manually, then enter the pitches using the keyboard and mouse. With bach.score, we could simply send a line of text to accomplish all of this in a single message:

(( ((4 4) (60)) (1/4 (C4)) (1/4 (E4)) (1/4 (G4)) (1/4 (C5)) ))

example 1:

And if we wanted to display the first eight notes of an ascending C major scale as eighth notes:

(( ((4 4) (60)) (1/8 (C4)) (1/8 (D4)) (1/8 (E4)) (1/8 (F4)) (1/8 (G4)) (1/8 (A4)) (1/8 (B4)) (1/8 (C5)) ))

example 2:

Text strings are sent in a format called a Lisp-like linked list (llll, for short). This format uses nested brackets to express data hierarchically, in a branching tree-like structure. This turns out to be a powerful metaphor for expressing the hierarchy of a score, which bach.score organizes in the following way:

voices > measures > rhythmic durations > chords > notes/rests > note metadata (dynamics, etc.)

The question might be raised: why learn an arcane new text format and be forced to type long strings of hierarchically arranged numbers and brackets to achieve something that might be accomplished by an experienced Finale user in 20 seconds? The answer is that we now have a method of controlling a score algorithmically. The process of formatting messages for bach.score can be simplified by creating utility scripts that translate between the language of the composer (“ascending”; “subdivision”; “F major”) and that of the machine. This allows us to control increasingly abstract compositional properties in powerful ways.

Let’s expand upon our arpeggio example above, and build an algorithm that allows us to change the arpeggio’s root note, the chord type (and corresponding key signature), the rhythmic subdivision used, and the arpeggio’s direction (ascending, descending, or random note order).

example 3:

Let’s add a second voice to create a simple canonic texture. The bottom voice adds semitonal transposition and rhythmic rotation from the first voice as variables.

example 4:

To add some rhythmic variety, we might also add a control that allows us to specify the probability of rest for each note. Finally, let’s add basic MIDI playback capabilities so we can audition the results as we modify musical parameters.

example 5:

While our one-measure canonic arpeggiator leaves a lot to be desired compositionally, it gives an indication of the sorts of processes that can be employed once we begin thinking algorithmically. (In the next post, we will explore more sophisticated examples of algorithmic harmony, voice-leading, and orchestration.) It is important to keep in mind that unlike similar operations for transposition, inversion, and rotation in a program like Sibelius, the functions we have created here will respond to realtime input. This means that our canonic function could be used to process incoming MIDI data from a keyboard or a pitch-tracked violin, creating a realtime accompaniment that is canonically related to the input stream.

PRACTICAL CONSIDERATIONS: RHTYHMIC COORDINATION AND REALTIME SIGHT-READING

Before going any further with our discussions of algorithmic compositional techniques, we should return to more practical considerations related to a realtime score’s performability. Even if we are able to generate satisfactory musical results using algorithmic processes, how will we display the notation to a group of musicians in a way that allows them to perform together in a coordinated manner? Is there a way to establish a musical pulse that can be synced across multiple computers/mobile devices? And if we display notation to performers precisely at the instant it is being generated, will they be able to react in time to perform the score accurately? Should we, instead, generate material in advance and provide a notational pre-display, so that an upcoming bar can be viewed before having to perform it? If so, how far in advance?

I will share my own solutions to these problems—and the thinking that led me to them—below. I should stress, however, that a multiplicity of answers are no doubt possible, each of which might lead to novel musical results.

I’ve addressed the question of basic rhythmic coordination by stealing a page from Sibelius’s/Finale’s book: a vertical cursor that steps through the score at the tempo indicated. By programming the cursor to advance according to a quantized rhythmic grid (usually either quarter or eighth note), one can visually indicate both the basic pulse and the current position in the score. While this initially seemed a perfectly effective and minimal solution, rehearsal and concert experience has indicated that it is good practice to also have a large numerical counter to display the current beat. (This is helpful for those 13/4 measures with 11 beats of rest.)

example 6:

With a “conductor cursor” in place to indicate metric pulse and current score position, we turn to the question of how best to synchronize multiple devices (e.g. laptops, tablets) so that each musician’s cursor can be displayed at precisely the same position across devices. This is a critical question, as deviations in the range of a few milliseconds across devices can undermine an ensemble’s rhythmic precision and derail any collective sense of groove. In addition to synchronizing cursor positions, communication among devices will likely be needed to pipe score data (e.g. notes/rests, time signatures, dynamics, expression markings) from a central computer—where the master score is being generated and compiled—to performers’ devices as individual parts.

Max/MSP has several objects that provide communication across a network, including udpsend and udpreceive, jit.net.send and jit.net.recv, and a suite of Java classes that use the mxj object as a host—each of these has its advantages and drawbacks. Udpsend and udpreceive allow Max messages to be sent to another device on a network by specifying its IP address; they provide the fastest transfer speeds and are therefore perhaps the most commonly used. The downside to using UDP packets is that there is no specification for error-checking, such as guaranteed message delivery or guaranteed ordered delivery. This means that, while it provides the fastest transfer speeds, UDP does not necessarily guarantee that data packets will make it to their destination, or that packets will be received in the correct order. jit.net.send and jit.net.recv are very similar in their Max/MSP implementation, but use the TCP/IP transfer protocol, which does provide for error-checking; the tradeoff is that they have slightly slower delivery times. mxj-based objects provide useful functionality such as the ability to query one’s own IP address (net.local) and multicasting (net.multi.send and net.multi.recv), but require Java to be installed on performers’ machines—something which, experience has shown, cannot always be assumed.

I have chosen to use jit.net.send and jit.net.recv exclusively in all of my recent work. The slight tradeoff in speed is offset by the reliability they provide during performance. Udpsend and udpreceive might work flawlessly for 30 minutes and then drop a data packet, causing the conductor cursor to skip a beat or a blank measure to be unintentionally displayed. This is, of course, unacceptable in a critical performance situation. To counteract the slightly slower performance of jit.net.send and jit.net.recv (and to further increase network reliability), I have also chosen to use wired Ethernet connections between devices via a 16-port Ethernet switch.[4]

Lastly, we come to the question of how much notational pre-display to provide musicians for sight-reading purposes. We must bear in mind that the algorithmic paradigm makes possible an indeterminate compositional output, so it is entirely possible that musicians will be sight-reading music during performance that they have not previously seen or rehearsed together. Notational pre-display might provide musicians with information about the most efficient fingerings for the current measure, alert them to an upcoming change in playing technique or a cue from a fellow musician, or allow them to ration their attention more effectively over several measures. In fact, it is not uncommon for musicians to glance several systems ahead, or even quickly scan an entire page, to gather information about upcoming events or gain some sense of the musical composition as a whole. The drawback to providing an entire page of pre-generated material, from a composer’s point of view, is that it limits one’s ability to interact with a composition in realtime. If twenty measures of music have been pre-generated, for instance, and a composer wishes to suddenly alter the piece’s orchestration or dynamics, he/she must wait for those twenty measures to pass before the orchestrational or dynamic change takes effect. In this way, we can note an inherent tension between a performer’s desire to read ahead and a composer’s desire to exert realtime control over the score.

Since it was the very ability to exert realtime control over the score which attracted me to networked notation in the first place, I’ve typically opted to keep the notational pre-display to a bare minimum in my realtime works. I’ve found that a single measure of pre-display is usually a good compromise between realtime control for the composer and readability for the performer. (Providing the performer with one measure of pre-display does prohibit certain realtime compositional possibilities that are of interest to me, such as a looping function that allows the last x measures heard during performance to be repeated on a composer’s command.) Depending on tempo and musical material, less than a measure of pre-display might be feasible; this necessitates updating data in a measure as it is being performed, however, which runs the risk of being visually distracting to a performer.

An added benefit of limiting pre-display to one measure is that a performer need only see two measures at any given time: the current measure and the following measure. This has led to the development of what I call an “A/B” notational format, an endless repeat structure comprising two measures. Before the start of the piece, the first two measures are pre-generated and displayed. As the piece begins, the cursor moves through measure 1; when it reaches the first beat of measure 2, measure 3 is pre-generated and replaces measure 1. When the cursor reaches the first beat of measure 3, measure 4 is pre-generated and replaces measure 2, and so on. In this way, a performer can always see two full bars of music (the current bar and the following bar) at the downbeat of any given measure. This system also keeps the notational footprint small and consistent on a performer’s screen, allowing for their part to be zoomed to a comfortable size for reading, or for the inclusion of other instruments’ parts to facilitate ensemble coordination.

example 7:

SO IT’S POSSIBLE… NOW WHAT?

Given this realtime notational bridge from the realm of computation to the realm of instrumental performance, a whole new world of compositional possibilities begins to emerge. In addition to traditional notation, non-standard notational forms such as graphical, gestural, or text-based can all be incorporated into a realtime networked environment. Within the realm of traditional notation, composers can begin to explore non-fixed, performable approaches to orchestration, dynamics, harmony, and even spatialization in the context of an acoustic ensemble. Next week, we will look at some of these possibilities more closely, discussing a range of techniques for controlling higher-order compositional parameters, from the linear to the more abstract.

1. Notable exceptions to this include the use of mechanical devices and robotics to operate acoustic instruments through digital means (popular examples: Yamaha Disklavier, Pat Metheny’s Orchestrion Project, Squarepusher’s Music for Robots, etc.). The technique of score following—which uses audio analysis to correlate acoustic instruments’ input to a position in a pre-composed score—should perhaps also be mentioned here. Score following provides for the compositional integration of electronic sound sources and DSP into acoustic music performance; since it fundamentally concerns itself with a pre-composed score, however, it cannot be said to provide a truly interactive compositional platform.

2. Freely available through the bach project website.

3. Info and license available at the MaxScore website.

4. A wired Ethernet connection is not strictly necessary for all networked notation applications. If precise timing of events is not compositionally required, a higher-latency wireless network can yield perfectly acceptable results. Moreover, recent technologies such as Ableton Link make possible wireless rhythmic synchronization among networked devices, with impressive perceptual precision. Ableton Link does not, however, allow for the transfer of composer-defined data packets, an essential function for the master/slave data architecture employed in my own work. At the time of this writing, I have not found a wireless solution for transferring data packets that yields acceptable (or even consistent) rhythmic latencies for musical performance.