App usage tips

Video upload

Video can be uploaded by dragging and dropping a video file into the dashed box, or by clicking on the box and selecting a file. Once the video has uploaded, the upload box should be replaced by the displayed video ready for playback. Hovering over the video will bring up the video playback tools. If the video does not open, then the upload has failed.

It is not required to upload a video. You can still enter any desired timestamps and generate music. If your video file is too large, this allows you to generate and download the music, then use the video editing software of your choice to add the music to the video.

The advantage of uploading the video is that you can preview the music with the video playback from the web app, and you have the option of automatically merging the video and audio and downloading the merged file directly. This achieves the fastest results from your original video, to a finished product with merged audio and video.

Style templates and markers

The style dropdown list contains a selection of pre-set style templates to choose from. Each style will result in a composition with a particular number of sections. For example, the first section might be an introduction, or different instruments might play in different sections.

The “Markers” are used to set the time points where transitions between sections occur. If you have loaded a video, you can scroll through the video to select these time points at key points in the video. Alternatively, you can extract these time points from the video in any other video player or video editing software.

When a video is loaded, the initial marker is automatically set to zero (the beginning of the video), and the final marker is set to equal the length of the video. This will generate music that matches the length of the video. This can be changed if desired, so as to create music with a length that differs from the total length of the video.

All or some of the markers may be specified. If any markers are left blank, then the times will be chosen to equally divide the unspecified interval. If there is no ending marker specified, the default section length is set to 30 seconds. Each style also has a minimum section length, typically around 4-6 seconds. Slower tempo usually requires a longer minimum section length. If a section length is too small, then it will be automatically adjusted to the minimum.

A section can be excluded from the composition by setting its length to zero, or in other words, by setting two successive markers to the same timestamp. This is useful when applying a longer style to a short video. This also allows the generation of an intro or outro, in isolation.

Some styles are designed for voice-over videos with a mix narrated and music-only sections. These are designed to have alternating sections where the music will “duck” below the narration, and sections where the music comes to the foreground. The number of sections can be customized by setting some markers to the same value, as described in the previous paragraph.

Composition and playback

After clicking the “Compose!” button, an animation will appear to indicate that the composition is in progress. Typical compositions of several minutes should take about 10 seconds.

Before the algorithm creates a composition, it must choose a structure for the composition that places the section boundaries at the selected markers. This is accomplished by attempting to alter the tempo of the music slightly to align the markers to reasonable points in the music. This usually can’t be done perfectly, so sometimes the transition points in the music will deviate slightly (usually less than one second) away from the chosen markers. If there is a particularly important time point that isn’t aligned sufficiently well, the situation may be improved by changing the less-important markers slightly and generating a new composition.

When the composition is complete, a midi player appears, with a “piano roll” visualization of the composition. Each rectangle in the visualization represents a note in the piece, with the pitch represented by the vertical position, and the time and duration represented horizontally. When clicking the play button, the composition will be played using a sound font synthesizer. This allows real-time audio synthesis, for quickly hearing the music, and checking the alignment with the video.

Some instruments sound better than others with the sound font synthesis. Orchestral instruments tend to sound worse, with electronic or percussion instruments sounding better. Don’t be discouraged if the sound font synthesis doesn’t sound good - better results can be obtained using the Rendered Audio download.

Downloading the outputs

  • Midi file:

    Downloading a midi file can be useful for users who wish to use the generated composition as a starting point for further adjustments. For example, the midi file can be further edited by opening in a digital audio workstation or importing into music notation software (e.g. Finale, Sibelius, Musescore). Downloading the midi file also allows the user to apply their own audio rendering and synthesis process. For example, the user can render the file using their own audio sample libraries, and can apply filters and effects.

  • Score:

    For users that are familiar with reading music, downloading the score can be useful for looking at the structure of the composition. It also allows human musicians to play one or more of the parts.

  • Rendered audio:

    This generates a rendered audio file using audio samples. This option generally produces higher-quality output than the preview playback in the app. Note that the audio file includes a one second delay at the beginning and a two second tail at the end, so the duration of the outputted file is three seconds longer than the composition itself. The rendering process usually takes about 20 seconds.

  • Video with new audio:

    If you have uploaded a video, this option will automatically merge the new audio and video, providing a new video file with any original audio removed. If the audio has already been rendered by downloading the “Rendered audio” option, then this should be quite fast. If not, then the audio must render first before combining with the video, so will take 20 seconds or so.

  • Video with mixed audio:

    If you have uploaded a video, this option will automatically merge the new audio and video, providing a new video file that mixes the new audio with any original audio. As with the “new audio” option, if the composition has not previously been rendered, this option will render the audio before mixing with the existing audio, which will take 20 seconds or so. The volume of the new audio is slightly reduced before mixing. For more controlled audio mixing, consider choosing the “Rendered audio” option, then using third-party video editing software for mixing the new audio with the video.