Adding Cue Points to Wav files in C
Wave files have the ability to contain Cue Points, a.k.a. Markers, a.k.a. Sync Points. Cue points are locations in the audio data which can be used to create loops, or non-sequential playlists. The FMOD Event API (for the playback of events created with the FMOD Designer application) has a neat feature where it will trigger a callback function when playing Wave data and reaching a cue point. However, unlike the lower level FMOD Ex API, there is no way to programatically add cue points to an event or a sound definition - the cue points must be present in the wave files before they are added to a Sound Bank in FMOD Designer.
For a recent project I needed to make use of these cue point callbacks in an FMOD Designer. I was disappointed to realise that Pro Tools doesn’t have the option of embedding its markers in exported wave files as cue points. So I had a look around for software that does this, and it seems the list is pretty short. On the Windows side there is Sound Forge and on Mac there is Triumph. Both are large, full featured audio packages, and not cheap, and I didn’t feel it was much of a value proposition when I only wanted to use one tiny feature. So I decided just to write some code to do it myself.
I sometimes jump around between Mac & Windows on interactive audio projects, so I decided to write the code in plain, portable C so that I could compile and use it on both platforms. Giving that I was going that far, I decided just to make the code completely platform agnostic, which raised a few challenges but made for an interesting side project for a couple of days! In this post I’ll talk through the internals of a wave file and the code I wrote to add cue points. The full code is available on github here.
The Wave File Format
The Wave file format’s roots go back to the IFF format developed by Electronic Arts back in the Amiga days (anyone remember Deluxe Paint?). The Wave format is an implementation of the RIFF file structure, which is Microsoft’s version of IFF. The difference between the two is that in IFF files bytes are laid out in big-endian format (which was the native format of the Amiga’s CPU), whereas RIFF files are laid out in little-endian format (which was, and is, the format of PCs’ Intel CPUs). The notion of big-endian vs little-endian is central to the platform-agnostic nature of the code shown below, so if you are not clear on these concepts there are good explanations around the web and on Wikipedia.
In RIFF files, the data is stored in chunks. A chunk consists of 2 parts, a header section and a data section. The header specifies the type of data in the chunk, and the number of bytes in the data section. The data section can then contain any data in any format. If an application reading a RIFF file encounters a chunk type that it doesn’t recognise, it can ignore it by jumping over the data section and moving on to the next chunk.
Specifically, in a chunk header the Chunk ID consists of 4 ASCII characters and the data size is specified by a 4 byte (32bit) unsigned integer, in little-endian format. The data section can be a variable number of bytes, as specified in the header. Chunks in RIFF files must be 2-byte aligned, i.e. each chunk must start on an even numbered byte. Therefore if the number of bytes in the data section is odd, a padding byte must be appended to the end of the data. However, it is important to note that if there is a padding byte present, it is *not* included in the data size value in the chunk header.
The ‘root’ chunk in a RIFF file has the chunk ID of “RIFF”, and the data size value equal to the file size minus the 8 bytes for the chunk header. The data section of the RIFF chunk starts with 4 bytes of ASCII describing the contents of the file, followed by a series of other chunks appropriate to that file type
Wave files in particular begin with the 'root’ RIFF Chunk, and the 4 bytes at the beginning of the root chunk’s data section are the characters “WAVE”. There are numerous further types of chunk that can be contained within a wave file, but at a minimum there should be a format chunk describing the format of the sample data, and a data chunk containing the actual audio samples.
The format chunk has the id “fmt ” (note the space as the 4th character). Its data section contains a compression code, indicating the compression used in the sample data. A value of 1 indicates uncompressed LPCM data. A list of common compression codes can be found on here. The data section also includes along with sample rate, bit depth, number of channels, average bytes per second and bytes per frame. Some of the more complex compressed audio formats may need additional information for decoding, so after the 'standard’ values in the format chunk, there can be an optional variable-length section of further data. If the data size in the format chunk’s header is greater than 16 (the number of bytes required for the 'standard’ format data), then there will be dataSize - 16 bytes of additional data at the end of the format chunk. If the number of extra bytes is odd, there will be a padding byte added to the end of the chunk.
The data chunk contains the actual audio sample data. This chunk has the ID of “data”, and the chunk’s data section contains the samples in the format specified in the format chunk. Again, a padding byte will be added to the end of the data chunk if the length of the sample data is odd.
Cue Points are stored in a Cue chunk. The cue chunk has the ID “cue ” (again, note the trailing space). Its data section consists of a number indicating how many cues it contains followed by a series of Cue Point 'sub chunks’. Each cue point has a unique identifier; a play order position that can be used with a separate Playlist chunk to create non-sequential playback options; a data chunk ID, which is the id of the chunk where the sample data for this cue point resides; a chunk start value which is used if the sample data is not in a standard “data”; a block start value which indicates bow many bytes of compressed data must be read before sample values can be decompressed; and a frame offset value which is the actual position of the cue point - if the audio is mono then 1 frame contains 1 sample; in stereo 1 frame contains 1 sample for the left channel and one for the right, and so on.
The code we will look at below shows adding cue points to LPCM wave files only (for clarity’s sake) so the values for each cue point are simple:
- Cue Point ID: the index of the cue point
- Play Order Position: 0 (i.e. no playlist)
- Data Chunk ID: “data”
- Chunk Start: 0 (we are using a standard 'data’ chunk)
- Block Start: 0 (uncompressed sample data needs no pre-reading
- Frame Offset: the position of our cue point.
There are numerous other types of chunk that can be found in a wave file, some standard, some standard-ish and some non-standard - as to be expected with a media format as old as this one. Some other chunks are useful, such as the Label and Note chunks which can associate text with cues, and others are just weird, such as the “JUNK” chunk I’ve seen in a few files (I think this comes from ProTools). But as we will do in the code below, programs can just ignore any chunks that they don’t recognise, which also means application specific meta data can be added to wave files without breaking compatibility.
Representing Chunks in Code
The code we will look at below will read a wave file, a file containing marker locations, and write out a new wave file containing cue points for each marker. As we go through the code to read and write wave files there will be some chunks that we will want to inspect and manipulate and others that we don’t care about. For simplicity and readable code, we start by defining structs to represent the chunks we will be manipulating:
typedef struct {
char chunkID[4]; // Must be "RIFF"
char dataSize[4];
char riffType[4]; // Must be "WAVE"
} WaveHeader;
typedef struct {
char chunkID[4]; // String: must be "fmt "
char chunkDataSize[4];
char compressionCode[2];
char numberOfChannels[2];
char sampleRate[4];
char averageBytesPerSecond[4];
char blockAlign[2];
char significantBitsPerSample[2];
} FormatChunk;
typedef struct {
char chunkID[4]; // String: Must be "cue "
char chunkDataSize[4];
char cuePointsCount[4];
CuePoint *cuePoints;
} CueChunk;
typedef struct {
char cuePointID[4];
char playOrderPosition[4];
char dataChunkID[4];
char chunkStart[4];
char blockStart[4];
char frameOffset[4];
} CuePoint;
Chunks that we will not manipulate directly can just be copied over from the source wave file into the output file. Therefore for these 'generic’ chunks, we can just keep a note of their location in the source file with a struct like this:
typedef struct {
long startOffset; // in bytes
long size; // in bytes
} ChunkLocation;
Endianness
As noted above, all the integer data in a Wave file is in little-endian format. Since Window’s PC’s and current Mac computers all use intel-derived CPUs, they all store data in memory as little endian. However, as I was writing this code to be portable, I decided to go all the way, and not assume that the platform running the code would be little endian. You will see that all the fields struct definitions for chunks above are just char (i.e. byte) arrays. Whenever we read data into one of these structs from a file or write one of these structs to a file, the data must explicitly be in little endian format. When we want to actually manipulate the data from one of these structs we will explicitly create a host-CPU endian integer variable by way of a function that will convert little-endian bytes to a host endian integer if necessary. Therefore in the code you will see these four functions:
uint32_t littleEndianBytesToUInt32(char littleEndianBytes[4]);
void uint32ToLittleEndianBytes(uint32_t uInt32Value,
char out_LittleEndianBytes[4]);
uint16_t littleEndianBytesToUInt16(char littleEndianBytes[2]);
void uint16ToLittleEndianBytes(uint16_t uInt16Value,
char out_LittleEndianBytes[2]);
The implementation is just some simple byte-reordering, which I won’t go into here, but which is included in the full source code.
Reading Wave Files
Onto the main course: code. First up is opening our input wave file and checking it really is a wave file:
FILE *inputFile = fopen(inFilePath, "rb");
WaveHeader *waveHeader = (WaveHeader *)malloc(sizeof(WaveHeader));
fread(waveHeader, sizeof(WaveHeader), 1, inputFile);
if (strncmp(&(waveHeader->chunkID[0]), "RIFF", 4) != 0)
{
fprintf(stderr, "Input file is not a RIFF file\n");
goto CleanUpAndExit;
}
if (strncmp(&(waveHeader->riffType[0]), "WAVE", 4) != 0)
{
fprintf(stderr, "Input file is not a WAVE file\n");
goto CleanUpAndExit;
}
In the above, and through out the rest of the code samples, I’ll skip over error checking for brevity.
Next we will read through the remainder of the input file and identify the different chunks that it contains, by reading the Chunk ID of the next chunk we encounter:
while (1)
{
char nextChunkID[4];
fread(&nextChunkID[0], sizeof(nextChunkID), 1, inputFile);
if (feof(inputFile))
{
break;
}
We are interested in format chunks, data chunks and any existing cue chunks. By convention format chunks should be the first chunk, but you never know. When we find a format chunk, we can check we have uncompressed audio:
if (strncmp(&nextChunkID[0], "fmt ", 4) == 0)
{
// Elsewhere declare FormatChunk *formatChunk
formatChunk = (FormatChunk *)malloc(sizeof(FormatChunk));
// Skip back to the start of the chunk
fseek(inputFile, -4, SEEK_CUR);
fread(formatChunk, sizeof(FormatChunk), 1, inputFile);
if (littleEndianBytesToUInt16(formatChunk->compressionCode) != 1)
{
fprintf(stderr, "Compressed audio formats are not supported\n");
goto CleanUpAndExit;
}
Although the extra format information at the end of a format chunk isn’t required for uncompressed audio, some applications may stick some data in there anyway. So we check for that and any padding byte that may come after it:
uint32_t extraFormatBytesCount =
littleEndianBytesToUInt32(formatChunk->chunkDataSize) - 16;
if (extraFormatBytesCount > 0)
{
formatChunkExtraBytes.startOffset = ftell(inputFile);
formatChunkExtraBytes.size = extraFormatBytesCount;
fseek(inputFile, extraFormatBytesCount, SEEK_CUR);
if (extraFormatBytesCount % 2 != 0)
{
fseek(inputFile, 1, SEEK_CUR);
}
}
}
When we find a data chunk, we just keep track of its position:
else if (strncmp(&nextChunkID[0], "data", 4) == 0)
{
// Declare elsewhere ChunkLocation dataChunkLocation
dataChunkLocation.startOffset = ftell(inputFile) - sizeof(nextChunkID);
char sampleDataSizeBytes[4];
fread(sampleDataSizeBytes, sizeof(char), 4, inputFile);
uint32_t sampleDataSize = littleEndianBytesToUInt32(sampleDataSizeBytes);
dataChunkLocation.size = sizeof(nextChunkID)
+ sizeof(sampleDataSizeBytes)
+ sampleDataSize;
// Skip to the end of the chunk.
fseek(inputFile, sampleDataSize, SEEK_CUR);
if (sampleDataSize % 2 != 0)
{
fseek(inputFile, 1, SEEK_CUR);
}
}
If we find and existing cue chunk, i.e.:
else if (strncmp(&nextChunkID[0], "cue ", 4) == 0)
{
...
}
We could go ahead and stick its details into a CueChunk struck if we want to try to merge existing cue points with the ones we will add, but for now we will just ignore it. For any other chunks that we find we will just keep a note of their location in the input file.
else
{
// Declared elsewhere:
// const int maxOtherChunks = 256 or whatever;
// int otherChunksCount = 0;
// ChunkLocation otherChunkLocations[maxOtherChunks] = {{0}};
otherChunkLocations[otherChunksCount].startOffset = ftell(inputFile)
- sizeof(nextChunkID);
char chunkDataSizeBytes[4] = {0};
fread(chunkDataSizeBytes, sizeof(char), 4, inputFile);
uint32_t chunkDataSize = littleEndianBytesToUInt32(chunkDataSizeBytes);
otherChunkLocations[otherChunksCount].size = sizeof(nextChunkID)
+ sizeof(chunkDataSizeBytes)
+ chunkDataSize;
// Skip over the chunk's data, and any padding byte
fseek(inputFile, chunkDataSize, SEEK_CUR);
if (chunkDataSize % 2 != 0)
{
fseek(inputFile, 1, SEEK_CUR);
}
otherChunksCount++;
}
Assuming our input file contained at least a format chunk and a data chunk, we can go ahead in read in the cue point locations. For this, I assumed that the locations would be presented in a plain text file with one location per line (as this is reasonably easy to generate from Pro Tools’ “Export Session Info” feature). However, portability becomes an issue here again, with different platforms using different end-of-line characters in plain text files. OS X and Unix use the newline character ’\n’, where as Windows uses a carriage return character followed by a newline character ’\r\n’. For completenesses sake we will also accommodate the classic Mac’s case, which uses a single carriage return character ’\r’.
To read in the cue point locations we can through each character in the file, adding any numeric characters to a string. When we hit the end of the line, we convert the string to an integer and start over:
FILE *markersFile = fopen(markerFilePath, "rb");
uint32_t cueLocations[1000] = {0};
int cueLocationsCount = 0;
while (!feof(markersFile))
{
char cueLocationString[11] = {0};
// Max Value for a 32 bit int is 4,294,967,295,
// i.e. 10 numeric digits, so char[11] should be
// enough storage for all the digits in a line
// plus a terminator (\0).
int charIndex = 0;
// Loop through each line int the markers file
while (1)
{
char nextChar = fgetc(markersFile);
if (feof(markersFile))
{
cueLocationString[charIndex] = '\0';
break;
}
// check for end of line
if (nextChar == '\r')
{
// This is a Classic Mac line ending '\r'
// or the start of a Windows line ending '\r\n'
// If this is the start of a '\r\n', gobble up
//the '\n' too
char peekAheadChar = fgetc(markersFile);
if ((peekAheadChar != EOF) && (peekAheadChar != '\n'))
{
ungetc(peekAheadChar, markersFile);
}
cueLocationString[charIndex] = '\0';
break;
}
if (nextChar == '\n')
{
// This is a Unix/ OS X line ending '\n'
cueLocationString[charIndex] = '\0';
break;
}
if ( (nextChar == '0') || (nextChar == '1') ||(nextChar == '2')
||(nextChar == '3') ||(nextChar == '4') ||(nextChar == '5')
||(nextChar == '6') ||(nextChar == '7') ||(nextChar == '8')
||(nextChar == '9'))
{
// This is a regular numeric character, if there are less than
// 10 digits in the cueLocationString, add this character.
// More than 10 digits is too much for a 32bit unsigned
// integer, so ignore this character and spin through the loop
// until we hit EOL or EOF
if (charIndex < 10)
{
cueLocationString[charIndex] = nextChar;
charIndex++;
}
}
}
// Convert the digits from the line to a uint32 and add to cueLocations
if (strlen(cueLocationString) > 0)
{
long cueLocation_Long = strtol(cueLocationString, NULL, 10);
if (cueLocation_Long < UINT32_MAX)
{
cueLocations[cueLocationsCount] = (uint32_t)cueLocation_Long;
cueLocationsCount++;
}
}
}
We then create cue point structs for each location and add them to a cue chunk.
// Create CuePointStructs for each cue location
CuePoint *cuePoints = malloc(sizeof(CuePoint) * cueLocationsCount);
for (uint32_t i = 0; i < cueLocationsCount; i++)
{
uint32ToLittleEndianBytes(i + 1, cuePoints[i].cuePointID);
uint32ToLittleEndianBytes(0, cuePoints[i].playOrderPosition);
cuePoints[i].dataChunkID[0] = 'd';
cuePoints[i].dataChunkID[1] = 'a';
cuePoints[i].dataChunkID[2] = 't';
cuePoints[i].dataChunkID[3] = 'a';
uint32ToLittleEndianBytes(0, cuePoints[i].chunkStart);
uint32ToLittleEndianBytes(0, cuePoints[i].blockStart);
uint32ToLittleEndianBytes(cueLocations[i], cuePoints[i].sampleOffset);
}
// Populate a CueChunk Struct
CueChunk cueChunk;
cueChunk.chunkID[0] = 'c';
cueChunk.chunkID[1] = 'u';
cueChunk.chunkID[2] = 'e';
cueChunk.chunkID[3] = ' ';
uint32ToLittleEndianBytes(4 + (sizeof(CuePoint) * cueLocationsCount),
cueChunk.chunkDataSize);
uint32ToLittleEndianBytes(cueLocationsCount,
cueChunk.cuePointsCount);
cueChunk.cuePoints = cuePoints;
We are now ready to write out the new wave file. As the length of the file is specified in the root RIFF chunk’s header, we need to calculate the length up front:
FILE *outputFile = fopen(outFilePath, "wb");
// Update the file header chunk to have the new data size
uint32_t fileDataSize = 0;
fileDataSize += 4; // the 4 bytes for the Riff Type "WAVE"
fileDataSize += sizeof(FormatChunk);
fileDataSize += formatChunkExtraBytes.size;
if (formatChunkExtraBytes.size % 2 != 0)
{
fileDataSize++; // Padding byte for 2byte alignment
}
fileDataSize += dataChunkLocation.size;
if (dataChunkLocation.size % 2 != 0)
{
fileDataSize++;
}
for (int i = 0; i < otherChunksCount; i++)
{
fileDataSize += otherChunkLocations[i].size;
if (otherChunkLocations[i].size % 2 != 0)
{
fileDataSize ++;
}
}
fileDataSize += 4; // 4 bytes for CueChunk ID "cue "
fileDataSize += 4; // UInt32 for CueChunk.chunkDataSize
fileDataSize += 4; // UInt32 for CueChunk.cuePointsCount
fileDataSize += (sizeof(CuePoint) * cueLocationsCount);
uint32ToLittleEndianBytes(fileDataSize, waveHeader->dataSize);
// Write out the header to the new file
fwrite(waveHeader, sizeof(*waveHeader), 1, outputFile);
We next write all our chunks out to the output file. To keep the code simple and readable, we use a little helper function to copy chunks we haven’t modified from the input file to the output file:
int writeChunkLocationFromInputFileToOutputFile(ChunkLocation chunk,
FILE *inputFile,
FILE *outputFile);
This function just copies over chunk data from the input file in 1MB pieces. You can see the implementation in the full source code listing. Although chunks can appear in any order in a wave file, it makes sense to put all the metadata chunks first and the data chunk at the end - as this means a program can get all the information it needs to start playback before the whole file is loaded - useful if the file is large and being streamed over a network. So we next write out the format chunk, the cue chunk and any other chunks we came across in the input file, and finally the data chunk.
// Write out the format chunk
fwrite(formatChunk, sizeof(FormatChunk), 1, outputFile);
if (formatChunkExtraBytes.size > 0)
{
writeChunkLocationFromInputFileToOutputFile(formatChunkExtraBytes,
inputFile,
outputFile)
if (formatChunkExtraBytes.size % 2 != 0)
{
fwrite("\0", sizeof(char), 1, outputFile) < 1);
}
}
// Write out the start of new Cue Chunk: chunkID, dataSize and cuePointsCount
size_t writeSize = sizeof(cueChunk.chunkID)
+ sizeof(cueChunk.chunkDataSize)
+ sizeof(cueChunk.cuePointsCount);
fwrite(&cueChunk, writeSize, 1, outputFile);
// Write out the Cue Points
uint32_t cuePointsCount =
littleEndianBytesToUInt32(cueChunk.cuePointsCount)
for (uint32_t i = 0; i < cuePointsCount; i++)
{
fwrite(&(cuePoints[i]), sizeof(CuePoint), 1, outputFile);
}
// Write out the other chunks from the input file
for (int i = 0; i < otherChunksCount; i++)
{
writeChunkLocationFromInputFileToOutputFile(otherChunkLocations[i],
inputFile,
outputFile);
if (otherChunkLocations[i].size % 2 != 0)
{
fwrite("\0", sizeof(char), 1, outputFile)
}
}
// Write out the data chunk
writeChunkLocationFromInputFileToOutputFile(dataChunkLocation,
inputFile,
outputFile)
if (dataChunkLocation.size % 2 != 0)
{
fwrite("\0", sizeof(char), 1, outputFile);
}
And that’s it, we now have a wave file with embedded cue points that we can use for FMOD or any other such application. The full source code is available here (public domain). You could wrap it up inside a GUI, use it in an audio app, or just compile it and run it from the command line.



