TTS Project Journal

The Living LVS

← 1: Macroscopic Machinery
→ 3: The Voice Sample Library

Within the program, an LVS is a struct, a variable that holds other variables. Each struct of the same type will hold the same types of variables, which are given the same variable names.

struct LVS {
	uint32_t sdata;
	char cpres;
	char bhav;
	enc encod;
	uint32_t nsamp;
	uint32_t rate;
	char* data;
};

This code defines the LVS data type to be a struct consisting of several chars and uint32_ts, as well as an enc and char*. A char, in name, is a variable used to store a typographical character. However, in C++ a char is guaranteed to take up exactly a byte of space. An int, on the other hand, changes very fluidly from system to computer system. Using int32_ts and chars is a way of controlling the data's size, which will be important for reading and writing this data among other things. For these intents and purposes, treat enc as a char.

data is a char*, or pointer to a character. This pointer is a placeholder for an array, or chain of data, that will be created in its name. The data in question is the sample data, the digital audio that the container was meant to hold. You have to know how long an array's going to be before you create it, but in many situations we won't know how long our sample data's going to be, hence the pointer. As for the use of a character, a character is again reliably a byte in length, and a single-byte sample precision is the lowest supported. You can read "char* data" as, "placeholder for a string of bytes named data."

We are all one with nature; there is no end to me and beginning to you. Your computer's memory also has boundary issues, and since this data is stored in memory we have to keep track of where it ends. For single variables this isn't a problem — it's always going to be 1 — but for arrays you need to keep track of the length or your computer will molest nature in an attempt to make a crude semblance of human speech. This task of keeping the size of your data is relegated to sdata

cpres, bhav, and encod are flag container variables, which are provided to store information on the compression, behavior, and encoding of the LVS's data. See, in boolean logic, you assign a number to a state of being. 0 is the state of inexistence, or falsehood, and 1 is the state of existence, or truth. If you want to keep track of whether you've turned a setting on or off you store this state in a flag. A collection of flags reads as a checklist of features that describe how the program is to handle the data.

You can store each of these 1s or 0s in a separate flag variable, which allows for very straightforward assignment. But that's no fun, not to mention wasteful. With a flag container, instead of assigning each truth value to a variable, you assign it to a bit of that variable, so that bits of data can not only used in concert, for counting with normal binary numbers, but also for representing truth states individually. Changing or reading a bit to the exclusion of the other bits of the variable requires the use of bitwise operators. You can only directly operate on whole variables. You cannot select their bits as if by name. Instead you use masks, which are values that effectively select bits by having any of its bit you don't want to change or read on the other variable set to an inert value. The practice of using bitwise operations on a variable and a mask to modify them on the per-bit level is known as "bit twiddling," and by doing it you can pack 8 truth values into a single byte. It is definitely a love it or hate it kind of thing.

nsamp and rate refer to the number of samples, or measurements of amplitude, taken in total and to the resolution or bit rate, typically the sampling rate, of the audio. For discrete time signals, one stores the measured amplitude, or reference volume, as it varies over time. The more frequently you take measurements, or sample, the audio, the more accurate your copy of the signal will be. You not only have to record data at a sampling rate, but you also have to send each sample through your speakers at a playback rate. When that rate differs from the sampling rate, such as with a 45 played at 33.3 RPM, you get a sped up or slowed version of the audio.

Discrete time signals, however, are not the only type of signals I plan to allow the program to utilize. They are merely what is available now. I also plan to include discrete frequency signals, distinguished by a behavioral flag, which store the frequency content of a signal in an instantaneous moment in time. In this context, the resolution of the signal is the difference in frequency (in inverse Hz), rather than time, between neighboring samples, and each sample represents the amplitude of that frequency.

These variables make up the functionality and structure of an LVS, the native sample format, and currently the only format the TTS can process. Until the implementation, somewhere down the road, of paratextual descriptions of a language's phonetics, a unique correspondence between text and sound must be maintained, and the capabilities of the program in terms of the audio produced are exactly the features of the LVS.

~LCK, 5/10/2010

← 1: Macroscopic Machinery
→ 3: The Voice Sample Library

All content © Casady Roy Kemper (a.k.a. Loki Clock) and protected by the Digital Millenium Copyright Act and the Berne Convention, unless otherwise stated or unless alternative authorship is indicated without explicit accompanying copyright claims on the part of Loki Clock.