TTS Project Journal

The Living LCV

← 3: The Voice Sample Library
→ 5: Output

The LCV, or Loki Clock Voice is the packaged voice file that you would directly install for use within the finished TTS.

struct LCV {
	char vrsn;
	uint64_t sdata;
	char bhav;
	uint32_t keys;
	char namelen;

	// Data
	char* name;
	std::vector clips;

	// Internal use
	int namestack;
};

name and vrsn, the version number, are self-explanatory, as should be sdata and bhav if you read The Living LVS. namelen and namestack only really come into play during the particulars of allocation and loading, so they won't be discussed here.

vosamp is a type union. A type union allows one to contain multiple data types, like a struct, except only one of the variables is in use at a time, and the union only takes up the space of its largest variable, rather than the total space of its variables. In this case, the vosamp union is used to create a plurality of type.

A type union, such as vosamp, is used here as a means of obtaining type plurality. This is so that you can create a data container that holds multiple types, in this case multiple audio formats, under the same name without violating the strong typing restriction of the language.

union vosamp { //LCV's supported types
	LVS dlvs;
};

This type plurality can be achieved in a number of ways, most obviously through C++'s class polymorphism. I'm still experimenting with achieving plurality through polymorphism, but this attempt and each other at doing so has been painful and fruitless. This is mainly due to the loss of type information that polymorphism causes when an object is passed into a function polymorphically. The end result has been the exact same restrictions being imposed as with unions, plus new ones imposed on the structure of the data types used, and the sole benefit of prettier syntax.

A vector is like an array of data, except you can add or remove items from it flexibly. The syntax of the type definition is quite irregular, for reasons better explained in a C++ course. This is the aforementioned container that, with a mechanism for type plurality, can hold all our sample types. clips is, of course, that vector, and keys stores the number of samples within clips. You can get this number from clips itself, but keeping a copy of it allows independence of the header data from the sample data.

Setting aside the technical details, there's the big picture of what we have assembled. The LCV is a container of audio samples. These samples can be of potentially disparate format, so that, with future support, you can use any audio resource in your possession to assemble a voice, even if you can't convert them all to the same format, and even if one of these formats is not in fact a sample at all. The potential in the latter possibility can be explored to provide very abstract representations of the language's aural features welded to concrete representations through sampling. At the point that this or the LVS's goal of discrete frequency data is achieved, the TTS will transcend the stated qualification of a sample concatenation based speech synthesizer. It will be the best. I swear it to you all.

~LCK, 5/22/2010

← 3: The Voice Sample Library
→ 5: Output

All content © Casady Roy Kemper (a.k.a. Loki Clock) and protected by the Digital Millenium Copyright Act and the Berne Convention, unless otherwise stated or unless alternative authorship is indicated without explicit accompanying copyright claims on the part of Loki Clock.