TTS Project Journal

Output

← 4: The Living LCV
→ 6: The Voice Library

In spite of this being a journal, I have provided little information on the current state of the project. When I've gone in-depth about certain sections they've all been finished or at least refined to the point of being able to provide an exact description of its contents. This entry is different. This entry is about DATA. Firstly the data the program has already produced, and secondly the structures of this data. This entry is also about my misadventures with writing the map library.

I have made this package of files (7z, zip) for you to download and inspect. For the full experience, you will require a hex editor that displays textual and decimal representations of hexidecimal numbers, and an audio program that can import raw data, such as Audacity.

The Audacity dialogue for opening one of these files is under "File > Import > Raw Data...." Save bob.lcv, which for some circumstance I can't recall is in big-endian, all data from the package is in little-endian 16-bit signed PCM. This means that an amplitudes of -1.0 to 1.0 are represented by -32,768 to 32,767, the minimum and maximum value of a signed integer with 16 bits of precision. If you're wondering where 32,768 went, one of the bit combinations of a signed integer is stolen to represent 0. The byte offset should always be 0, and the sample rate 44100, but Audacity will often assume otherwise. Though you aren't likely to see Audacity provide native support for my custom audio format any time soon, raw data is always well-formed.

I generated a 1-second, 500Hz sine wave and exported this signal in this format. By March 12th of this year, I had produced the first LVS by attaching a header block to the data. At the beginning of the waveform of sample.lvs, you can see an scratching figure. This is the realization of the header data as audio samples.

In one's hex editor, one can see that the first 0-2 bytes, or 1 1⁄2 samples, are the ASCII letters "LCV." The following 15 bytes correspond to the LVS struct's variables, so that bytes 3-6 is the bytesize of the signal, byte 7 the compression info, etc..

Soon I had the ability to store these LVSs in LCVs, and by March 29th I had added the ability to name the LCV. I created one named "Bob" as a test, consisting of three copies of sample.lvs, perhaps slightly modified from the standalone version. As audio, it appears as a 500Hz sine wave punctuated by two scribbles, following which its phase is reset. At the beginning is another scribble, twice their length, which the hex editor reveals to be the header blocks of an LCV and LVS. If you select one of these header blocks in sample form and loop it, you will hear a most unpleasant sound. I've attempted to make two short songs out of them, and both times Audacity crashed.

Over the weeks since, I've been working on the third major library of the TTS and a Unicode support library. This library is used for the creation of a binary sort tree that holds each string and the index of the sample to convert the string to in the associated LCV's clips vector.

I set up the rudimentary operations for a dumb tree in a solid week or so of work, providing recursive functions for adding a new node wherever a spot was found available and for deleting all the nodes on the tree. A proper binary tree places new nodes by sort order. If you open up to the middle of a dictionary looking for "ogre" and you see the entry "only" on top, you know that you have to look earlier in the dictionary. You have eliminated the entire latter half of the dictionary. Now imagine going to each halfway point of the remaining pages and doing the same thing until you find the right one. A binary tree's structure allows the computer to know how to get to each next halfway point, containing their addresses in the same place as its entries. But if the dictionary is not in alphabetical order, or if you don't know the order of letters in the alphabet, you can't exploit this structure for ballpark elimination.

So, you have to enumerate the letters, so that you can articulate the concept of their order. And what better way to do this than to use their Unicode codepoints? The purpose of Unicode is to do just that - provide a standard enumeration of characters. If you've ever written a Unicode library you know that doing so is joyless and painful. It took a massive detour away from anything I find interesting about coding. It is as fun as making a handcopy of a 40-page newspaper style guide. I died of doldrum.

Midway through, however, I already had enough done to develop the tree to a smart form that held and searched sorted UTF-8 strings. And when I was done it turned out that if I'd tried to extend the binary tree just a little more, to being also able to support UTF-16 or UTF-32 strings, I would have discovered quite early on that C++ polymorphism is a terrible, useless trap. I had never tended towards using it much at all, mainly because I was doing just fine with other features, but also because it never worked when I did. But another situation came up where I needed type plurality. Due to the slightly overcomplicated syntax of unions, I figured I would experiment with polymorphism to accomplish this within the idiom of C++. I was wrong. Not only can you not accomplish type plurality with polymorphism, you cannot accomplish anything more than an automated copy and paste of variable names into the descriptions of other classes. This is because passing an object polymorphically runs it through a dehumanizing sieve that causes it to drop all of its unique features and reduce to an object of the base class.

So I closed my editor and stepped off the project for a couple days, and then rewrote the map library twice on the 3rd. Having written something else in the meantime, I had needed plurality again, and chose to use polymorphism again. I'd outgrown polymorphism much faster this time, so I had just finished a conversion and took the opportunity to write down the steps I take each time to speed things up in the future. They provided some assistance in converting the map library, but in this situation the direct conversion was causing the mechanisms to bloat. Referencing the subobjects' functions from the container needed wrappers upon wrappers, and opcodes and craziness was cropping up. I decided that to do this properly I needed to abandon the structure of the original classes and design the structures from scratch. Surprisingly, after this speedy second redesign I was able to keep the functions of the library virtually (har har) unchanged, because the variables in the subclasses are the same in name, just not in type.

On May 27th came the final test, in which I constructed a program that created the nodes of a binary tree for the strings "a", "s", "z", and "x" and gave them the numbers 0-3 respectively, which associated them with the 4 samples of test.lcv, the LCV I'd been working with since around the 7th of April. After constructing the map and loading the LCV, the program takes a single character user input, searching the map for a match. If it finds one, it takes the number, then takes the LVS at that index, copies its data, and saves that data to a new file. Thus, text became speech. In a manner of speaking.

The four possible outputs of the file, which are identical to the four inputs used to create the LCV, are included in the file package.

~LCK, 5/29/2010

← 4: The Living LCV
→ 6: The Voice Library

All content © Casady Roy Kemper (a.k.a. Loki Clock) and protected by the Digital Millenium Copyright Act and the Berne Convention, unless otherwise stated or unless alternative authorship is indicated without explicit accompanying copyright claims on the part of Loki Clock.