Workflow for processing texts

Here is the beast:

Workflow for analysis of texts

I first segment the text into short chunks in Elan—segmentation at this stage is based more on breath groups than anything grammatical (although with poetry I can hear the meter, so I segment line-by-line). Then my language helper transcribes the files orthographically in ElanCheck, simple software that I wrote for that purpose. (I can quickly see whether he missed any annotations entirely with eaf-blanks.xsl.) I then check his transcriptions with ElanCheck. We go back and forth until he writes it properly and I hear it properly.

New computer users may not have an appropriate sense of how whitespace is supposed to work (i.e., multiple spaces, line breaks, etc.), so I have a simple XSL transformation to normalize the whitespace in an Elan file format (.eaf), normalize-whitespace.xsl.

Language Explorer does not accept Elan files as input, so I need to convert the Elan file format (.eaf) to plain text. This is done with an XSL transformation that I wrote for the purpose, text-from-eaf.xsl. I can add the output from this process directly to Language Explorer as a new text, simply by pasting the text into the appropriate window of Language Explorer.

But, in the course of glossing the text new errors become evident. It would be possible to fix these in the Elan file and re-import them, but it is easier to make the changes in Language Explorer. Language Explorer has an XML export format with the extension .flextext. Given that file, I can update the Elan transcription with update-eaf-from-flextext.xsl

Thus, I can move an Elan .eaf file into Language Explorer, and a Language Explorer .flextext file into Elan.

Before I can do morphological analysis on a text, I need to switch the baseline of the text in Language Explorer from the orthography to the phonetic transcription. This is discussed in Changing the baseline writing sytem of a text in Language Explorer.

Finally, the free translations are something of an addendum to the procedure. They are not crucial to any other stage of the process, but they need to be there. Rather than have my language helper grapple with Language Explorer, I wrote the program FreeTranslator to make things easier. FreeTranslator accepts FlexText files (.flextext) with single interlinear texts. The easiest way to get these files from Language Explorer is to export them one time into a single .flextext file, and then use split-flextext.xsl to split them into individual files. (I can quickly see whether my language helper missed any translations with translation-blanks.xsl.)

For going the other way—getting the translations from those .flextext files back into Language Explorer—I use merge-translation.xsl. This will put all of the translations from a .flextext file into another .flextext file. So, whenever I want to enter in the Dari translations, I export all of my texts into a single FlexText file, copy in the translations with merge-translation.xsl, and then re-import that into Language Explorer.

Nettlesome problems & their solutions

All contents copyright © 2017 Adam Baker, except where otherwise noted.