Preprocessing the corpus allows you to remove words you don't want to be considered in the totals (like "the," "of," etc), convert all the words to lower case, and perform many other functions that make the text data nicer for the text analysis.


To preprocess a corpus, you must first have created a corpus. Then, choose Preprocess corpus from the Text menu on the menu bar of the Console window. The following window will appear.

From the Source Corpus pull down menu choose the file to be processed. Four Actions are checked by default. These can be unchecked in order to perform only one or two processes at a time.

The processed corpus is saved by default and the name appears in the Save Corpus As: space. A different name can be provided. The original corpus will remain intact.

(Note: When creating a frequency list, bar chart, or word cloud, choose the appropriate Source Corpus from the pull down menu because the program will default to the first corpus in the list.)

What now?

The processed corpus can be viewed used to create word counts, and word clouds, and generally perform text analysis.