Phrase Frequency Counter Advanced Non-English Text
Use of the program to count phrases in non-English text
Phrase Frequency Counter Advanced may be used with text in most (but not all) European languages, including German, French, Italian, Spanish and Portuguese — in fact, any language whose characters can be encoded using ISO 8859-1, a subsetof Windows 1252. (For more details see Scannable Files and Languages Supported.) Some European languages (such as Polish and Czech) and all non-European languages (such as Arabic and Hebrew) are not supported.The screenshot below shows the result of counting phrases in a 1.04 Mb HTML page in French (using the French words-to-ignore file, so that phrases consisting entirely of words-to-ignore are excluded):
Here is an example of the output when scanning German text for phrases with exactly 4 words occurring at least twice:
When we select the option Remove words to ignore from phrases and recount we obtain "condensed" phrases:
Here is an example of the output when scanning Portuguese text (the phrases are ordered alphabetically):
Note that when scanning text in some language one normally must first (via the Settings window) load the words-to-ignore file for that language. This program provides such files for English, German, French, Spanish, Portuguese and Italian. When processing a folder which contains files in more than one of these languages (or when successively scanning single files in more than one language) one can use the words-to-ignore file for all six languages.
PFCA Main Page Further Information Hermetic Systems Home Page