Hermetic Word (and Phrase) Frequency Counter | ||
A Customizable Multiple-File Word and Phrase Counting Program for Windows |
Download of this program has been temporarily suspended
Hermetic Word Frequency Counter Advanced Version scans one or many MS Word DOCX files or text or text-like files — including HTML and XML files encoded via ANSI or UTF-8 — and counts the number of occurrences of the different words in all files together (optionally ignoring common words such as the and this). It is thus also a multiple-file word-search program. It is possible to specify exactly what counts as a word (e.g., words with or without hyphens or numerals). The words and phrases found can be listed alphabetically or by frequency, with rank and frequency count displayed for each.
Theoretically there is no limit on the size of an input file or the number of words in it, but in practice (due to processing time needed) there is a limit of about 10 Mb on text files (and text-like files such as XML and HTML files). There is also a limit of about 10 Mb on the amount of text in an MS Word DOCX file (though a DOCX file can be larger than this if it contains many images). For a DOCX file, only words in the body of the document are counted, not words in footnotes or endnotes.
The Advanced Version does everything that the basic version does, including support for UTF-8 encoded text. The section below details the additional functionality of the Advanced Version, mainly, the ability to count words in multiple files, the ability to count phrases as well as words, and the ability to count occurrences of a word or a phrase which matches a specified pattern (so it is also a multiple-file search program). Thus the user manual for the basic version should be read in conjunction with this page.
This software counts words and phrases in MS Word DOCX files (but not Word DOC files) and in text and text-like files (including HTML and XML files). It does not act directly on binary files (other than DOCX files) such as PDF files; such files can be processed if they can be converted to DOCX files or to text files (see Scannable Files in the user manual for the basic version).
Click here for a screenshot showing the output when relative frequencies (instead of absolute frequencies) are calculated.
Here is a screenshot showing the result of counting all phrases of from 4 to 8 words in a DOCX file of size 21.46 Kb containing 13.70 of actual text:
This software has many different uses. One example is for searching for words and phrases in news stories. You can download multiple pages from the web then search through them for such terms as “economic recovery”, “chinese stocks”, “air traffic controller strike” and “IMF payment”. Searches can return the names of the files in which the target phrases occur, as explained at Report Formats (see also Sorting Documents by the Number of Occurrences of a Word or Phrase). There are, of course, many other possible uses for this software.
The ability to:
And, new in Version 26.07, the ability to filter found phrases so as to display only phrases containing specified words.
Most users will need just a few of these abilities, so only the relevant parts of the user manual need be consulted.
The Convert plural English words to singular checkbox and the Ignore words with fewer than (or more than) N occurrences checkboxes are applicable only when counting words, not when counting phrases.
The default for Phrase must not go beyond ... is everything except 'comma' and 'end of line'. If your document consists of phrases separated by commas then you should check the box for 'comma'. Such text should not be mixed with text where an end-of-line, period or double quote terminates a phrase.