|Hermetic Word (and Phrase) Frequency Counter|
A Customizable Multiple-File Word and Phrase Counting Program for Windows
Hermetic Word Frequency Counter Advanced Version scans one or many MS Word DOCX files or text or text-like files — including HTML and XML files encoded via ANSI or UTF-8 — and counts the number of occurrences of the different words in all files together (optionally ignoring common words such as the and this). It is thus also a multiple-file word-search program. It is possible to specify exactly what counts as a word (e.g., words with or without hyphens or numerals). The words and phrases found can be listed alphabetically or by frequency, with rank and frequency count displayed for each.
This software comes in two versions: Hermetic Word Frequency Counter (WFC) and Hermetic Word Frequency Counter Advanced Version (WFCA). These are two separate programs. The main difference is that WFC counts words only in single DOCX, text and text-like files (including HTML and XML files), whereas WFCA counts words and phrases in multiple files (in multiple folders) in a single operation. If you need to count words in only one file at a time then WFC is what you need. (Click on this link for the WFC page.) If you have many files or wish to count phrases or need more options and functionality for words and phrases) then you need WFCA (so read on).
Theoretically there is no limit on the size of an input file or the number of words in it, but in practice (due to processing time needed) there is a limit of about 10 Mb on text files (and text-like files such as XML and HTML files). There is also a limit of about 10 Mb on the amount of text in an MS Word DOCX file (though a DOCX file can be larger than this if it contains many images). For a DOCX file, only words in the body of the document are counted, not words in footnotes or endnotes.
The Advanced Version does everything that the basic version does, including support for UTF-8 encoded text. The section below details the additional functionality of the Advanced Version, mainly, the ability to count words in multiple files, the ability to count phrases as well as words, and the ability to count occurrences of a word or a phrase which matches a specified pattern (so it is also a multiple-file search program). Thus the user manual for the basic version should be read in conjunction with this page.
This software counts words and phrases in MS Word DOCX files (but not Word DOC files) and in text and text-like files (including HTML and XML files). It does not act directly on binary files (other than DOCX files) such as PDF files; such files can be processed if they can be converted to DOCX files or to text files (see Scannable Files in the user manual for the basic version).
To open a single file to count words or phrases, select the Single File option and click on the Single File button. To count words or phrases in multiple files in a particular folder, select the Folder option and click on the Folder button. After setting the operation parameters click on the appropriate Count button. Below is a screenshot showing the results of counting words in all .txt and .DOCX files (except for files ending in _E.DOCX) in a folder:
Click here for a screenshot showing the output when relative frequencies (instead of absolute frequencies) are calculated.
|Here is another screenshot when 3-word phrases (in the same set of files) are counted. (Click on the image to enlarge.)
||Here is a screenshot showing the result of counting all words in 8 Word DOCX files (sizes ranging from 14 Kb to 1400 Kb) with common words ignored:
||Here is a screenshot showing the result of counting all phrases of from 4 to 6 words which occur at least 3 times in a Word DOCX file of size 43.37 Kb containing 40.20 Kb of text:
This software has many different uses. One example is for searching for words and phrases in news stories. You can download multiple pages from the web then search through them for such terms as “economic recovery”, “chinese stocks”, “air traffic controller strike” and “IMF payment”. Searches can return the names of the files in which the target phrases occur, as explained at Report Formats (see also Sorting Documents by the Number of Occurrences of a Word or Phrase). There are, of course, many other possible uses for this software.
Differences from the Basic Version
The following are some (but not all) features of the Advanced Version (WFCA) which are not present in the basic version (WFC):
The ability to:
- count not just all words in a file but also all phrases (within bounds of phrase length).
- scan not just one file but all files in a folder, and optionally in all subfolders of that folder, and to return a single report on the frequencies of words and phrases in all files scanned.
- specify not only a list of words to be ignored (such as common words in a natural language) but also specify a list of words and phrases which are to be counted (or searched for).
- count words or phrases matching a given pattern.
- ignore words matching a given pattern.
- display relative frequency of occurrence as well as absolute frequency.
- display, for each word or phrase found when scanning multiple files, the files in which it occurs, and how many times.
- order words or phrases according to the number of files in a set of files in which those words or phrases occur.
- include or exclude files of certain types.
- generate an Excel-readable file containing a table of frequencies of words and phrases vs the files in which they occur.
Most users will need just a few of these abilities, so only the relevant parts of the user manual need be consulted.
The 'Settings' Panel
Here is what the 'Settings' panel looks like in the Advanced Version:
Note that Ignore words and Ignore common words only applies when counting words, not when counting phrases. There's a good reason for excludng words-to-ignore and words-less-than-N-characters when counting words, but there's a good reason for not excluding them from phrases found (unless the phrase consists entirely of them), namely, that if they were excluded then (1) the program would not return the actual phrases found, and (2) perhaps different phrases would then be conflated and so the program would give an incorrect report of the frequency of those different phrases.
The default for Phrase must not go beyond ... is everything except 'comma'. If, however, your document contains
then you would want to uncheck the box for 'end of line' and check the box for 'comma'. Note that text as in this example should not be mixed with text where an end-of-line, period or double quote terminates a phrase.
User Manual for Hermetic Word Frequency Counter Advanced Version
As stated above, the Advanced Version does everything that the basic version does, so the following sections of the user manual for the basic version apply also to the Advanced Version.
Trial version: A copy of the Hermetic Word Frequency Counter Advanced Version installation program can be downloaded for the purpose of evaluation. Click on the following link for further information:
Download Hermetic Word Frequency Counter Advanced ...
Price and ordering: A single-user license for the fully-functional software is available for a period of 3 months, 1 year or with no time limit (a 'perpetual' license). Prices for each type of license are given at Purchase a User License. An activation key is required in order to make the trial version permanently fully functional, and can be obtained immediately (or soon after) your purchase.
Refund: A refund will be provided promptly up to 30 days after purchase if the software does not perform satisfactorily.
Updates: Purchasers of a user license for this software are entitled to an update to any later version at no additional cost.
Upgrading from the basic version:
Purchasers of a perpetual user license for Hermetic Word Frequency Counter may upgrade to a perpetual user license for the Advanced Version by paying $26.45, €23.45 or £19.95 (excluding any sales tax). To purchase the upgrade click on one of the links below. Note that this is available only if a perpetual single-user license for Hermetic Word Frequency Counter has already been purchased.