Hermetic Word Frequency Counter Advanced Version
Count All Phrases

This section of the user manual explains what the "Count all phrases" button does. The previous section (Count All Words or Count Specified Words and Phrases) explains what the "Count words/phrases" button does.

A phrase is a sequence of two or more words (separated by spaces). If all phrases are counted then (in any moderately-sized section of text) there are a huge number of them, so a (user-specified) limit has to be placed on the maximum number of words in a phrase. All phrases must be checked to ascertain how often they occur. Usually only the phrases which occur more than once are of interest.

As noted previously, the main screen has two buttons for counting words and phrases. If you wish to (a) count all words (not phrases) in one or more files or (b) count only specified words or phrases then click on the Count words/phrases button; what happens then is explained in the previous section.

If you wish to count all phrases in one or more files then click on the Count all phrases button. This brings up a panel whose content depends on the settings, specifically, on (a) whether there is a specification of words to ignore and (b) whether there is a specification of words and phrases to count. The usual case is where you wish to count all phrases except for those which consist entirely of words given in a words-to-ignore file. In this case clicking on the Count all phrases button brings up a panel such as this:

You can then specify the lengths of the shortest and longest phrases that you are interested in, and the minimum number of times they should occur in order to be included in the results. Clicking on the Count phrases button will then count all such phrases except those consisting entirely of words-to-ignore.

checkbox: exclude redundant subphrases subphrases of phaseThis checkbox has an important effect. Suppose the phrase 'All hands on board now' occurs in the document, either once or more, and the program is to count phrases of 3 to 5 words. If this checkbox is not checked then all subphrases will be counted, as shown at right. If the checkbox is checked then a count will appear only for the phrase 'All hands on board now'.

As when counting words/phrases, results can be displayed in various formats by selecting one from the Format drop-down list. If the file source is a folder then the most detailed is the option word file-list (+freq). A less detailed format which gives the number of files in which a word or phrase occurs is word freq. no.files. If you have not selected this then at this panel you are given an opportunity to do so.

Other display formats which show the files in which words/phrases occur are described at Display Formats.

Larger values for the length of the longest phrase increase the processing time. The value for Minimum number of occurrences does not affect the processing time.

If you uncheck the Exclude phrases consisting only of words to ignore, or if you have not specified any words-to-ignore, then the result will often contain phrases which are of no interest.

If you have specified a list of words or phrases to be counted (either in a count-only words/phrases file or in the Extra count-only words/phrases textbox in the Settings window) then clicking on the Count all phrases button brings up a similar panel which advises you that the operation will ignore the specification of count-only words/phrases (since you wish to count all phrases, not just particular phrases).

Here is an example of the output of a count-all-phrases operation on a 2.81 MB docx file whose 230.86 KB of text contains 331 phrases of from 3 to 6 words and 769 instances of those phrases:

To count all phrases which match a certain pattern see Counting Words and Phrases with Pattern-Matching.

Introduction User Manual: Contents
Hermetic Systems Home Page