Further Voyages to the Land of The Hapax

In this section of what seems to be an endless essay on the theme of the “hapax legomena” and 1960s experimental music (in particular, the ONCE Festival and the works of Robert Ashley, with a special focus on Ashley’s 1972 composition “In Sara, Mencken, Christ, and Beethoven There Were Men and Women,” based on a book of the same name by the mysterious John Barton Wolgamot), I will try to focus in on the politics of the hapaxic. The main section will be a walk-through of an essay, chosen at random, from the scholarly literature on “hapax legomena.”

Then I will turn to some stray thoughts I had after completing last week’s essay, which, I hope, will set the stage adequately for future researches.

Our text, the first hit on a JSTOR search of “hapax legomena,” is an essay by Frederick E. Greenspahn: “The Number and Distribution of Hapax Legmonea in Biblical Hebrew” in Vetus Testamentum, Vol. 30, Fasc. 1 (Jan, 1980).

About Frederick E. Greenspahn or  Vetus Testamentum I know absolutely nothing. (By an accident of birth and education, I am fluent in Biblical Hebrew, but I don’t think that is either here nor there).

Greenspahn’s concern is to intervene in debates regarding words that appear only once in Jewish holy writ. The Hebrew Bible represents––it stands to reason––only a small slice of the total set of literary works produced in ancient Israel; but because only this small slice survives, philologists must necessarily engage in a certain amount of speculative or probabilistic thinking.

“Many words and phenomena,” Greenspahn writes, “which are rare in the extant corpus may have been widespread in antiquity while there were no doubt others in use which are simply unknown to modern scholarship.” Additionally, there may well have been words in use at the time of the Old Testament’s writing that the text’s writer “never had occasion to use.”

Nowhere do these problems become clearer than in the case of “hapax legomena.”

According to conventional wisdom, Greenspahn reports, “an inordinately large sector of the Biblical Hebrew vocabulary” is made up of “words which occur a single time only…”

The conclusion follows: “although these words may not have been rare in antiquity, their limited attestation in extant texts makes them particularly obscure for us.”

The cumulative effect of this profusion of “hapax legomena,” (here Greenspahn quotes one G.R. Driver) is that “an unduly high proportion of words is exceedingly rare so that their precise sense can hardly be determined.” This strikes us as interesting, because it introduces the theme of “indeterminacy”––also a key concern of the experimental aesthetics we are hoping to make sense of.

Greenspahn seeks to counter this consensus by means of modern methods of studying the word frequency and distribution––the “statistical stylistics”––of the Hebrew Bible’s “hapax legomena.”

To study these “hapax legomena” requires, first, that the author settle on a list of such words, which is no simple matter. Borrowing a distinction from a certain Casanowicz (a distinction that seems like it might be useful to bookmark), Greenspahn differentiates words that occur only once, one the one hand; and a purer form of “absolute hapax legomena,” on the other, which “are either absolutely new coinages of roots or can not be derived in their formation or in their specific meaning from other occurring stems.”

There is onlylimited agreement among scholars, Greenspahn cautions,  as to the number or identity of biblical “hapax legomena.” “While there is no consensus as to the precise size of the Bible’s vocabulary,” he continues, “the widely used Lexicon contains some fifty-seven hundred separate entries.” C. Rabin finds 2440, i.e. about two-fifths, whereas Greenspahn’s count yields 1501, just over one-fourth.

To compare the rate of “hapax legomena” within the Hebrew Bible, one must first establish how many nonce words occur, and with what frequency, in other literatures.

“The question of word frequency within a given text,” Greenspahn writes, “has been considered in some detail by statisticians interested in finding quantitative measures of style.”

Several factors affect the specific proportion of a work’s vocabulary which occurs only once: the nature of the language (as a highly inflected language will necessarily have a greater variety of forms); style (“some authors tend to use a more erudite vocabulary and thus a greater proportion of rare words than do others”); and length of text being studied (“a small text provides little opportunity for words to recur and contains therefore a large proportion of rare words; as the length increases, however, there is a growing probability that these words will be repeated”).

Greenspahn affirms the strange point that we reported in the first installment of this essay (in regard to the excellently named Zipf’s Law): hapaxes” are always the most common subset of any text’s words. While the exact percentages of “hapax legomena” vary, “nonetheless it has been incontrovertibly demonstrated on both theoretical and empirical grounds that when the words in a text are arranged according to the frequency with which they occur, the hapax legomena always comprise the largest group, ranging usually between two and three fifths of the vocabulary, followed by dislegomena and so on…” This general range “remains remarkably constant regardless of the nature of the linguistic material.”

Greenspahn continues:

Among literary works, for example, Corneille’s L’Illusion Comique contains forty-four per cent while The Captain’s Daughter by Pushkin has fifty per cent hapax legomena and Shakespeare’s plays As You Like It and Julius Caesar fifty-nine and fifty-five per cent respectively. Similar results can be established from the examination of ancient documents. The Greek text of Mark’s gospel includes forty-seven per cent, while Paul’s epistles are composed of forty-three per cent hapax legomena. A study of Plautus’ vocabulary discovered sixty-four per cent, while there are fifty-seven per cent in each of two of Seneca’s consolations.

Heterogeneous and non-literary samples yield similar results. An analysis of newspaper English discerned fifty per cent hapax legomena, while a selection of English telephone conversations contained thirty-seven per cent and French conversations thirty-three per cent.

The takeaway from all this, then, is that not only is the scholarly consensus about the high frequency of “hapax legomena” in the Hebrew Bible sort of wrong, it is, in fact, exactly wrong: the Bible is a uniquely low-“hapax” text.

Why? It could be that something about biblical Hebrew disfavors the hapaxic. More interesting, to me at least, is the possibility of “obscured homographs”: “(The Bible’s)  antiquity and complex history may have obscured the identity of several homographs, words which although spelled the same are etymologically distinct.”

All of this points to the need to revise the conventional wisdom regarding scholarly controversies about the authenticity of certain Biblical nonce words. Again, we would want to flag this part of the discussion as particularly philosophically rich, a section to which we would want to return. “That some words appear only once,” Greenspahn insists, “is inherent in the way language is used and reflects the kind of random processes which underlie word selection and distribution.”

Still, the question of the the low-“hapax”-frequency of the Hebrew Bible remains.

It could be that the length of the text is decisive (Greenspahn cites E. Ullendorff’s observation that the Bible contains some 300,000 words). It could be that “hapax legomena” tend to pop up in certain sub-forms that occur in only parts of the Bible: for example, lists of “species of forbidden animals or the kinds of decoration installed in a royal palace.” (This note would bring to mind the list or litany form of John Barton Wolgamot’s In Sara, Mencken and Ashley’s adaptation). “This is supported for the biblical material,” Greenspahn writes, “by the fact that of the twenty-eight verses which contain more than one ‘hapax legomenon’ apiece, fourteen include lists which by their very nature treat a particular topic more exhaustively than would normally be the case…”

The best way to determine if the Hebrew Bible is actually uniquely “hapax”-lite, at the level of  statistical distribution, is the “chi-square test”–which measures against a hypothetical randomization. (This seems to me to also be a philosophically pregnant and intellectual-historically meaningful point–as Ian Hacking points out, there is a history to the idea of “randomness,” and, in fact, our modern idea of “randomness” and “intentionality” are quite new).

Greenspahn’s conclusion is that Job, Song of Songs, Isaiah, Proverbs, Nahum, Lamentations, and Habakkuk are unusually loaded with “hapax legomena”; those with a significant lack of absolute “hapax legomena” are: Chronicles, Kings, Joshua, Exodus, and Samuel.

This suggests that the more “poetic” the book, the higher the likelihood that it will contain “hapax legomena.” That poetry contains a greater concentration of “hapax legomena,” Greenspahn observes, is “hardly surprising”:

Poetry requires a much broader vocabulary for its imagery and descriptions than does prose writing; as such, it is bound to contain a higher proportion of rare words. Semitic poetry, with its emphasis on parallelism, carries this need still further; the demands of such a medium force the poet to reach deeper into his vocabulary to find a synonym than might otherwise be the case.


In the comments section of the most recent installment of this essay I made a few notes that I would like to situate against Greenspahn’s article.

First, a note on the “hapaxic” as relevant as a form of critique of the standardizing pressures of the “American system of production.”

A passage from Thorstein Veblen’s Theory of Business Enterprise (1904):

The materials and moving forces of industry are undergoing a like reduction to staple kinds, styles, grades, and gauge. Even such forces as would seem at first sight not to lend themselves to standardization, either in their production or their use, are subjected to uniform scales of measurement; as, e.g., water-power, steam, electricity, and human labor. The latter is perhaps the least amenable to standardization, but, for all that, it is bargained for, delivered, and turned to account on schedules of time, speed, and intensity which are continually sought to be reduced to a more precise measurement and a more sweeping uniformity.

The like is true of the finished product. Modern consumers in great part supply their wants with commodities that conform to certain staple specifications of size, weight, and grade. The consumer…. can to an appreciable degree specify his needs and his consumption in the notation of the standard gauge. As regards the mass of civilized mankind, the idiosyncracies of the individual consumers are required to conform to the uniform gradations imposed upon consumable goods by the comprehensive mechanical processes of industry. ‘Local color,’ it is said, is falling into abeyance in modern life, and where it is still found it tends to assert itself in units of the standard gauge” (11-12).

This Veblen quote leads us to consider the question: is a hapax the polar opposite of a cliché, and what would it mean if that was true?

And thinking about clichés, reminds us, again, of one possible, additional antonym: the term “haecceity,” which we snuck in the title of last week’s post.

“Haecceity” is a philosophical term of art associated with the medieval philosopher Duns Scotus, the American pragmatist Charles S. Peirce, and the great synthesizer of Scotus and Peirce, Gilles Deleuze

As the Stanford Encyclopedia of Philosophy explains:

First proposed by John Duns Scotus (1266–1308), a haecceity is a non-qualitative property responsible for individuation and identity. As understood by Scotus, a haecceity is not a bare particular in the sense of something underlying qualities. It is, rather, a non-qualitative property of a substance or thing: it is a “thisness” (a haecceitas, from the Latin haec, meaning “this”) as opposed to a “whatness” (a quidditas, from the Latin quid, meaning “what”). Furthermore, substances, on the sort of metaphysics defended by Scotus, are basically collections of tightly unified properties, all but one of them qualitative; the one non-qualitative property is the haecceity. In contrast to more modern accounts of the problem of individuation, Scotus holds that the haecceity explains more than just the distinction of one substance from another. According to Scotus, the fact that individual substances cannot be instantiated — are indivisible or incommunicable, as Scotus puts it — also requires explaining. In short a haecceity is supposed to explain individuality.”

This discussion leads us to a few quotes–extremely germane to the question of haecceity and hapaxicity––from John Rajchman’s The Deleuze Connections:

Classical philosophy was directed against superstition and error; and no doubt there is still need for such an orientation of thought. But philosophy would confront other problems… In the nineteenth century Deleuze thinks there emerges a new problem––(…) stupidity…Flaubert helps introduce the problem into literature: Bouvard and Pecuchet expose the stupidity of the encyclopedia whose image still haunted Hegel. More generally, Deleuze finds a sort of parallel movement in modern works of art: the great struggle to free sensation of aisthesis from clichés or mere ‘probabilities’ and discover the mad change of singularities. For, as Deleuze puts it in his analysis of postwar cinema, we live in a civilization not of the image, but of clichés, in which the whole question is precisely to extract a genuine image (10-11).

Thus, in contrast to ‘particularity,’ Deleuze talks of ‘singularity,’ and in contrast to ‘generality,’ he talks of an indeterminate ‘plane of composition’ in which singularities would coalesce or come together. He finds an example of the first in the notion of ‘haecceity’ in Duns Scotus, while Spinoza’s notion of ‘Substance’ affords an example of the second; and in Deleuze’s logical universe, there thus exists, as it were, something ‘smaller’ than the most specified individual, ‘larger’ than the most general category… A ‘singularity’ is thus not an instance or instantiation of anything––it is not particularity or uniqueness. As Deleuze puts it, its individuation is not a specification… But this not-fitting-in-a-class, this ‘indefiniteness’… is not a logical deficiency or incoherence, but, rather, as with what Peirce called ‘firstness,’ it is a kind of power or chance, a ‘freshness’ of what has not yet been made definite by habit or law (54-55).

All of this is no doubt a remarkably indirect way to set a table (it is not very different from the way I set an actual table, which is why I am not often asked to set tables).

Nevertheless, I think that, step by errant step, we are gradually approaching something useful.