Зарегистрироваться
Восстановить пароль
FAQ по входу

Piotrowski M. Natural Language Processing for Historical Texts

  • Файл формата pdf
  • размером 2,46 МБ
  • Добавлен пользователем
  • Описание отредактировано
Piotrowski M. Natural Language Processing for Historical Texts
Springer Nature Switzerland, 2022. — 154 p. — ISBN 978-3-031-01018-7, 978-3-031-02146-6 (eBook)
This book is about natural language processing (NLP) for historical texts. For the purpose of NLP, historical texts may be defined as texts written in historical languages. Defining the term historical language is actually harder than it may seem at first. One of the definitions for historical given by the
New Oxford American Dictionary is “belonging to the past, not the present.” Latin, Ancient Greek, or Biblical Hebrew would clearly fit that description. Old English, Old German, or Old French are obviously also historical languages, as texts written in these languages are not understandable to today’s speakers of English, German, or French.
The problem is, however, that present and past are moving targets: today’s present will be tomorrow’s past. Historical linguists have defined language stages with conventional start and end dates; for example, Old English is said to have been used up to 1150, Middle English from 1150 to
1470, and modern English since about 1500. Should we thus consider only English texts from before 1500 as “historical?” Certainly not. The language of Shakespeare’s (1564–1616) plays also clearly differs from today’s language, and while a modern English speaker can understand a large portion of it, a glossary is required for many words that have fallen out of use. Abraham Lincoln’s Gettysburg Address, given on November 18, 1863, is even more recent, but even though it is relatively easy to understand for educated modern speakers of English, it differs notably in style from modern English texts—today, nobody would say four score and seven years to mean 87.
In their preface to a special issue of the journal Traitement Automatique des Langues, Denooz and Rosmorduc suggest that “the best delimitation of ancient languages might be in terms of scholarly community” (Denooz and Rosmorduc, 2009, p. 13). Even though somewhat self-referential, this can be a useful definition, as these communities (e.g., historical linguists, medievalists, or Egyptologists) will both use and inform NLP for historical languages.
We can see that it is not easy to define what historical languages are, even though it is obvious that they exist. However, for natural language processing as an engineering discipline, perhaps we do not need a scholarly definition. Instead, we could look where standard NLP tools and techniques— aimed at current languages—are having problems when applied to older texts. This may give us a number of features that characterize historical languages—and thus define the requirements for NLP for historical languages.
  • Чтобы скачать этот файл зарегистрируйтесь и/или войдите на сайт используя форму сверху.
  • Регистрация