- Computerised Phonetic Fund
of Russian language.
The project started more than 10 years
ago. It is carried out in cooperation with the
University of Ruhr (Bochum, Germany). The
Computerised Phonetic Fund is conceived and
developed as a collection of three related
components:
a) acoustic material,
b) software tools for its processing and analysis
and
c) the results of this analysis.
The team also publishes regularly the Bulletin of
the Phonetic Fund in order to distribute
information on the Funds new elements and
achievements and to present these results in
written form. The description of all elements of
the Fund is always made using the method of
Christian Sappok (University de Ruhr). The
contents of the Computerised Fund prescribed by
the general project is a collection of all forms
and significant units of Russian language, taking
into account all its variants and dialects.
- In particular, the "Phonotheque
of sound units" is a part of
the Phonetic Fund of Russian language. It it
composed of syllables, words and texts. The
syllables included into the Phonotheque are all
possible combinations of consonants with vowels.
Moreover, this set of syllables presents all
possible variants of accentuated vowel
realisation and all possible variants of "strong"
(i.e., followed by a vowel) realisation of
consonants in the Russian language. The words are
presented in three lists. The first contains the
words which have different pronunciation in
Moscow and in St. Petersburg, the second list is
the minimum dictionary to learn Russian as a
foreign language. The third list consists of 2300
most frequent words of the Russian language. The
textual part of the Phonotheque is a text
compiled so that it covers all most frequent (for
the Russian) chains of sounds. It contains the
frequent words which, in turn, contain 200 most
frequent open syllables. This text is presented
in two forms: as a discourse (monologue form) and
as a conversation (dialogue form). The
Phonotheque is available in Bochum in the form of
magnetic record, in the form of digitised copy,
and also in transcribed form. The transcription
has been made by the system UDAR (see the
description).
- Acoustic databases
in the format of "Speech Corpus"
system (see the description).
The databases of this type are designed for the
storage of the phonetically representative sound
material. The storage and the description unit in
this case is the syllable. The description
includes a graphical representation of the
syllable in the text, an ideal phonemic (broad)
and phonetic (narrow) transcription, a real
transcription, statistical characteristics,
peculiarities of the syllable realisation
connected with the prosodic position and the name
of the sound file in which the syllable is stored.
A part of the sound material from the Phonetic
Fund of Russian Language is presented in this
format, namely, the above-mentioned phonetically
representative text, composed of 200 most
frequently occurring Russian syllables in all
possible rhythmic positions. The Russian
phonetically representative text has been
recorded from four Russian speakers (2 male and 2
female speakers), representing Moscow and St.
Petersburg pronunciation standards, and also from
several foreign speakers (Bulgarian, Finnish,
American English, Korean, etc.). This sound
material demonstrates phonetic interference.
The sound material is recorded in 16 bit Raw PCM
format at 20 KHz sample rate.
- Sound archives
(acoustic databases produced from old sound
recordings collections of the Institute of the
Russian Literature (so-called Pushkinsky Dom; in
these old collections the recordings have been
performed on magnetic tapes, wax cylinders or
discs, etc.; now they are presented in
computerised form):
|
These acoustic databases are managed by the "Sound Archive"
system (see the description). A database unit
consists of 20 fields. The first 11 fields correspond
to the archival attribution of the original sound
material (the archival number, time and genre of the
recording, place of the recording, name and age of
the performer and his/her nationality, etc.). The
rest 9 fields are used for the storage of the
following information:
the sonogram of the sound signal,
the text of the recording, in standard
orthography;
its transcription;
the translation of the text into another
language or a complete text of the recording, if the
database contains only its fragment;
experts comments;
dialectological comment;
musicological comment;
phonetic comment;
general comment.
Text files are presented in the Windows ANSI (Text
Only) format; sound files are saved in the 16 bit Raw
PCM format at 16 KHz sample rate.
|