Products of the same type
Products of the same team

Linguistic resources of
Department for phonetics and methods of foreign languages teaching, St.-Petersburg State University


  1. Computerised Phonetic Fund of Russian language.
    The project started more than 10 years ago. It is carried out in cooperation with the University of Ruhr (Bochum, Germany). The Computerised Phonetic Fund is conceived and developed as a collection of three related components:
    a) acoustic material,
    b) software tools for its processing and analysis and
    c) the results of this analysis.
    The team also publishes regularly the Bulletin of the Phonetic Fund in order to distribute information on the Fund’s new elements and achievements and to present these results in written form. The description of all elements of the Fund is always made using the method of Christian Sappok (University de Ruhr). The contents of the Computerised Fund prescribed by the general project is a collection of all forms and significant units of Russian language, taking into account all its variants and dialects.
  2. In particular, the "Phonotheque of sound units" is a part of the Phonetic Fund of Russian language. It it composed of syllables, words and texts. The syllables included into the Phonotheque are all possible combinations of consonants with vowels. Moreover, this set of syllables presents all possible variants of accentuated vowel realisation and all possible variants of "strong" (i.e., followed by a vowel) realisation of consonants in the Russian language. The words are presented in three lists. The first contains the words which have different pronunciation in Moscow and in St. Petersburg, the second list is the minimum dictionary to learn Russian as a foreign language. The third list consists of 2300 most frequent words of the Russian language. The textual part of the Phonotheque is a text compiled so that it covers all most frequent (for the Russian) chains of sounds. It contains the frequent words which, in turn, contain 200 most frequent open syllables. This text is presented in two forms: as a discourse (monologue form) and as a conversation (dialogue form). The Phonotheque is available in Bochum in the form of magnetic record, in the form of digitised copy, and also in transcribed form. The transcription has been made by the system UDAR (see the description).
  3. Acoustic databases in the format of "Speech Corpus" system (see the description).
    The databases of this type are designed for the storage of the phonetically representative sound material. The storage and the description unit in this case is the syllable. The description includes a graphical representation of the syllable in the text, an ideal phonemic (broad) and phonetic (narrow) transcription, a real transcription, statistical characteristics, peculiarities of the syllable realisation connected with the prosodic position and the name of the sound file in which the syllable is stored.
    A part of the sound material from the Phonetic Fund of Russian Language is presented in this format, namely, the above-mentioned phonetically representative text, composed of 200 most frequently occurring Russian syllables in all possible rhythmic positions. The Russian phonetically representative text has been recorded from four Russian speakers (2 male and 2 female speakers), representing Moscow and St. Petersburg pronunciation standards, and also from several foreign speakers (Bulgarian, Finnish, American English, Korean, etc.). This sound material demonstrates phonetic interference.
    The sound material is recorded in 16 bit Raw PCM format at 20 KHz sample rate.
  4. Sound archives (acoustic databases produced from old sound recordings collections of the Institute of the Russian Literature (so-called Pushkinsky Dom; in these old collections the recordings have been performed on magnetic tapes, wax cylinders or discs, etc.; now they are presented in computerised form):
  • "Zhirmunsky’s collection" of old recordings of the folklore of so-called "Russian Germans" – Germans who lived in the Volga region since XVI century. The recordings have been made in the 20-s and 30-s in Russia.
  • "Tales of the Russian North" and "Poetic Folklore of the Russian North (lamentations)"
    In the dialects if these outlying regions (Pechora, Arkhangel’sk, etc.) one can find the traces of very ancient states of the Russian language.

These acoustic databases are managed by the "Sound Archive" system (see the description). A database unit consists of 20 fields. The first 11 fields correspond to the archival attribution of the original sound material (the archival number, time and genre of the recording, place of the recording, name and age of the performer and his/her nationality, etc.). The rest 9 fields are used for the storage of the following information:
– the sonogram of the sound signal,
– the text of the recording, in standard orthography;
– its transcription;
– the translation of the text into another language or a complete text of the recording, if the database contains only its fragment;
– experts’ comments;
– dialectological comment;
– musicological comment;
– phonetic comment;
– general comment.
Text files are presented in the Windows ANSI (Text Only) format; sound files are saved in the 16 bit Raw PCM format at 16 KHz sample rate.

  1. Database of Russian allophones
    The database contains about 3300 units – allophones of Russian consonants and vowels. These are all the allophones possible in Russian language. The allophones have been selected from words and logotoms produced by a professional speaker who had so-called "normative saint-peterburgian pronunciation". The formation principle of this database is described in:
    Skrelin P. 1997. Concatenative Russian Speech Synthesis: Sound Database Formation Principles. In: "SPECOM'97" Proceedings. Cluj-Napoca.
    The position of the allophone with regard to stress, its right and left context is included into the description of the allophone.
    The allophones are used in speech synthesis systems developed by the team. The results of the experiments showed that the reduction of the number of allophones to 2000 units does not affect the quality of the synthesised speech.