- RUSSICON Reference
Corpus
The corpus contains more than 150,000,000 word
occurrences. It is based on a wide representation
of texts of the following types: Russian
literature (including Russian XX Century
Literature Corpus, described below), critics,
philosophy, religion, newspapers, memoirs, law,
business documents, computer documentation,
historical documents, protocols, translations,
folklore (songs, jokes, Internet/CD literature,
etc.), "underground" literature, etc.Among
the authors one can find about 500 Russian
writers well-known in the XVII XX
centuries, approximately 200 names of Russian
famous philosophers, theologians, critics,
politicians and memoirists. The corpora files are
prepared in text, HTML and SGML formats.
Conversion to SGML was done by means of special
conversion utilities and with the help of
SoftQuad SGML Publishing Suite. Currently, only a
part of the whole corpus (approximately 5 mln
word occurrences) is linguistically encoded (tagged).
The texts for this so-called linguistically
encoded corpus were selected in such a way
that every author is presented by at least 4000-5000
words' worth of text fragments. Every word of the
linguistically encoded corpus
corresponds to an entry in one or several
Russicon dictionaries. Due to Russicon reference
corpora development, the team has already
accumulated an extensive glossary consisting of
more than 500,000 lemmas, but only about 200,000
of them have been already processed, i.e.
analysed, considered and included into
dictionaries. The processing of the glossary is
being continued.
- RUSSICON Russian XX Century
Literature Corpus
The corpus contains about 5,000,000 word
occurrences. It consists of more than 10,000
texts from 400 eminent Russian writers (prosaists,
poets and critics), among them:
Adamovich G. Abramov F., Aksenov V.,
Andreev L., Annenskij I., Anninskij L., Antokolskij
P., Aleshkovskij Y., Akhmadulina B., Akhmatova A.,
Aldanov M., Amfiteatrov A., Averchenko A.,Astafev
V., Aseev N., Ajtmatov C., Babel I.,
Bagritskij E., Bakhtin M., Balmont K., Bek
A., Belov V., Belyj A., Berggolts O.,
Berberova N., Berkovskij N., Blok A., Bobrov S.,
Brodskij (Brodsky) I., Bryusov V., Bulgakov M.,
Bunin I., Burlyuk D., Bykov V., Bitov A., Chekhov
A., Chukovskaya L., Chukovskij K., Chernyj Sasha,
Daniel Yu., Dombrovskij Y., Dovlatov S.,
Efremov I., Ehrdman N., Ehrenburg I., Esenin S.,
Evtushenko E., Fadeev A., Fedin K., Forsh O.,
Galich A., Gazdanov G., Gajdar A., Gershenzon M.,
Gilyarovskij V., Ginzburg L., Gippius Z.,
Gorbanevskaya N., Gorenshtejn F., Gorkij M.,
Gorodetskij S., Granin D., Grebenshchikov B.,
Grigorev O., Grin A., Grossman V., Gumilev
N., Ilf I., Iskander F., Ivanov G., Ivanov
Vyach., Ivanov Vs., Kamenskij V., Kataev V.,
Kaverin V., Kazakevich E., Kazakov Y., Kharitonov
Y., Kharms D., Khlebnikov V., Khodasevich V., Kim
A., Klyuev N., Kononov H., Kopelev L., Korzhavin
N., Kozhevnikov P., Krivulin V., Kruchenykh A.,
Kublanovskij Y., Kushner A., Kuzmin M., Leonov L.,
Limonov E., Lipkin S., Lipatov V., Lotman Y.,
Lozinskij M., Lugovskoj V., Lunts L., Makanin V.,
Makarenko A., Maksimov V., Mandelshtam O.,
Marshak S., Mariengof A., Mikhalkov S., Morits Y.,
Mother Maria, Mayakovskij V., Merezhovskij D.,
Mezhirov A., Mejlakh M., Nabokov V., Nagibin Y.,
Narbut V., Nekrasov V., Nilus S., Nosov N.,
Novikov-Priboj A., Odoevtseva I., Olesha Y.,
Okhapkin O., Okudzhava B., Olejnikov N., Oseev N.,
Ostrovskij N., Panova V., Panteleev L., Pasternak
B., Paustovskij K., Petrov Y., Pikul V.,
Pilnyak B., Petrushevskaya L., Platonov A.,
Polevoj B., Popov V., Popov Y., Prigov D.,
Pristavkin A., Prishvin M., Pulatov T., Rasputin
B., Radzinskij E., Remizov A., Roshchin M.,
Rozanov V., Rozhdestvenskij R., Rozov V., Rubtsov
N., Rubinshtejn L., Rybakov A., Samojlov D.,
Sapgir G., Sevela E., Severyanin I., Selvinskij
I., Semenov Y., Serafimovich A., Shaginyan M.,
Shalamov V., Shatrov M., Shvarts E., Shefner V.,
Shinkarev V., Shklovskij V., Shmelev I., Shukshin
V., Sholokhov M., Simonov K., Slavkin V.,
Slutskij B., Sokolov Sasha, Sokolov-Mikitov I.,
Sologub F., Soloukhin V., Sorokin V.,
Solzhenitsyn A., Sosnora V., Strugatskij A.&
B. , Svetlov M., Tarkovskij A., Teffi, Terts A. /Sinyavskij
A./ Tikhonov N., Tolstoj A., Tolstoj L. Tolstaya
T. Tryapkin N. Tsvetaeva M., Tynyanov Y. , Trenev
K., Trifonov Y., Tvardovskij A., etc.
The texts are encoded in HTML and SGML.The corpus
is prepared for distribution on CD-ROM as an
anthology, compiled by S. Yablonskij, with
several dictionaries and presentations of authors
and their works.
|