RuthLemm Demo

This is a demonstration of RuthLemm, a transformer (BART-based) lemmatizer for the Old Belarusian (Ruthenian) language. It can process raw text or files in the CoNLL-U format used by Universal Dependencies.

How to Use:

  1. Lemmatize String: Enter any text in the text box. The tool will tokenize it, lemmatize each word, and return the result. This mode does not use morphological information.
  2. Lemmatize CoNLL-U: Paste your CoNLL-U data into the text box or upload a .conllu file.
    • You can choose whether to use morphological features to improve accuracy via the "Use Morphology" checkbox.
    • The output will be the same CoNLL-U data with the LEMMA column updated. You can copy the result or download it as a file.