Efnil Research Institute for Linguistics, Hungarian Academy of Sciences Dictionary Browser
Cut Board

English | Hungarian

Language pairs: hu-en | en-hu | hu-sl | sl-hu | hu-lt | lt-hu | fr-nl | nl-fr

Sets the minimum ratio of f(T)/f(S) based on the presupposition that there is natural constraint on the ratio of translation equivalents.
Should be set to filter out false positives: in the case of rarely used SL lemmata the alignment algorithm might assign high translational probabilities to incorrect lemma pairs if the TL lemma occurs frequently in the corpus and both members of the lemma pair recurrently show up in aligned units.
Overall minimum frequency of either the source and the target words. It should be at least 5 to have sufficient amount of data to be able to estimate the translation probability based on the parallel corpus.
Translation probability is an estimation of how correct the translation candidate is expected to be. Overall minimum translation probability can be set here.
f1(S) ∈ [, ] → min p(T|S) =
f2(S) ∈ [, ] → min p(T|S) =
f3(S) ∈ [, ] → min p(T|S) =
Various translation probability constraints can be determined as a function of the source lemma frequency to increase the coverage of the dictionaries. This filtering heuristics is based on the observation that useful translation candidates might be assigned to more frequent source lemmata even with lower translation probabilities.

22545 pairs total