These results compare the baseline model, which combines a standard word vector with a token-level character vector obtained by passing the characters of a word through a BiLSTM to the meta model. The meta model separates word and character features as separate views and then combines them. Unfortunately, the meta runs had a patience value of 6 so many of the runs were cut short. Usually, it takes longer for the character view to converge.

bg_btb

ca_ancora

cs_fictree

de_gsd

en_ewt

es_ancora

eu_bdt

fa_seraji

fi_ftb

ga_idt

he_htb

it_isdt

ja_gsd

ko_gsd

la_ittb

la_proiel

nl_alpino

pt_gsd

zh_gsd