Wals Roberta Sets 136zip Fix • High Speed

Better mapping between WALS linguistic features and RoBERTa’s tokenization layers.

The issue stems from a discrepancy between the vocabulary size and the compression handling of the WALS "Sets" configuration versus the strict expectations of the HuggingFace RoBERTa tokenizer. wals roberta sets 136zip fix