Concept "NGram Tokenizer"

From SMW CindyKate - Main
Jump to: navigation, search


HasElasticConceptType Tokenizer

  • useful for querying languages that don’t use spaces or that have long compound words, like German
  • It usually makes sense to set min_gram and max_gram to the same value. The smaller the length, the more documents will match but the lower the quality of the matches. The longer the length, the more specific the matches. A tri-gram (length 3) is a good place to start.