Concept "NGram Tokenizer"

From SMW CindyKate - Main
Component0740498599
Jump to: navigation, search

Content

HasElasticConceptType Tokenizer https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

  • useful for querying languages that don’t use spaces or that have long compound words, like German
  • It usually makes sense to set min_gram and max_gram to the same value. The smaller the length, the more documents will match but the lower the quality of the matches. The longer the length, the more specific the matches. A tri-gram (length 3) is a good place to start.