Concept "NGram Tokenizer"

From SMW CindyKate by dataspects
Component0740498599
Jump to navigation Jump to search
[edit]
Keywords/Contexts


Annotations
{{#compound_query:Component0740498599Property "-Has subobject" has a restricted application area and cannot be used as annotation property by a user.UseCaseMotivation;?HasDirection;?IsMotivating;?IsMotivatedBy;?HasReasoning
Property "-Has subobject" has a restricted application area and cannot be used as annotation property by a user.Concept "NGram Tokenizer";?HasDirection;?IsMotivating;?IsMotivatedBy;?HasReasoning Property "-Has subobject" has a restricted application area and cannot be used as annotation property by a user.Concept "NGram Tokenizer";?HasDirection;?IsMotivating;?IsMotivatedBy;?HasReasoning
name=MotivationsDeclaredOnThisPageResultItem|link=none}}
Motivations
Lua error in Module:Motivations at line 23: attempt to concatenate field 'hasReasoning' (a nil value).
[edit]

Content

HasElasticConceptType Tokenizer https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

  • useful for querying languages that don’t use spaces or that have long compound words, like German
  • It usually makes sense to set min_gram and max_gram to the same value. The smaller the length, the more documents will match but the lower the quality of the matches. The longer the length, the more specific the matches. A tri-gram (length 3) is a good place to start.