UseCase "Debug relevance problems: why do documents (not) match and how do they (not) rank highly?"

From SMW CindyKate - Main
Component1262247674
Jump to: navigation, search

Content


Debugging query matching

Query parsing: examining the underlying query strategy

  • How does a certain query DSL query translate into a matching strategy of specific terms to fields?
  • Which terms are being searched for in which fields and how are matches boosted?

See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-validate.html

POST dataspects-main-subjects-index-3/_doc/_validate/query?explain=true
     dataspects-main-subjects-index-3/_doc/_search?explain=true
{
  "query": {
    "multi_match": {
      "query": "Search for this",
      "fields": [
        "HasEntityType^5",
        "HasEntityTitle^10",
        "HasEntityBlurb",
        "HasEntityKeywords",
        "HasEntityContent"
      ]
    }
  }
}
"explanation": "+((HasEntityBlurb:se HasEntityBlurb:sea HasEntityBlurb:sear HasEntityBlurb:searc HasEntityBlurb:search HasEntityBlurb:ea HasEntityBlurb:ear HasEntityBlurb:earc HasEntityBlurb:earch HasEntityBlurb:ar HasEntityBlurb:arc HasEntityBlurb:arch HasEntityBlurb:rc HasEntityBlurb:rch HasEntityBlurb:ch HasEntityBlurb:fo HasEntityBlurb:for HasEntityBlurb:or HasEntityBlurb:th HasEntityBlurb:thi HasEntityBlurb:this HasEntityBlurb:hi HasEntityBlurb:his HasEntityBlurb:is) | (HasEntityTitle:se HasEntityTitle:sea HasEntityTitle:sear HasEntityTitle:searc HasEntityTitle:search HasEntityTitle:ea HasEntityTitle:ear HasEntityTitle:earc HasEntityTitle:earch HasEntityTitle:ar HasEntityTitle:arc HasEntityTitle:arch HasEntityTitle:rc HasEntityTitle:rch HasEntityTitle:ch HasEntityTitle:fo HasEntityTitle:for HasEntityTitle:or HasEntityTitle:th HasEntityTitle:thi HasEntityTitle:this HasEntityTitle:hi HasEntityTitle:his HasEntityTitle:is)^10.0 | (HasEntityKeywords:se HasEntityKeywords:sea HasEntityKeywords:sear HasEntityKeywords:searc HasEntityKeywords:search HasEntityKeywords:ea HasEntityKeywords:ear HasEntityKeywords:earc HasEntityKeywords:earch HasEntityKeywords:ar HasEntityKeywords:arc HasEntityKeywords:arch HasEntityKeywords:rc HasEntityKeywords:rch HasEntityKeywords:ch HasEntityKeywords:fo HasEntityKeywords:for HasEntityKeywords:or HasEntityKeywords:th HasEntityKeywords:thi HasEntityKeywords:this HasEntityKeywords:hi HasEntityKeywords:his HasEntityKeywords:is) | (HasEntityContent:se HasEntityContent:sea HasEntityContent:sear HasEntityContent:searc HasEntityContent:search HasEntityContent:ea HasEntityContent:ear HasEntityContent:earc HasEntityContent:earch HasEntityContent:ar HasEntityContent:arc HasEntityContent:arch HasEntityContent:rc HasEntityContent:rch HasEntityContent:ch HasEntityContent:fo HasEntityContent:for HasEntityContent:or HasEntityContent:th HasEntityContent:thi HasEntityContent:this HasEntityContent:hi HasEntityContent:his HasEntityContent:is) | (HasEntityType:Search for this)^5.0) #DocValuesFieldExistsQuery [field=_primary_term]"

Analysis

The process of creating tokens from the query and document text.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html

POST dataspects-main-subjects-index-3/_analyze
{
  "analyzer": "standard",
  "text": "Search for this"
}

https://www.json2yaml.com/

---
tokens:
- token: search
  start_offset: 0
  end_offset: 6
  type: "<ALPHANUM>"
  position: 0
- token: for
  start_offset: 7
  end_offset: 10
  type: "<ALPHANUM>"
  position: 1
- token: this
  start_offset: 11
  end_offset: 15
  type: "<ALPHANUM>"
  position: 2

IsCarriedOutBy SearchEngineer