Skip to content

Commit

Permalink
understand-index-request-cache
Browse files Browse the repository at this point in the history
  • Loading branch information
kiranprakash154 committed Oct 2, 2024
1 parent 0647700 commit 2e6fac6
Show file tree
Hide file tree
Showing 8 changed files with 177 additions and 0 deletions.
24 changes: 24 additions & 0 deletions _community_members/awskiran.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: Kiran Prakash
short_name: awskiran
photo: '/assets/media/community/members/awskiran.png'
title: 'OpenSearch Community Member: Kiran Prakash'
primary_title: Kiran Prakash
breadcrumbs:
icon: community
items:
- title: Community
url: /community/index.html
- title: Members
url: /community/members/index.html
- title: 'Kiran Prakash's Profile'
url: '/community/members/kiran-prakash.html'
github: kiranprakash154
linkedin: kp154
job_title_and_company: 'Software Engineer, Amazon Web Services'
personas:
- author
permalink: '/community/members/kiran-prakash.html'
---

Kiran Prakash is a Software Engineer at AWS working on the OpenSearch Project.
153 changes: 153 additions & 0 deletions _posts/2024-10-01-understanding-index-request-cache.markdown
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
---
layout: post
title: "Behind the scenes with OpenSearch: Understanding Index request cache through code"
authors:
- awskiran
- kkhatua
date: 2024-10-01
categories:
- technical-posts
- search
excerpt:
meta_keywords: OpenSearch cluster, Opensearch caching, index request cache, search performance optimization, search latency
meta_description:
---
Speed and efficiency are essential to search users. OpenSearch achieves them using a variety of mechanisms—one of the most crucial being the [Index request cache](https://opensearch.org/docs/latest/search-plugins/caching/request-cache/). This blog post describes the inner workings of this cache, breaking down how it functions at the code level to optimize query performance.

## What is Index request cache

![Index-Request-Cache](/assets/media/blog-images/2024-10-01-understanding-index-request-cache/cache_location.png){:class="img-centered"}
The Index Request Cache is designed to speed up search queries in OpenSearch by caching the results of queries at the shard level. This approach is particularly effective for queries targeting specific indices or patterns, enhancing response times and system efficiency.

Check failure on line 20 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'. Raw Output: {"message": "[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 20, "column": 203}}}, "severity": "ERROR"}

The cache automatically clears entries when data changes, ensuring only up-to-date information is returned.

## Caching policy

Not all searches are eligible for caching in the Index request cache. By default, search requests with size=0 (i.e only cache the metadata like total number of results/hits) are cached.

Check failure on line 26 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'i.e'. Raw Output: {"message": "[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'i.e'.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 26, "column": 112}}}, "severity": "ERROR"}

The following requests are ineligible for caching:

* **Non-deterministic requests:** Searches involving functions like Math.random() or relative times such as now or new Date().

Check failure on line 30 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'Math.random'. Raw Output: {"message": "[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'Math.random'.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 30, "column": 69}}}, "severity": "ERROR"}
* **Scroll and Profile requests**
* **DFS Query Then Fetch requests:** Search type of DFS (Depth First Search) Query Then Fetch results depend on both index content and overridden statistics, leading to inaccurate scores when stats differ (e.g., due to shard updates).

Check warning on line 32 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LatinismsSubstitution] Use 'for example or such as' instead of 'e.g.,'. Raw Output: {"message": "[OpenSearch.LatinismsSubstitution] Use 'for example or such as' instead of 'e.g.,'.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 32, "column": 207}}}, "severity": "WARNING"}

Check failure on line 32 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'e.g'. Raw Output: {"message": "[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'e.g'.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 32, "column": 207}}}, "severity": "ERROR"}

You can enable caching for individual search requests by setting the request_cache query parameter to true:

```json
GET /students/_search?request_cache=true
{
"query": {
"match": {
"name": "doe john"
}
}
}
```

## Understanding cache entries

Every cache entry is a key value pair of **Key → BytesReference**

A [Key](https://github.com/opensearch-project/OpenSearch/blob/4199bc2726235456e5b5422eaf4e836f25c2c5ed/server/src/main/java/org/opensearch/indices/IndicesRequestCache.java#L346) comprises 3 entities

1. **CacheEntity** - [IndexShardCacheEntity](https://github.com/opensearch-project/OpenSearch/blob/4199bc2726235456e5b5422eaf4e836f25c2c5ed/server/src/main/java/org/opensearch/indices/IndicesService.java#L1866C24-L1866C45) that comprises [IndexShard](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/shard/IndexShard.java) (think of this as a main reference to relate a key to the shard it belongs to).
2. **ReaderCacheKeyId** - This is a unique reference to the current state of the shard. On a change of state (i.e., document addition or deletion or updates and upon a refresh) this reference changes.

Check warning on line 54 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LatinismsSubstitution] Use 'that is or specifically' instead of 'i.e.,'. Raw Output: {"message": "[OpenSearch.LatinismsSubstitution] Use 'that is or specifically' instead of 'i.e.,'.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 54, "column": 111}}}, "severity": "WARNING"}

Check failure on line 54 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'i.e'. Raw Output: {"message": "[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'i.e'.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 54, "column": 111}}}, "severity": "ERROR"}
3. **BytesReference** - The actual search query in bytes format.

These three components together ensure that each Key uniquely identifies a specific query targeting a particular shard, while also confirming that the shard’s state is current, preventing the retrieval of stale data.
![Key](/assets/media/blog-images/2024-10-01-understanding-index-request-cache/what_is_key.png){:class="img-centered"}

## Storing entries into the cache

Any cacheable query calls [getOrCompute](https://github.com/opensearch-project/OpenSearch/blob/4199bc2726235456e5b5422eaf4e836f25c2c5ed/server/src/main/java/org/opensearch/indices/IndicesRequestCache.java#L223) that either fetches the precomputed value from the cache or caches it after computing the result.

Check failure on line 62 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: cacheable. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: cacheable. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 62, "column": 5}}}, "severity": "ERROR"}

```
function getOrCompute(CacheEntity, DirectoryReader, cacheKey) {
// Step 1: Get the current state identifier of the shard
readerCacheKeyId = DirectoryReader.getDelegatingCacheKey().getId()
// Step 2: Create a unique key for the cache entry
key = new Key(CacheEntity, cacheKey, readerCacheKeyId)
// Step 3: Check if the result is already in the cache
value = cache.computeIfAbsent(key)
// Step 4: If the result was computed (not retrieved from the cache), register a cleanup listener
if (cacheLoader.isLoaded()) {
cleanupKey = new CleanupKey(CacheEntity, readerCacheKeyId)
OpenSearchDirectoryReader.addReaderCloseListener(DirectoryReader, cleanupKey)
}
// Step 5: Return the cached or computed result
return value
}
```

## Cache Invalidation

An IndexReader is point-in-time view of an index, any operations causing a change in the contents of an index would create a new IndexReader and close the old IndexReader. All the cache entries created by the old IndexReader is now stale and needs cleaning up.

### CleanupKey

When an IndexReader is closed the corresponding cleanupKey is added to a Set called keysToClean.

Check failure on line 92 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: cleanupKey. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: cleanupKey. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 92, "column": 49}}}, "severity": "ERROR"}
![CleanupKey](/assets/media/blog-images/2024-10-01-understanding-index-request-cache/key_and_cleanupkey.png){:class="img-centered"}

The third entity in the Key class, **BytesReference**, is not used in CleanupKey because it represents the actual cached data which is not necessary for identifying which entries need to be cleaned up. The CleanupKey is only concerned with identifying the entries, not their contents.

A cache entry can become invalid due to these operations:

#### Refresh / Merge

Check failure on line 99 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SpacingSlash] When using '/' between words, do not insert space on either side of it. Raw Output: {"message": "[OpenSearch.SpacingSlash] When using '/' between words, do not insert space on either side of it.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 99, "column": 6}}}, "severity": "ERROR"}

A Refresh or a Merge operation creates a new IndexReader

#### Cache Clear API

```POST /my-index/_cache/clear?request=true```

The API call invalidates all the request cache entries for the index.

![Cache-Clear](/assets/media/blog-images/2024-10-01-understanding-index-request-cache/keys_to_clean_insert.png){:class="img-centered"}

In Summary, any scenario of invalidating an IndexReader or specifically clearing the cache would add corresponding CleanupKeys into a collection called KeysToClean.

## Cache Cleanup

OpenSearch has a background job called [CacheCleaner](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/indices/IndicesService.java#L1678) that runs every 1 minute in a separate thread.

Check failure on line 115 in _posts/2024-10-01-understanding-index-request-cache.markdown

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: cleanCache. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: cleanCache. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-01-understanding-index-request-cache.markdown", "range": {"start": {"line": 115, "column": 208}}}, "severity": "ERROR"}
This calls the [cleanCache](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/indices/IndicesRequestCache.java#L698) method, which iterates through all the entries of the cache and maps every Key to a CleanupKey held in KeysToClean and deletes the corresponding Keys.

```
function cleanCache() {
// Step 1: Initialize sets for keys to clean
currentKeysToClean = new Set()
currentFullClean = new Set()
// Step 2: Process the list of keys that need to be cleaned
for each cleanupKey in keysToClean {
keysToClean.remove(cleanupKey)
if (shard is closed or cacheClearAPI called) {
currentFullClean.add(cleanupKey.entity.getCacheIdentity())
} else {
currentKeysToClean.add(cleanupKey)
}
}
// Step 3: Process the cache and remove identified keys
for each key in cache.keys() {
if (currentFullClean.contains(key.entity.getCacheIdentity()) or
currentKeysToClean.contains(new CleanupKey(key.entity, key.readerCacheKey))) {
cache.remove(key)
}
}
// Step 4: Refresh the cache
cache.refresh()
}
```

![Cache-Clear](/assets/media/blog-images/2024-10-01-understanding-index-request-cache/keys_to_clean_delete_and_fetch.png){:class="img-centered"}

## Wrapping up

The Index Request Cache plays a crucial role in OpenSearch’s efficiency. By understanding how it works, you can optimize performance and tune your configurations with greater confidence.

OpenSearch thrives on community contributions. If you have ideas or see opportunities for improvement, consider contributing. Your input can help shape the future of this search technology.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/media/community/members/awskiran.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 2e6fac6

Please sign in to comment.