-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
1440 lines (1440 loc) · 72.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="Asciidoctor 2.0.20">
<title>Redis Enterprise Developer Observability Playbook</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700">
<link rel="stylesheet" href="./asciidoctor.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
</head>
<body class="article toc2 toc-left">
<div id="header">
<h1>Redis Enterprise Developer Observability Playbook</h1>
<div class="details">
<span id="revnumber">version 1.0</span>
</div>
<div id="toc" class="toc2">
<div id="toctitle">Table of Contents</div>
<ul class="sectlevel1">
<li><a href="#introduction">1. Introduction</a></li>
<li><a href="#core-cluster-resource-monitoring">2. Core cluster resource monitoring</a>
<ul class="sectlevel2">
<li><a href="#memory">2.1. Memory</a></li>
<li><a href="#cpu">2.2. CPU</a></li>
<li><a href="#connections">2.3. Connections</a></li>
<li><a href="#synchronization">2.4. Synchronization</a></li>
</ul>
</li>
<li><a href="#database-performance-indicators">3. Database performance indicators</a>
<ul class="sectlevel2">
<li><a href="#latency">3.1. Latency</a></li>
<li><a href="#cache-hit-rate">3.2. Cache hit rate</a></li>
<li><a href="#key-eviction-rate">3.3. Key eviction rate</a></li>
</ul>
</li>
<li><a href="#proxy-performance">4. Proxy Performance</a>
<ul class="sectlevel3">
<li><a href="#proxy-policies">4.1. Proxy Policies</a></li>
</ul>
</li>
<li><a href="#data-access-anti-patterns">5. Data access anti-patterns</a>
<ul class="sectlevel2">
<li><a href="#slow-operations">5.1. Slow operations</a></li>
<li><a href="#hot-keys">5.2. Hot keys</a></li>
<li><a href="#large-keys">5.3. Large keys</a></li>
</ul>
</li>
<li><a href="#alerting">6. Alerting</a>
<ul class="sectlevel2">
<li><a href="#configuring-prometheus">6.1. Configuring Prometheus</a></li>
<li><a href="#list-of-alerts">6.2. List of alerts</a></li>
</ul>
</li>
<li><a href="#appendix-a-grafana-dashboards">7. Appendix A: Grafana Dashboards</a>
<ul class="sectlevel2">
<li><a href="#software">7.1. Software</a></li>
<li><a href="#workflow">7.2. Workflow</a></li>
<li><a href="#cloud">7.3. Cloud</a></li>
</ul>
</li>
</ul>
</div>
</div>
<div id="content">
<div class="sect1">
<h2 id="introduction"><a class="anchor" href="#introduction"></a>1. Introduction</h2>
<div class="sectionbody">
<div class="paragraph">
<p>This document provides monitoring guidance for developers running applications
that connect to Redis Enterprise. In particular, this guide focuses on the systems
and resources that are most likely to impact the performance of your application.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/node_summary.png" alt="node summary">
</div>
<div class="title">Figure 1. Dashboard showing relevant statistics for a Node</div>
</div>
<div class="paragraph">
<p>To effectively monitor a Redis Enterprise cluster you need to observe
core cluster resources and key database performance indicators.</p>
</div>
<div class="paragraph">
<p>Core cluster resources include:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="#memory">Memory utilization</a></p>
</li>
<li>
<p><a href="#cpu">CPU utilization</a></p>
</li>
<li>
<p><a href="#connections">Database connections</a></p>
</li>
<li>
<p><a href="#network-ingress-egress">Network traffic</a></p>
</li>
<li>
<p><a href="#synchronization">Synchronization</a></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Key database performance indicators include:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="#latency">Latency</a></p>
</li>
<li>
<p><a href="#cache-hit-rate">Cache hit rate</a></p>
</li>
<li>
<p><a href="#key-eviction-rate">Key eviction rate</a></p>
</li>
<li>
<p><a href="#proxy-performance">Proxy Performance</a></p>
</li>
</ul>
</div>
<div class="imageblock">
<div class="content">
<img src="images/cluster_overview.png" alt="cluster overview">
</div>
<div class="title">Figure 2. Dashboard showing an overview of cluster metrics</div>
</div>
<div class="paragraph">
<p>In addition to manually monitoring these resources and indicators, we recommend <a href="#alerting">setting up alerts</a>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="core-cluster-resource-monitoring"><a class="anchor" href="#core-cluster-resource-monitoring"></a>2. Core cluster resource monitoring</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="memory"><a class="anchor" href="#memory"></a>2.1. Memory</h3>
<div class="paragraph">
<p>Every Redis Enterprise database has a maximum configured memory limit to ensure isolation
in a multi-database cluster.</p>
</div>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>Metric name</strong></th>
<th class="tableblock halign-left valign-top"><strong>Definition</strong></th>
<th class="tableblock halign-left valign-top"><strong>Unit</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Memory usage percentage</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Percentage of used memory relative to the configured memory limit for a given database</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Percentage</p></td>
</tr>
</tbody>
</table>
<div class="imageblock">
<div class="content">
<img src="images/playbook_used-memory.png" alt="playbook used memory">
</div>
<div class="title">Figure 3. Dashboard displaying high-level cluster metrics - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/software/classic/cluster_dashboard_v9-11.json">Cluster Dashboard</a></div>
</div>
<div class="sect3">
<h4 id="thresholds"><a class="anchor" href="#thresholds"></a>2.1.1. Thresholds</h4>
<div class="paragraph">
<p>The appropriate memory threshold depends on how the application is using Redis.</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="#caching-workloads">Caching workloads</a>, which permit Redis to evict keys, can safely use 100% of available memory.</p>
</li>
<li>
<p><a href="#non-caching-workloads">Non-caching workloads</a> do not permit key eviction and should be closely monitored as soon as memory usage reaches 80%.</p>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="caching-workloads"><a class="anchor" href="#caching-workloads"></a>2.1.2. Caching workloads</h4>
<div class="paragraph">
<p>For applications using Redis solely as a cache, you can safely let the memory usage
reach 100% as long as you have an <a href="https://redis.io/blog/cache-eviction-strategies/">eviction policy</a> in place. This will ensure
that Redis can evict keys while continuing to accept new writes.</p>
</div>
<div class="paragraph">
<p><strong>NB</strong> Eviction will increase write command latency as Redis has to cleanup the memory/objects before accepting a new write to prevent OOM when memory usage is at 100%</p>
</div>
<div class="paragraph">
<p>While your Redis database is using 100% of available memory in a caching context,
it’s still important to monitor performance. The key performance indicators include:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Latency</p>
</li>
<li>
<p>Cache hit ratio</p>
</li>
<li>
<p>Evicted keys</p>
</li>
</ul>
</div>
<div class="sect4">
<h5 id="read-latency"><a class="anchor" href="#read-latency"></a>Read latency</h5>
<div class="paragraph">
<p><strong>Latency</strong> has two important definitions, depending on context:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>In context Redis itself, latency is <strong>the time it takes for Redis
to respond to a request</strong>. See <a href="#latency">Latency</a> for a broader discussion of this metric.</p>
</li>
<li>
<p>In the context of your application, latency is <strong>the time it takes for the application
to process a request</strong>. This will include the time it takes to execute both reads and writes
to Redis, as well as calls to other databases and services. Note that its possible for
Redis to report low latency while the application is experiencing high latency.
This may indicate a low cache hit ratio, ultimately caused by insufficient memory.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>You need to monitor both application-level and Redis-level latency to diagnose
caching performance issues in production.</p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="cache-hit-ratio-and-eviction"><a class="anchor" href="#cache-hit-ratio-and-eviction"></a>2.1.3. Cache hit ratio and eviction</h4>
<div class="paragraph">
<p><strong>Cache hit ratio</strong> is the percentage of read requests that Redis serves successfully.
<strong>Eviction rate</strong> is the rate at which Redis evicts keys from the cache. These metrics
are often inversely correlated: a high eviction rate may cause a low cache hit ratio.</p>
</div>
<div class="paragraph">
<p>If the Redis server is empty, the hit ratio will be 0%. As the application runs and the fills the cache,
the hit ratio will increase.</p>
</div>
<div class="paragraph">
<p><strong>When the entire cached working set fits in memory</strong>, then the cache hit ratio will reach close to 100%
while the percent of used memory will remain below 100%.</p>
</div>
<div class="paragraph">
<p><strong>When the working set cannot fit in memory</strong>, the eviction policy will start to evict keys.
The greater the rate of key eviction, the lower the cache hit ratio.</p>
</div>
<div class="paragraph">
<p>In both cases, keys will may be manually invalidated by the application or evicted through
the uses of TTLs (time-to-live) and an eviction policy.</p>
</div>
<div class="paragraph">
<p>The ideal cache hit ratio depends on the application, but generally, the ratio should be greater than 50%.
Low hit ratios coupled with high numbers of object evictions may indicate that your cache is too small.
This can cause thrashing on the application side, a scenario where the cache is constantly being invalidated.</p>
</div>
<div class="paragraph">
<p>The upshot here is that when your Redis database is using 100% of available memory, you need
to measure the rate of
<a href="https://redis.io/docs/latest/operate/rs/references/metrics/database-operations/#evicted-objectssec">key evictions</a>.</p>
</div>
<div class="paragraph">
<p>An acceptable rate of key evictions depends on the total number of keys in the database
and the measure of application-level latency. If application latency is high,
check to see that key evictions have not increased.</p>
</div>
</div>
<div class="sect3">
<h4 id="eviction-policies"><a class="anchor" href="#eviction-policies"></a>2.1.4. Eviction Policies</h4>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Name</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Description</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">noeviction</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">allkeys-lru</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Keeps most recently used keys; removes least recently used (LRU) keys</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">allkeys-lfu</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Keeps frequently used keys; removes least frequently used (LFU) keys</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">volatile-lru</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Removes least recently used keys with the expire field set to true.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">volatile-lfu</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Removes least frequently used keys with the expire field set to true.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">allkeys-random</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Randomly removes keys to make space for the new data added.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">volatile-random</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Randomly removes keys with expire field set to true.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">volatile-ttl</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Removes keys with expire field set to true and the shortest remaining time-to-live (TTL) value.</p></td>
</tr>
</tbody>
</table>
</div>
<div class="sect3">
<h4 id="eviction-policy-guidelines"><a class="anchor" href="#eviction-policy-guidelines"></a>2.1.5. Eviction policy guidelines</h4>
<div class="ulist">
<ul>
<li>
<p>Use the allkeys-lru policy when you expect a power-law distribution in the popularity of your requests. That is, you expect a subset of elements will be accessed far more often than the rest. This is a good pick if you are unsure.</p>
</li>
<li>
<p>Use the allkeys-random if you have a cyclic access where all the keys are scanned continuously, or when you expect the distribution to be uniform.</p>
</li>
<li>
<p>Use the volatile-ttl if you want to be able to provide hints to Redis about what are good candidate for expiration by using different TTL values when you create your cache objects.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The volatile-lru and volatile-random policies are mainly useful when you want to use a single instance for both caching and to have a set of persistent keys. However it is usually a better idea to run two Redis instances to solve such a problem.</p>
</div>
<div class="paragraph">
<p><strong>NB</strong> Setting an expire value to a key costs memory, so using a policy like allkeys-lru is more memory efficient since there is no need for an expire configuration for the key to be evicted under memory pressure.</p>
</div>
</div>
<div class="sect3">
<h4 id="non-caching-workloads"><a class="anchor" href="#non-caching-workloads"></a>2.1.6. Non-caching workloads</h4>
<div class="paragraph">
<p>If no eviction policy is enabled, then Redis will stop accepting writes once memory reaches 100%.
Therefore, for non-caching workloads, we recommend that you configure an alert at 80% memory usage.
Once your database reaches this 80% threshold, you should closely review the rate of memory usage growth.</p>
</div>
</div>
<div class="sect3">
<h4 id="troubleshooting"><a class="anchor" href="#troubleshooting"></a>2.1.7. Troubleshooting</h4>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>Issue</strong></th>
<th class="tableblock halign-left valign-top"><strong>Possible causes</strong></th>
<th class="tableblock halign-left valign-top"><strong>Remediation</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Redis memory usage has reached 100%</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This may indicate an insufficient Redis memory limit for your application’s workload</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">For non-caching workloads (where eviction is unacceptable),
immediately increase the memory limit for the database.
You can accomplish this through the Redis Enterprise console or its API.
Alternatively, you can contact Redis support to assist.</p>
<p class="tableblock">For caching workloads, you need to monitor performance closely.
Confirm that you have an <a href="https://redis.io/docs/latest/operate/rs/databases/memory-performance/eviction-policy/">eviction policy</a>
in place.
If your application’s performance starts to degrade, you may need to increase the memory limit,
as described above.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Redis has stopped accepting writes</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Memory is at 100% and no eviction policy is in place</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Increase the database’s total amount of memory.
If this is for a caching workload, consider enabling
an <a href="https://redis.io/docs/latest/operate/rs/databases/memory-performance/eviction-policy/">eviction policy</a></p>
<p class="tableblock">In addition, you may want to determine whether the application can set a reasonable TTL (time-to-live) on some or all
of the data being written to Redis.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Cache hit ratio is steadily decreasing</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The application’s working set size may be steadily increasing.</p>
<p class="tableblock">Alternatively, the application may be misconfigured (e.g., generating
more than one unique cache key per cached item.)</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">If the working set size is increasing, consider increasing the memory limit for the database.
If the application is misconfigured, review the application’s cache key generation logic.</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect2">
<h3 id="cpu"><a class="anchor" href="#cpu"></a>2.2. CPU</h3>
<div class="paragraph">
<p>Redis Enterprise provides several CPU metrics:</p>
</div>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>Metric name</strong></th>
<th class="tableblock halign-left valign-top"><strong>Definition</strong></th>
<th class="tableblock halign-left valign-top"><strong>Unit</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Shard CPU</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CPU time portion spent by database shards</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Percentage, up to 100% per shard</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Proxy CPU</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CPU time portion spent by the cluster’s proxy(s)</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Percentage, 100% per proxy thread</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Node CPU (User and System)</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CPU time portion spent by all user-space and kernel-level processes</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Percentage, 100% per node CPU</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>To understand CPU metrics, it’s worth recalling how a Redis Enterprise cluster is organized.
A cluster consists of one or more nodes. Each node is a VM (or cloud compute instance) or
a bare-metal server.</p>
</div>
<div class="paragraph">
<p>A database is a set of processes, known as shards, deployed across the nodes of a cluster.</p>
</div>
<div class="paragraph">
<p>In the dashboard, shard CPU is the CPU utilization of the processes that make up the database.
When diagnosing performance issues, start by looking at shard CPU.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/playbook_database-cpu-shard.png" alt="playbook database cpu shard">
</div>
<div class="title">Figure 4. Dashboard displaying CPU usage - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/software/classic/database_dashboard_v9-11.json">Database Dashboard</a></div>
</div>
<div class="sect3">
<h4 id="thresholds-2"><a class="anchor" href="#thresholds-2"></a>2.2.1. Thresholds</h4>
<div class="paragraph">
<p>In general, we define high CPU as any CPU utilization above 80% of total capacity.</p>
</div>
<div class="paragraph">
<p>Shard CPU should remain below 80%. Shards are single-threaded, so a shard CPU of 100% means that the shard is fully utilized.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/playbook_proxy-cpu-usage.png" alt="playbook proxy cpu usage">
</div>
<div class="title">Figure 5. Display showing Proxy CPU usage - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/software/classic/proxy_dashboard_v9-11.json">Proxy Dashboard</a></div>
</div>
<div class="paragraph">
<p>Proxy CPU should remain below 80% of total capacity.
The proxy is a multi-threaded process that handles client connections and forwards requests to the appropriate shard.
Because the total number of proxy threads is configurable, the proxy CPU may exceed 100%.
A proxy configured with 6 threads can reach 600% CPU utilization, so in this case,
keeping utilization below 80% means keeping the total proxy CPU usage below 480%.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/node_cpu.png" alt="node cpu">
</div>
<div class="title">Figure 6. Dashboard displaying an ensemble of Node CPU usage data - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/software/classic/node_dashboard_v9-11.json">Node Dashboard</a></div>
</div>
<div class="paragraph">
<p>Node CPU should also remain below 80% of total capacity. As with the proxy, the node CPU is variable depending
on the CPU capacity of the node. You will need to calibrate your alerting based on the number of cores in your nodes.</p>
</div>
</div>
<div class="sect3">
<h4 id="troubleshooting-2"><a class="anchor" href="#troubleshooting-2"></a>2.2.2. Troubleshooting</h4>
<div class="paragraph">
<p>High CPU utilization has multiple possible causes. Common causes include an under-provisioned cluster,
excess inefficient Redis operations, and hot master shards.</p>
</div>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>Issue</strong></th>
<th class="tableblock halign-left valign-top"><strong>Possible causes</strong></th>
<th class="tableblock halign-left valign-top"><strong>Remediation</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">High CPU utilization across all shards of a database</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This usually indicates that the database is under-provisioned in terms of number of shards.
A secondary cause may be that the application is running too many inefficient Redis operations.
You can detect slow Redis operations by enabling the slow log in the Redis Enterprise UI.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">First, rule out inefficient Redis operations as the cause of the high CPU utilization.
See <a href="#slow-operations">Slow operations</a> for details on this.
If inefficient Redis operations are not the cause, then increase the number of shards in the database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">High CPU utilization on a single shard, with the remaining shards having low CPU utilization</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This usually indicates a master shard with at least one hot key.
Hot keys are keys that are accessed extremely frequently (e.g., more than 1000 times per second).</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Hot key issues generally cannot be resolved by increasing the number of shards.
To resole this issue, see <a href="#hot-keys">Hot keys</a>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">High Proxy CPU</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">There are several possible causes of high proxy CPU.
First, review the behavior of connections to the database.
Frequent cycling of connections, especially with TLS is enabled, can cause high proxy CPU utilization.
This is especially true when you see more than 100 connections per second per thread.
Such behavior is almost always a sign of a misbehaving application.</p>
<p class="tableblock">Seconds, review the total number of operations per second against the cluster.
If you see more than 50k operations per second per thread, you may need to increase the number of proxy threads.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">In the case of high connection cycling, review the application’s connection behavior.</p>
<p class="tableblock">In the case of high operations per second, <a href="https://redis.io/docs/latest/operate/rs/references/cli-utilities/rladmin/tune/#tune-proxy">increase the number of proxy threads</a>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">High Node CPU</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">You will typically detect high shard or proxy CPU utilization before you detect high node CPU utilization.
Use the remediation steps above to address high shard and proxy CPU utilization.
In spite of this, if you see high node CPU utilization, you may need to increase the number of nodes in the cluster.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Consider increasing the number of nodes in the cluster and the rebalancing the shards across the new nodes.
This is a complex operation and should be done with the help of Redis support.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">High System CPU</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Most of the issues above will reflect user-space CPU utilization.
However, if you see high system CPU utilization, this may indicate a problem at the network or storage level.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Review network bytes in and network bytes out to rule out any unexpected spikes in network traffic.
You may need perform some deeper network diagnostics to identify the cause of the high system CPU utilization.
For example, with high rates of packet loss, you may need to review network configurations or even the network hardware.</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect2">
<h3 id="connections"><a class="anchor" href="#connections"></a>2.3. Connections</h3>
<div class="paragraph">
<p>The Redis Enterprise database dashboard indicates to the total number of connections to the database.</p>
</div>
<div class="paragraph">
<p>This connection count metric should be monitored with both a minimum and maximum number of connections in mind.
Based on the number of application instances connecting to Redis (and whether your application uses connection pooling),
you should have a rough idea of the minimum and maximum number of connections you expect to see for any given database.
This number should remain relatively constant over time.</p>
</div>
<div class="sect3">
<h4 id="troubleshooting-3"><a class="anchor" href="#troubleshooting-3"></a>2.3.1. Troubleshooting</h4>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>Issue</strong></th>
<th class="tableblock halign-left valign-top"><strong>Possible causes</strong></th>
<th class="tableblock halign-left valign-top"><strong>Remediation</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Fewer connections to Redis than expected</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The application may not be connecting to the correct Redis database.
There may be a network partition between the application and the Redis database.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Confirm that the application can successfully connect to Redis.
This may require consulting the application logs or the application’s connection configuration.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Connection count continues to grow over time</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Your application may not be releasing connections.
The most common of such a connection leak is a manually implemented
connection pool or a connection pool that is not properly configured.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Review the application’s connection configuration</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Erratic connection counts (e.g, spikes and drops)</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Application misbehavior (thundering herds, connection cycling, ) or networking issues</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Review the application logs and network traffic to determine the cause of the erratic connection counts.</p></td>
</tr>
</tbody>
</table>
<div class="imageblock">
<div class="content">
<img src="images/playbook_database-used-connections.png" alt="playbook database used connections">
</div>
<div class="title">Figure 7. Dashboard displaying connections - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/software/classic/database_dashboard_v9-11.json">Database Dashboard</a></div>
</div>
</div>
<div class="sect3">
<h4 id="network-ingress-egress"><a class="anchor" href="#network-ingress-egress"></a>2.3.2. Network ingress / egress</h4>
<div class="paragraph">
<p>The network ingress / egress panel show the amount of data being sent to and received from the database.
Large spikes in network traffic can indicate that the cluster is under-provisioned or that
the application is reading and/or writing unusually large keys. A correlation between high network traffic
and high CPU utilization may indicate a large key scenario.</p>
</div>
<div class="sect4">
<h5 id="unbalanced-database-endpoint"><a class="anchor" href="#unbalanced-database-endpoint"></a>Unbalanced database endpoint</h5>
<div class="paragraph">
<p>One possible cause is that the database endpoint is not located on the same node as master shards. In addition to added network latency, if data plane internode encryption is enabled, CPU consumption can increase as well.</p>
</div>
<div class="paragraph">
<p>One solution is to used the optimal shard placement and proxy policy to ensure endpoints are collocated on nodes hosting master shards. If you need to restore balance (e.g. after node failure) you can manually failover shard(s) with the rladmin cli tool.</p>
</div>
<div class="paragraph">
<p>Extreme network traffic utilization may approach the limits of the underlying network infrastructure.
In this case, the only remediation is to add additional nodes to the cluster and scale the database’s shards across them.</p>
</div>
</div>
</div>
</div>
<div class="sect2">
<h3 id="synchronization"><a class="anchor" href="#synchronization"></a>2.4. Synchronization</h3>
<div class="paragraph">
<p>In Redis Enterprise, geographically-distributed synchronization is based on CRDT technology.
The Redis Enterprise implementation of CRDT is called an Active-Active database (formerly known as CRDB).
With Active-Active databases, applications can read and write to the same data set from different geographical locations seamlessly and with low latency, without changing the way the application connects to the database.</p>
</div>
<div class="paragraph">
<p>An Active-Active architecture is a data resiliency architecture that distributes the database information over multiple data centers via independent and geographically distributed clusters and nodes.
It is a network of separate processing nodes, each having access to a common replicated database such that all nodes can participate in a common application ensuring local low latency with each region being able to run in isolation.</p>
</div>
<div class="paragraph">
<p>To achieve consistency between participating clusters, Redis Active-Active synchronization uses a process called the syncer.</p>
</div>
<div class="paragraph">
<p>The syncer keeps a replication backlog, which stores changes to the dataset that the syncer sends to other participating clusters.
The syncer uses partial syncs to keep replicas up to date with changes, or a full sync in the event a replica or primary is lost.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/playbook_network-connectivity.png" alt="playbook network connectivity">
</div>
<div class="title">Figure 8. Dashboard displaying connection metrics between zones - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/software/classic/synchronization_dashboard_v9-11.json">Synchronization Dashboard</a></div>
</div>
<div class="paragraph">
<p>CRDT provides three fundamental benefits over other geo-distributed solutions:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>It offers local latency on read and write operations, regardless of the number of geo-replicated regions and their distance from each other.</p>
</li>
<li>
<p>It enables seamless conflict resolution (“conflict-free”) for simple and complex data types like those of Redis core.</p>
</li>
<li>
<p>Even if most of the geo-replicated regions in a CRDT database (for example, 3 out of 5) are down, the remaining geo-replicated regions are uninterrupted and can continue to handle read and write operations, ensuring business continuity.</p>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="database-performance-indicators"><a class="anchor" href="#database-performance-indicators"></a>3. Database performance indicators</h2>
<div class="sectionbody">
<div class="paragraph">
<p>There several key performance indicators that report your database’s performance against your application’s workload:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Latency</p>
</li>
<li>
<p>Cache hit rate</p>
</li>
<li>
<p>Key eviction rate</p>
</li>
</ul>
</div>
<div class="sect2">
<h3 id="latency"><a class="anchor" href="#latency"></a>3.1. Latency</h3>
<div class="paragraph">
<p>Latency is <strong>the time it takes for Redis to respond to a request</strong>.
Redis Enterprise measures latency from the first byte received by the proxy to the last byte sent in the command’s response.</p>
</div>
<div class="paragraph">
<p>An adequately provisioned Redis database running efficient Redis operations will report an average latency below 1 millisecond. In fact, it’s common to measure
latency in terms is microseconds. Customers regularly achieve, and sometime require, average latencies of 400-600
microseconds.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/playbook_database-cluster-latency.png" alt="playbook database cluster latency">
</div>
<div class="title">Figure 9. Dashboard display of latency metrics - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/software/classic/database_dashboard_v9-11.json">Database Dashboard</a></div>
</div>
<div class="paragraph">
<p>The metrics distinguish between read and write latency. Understanding whether high latency is due
to read or writes can help you to isolate the underlying issue.</p>
</div>
<div class="paragraph">
<p>Note that these latency metrics do not include network round trip time or application-level serialization,
which is why it’s essential to measure request latency at the application, as well.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/latency_spike.png" alt="latency spike">
</div>
<div class="title">Figure 10. Display showing a noticeable spike in latency</div>
</div>
<div class="sect3">
<h4 id="troubleshooting-4"><a class="anchor" href="#troubleshooting-4"></a>3.1.1. Troubleshooting</h4>
<div class="paragraph">
<p>Here are some possible causes of high database latency. Note that high database latency is just one possible
cause of high application latency. Application latency can be caused by a variety of factors, including
a low <a href="#cache-hit-rate">cache hit rate</a>, a high rate of <a href="#key-eviction-rate">evictions</a>, or a
<a href="#network-ingress-egress">networking issue</a>.</p>
</div>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>Issue</strong></th>
<th class="tableblock halign-left valign-top"><strong>Possible causes</strong></th>
<th class="tableblock halign-left valign-top"><strong>Remediation</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Slow database operations</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Confirm that there are no excessive slow operations in the <a href="#slow-operations">Redis slow log</a>.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">If possible, reduce the number of slow operations being sent to the database.
If this not possible, consider increasing the number of shards in the database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Increased traffic to the database</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Review the <a href="#network-ingress-egress">network traffic</a> and the database operations per second chart
to determine if increased traffic is causing the latency.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">If the database is underprovisioned due to increased traffic, consider increasing the number of shards in the database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Insufficient CPU</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Check to see if the <a href="#cpu">CPU utilization</a> is increasing.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Confirm that <a href="#slow-operations">slow operations</a> are not causing the high CPU utilization.
If the high CPU utilization is due to increased load, consider adding shards to the database.</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect2">
<h3 id="cache-hit-rate"><a class="anchor" href="#cache-hit-rate"></a>3.2. Cache hit rate</h3>
<div class="paragraph">
<p><strong>Cache hit rate</strong> is the percentage of all read operations that return a response.<sup class="footnote">[<a id="_footnoteref_1" class="footnote" href="#_footnotedef_1" title="View footnote.">1</a>]</sup>
When an application tries to read a key that exists, this is known as a <strong>cache hit</strong>.
Alternatively, when an application tries to read a key that does not exist, this is knows as a <strong>cache miss</strong>.</p>
</div>
<div class="paragraph">
<p>For <a href="#caching-workloads">caching workloads</a>, the cache hit rate should generally be above 50%, although
the exact ideal cache hit rate can vary greatly depending on the application and depending on whether the cache
is already populated.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/playbook_cache-hit.png" alt="playbook cache hit">
</div>
<div class="title">Figure 11. Dashboard showing the cache hit ratio along with read/write misses - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/software/classic/database_dashboard_v9-11.json">Database Dashboard</a></div>
</div>
<div class="paragraph">
<p>Note: Redis Enterprise actually reports four different cache hit / miss metrics.
These are defined as follows:</p>
</div>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>Metric name</strong></th>
<th class="tableblock halign-left valign-top"><strong>Definition</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">bdb_read_hits</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The number of successful read operations</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">bdb_read_misses</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The number of read operations returning null</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">bdb_write_hits</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The number of write operations against existing keys</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">bdb_write_misses</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The number of write operations that create new keys</p></td>
</tr>
</tbody>
</table>
<div class="sect3">
<h4 id="troubleshooting-5"><a class="anchor" href="#troubleshooting-5"></a>3.2.1. Troubleshooting</h4>
<div class="paragraph">
<p>Cache hit rate is usually only relevant for caching workloads. See <a href="#cache-hit-ratio-and-eviction">Cache hit ratio and eviction</a>
for tips on troubleshooting cache hit rate.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="key-eviction-rate"><a class="anchor" href="#key-eviction-rate"></a>3.3. Key eviction rate</h3>
<div class="paragraph">
<p>They <strong>key eviction rate</strong> is rate at which objects are being evicted from the database.
If an <a href="https://redis.io/docs/latest/operate/rs/databases/memory-performance/eviction-policy/">eviction policy</a> is in place
for a database, eviction will begin once the database approaches its max memory capacity.</p>
</div>
<div class="paragraph">
<p>A high or increasing rate of evictions will negatively affect database latency, especially
if the rate of necessary key evictions exceeds the rate of new key insertions.</p>
</div>
<div class="paragraph">
<p>See <a href="#cache-hit-ratio-and-eviction">Cache hit ratio and eviction</a> for a discussion if key eviction and its relationship with memory usage.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/playbook_eviction-expiration.png" alt="playbook eviction expiration">
</div>
<div class="title">Figure 12. Dashboard displaying object evictions - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/software/classic/database_dashboard_v9-11.json">Database Dashboard</a></div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="proxy-performance"><a class="anchor" href="#proxy-performance"></a>4. Proxy Performance</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Redis Enterprise Software (RS) provides high-performance data access through a proxy process that manages and optimizes access to shards within the RS cluster. Each node contains a single proxy process. Each proxy can be active and take incoming traffic or it can be passive and wait for failovers.</p>
</div>
<div class="sect3">
<h4 id="proxy-policies"><a class="anchor" href="#proxy-policies"></a>4.1. Proxy Policies</h4>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Policy</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Description</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Single</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">There is only a single proxy that is bound to the database. This is the default database configuration and preferable in most use cases.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">All Master Shards</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">There are multiple proxies that are bound to the database, one on each node that hosts a database master shard. This mode fits most use cases that require multiple proxies.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">All Nodes</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">There are multiple proxies that are bound to the database, one on each node in the cluster, regardless of whether or not there is a shard from this database on the node. This mode should be used only in special cases, such as using a load balancer.</p></td>
</tr>
</tbody>
</table>
<div class="imageblock">
<div class="content">
<img src="images/proxy-thread-dashboard.png" alt="proxy thread dashboard">
</div>
<div class="title">Figure 13. Dashboard displaying proxy thread activity - <a href="https://github.com/redis-field-engineering/redis-enterprise-observability/blob/main/grafana/dashboards/grafana_v9-11/cloud/basic/redis-cloud-proxy-dashboard_v9-11.json">Proxy Thread Dashboard</a></div>
</div>
<div class="paragraph">
<p>When needed, we can tune the number of proxy threads using the "rladmin tune proxy" command in order to be able to make the proxy use more CPU cores.
Nevertheless, cores used by the proxy won’t be available for Redis, therefore we need to take into account the number of Redis nodes on the host and the total number of available cores.</p>
</div>
<div class="paragraph">
<p>How to set a new number of proxy cores using the command:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><id|all> - you can either tune a specific proxy by its id, or all proxies.</p>
</li>
<li>
<p><mode> - determines whether or not the proxy can automatically adjust the number of threads depending on load.</p>
</li>
<li>
<p><threads> and <max_threads> - determine the initial number of threads created on startup, and the maximum number of threads allowed.</p>
</li>
<li>
<p><scale_threshold> - determines the CPU utilization threshold that triggers spawning new threads. This CPU utilization level needs to be maintained for at least scale_duration seconds before automatic scaling is performed.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The following table indicates ideal proxy thread counts for the specified environments.</p>
</div>
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Total Cores</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Redis (ROR)</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Redis on Flash (ROF)</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">4</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">3</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">3</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">8</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">5</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">3</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">12</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">8</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">4</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">16</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">10</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">5</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">32</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">24</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">10</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">64/96</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">32</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">20</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">128</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">32</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">32</p></td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div class="sect1">
<h2 id="data-access-anti-patterns"><a class="anchor" href="#data-access-anti-patterns"></a>5. Data access anti-patterns</h2>
<div class="sectionbody">
<div class="paragraph">
<p>There are three data access patterns that can limit the performance of your Redis database:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Slow operations</p>
</li>
<li>
<p>Hot keys</p>
</li>
<li>
<p>Large keys</p>
</li>
</ul>
</div>
<div class="paragraph">