Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added: Optimized ToLowerInvariant Conversions for .NET 7 and LowerInvariantStringHash. Improved StringHash #3

Merged
merged 10 commits into from
Nov 16, 2023
1 change: 1 addition & 0 deletions .github/workflows/build-and-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ jobs:
- ubuntu-latest
- macos-13
targetFramework:
- net8.0
- net7.0
- net6.0
- net5.0
Expand Down
86 changes: 85 additions & 1 deletion docs/Extensions/StringExtensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,64 @@ Counts the number of occurrences of a given character in a target `string` insta

!!! warning "SIMD method currently restricted to .NET 7+. PRs for backports are welcome."

!!! warning "Will produce different hashes depending on runtime."
!!! warning "Will produce different hashes depending on runtime or CPU."

!!! info "Optimised for File Paths specifically"

```csharp
public static nuint GetHashCodeFast(string text)
public static unsafe nuint GetHashCodeFast(this ReadOnlySpan<char> text)
```

Faster hashcode for strings; but does not randomize between application runs.

Use this method if and only if 'Denial of Service' attacks are not a concern
(i.e. never used for free-form user input), or are otherwise mitigated.

This method does not provide guarantees about producing the same hash across different machines or library versions,
or runtime; only for the current process. Instead, it prioritises speed over all.

### GetHashCodeLowerFast

!!! warning "SIMD method currently restricted to .NET 7+. PRs for backports are welcome."

!!! warning "Will produce different hashes depending on runtime or CPU."

!!! info "Optimised for File Paths specifically"

```csharp
public static nuint GetHashCodeLowerFast(this string text)
public static unsafe nuint GetHashCodeLowerFast(this ReadOnlySpan<char> text)
```

Faster hashcode for strings, hashed in lower (invariant) case; does not randomize between application runs.

Use this method if and only if 'Denial of Service' attacks are not a concern
(i.e. never used for free-form user input), or are otherwise mitigated.

This method does not provide guarantees about producing the same hash across different machines or library versions,
or runtime; only for the current process. Instead, it prioritises speed over all.

### ToLowerInvariantFast

```csharp
public static string ToLowerInvariantFast(this string text)
public static unsafe void ToLowerInvariantFast(this ReadOnlySpan<char> text, Span<char> target)
```

Converts the given string to lower case (invariant casing) using the fastest possible implementation.
This method is optimized for performance but currently has limitations for short non-ASCII inputs.

### ToUpperInvariantFast

```csharp
public static string ToUpperInvariantFast(this string text)
public static unsafe void ToUpperInvariantFast(this ReadOnlySpan<char> text, Span<char> target)
```

Converts the given string to upper case (invariant casing) using the fastest possible implementation.
This method is optimized for performance but currently has limitations for short non-ASCII inputs.

## Usage

### Get Reference to First Element in String
Expand Down Expand Up @@ -77,4 +124,41 @@ int count = text.Count(targetChar);
```csharp
string text = "Hello, world!";
nuint fastHashCode = text.GetHashCodeFast();
```

### Get Lower Case Hash Code

```csharp
string text = "Hello, World!";
nuint lowerCaseHashCode = text.GetHashCodeLowerFast();
```

## Convert String to Lower Case Invariant Fast

```csharp
string text = "Hello, WORLD!";
string lowerInvariant = text.ToLowerInvariantFast(); // hello, world!
```

### Convert String to Upper Case Invariant Fast

```csharp
string text = "hello, world!";
string upperInvariant = text.ToUpperInvariantFast(); // HELLO, WORLD!
```

### Convert ReadOnlySpan to Lower Case Invariant Fast

```csharp
string text = "Hello, WORLD!";
Span<char> target = stackalloc char[textSpan.Length]; // Careful with string length!
text.AsSpan().ToLowerInvariantFast(target); // hello, world! (on stack)
```

### Convert ReadOnlySpan to Upper Case Invariant Fast

```csharp
string text = "hello, world!";
Span<char> target = stackalloc char[textSpan.Length]; // Careful with string length!
text.AsSpan().ToLowerInvariantFast(target); // HELLO, WORLD! (on stack)
```
109 changes: 109 additions & 0 deletions src/Reloaded.Memory.Benchmarks/Benchmarks/StringChangeCaseAscii.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
using System.Diagnostics.CodeAnalysis;
using BenchmarkDotNet.Attributes;
using Reloaded.Memory.Benchmarks.Framework;
using Reloaded.Memory.Benchmarks.Utilities;
using Reloaded.Memory.Extensions;

namespace Reloaded.Memory.Benchmarks.Benchmarks;

[MinColumn]
[MaxColumn]
[MedianColumn]
[DisassemblyDiagnoser(printInstructionAddresses: true)]
[BenchmarkInfo("String Change Case (ASCII Only)", "Measures the performance of changing the case of an ASCII string using invariant rules.", Categories.Performance)]
[SuppressMessage("ReSharper", "RedundantAssignment")]
public class StringChangeCaseAsciiBenchmark
{
private static readonly Random _random = new();
private const int ItemCount = 10000;

[Params(4, 12, 32, 64)] public int CharacterCount { get; set; }

public string[] Input { get; set; } = null!;

[GlobalSetup]
public void Setup()
{
Input = new string[ItemCount];

for (var x = 0; x < ItemCount; x++)
Input[x] = StringGenerators.RandomStringAsciiMixedCase(CharacterCount);
}

[Benchmark]
public nuint ToLowerInvariantFast_Custom()
{
nuint result = 0;
var maxLen = Input.Length / 4;
Span<char> outBuf = stackalloc char[CharacterCount];

// unroll
for (var x = 0; x < maxLen; x += 4)
{
Input.DangerousGetReferenceAt(x).AsSpan().ToLowerInvariantFast(outBuf);
Input.DangerousGetReferenceAt(x + 1).AsSpan().ToLowerInvariantFast(outBuf);
Input.DangerousGetReferenceAt(x + 2).AsSpan().ToLowerInvariantFast(outBuf);
Input.DangerousGetReferenceAt(x + 3).AsSpan().ToLowerInvariantFast(outBuf);
}

return result;
}

[Benchmark]
public nuint ToLowerInvariant_Runtime()
{
nuint result = 0;
var maxLen = Input.Length / 4;
Span<char> outBuf = stackalloc char[CharacterCount];

// unroll
for (var x = 0; x < maxLen; x += 4)
{
Input.DangerousGetReferenceAt(x).AsSpan().ToLowerInvariant(outBuf);
Input.DangerousGetReferenceAt(x + 1).AsSpan().ToLowerInvariant(outBuf);
Input.DangerousGetReferenceAt(x + 2).AsSpan().ToLowerInvariant(outBuf);
Input.DangerousGetReferenceAt(x + 3).AsSpan().ToLowerInvariant(outBuf);
}

return result;
}

[Benchmark]
public nuint ToUpperInvariantFast_Custom()
{
nuint result = 0;
var maxLen = Input.Length / 4;
Span<char> outBuf = stackalloc char[CharacterCount];

// unroll
for (var x = 0; x < maxLen; x += 4)
{
Input.DangerousGetReferenceAt(x).AsSpan().ToUpperInvariantFast(outBuf);
Input.DangerousGetReferenceAt(x + 1).AsSpan().ToUpperInvariantFast(outBuf);
Input.DangerousGetReferenceAt(x + 2).AsSpan().ToUpperInvariantFast(outBuf);
Input.DangerousGetReferenceAt(x + 3).AsSpan().ToUpperInvariantFast(outBuf);
}

return result;
}

[Benchmark]
public nuint ToUpperInvariant_Runtime()
{
nuint result = 0;
var maxLen = Input.Length / 4;
Span<char> outBuf = stackalloc char[CharacterCount];

// unroll
for (var x = 0; x < maxLen; x += 4)
{
Input.DangerousGetReferenceAt(x).AsSpan().ToUpperInvariant(outBuf);
Input.DangerousGetReferenceAt(x + 1).AsSpan().ToUpperInvariant(outBuf);
Input.DangerousGetReferenceAt(x + 2).AsSpan().ToUpperInvariant(outBuf);
Input.DangerousGetReferenceAt(x + 3).AsSpan().ToUpperInvariant(outBuf);
}

return result;
}
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
using System.Diagnostics.CodeAnalysis;
using BenchmarkDotNet.Attributes;
using Reloaded.Memory.Benchmarks.Framework;
using Reloaded.Memory.Benchmarks.Utilities;
using Reloaded.Memory.Extensions;

namespace Reloaded.Memory.Benchmarks.Benchmarks;

[MinColumn]
[MaxColumn]
[MedianColumn]
[DisassemblyDiagnoser(printInstructionAddresses: true)]
[BenchmarkInfo("String Change Case (Unicode Only)", "Measures the overhead of a failed accelerated change case.", Categories.Performance)]
[SuppressMessage("ReSharper", "RedundantAssignment")]
public class StringChangeCaseUnicodeOnlyBenchmark
{
private static readonly Random _random = new();
private const int ItemCount = 10000;

[Params(4, 12, 32, 64)] public int CharacterCount { get; set; }

public string[] Input { get; set; } = null!;

[GlobalSetup]
public void Setup()
{
Input = new string[ItemCount];

for (var x = 0; x < ItemCount; x++)
Input[x] = StringGenerators.RandomStringOfProblematicCharacters(CharacterCount);
}

[Benchmark]
public nuint ToLowerInvariantFast_Custom()
{
nuint result = 0;
var maxLen = Input.Length / 4;
Span<char> outBuf = stackalloc char[CharacterCount];

// unroll
for (var x = 0; x < maxLen; x += 4)
{
Input.DangerousGetReferenceAt(x).AsSpan().ToLowerInvariantFast(outBuf);
Input.DangerousGetReferenceAt(x + 1).AsSpan().ToLowerInvariantFast(outBuf);
Input.DangerousGetReferenceAt(x + 2).AsSpan().ToLowerInvariantFast(outBuf);
Input.DangerousGetReferenceAt(x + 3).AsSpan().ToLowerInvariantFast(outBuf);
}

return result;
}

[Benchmark]
public nuint ToLowerInvariant_Runtime()
{
nuint result = 0;
var maxLen = Input.Length / 4;
Span<char> outBuf = stackalloc char[CharacterCount];

// unroll
for (var x = 0; x < maxLen; x += 4)
{
Input.DangerousGetReferenceAt(x).AsSpan().ToLowerInvariant(outBuf);
Input.DangerousGetReferenceAt(x + 1).AsSpan().ToLowerInvariant(outBuf);
Input.DangerousGetReferenceAt(x + 2).AsSpan().ToLowerInvariant(outBuf);
Input.DangerousGetReferenceAt(x + 3).AsSpan().ToLowerInvariant(outBuf);
}

return result;
}
}

Loading