Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44361: [C#][Integration] Include .NET in Flight integration tests #44377

Merged
merged 8 commits into from
Oct 15, 2024

Conversation

adamreeve
Copy link
Contributor

@adamreeve adamreeve commented Oct 11, 2024

Rationale for this change

See #44361. This allows testing compatibility of the .NET Flight implementation with other Flight implementations.

What changes are included in this PR?

Are these changes tested?

These changes are tests.

Are there any user-facing changes?

No

@@ -64,7 +64,7 @@ protected virtual void Dispose(bool disposing)
{
if (!_disposed)
{
_flightDataStream.Dispose();
_flightDataStream?.Dispose();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't required for any of the included tests, but before disabling the primitive_no_batches test, it would crash here with a NullReferenceException due to the writer being disposed before the stream was created.

With this fix, writing doesn't crash, but the data isn't found when trying to retrieve it.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 11, 2024
@CurtHagenlocher
Copy link
Contributor

This is awesome; I didn't even know this test gap existed.

By the way, I have a proposed fix for #38045 at https://github.com/CurtHagenlocher/arrow/tree/dev/curth/FlightDictionaries which I didn't feel great about committing due to lack of test coverage. It'll be interesting to see whether it works.

@adamreeve
Copy link
Contributor Author

There was a failure running the new tests in CI. Surprisingly it's a C++ and C# test, which was the only combination I tested locally... I'll try to figure out what's gone wrong here:

   FAILED TEST: binary_view C++ producing,  C# consuming
  <class 'RuntimeError'>: Command failed: /arrow/csharp/artifacts/Apache.Arrow.Flight.IntegrationTest/Debug/net8.0/Apache.Arrow.Flight.IntegrationTest client --port 37771 --path /tmp/arrow-integration-be07w72w/generated_binary_view.json
  With output:
  --------------
  Unhandled exception: Grpc.Core.RpcException: Status(StatusCode="NotFound", Detail="Could not find flight./tmp/arrow-integration-be07w72w/generated_binary_view.json")
     at Apache.Arrow.Flight.Client.FlightClient.<>c.<<GetInfo>b__10_0>d.MoveNext() in /arrow/csharp/src/Apache.Arrow.Flight/Client/FlightClient.cs:line 90
  --- End of stack trace from previous location ---
     at Apache.Arrow.Flight.IntegrationTest.JsonTestScenario.RunClient() in /arrow/csharp/test/Apache.Arrow.Flight.IntegrationTest/JsonTestScenario.cs:line 73
     at Apache.Arrow.Flight.IntegrationTest.FlightClientCommand.Execute() in /arrow/csharp/test/Apache.Arrow.Flight.IntegrationTest/FlightClientCommand.cs:line 49
     at Apache.Arrow.Flight.IntegrationTest.Program.<>c.<<Main>b__0_0>d.MoveNext() in /arrow/csharp/test/Apache.Arrow.Flight.IntegrationTest/Program.cs:line 50
  --- End of stack trace from previous location ---
     at System.CommandLine.Invocation.AnonymousCommandHandler.InvokeAsync(InvocationContext context)
     at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass17_0.<<UseParseErrorReporting>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass12_0.<<UseHelp>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<<UseVersionOption>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass19_0.<<UseTypoCorrections>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__18_0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseParseDirective>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__5_0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass8_0.<<UseExceptionHandler>b__0>d.MoveNext()

@adamreeve
Copy link
Contributor Author

By the way, I have a proposed fix for #38045 at https://github.com/CurtHagenlocher/arrow/tree/dev/curth/FlightDictionaries which I didn't feel great about committing due to lack of test coverage. It'll be interesting to see whether it works.

Cool, it would be nice to fix that!

@adamreeve
Copy link
Contributor Author

There was a race condition where the C# client could try to request the data before the server stored it here:

I could reliably reproduce it by adding a small sleep in the C++ server before that line.

Reading to the end of the response stream in the client seems to fix it.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great!

I'll give some people more familiar with the infrastructure some time to comment before committing.

@CurtHagenlocher CurtHagenlocher merged commit 1dcd145 into apache:main Oct 15, 2024
11 checks passed
@CurtHagenlocher CurtHagenlocher removed the awaiting committer review Awaiting committer review label Oct 15, 2024
@adamreeve adamreeve deleted the csharp-flight-integration branch October 15, 2024 20:50
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 1dcd145.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@alamb
Copy link
Contributor

alamb commented Oct 20, 2024

Hi @adamreeve -- thank you for adding this test. This test unfortunately fails in arrow-rs -- see apache/arrow-rs#6577

Do you have any hints about how we can potentially fix this?

Here is how the job is run: https://github.com/apache/arrow-rs/blob/1666a4d7b237370c167e2b90b45d1354e28f500a/.github/workflows/integration.yml#L50-L114

Here is a specific message from the logs https://github.com/apache/arrow-rs/actions/runs/11409436304/job/31749661931

 ======================================================================
  Testing file null
  Traceback (most recent call last):
    File "/__w/arrow-rs/arrow-rs/dev/archery/archery/integration/util.py", line 136, in run_cmd
      output = subprocess.check_output(cmd, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/envs/arrow/lib/python3.11/subprocess.py", line 466, in check_output
      return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/envs/arrow/lib/python3.11/subprocess.py", line 571, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/__w/arrow-rs/arrow-rs/csharp/artifacts/Apache.Arrow.Flight.IntegrationTest/Debug/net8.0/Apache.Arrow.Flight.IntegrationTest', 'client', '--port', '46599', '--path', '/tmp/arrow-integration-2gthhuy4/generated_null.json']' returned non-zero exit status 1.
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/__w/arrow-rs/arrow-rs/dev/archery/archery/integration/runner.py", line 435, in _run_flight_test_case
      consumer.flight_request(port, **client_args)
    File "/__w/arrow-rs/arrow-rs/dev/archery/archery/integration/tester_csharp.py", line 218, in flight_request
      run_cmd(cmd)
    File "/__w/arrow-rs/arrow-rs/dev/archery/archery/integration/util.py", line 145, in run_cmd
      raise RuntimeError(sio.getvalue())
  RuntimeError: Command failed: /__w/arrow-rs/arrow-rs/csharp/artifacts/Apache.Arrow.Flight.IntegrationTest/Debug/net8.0/Apache.Arrow.Flight.IntegrationTest client --port 46599 --path /tmp/arrow-integration-2gthhuy4/generated_null.json
  With output:
  --------------
  Unhandled exception: Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.ArgumentException: IPv4 address 0.0.0.0 and IPv6 address ::0 are unspecified addresses that cannot be used as a target address. (Parameter 'hostName')")
   ---> System.ArgumentException: IPv4 address 0.0.0.0 and IPv6 address ::0 are unspecified addresses that cannot be used as a target address. (Parameter 'hostName')
     at System.Net.Dns.GetHostEntryOrAddressesCoreAsync(String hostName, Boolean justReturnParsedIp, Boolean throwOnIIPAny, Boolean justAddresses, AddressFamily family, CancellationToken cancellationToken)
     at System.Net.Dns.GetHostAddressesAsync(String hostNameOrAddress, AddressFamily family, CancellationToken cancellationToken)
     at System.Net.Sockets.SocketAsyncEventArgs.DnsConnectAsync(DnsEndPoint endPoint, SocketType socketType, ProtocolType protocolType)
     at System.Net.Sockets.Socket.ConnectAsync(SocketAsyncEventArgs e, Boolean userSocket, Boolean saeaCancelable)
     at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ConnectAsync(Socket socket)
     at System.Net.Sockets.Socket.ConnectAsync(EndPoint remoteEP, CancellationToken cancellationToken)
     at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.OnConnect(Socket socket, DnsEndPoint endpoint, CancellationToken cancellationToken)
     at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt)
     --- End of inner exception stack trace ---
     at Apache.Arrow.Flight.Internal.RecordBatchReaderImplementation.ReadSchemaAsync(CancellationToken cancellationToken) in /__w/arrow-rs/arrow-rs/csharp/src/Apache.Arrow.Flight/Internal/RecordBatchReaderImplementation.cs:line 74
     at Apache.Arrow.Flight.Internal.RecordBatchReaderImplementation.ReadNextRecordBatchAsync(CancellationToken cancellationToken) in /__w/arrow-rs/arrow-rs/csharp/src/Apache.Arrow.Flight/Internal/RecordBatchReaderImplementation.cs:line 118
     at Apache.Arrow.Flight.FlightRecordBatchStreamReader.MoveNext(CancellationToken cancellationToken) in /__w/arrow-rs/arrow-rs/csharp/src/Apache.Arrow.Flight/FlightRecordBatchStreamReader.cs:line 64
     at Apache.Arrow.Flight.IntegrationTest.Scenarios.JsonTestScenario.ConsumeFlightLocation(FlightClient client, FlightTicket ticket, RecordBatch[] batches) in /__w/arrow-rs/arrow-rs/csharp/test/Apache.Arrow.Flight.IntegrationTest/Scenarios/JsonTestScenario.cs:line 160
     at Apache.Arrow.Flight.IntegrationTest.Scenarios.JsonTestScenario.RunClient(Int32 serverPort) in /__w/arrow-rs/arrow-rs/csharp/test/Apache.Arrow.Flight.IntegrationTest/Scenarios/JsonTestScenario.cs:line 109
     at Apache.Arrow.Flight.IntegrationTest.FlightClientCommand.Execute() in /__w/arrow-rs/arrow-rs/csharp/test/Apache.Arrow.Flight.IntegrationTest/FlightClientCommand.cs:line 45
     at Apache.Arrow.Flight.IntegrationTest.Program.<>c.<<Main>b__0_0>d.MoveNext() in /__w/arrow-rs/arrow-rs/csharp/test/Apache.Arrow.Flight.IntegrationTest/Program.cs:line 50
  --- End of stack trace from previous location ---
     at System.CommandLine.Invocation.AnonymousCommandHandler.InvokeAsync(InvocationContext context)
     at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass17_0.<<UseParseErrorReporting>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass12_0.<<UseHelp>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<<UseVersionOption>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass19_0.<<UseTypoCorrections>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__18_0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseParseDirective>b__0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__5_0>d.MoveNext()
  --- End of stack trace from previous location ---
     at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass8_0.<<UseExceptionHandler>b__0>d.MoveNext()
  
  --------------

@adamreeve
Copy link
Contributor Author

Hi @alamb, sorry about this, it looks like in apache/arrow the Flight integration tests are only run with Rust as a client/consumer and not as a server/producer, so this wasn't caught in the CI here.

I can reproduce this locally and can see that when the C# client makes a get_info request to the Rust server, the Rust server sends a location like "grpc+tcp://0.0.0.0:38119". In comparison, the C++ server for example returns locations that use the ipv4 loopback address, eg. "grpc+tcp://127.0.0.1:43861".

This looks like a bug in the Rust integration test server to me, as 0.0.0.0 is valid to use as an IP address when binding to a port as a server, but as the .NET exception says, this is an "unspecified address that cannot be used as a target address". The Rust server should specify a public address that clients can connect to, or return an empty list of locations to specify that the current connection can be reused (from the Flight RPC docs).

I'm not sure why other clients don't have a problem with this though. It seems like the C++ client is quite happy to connect to a "0.0.0.0" address for example, which is a little surprising to me, so maybe I'm misunderstanding this.

@kou
Copy link
Member

kou commented Oct 21, 2024

it looks like in apache/arrow the Flight integration tests are only run with Rust as a client/consumer and not as a server/producer, so this wasn't caught in the CI here.

Oh, sorry. It's a bug of #44099. Could you open an issue for this?

@adamreeve
Copy link
Contributor Author

This change in arrow-rs fixes the tests for me:

--- a/arrow-integration-testing/src/flight_server_scenarios/integration_test.rs
+++ b/arrow-integration-testing/src/flight_server_scenarios/integration_test.rs
@@ -48,9 +48,10 @@ type Result<T = (), E = Error> = std::result::Result<T, E>;
 /// Run a scenario that tests integration testing.
 pub async fn scenario_setup(port: u16) -> Result {
     let addr = super::listen_on(port).await?;
+    let resolved_port = addr.port();
 
     let service = FlightServiceImpl {
-        server_location: format!("grpc+tcp://{addr}"),
+        server_location: format!("grpc+tcp://localhost:{resolved_port}"),
         ..Default::default()
     };
     let svc = FlightServiceServer::new(service);

But I'm not 100% sure this isn't something the C# client should handle, given the other clients seem OK with it.

@adamreeve
Copy link
Contributor Author

Oh, sorry. It's a bug of #44099. Could you open an issue for this?

Sure, I've raised #44479

@alamb
Copy link
Contributor

alamb commented Oct 21, 2024

This change in arrow-rs fixes the tests for me:

Thank you 🙏 @adamreeve

I made a PR to try out your proposed workaround in arrow-rs: apache/arrow-rs#6611

@CurtHagenlocher
Copy link
Contributor

I'm relatively unfamiliar with Flight in general (to say nothing of any given implementation) but I have to wonder whether this reflects a product bug in the C# code. If I connect to a server and ask it for information and it tells me "find that information at "grpc+tcp://0.0.0.0:38119", then my assumption would be that I'm meant to connect to an endpoint on the same server that I originally connected to.

@adamreeve
Copy link
Contributor Author

The Flight docs specify other ways to reuse the same server location though, either using an empty list of locations or the arrow-flight-reuse-connection: URI scheme. I guess you might want to reuse the same host with a different port, but this isn't how 0.0.0.0 is interpreted by other implementations, at least the Rust and C++ ones I've looked at. They don't have any special handling of 0.0.0.0 that I could find so it appears that that IP address is just interpreted as the localhost/loopback device by the underlying GRPC library or lower level networking library.

From looking into this a bit more, it seems like using 0.0.0.0 to refer to localhost/loopback when used as a target IP is non-standard but not that uncommon, and the C# System.Net library is just more strict about following specifications. Eg. see discussions at https://serverfault.com/questions/706137/why-does-0-0-0-0-resolve-to-the-loopback-device and https://superuser.com/questions/949428/whats-the-difference-between-127-0-0-1-and-0-0-0-0.

Given that this would be fixed in either the Rust or C# integration test code without changing the Flight libraries, I guess it's not too consequential which approach is used, and I'm happy to update the C# integration test client to handle this. But the integration test code may be used by users as an example of what to do, so it seems like changing the Rust server might be best to set a good example of what to do to provide a more compatible server.

@alamb
Copy link
Contributor

alamb commented Oct 23, 2024

so it seems like changing the Rust server might be best to set a good example of what to do to provide a more compatible server.

This PR proposes to do that: apache/arrow-rs#6611

However, @tustvold is still helping me get the incantation just right (see apache/arrow-rs#6611 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants