Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: codec-sv2 crate #1129

Open
rrybarczyk opened this issue Aug 21, 2024 · 23 comments
Open

Refactor: codec-sv2 crate #1129

rrybarczyk opened this issue Aug 21, 2024 · 23 comments
Assignees
Labels
codec-sv2 protocols Lowest level protocol logic refactor Implies refactoring code

Comments

@rrybarczyk
Copy link
Collaborator

rrybarczyk commented Aug 21, 2024

Background

This task is an outcome of the protocols Rust docs issues tracked in #845.

While documenting protocols::v2::codec-sv2 in #1012, areas of potential code debt were identified. This issue servers as a place to list out these items to then be addressed in an organized manner. The initial Rust documentation effort was an immediate action, while a refactoring (which implies breaking API changes) is not so urgent priority, so for now we should leave this in the backlog for an appropriate moment in the future.

Note: This issue will replace the #873 discussion.

Identified Potential Code Debt

Error Variant Naming

Suggestions for codec_sv2::error::Error and ::CError refactor:

  • Instead of importing at the top of the file:
    • Use noise_sv2 crate directly in the Error variants like AeadError(noise_sv2::AeadError) and NoiseSv2Error(noise_sv2::Error).
    • Use framing_sv2 crate directly in the Error variant like FramingSv2Error(framing_sv2::Error).
  • There are two definitions of the framing_sv2::Error variant: FramingError and FramingSv2Error, remove one of these variant and handle any downstream effects.
  • Rename (applies to both the codec_sv2::Error and codec_sv2::CError enums):
  • AeadError -> Either keep as is or do:NoiseSv2Aead or NoiseAead
  • BinarySv2Error -> BinarySv2or Binary
  • FramingSv2Error/FramingError -> FramingSv2 or Framing
  • NoiseSv2Error -> Noise
  • Alphabetize the error variants, the match arms of the implementations for the Error enums, and the order of the impl From<external-error> for Error

Uniform Spelling

  • We are using both HandShake and Handshake. Everything should be Handshake with a lowercase "s" (this is how it is defined in The Noise Protocol Framework) . This impacts:
    • codec_sv2:::
      • lib::State::HandShake -> lib::State::Handshake
      • error::Error::NotInHandShake -> error::Error::NotInHandshake
      • error::CError::NotInHandShake -> error::CError::NotInHandshake
    • framing_sv2::framing2:: (there may be more in framing_sv2, but these are what I have noticed so far):
      • HandShakeFrame -> HandshakeFrame
      • EitherFrame::HandShake -> EitherFrame::Handshake
  • We have decoder::WithNoise::decode_noise_frame and then encoder::NoiseEncoder::encode. Would it make sense to make these methods uniform like decoder::WithNoise::decode and encoder::NoiseEncoder::encode?
  • We have decoder::WithNoise and decoder::WithoutNoise. Then we have encoder::NoiseEncoder and encoder::Encoder. Should we make it decoder::WithNoise, decoder::WithoutNoise, encoder::WithNoise, and encoder::WithoutNoise? Or decoder::NoiseDecoder, decoder::Decoder, encoder::NoiseEncoder and encoder::Encoder?

Imports

  • Small thing, but in codec_sv2::lib, I am not a big fan of importing framing_sv2::framing2::handshake_message_to_frame as h2f. I think we should import framing_sv2::framing2 then use framing2::handshake_message_to_frame directly. The upside is it is clearer to the reader, the downside is it is more verbose.
  • use core::cmp is imported in codec_sv2::error with the #[cfg(test)] declarator, but when running cargo test I get the following warning saying it is not used. Should it be removed?
warning: unused import: `core::cmp`
 --> v2/codec-sv2/src/error.rs:8:5
  |
8 | use core::cmp;
  |     ^^^^^^^^^
  • In codec_sv2::decoder, some imports can be combined into one line. For example, use binary_sv2::{GetSize, Serialize}.
  • In codec_sv2::encoder and ::decoder, if the with_buffer_pool feature flag is disabled, we default Buffer to be buffer_sv2::BufferFromSystemMemory which is imposed in the import statement as use buffer_sv2::{Buffer as IsBuffer, BufferFromSystemMemory as Buffer};. To make this more readable, we should explicitly create the type in the same way that we do when the with_buffer_pool feature flag is enabled:
 #[cfg(not(feature = "with_buffer_pool"))]
  type Buffer = BufferFromSystemMemory;

 #[cfg(feature = "with_buffer_pool")]
  type Buffer = BufferPool<BufferFromSystemMemory>;

Exports

Use of Result types

The following methods use core::result::Result but should use codec_sv2::Result (unless there is a reason I am not seeing that requires using core::result::Result):

  • State::step_0
  • State::step_1
  • State::step_2

Uniform Struct Fields

  • decoder::WithNoise and encoder::WithoutNoise have some shared fields, these fields should appear first and be listed in the same order in each struct.

Misc

  • Should hard coded default buffer sizes be defined and constants?
  • error module imports use core::cmp under the #[cfg(test)] but it is unused.
  • In the decoder.rs module the types StandardEitherFrame and StandardSv2Frame are defined, however these frames can represent an encoded OR decoded Sv2 frame. I think we should move them out of the decoder module into a more appropriate place.
  • In the encoder.rs module, we have type Slice = buffer_sv2::Slice; with the #[cfg(feature = "with_buffer_pool")] decorator. Why do we need this type if it is simply assigned to buffer_sv2::Slice? Could we just directly use buffer_sv2::Slice?
  • In encoder.rs, we have the type Slice = Vec<u8> commented out. Can this be completely removed?
  • new methods should always come first (exception is if default is present, which it is not for this crate) in an impl. See impl<T: Serialize + GetSize> Encoder<T> in encoder.
  • Consider creating a struct type to hold the message, message type, extension type, and channel message required to create a new StandardSv2Frame/StandardEitherFrame. Right now all being passed into the from_message method. Could be fine, but if there are other instances where we see the need to use all these variables together, its own struct may be useful.
  • State logic exists in lib.rs. It was not immediately apparent to me to look there when trying to locate Starte related code. Consider moving this logic to its own state.rs or handshake_state.rs module.
  • State has two different impl blocks that can be combined into a single impl block.
@rrybarczyk rrybarczyk added refactor Implies refactoring code protocols Lowest level protocol logic codec-sv2 labels Aug 21, 2024
@rrybarczyk rrybarczyk self-assigned this Aug 21, 2024
@plebhash
Copy link
Collaborator

we should seriously consider merging codec_sv2 + framing_sv2 crates

#873 (comment)


perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

@Fi3
Copy link
Collaborator

Fi3 commented Sep 16, 2024

we should seriously consider merging codec_sv2 + framing_sv2 crates

#873 (comment)

perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

why?

@rrybarczyk
Copy link
Collaborator Author

we should seriously consider merging codec_sv2 + framing_sv2 crates
#873 (comment)
perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

why?

These crates are closely linked. To my understanding, you never use one with out the other. For this reason, we should combine them. Currently with them separate, you constantly have to cross reference between these two crates.

Unless there is a reason to not combine these two crates, we should plan on this refactor. @Fi3, is there any reason not to combine that crates that you know of?

@Fi3
Copy link
Collaborator

Fi3 commented Sep 16, 2024

we should seriously consider merging codec_sv2 + framing_sv2 crates
#873 (comment)
perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

why?

These crates are closely linked. To my understanding, you never use one with out the other. For this reason, we should combine them. Currently with them separate, you constantly have to cross reference between these two crates.

Unless there is a reason to not combine these two crates, we should plan on this refactor. @Fi3, is there any reason not to combine that crates that you know of?

I just don't see an advantage that justify the work of putting them together. What is this advantage? Why we should spend time doing that?

@rrybarczyk
Copy link
Collaborator Author

we should seriously consider merging codec_sv2 + framing_sv2 crates
#873 (comment)
perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

why?

These crates are closely linked. To my understanding, you never use one with out the other. For this reason, we should combine them. Currently with them separate, you constantly have to cross reference between these two crates.
Unless there is a reason to not combine these two crates, we should plan on this refactor. @Fi3, is there any reason not to combine that crates that you know of?

I just don't see an advantage that justify the work of putting them together. What is this advantage? Why we should spend time doing that?

Now is the time where we are really scrutinizing these crates to solidify them moving forward. Can you help me understand by giving me an example of when one of these crates would be used without the other?

@plebhash
Copy link
Collaborator

plebhash commented Sep 16, 2024

We need to make sure the low-level APIs are lean and easy to use. Right now, they are extremely convoluted. There's no way to understand how to use one single crate without looking for references in multiple other crates.

Maybe this is a blindspot for you @Fi3 , because the APIs are fresh in your head and you can navigate them easily (since you wrote most of them).

But for us, even while just documenting these crates, it has been extremely challenging to understand how everything is meant to be used together.

Our mission here is to make sure the codebase can be easily used by the entire ecosystem, not just one single pool.

@Fi3
Copy link
Collaborator

Fi3 commented Sep 16, 2024

IMHO is lot easier to make a crate that export everything and is well documented rather then refactor all the crates

@Fi3
Copy link
Collaborator

Fi3 commented Sep 16, 2024

@Fi3
Copy link
Collaborator

Fi3 commented Sep 16, 2024

also low level crates are not supposed to be used but we should use an higher level library designed to be easy to use ecc ecc. The low level crates should be used only in special occasion where you really need them, for example for ffi

@rrybarczyk
Copy link
Collaborator Author

Ok, I will take a look at that extension and dig deeper into the code. Thank you.

@Fi3
Copy link
Collaborator

Fi3 commented Sep 16, 2024

ext is just an example main point is

IMHO is lot easier to make a crate that export everything and is well documented rather then refactor all the crates

@jbesraa
Copy link
Contributor

jbesraa commented Sep 17, 2024

I dont have a good input about merging/not merging. I will only share that as a new dev in the team, whenever I saw a crate under protocols folder, I immediately thought that there is a protocol specification for the crate, living in sv2-specs. Now I know this is not the case, but I do think it is miss-leading.

@Fi3
Copy link
Collaborator

Fi3 commented Sep 17, 2024

having several low level crates have the main advantage of keeping the lib as small as possible and easier to test and to review. framing-sv2 is very small and I agree that could be merged, I would put it in binary-sv2 not in codec. I also think that is more than fine like that, and that we shouldn't waste our time and energy in merging it.

@plebhash
Copy link
Collaborator

I can definitely entertain @Fi3's argument about effort. Whenever we start planning the scope how we will mitigate technical debt, I fully agree that it is very important for us to evaluate the trade-offs of all proposed changes.

And for sure, there will absolutely be cases where the benefits will not justify the efforts.

I'm not saying this will be the case here. I don't have a clear perspective around that yet, especially since we still have not fully studied the entire scope of protocols.

To be honest, this discussion is still somewhat premature (which is my fault, since I brought up this topic). After we cover 100% protocols crates, we will be in a much better position to have these debates.

What I said about potentially merging framing_sv2 + codec_sv2 was just a random thought that occurred to me while I reviewed #1040.

It's actually quite interesting that @Fi3 said that it would probably be better to merge framing_sv2 into binary_sv2 instead of codec_sv2. I will tag @Shourya742 here so he is aware, but with a careful warning that his mindset while reading this should be to simply to entertain this possibility while he documents and reviews binary_sv2/no-serde.

For now, at least in my mind, all options are on the table. The main point is that it is very important that we do our best to learn protocols crates and really become experts at them, just like @Fi3 is. And maybe at the end of that journey, we could even come to the conclusion that everything is fine the way it is.

Or maybe not. Time will tell. 🧘

@plebhash
Copy link
Collaborator

plebhash commented Sep 17, 2024

on the top issue description @rrybarczyk suggested:

Uniform Spelling

  • ...
  • We have decoder::WithNoise and decoder::WithoutNoise. Then we have encoder::NoiseEncoder and encoder::Encoder. Should we make it decoder::WithNoise, decoder::WithoutNoise, encoder::WithNoise, and encoder::WithoutNoise? Or decoder::NoiseDecoder, decoder::Decoder, encoder::NoiseEncoder and encoder::Encoder?

I would suggest the following naming:

  • encoder.rs:
#[cfg(feature = "noise_sv2")]
pub type Encoder<T> = EncryptedEncoder<T>;

#[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear
pub type Encoder<T> = PlainEncoder<T>;

pub struct EncryptedEncoder<T> {
    noise_buffer: Buffer,
    sv2_buffer: Buffer,
    frame: PhantomData<T>,
}

pub struct PlainEncoder<T> {
    buffer: Vec<u8>,
    frame: PhantomData<T>,
}
  • decoder.rs:
#[cfg(feature = "noise_sv2")]
pub type Decoder<B, T> = EncryptedDecoder<B, T>;

#[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear
pub type Decoder<B, T> = PlainDecoder<B, T>;

pub struct EncryptedDecoder<B, T> {
    frame: PhantomData<T>,
    missing_noise_b: usize,
    noise_buffer: B,
    sv2_buffer: B,
}

struct PlainDecoder<B, T> {
    frame: PhantomData<T>,
    missing_b: usize,
    buffer: B,
}

It's not necessarily the final code organization, just some pseudo-rust to convey the idea around the keywords:

  • Encrypted vs Plain (which depends on noise_sv2 feature flag)
  • Encoder vs Decoder

@rrybarczyk
Copy link
Collaborator Author

on the top issue description @rrybarczyk suggested:

Uniform Spelling

  • ...
  • We have decoder::WithNoise and decoder::WithoutNoise. Then we have encoder::NoiseEncoder and encoder::Encoder. Should we make it decoder::WithNoise, decoder::WithoutNoise, encoder::WithNoise, and encoder::WithoutNoise? Or decoder::NoiseDecoder, decoder::Decoder, encoder::NoiseEncoder and encoder::Encoder?

I would suggest the following naming:

  • encoder.rs:
#[cfg(feature = "noise_sv2")]
pub type Encoder<T> = EncryptedEncoder<T>;

#[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear
pub type Encoder<T> = PlainEncoder<T>;

pub struct EncryptedEncoder<T> {
    noise_buffer: Buffer,
    sv2_buffer: Buffer,
    frame: PhantomData<T>,
}

pub struct PlainEncoder<T> {
    buffer: Vec<u8>,
    frame: PhantomData<T>,
}
  • decoder.rs:
#[cfg(feature = "noise_sv2")]
pub type Decoder<B, T> = EncryptedDecoder<B, T>;

#[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear
pub type Decoder<B, T> = PlainDecoder<B, T>;

pub struct EncryptedDecoder<B, T> {
    frame: PhantomData<T>,
    missing_noise_b: usize,
    noise_buffer: B,
    sv2_buffer: B,
}

struct PlainDecoder<B, T> {
    frame: PhantomData<T>,
    missing_b: usize,
    buffer: B,
}

It's not necessarily the final code organization, just some pseudo-rust to convey the idea around the keywords:

  • Encrypted vs Plain (which depends on noise_sv2 feature flag)
  • Encoder vs Decoder

I am not sold on Plain. We have been using the word "standard" to express unencrypted logic (correct me if I am wrong). So for the unencrypted, I prefer StandardEncoder/StandardDecoder, or just simply Encoder/Decoder.

For the encrypted, we I do like EncryptedEncoder/EncryptedDecoder.

@rrybarczyk
Copy link
Collaborator Author

  • Why do we have WithoutNoise::default and WithNoise::default if they just call Self::new()?
  • When should State::not_initialized be used?

@plebhash
Copy link
Collaborator

plebhash commented Oct 9, 2024

We have been using the word "standard" to express unencrypted logic (correct me if I am wrong).

That's not the case. The Standard* aliases are related to Buffer. There's even a StandardNoiseDecoder, which as the name implies, is encrypted. I asked some questions related to these aliases here #873 (comment)

tbh I'm not sold on these aliases... maybe we can do something better, but that's a different discussion (hopefully after buffer_sv2 has been properly documented).

Anyways, if Plain* is not good, we can do either Encoder / Decoder or UnencryptedEncoder / UnencryptedDecoder.

@rrybarczyk
Copy link
Collaborator Author

We have been using the word "standard" to express unencrypted logic (correct me if I am wrong).

That's not the case. The Standard* aliases are related to Buffer. There's even a StandardNoiseDecoder, which as the name implies, is encrypted. I asked some questions related to these aliases here #873 (comment)

tbh I'm not sold on these aliases... maybe we can do something better, but that's a different discussion (hopefully after buffer_sv2 has been properly documented).

Anyways, if Plain* is not good, we can do either Encoder / Decoder or UnencryptedEncoder / UnencryptedDecoder.

Ok, lets do that: Encoder/Decoder and EncryptedEncoder/EncryptedDecoder.

@jbesraa
Copy link
Contributor

jbesraa commented Oct 10, 2024

Wouldn't NoiseEncoder and NoiseDecoder be more consistent with the current naming in the projects?

@rrybarczyk
Copy link
Collaborator Author

Wouldn't NoiseEncoder and NoiseDecoder be more consistent with the current naming in the projects?

I do like that. Thoughts @plebhash?

@Shourya742
Copy link
Contributor

#1040 (comment)

@rrybarczyk
Copy link
Collaborator Author

rrybarczyk commented Oct 14, 2024

When running the unencrypted and encrypted examples, the following warnings occur:

cargo run --example unencrypted
warning: unexpected `cfg` condition value: `with_serde`
  --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:31:11
   |
31 | #[cfg(not(feature = "with_serde"))]
   |           ^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expected values for `feature` are: `buffer_sv2`, `default`, `no_std`, `prop_test`, `quickcheck`, and `with_buffer_pool`
   = help: consider adding `with_serde` as a feature in `Cargo.toml`
   = note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration
   = note: `#[warn(unexpected_cfgs)]` on by default

warning: unexpected `cfg` condition value: `with_serde`
  --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:43:11
   |
43 | #[cfg(not(feature = "with_serde"))]
   |           ^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expected values for `feature` are: `buffer_sv2`, `default`, `no_std`, `prop_test`, `quickcheck`, and `with_buffer_pool`
   = help: consider adding `with_serde` as a feature in `Cargo.toml`
   = note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration

warning: trait `Variable` is never used
  --> v2/binary-sv2/no-serde-sv2/codec/src/codec/mod.rs:37:11
   |
37 | pub trait Variable {
   |           ^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: trait `IntoOwned` is never used
 --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:7:7
  |
7 | trait IntoOwned {
  |       ^^^^^^^^^

warning: `binary_codec_sv2` (lib) generated 4 warnings
warning: `binary_codec_sv2` (lib) generated 4 warnings (4 duplicates)
   Compiling codec_sv2 v1.2.1 (/Users/rachelrybarczyk/Development/StratumV2/Dev/stratum/protocols/v2/codec-sv2)
    Finished `dev` profile [optimized + debuginfo] target(s) in 0.61s
cargo run --example encrypted --features noise_sv2
warning: unexpected `cfg` condition value: `with_serde`
  --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:31:11
   |
31 | #[cfg(not(feature = "with_serde"))]
   |           ^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expected values for `feature` are: `buffer_sv2`, `default`, `no_std`, `prop_test`, `quickcheck`, and `with_buffer_pool`
   = help: consider adding `with_serde` as a feature in `Cargo.toml`
   = note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration
   = note: `#[warn(unexpected_cfgs)]` on by default

warning: unexpected `cfg` condition value: `with_serde`
  --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:43:11
   |
43 | #[cfg(not(feature = "with_serde"))]
   |           ^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expected values for `feature` are: `buffer_sv2`, `default`, `no_std`, `prop_test`, `quickcheck`, and `with_buffer_pool`
   = help: consider adding `with_serde` as a feature in `Cargo.toml`
   = note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration

warning: trait `Variable` is never used
  --> v2/binary-sv2/no-serde-sv2/codec/src/codec/mod.rs:37:11
   |
37 | pub trait Variable {
   |           ^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: trait `IntoOwned` is never used
 --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:7:7
  |
7 | trait IntoOwned {
  |       ^^^^^^^^^

warning: `binary_codec_sv2` (lib) generated 4 warnings
warning: `binary_codec_sv2` (lib) generated 4 warnings (4 duplicates)
warning: methods `into_aesg` and `into_chacha` are never used
  --> v2/noise-sv2/src/cipher_state.rs:26:8
   |
7  | pub trait CipherState<Cipher_: AeadCipher>
   |           ----------- methods in this trait
...
26 |     fn into_aesg(mut self) -> Option<Cipher<Aes256Gcm>> {
   |        ^^^^^^^^^
...
33 |     fn into_chacha(mut self) -> Option<Cipher<ChaCha20Poly1305>> {
   |        ^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: associated items `name`, `hkdf_3`, and `ecdh` are never used
   --> v2/noise-sv2/src/handshake.rs:10:8
    |
9   | pub trait HandshakeOp<Cipher: AeadCipher>: CipherState<Cipher> {
    |           ----------- associated items in this trait
10  |     fn name(&self) -> String;
    |        ^^^^
...
69  |     fn hkdf_3(
    |        ^^^^^^
...
109 |     fn ecdh(private: &[u8], public: &[u8]) -> [u8; 32] {
    |        ^^^^

warning: `noise_sv2` (lib) generated 2 warnings
    Finished `dev` profile [optimized + debuginfo] target(s) in 0.03s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codec-sv2 protocols Lowest level protocol logic refactor Implies refactoring code
Projects
None yet
Development

No branches or pull requests

5 participants