IO Abstraction: Introducing PathId & FileSystem (Phase 1) #3219

hs-apotell · 2022-09-09T01:42:31Z

IO Abstraction: Introducing PathId & FileSystem (Phase 1)

NOTE: This is only the first phase of the large goal of IO abstraction.

Big picture goal is to abstract IO operations out of Surelog and into a class implementation that can be overriden by client application. This facilitates implementation of other more extensive features (like support for UNC paths, portale cache, multi-source root directories, cache archives, etc). Adding those directly to Surelog is not ideal since in many cases this features pull in other thrid-party dependecnies that might not have the same licensing terms as Surelog (and so leave it to the client/application to decide).

Introducing two new classes - PathId & FileSystem

PathId - New class parallel to SymbolId that represent path-like objects for IO operations. Each instance can represent a directory or a file but the internal representation doesn't necessarily have to be a path. For instance, in case of compressed archive of sources, it can be a string representation of the archive + the offset. The only enforcement is that the representation needs to be printable i.e. should be convertible to a string (since thee representation is stored in SymbolTable). For the advanced users. this limitation can very easily be overcome and is left as an exercise.

FileSystem - This reroutes access to native file system or in special advanced cases can restrict access to the file system. All operations of PathId are meant to be routed thru' this class. This commit contains only the bare-bones of the implementation (to keep the rest of the change review-able). Follow up commit will add the real meat to this implementation.

Most of this change is about switching from SymbolId to use of PathId where applicable. Some specific ones that may be of interest to the reviewer.

Restrict use of std::filesystem to minimal scope (functions where possible). Do NOT pass std::filesystem::path across function boundaries. Use PathId instead.
Do NOT cache unpinned strings, especially the ones that represent paths. Add them to the SymbolTable and resolve it on restore.
PreprocessFile::m_fileId used to represent both a file path (when processing a file) or a macro name (when processing a macro). This was a problem since now the path needs to be represented using PathId but the macro name is represented using SymbolId. So, added two independent variables PreprocessFile::m_macroId & PreprocessFile::m_fileId.

NOTE: This is only the first phase of the large goal of IO abstraction. Big picture goal is to abstract IO operations out of Surelog and into a class implementation that can be overriden by client application. This facilitates implementation of other more extensive features (like support for UNC paths, portale cache, multi-source root directories, cache archives, etc). Adding those directly to Surelog is not ideal since in many cases this features pull in other thrid-party dependecnies that might not have the same licensing terms as Surelog (and so leave it to the client/application to decide). Introducing two new classes - PathId & FileSystem PathId - New class parallel to SymbolId that represent path-like objects for IO operations. Each instance can represent a directory or a file but the internal representation doesn't necessarily have to be a _path_. For instance, in case of compressed archive of sources, it can be a string representation of the archive + the offset. The only enforcement is that the representation needs to be *printable* i.e. should be convertible to a string (since thee representation is stored in SymbolTable). For the advanced users. this limitation can very easily be overcome and is left as an exercise. FileSystem - This reroutes access to native file system or in special advanced cases can restrict access to the file system. All operations of PathId are meant to be routed thru' this class. This commit contains only the bare-bones of the implementation (to keep the rest of the change review-able). Follow up commit will add the real meat to this implementation. Most of this change is about switching from SymbolId to use of PathId where applicable. Some specific ones that may be of interest to the reviewer. * Restrict use of std::filesystem to minimal scope (functions where possible). Do NOT pass std::filesystem::path across function boundaries. Use PathId instead. * Do NOT cache unpinned strings, especially the ones that represent paths. Add them to the SymbolTable and resolve it on restore. * PreprocessFile::m_fileId used to represent both a file path (when processing a file) or a macro name (when processing a macro). This was a problem since now the path needs to be represented using PathId but the macro name is represented using SymbolId. So, added two independent variables PreprocessFile::m_macroId & PreprocessFile::m_fileId.

hs-apotell · 2022-09-09T01:43:12Z

@alaindargelas This PR pairs with chipsalliance/UHDM#793

hs-apotell · 2022-09-09T01:47:25Z

@alaindargelas @hzeller Since this is only the first phase of the bigger change, the attached files should help provide larger picture of the end goal.

FileSystem.h.txt
FileSystem.cpp.txt

A follow up phase 2 will introduce these files and all relevant changes to go with it across the project.

hzeller · 2022-09-09T16:16:11Z

There are absolutely no unit-tests that verify the new path-ID behavior, nor are there any extension to existing unit tests to verify corner cases.
Please for the next itereration, add unit also provide unit tests.

hzeller

A first round of review comments.

Unfortunately, this one is already merged, but before any other pull request can be added in this series, we first need to work out all the details here.

In general, this is too much and not reviewable, you really need to break independent changes to what they are

DesignComponent: change return value of name() from const std::string& to std::string can be a separate PR. In particular it would be then easy to discuss why this is not std::string_view and work towards what it is
PathId should start out as a no-op: have a typedef SymboldID to PathId, then change everything that needs to be a path from SymbolId to Pathid. Huge text change, but easy to review, no actual change.
All new classes need a description of what they are. Also naming: FileSystem sounds like a poor name for what it does, or it is not explained enough.
All new classes need unit tests.
Why does something that is called SometingId have a reference to the thing that supplies these IDs ? This sounds like a convenience hack.

Here are a first round of comments that we should address before the next phase of the filesystem series.

Whenever you work on changes, please make sure to break them up in simple steps

big changes only in a way that they are No-ops first, e.g. with the typedef of SymboldId -> NodeId.
introducing new classes one at a time with proper unit tests
Wire new classes up as needed in separate changes.

include/Surelog/Common/PathId.h

hzeller · 2022-09-09T16:19:29Z

include/Surelog/Common/PathId.h

+    return *this;
+  }
+
+  const SymbolTable *getSymbolTable() const { return m_symbolTable; }


It is unclear why a path-ID needs to know about the SymbolTable. It sounds like a leaky abstraction or 'convenience' member.
Ideally, this should be removed. Anything called 'ID' should not not carry more context around. Or it needs to be clearly explained in the PathId class description.

It's intentional. It makes the implementation of FileSystem API lot easier and elegant. Please check the full implementation of FileSystem in files attached here

I did bring up this design decision in #3054 and we had some back and forth before it went silent. For SymbolId this might not be as necessary since the use of it is restricted to Surelog code. But for PathId, it is meant to be accessed and used in application code. Putting the overhead of managing the owner SymbolTable on client application seemed unnecessary.

leaking abstraction often makes implementation easier, but it doesn't mean it is the right thing.

I'll have a look at the CC file, but it currently smells like a convenience hack.

include/Surelog/Common/SymbolId.h

include/Surelog/Common/FileSystem.h

hzeller · 2022-09-09T16:32:47Z

include/Surelog/Common/FileSystem.h

+  virtual const std::filesystem::path &getCwd();
+  virtual PathId getCwd(SymbolTable *const symbolTable);
+
+  virtual PathId copy(PathId id, SymbolTable *const toSymbolTable);


Given that this is a class called FileSystem, a function called copy() copies files around, right ?

If not, this needs a different name, and also, it needs a description.

Also, don't make the parameter const, it makes absolutely no difference (as you const the value, not the point location), and thus is just confusing.

Given that this is a class called FileSystem, a function called copy() copies files around, right ?

Good point. I didn't think along those lines. Do you have a suggestion of what this function should be renamed to. The purpose is to copy a PathId from one SymbolTable into another (giving the FileSystem a chance for any bookkeeping it might need to do).

In which contexts is this used ? Maybe duplicatePath() could be a good name ?
What are the scenarios where this will be used ?

Location takes in a PathId object but the owner of this object could be whoever. When an error is recorded with the ErrorContainer, it copies the contained PathId into its own SymbolTable.

In some ways, the name isn't so bad. FileSystem is responsible for everything PathId and so copy with PathId as parameter can be a potential overloaded function.

PathId copy(PathId srcId, SymbolTable *destinationSourceTable); // Copies srcId into destinationSymbolTable bool copy(PathId srcId, PathId dstId); // Copies files from location srcId to dstId

include/Surelog/Common/FileSystem.h

include/Surelog/Common/NodeId.h

include/Surelog/Common/PathId.h

include/Surelog/Design/DesignComponent.h

hs-apotell · 2022-09-09T17:35:20Z

There are absolutely no unit-tests that verify the new path-ID behavior, nor are there any extension to existing unit tests to verify corner cases. Please for the next itereration, add unit also provide unit tests.

PathId is a data container class not intended to have any logic. There are few operator functions that might be worth some test cases that I can add.

As for the FileSystem itself, yes, I do have uni tests covering most file operations and example to illustrate In-memory file system, portable cache and multiple source directories. These will be part of the follow up Phase-3 PR.

hs-apotell · 2022-09-09T18:45:16Z

To be honest, this isn't one night effort. This was a group effort from over three months. We did a number of design iterations for deciding how we wanted to implement some of the requested features in our tool. The change was organic but the effort paid off. I am cherry-picking from a much larger change in an effort to get this back into public repository.

Frankly, I thought that's what this change was - a big text change and doesn't change anything in terms of logic. No test case broke and no test log really change in any significant meaningful manner. All this change was doing is introducing PathId class in the codebase and fixing all the fallouts from it.

PathId should start out as a no-op: have a typedef SymboldID to PathId, then change everything that needs to be a path from SymbolId to Pathid. Huge text change, but easy to review, no actual change.

That approach wouldn't have caught all the places that needed attention. I did do a few PR before this one for the changes that I could easily pull out from the larger effort. But breaking this into any smaller would have meant redoing (and verifying) a lot of work that was already completed.

DesignComponent: change return value of name() from const std::string& to std::string can be a separate PR. In particular it would be then easy to discuss why this is not std::string_view and work towards what it is

And I wouldn't have known that such a change would be needed unless I had the PathId change in place.

All new classes need a description of what they are. Also naming: FileSystem sounds like a poor name for what it does, or it is not explained enough.

Sure. Will do so.

All new classes need unit tests.

PathId meant to be a data class and so didn't see much value in adding tests. There are few operator functions (comparison operators) that might demand such and so will add those in a follow up PR.

Why does something that is called SometingId have a reference to the thing that supplies these IDs ? This sounds like a convenience hack.

I have already replied back to your comment about this on the review. We can discuss this further there. IMO, PathId containing the owner SymbolTable makes sense to keep the API simpler. May be PathId shouldn't be called PathId.

Whenever you work on changes, please make sure to break them up in simple steps
big changes only in a way that they are No-ops first, e.g. with the typedef of SymboldId -> NodeId.
introducing new classes one at a time with proper unit tests
Wire new classes up as needed in separate changes.

This is good rule book to follow when you have a concrete design in front of you to implement. Much of our effort is organic and so I try my best to contribute changes back. It's not always that easy.

All that said, I am happy to iterate on specific PR if you feel that needs more work and needs to be broken up. Unfortunately. given time constraints, it might not always be possible but will make best effort.

…rnings * Fixed build warnings and also updated the workflow to match local settings so the warnings should now be reported on CI as well. * Add unit tests for PathId * Some documentation & decorative changes

hs-apotell · 2022-09-09T19:28:35Z

Follow up PR #3223

…rnings * Fixed build warnings and also updated the workflow to match local settings so the warnings should now be reported on CI as well. * Add unit tests for PathId * Some documentation & decorative changes

hzeller · 2022-09-09T20:55:25Z

To be honest, this isn't one night effort. This was a group effort from over three months. We did a number of design iterations for deciding how we wanted to implement some of the requested features in our tool. The change was organic but the effort paid off. I am cherry-picking from a much larger change in an effort to get this back into public repository.

Frankly, I thought that's what this change was - a big text change and doesn't change anything in terms of logic. No test case broke and no test log really change in any significant meaningful manner. All this change was doing is introducing PathId class in the codebase and fixing all the fallouts from it.

PathId should start out as a no-op: have a typedef SymboldID to PathId, then change everything that needs to be a path from SymbolId to Pathid. Huge text change, but easy to review, no actual change.

That approach wouldn't have caught all the places that needed attention. I did do a few PR before this one for the changes that I could easily pull out from the larger effort. But breaking this into any smaller would have meant redoing (and verifying) a lot of work that was already completed.

But that is how code reviews and pull requests work.

Sure, you might not know beforehand what changes, that is why you work on a full change first until you're happy with it. The work then to make it reviewable is to break it up into the parts.
So now that you know what parts need to change to PathId, you then can retroactively do a change where everything is text-changed but made a typedef. etc.

Find all the changes independent from the rest, but that you also need for everything coming together, that can have their own review. Such as the const std::string& to std::string return value.

Yes, it requires a little bit more work to prepare a pull request, but that way it is not just a code-dump, but a well thought-out sequence of design decisions that the reviewers can follow.
I usually start out with the big local change (let's call it the creative phase), but then for review I re-do the exact same changes but break them out into separate commits.
That process often discovers that there are things that I don't have to change anymore.
Yes, it might take an hour of work, but it also introduces clarity and it is needed to convey ideas to fellow engineers.

This code review contains 116 changed files! With a mix of functionality addition, text changes, return value changes...
That many files should only really be with no-op starting with a typedef (such as SymbolId -> FileId where needed), or changing a single return value type throughout and have that an easy-to-review thing. So, large changes should only be text changes. Sometimes it requires a little 'dance': do a bulk change of something, then do some functionality change, do a bulk-change of some things turned back. And that is ok.

Any functionality change should be as minimal as possible (a few files and all the corresponding unit tests). Followed by a separate PR or at least commit that wires it up.

DesignComponent: change return value of name() from const std::string& to std::string can be a separate PR. In particular it would be then easy to discuss why this is not std::string_view and work towards what it is

And I wouldn't have known that such a change would be needed unless I had the PathId change in place.

That is the point: you first develop things and find out what all is needed. That is for sure.
But that is not the pull request. It is your responsibility to then break these up into independent, coherent changes before sending them as review. It also gives you another chance to reflect on why each change is needed (sometimes some change needed earlier in a larger effort turns out to not be needed anymore).

All that said, I am happy to iterate on specific PR if you feel that needs more work and needs to be broken up. Unfortunately. given time constraints, it might not always be possible but will make best effort.

Please do. I am not saying that it is simple. And often it requires to re-do the change manually into multiple commits.
The burden you put on Alain, me or others who have to review it is much higher; so you're transferring your time constraint to multiple times that effort to the reviewers. You see that Alain often just gives up on the review and merges, hoping for the best.

Issue #3221, #3219: Follow up and fixing warnings

hs-apotell · 2022-09-10T07:04:11Z

Taking the PathId conversation to discussion board #3228

hs-apotell requested review from hzeller and alaindargelas September 9, 2022 01:42

Merge branch 'master' into pathid

54896f5

alaindargelas merged commit 40766f4 into chipsalliance:master Sep 9, 2022

hzeller reviewed Sep 9, 2022

View reviewed changes

hs-apotell mentioned this pull request Sep 9, 2022

Issue #3221, #3219: Follow up and fixing warnings #3223

Merged

hs-apotell deleted the pathid branch September 9, 2022 19:28

alaindargelas added a commit that referenced this pull request Sep 10, 2022

Merge pull request #3223 from hs-apotell/pathid-p2

d6804f8

Issue #3221, #3219: Follow up and fixing warnings

alaindargelas mentioned this pull request Sep 10, 2022

Package build broken #3227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IO Abstraction: Introducing PathId & FileSystem (Phase 1) #3219

IO Abstraction: Introducing PathId & FileSystem (Phase 1) #3219

hs-apotell commented Sep 9, 2022

hs-apotell commented Sep 9, 2022

hs-apotell commented Sep 9, 2022

hzeller commented Sep 9, 2022

hzeller left a comment

hzeller Sep 9, 2022

hs-apotell Sep 9, 2022

hzeller Sep 9, 2022

hzeller Sep 9, 2022

hs-apotell Sep 9, 2022

hzeller Sep 9, 2022

hs-apotell Sep 9, 2022

hs-apotell commented Sep 9, 2022

hs-apotell commented Sep 9, 2022

hs-apotell commented Sep 9, 2022

hzeller commented Sep 9, 2022

hs-apotell commented Sep 10, 2022

IO Abstraction: Introducing PathId & FileSystem (Phase 1) #3219

IO Abstraction: Introducing PathId & FileSystem (Phase 1) #3219

Conversation

hs-apotell commented Sep 9, 2022

hs-apotell commented Sep 9, 2022

hs-apotell commented Sep 9, 2022

hzeller commented Sep 9, 2022

hzeller left a comment

Choose a reason for hiding this comment

hzeller Sep 9, 2022

Choose a reason for hiding this comment

hs-apotell Sep 9, 2022

Choose a reason for hiding this comment

hzeller Sep 9, 2022

Choose a reason for hiding this comment

hzeller Sep 9, 2022

Choose a reason for hiding this comment

hs-apotell Sep 9, 2022

Choose a reason for hiding this comment

hzeller Sep 9, 2022

Choose a reason for hiding this comment

hs-apotell Sep 9, 2022

Choose a reason for hiding this comment

hs-apotell commented Sep 9, 2022

hs-apotell commented Sep 9, 2022

hs-apotell commented Sep 9, 2022

hzeller commented Sep 9, 2022

hs-apotell commented Sep 10, 2022