Skip to content

End of Phase 2 Technical Report

richardgilham edited this page May 6, 2020 · 1 revision

It's Tuesday 7th April 2020 and Fab has just reached the end of Phase 2. A few days later than planned due to the sudden changes brought about by the COVID-19 pandemic. But the Fab team are happy to report that we have been able to implement what was planned. This is a short report to outline how things went.

Recap: The aims of Phase 2

We are still fairly early on in the development process for Fab, and as such our aims for Phase 2 were to try and prototype 3 core pieces of functionality:

  • A process for descending the source tree of files and gathering information into a database
  • The structures we would use to represent "Transformations" (steps in the build process)
  • A queue system which would allow us to execute tasks (rather than a DAG)

How did it go?

Initially it was thought that we would quickly prototype each of these areas in an isolated way - producing quick standalone test examples which showed how each of the above areas could work. However as things got started we realised that such a fragmented approach would actually create more work for ourselves when it came time to bring the different ideas together (especially since the ideas have a certain amount of overlap). So instead we chose to work in the same space, accepting that some clashes and adaptation to each other's contributions would be required.

The source tree descent

This area came together before the others and laid much of the groundwork for the structure of the application and the various bits of testing around it. A descent through the files making up the target source tree using the visitor pattern, which runs each source file through an "Analyser"; examining the source (Fortran only for now) to find out what it contains and on which other modules it depends, and store the results in a database. The descent also calculates the Adler32 hash of all files in the source tree and saves them in the database (this will be important later on when Fab needs to determine if files have changed for incremental builds).

Transformations/Tasks

Building on the above work, a series of classes were created to act as the build steps. Though we originally intended for these to be called "Transformations", we realised that it probably made sense for the "Analyser" implemented by the source tree descent to inherit from these classes. Since the analysis doesn't really "transform" anything (it needs an input file but produces no output other than updating the database), and because we anticipate that future subclasses may also fail to fit the spirit of that name - it was renamed to a "Task". A Task is able to report what file/s it depends on, what file/s it produces (even prior to running), and has a "run" method that can be called to perform its actions. Tasks were created to cover Pre-processing (of Fortran with cpp), Compilation and Linking (again - of Fortran); along with the supporting logic meaning that Fab can effectively perform a simple end-to-end build at this point. We realised early on that tasks like the pre-processing would need to happen before the analysis, so changes were made to allow the descent to cope with this.

Queue System

Alongside the other bits of work a queue manager and worker setup was written making use of Python's multiprocessing module to create a dynamic queue whose workers will sit idle, running tasks as they are added to the queue. The queue has been designed to accept the "Task" classes and has mechanisms for holding back the main application whilst the queue empties (for example, to wait for all database-population tasks to complete before compiling starts). The original intention was that the queue would be used for all source tree descent tasks as well as all compilation and linking tasks, however we found that the multiprocessing module would not readily allow for parallelism where an active database connection was involved. So for now the queue is not enabled in the descent - we hope to be able to resolve this during Phase 3.

What else?

In the course of exploring the main aims above various other parts of the system and our processes around it started to come together as well:

  • As we realised the benefits of pytest and our unit testing, we were able to use its coverage capabilities to ensure maximum utilisation of code within our tests.
  • Since the database would have such an important role in the final application, an additional tool - fab-dump was created to allow for easy dumping of the database contents, for assistance with debugging.
  • We adopted the usage of mypy - a 3rd party tool which enforces stricter type checking; which proved to be very useful in avoiding mistakes.
  • As we learnt to make better use of the Github Actions we now have a fairly comprehensive CI workflow setup which runs mypy, flake8, pytest and our set of system-tests.
Clone this wiki locally