Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add reusable workflow to build openedx images #1

Merged
merged 28 commits into from
Sep 4, 2024

Conversation

mariajgrimaldi
Copy link
Member

@mariajgrimaldi mariajgrimaldi commented Jul 29, 2024

Description

Add reusable workflow to build Open edX images based on the Picasso V2 pipeline definition. The main goal of this implementation is to prove the feasibility of adopting Github Actions to replace Jenkins for our current build process. For more details on the team's stand on the potential migration, please visit this post.

Implementation details

For this implementation, we first translated each step in the current Picaso V2 pipeline to its counterpart in GH actions. Then, we refactored them into smaller steps when possible for easier management. Here's an overview of the translation of each stage into steps:

Jenkins pipeline stages

  1. Login in Dockerhub
  2. Clone strains repository
  3. Get Global Variables
    3.1 Set strain name for TVM use
    3.2 Set tutor version for TVM use
  4. Configure TVM project
    4.1 Install TVM
    4.2 Initialize TVM project
    4.3 Copies strain into project folder
    4.4 Generates tutor environment
  5. Execute extra commands
    5.1 Update plugin index
    5.2 List plugins
    5.3 Enable distro
    5.4 Run syntax validation
    5.5 Run distro run-extra-commands
    5.6 Generates tutor environment
  6. Build Service
  7. Push service

GitHub steps

  1. Login in Dockerhub
  2. Checkout strains repository
  3. Set TVM project name and version in Github Environment
  4. Install TVM
  5. Configure TVM project
    5.1 Install TVM
    5.2 Initialize TVM project
    5.3 Copies strain into project folder
  6. Enable distro plugin
  7. Execute extra commands (run-extra-commands)
  8. Prepare docker if building MFE (optional)
  9. Build service image with no cache
  10. Push service image to DockerHub

Other design details to take into account:

How to test

  1. We need to create a caller workflow that uses this reusable (called) workflow. For reference, see the implementation details. The build.yml workflow actively uses the picasso/build.yml workflow and sets the necessary configurations to run correctly.
  2. To manually trigger the ednx-strains/.github/build.yml workflow execution, go to Actions > Build Open edX strain, fill in the necessary configuration, or use the default. I suggest using the MJG/redwood-test-image strain branch to avoid overriding the current image on dockerhub.
  3. Press run workflow.

image

See here a successful build: https://github.com/eduNEXT/ednx-strains/actions/runs/10208033113/job/28243841700

@mariajgrimaldi mariajgrimaldi changed the title feat: add first version of picasso builder action feat: add reusable workflow to build openedx images Aug 1, 2024
@mariajgrimaldi mariajgrimaldi marked this pull request as ready for review August 2, 2024 01:16
@mariajgrimaldi mariajgrimaldi requested a review from a team August 2, 2024 01:17
with:
repository: ${{ inputs.STRAIN_REPOSITORY }}
ref: ${{ inputs.STRAIN_REPOSITORY_BRANCH }}
token: ${{ secrets.REPO_DEPLOYMENT_KEY }}
Copy link
Member Author

@mariajgrimaldi mariajgrimaldi Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment to reviewers]: the default value in the ednx-strains workflow is the default github token (see here), which is unnecessary since the workflow is being used in the same repository. Instead, the token should correspond to a personal access token from a service account or an ssh key, also belonging to a service account, so the workflow can checkout other repositories: https://github.com/actions/checkout?tab=readme-ov-file#usage

Which one do you recommend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned checking out other repositories to keep backward compatibility, but if we decide to stick with this implementation, we should revisit whether there's a use case for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting that the other organizations interested in using this workflow will all have their own strains repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarly. Unless teams need to change defaults from the caller workflow (e.g: https://github.com/eduNEXT/ednx-strains/blob/master/.github/workflows/build.yml#L39-L42) they shouldn't need to.

Copy link
Member Author

@mariajgrimaldi mariajgrimaldi Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to figure out whether to configure a token or SSH key. Our current setup uses an SSH key associated with a service account. So, I'll implement this same approach in another commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the commit: a43ca13. I created another strains repo (private) to test whether the workflow checks it out correctly: https://github.com/eduNEXT/strains

@@ -0,0 +1,161 @@
name: Picasso V2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment to reviewers]: this is a private repository, so only repositories from the edunext organization can use the workflow, see this documentation for more details. According to our previous discussions, the plan for this workflow is also to be used by other organizations. Which means this repository should be public.

Is that feasible for this repository? Considering that the Jenkins pipeline implementation is private as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, I don't see any major issues with making this repo public. I would like to hear from those who worked on the Jenkins implementation to see if this was previously discussed and to understand if there was any specific reason for keeping the implementation private. It might be simply that they didn't see a need for it to be public since Jenkins allows access to the service for as many people as needed without requiring access to the script repository (?)

Copy link
Member Author

@mariajgrimaldi mariajgrimaldi Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I should've added this context to my previous comment but I forgot. I created the dedalo-scripts repository and also committed the first changes. I didn't know at the time whether it would be a long-term solution or what implementations would be hosted there, so I used private as the default visibility setting. Now, I'd argue against this decision. The dedalo-scripts repository looks like a team-specific solution, but it shouldn't be, mainly because it's solving the tech-wide problem of how to build Open edX images.

Knowing this, this repo should be public regardless of the decisions made about the visibility of dedalo-scripts.

Comment on lines 5 to 31
inputs:
STRAIN_REPOSITORY:
description: 'The repository for strains to checkout. It should follow the format: ORGANIZATION/REPO'
required: true
type: string
STRAIN_REPOSITORY_BRANCH:
description: 'The branch of the repository to checkout'
required: true
type: string
STRAIN_PATH:
description: 'The path within the repository for strains'
required: true
type: string
SERVICE:
description: 'The service name to build'
required: true
type: string
secrets:
DOCKERHUB_USERNAME:
description: 'DockerHub username for login'
required: true
DOCKERHUB_PASSWORD:
description: 'DockerHub password for login'
required: true
REPO_DEPLOYMENT_KEY:
description: 'Deployment key for repository checkout'
required: true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment to reviewers]: What other types of configurations do you find useful?

Comment on lines +115 to +127
run: |
. .tvm/bin/activate

major_version=$(pip show tutor | grep Version | awk '{print $2}' | cut -d'.' -f1)
distro_version=$(
git ls-remote --tags https://github.com/eduNEXT/tutor-contrib-edunext-distro \
| grep -E "$major_version\\.[0-9]+\\.[0-9]+$" \
| tail -n 1 \
| awk -F'/' '{print $NF}' \
)

pip install git+https://github.com/eduNEXT/tutor-contrib-edunext-distro@$distro_version
tutor plugins enable distro
Copy link
Member Author

@mariajgrimaldi mariajgrimaldi Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment to reviewers]: I removed these lines from the original script since we're not using the plugin index to install distro:

tutor plugins update
tutor plugins list

Please let me know if you think otherwise.

shell: bash
working-directory: ${{ inputs.STRAIN_PATH }}/${{ env.TUTOR_APP_NAME }}
run: |
. .tvm/bin/activate
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment to reviewers]: Which environment variables does this script set? I think we can store those in the Github Environment, so there's no need to activate the tvm environment before running each step.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we could find a workaround to avoid activating the environment in each step, but I think we can leave it as an improvement for another PR

Comment on lines +140 to +142
echo "[worker.oci]" > buildkit.toml
echo "max-parallelism = 2" >> buildkit.toml
docker buildx create --use --node=max2cpu --driver=docker-container --config=./buildkit.toml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note to author]: add a comment explaining why we're doing this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this connection error with the parallel build also occur in the Jenkins pipeline? Could this potentially slow down build times or impact us in other ways?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked internally about the Jenkins agent's configuration but am still waiting for an answer so I'll ask again. I'll post here once I know more.

And yes, limiting parallelism will slow down the building process compared to the default buildkit configuration. Here's a successful build with this buildkit config, which takes about 20 minutes. I'll compare this time with our current building time for MFEs in Jenkins.

I was also thinking about taking this to the Tutor Users' Group meeting because it seems like it hasn't been addressed upstream yet.

Copy link

@magajh magajh Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks
Taking this to the Tutor users group seems like a good idea. And if it's something that can be resolved in the near future, and we can help push this forward, then I don't think it should be a major concern for us

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that this is not a blocker for the MVP. I opened this issue so we can work on it later: #2

run: |
. .tvm/bin/activate
tutor config save
tutor images build $SERVICE --no-cache
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note to author]: I don't think we need the --no-cache argument.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why you think we wouldn't need this? Especially considering that Picasso builds the images with the same --no-cache argument, and we want to compare the build times as fairly and accurately as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. Our current jenkins setup uses dynamically provisioned agents, each with a max number of concurrent builds. As I understand, --no-cache prevents one of those concurrent builds from using docker cache layers from another concurrent build. This wouldn't happen in this setup since no concurrent builds exist in the same GH runner. Also, each time a runner is provisioned, it's done from scratch without using cache of any form, including from previous builds. Therefore, this setup would behave the same with or without the --no-cache flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alec4r @MaferMazu, please correct me if I'm wrong. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decide it after get some numbers and make some tests with the MVP, but I agree with you so far.

Copy link

@magajh magajh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @mariajgrimaldi. We could make other adjustments later if we need to, but I think this is good enough for an MVP.

This seems to be functional enough to show to the other teams and get their feedback on whether this new implementation is useful and easy to adapt to their use cases.

run: |
. .tvm/bin/activate
tutor config save
tutor images build $SERVICE --no-cache
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why you think we wouldn't need this? Especially considering that Picasso builds the images with the same --no-cache argument, and we want to compare the build times as fairly and accurately as possible.

Comment on lines +140 to +142
echo "[worker.oci]" > buildkit.toml
echo "max-parallelism = 2" >> buildkit.toml
docker buildx create --use --node=max2cpu --driver=docker-container --config=./buildkit.toml
Copy link

@magajh magajh Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks
Taking this to the Tutor users group seems like a good idea. And if it's something that can be resolved in the near future, and we can help push this forward, then I don't think it should be a major concern for us

shell: bash
working-directory: ${{ inputs.STRAIN_PATH }}/${{ env.TUTOR_APP_NAME }}
run: |
. .tvm/bin/activate
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we could find a workaround to avoid activating the environment in each step, but I think we can leave it as an improvement for another PR

with:
repository: ${{ inputs.STRAIN_REPOSITORY }}
ref: ${{ inputs.STRAIN_REPOSITORY_BRANCH }}
token: ${{ secrets.REPO_DEPLOYMENT_KEY }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting that the other organizations interested in using this workflow will all have their own strains repo?

@@ -0,0 +1,161 @@
name: Picasso V2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, I don't see any major issues with making this repo public. I would like to hear from those who worked on the Jenkins implementation to see if this was previously discussed and to understand if there was any specific reason for keeping the implementation private. It might be simply that they didn't see a need for it to be public since Jenkins allows access to the service for as many people as needed without requiring access to the script repository (?)

@mariajgrimaldi
Copy link
Member Author

@magajh, thank you for your feedback! I've published this PR in our discussion post, so other teams can also review the solution.

@Alec4r
Copy link
Member

Alec4r commented Aug 8, 2024

@mariajgrimaldi I was thinking that maybe instead of creating a workflow for building Docker images, it might be more convenient to do it as a GitHub Action. This way, each team or company could create their own workflow using the action, which would be much more flexible.

Additionally, this would allow us to easily consider the future of pushing to different Docker registries. Centralizing everything in a GitHub Action would give us the flexibility to adapt to different needs in the future without having to modify multiple workflows.

What do you think? Do you think we could evaluate the possibility of doing it as an action instead of a workflow?

@Squirrel18
Copy link
Member

Hi, @mariajgrimaldi. I don't have any specific comments on the implementation, however I do have some general comments.

  1. I think at some point it would be worth exploring the option of writing these workflows or actions using JS, as it will allow us more flexibility, extensibility, and we can also add unit or integration tests.
  2. I understand that initially, this workflow will be used in the starins repository, but we can reuse it in any other repository we want, right?
  3. Why are we going to use TVM? I don't think it's necessary to keep using TVM, at least for this version of Picasso, since each build will be in an isolated environment, unless we want to use some cache, but I don't think TVM will benefit us much in this case.
  4. Is there any requirement to make this action public? IMO, we don't need to make it public, as it is for internal use only.

@mariajgrimaldi
Copy link
Member Author

mariajgrimaldi commented Aug 9, 2024

Thank you, @Alec4r, for your comments!

I was thinking that maybe instead of creating a workflow for building Docker images, it might be more convenient to do it as a GitHub Action. This way, each team or company could create their own workflow using the action, which would be much more flexible.

Thank you for the suggestion! Implementing this as a composite action was my first approach; it looked something like this, but then I changed it to a reusable action.

I found debugging difficult when using it as an action. All the workflow steps happen in a single stage of the caller workflow, making reviewing the logs more difficult. Here's an example of the output when using a composite action: https://github.com/eduNEXT/ednx-strains/actions/runs/10172503969/job/28135183468. I don't think we're solving this problem by using reusable workflow instead cause there are still a bunch of logs to go through when debugging, but it's more organized since every single step is logged independently.

Also, when implementing these bundles of instructions, we're assuming a couple of things, like that they run on Ubuntu. How can we ensure this is complied with when using an action?

In any case, I'm not against implementing it as an action, but I'd like to understand better why that option would be better. So I opened a new issue so we can discuss it further: #4. Changing this implementation to a composite action is not much work, so we can keep the conversation open.

Additionally, this would allow us to easily consider the future of pushing to different Docker registries. Centralizing everything in a GitHub Action would give us the flexibility to adapt to different needs in the future without having to modify multiple workflows.

This is enough to use a reusable workflow: https://github.com/eduNEXT/ednx-strains/blob/master/.github/workflows/build.yml#L30-L42, the reusable workflow is called within a job unlike actions that are called as a step but the how is very similar. Therefore, we shouldn't have to modify multiple workflows just our centralized definition.

@mariajgrimaldi
Copy link
Member Author

mariajgrimaldi commented Aug 9, 2024

Thank you @Squirrel18, for the feedback!

  1. I think at some point it would be worth exploring the option of writing these workflows or actions using JS, as it will allow us more flexibility, extensibility, and we can also add unit or integration tests.

Thanks for the suggestion. I opened a new issue to discuss this further: #5

  1. I understand that initially, this workflow will be used in the starins repository, but we can reuse it in any other repository we want, right?

Yes! Our main goal with this workflow is that can be easily used in other repositories by calling it from any workflow, e.g.: https://github.com/eduNEXT/ednx-strains/blob/master/.github/workflows/build.yml#L30-L42

  1. Why are we going to use TVM? I don't think it's necessary to keep using TVM, at least for this version of Picasso, since each build will be in an isolated environment, unless we want to use some cache, but I don't think TVM will benefit us much in this case.

You're right. We don't need TVM since the environment is already isolated.

  1. Is there any requirement to make this action public? IMO, we don't need to make it public, as it is for internal use only.

This would work if and only if internal means within the edunext organization only: https://docs.github.com/en/actions/creating-actions/sharing-actions-and-workflows-from-your-private-repository#about-github-actions-access-to-private-repositories. I'm not sure how we can bypass that constraint for other orgs to use it.

@Squirrel18
Copy link
Member

Thanks for the answers, @mariajgrimaldi

This would work if and only if internal means within the edunext organization only: https://docs.github.com/en/actions/creating-actions/sharing-actions-and-workflows-from-your-private-repository#about-github-actions-access-to-private-repositories. I'm not sure how we can bypass that constraint for other orgs to use it.

I meant making it an open source project. In my opinion, there is no need to make it open source, as each vendor has its own build system and there is not much benefit in making it open source in terms of reputation.

@mariajgrimaldi
Copy link
Member Author

mariajgrimaldi commented Sep 3, 2024

Hi all, @Alec4r, @Squirrel18, and @magajh. Thanks for your comments, I really appreciate them. I'll merge this tomorrow morning (EST) as the first workflow version so we can start working on all the improvements discussed here and in other channels. Thanks again.

@mariajgrimaldi mariajgrimaldi merged commit 35e750b into main Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants