Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial push #45

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Initial push #45

wants to merge 3 commits into from

Conversation

patrickingraham
Copy link

No description provided.

Copy link
Member

@tribeiro tribeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few inline comments.. I do have a general comment. I think the title is broader than the overall context we intend this to be, or maybe I misunderstood the context.

My impression is that this was supposed to handle the user environment + script queue access to a set of tssw stack packages (ts_observatory_control, ts_standardscripts, ts_externalscripts, etc..). But the title, introduction and some of the content seems to suggest a much broader context (e.g. CSC versioning). I think we need to align these expectations.

As you will see in my comments below, I propose a title change. Also I think we could call this "Observing Environment" which consists of the environment used by observers in nublado and the ScriptQueue. This allows us to differentiate CSCs from OE. I made a suggestion for the first paragraph of the introduction based on this proposal.


This command will move your current ``~/notebooks/.user_setups`` file to your home directory with a timestamp, then create a new file which points to the base environment.

Use of the deploy branch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's touch base about this..


The base environment is defined by a list of tags, or commit hashes, representing the packages which are deemed to be stable.
Note that this list needs to be maintained daily, and can only be updated by the production environment maintainers.
The list itself is stored in ``/opt/obs_user/base_environment.yaml`` (FIXME-better way to do this? CSV file with package name and branch (or commit hash?)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is already in the cycle build, I think instead of having two sources of truth, we should probably use the same file as in the cycle build. I think we could either clone the repository here or copy the one file with the information we want, that is cycle/cycle.env.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I addressed this above?

The default for each package should be a tagged commit that has been merged to the deploy branch.
However, in certain cases it may be a specific commit of the deploy branch, specifically if bug-fixes have been applied that are not yet incorporated into the main branch.

Question: How do we update this when a new cycle build occurs? Just part of a procedure? FIXME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a service in Kubernetes that does this.. Basically an app which updates the source file anytime cycle build is updated..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe? I'd be a little worried that a "bot" can update the file that might have had something special in it written by a user? One can imagine that if you made a new revision and broke something, then you make that new broken version your base environment
Need to chew on that I guess.

Question: How do we update this when a new cycle build occurs? Just part of a procedure? FIXME


On-sky testing then rolling back a CSC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, I thought this was only for the user environment and the ScriptQueue, not for CSCs...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I was thinking of a use-case, which may just be completely separate, where we deploy a new CSC for people to try, but then give instructions on how they can roll back if they hit a bug, without a 2am phone call.
It's sort of related to "resetting back to the base environment" -- but more for CSCs.

.. _Production-Environment-Package-Management:

#########################################
Production Environment Package Management
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

Observing Environment Package Management

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't like my title either :D
So long it implies both the summit, TTS, and BTS then I think it's fine... and I think your suggestion does that.


.. This section should provide a brief, top-level description of the procedure's purpose and utilization. Consider including the expected user and when the procedure will be performed.

This page explains how package management in controlled on production environments, specifically the summit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this from an "Observing Environment Package Management" point of view, consider the following update to this paragraph.

This page explains how package management is controlled on the Observing Environment, specifically the summit.
The Observing Environment is mainly composed of the ScriptQueue and users nublado instance, which are responsible for driving the observatory operations.
(new paragraph?)
Even though the Observing Environment is initially built and deployed alongside the rest of the Observatory Control System components (e.g. CSCs), it needs to follow a slightly different management approach to support on-the-fly updates.
Nominally, the CSC versions and supporting software packages deployed on the summit are managed by the cycle build <https://ts-cycle-build.lsst.io/>_.
Nevertheless, the procedure to deploy hot-fixes for CSCs (e.g. create an alpha tag on the package, update the cycle build and redeploy), is not suitable for the Observing Environment, which requires a much larges set of packages and therefore, longer build times.
These patches to the Observing Environment needs to be rapidly rolled out to the summit and must be simultaneously available to the ScriptQueue CSC as well as the observer's notebook environments.
Most importantly, when testing new software on the summit, which may not be entirely stable, it is critical to have a mechanism to immediately roll back all packages to a designated stable version.
We call this the base observing environment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used most of this... but changed a few things.

These fixes must be simultaneously available to the ScriptQueue CSC as well as the observer's notebook environments.
Most importantly, when testing new software on the summit, which may not be entirely stable, it is critical to have a mechanism to immediately roll back all packages to a designated stable version.
We call this the **base environment**.
Note that this version may include bug fixes and/or new functionalities that are not included in the previously released cycle build.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, ideally, the "base environment" would match a revision of the cycle build. This means we have a unified way to control what we deploy at the summit. So the procedure to update the "base environment" would basically be to create a new revision. Now, this doesn't mean we have to build a new version of the SQ and nublado to get it deployed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"basically create a new revision" --> What you really mean is update a local version of cycle.env, then have some sort of process that matches the NFS mounted packages to that file, yes?
I think for clarity we really should refer to cycle.env as something else though...
Also, we should strip out all the packages (e.g. CSCs) that would be no-ops as they'll just add confusion.

@patrickingraham patrickingraham force-pushed the tickets/DM-36686 branch 2 times, most recently from 027a6d2 to 17b083c Compare October 25, 2022 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants