Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for updating disks #2496

Open
MSSedusch opened this issue Feb 7, 2024 · 5 comments
Open

Support for updating disks #2496

MSSedusch opened this issue Feb 7, 2024 · 5 comments
Assignees

Comments

@MSSedusch
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When you want to update e.g. the IOPS or MBps settings of a disk, BOSH creates a new disk with the new properties and copies the data from the old disk to the new one. This can take a long time when you have multiple TiB of data stored on the disk. Most cloud providers support updating an existing disk so it might not be necessary to migrate the data to a completely new disk.

One example is as stated above, changing the IOPS and throughput values of a disk (support by e.g. AWS and Azure)
cloudfoundry/bosh-aws-cpi-release#137

Another example is to migrate an Azure Standard HDD disk to Premium which is directly supported by Azure. Or changing from GP2 to GP3 on AWS.

Describe the solution you'd like
Create a new update_disk_iaas_specific method in this block

if use_iaas_native_disk_resize?(old_disk, new_disk)

update_disk_iaas_specific would try to call update_disk in the cloud specific CPI (e.g. BOSH Azure CPI). If update_disk exists, it would check if the update is possible and raise Bosh::Clouds::NotSupported if the change is not possible without a migration.

In addition, adding a new configuration parameter similar to enable_cpi_resize_disk to enable this new behaviour, default would be disabled.

Describe alternatives you've considered
Alternative is to manually update the disks to the new parameters and somehow modify the BOSH state in the database but that is very error prone.

Additional context

I am happy to create a draft PR for this feature in the BOSH director and do an implementation of this feature in the BOSH Azure CPI

@jpalermo
Copy link
Member

Sorry for the delayed response.

Yeah, I can't think of any other ways of making this work. It's unfortunately a change with several touch points, although I haven't looked at them too closely yet.

  • Director will need the new functionality, and will have to figure out if it's something the CPI can do. Hopefully it's mostly just copy/paste from the resize disk though.
  • Agent might need to be involved in the process, possibly unmounting the disk before the changes can happen. But I'd expect that functionality to already exist unless it's somehow tightly coupled to "resize_disk"
  • Then of course any CPIs that want to support it will need to be updated.

At least none of the pieces seem like they'll have to have major changes to support this.

@MSSedusch
Copy link
Contributor Author

@jpalermo I started to work on this a few weeks ago, planning to test it in the next couple of weeks. The Azure feature itself is not yet available so there is no urgency at the moment. But if you could have a look if what I have done so far looks good, that would be highly appreciated :)

https://github.com/cloudfoundry/bosh/compare/main...MSSedusch:bosh:update_disk?expand=1

@stefanator12
Copy link

@MSSedusch did you manage to test the feature?

@MSSedusch
Copy link
Contributor Author

not yet - we are also waiting for a new feature on the Microsoft side, at least for Pv1 -> Pv2 we need that feature.

@s4heid
Copy link
Contributor

s4heid commented Aug 27, 2024

I have picked up the work @MSSedusch started and completed the implementation of the new update_disk method for the director. This method has been added to the bosh-cpi-ruby gem and implemented in the bosh-azure-cpi. There are several pull requests associated with this work item awaiting review:

I managed to conduct a few manual tests on a BOSH Director deployed on Azure, built from a development release incorporating my changes. I tested using a zookeeper deployment, modifying disk size and properties, and found that upgrade scenarios were successful with the IaaS native disk update feature. Importantly, in all these scenarios, there was no create_disk call; but the existing disk was updated.

  • Scenario 1: Change Disk Type from Standard_LRS to Premium_LRS

    request.body:

    PATCH {"sku":{"name":"Premium_LRS"}}
  • Scenario 2: Change Disk Type from Premium_LRS to PremiumV2_LRS and set cloud properties: iops: 4000, mbps: 200

    request.body:

    PATCH {"properties":{"diskIOPSReadWrite":4000,"diskMBpsReadWrite":200},"sku":{"name":"PremiumV2_LRS"}}
  • Scenario 3: With Disk Type PremiumV2_LRS change the cloud properties to iops: 5000, mbps: 300 and disk size to 12 GiB.

    request.body:

    PATCH {"properties":{"diskSizeGB":12,"diskIOPSReadWrite":5000,"diskMBpsReadWrite":300},"sku":{"name":"PremiumV2_LRS"}}

I will be out of office for the next 10 days. During this period, responses to any review might be delayed. However, I eagerly anticipate your feedback and will address it as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

No branches or pull requests

4 participants