Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffered io merge into master #44

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Buffered io merge into master #44

wants to merge 19 commits into from

Conversation

Jo-stfc
Copy link

@Jo-stfc Jo-stfc commented May 17, 2023

No description provided.

snafus and others added 17 commits April 5, 2022 11:41
* Buffer implementation for XrdCeph

* Better error return code values

* Add timing into BufferIO

* Add timing into BufferSimple

* Utils code area

* Update raw data access and copy

* Adding Extents

* ReadV simple logic

* Add to own files the readV implementations

* Add to own files the readV implementations; cmake updated

* Logging improvements and write buffer updates

* Add IOadapter with blocking aio access

* Use IOadapter with blocking aio access

* Small logging update

* Reduce logging information; fix timeing to ms

* Reduce logging information;

* Reduced logging, and better use of aggregated metrics

* comment clean and typo fixes

* Remove uncessary file close

* Additional logging in case of problems

* Additional logging in case of problems

* allow option for buffering with IO or AIO buffer

Co-authored-by: james <[email protected]>
Co-authored-by: root <[email protected]>
* variable rpm name

* Update xrootd-ceph.spec.in

* Update makesrpm.sh

* Update makesrpm.sh
* Buffer implementation for XrdCeph

* Better error return code values

* Add timing into BufferIO

* Add timing into BufferSimple

* Utils code area

* Update raw data access and copy

* Adding Extents

* ReadV simple logic

* Add to own files the readV implementations

* Add to own files the readV implementations; cmake updated

* Logging improvements and write buffer updates

* Add IOadapter with blocking aio access

* Use IOadapter with blocking aio access

* Small logging update

* Reduce logging information; fix timeing to ms

* Reduce logging information;

* Reduced logging, and better use of aggregated metrics

* comment clean and typo fixes

* Remove uncessary file close

* Additional logging in case of problems

* Additional logging in case of problems

* allow option for buffering with IO or AIO buffer

* fix conflicts

* Allow for finite retries on EBUSY, else fail with EIO.

It is possible for a read/write from the buffer to return EBUSY due to an underlying issue.
In these cases, if the -EBUSY is returned out of XrdCeph, a large number of retries can originate.
It is better at this point for the transfer to be flagged as failed, and retried properly.
The code allows for 5 retries with a 1s sleep between them. If this doesn't work - which it might not -
then an -EIO error is returned to xrootd.
Other error messages are not affected.

* Better summary stats output for CephIOAdapterRaw

* Comment out a comment

Co-authored-by: james <[email protected]>
Co-authored-by: root <[email protected]>
* variable rpm name (#17)

* variable rpm name

* Update xrootd-ceph.spec.in

* Update makesrpm.sh

* Update makesrpm.sh

* Master cephnamelib (#16)

* Allow ceph.namelib to take params and apply translation to full path

* Reduce logging

Remove extraneous logging messages

* simplify parsing of namelib and added a log line for any remapped file

Co-authored-by: James <[email protected]>

* XRD-22 Fix ensuring the correct filename is passed to the CephFile instance. (#24)

A regression in previous commit meant that the filename was not correctly passed
to the CephFile instance. This fix ensures that the filename is set correctly.

Co-authored-by: james <[email protected]>

* re-introduce variable names to spec input (#27)

Co-authored-by: Jo-stfc <[email protected]>
Co-authored-by: James <[email protected]>
Reduced printouts. Only summary stats now produced, rather than the logging per read.

Co-authored-by: James Walder <[email protected]>
* XRD-12 Add timestamp information for ceph logging methods

Update the logwrapper method to print out the current timestamp in the initial section of output.

* Return permission denied on write attempt on existing file with EXCL set (#31)

Co-authored-by: James Walder <[email protected]>

* disable posc (#30)

posc is disabled for proxies, but not for a unified setup. XrdCeph does not support the posc flag as it misinterprets objects as folders

Co-authored-by: James Walder <[email protected]>
Co-authored-by: Jo-stfc <[email protected]>
* Add multiple buffer support for reads in case of simultaneous threads reading the same file.

* Further refinements to the simultaneous file reads code

 - Ensure all relevent read / write methods will create a buffer if needed
 - Validty check on close that a buffer was actually created (or bypass code if not)
 - Bugfix in case of odd read sizes combined with multi/split buffer reads (critical)
 - Clean of comments included for development

* Enhanced logging for cluster metrics and readV layer improvments (#35)

- dumpCLusterInfo to check on the rados connection info
  - extra logging in a delete to give info on delete times
  - update the readV basic alg to do a simple bulk request

Co-authored-by: James Walder <[email protected]>

* Add time taken to unlink a file in the logging message

  - Logging an unlink now includes the time taken, in cases of (un)successful deletes
  - Remove some extraneous comments

* - Fix issue with buffer passthrough read
 - Add maximum number of simultaneous buffers for a given file
Once a given number of opens have been made against the same file, don't
create a large buffer, and only create a 1MiB buffer for each new file.
This should avoid issues with small paged reads, but would normally hope the
pasthrough mode would be triggered in each read.

* Additional statistics on buffered reading added.

 - Will report bytes read from ceph, bytes read but bypassed the cache, and the cache hit fraction

---------

Co-authored-by: James Walder <[email protected]>
…40)

* Bug fix for writes with bufferedIO when extending over buffer range.
 - Fix for case where multiple writes to the buffer are needed for a given xrd write request
 - Previously threw an error; now will correctly perform the multiple writes as required.
 - Set the Simple Data buffer capacity to the input size, rather than the capacity of the vector, which could be larger.

---------

Co-authored-by: James Walder <[email protected]>
* test

* fix merge conflict

* extra bracket

* misplaced bracket

* StatLS only takes pool name from section of object path before first colon ':'

* Tidy reporting of pool name to ignore some exraneous characters

* Add XrdSys/XrdSysPlatform.h to get MAXPATHLEN

* Bug fix for writes with bufferedIO when extending over buffer range.  (#40) (#41)

* Bug fix for writes with bufferedIO when extending over buffer range.
 - Fix for case where multiple writes to the buffer are needed for a given xrd write request
 - Previously threw an error; now will correctly perform the multiple writes as required.
 - Set the Simple Data buffer capacity to the input size, rather than the capacity of the vector, which could be larger.

---------

Co-authored-by: snafus <[email protected]>
Co-authored-by: James Walder <[email protected]>

---------

Co-authored-by: Ian Johnson <[email protected]>
Co-authored-by: snafus <[email protected]>
Co-authored-by: James Walder <[email protected]>
* variable rpm name (#17)

* variable rpm name

* Update xrootd-ceph.spec.in

* Update makesrpm.sh

* Update makesrpm.sh

* Master cephnamelib (#16)

* Allow ceph.namelib to take params and apply translation to full path

* Reduce logging

Remove extraneous logging messages

* simplify parsing of namelib and added a log line for any remapped file

Co-authored-by: James <[email protected]>

* XRD-22 Fix ensuring the correct filename is passed to the CephFile instance. (#24)

A regression in previous commit meant that the filename was not correctly passed
to the CephFile instance. This fix ensures that the filename is set correctly.

Co-authored-by: james <[email protected]>

* XRD-12 Add timestamp information for ceph logging methods

Update the logwrapper method to print out the current timestamp in the initial section of output.

* re-introduce variable names to spec input (#27)

* Return permission denied on write attempt on existing file with EXCL set (#31)

Co-authored-by: James Walder <[email protected]>

* disable posc (#30)

posc is disabled for proxies, but not for a unified setup. XrdCeph does not support the posc flag as it misinterprets objects as folders

* Disk space reporting (#36)

* Provide XrdCephOss::StatLS and ceph_posix_stat_pool to enable disk space reporting. Responds to the 'xrdfs query space' command as requested by ALICE VO

* Remove ts() timestamp function and unnecessary #defines

* Read ceph.poolnames setting from XRootD config to specify reportable pools.

* Support 'xrdfs spaceinfo' via Stat() method returning XrdOssOK for stat'ing 'pool:'

* Tidy up tracing of Stat* calls

* Remove unwanted method isPathReportablePool

* Add comments for need to support stat-ing '/'

* Return -ENOMEM if malloc fails

* Return -ENOMEM if malloc fails

* Rename disk space reporting config item to ceph,reportingppols and log if the list of names is not present. Report if ceph_posix_stat_pool call to get the amount of used space fails

* Sanitize incoming pool name and allow for MonALISA format

* Optional tracing of Stat* incoming paths and response. Remove double logging of ceph.reporting pools.

* Check that sanitized pool name is not marked invalid

* Use ceph namelib translation at Oss level by copying translateFileName logic from Posix level. More error checking if stat can't find pool name.

* Remove superfluous comments

* Ensure tracing of path arguments to Stat() and StatLS(). Add Doxygen-style commments to changed methods

* Make source tarball only as minimum output

* Add make-src-tar.sh to additionally place required source tarball in '--output' destination

* Change back usedSpace to totalSpace in ceph_posix_statfs

* feat: improve (vector) read implementation (#37)

Try to avoid usage of libradosstriper for readv operations
since it may impact performance significantly. To do so we explicitly
determine the objects that constitute a file and read from them using
rados only. Reads are async.

To do these async reads conveniently we introduce a class for handling
multiple async read requests.

* Initial implementation of ReadV at the XrdOss level

* Correct the signature of ReadV to XrdCephOssFile

* feat: do not use libradosstiper for readv operation

* feat: use atomic operations for readv requests

This should be the most efficient way of handling multiple read ops.

* feat: use nonstriper reads for pread requests

* feat: use nonstriper reads for read operations also

To do so we do complete refactoring: bulkAioRead class moved to a
separate file, and its features extended. Namely, it can do reads
from files, not only objects, now.

* feat: print warning message if waiting for aio reads from ceph takes long

This is useful for debugging the reasons of failures for read(v) requests.

* Added some comments

* fix: use size_t for start_block

We can use "%zx" in sprintf, so let's unify the types of variables in
the function. This will also allow us to extend limitations on the
file size.

* feat: refactor BulkAioRead::read method, suggested during review

1. Rename end_block to last_block
2. Move variable definitions closer to its usage
3. Use 'std::min' instead of 'if' for chunk_len determination
4. Use more efficient chunk_start calculation

* feat: add options to allow one to switch to standard read mechanisms

This may be useful for testing.

* feat: rename block_size to object_size in BulkAioRead

New name better describes reality, since we are talking about the size
of ceph objects.

* feat: rename wait_for_complete to submit_and_wait_for_complete

New name describes this function better.

* feat: use more meaningful names for variables that loops over operations map

op_data should describe the contents of the variables better.

* feat: move type definitions into the class

* feat: added comments with method's description

* feat: remove unnecessary semicolons

* feat: convert wait_for_complete method from void to int

This allow one to improve several things. Here we change key to the
operations and use object number instead of full its name.

* fix: fixed comment

* fix: fixed comments

* feat: refactor bulkAioRead class

Pointers were dropped from objectReadOperation and ceph_bufferlist objects.
The objects are moved to appropriate classes to simplify memory management
and usage.

* feat: take into account completion's return value

We can retrieve return code from completion and get meaningful status
of the whole operation with this value.

* feat: allow reading of sparse file

Since we do not really expect sparse files, we use a fallback mechanism:
if a read(v) failed with -ENOENT exit status, then just resubmit it using
striper-based functions.

* lint: remove trailing whitespaces

* feat: use meaningful names for read(v) functions

The name now indicates whether read(v)s are striper or non-striper
ones.

* feat: fallback to striper-based read if number of stripes > 1

Just in case, such files should not be present in our production setup

* feat: allow zero-sized reads

In principle, this is a correct request, so we should support it.

* fix: make sure we do not delete completion objects until submitted operation is completed

This is done to prevent some nasty side-effects, e.g. writing to a deleted buffer.

* fix: remove move constructor from bulkAioRead

We do not use it.

* fix: handle failure to allocate completion

Completion allocation can fail, we should take that into an account.

* feat: use file reference to construct readOp objects

There is no need to extract (and the copy) file name and object size
from file reference to construct read object, we can use file reference
directly.

* feat: replace conversion operator with explicit method

Implicit conversion was making code less readable.

* feat: remove call to is_complete() in completion wrapper destructor

There is no need to check for completion, we can call wait_for_complete
multiple times.

* feat: put warning threshold to config file

It is better to have this value as configurable instead of hardcoded.

* fix: initialize return code variable in ReadOpData

* Added comment

* feat: add comment for future optimization.

We should use `aio_cancel` to cancel all pending read operations in future.

* fix: remove vim's swp file

Commited by accident

* feat: improve logging

Add file descriptor to sparse file's logging, fix typos.

* fix: minor fixes

Remove unnecessary include, move variable declaration closer to the
usage, fix spelling in the comment.

* feat: BulkAioRead::read method refactoring

Refactoring was made to increase (hopefully) readability.

* fix: better wording for comment

* feat: BulkAioRead::read -- change loop exit condition

We can exit when `to_read == 0`. This allow us to drop `end_block`
variable.

* fix: add call to `clear` after getting results

This is to allow clients to use the same readOp object for future
operations.

---------

Co-authored-by: Ian Johnson <[email protected]>
Co-authored-by: Alexander Rogovskiy <[email protected]>

* duplicate struct definition

* move struct definition to headers

* use bufferedIO version of path

* remove MAXPATHLEN redefinition

---------

Co-authored-by: snafus <[email protected]>
Co-authored-by: James <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Ian Johnson <[email protected]>
Co-authored-by: alex-rg <[email protected]>
Co-authored-by: Alexander Rogovskiy <[email protected]>
* Add capability for buffer io raw to use striperless reads

* Add capability for buffer io raw to use striperless reads

* Add a maybe striper for reading in ceph posix

* Use striperless reads when bypassing the buffer
Remove verbose logging for case when cache is bypassed, as the read size is at least the size of the buffer.
* catch division by 0 in CephIOAdapterRaw.cc, increase granularity to nanoseconds

* long to unsigned long long
explicit typecasting
return read return value when triggering error while read
Jo-stfc and others added 2 commits January 10, 2024 15:00
* get stripeunit and object size from xattr of first stripe
use striper.layout.object_size, not striper.size as that is the size of the whole object
get the striper layout info on file open
use min of return code of object striper layout metadata

* use striper.layout.object_size, not striper.size as that is the size of the whole object

* improvements from review

---------

Co-authored-by: root <[email protected]>
* clean garbage from rados read

* static alloc

* static alloc

* static alloc needs manual null

* comments and warning for nondefault params

* add filename in log

* add filename in log

* code review changes

* c++14 compatibility fixes

---------

Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants