-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d6ffe1f
commit 196c98f
Showing
3 changed files
with
62 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# FTIR File Schema | ||
This documentation provides an overview of the schema used in FTIR files. At the moment, this documentation is limited to only Bruker instruments as that as what we have experience with. Accordingly, all of the ingest processes rely on being able to read the binary instrument files directly as that allows us to capture all of the metadata as compared to some exported type. As such, the ingest uses the 'brukeropus' package developed by [joshduran](https://github.com/joshduran/brukeropus) which handles much of the heavy-lifting. | ||
|
||
## File Structure | ||
|
||
The file structure of the Bruker FTIR files contains metadata that can be accessed as 'parameters' and 'reference parameters'. These are extracted and stored as-is in the file metadata of the Parquet file. | ||
|
||
### Data | ||
|
||
Multiple sets of data can be contained in the Bruker file, for instance, 'Reflectance', 'Sample Spectrum', and 'Reference Spectrum', that represent the different parts of the measurement that make up a test. We consider there to be three primary quantities that are of most interest to us: 'Reflectance', 'Transmittance', and 'Absorbance'. Consequently we treat these as the main content of the output Parquet file. However, we also try to extract the remaining spectrum when appropriate. Given that each spectrum has its own set of 'x-values' associated with it, we must interpolate the other spectrum to that of the primary quantity. This does result in some modification and loss of data, however, we view this as an acceptable tradeoff to including more information in the Parquet file. The existing files still exist should we want to go back and look at the originals. All of the primary signals, and consequently the interpolated ones, are given on the basis of wavelength, as that is the primary quantity on which we work in our laboratory but the conversion to wavenumber or frequency is trivial. | ||
|
||
The datetime of the measurement is taken from the datetime of one of the primary spectrums. The other spectrums have different datetimes as they are conducted at slightly differing moments, however these are dropped from the resulting Parquet file as they are thought to be of limited utility. | ||
|
||
# Brukeropus package | ||
|
||
Below is just some sample code and explanations for reference and use of the brukeropus package. Deference should be given to the actual documentation, but this gives some common commands that may be useful. | ||
|
||
``` | ||
opus_file = read_opus(path) # Returns an OPUSFile class | ||
opus_file.print_parameters() # Pretty prints all metadata in the file to the console | ||
print(opus_file.data_keys) # Returns a list of all data keys in the file | ||
# General parameter metadata | ||
dict(opus_file.params) | ||
# Reference parameter metadata | ||
dict(opus_file.rf_params) | ||
# Data | ||
opus_file.all_data_keys # Returns a list of all data keys in the file: ['rf', 'r', 'sm'] | ||
opus_file.r.label # Returns the label of the reflectance spectrum | ||
opus_file.r.x # Returns the x-axis data for the reflectance spectrum, in whatever units it was saved in, can be queried with opus_file.r.dxu | ||
opus_file.r.wl # Returns the x-axis data for the reflectance spectrum as wavelength (um) | ||
opus_file.r.y # Returns the y-axis data for the reflectance spectrum | ||
opus_file.r.datetime # Returns the date and time of the measurement | ||
opus_file.iter_all_data() # Returns a generator that yields all data in the file, i.e. iterates through all data keys | ||
sm: Single-channel sample spectra | ||
rf: Single-channel reference spectra | ||
igsm: Sample interferogram | ||
igrf: Reference interferogram | ||
phsm: Sample phase | ||
phrf: Reference phase | ||
a: Absorbance | ||
t: Transmittance | ||
r: Reflectance | ||
km: Kubelka-Munk | ||
tr: Trace (Intensity over Time) | ||
gcig: gc File (Series of Interferograms) | ||
gcsc: gc File (Series of Spectra) | ||
ra: Raman | ||
e: Emission | ||
pw: Power | ||
logr: log(Reflectance) | ||
atr: ATR | ||
pas: Photoacoustic | ||
Conversions between parameter names can be found in | ||
from brukeropus.file.constants import PARAM_LABELS | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters