Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to read only the Header Unit and not the Data Unit? #30

Open
airnandez opened this issue Nov 20, 2017 · 10 comments
Open

Comments

@airnandez
Copy link
Contributor

airnandez commented Nov 20, 2017

I'm evaluating astrogo/fitsio for scanning the header units of hundreds of thousands of FITS files. I'm not interested in the contents of the data units.

I noticed that the method DecodeHDU reads not only the Header Unit but also its associated Data Unit.

This behavior is inconvenient for my use case. The structure of each one of the 800+K FITS files I need to scan is basically:

  • Each file has a single HDU
  • Size of the Header Unit: 14400 bytes
  • Size of the Data Unit: 18187200 bytes
  • Total file size: 18+ MB

Is there any way to tell astrogo/fitsio not to consume (i.e. read from disk) the data unit (i.e. 18MB in this particular case) but only the header unit (i.e. 14 KB in this particular case)? Given the number and the structure of the files I need to scan, that would greatly help.

@sbinet
Copy link
Contributor

sbinet commented Nov 20, 2017

right now it doesn't seem to be advisable with the current decoder (ie: fitsio.streamDecoder).

to implement this, we'd need to implement the fitsio.seekDecoder that can go back to where the data (associated with the header) was when somebody actually requests the data.
one also needs to slightly modify the imageHDU and other HDUs to handle the case where data is "lazily" loaded (or loaded just-in-time.)

I'll try to implement this on my way back to Clermont-Fd (tuesday night).
unless you beat me to it :)

sbinet added a commit that referenced this issue Nov 21, 2017
@sbinet
Copy link
Contributor

sbinet commented Nov 21, 2017

I have just pushed a branch lazy-image-loader that tries to address this, at least for the Image HDU.
could you try it out and report back?

I think I must try harder to provide a better API/interface for this though...

@airnandez
Copy link
Contributor Author

I tested branch lazy-image-loader and the result for me is the same: DecodeHDU() still requires the underlying io.Reader to read the full file contents.

I tested that by using the Reader below:

type UnbufferedReader struct {
	f     *os.File
	count int64
}

// NewUnbufferedReader returns a new UnbufferedReader object
func NewUnbufferedReader(f *os.File) *UnbufferedReader {
	return &UnbufferedReader{
		f: f,
	}
}

// Read implements io.Reader
func (ur *UnbufferedReader) Read(p []byte) (int, error) {
	count, err := ur.f.Read(p)
	if err == nil {
		ur.count += int64(count)
	}
	return count, err
}

// Close implements io.Closer
func (ur *UnbufferedReader) Close() error {
	fmt.Printf("Close: [%s] total bytes read: %d\n", ur.f.Name(), ur.count)
	return ur.f.Close()
}

I call fitsio.Open() as follows:

...
file, err := os.Open(path)
if err != nil {
     return err
}

f, err := fitsio.Open(NewUnbufferedReader(file))

Passing to fitsio.Open an analogous io.ReadSeeker also implies reading the underlying file entirely.

@sbinet
Copy link
Contributor

sbinet commented Nov 23, 2017

Ok... It occurred to me I only implemented it for image hdu.
Do you have other hdu in there?

(In any case, I'll try with your reader impls. On Monday. I am on holidays, starting now.)

@airnandez
Copy link
Contributor Author

airnandez commented Nov 23, 2017

The file I'm testing with is actually an image and fitsio correctly detects that, which I know by printing the type of the header. There is a single HDU in that file.

Below is an excerpt of the FITS cards of my test file:

SIMPLE = true
BITPIX = 16
NAXIS = 2
NAXIS1 = 2144
NAXIS2 = 4241
EXTEND = false
BUNIT = ADU
BSCALE = 1
BZERO = 32768
BLANK = -32768
TIMESYS = UTC
...

@sbinet
Copy link
Contributor

sbinet commented Nov 29, 2017

ok. I've found the underlying issue.
the DecodeHDU method does correctly detect the io.Seeker and does correctly,lazily decode the Data unit.
and after the first HDU, the number of bytes read does correspond to the number of bytes that makes up a header (14400).
it's when the decoder tries to go to the next HDU: the attempt at parsing the header iterates over all the data...

let me fix that.

@sbinet
Copy link
Contributor

sbinet commented Nov 29, 2017

ok. done.

I am still not very happy with the way I have implemented the lazy loading.
I am pondering on whether I shouldn't split the implementation of the various HDU into a HU and a DU part (HU is already factored out as Header. we could do the same for the DU part)
so the actual loading of the DU part could happen when doing du := hdu.Data().(*fitsio.Table)

@airnandez
Copy link
Contributor Author

I just tested with branch lazy-image-loader. When passing an io.Seeker as the argument to fitsio.Open(), DecodeHDU() correctly consumes the part of the FITS file which contains the header unit and does not consume the data unit.

However, when using an io.Reader the whole HDU is consumed even if I'm only interested in retrieving the header. I suspect this is intentional.

In both cases, this different behavior deserves to be documented, so that as a user I understand the tradeoffs of using io.Seeker vs. io.Reader for consuming a FITS file.

@airnandez
Copy link
Contributor Author

Regarding the API, it would be clearer indeed to separate the Header part from the Data part so that one can retrieve them separately. Something like:

f := fitsio.Open(...)
hdr := f.HDU(0).Header()

// actually load the data unit of this HDU
data := f.HDU(0).Data()

@sbinet
Copy link
Contributor

sbinet commented Nov 29, 2017

Wrt the API: yes, that's what I was thinking as well.

Wrt the doc: yep, that also needs to be properly documented.

Wrt the io.Reader it's indeed the intended behavior: an io.Reader can only consume blocks of data, no seeking can be achieved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants