Skip to content

The parcel format

philipl edited this page Feb 19, 2014 · 17 revisions

Put most simply, a parcel is a tarball (gzip compressed) that contains some defined metadata in addition to all the files being deployed in the parcel. If the metadata is ignored, then there is no functional difference between the parcel and an equivalent tarball.

The metadata is what allows Cloudera Manager to manage the parcel.

What do we mean by "a parcel is a tarball"?

Strictly speaking, a tarball is just a tar archive that contains a bunch of files. There are no rules about the internal organisation of the files or what files are present or absent. A parcel is the same way: if we ignore the requirements around the metadata, there are no other requirements imposed on how you should organise the files in your parcel. If you already build tarballs for your service or application, you can probably turn them into parcels simply by adding the necessary metadata.

How a parcel interacts with the rest of the system

As a parcel simply contains a set of files, and these are deliberately placed out of the way by Cloudera Manager; their mere presence should not have any effect on a running system, beyond taking up space. Parcels have an opportunity to interact with the system when they are activated, and there are two mechanisms for this.

  • Registering alternatives with the OS
  • When creating a parcel, you can provide a list of alternatives entries that should be created when the parcel is active. These entries can establish symlinks in system-wide or otherwise well known locations so that binaries or other files can easily be discovered.
  • eg: Pig in the CDH parcel is exposed to end-users in this way
  • Setting and extending environment variables for CM managed processes
  • When CM starts a process that has been configured to consume a parcel, the CM Agent will source a shell script provided by the parcel. At this time, the script can set or modify environment variables that affect the process being started
  • eg: The LZO parcel will extend $HADOOP_CLASSPATH so that MapReduce can find the codec

The metadata

  • [parcel.json](The parcel.json file)
  • [Environment Script](The Parcel Defines Script)
  • [alternatives.json](The alternatives.json file)
  • [permissions.json](The permissions.json file)
  • release-notes.txt

Compression

As parcels are mechanically tarballs, there are a variety of potential compression formats available. However, Cloudera Manager only supports uncompressed and gzip compressed parcels.

Cloudera Manager Assumptions

Cloudera Manager makes a set of assumptions about parcels which are not inherent in the format, but have to do with the actual application/service contained within it. If this assumptions cannot be respected, the application/service may not work correctly when deployed by Cloudera Manager.

  • The parcel is immutable
  • Configuring a service or running that service from a parcel should NEVER modify the contents of the unpacked parcel. As switching to a new parcel version involves switching to a new directory location, any changes made within the old parcel would be lost.
  • The parcel is relocatable
  • Cloudera Manager allows users to choose where, on the filesystem, parcels are deployed for use. If the parcel contents requires that it be run from a fixed absolute path, then it may not work at all, or may only work when CM is configured to use an exact location as the parcel deployment directory. As implied by immutability, relocatability cannot be achieved by editing files inside the parcel. The location of the parcel will be provided to the parcel environment script (see format above) so that it can be known at runtime. If the contents of the parcel do require knowledge of their absolute location, they should use the environment to obtain this information.
  • The service should be able to read its configuration from a configurable external location
  • As is generally implied by the previous assumptions, the service configuration must be externally provided. A good example is Hadoop itself, where an environment variable or command line argument is used to specify the location of a directory containing all the config files.

Pre-Cloudera Manager 5 parcels (aka: schema_version == 0)

Parcels were first introduced in Cloudera Manager 4.5, and it was theoretically possible for third parties to build parcels from day one. However, this was never formally documented, and without CSDs, the set of things you could use parcels for was quite limited.

The documentation you are reading was written to cover parcels as they are defined and used in Cloudera Manager 5.0. There are some differences in the metadata format, primarily to remove unnecessary and confusing elements which were present in the "version 0" format. Consequently, parcels that conform to this documentation and the associated validation tool will not be usable by Cloudera Manager 4.x, although old parcels are usable by CM5.

Technical differences

Removed fields

  • minPrevVersion
  • maxPrevVersion

These two fields were never used in CM, but had to be present in parcel.json for a valid parcel. To reduce confusion, they have been removed from the schema.

Alternatives handling

The one significant functional difference was the introduction of the alternatives.json file as a replacement for an alternatives script. In the older format, alternatives were handled by a script in the parcel that CM would run. This required the parcel author to correctly use the update-alternatives command, and allowed non-alternatives related things to be done by the script, which could lead to problems down the line, especially if the user disables alternatives handling but the script did things a program required to run.

In contrast, the alternatives.json file simply contains entries that correspond to the arguments passed to the update-alternatives command. Migration should require nothing more than transferring your current update-alternatives calls to new entries in this file.