Skip to content

Why you should use parcels

Aravind Selvan edited this page Aug 18, 2017 · 4 revisions

As [previously discussed](Parcels: What and Why?), we designed parcels to help address challenges that we, at Cloudera, faced when trying to manage distributing CDH across a cluster and managing upgrades to CDH in a coordinated way. We also needed to account for additional services (such as Impala that wasn't initially part of CDH) and plugins (such as LZO).

These challenges are ones that will equally apply to any other services or plugins that you need to deploy across a cluster that's running CDH and managed by Cloudera Manager, so it's worth considering whether parcels can help you too.

Distributing a set of bits to many machines is boring but non trivial

Copying bits around is not exciting, but it's very painful if you don't have any tools that automate the process. And there are a lot of tools out there, but that also means that not everyone uses the same ones. You may have some users using Puppet, Chef, Satellite server, or even a terrifying contraption built out of parallel SSH and some shell scripts. Under such circumstances, it's impossible to provide a turn-key way for your bits to get distributed as part of their existing infrastructure. You can provide some documentation on how to integrate with various tools but the only way to provide an end-to-end solution is to write it yourself.

We went through that exercise at Cloudera and Parcels were the result. But with that work done, we can offer Parcels as a mechanism for everyone else to use when deploying bits to a CDH cluster, regardless of what other systems, if any, the user might have in place.

Once you provide a parcel, Cloudera Manager can handle distribution of that parcel, as well as upgrading to future versions, or eventually removing it.

What you can distribute

There are three primary content types that parcels can effectively address - and unsurprisingly - all three are present in various Cloudera parcels.

  • Binaries and support files for services that should run on a managed cluster (eg: HDFS, Impala, Spark)
  • Plugins/Extensions for services (eg: LZO codec, Hive UDFs, HBase co-processors)
  • Client tools that interact with services (eg: Sqoop, Pig)

If you are providing one or more of these to a CDH cluster, parcels are a good fit for you.

In the case of plugins and extensions, parcels are especially compelling, as they avoid the need for the end user to do manual configuration to pick up your components. If the end user is already using parcels for CDH, and as a best practice, you should not place these components into the lib directories for the services you are extending. It is much safer to place your files in private directories and extend the appropriate environment variables to let the service them up. Parcels allow you to do this automatically, rather than having to ask the user to customize those variables by hand.

Parcels work well with CSDs

This is probably not a surprising statement, but if you are already intending to provide a CSD to allow CM to manage a service, parcels are the best way to provide the binaries and other files that make up your service. The CSD can specify a parcel repository automatically, and once in place, users can trivially add your service to existing clusters and even create brand new ones with your service included from the get-go. Without parcels, the distribution exercise must take place outside of CM, and out-of-band.

Note: It is not required that the Cloudera Manager server host be part of a managed cluster and have an agent installed. Although you initially copy the CSD file to the Cloudera Manager server, the Parcel for the add-on service will not be installed on the Cloudera Manager Server host unless the host is managed by Cloudera Manager.