Skip to content

Ganglia Wish List

Ng Zhi An edited this page Jul 27, 2014 · 1 revision

The following is a list of features that we would like to implement for future versions of Ganglia. If you are interested in helping out, please ping us at IRC in #ganglia on irc.freenode.net or email us at [email protected] (Subscription required).

The original discussion thread which started the wish list can be found here: http://www.mail-archive.com/[email protected]/msg03070.html

Ganglia To Do

  • Integration of gexec/authd
  • Expand gstat nodelist parameter query options (i.e. return all hosts with <10% iowait, etc.)
  • Native Windows Port
  • Integrated configuration management
  • Stable libganglia interface
  • Bindings to libganglia for scripting languages (to be used mainly by modules)
  • CLI interface to easily browse and download Python DSO metric plugins hosted at !GitHub -- let's call it GPAN (Ganglia Plugin Archive Network) :-)
  • Quickstart guide in Wiki

GMond To Do

  • Implement a perl module interface - Develop a Gmond module that will allow metric modules written in Perl to be plugged into Gmond (see the Python interface module). In progress, see: http://www.mail-archive.com/[email protected]/msg05655.html
  • Implement a PHP module interface - Develop a Gmond module that will allow metric modules written in PHP to be plugged into Gmond (see the Python interface module). In progress, see: http://www.mail-archive.com/[email protected]/msg05664.html
  • Implement a Ruby module interface - Develop a Gmond module that will allow metric modules written in Ruby to be plugged into Gmond (see the Python interface module).
  • Implement a .NET module interface - Develop a Gmond module that will allow metric modules written in Visual Basic or C# to be plugged into Gmond (Focused on running ganglia in Windows where scripting support is poor).
  • Implement a LUA module interface - Develop a Gmond module that will allow metric modules written in LUA to be plugged into Gmond (see the Python interface module).
  • Implement a TCL module interface - Develop a Gmond module that will allow metric modules written in TCL to be plugged into Gmond (see the Python interface module).
  • Metric packing: Simplify that a UDP packet can contain multiple metrics (using the usual XDR stream decoding) up to the size of a UDP packet. This would help reduce the overheads when sending many metric updates concurrently. It also preserves the current gmond behaviour where it sends metric updates in a single UDP packet.
  • Support for counters (metrics with +ve slope) This shouldn't require much work (from memory, make sure the slope-type information is preserved and patch gmetad to create RRD files with the correct options). Currently Ganglia doesn't actually support custom counter metrics, which is an awkward limitation.
  • Gmond switching to a non-blocking IO model. If there's a large number of metric updates then gmond must process them "quickly" or they will be lost. If this happens whilst gmond is sending XML data to gmetad there's may be a delay, increasing the risk of metric update messages being lost. Switching to a non-blocking IO model would allow gmond to respond preferentially to the incoming UDP messages.
  • Add more memory metrics such as slab, buffers, dirty, writeback, cache_clean (= cached - dirty+writeback)), mapped, free
  • All metric modules should be changed to report their metrics in their base units (i.e. bytes not kilobytes or megabytes) and all unit conversions should happen on the display side. For the sake of historical compatibility there should be a way to toggle this behavior in grandfathered non-conforming metric modules i.e. modmem.so
  • Allow for hosts to report their own hostname instead of depending on gmond's own reverse resolution.
  • Send-only version of Gmond written in Python (suggested by vvuksan)
  • Abstract collection code from gmond and be able to take metrics from external sources. Currently, the code in trunk supports metrics sent via [http://host-sflow.sourceforge.net/ hsflowd] using the [http://sflow.org/ sFlow] protocol, this could be further extended to support:
    • [http://collectd.org/ collectd]
    • [http://collectl.sourceforge.net/ collectl]
    • [https://github.com/etsy/statsd StatsD]
  • Implement a [http://en.wikipedia.org/wiki/DTrace DTrace] plugin for gmond on OSes that support it (Solaris, MacOS X, etc.)

== GMetad To Do ==

  • Support pluggable Gmetad modules (Done for the python version of Gmetad)
  • Rewrite Gmetad on top of APR
  • Support for new RRDTool which allows graphs to have dynamic sizes
  • Gilad's stacked graphs: http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=187
  • Better support for bigger less frequent updates - One packet every 20 seconds per host for all data?
  • Multi PB disk limit
  • Better on disk RRD performance ([https://sourceforge.net/apps/trac/ganglia/wiki/rrdcached_integration rrdcached] addresses this well!)
  • Allow for some sort of in memory RRD (never written to disk) as an alternative storage for very extreme cases.
  • Name RRD directories based on UUID generated by client gmond MAC address? something else? So that renaming hosts, updating DNS or hosts files don't result in history for the phyiscal gmond client being lost.
  • Interface stats in bits? Self awareness of interface capablity for % util stats for network.
  • Something like a unique per-gmond instance identifier. To help with multi-homing and DNS issues and so the IP address is no longer the index key. There was discussion of this under the subject "Overriding hostname" on the Ganglia-general list.
  • Give some metrics priority and have them updated more frequently in their RRDs than others.
  • Support for different set of RRAs to support different density of data per metric.
  • Let the users manage different IO bound pools for their metrics. For extreme cases one based on tmpfs. So that they can be tied correctly to the right kind of storage IO capabilities for the frequency needed.

== Web interface ==

  • GWeb 2.0 (new Ganglia Web Interface)

  • Numerous custom graphs enhancements (Alex Balk, Timothy Witham, others)

  • Web frontend face lift

  • Mouse over result graphs Default cluster view uses text-only per host squares loading 1700 little graphs chews too much browser

  • Better icons. The current highly-compressed JPEG files for the icons look horrible! Line-art perhaps suffers worst from JPEG compression artifacts. Could we not use either PNGs or (preferably) SVG?

  • Add an option to allow switching to SVG in-line RRDtool graphs. This should be pretty easy to add as a config option. I think support for SVG in current browsers is now "good enough". A half-way modern version of RRDTool can generate SVG versions of the graphs, which should look much better.

  • Have some standard way of describing custom graphs. There currently isn't a standard way of producing custom graphs; "custom" here means adding support for host-specific and cluster-specific graphs and also some framework for describing those custom graphs. I have a solution, that (at least) has merit in both existing and working. Perhaps it isn't ideal, but the Ganglia web front-end should provide at least some standard hooks if not an actual framework.

  • Have the option to switch off displaying all the single-metric graphs. If you have ~300 metrics, the little graphs at the page bottom are all but useless. They slow down the loading of the page without adding much insight. (Partially done)

  • Fix the pie-chart-generating code. The current pie-chart code is a bit ugly and can plot things incorrectly under certain circumstances. There must be some nicer graph plotting packages out there...

  • Allow labeling hosts by arbitrary static metrics. I.E. IP or Mac address instead of hostname