Puppet and KairosDB - kairosdb Module

fatdragon/kairosdb Puppet module is a module that installs KairosDB and allows managing multiple KairosDB instances running on the same host.

It is an open source module, managed at Github and published under Apache 2.0 license. Talking about the license, it is important to mention that depending on use_highcharts parameter (explained below) it might also install highcharts scipt, which is a commercial product, and you should check their licensing options. They offer free license for non-commercial usage. The module is also published at puppet forge.

Why Another Module?

Although the module solves few more issues that I'll explain below, the main reason for me to create it, and the main advantage of the module is:

To explain why it is so important for me, and how it might be useful for you, I'll explain my use case: I'm using KairosDB with Cassandra for few different purposes:

  • For storing Forex prices;
  • For storing sports betting odds (prices);
  • For storing application metrics in my infrastructure.

All these have something in common - time series data. But again they are too different between each other that I really wanted clear separation (different database, different replication rules, etc.). To accomplish that I've needed multiple KairosDB instances. Still, I didn't want to dedicate separate machines since it may happen that not all are too loaded at the same time, and basically can live together on the same host.

The module also successfully resolved another issue: At the moment of this writing, out-of-the-box KairosDB installation from deb package comes with poor service script (init.d) which makes it difficult to manage. The issue is described here. This module patches service script (if patch_initd parameter is set to true) and resolves the issue.

Finally, I have to be honest and admit one risk introduced by the module: As far as I know KairosDB does not officially support deployment with multiple instances running on the same host. You can check with them about their opinion on the subject, if you want. But one thing is for sure: multi-instances deployment (configured by the module) does work in my production environment without problems.

Implementation

There is one thing about the implementation you should be aware of. When you install KairosDB in a usual way, you get one kairosdb service that runs one (and only) KairosDB instance. On the other side, the module installs multiple services, one for every KairosDB instance. For example, if you have two instances configured (myinst01 and myinst02), you'll also have two services (kairosdb-myinst01 and kairosdb-myinst02). Every service can be managed separately. Actually, the instances are completely isolated one from another so that besides using different services they are also using different cache directories, different log directories, different PID files, etc.

Existing Deployments

I will start this with recommendation:

If you'll stick with this recommendation you can skip the rest of this section.

If you already have KairosDB installed on the server, you should be careful with implementing this module. Your service currently probably uses kairosdb service, and as mentioned above, this module expect implementation with multiple services. In fact, if module finds kairosdb service on the managed machine - it will stop and uninstall it. For this reason I recommend reinstalling KairosDB, which is an easy task since KairosDB installation is pretty much stateless - the data is stored elsewhere, and the new KairosDB installation will simply continue using the same data.

Another thing you should be aware of if you have existing KairosDB installation, and you don't want to let module reinstall it, is that in the first run the module will copy existing configuration files and scripts from your installation directory, and it will use these as templates for all instances created in the future. For this reason, before running the module for the first time, you should ensure that your installation directory contains unchanged, default configuration files and scripts.

Prerequisites

Prior to installing KairosDB you should have Java installed and JAVA_HOME configured.

Configuring the Module

Configuring the module is actually very simple. Before starting with examples, let's introduce an assumption that Cassandra cluster that will be used consists of two Cassandra nodes: 192.168.1.1 and 192.168.1.2.

Single Instance, Minimal Configuration Example

The minimal configuration example based on Cassandra datastore:

kairosdb::instance { 'kdb01':
  properties_set => {
    'kairosdb.datastore.cassandra.host_list' => '192.168.1.1:9160, 192.168.1.2:9160',
  },
}

That's it. It will install and configure one KairosDB instance (named 'kdb01'), with default configuration options. You should be able to access the instance by using standard KairosDB ports (HTTP REST 8080 and telnet 4242). You can also access standard web interface at http://servername:8080, of course.

Two Instances, Minimal Configuration Example

In a similar way you can configure two instances, both using the same Cassandra datastore, at the same host:

kairosdb::instance { 'kdb01':
  properties_set => {
    'kairosdb.datastore.cassandra.host_list' => '192.168.1.1:9160, 192.168.1.2:9160',
  }
}

kairosdb::instance { 'kdb02':
  telnetserver_port => 4243,
  jetty_port        => 8081,
  properties_set    => {
    'kairosdb.datastore.cassandra.host_list' => '192.168.1.1:9160, 192.168.1.2:9160',
  },
}

In this case however, we've had to specify ports for the second instance (both instances cannot use the same default ports).

If you've read everything carefully, and if you are fast-thinking kind of a guy, you've noticed that this example is actually useless. We are using two KairsDB instances on the same host, and they are targeting the same database, so no avail at all. Congratulations for you if you've noticed that! The next examples will be more useful.

Two Instances with Different Cassandra Keyspace

We can configure two KairosDB instances to use the same Cassandra cluster, but different keyspaces in the following way:

kairosdb::instance { 'kdb01':
  properties_set => {
    'kairosdb.datastore.cassandra.host_list'          => '192.168.1.1:9160, 192.168.1.2:9160',
    'kairosdb.datastore.cassandra.keyspace'           => 'myimportantdata',
    'kairosdb.datastore.cassandra.replication_factor' => 2,
  }
}
kairosdb::instance { 'kdb02':
  telnetserver_port => 4243,
  jetty_port        => 8081,
  properties_set    => {
    'kairosdb.datastore.cassandra.host_list'          => '192.168.1.1:9160, 192.168.1.2:9160',
    'kairosdb.datastore.cassandra.keyspace'           => 'mynotimportantdata',
    'kairosdb.datastore.cassandra.replication_factor' => 1,
  },
}

In this case we've used different keyspaces, with different replication factor. We can also set other configuration parameters this way, of course.

Here I want to note two things:

  • If keyspace is not specified (as in previous examples), the default 'kairosdb' keyspace is used;
  • If two or more instances are targeting the same datastore, datastore configuration has to be consistent among them. If different datastores are used - you can freely change datastore configuration options.

Besides using different keyspaces we could also use different Cassandra clusters, of course.

Test Instance Example

Here I'll cover another use case that will demonstrate few more things. Let's assume that we have KairosDB host with a production KairosDB instance based on Cassandra. For testing reasons we want to implement another KairosDB instance that will be used occasionally. Testing instance will use H2 datastore. We can accomplish this with:

kairosdb::instance { 'production':
  properties_set => {
    'kairosdb.datastore.cassandra.host_list'          => '192.168.1.1:9160, 192.168.1.2:9160',
  },
}
kairosdb::instance { 'test':
  telnetserver_port => 4243,
  jetty_port        => 8081,
  datastore         => 'h2',
}

After the testing is finished we can turn off testing instance in the following way:

kairosdb::instance { 'test': 
  service_enable    => false, 
  service_ensure    => 'stopped',
  telnetserver_port => 4243,
  jetty_port        => 8081,
  datastore         => 'h2',
}

Or we can completely remove testing instance by setting:

kairosdb::instance { 'test': 
  ensure => 'absent', 
}

This example showed several features:

  • KairosDB instances can use different datastores (Cassandra and H2 in this example);
  • The instances can be simply added, managed and removed.

Host-Level Configuration

Although the most important configuration options are available at instance level, some options (for example KairosDB version used) are available only at host level. Host level configuration is managed by kairosdb class. Here's an example:

class { 'kairosdb':
  version => '1.1.1-1',
}

Note that no example (except for the last one) has implemented this class. This is possible because if the class is not specified the module assumes (and automatically includes) default one (with default parameter values). It actually means that all previous examples would work in the same exact way if we were also specified the following:

class { 'kairosdb': }

On the other side, if you specify kairosdb class without specifying any instance (without including any kairosdb::instance resource), KairosDB will be installed but no instance will be created. The server will be ready for creating instances, but before they are actually added no KairosDB service will run on it.

Key Points

The key points you should remember from these examples are:

  • Instances can be added / managed / removed very easy;
  • Every instance can be configured separately, so we can create any deployment we can think of.

Classes and Resources

The module exposes one class (kariosdb) and one resource (kairosdb::instance). There are few more resources inside, but they are only for internal use, and you should not use them directly.

In this section I'll cover available parameters, but before I do this I'll introduce some terminology here. Not only in this module but in general, there are two kinds of parameters:

  • Executive - These are parameters that will change things in the system based on their values. For example, in some cases (modules), when you specify log directory location, the module will actually create this directory and store log files there. In this case we are pretty much free to select parameter value as we want. The value has to follow some predefined rules; for example if directory is expected we have to provide valid file system path, but we are free to select actual directory.
  • Informative - These are parameters that won't change things in the system based on their values; instead they are simply informing the module about these things. For example, in some other cases (modules), when you specify log directory location, the module will not create this directory (it's not its purpose), but instead it will search for this directory and try to read log files. In such cases we cannot freely select the value (directory), but we should provide the one that correspond to the state of the system (actual existing log directory in our example).

This terminology will help us better understand the parameters. For each parameter described below I'll also set its type according to this rule to e (for executive) or i (for informative).

kairosdb Class

The purpose of the class is already explained above. Here I'll go through parameters. For clarity I'll group the parameters in few groups. The first group are package-related parameters:

Parameter Default Value Type Description
manage_package true e Whether to manage package (install KairosDB) or not.
version '1.1.1-1' e

Version of KairosDB that will be installed. It is used only if manage_package is set to true.

The default value is set to the latest version of KairosDB in the moment of this writing, but you should not rely on it to always point to the latest version available. It is hard-coded, and there's no something like 'latest' option. You should check which versions are available at Github, and pick the one you want to install.

package_name 'kairosdb' i Package name. It is used only if manage_package is set to true. It should match to the actual package name, so most likely you'll want to leave this unchanged.
package_ensure 'present' e The value that will be used for ensure parameter of package resource. It is used only if manage_package is set to true.

The second group are file system parameters:

Parameter Default Value Type Description
install_dir '/opt/kairosdb' i KairosDB installation directory.
conf_base '/etc/kairosdb' e Base directory which will contain configurations of all the instances running on the host. Every instance will have its own subdirectory in this base directory.
tmpdir '/tmp' e Temporary directory where KairosDB query cache will be stored.

Note that the values of these parameters have to be specified without trailing slash.

The third group are parameters related to aforementioned init.d script patch. These parameters are used only on Debian family OS (Debian and Ubuntu). On RedHat family OS (RedHat and CentOS) these parameters are ignored.

Parameter Default Value Type Description
patch_initd true e Whether to implement the patch or not. Until people from KairosDB resolve the issue, I recommend using this patch (leaving default true).
init_functions '/lib/lsb/init-functions' i Location of init-functions file on the system. You can check if you want but the default value should be correct on Ubuntu.

The last group are parameters related to visualization in KairosDB web interface. As you may read here, KairosDB supports two visualization JavaScript libraries:

By default the module (as KairosDB itself) will install free library, but if you want to use highcharts, you can set this by using the following parameters:

Parameter Default Value Type Description
use_highcharts false e Whether to use highcharts or not.
highcharts_acknowledge undef e Acknowledgement that you are aware that highcharts is commercial product. This parameter is bit strange, so I'll explain it below.

Let's explain the strange highcharts_acknowledge parameter: Embedding commercial products in an open source solution is generally bad idea, since someone can start using the commercial product not being aware that doing so violates the license. But since KairosDB itself includes highcharts in some way, I've decided that the module should support this too. But if you set module to install highcharts (by setting use_highcharts to true), you'll get warnings at every puppet agent run. The warnings will tell you that you are using commercial product. The only way to stop these annoying warnings is to acknowledge that you are aware of this fact, and you can do this by setting highcharts_acknowledge parameter as follows:

class { 'kairosdb':
  use_highcharts         => true,
  highcharts_acknowledge => 'I am aware that highcharts.js is a commercial product, and that an appropriate license is needed for its usage.',
}

No matter how strange it may look - you should simply copy this exact text and the warnings will stop. Ugly, long line, I know, but it is the way I've came up with.

kairosdb::instance Resource

Again, the purpose of this resource is obvious from the examples above, so here I'll simply enumerate and explain available parameters.

The first, invisible parameter to explain is name (puppet resource name). Obviously it represents the name of KairosDB instance that will be managed. You should be aware that the same name will be used for creating some directories, so don't go crazy with special characters. Try to use single-word name that starts with a letter and contains letters, underscores and numbers (and dashes if you have to).

Here I won't introduce parameter grouping, and I will put all the parameters in the single table:

Parameter Default Value Type Description
manage_service true e Whether to manage service (ensuring / enabling) or not.
service_ensure 'running' e The value that will be used for ensure parameter of service resource. It will be used only if manage_service is set to true.
service_enable true e The value that will be used for enable parameter of service resource. It will be used only if manage_service is set to true.
telnetserver_port 4242 e Telnet port that will be used by the instance.
jetty_port 8080 e Jetty (HTTP REST) port that will be used by the instance.
datastore 'cassandra' e The datastore that will be used by the instance. Acceptable values are 'cassandra', 'h2' and 'remote'.
properties_set undef e Hash of key/value pairs that will be set in kairosdb.properties file.
properties_remove undef e Array of keys that should be removed from kairosdb.properties file.

I need to explain the last two parameters in more details. Obviously they are related to kairosdb.properties file, and you can check the original documentation for more details about available options. It is important to understand that we can use properties_set and properties_remove parameters to define almost all possible configuration entries. The only kairosdb.properties file entries that you should not specify this way are:

  • kairosdb.service.datastore - it will be set by the module based on specified datastore value. Actually, datastore parameter is nothing more than shorthand (easier) way for you to specify kairosdb.service.datastore;
  • kairosdb.jetty.port - it will be set by the module based on specified jetty_port parameter. I've exposed this parameter separately because I expect that it will be changed often (and to remind you that different port need to be specified for different instances);
  • kairosdb.telnetserver.port - it will be set by the module based on specified telnetserver_port parameter. The same explanation as in the previous case stands here;
  • kairosdb.query_cache.cache_dir - it will be set by the module based on tmpdir parameter of kairosdb class and the instance name. For example, if tmpdir value is /tmp, and the instance is named inst1, KairosDB query cache directory will be /tmp/kairos_cache_inst1.

Since it is not included in any of the examples above, here I'll provide properties_remove example. Of course, you'll want to use this parameter when you don't want to set any particular value for this configuration option, but you want to remove it from the configuration completely. For example, here's how you can remove kairosdb.jetty.ssl.port configuration option:

kairosdb::instance { 'kdb01':
  properties_set    => {
    'kairosdb.datastore.cassandra.host_list' => '192.168.1.1:9160, 192.168.1.2:9160',
  }, 
  properties_remove => ['kairosdb.jetty.ssl.port'],
}

Instance Removal

It is already shown in one of the examples above how you can remove KairosDB instance by setting ensure => 'absent'. Here I'll explain what will happen with the data in such case. In general, setting ensure => 'absent' does not remove datastore used by the instance. To be more precise, Cassandra keyspace is never removed, but if you've used H2 datastore - it might (or might not) be removed, depending on the location of the database. By default H2 database is stored in the same directory with the instance's configuration data - in [CONF_BASE]/[INSTANCE_NAME] (i.e. /etc/kairosdb/myinstance). Since setting ensure => 'absent' removes this directory completely - the data will be removed also. But if you've changed actual data location (by setting kairosdb.datastore.h2.database_path) - the data won't be removed.

Feedback Needed!!!

Guyz, it's my fourth puppet module, and about two thousand downloads in less than one week, still no any feedback. I've started thinking that some bot does all these downloads. Don't be so quiet please! You can post comments / questions here or at Github. Please do so! If you have nothing else to say, than simple "Hi" will suffice. I need to know that real living people are on the other side. Thanks!