Leverage Lambda to manage Scaling Activities and Spot Instances

The use-case

A group of friends asked me for help as they were in need of a SysAdmin for their startup. Their web application, mostly API based, integrates a Video Edition tool, which allows users to upload videos, sounds, images etc. which then they can cut, edit and render. The major requirement for an efficient production was to leverage a GPU. Historically, they were using OVH as their provider, but OVH offering for GPU servers starts at 2K Euros / months. Not really in their budget.

So, of course, I told them to go for AWS instead where they could have GPU instances and pay only for when they need it. So, after a few weeks of work, they had the necessary automation in place to have "Workers" running on GPU instances get created when the SQS queue was growing up. With CloudWatch and Scaling Policies, AWS was starting on-demand GPU instances. From 600USD per month for a G2.2xlarge to just a few USD a day, savings were already significant. But as I was working on that with them, I wanted to go even further and use Spot Instances. For the GPU instances, it is a potential 75% saving of the compute-hour. For production as for development, it is a significant saving.


AWS Elastic Transcoder doesn't fit our use-case as more advanced stuff happens on the GPU side. However, price-wise, we calculated that the EST is not worth it when encoding more than 250h of HD content.

The problem

With the CloudFormation templates all ready, we simply duplicated the AutoScaling group and Launch Config for the GPU and instances. We now had 2 ASG with the same set of metrics and alarms, one configured with an On-Demand Launch-Config and another with a Spot Launch-Config. But, how could we distinguish which ASG should scale up first when the Queue grows in messages and we need a GPU to compute results faster ? I could not find an easy answer with integrated dispatch within AWS AutoScaling nor CloudWatch services..

Possible solutions

Online we could find SpotInst, a company that manages the AutoScaling groups and whenever a scaling operations is necessary, is going to manage for you "Spot or On-demand ?" (at least, that's what I understand of the documentation). Of course, SpotInst proposes a lot more services integrated with that, but I personaly found a little bit of an overkill for our use-case.

That is where the integration with CloudWatch and SNS, paired with Lambda as a consumer or our SNS topic, comes in and does it all for us with what I have called the "SpotManager".

The idea

As you probably already guessed, the Spot Manager is my Lambda function which will distinguish which of the AutoScaling group should trigger a scale-up activity. Here is an overview of the workflow :


How to?

For this solution, we will need:

  1. Our 2 Autoscaling Groups
    1. Identical Launch Configuration apart from the SpotPrice
    2. Scale up policy configured with no alarm
    3. Scale down policy configured with CW Alarm
  2. SNS Topic to get messages / notifications from CloudWatch
  3. CloudWatch alarms on the Queue
    1. Alarm to raise "Jobs to do" signal
    2. Alarm to raise "No jobs anymore" signal
  4. "SpotManager" Lambda Function

In terms of pricing, the EC2 side of things is purely hour-compute maths, so report to the EC2 Spot pricing and EC2 On-Demand pricing. Good news : SNS delivery to Lambda costs 0$USD as referenced here, for CloudWatch we count ~0.4$USD per month. The Lambda pricing depends on how long the function runs. Here, it might take up to 1 second per invoke, so, per months you probably won't go over the free-tier.

Total cost : less than 1$USD per month per pair of ASG (Compute pricing excluded).

1 - The AutoScalingGroups

A. The Launch configurations

To simplify all the steps, I have published here a cloudformation template that will create 2 autoscaling groups, as explained earlier, with an identical Launch Configuration at the difference that one has the property * SpotPrice * set.

B. Scale up policy

Here, also in the Cloudformation template provided, we create a scale-up policy : when triggered, this will add 1 instance to the AutoScaling Group by raising the value of the "Desired Capacity". With the on-demand ASG, nothing fancy will happen if you trigger it: the EC2 service will kick off a new instance according to the ASG properties and the Launch Configuration. Now, if you do the same for testing with the ASG configured for Spot Instances, you will notice that first, in the "Spot Instances" section of the Dashboard, the spot request is being evaluated: a Spot request is sent with the Max bid you are willing to pay for that instance type. If the current spot market allows it, this spot request will be granted and an instance will be created in your account.

C. Scale down policy

As we need machines for jobs coming in, we are also capable to tell when we don't need any compute resources anymore. Depending on how you do the queue messages length analysis, you should be able to determine pretty easily when there are no more messages your workers have to consume. Therefore, I have linked the Scale Down policy to the SQS Alarm. The good thing about an alarm is that you can have the same alarm go to multiple actions. So here, as we have 2 ASG we want to treat the same way regarding scale-down, we instruct the alarm to trigger both ASG' scale-down policies.


The alarm can trigger mulitple actions, but remember that you need to configure a scaling policy on each individual ASG for it to work.


For the rest of the blog, I am going to work with 2 ASG. If you haven't already, create those with a minimum at 0, maximum at 1 and desired capacity at 0. No need to pay before we get to the real thing.

2 - The SNS Topic

SNS is probably one of the oldest service in AWS and doesn't stop growing in features. As a key component of the AWS eco-system, it is extremely easy to integrate other services as consumers of the different topics we can have. Here we go in our AWS Console :


Via the cli ?

aws sns create-topic --name mediaworkerqueuealarmsstatus

That was easy, right ? Let's continue.

3 - Cloudwatch and Alarms

For our demo, I have already created a queue called "demoqueue". From here, we have different metrics to work with. Here, I am going to use the * ApproximateNumberOfMessagesVisible * . This number will stack up as long as messages reside in the queue without being consumed.


Remember that the metrics for the SQS service are updated only every 5 minutes. If for any reason you have to get the jobs started faster than CW to notify you, you will have to find a different way to trigger that alarm

3A - Alarm "There are jobs to process!"

The new Cloudwatch dashboard released just recently makes it even easier to browse the Metrics and create a new alarm.

  1. Identify the metrics

On the CloudWatch dashboard, click on Alarms. There, click on the top button "Create alarm". The different metrics available appear by category. Here, we want to configure the SQS metrics.

  1. Configure the threshold
  1. Check the alarm summary

3B - Alarm "Chill, no more jobs"

For that alarm, we are going to follow the same steps as for the previous alarm, but, we are going to use a different metric and configure a different action. Both our ASG have a scale-down policy. So, let's create that alarm.

    • Identify the metric
    • Configure the threshold
    • Set the alarm actions

4 - The SpotManager function

As explained earlier, I create about everything via CloudFromation, which allows me to leverage tags to identify my resources quickly and easily. That said, the function I share with you today is made to work in any region, the only thing you might have to implement to suit your use-case is how to identify the asg ?.

The code

As usual, the code for the lambda function can be found here, on my github account. Be aware that this function is zipped with different files because I separated each different "core" function to be re-usable in different cases.

However, I have tried to get the best rating from pylint (^^) and document each functions params/return, each of those named with, I hope, self-explainatory names.


The code shared here is really specific to working with CloudFormation templates and my use-case. I use SQS where you might simply use EC2 metrics, or any kind of metrics. Adapt the code to figure out the action to trigger.


This is the python file that is going to analyze for each different AZ where you have a subnet. For each subnet, it is going to retrieve the average spot price for the past hour of the instance-type you want to have.


You could have 3 subnets in 2 AZs within a 5 AZs region, so you actually can run instances within those 2 AZ only, hence why the script takes a VPC ID as parameter.


Here is the CloudFormation parser that will read all information from the stackname you created the two ASG with. Those functions are mostly wrappers around existing boto3 ones to make it easier to get right to the information we are looking for. In our case, we are going to:

  1. Assume the stack we are looking our ASG in could be nested. Therefore, we look on the stack name given and we find our ASG with their Logical Id expressed in the template (ie: asgGPU)
  2. Once the ASG Physical ID names are found (ie: mystack-asgGPU-546540984) we can retrieve any sort of informations
    1. Instances in the group ?
    2. Scaling Policies ?
    3. Any, but that's all we are looking for here ;)

Of course, we could have looked for the ScalingPolicy physical Ids right away from the CloudFormation template, but just in case you misconfigured / mislinked the ASG and the ScalingPolicy (the policy is not there ?), this helps us verify that that's not the case and our ScalingPolicy is linked to the right ASG.


This is the central script from which all the others are going to be executed. Originally, this function was called right away by an Invoke API call providing most of the variables to the function. In the repository, you will find an file named spotmanager_sns.py which is the adaptation of the code to our use-case. The main difference is that, we assume the topic name is a combination of the stackname (AWS::StackName) and other variables. That way we can simply know which Stack runs it and we can find out the rest.

So here is the algorythm.



Any Pull Request to make it a better function is welcome :)

The IAM Role

As for every lambda function, I create an IAM role to control in detail every access of each individual function to the resources. Therefore, here are the different statements I have set in my policy


Do not forget the AWS Managed policies AWSLambdaBasicExecutionRole and AWSLambdaExecute so you will have the logs in CloudWatch logs.

AutoScaling Statement

"Sid": "Stmt1476739808000",
"Effect": "Allow",
"Action": [
  "Resource": [

EC2 Statement

There are a few EC2 calls we need to authorize. Here, all those calls will help me identify the subnets, the Spot pricing and other information necessary to decide what to do next.

 "Sid": "Stmt1476739866000",
 "Effect": "Allow",
 "Action": [
 "Resource": [

CloudFormation statement

As explained, the scripts I have written call the API of CloudFormation to get information about the stack resources etc. This allows me to identify the ASG I want to scale-up.

"Sid": "Stmt1476740536000",
"Effect": "Allow",
"Action": [
"Resource": [

The full policy can be found here

How could we make it more secure ?

I built this function to be work for all my stacks, hence why the resources are "all" (*). But if there is a risk of the possibility that the function could go rogue or exploited, we could do something very simple in our CloudFormation stack :

  • Create the policies and the role as described earlier specifying the resources as we created them in the CF Template.
  • Create the lambda function with the stack (requires to have bundled the function as a zip file)

A bit of extra work for extra security. Just keep in mind that, Lambda costs you a little bit for the code storage. But, probably negligeable compared to the financial risks of leaving the function go rogue ?

Protect your CloudFormation sensible values and secure them with KMS and DynamoDB

The use-case

CloudFormation it probably one of my favorite AWS service. It allows hundreds of people today with the deployment of all the architecture resources required for their applications to run on AWS : Instances, Databases etc.

I use CloudFormation all the time, as soon as there is a piece of architecture that I can use in multiple places, in different environments, this becomes my default deployment method (even just to deploy a couple VMs..).

Some of those resources require a particular care : some of the parameters or some of the values have to be kept secret and possibly as less human readable as possible.

Today I want to share with you a thought and the process I have decided to go with for those delicate resources that we might want to secure as much as possible, removing human factor out of the process. As part of the use-case, I want a fully-automated solution which I can re-use anytime and will guarantee me that those values are never the same from one stack to another as well as secured, both in the recovery sense and security.

Different approaches

Here, I am going to work with a very simple use-case : for my application, I neeed a RDS DB. That resource requires a password to get created and for consumers to get connected to it onwards.

1 - Generate from the CLI and hide the values

In our CloudFormation templates, we have the ability to set a "noEcho" on some of the parameters, so after creation we can't read the value. I find this really useful in the case I have some settings I have access to as an elevated administator of the Cloud which I don't want others to be aware of (ie: The Zone ID of a Route53 managed domain). Generally speaking those are values I could have a default value for in the templates (assuming the templates access is as much restricted as the default value you want to keep from others) and once we describe the stack, won't be displayed in clear-text.

Pros :

  • Very easy to implement (noEcho on the parameter)
  • You can set default values and restrict template access (for non authorized users, deny "cloudformation:GetTemplate")

Cons: - Values are known by authorized users and show up in clear text without additional level of restriction

  • If you have set a default value, you could forget to change it
  • The values exist only in the templates and stack updates could not affect the resources we wanted to update.

2 - Leverage Lambda, KMS and DynamoDB

With CloudFormation, we can create CustomResource(s) and point it to a Lambda function. This Lambda function will execute some code and get treated as any sort of resource in the Stack. Now, in our use-case where we have to set a Master password for our RDS Instance, we want that password to be different for every stack we create, store it somewhere so we can retrieve it, but as we store it, it has to be protected so it won't be human readable.

That is true for a password, you could extend that function to encrypt and store any sort of information you would have compute generated and encrypted. You might just want the randomness, or the encryption, or both. It's up to you.

Pros: - Every stack resource that needs some random value will get a new value everytime

  • Each random value will be encrypted with an AWS managed encryption key (using AWS KMS), and DynamoDB will store it region-wise.
  • We can backup our DynamoDB table to a S3 bucket (most likely, encrypted as well with a different key) for recovery (and leverage S3 replication to backup up globally).


  • Can look like an overkill for not so much (I honestly had that thought at first)
  • Requires a good understanding of how CloudFormation and Lambda custom resources work together.

At the end of the day, the cost of that solution is probably around 1 USD per month, for the KMS key + the Lambda function + DynamoDB storage. So, it is a neutral argument, unless you end up with a bazillion of stacks and stored resources. If you think that would be your case, see the At Scale section.

3 - Use S3 bucket

Update on : 2016-11-07 in reponse to Harold L. Spencer. Thanks Harold for that proposal ;)

Here, instead of going with DynamoDB to backup the passwords etc, Harold asked if it would be better to use S3 to store the passwords : as the stack is created, we would create the same kind of record in a file, which we would encrypt and store into a S3 bucket (itself encrypted). So, here is what I see as pros and cons:


  • No need for DynamoDB, so potentially removes the Capacity Units for reads and writes more expensive than S3 Get/Put
  • S3 has a replication mechanism multi-region, so we can save our data as we need
  • S3 has a versioning system, so we could version each new configuration if need be.


  • No query capabilities in S3, so to find the file you are looking for, it needs to be unique and already need to know what is key is.
  • The parsing necessary depending on the file architecture made in the S3 bucket or the payload file could make it more difficult to update / delete the file
  • Even with versioning, you might not be able to determine what went wrong if you corrupted the file (or at least as complicated as with DynamoDB)

At this point, I would agree that using S3 for storage could be a viable and even cheaper solution. However, as said in the At scale section, here is why I think this might be a alike:

  • For both DynamoDB and S3, you have to make a KMS call to encrypt and decrypt the payload that is going to be stored. Regardless of the scale, you call KMS the same way in both cases..
  • In this very particular use-case, the chances that the DynamoDB table read requirements higher than the free-tier (25 Units) as extremly low.
  • With the right combination of automation, you can as easily backup to one (or more) S3 bucket(s) a DynamoDB table as would a S3 bucket with replication.

How to ?

So, at this point, I have decided to use that second method for all my RDS resources I will create with CloudFormation. Here is what we need to do:

  1. Create a DynamoDB table (per region) we are going to use to store our different stacks passwords into
  2. Create a KMS key (0.53$ per month, so ..)
  3. Create the Lambda functions
    1. Create a Lambda function to generate, encrypt and store the password in DynamoDB
    2. Create a Lambda function to decrypt the key for both CloudFormation and any Invoke capable resource
  4. Create the cloudformation resources in our stack to generate all of the above

Here is a very simple diagram of the workflow our CloudFormation stack is going to go through to create our RDS resources.


1 - The DynamoDB table

Why DynamoDB ? Well, because it is very simple to use and very cheap for our use-case. Not to mention, you won't even go over the free-tier. But at first, DynamoDB is a NoSQL service that you can use directly via API calls as long as the consumer has permissions to write/read from it. Very simple : we are going to create a table with a primary key and a sort key (ensure we aren't doing anything stupid). The DynamoDB table structure is discussable. Please comment if you have suggestions :)

Create the table - Dashboard

In your Dashboard, go to the DynamoDB service. There, start to create a new table.

Create the table

With CloudFormation, I use extensively the "Env" tag to be able to identify all other resources via mappings etc. To create my table, I decided to pair the stack name (which is unique inthe region, granted) and this env value. That way, it sorts of ensure me that I am not overwritting a key in the occasion of a mistake and instead of creating a new item, the function will update the field and you could possibly loose the information ..

There, we are going to use only a very little of the writes and reads. Therefore, there is no need to go with the default values of 5 RSU for reads and writes. Wait for the table to be created (should only take a minute really ..). Make sure all settings look good.

Table is created. Ready to go.

2 - The KMS Key

DynamoDB doesn't come up with a native encryption solution, and furthermore, all data is potentially cleartext at a certain extent. So, prior to storing our password, we are going to leverage KMS to cypher our password. The good thing is : KMS has probably less risks of loosing your key than you have to loose your USB key or tape, for old-school.

Create the KMS key - Dashboard

In IAM, select the right region and create a new key.


I have decided to call my key so I have a very simple way to identify what each key does. Maybe something to exploit to mistake the enemy ? ^^ Make sure you get KMS to manage it for you ..


The administator of the key are the users / roles who can revoke (delete) a key or change its configuration. Choose very carefully the users. Here, I select my user as the only administator of the key.


Now, just as for the admins, I select which users can use the key to encrypt / decrypt data with it. For now, I only select my user. Later on, we will grant those user rights to our IAM role for the lambda functions.


Final validation of the IAM policy that is for the key itself. This is a key policy, check it twice !


Click on Finish to complete the key creation.


Here ! Your key has been created and we can start using it. Note the KeyID somewhere or remember how to come back here, we will need that key for later.

3 - The Lambda functionS

AWS Lambda .. How awesome service, right ? Write some code, store it, call it when you need it, with no additional pain. So, this is where you discover that I am a Python developer, and as such, all my Lambda functions are done in Python. A bit of history : I started with one of the first versions of boto 2. And, it was nice, but, once I tasted some of boto3 and its documentation .. this is where the sweetness comes ;) boto3 really makes it super easy for us to talk to AWS.

So, as for the code, you will be able to find it on gists / github.com in links as we go through that script.

3A - Generate, encrypt and store the password

The lambda function's role

The lambda function runs assuming an IAM role. Here we need a couple rights:

  • Write-only to the dyamoDB table we created earlier
  • Use the KMS key we used earlier

And that's it. Remember, in AWS as in general, the less privileges you give to a function, the lesser the risks of exposing problems where someone gains access to it.

Start with going in IAM again, in the roles this time. Now here, click on create a new role


I usually prefix the role with the roletype. Here, lambda as this role will be used by the Lambda function. Now, cfEncrypt tells me this is the role we will use for the encryption function.


In the AWS Services roles, select AWS Lambda. This is what's called the trust policy. It simply exposes that for this role, IAM will allow API calls from the lambda functions.


We are going to select 2 AWS managed policies as AWS preconfigured those for general purpose. Those policies are necessary for Lambda to create the logs files and other reports. If you find those too permissive, feel free to change them. Beware that you have to know all the details around CW and Lambda functions logging.


Here, final step. We are good to create the role :)


Now we have the baseline for our Lambda function to have the appropriate powers, we still have to create a policy so it will be allowed to write (or read for the decrypt function) to our DynamoDB table. The policy should be as follows :

     "Version": "2012-10-17",
     "Statement": [
             "Sid": "Stmt1478169389000",
             "Effect": "Allow",
             "Action": [
             "Resource": [

So, via the Dashboard again, here is simple run-through how-to create the policy properly. Go to the IAM service, then in policies section, then click on "Create policy" button. On the next screen, select "Policy Generator"


The policy generator is a very simple and efficient tool to help you build the JSON policy if you aren't familiar / used to write and read JSON IAM policies.

In the service dropbox, select "AWS DyanmoDB", then select the "PutItem" Action. In the Resource ARN filed, use the DynamoDB ARN of your table. This is the ultimate way to be sure that the policy won't allow any other action against any other table.


Once you've clicked you will see a first statement has been created for the policy. For now, we don't need any other statement for the policy, so now click on "Next step"


The last step before creation is to review the JSON and the policy name / description. Once you have named your policy and description, click on "Create policy"


At this point, we simply have to attach the policy to our existing role. So back to the roles in the IAM dashboard, select the "lambdaCfDecrypt" role. In the role description page, select "Attach policy". You are taken to a new page where you can select the role to attach:


Create the function

So, first of all we want to generate a password that will comply to our security policy and works for our backend. In my use-case, it is a MySQL DB so, as it is, I go for letters (lower and major cases), numbers and a special caracter. One the password is generated (0.05ms later ..) we are going to call KMS and use our key to cypher the password text (add another 20ms). Then, we write the base64 of the whole thing to our DynamoDB table with all the attributes necessary to make sure we are making it unique.

In this part, I am going to do it only via the Dashboard so it stays user friendly.

In the Lambda dashboard, go to create function. Skip the blueprint selection by clicking on the next step right away in the top left corner.


As Lambda is an event triggered function, you can define trigger to execute the lambda function. Here, we don't need to configure a specific trigger as we are going to call our Lambda function only when CloudFormation will.


"Oh oh .. lots of settings here" - Don't panic ! We have already prepared all the necessary for this step. Here, we give our function a lovely name, a meaningful description and the code. To make the tutorial user-friendly, I have selected "Upload a Zip file" just so everything fits within the page, but you will use the code here and copy-paste it inline.


Here we are, ready to create the lambda function :)


3B - Decrypt the password

The other lambda function's role

As you guessed, here we are going to create a role that is just like the previous one, but instead, we are granting read-only access to the DynamoDB table and decrypt rights on the KMS Key.

To create the lambdaCfDecrypt function, follow exactly the same steps as described in the "Encrypt" function.

     "Version": "2012-10-17",
     "Statement": [
             "Sid": "Stmt1478169389000",
             "Effect": "Allow",
             "Action": [
             "Resource": [

Once you've clicked you will see a first statement has been created for the policy. For now, we don't need any other statement for the policy, so now click on "Next step"


The last step before creation is to review the JSON and the policy name / description. Once you have named your policy and description, click on "Create policy"


Create the function

To create the function, follow the exact same steps as for the cfRdsPasswordGenerate function. The code is in this gist, so you can put that inline.


Do not forget to change the role of the function to the lambdaCfDecrypt role.

4 - Put it all together with CloudFormation

Now we have created our Lambda functions and tested those, it is time to get our Cloudformation running. As for the lambda functions, you can find the full CloudFormation template on my Github account or here.

So, you might know all AWS:EC2:Instance resource attributes, but do you know the custom resource ? Here is the special one that we are going to call our Lambda function with parameters and that is going to generate and capture the values we want. Here is a very simple snippet of those two resources that we want to get through our Lambda functions (those go in the parameters object of your template).

"lambdaDBPassword": {
   "Type": "AWS::CloudFormation::CustomResource",
   "Version": "1.0",
   "Properties": {
     "ServiceToken": "arn:aws:lambda:eu-west-1:account_id:function:cfGeneratePassword",
     "KeyId": "arn:aws:kms:eu-west-1:account_id:key/key_id",
     "PasswordLength": "20",
     "TableName": "mypasswordtablename"
     "Env": {
       "Ref": "Environment"
     "StackName": {
       "Ref": "AWS::Stackname"
 "lambdaGetDBPassword": {
   "Type": "AWS::CloudFormation::CustomResource",
   "DependsOn": "lambdaDBPassword",
   "Version": "1.0",
   "Properties": {
     "ServiceToken": "arn:aws:lambda:eu-west-1:account_id:function:cfGetPassword",
     "TableName": "mypasswordtablename",
     "Env": {
       "Ref": "Environment"
     "StackName": {
       "Ref": "AWS::StackName"

If the resources creation succeded, how can we get the password out of the lambdaGetDBPassword resource ?

Below is a very very small snippet of a RDSInstance resource for which I volountarily kept only the DB password attribute. Here, we first ensure that the lambda custom resource worked and could be created successfully, using the "DependsOn" attribute. Then, for the password, we simply have to get the password out of it, using the function "Fn::GetAtt".

The attribute name is the one that in the code we previously set in the "Data" object of the response.

"rdsDB": {
  "Type": "AWS::RDS::DBInstance",
  "DependsOn": "lambdaGetDBPassword",
  "Properties": {
    "MasterUserPassword": {
      "Fn::GetAtt": [

I had the question : why have 2 separate functions used by CloudFormation instead of have the generator function return the cleartext directly ?

Well, it might make you save around 10 seconds in the stack creation to use only one function, but I find use a function to decrypt a very nice way to be sure that at the creation of the Lambda function, the "reverse" process of get and decrypt the password works as expected. That way, you know that you can reuse that function in different places again and again and keep the logic very simple.


This is a very simple example of all the possibilities Lambda and CloudFormation offer us. I hope this will help you in your journey to AWS and automation.

At scale

When we created the DynamoDB table, as you can see, we have set the read and write capacity units to 1, because this table will be potentially used only when we will create a new stack for dev/test and our resources need a password. But if tomorrow you find yourself in the position where you have 100 RDS dbs, and for each individual DB you have 10s of consumers which when they initialize themselves, will call our Lambda function. Lambda won't be our limitation here, but DynamoDB might be. In this case, you might want to look at the table metrics, and maybe raise the read capacity so you can have more consumers potentially reading all at the same time without a throttle.

Also, it is worth mentionning that KMS has a cost per call. So again depending on the kind of resources that need to decrypt the information with the key, you might have to make sure that your resources are asking for decrypt only when it is necessary (Free-tier ends at 20k requests globally, then goes at 0.03$ per 10k requests).

Edited on 2016-11-07

The 3rd option, 3 - Use S3 bucket, could give us an alternative, but, at risks : if the number of calls you have to make to KMS to get and decrypt the payload becomes a struggle for your bill, if you are super confident in your ability to write S3 bucket policies and your VPC network configuration, you could have the payload non-encrypted in the bucket, leverage VPC Endpoint to S3, and have the instances / resources that need the information and get it in clear-text.

At your own risks if you happen to store your non-protected root password for your black-box.

A fresh start

Hi everyone. I have decided to move away from Wordpress.com, whom I thank for hosting my previous blog for the past 2 years. But, as I am moving forward with my journey to AWS and cloud in general, I wanted to start using more of the AWS'someness.

A while ago, I started a blog using Nikola, which I am using again today to generate all the future blog articles and guides/how-to's and host it directly in S3.

Where are the old posts ?

Some of the very old articles I have are still in my Github account, therefore I will simply republish them, probably in the archives part.

For the most recent AWS or Eucalyptus articles, they will simply be re-published here very soon :)


Hi everyone,

It has been a while since I haven't written a blog post, but today I wanted to share some recent experience with my public cloud of heart and their GPU instances offering. I know that, many people probably did it and did it way better than I did, especially on Ubuntu. But as I am not Ubuntu #1 fan, and much prefer CentOS I wanted to share with you my steps and results using FFMPEG and the NVIDIA codecs to leverage GPU for video encoding and manipulations.

In a near future, I will take a look at the transcoder service in AWS, but as it doesn't meet the requirements for the entire pipe of my video's lifecycle, I am yet to determine how to leverage that service.

So, historically I wanted to use the Amazon Linux image with the drivers installed (the one with the nice NVIDIA icon). But I faced much more problems with it than I thought I'd have. Therefore, I decided to go on a basic minimal install of CentOS 7 and take it from here. And here we go !


As I just mentionned before, I use for this tutorial the official CentOS 7 image available on the AWS Marketplace. This is a minimal install, so at first I recommend to install your favorite packages as well as some of the packages coming from the Base group.

yum upgrade -y; yum install wget git -y; reboot

Also, to install the NVIDIA drivers, you will have to remove the "nouveau" driver / module from the machine.

vi||emacs emacs /etc/default/grub

On the GRUB_CMDLINE_LINUX line, add rd.driver.blacklist=nouveau nouveau.modeset=0

GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200 rd.driver.blacklist=nouveau nouveau.modeset=0"

Now, generate the new grub configuration

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Now, reboot the machine.

Install CUDA

CUDA is not required to do the encode and use the latest h264_nvenc encoder available in FFMPEG. However, CUDA seems required to leverage the NVIDIA api "NVResize" which redimensions videos and uses the GPU for that. Thought, as I am not an expert, there might already be an option to do so with the nvenc encoder in the latest FFMPEG version (to be continued research). IT IS IMPORTANT THAT YOU START WITH CUDA BEFORE INSTALLING THE NVIDIA DRIVERS !!!!

So, let's get our hands on the latest CUDA SDK : https://developer.nvidia.com/cuda-downloads

There, select Linux -> x86_64 -> CentOS -> 7 -> run file (I strongly advice to go for the run file. Much more problem with the repos).

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run

From your AWS EC2 Instance, this won't take long. Grab a tea :) Now we have it, execute the run file as root

sudo chmod +x cuda_7.5.18_linux
sudo ./cuda_7.5.18_linux.run

Accept the terms, then the options I took were :

Install CUDA Install the Libraries Do not install the samples The CUDA installer will install some NVIDIA drivers. At the end of the process (if successful), you should be able to enable the nvidia kernel modules :

sudo modprobe nvidia sudo modprobe nvidia_uvm

So far so good, with lsmod you should see those enabled.

The CUDA utils

Thanks to this guide, I realized there would be potential steps to have some CUDA library to help FFMPEG to communicate with CUDA. Pretty straight forward step.

wget http://developer.download.nvidia.com/compute/redist/ffmpeg/1511-patch/cudautils.zip unzip cudautils.zip cd cudautils make

All set for now, moving on.

The Right NVIDIA drivers

To save you lots of troubles, let's just say that after 24h of different annoying non-verbose errors, I figured something was wrong with the drivers delivered by the CUDA installer (v.352.79). So, now, let's get the NVIDIA latest drivers. You can always get the latest from NVIDIA website : http://www.nvidia.com/download/driverResults.aspx/106780/en-us Those are the latest (on Aug. 2016, v.367.44). Make sure you download the Linux x64 version for the GRID K520. On your instance :

wget http://us.download.nvidia.com/XFree86/Linux-x86_64/367.44/NVIDIA-Linux-x86_64-367.44.run chmod +x NVIDIA-Linux-x86_64-367.44.run sudo ./NVIDIA-Linux-x86_64-367.44.run

Accept the terms, acknowledge that this installer will install new drivers and uninstall the old ones. I did not pick yes for the compatibility drivers for 32bits, and went for the DKMS. Once the driver install is finished, I strongly suggest to reboot, and then, as before, check on the kernel modules to verify those are enabled and working (use lsmod).

The nvEncodeAPI.h

You will need this header file in your library to compile FFMPEG with the --enable-nvenc and use the encoder. To get this one, you will need to subscribe to the developer program of NVIDIA and get the Video_Codec_SDK_7.0.1. I could have made this available for all, but, I will leave you to accept the terms and conditions and get your hands on it yourself. Once you have it, upload it (via SFTP mostlikely) to your instance. Once you have it, unzip the file, and locate the nvEncodeAPI.h file. Keep it in your back-pocket, we will need it soon.

Compile FFMPEG

Now arriving on the final step. As the guide referenced earlier mentions, a few steps are required prior to compile FFMPEG : get the right ffmpeg version get the patch to enable the nvresize For my own FFMPEG, I needed some additional plugins. Here is the script I used to install those. Note the exit 1 just before FFMPEG. This was a fail safe to avoid forgetting some of the little details that follow. Use the script for the non-in repos packages (ie: for x264). Prepare your compilation folders

I personally like to put extra, self-compiled packages in /opt as they are easy to find. But feel free to do as you prefer. For the following steps, I will be doing all the work as root (I know, I know ..) in /opt (If you used my script so far, skip the folders creation). mkdir -p /opt/ffmpeg_sources mkdir -p /opt/ffmpeg_build

Now, we can go ahead with building all the dependencies. The shell script I have done will cover those parts.

Download the right FFMPEG

wget http://developer.download.nvidia.com/compute/redist/ffmpeg/1511-patch/ffmpeg_NVIDIA_gpu_acceleration.patch
git clone git://source.ffmpeg.org/ffmpeg.git git reset --hard b83c849e8797fbb972ebd7f2919e0f085061f37f
git apply --reject --whitespace=fix ../ffmpeg_NVIDIA_gpu_acceleration.patch

From this point, you can't use ./configure with --enable-nvresize and --enable-nvenc: we are missing the libraries.


Simply copy the headers file in /opt/ffmpeg_build/include cudautils

Go back to your cuda utils folder. I did the quick and dirty, yet working, cp * /opt/ffmpeg_build/include and cp * /opt/ffmpeg_build/lib. Now, you could just put the .a in the lib and the .h in the include folders. Configure

cd /opt/ffmpeg_sources/ffmpeg/
export PKG_CONFIG_PATH="/opt/ffmpeg_build/lib/pkgconfig"
./configure --prefix="/opt/ffmpeg_build" \
             --extra-cflags="-I/opt/ffmpeg_build/include" \
             --extra-ldflags="-L/opt/ffmpeg_build/lib" \
             --bindir="/opt/bin" \
             --pkg-config-flags="--static" \
             --enable-gpl \
             --enable-libfaac \
             --enable-libmp3lame \
             --enable-libtheora \
             --enable-libvorbis \
             --enable-libvpx \
             --enable-libx264 \
             --enable-nonfree \
             --enable-nvenc \
             --enable-nvresize \
             --enable-libfribidi \

If the configure succeeded, let's compile

make -j make install

Now, the sneaky library missing to run it 100% requires you to deploy it to /usr/lib64 :

sudo ln -s $BUILD_DIR/ffmpeg_build/lib/libfaac.so.0 /usr/lib64/
export PATH=$PATH:/opt/bin
ffmpeg -version ffmpeg -encoders | grep nv

If something doesn't work to run FFMPEG at this point, something went wrong before. Test FFMPEG with NVENC and NVResize

Well for this part, I have followed the basic demo and test commands that this PDF guide suggested. So see the CPU and GPU usage, I had side-by-side in my TMUX sessions running htop and nvidia-smi (watch -d -n1 nvidia-smi). Now in a 3 part of my TMUX, I ran the different commands, such as :

ffmpeg -y -i Eucalyptus-Ansible-Deploy.mp4 -filter_complex nvresize=5:s=hd1080\|hd720\|hd480\|wvga\|cif:readback=0[out0][out1][out2][out3][out4] -map [out0] -acodec copy -vcodec nvenc -b:v 5M out0nv.mkv -map [out1] -acodec copy -vcodec nvenc -b:v 4M out1nv.mkv -map [out2] -acodec copy -vcodec nvenc -b:v 3M out2nv.mkv -map [out3] -acodec copy -vcodec nvenc -b:v 2M out3nv.mkv -map [out4] -acodec copy -vcodec nvenc -b:v 1M out4nv.mkv ffmpeg -y -i Eucalyptus-Ansible-Deploy.mp4 -acodec copy -vcodec nvenc -b:v 2M /var/tmp/test_parallel.mp4

Tada ! I hope this guide will have helped you guys in your different experimentation with AWS and will enjoy playing around with it.


Do not forget to stop/terminate those instances. A potential 600USD bill will wait for your per GPU instance running. Still less than traditional IT but still .. :P

Eucalyptus VPC with Midokura


In 2014 VPC became the default networking mode in AWS letting the EC2 Classic networking mode go. VPC is a great way to manage and have control over the network environment into which the AWS resources will run. It also gives full control in the case of an hybrid cloud or at least in the case of your IT extension, with a lot of ways to interconnect the two.

A lot of new features came out from this but most importantly, VPC would provide the ability for everyone to have backend applications running in Private. No public traffic, no access to and from the internet unless wanted. A keystone for AWS to promote the Public cloud as a safe place.


Midokura is a SDN software which is used to manage routing between instances, to the internet, security groups etc. The super cool thing about about Midokura is its capacity to be high-available and scalable in time. Of course being originally a networking guy, I also find super cool to have BGP capability.


Here is what my architecture looks like :


VLAN 1001 is here for Eucalyptus components communication. VLAN 1010 is here for Midonet components communication including Cassandra and Zookeeper. VLAN 1002 is our BGP zone. VLAN 0 is basically the ifaces default network which is not relevant here. We will use it only for packages download etc.

From here, we need :

  • UFS (CentOS 6)
  • CLC (CentOS 6)
  • CCSC (CentOS 6)
  • NC(s) (CentOS 6)
  • Cassandra/Zookeeper (CentOS 7)
  • BGP Server / Router

Of course, you could use all on the same L2/L3 network. But you are not a bad person, are you ? ;) For the Eucalyptus components we will be on CentOS 6.6, CentOS 7 for the others. Some of the Eucalyptus components will have a midonet component depending on the service they will be running. Today we will do a single-cluster deployment, but nothing will change in that regard.

But before going any further, we should sit and understand how Midokura and Eucalyptus will work together. So, Midokura is here as a SDN provider. Eucalyptus is here to understand VPC / EC2 etc API calls and pilot the other components to provide the resources. Now, what is VPC ? VPC stands for Virtual Private Cloud. Technically what it means ? You will be able to create a delimited zone (define the perimeter) specifying the different networks (subnets) in which instances will be running. By default, those instance will have a private IP address only, and no internet access unless you specifically give it.

In a classical envirnoment, that would correspond to have different routers (L3) connected to different Switches responsible for traffic isolation (L2). Here, this is exactly what Midokura will do for us. Midokura will create virtual routers and switches which will be used by Eucalyptus to place resources.

How Eucalyptus and Midokura work together - Logical view

I am on my Eucalyptus cloud. No VPC has been created. At this time, I have no resources created. Eucalyptus will have created on midokura 1 router, called "eucart". This router is the "top-upstream" router, to which new routers will be created. For our explanation, we will call this router EUCART.

So now, when I create a new VPC, I do it with

euca-create-vpc CIDR

This in the system will create a new router, which we will call RouterA. This routerA will be responsible for the communication of instances between subnets. But at this time, I have no subnets. So, let's create two :

euca-create-subnet -c VPC_ID -i CIDR -z AZ
euca-create-subnet -c VPCA -i -z euca-az-01
euca-create-subnet -c VPCA -i -z euca-az-01

Now I have two different subnets. If I had to represent the network we have just created we would have :


Of course, if I had multiple VPC, the we would simply have duplicated VPCA group, and more switchs if we had more subnets.

As in AWS, you can have an "internet gateway" which is simply a router which will all instances to have public internet access, as soon as they have an Elastic IP (EIP) associated with those.

Here it is for a few logical mechanism. Let's attack the technical install. Brace yourself, it is going go take a few times.

Eucalyptus & Midokura - Install

First we are going to begin with Eucalyptus. I won't dig too much on the steps you can find here for the packages install. As said, we gonna have in this deployment : 1 CLC, 1UFS, 1 CC/SC and 1 NC. But before initializing the cloud, we are going to do a few VLANs to separate our traffic.

Servers Network configuration

vconfig add <iface> <vlan number>
vconfig add em2 1000
ifconfig em2.1000 192.168.1.XXX/26 # change the IP for each component

For the NC


Don't assign the IP on the VLAN iface

vconfig add em2 1000
ifconfig em2.1000 up
brctl addbr br0
brctl addif br0 em2.1000

Eucalyptus network Configuration

From here, make sure all components can talk to each other. Then, for our CLC, UFS and SC we are going to specify that they have to use the VLAN iface in eucalyptus.conf

vi /etc/eucalyptus/eucalyptus.conf
CLOUD_OPTS="--bind-addr=192.168.1.XX" # here, use your machine's IP as set previously
euca_conf --initialize

While it is initializing, we are going to add on our CLC and NC a new VLAN, as well as on the Cassandra and Zookeeper machines. For those, I will use VLAN 1001

vconfig add em2 1001
# here I am putting my machines in a different subnet than for VLAN 1000 | change for each machine
ifconfig em2.1001 up

Cassandra - Zookeper

Alright, let's move onto the Cassandra / Zookeeper machine. In this deployment I will have only 1 machine to host the cluster, but of course for production, the minimum recommended is 3 to have a better scale and redundancy capacity.


File datastax.repo

# DataStax (Apache Cassandra)
name = DataStax Repo for Apache Cassandra
baseurl = http://rpm.datastax.com/community
enabled = 1
gpgcheck = 0
gpgkey = https://rpm.datastax.com/rpm/repo_key

File midonet.repo


name=MidoNet OpenStack Integration

name=MidoNet 3rd Party Tools and Libraries

Packages install

yum install zookeeper zkdump cassandra21 java-1.7.0-openjdk-headless

Once it is installed, we need to configure the runtime environment for those applications. First, Zookeeper.

Zookeeper configuration

mkdir -p /usr/java/default/bin/
ln -s /usr/lib/jvm/jre-1.7.0-openjdk/bin/java /usr/java/default/bin/java
mkdir /var/lib/zookeeper/data
chmod 777 /var/lib/zookeeper/data

We also need to add a configuration line in zoo.cfg File /etc/zookeeper/zoo.cfg

server.1=<VLAN 1001 IP>:2888:3888

We can now indicate which server ID we are and start the service:

echo 1 > /var/lib/zookeeper/data/myid
systemctl start zookeeper.service

Cassandra configuration

Cassandra is a bit easier to install and configure. Simply replace a few values into the configuration files

File /etc/cassandra/cassandra.yaml

cluster_name: 'midonet'
rpc_address: <VLAN 1001 IP>
seeds: <VLAN 1001 IP>
listen_address: <VLAN 1001 IP>

Now we clean it up and start the service:

rm -rf /var/lib/cassandra/data/system/
systemctl restart cassandra.service

And this is :) Of course, don't forget to open the ports on your FW and / or local machine.

Eucalyptus configuration

In the meantime, our cloud has been initialized. Now simply register the services as indicated in the documentation - register components. Don't forget to use the VLAN 1000 IP address for registration.

For all components, don't forget to change the VNET_MODE value to "VPCMIDO" instead of "MANAGED-NOVLAN" which will indicate to the components that their configuration must fit VPCMIDO requirements.

So, from here the Cassandra and Zookeeper will allow us to have midonet-API and midolman installed. Midonet-API is here to be the endpoint against which the Eucalyptus components will do API calls to create new routers and switches, as well as configure security groups (L4 filtering). Midolman is here to connect the different systems together and make the networking possible. You MUST have a midolman on each NC and CLC. Midonet-API is only to be installed on the CLC.

To have the API working, for now in Eucalyptus 4.1.0 we have to (sadly) have it installed on the CLC and the CLC only (that is the sad thing). Here our CLC will act as the "Midonet Gateway", this EUCART router I was talking about previously.

Let's do the install : (of course, here you will also need the midonet.repo we used before).

yum install tomcat midolman midonet-api python-midonetclient

Tomcat will act as server for the API (basically). Unfortunately, the port in Eucalyptus to talk to the API has been hardcoded :'( to 8080. So before going any further we need to change one of Eucalyptus' port to a different on Eucalyptus itself:

$> euca-modify-property -p www.http_port=8081
8081 was 8080

If you don't make this change, your API will never be available.

Now the packages are installed and the port 8080/TCP free, we must configure Tomcat itself to serve the midonet API. Add a new file into /etc/tomcat/Catalina/localhost named "midonet-api.xml"

File /etc/tomcat/Catalina/localhost/midonet-api.xml


Good, so now we need to configure Midonet API to get connected to our Zookeeper server. Go into /usr/share/midonet-api/WEB-INF and edit the file web.xml

              <param-value>http://<CLC VLAN 1001 IP>:8080/midonet-api</param-value>

 # [...]
 <!-- Specify the class path of the auth service -->
              # old value is for Keystone OS
              # new value ->

              <!-- comma separated list of Zookeeper nodes(host:port) -->
              <param-value><ZOOKEEPER VLAN 1001 IP>:2181</param-value>

Alright, now we can start tomcat which will enable the midonet-api. To verify, you can simply do a curl call on the entry point

curl <CLC VLAN 1001 IP>:8080/midonet-api/


We can configure midolman. The good thing about the midolman configuration, is that you can use the same configuration across all nodes. Once more, we simply have to change a few parameters to use our Cassandra / Zookeeper server. Edit /etc/midolman/

#zookeeper_hosts =,,
zookeeper_hosts = <ZOOKEEPER VLAN 1001 IP>:2181
session_timeout = 30000
midolman_root_key = /midonet/v1
session_gracetime = 30000

#servers =,,
servers = <CASSANDRA VLAN 1001 IP>:9042
# DO CHANGE THIS, recommended value is 3
replication_factor = 1
cluster = midonet

This is it, nothing else to configure.


you need to have IP route with netns installed. To verify, simply try "ip netns list". If you end with an error, you need to install the iproute netns package.

Now we are done with the config files, we can start the services. For Midolman, there is not default init.d script installed. So, here it is :

File /etc/init.d/midolman

# midolman      Start up the midolman virtual network controller daemon
# chkconfig: 2345 80 20
# Provides: midolman
# Required-Start: $network
# Required-Stop: $network
# Description:  Midolman is the virtual network controller for MidoNet.
# Short-Description: start and stop midolman
# Midolman's backwards compatibility script to forward requests to upstart.
# Based on Ubuntu's /lib/init/upstart-job

set -e
if [ -z "$1" ]; then
echo "Usage: $0 COMMAND" 1>&2
                                exit 1
case $COMMAND in

     status_output=`status "$JOB"`
     echo $status_output
     echo $status_output | grep -q running
     if status "$JOB" 2>/dev/null | grep -q ' start/'; then
     if [ -z "$RUNNING" ] && [ "$COMMAND" = "stop" ]; then
         exit 0
     elif [ -n "$RUNNING" ] && [ "$COMMAND" = "start" ]; then
         exit 0
     elif [ -n "$DISABLED" ] && [ "$COMMAND" = "start" ]; then
         exit 0
     $COMMAND "$JOB"
     if status "$JOB" 2>/dev/null | grep -q ' start/'; then
     if [ -n "$RUNNING" ] ; then
         stop "$JOB"

     # If the job is disabled and is not currently running, the job is
     # not restarted. However, if the job is disabled but has been forced into the
     # running state, we *do* stop and restart it since this is expected behaviour
     # for the admin who forced the start.

     if [ -n "$DISABLED" ] && [ -z "$RUNNING" ]; then
         exit 0
     start "$JOB"
     reload "$JOB"
     $ECHO "$COMMAND is not a supported operation for Upstart jobs." 1>&2
     exit 1

Once you have installed and configured midolman for every components, we need to configure midonet to have all our cloud component, here we will simply call them "hosts" (the terminology is very important).

Back on our CLC, let's add a midonetrc file so we don't have to specify the IP address everytime

api_url = http://<CLC VLAN 1001 IP>:8081/midonet-api
username = admin
password = admin
project_id = admin

Here, the credentials are not important and won't work. So anytime, to get onto midonet-cli, use the option "-A"

Before we get any further, there are 2 new packages which MUST be installed on the CLC : eucanetd and nginx. Explanations later.

yum install eucanetd nginx -y

We are half the way. I know, sounds like quite a lot. But in fact, that is not that much. We now need to configure Eucalyptus network configuration. This, as for EDGE, is done using a JSON template. Pay attention, a mistake will cause you headaches for a long time.

  "Mode": "VPCMIDO",
  "Mido": {
    "EucanetdHost": "odc-c-30.prc.eucalyptus-systems.com",
    "GatewayHost": "odc-c-30.prc.eucalyptus-systems.com",
    "GatewayIP": "",
    "GatewayInterface": "em1.1002",
    "PublicNetworkCidr": "",
    "PublicGatewayIP": ""

So here, what does it mean ?

  • InstanceDnsServers : The list of nameservers. Nothing unexpected
  • Mode : VPCMIDO : Indicates to the cloud that VPC is enabled
  • PublicIps : List of ranges and / or subnets of Public IPs which can be used by the instances
  • Mido : This is the most important object !!
  • EucanetdHost: String() which points onto the server which runs the eucanetd binary and midonetAPI
  • GatewayHost: String() which points onto the server which runs the midonet GW. As said for now the GW and EucanetdHost must be the same machine.
  • GatewayIP : String() which indicates which IP will be used by the router EUCART. Here, you must use an IP address which DOESNT EXIST !!!
  • GatewayInterface : The IFACE which is used for the GatewayIP. Here, I had created a dedicated VLAN for it, vlan 1002.
  • PublicNetworkCidr: String() which is the network / subnet for all your public IPs. Here in my example, I am using a /16 and defined only a /24 for my cloud public IPs. It is because I can have multiple Clouds in this /16 which each will use a different range of IPs
  • PublicGatewayIP : String() which points on our GBP router.


Don't forget that the GatewayInterface must be an interface WITHOUT an IP address set

For now in 4.1.0 as VPC is techpreview, many configuration and topologies are not yet supported. So for now, you must keep the MidonetGW on the CLC and have the EucanetdHost and EucanetdHost pointing onto the CLC DNS' name. And this MUST be a DNS name, otherwise the net namespaces wont be created correctly.

Also as we speak of DNS, if those VLAN we created can lead to resolve the hostname, you MUST add in your local hosts file the VLAN1001 IP to resolve your hostname.

Alright, at this point we can have instances created into subnets, but they won't be able to get connected to external networks. We need to setup BGP and configure midonet for that.

Get onto the the BGP server. Here, we are only going to create 1 VLAN, which we will use for public Addresses of instances. Here, we gonna use and our BGP router will use as we indicated into the JSON previously.

vconfig add em1 1002
ifconfig em2.1002 up

To have it working, it is very easy : (originally I followed this tutorial)

yum install quagga
setsebool -P zebra_write_config 1

Now the vty config itself and BGP are really simple :

File /etc/quagga/bgpd.conf

! -*- bgp -*-
! BGPd sample configuratin file
! $Id: bgpd.conf.sample,v 1.1 2002/12/13 20:15:29 paul Exp $
hostname bgpd
password zebra
!enable password please-set-at-here

router bgp 66000
bgp router-id
neighbor remote-as 66742
neighbor remote-as 66743

log file bgpd.log

Here we can see that the server will get BGP information from 2 "neighbor" with unique IDs. We will later be able to have 1 peer per midonet GW which will be used by the system to reach networks.

To simplify : the BGP server is waiting for information coming from other BGP servers. Those BGP servers will be our MidonetGW. Our MidonetGW will then announce themselves to the server saying "hi, I am server ID XXXX, and I know the route to YY networks". Once the announcement is done on the root BGP router, all traffic going to it to reach our instances EIP will be sent onto our MidonetGW.

Here is the zebra config.

cat /etc/quagga/zebra.conf
! Zebra configuration saved from vty
!   2015/03/05 13:14:09
hostname Router
password zebra
enable password zebra
log file /var/log/quagga/quagga.log
interface eth0
ipv6 nd suppress-ra
interface eth0.1002
ipv6 nd suppress-ra
interface eth1
ipv6 nd suppress-ra
interface lo
line vty

Alright, almost finished !. Back on our CLC, we need to configure the EUCART router.

# we list routers
midonet> router list
router router0 name eucart state up

# we list the ports on the router
midonet> router router0 list port
port port0 device router0 state up mac ac:ca:ba:b0:df:d8 address net peer bridge0:port0
port port1 device router0 state up mac ac:ca:ba:09:b9:47 address net

# At this point, we know that we must configure port1 as it has the GWIpAddress we have set in the JSON earlier. Check if there is any BGP configuration done on it

midonet> router router0 port port1 list bgp
# nothing, it is normal we have not set anything yet
# first we need to add the BGP peering configuration

midonet> router router0 port port1 add bgp local-AS 66742 peer-AS 66000 peer
# here, note that the values are the same as in bgpd.conf . Our router is ID 66742 where the root BGP is 66000

# Now, we need to indicate that we are the routing device to our public IPs

router router0 port port1 bgp bgp0 add route net
# in my JSON config I have used only 240 addresses, but those addresses fit into this subnet
# ok, at this point, things can work just fine. Last step is to indicate on which port the BGP has to be
# to do so, we need to spot the CLC VLAN 1002 interface

midonet> host list
host host0 name odc-c-33.prc.eucalyptus-systems.com alive true
host host1 name odc-c-30.prc.eucalyptus-systems.com alive true

# Now we create a tunnel zone for our hosts
tunnel-zone create name euca-mido type gre
# Add the hosts
tunnel-zone list
tunnel-zone tzone0 add member host host0 address A.B.C.D
tunnel-zone tzone0 add member host host1 address X.Y.Z.0

# here my GW is host1

midonet> host host1 list interface
iface midonet host_id host1 status 0 addresses [] mac 9a:3a:bd:6d:89:c2 mtu 1500 type Virtual endpoint DATAPATH
iface lo host_id host1 status 3 addresses [u'', u'0:0:0:0:0:0:0:1'] mac 00:00:00:00:00:00 mtu 65536 type Virtual endpoint LOCALHOST
iface em2.1001 host_id host1 status 3 addresses [u'', u'fe80:0:0:0:baac:6fff:fe8c:e96d'] mac b8:ac:6f:8c:e9:6d mtu 1500 type Virtual endpoint UNKNOWN
iface em1.1002 host_id host1 status 3 addresses [u'fe80:0:0:0:baac:6fff:fe8c:e96c'] mac b8:ac:6f:8c:e9:6c mtu 1500 type Virtual endpoint UNKNOWN
iface em2.1000 host_id host1 status 3 addresses [u'', u'fe80:0:0:0:baac:6fff:fe8c:e96d'] mac b8:ac:6f:8c:e9:6d mtu 1500 type Virtual endpoint UNKNOWN
iface em1 host_id host1 status 3 addresses [u'', u'fe80:0:0:0:baac:6fff:fe8c:e96c'] mac b8:ac:6f:8c:e9:6c mtu 1500 type Physical endpoint PHYSICAL
iface em2 host_id host1 status 3 addresses [u'', u'fe80:0:0:0:baac:6fff:fe8c:e96d'] mac b8:ac:6f:8c:e9:6d mtu 1500 type Physical endpoint PHYSICAL

# iface em1.1002 -> no IP address, good. We can now bind the router onto it.
midonet> host host1 add binding port router0:port1 interface em1.1002

On your CLC, you should see new interfaces being created called "mbgp_X". This is good sign. This means that your BGP processes are running and broadcasting information. Let's check on the upstream BGP that we have learned those routes.


Hello, this is Quagga (version
Copyright 1996-2005 Kunihiro Ishiguro, et al.

router# show ip bgp
BGP table version is 0, local router ID is
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete

Network          Next Hop            Metric LocPrf Weight Path
*                  0         32768 i
*           0         0 66742 i

Total number of prefixes 2

Here we can see that we have router 66742 which has announced he knows the route to the subnet

Now on our cloud, if we get an EIP on the instances, we will be able to reach those instances and / or host services accessible from - potentially - anywhere.

CEPH & Eucalyptus for Block Storage

Today I did my first install of CEPH, which I used as backend system for the Elastic Block Storage (EBS) in Eucalyptus. The advantage of CEPH is that it is a distributed system which is going to provide you replication (persistence for data), redundancy (we are using a pool of resources, not a single target) and scalability : the easiest way to add capacity is to add nodes which will be used for storage, and those nodes can be simple machines.

I won't go too deep into CEPH installation. The official documentation is quite good for anyone to get up to speed quickly. I personally had no troubles using CentOS7 (el7). Also, I won't go too deep into Eucalyptus installation, I will simply share with you the CEPH config files and my NC configuration files which have some values non-indicated in Eucalyptus docs.

I will simply spend some time to configure a non-admin user in CEPH which I will use for my Eucalyptus cloud. Back on your ceph admin node, simply have :

Create the pools

In CEPH, there is a default pool called 'rbd' (pool 0). I don't like to use the default values and settings when I deploy components I can tune / adapt to my use-case. So here, I am going to create 2 pools : 1 to store the EBS volumes and 1 to store the EBS snapshots

ceph osd pool create eucavols 64
ceph osd pool create eucasnaps 64

Here, that's about all we had to do to create the pools.


The number you set after the pool name (here, 64 in my example) depends on how many OSDs and replication factor you want to assign. If you have a dev test cluster, small numbers will do. For larger deployments, ref. to the CEPH pool pg planning. A factor of 2 is always best (save yourself CPU cycles ;) )

Create a CEPH user for our pools

When you installed your CEPH cluster, you used and did every basic activities using the ceph administrator keys and credentials. Just like you tell people not to dev as root, you don't let the softwares (here, Eucalyptus) with too many power over your cluster. So, without any further wait, we are going to create a CEPH user, called "eucalyptus", which will have only read access on the monitors, and full control over eucavols and eucasnaps pools.

ceph auth get-or-create client.eucalyptus mon 'allow r' osd 'allow rwx pool=rbd, allow rwx pool=eucavols, allow rwx pool=eucasnaps, allow x' \
     -o ceph.client.eucalyptus.keyring

Running that command on the monitor as the CEPH admin will create a eucalyptus user and generate the eucalyptys.keyring file. Copy that keyring file on all your NC and on the SC (by default to /etc/ceph/, otherwise, as follows below).

The ceph.conf on the SC and NC

When Eucalyptus implemented CEPH storage, it was on a fairly "old" version of CEPH at first, and some non-default parameters that are generated by the CEPH installation scripts are missing. Here is what the ceph.conf file has to look like on the NC and the SC.

fsid = ef66f3c8-2cbe-4195-8fbc-bc2b14ba6d69
public_network =
cluster_network =
mon_initial_members = nc-0
mon_host =
# mon addr is not by default in the configuration. Do not forget to add it as follows:
# if you have multiple, simply list them with a ',' as separator
mon addr =
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

Eucalyptus NC configuration

In /etc/eucalyptus/eucalyptus.conf, edit files accordingly. Make sure the eucalyptus Linux user/group has read access on the keyring file and ceph.conf file

CEPH_USER_NAME=eucalyptus # BECAREFUL - THIS IS NOT THE EUCALYPTUS LINUX ACCOUNT but the CEPH client you created previously.
CEPH_KEYRING_PATH=/var/lib/eucalyptus/ceph-config/ceph.client.eucalyptus.keyring # my key file path
CEPH_CONFIG_PATH=/var/lib/eucalyptus/ceph-config/ceph.conf; # ceph config path

Here we go. Now to confirm this work as expected, run a new instance, create a new volume and run

euca-attach-volume -i <instance_id> -d <device path # /dev/sdz > <volume_id>

I know it might look too easy to be true, but this is it ! You have successfully configured Eucalyptus to use CEPH as a backend storage for your EBS volumes. Enjoy the IOps and space saver CEPH ;)

Instance-Store vs EBS-backed

What is that ?

On Eucalyptus, and also on Amazon Web Services (AWS) there are two types of backed instances : instance-store and EBS. Each one has its advantages and drawbacks, and what we are going to see here is which they are, and how can we use it the most efficiently.

If you've just setup a new Eucalyptus cloud using the fast-start (http://www.eucalyptus.com/eucalyptus-cloud/get-started/try/faststart) you will follow the instructions and finally arrive at this point : run a VM. So here we are


Instance-Store are usually the first type of VM you will add on your cloud. Very easy to use thanks to eustore (http://www.eucalyptus.com/docs/eucalyptus/3.1/ug/eustore-browse-install.html), instance-store EMIs are stored on Walrus (S3 on AWS). These EMIs are very light, with the minimal setup and packages. This is why you will be able to find images of 500Mo for a complete CentOS system. In an instance-store, there are 3 components :

  • Kernel Image
  • Ramdisk Image
  • System Image

Obviously, the kernel and the ramdisk are here to fit your hypervisor (XEN / KVM) and still work on the same way. Then we have the image. This is your OS image. Compressed, this will be expanded to a static size once it runs in your cloud. So as this EMI is packaged in S3, this is very easy for you to have it on all your region.


  • Really fast to run:
    • It is a package downloaded from the Walrus server it is really fast and your VM will start within the minute (depending on your network speed).
  • Light:
    • It is a small package. This do not use a lot of disk space on your walrus.
  • On demand ready :
    • With the eustore, you have access to a lot of official EMIs and the community ones.
  • Ephemeral root device usually have a best IO than EBS and disk space is not paid


  • Volatile:
    • On a NC crash, all the Instances will be lost (and also the data stored on its primary hard drive)
  • Harder to customize and create new EMIs from a running one, as its hard drive is deleted on shutdown, and no on the fly creation
  • Cannot be stopped : on stop, instance will always go terminated

bfEBS Instances

As you can see on the picture, bfEBS are instances which can be ran on any node like an instance-store one, but which boots here, not from the hard drive of the Node Controller, but from the Storage Controller EBS Volume. This will make our VM dependant on the IOps limitations. This also is a useful feature, this way you will be able to create instances with disks using different IOps according to the VM usage: low IOps for front-end VMs and high-IOps for databases or file servers.


  • Fault tolreant
    • If the NC crashs, the Instance can be restarted on another NC.
  • Different IOps levels
  • Root device can be changed dynamically
  • Snapshot the volumes
  • On the fly AMI creation


  • On Eucalyptus, this may take more time in overlay/das/clvm modes (no DELL/EMC2/NetAPP SAN)
  • This uses disk space which is paid on AWS

Log with RSyslog

The logs on a system are the more useful files a developer or a sysadmin will use to know if everything is running in a proper way, or to find a bug in an application. For one or two servers, that's not very difficult to manage it, but when you've got more than ten servers to manage, go on each server can be really long. That's why rsyslog was created. Thanks to this program, you will be able to send all your logs (depending on your criteria) to a remote server which will catch them and allow you to collect all the logs from your servers.

There is a very simple design for rsyslog :


As can imagine, this would be useful to have this on your cloud. And you can also have VMs from scratch or VMs created by an auto-scaling group. To not be search in every logs to find to which server these logs are, we also are going to create templates to separate each files in different directory according to the server's name. In this part we will log everything as files, but you can also store every logs in databases and use webapps to have a user friendly view of every logs.

So first go on the machine you want to use as log server. Here I'm using Debian 7, and the config is the same on RedHat. We're going to modify /etc/rsyslog.conf which is the main conf file of rsyslog.

So first, we will consider that the syslog server is a syslog client, we can also comment a part of the configuration, and at the same time, activate the listening system.

#  /etc/rsyslog.conf    Configuration file for rsyslog.
#                       For more information see
#                       /usr/share/doc/rsyslog-doc/html/rsyslog_conf.html
#### MODULES ####

$ModLoad imuxsock # provides support for local system logging
$ModLoad imklog   # provides kernel logging support
#$ModLoad immark  # provides --MARK-- message capability

# provides UDP syslog reception
#$ModLoad imudp
#$UDPServerRun 514
# provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 514
# Use traditional timestamp format.
# To enable high precision timestamps, comment out the following line.
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# Set the default permissions for all log files.
$FileOwner root
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022
# Where to place spool and state files
$WorkDirectory /var/spool/rsyslog
# Include all config files in /etc/rsyslog.d/
$IncludeConfig /etc/rsyslog.d/*.conf
#### RULES ####
# First some standard log files.  Log by facility.
auth,authpriv.*                        /var/log/auth.log
#*.*;auth,authpriv.none         -/var/log/syslog
#cron.*                         /var/log/cron.log
daemon.*                       -/var/log/daemon.log
kern.*                         -/var/log/kern.log
lpr.*                          -/var/log/lpr.log
mail.*                         -/var/log/mail.log
user.*                         -/var/log/user.log
# Logging for the mail system.  Split it up so that
# it is easy to write scripts to parse these files.
mail.info                      -/var/log/mail.info
mail.warn                      -/var/log/mail.warn
mail.err                       /var/log/mail.err
# Logging for INN news system.
news.crit                      /var/log/news/news.crit
news.err                       /var/log/news/news.err
news.notice                    -/var/log/news/news.notice
# Some "catch-all" log files.
#       auth,authpriv.none;\
#       news.none;mail.none     -/var/log/debug
#       auth,authpriv.none;\
#       cron,daemon.none;\
#       mail,news.none          -/var/log/messages
# Emergencies are sent to everybody logged in.
*.emerg                                :omusrmsg:*

# I like to have messages displayed on the console, but only on a virtual
# console I usually leave idle.
*.=notice;*.=warn       /dev/tty8
# The named pipe /dev/xconsole is for the `xconsole' utility.  To use it,
# you must invoke `xconsole' with the `-file' option:
#    $ xconsole -file /dev/xconsole [...]
# NOTE: adjust the list below, or you'll go crazy if you have a reasonably
#      busy site..
#       news.err;\
#       *.=debug;*.=info;\
#       *.=notice;*.=warn       |/dev/xconsole

DNS - Bind9/Named [Part 5]

Forwarding is also one of the very interesting capacity of bind. Imagine, we have somewhere.net hosted by primary DNS ns1 and we have a really big zone "tutorials" which is held by a secondary DNS ns2. We'd like that when we query "test.tutorials.somewhere.net" on master server (which does not have the zone hosted in his configuration files), the ns1 server will ask to ns2.somewhere.net. There, ns0 will act as a forwarder.

To do so, what we're going to do is tell the ns0 server that "tutorials" is held by ns2. So in our db.somewhere.net we have :

; db for somewhere.net
$TTL    86400
@            IN    SOA       ns1.somewhere.net. root.somewhere.net. (
                             1         ; Serial
                             604800         ; Refresh
                             86400         ; Retry
                             2419200         ; Expire
                             86400 )       ; Negative Cache TTL
; Here we define the nameservers of the domain.
@            IN    NS        ns1.somewhere.net.
@            IN    NS        ns2.somewhere.net.
;Here we set the MX records for our domain
@            IN    MX    10  smtp.somewhere.net.
; Now we set the IP of the nameservers - Use yours
ns1          IN    A
ns2          IN    A
; Now, we set some zones
www          IN    A
smtp         IN    A

Great. We also have to add this zone as a forwarded one in our named.conf.local

zone "tutorials.somewhere.net"
     type forward;
     forward only;
     forwarders {; };

DNS - Bind9/Named [Part 4]

Ok now we have a DNS server with all queries logged, and some ACLs management. But what if this server fails ? To provide high availability, we're going to setup a mastery-slave DNS system. Of course, do not forget to add the second DNS in your clients configuration.

So on the master server, we are going to create a key we will use to authenticate updates :

dnssec-keygen -a hmac-md5 -b 256 -n host somewhere.net

Once the key is generated, get the key.

cat <yourkey>.private | grep Key

We are going to create a new file in which we will add our server set : ns-lan.conf

     algorithm hmac-md5;
     secret "<your key here>";
     keys { LAN-TRANSFER; };
     keys { LAN-TRANSFER; };

Ok so now bind is able to identify each server and the key assigned to it. Copy the key and configuration file on your slave server. Now, I will assume that your secondary server has the same original named.conf.local and db.somewhere.net. Let's see the local config file, and edit some parts

zone "somewhere.net"
     type slave;
     file "/etc/bind/db.somewhere.net";
     masters {; };
     allow-notify {; };

Here our server is now serving the DNS zone as a slave. And if your make an update on the master, the slave server will be notified about it and be updated. (Do not forget to give bind write access on /etc/bind). Now to test it, that's very easy : edit your master server zone config file, and thanks to the logs, see if the slave received the notification. After that, stop the master server and try to ping (or use dig - more advanced) to see if the slave server is able to give you the correct answer.

For a "master-master" configuration, add on your masters the instruction :

also-notify { sec-master-ip; };"