07
Sep

Reconfiguring Mesos Agents (Slaves) with new resources

Problem:

You want to add new resources to a Mesos Agent. Maybe you want to open new ports or restrict the number of CPUs, etc. When restarting the Mesos Agent you get an error like “Failed to perform recovery: Incompatible slave info detected.”

By default Mesos Agents ( as of Mesos 0.23 ) tries to recover the state using a”strict” flag. If strict=true, any and all recovery errors are considered fatal.

Recovery is a nice thing to have and it’s comforting to know that if Mesos Agent restarts things resume from a known state.

Solution:

When the state of the Mesos Agents does not matter, then one way to solve the problem is either to restart Mesos Agent with the “strict” flag set to false, or to clear the state and start fresh, also killing any running docker processes. To achieve the latter you can issue:

# the script bellow works with systemd and you should adapt it to your system
$ systemctl stop mesos-slave
# update resources
$ vi /etc/mesos-slave/resources
# cleanup any previous state
$ rm -rf /tmp/mesos/meta/
# restart docker process too
$ systemctl restart docker
# start mesos-slave and watch for any errors in the logs
$ systemctl start mesos-slave && journalctl -u mesos-slave.service -f -a

Resources:

Mesos Slave Recovery
Mesos Configuration
Mesos Attributes and Resources

05
Sep

Kitematic – Simple Cleanup Script for Docker

I was actually wondering when this problem would hit my local box on OS X and I though to share one solution in case it’s useful for anyone searching for a solution. The problem started when building a new docker image and I was getting the “no space left on device” message.

Solution:

$ docker-machine ssh dev
$ docker images | tail -n +2 | awk '$1 == "<none>" {print $3}' | xargs docker rmi
13
Feb

Monitoring Docker containers with cAdvisor from Marathon

Problem:

Monitor resource utilisation of Docker containers in a Mesos cluster. This is useful when deciding how much CPU and Memory to give to each container or for understanding when to scale up / down.

Solution:

cAdvisor is a simple to use monitoring tool for Docker containers. It provides a Docker container ready to run on each of the Mesos slaves.

With Marathon and Mesos is very easy to deploy a cAdvisor agent on each of the slaves. Marathon allows you to define constraints to make sure you can distribute the cAdvisor container evenly across all the Mesos slaves.

Bellow is the body of the HTTP POST request to be made to Marathon and deploy cAdvisor.

{
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "google/cadvisor:latest"
    },
    "volumes": [
      {
        "containerPath": "/rootfs",
        "hostPath": "/",
        "mode": "RO"
      },
      {
        "containerPath": "/var/run",
        "hostPath": "/var/run",
        "mode": "RW"
      },
      {
        "containerPath": "/sys",
        "hostPath": "/sys",
        "mode": "RO"
      },
      {
        "containerPath": "/var/lib/docker",
        "hostPath": "/var/lib/docker",
        "mode": "RO"
      },
      {
        "containerPath": "/cgroup",
        "hostPath": "/cgroup",
        "mode": "RO"
      }
    ],
    "network": "BRIDGE",
    "portMappings": [
      { "containerPort": "0.0.0.0:8080", "hostPort": "0.0.0.0:8080", "protocol": "tcp" }
    ]
  },
  "id": "cadvisor",
  "instances": 1,
  "cpus": 0.5,
  "mem": 512,
  "constraints": [
    [
      "hostname",
      "UNIQUE"
    ]
  ],
  "ports": [
    8080
  ]
}

If the Mesos slaves run on CentOS 7 as in my case, then you also need to open port 8080 in the firewall. The following commands allow you to do so:

 

$ firewall-cmd --zone=public --add-port=8080/tcp --permanent
$ firewall-cmd --reload

 

UPDATE: Also, make sure IP forward is ON when you run Docker in BRIDGE mode:

$ sysctl -w net.ipv4.ip_forward=1

Once the deployment is complete, you can access the cAdvisor UI from each of the slaves on port 8080.

cAdvisor-screenshot