07
Sep

Reconfiguring Mesos Agents (Slaves) with new resources

Comments Off on Reconfiguring Mesos Agents (Slaves) with new resources

Problem:

You want to add new resources to a Mesos Agent. Maybe you want to open new ports or restrict the number of CPUs, etc. When restarting the Mesos Agent you get an error like “Failed to perform recovery: Incompatible slave info detected.”

By default Mesos Agents ( as of Mesos 0.23 ) tries to recover the state using a”strict” flag. If strict=true, any and all recovery errors are considered fatal.

Recovery is a nice thing to have and it’s comforting to know that if Mesos Agent restarts things resume from a known state.

Solution:

When the state of the Mesos Agents does not matter, then one way to solve the problem is either to restart Mesos Agent with the “strict” flag set to false, or to clear the state and start fresh, also killing any running docker processes. To achieve the latter you can issue:

# the script bellow works with systemd and you should adapt it to your system
$ systemctl stop mesos-slave
# update resources
$ vi /etc/mesos-slave/resources
# cleanup any previous state
$ rm -rf /tmp/mesos/meta/
# restart docker process too
$ systemctl restart docker
# start mesos-slave and watch for any errors in the logs
$ systemctl start mesos-slave && journalctl -u mesos-slave.service -f -a

Resources:

Mesos Slave Recovery
Mesos Configuration
Mesos Attributes and Resources

No Comments

No comments yet.

Sorry, the comment form is closed at this time.

Comments RSS Feed