I was able to get swarm to register mesos workers today.
Rather then running swarm externally, I just tried to run my swarm setup on my mesos cluster so that I wouldn't have to worry about external IPs, etc... so
1) SSH into the mesos master
... At my first attempt, I tried to run this against a mesos maseter @ 0.0.0.0:5050. But it turns out, that doesn't work... So,
On the running mesos cluster, I first just ran netstat to find where 5050 was bound to:
tcp 0 0 10.1.4.62:5050 0.0.0.0:* LISTEN 3071/mesos-master
From there, I filled it into the shell of the command from:
https://github.com/docker/swarm/blob/master/cluster/mesos/README.md
2) ip-10-0-5-63 opt # docker run -e SWARM_MESOS_USER=root -p 2375:2375 -p 3375:3375 swarm manage -c mesos-experimental --cluster-opt mesos.address=0.0.0 --cluster-opt mesos.port=3375 10.1.4.62:5050
time="2017-07-17T20:47:02Z" level=warning msg="WARNING: the mesos driver is currently experimental, use at your own risks"
ERROR: logging before flag.Parse: I0717 20:47:02.138308 1 scheduler.go:323] Initializing mesos scheduler driver
ERROR: logging before flag.Parse: I0717 20:47:02.138490 1 scheduler.go:792] Starting the scheduler driver...
ERROR: logging before flag.Parse: I0717 20:47:02.138626 1 http_transporter.go:407] listening on 0.0.0.0 port 3375
ERROR: logging before flag.Parse: I0717 20:47:02.138720 1 scheduler.go:809] Mesos scheduler driver started with PID=scheduler(1)@172.17.0.2:3375
ERROR: logging before flag.Parse: I0717 20:47:02.147828 1 scheduler.go:374] New master master@10.0.5.63:5050 detected
ERROR: logging before flag.Parse: I0717 20:47:02.147864 1 scheduler.go:435] No credentials were provided. Attempting to register scheduler without authentication.
ERROR: logging before flag.Parse: I0717 20:47:02.150182 1 scheduler.go:535] Framework registered with ID=735b9761-22db-42bb-82b3-c811b164542b-0006
time="2017-07-17T20:47:02Z" level=info msg="Listening for HTTP" addr=":2375" proto=tcp
time="2017-07-17T20:47:02Z" level=error msg="Cannot connect to the Docker daemon at tcp://10.0.2.213:2375. Is the docker daemon running?"
time="2017-07-17T20:47:02Z" level=error msg="Cannot connect to the Docker daemon at tcp://10.0.0.149:2375. Is the docker daemon running?"
At that point, it seemed to work, but I got:
3) No available offers !:(...
docker -H tcp://0.0.0.0:2375 info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: swarm/1.2.8
Role: primary
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint, whitelist
Offers: 0
Plugins:
Volume:
Network:
Swarm:
NodeID:
Is Manager: false
Node Address:
Security Options:
Kernel Version: 4.7.3-coreos-r3
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
Name: 055c1bf98028
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
WARNING: No kernel memory limit support
4) The missing sauce ? It turns out that there is no gaurantee that all nodes in a cluster have docker EXPOSED through an external port. In order to fix this, you actually have to CREATE a unit file that exposes the docker daemon. For example, on a coreos cluster, you have to do this:
https://coreos.com/os/docs/latest/customizing-docker.html
Then, since the docker daemon is exposed, you can actually see offers:
Offers: 1
Offer: 735b9761-22db-42bb-82b3-c811b164542b-O515
└ ports: 1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000
└ disk: 34.74 GiB
└ cpus: 4
└ mem: 13.69 GiB
So TLDR
The final command I ran was:
docker run -p 2375:2375 -p 3375:3375 -e SWARM_MESOS_USER=root swarm manage -c mesos-experimental --cluster-opt mesos.address=0.0.0.0 --cluster-opmesos.port=3375 10.0.5.63:5050
The SWARM_MESOS_USER=root was necessary, otherwise you get:
And finally, if its really working, at the DCOS dashboard, you can see your docker images being monitored by DC/OS :) . A good way to test this is:
time="2017-07-17T23:29:04Z" level=error msg="HTTP error: Failed to launch container: Failed to create container: Failed to chown: Failed to get user information for '
': Success; Abnormal executor termination: unknown container : please verify your SWARM_MESOS_USER is correctly set" status=500
docker run -c 1 -it progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s
Followed by a quick glance at the dashboard:
In this case, the stress test I ran caused a bump (From 1->2) in the number of shares that were being used.
Also notice that we highlighted -c 1. It turns out that argument HAS TO BE PRESENT when running docker commands through DC/OS, because a resource has to be requested.

No comments:
Post a Comment