How to properly run a container with containerd's ctr using --uidmap/gidmap and --net-host option - containers

I'm running a container with ctr and next to using user namespaces to map the user within the container (root) to another user on the host, I want to make the host networking available for the container. For this, I'm using the --net-host option. Based on a very simple test container
$ cat Dockerfile
FROM alpine
ENTRYPOINT ["/bin/sh"]
I try it with
sudo ctr run -rm --uidmap "0:1000:999" --gidmap "0:1000:999" --net-host docker.io/library/test:latest test
which gives me the following error
ctr: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"sysfs\\\" to rootfs \\\"/run/containerd/io.containerd.runtime.v2.task/default/test/rootfs\\\" at \\\"/sys\\\" caused \\\"operation not permitted\\\"\"": unknown
Everything works fine if I either
remove the --net-host flag or
remove the --uidmap/--gidmap arguments
I tried to add the user with the host uid=1000 to the netdev group, but still the same error.
Do I maybe need to use networking namespaces?
EDIT:
Meanwhile found out that it's an issue within runc. In case I use user namespaces by adding the following to the config.json
"linux": {
"uidMappings": [
{
"containerID": 0,
"hostID": 1000,
"size": 999
}
],
"gidMappings": [
{
"containerID": 0,
"hostID": 1000,
"size": 999
}
],
and additionally do not use a network namespace, which means leaving out the entry
{
"type": "network"
},
within the "namespaces" section, I got the following error from runc:
$ sudo runc run test
WARN[0000] exit status 1
ERRO[0000] container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"sysfs\\\" to rootfs \\\"/vagrant/test/rootfs\\\" at \\\"/sys\\\" caused \\\"operation not permitted\\\"\""
container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"sysfs\\\" to rootfs \\\"/vagrant/test/rootfs\\\" at \\\"/sys\\\" caused \\\"operation not permitted\\\"\""

Finally found the answer from this issue in runc. It's basically a restriction within the kernel that a user that does not own the network namespace does not have the CAP_SYS_ADMIN capability and without that can't mount sysfs. Since the user on the host that the root user within the container is mapped to did not create the host network namespace, it does not have CAP_SYS_ADMIN there.
From the discussion in the runc issue, I do see the following options for now:
remove mounting of sysfs.
Within the config.json that runc uses, remove the following section within "mounts":
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
In my case, I also couldn't mount /etc/resolv.conf. By removing these 2, the container did run fine and had host network access. This does not work with ctr though.
setup a bridge from the host network namespace to the network space of the container (see here and slirp4netns).
use docker or podman if possible that seem to use slirp4netns for this purpose. There is an old moby issue that also might be interesting.

Related

Unable to run a health check for mysql image on AWS ECS task

I have a cluster on AWS and it has a task that should run two containers. One of the containers called app has a dependency on my db container which is a mysql image and it should run when my db image is healthy.
I have managed to write a healthcheck in a docker compose with command
["CMD","mysqladmin","ping","-h","localhost"].
However I'm unable to run the same command on my aws container and the container is always unhealthy when I write this command on my task definition.
I have replaced CMD with CMD-Shell and it has the same result. I have also tried the generic healthcheck command advised by AWS [ "CMD-SHELL", "curl -f http://localhost/ || exit 1" ]
but since the mysql image does not have curl installed I get the same result again.
What am I doing wrong?
I'm using exactly the same command (CMD) in my test deployment and it works, this makes me think that maybe you're interpreting the unhealthy state as the health check command not working, but there are a number of other reasons for that state.
For example, during my tests it wasn't able to start because I was using EFS to provide permanent storage and every time I was updating the task, the files were not deleted, so the container was failing because it was Unable to lock ./ibdata1.
There's apparently a second issue anyway, because CMD and CMD-SHELL (which are part of the docker syntax and unrelated the OS), requires a different syntax: CMD need a list, while CMD-SHELL allows a single string as the second item of the array.
Basically, if you try this in compose, it works:
["CMD","mysqladmin","ping","-h","localhost"]
this does not:
["CMD-SHELL","mysqladmin","ping","-h","localhost"]
and it should be instead:
["CMD-SHELL","mysqladmin ping -h localhost"]
So to answer the question, make sure to create a log group on Cloudwatch and set the "logConfiguration" into the container definition and check the logs, more than likely there's something wrong into that container, and make sure to use the right syntax into the container definition, either this example
"healthCheck": {
"retries": 3,
"command": ["CMD","mysqladmin","ping","-h","localhost"],
"timeout": 5,
"interval": 5,
"startPeriod": null
},
or this one - provided a shell has been defined into the container
"healthCheck": {
"retries": 3,
"command": ["CMD-SHELL","mysqladmin ping -h localhost"],
"timeout": 5,
"interval": 5,
"startPeriod": null
},
Into your logs you should find a lot of record like
[Note] Access denied for user 'root'#'localhost' (using password: NO)
Which is the healthcheck running. It fails because we didn't set any password, but as the mysqladmin docs are stating:
The return status from mysqladmin is 0 if the server is running, 1 if it is not. This is 0 even in case of an error such as Access denied, because this means that the server is running but refused the connection, which is different from the server not running.

How to deploy MySQL docker image on AWS ECS?

I have troubles deploying MySQL image on AWS ECS FARGATE.
The cloudformation script that i have is this (dont mind the syntax, i am using python lib Troposphere to manage cloudfromation templates):
TaskDefinition(
'WordpressDatabaseTaskDefinition',
RequiresCompatibilities=['FARGATE'],
Cpu='512',
Memory='2048',
NetworkMode='awsvpc',
ContainerDefinitions=[
ContainerDefinition(
Name='WordpressDatabaseContainer',
Image='mysql:5.7',
Environment=[
Environment(Name='MYSQL_ROOT_PASSWORD', Value='root'),
Environment(Name='MYSQL_DATABASE', Value='wpdb'),
Environment(Name='MYSQL_USER', Value='root'),
Environment(Name='MYSQL_PASSWORD', Value='root'),
],
PortMappings=[
PortMapping(
ContainerPort=3306
)
]
)
]
)
The deployment succeeds. I can even see that the task is running for few seconds until its state changes to STOPPED.
The only thing that i can see is:
Stopped reason Essential container in task exited
Exit Code 1
On localhost it works like a charm. What am i doing here wrong? At least - are there ways to debug this?
With AWS ECS, if it is stopping, it may be failing a health check which is causing the container to restart. What port is the container DB mapped to and can you check the container logs to see what is happening when it starts then stops? Also, check the logs in ECS under the service or task. Post it here so I can take a look at them.
So, I found out a mistake.
THE VERY FIRST THING YOU DO - is you test that docker container on localhost and see if you can reproduce the issue. In my case docker mysql container on a local machine with the exact same environment crashed too. I was able to inspect logs and found out that it fails to create "root" user. Simply changing user and password made everything work, even on ECS.
This is the complete stack to have a mysql docker image running on AWS ECS FARGATE:
self.wordpress_database_task = TaskDefinition(
'WordpressDatabaseTaskDefinition',
RequiresCompatibilities=['FARGATE'],
Cpu='512',
Memory='2048',
NetworkMode='awsvpc',
# If your tasks are using the Fargate launch type, the host and sourcePath parameters are not supported.
Volumes=[
Volume(
Name='MySqlVolume',
DockerVolumeConfiguration=DockerVolumeConfiguration(
Scope='shared',
Autoprovision=True
)
)
],
ContainerDefinitions=[
ContainerDefinition(
Name='WordpressDatabaseContainer',
Image='mysql:5.7',
Environment=[
Environment(Name='MYSQL_ROOT_PASSWORD', Value='root'),
Environment(Name='MYSQL_DATABASE', Value='wpdb'),
Environment(Name='MYSQL_USER', Value='wordpressuser'),
Environment(Name='MYSQL_PASSWORD', Value='wordpressuserpassword'),
],
PortMappings=[
PortMapping(
ContainerPort=3306
)
]
)
]
)
self.wordpress_database_service = Service(
'WordpressDatabaseService',
Cluster=Ref(self.ecs_cluster),
DesiredCount=1,
TaskDefinition=Ref(self.wordpress_database_task),
LaunchType='FARGATE',
NetworkConfiguration=NetworkConfiguration(
AwsvpcConfiguration=AwsvpcConfiguration(
Subnets=[Ref(sub) for sub in VpcFormation().public_subnets],
AssignPublicIp='ENABLED',
SecurityGroups=[Ref(self.security_group)]
)
),
)
Note the AssignPublicIp='ENABLED' option so you would be able to connect to the database remotely.
After the stack completed i was able to successfully connect with a command:
mysql -uwordpressuser -pwordpressuserpassword -h18.202.31.123
Thats it :)

How to display the content of the volume?

How to display the content of the openshift volume? (files that are in, the total space used etc.).
The only information I've managed to find in the docs is to oc rsh into the running POD and use ls, which of course is no way a viable solution if no pod using the volume is running and can't be started because of some issues with the volume...
For the moment there's no "volume file explorer" or whatever interface in Openshift.
Currently you always need to attach the volume to a running pod and list files within.
If you're using glusterfs (and are cluster/storage admin) all volumes are also mounted inside the storage pods , so you can get a complete overview within the storage pods.
I don't know these ways are fit for you, but I just list the availabilities as follows.
As far as I remember, if the pod can be created based on docker image, then you can run without run the application like this.
oc run tmp-pod --image=your-docker-registry.default.svc/yourapplication -- tail -f /dev/null
You are using PersistentVolume(PV/PVC pair) for your volume, then you can display the volume after mounting temporarily the PV to temporary pod as follows.
oc run tmp-pod --image=registry.access.redhat.com/rhel7 -- tail -f /dev/null
oc set volume dc/tmp-pod --add -t pvc --name=new-registry --claim-name=new-registry --mount-path=/mountpath
You can see the volume contents mounted above configuration via tmp-pod, and you can remove above temporary pod simply after checking.
I hope it help you.
The solution proposed by #Daein Park to display the PersistentVolume(PV/PVC pair) content was not working for me. The command oc run tmp-pod does not create a dc deploymentConfig and it seems impossible to set a volume to a pod.
My solution was to use the following command:
oc run tmp-pod --image=dummy --restart=Never --overrides='{"spec":{"containers":[{"command":["tail","-f","/dev/null"],"image":"registry.access.redhat.com/rhel7","name":"tmp-pod","volumeMounts":[{"mountPath":"/mountpath","name":"volume"}]}],"volumes":[{"name":"volume","persistentVolumeClaim":{"claimName":"pv-clain"}}]}}'
NOTE2: The --image=dummy is only provided to make the oc run command happy, anyway the image field is overridden the json.
Finally, to list the content of the mounted volume:
oc rsh tmp-pod ls /mountpath
As the json content is not easy to read in the command line, here is what it is provided to the --overrides parameter:
{
"spec": {
"containers": [{
"command": ["tail", "-f", "/dev/null"],
"image": "registry.access.redhat.com/rhel7",
"name": "tmp-pod",
"volumeMounts": [{
"mountPath": "/mountpath",
"name": "volume"
}
]
}
],
"volumes": [{
"name": "volume",
"persistentVolumeClaim": {
"claimName": "pv-clain"
}
}
]
}
}

How to run several IPFS nodes on a single machine?

For testing, I want to be able to run several IPFS nodes on a single machine.
This is the scenario:
I am building small services on top of IPFS core library, following the Making your own IPFS service guide. When I try to put client and server on the same machine (note that each of them will create their own IPFS node), I will get the following:
panic: cannot acquire lock: Lock FcntlFlock of /Users/long/.ipfs/repo.lock failed: resource temporarily unavailable
Usually, when you start with IPFS, you will use ipfs init, which will create a new node. The default data and config stored for that particular node are located at ~/.ipfs. Here is how you can create a new node and config it so it can run besides your default node.
1. Create a new node
For a new node you have to use ipfs init again. Use for instance the following:
IPFS_PATH=~/.ipfs2 ipfs init
This will create a new node at ~/.ipfs2 (not using the default path).
2. Change Address Configs
As both of your nodes now bind to the same ports, you need to change the port configuration, so both nodes can run side by side. For this, open ~/.ipfs2/configand findAddresses`:
"Addresses": {
"API": "/ip4/127.0.0.1/tcp/5001",
"Gateway": "/ip4/127.0.0.1/tcp/8080",
"Swarm": [
"/ip4/0.0.0.0/tcp/4001",
"/ip6/::/tcp/4001"
]
}
To for example the following:
"Addresses": {
"API": "/ip4/127.0.0.1/tcp/5002",
"Gateway": "/ip4/127.0.0.1/tcp/8081",
"Swarm": [
"/ip4/0.0.0.0/tcp/4002",
"/ip6/::/tcp/4002"
]
}
With this, you should be able to run both node .ipfs and .ipfs2 on a single machine.
Notes:
Whenever you use .ipfs2, you need to set the env variable IPFS_PATH=~/.ipfs2
In your example you need to change either your client or server node from ~/.ipfs to ~/.ipfs2
you can also start the daemon on the second node using IPFS_PATH=~/.ipfs2 ipfs daemon &
Hello, I use ipfs2, after running two daemons at the same time, can indeed open localhost:5001 / webui, run the second localhost:5002 / webui has an error, as shown in the attachment
Here are some ways I've used to create multiple nodes/peers ids.
I use windows 10.
1st node go-ipfs (latest version)
2nd node Siderus Orion ifps (connect to Orion node , not local) -- https://orion.siderus.io/
Use VirtualBox to run a minimal ubuntu installation. (You can set up as many as you want)
Repeat the process and you have 4 nodes or as many as you want.
https://discuss.ipfs.io/t/ipfs-manager-download-install-manage-debug-your-ipfs-node/3534 is another gui that installs and lets you manage all ipfs commands without CMD. He just released it a few days ago and it looks well worth lots of reviews.
Disclaimer I am not a coder or computer professional. Just a huge fan of IPFS! I hope we can raise awareness and change the world.

packer ssh_private_key_file is invalid

I am trying to use the OpenStack provisioner API in packer to clone an instance. So far I have developed the script:
{
"variables": {
},
"description": "This will create the baked vm images for any environment from dev to prod.",
"builders": [
{
"type": "openstack",
"identity_endpoint": "http://192.168.10.10:5000/v3",
"tenant_name": "admin",
"domain_name": "Default",
"username": "admin",
"password": "****************",
"region": "RegionOne",
"image_name": "cirros",
"flavor": "m1.tiny",
"insecure": "true",
"source_image": "0f9b69ee-4e9f-4807-a7c4-6a58355c37b1",
"communicator": "ssh",
"ssh_keypair_name": "******************",
"ssh_private_key_file": "~/.ssh/id_rsa",
"ssh_username": "root"
}
],
"provisioners": [
{
"type": "shell",
"inline": [
"sleep 60"
]
}
]
}
But upon running the script using packer build script.json I get the following error:
User:packer User$ packer build script.json
openstack output will be in this color.
1 error(s) occurred:
* ssh_private_key_file is invalid: stat ~/.ssh/id_rsa: no such file or directory
My id_rsa is a file starting and ending with:
------BEGIN RSA PRIVATE KEY------
key
------END RSA PRIVATE KEY--------
Which I thought meant it was a PEM related file so I found this was weird so I made a pastebin of my PACKER_LOG: http://pastebin.com/sgUPRkGs
Initial analysis tell me that the only error is a missing packerconfig file. Upon googling this the top searches tell me if it doesn't find one it defaults. Is this why it is not working?
Any help would be of great assistance. Apparently there are similar problems on the github support page (https://github.com/mitchellh/packer/issues) But I don't understand some of the solutions posted and if they apply to me.
I've tried to be as informative as I can. Happy to provide any information where I can!!
Thank you.
* ssh_private_key_file is invalid: stat ~/.ssh/id_rsa: no such file or directory
The "~" character isn't special to the operating system. It's only special to shells and certain other programs which choose to interpret it as referring to your home directory.
It appears that OpenStack doesn't treat "~" as special, and it's looking for a key file with the literal pathname "~/.ssh/id_rsa". It's failing because it can't find a key file with that literal pathname.
Update the ssh_private_key_file entry to list the actual pathname to the key file:
"ssh_private_key_file": "/home/someuser/.ssh/id_rsa",
Of course, you should also make sure that the key file actually exists at the location that you specify.
Have to leave a post here as this just bit me… I was using a variable with ~/.ssh/id_rsa and then I changed it to use the full path and when I did… I had a space at the end of the variable value being passed in from the command line via Makefile which was causing this error. Hope this saves someone some time.
Kenster's answer got you past your initial question, but it sounds like from your comment that you were still stuck.
Per my reply to your comment, Packer doesn't seem to support supplying a passphrase, but you CAN tell it to ask the running SSH Agent for a decrypted key if the correct passphrase was supplied when the key was loaded. This should allow you to use Packer to build with a protect SSH key as long as you've loaded it into SSH agent before attempting the build.
https://www.packer.io/docs/templates/communicator.html#ssh_agent_auth
The SSH communicator connects to the host via SSH. If you have an SSH
agent configured on the host running Packer, and SSH agent
authentication is enabled in the communicator config, Packer will
automatically forward the SSH agent to the remote host.
The SSH communicator has the following options:
ssh_agent_auth (boolean) - If true, the local SSH agent will be used
to authenticate connections to the remote host. Defaults to false.