Container from scratch
12 September 2018
We are doing to be building a docker from scratch, and I do not mean a Dockerfile from scratch, I litterally mean the container. Getting there not nearly as crazy as I thought it would be and involves using namespaces, which are the building blocks to making an isolated environment. Having seen some really cool examples of building a container from scratch, I thought would run through it myself to get a better understanding. Some great resources on this that I took a look at are from Liz Rice and Julian Friedman.
Before getting into the code, namespaces are tools to seperate things on a system. There are 7 linux namespaces and we are going to use 3 of them here: Mount, Process ID, and UTS
- Mount: Isolated filesystem, not able to modify files on our hosts filesystem
- Pid: Process will appear as a familiar Pid 1, and not be able to see the hosts Pids
- UTS: Container will have its own hostname, and not be able to modify the hosts hostname
There are 4 others:
- Network
- Interprocess Communication
- User ID
- Contrl Group
You can read more about here: https://en.wikipedia.org/wiki/Linux_namespaces
Now to get start started on the code! We are going to need a ubuntu machine and a spare filesystem. For this demo I’m going to use a DigitalOcean droplet and cool container tool (lxc) to create us an extra filesystem.
The first thing that we will need is a container that can run a command. So we can start off
with a golang program that takes input simmilar to docker run /bin/bash
, but ours will
look like go run main.go run bash
use golang to run a command
package main
import (
"fmt"
"os"
"os/exec"
)
func main() {
switch os.Args[1] {
case "run":
run()
default:
panic("invalid command!")
}
}
func run() {
fmt.Printf("running %v\n", os.Args[2:])
cmd := exec.Command(os.Args[2], os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.stderr = os.Stderr
must(cmd.Run())
}
func must(err err) {
if err !=nil {
panic(err)
}
}
The above code takes arguments starting with run
and passes them to exec.Command
, adding in stdin/stdout/stderr.
So far this looks the same as any program
that is making a call out to the system (although with arbitrarty user input).
Running the above with go run main.go run /bin/bash
, we have a container running a shell - but it can see everything.
Although it cannot see my paused vim running from the previous shell. Exiting out and we can again
The next step is to add in some namespaces, here we can use cmd.SysProcAttr
to
tell the command to use its own namespaces
stargin with namespaces
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID
}
Including syscall.CLONE_NEWUTS
is for the UTS namespace, and syscall.CLONE_NEWPID
is
for the PID namespace. Adding this alone won’t give us the namespaces in our /bin/bash
invocation because we need to run our command from the forked exec. So what
we will do is run a command that forks our current process with the namesapces, and in
that command we run our actual command as a child:
func main() {
switch os.Args[1] {
case "run":
run()
case "child":
child()
default:
panic("invalid command!")
}
}
func run() {
cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
}
must(cmd.Run())
}
func child() {
fmt.Printf("running %v as PID %d\n", os.Args[2:], os.Getpid())
cmd := exec.Command(os.Args[2], os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
must(cmd.Run())
}
Running again with go run main.go /bin/bash
give us the ability to isolate
our hostname, beacuse of: syscall.CLONE_NEWUTS
Our PID will not actaully show PID 1 yet even though we have syscall.CLONE_NEWPID
, because ps also looks
at ls /proc
for running processes. We will be able to get to this while doing the next step.
Next up is the filesystem namespace. What we will have to do is, in our child process, change our root directory and current diretory to the new filesystem before running the command. If you do not have an extra filessytem sitting around, you can use cool container tool lxc to create one:
apt-get install lxc
sudo lxc-create -t ubuntu -n container
The above command will create a ubuntu filesystem, in the /var/lib/lxc/container/rootfs directory.
Otherwise, this is what we need to do is add:
must(syscall.Chroot("mynewfs"))
must(os.Chdir("/"))
to our child function:
func child () {
fmt.Printf("running %v as PID %d\n", os.Args[2:], os.Getpid())
cmd := exec.Command(os.Args[2], os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.stderr = os.Stderr
must(syscall.Chroot("/var/lib/lxc/container/mynewfs"))
must(os.Chdir("/"))
must(syscall.Mount("proc", "proc", "proc", 0, ""))
must(cmd.Run())
}
We also need to call must(syscall.Mount("proc", "proc", "proc", 0, ""))
, this
is due to proc needing to be mounted in a new filesystem (which it is in our case).
An alternative would be to run mount -t proc proc /proc
from the filesystem from a shell, but regardless
it needs be run
Now if we hop into out container with go run main.go run /bin/bash
, we have what
looks like a container!
Running ps
will show us as process 1
Running hostname
will show our containers hostname
Running ls
will show our containers filesystem
While there is much more than this to get docker working as well as it does, this covers the basics of getting our own namespaces to execute inside! If you want to try out more, there are a few namespaces we did not go through, including cgroups - which are namespaces that do the really cool part of controlling resources like cpu and memory allocation.
The full code to get everything running:
package main
import (
"fmt"
"os"
"os/exec"
"syscall"
)
func main() {
switch os.Args[1] {
case "run":
run()
case "child":
child()
default:
panic("invalid command!")
}
}
func run() {
cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
}
must(cmd.Run())
}
func child() {
fmt.Printf("running %v as PID %d\n", os.Args[2:], os.Getpid())
cmd := exec.Command(os.Args[2], os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
must(syscall.Chroot("/var/lib/lxc/container/rootfs"))
must(os.Chdir("/"))
must(syscall.Mount("proc", "proc", "proc", 0, ""))
must(cmd.Run())
}
func must(err error) {
if err != nil {
panic(err)
}
}