I have been working on a project lately that unites two things that deeply intrigue my geeky side – digital forensics and Common Lisp. A friend had a laptop whose hard drive had been reformatted – and the only backup of the data was no good. So I got to do a little research, and at the end of the day, was victorious!
For those who find themselves doing forensics without expensive software, I would highly recommend Brian Carrier’s book, File System Forensics. In it, he provides a wonderfully detailed description of major filesystems, including NTFS. This was something I’d never even thought about learning, so it made an excellent challenge.
It seems that NTFS puts the MFT (Master File Table) in a predictable location on the hard drive. Each entry describes a file (or directory, or attributes for some other file, or an index, etc…), and is always 1024 bytes. When a new filesystem is created over the old one, the old MFT is not erased – the new filesystem just starts over at the beginning of the table and works its way up…
This means that, until you add enough files, the old MFT entries are still accessible (they’re just contained in unallocated clusters that are likely to be used in the future). So I broke out the sleuth kit and started playing. Now, I’m not a TSK expert, but I got stuck. It’s easy to extract unallocated clusters and identify MFT entries in those clusters – what’s hard is to pull the files (or what’s left of them) out. No TSK tools seem to do that – they all assume that you are using the existing filesystem, and so you cannot specify “I know it’s bogus, but pretend that this block contains an MFT, and extract what you can find of that file.” If you know how, please comment!
For me, though, after diligent searching, I could find nothing – so I set to work writing something. I started in C, but realized that would take too long (probably days). So, I pulled out Practical Common Lisp and started learning how to work with binary data structures. I am absolutely amazed at how straighforward it is! E.g., no masking and shifting to pull out nybbles, etc., – just (ldb (byte 4 0) idex) (takes the four bits starting at the least significant bit from the variable ‘idex).
So my high-level coding bliss proceeded until I had two programs written in about 200 lines of Lisp: (1) an MFT entry analyzer to identify the filenames for each MFT record identified by its signature (“FILE” at the first bytes of the record) and (2) code to parse the $DATA attributes for the files selected for recovery that generates the right series of DD lines for extracting the fragments of the file. And it works great!
Now, it’s not publishable code (Here’s proof!), but it retrieved a couple hundred MB of data from this drive, including fragmented JPEGs and MP3s of substantial size. And, being Common Lisp, it’ll be fairly extensible… if I do a little refactoring and code-golf, that is. After hex-editing all over this drive image (60GB), I’m getting more and more interested in writing a more comprehensive package for manipulating drive images. Discoverability is so low, and though Autopsy is sort of GUI, it’s not really that great for the more unusual forensics work – i.e., the stuff that people seem to be coming to me about ;-).