Appendix A - Inline Assembly Programming Tips
This appendix points out some common-sense considerations when writing inline assembly code.
Here are some DONíTs:
- DONíT USE ASSEMBLY CODE unless you are just trying to learn assembly language. If you must use it, use it where it makes the most sense like in loops, managing memory, using the FPU, or developing a compiler.
- DONíT write entire programs using inline assembly code.
- DONíT try to over-optimize your assembly code. Get it to work first, and then tweak it. The truth is that you quickly reach a point of diminishing return when optimizing any code. Sometimes itís just not worth the extra effortólet it go. Otherwise, focus on optimizing loops, eliminating branches, and so forth.
- DONíT count nanoseconds and instruction timing because pipelining, pre-fetch queuing, branch prediction, multi-core processors, parallel processing, threads, and decoding stalls will muck up your best efforts. Generally speaking, your code will be too small and the processor too fast to make that much difference. Donít waste your time.
- DONíT erase any code before you are done testing it. Comment it out so that if you need to put it back, all you have to do is uncomment it.
Here are some DOs if you are serious about assembly programming:
- DO update the NASM assembler with the latest update from the NASM project. The NASM website address is in Appendix C. Copy the latest nasm.exe file to the iwbdev/bin subdirectory. Rename the current nasmw.exe to nasmw_old.exe or something similar. Name the copied file nasmw.exe.
- DO get your hands on all the Intel IA-32 Software Developerís manuals. They are offered free and come in PDF format. You can find the Intel website address in Appendix C.
- DO download a disassembler. Two good ones are OllyDbg and Ida Pro (free version). Their websites are listed in Appendix C. Learn to use one or both of them. Ida Pro is the industry standard and you donít need the $1,000 versionóthe free version works just fine.
- DO get a copy of the Win32 API. It is old, but extremely useful. The Win32 API website is located in Appendix C. It comes in the old Win XP help file format so you may have to download Microsoftís help file add-on for Vista and Windows 7. The MS Help File Reader website is located in Appendix C.
- DO join the NASM forum. The forum website address is located Appendix C.
- DO join the IWBASIC forum. The forum website address is located in the Appendix C.
- Most assembly distributions are command-line only, but MASM32 and GoAsm have nice IDEs. However, one of the best ways to learn assembly language programming is inline using IWBASIC as a platform. This eliminates much of the setup and overhead, and IWBASIC has a first-rate IDE, which NASM lacks.
Here are some quick assembly coding hints:
- To clear a register use the XOR or the SUB instructions like this: XOR EAX, EAX or SUB EAX, EAX. Both are quick and intuitive. Donít use MOV EAX, 0 as this is slower.
- To place a 1 in a register, the fastest way is to use one of the following instructions:
XOR EAX, EAX INC EAX ; these two instructions are twice as fast as MOV EAX, 1
- Use a register more than once with special instructions. Example: MOV EAX, [EAX] puts the contents at the address in EAX back into EAX without using another register.
- Use local variables as much as possible over global variables because they are utilized more efficiently in modern processors and reduce cache misses than do global variables.
- Adding a register value to memory is faster than adding a memory value to a register. Example:
ADD EAX, [ESI] ; slower ADD [ESI], EAX ; faster
- Use 32-bit integers for loop counting and other repetitive tasks. The IA-32 processor and its instruction set are optimized to pass 32-bit integers (DWORD) more efficiently than 16-bit (WORD) or 8-bit (BYTE) integers.
- Pass values from one assembly procedure to another in registers rather than in memory or on the stack. This also saves multiple instructions.
- Donít pass large chunks of data like strings and arrays on the stack. Pass a pointer (address) to these structures instead.
- Use MOVZX to clear out a 32-bit register before entering a 16-bit or 8-bit value. Using XOR is a separate action from the MOV instruction and may not complete in modern processors before the MOV instruction is ready to execute. This causes a stall in the pipeline. MOVZX clears out all 32-bits as it moves data into the register.
- Use ADD and SUB when you desire speed and use INC and DEC when optimizing for size.
- Avoid using the XCHG instruction because it has an implicit LOCK prefix that prevents it from using the processorís cache. This can be extremely time consuming.
- Align all critical code and data. In systems with 1GB or more RAM, it doesnít hurt to align code and data on 16- or 32-byte boundaries for the small programs we write.