Understand the stack Part 2

understand ....

the stack (part 2)

by Jeremy Gordon -

This file is intended for those interested in 32 bit assembler programming, in particular for Windows.

In Part 2 you will find information which is not essential reading for assembler programmers but may be of interest.

Part 2 (not essential):-
Stack is in virtual address space
The stack at start-up: contents
The stack at start-up: amount
Enlargement of the stack at run-time
Permitted usable stack area
Using the stack to keep data streams
The stack in multi-thread applications
The stack frame and local data
Addressing the local data
Accessing parameters from the stack
Use of the stack in Window's callback procedures Part 1 (essential)
Features and advantages of the stack
Practical uses for the stack
The stack pointer: ESP register
PUSHing data onto, and POPing data from, the stack
Preserving register values in functions
Preserving data in memory
Move data around without using registers
Reverse data order
How CALL and RET use the stack
Importance of stack equilibrium
Using the stack to pass parameters

Stack is in virtual address space

The value in ESP is a virtual address. If it is for example, a value of 64FE3Ch at start up, we are not talking here of this address in real physical memory. To obtain the actual physical memory address the system needs to convert (or "map") the 64FE3Ch according to its own internal records. For example this address might in reality be 2FE3Ch in real physical memory. A virtual address is therefore just a convenient representation of a position in memory. It is often said that each application runs in its own virtual address space. In theory the whole range of 32-bit addresses (zero to 4GB) is available to each application. In practice this is not the case, but it is true that each application running on the system may use the same range of virtual addresses. There is no conflict between them because the system knows which application is addressing memory at any one time and can therefore point the application to the correct place in physical memory. So at any one time it is likely that there are several applications with the same value in ESP. But this value will actually point to different parts of physical memory.

The stack at start-up: contentscontents

In Windows the main thread is allocated its own stack area by the system when it loads. The system itself uses this thread and the stack area for its own purposes prior to calling the program's starting address. You can see this in the debugger. Start your program up to the starting address and look at the value of ESP. Now open an inspector window for the value of ESP. Now you might expect it to be at the bottom of the memory area, but it is not. If you scroll the inspector to the bottom of the memory area (scroll to the highest address) you see that there has already been a lot of activity in the stack where the system has prepared for its call to the program's starting address. Of interest is that (in W98 anyway) the last value on the stack before the application was called is a return address in Kernel32.dll. This indicates that a function within Kernel32.dll called the application. Because of this return address it is possible to use a simple RET to end a process, rather than calling ExitProcess. Of course this only works if the stack is in equilibrium so that code execution continues in the caller function in Kernel32.dll.
A little further down the stack we can see the filename of the application, and much further down we can see the address of the of the system's own exception handler for the application's main thread has been put on the stack. These things all show that the application's own stack area (and own thread) is used by the system to prepare for a call to the application.

The stack at start-up: amountcontents

In Windows, when memory is reserved for the use of an application a range of virtual addresses is allocated by the system. This allocation preserves those addresses for the application's use. If the application asks for more memory the same addresses cannot be reused. No physical memory is actually used until the memory is committed. At that point the virtual addresses which have been allocated are mapped to the area or areas of physical memory the system has available.
Obviously for this arrangement to work, the system needs to know the maximum size of contiguous memory which may have to be committed. This is then the allocated range of addresses.
When preserving memory for stack use the same applies. On an application's start-up the system needs to know how much memory to allocate for the stack, and how much to commit in the first instance. These two amounts are contained within the executable at +48h and +4Ch respectively in the optional header (to understand exactly where this is in the executable, you need to know the PE file format). As we see below they apply not only to the application's main thread but also to new threads made by the application.
Most linkers use a default of 1MB and 4K (the normal page size) for these values respectively. With GoLink you can alter the defaults using the /stacksize switch and /stackinit switches respectively (see the GoLink manual how to use these).

Enlargement of the stack at run-timecontents

The system senses whether an application is attempting to read or write outside the committed stack area by using exception handling. Providing (in W9x) the attempt is within the permitted usable stack area new memory will be committed as required. Even if an attempt is made to enlarge the stack beyond the allocated area, under NT (but not W9x) the system will try to allocate further memory, but this will not be possible if the virtual addresses then required have been allocated to other memory areas.

Permitted usable stack areacontents

The stack is not considered suitable for keeping large amounts of data and this view is enforced by Windows by its exception mechanism. In W9x the permitted usable stack area is between the current ESP and the next page boundary plus the page size. For example if ESP is 64FE3Ch, then the next page boundary is 64F000h and the extra page (which is usually set at 4K by the system) takes you to 64E000h:-

		64D000h
	4K page unavailable	64E000h
	4K page available	64F000h
ESP (64FE3Ch) is here →	4K page available	650000h

So if ESP is 64FE3Ch you will find that the instruction

MOV D[ESP-1E40h],0

will cause an exception, because the actual point on the stack being addressed here is 64DFFCh which area is unavailable because it has not been committed by the system.
And you can't get round this by moving ESP either. In W9x the system allows you to move ESP only up to the next page boundary + the page size less four bytes. For example if ESP is 64FE3C a single instruction will only be permitted to move ESP by 1E38h (in decimal this is 7836 bytes). This means that the instruction

SUB ESP,1E38h

causes ESP to become 64E004h and is permitted. But the instruction

SUB ESP,1E3Ch

will cause an exception. The difference in 4 bytes in the position which triggers the exception suggests that there are two different protection mechanisms at work here.

From the above it might appear that the size of data which might be put on the stack is limited to 4K, but this is not true. There are two ways to avoid these exceptions from occurring and thereby to use the stack for larger data areas.
The first way is to move and use ESP incrementally. This will ensure that the system commits the memory progressively as intended. The following code safely creates an area of 40K bytes on the stack:-

MOV ECX,10
L0:
SUB ESP,1000h
MOV D[ESP],0
LOOP L0

Here the system is made to commit ten 4K chunks of stack memory. ESP then ends up at the top of this stack area. This will not be particularly quick since the system has to commit memory ten times. A quicker method is to instruct the system to commit a larger than usual area of memory for the stack when the application is loaded. With GoLink you can do this using the /stackinit switch. For example:-

/stackinit 0A000

will ensure that 40K of memory is committed on the stack at start-up. You will then be able safely to move ESP using the instruction:-

SUB ESP,0A000h

giving you 40K of memory on the stack to play with.

Using the stack to keep data streamscontents

Provided precautions are taken, the stack can be used to hold a fairly large stream of data. The things to remember are:-

Always restore ESP to equilibrium when you have finished with the stack.
Never write to [ESP] unless you have subtracted at least 4 bytes from the original value of ESP because that holds the return address for the procedure. Never write to [ESP+n] unless a sufficient number of bytes have been subtracted from ESP to avoid overwriting other important data.
If you do not move ESP to the top of the data area then you need to write the data in reverse direction that is to say into progressively decreasing values. This can be done in various ways, the most convenient is probably to set the direction flag using STD and then to use the MOVS instruction, for example:-
```
MOV ECX,8000
MOV EDI,ESP
SUB EDI,4
STD                     ;set direction flag
REP MOVSD               ;move ECX dwords from [ESI] to [EDI]
CLD                     ;clear direction flag
```
This code writes 8,000 dwords to the stack. Note how SUB EDI,4 avoids the write over [ESP] which holds the return address for the procedure. There is no problem with enlargement of memory since the write is incremental, so the system properly creates new 4K memory areas as it needs to.
If you do move ESP to the top of the data area you must take the precautions referred to in permitted usable stack area. Having done this you can write to the stack in a forward direction.

The stack in multi-thread applicationscontents

Each thread in your application has its own registers and its own stack. That is to say, when the system gives processor-time to a thread, it will switch to the register context for the thread. This holds all the values of the registers when processor-time was last removed from the thread. Since the registers include ESP, its value will also be correctly switched so that the correct area of physical memory will be used by the thread as its stack. The result is that a thread can rely on the fact that it can use its stack as a discrete area of memory which will not be interfered with by other threads. You can see this in the debugger. You can see that the ESP always changes substantially when execution changes from thread to thread.

When a thread starts it is allocated its stack area. As a practical example, it was found under W98 that the stack of the main thread of an application ran from 64FE3Ch (downwards) and when a new thread was made its stack ran from 75FF9Ch (downwards). In another test, when six new threads were made their stacks started at 19DEF9Ch, 1AFFF9Ch, 1C1FF9Ch, 1D3FF9Ch, 1E5FF9Ch and 1F7FF9Ch respectively. Here you can see that the system is separating the virtual address of each stack area by 128KB more than the default 1MB area. This is probably to allow room for the system's own use of the stack and also some extra leeway. Changing the allocation stack size to 200000h (2MB) using the /stacksize switch and then creating six new threads resulted in the stack areas being separated by 128KB more than 2MB.

The stack frame and local datacontents

A stack frame is a discrete area of the stack which holds a return address of a function and data used by that function without risk of overwrite because the value of ESP has been decreased. The data kept in a stack frame is called "local data". That's because it is intended only for use within the stack frame concerned and is not intended to be addressed by the program generally. Lets take this simple example:-

PROCEDURE1:
SUB ESP,20h                ;make space on stack for local data
;                          ;use local data area
CALL PROCEDURE2
;                          ;return from PROCEDURE2
;                          ;continue to use local data
ADD ESP,20h                ;restore ESP to equilibrium
RET

and

PROCEDURE2:
PUSH EAX,EBX,ECX
;                          ;carry out various calculations
POP ECX,EBX,EAX
RET

Here the stack frame is created using the SUB ESP,20h instruction. This decreases the value of ESP by 32 bytes creating space on the stack for 8 dwords. Now because ESP has been moved, whatever happens in PROCEDURE2 will not overwrite these 8 dwords. Lets check this visually assuming that ESP is 64FE38h at the start of PROCEDURE1:-

64FE08h holds value in ECX inserted by PROCEDURE2

64FE0Ch holds value in EBX inserted by PROCEDURE2

64FE10h holds value in EAX inserted by PROCEDURE2

ESP here at start of PROCEDURE2→ 64FE14h holds return address from PROCEDURE2

PROCEDURE1's stack frame 64FE18h
to
64FE34h 8 dwords for local data

64FE38h holds return address from PROCEDURE1

Addressing the local datacontents

Note: this is automated in GoAsm using FRAME..ENDF and in MASM using PROC..ENDP.
Since ESP points to the top of the local data area you can address the data using ESP. So in the example above the first dword of local data would be available at [ESP] immediately after the SUB ESP,20h. But using ESP to keep track of the local data on the stack can be difficult because ESP will move on each CALL or PUSH within the procedure. For this reason the EBP register tends to be used for this purpose instead. This is usually set early in the stack frame to the bottom of the local data and it will not be changed until execution leaves the stack frame. In this way you can be confident that the local data can always be addressed using a particular offset from EBP.
So the code for a typical stack frame now looks like this:-

TypicalStackFrame:
PUSH EBP          ;save the value of ebp which will be altered    }
MOV EBP,ESP       ;give current value of stack pointer to ebp     } "prologue"
SUB ESP,0Ch       ;make space for local data                      }
;                 ;POINT "X"
;
;                 ;code within the procedure
;
MOV ESP,EBP       ;restore the stack pointer to previous value    }
POP EBP           ;restore the value of ebp                       } "epilogue"
RET               ;return to caller adjusting the stack pointer   }

Here we have moved the stack pointer by 12 bytes. At point "X" the stack by reference to EBP actually looks like this:-

ebp-10h the next push will go here

ESP here at point "X"→ ebp-0Ch holds space for local data

ebp-8h holds space for local data

ebp-4h holds space for local data

ebp holds saved value of ebp

ebp+4h holds return address from TypicalStackFrame

Now throughout the stack frame, whatever happens to ESP the local data will always be accessible at [EBP-4h], [EBP-8h] and [EBP-0Ch].
Note how ESP is restored to equilibrium automatically by the use of MOV ESP,EBP just before returning to the caller.
You don't have to use EBP for this purpose, any register will do. But EBP is traditionally used for this and your code will be more understandable to others if you stick to this.

Accessing parameters from the stackcontents

We have already seen how to pass parameters on the stack to other procedures. Now we are going to see how to use parameters passed to procedures in your own code. Basically these parameters are further down the stack so they will not be overwritten under any normal circumstances. For this reason it is not necessary to retrieve and save them at all. Upon entry to a procedure ESP will point to the return address of the procedure (inserted by CALL). So the parameters will be at [ESP+4h], [ESP+8h], [ESP+0Ch] and so on, depending on how many parameters there are. But it may be difficult to keep track of exactly where the parameters are using ESP because it will change upon the next PUSH or CALL. So again you can use EBP to point to the parameters.
If you have the prologue code:-

PUSH EBP          ;save the value of ebp which will be altered    }
MOV EBP,ESP       ;give current value of stack pointer to ebp     } "prologue"
SUB ESP,0Ch       ;make space for local data                      }

When ESP is given to EBP it is 4 bytes less in value than at the beginning of the call (this is because of the first "PUSH EBP"). Therefore the parameters can now be accessed using [EBP+8h], [EBP+0Ch], [EBP+10h] and so on, depending on how many parameters there are.

Use of the stack in Window's callback procedurescontents

The two techniques just dealt with (making space for local data and addressing parameters) are required in Windows callback procedures. The callback procedure most frequently found in Windows programs is the window procedure. It is to this procedure that Windows sends "messages" and Windows expects the correct reply. What is happening here is that Windows calls the windows procedure using the program's own thread. This usually happens while the program is in the message loop either waiting for a return from the API GetMessage, or whilst executing the API DispatchMessage.
Luckily in GoAsm you can use FRAME..ENDF to retrieve the parameters sent by windows and to address them by name. You can also easily make local data areas addressable by name. And you can preserve registers and also restore the stack to equilibrium automatically too. See the GoAsm manual for a full description of how to do this or go back to understanding the stack (part 1).