win32 api tips and quick reference guide

Those who still writing win32 api programming, this is great except from my win32 guru Jeffrey Richter and David Gruklinski(dedicating to him since he is no longer there with us)

Error handling:
Internally, when a Windows function detects an error, it uses a mechanism called thread-local storage to associate the appropriate error-code number with the calling thread. 

In Microsoft Visual Studio 6.0, Microsoft’s debugger supports a useful featureóyou can configure the Watch window to always show you the thread’s last error code number and the English text description of the error. This is done by selecting a row in the Watch window and typing “@err,hr”. 

Kernal Objects:
Kernel objects are owned by the kernel, not by a process and it is process relative. In other words, if your process calls a function that creates a kernel object and then your process terminates, the kernel object is not necessarily destroyed

When a process is initialized, the system allocates a handle table for it. This handle table is used only for kernel objects,it is simply an array of data structures. Each structure contains a pointer to a kernel object, an access mask, and some flags.

When a process first initializes, its handle table is empty. Then when a thread in the process calls a function that creates a kernel object, such as CreateFileMapping, the kernel allocates a block of memory for the object and initializes it; the kernel then scans the process’s handle table for an empty entry.

When your process terminates, the system automatically scans the process’s handle table. If the table has any valid entries (objects that you didn’t close before terminating), the system closes these object handles for you. If the usage count of any of these objects goes to zero, the kernel destroys the object.

Sharing Kernel Objects Across Process Boundaries:
Object handle inheritance can be used only when processes have a parent-child relationship. In this scenario, one or more kernel object handles are available to the parent process, and the parent decides to spawn a child process, giving the child access to the parent’s kernel objects.

To create an inheritable handle, the parent process must allocate and initialize a SECURITY_ATTRIBUTES structure and pass the structure’s address to the specific Create function.

Be aware that object handle inheritance applies only at the time the child process is spawned. If the parent process were to create any new kernel objects with inheritable handles, an already-running child process would not inherit these new handles.

By far the most common way for a child process to determine the handle value of the kernel object that it’s expecting is to have the handle value passed as a command-line argument to the child process. The child process’s initialization code parses the command line (usually by calling sscanf) and extracts the handle value.

you can use other forms of interprocess communication to transfer an inherited kernel object handle value from the parent process into the child process. One technique is for the parent to wait for the child to complete initialization (using the WaitForInputIdle function discussed in Chapter 9); then the parent can send or post a message to a window created by a thread in the child process.

Another technique is for the parent process to add an environment variable to its environment block. The variable’s name would be something that the child process knows to look for, and the variable’s value would be the handle value of the kernel object to be inherited. Then when the parent spawns the child process, the child process inherits the parent’s environment variables and can easily call GetEnvironmentVariable to obtain the inherited object’s handle value. This approach is excellent if the child process is going to spawn another child process, because the environment variables can be inherited again.

Changing a Handle’s Flags:
you might at times want to control which child processes inherit kernel object handles. To alter the inheritance flag of a kernel object handle, you can call the SetHandleInformation function.
SetHandleInformation(hobj, HANDLE_FLAG_INHERIT, 0); //TURN OFF

Named Objects:
The second method available for sharing kernel objects across process boundaries is to name the objects. Manyóthough not allókernel objects can be named. For example, all of the following functions create named kernel objects:
Ex.CreateMutex, OpenMutex.

Terminal Server Name Spaces
Note that Terminal Server changes the above scenario a little bit. A Terminal Server machine will have multiple name spaces for kernel objects. There is one global name space, which is used by kernel objects that are meant to be accessible by any and all client sessions. This name space is mostly used by services. In addition, each client session has its own name space. This keeps two or more sessions that are running the same application from trampling over each otheróone session cannot access another session’s objects even though the objects share the same name. On a machine without Terminal Server, services and applications share the same kernel object name space as described above; this is not true on a Terminal Server machine.

A service’s named kernel objects always go in the global name space. By default, in Terminal Server, an application’s named kernel object goes in the session’s name space. However, it is possible to force the named object to go into the global name space by prefixing the name with “Global\”, as in the example below:

HANDLE h = CreateEvent(NULL, FALSE, FALSE, “Global\\MyName”); 

You can also explicitly state that you want a kernel object to go in the session’s name space by prefixing the name with “Local\”, as in

HANDLE h = CreateEvent(NULL, FALSE, FALSE, “Local\\MyName”); 

Duplicating Object Handles

The last technique for sharing kernel objects across process boundaries requires the use of the DuplicateHandle function:
this function takes an entry in one process’s handle table and makes a copy of the entry into another process’s handle table. DuplicateHandle takes several parameters but is actually quite straightforward. The most general usage of the DuplicateHandle function involves three different processes that are running in the system.


A process is usually defined as an instance of a running program and consists of two components:
A kernel object that the operating system uses to manage the process. The kernel object is also where the system keeps statistical information about the process.
An address space that contains all the executable or DLL module’s code and data. It also contains dynamic memory allocations such as thread stacks and heap allocations.

Processes are inert. For a process to accomplish anything, it must have a thread that runs in its context; this thread is responsible for executing the code contained in the process’s address space. In fact, a single process might contain several threads, all of them executing code “simultaneously” in the process’s address space. To do this, each thread has its own set of CPU registers and its own stack. Each process has at least one thread that executes code in the process’s address space. If there were no threads executing code in the process’s address space, there would be no reason for the process to continue to exist, and the system would automatically destroy the process and its address space.

The operating system doesn’t actually call the entry-point function you write. Instead, it calls a C/C++ run-time startup function. This function initializes the C/C++ run-time library so that you can call functions such as malloc and free. It also ensures that any global and static C++ objects that you have declared are constructed properly before your code executes.Ex.WinMainCRTStartup,wWinMainCRTStartup,mainCRTStartup,wmainCRTStartup

You can find the code for the four startup functions in the CRt0.c file:
>Retrieve a pointer to the new process’s full command line.
>Retrieve a pointer to the new process’s environment variables.
>Initialize the C/C++ run time’s global variables. Your code can access these variables if you include StdLib.h. The variables are listed in Table 4-1.
>Initialize the heap used by the C run-time memory allocation functions (malloc and calloc) and other low-level input/output routines.
>Call constructors for all global and static C++ class objects.
>Calls any functions registered by calls to the _onexit function.
>Calls destructors for all global and static C++ class objects.
>Calls the operating system’s ExitProcess function, passing it nMainRetVal. This causes the operating system to kill your process and set its exit code.

The actual value of (w)WinMain’s hinstExe parameter is the base memory address where the system loaded the executable file’s image into the process’s address space. For example, if the system opens the executable file and loads its contents at address 0x00400000, (w)WinMain’s hinstExe parameter has a value of 0x00400000.
The GetModuleHandle function, shown below, returns the handle/base address where an executable or DLL file is loaded in the process’s address space:

(w)WinMain’s hinstExePrev parameter. This parameter was used in 16-bit Windows and remains a parameter to (w)WinMain solely to ease porting of 16-bit Windows applications. You should never reference this parameter inside your code.

When a process kernel object is created, the system assigns the object a unique identifier; no other process kernel object in the system will have the same ID number. The same is true for thread kernel objects. When a thread kernel object is created, the object is assigned a unique, system-wide ID number. Process IDs and thread IDs share the same number pool. This means that it is impossible for a process and a thread to have the same ID. In addition, an object is never assigned an ID of 0. Before CreateProcess returns, it fills the dwProcessId and dwThreadId members of the PROCESS_INFORMATION structure with these IDs. IDs simply make it easy for you to identify the processes and threads in the system.

If your application uses IDs to track processes and threads, you must be aware that the system reuses process and thread IDs immediately. For example, let’s say that when a process is created, the system allocates a process object and assigns it the ID value 122. If a new process object is created, the system doesn’t assign the same ID number. However, if the first process object is freed, the system might assign 122 to the next process object should define a more persistent mechanism to communicateókernel objects, window handles, and so forth.

Terminating a Process
A process can be terminated in four ways:
>The primary thread’s entry-point function returns. (This is highly recommended.)
>One thread in the process calls the ExitProcess function. (Avoid this method.)
>A thread in another process calls the TerminateProcess function. (Avoid this method.)
>All the threads in the process just die on their own. (This hardly ever happens.)

VOID ExitProcess(UINT fuExitCode);
This function terminates the process and sets the exit code of the process to fuExitCode. ExitProcess doesn’t return a value because the process has terminated. If you include any code following the call to ExitProcess, that code will never execute.

Note that calling ExitProcess or ExitThread causes a process or thread to die while inside a function. As far the operating system is concerned, this is fine and all of the process’s or thread’s operating system resources will be cleaned up perfectly. However, a C/C++ application should avoid calling these functions because the C/C++ run time might not be able to clean up properly. Examine the following code:

This function is different from ExitProcess in one major way: any thread can call TerminateProcess to terminate another process or its own process.

Running Detached Child Processes:
the parent process doesn’t need to communicate with the new process or doesn’t require it to complete its work before the parent process continues. This is how the Explorer works. After the Explorer creates a new process for the user, it doesn’t care whether that process continues to live or whether the user terminates it.
BOOL fSuccess = CreateProcess(…, &pi);
if (fSuccess) {

   // Allow the system to destroy the process & thread kernel
   // objects as soon as the child process terminates.

You often need to treat a group of processes as a single entity.

we discussed how a process actually consists of two components: a process kernel object and an address space. Similarly, a thread consists of two components:
>A kernel object that the operating system uses to manage the thread. The kernel object is also where the system keeps statistical information about the thread.
>A thread stack that maintains all the function parameters and local variables required as the thread executes code

Threads are always created in the context of some process and live their entire life within that process. What this really means is that the thread executes code within its process’s address space and manipulates data within its process’s address space. So if you have two or more threads running in the context of a single process, the threads share a single address space. The threads can execute the same code and manipulate the same data. Threads can also share kernel object handles because the handle table exists for each process, not each thread.

processes use a lot more system resources than threads do. The reason for this is the address space. Creating a virtual address space for a process requires a lot of system resources. A lot of record keeping takes place in the system, and this requires a lot of memory. Also, since .exe and .dll files get loaded into an address space, file resources are required as well. A thread, on the other hand, uses significantly fewer system resources. In fact, a thread has just a kernel object and a stack; little record keeping is involved, and little memory is required.

Every thread must have an entry-point function where it begins execution. We already discussed this entry-point function for your primary thread: main, wmain, WinMain, or wWinMain. If you want to create a secondary thread in your process, it must also have an entry-point function, which should look something like this:

DWORD WINAPI ThreadFunc(PVOID pvParam);

 When you call CreateThread, passing a value other than 0 causes the function to reserve and commit all storage for the thread’s stack. Since all the storage is committed up front, the thread is guaranteed to have the specified amount of stack storage available. The amount of reserved space is either the amount specified by the /STACK linker switch or the value of cbStack, whichever is larger. The amount of storage committed matches the value you passed for cbStack. If you pass 0 to the cbStack parameter, CreateThread reserves a region and commits the amount of storage indicated by the /STACK linker switch information embedded in the .exe file by the linker.

The fdwCreate parameter specifies additional flags that control the creation of the thread. It can be one of two values. If the value is 0, the thread is schedulable immediately after it is created. If the value is CREATE_SUSPENDED, the system fully creates and initializes the thread but suspends the thread so that it is not schedulable

Threads internal:
This object has an initial usage count of 2. (The thread kernel object is not destroyed until the thread stops running and the handle returned from CreateThread is closed.) Other properties of the thread’s kernel object are also initialized: the suspension count is set to 1, the exit code is set to STILL_ACTIVE (0x103), and the object is set to the nonsignaled state. 

Once the kernel object has been created, the system allocates memory, which is used for the thread’s stack. This memory is allocated from the process’s address space since threads don’t have an address space of their own. The system then writes two values to the upper end of the new thread’s stack. (Thread stacks always build from high memory addresses to low memory addresses.) The first value written to the stack is the value of the pvParam parameter that you passed to CreateThread. Immediately below it is the pfnStartAddr value that you also passed to CreateThread.

The C/C++ run-time library variables and functions that have problems in multithreaded environments include errno, _doserrno, strtok, _wcstok, strerror, _strerror, tmpnam, tmpfile, asctime, _wasctime, gmtime, _ecvt, and _fcvtóto name just a few. 

For multithreaded C and C++ programs to work properly, a data structure must be created and associated with each thread that uses C/C++ run-time library functions. Then, when you make C/C++ run-time library calls, those functions must know to look in the calling thread’s data block so that no other thread is adversely affected.

Note that the _beginthreadex function exists only in the multithreaded versions of the C/C++ run-time library. If you are linking to a single-thread run-time library, you get an “unresolved external symbol” error reported from the linker.

Here are the important things to note about _beginthreadex:
>Each thread gets its very own tiddata memory block allocated from the C/C++ run-time library’s heap. (The tiddata structure is in the Visual C++ source code in the Mtdll.h file). Just for fun, I’ll reproduce the structure in Figure 6-2.
>The address of the thread function passed to _beginthreadex is saved in the tiddata memory block. The parameter to be passed to this function is also saved in this data block.
>_beginthreadex does call CreateThread internally since this is the only way that the operating system knows how to create a new thread.
>When CreateThread is called, it is told to start executing the new thread with a function called _threadstartex, not pfnStartAddr. Also, note that the parameter passed to the thread function is the address of the tiddata structure, not pvParam.
>If all goes well, the thread handle is returned just like CreateThread. If any operation fails, NULL is returned.

Here are the important things to note about _threadstartex:
>The new thread begins executing with BaseThreadStart (in Kernel32.dll) and then jumps to _threadstartex.
>_threadstartex is passed the address to this new thread’s tiddata block as its only parameter.
>TlsSetValue is an operating system function that associates a value with the calling thread. This is called Thread Local Storage (TLS) and is discussed in Chapter 21. The _threadstartex function associates the tiddata block with the new thread.
>An SEH frame is placed around the desired thread function. This frame handles many things related to the run-time libraryófor example, run-time errors (such as throwing C++ exceptions that are not caught) and the C/C++ run-time library’s signal function. This is critically important. If you were to create a thread using CreateThread and then call the C/C++ run-time library’s signal function, the function would not work correctly.
>The desired thread function is called and passed the desired parameter. Recall that the address of the function and the parameter were saved in the tiddata block by _beginthreadex.

Microsoft’s Visual C++ team realizes that developers like to call ExitThread anyway, so they have made this possible without forcing your application to leak memory. If you really want to forcibly kill your thread, you can have it call _endthreadex (instead of ExitThread) to free the thread’s tiddata block and then exit. Still, I discourage you from calling _endthreadex.

OopsóI Called CreateThread Instead of _beginthreadex by Mistake:
First, the C/C++ run-time function attempts to get the address of the thread’s data block (by calling TlsGetValue). If NULL is returned as the address of the tiddata block, the calling thread doesn’t have a tiddata block associated with it. At this point, the C/C++ run-time function allocates and initializes a tiddata block for the calling thread right on the spot. The block is then associated with the thread (via TlsSetValue) and this block stays with the thread for as long as the thread continues to run. The C/C++ run-time function can now use the thread’s tiddata block, and so can any C/C++ run-time functions that are called in the future.

This, of course, is fantastic because your thread runs without a hitch (almost). Well, actually there are a few problems. First, if the thread uses the C/C++ run-time library’s signal function, the entire process terminates because the structured exception handling frame has not been prepared. Second, if the thread terminates without calling _endthreadex, the data block cannot be destroyed and a memory leak occurs. (And who would call _endthreadex for a thread created with CreateThread?)

HANDLE GetCurrentProcess();
HANDLE GetCurrentThread();
Both of these functions return a pseudo-handle to the calling thread’s process or thread kernel object. These functions do not create new handles in the calling process’s handle table. Also, calling these functions has no effect on the usage count of the process or thread kernel object. If you call CloseHandle, passing a pseudo-handle as the parameter, CloseHandle simply ignores the call and returns FALSE. 

Converting a Pseudo-Handle to a Real Handle
Sometimes you might need to acquire a real handle to a thread instead of a pseudo-handle. By “real,” I mean a handle that unambiguously identifies a unique thread.

The idea is to have the parent thread pass to the child thread a thread handle that identifies the parent thread. However, the parent thread passes a pseudo-handle, not a real handle. When the child thread begins executing, it passes the pseudo-handle to the GetThreadTimes function, which causes the child thread to get its own CPU times, not the parent thread’s CPU times. This happens because a thread pseudo-handle is a handle to the current threadó that is, a handle to whichever thread is making the function call.

To fix this code, we must turn the pseudo-handle into a real handle. The DuplicateHandle function 

Suspending and Resuming a Thread
After the thread is fully initialized, CreateProcess or CreateThread checks to see whether you’ve passed the CREATE_SUSPENDED flag. If you have, the functions return and the new thread is left in the suspended state. If you have not, the function decrements the thread’s suspend count to 0. When a thread’s suspend count is 0, the thread is schedulable unless it is waiting for something else to happen (such as keyboard input).

Creating a thread in the suspended state allows you to alter the thread’s environment (such as priority, discussed later in the chapter) before the thread has a chance to execute any code. Once you alter the thread’s environment, you must make the thread schedulable. You do this by calling ResumeThread and passing it the thread handle returned by the call to CreateThread (or the thread handle from the structure pointed to by the ppiProcInfo parameter passed to CreateProcess):

DWORD ResumeThread(HANDLE hThread);
If ResumeThread is successful, it returns the thread’s previous suspend count;

In real life, an application must be careful when it calls SuspendThread because you have no idea what the thread might be doing when you attempt to suspend it. If the thread is attempting to allocate memory from a heap,other thread may try to free…

Suspending and Resuming a Process
The concept of suspending or resuming a process doesn’t exist for Windows since processes are never scheduled CPU time. we have to write our own mechanisom to work.

A thread can also tell the system that it does not want to be schedulable for a certain amount of time. This is accomplished by calling Sleep:

VOID Sleep(DWORD dwMilliseconds);
This function causes the thread to suspend itself until dwMilliseconds have elapsed

Switching to Another Thread
The system offers a function called SwitchToThread that allows another schedulable thread to run if one exists:
BOOL SwitchToThread();

Calling SwitchToThread is similar to calling Sleep and passing it a timeout of 0 milliseconds. The difference is that SwitchToThread allows lower-priority threads to execute. Sleep reschedules the calling thread immediately even if lower-priority threads are being starved.

A Thread’s Execution Times
Sometimes you want to time how long it takes a thread to perform a particular task, here you go.
BOOL GetThreadTimes(   HANDLE hThread,   PFILETIME pftCreationTime,    PFILETIME pftExitTime,   PFILETIME pftKernelTime,      PFILETIME pftUserTime);

By default, Windows 2000 uses soft affinity when assigning threads to processors,you can control which CPUs can run certain threads. This is called hard affinity.

The system determines how many CPUs are available in the machine at boot time. An application can query the number of CPUs on the machine by calling GetSystemInfo (discussed in Chapter 14). By default, any thread can be scheduled to any of these CPUs. To limit threads in a single process to run on a subset of the available CPUs, you can call SetProcessAffinityMask:

BOOL SetProcessAffinityMask(   HANDLE hProcess,    DWORD_PTR dwProcessAffinityMask);

The first parameter, hProcess, indicates which process to affect. The second parameter, dwProcessAffinityMask, is a bitmask indicating which CPUs the threads can run on. For example, passing 0x00000005 means that threads in this process can run on CPU 0 and CPU 2 but not on CPU 1 and CPUs 3 through 31.

Thread Synchronization User Mode:
Atomic Access: The Interlocked Family of Functions:

long g_x = 0;
DWORD WINAPI ThreadFunc1(PVOID pvParam) {
   InterlockedExchangeAdd(&g_x, 1);

InterlockedExchange and InterlockedExchangePointer atomically replace the current value whose address is passed in the first parameter with a value passed in the second parameter. For a 32-bit application, both functions replace a 32bit value with another 32-bit value. But for a 64-bit application, InterlockedExchange replaces a 32-bit value while InterlockedExchangePointer replaces a 64-bit value. Both functions return the original value. InterlockedExchange is extremely useful when you implement a spinlock:

// Global variable indicating whether a shared resource is in use or not
BOOL g_fResourceInUse = FALSE;

void Func1() {
   // Wait to access the resource.
   while (InterlockedExchange (&g_fResourceInUse, TRUE) == TRUE)

   // Access the resource.

   // We no longer need to access the resource.
   InterlockedExchange(&g_fResourceInUse, FALSE);
You should avoid using spinlocks on single-CPU machines. If a thread is spinning, it’s wasting precious CPU time.Spinlocks are useful on multiprocessor machines because one thread can spin while the other thread runs on another CPU

The while loop spins repeatedly, changing the value in g_fResourceInUse to TRUE and checking its previous value to see if it was TRUE. If the value was previously FALSE, the resource was not in use but the calling thread just set it to in-use and exits the loop. If the previous value was TRUE, the resource was in use by another thread and the while loop continues to spin.

LONG InterlockedIncrement(PLONG plAddend);
LONG InterlockedDecrement(PLONG plAddend);

InterlockedExchangeAdd replaces both of these older functions. The new function can add or subtract any value; the old functions are limited to adding or subtracting 1.

Cache Lines
If you want to build a high-performance application that runs on multiprocessor machines, you must be aware of CPU cache lines. When a CPU reads a byte from memory, it does not just fetch the single byte; it fetches enough bytes to fill a cache line. Cache lines consist of 32 or 64 bytes (depending on the CPU) and are always aligned on 32-byte or 64-byte boundaries. Cache lines exist to improve performance. Usually, an application manipulates a set of adjacent bytes. If these bytes are in the cache, the CPU does not have to access the memory bus, which requires much more time.

What all of this means is that you should group your application’s data together in cache line-size chunks and on cache-line boundaries. The goal is to make sure that different CPUs access different memory addresses separated by at least a cache line boundary. Also, you should separate your read-only data (or infrequently read data) from read-write data. And you should group together pieces of data that are accessed around the same time.

Here is an example of a poorly designed data structure:

struct CUSTINFO {
   DWORD    dwCustomerID;     // Mostly read-only
   int      nBalanceDue;      // Read-write
   char     szName[100];      // Mostly read-only
   FILETIME ftLastOrderDate;  // Read-write

Here is an improved version of this structure:

// Determine the cache line size for the host CPU.
#ifdef _X86_
#define CACHE_ALIGN  32
#ifdef _ALPHA_
#define CACHE_ALIGN  64
#ifdef _IA64_
#define CACHE_ALIGN  ??

#define CACHE_PAD(Name, BytesSoFar) \
   BYTE Name[CACHE_ALIGN – ((BytesSoFar) % CACHE_ALIGN)]

struct CUSTINFO {
   DWORD    dwCustomerID;     // Mostly read-only
   char     szName[100];      // Mostly read-only

   // Force the following members to be in a different cache line.
   CACHE_PAD(bPad1, sizeof(DWORD) + 100);

   int      nBalanceDue;      // Read-write
   FILETIME ftLastOrderDate;  // Read-write

   // Force the following structure to be in a different cache line.
   CACHE_PAD(bPad2, sizeof(int) + sizeof(FILETIME));

A technique to avoid ( Polling)
In this technique, one thread synchronizes itself with the completion of a task in another thread by continuously polling the state of a variable that is shared by or accessible to multiple threads. The following code fragment illustrates this:

volatile BOOL g_fFinishedCalculation = FALSE;
int WINAPI WinMain(…) {
   CreateThread(…, RecalcFunc, …);

   // Wait for the recalculation to complete.
   while (!g_fFinishedCalculation)

DWORD WINAPI RecalcFunc(PVOID pvParam) {
   // Perform the recalculation.

   g_fFinishedCalculation = TRUE;

As you can see, the primary thread (executing WinMain) doesn’t put itself to sleep when it needs to synchronize itself with the completion of the RecalcFunc function. Because the primary thread does not sleep, it is continuously scheduled CPU time by the operating system. This takes precious time cycles away from other threads.

Another problem with the polling method used in the previous code fragment is that the BOOL variable g_fFinishedCalculation might never be set to TRUE. This can happen if the primary thread has a higher priority than the thread executing the RecalcFunc function. In this case, the system never assigns any time slices to the RecalcFunc thread

Critical Sections

If you have multiple resources that are always used together, you can place them all in a single lavatory: create just one CRITICAL_SECTION structure to guard them all.

If you have multiple resources that are not always used togetherófor example, threads 1 and 2 access one resource and threads 1 and 3 access another resourceóyou should create a separate lavatory, or CRITICAL_SECTION structure, for each resource.

Now, wherever you have code that touches a resource, you must place a call to EnterCriticalSection, passing it the address of the CRITICAL_SECTION structure that identifies the resource

The hardest thing to remember is that any code you write that touches a shared resource must be wrapped inside EnterCriticalSection and LeaveCriticalSection functions. If you forget to wrap your code in just one place, the shared resource will be subject to corruption.

If EnterCriticalSection places a thread in a wait state, the thread might not be scheduled again for a long time. In fact, in a poorly written application, the thread might never be scheduled CPU time again. If this happens, the thread is said to be starved.

You can use this function instead of EnterCriticalSection:

BOOL TryEnterCriticalSection(PCRITICAL_SECTION pcs);

TryEnterCriticalSection never allows the calling thread to enter a wait state. Instead, its return value indicates whether the calling thread was able to gain access to the resource. So if TryEnterCriticalSection sees that the resource is being accessed by another thread, it returns FALSE. In all other cases, it returns TRUE.

With this function, a thread can quickly check to see if it can access a certain shared resource and, if not, continue doing something else instead of waiting. If TryEnterCriticalSection does return TRUE, the CRITICAL_SECTION’s member variables have been updated to reflect that the thread is accessing the resource. Therefore, every call to TryEnterCriticalSection that returns TRUE must be matched with a call to LeaveCriticalSection.

Critical Sections and Spinlocks
When a thread attempts to enter a critical section owned by another thread, the calling thread is placed immediately into a wait state. This means that the thread must transition from user mode to kernel mode (about 1000 CPU cycles). This transition is very expensive. On a multiprocessor machine, the thread that currently owns the resource might execute on a different processor and might relinquish control of the resource shortly. In fact, the thread that owns the resource might release it before the other thread has completed executing its transition into kernel mode. If this happens, a lot of CPU time is wasted.

To improve the performance of critical sections, Microsoft has incorporated spinlocks into them. So when EnterCriticalSection is called, it loops using a spinlock to try to acquire the resource some number of times. Only if all the attempts fail does the thread transition to kernel mode to enter a wait state.

To use a spinlock with a critical section, you should initialize the critical section by calling this function:

BOOL InitializeCriticalSectionAndSpinCount(
   DWORD dwSpinCount);

As in InitializeCriticalSection, the first parameter of InitializeCriticalSectionAndSpinCount is the address of the critical section structure. But in the second parameter, dwSpinCount, you pass the number of times you want the spinlock loop to iterate as it tries to acquire the resource before making the thread wait.

Useful Tips and Techniques
Use One CRITICAL_SECTION Variable per Shared Resource
If you have several unrelated data structures in your application, you should create a CRITICAL_SECTION variable for each data structure. This is better than having a single CRITICAL_SECTION structure that guards access to all shared resources.

Access Multiple Resources Simultaneously

1 )DWORD WINAPI ThreadFunc(PVOID pvParam) {


2) DWORD WINAPI OtherThreadFunc(PVOID pvParam) {


Deadlock likely to occur,to solve the problem you must always request access to the resources in exactly the same order. Notice that order does not matter when you call LeaveCriticalSection because this function never causes a thread to enter a wait state.

Thread Synchronization Kernal Mode:

While user-mode thread synchronization mechanisms offer great performance, they do have limitations, and for many applications they simply do not work. For example, the interlocked family of functions operates only on single values and never places a thread into a wait state. You can use critical sections to place a thread in a wait state, but you can use them only to synchronize threads contained within a single process. Also, you can easily get into deadlock situations with critical sections because you cannot specify a timeout value while waiting to enter the critical section.

Only for process and thread kernel objects is said to be in a signaled or nonsignaled state. The toggling of this state is determined by rules that Microsoft has created for each object. For example, process kernel objects are always created in the nonsignaled state. When the process terminates, the operating system automatically makes the process kernel object signaled. Once a process kernel object is signaled, it remains that way forever; its state never changes back to nonsignaled.

A process kernel object is nonsignaled while the process is running, and it becomes signaled when the process terminates.

Threads are not schedulable when the objects they are waiting for are nonsignaled (the flag is lowered). However, as soon as the object becomes signaled (the flag goes up), the thread sees the flag, becomes schedulable, and shortly resumes execution.

Wait Functions:

Wait functions cause a thread to voluntarily place itself into a wait state until a specific kernel object becomes signaled. By far the most common of these functions is WaitForSingleObject:

DWORD WaitForSingleObject(
   HANDLE hObject, 
   DWORD dwMilliseconds);

 The function below, WaitForMultipleObjects, is similar to WaitForSingleObject except that it allows the calling thread to check the signaled state of several kernel objects simultaneously:

DWORD WaitForMultipleObjects(
   DWORD dwCount, 
   CONST HANDLE* phObjects,
   BOOL fWaitAll, 
   DWORD dwMilliseconds); 
DWORD dw = WaitForMultipleObjects(3, h, FALSE, 5000);
switch (dw) {
   case WAIT_FAILED:
      // Bad call to function (invalid handle?)

      // None of the objects became signaled within 5000 milliseconds.

   case WAIT_OBJECT_0 + 0:
      // The process identified by h[0] (hProcess1) terminated.

   case WAIT_OBJECT_0 + 1:
      // The process identified by h[1] (hProcess2) terminated.

Successful Wait Side Effects:
For some kernel objects, a successful call to WaitForSingleObject or WaitForMultipleObjects actually alters the state of the object. A successful call is one in which the function sees that the object was signaled and returns a value relative to WAIT_OBJECT_0. A call is unsuccessful if the function returns WAIT_TIMEOUT or WAIT_FAILED. Objects never have their state altered for unsuccessful calls.

When an object has its state altered, I call this a successful wait side effect.This side effect is applied to auto-reset event kerne
l objects

Event Kernel Objects
Events signal that an operation has completed. There are two different types of event objects: manual-reset events and auto-reset events. When a manual-reset event is signaled, all threads waiting on the event become schedulable. When an auto-reset event is signaled, only one of the threads waiting on the event becomes schedulable.

HANDLE CreateEvent(
   BOOL fManualReset, 
   BOOL fInitialState, 
   PCTSTR pszName);

The fManualReset parameter is a Boolean value that tells the system whether to create a manual-reset event (TRUE) or an auto-reset event (FALSE). The fInitialState parameter indicates whether the event should be initialized to signaled (TRUE) or nonsignaled (FALSE). After the system creates the event object, CreateEvent returns the process-relative handle to the event object. Threads in other processes can gain access to the object by calling CreateEvent using the same value passed in the pszName parameter; by using inheritance; by using the DuplicateHandle function; or by calling OpenEvent, specifying a name in the pszName parameter that matches the name specified in the call to CreateEvent:

HANDLE OpenEvent(
   DWORD fdwAccess, 
   BOOL fInherit, 
   PCTSTR pszName);

As always, you should call the CloseHandle function when you no longer require the event kernel object.
Once an event is created, you control its state directly. When you call SetEvent, you change the event to the signaled state:

BOOL SetEvent(HANDLE hEvent);

When you call ResetEvent, you change the event to the nonsignaled state:

BOOL ResetEvent(HANDLE hEvent);

Microsoft has defined a successful wait side effect rule for an auto-reset event: an auto-reset event is automatically reset to the nonsignaled state when a thread successfully waits on the object. This is how auto-reset events got their name. It is usually unnecessary to call ResetEvent for an auto-reset event because the system automatically resets the event.

// Create a global handle to a manual-reset, nonsignaled event.
HANDLE g_hEvent;

int WINAPI WinMain(…) {

   // Create the manual-reset, nonsignaled event.
   g_hEvent = CreateEve\nt(NULL, TRUE, FALSE, NULL);

   // Spawn 3 new threads.
   HANDLE hThread[3];
   DWORD dwThreadID;
   hThread[0] = _beginthreadex(NULL, 0, WordCount, NULL, 0, &dwThreadID);
   hThread[1] = _beginthreadex(NULL, 0, SpellCheck, NULL, 0, &dwThreadID);
   hThread[2] = _beginthreadex(NULL, 0, GrammarCheck, NULL, 0, &dwThreadID);


   // Allow all 3 threads to access the memory.

DWORD WINAPI WordCount(PVOID pvParam) {

   // Wait until the file’s data is in memory.
   WaitForSingleObject(g_hEvent, INFINITE);

   // Access the memory block.

DWORD WINAPI SpellCheck (PVOID pvParam) {

   // Wait until the file’s data is in memory.
   WaitForSingleObject(g_hEvent, INFINITE);

   // Access the memory block.

DWORD WINAPI GrammarCheck (PVOID pvParam) {

   // Wait until the file’s data is in memory.
   WaitForSingleObject(g_hEvent, INFINITE);

   // Access the memory block.

When this process starts, it creates a manual-reset, nonsignaled event and saves the handle in a global variable. This makes it easy for other threads in this process to access the same event object. Now three threads are spawned. These threads wait until a file’s contents are read into memory, and then each thread accesses the data: one thread does a word count, another runs the spelling checker, and the third runs the grammar checker. The code for these three thread functions starts out identically: each thread calls WaitForSingleObject, which suspends the thread until the file’s contents have been read into memory by the primary thread.

Once the primary thread has the data ready, it calls SetEvent, which signals the event. At this point, the system makes all three secondary threads schedulableóthey all get CPU time and access the memory block. Notice that all three threads will access the memory in a read-only fashion. This is the only reason why all three threads can run simultaneously. Also note that if the machine has multiple CPUs on it, all of these threads can truly execute simultaneously, getting a lot of work done in a short amount of time.

If you use an auto-reset event instead of a manual-reset event, the application behaves quite differently. The system allows only one secondary thread to become schedulable after the primary thread calls SetEvent. Again, there is no guarantee as to which thread the system will make schedulable. The remaining two secondary threads will continue to wait.

The thread that becomes schedulable has exclusive access to the memory block. Let’s rewrite the thread functions so that each function calls SetEvent (just like the WinMain function does) just before returning. The thread functions now look like this:

DWORD WINAPI WordCount(PVOID pvParam) {

   // Wait until the file’s data is in memory.
   WaitForSingleObject(g_hEvent, INFINITE);

   // Access the memory block.

DWORD WINAPI SpellCheck (PVOID pvParam) {

   // Wait until the file’s data is in memory.
   WaitForSingleObject(g_hEvent, INFINITE);

   // Access the memory block.

 When a thread has finished its exclusive pass over the data, it calls SetEvent, which allows the system to make one of the two waiting threads schedulable. Again, we don’t know which thread the system will choose, but this thread will have its own exclusive pass over the memory block. When this thread is done, it will call SetEvent as well, causing the third and last thread to get its exclusive pass over the memory block

Semaphore Kernel Objects
HANDLE hsem = CreateSemaphore(NULL, 0, 5, NULL);
This creates a semaphore with a maximum resource count of 5, but initially 0 resources are available.Since the current resource count is initialized to 0, the semaphore is nonsignaled. Any threads that wait on the semaphore are therefore placed in a wait state.

A thread gains access to a resource by calling a wait function, passing the handle of the semaphore guarding the resource. Internally, the wait function checks the semaphore’s current resource count and if its value is greater than 0 (the semaphore is signaled), the counter is decremented by 1 and the calling thread remains schedulable.

A thread increments a semaphore’s current resource count by calling ReleaseSemaphore:

BOOL ReleaseSemaphore(   HANDLE hsem,    LONG lReleaseCount,    PLONG plPreviousCount);

This function simply adds the value in lReleaseCount to the semaphore’s current resource count.

Mutex Kernel Objects
Mutex kernel objects ensure that a thread has mutual exclusive access to a single resource. In fact, this is how the mutex got its name. A mutex object contains a usage count, a thread ID, and a recursion counter. Mutexes behave identically to critical sections, but mutexes are kernel objects, while critical sections are user-mode objects. This means that mutexes are slower than critical sections. But it also means that threads in different processes can access a single mutex, and it means that a thread can specify a timeout value while waiting to gain access to a resource.

The thread ID identifies which thread in the system currently owns the mutex, and the recursion counter indicates the number of times that this thread owns the mutex. Mutexes have many uses and are among the most frequently used kernel objects. Typically, they are used to guard a block of memory that is accessed by multiple threads. If multiple threads were to access the memory block simultaneously, the data in the block would be corrupted. Mutexes ensure that any thread accessing the memory block has exclusive access to the block so that the integrity of the data is maintained.

HANDLE CreateMutex(   PSECURITY_ATTRIBUTES psa,    BOOL fInitialOwner,   PCTSTR pszName);

The fInitialOwner parameter controls the initial state of the mutex. If you pass FALSE (the usual case), both the mutex object’s thread ID and recursion counter are set to 0. This means that the mutex is unowned and is therefore signaled.

If you pass TRUE for fInitialOwner, the object’s thread ID is set to the calling thread’s ID and the recursion counter is set to 1. Since the thread ID is nonzero, the mutex is initially nonsignaled.

For mutexes, there is one special exception to the normal kernel object signaled/nonsignaled rules. Let’s say that a thread attempts to wait on a nonsignaled mutex object. In this case, the thread is usually placed in a wait state. However, the system checks to see whether the thread attempting to acquire the mutex has the same thread ID as recorded inside the mutex object. If the thread IDs match, the system allows the thread to remain schedulableóeven though the mutex was nonsignaled. We don’t see this “exceptional” behavior applied to any other kernel object anywhere in the system. Every time a thread successfully waits on a mutex, the object’s recursion counter is incremented. The only way the recursion counter can have a value greater than 1 is if the thread waits on the same mutex multiple times, taking advantage of this rule exception.

Once a thread has successfully waited on a mutex, the thread knows that it has exclusive access to the protected resource. Any other threads that attempt to gain access to the resource (by waiting on the same mutex) are placed in a wait state. When the thread that currently has access to the resource no longer needs its access, it must release the mutex by calling the ReleaseMutex function:

BOOL ReleaseMutex(HANDLE hMutex);


Windows memory management:
Every process is given its very own virtual address space. For 32-bit processes, this address space is 4 GB, since a 32-bit pointer can have any value from 0x00000000 through 0xFFFFFFFF. For 64-bit processes, this address space is 16 EB (exabytes), since a 64-bit pointer can have any value from 0x00000000’00000000 through 0xFFFFFFFF’FFFFFFFF.

How a Virtual Address Space Is Partitioned
>NULL-Pointer Assignment (Access violation when try to access)
>DOS/16-bit Windows Application Compatibility
>User-Mode (process private space)
>64-KB Off-Limits
>Shared Memory-Mapped File (MMF) 1GB only for windows98.
>Kernel-Mode – (This partition is where the operating system’s code resides. )

Regions in an Address Space
When a process is created and given its address space, the bulk of this usable address space is free, or unallocated. To use portions of this address space, you must allocate regions within it by calling VirtualAlloc. The act of allocating a region is called reserving.

When your program’s algorithms no longer need to access a reserved region of address space, the region should be freed. This process is called releasing the region of address space and is accomplished by calling the VirtualFree function.

To use a reserved region of address space, you must allocate physical storage and then map this storage to the reserved region. This process is called committing physical storage. Physical storage is always committed in pages. To commit physical storage to a reserved region, you again call the VirtualAlloc function.

The Importance of Data Alignment
CPUs operate most efficiently when they access properly aligned data. Data is aligned when the memory address of the data modulo of the data’s size is 0. For example, a WORD value should always start on an address that is evenly divided by 2, a DWORD value should always start on an address that is evenly divided by 4, and so on. When the CPU attempts to read a data value that is not properly aligned, the CPU will do one of two things. It will either raise an exception or the CPU will perform multiple, aligned memory accesses in order to read the full misaligned data value.

Here is some code that accesses misaligned data:

VOID SomeFunc(PVOID pvDataBuffer) { 
   // The first byte in the buffer is some byte of information
   char c = * (PBYTE) pvDataBuffer;       

   // Increment past the first byte in the buffer 
   pvDataBuffer = (PVOID)((PBYTE) pvDataBuffer + 1); 

   // Bytes 2-5 contain a double-word value
   DWORD dw = * (DWORD *) pvDataBuffer; 

   // The line above raises a data misalignment exception on the Alpha

Correct code shoud be as follows:

  // Bytes 2-5 contain a double-word value
   DWORD dw = * (_ _unaligned DWORD *) pvDataBuffer;

   // The line above causes the compiler to generate additional 
   // instructions so that several aligned data accesses are performed 
   // to read the DWORD. 
   // Note that a data misalignment exception is not raised.

Using Virtual Memory in Your Own Applications:
Windows offers three mechanisms for manipulating memory:
>Virtual memory, which is best for managing large arrays of objects or structures
>Memory-mapped files, which are best for managing large streams of data (usually from files) and for sharing data between multiple   processes running on a single machine
>Heaps, which are best for managing large numbers of small objects

Memory-Mapped Files
Like virtual memory, memory-mapped files allow you to reserve a region of address space and commit physical storage to the region. The difference is that the physical storage comes from a file that is already on the disk instead of the system’s paging file. Once the file has been mapped, you can access it as if the whole file were loaded in memory.

Memory-mapped files are used for three different purposes:

The system uses memory-mapped files to load and execute .exe and DLL files. This greatly conserves both paging file space and the time required for an application to begin executing.

You can use memory-mapped files to access a data file on disk. This shelters you from performing file I/O operations on the file and from buffering the file’s contents.

You can use memory-mapped files to allow multiple processes running on the same machine to share data with each other. Windows does offer other methods for communicating data among processesóbut these other methods are implemented using memory-mapped files, making memory-mapped files the most efficient way for multiple processes on a single machine to communicate with one another.

Memory-Mapped Executables and DLLs
When a thread calls CreateProcess, the system performs the following steps:

>The system locates the .exe file specified in the call to CreateProcess. If the .exe file cannot be found, the process is not created and CreateProcess returns FALSE.

>The system creates a new process kernel object.

>The system creates a private address space for this new process.

>The system reserves a region of address space large enough to contain the .exe file. The desired location of this region is specified inside the .exe file itself. By default, an .exe file’s base address is 0x00400000 (this address might be different for a 64-bit application running on 64-bit Windows 2000). However, you can override this when you create your application’s .exe file by using the linker’s /BASE option when you link your application.

>The system notes that the physical storage backing the reserved region is in the .exe file on disk instead of the system’s paging file.

Sharing Static Data Across Multiple Instances of an Executable or a DLL
The fact that global and static data is not shared by multiple mappings of the same .exe or DLL is a safe default. However, on some occasions it is useful and convenient for multiple mappings of an .exe to share a single instance of a variable

#pragma data_seg(“Shared”)
volatile LONG g_lApplicationInstances = 0;
#pragma data_seg()

// Tell the linker to make the Shared section 
// readable, writable, and shared.
#pragma comment(linker, “/Section:Shared,RWS”)

Using Memory-Mapped Files
To use a memory-mapped file, you must perform three steps:
>Create or open a file kernel object that identifies the file on disk that you want to use as a memory-mapped file.
>Create a file-mapping kernel object that tells the system the size of the file and how you intend to access the file.
>Tell the system to map all or part of the file-mapping object into your process’s address space.
>When you are finished using the memory-mapped file, you must perform three steps to clean up:
>Tell the system to unmap the file-mapping kernel object from your process’s address space.
>Close the file-mapping kernel object.
>Close the file kernel object.

//Creating or Opening a File Kernel Object
HANDLE hFile = CreateFile(…); 
//must tell the system how much physical storage the file-mapping object requires
HANDLE hFileMapping = CreateFileMapping(hFile, …);
//Mapping the File’s Data into the Process’s Address Space
PVOID pvFile = MapViewOfFile(hFileMapping, …);

// Use the memory-mapped file.
//Unmapping the File’s Data from the Process’s Address Space

Using Memory-Mapped Files to Share Data Among Processes
This data sharing is accomplished by having two or more processes map views of the same file-mapping object, which means they are sharing the same pages of physical storage. As a result, when one process writes to data in a view of a shared file-mapping object, the other processes see the change instantly in their views. Note that for multiple processes to share a single file-mapping object, all processes must use exactly the same name for the file-mapping object. use OpenFileMapping()

A Process’s Default Heap(used for most of the windows fn(),serialized action only one thread can create at a time others has to wait)
When a process initializes, the system creates a heap in the process’s address space. This heap is called the process’s default heap. By default, this heap’s region of address space is 1 MB in size. 

A single process can have several heaps at once. These heaps can be created and destroyed during the lifetime of the process. The default heap, however, is created before the process begins execution and is destroyed automatically when the process terminates. You cannot destroy the process’s default heap. Each heap is identified with its own heap handle, and all of the heap functions that allocate and free blocks within a heap require this heap handle as a parameter.

You can obtain the handle to your process’s default heap by calling GetProcessHeap: HANDLE GetProcessHeap();

Reasons to Create Additional Heaps
In addition to the process’s default heap, you can create additional heaps in your process’s address space. You would want to create additional heaps in your own applications for the following reasons:

>Component protection
>More efficient memory management
>Local access
>Avoiding thread synchronization overhead
>Quick Free

Component protection:

Imagine that your application needs to process two components: a linked list of NODE structures and a binary tree of BRANCH structures. If the NODEs and the BRANCHes are stored together in a single heap,Now let’s say that a bug in the linked-list code causes the 8 bytes after NODE 1 to be accidentally overwritten, which in turn causes the data in BRANCH 3 to be corrupted. When the code in BinTree.cpp later attempts to traverse the binary tree, it will probably fail because of this memory corruption. Of course, this will lead you to believe that there is a bug in your binary-tree code when in fact the bug exists in the linked-list code. Because the different types of objects are mixed together in a single heap, tracking down and isolating bugs becomes significantly more difficult.

More efficient memory management:
Heaps can be managed more efficiently by allocating objects of the same size within them. For example, let’s say that every NODE structure requires 24 bytes and every BRANCH structure requires 32 bytes. All of these objects are allocated from a single heap. Figure 18-2 shows a fully occupied single heap with several NODE and BRANCH objects allocated within it. If NODE 2 and NODE 4 are freed, memory in the heap becomes fragmented. If you then attempt to allocate a BRANCH structure, the allocation will fail even though 48 bytes are available and a BRANCH needs only 32 bytes.

If each heap consisted only of objects that were the same size, freeing an object would guarantee that another object would fit perfectly into the freed object’s space.

Avoiding Thread Synchronization Overhead:
As I’ll explain shortly, heaps are serialized by default so that there is no chance of data corruption if multiple threads attempt to access the heap at the same time. However, the heap functions must execute additional code in order to keep the heap thread-safe. If you are performing lots of heap allocations, executing this additional code can really add up, taking a toll on your application’s performance. When you create a new heap, you can tell the system that only one thread will access the heap and therefore the additional code will not execute. However, be carefulóyou are now taking on the responsibility of keeping the heap thread-safe. The system will not be looking out for you.

Quick Free:
Finally, using a dedicated heap for some data structures allows you to free the entire heap without having to free each memory block explicitly within the heap.

How to Create an Additional Heap
You can create additional heaps in your process by having a thread call HeapCreate:

HANDLE HeapCreate(   DWORD fdwOptions,    SIZE_T dwInitialSize,   SIZE_T dwMaximumSize);

Allocating a block of memory from a heap is simply a matter of calling HeapAlloc:

PVOID HeapAlloc(   HANDLE hHeap,   DWORD fdwFlags,   SIZE_T dwBytes);

Resizing a memory block is accomplished by calling the HeapReAlloc function:

PVOID HeapReAlloc(   HANDLE hHeap,    DWORD fdwFlags,   PVOID pvMem,    SIZE_T dwBytes);

After a memory block has been allocated, the HeapSize function can be called to retrieve the actual size of the block:

SIZE_T HeapSize(   HANDLE hHeap,   DWORD fdwFlags,    LPCVOID pvMem);

When you no longer need the memory block, you can free it by calling HeapFree:

BOOL HeapFree(   HANDLE hHeap,   DWORD fdwFlags,    PVOID pvMem);

If your application no longer needs a heap that it created, you can destroy the heap by calling HeapDestroy:

BOOL HeapDestroy(HANDLE hHeap);

Using Heaps with C++
void* CSomeClass::operator new (size_t size) {
   if (s_hHeap == NULL) {
      // Heap does not exist; create it.
      s_hHeap = HeapCreate(HEAP_NO_SERIALIZE, 0, 0);

      if (s_hHeap == NULL)
   // The heap exists for CSomeClass objects.
   void* p = HeapAlloc(s_hHeap, 0, size);

   if (p != NULL) {
      // Memory was allocated successfully; increment
      // the count of CSomeClass objects in the heap.

   // Return the address of the allocated CSomeClass object.

void CSomeClass::operator delete (void* p) {
   if (HeapFree(s_hHeap, 0, p)) {
      // Object was deleted successfully.

   if (s_uNumAllocsInHeap == 0) {
      // If there are no more objects in the heap,
      // destroy the heap.
      if (HeapDestroy(s_hHeap)) {
         // Set the heap handle to NULL so that the new operator
         // will know to create a new heap if a new CSomeClass
         // object is created.
         s_hHeap = NULL;

Dynamic link libraries:
It is often easier to create a DLL than to create an application because a DLL usually consists of a set of autonomous functions that any application can use. There is usually no support code for processing message loops or creating windows within DLLs. A DLL is simply a set of source code modules, with each module containing a set of functions that an application (executable file) or another DLL will call. 

Before an application (or another DLL) can call functions in a DLL, the DLL’s file image must be mapped into the calling process’s address space. You can do this using one of two methods: implicit load-time linking or explicit run-time linking.

When a module offers a function that allocates memory, the module must also offer a function that frees memory.don’t forget that functions in other modules might not even be written in C/C++ and therefore might not use malloc and free for memory allocations. Be careful not to make these assumptions in your code.

Building the DLL Module
Building a DLL requires the following steps:

You must first create a header file, which contains the function prototypes, structures, and symbols that you want to export from the DLL. This header file is included by all of your DLL’s source code modules to help build the DLL. As you’ll see later, this same header file is required when you build an executable module (or modules) that uses the functions and variables contained in your DLL.

You create the C/C++ source code module (or modules) that implements the functions and variables that you want in the DLL module. Since these source code modules are not required to build an executable module, the DLL company’s source code can remain a company secret.

Building the DLL module causes the compiler to process each source code module, producing an .obj module (one .obj module per source code module).

After all of the .obj modules are created, the linker combines the contents of all the .obj modules and produces a single DLL image file. This image file (or module) contains all the binary code and global/static data variables for the DLL. This file is required in order to execute the executable module.

If the linker detects that the DLL’s source code module exports at least one function or variable, the linker also produces a single .lib file. This .lib file is small because it contains no functions or variables. It simply lists all the exported function and variable symbol names. This file is required in order to build the executable module.

Once you build the DLL module, you can build the executable module. These steps are

In all of the source modules that reference functions, variables, data structures, or symbols, you must include the header file created by the DLL developer.

You create the C/C++ source code module (or modules) that implements the functions and variables that you want in the executable module. The code can, of course, reference functions and variables defined in the DLL’s header file.

Building the executable module causes the compiler to process each source code module, producing an .obj module (one .obj module per source code module).

After all of the .obj modules are created, the linker combines the contents of all the .obj modules and produces a single executable image file. This image file (or module) contains all the binary code and global/static data variables for the executable. The executable module also contains an import section that lists all the DLL module names required by this executable. (See Chapter 17 for more on sections.) In addition, for each DLL name listed, the section indicates which function and variable symbols are referenced by the executable’s binary code. The operating system loader parses the import section, as you’ll see in a moment.

Once the DLL and the executable modules are built, a process can execute. When you attempt to run the executable module, the operating system’s loader performs the following steps:

The loader creates a virtual address space for the new process. The executable module is mapped into the new process’s address space. The loader parses the executable module’s import section. For every DLL name listed in the section, the loader locates the DLL module on the user’s system and maps that DLL into the process’s address space. Note that since a DLL module can import functions and variables from another DLL module, a DLL module might have its own import section. To fully initialize a process, the loader parses every module’s import section and maps all required DLL modules into the process’s address space. As you can see, initializing a process can be time consuming.

 DLL Advanced Techniques

Explicit DLL Module Loading and Symbol Linking
At any time, a thread in the process can decide to map a DLL into the process’s address space by calling one of these two functions:

HINSTANCE LoadLibrary(PCTSTR pszDLLPathName);

When the threads in the process no longer want to reference symbols in a DLL, you can explicitly unload the DLL from the process’s address space by calling this function:

BOOL FreeLibrary(HINSTANCE hinstDll);

A thread can determine whether a DLL is already mapped into its process’s address space by calling the GetModuleHandle function:

HINSTANCE GetModuleHandle(PCTSTR pszModuleName);

Explicitly Linking to an Exported Symbol;

Once a DLL module has been explicitly loaded, the thread must get the address of the symbol that it wants to reference by calling this function:

FARPROC GetProcAddress(   HINSTANCE hinstDll,    PCSTR pszSymbolName);

 DllMain and the C/C++ Run-Time Library:
When your DLL file image is mapped into a process’s address space, the system actually calls this _DllMainCRTStartup function instead of your DllMain function. The _DllMainCRTStartup function initializes the C/C++ run-time library and ensures that any global or static C++ objects are constructed when _DllMainCRTStartup receives the DLL_PROCESS_ATTACH notification. After any C/C++ run-time initialization has been performed, the _DllMainCRTStartup function calls your DllMain function.

Delay-Loading a DLL:
delay-load DLLs. A delay-load DLL is a DLL that is implicitly linked but not actually loaded until your code attempts to reference a symbol contained within the DLL.


The /Lib switch tells the linker to embed a special function, _ _delayLoadHelper, into your executable. The second switch tells the linker the following things:

Remove MyDll.dll from the executable module’s import section so that the operating system loader does not implicitly load the DLL when the process initializes.

Embed a new Delay Import section (called .didata) in the executable indicating which functions are being imported from MyDll.dll.

Resolve calls to the delay-loaded functions by having calls jump to the _ _delayLoadHelper function.

When the application runs, a call to a delay-loaded function actually calls the _ _delayLoadHelper function instead. This function references the special Delay Import section and knows to call LoadLibrary followed by GetProcAddress. Once the address of the delay-loaded function is obtained, _ _delayLoadHelper fixes up calls to that function so future calls go directly to the delay-loaded function.

Rebasing Modules
Every executable and DLL module has a preferred base address, which identifies the ideal memory address where the module should get mapped into a process’s address space. When you build an executable module, the linker sets the module’s preferred base address to 0x00400000. For a DLL module, the linker sets a preferred base address of 0x10000000. 

OK, now let’s say that you’re designing an application that requires two DLLs. By default, the linker sets the .exe module’s preferred base address to 0x00400000 and the linker sets the preferred base address for both DLLs to 0x10000000. If you attempt to run the .exe, the loader creates the virtual address space and maps the .exe module at the 0x00400000 memory address. Then the loader maps the first DLL to the 0x10000000 memory address. But now, when the loader attempts to map the second DLL into the process’s address space, it can’t possibly map it at the module’s preferred base address. It must relocate the DLL module, placing it somewhere else.

MOV   [0x10014540], 5

Relocating an executable (or DLL) module is an absolutely horrible process, and you should take measures to avoid it. Let’s see why. Suppose that the loader relocates the second DLL to address 0x20000000. In that case, the code that changes the g_x variable to 5 should be ( also to move )

MOV   [0x20014540], 5

There are two major drawbacks when a module cannot load at its preferred base address:
The loader has to iterate through the relocation section and modify a lot of the module’s code. This produces a major performance hit and can really hurt an application’s initialization time.

As the loader writes to the module’s code pages, the system’s copy-on-write mechanism forces these pages to be backed by the system’s paging file.

You now understand the importance of the preferred base address. So if you have multiple modules that you’re loading into a single address space, you must set different preferred base addresses for each module. Microsoft Visual Studio’s Project Settings dialog box makes this easy. All you do is select the Link tab and then select the Output category. In the Base Address field, which is blank by default, you enter a number. In the following figure, I’ve set my DLL module’s base address to 0x20000000.

When you execute Rebase, passing it a set of image file names, it does the following:
It simulates creating a process’s address space.

It opens all of the modules that would normally be loaded into this address space. It thus gets the preferred base address and size of each module.

It simulates relocating the modules in the simulated address space so that none of the modules overlap.

For the relocated modules, it parses that module’s relocation section and modifies the code in the module file on disk.

It updates the header of each relocated module to reflect the new preferred base address.

Binding Modules
Rebasing is very important and greatly improves the performance of the entire system. However, you can do even more to improve performance. Let’s say that you have properly rebased all of your application’s modules. Recall from Chapter 19 our discussion about how the loader looks up the address of all the imported symbols. The loader writes the symbol’s virtual address into the executable module’s import section. This allows references to the imported symbols to actually get to the correct memory location

When you execute Bind, passing it an image name, it does the following:

It opens the specified image file’s import section.

For every DLL listed in the import section, it opens the DLL file and looks in its header to determine its preferred base address.

It looks up each imported symbol in the DLL’s export section.

It takes the RVA of the symbol and adds to it the module’s preferred base address. It writes the resulting expected virtual address of the imported symbol to the image file’s import section.

It adds some additional information to the image file’s import section. This information includes the name of all DLL modules that the image is bound to and the timestamp of those modules.
Thread-Local Storage

The C/C++ run-time library uses TLS. Because the library was designed years before multithreaded applications, most functions in the library are intended for use with single-threaded applications. The strtok function is an excellent example. The first time an application calls strtok, the function passes the address to a string and saves the address of the string in its own static variable. When you make future calls to strtok, passing NULL, the function refers to the saved string address.

In a multithreaded environment, one thread might call strtok, and then, before it can make another call, another thread might also call strtok. In this case, the second thread causes strtok to overwrite its static variable with a new address without the first thread’s knowledge. The first thread’s future calls to strtok use the second thread’s string, which can lead to all kinds of bugs that are difficult to find and to fix.

To address this problem, the C/C++ run-time library uses TLS. Each thread is assigned its own string pointer that is reserved for use by the strtok function. Other C/C++ run-time functions that require the same treatment include asctime and gmtime.

Dynamic TLS
An application takes advantage of dynamic TLS by calling a set of four functions. These functions are actually most often used by DLLs.
The figure shows a single set of in-use flags for each process running in the system. Each flag is set to either FREE or INUSE, indicating whether the TLS slot is in use. Microsoft guarantees that at least TLS_MINIMUM_AVAILABLE bit flags are available. By the way, TLS_MINIMUM_AVAILABLE is defined as 64 in WinNT.h. Windows 2000 has expanded this flag array to allow more than 1000 TLS slots! This should be more than enough slots for any application.

To use dynamic TLS, you must first call TlsAlloc:

DWORD TlsAlloc();

This function instructs the system to scan the bit flags in the process and locate a FREE flag. The system then changes the flag from FREE to INUSE, and TlsAlloc returns the index of the flag in the bit array.

When a thread is created, an array of TLS_MINIMUM_AVAILABLE PVOID values is allocated, initialized to 0, and associated with the thread by the system. As Figure 21-1 shows, each thread gets its own array and each PVOID in the array can store any value.

To place a value in a thread’s array, you call the TlsSetValue function:

BOOL TlsSetValue(   DWORD dwTlsIndex,    PVOID pvTlsValue);

This function puts a PVOID value, identified by the pvTlsValue parameter, into the thread’s array at the index identified by the dwTlsIndex parameter. The value of pvTlsValue is associated with the thread making the call to TlsSetValue. If the call is successful, TRUE is returned

To retrieve a value from a thread’s array, you call TlsGetValue:

PVOID TlsGetValue(DWORD dwTlsIndex);

When you come to a point in your process where you no longer need to reserve a TLS slot among all threads, you should call TlsFree:

BOOL TlsFree(DWORD dwTlsIndex);

This function simply tells the system that this slot no longer needs to be reserved. The INUSE flag managed by the process’s bit flags array is set to FREE again and might be allocated in the future if a thread later calls TlsAlloc. TlsFree returns TRUE if the function is successful. Attempting to free a slot that was not allocated results in an error

Static TLS
Like dynamic TLS, static TLS associates data with a thread. However, static TLS is much easier to use in your code because you don’t have to call any functions to take advantage of it.

Let’s say that you want to associate a start time with every thread created by your application. All you do is declare the start-time variable as follows:

_ _declspec(thread) DWORD gt_dwStartTime = 0;

Structrued Exception Handling : Termination Handlers

SEH really consists of two main capabilities: termination handling and exception handling. We’ll discuss termination handlers in this chapter and exception handling in the next chapter.

A termination handler guarantees that a block of code (the termination handler) will be called and executed regardless of how another section of code (the guarded body) is exited. The syntax (using the Microsoft Visual C++ compiler) for a termination handler is as follows:

_ _try {
   // Guarded body
_ _finally {
   // Termination handler
The _ _try and _ _finally keywords delineate the two sections of the termination handler. In the code fragment above, the operating system and the compiler work together to guarantee that the _ _finally block code in the termination handler will be executed no matter how the guarded body is exited. Regardless of whether you put a return, a goto, or even a call to longjump in the guarded body, the termination handler will be called. 

Understanding Termination Handlers by Example

So far we have explicitly identified two scenarios that force the finally block to be executed:

>Normal flow of control from the try block into the finally block
>Local unwind: premature exit from the try block (goto, longjump, continue, break, return, and so on) forcing control to the finally    block
>A third scenarioóa global unwindóoccurred without explicit identification as such in the Funcfurter1 function we saw earlier in the chapter. Inside the try block of this function was a call to the Funcinator function. If the Funcinator function caused a memory access violation, a global unwind caused Funcfurter1’s finally block to execute.

>The use of the _ _leave keyword in the try block causes a jump to the end of the try block. You can think of it as jumping to the try block’s closing brace. Because the flow of control will exit naturally from the try block and enter the finally block, no overhead is incurred. However, it was necessary to introduce a new Boolean variable, fFunctionOk, to indicate the success or failure of the function.

Exception Handlers and Software Exceptions
When a hardware or software exception is raised, the operating system offers your application the opportunity to see what type of exception was raised and allows the application to handle the exception itself. Here is the syntax for an exception handler:

_ _try {
   // Guarded body
_ _except (exception filter) {
   // Exception handler

Notice the _ _except keyword. Whenever you create a try block, it must be followed by either a finally block or an except block. A try block can’t have both a finally block and an except block, and a try block can’t have multiple finally or except blocks. However, it is possible to nest try-finally blocks inside try-except blocks and vice versa.

Understanding Exception Filters and Exception Handlers by Example
In Funcmeister2, an instruction inside the try block calls for the attempt to divide 5 by 0. The CPU will catch this event and raise a hardware exception. When this exception is raised, the system will locate the beginning of the except block and evaluate the exception filter expression, an expression that must evaluate to one of the following three identifiers as defined in the Windows’ Excpt.h file.

Identifier Defined As 

In Funcmeister2, the exception filter expression evaluates to EXCEPTION_EXECUTE_HANDLER. This value basically says to the system, “I recognize the exception. That is, I had a feeling that this exception might occur some time, and I’ve written some code to deal with it that I’d like to execute now.” At this point, the system performs a global unwind (discussed later in this chapter) and then execution jumps to the code inside the except block (the exception handler code). After the code in the except block has executed, the system considers the exception to be handled and allows your application to continue executing. This mechanism allows Windows applications to trap errors, handle them, and continue running without the user ever knowing that the error happened.

When an exception filter evaluates to EXCEPTION_EXECUTE_HANDLER, the system must perform a global unwind. The global unwind causes all of the outstanding try-finally blocks that started executing below the try-except block that handles the exception to resume execution.

Often an exception filter must analyze the situation before it can determine which value to return. For example, your handler might know what to do if a divide by 0 exception occurs, but it might not know how to handle a memory access exception. The exception filter has the responsibility for examining the situation and returning the appropriate value.

Software Exceptions
So far, we have been discussing hardware exceptions in which the CPU catches an event and raises an exception. It is also possible for your code to forcibly raise an exception. This is another way for a function to indicate failure to its caller.

VOID RaiseException(   DWORD dwExceptionCode,    DWORD dwExceptionFlags,   DWORD nNumberOfArguments, 
   CONST ULONG_PTR *pArguments);

Windows Messaging:
A Thread’s Message Queue:
one of the main goals of Windows is to offer a robust environment for all the applications running. To meet this goal, each thread must run in an environment in which it believes that it is the only thread running. More specifically, each thread must have message queues that are totally unaffected by other threads. In addition, each thread must have a simulated environment that allows the thread to maintain its own notion of keyboard focus, window activation, mouse capture, and so on.

When a thread is first created, the system assumes that the thread will not be used for any user interface_related tasks. This reduces the system resources required by the thread. However, as soon as the thread calls a graphical UI-related function (such as checking its message queue or creating a window) the system automatically allocates some additional resources for the thread so that it can perform its UI-related tasks. Specifically, the system allocates a THREADINFO structure and associates this data structure with the thread.

This THREADINFO structure contains a set of member variables that are used to make the thread think that it is running in its very own environment. The THREADINFO structure is an internal, undocumented data structure that identifies the thread’s posted-message queue, send-message queue, reply- message queue, virtualized-input queue, and wake flags, as well as a number of variables that are used for the thread’s local input state. Figure 26-1 illustrates how THREADINFO structures are associated with three threads.

Posting Messages to a Thread’s Message Queue:
Messages are placed in a thread’s posted-message queue by calling the PostMessage function:

BOOL PostMessage(   HWND hwnd,    UINT uMsg,    WPARAM wParam,   LPARAM lParam);
When a thread calls this function, the system determines which thread created the window identified by the hwnd parameter

PostMessage returns immediately after posting the messageóthe calling thread has no idea whether the posted message was processed by the specified window’s window procedure.

A message can also be placed in a thread’s posted-message queue by calling PostThreadMessage:

BOOL PostThreadMessage(   DWORD dwThreadId,    UINT uMsg,    WPARAM wParam,   LPARAM lParam);

The last function that posts a message to a thread’s queue is PostQuitMessage:

VOID PostQuitMessage(int nExitCode);

You call this function in order to terminate a thread’s message loop. Calling PostQuitMessage is similar to calling

PostThreadMessage(GetCurrentThreadId(), WM_QUIT, nExitCode, 0); 

However, PostQuitMessage doesn’t really post a message to any of the THREADINFO structure’s queues. Internally PostQuitMessage just turns on the QS_QUIT wake flag (which I’ll discuss later) and sets the nExitCode member of the THREADINFO structure. 

Sending Messages to a Window: (Intgerthread communication)

Window messages can be sent directly to a window procedure by using the SendMessage function:

LRESULT SendMessage(   HWND hwnd,    UINT uMsg,    WPARAM wParam,   LPARAM lParam);

The window procedure will process the message. Only after the message has been processed will SendMessage return to the caller

LRESULT SendMessageTimeout(   HWND hwnd,    UINT uMsg,    WPARAM wParam,   LPARAM lParam,    UINT fuFlags, 
   UINT uTimeout, 
   PDWORD_PTR pdwResult);

The SendMessageTimeout function allows you to specify the maximum amount of time you are willing to wait for another thread to reply to your message. The first four parameters are the same parameters that you pass to SendMessage. For the fuFlags parameter, you can pass SMTO_NORMAL (defined as 0), SMTO_ABORTIFHUNG, SMTO_BLOCK, SMTO_NOTIMEOUTIFNOTHUNG, or a combination of these flags.

BOOL SendMessageCallback(   HWND hwnd,    UINT uMsg,   WPARAM wParam,   LPARAM lParam,    SENDASYNCPROC pfnResultCallBack, 
   ULONG_PTR dwData);

Again, the first four parameters are the same as those used by the SendMessage function. When a thread calls SendMessageCallback, the function sends the message off to the receiving thread’s send-message queue and immediately returns so that your thread can continue processing. When the receiving thread has finished processing the message, a message is posted to the sending thread’s reply-message queue. Later, the system notifies your thread of the reply by calling a function that you write using the following prototype:

VOID CALLBACK ResultCallBack(   HWND hwnd,    UINT uMsg,    ULONG_PTR dwData,   LRESULT lResult);

You must pass the address to this function as the pfnResultCallBack parameter of SendMessageCallback

The third function that can help send interthread messages is SendNotifyMessage:

BOOL SendNotifyMessage(   HWND hwnd,    UINT uMsg,    WPARAM wParam,   LPARAM lParam);

SendNotifyMessage places a message in the send-message queue of the receiving thread and returns to the calling thread immediately. This should sound familiar because it is exactly what the PostMessage function does. However, SendNotifyMessage differs from PostMessage in two ways.

First, if SendNotifyMessage sends a message to a window created by another thread, the sent message has higher priority than posted messages placed in the receiving thread’s queue. In other words, messages that the SendNotifyMessage function places in a queue are always retrieved before messages that the PostMessage function posts to a queue.

Second, when you are sending a message to a window created by the calling thread, SendNotifyMessage works exactly like the SendMessage function: SendNotifyMessage doesn’t return until the message has been processed.

Waking a Thread
When a thread calls GetMessage or WaitMessage and there are no messages for the thread or windows created by the thread, the system can suspend the thread so that it is not scheduled any CPU time. However, when a message is posted or sent to the thread, the system sets a wake flag indicating that the thread should now be scheduled CPU time to process the message. Under normal circumstances, the user is not typing or moving the mouse and no messages are being sent to any of the windows. This means that most of the threads in the system are not being scheduled any CPU time.

The Algorithm for Extracting Messages from a Thread’s Queue:
When a thread calls GetMessage or PeekMessage, the system must examine the state of the thread’s queue status flags and determine which message should be processed. Figure 26-2 and the following steps illustrate how the system determines which message the thread should process next.

>If the QS_SENDMESSAGE flag is turned on, the system sends the message to the proper window procedure. Both the GetMessage and PeekMessage functions handle this processing internally and do not return to the thread after the window procedure has processed the message; instead, these functions sit and wait for another message to process.

>If messages are in the thread’s posted-message queue, GetMessage and PeekMessage fill the MSG structure passed to these functions, and then the functions return. The thread’s message loop usually calls DispatchMessage at this point to have the message processed by the appropriate window procedure.

>If the QS_QUIT flag is turned on, GetMessage and PeekMessage return a WM_QUIT message (where the wParam parameter is the specified exit code) and reset the QS_QUIT flag.

>If messages are in the thread’s virtualized input queue, GetMessage and PeekMessage return the hardware input message.

>If the QS_PAINT flag is turned on, GetMessage and PeekMessage return a WM_PAINT message for the proper window.

>If the QS_TIMER flag is turned on, GetMessage and PeekMessage return a WM_TIMER message.

Sending Data with Messages to another process:
In this section, we’ll examine how the system transfers data between processes using window messages. Some window messages specify the address of a block of memory in their lParam parameter. For example, the WM_SETTEXT message uses the lParam parameter as a pointer to a zero-terminated string that identifies the new text for the window. Consider the following call:

SendMessage(FindWindow(NULL, “Calculator”), WM_SETTEXT,
   0, (LPARAM) “A Test Caption”);

the system looks specifically for the WM_SETTEXT message and handles it differently from the way it handles most other messages. When you call SendMessage, the code in the function checks whether you are trying to send a WM_SETTEXT message. If you are, it packs the zero-terminated string from your address space into a memory-mapped file that it is going to share with the other process. Then it sends the message to the thread in the other process. When the receiving thread is ready to process the WM_SETTEXT message, it determines the location, in its own address space, of the shared memory-mapped file that contains a copy of the new window text. The lParam parameter is initialized to point to this address, and the WM_SETTEXT message is dispatched to the appropriate window procedure. After the message is processed, the memory-mapped file is destroyed. 

Well, all this is fine and good if you are sending messages that the system is aware of. But what if you create your own (WM_USER + x) message that you want to send from one process to a window in another? The system will not know that you want it to use memory-mapped files and to update pointers when sending. However, Microsoft has created a special window message, WM_COPYDATA, for exactly this purpose:

SendMessage(hwndReceiver, WM_COPYDATA,   (WPARAM) hwndSender, (LPARAM) &cds);

COPYDATASTRUCT is a structure defined in WinUser.h, and it looks like this:

typedef struct tagCOPYDATASTRUCT {   ULONG_PTR dwData;   DWORD cbData;   PVOID lpData;

 When SendMessage sees that you are sending a WM_COPYDATA message, it creates a memory-mapped file cbData bytes in size and copies the data from your address space to the memory-mapped file. It then sends the message to the destination window. When the receiving window procedure processes this message, the lParam parameter points to a COPYDATASTRUCT that exists in the address space of the receiving process. The lpData member of this structure points to the view of the shared memory-mapped file in the receiving process’s address space.

The WM_COPYDATA message is an incredible device that could save many developers hours of time when trying to solve interprocess communication problems. It’s a shame it’s not used more frequently.

How a Single-Threaded Program Processes Messages GetMessage and PeekMessage
All the programs so far in this book have been single-threaded, which means that your code has only one path of execution. With ClassWizard’s help, you’ve written handler functions for various Windows messages and you’ve written OnDraw code that is called in response to the WM_PAINT message. It might seem as though Windows magically calls your handler when the message floats in, but it doesn’t work that way. Deep inside the MFC code (which is linked to your program) are instructions that look something like this: 

MSG message;
while (::GetMessage(&message, NULL, 0, 0)) {

Windows determines which messages belong to your program, and the GetMessage function returns when a message needs to be processed. If no messages are posted, your program is suspended and other programs can run. When a message eventually arrives, your program “wakes up.” The TranslateMessage function translates WM_KEYDOWN messages into WM_CHAR messages containing ASCII characters, and the DispatchMessage function passes control (via the window class) to the MFC message pump, which calls your function via the message map. When your handler is finished, it returns to the MFC code, which eventually causes DispatchMessage to return. 

Yielding Control
What would happen if one of your handler functions was a pig and chewed up 10 seconds of CPU time? Back in the 16-bit days, that would have hung up the whole computer for the duration. Only cursor tracking and a few other interrupt-based tasks would have run. With Win32, multitasking got a whole lot better. Other applications can run because of preemptive multitaskingóWindows simply interrupts your pig function when it needs to. However, even in Win32, your program would be locked out for 10 seconds. It couldn’t process any messages because DispatchMessage doesn’t return until the pig returns. 

There is a way around this problem, however, which works with both Win16 and Win32. You simply train your pig function to be polite and yield control once in a while by inserting the following instructions inside the pig’s main loop: 

MSG message;
if (::PeekMessage(&message, NULL, 0, 0, PM_REMOVE)) {

The PeekMessage function works like GetMessage, except that it returns immediately even if no message has arrived for your program. In that case, the pig keeps on chewing. If there is a message, however, the pig pauses, the handler is called, and the pig starts up again after the handler exits.