O(N^2) in CreateProcess

O(N^2) in CreateProcess

It showed that 99% of the CPU time in the main unit_tests process was inside of CreateProcess, and 98.4% of the samples were in a single function. In one trace that I grabbed I found that more than 95% of the samples in my test process were in just seventeen instructions in MiCopyToCfgBitMap, which is tough to do without an n^2 algorithm:

My first attempt at investigating was to grab the sample counts and addresses from the ETW trace, grab the disassembly of MiCopyToCfgBitMap from livekd, write a script to merge them, and then analyze the annotated disassembly. That gave me the following CFG entry counts:

I then compiled seventeen different variants (using /MP for parallel compilation), using this command to verify how many CFG entries I was getting:

Finally I measured CreateProcess time of each version with a simple test harness.

Source: randomascii.wordpress.com