Silver's Weblog - Tag - 'NVidia'
NVidia in Bug-Fixing Shocker
You may remember my previous post about NVidia's (then) latest drivers have some issues; in particular, the leaking of kernel-space objects being the most serious.
Since then, I'd upgraded to 91.47 to no avail, but today I upgraded to 93.71 and it is no longer leaking them!
Start NVidia control panel:
lkd> !object fffffadfb4fbc060 Object: fffffadfb4fbc060 Type: (fffffadfb5ab86c0) Process ObjectHeader: fffffadfb4fbc030 HandleCount: 2 PointerCount: 87
Close it again:
lkd> !object fffffadfb4fbc060 fffffadfb4fbc060: Not a valid object (ObjectType invalid)
W00t.
Permalink | Author: Silver | Tags: NVidia | Posted: 11:40PM on Sunday, 03 December, 2006 | Comments: 0
NVidia: 1 point for, 427 against
Last Sunday, I decided to start playing Oblivion again. Turns out I hadn't uninstalled it from last time, so that was easy. Except for the video playback problem - this was a problem originally, clearly hadn't fixed itself, and consists of all videos (opening sequence, main menu background, etc.) to be static images from apparently uninitialised memory. Which is fun.
Begin Oblivion Video Fixing Sequence, Take 2 (I'd tried before when I first got the game).
-
Download and install latest NVidia drivers for Windows XP 64bit (91.31).
-
Run Oblivion, and find the videos work!
Well that was shockingly easy. That is the one and only point for NVidia, though.
The first and most obvious downside to updating my NVidia drivers was the new "NVidia Control Panel", which does a good job of not quite matching Explorer is every way. It's got a UI consistency you could only otherwise have found on 1995 shareware, too. Horrible.
The looks of the new control panel 'thing' are not the only problems it has, oooh no. If you're running as a Standard User Account (LUA), as everyone does (right?), it fails to save any of the application-specific 3D settings. Better than that, it looks as if they were saved when they weren't! (The Apply button disappears in a fit of horrible UI design and there's no error message at all.)
Then there is the "NVidia Display Driver Service" (nvsvc64.exe), which sits in the background doing (apparently) nothing except leaking. It was leaking Paged Pool, Non-paged Pool, Commit and Handles earlier, although currently it only seems to be leaking Non-paged Pool and Handles. The 3 memory values were leaking at a combined rate of (approximately) 1.8MB/hour, and the handles at (approximately) 1700/hour. Yummy.
Finally, we come to the actual driver itself. The main deal. Which leaks entire processes through a really bizarre bug.
For this to make sense, I'll explain a few simple facts about the Windows Kernel:
-
It has an Object Manager that tracks all objects in kernel-space and user-space.
-
All objects have a "Handle Count" and "Pointer Count" - the former is for (obviously) any open handles to the object, which is mostly for user-space code, and the latter is for kernel code that simply has a pointer (it's a reference counter).
-
When both counts reach zero, non-permanent (i.e. most) objects are removed and cleaned up.
When you start a new process, naturally there enters into existence a kernel "Process" object (along with all the shenanigans that go with that). I started the NVidia Control Panel for this test.
lkd> !process fffffadfb2fe2750 1 PROCESS fffffadfb2fe2750 SessionId: 0 Cid: 14c4 Peb: 7fffffd4000 ParentCid: 0230 DirBase: 9546c000 ObjectTable: fffffa80009c0580 HandleCount: 189. Image: nvcplui.exe VadRoot fffffadfb1996b30 Vads 202 Clone 0 Private 4739. Modified 240. Locked 0. DeviceMap fffffa800249dc10 Token fffffa80077cbcf0 ElapsedTime 00:00:48.515 UserTime 00:00:00.000 KernelTime 00:00:00.000 QuotaPoolUsage[PagedPool] 1287904 QuotaPoolUsage[NonPagedPool] 16720 Working Set Sizes (now,min,max) (8402, 50, 345) (33608KB, 200KB, 1380KB) PeakWorkingSetSize 8698 VirtualSize 657 Mb PeakVirtualSize 658 Mb PageFaultCount 15774 MemoryPriority BACKGROUND BasePriority 8 CommitCharge 5247
lkd> !object fffffadfb2fe2750 Object: fffffadfb2fe2750 Type: (fffffadfb5ab86c0) Process ObjectHeader: fffffadfb2fe2720 HandleCount: 2 PointerCount: 74
Most of the above is not too important, but the Image: and two counts from !object are - notice it starts with 2 handles and 74 pointers (2 of which will be the 2 handles). These are all constant while I look around the control panel. Then I go to the "Adjust image settings with preview" page, which has a real live 3D animation. Big mistake! Only moments after going to it, the object has:
HandleCount: 2 PointerCount: 2029
And it keeps going up, even after switching to another view! It was going up at something like 1000 pointers/second, although I don't have timestamps for my debugging log. By the time I closed the application, it was:
HandleCount: 0 PointerCount: 21516
Notice that there's no handles - nothing in user-space cares about it any more. There's still over 21,000 pointers to it in kernel-space, though. Or so the Object Manager is lead to believe. One last look at the process object in detail gives:
lkd> !process fffffadfb2fe2750 1 PROCESS fffffadfb2fe2750 SessionId: 0 Cid: 14c4 Peb: 7fffffd4000 ParentCid: 0230 DirBase: 9546c000 ObjectTable: 00000000 HandleCount: 0. Image: nvcplui.exe VadRoot 0000000000000000 Vads 0 Clone 0 Private 253. Modified 769. Locked 0. DeviceMap fffffa800249dc10 Token fffffa80077cbcf0 ElapsedTime 00:01:31.953 UserTime 00:00:11.593 KernelTime 00:00:02.734 QuotaPoolUsage[PagedPool] 0 QuotaPoolUsage[NonPagedPool] 0 Working Set Sizes (now,min,max) (6, 50, 345) (24KB, 200KB, 1380KB) PeakWorkingSetSize 11123 VirtualSize 80 Mb PeakVirtualSize 670 Mb PageFaultCount 27942 MemoryPriority BACKGROUND BasePriority 8 CommitCharge 0
Interesting points on this are that the ObjectTable, VadRoot and CommitCharge are now all zero. This means that the process' virtual address space has been cleaned up entirely. The process is not even in the session process table (list of processes for the logged in session), although it is in the overall kernel process table (which nothing in user-space can see - Task Manager can't see it).
So what's happened? Almost certainly, a driver (most likely the NVidia one, since this only happens with applications that use 3D acceleration and only since the driver upgrade) is adding a reference count to the process it is handling but not releasing it. Thus, leaking hundreds of reference counting points (there's unlikely to be any actual leaked pointers). A few of my Oblivion processes have over 3 million PointerCounts.
Excellent work, NVidia. You've managed to leak in such a special way that no-one will even notice. Except me and my wonderful friend windbg.
Permalink | Author: Silver | Tags: NVidia | Posted: 03:30AM on Friday, 01 September, 2006 | Comments: 0
Powered by the Content Parser System, copyright 2002 - 2007 James G. Ross.
|