A key advantage of Intel HD graphics is its built-in threading support. Cryptic had been aware of the need to efficiently thread important processing tasks for quite awhile.
“We started threading systems with City of Villains,” Esser said. “Since then, we have improved upon that effort. Our primary breakdown is to use a thread for game logic, which sometimes goes out to all available cores, and a thread for game rendering. It’s split fairly evenly; we get a good boost. Some of our other systems are data parallel; the animation system is a good example. The software occlusion system we used also splits into any number of cores.”
Cryptic anticipates that the data model will scale to n cores. “For code complexity reasons, we’re going to go data parallel,” Esser continued. “Your rendering has to be referenced to a single core in DirectX* 9. There is a single thread going into DirectX, so if we have to queue up 500 things to draw, they queue up in a single thread.
“Newer versions of DirectX have a feature that lets you do that queuing in multiple threads. With that, we could make our rendering thread more data parallel, and it would work across more CPUs. The more we move onto the GPU, the less that’s going on in our DirectX thread. And that’s the way things are moving. The more on the GPU, the fewer DirectX calls.”
Ivan Sulic, marketing communications manager at Cryptic, thinks that working closely with Intel was critical in making the development of an intriguing online platform title such as Star Trek Online possible. “From my perspective, it’s good to have development partners like Intel,” Sulic said. “The PC is the largest install base in the world, and we want to support as many graphics chipsets on the PC as we can. So adding support for Intel HD graphics was a no-brainer because each generation has been faster than the last. A good percentage of our users have Intel chips; and if the users have it, you better support it if you want more customers.”
About The Author :
Garret Romaine is a long-time journalist and technical writer hailing from Portland, Oregon. He started in the gaming world as a beta tester for Epic MegaGames in the 1990s, working on titles such as Unreal, One Must Fall 2097, and Jazz Jackrabbit 2. Later, he contributed to ESCMAG.com, specializing in real-time strategy games such as Dune II and Command & Conquer. Currently, Garret provides insights on the latest technology through RHM Inc.
Tips For Optimizing For Intel HD Graphics :
The latest generation of Intel integrated graphics is called Intel® HD Graphics, introduced in 2010 with the Westmere family of processors in the CPU socket. As you would expect, the new integrated graphics cores provide more capabilities and better performance, but only when optimized correctly.
Over the course of several projects, Intel software engineers have blocked out a set of general guidelines to follow as best practices.The following suggestions, tips, and tricks can help you produce smooth, eye-pleasing frames that ensure your end-users have a great experience with interactive 3D graphics applications on Intel HD graphics.
1. Efficiently Keep Compute Units Busy to Avoid Stalling the Pipeline
First, avoid any scenario where the CPU can stall waiting for the CMD buffer to empty. For example, the Intel® graphics driver stores asynchronous D3D calls in the CMD buffer. If the application issues CopyResource/CopySubresourceRegion, it gets mapped in the CMD buffer. If the application then tries to Map() the resource that was the target of the Copy call, the CMD buffer gets flushed. The application (CPU) will then begin accessing the resource while the GE sits idle. This effect is known as stuttering. To avoid stuttering, implement frame rate smoothing. The best solution is to call CopyResource at Frame N, which executes at Frame N+1. The copy should be finished when the application is processing N+2. Also put some time between locks by synchronizing N-2 frames.
2. Optimize Shader Performance
For best execution-unit (EU) utilization, minimize register usage. Large shaders affect performance when register usage is limited, so mask alpha when not needed. Minimize the use of transcendentals, such as LOG, POW, and EXP. Space-out must have operations where possible. Also, pre-load shaders to avoid mid-scene compiles and mid scene texture changes. Finally, minimize geometry shader usage, and experiment with texture sampling calls.
3. Minimize Runtime and Driver Overhead
The Intel graphics driver optimizes for DirectX* 9, the most frequently used constants. Avoid global constants, if possible. Also, limit dynamically-indexed constants C[a0] C[r] of Direct X 9/10. In DirectX 10, when a constant changes, the complete buffer gets updated. So group cbuffers by frequency of updates and organize cbuffers based on feature scaling. Pack data into float4 boundaries. Ideally, use large batches (that is, >200–1000 primitives). Minimize the number of state changes between batches, and optimize the number of draw calls per frame. The higher the number of draw calls, the more likely your code will be CPU-limited. If small batches are needed, use instancing for higher performance.
4. Scale Visual Effects for Performance
Full-screen visual effects are heavy on computes, so watch for per pixel postprocessing that requires multiple passes. Balance visual quality with speed by reducing complexity of the shaders and/or the number of passes. Optimize effects such as glow/bloom, depth of field, motion blur, high dynamic range (HDR) tone mapping, heat distortion, atmospheric effects, and dynamic ambient occlusion.
5. Skip Computes That Don’t Render
Reduce the level of detail (LOD) resolution for distant objects. Reject objects outside the view frustum by doing a visibility check. Cull objects using occlusion query for complex scenes. Also, maximize use of Hi-Z and Early-Z,
and render front to back, if possible.
6. Optimize Pixel/Texel Operations
Minimize use of the Message Revision Table (MRT). Avoid proprietary texture formats or formats outside of the DirectX specification. Balance texture load instructions with arithmetic instructions, if possible. Reduce the number of texture fetches for low-fidelity modes, minimize the use of large textures, and use compressed textures with mip-maps. Implement shadows as a scalable feature, and clear the color, stencil, and Z-buffers in the same API call.
Sample & Credit :
Here is the article from the Intel Visual Adrenaline Magazine Issue 7. Drag your mouse over the pages to flick through the preview.
Article from Intel Visual Adrenaline magazine issue 7 // Click link below to read more: