12 第1页 | 共2 页下一页
返回列表 发新帖
查看: 5251|回复: 16
打印 上一主题 下一主题

[其它] T&I Engine: Traversal and Intersection Engine for Hardware Accelerated Ray Traci

[复制链接]

1023

主题

3

听众

359

积分

设计实习生

Rank: 2

纳金币
335582
精华
0

最佳新人

跳转到指定楼层
楼主
发表于 2011-12-28 10:24:54 |只看该作者 |倒序浏览
T&I Engine: Traversal and Intersection Engine for Hardware Accelerated Ray Tracing

Jae-Ho Nah Jeong-Soo Park Chanmin Parky Jin-Woo Kim Yun-Hye Jung Woo-Chan Parkz Tack-Don Han

Yonsei University, Korea ySamsung Electronics, Korea zSejong University, Korea

Abstract

Ray tracing naturally supports high-quality global illumination ef-

fects, but it is computationally costly. Traversal and intersection

operations dominate the computation of ray tracing. To accelerate

these two operations, we propose a hardware architecture integrat-

ing three novel approaches. First, we present an ordered depth-first

layout and a traversal architecture using this layout to reduce the

required memory bandwidth. Second, we propose a three-phase

ray-triangle intersection architecture that takes advantage of early

exit. Third, we propose a latency hiding architecture defined as the

ray accumulation unit. Cycle-accurate simulation results indicate

our architecture can achieve interactive distributed ray tracing.

CR Categories: Computer Graphics [I.3.7]: Computer

Graphics—Three-Dimensional Graphics and Realism–Ray tracing

Keywords: ray tracing, ray tracing hardware, global illumination

1 Introduction

Ray tracing [Whitted 1980; Cook et al. 1984] is the most

commonly-used algorithm for photorealistic rendering. Ray trac-

ing generates a more realistic image than does rasterization, but

it requires tremendous computational power for traversal and ray-

primitive intersections. For this reason, it has been used for offline

rendering for most of the last decade.

For real-time ray tracing, many approaches utilizing CPUs, GPUs,

or custom hardware have recently been studied. These approaches

do not yet provide sufficient performance for processing 1G rays/s

for real-time distributed ray tracing [Govindaraju et al. 2008].

Most performance bottlenecks in ray tracing are in traversal and in-

tersection tests [Benthin 2006]. Traversal is the process of search-

ing an acceleration s***cture (AS), such as a kd-tree or bounding

volume hierarchy (BVH), to find a small subset of the primitives

for testing by the ray. A ray-primitive intersection test determines

the visibility of primitives found during the traversal.

We believe a dedicated hardware unit for traversal and the intersec-

tion test is a suitable solution for real-time distributed ray tracing.

In this paper, we present a custom hardware architecture, called

T&I (traversal and intersection) engine. This architecture can be

integrated with existing programmable shaders, as with raster op-

erations pipelines (ROps) or texture mapping units. Also, it com-

prises three novel approaches that are applicable to the traversal and

intersection test processes.

First, an ordered depth-first layout (ODFL) and its traversal archi-

tecture are presented. The ODFL is the enhancement of an eight-

byte kd-tree node layout [Pharr and Humphreys 2010]. It arranges

the child node, which has a larger surface area than its sibling, ad-

jacent to its parent to improve parent-child locality. We apply this

layout to our traversal architecture to effectively reduce the miss

rate of the traversal cache. The ODFL also can be easily applied

to other CPU or GPU ray tracers. This concept was previously an-

nounced in the extended abstract [Nah et al. 2010].

Second, we propose a three-phase intersection test unit, which di-

vides the intersection test stage into three phases. Phase 1 is the

ray-plane test, Phase 2 is the barycentric coordinate test, and Phase

3 is the final hit point calculation. This configuration reduces the

need for further computation and memory requests for missed tri-

angles that are identified in either Phases 1 or 2. Phases 1 and 2 are

performed in a common module because they use roughly the same

arithmetic operations.

Third, a ray accumulation unit is proposed for hiding memory la-

tency. This unit manages memory requests and accumulates rays

that induce a cache miss. While the waiting missed block is fetched,

other rays can perform their operations. When the missed block is

fetched, the accumulated rays are flushed to the pipeline.

We verify the performance of our architecture with a cycle-accurate

simulator and evaluate resource requirements and performance. We

also performa simulation with three types of rays that have different

coherence. The proposed architecture achieves 44-1188 Mrays/s

ray tracing performance at 500 MHz on 65 nm process.

The remainder of this paper is s***ctured as follows. Section 2 de-

scribes related work. Section 3 gives an overview of the proposed

architecture. In Sections 4 to 6, we cover the details of our three

approaches (a traversal unit with the ODFL, a three-phase inter-

section test unit, and a ray accumulation unit). In Section 7, we

describe the experimental results of the proposed architecture sim-

ulation. Finally, we conclude the paper in Section 8.

2 Related Work

2.1 Dedicated ray tracing hardware

SaarCOR [Schmittler et al. 2004] is a ray tracing pipeline that con-

sists of a ray generation/shading unit, a 4-wide SIMD traversal unit,

a list unit, a transformation unit, and an intersection test unit. Woop

et al. [2005] presented the programmable RPU architecture, which

performs ray generation, shading, and intersection tests with pro-

grammable shaders. For dynamic scenes, D-RPU [Woop et al.

2006a; Woop 2007] has a node update unit [Woop et al. 2006b]

unlike RPU. RTE [Davidovic et al. 2011] is an optimized version

of D-RPU that uses tail recursive shaders with treelets.



全文请下载附件:
分享到: QQ好友和群QQ好友和群 腾讯微博腾讯微博 腾讯朋友腾讯朋友 微信微信
转播转播0 分享淘帖0 收藏收藏0 支持支持0 反对反对0
回复

使用道具 举报

   

671

主题

1

听众

3247

积分

中级设计师

Rank: 5Rank: 5

纳金币
324742
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

沙发
发表于 2012-2-6 23:31:09 |只看该作者
再次路过……
回复

使用道具 举报

1023

主题

3

听众

359

积分

设计实习生

Rank: 2

纳金币
335582
精华
0

最佳新人

板凳
发表于 2012-2-28 23:22:01 |只看该作者
已阵亡的 蝶 随 风 舞 说过  偶尔按一下 CTRL A 会发现 世界还有另一面
回复

使用道具 举报

1023

主题

3

听众

359

积分

设计实习生

Rank: 2

纳金币
335582
精华
0

最佳新人

地板
发表于 2012-3-10 23:24:22 |只看该作者
我来顶个
回复

使用道具 举报

   

671

主题

1

听众

3247

积分

中级设计师

Rank: 5Rank: 5

纳金币
324742
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

5#
发表于 2012-3-11 23:24:53 |只看该作者
真不错,全存下来了.
回复

使用道具 举报

1023

主题

3

听众

359

积分

设计实习生

Rank: 2

纳金币
335582
精华
0

最佳新人

6#
发表于 2012-3-14 23:25:46 |只看该作者
我也来支持下
回复

使用道具 举报

1023

主题

3

听众

359

积分

设计实习生

Rank: 2

纳金币
335582
精华
0

最佳新人

7#
发表于 2012-3-24 23:25:26 |只看该作者
长了不少见识
回复

使用道具 举报

tc    

5089

主题

1

听众

33万

积分

首席设计师

Rank: 8Rank: 8

纳金币
-1
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

8#
发表于 2012-4-18 23:18:04 |只看该作者
加精、加亮滴铁子,尤其要多丁页丁页
回复

使用道具 举报

1023

主题

3

听众

359

积分

设计实习生

Rank: 2

纳金币
335582
精华
0

最佳新人

9#
发表于 2012-5-17 23:28:23 |只看该作者
其实楼主所说的这些,俺支很少用!
回复

使用道具 举报

tc    

5089

主题

1

听众

33万

积分

首席设计师

Rank: 8Rank: 8

纳金币
-1
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

10#
发表于 2012-6-3 23:24:33 |只看该作者
发了那么多,我都不知道该用哪个给你回帖了,呵呵
回复

使用道具 举报

12 第1页 | 共2 页下一页
返回列表 发新帖
您需要登录后才可以回帖 登录 | 立即注册

手机版|纳金网 ( 闽ICP备2021016425号-2/3

GMT+8, 2024-11-23 20:36 , Processed in 0.088171 second(s), 28 queries .

Powered by Discuz!-创意设计 X2.5

© 2008-2019 Narkii Inc.

回顶部