12 第1页 | 共2 页下一页
返回列表 发新帖
查看: 2602|回复: 16
打印 上一主题 下一主题

[其它] Heterogeneous Particle-based Simulation

[复制链接]

797

主题

1

听众

1万

积分

资深设计师

Rank: 7Rank: 7Rank: 7

纳金币
5568
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

跳转到指定楼层
楼主
发表于 2012-1-4 17:11:08 |只看该作者 |倒序浏览
1 Introduction

Particle-based simulations have been used to simulate granular materials,

fluids, and rigid bodies. To achieve realistic behavior, a

large number of particles have to be simulated. Particle-based simulations

are suited for GPUs because the computation of each particle

is almost the same, (i.e., the granularity of the computation

is uniform over the particles). This is preferable for GPUs with a

wide SIMD architecture. However, particle-based simulation on the

GPU has been mostly restricted to simulating particles of identical

size [Harada et al. 2007]. This is because the work granularity is

non-uniform if there are particles with different radii, which leads to

inefficient use of the GPU. Heterogeneous CPU/GPU architectures,

such as AMD Fusion

R APUs, can solve this simulation efficiently

by using the CPU and the GPU at the same time. On a PC with

a CPU and a discrete GPU, whenever a computation is dispatched

to the GPU, the data has to be sent via PCI Express

R bus, which

introduces a latency. However, heterogeneous architectures have

a CPU and a GPU on the same die with a tightly coupled shared

memory, so the same memory space can be accessed from the GPU

and the CPU without any copying, which can facilitate a tight collaboration

between the two processors. In this paper, we describe

a particle-based simulation with particles of various sizes running

on a heterogeneous architecture by dispatching and simultaneously

processing work on the GPU and CPU depending on the granularity.

2 Method

The simulation we developed maximizes the use of all the available

resources of a heterogeneous architecture by performing computation

concurrently on both the CPU and the GPU components.

The simulation uses a CPU thread for dispatching work to the GPU

(GPU control thread) and multiple CPU threads for computation

(CPU computation threads), whereas an application using only the

GPU uses one CPU thread. The target architecture was an AMD ASeries

APU with four CPU cores and a GPU. Our implementation

e-mail: takahiro.harada@amd.com

!"#$

$$$

%"#$

$$$

&'()*$

+,,-$

./0',/'01$

..$

%2))(3(24$

.$

54/1607824$

9$

54/1607824$

99$

%2))(3(24$

&'()*$

+,,-$

./0',/'01$

.:4,;024(<7824$

"23(824$

=1)2,(/

!0(*$

>20,1$

.:4,;024(<7824$

9.$

%2))(3(24$

Figure 2: A step of the simulation using two CPU threads.

for the architecture used the GPU, a GPU control thread, and three

CPU computation threads. For simplicity, we first describe a simulation

using the GPU and two CPU threads (a GPU control thread

and a CPU computation thread). Then we describe how to scale the

simulation to the GPU, a GPU control thread, and multiple CPU

computation threads.

2.1 Simulation using the GPU, a GPU control thread

and a CPU computation thread

A simulation with particles of various sizes as shown in Fig.1 is

a coupling of two simulations: a simulation with identical-sized

particles (small particles colored with blue) and a simulation with

varying-sized particles (large particles colored with red and green).

If the interaction between large and small particles is not considered,

the simulation of small particles has a uniform work granularity.

Thus it is suited to be processed by the GPU. On the other

hand, using the CPU is a better choice for the computation of large

particles because the granularity of the simulation of large particles

is not uniform. Therefore, small and large particle simulations are

performed on the GPU and the CPU respectively. Note that they

are also running concurrently.

A simulation step consists of three steps; building an acceleration

structure, collision, and integration. Brute-force collision computation

is prohibitively expensive when there are a large number of

particles, so an acceleration structure is built to improve the efficiency

of collision. Colliding particles are searched for and repulsion

forces are calculated. The integration step updates particle velocity

and positions.

For a coupled simulation as shown in Fig.1, we have to think about

how to handle the collision between large and small particles (LS

collision). LS collision is performed by searching for colliding

small particles for each large particle and accumulating the forces

on the small and large particles. To improve the efficiency of the

search, we can reuse the data structure built for small-small (SS)

collision. For each large particle, a bounding box in the coordinate

system of the uniform grid is calculated and small particles found in

the grid cells overlapped with the bounding box. Work granularity

for each large particle depends on the size of the particle because

the number of overlapping cells depends on the size of a bounding

box. Therefore it is more efficient to perform LS collision on the

CPU computation thread.
分享到: QQ好友和群QQ好友和群 腾讯微博腾讯微博 腾讯朋友腾讯朋友 微信微信
转播转播0 分享淘帖0 收藏收藏0 支持支持0 反对反对0
回复

使用道具 举报

797

主题

1

听众

1万

积分

资深设计师

Rank: 7Rank: 7Rank: 7

纳金币
5568
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

沙发
发表于 2012-1-13 10:51:52 |只看该作者


回复

使用道具 举报

tc    

5089

主题

1

听众

33万

积分

首席设计师

Rank: 8Rank: 8

纳金币
-1
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

板凳
发表于 2012-2-22 23:27:40 |只看该作者
楼主收集的可真全哦
回复

使用道具 举报

   

671

主题

1

听众

3247

积分

中级设计师

Rank: 5Rank: 5

纳金币
324742
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

地板
发表于 2012-3-20 23:18:45 |只看该作者
沙发不解释
回复

使用道具 举报

462

主题

1

听众

31万

积分

首席设计师

Rank: 8Rank: 8

纳金币
2
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

5#
发表于 2012-4-19 23:26:42 |只看该作者
水……生命之源……灌……
回复

使用道具 举报

tc    

5089

主题

1

听众

33万

积分

首席设计师

Rank: 8Rank: 8

纳金币
-1
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

6#
发表于 2012-5-27 23:20:24 |只看该作者
凡系斑竹滴话要听;凡系朋友滴帖要顶
回复

使用道具 举报

5969

主题

1

听众

39万

积分

首席设计师

Rank: 8Rank: 8

纳金币
-1
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

7#
发表于 2012-6-2 23:27:15 |只看该作者
有意思!学习了!
回复

使用道具 举报

1023

主题

3

听众

359

积分

设计实习生

Rank: 2

纳金币
335582
精华
0

最佳新人

8#
发表于 2012-8-6 00:28:30 |只看该作者
响应天帅号召,顶
回复

使用道具 举报

5472

主题

6

听众

1万

积分

版主

Rank: 7Rank: 7Rank: 7

纳金币
76544
精华
23

活跃会员 荣誉管理 突出贡献 优秀版主 论坛元老

9#
发表于 2012-8-7 08:50:40 |只看该作者
SketchUp学习与技巧



克莱斯勒将3D技术用于变速箱生产




虽有3D地图还须拿牌照_苹果意图很明显




Janus奖首次亮相2012大连设计节




“3D电影技术”促细胞吞噬研究
回复

使用道具 举报

462

主题

1

听众

31万

积分

首席设计师

Rank: 8Rank: 8

纳金币
2
精华
0

最佳新人 活跃会员 热心会员 灌水之王 突出贡献

10#
发表于 2012-8-30 00:51:17 |只看该作者
再看一看,再顶楼主
回复

使用道具 举报

12 第1页 | 共2 页下一页
返回列表 发新帖
您需要登录后才可以回帖 登录 | 立即注册

手机版|纳金网 ( 闽ICP备2021016425号-2/3

GMT+8, 2024-11-28 10:37 , Processed in 0.095553 second(s), 28 queries .

Powered by Discuz!-创意设计 X2.5

© 2008-2019 Narkii Inc.

回顶部