🥭CUDA(C++)电磁(斯特拉顿-楚矢量衍射积分)蒙特卡洛计算分析

1. 使用英伟达 V100 GPU计算测试分析。 2. 计算斯特拉顿-楚矢量衍射积分,使用蒙特卡洛法计算分析聚焦激光场粒子与电磁场之间相互作用。 3. 使用曲面积分形式表示抛物面镜矢量衍射积分。 4. 使用切比雪夫微分矩阵法解振荡积分。 5. 使用一种洛伦兹力跳蛙算法解带电粒子与激光脉冲碰撞的轨迹。

🏈指点迷津 | Brief

🍁CUDA蒙特卡洛

🍪语言内容分比

🍇CUDA张量计算

NVIDIA Tensor Core 专门用于执行混合精度的广义矩阵乘法运算,即广义矩阵乘法输入矩阵精度较低,而广义矩阵乘法输出矩阵精度较高。混合精度训练和推理是加速神经网络训练和推理的关键技术。

D=(A0,0A0,1A0,2A0,3A1,0A1,1A1,2A1,3A2,0A2,1A2,2A2,3A3,0A3,1A3,2A3,3)(B0,0B0,1B0,2B0,3B1,0B1,1B1,2B1,3B2,0B2,1B2,2B2,3B3,0B3,1B3,2B3,3)+(C0,0C0,1C0,2C0,3C1,0C1,1C1,2C1,3C2,0C2,1C2,2C2,3C3,0C3,1C3,2C3,3)D =\left(\begin{array}{|l|l|l|l|} \hline A_{0,0} & A_{0,1} & A_{0,2} & A_{0,3} \\ \hline A_{1,0} & A_{1,1} & A_{1,2} & A_{1,3} \\ \hline A_{2,0} & A_{2,1} & A_{2,2} & A_{2,3} \\ \hline A_{3,0} & A_{3,1} & A_{3,2} & A_{3,3} \\ \hline \end{array}\right)\left(\begin{array}{|l|l|l|l|} \hline B_{0,0} & B_{0,1} & B_{0,2} & B_{0,3} \\ \hline B_{1,0} & B_{1,1} & B_{1,2} & B_{1,3} \\ \hline B_{2,0} & B_{2,1} & B_{2,2} & B_{2,3} \\ \hline B_{3,0} & B_{3,1} & B_{3,2} & B_{3,3} \\ \hline \end{array}\right) \quad+\left(\begin{array}{|l|l|l|l|} \hline C_{0,0} & C_{0,1} & C_{0,2} & C_{0,3} \\ \hline C_{1,0} & C_{1,1} & C_{1,2} & C_{1,3} \\ \hline C_{2,0} & C_{2,1} & C_{2,2} & C_{2,3} \\ \hline C_{3,0} & C_{3,1} & C_{3,2} & C_{3,3} \\ \hline \end{array}\right)

由于 NVIDIA Tensor Cores 是专为广义矩阵乘法设计的,因此使用 NVIDIA Tensor Core 的广义矩阵乘法吞吐量比使用更适合更通用的并行编程的 NVIDIA CUDA Cores 所能实现的吞吐量高得多。

NVIDIA CUDA 允许用户在 warp 级别编程 Tensor Core 广义矩阵乘法计算。虽然每个 Tensor Core 只能针对不同数据类型执行某些特定小尺寸的矩阵乘法,但大型广义矩阵乘法可以分为多个小型广义矩阵乘法并进行累积。

A=[A1,1dbm×dbkA1,2dbm×dbkA1,kdmm×dbkA2,1dmm×dbkA2,2dbm×dbkA2,k/dbkdbm×dbkAm/dmm,1dmb×dbkAm/dmm,2dbm×dbkAm/dbm,k/dbkdbm×dbk]A=\left[\begin{array}{cccc} A_{1,1}^{d_{b m} \times d_{b k}} & A_{1,2}^{d_{b m} \times d_{b k}} & \ldots & A_{1, k}^{d_{m m} \times d_{b k}} \\ A_{2,1}^{d_{m m} \times d_{b k}} & A_{2,2}^{d_{b m} \times d_{b k}} & \cdots & A_{2, k / d_{b k}}^{d_{b m} \times d_{b k}} \\ \vdots & \vdots & \ddots & \vdots \\ A_{m / d_{m m}, 1}^{d_{m b} \times d_{b k}} & A_{m / d_{m m}, 2}^{d_{b m} \times d_{b k}} & \cdots & A_{m / d_{b m}, k / d_{b k}}^{d_{b m} \times d_{b k}} \end{array}\right]

B=[B1,1dbk×dbnB1,2dbk×dbnB1,n/dbndbk×dbnB2,1dbk×dbnB2,2dbk×dbnB2,n/dbndbk×dbnBk/dbk,1dbkdbnBk/dbk,2dbk×dbnBk/dbk,n/dbndbk×dbn]B=\left[\begin{array}{cccc} B_{1,1}^{d_{b k} \times d_{b n}} & B_{1,2}^{d_{b k} \times d_{b n}} & \ldots & B_{1, n / d_{b n}}^{d_{b k} \times d_{b n}} \\ B_{2,1}^{d_{b k} \times d_{b n}} & B_{2,2}^{d_{b k} \times d_{b n}} & \ldots & B_{2, n / d_{b n}}^{d_{b k} \times d_{b n}} \\ \vdots & \vdots & \ddots & \vdots \\ B_{k / d_{b k}, 1}^{d_{b k} d_{b n}} & B_{k / d_{b k}, 2}^{d_{b k} \times d_{b n}} & \cdots & B_{k / d_{b k}, n / d_{b n}}^{d_{b k} \times d_{b n}} \end{array}\right]

C=[C1,1dbm×dbnC1,2dbm×dbnC1,n/dbndbm×dbnC2,1dbm×dbnC2,2dbm×dbnC2,n/dbndbm×dmnCm/dbm,1dbm×dlnCm/dbm,2dbn×dbnCm/dbm,n/dbndbm×dbn]C=\left[\begin{array}{cccc} C_{1,1}^{d_{b m} \times d_{b n}} & C_{1,2}^{d_{b m} \times d_{b n}} & \ldots & C_{1, n / d_{b n}}^{d_{b m} \times d_{b n}} \\ C_{2,1}^{d_{b m} \times d_{b n}} & C_{2,2}^{d_{b m} \times d_{b n}} & \ldots & C_{2, n / d_{b n}}^{d_{b m} \times d_{m n}} \\ \vdots & \vdots & \ddots & \vdots \\ C_{m / d_{b m}, 1}^{d_{b m} \times d_{l n}} & C_{m / d_{b m}, 2}^{d_{b n} \times d_{b n}} & \cdots & C_{m / d_{b m}, n / d_{b n}}^{d_{b m} \times d_{b n}} \end{array}\right]

D=[D1,1dbm×dbnD1,2dbm×dbnD1,n/dbndbm×dbnD2,1dbm×dbnD2,2dbm×dbnD2,n/dbndbm×dmnDm/dbm,1dbm×dbnDm/dbm,2dbm×dbnDm/dbm,n/dbndbn×dlm]D=\left[\begin{array}{cccc} D_{1,1}^{d_{b m} \times d_{b n}} & D_{1,2}^{d_{b m} \times d_{b n}} & \ldots & D_{1, n / d_{b n}}^{d_{b m} \times d_{b n}} \\ D_{2,1}^{d_{b m} \times d_{b n}} & D_{2,2}^{d_{b m} \times d_{b n}} & \ldots & D_{2, n / d_{b n}}^{d_{b m} \times d_{m n}} \\ \vdots & \vdots & \ddots & \vdots \\ D_{m / d_{b m}, 1}^{d_{b m} \times d_{b n}} & D_{m / d_{b m}, 2}^{d_{b m} \times d_{b n}} & \cdots & D_{m / d_{b m}, n / d_{b n}}^{d_{b n} \times d_{l_m}} \end{array}\right]

D中的每个小矩阵都被计算为多个小的广义矩阵乘法并进行累积。

Dim,ind×d=ik=1k/dAim,ikd×dBik,ind×dD_{i_m, i_n}^{d \times d}=\sum_{i_k=1}^{k / d} A_{i_m, i_k}^{d \times d} B_{i_k, i_n}^{d \times d}

在此,将主要关注广义矩阵乘法运算中的矩阵乘法部分,令 C = 0。

Last updated

Was this helpful?