StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true > Struct Template Reference#
ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true > Struct Template Reference
Persistent Stream-K tile partitioner derived struct. More...
#include <streamk_gemm_tile_partitioner.hpp>
Inheritance diagram for ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >:
Public Member Functions | |
| StreamKTilePartitioner_v2 (ck_tile::index_t m, ck_tile::index_t n, ck_tile::index_t k, ck_tile::index_t grid) | |
| CK_TILE_HOST auto | grid_size () const noexcept -> dim3 |
| Calculates the launching grid size for the Stream-K kernel. In the Persistent case, no extra workgroups are allocated for the data parallel section, making the grid size num_cu * occupancy. | |
| CK_TILE_HOST_DEVICE index_t | get_dp_tiles_per_cta () const noexcept |
| Returns the total number of DP tiles per workgroup. | |
| CK_TILE_HOST_DEVICE index_t | get_extra_dp_tiles () const noexcept |
| Returns the total number of DP tiles left over when dp_tiles_ is not evenly divisible by grid_. | |
| Public Member Functions inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType > | |
| StreamKTilePartitionerBase (index_t m, index_t n, index_t k, index_t grid) | |
| CK_TILE_HOST_DEVICE index_t | get_partials_buffer_size (index_t acc_element_bytes) const noexcept |
| Calculates the total space needed for the partials buffer. | |
| CK_TILE_HOST_DEVICE index_t | get_flags_buffer_size () const noexcept |
| Calculates the total space needed for the flags buffer. | |
| CK_TILE_DEVICE void | get_iter_boundaries (index_t &iter_start, index_t &iter_end, index_t cta_idx) const noexcept |
| Calculates the start and end iteration given the cta_idx. | |
| CK_TILE_DEVICE index_t | get_tile_index (index_t iter_start) const noexcept |
| Calculates the 1D tile index in the C tensor for a workgroup. | |
| CK_TILE_DEVICE void | get_tile_boundaries (index_t &tile_iter_start, index_t &tile_iter_end, index_t tile_idx) const noexcept |
| Calculates the starting and ending tile boundaries for the given 1D tile index. | |
| CK_TILE_DEVICE auto | get_output_tile_index (index_t tile_idx) const noexcept -> tuple< index_t, index_t > |
| Calculates the workgroups 2D tile index in the C tensor given the 1D tile index. | |
| CK_TILE_HOST_DEVICE index_t | get_workspace_size (index_t acc_element_bytes) const noexcept |
| Calculates the total space needed for the partials and flags buffers. | |
| CK_TILE_HOST_DEVICE index_t | get_num_tiles () const noexcept |
| Returns the number of macro tiles in the C tensor. | |
| CK_TILE_HOST_DEVICE index_t | get_grid () const noexcept |
| Returns the maximum number of active workgroups; this is assumed to be number of CUs * occupancy. | |
| CK_TILE_HOST_DEVICE index_t | get_dp_tiles () const noexcept |
| Returns the number of tiles in the C tensor that will use the data-parallel (DP) approach. | |
| CK_TILE_HOST_DEVICE index_t | get_sk_tiles () const noexcept |
| Returns the number of tiles in the C tensor that will use the Stream-K approach. | |
| CK_TILE_HOST_DEVICE index_t | get_sk_ctas () const noexcept |
| Returns the number of workgroups that will participate in Stream-K in the sk_tiles_. | |
| CK_TILE_HOST_DEVICE index_t | get_total_sk_iters () const noexcept |
| Returns the total number of Stream-K iterations. | |
| CK_TILE_HOST_DEVICE index_t | get_iters_per_tile () const noexcept |
| Returns the total number of iterations per tile in the C tensor. In other words, this is the total number of macro tiles along the K dimension of A and B. | |
| CK_TILE_HOST_DEVICE index_t | get_iters_per_sk_cta () const noexcept |
| Returns the total number of Stream-K iterations for each sk_cta. This is the lower bound (i.e., all sk_ctas_ are guaranteed to perform at least this many iterations). | |
| CK_TILE_HOST_DEVICE index_t | get_extra_iters () const noexcept |
| Returns the remainder resulting from total_sk_iters_ divided by sk_ctas_. When this is non-zero, the first extra_iters_ sk_ctas_ will get one additional iteration assigned to them; such work groups will perform (iters_per_sk_cta_ + 1) iterations. | |
| CK_TILE_HOST_DEVICE index_t | get_total_dp_iters () const noexcept |
| Returns the total number of DP iterations. | |
| CK_TILE_HOST_DEVICE index_t | get_n () const noexcept |
| Returns the n dimension for the GEMM problem. | |
| CK_TILE_HOST index_t | estimate_num_wgs_per_tile () const noexcept |
| Returns an estimate of the number of workgroups writing to the same macro tile in C. | |
Static Public Attributes | |
| static constexpr bool | PERSISTENT = true |
| Static Public Attributes inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType > | |
| static constexpr index_t | MPerBlock |
| static constexpr index_t | NPerBlock |
| static constexpr index_t | KPerBlock |
| static constexpr StreamKReductionStrategy | ReductionStrategy |
Protected Attributes | |
| index_t | dp_tiles_per_cta_ |
| index_t | extra_dp_tiles_ |
| Protected Attributes inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType > | |
| index_t | num_tiles_ |
| index_t | grid_ |
| index_t | dp_tiles_ |
Additional Inherited Members | |
| Public Types inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType > | |
| using | BlockGemmShape |
| Static Public Member Functions inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType > | |
| static CK_TILE_DEVICE index_t | get_local_iter (index_t iter_start, index_t tile_iter_start) noexcept |
| Calculates the workgroup's starting iteration that is local to a tile. | |
| static CK_TILE_DEVICE index_t | get_local_iter_end (index_t tile_iter_start, index_t iter_end, index_t tile_iter_end) noexcept |
| Calculates the workgroup's non-inclusive end iteration that is local to a tile. | |
Detailed Description
template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
struct ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >
struct ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >
Persistent Stream-K tile partitioner derived struct.
This partitioner is responsible for mapping workgroups to tiles in the C tensor for the Stream-K algorithm when using a Persistent approach where no extra workgroups are allocated for data parallel.
- Template Parameters
-
BlockGemmShapeType A class providing basic GEMM parameters. ReductionStrategyType An enum that defines the reduction strategy for the results in the C Tensor.
Constructor & Destructor Documentation
◆ StreamKTilePartitioner_v2()
template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
| ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >::StreamKTilePartitioner_v2 | ( | ck_tile::index_t | m, |
| ck_tile::index_t | n, | ||
| ck_tile::index_t | k, | ||
| ck_tile::index_t | grid ) |
Member Function Documentation
◆ get_dp_tiles_per_cta()
template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
|
noexcept |
Returns the total number of DP tiles per workgroup.
◆ get_extra_dp_tiles()
template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
|
noexcept |
◆ grid_size()
template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
|
noexcept |
Calculates the launching grid size for the Stream-K kernel. In the Persistent case, no extra workgroups are allocated for the data parallel section, making the grid size num_cu * occupancy.
- Returns
- dim_3 The launching grid size for the kernel.
Member Data Documentation
◆ dp_tiles_per_cta_
template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
|
protected |
◆ extra_dp_tiles_
template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
|
protected |
◆ PERSISTENT
template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
|
staticconstexpr |
The documentation for this struct was generated from the following files: