StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true > Struct Template Reference

StreamKTilePartitioner_v2&lt; BlockGemmShapeType, ReductionStrategyType, true &gt; Struct Template Reference#

Composable Kernel: ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true > Struct Template Reference
ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true > Struct Template Reference

Persistent Stream-K tile partitioner derived struct. More...

#include <streamk_gemm_tile_partitioner.hpp>

Inheritance diagram for ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >:
ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType >

Public Member Functions

 StreamKTilePartitioner_v2 (ck_tile::index_t m, ck_tile::index_t n, ck_tile::index_t k, ck_tile::index_t grid)
CK_TILE_HOST auto grid_size () const noexcept -> dim3
 Calculates the launching grid size for the Stream-K kernel. In the Persistent case, no extra workgroups are allocated for the data parallel section, making the grid size num_cu * occupancy.
CK_TILE_HOST_DEVICE index_t get_dp_tiles_per_cta () const noexcept
 Returns the total number of DP tiles per workgroup.
CK_TILE_HOST_DEVICE index_t get_extra_dp_tiles () const noexcept
 Returns the total number of DP tiles left over when dp_tiles_ is not evenly divisible by grid_.
Public Member Functions inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType >
 StreamKTilePartitionerBase (index_t m, index_t n, index_t k, index_t grid)
CK_TILE_HOST_DEVICE index_t get_partials_buffer_size (index_t acc_element_bytes) const noexcept
 Calculates the total space needed for the partials buffer.
CK_TILE_HOST_DEVICE index_t get_flags_buffer_size () const noexcept
 Calculates the total space needed for the flags buffer.
CK_TILE_DEVICE void get_iter_boundaries (index_t &iter_start, index_t &iter_end, index_t cta_idx) const noexcept
 Calculates the start and end iteration given the cta_idx.
CK_TILE_DEVICE index_t get_tile_index (index_t iter_start) const noexcept
 Calculates the 1D tile index in the C tensor for a workgroup.
CK_TILE_DEVICE void get_tile_boundaries (index_t &tile_iter_start, index_t &tile_iter_end, index_t tile_idx) const noexcept
 Calculates the starting and ending tile boundaries for the given 1D tile index.
CK_TILE_DEVICE auto get_output_tile_index (index_t tile_idx) const noexcept -> tuple< index_t, index_t >
 Calculates the workgroups 2D tile index in the C tensor given the 1D tile index.
CK_TILE_HOST_DEVICE index_t get_workspace_size (index_t acc_element_bytes) const noexcept
 Calculates the total space needed for the partials and flags buffers.
CK_TILE_HOST_DEVICE index_t get_num_tiles () const noexcept
 Returns the number of macro tiles in the C tensor.
CK_TILE_HOST_DEVICE index_t get_grid () const noexcept
 Returns the maximum number of active workgroups; this is assumed to be number of CUs * occupancy.
CK_TILE_HOST_DEVICE index_t get_dp_tiles () const noexcept
 Returns the number of tiles in the C tensor that will use the data-parallel (DP) approach.
CK_TILE_HOST_DEVICE index_t get_sk_tiles () const noexcept
 Returns the number of tiles in the C tensor that will use the Stream-K approach.
CK_TILE_HOST_DEVICE index_t get_sk_ctas () const noexcept
 Returns the number of workgroups that will participate in Stream-K in the sk_tiles_.
CK_TILE_HOST_DEVICE index_t get_total_sk_iters () const noexcept
 Returns the total number of Stream-K iterations.
CK_TILE_HOST_DEVICE index_t get_iters_per_tile () const noexcept
 Returns the total number of iterations per tile in the C tensor. In other words, this is the total number of macro tiles along the K dimension of A and B.
CK_TILE_HOST_DEVICE index_t get_iters_per_sk_cta () const noexcept
 Returns the total number of Stream-K iterations for each sk_cta. This is the lower bound (i.e., all sk_ctas_ are guaranteed to perform at least this many iterations).
CK_TILE_HOST_DEVICE index_t get_extra_iters () const noexcept
 Returns the remainder resulting from total_sk_iters_ divided by sk_ctas_. When this is non-zero, the first extra_iters_ sk_ctas_ will get one additional iteration assigned to them; such work groups will perform (iters_per_sk_cta_ + 1) iterations.
CK_TILE_HOST_DEVICE index_t get_total_dp_iters () const noexcept
 Returns the total number of DP iterations.
CK_TILE_HOST_DEVICE index_t get_n () const noexcept
 Returns the n dimension for the GEMM problem.
CK_TILE_HOST index_t estimate_num_wgs_per_tile () const noexcept
 Returns an estimate of the number of workgroups writing to the same macro tile in C.

Static Public Attributes

static constexpr bool PERSISTENT = true
Static Public Attributes inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType >
static constexpr index_t MPerBlock
static constexpr index_t NPerBlock
static constexpr index_t KPerBlock
static constexpr StreamKReductionStrategy ReductionStrategy

Protected Attributes

index_t dp_tiles_per_cta_
index_t extra_dp_tiles_
Protected Attributes inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType >
index_t num_tiles_
index_t grid_
index_t dp_tiles_

Additional Inherited Members

Public Types inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType >
using BlockGemmShape
Static Public Member Functions inherited from ck_tile::StreamKTilePartitionerBase< BlockGemmShapeType, ReductionStrategyType >
static CK_TILE_DEVICE index_t get_local_iter (index_t iter_start, index_t tile_iter_start) noexcept
 Calculates the workgroup's starting iteration that is local to a tile.
static CK_TILE_DEVICE index_t get_local_iter_end (index_t tile_iter_start, index_t iter_end, index_t tile_iter_end) noexcept
 Calculates the workgroup's non-inclusive end iteration that is local to a tile.

Detailed Description

template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
struct ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >

Persistent Stream-K tile partitioner derived struct.

This partitioner is responsible for mapping workgroups to tiles in the C tensor for the Stream-K algorithm when using a Persistent approach where no extra workgroups are allocated for data parallel.

Template Parameters
BlockGemmShapeTypeA class providing basic GEMM parameters.
ReductionStrategyTypeAn enum that defines the reduction strategy for the results in the C Tensor.

Constructor & Destructor Documentation

◆ StreamKTilePartitioner_v2()

template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >::StreamKTilePartitioner_v2 ( ck_tile::index_t m,
ck_tile::index_t n,
ck_tile::index_t k,
ck_tile::index_t grid )

Member Function Documentation

◆ get_dp_tiles_per_cta()

template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
CK_TILE_HOST_DEVICE index_t ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >::get_dp_tiles_per_cta ( ) const
noexcept

Returns the total number of DP tiles per workgroup.

◆ get_extra_dp_tiles()

template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
CK_TILE_HOST_DEVICE index_t ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >::get_extra_dp_tiles ( ) const
noexcept

Returns the total number of DP tiles left over when dp_tiles_ is not evenly divisible by grid_.

◆ grid_size()

template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
CK_TILE_HOST auto ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >::grid_size ( ) const->dim3
noexcept

Calculates the launching grid size for the Stream-K kernel. In the Persistent case, no extra workgroups are allocated for the data parallel section, making the grid size num_cu * occupancy.

Returns
dim_3 The launching grid size for the kernel.

Member Data Documentation

◆ dp_tiles_per_cta_

template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
index_t ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >::dp_tiles_per_cta_
protected

◆ extra_dp_tiles_

template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
index_t ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >::extra_dp_tiles_
protected

◆ PERSISTENT

template<typename BlockGemmShapeType, StreamKReductionStrategy ReductionStrategyType>
bool ck_tile::StreamKTilePartitioner_v2< BlockGemmShapeType, ReductionStrategyType, true >::PERSISTENT = true
staticconstexpr

The documentation for this struct was generated from the following files: