vllm.utils.cpu_triton_utils ¶
Contains replacement functions to fallback Triton usages in CPU backend
_copy_and_expand_eagle_inputs_kernel_impl(target_token_ids_ptr, target_positions_ptr, next_token_ids_ptr, out_input_ids_ptr, out_positions_ptr, out_is_rejected_token_mask_ptr, out_is_masked_token_mask_ptr, out_new_token_indices_ptr, out_hidden_state_mapping_ptr, query_start_loc_ptr, query_end_loc_ptr, padding_token_id, parallel_drafting_token_id, total_input_tokens, num_padding_slots_per_request, shift_input_ids, BLOCK_SIZE_TOKENS=None, BLOCK_SIZE_REQS=None) ¶
Adapter between Triton kernel call convention and C++ implementation.
The Triton kernel uses '_ptr' suffixed parameter names and compile-time constants (BLOCK_SIZE_TOKENS, BLOCK_SIZE_REQS) which are not needed by the C++ implementation. C++ reads token id tensors as int64_t*. Output tensors that are int32 need copy-back after C++ writes int64.