Reference architecture · Direct device movement

Peer-to-Peer PCIe Data Paths

Create direct local or remote transfer paths between compatible acquisition, FPGA, GPU, NIC and NVMe endpoints to reduce CPU intervention and avoid unnecessary staging through host memory.

Primary path PCIe endpoint to endpoint

Host role Configuration and orchestration

Typical stages DAQ · FPGA · GPU · NVMe

Prerequisite End-to-end P2P support

Define the architecture Back to PCIe architectures

Reference topology Endpoint-to-endpoint DMA

Acquisition Sensor / frame source Produces high-rate buffers

Processing FPGA or GPU Filters, transforms or infers

Storage / sink NVMe or downstream device Records or forwards the result

Control plane PCIe data plane

Operating model

How the architecture works

The sequence describes system behaviour rather than a product feature list.

Register

The application, SISCI layer or device-driver stack maps the source and destination resources and registers the requester or peer-access relationship required for the transfer.

Route

The PCIe hierarchy forwards transactions along an allowed peer route. Depending on topology and ACS/IOMMU policy, the path may remain below a switch or traverse upstream, but host DRAM is not used as the intermediate payload buffer.

Orchestrate

Host software schedules buffers, completion events, error handling and fallback paths.

System definition

Reference topology and architectural boundaries

Ownership, data movement, software responsibility and the limits of the pattern are defined separately.

System-level view Endpoint-to-endpoint DMA

Acquisition Sensor / frame source Produces high-rate buffers

Processing FPGA or GPU Filters, transforms or infers

Storage / sink NVMe or downstream device Records or forwards the result

Qualification required Application-aware design

Data path

Device-to-device

Payload movement can occur directly between local or remote PCIe devices without a host-memory staging copy where the complete platform supports peer-to-peer transactions.

Control path

Host coordinated

Configuration, buffer ownership and completion handling remain software responsibilities.

Compatibility boundary

Entire PCIe path

PCIe peer-to-peer is optional: the endpoints, drivers or APIs, host bridge, switches, ACS settings and IOMMU policy must all permit the required transaction path.

Fallback model

Host-staged transfer

A validated fallback is required when direct P2P is unavailable or disabled.

Engineering criteria

Design decisions that determine whether the architecture will work

These items must be resolved for the actual hosts, endpoints, operating systems, topology and workload.

Endpoint capability

Confirm that the source and destination devices, drivers or APIs support peer addressing, the required requester/target roles and the intended transfer direction.

Topology

Map the exact root ports, switches and NUMA placement; review ACS redirection, upstream routing and whether the platform exposes a supported local or remote P2P path.

Addressing and isolation

Define BAR visibility, IOMMU policy, security boundaries and any required peer-memory modules.

Buffer lifecycle

Specify allocation, pinning, ownership, queue depth, completion, backpressure and recovery from partial transfers.

Performance proof

Measure sustained rate, latency, jitter, CPU utilization and data integrity with the actual endpoint combination.

Implementation layers

Building blocks used to realize the architecture

The final system combines compatible hardware, software, application logic and validation—not one standalone product.

Building block

Source endpoint

Acquisition card, FPGA, network adapter or another DMA-capable producer.

Building block

Processing endpoint

GPU, FPGA or accelerator with a supported peer-memory interface.

Building block

Destination endpoint

NVMe, NIC, frame buffer or another accelerator or processing stage.

Building block

PCIe fabric

Compatible root complex, switch and lane topology.

Building block

Driver integration

Resource mapping, requester registration, DMA queues, synchronization and completion handling.

Building block

System control

Health monitoring, backpressure, fallback and data-integrity checks.

Selection boundary

Where this architecture fits

Use the pattern when its ownership and data-movement model match the engineering requirement.

Deployment fit

Use this architecture

A high-rate pipeline loses performance or CPU budget because payloads are copied through host memory between stages.

Architecture boundary

Use another architecture

The endpoints or platform do not expose a supported P2P path, or isolation policy requires host staging.

Deployment fit

Typical deployments

DAQ-to-GPU, FPGA-to-GPU, frame-grabber or FPGA-to-NVMe, accelerator pipelines and high-speed recording.

Project definition

Inputs required before system configuration

Architecture selection starts with the actual platform, traffic, software and recovery requirements.

Endpoint pairings Exact model, firmware, driver and supported DMA directions for each stage.

Topology map Root ports, switches, lane width, generation, ACS and IOMMU configuration.

Data profile Buffer size, rate, queue depth, latency and backpressure behaviour.

Fallback and recovery Host-staged mode, failed-transfer handling, device reset and application restart.

Define the topology before selecting the components.

Primionics can review the root-complex model, endpoint inventory, lane and bandwidth budget, software path, operating-system support and qualification requirements for the complete PCIe system.

Discuss the system