14) Development for RoCEv2 based on the Mellanox ConnectX network adapter on the Windows OS

I.  Environment Preparation

Hardware and Driver Installation Confirm that the network card firmware supports RoCEv2 (supported by default).

Install the latest Mellanox WinOF-2 driver (including NDK driver).

Install Mellanox Firmware Tools (MFT) for firmware management.

2. Development Tool Installation

Visual Studio 2019/2022 (with C++17 support required).

Install the Mellanox Windows Software Development Kit (SDK).

This includes header files (mlx4_win.h, mlx5_win.h, etc.).

Static library files (.lib) and dynamic link libraries (.dll).

3. Network Configuration

Enable RoCEv2 mode: Configure through the Mellanox driver configuration tool.

Configure the switch to support PFC and ECN (to ensure a lossless network).

Set the Windows firewall to allow RoCEv2 traffic (UDP port 4791).

II. Core Development Process

  1.  RDMA Initialization

 

#include <winverbs.h> // Mellanox Windows Verbs API

// Initialize device list

ibv_device** dev_list = ibv_get_device_list(NULL);

ibv_context* context = ibv_open_device(dev_list[0]); // Select the first device

// Allocate Protection Domain (PD)

ibv_pd* pd = ibv_alloc_pd(context);

// Create Completion Queue (CQ)

ibv_cq* cq = ibv_create_cq(context, CQ_DEPTH, nullptr, nullptr, 0);

2.  Configure Queue Pair (QP)

 

ibv_qp_init_attr qp_init_attr = {};

 

qp_init_attr.qp_type = IBV_QPT_UD; // RoCEv2 uses Unreliable Datagram

 

qp_init_attr.send_cq = cq;

 

qp_init_attr.recv_cq = cq;

 

qp_init_attr.cap.max_send_wr = MAX_WR;

 

qp_init_attr.cap.max_recv_wr = MAX_WR;

 

ibv_qp* qp = ibv_create_qp(pd, &qp_init_attr);

 

// Transition QP state to INIT

 

ibv_qp_attr qp_attr = {};

 

qp_attr.qp_state = IBV_QPS_INIT;

 

qp_attr.pkey_index = 0;

 

qp_attr.port_num = PORT_NUM; // Physical port number

 

ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_PKEY_INDEX | IBV_QP_PORT);

  1.  Memory Registration

// Register memory buffer

ibv_mr* mr = ibv_reg_mr(pd, buffer, buffer_size,

IBV_ACCESS_LOCAL_WRITE |

IBV_ACCESS_REMOTE_READ);

  1.  Connection Management

// Exchange QP information (custom protocol required)

struct QPInfo {

uint16_t lid;

uint32_t qpn;

uint32_t psn;

} local_info, remote_info;

// Transition QP state to RTR (Ready to Receive)

qp_attr.qp_state = IBV_QPS_RTR;

ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE);

// Transition to RTS (Ready to Send)

qp_attr.qp_state = IBV_QPS_RTS;

qp_attr.sq_psn = local_info.psn;

ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_SQ_PSN);

  1.  Data Transfer

// Construct send request

ibv_sge sge = {};

sge.addr = (uintptr_t)buffer;

sge.length = data_len;

sge.lkey = mr->lkey;

ibv_send_wr wr = {};

wr.wr_id = 0x1234; // Custom identifier

wr.opcode = IBV_WR_SEND;

wr.sg_list = &sge;

wr.num_sge = 1;

wr.send_flags = IBV_SEND_SIGNALED;

ibv_send_wr* bad_wr;

ibv_post_send(qp, &wr, &bad_wr);

// Poll Completion Queue

ibv_wc wc;

int ret;

do {

ret = ibv_poll_cq(cq, 1, &wc);

} while (ret == 0);

III. Key Optimization Points

  1. Zero-Copy Technology

Enable the IBV_ACCESS_ZERO_BASED flag when registering memory with ibv_reg_mr.

Cooperate with DMA to directly access user-space memory.

  1. Batch Operations

Use ibv_post_send to submit multiple WRs in bulk.

Reduce the overhead of transitions between user-space and kernel-space.

  1. Asynchronous Event Handling

Bind Windows IOCP with the completion queue.

Use ibv_get_async_event to listen for hardware events.

Ⅳ. Debugging and Testing Tools

1.Performance Testing

Use the mlx5_win_perf tool to test throughput and latency.

Validate bandwidth with custom Benchmark tools.

2.Protocol Analysis

Install the RoCEv2 parsing plugin for Wireshark.

Filter on udp.port == 4791 to inspect data packets.

3.Mellanox Diagnostic Tools

Run mlx_fw_checker to verify firmware status.

Use mlxlink to check physical link quality.

Ⅴ. Precautions

1.Windows-Specific Behaviors

Programs need to be run with administrative privileges.

Some APIs require dynamic invocation via MLX5_WIN.dll.

2.Compatibility Issues

Ensure byte order consistency when communicating with Linux endpoints.

Verify MTU configuration matches (4096 is recommended).

3.Security Mechanisms

Enable CMA (Connection Manager Abstraction) for access control.

Use IPSEC to encrypt RoCEv2 traffic (hardware support required).

Ⅵ. Reference Resources

1.Official Documentation:

Mellanox WinOF-2 User Manual

RDMA Aware Networks Programming User Manual

2.Sample Code:

windows_examples branch on the Mellanox GitHub repository.

Windows Direct Access samples on MSDN.

3.Community Support:

Mellanox Developer Forum

Windows Hardware Dev Center

By following this plan, gradual integration of RoCEv2 functionality can be achieved. It is recommended to start with a simple PingPong test program and gradually expand to a complete application.