Installation¶
Breaking Change Notice
After v0.11.2, vLLM-MetaX moved its _C and _moe_C kernel into a separate package named mcoplib.
mcoplib is open-sourced at MetaX-mcoplib and would maintain its own release cycle. vllm-metax's release rely on its corresponding version of mcoplib. Check it at the Release Page.
Though the csrc folder is still kept in this repo for development convenience, and there is no guarantee that the code is always in sync with mcoplib. Not only the performance but also the correctness may differ from mcoplib.
To build and use the vllm-metax csrc , you need to set:
in both build and runtime environment variables.
Please always use mcoplib for production usage.
Requirements¶
- OS: Linux
- Python: 3.10 -- 3.12
- Hardware: MetaX C-series
- SDK: MACA-SDK
Build from source¶
Prepare environment¶
# setup MACA path
export MACA_PATH="/opt/maca"
# cu-bridge
export CUCC_PATH="${MACA_PATH}/tools/cu-bridge"
export CUDA_PATH="${HOME}/cu-bridge/CUDA_DIR"
export CUCC_CMAKE_ENTRY=2
# update PATH
export PATH=${MACA_PATH}/mxgpu_llvm/bin:${MACA_PATH}/bin:${CUCC_PATH}/tools:${CUCC_PATH}/bin:${PATH}
export LD_LIBRARY_PATH=${MACA_PATH}/lib:${MACA_PATH}/ompi/lib:${MACA_PATH}/mxgpu_llvm/lib:${LD_LIBRARY_PATH}
Note
If using pip, all the build and installation steps are based on corresponding docker images. You can find them on QuickStart page. We need to add --no-build-isolation flag during the whole package building since we need all the requirements that were pre-installed in released docker image.
Note
UV does not rely on any pre-installed packages in the docker, and would install all the dependencies in a virtual environment from scratch.
UV installation guide
We'd recommend install uv with pip (this is not forcibly required):
Then create the virtual environment with python 3.10 or above:
And activate the virtual environment:
You need to manually set Metax PyPi repo to download maca-related dependencies during installation.
Build vllm-metax plugin¶
Clone vllm-metax project:
Build the plugin:
Build vllm¶
Warning
It's not recommended to install vllm directly from PYPI with:
This would install a cuda-build vllm which carried a lot of cuda-related dependencies and kernel files together with internal vllm_flash_attn and triton_kernels, which may wrongly cause runtime errors on some checking. (E.g. Some pre-conditions that should only be detected on cuda may also passed to maca backend by mistake in this kind of vllm.)
Clone vllm project:
Build with empty device:
To build vLLM using local uv environment
About isolation
--no-build-isolation is optional. we add this option for speeding up installation. uv would still trying to download cuda-related packages during build even if you set VLLM_TARGET_DEVICE=empty, which may take a long time.
Post build (Significant)¶
Note
None
Note
None