Installation¶

Breaking Change Notice

After v0.11.2, vLLM-MetaX moved its _C and _moe_C kernel into a separate package named mcoplib.

mcoplib is open-sourced at MetaX-mcoplib and would maintain its own release cycle. vllm-metax's release rely on its corresponding version of mcoplib. Check it at the Release Page.

Though the csrc folder is still kept in this repo for development convenience, and there is no guarantee that the code is always in sync with mcoplib. Not only the performance but also the correctness may differ from mcoplib.

To build and use the vllm-metax csrc , you need to set:

export USE_PRECOMPILED_KERNEL=0

in both build and runtime environment variables.

Please always use mcoplib for production usage.

Requirements¶

OS: Linux
Python: 3.10 -- 3.12
Hardware: MetaX C-series
SDK: MACA-SDK

Build from source¶

Prepare environment¶

# setup MACA path
export MACA_PATH="/opt/maca"

# cu-bridge
export CUCC_PATH="${MACA_PATH}/tools/cu-bridge"
export CUDA_PATH="${HOME}/cu-bridge/CUDA_DIR"
export CUCC_CMAKE_ENTRY=2

# update PATH
export PATH=${MACA_PATH}/mxgpu_llvm/bin:${MACA_PATH}/bin:${CUCC_PATH}/tools:${CUCC_PATH}/bin:${PATH}
export LD_LIBRARY_PATH=${MACA_PATH}/lib:${MACA_PATH}/ompi/lib:${MACA_PATH}/mxgpu_llvm/lib:${LD_LIBRARY_PATH}

PIPUV

Note

If using pip, all the build and installation steps are based on corresponding docker images. You can find them on QuickStart page. We need to add --no-build-isolation flag during the whole package building since we need all the requirements that were pre-installed in released docker image.

Note

UV does not rely on any pre-installed packages in the docker, and would install all the dependencies in a virtual environment from scratch.

UV installation guide

We'd recommend install uv with pip (this is not forcibly required):

pip install uv

Then create the virtual environment with python 3.10 or above:

uv venv /opt/venv --python python3.10

And activate the virtual environment:

source /opt/venv/bin/activate

You need to manually set Metax PyPi repo to download maca-related dependencies during installation.

export UV_EXTRA_INDEX_URL=https://repos.metax-tech.com/r/maca-pypi/simple
export UV_INDEX_STRATEGY=unsafe-best-match

Optional: Change PyPi default mirror

You could set Aliyun PyPi mirror as default to speed up non-metax-related packages:

export UV_INDEX_URL=https://mirrors.aliyun.com/pypi/simple

Build vllm-metax plugin¶

Clone vllm-metax project:

git clone --branch releases/v0.21.0 https://github.com/MetaX-MACA/vLLM-metax
cd vLLM-metax

Build the plugin:

PIPUV

Note

python use_existing_metax.py
pip install -r requirements/build.txt
pip install .  --no-build-isolation

Additional installation options

If you want to develop vllm-metax, install it in editable mode instead.

pip install -v -e . --no-build-isolation

Optionally, build a portable wheel which you can then install elsewhere.

python -m build -w -n 
pip install dist/*.whl

Note

uv pip install -r requirements/build.txt
uv pip install .

Additional installation options

If you want to develop vLLM, install it in editable mode instead.

uv pip install -v -e .

Optionally, build a portable wheel which you can then install elsewhere.

uv build --wheel

Build vllm¶

Warning

It's not recommended to install vllm directly from PYPI with:

uv pip install vllm==X.Y.Z

This would install a cuda-build vllm which carried a lot of cuda-related dependencies and kernel files together with internal vllm_flash_attn and triton_kernels, which may wrongly cause runtime errors on some checking. (E.g. Some pre-conditions that should only be detected on cuda may also passed to maca backend by mistake in this kind of vllm.)

Clone vllm project:

git clone  --depth 1 --branch releases/v0.21.0 https://github.com/vllm-project/vllm 
cd vllm

Build with empty device:

PIPUV

To build vllm-metax using an existing PyTorch installation

python use_existing_pytorch.py
pip install -r requirements/build.txt
VLLM_TARGET_DEVICE=empty pip install . --no-build-isolation

To build vLLM using local uv environment

VLLM_TARGET_DEVICE=empty uv pip install . --no-build-isolation

About isolation

--no-build-isolation is optional. we add this option for speeding up installation. uv would still trying to download cuda-related packages during build even if you set VLLM_TARGET_DEVICE=empty, which may take a long time.

Post build (Significant)¶

PIPUV

Note

None

Note

Currently vllm-metax still only support build with numpy<2. While in building vllm, numpy would be overridden to numpy>2. So we need manually restore it to original version by:

uv pip install 'numpy<2'