如何添加TpuLang接口

Question

该问题讲解如何完成Complete TpuLang Interfaces issus。
一步一步添加TpuLang接口

zhang · Answer

一. 选择一个TpuLang接口任务，比如Conv接口

二. 查看接口定义以及参数说明

以Conv_v2为例，接口定义如下：

     def conv_v2(tensor_i,
                 weight,
                 bias = None,
                 stride = None,
                 dilation = None,
                 pad = None,
                 group = 1,
                 input_zp = None,
                 weight_zp = None,
                 out_dtype = None,
                 out_name = None):
        # pass

参数说明

* tensor_i：Tensor类型，表示输入Tensor，4维NCHW格式。
* weight：Tensor类型，表示卷积核Tensor，4维[oc, ic, kh, kw]格式。其中oc表示输出Channel数，ic表示输入channel数，kh是kernel_h，kw是kernel_w。
* bias：Tensor类型，表示偏置Tensor。为None时表示无偏置，反之则要求shape为[1, oc, 1, 1]。
* dilation：List[int]，表示空洞大小，取None则表示[1,1]，不为None时要求长度为2。List中顺序为[长，宽]
* pad：List[int]，表示填充大小，取None则表示[0,0,0,0]，不为None时要求长度为4。List中顺序为[上， 下， 左， 右]
* stride：List[int]，表示步长大小，取None则表示[1,1]，不为None时要求长度为2。List中顺序为[长，宽]
* groups：int型，表示卷积层的组数。若ic=oc=groups时，则卷积为depthwise conv
* input_zp：List[int]型或int型，表示输入偏移。取None则表示0，取List时要求长度为ic。
* weight_zp：List[int]型或int型，表示卷积核偏移。取None则表示0，取List时要求长度为ic，其中ic表示输入的Channel数。
* out_dtype：string类型或None，表示输出Tensor的类型。输入tensor类型为float16/float32时，取None表示输出tensor类型与输入一致，否则取None表示为int32。取值范围：/int32/uint32/float32/float16
* out_name：string类型或None，表示输出Tensor的名称，为None时内部会自动产生名称。

返回值

返回一个Tensor，该Tensor的数据类型由out_dtype确定。

可以看到conv接口有11个输入，在参数说明中有参数的详细描述。
其中tensor_i，weight，bias为输入tensor，其余为输入参数。
返回值为tensor，其data type和name分别由out_dtype，out_name确定。
11个输入中，tesor_i和weight是必要参数，其他参数可以不给，其默认值在参数说明中说明。

三. 查看top层conv定义

查看TopOps.td，找到对应的conv对应的arguments和results定义：

let arguments = (ins
    AnyTensor:$input,
    AnyTensor:$filter,
    AnyTensorOrNone:$bias,
    I64ArrayAttr:$kernel_shape,
    I64ArrayAttr:$strides,
    I64ArrayAttr:$pads, // top,left,bottom,right
    DefaultValuedAttr:$group,
    OptionalAttr:$dilations,
    OptionalAttr:$inserts,
    DefaultValuedAttr:$do_relu,
    DefaultValuedAttr:$relu_limit
  );

  let results = (outs AnyTensor:$output);

可以看到输入为3个tensor。bias为tensor或none。
kernel_shape，strides，pads，dilations，inserts为I64ArrayAttr，int64 array类型。
group为I64Attr，int64类型。
do_relu为BoolAttr，bool类型。
relu_limit为F64Attr，float64类型。
输出为一个tensor类型。

四.添加conv_v2接口

在tpu-mlir/python/transform/TpuLang.py中添加接口

1）添加接口定义

def conv_v2(input: Tensor,
        weight: Tensor,
        bias: Tensor = None,
        stride: List[int] = None,
        dilation: List[int] = None,
        pad: List[int] = None,
        group=1,
        input_zp: Union[int, List[int]] = None,
        weight_zp: Union[int, List[int]] = None,
        out_dtype: str = None,
        out_name: str = None):

输入参数与定义一致，并在接口添加了输入类型的约束。

2)设置默认参数

    dilation = [1, 1] if dilation is None else dilation
    stride = [1, 1] if stride is None else stride
    pad = [0, 0, 0, 0] if pad is None else pad

如果输入为None，则给定默认值。

3) 将参数换为attr

    attr = {
        "kernel_shape": ArrayAttr(weight.shape[2:]),
        "strides": ArrayAttr(stride),
        "dilations": ArrayAttr(dilation),
        "pads": ArrayAttr(pad),
        "do_relu": Attr(False, "bool"),
        "group": Attr(group)
    }

该接口需要将参数转换为tpu-mlir中可以识别的参数，通过ArrayAttr和Attr可以将参数转为tpu-mlir可以识别的参数。ArrayAttr和Attr的第一个参数为数据，第二个参数为类型，默认类型为“int64”。

4) 设置量化参数

    input.quantization(zero_point=input_zp)
    weight.quantization(zero_point=weight_zp)

设置输入或输出tensor的量化参数(scale, zero_point)。

5) 生成输出tensor

    o_dtype = "int32"
    if out_dtype is not None:
        o_dtype = out_dtype
    elif input.dtype == "float32" or input.dtype == "float16":
        o_dtype = input.dtype
    def _shape_inference():
        kh_ext = dilation[0] * (weight.shape[2] - 1) + 1
        kw_ext = dilation[1] * (weight.shape[3] - 1) + 1
        oh = (input.shape[2] + pad[0] + pad[1] - kh_ext) // stride[0] + 1
        ow = (input.shape[3] + pad[2] + pad[3] - kw_ext) // stride[1] + 1
        return [input.shape[0], weight.shape[0], oh, ow]
    output = Tensor(_shape_inference(), dtype=o_dtype, name=out_name)

根据参数，新建输出tensor，此处需要对输入tensor进行type_inference和shape_inference(未来可能去掉type_inference和shape_inference)

6) 注册op并返回output

    inputs = [input, weight, bias]
    TpuLang.insert_op(Top.ConvOp, inputs=inputs, outputs=[output], params=attr)
    return output

给定op_name, inputs, outputs, params。将op注册到tpulang中。

五. 添加conv_v2单元测试

1) 注册单元测试

class TPULANG_IR_TESTER(object):
    # This class is built for testing single operator transform.
    def __init__(self):
        self.test_function = {
            #############################
            # TpuLang Test Case, Alphabetically
            #############################
            "Conv2d": self.test_Conv2d,
            "HModel": self.test_Model,
        }

在self.test_function中注册Conv2d测试

2) 创建测试用例

def test_Conv2d(self, case_name):
    """Conv 2D"""
    @tpulang
    def _test_convolution(
        input_shape:List[int], kernel_shape:List[int], stride:List[int]=[1,1],
        dilation:List[int]=[1,1], pad:List[int]=None, group=1, dtype="float32",
        zp:List[int]=[None,None]
    ):
        x_data = rand_data(input_shape, dtype)
        x = tpul.Tensor(dtype=dtype, shape=input_shape, data=x_data)
        conv = self.conv_op(x, kernel_shape, stride, pad, group=group, dilation=dilation, zp=zp, dtype=dtype)
        tpul.compile(case_name, [x], [conv], False, 2)

    _test_convolution([1, 3, 28, 28], [12, 3, 1, 1], group=3)
    _test_convolution([1, 3, 32, 32], [12, 3, 3, 3], stride=[2,2], pad=[1,1,1,1])
    _test_convolution([1, 3, 32, 32], [12, 3, 3, 3], stride=[2,2], pad=[1,1,1,1], dtype="int8", zp=[5, -8])

新建测试用例步骤如下：

_test_convolution参数可以根据测试参数自主选择，尽可能覆盖较多的参数。
创建输入tensor：x = tpul.Tensor(...)
调用接口函数conv = self.conv_op(...)。此处对conv_v2接口进行封装。
调用编译命令tpul.compile(name，inputs, outputs,...)

这里的conv_op定义如下：

def conv_op(self, x, kshape, stride, pad=None, group=1, dilation=[1,1], zp=[None, None], dtype="float32"):
    oc = kshape[0]
    weight = self.coeff_tensor(kshape, dtype)
    out_dtype =  dtype if dtype == 'float32' else 'int32'
    bias = self.coeff_tensor(oc, out_dtype)
    conv = tpul.conv_v2(x, weight, bias=bias, stride=stride, pad=pad,
                        dilation=dilation, group=group, input_zp=zp[0],
                        weight_zp=zp[1], out_dtype=out_dtype)

其封装了conv_v2接口，新建了weight和bias coeff tensor。

3) 调用测试命令，测试conv_v2接口

进入docker环境
source envsetup.sh
执行命令 test_tpulang.py --case Conv2d

结果如下：

test_tpulang.py Conv2d

no found SET_CHIP_NAME environment value, set bm1684x as default
Test: Conv2d
Save mlir file: Conv2d_0_origin.mlir
[Running]: tpuc-opt --init --shape-infer --canonicalize --mark-FLOPs --save-weight --mlir-print-debuginfo Conv2d_0_origin.mlir -o Conv2d_0.mlir 
[Success]: tpuc-opt --init --shape-infer --canonicalize --mark-FLOPs --save-weight --mlir-print-debuginfo Conv2d_0_origin.mlir -o Conv2d_0.mlir 
Mlir file generated:Conv2d_0.mlir
[CMD]: model_runner.py --input Conv2d_0_in_f32.npz --model Conv2d_0.mlir --output Conv2d_0_top_outputs.npz
Save mlir file: Conv2d_1_origin.mlir
[Running]: tpuc-opt --init --shape-infer --canonicalize --mark-FLOPs --save-weight --mlir-print-debuginfo Conv2d_1_origin.mlir -o Conv2d_1.mlir 
[Success]: tpuc-opt --init --shape-infer --canonicalize --mark-FLOPs --save-weight --mlir-print-debuginfo Conv2d_1_origin.mlir -o Conv2d_1.mlir 
Mlir file generated:Conv2d_1.mlir
[CMD]: model_runner.py --input Conv2d_1_in_f32.npz --model Conv2d_1.mlir --output Conv2d_1_top_outputs.npz
Save mlir file: Conv2d_2_origin.mlir
[Running]: tpuc-opt --init --shape-infer --canonicalize --mark-FLOPs --save-weight --mlir-print-debuginfo Conv2d_2_origin.mlir -o Conv2d_2.mlir 
[Success]: tpuc-opt --init --shape-infer --canonicalize --mark-FLOPs --save-weight --mlir-print-debuginfo Conv2d_2_origin.mlir -o Conv2d_2.mlir 
Mlir file generated:Conv2d_2.mlir
[CMD]: model_runner.py --input Conv2d_2_in_f32.npz --model Conv2d_2.mlir --output Conv2d_2_top_outputs.npz
====== TEST Conv2d Success ======