Abstract: We propose MoBox, a low-cost solution for semi-supervised video object segmentation that requires only bounding boxes as manual annotations for training. Built upon a mature semi-supervised ...