Paper review: MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context
Date: 2022-02-05
Tags: segmentation, mobile, neural network
Asymmetric encoder-decoder architecture with multiple lateral connections and spatial pyramid pooling module on top of a backbone.
Contextual Feature Pyramid
A spatial pyramid pooling (SPP) module is employed to collect contextual information from global levels to regional ones. Each bin in the pyramid accumulates regional information, while the global branch generates representations of the whole scene.
To further increase the diversity of the contextual features, a multi-kernel group convolution is proposed (usage of different kernel sizes in different groups to increase the diversity of contextual features while keeping the computational cost down).
Hybrid Decoder
A few skip connections from early feature layers are added in combination with a lightweight decoder with a hybrid merging style (concatenation merge block and summation merge block).