Artículo
Flexible Quantization for Efficient Convolutional Neural Networks
Fecha de publicación:
05/2024
Editorial:
MDPI
Revista:
Electronics
ISSN:
2079-9292
Idioma:
Inglés
Tipo de recurso:
Artículo publicado
Clasificación temática:
Resumen
This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to ∼1.58 bits, but with a loss in performance of only ∼0.6%.
Archivos asociados
Licencia
Identificadores
Colecciones
Articulos(SEDE CENTRAL)
Articulos de SEDE CENTRAL
Articulos de SEDE CENTRAL
Citación
Zacchigna, Federico Giordano; Lew, Sergio Eduardo; Lutenberg, Ariel; Flexible Quantization for Efficient Convolutional Neural Networks; MDPI; Electronics; 13; 10; 5-2024; 1-16
Compartir
Altmétricas