About the Authors xiii Preface xv Part I Overview 1 1 Introduction 3 1.1 History and Applications 5 1.2 Pitfalls of High-Accuracy DNNs/CNNs 6 1.2.1 Compute and Energy Bottleneck 6 1.2.2 Sparsity Considerations 9 1.3 Chapter Summary 11 2 Overview of Convolutional Neural Networks 13 2.
1 Deep Neural Network Architecture 13 2.2 Convolutional Neural Network Architecture 15 2.3 Popular CNN Models 26 2.4 Popular CNN Datasets 30 2.5 CNN Processing Hardware 31 2.6 Chapter Summary 37 Part II Compressive Coding for CNNs 39 3 Contemporary Advances in Compressive Coding for CNNs 41 3.1 Background of Compressive Coding 41 3.2 Compressive Coding for CNNs 43 3.
3 Lossy Compression for CNNs 43 3.4 Lossless Compression for CNNs 44 3.5 Recent Advancements in Compressive Coding for CNNs 48 3.6 Chapter Summary 50 4 Lossless Input Feature Map Compression 51 4.1 Two-Step Input Feature Map Compression Technique 52 4.2 Evaluation 55 4.3 Chapter Summary 57 5 Arithmetic Coding and Decoding for 5-Bit CNN Weights 59 5.1 Architecture and Design Overview 60 5.
2 Algorithm Overview 63 5.3 Weight Decoding Algorithm 67 5.4 Encoding and Decoding Examples 69 5.5 Evaluation Methodology 74 5.6 Evaluation Results 75 5.7 Chapter Summary 84 Part III Dense CNN Accelerators 85 6 Contemporary Dense CNN Accelerators 87 6.1 Background on Dense CNN Accelerators 87 6.2 Representation of the CNNWeights and Feature Maps in Dense Format 87 6.
3 Popular Architectures for Dense CNN Accelerators 89 6.4 Recent Advancements in Dense CNN Accelerators 92 6.5 Chapter Summary 93 7 iMAC: Image-to-Column and General Matrix Multiplication-Based Dense CNN Accelerator 95 7.1 Background and Motivation 95 7.2 Architecture 97 7.3 Implementation 99 7.4 Chapter Summary 100 8 NeuroMAX: A Dense CNN Accelerator 101 8.1 RelatedWork 102 8.
2 Log Mapping 103 8.3 Hardware Architecture 105 8.4 Data Flow and Processing 108 8.5 Implementation and Results 118 8.6 Chapter Summary 124 Part IV Sparse CNN Accelerators 125 9 Contemporary Sparse CNN Accelerators 127 9.1 Background of Sparsity in CNN Models 127 9.2 Background of Sparse CNN Accelerators 128 9.3 Recent Advancements in Sparse CNN Accelerators 131 9.
4 Chapter Summary 133 10 CNN Accelerator for In Situ Decompression and Convolution of Sparse Input Feature Maps 135 10.1 Overview 135 10.2 Hardware Design Overview 135 10.3 Design Optimization Techniques Utilized in the Hardware Accelerator 140 10.4 FPGA Implementation 141 10.5 Evaluation Results 143 10.6 Chapter Summary 149 11 Sparse-PE: A Sparse CNN Accelerator 151 11.1 RelatedWork 155 11.
2 Sparse-PE 156 11.3 Implementation and Results 174 11.4 Chapter Summary 184 12 Phantom: A High-Performance Computational Core for Sparse CNNs 185 12.1 RelatedWork 189 12.2 Phantom 190 12.3 Phantom-2D 201 12.4 Experiments and Results 209 12.5 Chapter Summary 218 Part V HW/SW Co-Design and Co-Scheduling for CNN Acceleration 221 13 State-of-the-Art in HW/SW Co-Design and Co-Scheduling for CNN Acceleration 223 13.
1 HW/SW Co-Design 223 13.2 HW/SW Co-Scheduling 228 13.3 Chapter Summary 230 14 Hardware/Software Co-Design for CNN Acceleration 231 14.1 Background of iMAC Accelerator 231 14.2 Software Partition for iMAC Accelerator 232 14.3 Experimental Evaluations 235 14.4 Chapter Summary 237 15 CPU-Accelerator Co-Scheduling for CNN Acceleration 239 15.1 Background and Preliminaries 240 15.
2 CNN Acceleration with CPU-Accelerator Co-Scheduling 242 15.3 Experimental Results 251 15.4 Chapter Summary 257 16 Conclusions 259 References 265 Index 285.