基于FPGA的神经网络加速器_张渤钰.pdf

1.1 背景

CNN
RNN
FPGA
计算量参数量剧增
FPGA灵活低成本高效
异构平台适应网络

1.2 现状

1.2.1 CNN

CNN
ReLU
ResNet
Transformer
- Attention is all you need
EffiecientNet
- Resolution
- Depth
- Width

1.2.2 RNN

Elman Network
Jordan Network
LSTM
- Forget Gate
- Input Gate
- Output Gate
Peephole
- Forget Gate
- Memory
Coupled LSTM
- Robustness
- Competitive
Attention
- Attention is all you need
- Dynamic
- Flexible
Residual Connection
- Depth improvement

1.2.3.1 CNN Acceleration

Quantification
Pruning 剪枝
Searching
Depth-First Detachable
Winograd Transition
- To smaller matrix
- Specific linear transition to a smaller tensor
Parallel

1.2.3.2 LSTM Acceleration

Quantification
Pruning
Parallel

1.3 Contributions

CNN & LSTM supported hardware acceleration
Register Transfer Level
Data chunking
Data Reassembling
Less CPU time waste

a hardware design improved DSP efficiency by paralleling
a hardware accelerator design with more dimension unfolding and parallel acceleration with HDL optimization/
a way to accelerate data chunking and data stream re-arranging with less CPU time wasting
portale design
FPGA controlling

2.1 神经网络原理

2.1.1 CNN

Convolution Layer
Activator Layer
Pooling Layer
Full Connection Layer

2.1.2 RNN

Memory
Inheritance

探索

大作业

1.1 背景

1.2 现状

1.2.1 CNN

1.2.2 RNN

1.2.3.1 CNN Acceleration

1.2.3.2 LSTM Acceleration

1.3 Contributions

2.1 神经网络原理

2.1.1 CNN

2.1.2 RNN

2.2.1 Fixed-Point Quantification

2.2.2 Platforms

2.2.3 Path

3.1 Data Storage

3.2 Optimization

3.2.1 Data Transfering

3.2.2 Command Control

4.1 Loop unfolding & Loop parellel

4.2 Data chunking

4.3 Pipeline

4.3.1 On-chip Storage

4.3.2 Quantification Module

4.3.3 Pooling

1.2 研究现状

1.3 In paper

2.1 Smartphone application

2.2 ML-based

2.3 Deep Learning Based

3.1 Dataset

3.2 1D CNN

3.3 Multi-channel CNN

4.1 RNN

4.2 LSTM

4.3 1D CNN-LSTM

4.4 ConvLSTM

5.2 Mobile Deployment

1.2 History

2.1 CT images Mechanism

2.2 Techniques

3 Noise Learning

4. Tasks-related Denoising

5. Doctor Action inspired training

5.4.7 Discussion

1 Intro

2.3 Variable CNN

2.4 Dynamic Filter Network

3. VCNN deblur

3.3 Frame align

3.4.1 Feature Fusion

3.4.2 Feature Rebuild

4.2 Self-adapting time & space CNN

4.3 Enhanced VCNN near frames align

4.4 Dynamic Filter Network Based Self-adapting Time-ordered Fetaure Fusion

4.5 SATOCNN based video deblur

1.1.2 Background

1.2.1 Former Research

2.2 Facial expression recognition

3 ResNet

3.1.3 Light-weight

3.2 Pre-process and training

4.1.1 Meaningful Perturbation

4.1.2 Score-CAM

1.2.2 Image pre-process algos

1.4 CNN on FPGA

2.2 Filter

2.3 Image Enhancement

2.4.1 Characters Adjust

2.4.2 Characters Split

2.5 Lenet-5

3.2 Image taking

3.3 Character Recognition

4.2 Normal Lighting Pre-Process

4.3 High Lighting Pre-Process

4.3 Poor Lighting Pre-Process

5.2 Address Generating

5.3 Pipeline design

5.4 Communication

5.5 Calculation