基于FPGA的神经网络加速器_张渤钰.pdf
1.1 背景
- CNN
- RNN
- FPGA
- 计算量 参数量剧增
- FPGA灵活 低成本 高效
- 异构平台 适应网络
1.2 现状
1.2.1 CNN
- CNN
- ReLU
- ResNet
- Transformer
- Attention is all you need
- EffiecientNet
- Resolution
- Depth
- Width
1.2.2 RNN
-
Elman Network
-
Jordan Network
-
LSTM
- Forget Gate
- Input Gate
- Output Gate
-
Peephole
- Forget Gate
- Memory
-
Coupled LSTM
- Robustness
- Competitive
-
Attention
- Attention is all you need
- Dynamic
- Flexible
-
Residual Connection
- Depth improvement
1.2.3.1 CNN Acceleration
- Quantification
- Pruning 剪枝
- Searching
- Depth-First Detachable
- Winograd Transition
- To smaller matrix
- Specific linear transition to a smaller tensor
- Parallel
1.2.3.2 LSTM Acceleration
- Quantification
- Pruning
- Parallel
1.3 Contributions
- CNN & LSTM supported hardware acceleration
- Register Transfer Level
- Data chunking
- Data Reassembling
- Less CPU time waste
- a hardware design improved DSP efficiency by paralleling
- a hardware accelerator design with more dimension unfolding and parallel acceleration with HDL optimization/
- a way to accelerate data chunking and data stream re-arranging with less CPU time wasting
- portale design
- FPGA controlling
2.1 神经网络原理
2.1.1 CNN
- Convolution Layer
- Activator Layer
- Pooling Layer
- Full Connection Layer
2.1.2 RNN
- Memory
- Inheritance
2.2.1 Fixed-Point Quantification
- 8 bit
- 16 bit
- less DSP consumption
- ReLU
2.2.2 Platforms
FPGA
- Logic
- DSP
- IO
- Memory
- System
2.2.3 Path
HLS
- Redundancy
- Poor portability HDL
- Flexible
- Precise
- Portable
- Optimizible
3.1 Data Storage
- SD Card
- External
- FAT
- application layer
- C library to RW
- SD Data to DDR with address RW by ARM
- Flash
- System
- Bootloader
3.2 Optimization
3.2.1 Data Transfering
- AXI Bus
- AXI DMA Controller
3.2.2 Command Control
- Flexiblity
- Software control rather than HDL design
- 32-bit
- Registers-control
- AXI-lite
4.1 Loop unfolding & Loop parellel
- Convolution cores parelleling
- More Dimension
- Line buffer
4.2 Data chunking
- padding
4.3 Pipeline
4.3.1 On-chip Storage
- Two buffers, ping-pong buffer
- split into half, for writing and storing
4.3.2 Quantification Module
- Const optimization
4.3.3 Pooling
- Downscaling
- line buffer
- configurable shift register
基于移动终端的人体姿态检测技术研究与实现_袁环宇.pdf
1.2 研究现状
- 计算机视觉
- 传感器数据
- CNN
- CRNN
- KNN
1.3 In paper
- Multi-channel CNN
- CNN-LSTM
- ConvLSTM
2.1 Smartphone application
- Acceleration sensor
- Gyroscope
2.2 ML-based
- SVM
- Naive Bayers
- Decision Tree
2.3 Deep Learning Based
- CNN
- RNN
- LSTM
3.1 Dataset
- Android app
- Sampling
- Filter
3.2 1D CNN
- Keras
- Hyper parameters config
3.3 Multi-channel CNN
- split - merge
4.1 RNN
- Low accuracy
- 梯度爆炸
- 梯度消失
4.2 LSTM
- Slightly low accuracy
- Less layers
- More parameters
- Hyper-parameters config
4.3 1D CNN-LSTM
- Convolution as frontend
- LSTM as branch prediction and classification
- Connection
- Dimesion conflict
- Reshape layer
4.4 ConvLSTM
Convolutional LSTM
- LSTM - Time
- CNN - Partial Spacial
5.2 Mobile Deployment
- Tensorflow Lite
- Compression
- Quantification
- Android System Sensor API
基于机器学习的低剂量CT图像降噪方法研究_陈柯成.pdf
1.2 History
sin
function filter- iteration refaction
- deep learning post-process
- black box
- information loss
- dataset limitation
- no interaction with further medical dianogistics
2.1 CT images Mechanism
- Pseudo-image
- Quantium noise
- Poisson Distribution
- Mixed Poisson Gauss Distribution
2.2 Techniques
- CNN
- Encoder-Decoder
- Adversarial Network
3 Noise Learning
- Paired dataset
- Variational Auto Encoder, GAN
- VAR Noise parsing
- GAN Noise parsing
4. Tasks-related Denoising
- PnP framework
- Lesion-Inspired
- LIDnet
- Co-training based
5. Doctor Action inspired training
- RIDnet
- GCN
- non-particial graph
- get non-particial info
- Context GCN
- 3 dimesion GCN
5.4.7 Discussion
基于可变形卷积网络的视频去模糊算法_段风志.pdf
1 Intro
- Image deblur
- Transformer
- Uformer
- Video deblur
- Variable Filter
- STFAN
- EDVR
2.3 Variable CNN
- Enhance
- Larger attention area
- More sampling positions available
2.4 Dynamic Filter Network
- Self-adapting
- DFnet layer
3. VCNN deblur
- near frame clear pixels
- infomation
3.3 Frame align
- Light flow
- Feature align
- Multi-dimesion VCNN hidden feature align
3.4.1 Feature Fusion
- Channel Dimesion - Time-orderd Feature Fusion
3.4.2 Feature Rebuild
- Reverse Convoltion layer
- Upscaling
4.2 Self-adapting time & space CNN
- using clear pixels fusing with blur pixels to dig out time & space info
- Dynamic Filter Network
4.3 Enhanced VCNN near frames align
- more free dimensions
- stable
4.4 Dynamic Filter Network Based Self-adapting Time-ordered Fetaure Fusion
- Pixel-level Feature Fusion
- Inspired by KPN
- Using self-adating feature of Dynamic Filter Network
4.5 SATOCNN based video deblur
AWESOME but worse than examples
基于轻量化卷积神经网络的表情识别与可视化分析_王聪荣.pdf
1.1.2 Background
- CNN
- DBN, Deep Belief Net
- DAE, Deep Autoencoder
- RNN
- GAN
1.2.1 Former Research
- SVM based L2-SVM
- Level-Related CNN
- Deeper layers
- Wider width
- FaceNet2ExpNet
- Freeze Parameters
- Minimize Loss Function
- More details learned than hinting
- SBN-CNN
2.2 Facial expression recognition
- Pre-process
- lighting and position unification
- Face detection
- Feature extraction and classification
- Face detection and alignment
- MTCNN
- CenterFace
- Retinaface
3 ResNet
- Depthwise Convoltion
- Pointwise Convoltion
3.1.3 Light-weight
- Down sampling
- Smaller convoltion core
- Global pooling
- Fewer parameters
- Less calculation
3.2 Pre-process and training
- FER2013 Dataset
- CUDA 11.0 RTX 3090
4.1.1 Meaningful Perturbation
- Adam optimization
- Simple
- Efficient
- Less dependency of Hyper-parameter config
4.1.2 Score-CAM
- Forwarding-spread
- Less noise
- Less calculation
- Robustness
基于FPGA和Lenet-5的字符识别系统设计与实现_程永龙.pdf
1.2.2 Image pre-process algos
- Image filter
- Image enhancement
- Characters split
1.4 CNN on FPGA
- HLS translate to HDL
- PS+PL
- HDL
- Low on board RAM available
- DDR slow speed
2.2 Filter
- 均值滤波
- smooth and blur
- 中值滤波
- Gauss Filter
- 小波变换滤波
2.3 Image Enhancement
- HE algo
- average
- 偏微分方程
2.4.1 Characters Adjust
- Rotation Adjust
- Lean Adjust
- 切向矫正
2.4.2 Characters Split
- 二值化
- 投影法
2.5 Lenet-5
- Convolution
- Single character
- Full connection
- Structure
- Convolution
- Pooling
- Convolution
- Pooling
- Full connection
- ReLU
3.2 Image taking
- Camera
- Camera link
- LVDS
3.3 Character Recognition
- FPGA
- Block RAM
- Registers
- RAM Address
- Address Generate
- Storage & Pipeline
- Efficiency
- Speed
- Communication
- Signals
- Enabler
- Calculation
- Parallel
- Convolution
- Pooling
4.2 Normal Lighting Pre-Process
- 二值化
4.3 High Lighting Pre-Process
- 图像增强 图像二值化 膨胀腐蚀 字符分割
4.3 Poor Lighting Pre-Process
- 二值化
5.2 Address Generating
- Head address anchoring
5.3 Pipeline design
- Register
- Signal control
- Parallel
5.4 Communication
- Port
- Signal
- Enabler
5.5 Calculation
- Convolution
- Pooling