基于FPGA的神经网络加速器_张渤钰.pdf

1.1 背景

  • CNN
  • RNN
  • FPGA
  • 计算量 参数量剧增
  • FPGA灵活 低成本 高效
  • 异构平台 适应网络

1.2 现状

1.2.1 CNN

  • CNN
  • ReLU
  • ResNet
  • Transformer
    • Attention is all you need
  • EffiecientNet
    • Resolution
    • Depth
    • Width

1.2.2 RNN

  • Elman Network

  • Jordan Network

  • LSTM

    • Forget Gate
    • Input Gate
    • Output Gate
  • Peephole

    • Forget Gate
    • Memory
  • Coupled LSTM

    • Robustness
    • Competitive
  • Attention

    • Attention is all you need
    • Dynamic
    • Flexible
  • Residual Connection

    • Depth improvement

1.2.3.1 CNN Acceleration

  • Quantification
  • Pruning 剪枝
  • Searching
  • Depth-First Detachable
  • Winograd Transition
    • To smaller matrix
    • Specific linear transition to a smaller tensor
  • Parallel

1.2.3.2 LSTM Acceleration

  • Quantification
  • Pruning
  • Parallel

1.3 Contributions

  • CNN & LSTM supported hardware acceleration
  • Register Transfer Level
  • Data chunking
  • Data Reassembling
  • Less CPU time waste
  1. a hardware design improved DSP efficiency by paralleling
  2. a hardware accelerator design with more dimension unfolding and parallel acceleration with HDL optimization/
  3. a way to accelerate data chunking and data stream re-arranging with less CPU time wasting
  4. portale design
  5. FPGA controlling

2.1 神经网络原理

2.1.1 CNN

  • Convolution Layer
  • Activator Layer
  • Pooling Layer
  • Full Connection Layer

2.1.2 RNN

  • Memory
  • Inheritance

2.2.1 Fixed-Point Quantification

  • 8 bit
  • 16 bit
  • less DSP consumption
  • ReLU

2.2.2 Platforms

FPGA

  • Logic
  • DSP
  • IO
  • Memory
  • System

2.2.3 Path

HLS

  • Redundancy
  • Poor portability HDL
  • Flexible
  • Precise
  • Portable
  • Optimizible

3.1 Data Storage

  • SD Card
    • External
    • FAT
    • application layer
    • C library to RW
    • SD Data to DDR with address RW by ARM
  • Flash
    • System
    • Bootloader

3.2 Optimization

3.2.1 Data Transfering

  • AXI Bus
  • AXI DMA Controller

3.2.2 Command Control

  • Flexiblity
  • Software control rather than HDL design
  • 32-bit
  • Registers-control
  • AXI-lite

4.1 Loop unfolding & Loop parellel

  • Convolution cores parelleling
  • More Dimension
  • Line buffer

4.2 Data chunking

  • padding

4.3 Pipeline

4.3.1 On-chip Storage

  • Two buffers, ping-pong buffer
    • split into half, for writing and storing

4.3.2 Quantification Module

  • Const optimization

4.3.3 Pooling

  • Downscaling
  • line buffer
  • configurable shift register

基于移动终端的人体姿态检测技术研究与实现_袁环宇.pdf

1.2 研究现状

  • 计算机视觉
  • 传感器数据
  • CNN
  • CRNN
  • KNN

1.3 In paper

  • Multi-channel CNN
  • CNN-LSTM
  • ConvLSTM

2.1 Smartphone application

  • Acceleration sensor
  • Gyroscope

2.2 ML-based

  • SVM
  • Naive Bayers
  • Decision Tree

2.3 Deep Learning Based

  • CNN
  • RNN
  • LSTM

3.1 Dataset

  • Android app
  • Sampling
  • Filter

3.2 1D CNN

  • Keras
  • Hyper parameters config

3.3 Multi-channel CNN

  • split - merge

4.1 RNN

  • Low accuracy
  • 梯度爆炸
  • 梯度消失

4.2 LSTM

  • Slightly low accuracy
  • Less layers
  • More parameters
  • Hyper-parameters config

4.3 1D CNN-LSTM

  • Convolution as frontend
  • LSTM as branch prediction and classification
  • Connection
    • Dimesion conflict
    • Reshape layer

4.4 ConvLSTM

Convolutional LSTM

  • LSTM - Time
  • CNN - Partial Spacial

5.2 Mobile Deployment

  • Tensorflow Lite
  • Compression
  • Quantification
  • Android System Sensor API

基于机器学习的低剂量CT图像降噪方法研究_陈柯成.pdf

1.2 History

  • sin function filter
  • iteration refaction
  • deep learning post-process
    • black box
    • information loss
    • dataset limitation
    • no interaction with further medical dianogistics

2.1 CT images Mechanism

  • Pseudo-image
  • Quantium noise
  • Poisson Distribution
  • Mixed Poisson Gauss Distribution

2.2 Techniques

  • CNN
    • Encoder-Decoder
  • Adversarial Network

3 Noise Learning

  • Paired dataset
  • Variational Auto Encoder, GAN
  • VAR Noise parsing
  • GAN Noise parsing
  • PnP framework
    • Lesion-Inspired
    • LIDnet
  • Co-training based

5. Doctor Action inspired training

  • RIDnet
  • GCN
    • non-particial graph
    • get non-particial info
  • Context GCN
  • 3 dimesion GCN

5.4.7 Discussion

基于可变形卷积网络的视频去模糊算法_段风志.pdf

1 Intro

  • Image deblur
    • Transformer
    • Uformer
  • Video deblur
    • Variable Filter
    • STFAN
    • EDVR

2.3 Variable CNN

  • Enhance
  • Larger attention area
  • More sampling positions available

2.4 Dynamic Filter Network

  • Self-adapting
  • DFnet layer

3. VCNN deblur

  • near frame clear pixels
  • infomation

3.3 Frame align

3.4.1 Feature Fusion

  • Channel Dimesion - Time-orderd Feature Fusion

3.4.2 Feature Rebuild

  • Reverse Convoltion layer
  • Upscaling

4.2 Self-adapting time & space CNN

  • using clear pixels fusing with blur pixels to dig out time & space info
  • Dynamic Filter Network

4.3 Enhanced VCNN near frames align

  • more free dimensions
  • stable

4.4 Dynamic Filter Network Based Self-adapting Time-ordered Fetaure Fusion

  • Pixel-level Feature Fusion
  • Inspired by KPN
  • Using self-adating feature of Dynamic Filter Network

4.5 SATOCNN based video deblur

AWESOME but worse than examples

基于轻量化卷积神经网络的表情识别与可视化分析_王聪荣.pdf

1.1.2 Background

  • CNN
  • DBN, Deep Belief Net
  • DAE, Deep Autoencoder
  • RNN
  • GAN

1.2.1 Former Research

  • SVM based L2-SVM
  • Level-Related CNN
    • Deeper layers
    • Wider width
  • FaceNet2ExpNet
    • Freeze Parameters
    • Minimize Loss Function
    • More details learned than hinting
  • SBN-CNN

2.2 Facial expression recognition

  • Pre-process
    • lighting and position unification
  • Face detection
  • Feature extraction and classification
  • Face detection and alignment
    • MTCNN
    • CenterFace
    • Retinaface

3 ResNet

  • Depthwise Convoltion
  • Pointwise Convoltion

3.1.3 Light-weight

  • Down sampling
  • Smaller convoltion core
  • Global pooling
  • Fewer parameters
  • Less calculation

3.2 Pre-process and training

  • FER2013 Dataset
  • CUDA 11.0 RTX 3090

4.1.1 Meaningful Perturbation

  • Adam optimization
  • Simple
  • Efficient
  • Less dependency of Hyper-parameter config

4.1.2 Score-CAM

  • Forwarding-spread
  • Less noise
  • Less calculation
  • Robustness

基于FPGA和Lenet-5的字符识别系统设计与实现_程永龙.pdf

1.2.2 Image pre-process algos

  • Image filter
  • Image enhancement
  • Characters split

1.4 CNN on FPGA

  • HLS translate to HDL
  • PS+PL
  • HDL
  • Low on board RAM available
  • DDR slow speed

2.2 Filter

  • 均值滤波
    • smooth and blur
  • 中值滤波
  • Gauss Filter
  • 小波变换滤波

2.3 Image Enhancement

  • HE algo
    • average
  • 偏微分方程

2.4.1 Characters Adjust

  • Rotation Adjust
  • Lean Adjust
  • 切向矫正

2.4.2 Characters Split

  • 二值化
  • 投影法

2.5 Lenet-5

  • Convolution
  • Single character
    • Full connection
  • Structure
    • Convolution
    • Pooling
    • Convolution
    • Pooling
    • Full connection
    • ReLU

3.2 Image taking

  • Camera
  • Camera link
  • LVDS

3.3 Character Recognition

  • FPGA
    • Block RAM
    • Registers
  • RAM Address
    • Address Generate
  • Storage & Pipeline
    • Efficiency
    • Speed
  • Communication
    • Signals
    • Enabler
  • Calculation
    • Parallel
    • Convolution
    • Pooling

4.2 Normal Lighting Pre-Process

  • 二值化

4.3 High Lighting Pre-Process

  • 图像增强 图像二值化 膨胀腐蚀 字符分割

4.3 Poor Lighting Pre-Process

  • 二值化

5.2 Address Generating

  • Head address anchoring

5.3 Pipeline design

  • Register
  • Signal control
  • Parallel

5.4 Communication

  • Port
  • Signal
  • Enabler

5.5 Calculation

  • Convolution
  • Pooling

基于卷积神经网络的多模光纤图像重构研究_龚志宝.pdf