from 東大1S情報α:

  • Instead of just performing “matrix and vector multiplication,” this neural network also processes Spatial Filter.
    • By applying a filter that converts small vectors into scalars to various parts (kernels) of a larger vector (original image), a slightly smaller vector is obtained.
      • It’s kind of like that (blu3mo).
      • This operation is indeed similar to “convolution”.
    • By doing this, it extracts necessary features from the image and eliminates unnecessary information.
      • The method of extraction is naturally trained as well.
  • In addition, it is common to perform a process called Pooling after convolution.
    • This is a simple process that reduces the resolution of the image.

from#udacity_intro_to_deep_learning_with_pytorch:

  • The key is the Spatial Filter explained in the section on Contour Detection.

  • CNN automatically learns the filters.

  • The number of convolutional layers is the number of kernels that automatically generate multiple filters.

    • A certain filter may be able to detect dog ears.
    • Another filter may detect dog eyes.
  • Sometimes a pooling layer is inserted in between.

    • This is to reduce the size while preserving as much information as possible.
    • There are methods such as taking the average or taking the maximum value.
      • The maximum value method is suitable for image detection as it emphasizes distinctive parts.
  • CNN increases the depth of the image while decreasing the width and height.

    • The first layer has a depth of 3 (in the case of RGB), so the width and height are significantly larger.
    • The Convolutional Layer increases the depth, while the Pooling Layer decreases the width and height. image image