from 東大1S情報α:
- Instead of just performing “matrix and vector multiplication,” this neural network also processes Spatial Filter.
- By applying a filter that converts small vectors into scalars to various parts (kernels) of a larger vector (original image), a slightly smaller vector is obtained.
- It’s kind of like that (blu3mo).
- This operation is indeed similar to “convolution”.
- By doing this, it extracts necessary features from the image and eliminates unnecessary information.
- The method of extraction is naturally trained as well.
- By applying a filter that converts small vectors into scalars to various parts (kernels) of a larger vector (original image), a slightly smaller vector is obtained.
- In addition, it is common to perform a process called Pooling after convolution.
- This is a simple process that reduces the resolution of the image.
from#udacity_intro_to_deep_learning_with_pytorch:
-
The key is the Spatial Filter explained in the section on Contour Detection.
-
CNN automatically learns the filters.
-
The number of convolutional layers is the number of kernels that automatically generate multiple filters.
- A certain filter may be able to detect dog ears.
- Another filter may detect dog eyes.
-
Sometimes a pooling layer is inserted in between.
- This is to reduce the size while preserving as much information as possible.
- There are methods such as taking the average or taking the maximum value.
- The maximum value method is suitable for image detection as it emphasizes distinctive parts.
-
CNN increases the depth of the image while decreasing the width and height.
- The first layer has a depth of 3 (in the case of RGB), so the width and height are significantly larger.
- The Convolutional Layer increases the depth, while the Pooling Layer decreases the width and height.