Includes a Kernel/Filter component that performs convolution operations. Until an entire image is scanned, kernel make horizontal/vertical adjustments as per stride rate.
It reduces dimensionality, as well as reduces computing power requirements for processing the data. Pooling can be classified as maximum pooling or average pooling.
It works with flattened inputs i.e., each input is coupled to every neuron. Flattened vector is then sent through a few FC layers, where mathematical functional operations are performed.
1. LeNet Architecture Simple and modest, ideal for teaching CNN basics. It is the most commonly used CNN design and is capable of recognizing handwritten digits, making it an excellent "first CNN".
2. AlexNet Architecture First convolutional network to utilize graphics processing units (GPUs). ReLU, a non-linear activation function, is used in each convolutional layer (Rectified Linear Unit).
3. VGGNet Architecture The VGG CNN model serves as a good baseline for a variety of applications in computer vision, including object detection, because of its adaptability.