CNN Architecture

Convolutional Neural Networks (CNN, or ConvNet) are a type of multi-layer neural network that discern visual patterns from pixel images.

[{"selector":"#anim-3c71cedd-5e37-45a4-9458-915483eceebd","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-e077127a-4196-4a50-82b7-c171dfc3f349","keyframes":{"transform":["translate3d(0px, -856.04974%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-4960e83c-2a6f-4752-a38f-e042fad843fe","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-9733868a-ba66-4925-bf62-77370882e131","keyframes":{"transform":["translate3d(0px, -511.53048%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] Learn more

Typical CNN Architecture

[{"selector":"#anim-976d4500-e077-4d91-8c71-f16b18e43660","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-55b22d1b-ceec-49c5-9f8e-823a20958233","keyframes":{"transform":["translate3d(0px, -438.45204%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-023ae1f1-5c9f-4977-8624-b335a365a2d0","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-84c6bd9c-01c4-45fc-883f-92fd8a1831ed","keyframes":{"transform":["translate3d(0px, -940.81189%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] Learn more

Includes a Kernel/Filter component that performs convolution operations. Until an entire image is scanned, kernel make horizontal/vertical adjustments as per stride rate.

[{"selector":"#anim-db6a980a-92b3-4b99-bb92-9399590dee6f","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-d552f5e4-0bfa-48d1-8f8a-8e6752196ab9","keyframes":{"transform":["translate3d(0px, -376.72295%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-245ad38c-0d3f-4264-a2c0-373d0efdd007","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-75b48fe4-1f8e-4091-9cd3-45f1eeb890a0","keyframes":{"transform":["translate3d(0px, -538.34125%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] Learn more

It reduces dimensionality, as well as reduces computing power requirements for processing the data. Pooling can be classified as maximum pooling or average pooling.

[{"selector":"#anim-fc643e1b-bf93-4afc-9d39-e2607e5eb9c2","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-734f8667-992e-4cf0-9254-1c71d9e41a61","keyframes":{"transform":["translate3d(0px, -375.45714%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-758f121a-0bbe-4553-8576-10c0e1870c92","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-a99f7bd1-3cf1-4c7b-8e4d-58cbf393d4c3","keyframes":{"transform":["translate3d(0px, -1092.06364%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] Learn more

It works with flattened inputs i.e., each input is coupled to every neuron. Flattened vector is then sent through a few FC layers, where mathematical functional operations are performed.

[{"selector":"#anim-f7c1f220-212a-4855-ab22-144e695783d6","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-250a65d2-78ad-4052-8570-d87fe46c6051","keyframes":{"transform":["translate3d(0px, -376.72295%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-449133c9-9bd2-4529-a59c-5ba9c171970f","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-118fec58-9f1f-43a6-a29e-7f7a160ac41a","keyframes":{"transform":["translate3d(0px, -549.60888%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] Learn more

1. LeNet Architecture Simple and modest, ideal for teaching CNN basics. It is the most commonly used CNN design and is capable of recognizing handwritten digits, making it an excellent "first CNN".

[{"selector":"#anim-f1ba1f05-2af3-44b2-99e3-68f2d07628c7","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-89cede7b-0e06-48a6-abea-6b236f6459b7","keyframes":{"transform":["translate3d(0px, -302.96299%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-4c395c5f-9231-4cb3-a477-fb481f347907","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-d6dccfe9-7fbc-40f9-9848-a3c3652d76a6","keyframes":{"transform":["translate3d(0px, -529.89058%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] Learn more

2. AlexNet Architecture First convolutional network to utilize graphics processing units (GPUs). ReLU, a non-linear activation function, is used in each convolutional layer (Rectified Linear Unit).

[{"selector":"#anim-dac9027d-0a85-4719-af5a-d971bfadd72f","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-4afb7971-ef6e-417f-96fe-e14f4e8cebc9","keyframes":{"transform":["translate3d(0px, -299.62964%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] Learn more

3. VGGNet Architecture The VGG CNN model serves as a good baseline for a variety of applications in computer vision, including object detection, because of its adaptability.

[{"selector":"#anim-0e0a1220-6850-4ce1-9442-a6b2dc815099","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-f93f6b83-11d6-4068-96df-d8e37352503e","keyframes":{"transform":["translate3d(0px, -328.92516%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] Learn more

Find out how CNN Architecture can benefit you.

[{"selector":"#anim-8ef122d8-08d4-4676-ba84-378f14563e2e","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-e9ab37c8-6e83-454d-9a21-60ccbd1e6bef","keyframes":{"transform":["translate3d(0px, -219.0417%, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] SWIPE UP

CNN Architecture

A Detailed Explanation

Convolutional Neural Networks (CNN, or ConvNet) are a type of multi-layer neural network that discern visual patterns from pixel images.

Introduction to CNN

Typical CNN Architecture

It consists of 3 layers, which make up its building blocks.

Includes a Kernel/Filter component that performs convolution operations. Until an entire image is scanned, kernel make horizontal/vertical adjustments as per stride rate.

1. Convolutional Layer (CONV)

It reduces dimensionality, as well as reduces computing power requirements for processing the data. Pooling can be classified as maximum pooling or average pooling.

2. Pooling Layer (POOL)

It works with flattened inputs i.e., each input is coupled to every neuron. Flattened vector is then sent through a few FC layers, where mathematical functional operations are performed.

3. Fully Connected Layer (FC)

1. LeNet Architecture Simple and modest, ideal for teaching CNN basics. It is the most commonly used CNN design and is capable of recognizing handwritten digits, making it an excellent "first CNN".

Popular CNN Architectures

2. AlexNet Architecture First convolutional network to utilize graphics processing units (GPUs). ReLU, a non-linear activation function, is used in each convolutional layer (Rectified Linear Unit).

3. VGGNet Architecture The VGG CNN model serves as a good baseline for a variety of applications in computer vision, including object detection, because of its adaptability.

Find out how CNN Architecture can benefit you.