Biology, asked by Aadill69591, 1 year ago

Derive a suitable expression for local gradient of a neuron j which is present in the hidden layer of the neural network.

Answers

Answered by muhammedashique383

Answer:

Artificial neural networks (ANNs) are a powerful class of models used for nonlinear regression and classification tasks that are motivated by biological neural computation. The general idea behind ANNs is pretty straightforward: map some input onto a desired target value using a distributed cascade of nonlinear transformations (see Figure 1). However, for many, myself included, the learning algorithm used to train ANNs can be difficult to get your head around at first. In this post I give a step-by-step walk-through of the derivation of gradient descent learning algorithm commonly used to train ANNs (aka the backpropagation algorithm) and try to provide some high-level insights into the computations being performed during learning.An ANN consists of an input layer, an output layer, and any number (including zero) of hidden layers situated between the input and output layers. Figure 1 diagrams an ANN with a single hidden layer. The feed-forward computations performed by the ANN are as follows: The signals from the input layer

a_i are multiplied by a set of fully-connected weights

w_{ij} connecting the input layer to the hidden layer. These weighted signals are then summed and combined with a bias

b_i (not displayed in the graphical model in Figure 1). This calculation forms the pre-activation signal

z_j = b_j + \sum_i a_i w_{ij} for the hidden layer. The pre-activation signal is then transformed by the hidden layer activation function

g_j to form the feed-forward activation signals leaving leaving the hidden layer

a_j. In a similar fashion, the hidden layer activation signals

a_j are multiplied by the weights connecting the hidden layer to the output layer

w_{jk}, a bias

b_k is added, and the resulting signal is transformed by the output activation function

g_k to form the network output

a_k. The output is then compared to a desired target

t_k and the error between the two is calculated.

Training a neural network involves determining the set of parameters

\theta = \{\mathbf{W},\mathbf{b}\} that minimize the errors that the network makes. Often the choice for the error function is the sum of the squared difference between the target values

t_k and the network output

a_k (for more detail on this choice of error function see):

\Large{\begin{array}{rcl} E &=& \frac{1}{2} \sum_{k \in K}(a_k - t_k)^2 \end{array}}

Equation (1)

This problem can be solved using gradient descent, which requires determining

\frac{\partial E}{\partial \theta} for all

\theta in the model. Note that, in general, there are two sets of parameters: those parameters that are associated with the output layer (i.e.

\theta_k = \{w_{jk}, b_k\}), and thus directly affect the network output error; and the remaining parameters that are associated with the hidden layer(s), and thus affect the output error indirectly.

Before we begin, let’s define the notation that will be used in remainder of the derivation. Please refer to Figure 1 for any clarification.

{z_j}: input to node

j for layer

{g_j}: activation function for node

j in layer

l (applied to

{z_j})

a_j=g_j(z_j): ouput/activation of node

j in layer

{w_{ij}}: weights connecting node

i in layer

(l-1) to node

j in layer

{b_{j}}: bias for unit

j in layer

{t_{k}}: target value for node

k in the output layer

Previous Question

Next Question