You run gradient descent for 15 iterations with \alpha = 0.3α=0.3 and compute J(\theta)J(θ) after each iteration. You find that the value of J(\theta)J(θ) increases over time. Based on this, which of the following conclusions seems most plausible?
Answers
Answer:
α = 0.3
Explanation:
In the said statement there can be three possible conclusions -
a. Rather than use the current value of α, it will be more promising to try a larger value of α (say α=1.0).
b. Rather than use the current value of α, it will be more promising to try a smaller value of α (say α=0.1).
c. α=0.3 is an effective choice of learning rate
Hence,
Gradient descent = 15 iterations
α = 0.3α = 0.3
J(θ) = Increases/decreases
Thus,
α = 0.3 is an effective choice of learning rate because, we want the gradient descent to quickly converge to the minimum possible rate, so that the current setting of α seems to be a good step.
Answer:
Explanation:
You run gradient descent for 15 iterations with α=0.3 and compute J(θ) after each iteration. You find that the value of J(θ) decreases quickly then levels off. Based on this, the conclusion that seems most plausible among the given options is following - α=0.3 is an effective choice of learning rate.
As we want the gradient descent to converge quickly to minimum, the current setting of the α can be good.