A non linear model can be transformed into linear model as (the formula)
Answers
You don't transform the model. You transform the data, in order to fit a linear model.
Note that the term "linear model" is ambigious. It actually means a model that is linear in the coefficients that are to be estimated. It has the general form
y = b0 + b1*f1(x1,x2,...) + b2*f2(x1,x2,...) + b3*f3(x1,x2,...) +...
where the xi are predictor variables, fi are arbitrary functions (it can be the identity function, so that f(x)=x), and bi are the coefficients. The model is linear because none of the coefficients is inside some function.
So y = b0 + b1 * sin(x) does surely not describe a linear relationship between x and y but it is linear in the coefficients (b0 and b1). In contrast, y = b0 + sin(b1*x) is not a linear model, because it is not linear in b1.
To fit a model (i.e. to find the values of the coefficients) we need some assumption about the probability distribution of y (conditional on the values of bi). Then we can identify those values of the coefficients that will maximize the likelihood of the observed data. The math to do this can be pretty well simplified when the assumed conditional probability model is described by the normal distribution. In this case there exists efficient algorithms that will find the unique solution.
If the probability model is different, the solution might not be simple to find and it might not be unique. However, as long as the unknown coefficient are still al linear in the model, quite efficent algorithms exist to arrive at a good solution. In ancient times, where computers were rare, such problems could not be solved. It was therefore required to assume a conditional normal probability distribution for y to fit the model at all. This assumption might be unreasonable for a given response variable, but not so for an appropriately transformed one. The fitting of the model to the transformed data could therefore be done with the far less calculation-intensive procedures.
The "problm" with this approach is that a transformation of the response changes the meaning of the coefficients. Usually, the values of the coefficients are not really interpretable when using a transformed response. That can be a high price to pay. The log-transformation is one nice exception. It turns the additive effects of the coefficients in the model into multiplicative effects on the response, and the coefficients remain interpretable...
It is sometimes better not to transform the response values but to make the model fit a transformed expectation (mean) instead. This can be achieved by a link function used by generalized linear models that also allow to specify a particular variance structure and a probability model of the response.
Hope it helps