**The math is all OK; computation is the nightmare.**

Although the theory of model fitting is quite straightforward, the
implementation can be cumbersome. This is mostly due to the fact that we
have limited precision computers and limited amounts of time. Even
though all the computation is done in 8-byte floats, limited resources
and sometimes the obnoxious shape of the likelihood landscape (or
χ^{2}-landscape) can make life for the fitter difficult.
Another common bear on the road is the (near-)degeneracy in the model
with respect to the data. It sometimes looks more like an art or a
craft than like science

For diagnostic (and debugging) purposes in the iterative fitters there
is the `verbose`

option, which prints increasingly more information for
higher values.

It is also a good idea to visually inspect the results of the fit. All
fitters have a method

`fitter.plotResult( xdata, ydata, model )`

It is more easily called by turning on the plot option in the fit method:

`fitter.fit( ydata, plot=True )`

Here are some guidelines that might help to get usefull results.

The independent variable(s) (x) should be nice numbers, ie. roughly of
order 1. Mostly the fitter obtains solutions by manipulating a matrix
consisting of a direct product of the partial derivatives of the model
to each of its parameters. If the elements of this matrix wildly vary in
size, loss of precision is quickly attained.

E.g. a polynomial model of order 3 with an independent variable which
has values from 1 to 100, will have a matrix with values ranging between
1 and 10^{12}. Mathematically this is all OK, computation ...

The Fitter software can not and does not scale its inputs in any way. It
takes it all at face value. It is upto the user to present the Fitters
with usable data.

The dependent variable (y) has less constraints. Still there is a silent
assumption in the algorithms that the amount of noise in the data is of
the order 1. This is only of importance for the stopping of
iterative fitters. An iterative fitter stops when the change in the cost
function from one iteration to the next is less than the tolerance. When
the cost function is small to begin with, due to small y values, the
fitter might stop too early. Or too late when y values are large.
There is no way to do it right in all imaginable cases.

To redress this condition you can either use weights
or set the tolerance attribute to adapt the stopping
criterion to your problem, or scale the dependent variable such that
the noise level attains a more usefull value.

Check whether χ^{2} is of the order of the
number of datapoints.

Sometimes the model is degenerate, meaning that 2 (or more) of its parameters are essentially measuring the same thing. Trying to fit data using Fitter to such a model results in a singular matrix. In general it is better to use models which are not degenerated.

It can happen that a runtime warning is thrown during the run of a fitter (or of NestedSampler). In most cases the results are still OK. The program sailed undamaged pass the obstacle. Although some extra precautions should be taken with the results, the warnings can almost always be ignored.

When a non-linear fitter searches for a minimum, it migh happen that for
almost all values of the parameters χ^{2} does not have a
gradient. That is when the parameters move away from the sought minimum,
the χ^{2} does not or hardly change. Mostly this can be the
case with models of periodic functions or when you are far from the
global minimum. Only at a small area in the parameters space around the
minimum there is a gradient to move along.

A similar and sometimes simultaneous condition happens when
the landscape is multimodal, i.e. it has more than one minimum. Of
course only one minimum is the lowest, the global minimum. And that one
is the one we want to find. In general fitters don't have a global view
on the χ^{2} landscape. They fall for the first minimum they
encounter.

If you can in some other way localise the area where your parameters
should be found, feed them to the system as initial parameters.
Otherwise you have to use the AmoebaFitter in the annealing mode or even
do exhaustive search. Both strategies take a lot more time.

Sometimes a nonlinear fitter produces a division-by-zero warning. When the fitter just continues and produces sensible results, the warnings can be ignored. They are caused by a noise scale equal to 0, when calculating the loglikelihood. As +inf is larger than any other number the fitter cannot have a minimum there.

The model is (almost) degenerate under the data it is presented to. In general this signals that the model is not the best one for the data at hand. Use a simpler model.

The size of the Simplex in the AmoebaFitter is by default 1. When your x-data is very much larger (or smaller) than 1 and you set your startValues accordingly, you might want to adapt the size of the simplex too. In extreme cases the 1 vanishes in the precision of the startValues and the simplex is frozen in some dimension(s). Use AmoebaFitter.fit( data, size=simplexsize ).

This can only happen when the data fit the
model perfectly. And as true perfection is not given to us it often is
the result of something else e.g. digitization of the data: the data is
only perfect after they have been digitized.
This digitization gives each data point
a noise of 1/sqrt(12), the standard deviation of a uniform distribution.
The actual position of each datapoint can be anywhere between two
digitization levels (presumably at distance 1).
This noise can be added to the internal χ^{2} with
setChiSquared( N/12 ).

When you think that the standard deviations on the parameters are
ridiculously large, please calculate the confidence region on the model fit
(use fitter.monteCarloError()). This 1-σ confidence interval is obtained
directly from the standard deviation and its covariance matrix. Due to strong
covariance between the parameters, the standard deviations can be large
while the confidence region is still acceptable.

The confidence region is actually is better indicator for how well the model
performs than the standard deviations on the parameters.

Standard deviations are proportional to the noise scale divided by the square root of the number of points. Nestedsampler is doing that too. However, if the noise scale is fixed during the run of NestedSampler, the standard deviations on the parameters is as much off as the true noise scale differs from the provided noise scale.

Even worse, if you set the scale of the error distribution to a fixed value while the residuals are significantly smaller, their contributions to the likelihood are all similarly small.

Sometimes it looks like NestedSampler cannot find the maximum in the likelihood
landscape. The path to the likelihood mountain is lost in the n-dim space
spanned by the parameters. A tighter prior might help. Or setting the attribute
minimumIterations to something large. By default it is 100. Enlarging it, gives
NestedSampler more opportunities to explore the parameter space.

Ensuring that the prior space contains the expected values for the posterior
parameter values, can also help.

The standard deviations of the parameters are calculated as the
square root of chisq multiplied by the diagonal elements of the inverse
Hessian matrix at the chisq minimum, divided by the degrees of freedom.
The chisq landscape at that point is a stationary minimum, curving up in
all directions. The shape of the minimum is approximated by the inverse
Hessian (also known as Covariance Matrix), upto the 2nd power in the
Taylor expansion of the chisq function.

When you have set limits on the parameters and the unconstraint minimum
of chisq would be outside the limits, then the constraint minimum of
chisq is at one of the limits. This constraint minimum is not a
stationary point any more so the assumptions above about a neat valley
in the chisq landscape, curving up to all sides are not met.
What the resulting calculations will give is anyones guess.