Science, asked by vivan3447, 1 year ago

Comparative analysis of evolutionary algorithms for dataset using matlab

Answers

Answered by mrunalinividya

Array CGH data consist of the log-ratios of normalized intensities from disease vs control samples, indexed by the physical location of the probes on the genome. The goal is to identify regions of concentrated high or low log-ratios. In general, these regions of interest can be very small; some microdeletions may only contain a single probe. Because attempting to identify such small regions can result in too many false positives, information from consecutive probes are used to identify larger regions with more confidence.

The first analytical methods were simple yet intuitive and often effective, involving smoothing of the ratio profiles and applying a reasonable threshold to determine if the average ratio over a potential region signified an amplification or a deletion. For instance, a moving average was used to process the ratios, and a ‘normal versus normal’ hybridization was used to compute a threshold level (Pollack et al., 2002). In another study, a simple maximum likelihood method was used to fit a mixture of three Gaussian distributions corresponding to gain, loss and normal regions (Hodgson et al., 2001).

Broadly, there are two estimation problems. One is to infer the number and statistical significance of the alterations; the other is to locate their boundaries accurately. The many available methods differ in the ways in which each part is modeled and the two are combined. In general, the formulation of a model-based method presumes a sequence of piecewise constant segments as a function of various parameters such as the number of breakpoints, their locations and the mean/variance of the distributions for each segment. Then the maximization of a function, typically a log-likelihood, is used to estimate the model parameters from the data. In the likelihood, a penalty term for the number of segments is often included to avoid too fine a partition, which tends to increase the likelihood. Models differ in their distributional assumptions and the incorporation of penalty terms.

Subsequently, more complicated methods for denoising and estimating the spatial dependence were derived. Genomic amplifications and deletions are assumed to cover multiple probes in general, and an effective incorporation of this spatial structure is a key component in any algorithm. For instance, a quantile smoothing method based on the minimization of errors in L1 norm (sum of absolute errors) rather than L2 norm (sum of squared errors) is shown to give sharper boundaries between segments (Eilers and de Menezes, 2005). Another promising smoothing algorithm is a denoising by wavelets (Hsu et al., 2005), a nonparametric technique that appears to handle abrupt changes in the profiles well. A simple and more common approach is based on robust locally weighted regression and smoothing scatterplots (lowess), introduced inCleveland (1979). This has been used previously in other works such as inBeheshti et al. (2003).

In Olshen and Venkatraman (2002) and Olshen et al. (2004) the binary segmentation method (Sen and Srivastava, 1975) is modified to allow splits into either two or three segments. In this algorithm, termed Circular Binary Segmentation (CBS), the maximum of a likelihood ratio statistic is used recursively to detect narrower segments of aberration. In Jong et al. (2003),2004, a genetic search algorithm is used to maximize a likelihood with a penalty term containing the number of breakpoints. In Hupe et al. (2004), a more complex likelihood function with weights determined adaptively is used to solve the estimation problem locally based on data smoothed by the Adaptive Weights Smoothing procedure (Polzehl and Spokoiny, 2000). A likelihood method with a different penalty function is used in Picard et al. (2005) for the number of segments to avoid underestimation on them. It is pointed out that a distribution assumption can have an important

Previous Question

Next Question