runmed package:stats R Documentation(latin1) _R_u_n_n_i_n_g _M_e_d_i_a_n_s - _R_o_b_u_s_t _S_c_a_t_t_e_r _P_l_o_t _S_m_o_o_t_h_i_n_g _D_e_s_c_r_i_p_t_i_o_n: Compute running medians of odd span. This is the 'most robust' scatter plot smoothing possible. For efficiency (and historical reason), you can use one of two different algorithms giving identical results. _U_s_a_g_e: runmed(x, k, endrule = c("median", "keep", "constant"), algorithm = NULL, print.level = 0) _A_r_g_u_m_e_n_t_s: x: numeric vector, the 'dependent' variable to be smoothed. k: integer width of median window; must be odd. Turlach had a default of 'k <- 1 + 2 * min((n-1)%/% 2, ceiling(0.1*n))'. Use 'k = 3' for 'minimal' robust smoothing eliminating isolated outliers. endrule: character string indicating how the values at the beginning and the end (of the data) should be treated. '"_k_e_e_p"' keeps the first and last k2 values at both ends, where k2 is the half-bandwidth 'k2 = k %/% 2', i.e., 'y[j] = x[j]' for j = 1,..,k2 and (n-k2+1),..,n; '"_c_o_n_s_t_a_n_t"' copies 'median(y[1:k2])' to the first values and analogously for the last ones making the smoothed ends _constant_; '"_m_e_d_i_a_n"' the default, smooths the ends by using symmetrical medians of subsequently smaller bandwidth, but for the very first and last value where Tukey's robust end-point rule is applied, see 'smoothEnds'. algorithm: character string (partially matching '"Turlach"' or '"Stuetzle"') or the default 'NULL', specifying which algorithm should be applied. The default choice depends on 'n = length(x)' and 'k' where '"Turlach"' will be used for larger problems. print.level: integer, indicating verboseness of algorithm; should rarely be changed by average users. _D_e_t_a_i_l_s: Apart from the end values, the result 'y = runmed(x, k)' simply has 'y[j] = median(x[(j-k2):(j+k2)])' (k = 2*k2+1), computed very efficiently. The two algorithms are internally entirely different: "_T_u_r_l_a_c_h" is the Härdle-Steiger algorithm (see Ref.) as implemented by Berwin Turlach. A tree algorithm is used, ensuring performance O(n * log(k)) where 'n <- length(x)' which is asymptotically optimal. "_S_t_u_e_t_z_l_e" is the (older) Stuetzle-Friedman implementation which makes use of median _updating_ when one observation enters and one leaves the smoothing window. While this performs as O(n * k) which is slower asymptotically, it is considerably faster for small k or n. _V_a_l_u_e: vector of smoothed values of the same length as 'x' with an 'attr'ibute 'k' containing (the 'oddified') 'k'. _A_u_t_h_o_r(_s): Martin Maechler maechler@stat.math.ethz.ch, based on Fortran code from Werner Stuetzle and S-plus and C code from Berwin Turlach. _R_e_f_e_r_e_n_c_e_s: Haerdle, W. and Steiger, W. (1995) [Algorithm AS 296] Optimal median smoothing, _Applied Statistics_ *44*, 258-264. Jerome H. Friedman and Werner Stuetzle (1982) _Smoothing of Scatterplots_; Report, Dep. Statistics, Stanford U., Project Orion 003. Martin Maechler (2003) Fast Running Medians: Finite Sample and Asymptotic Optimality; working paper available from the author. _S_e_e _A_l_s_o: 'smoothEnds' which implements Tukey's end point rule and is called by default from 'runmed(*, endrule = "median")'. 'smooth' uses running medians of 3 for its compound smoothers. _E_x_a_m_p_l_e_s: require(graphics) utils::example(nhtemp) myNHT <- as.vector(nhtemp) myNHT[20] <- 2 * nhtemp[20] plot(myNHT, type="b", ylim = c(48,60), main = "Running Medians Example") lines(runmed(myNHT, 7), col = "red") ## special: multiple y values for one x plot(cars, main = "'cars' data and runmed(dist, 3)") lines(cars, col = "light gray", type = "c") with(cars, lines(speed, runmed(dist, k = 3), col = 2)) ## nice quadratic with a few outliers y <- ys <- (-20:20)^2 y [c(1,10,21,41)] <- c(150, 30, 400, 450) all(y == runmed(y, 1)) # 1-neighbourhood <==> interpolation plot(y) ## lines(y, lwd=.1, col="light gray") lines(lowess(seq(y),y, f = .3), col = "brown") lines(runmed(y, 7), lwd=2, col = "blue") lines(runmed(y,11), lwd=2, col = "red") ## Lowess is not robust y <- ys ; y[21] <- 6666 ; x <- seq(y) col <- c("black", "brown","blue") plot(y, col=col[1]) lines(lowess(x,y, f = .3), col = col[2]) lines(runmed(y, 7), lwd=2, col = col[3]) legend(length(y),max(y), c("data", "lowess(y, f = 0.3)", "runmed(y, 7)"), xjust = 1, col = col, lty = c(0, 1,1), pch = c(1,NA,NA))