Truncated Mean

Before going to Trimmed Mean or Truncated Mean, lets have a quick view of the most common terms used in descriptive and summary statistics: (There are whole lot of other terms)

Mean : calculation of the central value of the dataset

Median: A value lying in the midpoint of the dataset

Mode: value with highest frequency (or most occurred)

Range : distance between the highest and smallest value of a dataset

Lets Say, for Example, we have a dataset:

A = {2, 5, 7, 11, 14, 17,17,2, 18,2, 20, 24, 25, 27, 32}

So, if we calculate using base R functions, it will be something like this:

> mean(A)
[1] 14.86667
> median(A)
[1] 17
> mode(A)
[1] “numeric”
> range(A)
[1] 2 – 32

So, What is Trimmed Mean ?

Similar to the “Mean” and “Median” , Truncated Mean or Trimmed Mean is also a measure of central tendency. It calculates mean or average discarding the samples at extreme high and low ends (outliers). It is a robust statistical method.

So, lets say we have a dataset : (source: wikipedia)

X = { 92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41 }

So, you can majority of the number (95%) lies in between 15 – 92), while negative values and 1053 are kind of extreme outliers. So, lets see what we will get if we calculate mean vs trimmed mean (with 5%)

> mean(x)
[1] 101.5
> mean(x, trim = 0.05)
[1] 56.5

So, what does this trim value means? How to choose trim value ?

If you are choosing 20% of trim or trim = 0.2 on a dataset of N = 20; (20% of 20 = 4), so it will remove first four lower value and last four higher values from the dataset or 20% of lower value and 20% of higher value.

Lets consider a dataset,

M = {20, 33, 22, 36, 38, 45, 67, 39, 12, 15, 19, 28, 10, 44, 56, 63, 72, 30, 42, 48}

first lets sort the dataset in ascending order:

sort(M)
10 12 15 19 20 22 28 30 33 36 38 39 42 44 45 48 56 63 67 72

so, if we now trim it with 20% it will be like removing first four value and last four value so, the trimmed dataset will be:

10 12 15 19 20 22 28 30 33 36 38 39 42 44 45 48 56 63 67 72

M = { 20 22 28 30 33 36 38 39 42 44 45 48 }

People generally choose 20% trim, but you can choose other trim value also based on your data distribution.

What is the advantage of truncated Mean ?

a. As it less sensitive to outliers, it gives reasonable estimates of central tendency when sample distribution is skewed or uneven.

b. standard error of trimmed mean is less effected by outliers

What are the statistical test you can use with truncated mean?

You can use trimmed means instead of means in t-test. However, calculation of standard error differs from traditional t-test formula as because the values are nor more independent after trimming. Adjusted standard error calculation for trimmed mean was originally proposed by Karen Yuen in 1974, which involves “winsorization”. In winsorization instead of removing observation like in trimming, we replace the values with extreme values. so for dataset M, 20% winsorized sample will be :

M = { 10 10 10 10 20 22 28 30 33 36 38 39 42 44 45 48 72 72 72 72 }

Yuen’s approach:

trimmed mean with confidence interval : trimci(x, tr = 0.2, alpha = 0.05)

Yuen t-test for 2 independent groups: yuen(x, y, tr=0.2)

Yuen t-test for 2 dependent groups: yuend(x, y,, tr=0.2)

Further Readings:

  1. Trimmed Mean: https://garstats.wordpress.com/2017/11/28/trimmed-means/
  2. Karen K. Yuen. The two-sample trimmed t for unequal population variances, Biometrika, Volume 61, Issue 1, 1 April 1974, Pages 165–170, https://doi.org/10.1093/biomet/61.1.165
  3. Trimmed mean: https://medium.com/@HollyEmblem/when-to-use-a-trimmed-mean-fd6aab347e46

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s