Navigating the Nuances of Central Tendency in Forensic Data Analysis

Beyond the Mean: Navigating the Nuances of Central Tendency in Forensic Data Analysis

From the boardroom to the courtroom, there are ample examples of the misuse of statistics despite accurate calculations. Graphical and numerical descriptive statistics are wonderful tools to summarize large data sets, to condense huge amounts of information into concise charts and single values. However, the values reported must be interpreted correctly and used appropriately. It is essential to understand the underlying data, how the data was obtained, and what you are trying to substantiate in order to determine the appropriate statistical measure. Forensic accountants are frequently asked to gather data and use this information to report and answer all sorts of financial questions. Without the proper knowledge and understanding, the misuse of statistics can lead to inaccurate conclusions, ambiguous reasoning, and embarrassment in the courtroom.

Descriptive Statistics involve graphical and numerical methods to describe, organize, and summarize data. There are four general characteristics used to describe a data set:

  1. The shape of the distribution, for example, is the data skewed or symmetric,
  2. The center, or central tendency, of the distribution,
  3. The variability, or dispersion, of the data, for example, are the data compact or spread out,
  4. Outliers; observations that are very far away from the rest.

Measures of central tendency are the most common descriptive measures; they are single values, computed from the data, that convey some idea of the center of the data, where the majority of the data is located. The most common measure of central tendency is the sample mean, denoted. This is the sum of the observations divided by the total number of observations. Unfortunately, many people use the word average and mean synonymously. However, the sample mean is an average; there are many other averages.

The sample mean is very sensitive to outlying values. Outliers tend to pull the sample mean in their direction, and can lead to a misleading measure of central tendency. Therefore, the sample median, denoted, is often used as a measure of central tendency. The sample median is simply the middle value when the data is arranged in order from smallest to largest. This descriptive statistic divides the data in half, is very easy to calculate, and is not sensitive to outlying values. The sample median is often used to measure central tendency in data sets involving income or home prices. In both of these cases there are often observations that are very large, far away from the rest of the data, that would pull the sample mean in their direction. In this case, the sample median is a better measure of the center of the data.

The relative position of the sample mean and the sample median can suggest the shape of the distribution. For example, if the sample mean is much larger than the sample median, this suggests that the distribution is skewed to the right. If the two values are approximately the same, this suggests that the distribution is approximately symmetric.

There are other measures of central tendency, for example, a trimmed mean, the mode, the geometric mean, the harmonic mean, or even just the arithmetic mean of the smallest and largest observations. To compute a trimmed mean we trim a certain percentage of the data from both sides of the distribution, presumably possible outliers, and compute the sample mean of the remaining data. This statistic is not sensitive to outliers, but does disregard some of the data. The mode is the observation that occurs most often, or with the greatest frequency. Theoretically, this is a measure of central tendency, we would expect the value that occurs most often to be near the center of the distribution. However, it is possible for a data set to have no mode, or more than one mode.

There are lots of descriptive statistics to choose from in order to measure central tendency. The sample mean and the sample median are the most common, but the most appropriate measure is usually dictated by the sample data, or the underlying population. For statistical inference, the sample mean is usually the most appropriate measure. However, if very little is known, or assumed, about the underlying population, then the sample median is often used.

Software packages routinely report several measures of central tendency. But it is important to know more about the data and the underlying population, and to understand how each statistic is computed, in order to use the most robust, appropriate measure in each case.

If you are considering litigation and anticipate examination of large data sets which will be the basis of a compelling complaint, you need experts at summarizing data and statistical analysis. Alternatively, if the opposition is presenting an inaccurate report based on spurious statistical analysis, you need experts to identify this and to professionally explain the inconsistent arguments. North American Forensic Accounting has the knowledge, experience, and skills to evaluate data and use all the information to present precise, detailed conclusions. Call us today to learn how we can help you.