Data Collection & Presentation

FORMULATION OF FREQUENCY TABLE FOR UNGROUPED DATA

Ungrouped data refers to a dataset in which the individual observations or values are presented without any grouping or classification into intervals. This type of data does not have predefined class intervals and may consist of discrete values or continuous measurements. Analyzing ungrouped data involves performing certain key steps to organize and summarize the information effectively.

The first step in handling ungrouped data is to create a tally sheet. A tally sheet is a simple method of recording the frequency or count of each unique value or observation in the dataset. It involves drawing a vertical line for every occurrence of a particular value and grouping these lines in sets of five. Once all the observations are tallied, the number of lines for each value is counted to determine its frequency.

After completing the tally sheet, the next step is to construct a frequency table. A frequency table provides a systematic summary of the data by presenting the values and their corresponding frequencies. It consists of two columns: one for the distinct values in the dataset and another for their respective frequencies. The values are listed in ascending or descending order, and their frequencies are recorded beside them.

The frequency table for ungrouped data offers valuable insights into the distribution and occurrence of individual values within the dataset. It allows for a clearer understanding of the data by highlighting the most frequent or recurring values and identifying any outliers or unique observations. Additionally, the frequency table can serve as a foundation for further statistical analysis or visualization, aiding in the interpretation and communication of the dataset’s characteristics.

Example: The following are the scores of thirty (30) students of SS 1 in an economics test.

2,         4,         8,         8,         2,         6,         6,         8,         2,         4

8,         0,         8,         6,         0,         10,       2,         2,         0,         10

4,         6,         0,         10,       2,         2,         6,         6,         4,         2

 

Scores Tally Frequency
0 Illl 4
2 llll lll 8
4 Illl 4
6 llll   l 6
8 llll 5
10 Ill 3
30

 MEASURES OF CENTRAL TENDENCY

  1. Mean
  2. Median
  3. Mode

MEASURES OF CENTRAL TENDENCY

Measures of central tendency means are values that show the degree to which a given data or any given set of values will converge toward the central point of the data. It is also called a measure of location and is the statistical information that gives the middle or center or average of a set of data. It includes mean, median, and mode.

THE MEAN

The mean, also known as the arithmetic mean, is a fundamental measure of central tendency used to describe the typical or average value in a set of data. It is calculated by summing up all the values in a series and then dividing that sum by the total number of observations.

The mean is widely employed in various fields of study, such as statistics, mathematics, economics, and social sciences, due to its simplicity and interpretability. It provides a concise representation of the data set, offering a single value that summarizes the overall trend or central position of the observations.

To compute the mean, each individual value in the data set is added together to obtain their sum. This sum is then divided by the total number of observations, resulting in the mean value. By dividing the sum by the number of observations, the mean balances out the contribution of each value, making it a fair representation of the entire data set.

The mean is commonly used for both discrete and continuous data, making it applicable to a wide range of scenarios. For discrete data, such as the number of siblings a person has, the mean provides an average whole number. In the case of continuous data, like the heights of individuals, the mean represents the average value with decimal places.

By utilizing the mean, researchers, analysts, and decision-makers gain valuable insights into the central tendency of the data. It allows for comparisons between different data sets or subgroups, facilitating the identification of patterns, trends, or deviations from the norm. Furthermore, the mean serves as a foundation for various statistical analyses, including hypothesis testing, regression modeling, and inferential statistics.

While the mean is a powerful descriptive statistic, it can be influenced by outliers or extreme values in the data set, leading to potential distortions. In such cases, other measures of central tendency, such as the median or mode, may provide a more robust representation of the data. Nevertheless, the mean remains a widely utilized and fundamental measure, providing a quick and intuitive understanding of the average value within a given data set.

TYPES OF MEAN

  1. The Arithmetic Mean
  2. The Geometric Mean
  3. The Quadratic Mean

When it comes to calculating means, there are several types that serve different purposes and are applicable in various contexts. Here, we will explore three commonly used types of means: the arithmetic mean, the geometric mean, and the quadratic mean.

1. The Arithmetic Mean:
The arithmetic mean, also known as the average, is the most widely used type of mean. It is calculated by summing up all the values in a dataset and then dividing the sum by the total number of observations. The arithmetic mean provides a measure of the central tendency and is often used to describe the typical value in a set of data. It is useful for both discrete and continuous data and provides a straightforward representation of the data set.

2. The Geometric Mean:
The geometric mean is primarily used when dealing with positive values, such as rates of growth, investment returns, or ratios. It is calculated by taking the nth root of the product of n values. The geometric mean is useful when examining multiplicative relationships between variables, as it captures the proportional changes between values. It is commonly used in finance, biology, and other fields where relative changes and growth rates are of interest.

3. The Quadratic Mean (Root Mean Square):
The quadratic mean, also known as the root mean square (RMS), is frequently employed in areas involving squared quantities, such as signal processing, physics, and engineering. It is calculated by taking the square root of the average of the squared values in a dataset. The quadratic mean is particularly useful when dealing with fluctuating values or analyzing the magnitude of fluctuations. For example, it is utilized in measuring the effectiveness of soundproofing materials or calculating the average power in an electrical circuit.

Each type of mean has its own unique characteristics and applications. The arithmetic mean provides a general overview of the data set, the geometric mean emphasizes relative changes and ratios, while the quadratic mean focuses on squared quantities and fluctuations. Understanding the appropriate use of each mean is crucial for accurate analysis and interpretation of data in specific domains.

Example

Calculate the arithmetic mean of the following scores of eight students in an economics test. The scores are: 14, 18, 24, 16, 30, 12, 20, and 10.

Solution

Add up the scores

14+18+24+16+30+12+20+10 = 144

Number of observations (students) = 8

Arithmetic Mean =Sum of observations divided by the number of observations

=   144      = 18

8

5 ADVANTAGES OF THE MEAN

The mean, also known as the arithmetic mean, possesses several advantages that contribute to its widespread use and popularity as a measure of central tendency.

1. Easy to derive or calculate:
One of the primary advantages of the mean is its simplicity in calculation. It involves summing up all the values in a dataset and dividing the sum by the total number of observations. This straightforward calculation makes it accessible to individuals with varying levels of mathematical proficiency and allows for quick and efficient computations.

2. Easy to interpret:
The mean offers a straightforward interpretation. It represents the average value or the typical value within a dataset. This makes it intuitive for individuals to understand and relate to. For example, if the mean age of a group is 30 years, it indicates that, on average, the individuals in the group have an age of 30 years.

3. Best-known average:
The mean is perhaps the most widely recognized and commonly used measure of central tendency. Its ubiquity in statistical analysis, research, and everyday applications has led to its familiarity among researchers, practitioners, and the general public. Its popularity ensures that it is widely understood and accepted as a valid measure.

4. Determinate exact value:
Unlike some other measures of central tendency, such as the median or mode, the mean possesses an exact value that can be precisely determined. This determinate value facilitates precise calculations and allows for direct comparison and analysis of datasets. The mean provides a specific numerical representation of the dataset’s central position.

5. Provides a good measure of comparison:
The mean is particularly useful when comparing different datasets or subgroups. It provides a common metric that allows for meaningful comparisons between various groups or categories. For instance, the mean salaries of employees in different departments can be compared to evaluate disparities or identify areas of concern.

By leveraging these advantages, the mean becomes a powerful tool for summarizing and analyzing data. Its ease of calculation, interpretability, established prominence, determinate value, and comparability contribute to its extensive utilization across a wide range of disciplines and applications. However, it is important to consider the limitations and potential biases associated with the mean, such as its sensitivity to outliers and skewed distributions, and to supplement its analysis with other measures of central tendency when necessary.

5 DISADVANTAGES OF THE MEAN

While the mean, or arithmetic mean, has several advantages, it is essential to acknowledge its limitations and potential disadvantages.

1. Difficult to determine without calculation:
Unlike other measures of central tendency, such as the median or mode, which can be identified by inspection or sorting, the mean requires calculation. This can be a disadvantage when working with large datasets or in situations where manual calculations may be time-consuming or prone to error.

2. Some facts may be concealed:
The mean can sometimes mask important information or variations within a dataset. It provides a single value that represents the average, but it does not reveal the underlying distribution or patterns. Extreme values, outliers, or skewed distributions may have a significant impact on the mean, causing it to deviate from the typical values in the dataset.

3. Cannot be obtained graphically:
Unlike other measures like the median, which can be obtained by graphical representation such as a box plot, the mean cannot be directly derived from a graph. It requires the calculation of all the individual values in the dataset, which limits the ability to quickly visualize or estimate the measure from a graphical representation.

4. Difficulty with incorrect or missing values:
When calculating the mean, the inclusion of incorrect or missing values can lead to distorted results. A single erroneous or outlier value can significantly affect the mean, causing it to deviate from the true representation of the dataset. Moreover, missing values can introduce challenges in determining the mean since it requires the availability of complete data.

5. Potential for distorted results:
The mean is sensitive to extreme values or outliers, which can skew the result. In datasets with skewed distributions or data that does not follow a normal distribution, the mean may not accurately represent the central tendency. It can be significantly influenced by a few extreme values and may not reflect the majority of observations in the dataset.

To overcome the limitations of the mean, it is advisable to consider complementary measures of central tendency, such as the median or mode, and to examine the distribution and characteristics of the data. Using multiple measures provides a more comprehensive understanding of the dataset and helps mitigate the potential distortions introduced by outliers or skewed data. Additionally, graphical representations and exploratory data analysis techniques can enhance the interpretation of data beyond relying solely on the mean.

THE MEDIAN

The median is a measure of central tendency that represents the middle value of a dataset when the values are arranged in ascending or descending order. It is an alternative to the mean and provides valuable insights, particularly when dealing with ungrouped data.

To calculate the median, the first step is to arrange the values in either ascending or descending order. Once the values are sorted, the middle value is selected as the median. In cases where the dataset has an odd number of observations, there will be a single middle value, making it the exact median. However, if the dataset has an even number of observations, the median is obtained by taking the average of the two middle values.

Example 1:

Calculate the median of the following scores: 12, 8 15, 9, 3, 7, and 1

Solution

Step 1:                First arrange in order

1, 3, 7, 8, 9, 12 and 15

Step 2:                Total frequency is 7, thus the middle number in the set is in the 4th position

Step 3:                Median = 4th Position = 8

Example 2:

Find the median of this set of numbers; 36, 42, 10, 15, 9, 32 16, and 12.

Solution

Step 1:             9, 10, 12, 15, 16, 32 36 and 42

Step 2:             Total Frequency = 8

Step 3:             Median = 4th and 5th Position

= 4th + 5th      =  15 + 16

2

31               = 15.5

2

Example 3

The following are the scores of 20 students in an Economics test. What is the median Mark?

5          10        2          9          5          3          4          6          1          3

2          3          6          1          3          3          2          3          4          3

Solution

Marks Tally Frequency
1 II 2
2 III 3
3 III     I 6
4 II 2
5 III 3
6 II 2
9 I 1
10 I 1
20

Median = f       = 70th + 11th              =  3 + 3           = 6      =  3

x                    2                            2    2

4 ADVANTAGES OF THE MEDIAN

The median, as a measure of central tendency, possesses several advantages that make it valuable in data analysis and interpretation.

1. Easy to determine with little or no calculations:
Unlike the mean which requires summing up all values and dividing by the total count, the median can be determined simply by arranging the values in order and identifying the middle value. This makes it a convenient measure that can be obtained quickly, especially when working with small or moderate-sized datasets.

2. Easy to understand and compute:
The concept of the median is straightforward and intuitive. It represents the middle value in a dataset when arranged in order. This simplicity makes it easily interpretable and communicable to individuals with varying levels of statistical knowledge. The median provides a clear representation of the typical value within the dataset.

3. Does not use all values in the distribution:
The median is robust to outliers or extreme values since it does not consider all values in the distribution. It only relies on the position of the middle value(s). This characteristic makes it particularly useful when dealing with skewed distributions or datasets with potential outliers. The median is not influenced by extreme values in the tails of the distribution, providing a more robust measure of central tendency.

4. Gives a clear idea of the distribution:
The median provides valuable insights into the overall distribution of the dataset. By identifying the middle value, it divides the data into two equal halves, allowing for a clear understanding of the spread and central position of the values. For symmetric distributions, the median is located at the center, providing a representative measure. In skewed distributions, the median may be closer to the bulk of the data, indicating the direction and degree of skewness.

The advantages of the median make it a useful tool in various scenarios. Its ease of determination, simplicity, robustness to outliers, and ability to portray the distribution contribute to its effectiveness in analyzing data, particularly in cases where the mean may be influenced by extreme values or when the underlying distribution is skewed or unknown. However, it’s important to note that the median does not capture the full information present in the dataset, and in some situations, complementary measures such as the mean or mode may be necessary for a comprehensive analysis.

DISADVANTAGES OF THE MEDIAN

While the median possesses several advantages, it is important to acknowledge its limitations and potential disadvantages.

1. Not useful in further statistical calculations:
One of the primary drawbacks of the median is that it does not lend itself well to further statistical calculations. Unlike the mean, which has properties that allow for various mathematical operations and statistical analyses, the median is not as amenable to manipulation in subsequent calculations. This limitation restricts its applicability in certain statistical procedures and can hinder more advanced data analysis.

2. Ignores very large or small values:
The median only considers the middle value(s) in a dataset and disregards the actual magnitude of the values. While this can be an advantage in terms of reducing the influence of extreme values or outliers, it also means that the median fails to account for the specific values that are extremely high or low. Consequently, the median may not provide a comprehensive representation of the full range of values in the dataset.

3. Does not represent a true average of the data set:
Unlike the mean, which calculates the sum of all values divided by the total count, the median does not provide a true average. The median represents the middle value, or the average of the middle two values in the case of an even number of observations. As a result, the median may not accurately reflect the central tendency or the typical value of the entire dataset. It does not take into account the values above or below the middle, potentially leading to a less precise estimation of the central position.

It is important to consider these disadvantages when deciding on the appropriate measure of central tendency for a given analysis. While the median is valuable in certain scenarios, such as when dealing with skewed distributions or data with outliers, its limitations in further statistical calculations and its inability to represent a true average should be taken into account. Depending on the specific requirements of the analysis and the nature of the data, complementary measures like the mean or mode may need to be employed to provide a more comprehensive understanding of the dataset.

THE MODE

The mode is a statistical measure that represents the most frequently occurring value or values in a dataset. In other words, it is the number or value with the highest frequency of occurrence. The mode provides insights into the observation that is the most popular or common within the dataset.

Calculating the mode can be done by forming a frequency table for the distribution. A frequency table organizes the data by listing each distinct value and its corresponding frequency, which represents the number of times that value appears in the dataset. By examining the frequency table, it becomes apparent which value or values have the highest frequency, indicating the mode.

The mode offers several important characteristics and applications:

1. Identifying the most popular observation: The mode allows us to determine the value or values that occur most frequently in a dataset. It provides insight into what is considered typical or widely observed within the data.

2. Categorical and discrete data: The mode is particularly useful when working with categorical or discrete data, where values are grouped into distinct categories or countable units. It helps to identify the most common category or value within these datasets.

3. Simplifies complex data: In datasets with a wide range of values and distributions, the mode can simplify the data by highlighting the most prevalent observation. It condenses the information by focusing on the value(s) that occur with the highest frequency, aiding in the interpretation and communication of the dataset’s characteristics.

4. Multiple modes: It is possible for a dataset to have multiple modes, where two or more values occur with the same highest frequency. In such cases, the dataset is considered multimodal, and each mode contributes to the understanding of the dataset’s characteristics. This is especially useful when analyzing data with distinct groups or subgroups.

The mode provides a simple and intuitive measure of central tendency that highlights the most frequently recurring value(s) in a dataset. By forming a frequency table, the mode can be easily determined, providing valuable information about the most common observation or category within the data. It is particularly useful for categorical or discrete data and simplifies complex datasets by focusing on the prevailing values.

Example

Using the frequency distribution of example 3 above, the mode is 3 because it has the highest frequency of six (6) 

MERITS

The mode, as a measure of central tendency, possesses several merits that make it valuable in data analysis and interpretation.

1. Easily understood:
One of the primary advantages of the mode is its simplicity and intuitive interpretation. The concept of the mode, representing the most frequently occurring value or values in a dataset, is easy to understand for individuals with varying levels of statistical knowledge. It provides a clear representation of the most common observation or category within the data, making it accessible and communicable.

2. Unaffected by extreme values:
Unlike the mean, which can be greatly influenced by extreme values or outliers, the mode is not affected by these atypical observations. It focuses solely on the values with the highest frequency, disregarding the magnitude of other values. This characteristic makes the mode a robust measure of central tendency, especially in datasets with significant outliers or skewed distributions.

3. Easy to calculate from the graph:
In certain cases, the mode can be easily determined from a graphical representation, such as a histogram or bar chart. By visually observing the peak or highest bar on the graph, one can identify the mode as the corresponding value or category. This ability to determine the mode from a graph provides a quick and visual method of calculating it, which can be particularly advantageous when dealing with large datasets.

4. Easy to determine:
Similar to calculating the mode from a graph, determining the mode directly from the dataset is also relatively simple. By organizing the data into a frequency table and identifying the value or values with the highest frequency, the mode can be easily determined. This ease of determination facilitates quick analysis and interpretation of the data, especially in cases where a quick overview of the most common observation is desired.

These merits of the mode make it a useful measure of central tendency in various scenarios. Its ease of understanding, robustness to extreme values, simplicity of calculation from a graph, and straightforward determination contribute to its effectiveness in summarizing and interpreting data. However, it is important to note that the mode may not always provide a complete representation of the entire dataset and may require the consideration of additional measures, such as the mean or median, for a comprehensive analysis.

DEMERIT

While the mode possesses several merits, it is important to recognize its limitations and potential drawbacks.

1. Poor average:
One of the main demerits of the mode is that it can sometimes be a poor representation of the average value within a dataset. Unlike the mean, which takes into account all values in the distribution, the mode only considers the most frequently occurring value(s). As a result, the mode may not accurately reflect the typical value or provide a comprehensive summary of the entire dataset. It may not account for variations or outliers present in the data.

2. Difficulty computing multiple modes:
In cases where a dataset has more than one mode, meaning that multiple values occur with the same highest frequency, the mode becomes more challenging to compute. While it is straightforward to determine a single mode, handling multiple modes requires additional consideration and analysis. The presence of multiple modes can complicate the interpretation of the dataset and may require additional statistical techniques to account for the multimodality.

3. Limited usefulness in further statistical calculations:
Unlike other measures of central tendency, such as the mean or median, the mode has limited applicability in further statistical calculations. It does not possess the mathematical properties or versatility that allow for various statistical operations or analyses. This limitation restricts its utility in certain statistical procedures and can hinder more advanced data analysis, especially when more precise estimates or computations are required.

4. Not considering all values in the distribution:
Another demerit of the mode is that it disregards all values in the dataset except for the most frequently occurring one(s). This exclusion of other values can lead to a loss of information. The mode does not provide a comprehensive overview of the full range or distribution of values, potentially overlooking important insights or variations within the dataset.

It is important to be aware of these demerits when deciding on the appropriate measure of central tendency for a particular analysis. While the mode can be useful in certain scenarios, such as identifying the most common observation or category, its limitations in representing the average, handling multiple modes, limited usefulness in further calculations, and not considering all values should be considered. Depending on the specific requirements of the analysis and the nature of the data, complementary measures like the mean or median may need to be employed to provide a more comprehensive understanding of the dataset.

 Read also:

Tools of Economic Analysis

Economic Problems: What, How & Whom to produce

Opportunity Cost

Concepts of Economics

Economics | Science, Arts & Social Science

Leave a Comment

Your email address will not be published. Required fields are marked *

Get Fully Funded Scholarships

Free Visa, Free Scholarship Abroad

           Click Here to Apply

Acadlly