To calculate the median value in Python:
- Import the statistics module.
- Call the statistics.median() function on a list of numbers.
For example, let’s calculate the median of a list of numbers:
import statistics numbers = [1, 2, 3, 4, 5, 6, 7] med = statistics.median(numbers) print(med)
Output:
4
The median value is a common way to measure the “centrality” of a dataset.
If you are looking for a quick answer, I’m sure the above example will do. But to really learn what median really is, why it is useful, and how to find it, read along.
This is a comprehensive guide to finding the median in Python.
What Is the Median Value in Maths
The Median is the middle value of a given dataset.
If you have a list of 3 numbers, the median is the second number as it is in the middle.
But in case you have a list of 4 values, there is no “middle value”. When calculating the median, of an even-sized dataset, the average of the two middle values is used.
Why and When Is Median Value Useful
When dealing with statistics, you usually want to have a single number that describes the nature of a dataset.
Think about your school grades for example. Instead of seeing the dozens of grades, you want to know the average (the mean).
Usually, measuring the “centrality” of a dataset means calculating the mean value. But if you have a skewed distribution, the mean value can be unintuitive.
Let’s say you drive to your nearby shopping mall 7 times. Usually, the drive takes around 10 minutes. But one day the traffic jam makes it last 2 hours.
Here is a list of driving times to the mall:
[9, 120, 10, 9, 10, 10, 10]
Now if you take the average of this list, you get ~25 minutes. But how well does this number really describe your trip?
Pretty badly.
As you can see, most of the time the trip takes around 10 minutes.
To better describe the driving time, you should use a median value instead. To calculate the median value, you need to sort the driving times first:
[9, 9, 10, 10, 10, 10, 120]
Then you can choose the middle value, which in this case is 10 minutes. 10 minutes describes your typical trip length way better than 25, right?
The usefulness of calculating the median, in this case, is that the unusually high value of 120 does not matter.
In short, you can calculate the median value when measuring centrality with average is unintuitive.
How to Calculate the Median Value in Python
In Python, you can either create a function that calculates the median or use existing functionality.
How to Implement Median Function in Python
If you want to implement the median function, you need to understand the procedure of finding the median.
The median function works such that it:
- Takes a dataset as input.
- Sorts the dataset.
- Checks if the dataset is odd/even in length.
- If the dataset is odd in length, the function picks the mid-value and returns it.
- If the dataset is even, the function picks the two mid values, calculates the average, and returns the result.
Here is how it looks in the code:
def median(data): sorted_data = sorted(data) data_len = len(sorted_data) middle = (data_len - 1) // 2 if middle % 2: return sorted_data[middle] else: return (sorted_data[middle] + sorted_data[middle + 1]) / 2.0
Example usage:
numbers = [1, 2, 3, 4, 5, 6, 7] med = median(numbers) print(med)
Output:
4
Now, this is a valid approach if you need to write the median function yourself. But with common maths operations, you should use a built-in function to save time and headaches.
Let’s next take a look at how to calculate the median with a built-in function in Python.
How to Use a Built-In Median Function in Python
In Python, there is a module called statistics. This module contains useful mathematical tools for data science and statistics.
One of the great methods of this module is the median() function.
As the name suggests, this function calculates the median of a given dataset.
To use the median function from the statistics module, remember to import it into your project.
Here is an example of calculating the median for a bunch of numbers:
import statistics numbers = [1, 2, 3, 4, 5, 6, 7] med = statistics.median(numbers) print(med)
Result:
4
Conclusion
Today you learned how to calculate the median value in Python.
To recap, the median value is a way to measure the centrality of a dataset. The Median is useful when the average doesn’t properly describe the dataset and gives falsy results.
To calculate the median in Python, use the built-in median() function from the statistics module.
import statistics numbers = [1, 2, 3, 4, 5, 6, 7] med = statistics.median(numbers)
Thanks for reading. Happy coding!