Find Median in a Data Stream
The median is the middle value in a sorted list of integers. For lists of even length, there is no middle value, so the median is the mean of the two middle values.
For example:
For arr = [1,2,3], the median is 2.
For arr = [1,2], the median is (1 + 2) / 2 = 1.5
Implement the MedianFinder class:
MedianFinder() initializes the MedianFinder object.
void addNum(int num) adds the integer num from the data stream to the data structure.
double findMedian() returns the median of all elements so far.
Example 1:
Input:
["MedianFinder", "addNum", "1", "findMedian", "addNum", "3" "findMedian", "addNum", "2", "findMedian"]
Output:
[null, null, 1.0, null, 2.0, null, 2.0]
Explanation:
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.findMedian(); // return 1.0
medianFinder.addNum(3); // arr = [1, 3]
medianFinder.findMedian(); // return 2.0
medianFinder.addNum(2); // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0
Constraints:
-100,000 <= num <= 100,000
findMedian will only be called after adding at least one integer to the data structure.
Solution
We can divide this ordered list into 2 parts. Maintain the first half via a max-heap and maintain the second half via a min-heap. If we keep the balance between the size of these two heaps, we can guarantee that the median is always at the top of them.
To do this, we need to adjust the size of heaps after each addNum. As we only add 1 number once, we only need to move at most 1 element from one heap to another.
Code
heapq
is a good way to realize heap (or say priority queue).
py
class MedianFinder:
def __init__(self):
self.first = [(100001, -100001)]
self.second = [(100001, 100001)]
def addNum(self, num: int) -> None:
first_max = self.first[0][1]
second_min = self.second[0][1]
if num <= first_max:
heapq.heappush(self.first, (-num, num))
else:
heapq.heappush(self.second, (num, num))
if len(self.first) > len(self.second) + 1:
temp = heapq.heappop(self.first)
heapq.heappush(self.second, (-temp[0], temp[1]))
if len(self.second) > len(self.first):
temp = heapq.heappop(self.second)
heapq.heappush(self.first, (-temp[0], temp[1]))
def findMedian(self) -> float:
if len(self.first) == len(self.second):
return (self.first[0][1]+self.second[0][1])/2
else:
return self.first[0][1]