classSolution { public: vector<int> intersect(vector<int>& nums1, vector<int>& nums2){ sort(nums1.begin(), nums1.end()); sort(nums2.begin(), nums2.end()); int m = nums1.size(), n = nums2.size(); auto &&s = m < n ? nums1 : nums2; auto &&l = m < n ? nums2 : nums1; vector<int> res; auto it = l.begin(); for (int x : s) { it = lower_bound(it, l.end(), x); if (it == l.end()) { break; } elseif (*it == x) { res.push_back(x); ++it; } } return res; } };
if m << n, O(min(m,n)log(max(m,n))) is close to O(logn) while O(m+n) is close to O(n)
follow up #3
If only nums2 cannot fit in memory, put all elements of nums1 into a HashMap, read chunks of array that fit into the memory, and record the intersections.
If both nums1 and nums2 are so huge that neither fit into the memory, sort them individually (external sort), then read 2 elements from each array at a time in memory, record intersections. (similar to merge)
classSolution { public: vector<int> intersect(vector<int>& nums1, vector<int>& nums2){ sort(nums1.begin(), nums1.end()); sort(nums2.begin(), nums2.end()); int m = nums1.size(), n = nums2.size(); vector<int> res; for (int i = 0, j = 0; i < m && j < n;) { if (nums1[i] == nums2[j]) { res.push_back(nums1[i]); ++i; ++j; } elseif (nums1[i] < nums2[j]) { ++i; } else { ++j; } } return res; } };
another approach is to build index for sorted nums2, if nums2 is [1,2,3,4,5,6,7,8,9,10], we can build index [1:address, 3:address, 5:address, 7:address, 9:address], and put the indices in memory, each time we search the indices first and locate the chunk address in disk, then load the chunk in memory and do another round binary search of the chunk. the size of chunk must fit in the memory (indices are built based on this). because there might be some dup numbers, we can maintain a hashmap (or bloomfilter) in memory to tell how many dup numbers has already been searched (i.e., nums2 has only 4 number 5s but nums1 has 6 number 5s then the last 2 searches for number 5 will be aborted).