`arr1.filter()` says "apply a filtering function to every value in arr1". This is O(n) because as the size of arr1 (n) grows larger, the work grows at the same rate (linearly).
`arr2.includes()` says "check if value is in arr2", and most languages implement this method as a linear search (esp. if arr2 is an array). This is also O(m) because as the size of arr2 (m) grows larger, the work (linear search) grows at the same rate.
Taken together, the algorithm is doing "for every value of arr1, apply a filter function which linearly searches for the value in arr2". If arr1 is size 5 (n), and arr2 is size 10 (m), then the total work being done is 5*10 (n*m). Thus the overall performance is O(n*m), though it is commonly simplified to the worst case where both sizes are the same: O(n*n), or O(n^2).
This is the same as using nested for loops, just more elegant (and functional):
intersection = list()
for i in arr1:
for j in arr2:
if i == j:
intersection.append(1)
return intersection
---- SOLUTION ----
One approach is to convert one list into a Set, O(n) because you have to traverse the list once. Then iterate the second list and check if it exists in the Set. Checking if something exists in a Set is constant time, O(1), so you only have the cost of traversing the second list once O(m).
The overall performance is O(n + m * 1), or O(2n) because we assume worst case of both lists being the same size. That is usually simplified to O(n) as we're usually only concerned about the performance behavior, e.g. "how does it change as it gets bigger?" and not directly comparing say two algorithms of O(3n) and O(5n + 1).
There's still a lot of unhandled edge cases and other issues that need addressing in the solution described above, and there are other valid solutions as well. The goal is only to be better than O(n^2), not optimal.
No problem, even if you had it in school it takes some practice thinking this way to become fluent.
A fun exercise to give reality to the theoretical is to run some benchmarks for your solution, providing inputs of size 10, 100, 1K, 10K, 100K, 1M, etc. and seeing how the performance changes. You can also (with some benchmarking tools) look at the memory usage as well.
When talking about algorithm performance, there's usually considerations for both time and space (memory or disk) that need to be considered.
One thing you'll notice is that most solutions, even the O(n^2), are "fast enough" at small sizes that it doesn't matter what approach you take. In those situations, readability/clarity usually wins such as with your proposed solution.
If I knew my lists would never get larger than 100 elements (or even 1K), I would totally do what you did. The secret is in knowing the problem you're solving.
`arr2.includes()` says "check if value is in arr2", and most languages implement this method as a linear search (esp. if arr2 is an array). This is also O(m) because as the size of arr2 (m) grows larger, the work (linear search) grows at the same rate.
Taken together, the algorithm is doing "for every value of arr1, apply a filter function which linearly searches for the value in arr2". If arr1 is size 5 (n), and arr2 is size 10 (m), then the total work being done is 5*10 (n*m). Thus the overall performance is O(n*m), though it is commonly simplified to the worst case where both sizes are the same: O(n*n), or O(n^2).
This is the same as using nested for loops, just more elegant (and functional):
---- SOLUTION ----One approach is to convert one list into a Set, O(n) because you have to traverse the list once. Then iterate the second list and check if it exists in the Set. Checking if something exists in a Set is constant time, O(1), so you only have the cost of traversing the second list once O(m).
The overall performance is O(n + m * 1), or O(2n) because we assume worst case of both lists being the same size. That is usually simplified to O(n) as we're usually only concerned about the performance behavior, e.g. "how does it change as it gets bigger?" and not directly comparing say two algorithms of O(3n) and O(5n + 1).
There's still a lot of unhandled edge cases and other issues that need addressing in the solution described above, and there are other valid solutions as well. The goal is only to be better than O(n^2), not optimal.