Current implementation requires fetching the data from all partitions and then run the match. It is fetched to the client where the QueryProcessor is running (can be a partition or the client).
We now run the join collocated in each partition if the it can run collocated which is when the join condition is on the routing value. The client will receive the filtered data.