Tutorial :Find duplicates between arrays



Question:

Assume you are given two arrays of integers of constant length which is 3, and you are always sure that two elements of the given two arrray will have same values.

so assume array A has three values: a, b, c. and array B has three values: d, e, f.

we are sure that two of the values will be same. we are asked to put these four different values in an array of size 4, such that output array C, should have in indices 1 and 2 the same values from arrays A and B. and at indices 0 and 3 it should have the different values of array A and B. i have implemented it, but really not satisfied with this solution... does anyone has better solution idea? except the one that would put my counters in array... :)

int[] a = { 1, 201, 354 };  int[] b = { 404, 201, 354 };    int[] c = new int[4];    for (int i = 0; i < c.Length; i++)  {      Console.WriteLine(c[i]);  }  


Solution:1

I'm sorry, I re-read more closely and I think this is what you want. Please advise. :)

int[] same = a.Intersect(b).ToArray(); ;  int[] diff = a.Union(b).Except(same).ToArray();  int[] c = new int[] { diff[0], same[0], same[1], diff[1] };  


Solution:2

What you are looking for is just a set of the two arrays (set contains every element once at most). The solution in c++:

#include <set>    int main () {      int a[] = { 1,2,3 };      int b[] = { 4,2,3 };        std::set<int> s(a, a+3);      s.insert(b, b+3);  }  


Solution:3

If you have LINQ at your disposal, the following code will suffice:

int[] c = a.Union(b).ToArray();  

Union checks for duplicates, so no further checking is necessary:

Returns: An System.Collections.Generic.IEnumerable that contains the elements from both input sequences, excluding duplicates.


Solution:4

Replace

// IRQ. 20100211. Deleted unncessary code  

with

var c = a.Concat(b).Distinct().ToArray();  

Update:

New one:

var same = a.Intersect(b);  var c = a.Except(same).Concat(same).Concat(b.Except(same)).ToArray();  

or these

var c = a.Except(b).Concat(a.Intersect(b)).Concat(b.Except(a));  var c = a.Except(b).Concat(a).Concat(b).Distinct();  


Solution:5

Here a cool solution in C(++)

int a[3], b[3]; /* the two arrays */  int c[4]; /* target */  int s=0, t=0, k;  int i;  for (i=0;i<3;i++) { k = a[i]-b[i]; s += k; t += k*(a[i]+b[i]); }    /* At this point s is the difference of the two distinct elements     and t is the difference of their squares, i.e. s = x - y and t = x^2 - y^2     because (x-y)(x+y) = x^2-yx+yx-y^2 = x^2-y^2     Because the two elements are distinct, s != 0 and we can easily divide t     by s to get (x + y), from which then we have     s == x - y     t == x + y     i.e. x = (s+t)/2 and y=(t-s)/2 */    t /= s;  int x = (s + t) / 2;  int y = (t - s) / 2;    /* Now x, y are the distinct elements, x from array a and y from array b */  /* Fill in the results */  c[0] = x;  c[3] = y;  /* If a[0] is non-shared, then a[1] must be the first shared element; otherwise a[0] */  c[1] = (a[0] == x ? a[1] : a[0]);  /* If a[2] is non-shared, then a[1] must be the last shared element; otherwise a[2] */  c[2] = (a[2] == x ? a[1] : a[2]);  

Example: a = {1, 3, 5}, b = {3, 5, 2}

s = (1-3)+(3-5)+(5-2) = -2-2+3 = -1  t = (1-3)*(1+3)+(3-5)*(3+5)+(5-2)*(5+2) = -8-16+21 = -3  t / s = 3  x = (-1 + 3) / 2 = 1  y = (3 - (-1)) / 2 = 2  c[0] = 1  c[3] = 2  c[1] = 3  c[2] = 5  

so c gets the value {1,3,5,2}, as desired!

For fun, here a compacter version:

/* Declarations */  int a[3], b[3], c[4];  int s = 0, t = 0, k, i;    /* Actual algorithm */  for (i = 0; i < 3; i++) { s += (k = a[i]-b[i]); t += k * (a[i]+b[i]); }  t /= s;  c[0] = (s + t) >> 1;  c[3] = (t - s) >> 1;  c[1] = (a[0] == x ? a[1] : a[0]);  c[2] = (a[2] == x ? a[1] : a[2]);  

Note that coolly enough if the problem is generalized so that n-1 elements are shared and there is one unique element in both arrays, this is an O(n) algorithm, whereas set intersection and/or union based algorithms in general are O(n log n) :)


Solution:6

Instead of counter1, counter2, counter3:

counter[3];  

A lot of things get easier from there. You can refer to everything in loops, to start with.


Solution:7

I'm pretty sure I don't understand the question.

You say:

we are sure that two of the values will be same. we are asked to put these four different values

Which four different values are you referring to? The two that are the same? Because that's what the word "these" refers to.

Do you mean: Take the 4 unique values and put them into an array?

So that:

1, 2, 3
2, 3, 4

Becomes:

1, 2, 3, 4?

That's easy:

int[] c = a.Concat(b).Distinct().ToArray();  

This uses the Linq extension methods of .NET 3.5. If you're not on .NET 3.5, you can do this:

Dictionary<int, int> c1 = new Dictionary<int, int>();  foreach (var x in a)      c1[x] = 1;  foreach (var x in b)      c1[x] = 1;  int[] c = new List<int>(c1.Keys).ToArray();  

Now, if you need the order to be this:

  1. The first value that only occured once
  2. The first value that occured twice
  3. The second value that occured twice
  4. The second value that only occured once

Then I'm afraid I don't have a one-liner for you, will have to think about it some more.

Might I ask what the context is? Why this requirement?


Solution:8

I came up with this as a first draft, but I think it can need some improvement. It also doesn't fulfill the requirement of having the duplicates at position 1 and 2 and the unique numbers at 0 and 3 in the array. I thought I'd post it anyway, so you can get an idea of how this can look like:

  int[] a = { 1, 201, 354 };    int[] b = { 404, 201, 354 };      int[] c = new int[ 4 ];      // Start by just copying over one of the arrays completely.    a.CopyTo( c, 0 );      // Loop through b and compare each number against each    // each number in a.    foreach( int i in b )    {      // Assume that you're not dealing with a duplicate      bool found = true;      foreach( int j in a )      {        // If you find a duplicate, set found to false        if( i == j )        {          found = false;        }                 }      // If you haven't found a duplicate this is the      // number you want - add it to the array.      if (found == true)      {        c[3] = i;      }    }  


Solution:9

bool isUsed[6]={true, true, true, true, true, true};    int values[6];    int valuesCount = 0;    int i,j;    for( i = 0 ; i < 3 ; i++)  {     bool goodValue = true;     for ( j = 0; j < valuesCount; j++)     {         if(values[j] == a[i])         {             isUsed[j] = false;             goodValue = false;             break;         }     }     if(goodValue)     {         values[valuesCount++]=a[i];     }  }    //same for b[i]    for( i = 0 ; i < valuesCount; i++)  {      if( isUsed[i] ) printf("%i ", values[i]);  }  


Solution:10

This part

    if (a[0] == b[0]) { counter0++; }      if (a[0] == b[1]) { counter0++; }      if (a[0] == b[2]) { counter0++; }        if (a[1] == b[0]) { counter1++; }      if (a[1] == b[1]) { counter1++; }      if (a[1] == b[2]) { counter1++; }        if (a[2] == b[0]) { counter2++; }      if (a[2] == b[1]) { counter2++; }      if (a[2] == b[2]) { counter2++; }  

Could probably be rewritten as

for (i=0; i<3; i++)  {      for (j=0; j<3; j++)      {           switch(i)           {                case 0:                if (a[i] == b[j]) { counter0++; }                break;                case 1:                if (a[i] == b[j]) { counter1++; }                break;                case 2:                if (a[i] == b[j]) { counter2++; }                break;            }       }  }  

The other part with the other counters should be written similarly. Then you could maybe refactor this into a separate method and just pass the arrays and counters to it.

Another option could be LINQ, but I'm not sure exactly how to write something like this.

(Haven't tried compiling this, but is the idea clear?)

UPDATE: If you could put the counters in an array, this might work:

for (i=0; i<3; i++)  {      for (j=0; j<3; j++)      {          if (a[i] == b[j]) { counters[i]++; }         }  }  


Solution:11

I am trying to give a short answer. However it assumes that input will be correct.

int c1, c2, i;  c1 = a[0] == b[0] ? 0                    : (a[0] == b[1] ? 1 : 2); // index of a[0] in array 'b'  c2 = a[1] == b[0] ? 0                    : (a[1] == b[1] ? 1 : 2); // index of a[1] in array 'b'    for(i=0; i<2; i++)      Console.WriteLine(a[i]);  Console.WriteLine(b[3-c1-c2]); // looks quite hacky but it is actually element of 'b' not in array 'a'  


Solution:12

Here's some simple code, but it assumes that the values in a and b are always positive.

int[] a = { 1, 201, 354 };  int[] b = { 404, 201, 354 };    int[] c = { -1, -1, -1, -1};    for(int i = 0; i < 3; i++){      int notfound = 1;      for(int j = 0; j < 3; j++){          if(b[j] == -1) continue;            if(a[i] == b[j]){              b[j] = -1;              if(c[1] == -1)                  c[1] = a[i];              else                  c[2] = a[i];              notfound = 0;              break;          }      }      if(notfound)          c[0] = a[i];  }  int k = 0;  while(b[k++] == -1);  c[3] = b[k];  

I haven't tested it, but hopefully you get the idea. This uses very little extra space (just the space for notfound, which could be made a boolean, and the index variables) and should be quite fast.


Solution:13

int[] a = { 204, 534, 1 };  int[] b = { 204, 534, 401 };  int[] c = new int[4];    int x = 3, y = 3, k = 1;  for(int i=0; i<3; i++)      for(int j=0; j<3; j++)          if (a[i] == b[j]) {              c[k++] = a[i];              x -= i;              y -= j;              break;          }  c[0] = a[x];  c[3] = b[y];  


Solution:14

Sapph provided an answer that is about as clean as it gets, but here is one if performance is extremely important. The .NET array bounds checking will probably add some overhead, but in C this compiles down to 64 instructions with no branches.

int[] a = { 204, 534, 1 };  int[] b = { 204, 534, 401 };  int[] c = new int[4];    // pick the value from a that is not in b for c[0]  // a[0] not in b is implied by a[1] in b and a[2] in b  int a1_not_in_b = Convert.ToInt32(a[1] != b[0] & a[1] != b[1] & a[1] != b[2]);  int a2_not_in_b = Convert.ToInt32(a[2] != b[0] & a[2] != b[1] & a[2] != b[2]);    // bitfield of 2 bit values equivalent to the array {0,1,2,0,1}  int idxs = 0 | 1 << 2 | 2 << 4 | 0 << 6 | 1 << 8;  // if a[1] not in b start at 1, if a[2] not in b start at 2, else start at 0  idxs >>= 2*a1_not_in_b | 4*a2_not_in_b;  c[0] = a[(idxs >> 0) & 3];  c[1] = a[(idxs >> 2) & 3];  c[2] = a[(idxs >> 4) & 3];    // pick the value from b that is not in a  // b[0] not in a is implied by b[1] in a and b[2] in a  int b1_not_in_a = Convert.ToInt32(a[0] != b[1] & a[1] != b[1] & a[2] != b[1]);  int b2_not_in_a = Convert.ToInt32(a[0] != b[2] & a[1] != b[2] & a[2] != b[2]);  c[3] = b[b1_not_in_a | 2*b2_not_in_a];  


Solution:15

Faster?

using System;  using System.Linq;  using sw = System.Diagnostics.Stopwatch;  class Program  {      static void Main()      {          int[] a = new int[] { 1, 2, 3 },  // try: a={1,2,2} b={2,2,3}                b = new int[] { 4, 2, 3 }, c = new int[4];          sw sw = sw.StartNew();          for (int i = 5000000; i > 0; i--) { dssd1(a, b, c); dssd1(b, a, c); }          Console.Write(sw.ElapsedMilliseconds);          Console.Read();      }        static void dssd0(int[] a, int[] b, int[] c)               // 6710 ms.      {          int[] s = a.Intersect(b).ToArray();        // same          int[] d = a.Union(b).Except(s).ToArray();  // diff          c[0] = d[0]; c[1] = s[0]; c[2] = s[1]; c[3] = d[1];      }        static void dssd1(int[] a, int[] b, int[] c)               //   61 ms.      {          if (a[0] != b[0] && a[0] != b[1] && a[0] != b[2])          { c[0] = a[0]; c[1] = a[1]; c[2] = a[2]; goto L0; }          if (a[1] != b[0] && a[1] != b[1] && a[1] != b[2])          { c[0] = a[1]; c[1] = a[0]; c[2] = a[2]; goto L0; }          c[0] = a[2]; c[1] = a[0]; c[2] = a[1];      L0: if (b[0] != c[1] && b[0] != c[2]) { c[3] = b[0]; return; }          if (b[1] != c[1] && b[1] != c[2]) { c[3] = b[1]; return; }          c[3] = b[2];      }  }  

Fastest?

    L0: c[3] = b[0] != c[1] && b[0] != c[2] ? b[0] :           //   49 ms.          b[1] != c[1] && b[1] != c[2] ? b[1] : b[2];  


Solution:16

How about this?

private static int[] FindDuplicates(int[] arrA,int[] arrB)      {          var aList=new List<int>();          Array.Sort(arrA);          Array.Sort(arrB);              for(int i=0;i<arrA.Length;i++)          {             if(arrB.Contains(arrA[i]))             {                        aList.Add(arrA[i]);             }            }          return aList.ToArray();        }  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »