Tutorial :Finding the maximum subsequence binary sets that have an equal number of 1s and 0s


I found the following problem on the internet, and would like to know how I would go about solving it:

You are given an array ' containing 0s and 1s. Find O(n) time and O(1) space algorithm to find the maximum sub sequence which has equal number of 1s and 0s.


  1. 10101010 - The longest sub sequence that satisfies the problem is the input itself
  2. 1101000 - The longest sub sequence that satisfies the problem is 110100



I have to completely rephrase my answer. (If you had upvoted the earlier version, well, you were tricked!)

Lets sum up the easy case again, to get it out of the way:

Find the longest prefix of the bit-string containing an equal number of 1s and 0s of the array.

This is trivial: A simple counter is needed, counting how many more 1s we have than 0s, and iterating the bitstring while maintaining this. The position where this counter becomes zero for the last time is the end of the longest sought prefix. O(N) time, O(1) space. (I'm completely convinced by now that this is what the original problem asked for. )

Now lets switch to the more difficult version of the problem: we no longer require subsequences to be prefixes - they can start anywhere.

After some back and forth thought, I thought there might be no linear algorithm for this. For example, consider the prefix "111111111111111111...". Every single 1 of those may be the start of the longest subsequence, there is no candidate subsequence start position that dominates (i.e. always gives better solutions than) any other position, so we can't throw away any of them (O(N) space) and at any step, we must be able to select the best start (which has an equal number of 1s and 0s to the current position) out of linearly many candidates, in O(1) time. It turns out this is doable, and easily doable too, since we can select the candidate based on the running sum of 1s (+1) and 0s (-1), this has at most size N, and we can store the first position we reach each sum in 2N cells - see pmod's answer below (yellowfog's comments and geometric insight too).

Failing to spot this trick, I had replaced a fast but wrong with a slow but sure algorithm, (since correct algorithms are preferable to wrong ones!):

  • Build an array A with the accumulated number of 1s from the start to that position, e.g. if the bitstring is "001001001", then the array would be [0, 0, 1, 1, 1, 2, 2, 2, 3]. Using this, we can test in O(1) whether the subsequence (i,j), inclusive, is valid: isValid(i, j) = (j - i + 1 == 2 * (A[j] - A[i - 1]), i.e. it is valid if its length is double the amount of 1s in it. For example, the subsequence (3,6) is valid because 6 - 3 + 1 == 2 * A[6] - A[2] = 4.
  • Plain old double loop:

    maxSubsLength = 0 for i = 1 to N - 1 for j = i + 1 to N if isValid(i, j) ... #maintain maxSubsLength end end

This can be sped up a bit using some branch-and-bound by skipping i/j sequences which are shorter than the current maxSubsLength, but asymptotically this is still O(n^2). Slow, but with a big plus on its side: correct!


Strictly speaking, the answer is that no such algorithm exists because the language of strings consisting of an equal number of zeros and ones is not regular.

Of course everyone ignores that fact that storing an integer of magnitude n is O(log n) in space and treats it as O(1) in space. :-) Pretty much all big-O's, including time ones, are full of (or rather empty of) missing log n factors, or equivalently, they assume n is bounded by the size of a machine word, which means you're really looking at a finite problem and everything is O(1).


New solution: Suppose we have for n-bit input bit-array 2*n-size array to keep position of bit. So, the size of array element must have enough size to keep maximum position number. For 256 input bit array, it's needed 256x2 array of bytes (byte is enough to keep 255 - the maximum position).

Moving from the first position of bit-array we put the position into array starting from the middle of array (index is n) using a rule:

1. Increment the position if we passed "1" bit and decrement when passed "0" bit

2. When meet already initialized array element - don't change it and remember the difference between positions (current minus taken from array element) - this is a size of local maximum sequence.

3. Every time we meet local maximum compare it with the global maximum and update if the latter is less.

For example: bit sequence is 0,0,0,1,0,1

   initial array index is n     set arr[n] = 0 (position)       bit 0 -> index--     set arr[n-1] = 1        bit 0 -> index--     set arr[n-2] = 2       bit 0 -> index--     set arr[n-3] = 3       bit 1 -> index++     arr[n-2] already contains 2 -> thus, local max seq is [3,2] becomes abs. maximum        will not overwrite arr[n-2]       bit 0 -> index--     arr[n-3] already contains 3 -> thus, local max seq is [4,3] is not abs. maximum       bit 1 -> index++     arr[n-2] already contains 2 -> thus, local max seq is [5,2] is abs. max  

Thus, we passing through the whole bit array only once. Does this solves the task?

input:      n - number of bits      a[n] - input bit-array    track_pos[2*n] = {0,};  ind = n;  /* start from position 1 since zero has    meaning track_pos[x] is not initialized */  for (i = 1; i < n+1; i++) {      if (track_pos[ind]) {          seq_size = i - track_pos[ind];          if (glob_seq_size < seq_size) {              /* store as interm. result */              glob_seq_size = seq_size;              glob_pos_from = track_pos[ind];              glob_pos_to   = i;          }      } else {          track_pos[ind] = i;      }        if (a[i-1])          ind++;      else          ind--;  }    output:      glob_seq_size - length of maximum sequence      glob_pos_from - start position of max sequence      glob_pos_to   - end position of max sequence  


In this thread ( http://discuss.techinterview.org/default.asp?interview.11.792102.31 ), poster A.F. has given an algorithm that runs in O(n) time and uses O(sqrt(n log n)) bits.


brute force: start with maximum length of the array to count the o's and l's. if o eqals l, you are finished. else reduce search length by 1 and do the algorithm for all subsequences of the reduced length (that is maximium length minus reduced length) and so on. stop when the subtraction is 0.


As was pointed out by user "R..", there is no solution, strictly speaking, unless you ignore the "log n" space complexity. In the following, I will consider that the array length fits in a machine register (e.g. a 64-bit word) and that a machine register has size O(1).

The important point to notice is that if there are more 1's than 0's, then the maximum subsequence that you are looking for necessarily includes all the 0's, and that many 1's. So here the algorithm:

Notations: the array has length n, indices are counted from 0 to n-1.

  1. First pass: count the number of 1's (c1) and 0's (c0). If c1 = c0 then your maximal subsequence is the entire array (end of algorithm). Otherwise, let d be the digit which appears the less often (d = 0 if c0 < c1, otherwise d = 1).
  2. Compute m = min(c0, c1) * 2. This is the size of the subsequence you are looking for.
  3. Second pass: scan the array to find the index j of the first occurrence of d.
  4. Compute k = max(j, n - m). The subsequence starts at index k and has length m.

Note that there could be several solutions (several subsequences of maximal length which match the criterion).

In plain words: assuming that there are more 1's than 0's, then I consider the smallest subsequence which contains all the 0's. By definition, that subsequence is surrounded by bunches of 1's. So I just grab enough 1's from the sides.

Edit: as was pointed out, this does not work... The "important point" is actually wrong.


Try something like this:

/* bit(n) is a macro that returns the nth bit, 0 or 1. len is number of bits */  int c[2] = {0,0};  int d, i, a, b, p;  for(i=0; i<len; i++) c[bit(i)]++;  d = c[1] < c[0];  if (c[d] == 0) return; /* all bits identical; fail */  for(i=0; bit(i)!=d; i++);  a = b = i;  for(p=0; i<len; i++) {    p += 2*bit(i)-1;    if (!p) b = i;  }  if (a == b) { /* account for case where we need bits before the first d */    b = len - 1;    a -= abs(p);  }  printf("maximal subsequence consists of bits %d through %d\n", a, b);  

Completely untested but modulo stupid mistakes it should work. Based on my reply to Thomas's answer which failed in certain cases.


New Solution: Space complexity of O(1) and time complexity O(n^2)

        int iStart = 0, iEnd = 0;          int[] arrInput = { 1, 0, 1, 1, 1,0,0,1,0,1,0,0 };            for (int i = 0; i < arrInput.Length; i++)          {              int iCurrEndIndex = i;              int iSum = 0;              for (int j = i; j < arrInput.Length; j++)              {                                      iSum = (arrInput[j] == 1) ? iSum+1 : iSum-1;                  if (iSum == 0)                  {                      iCurrEndIndex = j;                  }                }              if ((iEnd - iStart) < (iCurrEndIndex - i))              {                  iEnd = iCurrEndIndex;                  iStart = i;              }          }  


I am not sure whether the array you are referring is int array of 0's and 1's or bitarray??

If its about bitarray, here is my approach:

int isEvenBitCount(int n)  {      //n ... //Decimal equivalent of the input binary sequence      int cnt1 = 0, cnt0 = 0;      while(n){          if(n&0x01) { printf("1 "); cnt1++;}          else { printf("0 "); cnt0++; }          n = n>>1;      }      printf("\n");      return cnt0 == cnt1;  }    int main()  {      int i = 40, j = 25, k = 35;        isEvenBitCount(i)?printf("-->Yes\n"):printf("-->No\n");      isEvenBitCount(j)?printf("-->Yes\n"):printf("-->No\n");      isEvenBitCount(k)?printf("-->Yes\n"):printf("-->No\n");  }  

with use of bitwise operations the time complexity is almost O(1) also.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »