Suppose that you are given a string. Write a function to find the first nonrepeated character in that string. Here’s an example: suppose you are given the string “interview”. The first nonrepeated character in that string is ‘n’, because ‘i’ appears twice in the string. And the first nonrepeated character for “racecar” is ‘e’ – which is the first and only nonrepeated character. Explain the efficiency of your algorithm.
Finding a solution for this problem is quite easy – but the hard part is finding an efficient solution. The very first solution that you may come up with is this: starting from the beginning of the string, for each character in the string, compare it to all of the other characters in the string. If you find another character that matches the current character then you know that you have found a repeat of that character. As soon as you find a character that has no matches, then you have found your first nonrepeated character. And that solution works just fine. But, the question is: how efficient is this solution – in other words, what is the time order of this solution? Remember that time order means the same thing as Big O.
The time order of our first solution
Well, in the worst case we would have to go through the entire string to find that the very last character is the first non repeated character (like in the string “xxyyz”) or that there is no non-repeated character (like in the string “toot”). This means that for a string that has n characters, we would have to do n comparisons for each character, giving us n*n which is n2. In terms of Big O Notation, this means that the algorithm will be O(n2). For very large strings, it’s very likely that this algorithm will do close to n2 comparisons if not n2. This is because there’s a higher probability of there being more repeated characters in larger strings when compared to smaller strings.
How can we improve our first solution?
But, there must be an algorithm that has a better time order than O(n2), because this solution is just too simple – especially if asked in an interview setting. So, rather than asking ourselves how we can make the algorithm more efficient, let’s instead ask what specifically we can improve in the algorithm we currently have. Well, what were we doing in this algorithm, and why was it O(n2) ? One factor of the “n2” (or basically half of the time efficiency of the algorithm – which would be “n” in this case) is going through each and every character in the string to see if it is nonrepeated. Because that nonrepeated character can even be at the very end of the string, improving this part of the algorithm to improve efficiency does not seem possible. But, the other half of the algorithm is searching the entire string in order to find matches for each character. Now, that sounds like it can be made more efficient. By improving the search of this algorithm, we can improve the overall efficiency as well.
Data structures allow for more efficient searching
Any time you want to improve the search efficiency for a set of data, it’s always a good idea to put that data inside a data structure – because a data structure is meant for more efficient searching and organization of data. The search portion in our current algorithm accounts for O(n) time. How can we improve that time so that it’s less than O(n)? Well, what data structures are less than O(n)? There’s binary trees, which can be searched in O(log(n)). And there’s also arrays and hash tables – which both have constant time element lookup, meaning that the time to lookup something does not depend on the size of the input. Since that sounds promising, let’s try to use either an array or a hash table since they are good data structures to help with faster searches. This will hopefully reduce our time order complexity.
Using a hash table or an array to find the first nonrepeated character in a string
If we use an array or a hashtable, the question is what will we store in the index for the array, or the key for the hashtable? Well, because we need to be able to search the data structure by character, the most obvious choice for the key or index is the character. Because array indices have to be numeric, we could just convert a given character to an integer and then use it as an index. This means for ASCII strings we would have 128 possible values, and for Unicode strings we would have 65,536 possible values. But, what do we actually store inside the data structures? Because the problem asks us to find the first nonrepeated character, then why don’t we just store the number of times each character appears in the string. So, all we have to do is scan the string once and we can get a count for each of the characters that would tell us how many times each one appears in the string.
And once we have a count for all of the characters, then all we have to do is scan the data structure for the very first “1” that we encounter in the data structure. This would give us the first non-repeated character!
What’s the Big-O Notation of the new solution to finding the first nonrepeated character?
But, is the new algorithm actually an improvement over what we previously had? Well, let’s break it down: we scan the entire string once to build the data structure. And then, once we have built the data structure, we scan it to find the ‘1’ – which tells us that this is the first nonrepeated character. This means that in the worst case, there will be 2 ‘operations’ on each character in a given string, which would be 2n for a string of length n. In Big-O Notation, this is O(n), which is a lot more efficient than our previous algorithm which was O(n2).
Hash tables versus arrays
Now the question is whether we should use an array or a hashtable as our data structure? Let’s examine the differences between the two. Hashtables do have a higher lookup overhead when compared to arrays, so that’s one advantage of arrays. But, the biggest difference is in the memory that each data structure would require. A hash table would only need to store the characters that actually exist in the input string – so if a string contains the characters “abcdef”, then a hashtable would only need to store the characters in the string “abcdef”. An array, on the other hand, would need an element for every single possible value of a character. This is because with an array you can not skip indices – so we can not have an array with just 2 elements, where one index is 10 and one index is 99. That’s impossible – you would have to have elements at index 0 and 1 for a 2 element array. And, remember that we have to store the numeric values of the characters in the index position for this problem. This means that if we have a Unicode string, we would need to have 65,536 elements (assuming a 16 bit Unicode encoding) in our array to account for every possible character that could be in the string – whether or not it is actually in the string. This is because we simply do not know what is going to be in the string that’s being passed in – but with an array we have to be ready to accept all possible values. But for an ASCII string, an array would really only need 128 elements, because there are only 128 possible ASCII values.
Because of the memory requirements, arrays would be better when there are not many possible character values (like in an ASCII string) and when the strings are long (because hash tables have a higher lookup overhead than arrays). But, hashtables are much more efficient when strings are smaller in size or when there are lot of possible character values.
Let’s use a hashtable, just because most people are very familiar with arrays, but not so much with hashtables. Before we write the actual code, let’s write out the pseudocode. Here’s what it would look like:
Create a hash table data structure that will store the count of each character Then, for each character in the string: If there is no value for that character in the hashtable then insert a '1' Otherwise, if there is a value for that character then increment the value by 1 After the first part is done, for each character in the input string, look it up in the hashtable, if there is a '1', then return that character because it will be the first nonrepeating character in the string. If none of the characters in the string have a count of 1, then return null, meaning there are no nonrepeated characters in the input string.
Example of using the Hashtable in Java:
If we implement the pseudocode above in Java this is what it would look like:
// returns the first nonrepeated Character in a string public static Character findFirstNonRepeated( String input) { // create a new hashtable: Hashtable hashChar = new Hashtable( ); int j, strLength; Character chr; Integer intgr; strLength = input.length( ); for (j =0; j < strLength; j++) { chr = new Character(input.charAt( j ) ); intgr = (Integer) hashChar.get(chr); if (intgr == null) { hashChar.put(chr, new Integer(1)); } else { hashChar.put(chr, new Integer(intgr.intValue( ) + 1) ); } } /* go through hashtable and search for the first nonrepeated char: */ for(j = 0; j < length; j++) { chr = new Character(input.charAt(j)); if (((Integer) hashChar.get(chr)).intValue( ) == 1) return chr; } /* this only returns if the loop above doesn't find a nonrepeated character. */ return null; } }
We can actually improve the code above even more. Note that every time we put something into the hashtable, we are creating an object of type Integer. Why not just create 2 Object values which are also flags that represent 'one time', and 'more than one time' that will tell us how many times a given character appears in the string? This way, we don't have to create a new object every time we insert into the hashtable, which consumes a lot of memory unnecessarily. Here is what the code would look like after changing it to use flags instead of creating new objects:
// returns the first nonrepeated Character in a string public static Character findFirstNonRepeated( String input) { // create a new hashtable: Hashtable hashChar = new Hashtable( ); int j, strLength; Character chr; Object oneTime = new Object( ); Object twoTimes = new Object( ); strLength = input.length( ); for (j =0; j < strLength; j++) { chr = new Character(input.charAt( j ) ); Object o = hashChar.get(chr); /*if there is no entry for that particular character, then insert the oneTime flag: */ if (o == null) { hashChar.put(chr, oneTime ); } /* */ else if (o == oneTime) { hashChar.put(chr, twoTimes); } } /* go through hashtable and search for the first nonrepeated char: */ for(j = 0; j < length; j++) { chr = new Character(input.charAt(j)); if ( hashChar.get(c) == oneTime) return chr; } /* this only returns null if the loop above doesn't find a nonrepeated character in the hashtable */ return null; } }
And finally, we have our final solution. The lesson here is that using data structures can really help in making a solution to a programming problem more efficient - and is something you should always keep in mind.
21 thoughts on “Find First Nonrepeated Character”