Common Characters

A Microsoft Coding Competition Problem

Posted by Matthew Roach on September 24, 2016

Microsoft was on campus this week. After a couple pints at Clark Hall for the networking event I decided too test my skills at the coding competition that evening. Long story short, my team did not win the $150.00 (per person) prize. It was a humbling experience to say the least.

Clearly my coding under pressure is sub-par; so in an effort to salvage any remaining pride I'm going to do the questions one at a time when school work isn't too heavy.

The first problem, a.k.a. the one that I failed to complete, actually wasn't too difficult. Note to self: scrap paper is key... you can't solve these problems in your head. Anyway, this first problem took around 45 minutes to complete and my solution is provided below. Try it for yourself first, if you have what it takes.

The Problem

Given a set of lines each containing multiple random strings, your task for each line is to find the characters that are common to all of its strings. In other words, you should output one string for each line that contains all of the distinct characters that the given strings on that line have in common.

Characters include [A-Z, a-z, 0-9]. Characters with different cases (e.g. 'a' vs 'A') count as distinct characters.

The order of characters in the output string should appear as follows:

  • In order of numbers 0-9.
  • In order of uppercase characters A-Z.
  • In order of lowercase characters a-z.

Input definition

Input files for this problem will contain an arbitrary number of lines.

The first line will contain an integer, m, denoting the number of tests in the input file.

Each subsequent line will contain multiple space-separated strings.

Output definition

Your output should contain m lines, one corresponding to each test case given in the input.

Each line should have one string (which can be empty/a blank line) containing all of the distinct characters that the input strings have in common, listed in the order described above.

Example input

2
9x34209asAb32sD840D9258 02934x2DoebAaab90CDEe348290
aabcwdefghkij ksdfopweiraa wieourslakaaier asjoirwekjaa

Example output

023489ADabx
aeikw

The Solution

The first step in solving this problem is handling the input. The problem states that the first line of input contains an integer and the remaining lines contain multiple random strings. I first split the input into a list called "data", like so:

[ ['2'],['9x34209asAb32sD840D9258', '02934x2DoebAaab90CDEe348290'],
['aabcwdefghkij', 'ksdfopweiraa', 'wieourslakaaier', 'asjoirwekjaa'] ]
I also set the first input line to the variable "count". Remember, this value tells us how many lines of random strings there are.

data=[]
with open('practiceInput.txt') as infile:
    with open('out.txt', 'w') as outfile:
        for line in infile:
            data.append(line.strip().split(' '))

        tmp = data[0]
        count = int(tmp[0])

So, using the problem input as an example, the program will loop twice, once for each line. I first created a list called "temp", which splits the "data" list (minus the first integer value) into a list of characters, like so:

[ ['9','x','3','4','2','0','9','a','s','A','b','3','2','s','D','8','4','0','D','9','2','5','8'] ['0','2','9','3','4','x','2','D','o','e','b','A','a','a','b','9','0','C','D','E','e','3','4','8','2','9','0'] ]
I then set the first sub-list of "temp" (the first string of each line given in the input) to the variable "same".

for i in range(count):
	temp = [] 
    for j in range(len(data[i+1])):
        temp.append(list(data[i+1][j]))
              
    same = temp[0]

The general idea of my algorithm is as follows:

  • Store the first string of the current line in "same" (which is a list of characters)
  • Next, starting with the second string, loop through the remaining strings of the current line (line 1).
  • Then loop through the individual characters of that string (line 3). Remember, each string is a list of characters.
  • If the current character is in the "same" list, then we append it to a temporary list called "OK" (line 4 and 5).
  • Once we've finished looping through the current string, we set the "same" list equal to the "OK" list. Now the "same" list is a list of characters that are in both the first and second string, which then gets compared to the third string and so on until the "same" list contains only characters that are in every string of the input line.

for j in range(1,len(temp)):
    OK = []
    for k in range(len(temp[j])):
        if temp[j][k] in same:
            OK.append(temp[j][k])

    same = OK

OK, so the hard work is over now. Recall that the question states that the output must be numbers first, then uppercase characters, then lowercase. Here we just apply some built in functions to sort our list of characters, then convert that list to a string to write to the output file.

result = ''.join(sorted(set(same)))
outfile.write(result)
outfile.write("\n")

Code

#!/usr/bin/python

data=[]
with open('practiceInput.txt') as infile:
    with open('out.txt', 'w') as outfile:
        for line in infile:
            data.append(line.strip().split(' '))
        
        tmp = data[0]
        count = int(tmp[0])

        #Loop through each line of the input
        for i in range(count):
            #temp hold the current line, a 
			#list of lists of characters
			temp = [] 
            for j in range(len(data[i+1])):
                temp.append(list(data[i+1][j]))
            #same is initialized to the list of characters 
            #from the first string of the current line
            same = temp[0]

            #Loop through each word on the current 
			#line of input that's not the first word
            for j in range(1,len(temp)):
                OK = []
                for k in range(len(temp[j])):
                    if temp[j][k] in same:
                        OK.append(temp[j][k])

                same = OK

            result = ''.join(sorted(set(same)))
            outfile.write(result)
            outfile.write("\n")