A Greedy Algorithm

Throwback Thursday: An Algorithms I Original

Posted by Matthew Roach on October 6, 2016

This program was originally written in October of 2015 for an Algorithms assignment.

The Problem

This program reads an input file, SpaceStation.txt, which contains data about the start and end times of different projects. Also the number of projects and S and F. The goal is to select a minimum-size set of projects so that at all times from S to F, at least one research project is in progress. The selectProjects function solves this part of the problem.

Each project has an associated cost equal to its length, and the costs of the projects selected must each be charged to one of two accounts. The total costs charged to each account must be as close as possible.

Eg. A totalCost of 100, account1 charged 50 and account2 charged 50 is optimal.

The divideCosts function solves this part of the problem.

The Solution

Select Projects

This function finds a minimum set of projects that spans from a start time (S) to end time (F).
Paramaters:
 1. lis: A list of projects
 2. lastEnd: The last end time of our result list, starts off as S
How I did this was by creating a temporary list of all the possible values that could be selected, then chose the one with the largest end time value. If we got to the point where the temp list contained a project that ended after the endTime, I selected the one with the least weight.

Divide Costs

This function divides the list of projects into two groups so that the cost of both groups are as close to equal as possible.
Parameters:
 1. Takes a list of projects as a paramater
The idea here it to calculate K (totalCost/2) and see if there is a subset in projects that sums to K. If this is true, we determine what projects sum to K and set these as group 1. Group 2 are the projects that sum to totalCost - k. The data is then written to output.txt. If there isn't a subset that sums to K, we try again with K-1.

Code

#!/usr/bin/python

import operator


#Empty list to hold the start and end time of each project
lis = []

#Open the input file
f = open("SpaceStation.txt")

#Holds the number of available projects
numProjects = 0

#Holds the start and end time (the range in which we want to fill with projects)
startTime = 0
endTime = 0


#Store the input to an appropriate variable or data structure
for i, line in enumerate(f):
    if i == 0: # line 1 holds S and F
        startTime, endTime = line.split()
        startTime = int(startTime)
        endTime = int(endTime)
    elif i == 1: # line 2 holds the number of projects
        numProjects = int(line)
    else: # line > 2 holds the projects start and end times
        (key, val1, val2) = line.split()
        lis.append([val1, val2])


#Convert the each element of the list to an integer so that
#we can sort it by numerical value.
for i in lis:
    i[0] = int(i[0])
    i[1] = int(i[1])

#Make a copy of lis, call it sortedLis. Sorted by the project start time
sortedLis = list(lis)
sortedLis.sort(key=operator.itemgetter(0))

#Holds the projects that we select as an optimal solution.
result = []


def selectProjects(lis, lastEnd):
    """
    Finds a minimum set of projects that spans from a start time (S) to end time (F)
    Paramaters:
        1. lis: A lis of projects
        2. lastEnd: The last end time of our result list, start off as S
    """
    big = 0 #big will store the largest end time value in our temp list
    temp = [] #Holds potential projects we need to choose from

    
    if lastEnd >= endTime or lis == []: #We're done, return the result set 
        return result
    else:
        i=0
        #While start time is before the last chosen end time
        while lis[i][0] <= lastEnd:
            #append project i to the temp list
            temp.append(lis[i])
            if lis[i] == lis[-1]:
                break
            i+=1

        #Sort temp and lis by end time
        temp.sort(key=operator.itemgetter(1))
        lis.sort(key=operator.itemgetter(1))
        #If the last value ends after endTime then we will make our last choice now.
        #We do this by removing projects that end before endTime and then select the
        #project with the least weight.
        if temp[-1][1] >= endTime:
            sizeOfTemp = len(temp)
            j = 0
            #remove projects that finish before endTime from temp and lis
            while j < sizeOfTemp:
                if temp[j][1] < endTime:
                    temp.remove(temp[j])
                    lis.remove(lis[j])
                    sizeOfTemp-=1
                else:
                    j+=1
            #Choose the project in temp with the smallest weight
            #small is initially set to endTime - startTime. It can be set to any large value really.
            small = endTime - startTime
            for p in temp:
                if p[1] - p[0] < small:
                    small = p[1] - p[0]
                    choose = p
        else:
            #Re-sort back to sorted by start time. This resolves ties with end times
            #and chooses the least weighted project (hence the >= big, opposed to just >).
            temp.sort(key=operator.itemgetter(0))
            lis.sort(key=operator.itemgetter(0))
            for i in range(len(temp)):  
                if temp[i][1] >= big:
                    big = temp[i][1]
                    choose = temp[i]
       

        #Append the chosen project to the result list 
        result.append(choose)

        #Remove the already considered options
        if temp != []:
            for i in temp:
                lis.remove(i)

    selectProjects(lis, result[-1][1])


def divideCosts(projects):
    """
    This function divides the list of projects into two groups so that the cost
    of both groups are as close to equal as possible. Parameters:
        1. Takes a list of projects as a paramater
    """
    #Holds the costs of each project
    costList = []

    #Find the cost of each project, append to costList
    for i in range(len(projects)):
        cost = projects[i][1] - projects[i][0]
        costList.append(cost)

    #Calculate the total Cost
    totalCost = 0
    for i in costList:
        totalCost += i

    #Value we want the two costs to be as close to as possible
    K = totalCost / 2

    #We first check if the total cost can be split into K and totalCost - K
    #If not we check K-1 and (totalCost - (K-1)), and so on..
    while K != 0:
        #If K is a subset sum of costList
        if subsetSum(costList, K) == True:
            #DEBUGGING
            #print str(K) + " and " + str(totalCost - K)

            L = subsetList(costList, K) #L holds the set of costs of group 1
            M = subsetList(costList, totalCost - K) #M holds the set of costs of group 2
            
            group1 = []  #Holds the start-end times of the projects of group 1
            group2 = []  #Holds the start-end times of the projects of group 2

            #Append start-end time values to group 1 and group 2 by comparing the
            #differnce of each set of times in projects with the values in L and M.
            #If they match, append to the list.
            for i in projects:
                for j in L:
                    if i[1] - i[0] == j  and i not in group1:
                        group1.append(i)

            for i in projects:
                for j in M:
                    if i[1] - i[0] == j and i not in group2:
                        group2.append(i)
                        
                        
            group1ProjectNums = [] #Holds the project # of all the projects in group 1
            group2ProjectNums = [] #Holds the project # of all the projects in group 2

            #Get the project numbers for group 1 as indicated by the first value of
            #each row in the input file.
            for i in group1:
                for j in range(len(lis)):
                    if i == lis[j]:
                        group1ProjectNums.append(j+1)

            #Get the project numbers for group 2 as indicated by the first value of
            #each row in the input file.
            for i in group2:
                for j in range(len(lis)):
                    if i == lis[j]:
                        group2ProjectNums.append(j+1)    


            #Sort group1ProjectNums and group2ProjectNums 
            group1ProjectNums.sort()    
            group2ProjectNums.sort()

            #selected projects is equal to the group 1 projects plus the group 2 projects
            selectedProjects = group1ProjectNums + group2ProjectNums
            #Sort the list of selected projects
            selectedProjects.sort()     

            #Print Results to output file, output.txt
            #Open output.txt in write mode
            out=open('output.txt', 'w+')
            #Write results to file
            out.write("Selected Projects: " + str(selectedProjects))
            out.write("\n")
            out.write("\nGroup 1 Projects: " + str(group1ProjectNums) + ". Total Time: " + str(K))
            out.write("\nGroup 2 Projects: " + str(group2ProjectNums) + ". Total Time: " + str(totalCost - K))
            break
        K-=1



"""
HELPER FUCNTIONS
"""

#Given a list and a target, this function returns True or False, whether
#there is a subset of lis that sums to target
def subsetSum(lis, target):
    if target == 0:
        return True

    for i in range(len(lis)):
        if subsetSum(lis[:i] + lis[i+1:], target - lis[i]):
            return True
    return False

#Given a list and a target, this function returns a subset of lis that sums to target.
def subsetList(lis, target):
    subset = [] #Holds the result subset
    for i, x in enumerate(lis):
        #If subsetSum returns true for the reduced list, and a target of target minus the 
        #value we removed from lis, then append that value(x) because it is in the solution.
        if subsetSum(lis[i+1:], target - x) == True:
          subset.append(x)
          target -= x
    return subset


"""
Run Program
"""
#Call selectProjects with parameters sortedLis (sorted by start time) and startTime (20)
selectProjects(sortedLis, startTime)

#Call divideCosts with the projects solution list to calculate the total weights
#and which projects should be assigned to groups 1 and 2
divideCosts(result)

The Input

The input tells us:
 -> we need projects running from time 20 to time 100 (line 1)
 -> there are 40 projects to choose from (line 2)
 -> project number, project start time, project end time (lines 3 -42)

20	100
40
1	58	74
2	12	18
3	75	85
4	48	61
5	93	111
6	84	97
7	1	3
8	58	68
9	43	62
10	40	51
11	28	47
12	6	14
13	59	69
14	47	54
15	18	32
16	9	18
17	90	95
18	87	93
19	35	53
20	38	58
21	78	86
22	18	19
23	97	116
24	57	60
25	50	52
26	57	58
27	53	60
28	49	54
29	76	94
30	15	28
31	34	42
32	86	99
33	82	93
34	70	73
35	54	74
36	51	58
37	23	40
38	70	86
39	49	58
40	22	25

				

Program Output

Selected Projects: [1, 5, 9, 11, 15, 32, 38]

Group 1 Projects: [5, 9, 11]. Total Time: 56
Group 2 Projects: [1, 15, 32, 38]. Total Time: 59
				

The output looks good, but we can quickly check it.

ProjectStart TimeEnd Time
151832
112847
94362
15874
387086
328699
593111