By the end of this assignment you should have:
become fluent with the mechanics of class definitions in Python, including constructors and other special methods
practised implementing a class, given its interface
practised designing the interface for a class, given the class name and its general purpose
practised identifying suitable classes, given the description of a problem in English
faced decisions about which classes should be responsible for what, and made reasonable choices
faced decisions about what data structure to use to implement a class, weighed some options, and made reasonable choices
recorded important facts about an implementation decision using representation invariants
learned how to define a subclass of another class
used an ADT to solve a problem, and appreciated the benefit of not having to know how it is implemented
Consider what happens when a delivery company like FedEx or Purolator receives a plane load of parcels at Pearson Airport to be delivered by truck to locations all over southern Ontario. They have to schedule the parcels for delivery by assigning parcels to trucks and determining the route each truck will take to make its deliveries. Depending on how well these decisions are made, trucks may be well-packed and have short, efficient routes, or trucks may not be fully filled and may have to visit many far-flung locations.
For this assignment, you will write code to try out different algorithms to perform parcel scheduling and compare their performance.
Please read through the following section carefully. Your Python code must accurately model all of the details described here.
The parcel delivery domain
In this experiment, each parcel has a source and a destination, which are the name of the city the parcel came from, and where it must be delivered to, respectively. We will not deal with specific addresses within cities. Each parcel also has a volume, which is a positive integer.
Each truck can store multiple parcels, but as a volume capacity, also a positive integer. (Each truck has its own volume capacity.) The total sum of the volumes of the parcel on a truck cannot exceed its volume capacity. Each truck also has a route, which is a list of city names that it is scheduled to travel through.
Parcel IDs are unique, that is, no two parcels can have the same ID. Truck IDs are also unique.
In this experiment, there is a special city that all parcels and trucks start from. We’ll refer to this city as the depot. (So you can imagine all parcels have been shipped from their source city to the depot.) Our algorithms will schedule delivery of parcels from the depot to their destinations.
At the beginning of the experiment, all trucks are in the depot, are empty, and have no other cities on their route. Routes will be determined during the experiment.
We will use a very simple algorithm for choosing a truck’s route, given the parcels that it has been assigned to deliver: Every time it is decided that a parcel will go onto a truck, if the parcel’s destination is not already on that truck’s route, the destination is added to the end of the route.
You will implement two different algorithms for choosing which parcel goes on which truck.
(1) Random algorithm
The random algorithm will go through the parcels in random order. For each parcel, it will schedule it onto a randomly chosen truck (from among those trucks that have capacity to add that parcel). Because of this randomness, each time you run your random algorithm on a given problem, it may generate a different solution.
(2) Greedy algorithm
The greedy algorithm tries to be more strategic. Like the random algorithm, it processes parcels one at a time, and picks the “best” truck it can for each parcel, but without looking ahead to possible consequences of the choice (that’s why we call it “greedy”).
The greedy algorithm has configurable features: the order in which parcels are considered, and how a truck is chosen for each parcel, described below.
There are four possible orders that the algorithm could use to process the parcels:
In order by parcel volume, either smallest to largest (non-decreasing) or largest to smallest (non-increasing).
In order by parcel destination, either smallest to largest (non-decreasing) or largest to smallest (non-increasing). Since destinations are strings, larger and smaller is determined by comparing strings (city names) alphabetically.
Ties are broken using the order in which the parcels are read in from our data file (see below).
When the greedy algorithm processes a parcel, it must choose which truck to assign it to. The algorithm does the following to compute the eligible trucks:
It only considers trucks that have enough unused volume to add the parcel.
Among these trucks, if there are any that already have the parcel’s destination on their route, only those trucks are considered. Otherwise, all trucks that have enough unused volume are considered.
Given the eligible trucks, the algorithm can be configured one of two ways to make a choice:
choose the eligible truck with the most available space, or
choose the eligible truck with the least available space
Ties are broken using the order in which the trucks are read in from our data file. If there is no eligible truck, then the parcel is not scheduled onto any truck.
Observations about the Greedy Algorithm
Since there are four options for parcel priority and two options for truck choice, our greedy algorithm can be configured eight different ways in total.
Notice that there is no randomness in the greedy algorithm; it is completely “deterministic”. This means that no matter how many times you run your greedy algorithm on a given problem, it will always generate the same solution.
Putting it all together
For this assignment, your program will create an experiment by reading in some parcel, truck, and configuration data from a set of files. Your code will be able to apply the random algorithm or any of the variations of the greedy algorithm to a scheduling problem, and then report on statistics of interest, such as the average distance travelled by the trucks. This will allow the delivery company to compare the algorithms and decide which is best, given the company’s priorities.
Download this tar file containing a bunch of starter files for this assignment, and extract its contents into assignments/a1.
The module that drives the whole program is experiment. It contains class SchedulingExperiment. A SchedulingExperiment can be created based on a dictionary that specifies the location of necessary data files as well as an algorithm configuration. Then the experiment can be run and statistics can be generated and (optionally) reported. To test your code, we will construct instances of SchedulingExperiment, call its methods, and examine the statistics that it returns.
The experiment module contains a main block that sets up and runs a single experiment. It can be used as a quick and dirty check to see if your experiment code can run without errors. For fuller testing, you should add unit tests to a1_test.py and run that.
An experiment can be run in verbose mode. In that case, it should print step-by-step details regarding the scheduling algorithm as it runs. You may find this helpful for debugging. The specific verbose output doesn’t matter, as we will never test your code in verbose mode. You are also free to change the name of the configuration file that is read in the main block of module experiment, because we will not test your code by running module experiment, just by importing it and using class SchedulingExperiment. Running the module would cause the main block to execute, whereas importing it will not.
Format of the Configuration Dictionary
The configuration dictionary stores the configuration of the scheduling experiment as a set of key-value pairs:
depot_location: the name of the city where each parcels is, and from which it must be delivered to its destination.
parcel_file: the name of a file containing parcel data
truck_file: the name of a file containing truck data
map_file: the name of a file containing distance data
algorithm: either ‘random’ or ‘greedy’. If ‘random’, none of the remaining keys are required in the configuration file (and if they are present, they will be ignored).
parcel_priority: either ‘volume’ or ‘destination’
parcel_order: either ‘non-decreasing’ or ‘non-increasing’
truck_order: either ‘non-decreasing’ or ‘non-increasing’
verbose: either ‘true’ or ‘false’
Format of the Data Files
The parcel file has one parcel per line, with this format: <parcel_id, <source, <destination, <parcel_volume
The truck file has one truck per line, with this format: <truck_id, <truck_volume
The map file has one distance fact per line, with this format: <city1, <city2, <distance
It is possible, for instance due to one-way traffic restrictions on some roads, that this file contains one distance from city A to city B, and a different distance from city B to city A or no distance from city B to city A.
The starter files include an example of each of these data files, as well as a program called generator.py that can generate new, random, parcel and truck files. You may find this helpful for testing your code. Note the places in generator.py that allow you to control what it generates.
The next part of this handout guides you through the code as you work on the different parts of this assignment.
Task 0: Prepare
In the first weeks of classes, you will learn material that you need for the various tasks in this assignment. But you can prepare for the assignment right away by doing the following:
Read this handout to learn about the problem domain and what you will do for the assignment.
Ensure that you have a precise understanding of the random and greedy algorithms.
Read the sample configuration file and understand its format and meaning.
Learn how to use the generate.py module: Read the code and set the appropriate values in order to generate a very tiny dataset with 10 parcels and 2 trucks.
On paper, enact the greedy algorithm on that tiny dataset. Keep track of just enough information to generate the statistics required (see Task 6). You can look up inter-city distances in map-data.txt.
Clear up any questions you have about the assignment by asking in class or on the course forum.
Clear up any questions you have about Python as well.
Talk to other students in lecture and lab, and think about who you might like to partner with.
Task 1: Distance Map
knowing how to design and implement a class (week 2).
In file distance_map.py, use the Class Design Recipe to define a class called DistanceMap that lets you store and look up the distance between any two cities. Your program will use an instance of this class to store the information from the map data file. Your class should be able to store one distance from city A to city B and a different distance from B to A if that is what the file says. (Perhaps these are different due to one-way roads, for instance.) If the file only specifies a distance in one direction, your Distance Map should consider the distance in the other direction to be unknown.
The precise interface for this class is up to you, as is the choice of data structure. For something so simple, there are surprisingly many choices. Just pick something reasonable, and be sure to define representation invariants that document any important facts about your data structure.
Task 2: Modelling the Domain
knowing how to design and implement a class (week 2).
knowing how to identify appropriate classes from a problem description (week 2).
Your next task is to design and implement the classes necessary to represent the entities in the experiment. At the very least, you must create a class for a parcel and one for a truck. Any further classes are up to you.
Start each class simply, by focusing on the data you know it must store and any operations that you are certain in must provide. Very likely, as you start to use these classes in later steps, you will add new operations or make other changes. This is appropriate.
Do all of your work for this task in the file domain.py.
Task 3: Priority Queue
knowing how to implement a class (week 2).
understanding inheritance (week 3).
In the greedy scheduling algorithm, you will need to consider the parcels one a time. While we could use a Python list, it has no way to give us items in order according to our priority (by volume in non-decreasing order, for example).
Instead we will use something called a PriorityQueue. It is a kind of container that allows you to add and remove items. But a PriorityQueue removes items in a very specific order: it always removes the item with the highest priority. We define what the priority is to be when we construct the PriorityQueue.
A PriorityQueue supports the following actions:
determine whether the priority queue is empty
insert an item with a given priority
remove the item with the highest priority
Look at container.py, which contains the Container class, as well as a new, partially-complete PriorityQueue class. PriorityQueue is a child class of Container, which means that it inherits attributes and methods from Container. We will cover inheritance in Week 3 of the course.
You must complete the add() method according to its docstring. (Notice that we’re using a sorted list to implement this class; in later courses, you’ll learn about a much more efficient implementation called heaps.)
Task 4: The Scheduling Algorithms
Requires: same as Task 3
In the file scheduler.py, we have given you a class called Scheduler. It is an abstract class: the schedule method is not implemented, so no instances of the class should be constructed. Its purpose is to define what any sort of scheduler must be able to do.
Your task is to implement two subclasses of Scheduler called RandomScheduler and GreedyScheduler, which implement the scheduling algorithms defined above. This is the heart of the code.
Your code in this module will make use of the various classes you have completed in the earlier steps.
Task 5: Experiment
Requires: same as Task 3
You are ready to write the code that runs the whole experiment! Take a look at the file experiment.py, which contains a skeleton for class SchedulingExperiment. Implement the class. Here are some tips:
The constructor must look in the config dictionary for the names of all the relevant data files, read those data files, and construct the appropriate kind of scheduler.
No other class may read any input from a file or from the user. If it does, you may fail all of our test cases.
We will test your code by constructing and running a SchedulingExperiment and inspecting the dictionary that is returned. The format of any output you generate when report or verbose is True does not matter.
Method run() is very simple, given all the other code you already have. This method is supposed to return a dictionary full of statistics, but for now, it may return an empty dictionary.
Your experiment should run now, but you will not have correct statistics.
Task 6: Statistics
Requires: same as Task 3
Now revise your run() method to return a dictionary of statistics, with the following keys:
fleet: the number of trucks in the fleet
unused_trucks: the number of unused trucks (trucks that have no parcels)
avg_distance: among the used trucks, the average distance they will have to travel to follow their scheduled route
avg_fullness: among the used trucks, their average fullness. The fullness of a truck is the percentage of its volume that is taken by parcels
unused_space: among the used trucks, their total unused space. The unused space of a truck is the amount (not percentage) of its volume that is not taken by parcels
unscheduled: the number of unschedule parcels (parcels that did not fit onto any truck)
Of course, there are many other interesting statistics you could report, but we had to stop somewhere!
You can assume there is always at least one parcel and one truck, and that at least one truck will be used. You may also assume that the map file contains every distance that you will need in order to compute the statistics.
Extra modules we are providing
There are several modules that we are providing for you:
generator: Generates random parcel and truck data and writes each to a file.
explore: Compares all algorithms on a single problem.
a1_test: Contains a sample configuration dictionary from which it generates 6 unittest test cases — one for each statistic that your code should generate. (Yes, this code creates unittest code. Very cool.) You are not required to do anything with this module, but this is the place where you should put your tests of the overall SchedulingExperiment class. To add a new test case, just follow the pattern that we’ve shown.
Take some time to polish up. This step will improve your mark, but it also feels so good. Here are some things you can do:
Pay attention to any violations of the “PEP8” Python style guidelines that PyCharm points out. Fix them!
In each module, run python_ta.check_all(config=’.pylintrc’) to check for flaws. Note: .pylintrc is a customization file for python_ta, provided in the starter files. It allows you to import more modules than would normally be allowed by python_ta. Make sure it’s in the same folder as your code when you run python_ta.check_all so that it is found.
Check your docstrings to make sure they are precise and complete and that they follow the conventions of the Function Design Recipe and the Class Design Recipe.
Read through and polish your internal comments.
Remove any code you added just for debugging, such as print statements.
Remove any pass statement where you have added the necessary code.
Remove the word “TODO” wherever you have completed the task.