Give an algorithm that computes a fair driving schedule for all people in a carpool over $d$ days

Some people agree to carpool, but they want to make sure that any
carpool arrangement is fair and doesn’t overload any single person
with too much driving. Some scheme is required because none of them
goes to work every day, and so the subset of them in the car varies
from day to day.

Let the people be labeled in a set $S = \{p_1,…,p_k\}$. We say that
the total driving obligation of $p_j$ over a set of days is the
expected number of times that $p_j$ would have driven, had a driver
been chosen uniformly at random from among the people going to work
each day. That is, suppose the carpool plan lasts for $d$ days, and on
the $i^{th}$ day a subset $S_i \subseteq S$ of the people go to work.
Then the total driving obligation $\delta_j$ for $p_j$ can be written
as $\delta_j = \sum_{i:p_j\in S_i} \frac{1}{|S_i|}$. Ideally, we’d
like to require that $p_j$ drives at most $\delta_j$ times, but
$\delta_j$ may not be an integer.

A driving schedule is a choice of a driver for each day$-$a sequence
$p_{i_1}, p_{i_2},…,p_{i_d}$ with $p_{i_t}\in S_t-$and that a fair
driving schedule
is one in which $p_j$ is chosen as a driver on at
most $\lceil \delta_j \rceil$ days.

  1. Prove that for any sequence of sets $S_1,…,S_d$, there exists a fair driving schedule.

  2. Give an algorithm to compute a fair driving schedule that is polynomial in $k$ and $d$.

I have a few questions here. My first question has to do with $\delta_j$. Namely, what does the sum truly represent? I don’t really understand the notation $i:p_j\in S_i$. Are we summing over all $i$ values, all values in $S_i$, or what?

I’m also confused about number 1. A sequence of sets $S_1,…,S_d$ represents a set of people $S_1$ to $S_d$ through $d$ days. We want to show that there is a fair driving schedule for this set, so we need to show that $p_i$ drives on at most $\lceil \delta_i \rceil$ days for all people $p_i$. However, we’re not even told how many people there are in each set, so how can we determine what $\delta_i$ should be?

Again, this confusion may stem from my misunderstanding of the notation in the summation, so if someone could clear that up I would greatly appreciate it.

I’m not really sure how we can efficiently construct an algorithm that computes a fair driving schedule. Among which ideas should I look for insight here? I was thinking this was somewhat reminiscent of network flow problems, but I’m not entirely sure.

Solutions Collecting From Web of "Give an algorithm that computes a fair driving schedule for all people in a carpool over $d$ days"

There should be always a fair driving schedule since you can model this problem as a flow network. The source is connected with all persons and the capacity of an edge from the source to a person $p_j$ is $\lceil\delta_j\rceil$. The next step is to connect every person with each of his working days. The capacity of each of this edges is $1$. The last step is to connect each day with the sink with capacity $1$.

Applying the max-flow min-cut theorem to our flow network, we can see that the maximal flow of our network is equal to the number of days, since the minimal cut splits our network into the sink and all other nodes.

Explanation: Without loss of generality, we can assume that each person works at least one day and there are no days without workers. Therefore we exclude all cuts running through an edge which connects a person with a day. The cut which splits our network in source and all other nodes is not minimal since
$$\sum_j\lceil\delta_j\rceil\ge\sum_j\delta_j= \text{number of days}.$$ The mixed cases are excluded with the combination of the abovementioned arguments.

In conclusion, the maximal flow in out network can be retranslated to our original problem in the following way: If the flow from the node corresponding to person $p$ to the node corresponding to day $t$ is $1$, than $p$ is the driver on $t$.

Note that there are algorithms to determine the maximal flow like the Ford-Fulkerson-Algorithm.

It seems to me that you could describe this situation as a bipartite graph with the one partition representing the people and the other partition representing the days with an edge between them if that person rides on that day. You could apply a greedy style algorithm matching edges to a day vertex such that each day vertex is adjacent to at most one matched edge (ignoring that person vertices may be adjacent to multiple). You could for each day pick the person whose current ‘filled’ obligation is least.

Using the graph that I described, as I understand, if $N(p_i) = \{S_{j_n}\}_n$, then $\delta_i = \sum_{k=1}^{n} \frac{1}{N(S_{j_n})}$, that is to say, each time that a person rides in the carpool they share the responsibility of driving with each other person in the car. We want to avoid any one person having to drive more than their fair share of the time.

example carpool

In my example, $N(p_1) = \{S_1, S_3, S_4, S_6\}$ and $\delta_1 = \frac{1}{3} + \frac{1}{2} + \frac{1}{3} + \frac{1}{3} = 1.5$ so at the sixth day, $p_1$ would feel it fair for him/her to have driven two days. Notice also for day 5, there is only one driver available, so $p_4$ automatically feels it is fair that they had to drive then.

The first algorithm that comes to my mind (which is likely not the best) is to look at day one’s riders and pick the first person that comes to mind whose filled obligation is least. Do the same for day 2 and so on. If at any point you will have accidentally put someone over what they feel is fair, then on a previous day that they are currently designated to drive, have someone else whose current filled obligation is least drive instead.

Notice that in my algorithm, it would initially assign $p_1$ to drive on day one and then $p_2$ to drive on day two, but when we reach day 3 it would attempt to assign driving duty to $p_1$ again, putting him at driving two days while $p_1$ would feel it unfair as he expects to have only had to drive $\frac{1}{3}+\frac{1}{2}$ days, so we backtrack to day 1 and pick someone different to drive then instead, namely $p_3$.

Someone better than me could come along and give us an idea of how good or bad this algorithm is, or provide a counter example for when it would fail. My intuition tells me that it should work, though I’m only starting to learn this subject myself. My main point for posting was to hopefully help give an idea how to interpret the variables.