In this paper, we analyze, theoretically and empirically, the performance of linear relaxations in Ad Network optimization — an essential component of online marketing. Online marketing is a form of marketing and advertising which uses the Internet to deliver promotional marketing messages to consumers. The economic importance of such a task is apparent. Indeed, online marketing revenue has grown quickly since midnineties, with a compound annual growth rate of 20 %. In the first semester of 2014, this market achieved a revenue of 23.1 billion dollars (USD) only in the USA, an astonishing growth of 15.1 % over the first semester of 2013 [1]. Revenue comes from searchrelated marketing (39 %), banner displaying (28 %), mobile advertising (23 %), and other activities (10 %).
Online advertising companies usually follow either an online or an offline model. Both models assume the existence of a broker that has contracts with websites. These websites have spaces where banners can be displayed. The online model works by realtime bidding; in this model, advertisers participate in auctions, normally a Vickrey auction [2], where the value to be paid by the auction’s winner is the second largest bid, competing for specific user profiles. In the offline model, advertisers establish contracts with a broker, from now on called the Ad Network. Advertisers create campaigns specifying a set of ads, a pricing model, a budget, a minimal number of impressions (an impression corresponds to the display of an ad to a user), and time restrictions (that is, how long the campaign will be available and its starting time). The Ad Network decides how to distribute these ads to the users, so as to maximize the Ad Network revenue, and respecting the conditions specified by the advertisers. This paper focuses only on the offline model.
There are several pricing models [3, 4]; however, the most used ones are cost per impression (CPI), cost per action (CPA), and the cost per click (CPC). In the CPI model, the advertiser pays only for a number of impressions of the campaign. In the CPA model, the advertiser pays by specific users’s actions, e.g., filling in a form or buying a product in the advertisers’ store. In the CPC model, the advertiser pays when the user actually clicks in the advertisement. In 2014, advertisers paid 34 % of online advertising transactions on a costperimpression basis, 65 % on customer performance (e.g., cost per click or cost per acquisition), and 1 % on hybrids of impression and performance methods. CPC’s market share has grown each year since its introduction, eclipsing CPI to dominate twothirds of all online advertising pricing methods [5]. This paper focuses on the CPC pricing model.
The offline business model is, in essence, a sequential decision process: the Ad Network must decide which campaign to display to each user at a specific time, given campaign budgets, values that campaigns pay per click, time constraints of the campaigns, and the relationship between campaigns and user profiles. Ad Network decisions are evaluated based on some utility function; for example, expected revenue.
This sequential decision process can be modeled as a Markov decision process (MDP), as noticed by some authors [6].
The solution of this MDP yields the policy for the Ad Network, indicating the best decision for each possible combination of user profile, budget, and time constraints. However, this approach is computationally intractable even for small problems, as the state space grows exponentially with the number of state variables of the problem.
One way to avoid the curse of dimensionality in Ad Network optimization is to convert this decision process into a simpler, relaxed problem: Instead of deciding which campaign to allocate for each user profile at each time step, one then selects only the number of impressions of each campaign in a given interval of time. Some wellknown formulations of this relaxed problem rely on linear programming (LP), which have produced good results [6, 7].
In this paper, we contribute with an explicit MDP formulation for the Ad Network problem and we compare it with the LP formulation for the relaxed problem. We have expanded the results of a previous paper [8] to present a detailed analysis of the behavior of the two formulations. In our analysis, we build scenarios that clearly show the loss of performance resulting from the use of the LP formulation when compared with the MDP formulation. However, we show that the LP results are indeed close to results obtained with MDP models when relatively large budgets are assigned to campaigns. Thus, the performance loss of the LP formulation drops when the campaign budgets reach realistic sizes. Finally, we also propose new heuristics to improve the use of solutions achieved by LP relaxation.
The remainder of this paper is organized as follows. “Problem definition” section formalizes the problem of Ad Network optimization. In “Ad Networks as a Markov decision process” section, we formulate the problem as an MDP, and in “A linear programming relaxation” section, we formulate the problem as a relaxed problem to be solved by LP. “Methods” section builds cases that are unfavorable for the LP formulation. “Results and discussion” section describes the experiments that allow us to highlight and discuss the differences between the solutions for the MDP and LP models. “Conclusions” section concludes the paper.
Problem definition
Ad Networks promote the distribution of ads to websites [9]. Advertisers create ads, grouped in campaigns, and publishers are websites that own spaces for the display of ads. Campaigns are designed by advertisers.
The campaigns processed by the Ad Network are described by the campaign set \(\mathcal {C}\). A campaign \(k \in \mathcal {C}\) is defined by a tuple <B
_{
k
},S
_{
k
},L
_{
k
},c
c
_{
k
}>, where B
_{
k
} is the budget of campaign k in number of clicks, S
_{
k
} is the starting time of the campaign, L
_{
k
} is the lifetime of the campaign, and c
c
_{
k
} is the monetary value that the campaign pays per click. Campaigns can be active or inactive, and only active campaigns can be chosen by the Ad Network. A campaign is active at a specific time t if S
_{
k
}≤t<S
_{
k
}+L
_{
k
} and the remaining budget is larger than zero.
The advertisers contract the service of an Ad Network to display ads of campaigns in websites, providing to the Ad Network a set of campaigns. It is also assumed that these contracts occur previously to the beginning of the distribution of ads. Figure 1 depicts the flow of ad distribution in online marketing.
Every time a user requests a page in a website (step 1 in Fig. 1), the website requests an ad to be displayed (step 2). Users are characterized by their profiles, and these are known by the Ad Network. The Ad Network decides which campaign to allocate to the received request, and an ad of the selected campaign is sent to the website (step 3). Then, an impression is made, i.e., the ad is displayed as a banner to the user (step 4), who may or may not click on the ad (step 5).
This sequential process can be formalized as follows. At each time t, there is a probability that a request is received by the Ad Network; that is, a probability of a user requesting a page in a site in the Ad Network’s inventory. We assume that the requests follow a Bernoulli distribution with a success probability P
_{req}. This modeling decision is justified as the Bernoulli distribution is well suited to encode the arrival of random requests from a large unknown population [10].
Users are classified into different profiles, and the set of possible user profiles is denoted by \(\mathcal {G}\). A probability distribution \(P_{\mathcal {G}}: \mathcal {G} \rightarrow [0,1]\) yields the probability that a user belongs to a user profile i.
Once the campaign k is selected, one of its ads is displayed to the user with profile i in a banner inside a page in a website.
The user may or may not click on this ad with probability CTR(i,k), where CTR stands for clickthrough rate. That is, CTR \(: \mathcal {G} \times \mathcal {C} \rightarrow [0,1]\) is the probability of a click given a user profile and a campaign. In real problems, CTR values are typically on the order of 10^{−4} [6]. One click generates a revenue equal to c
c
_{
k
}; a percentage of this amount goes to the website and the remaining revenue stays with the Ad Network.
The goal of the Ad Network is to choose which campaign to allocate to each request, while maximizing a utility function. We assume the Ad Network to be interested in maximizing expected revenue.
Ad Networks as a Markov decision process
We now formulate the Ad Network problem as an MDP. The formulation is based on our previous work on Ad Network optimization [8, 11].
A finite discretetime fully observable MDP is a tuple \(\langle \mathcal {S},\mathcal {A}, \mathcal {D}, \mathcal {T}, \mathcal {R}\rangle \) [12], where:

\(\mathcal {S}\) is a finite set of fully observable states of the process;

\(\mathcal {A}\) is a finite set of all the possible actions to be executed at each state; \(\mathcal {A}(s,t)\) denotes the set of valid actions at instant t when the system is in state \(s \in \mathcal {S}\);

\(\mathcal {D}\) is a finite sequence of natural numbers that correspond to decision epochs, in which the actions should be chosen and performed;

\(\mathcal {T}:\mathcal {S}\times \mathcal {A}\times \mathcal {S}\times \mathcal {D}\rightarrow [0,1]\) is a transition function that specifies the probability \(\mathcal {T}(s,a,s',t)\) that the system moves to state s
^{′} when action a is executed in state s at time t;

is a reward function that produces a finite numerical value \(r = \mathcal {R}(s,a,s',t)\) when the system goes from state s to state s
^{′} as a result of applying an action a at time t.
An MDP agent is continuously in a cycle of perception and action (Fig. 2): at each time t the agent observes the state \(s \in \mathcal {S}\) and decides which action \(a \in \mathcal {A}(s,t)\) to perform; the execution of this action causes the transition to a new state s
^{′} according to the transition probability function \(\mathcal {T}\) and the agent receives a reward r. This cycle is repeated until a stopping criterion is met; for example, until there are no more valid decision epochs.
It is important to notice that this system is not deterministic. Given a transition function \(\mathcal {T}\), when the same action is performed in the same state and at the same instant of time, the system may move to different states.
To solve an MDP is to find a policy that maximizes the accumulated reward sequence. A nonstationary deterministic policy \(\pi :\mathcal {S}\times \mathcal {D} \rightarrow \mathcal {A}\) specifies which action \(a \in \mathcal {A}\) will be executed at each state \(s \in \mathcal {S}\) and at time \(t \in \mathcal {D}\), \(\mathcal {D} = \{0,1,\ldots,\tau 1\}\).
The expected total reward of a policy π starting at time t at state \(s\in \mathcal {S}\) is defined as:
$$ V^{\pi}(s,t) = \mathrm{E}\left.\left[\sum_{i=t}^{\tau1}\mathcal{R}(s_{i},\pi(s_{i},t), s_{i+1},t)\rights_{t}=s,\pi\right]. $$
((1))
The value function V
^{∗} of an optimal policy can be defined recursively for any state \(s\in \mathcal {S}\) and time t<τ by:
$${} {\fontsize{8.8pt}{9.6pt}\selectfont{\begin{aligned} V^{*}(s,t) = \max_{a\in\mathcal{A}(s,t)}\left\{ \sum_{s' \in \mathcal{S}} \mathcal{R}(s,a,s',t) + \mathcal{T}(s,a,s',t)V^{*}(s',t+1)\right\}\!, \end{aligned}}} $$
((2))
where \(\mathcal {A}(s,t)\) is the subset of \(\mathcal {A}\) which contains the possible actions to be applied in state s at time t, and V
^{∗}(s,τ)=0 for any state \(s\in \mathcal {S}\) [13].
Given the optimal value function V
^{∗}(·), an optimal policy can be chosen for any state \(s\in \mathcal {S}\) and time t<τ by:
$${} {\fontsize{8.6pt}{9.6pt}\selectfont{\begin{aligned} \pi^{*}(s,t) = \arg\max_{a\in\mathcal{A}(s,t)}\left\{\!\sum_{s' \in \mathcal{S}} \!\mathcal{R}(s,a,s',t)\!+\mathcal{T}(s,a,s',t)V^{*}(s',t+1)\!\right\}\!. \end{aligned}}} $$
The intuition behind the expressions above is exploited by the value iteration algorithm [14], described in Algorithm 1.
Once the optimal policy π
^{∗} is available, just apply it at every decision epoch: the agent observes the state s and instant of time t and applies the action defined by the optimal policy, a=π
^{∗}(s,t).
For the problem of Ad Networks, the network observes its current state, given by the configuration of the campaigns and the user profile, and then the optimal policy defines the campaign to be displayed on the website.
We now contribute with a model for the Ad Network problem as an MDP by specifying its states, actions, transitions, and rewards.
States
The state is modeled as:
$$s = \left[ B_{1}, B_{2}, \dots, B_{k}, G \right], $$
where B
_{
i
} is the remaining budget of campaign i and \(G \in \mathcal {G} \cup \{0\}\) is the user profile that is generating a request. When the variable G is equal to 0, there is no request to attend to. For example, consider 5 campaigns and 3 user profiles, a state could be:
$$\underbrace{[\overbrace{10, 3, 4, 2, 3}^{\text{Campaign information}}, \overbrace{3}^{\text{Request information}}]}_{\text{State}}. $$
Here, campaign 1 can afford 10 clicks, campaign 2 can afford 3 clicks, and so on. The request information contains the information of which user profile has generated a request; in this example, user profile i=3 has generated the request. From this state, possible next states are: [9,3,4,2,3,G], [10,2,4,2,3,G], [10,3,3,2,3,G], [10,3,4,1,3,G], [10,3,4,2,2,G], and [10,3,4,2,3,G], where G can be any user profile in \(\mathcal {G}\) or even 0, if there are no requests in the next time step.
Actions
An action allocates an ad from a campaign in the set \(\mathcal {C}\) to a request from a user profile in set \(\mathcal {G}\). Given our problem definition, the set of actions can be defined by \(\mathcal {A}=\{0,1,\ldots,\mathcal {C}\}\) and an action is simply an integer k. If k>0, then k is the campaign index, \(k \in \{1,2, \dots, \mathcal {C}\}\). If k=0, then the Ad Network does not allocate any campaign to the user request.
Recall that campaigns can be active or inactive, hence at any time t a subset of actions \(\mathcal {A}(s,t)\) is available, consisting of action 0 plus all k>0 such that S
_{
k
}≤t<S
_{
k
}+L
_{
k
} and such that B
_{
k
}>0.
Transitions
For all actions a and all states s and s
^{′}, the function \(\mathcal {T}\) must satisfy the following requirements: \(0 \leq \mathcal {T}(s,a,s',t) \leq 1\), and \(\sum _{s' \in \mathcal {S}}\mathcal {T}(s,a,s',t) = 1\).
The variable G in the state does not depend on the previous state. The component of the state B
k′ depends only on the previous B
_{
k
} and on the occurrence of click events. Given s=[B
_{1},B
_{2},…,B
_{
j
},G] and s
^{′}=[B1′,B2′,…,B
j′,G
^{′}], the transition function \(\mathcal {T}\) is:
$$ \mathcal{T}(s,a,s',t) = P_{t}(G') \times \prod_{k\in\mathcal{C}}P(B'_{k}B_{k},a,G), $$
((3))
where P(B
k′B
_{
k
},a,G) is equal to:
$${} {\fontsize{8.6pt}{9.6pt}\selectfont{\begin{aligned} \left\{ \begin{array}{llll} 1 & \text{if}~ B_{k}'=B_{k}~\text{and}~ (a \neq k ~\text{or}~ G=0~\text{or}~B_{k}=0), \\\\ \text{CTR}(G,k) & \text{if}~ B_{k}'\,=\,B_{k}1 ~\text{and}~ (a \,=\, k ~\text{and}~ G\!>\!0 ~\text{and}~ B_{k}\!>\!0), \\\\ 1\text{CTR}(G,k) & \text{if}~B_{k}'=B_{k} ~\text{and}~ (a = k ~\text{and}~ G>0 ~\text{and}~ B_{k}>0), \\\\ 0 &\text{otherwise,} \end{array}\right. \end{aligned}}} $$
((4))
and
$$ P_{t}(G') =\left\{ \begin{array}{ll} (1P_{\text{req}}) & \text{if}~G' = 0, \\ P_{\text{req}}\times P_{\mathcal{G}}(G) & \text{if}~ G'\in\mathcal{G},\\ \end{array}\right. $$
((5))
where P
_{req} is the probability that a request is received by the Ad Network, and \(P_{\mathcal {G}}\) is the probability of a user being of a given user profile. Note that the transition function \(\mathcal {T}\) is timeinvariant.
As an example, consider the problem in which P
_{req}=0.9, \(\mathcal {G}=\{1,2\}\), B
_{1}=B
_{2}≥2, \(P_{\mathcal {G}}(G) = \frac {1}{\mathcal {G}}=0.5\), and CTR(i,k)=i×k×10^{−4}. From the state s=[B
_{1},B
_{2},1], there is the possibility of 12 future states. Figure 3 illustrates some examples; therein, if a=1 then

\(P1 = P_{\text {req}} \times P_{\mathcal {G}} \times (1  \text {CTR}(1,1)) \times 1 = 0.9 \times 0.5\) ×(1−1×1×10^{−4})×1,

\(P2 = P_{\text {req}} \times P_{\mathcal {G}} \times \text {CTR}(1,1) \times 1 =0.9\times 0.5\) ×(1×1×10^{−4})×1,

P3=(1−P
_{req})×CTR(1,1)×1=(1−0.9) ×(1×1×10^{−4})×1,

\(P4 = P_{\text {req}} \times P_{\mathcal {G}} \times (1  \text {CTR}(1,1)) \times 0 = 0.9\times 0.5\) ×(1−1×1×10^{−4})×0.
Rewards
In our model, we assume that the reward does not vary over time, and that it is independent of the next state. Thus, the reward function is . In our problem, we have:
$${} {\fontsize{8.8pt}{9.6pt}\selectfont{\begin{aligned} \mathcal{R}(s,a) =\left\{ \begin{array}{ll} cc_{k}\times \text{CTR}(G,k) & \text{if \(a>0, a=k\), \(G>0\) and \(B_{k}>0\),} \\ 0 & \text{otherwise,}\\ \end{array}\right. \end{aligned}}} $$
((6))
where c
c
_{
k
} and CTR(G,k) were defined in “Problem definition” section and specify respectively the CPC for campaign k and the CTR for campaign k and user profile G. The intuition behind the reward function is that it represents the local revenue after choosing to display an ad from campaign k.
A linear programming relaxation
Here, we formulate the Ad Network problem as an LP relaxation. LP focuses on maximization or minimization of a linear function over a polyhedron [15]. In canonical form, we must find
$$\begin{array}{ll} \max & \quad c^{T}x \\ s.t. & \quad Ax \leq b, \quad x \geq 0, \end{array} $$
where c and b are vectors, A is a matrix, and x is a vector of variables. There are several algorithms to solve an LP problem, even strongly polynomial time algorithms [16]. The simplex method is the most commonly used [17]; despite its worstcase exponential time, this method is in average very efficient [18].
In the Ad Network relaxation, we are interested in discovering the number of ad displays to be allocated for each campaign in a given interval of time. The description that follows is based on previous efforts [6, 7] with minor modifications^{1}.
Let \(\mathcal {I}\) be a sorted list obtained by sorting the set defined by {S
_{
k
}}∪{S
_{
k
}+L
_{
k
}}; that is, the ordered list of starting and ending times of all campaigns, and let \(\mathcal {J}_{j}\) be the (rightopen) set of intervals defined by the campaign time constraints. Define \(\mathbb {T}_{j}\) to be the length of the interval j.
For example, in Fig. 4, we have three campaigns, with their starting times and ending times, defining five intervals. Consider that E
_{
k
}=S
_{
k
}+L
_{
k
}. In this example, we have that: \(\mathcal {I} = \{S_{2},S_{3},S_{1},S_{3}+L_{3},S_{2}+L_{2},S_{1}+L_{1}\}\), then \(\mathcal {J}_{1}= [S_{2},S_{3}[\), \(\mathcal {J}_{2}= [S_{3},S_{1}[\), \(\mathcal {J}_{3}= [S_{1},E_{3}[\), \(\mathcal {J}_{4}= [E_{3},E_{2}\)
\([,\mathcal {J}_{5}= [E_{2},E_{1}[\), and \(\mathbb {T}_{1}=S_{3}S_{2}\), \(\mathbb {T}_{2}=S_{1}S_{3}\), \(\mathbb {T}_{3}=E_{3}S_{1}\), \(\mathbb {T}_{4}=E_{2}E_{3}\), \(\mathbb {T}_{5}=E_{1}E_{2}\).
We can state the LP approach to Ad Network optimization problem as follows:
$$\begin{array}{@{}rcl@{}} \max \quad \sum_{j \in \mathcal{J}} \sum_{i \in \mathcal{G}} \sum_{k \in \mathcal{C}} cc_{k} \times \text{CTR}(i,k) \times x_{j,i,k} \end{array} $$
((7))
$$\begin{array}{@{}rcl@{}} &s.t.& \sum_{k \in \mathcal{C}_{j}} x_{j,i,k} \leq P_{\text{req}} \times P_{\mathcal{G}}(i) \times \mathbb{T}_{j}, \forall i \in \mathcal{G}, \forall j \in \mathcal{J} \end{array} $$
((8))
$$\begin{array}{@{}rcl@{}} && \sum_{i \in \mathcal{G}} \sum_{j \in \mathcal{J}} \text{CTR}(i,k) \times x_{j,i,k} \leq B_{k}, \forall k \in {\mathcal{C}} \end{array} $$
((9))
$$\begin{array}{@{}rcl@{}} && x_{j,i,k} \geq 0, \forall j \in \mathcal{J}, \forall k \in \mathcal{C}, \forall i \in \mathcal{G} \end{array} $$
((10))
Variable x
_{
j,i,k
} indicates how many ads from campaign k should be displayed to users with user profile i at the interval j. The objective function maximizes the total expected revenue of the Ad Network. The first set of constraints ensures that the solution does not exceed the expected number of requests for each user profile i in interval j. The second set of constraints ensures that the expected number of clicks for each campaign does not exceed its budget. The last set of constraints ensures that the solution is positive and therefore feasible for real problems. Without the last set of constraints, it would be possible to create requests for allocations with negative values of x
_{
j,i,k
}. Clearly, x
_{
j,i,k
} should be an integer because it is not possible to allocate a fraction of an ad, but we ignore (relax) this for now. Table 1 summarizes the list of symbols that we use.
Policies for setting the ad to be displayed from the LP solution
Note that the LP solution indicates how many ads from campaigns should be shown to each user profile at each interval, but it does not provide any clue on how to apply this solution. Girgin et al. [6] proposed two ways to use the solution of this LP problem:

1.
The highest LP policy (HLP), π
_{LP}(i,j), selects the campaign in \(\mathbb {T}_{j}\), where
$$\pi_{\text{LP}}(i,j) = \arg\max_{k}\; x_{j,i,k}/\sum_{k} x_{j,i,k}. $$

2.
The stochastic LP policy (SLP) selects stochastically with respect to “probabilities”
$$x_{j,i,k}/\sum_{k} x_{j,i,k}. $$
Complexity of the MDP and the LP formulations
Here, we compare the complexity of the MDP formulation and the LP relaxation.
In the LP formulation, if the constraint (10), x
_{
j,i,k
}≥0, is not considered, the number of constraints is of order \(O(\mathcal {J} \times  \mathcal {G}  +  \mathcal {C} )\). But by definition \(1 \leq \mathcal {J} \leq 2 \times \mathcal {C}\). Then, the number of constraints is of order \(O(\mathcal {G}\mathcal {C})\), while the number of variables is of order \(O(\mathcal {G}\mathcal {C}^{2})\) for the same reason.
On the other hand, in the MDP formulation, the size of the policy to be found is equal to \(\mathcal {S} \times \{0,1,2,\dots,\tau 1\}\), and \(\mathcal {S} = (\mathcal {G} +1) \times \prod _{k \in \mathcal {C}} (B_{k} +1)\). If we consider \(B_{\text {min}} = \min _{k \in \mathcal {C}}\{B_{k}\}\), it follows that \(\mathcal {S} \geq (\mathcal {G} +1) \times (B_{\text {min}}+1)^{\mathcal {C}}\). This makes the MDP solution intractable even for small problems because of its memory requirements. In real settings, there are hundreds of campaigns with budgets of thousands of clicks.
Thus, despite the fact that the MDP formulation is a more faithful scheme, the LP formulation is computationally much more attractive. However, the LP formulation only indicates how many campaigns should be allocated in a given time interval, leaving the actual action to auxiliary schemes (for instance, HLP and SLP).