## GraphCliques

- SI: GraphCliques
- Open Access
- Published:

# Gene clusters as intersections of powers of paths

*Journal of the Brazilian Computer Society*
**volume 18**, pages 129–136 (2012)

## Abstract

There are various definitions of a gene cluster determined by two genomes and methods for finding these clusters. However, there is little work on characterizing configurations of genes that are eligible to be a cluster according to a given definition. For example, given a set of genes in a genome, is it always possible to find two genomes such that their intersection is exactly this cluster? In one version of this problem, we make use of the graph theory to reformulated it as follows: Given a graph *G* with *n* vertices, do there exist two *θ*-powers of paths *G*_{
S
}=(*V*_{
S
},*E*_{
S
}) and *G*_{
T
}=(*V*_{
T
},*E*_{
T
}) such that *G*_{
S
}∩*G*_{
T
} contains *G* as an induced subgraph? In this work, we divide the problem in two cases, depending on whether or not *G* is an induced subgraph of *G*_{
S
} or *G*_{
T
}. We show an \(\mathcal{O}(n^{2})\) time algorithm that generates the smallest *θ*-powers of paths *G*_{
S
} and *G*_{
T
} (with respect to and the number of vertices) that contains *G* as an induced subgraph. Finally, we discuss the problem when *G* is an induced subgraph neither of *G*_{
S
} nor of *G*_{
T
} and we present a method of finding the smallest power of a path when graph *G* is a cycle *C*_{
n
}.

## Introduction

Due to recent research on genetic mapping, a large amount of information is available and stored in databases of various research centers in the world. Processing these data, in order to obtain relevant biological conclusions, is one of the challenges in biology. One way to structure these data is using comparison of genomes, i.e., the search for similarities and differences between two or more organisms. The central question of this paper proposes to deal with a problem in this area by asking: given a set of genes in a genome, called *cluster*, is it always possible to find two genomes such that their intersection is exactly this cluster? First, we show the modeling presented by Adam et al. [1] and Sankoff and Xu [8], which will be used in this paper.

A *marker* is a gene with a known location on a chromosome. Let *V*_{
X
} be the set of *n* markers in the genome *X*. These markers are partitioned among a number of total orders called *chromosomes*. For markers *g* and *h* in *V*_{
X
} on the same chromosome in *X*, let *gh*∈*E*_{
X
} if the number of genes intervening between *g* and *h* in *X* is less than *θ*, where *θ*≥1 is a fixed *neighborhood parameter*. We call *G*_{
X
}=(*V*_{
X
},*E*_{
X
}) a *θ-adjacency graph* if its edges are determined by a neighborhood parameter *θ*.

Consider the *θ*-adjacency graphs *G*_{
S
}=(*V*_{
S
},*E*_{
S
}) and *G*_{
T
}=(*V*_{
T
},*E*_{
T
}) with a non-null set of vertices in common *V*_{
ST
}=*V*_{
S
}∩*V*_{
T
}. We say that a subset of *V*⊆*V*_{
ST
} is a *generalized adjacency cluster* if it consists of vertices of a maximal connected subgraph of *G*_{
ST
}=(*V*_{
ST
},*E*_{
S
}∩*E*_{
T
}). We call *G*=*G*_{
ST
}[*V*] the subgraph induced by set *V*.

Let *G*=(*V*(*G*),*E*(*G*)) be a graph with vertex set *V*(*G*) and edge set *E*(*G*), such that |*V*(*G*)|=*n*. Let *v*, \(\bar{v} \in V(G)\). The *distance* between vertices *v* and \(\bar{v}\), denoted by \(d_{G}(v,\bar{v})\), is the number of edges in a shortest path between *v* and \(\bar{v}\) in *G*. A *path* between two vertices *v*_{0} and *v*_{
t
} of graph *G* is a sequence of vertices *v*_{1},*v*_{2},…,*v*_{
t
} such that *v*_{
i
}*v*_{i+1} is an edge of *G*, 1≤*i*≤*t*−1. Let *P*_{
n
} be a graph that is a path with *n* vertices. A *θ-power of a path*\(P_{n_{\theta}}\), denoted by \(P^{\theta}_{n_{\theta}}\), *θ*>0, is graph such that \(V(P^{\theta}_{n_{\theta}}) = V(P_{n_{\theta}})\) and \(E(P_{n_{\theta}}^{\theta}) = \{v\bar{v} : d_{P_{n_{\theta}}}(v, \bar{v})\leq \theta \ \mathrm{with} \ v, \bar{v} \in V(P_{n_{\theta}}^{\theta})\}\). For the benefit of the reader, we denote the power of a path \(P^{\theta}_{n_{\theta}}\) by *P*^{θ}. The definition of a chromosome with *n*_{
θ
} markers in a *θ*-adjacency graph is similar to a power of a path \(P^{\theta}_{n_{\theta}}\). Now, the central question of this work can be reformulated as follows:

### Question 1

Given a connected graph *G*, do there exist *G*_{
S
} and *G*_{
T
}, two *θ*-powers of paths *P*_{
S
} and *P*_{
T
}, whose intersection contains *G* as an induced subgraph?

If the answer is yes, we are also interested in finding the minimum value of power *θ* and number vertices *n*_{
θ
} for these two *θ*-powers of paths.

In order to contribute to this challenging problem, we divide our study in two cases, depending on whether or not *G* is an induced subgraph of *G*_{
S
} or *G*_{
T
}. First, we give some definitions. We say that *G* is an *unit interval graph* if there exists a family *I* of intervals (*a*,*b*) on the real line such that each *v*∈*V*(*G*) can be put in a one-to-one correspondence with (*a*_{
v
},*b*_{
v
})∈*I*; the intervals in *I* are of same length; and \(v\bar{v}\) is a edge of *E*(*G*) if, and only if, \((a_{v}, b_{v}) \cap (a_{\bar{v}}, b_{\bar{v}}) \neq \emptyset\). This family of intervals is called an *interval model* for *G*. Lin et al. [6] and Soulignac [9] present a proof that the class of proper interval graphs precisely the class of unit interval graphs. There exist linear-time recognition algorithms for unit interval graphs, for example Figueiredo et al. [4] and Corneil et al. [3].

Brandstädt et al. [2] and Lin et al. [5] proved independently the following structural property:

### Theorem 1

*A graph**G**is an induced subgraph of a power of a path if*, *and only if*, *G**is an unit interval graph*.

Thus, given an unit interval graph *G* with *n* vertices, there exists a *θ*-power of a path \(P_{n_{\theta}}\) that contains *G* as an induced subgraph. But the proofs of the structural characterization given by Theorem 1 [2, 5] does not lead to an algorithm that constructs *G*_{
S
} and *G*_{
T
} for Question 1 with minimum value of power *θ* and number vertices *n*_{
θ
}.

In the paper [6], the authors show an \(\mathcal{O}(n)\) time algorithm that includes new intervals into a proper interval model *I* of a connected graph *G*, constructing an extended model *I*′ containing *I*. This extended model *I*′ gives an implicit representation of a power of a path for all proper interval graph *G*, but the number of inserted intervals, or the size of the power *θ*, cannot be minimum. The authors also remark that any explicit representation would require \(\mathcal{O}(n^{2})\) steps.

We present in this work an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph *G*, an explicit representation of the smallest *θ*-power of path, *G*_{
S
} (with respect to *θ* and to the number of vertices), that contains *G* as an induced subgraph. Next, we construct *G*_{
T
}, a *θ*-power of a path with the same number of vertices of *G*_{
S
}, such that the intersection *G*_{
S
}∩*G*_{
T
} contains *G* as an induced subgraph.

This paper is organized as follows. In Sects. 2 and 3, we present the algorithm and we prove its correctness and complexity. In Sect. 4, we discuss the problem when *G* is an induced subgraph neither of *G*_{
S
} nor of *G*_{
T
} and we present a method of finding the smallest power of a path when graph *G* is a cycle *C*_{
n
}.

## The algorithm

Our result is based on the ordering of the vertex set of *G*, given by Algorithm *Recognize* [3], which satisfies the property proved by Roberts in [7]:

### Property 2

*A graph**G**is an unit interval graph if and only if there is an order* < *on vertices such that for all vertices**v*, *the closed neighborhood of**v**is a set of consecutive vertices with respect to the order* <.

Since all powers of paths are unit interval graphs, we can insert the vertices of *V*(*G*) in the vertex set of a power of a path \(P^{\theta}_{n_{\theta}}\) until this power of a path contains *G* as an induced subgraph.

This construction is done by Algorithm *CPP* as follows. First, let *v*_{1}<*v*_{2}<⋯<*v*_{
n
} be an ordering of *V*(*G*) given by Algorithm *Recognize* [3]. We consider *θ*_{0} as the number of vertices of the maximal clique that contains *v*_{1}, minus one; and we insert the vertices of this clique in \(P^{\theta_{0}}\). The Algorithm *CPP* constructs a sequence of power of a paths \(P^{\theta_{0}} \subset P^{\theta_{1}} \subset \cdots \subset P^{\theta_{l-1}} \subset P^{\theta_{l}}\) such that *θ*_{
i
}=*θ*_{i−1}+1.

Let *v* be the first vertex non-adjacent to *v*_{1} in the order on *V*(*G*). If *v* is adjacent to *v*_{2}, Algorithm *CPP* must insert *v* in the vertex of \(P^{\theta_{0}}\) that is at distance *θ*_{0}+1 from vertex *v*_{1} in \(P^{\theta_{0}}\). Similarly, if *v* is not adjacent to *v*_{
t
}, but is adjacent to *v*_{t+1}, Algorithm *CPP* must insert *v* in the vertex of \(P^{\theta_{0}}\) that is at a distance *θ*_{0}+1 from vertex *v*_{
t
} in \(P^{\theta_{0}}\). This is done by inserting *t*−1 vertices between the vertex of largest index adjacent to *v*_{1} and *v* in \(P^{\theta_{0}}\). Now, suppose that there exist at least two vertices *v*, \(\bar{v}\) that are not adjacent to *v*_{1} and adjacent to *v*_{2}. Let \(\bar{v}\) be the second vertex of this set. In order to minimize the number of vertices of \(P^{\theta_{0}}\), vertex \(\bar{v}\) must be a vertex of \(P^{\theta_{0}}\) at distance *θ*_{0}+2 of vertex *v*_{1} in \(P^{\theta_{0}}\). Then Algorithm *CPP* must call Procedure *SHIFT* to increase *θ*_{0} to *θ*_{1}:=*θ*_{0}+1 because of the edge \(\bar{v}v_{2}\). On the other hand, this increase adds several edges in \(P^{\theta_{0}}\) which are not in *E*(*G*). Thus, Procedure *SHIFT* adjusts the power of a path \(P^{\theta_{0}}\) for the new *θ*_{1}, by inserting vertices in \(P^{\theta_{0}}\) in order to preserve the adjacencies and non-adjacencies between vertices of *G* and generates a new \(P^{\theta_{1}}\). Algorithm *CPP* proceeds until all vertices of *V*(*G*) are included in \(P_{n_{\theta}}^{\theta}\), a smallest power of a path with respect to *θ* and *n*_{
θ
}.

Before describing Algorithm *CPP*, we borrow some definitions from [3]. Given an ordering of *V*(*G*) returned by Algorithm *Recognize* [3], then order_{
G
}(*v*) is the position of vertex *v* considering this ordering; \(\xi_{G}(v) = \textrm{max} \{\mathrm {order}_{G}(\overline{v}) :\bar{v} \in N_{G}[v]\}\) and \(\eta_{G}(v) = \textrm{min} \{\mathrm {order}_{G}(\bar{v}): \bar{v} \in N_{G}[v]\}\), where *N*_{
G
}[*v*]={*w*∈*V*(*G*):*vw*∈*E*(*G*)}∪{*v*}. Let *v*∈*V*(*G*) and *u*∈*V*(*P*^{θ}). We refer to \(\mathrm {order}_{P^{\theta}}(v)\) as the position of vertex *v* in the ordering of the vertex set of *P*^{θ}, i.e., \(\mathrm {order}_{P^{\theta}}(v)= i\), if *u*_{
i
}=*v* in *P*^{θ}. We denote \(\xi_{P^{\theta}}(u)= \textrm{max} \{\mathrm {order}_{P^{\theta}}(\bar{u}) : \bar{u} \in N_{P^{\theta}}[u]\}\) and \(\eta_{P^{\theta}}(u) = \textrm{min} \{\mathrm {order}_{P^{\theta}}(\bar{u}) :\bar{u} \in N_{P^{\theta}}[u]\}\).

Next, we present Algorithm *CPP* and Procedure *SHIFT*.

### Algorithm

*CONSTRUCTING_POWER_OF_PATH(CPP)*

Procedure *SHIFT* receives as input a smallest power of a path *P*^{θ} that contains *G*[*v*_{1},…,*v*_{l−1}], *ξ*_{
G
}(*v*_{1})+1≤*l*≤*n* as an induced subgraph in *P*^{θ}. Power *P*^{θ} contains the last vertex *v*_{
l
} inserted by Algorithm *CPP*. Vertex *v*_{
l
} raises Procedure *SHIFT* because *v*_{
l
} is not adjacent to some vertex *v*_{l−t} in *P*^{θ}, but *v*_{l−t}*v*_{
l
}∈*E*(*G*).

### Procedure

*SHIFT*

Algorithm *CPP* returns \(P_{n_{\theta}}^{\theta}\), the smallest power of a path (with respect to *θ* and *n*_{
θ
}) that contains *G* as an unit interval graph. We construct two powers of paths, *G*_{
T
}=(*V*_{
T
},*E*_{
T
}) and *G*_{
S
}=(*V*_{
S
},*E*_{
S
}), from \(P_{n_{\theta}}^{\theta}\) as follows. First, \(V_{T}= V_{S} = V(P^{\theta}_{n_{\theta}})\). Then, vertices of *V*_{
T
}, which are not in *V*, receive different labels from vertices in \(V(P_{n_{\theta}}^{\theta})\).

We show an example of an unit interval graph *G* in Fig. 1. For this graph *G*, Algorithm *CPP* returns *G*_{
S
}, the 2-power of path *P*_{
S
}=*v*_{1},*v*_{2},*v*_{3},0,*v*_{4},*v*_{5}. Then, *G*_{
T
} is a 2-power of path *P*_{
T
}=*v*_{1},*v*_{2},*v*_{3},*v*_{
b
},*v*_{4},*v*_{5}.

## Proofs

In this section, we present the proofs of correctness of the Procedure *SHIFT* (Lemma 1) and Algorithm *CPP* (Theorem 4).

### Lemma 1

*Let**P*^{θ}*be a smallest power of a path that contains**G*_{l−1}=*G*[*v*_{1},…,*v*_{l−1}] *as an induced subgraph*, *with respect to the ordering**v*_{1}<⋯<*v*_{l−1}. *Let**v*_{
l
}∈*V*(*G*) *be the next vertex inserted in**P*^{θ}*and*\(v_{l-t-1}v_{l} \not\in E(G), \ v_{l-t}v_{l} \in E(G)\)*and*\(d_{P_{n_{\theta}}}(v_{l-t}, v_{l})= \theta+1\). *Then*, *the output of the Procedure* SHIFT, *the power of a path**P*^{θ+1}, *is a smallest power of a path that contains**G*_{
l
}=*G*[*v*_{1},…,*v*_{l−1},*v*_{
l
}] *as an induced subgraph*, *with respect to the ordering**v*_{1}<⋯<*v*_{l−1}<*v*_{
l
}.

### Proof

Since \(v_{l-t-1}v_{l} \not\in E(G)\), *v*_{l−t}*v*_{
l
}∈*E*(*G*) and \(\theta+1=\allowbreak d_{P_{n_{\theta}}}(v_{l-t}, v_{l})\), the Procedure *SHIFT* must increase the power *θ* by one unit (Step 1). But the increase of *θ* to *θ*+1 creates several adjacencies in *P*^{θ} between pairs of vertices of the set {*v*_{1},…,*v*_{
l
}} that are non-adjacent in *G*. In order to preserve the adjacencies and non-adjacencies between vertices of *G* in *P*^{θ}, Procedure *SHIFT* is forced to insert one vertex between the vertex that received *v*_{l−t−1} in *P*^{θ} and its consecutive vertex in *P*^{θ}. Again, counting in descending order from vertex *v*_{l−t−1}, the adjacencies were violated in each “block” of *θ* vertices in *P*^{θ}. So, the procedure must insert one vertex to each *θ*+1 vertices in descending order, from vertex *v*_{l−t−1} in *P*^{θ}. We observe that the set formed by the initial vertices of *V*(*P*^{θ}) has cardinality less than or equal to *θ*+1, because dividing \(\mathrm {order}_{P^{\theta}}(v_{l-t-1}) \ \) by *θ*+1 the remainder is greater than or equal to 1 and less than or equal to *θ*+1.

In each step, the procedure inserts the smallest number of vertices necessary to guarantee that the power of a path *P*^{θ+1}, created by Procedure *SHIFT*, contains *G*_{
l
}[*v*_{1},…,*v*_{
l
}] as an induced subgraph. So, the power *θ*+1 and the number of inserted vertices are minimum and, consequently, *P*^{θ+1} is a smallest power of a path that contains *G*_{
l
}[*v*_{1},…,*v*_{
l
}] as an induced subgraph. □

First, we prove that Algorithm *CPP* correctly returns a smallest power of a path according to the ordering given by Algorithm *Recognize* [3].

### Lemma 2

*Let**G**be a connected unit interval graph*. *Algorithm* CPP *generates the smallest power of a path*\(P_{n_{\theta}}^{\theta}\), *with respect to**θ**and**n*_{
θ
}, *that contains**G**as an induced subgraph according to the ordering**v*_{1}<⋯<*v*_{
n
}*given by the input of* CPP.

### Proof

Algorithm *CPP* constructs a sequence of powers of paths \(P^{\theta_{0}} \subseteq P^{\theta_{1}} \subseteq \cdots \subseteq P^{\theta}\), where *θ*_{
i
}=*θ*_{i−1}+1. This is done by successively adding, in each \(P^{\theta_{i}}\), vertices of *G* following the input ordering, preserving the adjacencies and non-adjacencies between vertices of *G* and minimizing *θ* and *n*_{
θ
}. Initially, the power of a path \(P^{\theta_{0}}\) receives the maximal clique containing *v*_{1}, i.e., \(V(P^{\theta_{0}}) =\{u_{1}, \ldots, u_{\xi_{P^{\theta_{0}}}(v_{1})}\}\) and *θ*_{0}=*ξ*_{
G
}(*v*_{1})−1. This is the smallest power of a path that contains \(G[v_{1}, \ldots, v_{\xi_{G}(v_{1})}]\) as an induced subgraph.

Suppose that the *l*−1 first vertices, i.e., {*v*_{1},…,*v*_{l−1}}, were already been inserted by Algorithm *CPP* in the power of a path *P*^{θ}, i.e., *P*^{θ} is the smallest power of a path, with respect to *θ* and *n*_{
θ
} that contains *G*[*v*_{1},…,*v*_{l−1}] as an induced subgraph. Let *v*_{
l
}∈*V*(*G*) the next vertex to be inserted by Algorithm *CPP* in *P*^{θ}. Suppose that *v*_{
l
} is adjacent, in *P*^{θ}, to {*v*_{l−t},…,*v*_{l−1}}. Vertex *v*_{
l
} must be inserted in *P*^{θ} between positions \(\xi_{P^{\theta}}(v_{l-t-1})+1\) and \(\xi_{P^{\theta}}(v_{l-t})\) so that *G*[*v*_{1},…,*v*_{
l
}] be an induced subgraph of *P*^{θ}. Then, \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta+1\) and \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). We consider two cases with respect to the adjacencies of *v*_{
l
} in *G*. From now on, we refer to Fig. 2, and Fig. 3 and Fig. 4, where dashed lines represent adjacencies.

Case 1: If *t*=*θ*, then after insertion of *v*_{
l
}, \(\theta + 1 \leq d_{P_{n_{\theta}}}(v_{l-t-1},v_{l})\), because the set {*v*_{l−t},…,*v*_{l−1}} has *t*=*θ* elements (see Fig. 2). In order to minimize *θ* and *n*_{
θ
}, Algorithm *CPP* must insert *v*_{
l
} in the consecutive vertex to *v*_{l−1} in the power of a path *P*^{θ}, and as a consequence \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta + 1\). In effect, since *v*_{l−1} is adjacent to *v*_{l−t} in *P*^{θ}, by hypothesis, *v*_{l−1} was inserted in *P*^{θ} such that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) \leq \theta\) and *v*_{
l
} was inserted in the consecutive vertex to *v*_{l−1} in *P*^{θ}, then the claim is true. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\), Algorithm *CPP* inserted *v*_{
l
} without changing *θ*, the number of vertices of *P*^{θ} became *n*_{
θ
}+1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta+1\), Algorithm *CPP* called the Procedure *SHIFT* and, by Lemma 1, we conclude the proof.

Case 2: If 1<*t*<*θ*, Algorithm *CPP* must insert *v*_{
l
} in *P*^{θ} such that \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta + 1\) so that *v*_{l−t−1} and *v*_{
l
} are not adjacent. We observe the position of *v*_{l−1} in *P*^{θ}. If *v*_{l−1} is not adjacent to *v*_{l−t−1} in *P*^{θ} (see Fig. 3), in order to minimize the number of vertices of *P*^{θ}, Algorithm *CPP* inserts *v*_{
l
} in the consecutive vertex to *v*_{l−1}. By hypothesis, vertex *v*_{l−1} was inserted in *P*^{θ} so that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1})\leq \theta\). Then, if \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) < \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). Thus *v*_{
l
} was inserted in *P*^{θ} without changing *θ*, the number of vertices of *P*^{θ} became *n*_{
θ
}+1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) = \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta + 1\), Procedure *SHIFT* was called and, by Lemma 1, we conclude the proof.

If *v*_{l−1} is adjacent to *v*_{l−t−1} in *P*^{θ} (see Fig. 4), the position of *v*_{l−t−1} in *P*^{θ} is between *l*−*t*+1 and \(\xi_{P^{\theta}}(v_{l-t-1})\), including them. Again, in order to minimize the number of vertices of *P*^{θ}, vertex *v*_{
l
} is inserted \((\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))\) vertices after vertex *v*_{l−1} in *P*^{θ}. Thus,

Since \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) < d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). So, *v*_{
l
} was inserted in *P*^{θ} without changing *θ*, and the number of vertices of *P*^{θ} became \(n_{\theta}+(\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))+1\). This insertion was minimum, because \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\).

This concludes the proof of the Lemma 2. □

In order to show that the Algorithm *CPP* returns the smallest power of a path containing *G* as an induced subgraph, we present two results with a given power of a path *P*^{σ} containing *G* as an induced subgraph. First, we shall give some notation from [3]. Given an unit interval graph *G* and an unit interval model associated to its vertices *I*={*v*_{1},*v*_{2},…,*v*_{
n
}}, we recall that the interval associated to vertex *v* is (*a*_{
v
},*b*_{
v
}). We say that *v*_{1},*v*_{2},…,*v*_{
n
} is a *natural labeling* for the vertices of *G*, if \(a_{v_{i}} \leq a_{v_{i+1}}\), for each 1≤*i*≤*n*−1. The ordering *v*_{1}<*v*_{2}<…<*v*_{
n
} is a *natural ordering*, if *v*_{1},*v*_{2},…,*v*_{
n
} is a natural labeling for *V*(*G*). A vertex is a *left anchor* if it can receive the label *v*_{1} in some natural labeling for *V*(*G*). Consider the model *I*′ obtained by mirroring an unit interval model *I* (that is, replacing each interval (*a*,*b*) by (−*b*,−*a*)). Model *I*′ is also a valid unit interval model for *G*, so the rightmost interval in *I* is also a left anchor.

In the next results, we show properties of the ordering of *V*(*G*) induced by a natural ordering that is generated by the subscripts of a natural labeling of a power of a path.

### Lemma 3

*Let*\(P_{n_{\sigma}}^{\sigma}\)*be a power of a path that contains**G**as an induced subgraph*. *The ordering of the vertices of**V*(*G*) *induced by a natural ordering of the vertices of**V*(*P*^{σ}) *satisfies Property *2.

### Proof

Suppose that this ordering of *V*(*G*) does not satisfy Property 2. Then, there exist three vertices *v*_{
r
},*v*_{
s
},*v*_{
t
}∈*V*(*G*) such that *v*_{
r
}<*v*_{
s
}<*v*_{
t
} with \(v_{r}v_{s}\not\in E(G)\) and *v*_{
r
}*v*_{
t
}∈*E*(*G*). It follows that *v*_{
r
}*v*_{
t
}∈*E*(*G*)⊂*E*(*P*^{σ}). Therefore, \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{t})|\leq \sigma\). Since *v*_{
r
}<*v*_{
s
}<*v*_{
t
} in *V*(*P*^{σ}), we have \(|\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq |\mathrm {order}_{P}(v_{r}) - \mathrm {order}_{P}(v_{t})|\), and then \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq \sigma\). Consequently, *v*_{
r
}*v*_{
s
}∈*E*(*P*^{σ}) and \(v_{r}v_{s} \not\in E(G)\), i.e., \(P_{n_{\sigma}}^{\sigma}\) does not contain *G* as an induced subgraph. □

Vertices \(v, \overline{v} \in V(G)\) are *indistinguishable vertices* (*twin vertices*) if \(N_{G}[v] = N_{G}[\overline{v}]\). The next result states that it is possible to change the position, between indistinguishable vertices of *V*(*G*) in a natural ordering of *V*(*P*^{σ}).

### Lemma 4

*Let**v*, \(\overline{v} \in V(G)\)*such that*\(N_{G}[v] = N_{G}[\overline{v}]\)*with**v*=*u*_{
i
}*and*\(\overline{v} = u_{j}\)*in**V*(*P*^{σ}). *If we change the positions of vertices**v**and*\(\overline{v}\)*in**P*^{σ}, *i*.*e*., *v*=*u*_{
j
}*and*\(\overline{v} = u_{i}\), *graph**G**will still be an induced subgraph of**P*^{σ}.

### Proof

Without loss of generality, suppose *i*<*j*. By Lemma 3, the ordering of *V*(*G*) induced by a natural ordering of *V*(*P*^{σ}) satisfies Property 2. So, \(N_{G}[v]=\{v_{\eta_{G}(v)},\ldots, v_{\xi_{G}(v)}\}\) and \(N_{G}[\overline{v}]=\{v_{\eta_{G}(\overline{v})}, \ldots,v_{\xi_{G}(\overline{v})}\}\). Since \(N[v] = N[\overline{v}]\), we have \(\xi_{G}(\overline{v}) =\xi_{G}(v)\) and \(\eta_{G}(\overline{v}) = \eta_{G}(v)\). Then, \(v_{\xi_{G}(v)} =v_{\xi_{G}(\overline{v})}\), \(v_{\eta_{G}(v)} = v_{\eta_{G}(\overline{v})}\), \(v_{\xi_{G}(v)+1}=v_{\xi_{G}(\overline{v})+1}\) and \(v_{\eta_{G}(v)-1} = v_{\eta_{G}(\overline{v})-1}\). Thus, by changing the positions of vertices *v* and \(\overline{v}\) in *P*^{σ}, we have \(\mathrm {order}_{P^{\sigma}}(\overline{v}) - \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(\overline{v})-1}) \geq \sigma +1\); then edge \(\overline{v}v_{\eta_{G}(\overline{v})-1} \not\in E(P^{\sigma})\). Also, for any *v*′∈*V*(*G*) with \(\mathrm {order}_{G}(v') < \mathrm {order}_{G}(v_{\eta_{G}(\overline{v})-1})\), edge \(\overline{v}v' \not\in E(P^{\sigma})\). Similarly, \(\mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(v)+1}) - \mathrm {order}_{P^{\sigma}}(v)\geq \sigma + 1\), i.e., edge \(vv_{\xi_{G}(v)+1} \not\in E(P^{\sigma})\) and also, for any *v*′∈*V*(*G*) with \(\mathrm {order}_{G}(v_{\xi_{G}(v)+1}) < \mathrm {order}_{G}(v')\), edge \(vv' \not\in E(P^{\sigma})\).

Analogously, \(\sigma \geq \mathrm {order}_{P^{\sigma}}(\overline{v})- \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(v)})\), i.e., edge \(\overline{v}v_{\eta_{G}(\overline{v})} \in E(P^{\sigma})\) and, for any *v*′∈*V*(*G*) with \(\mathrm {order}_{G}(v_{\eta_{G}(\overline{v})}) < \mathrm {order}_{G}(v') < \mathrm {order}_{G}(\overline{v})\), edge \(\overline{v}v' \in E(P^{\sigma})\). Similarly \(\sigma \geq \mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(\overline{v})}) - \mathrm {order}_{P^{\sigma}}(v)\), i.e., edge \(vv_{\xi_{G}(v)} \in E(P^{\sigma})\) and, for any *v*′∈*V*(*G*) with \(\mathrm {order}_{G}(v) < \mathrm {order}_{G}(v') <\mathrm {order}_{G}(v_{\xi_{G}(v)})\), edge *vv*′∈*E*(*P*^{σ}). □

In what follows, we denote by *v*_{
i
}<_{
B
}*v*_{
j
} if order_{
G
}(*v*_{
i
})<order_{
G
}(*v*_{
j
}) considering the ordering of *V*(*G*) given by Algorithm *Recognize* [3]. First, Theorem 4, we need two results.

### Theorem 3

(Theorem 2.2 [3])

*Let**I**be an unit interval model of an unit interval graph**G**with natural labeling**v*_{1},…,*v*_{
n
}. *Then*, *for all vertices*\(\bar{v},\ v \in V(G)\), *if*\(a_{\bar{v}} < a_{v}\)*but*\(v <_{B} \bar{v}\), *we have*\(N_{G}[v]=N_{G}[\bar{v}]\).

As consequence of Theorem 2.3 of [3], we have the following result.

### Lemma 5

([3])

*Let*\(v'_{1} <_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\)*be an ordering of**V*(*G*) *given by Algorithm *Recognize [3] *of an unit interval graph**G*. *Given a natural labeling**v*_{1},…,*v*_{
n
}*then*\(N_{G}[v'_{1}]=N_{G}[v_{1}]\)*or*\(N_{G}[v'_{1}]=N_{G}[v_{n}]\).

Finally, the correctness of Algorithm *CPP* is given by theorem below.

### Theorem 4

*Let**G**be an unit interval graph*. *Algorithm* CPP *returns the smallest power of a path*\(P^{\theta}_{n_{\theta}}\)*with respect to**θ**and**n*_{
θ
}, *that contains**G**as an induced subgraph*.

### Proof

Let \(P^{\sigma}_{n_{\sigma}}\) be the smallest power of a path that contains *G* as an induced subgraph. Let \(\overline{u}_{1} < \cdots < \overline{u}_{n_{\sigma}}\) be a natural ordering of *V*(*P*^{σ}) and let \(\overline{v}_{1} < \cdots < \overline{v}_{n}\) be the ordering of *V*(*G*) induced by the natural ordering of *V*(*P*^{σ}). Clearly, \(\overline{v}_{1}, \ldots, \overline{v}_{n}\) is a natural labeling of *V*(*G*). Let *I* be a family of intervals for this labeling of *V*(*G*), such that each *v*∈*V*(*G*) is associated to (*a*_{
v
},*b*_{
v
})∈*I*.

If we prove \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\) is equal to \(v'_{1} <_{B}v'_{2} <_{B} \cdots <_{B} v'_{n}\) up to indistinguishable vertices, we have *θ*=*σ* and *n*_{
θ
}=*n*_{
σ
}. In fact, since *P*^{σ} is the smallest power of a path that contains *G* as an induced subgraph, then *σ*≤*θ* and *n*_{
σ
}≤*n*_{
θ
}. On the order hand, by Lemma 2, the power of a path *P*^{θ} returned by Algorithm *CPP* is the smallest power of a path that contains *G* as an induced subgraph with respect to the ordering, \(v'_{1}<_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\). So, if this ordering is equal to \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\), up to indistinguishable vertices, by Lemma 4, *P*^{σ} contains *G* as an induced subgraph with respect to the ordering \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\). Then, by minimality of *θ* and *n*_{
θ
} with respect to \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\), we have *σ*≥*θ* and *n*_{
σ
}≥*n*_{
θ
}.

First, suppose that the left anchor \(\overline{v}_{1}\) is equal to \(v'_{1}\). Suppose, by absurd, that there exist \(v, \ \tilde{v} \in V(G)\), such that \(v < \tilde{v}\), \(\tilde{v} <_{B} v\) and \(N_{G}[v] \neq N_{G}[\tilde{v}]\). Since \(v < \tilde{v}\) then \(a_{v} \leq a_{\tilde{v}}\). If \(a_{v} = a_{\tilde{v}}\), since all intervals of *I* have the same length, we have \(b_{v} = b_{\tilde{v}}\) and hence \(N_{G}[v] =N_{G}[\tilde{v}]\) a contradiction to the hypothesis. If \(a_{v} < a_{\tilde{v}}\), since \(\tilde{v} <_{B} v\) then, by Theorem 3, \(N_{G}[v] = N_{G}[\tilde{v}]\), a contradiction to the hypothesis. Thus, for all pair of vertices \(v, \tilde{v} \in V(G)\) such that \(v < \tilde{v}\) and \(\tilde{v} <_{B} v\), then \(N_{G}[v] = N_{G}[\tilde{v}]\). Consequently, we have *σ*=*θ* and *n*_{
σ
}=*n*_{
θ
}.

Now, suppose that the left anchor \(\overline{v}_{1}\) is different from \(v'_{1}\). By Lemma 5, either \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\) or \(N_{G}[\overline{v}_{n}] =N_{G}[v'_{1}]\). If \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\), by Lemma 4, we can change the positions of these vertices in *V*(*P*^{σ}), i.e., \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(v'_{1})}=\overline{v}_{1}\) and \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(\overline{v}_{1})} = v'_{1}\) and *G* will still be an induced subgraph of *P*^{σ}. After this change \(v'_{1} < \overline{v}_{2} < \cdots <\overline{v}_{1} < \cdots < \overline{v}_{n}\) is the new ordering of *V*(*G*) induced by the ordering of *V*(*P*^{σ}). We repeat the same argument used in the previous case, where \(\overline{v}_{1}\) is equal to \(v'_{1}\) and we conclude the proof. If \(N_{G}[\overline{v}_{n}] = N_{G}[v'_{1}]\), since \(\overline{v}_{n}\) is the left anchor of the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} <\cdots < \overline{v}_{1}\) of *V*(*G*) induced by the natural ordering \(\overline{u}_{n_{\sigma}} < \cdots < \overline{u}_{1}\) of *V*(*P*^{σ}) then, we can repeat the previous argument for the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} < \cdots < \overline{v}_{1}\) and so we conclude the proof. □

The Algorithm *CPP* analyzes each vertex of *G* in the ordering returned by Algorithm *Recognize* [3] a single time. In the worst case, the Algorithm *CPP* calls Procedure *SHIFT* for each vertex *v*_{
l
}∈*V*(*G*) only once. Since for each vertex *v*_{
l
} the Procedure *SHIFT* analyzes the set of vertices of *G*_{
l
} at most once, the complexity of the Algorithm *CPP* is \(\mathcal{O}(n^{2})\).

*G* is not an induced subgraph of *G*_{
S
} and *G*_{
T
}

If we relax the constraint that *G* must be an induced subgraph of *G*_{
S
} or *G*_{
T
} then even for unit interval graphs it is possible to find two powers of paths, whose intersection contains *G* as an induced subgraph, smaller than the answer given by Algorithm *CPP*. See an example in Fig. 5.

If graph *G* is an unit interval graph then *G* contains no induced Claw (Fig. 6), *S*_{3} (Fig. 7), \(\overline{S}_{3}\) (Fig. 8) and Cycle (*C*_{
n
}), *n*≥4. If *G* is a cycle *C*_{
n
}, *n*≥4. Then the smallest *θ*-powers of paths, *G*_{
S
} and *G*_{
T
}, such that *G*_{
S
}∩*G*_{
T
} contains *C*_{
n
} as induced subgraph can be obtained as follows. First, we construct *G*_{
S
}: for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), *u*_{2j−1}:=*v*_{
j
}; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), *u*_{2j}:=*v*_{n+1−j}. Now, we construct *G*_{
T
}: for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), *w*_{2j−1}:=*v*_{j+1}; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), *w*_{2j}:=*v*_{
k
}, where \(k = (n+2-j) \operatorname {mod}n\). See an example when *G* is a *C*_{6} in Fig. 9.

### Theorem 5

*Let**G*_{
S
}*and**G*_{
T
}*be* 2-*powers of paths with n vertices constructed by the previous method*. *Then**G*_{
S
}∩*G*_{
T
}*is**C*_{
n
}, *n*≥4.

### Proof

Let *G*_{
S
} be the 2-power of path *P*_{
S
}=*u*_{1},…,*u*_{
n
}, and let *G*_{
T
} be the 2-power of path *P*_{
T
}=*w*_{1},…,*w*_{
n
} constructed by the previous method. Since the distance between consecutive vertices of *G* in *G*_{
S
} (resp. *G*_{
T
}) is less than or equal to 2, *G*_{
S
} (resp. *G*_{
T
}) contains *G* as subgraph.

For each *v*_{
i
}∈*C*_{
n
}, \(i \in \{2, \ldots, \lceil\frac{n}{2}\rceil,\lceil\frac{n}{2}\rceil +2, \ldots, n\}\), and 3≤*j*≤*n*−2, if *u*_{
j
}=*v*_{
i
} with *j* odd then *w*_{j−2}=*v*_{
i
}; if *j* is even, we have *w*_{j+2}=*v*_{
i
}.

Now, let *v*_{
i
}∈*C*_{
n
}, if *u*_{
j
}=*v*_{
i
}, 3≤*j*≤*n*−2 with *j* odd (resp. even), then *w*_{j−2}=*v*_{
i
} (resp. *w*_{j+2}=*v*_{
i
}), and its neighbors *u*_{j−1}=*v*_{
k
}=*w*_{j−1+2} (resp. *u*_{j−1}=*v*_{k−1}=*w*_{j−1−2}) and *u*_{j+1}=*v*_{k−1}=*w*_{j+1+2} (resp. *u*_{j+1}=*v*_{
k
}=*w*_{j+1−2}). We conclude that \(d_{P_{S}}(v_{i},v_{k})= d_{P_{S}}(v_{i}, v_{k-1}) = 1\), \(d_{P_{T}}(v_{i}, v_{k}) = 3\) and \(d_{P_{T}}(v_{i}, v_{k-1}) = 5\), i.e., *v*_{
i
}*v*_{
k
},*v*_{
i
}*v*_{k−1}∈*E*(*G*_{
S
}) and \(v_{i}v_{k},v_{i}v_{k-1} \not \in E(G_{T})\). Hence, \(v_{i}v_{k},v_{i}v_{k-1} \not \in G_{S} \cap G_{T}\). □

## Conclusion

In this work, we developed an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph *G*, an explicit representation of the smallest *θ*-power of path *G*_{
S
} (with respect to *θ* and to the number of vertices) that contains *G* as an induced subgraph. We construct *G*_{
T
}, a *θ*-power of a path with the same number of vertices of *G*_{
S
}, such that the intersection *G*_{
S
}∩*G*_{
T
} contains *G* as an induced subgraph.

We remark that *θ* can be greater than or equal to the size of a maximum clique of the graph *G*, *ω*(*G*). We present in Fig. 10 an example where *G* has *ω*(*G*)=4 and Algorithm *CPP* returns *θ*=5, but the difference between *θ* and *ω*(*G*) can be greater than 1.

In case graph *G* is not an induced of *G*_{
S
} and *G*_{
T
}, we show a method that generates *G*_{
S
} and *G*_{
T
}, 2-powers of paths with *n* vertices, whose intersection is *C*_{
n
}, *n*≥4.

As future work, we intend to investigate this problem for other classes of graphs. We remark that all remaining forbidden induced subgraphs of unit interval graphs (Figs. 6, 7 and 8), have answer YES to Question 1.

For a Claw graph, we see that *G*_{
S
} is the 2-power of path *P*_{
S
}=*v*_{2},*a*,*v*_{1},*v*_{3},*v*_{4}; and *G*_{
T
} is the 2-power of path *P*_{
T
}=*v*_{3},*b*,*v*_{1},*v*_{2},*v*_{4}. For a 3-sun graph, we find that *G*_{
S
} and *G*_{
T
} are 4-powers of paths *P*_{
S
}=*v*_{5},*a*,*b*,*v*_{4},*v*_{6},*v*_{3},*v*_{1},*v*_{2} and *P*_{
T
}=*v*_{1},*v*_{6},*x*,*v*_{5},*v*_{2},*v*_{4},*y*,*v*_{3}, respectively. For a Net graph, we see that *G*_{
S
} and *G*_{
T
} are 2-powers of paths *P*_{
S
}=*v*_{4},*v*_{2},*v*_{1},*v*_{5},*v*_{3},*a*,*v*_{6} and *P*_{
T
}=*v*_{4},*b*,*v*_{1},*v*_{5},*v*_{3},*v*_{2},*v*_{6}, respectively.

## References

Adam Z, Choi V, Sankoff D, Zhu Q (2008) Generalized gene adjacencies, graph bandwidth and clusters in yeast evolution. In: Lecture Notes in Bioinformatics, vol 4983, pp 134–145

Brandstädt A, Hundt C, Mancini F, Wagner P (2010) Rooted directed path graphs are leaf powers. Discrete Math 310:897–910

Corneil DG, Kim H, Natarajan S, Olariu S, Sprague A (1995) Simple linear time recognition of unit interval graphs. Inf Process Lett 55:99–104

Figueiredo CMH, Meidanis J, Mello CP (1995) A linear-time algorithm for proper interval graph recognition. Inf Process Lett 56:179–184

Lin MC, Rautenbach D, Soulignac FJ, Szwarcfiter JL (2011) Powers of cycles, powers of paths, and distance graph. Discrete Appl Math 159:621–627

Lin MC, Soulignac FJ, Szwarcfiter JL (2009) Short models for unit interval graphs. Electron Notes Discrete Math 35:247–255

Roberts FS (1968) Representations of indifference relations. Stanford University, Stanford

Sankoff D, Xu X (2008) Tests for gene clusters satisfying the generalized criterion. Lect Notes Comput Sci 5167:152–160

Soulignac FJ (2010) On proper and helly circular-arc graphs. Universidad de Buenos Aires, Buenos Aires

## Acknowledgements

This research was supported by CNPq and FAPERJ.

We are really grateful to professor Jayme Szwarcfiter for having presented to us the paper [5] in the very beginning of our work and for fruitful discussions on this topic. We are also thankful to the anonymous referees for their careful reading and valuable contributions.

## Author information

### Authors and Affiliations

### Corresponding author

## Rights and permissions

**Open Access**
This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (
https://creativecommons.org/licenses/by/2.0
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

### Cite this article

Costa, V., Dantas, S., Sankoff, D. *et al.* Gene clusters as intersections of powers of paths.
*J Braz Comput Soc* **18**, 129–136 (2012). https://doi.org/10.1007/s13173-012-0064-8

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s13173-012-0064-8

### Keywords

- Power of a path
- Unit interval graph
- Genome
- Gene clusters