Open Access

Gene clusters as intersections of powers of paths

Journal of the Brazilian Computer Society201218:64

https://doi.org/10.1007/s13173-012-0064-8

Received: 31 January 2012

Accepted: 1 February 2012

Published: 23 February 2012

Abstract

There are various definitions of a gene cluster determined by two genomes and methods for finding these clusters. However, there is little work on characterizing configurations of genes that are eligible to be a cluster according to a given definition. For example, given a set of genes in a genome, is it always possible to find two genomes such that their intersection is exactly this cluster? In one version of this problem, we make use of the graph theory to reformulated it as follows: Given a graph G with n vertices, do there exist two θ-powers of paths G S =(V S ,E S ) and G T =(V T ,E T ) such that G S G T contains G as an induced subgraph? In this work, we divide the problem in two cases, depending on whether or not G is an induced subgraph of G S or G T . We show an \(\mathcal{O}(n^{2})\) time algorithm that generates the smallest θ-powers of paths G S and G T (with respect to and the number of vertices) that contains G as an induced subgraph. Finally, we discuss the problem when G is an induced subgraph neither of G S nor of G T and we present a method of finding the smallest power of a path when graph G is a cycle C n .

Keywords

Power of a pathUnit interval graphGenomeGene clusters

1 Introduction

Due to recent research on genetic mapping, a large amount of information is available and stored in databases of various research centers in the world. Processing these data, in order to obtain relevant biological conclusions, is one of the challenges in biology. One way to structure these data is using comparison of genomes, i.e., the search for similarities and differences between two or more organisms. The central question of this paper proposes to deal with a problem in this area by asking: given a set of genes in a genome, called cluster, is it always possible to find two genomes such that their intersection is exactly this cluster? First, we show the modeling presented by Adam et al. [1] and Sankoff and Xu [8], which will be used in this paper.

A marker is a gene with a known location on a chromosome. Let V X be the set of n markers in the genome X. These markers are partitioned among a number of total orders called chromosomes. For markers g and h in V X on the same chromosome in X, let ghE X if the number of genes intervening between g and h in X is less than θ, where θ≥1 is a fixed neighborhood parameter. We call G X =(V X ,E X ) a θ-adjacency graph if its edges are determined by a neighborhood parameter θ.

Consider the θ-adjacency graphs G S =(V S ,E S ) and G T =(V T ,E T ) with a non-null set of vertices in common V ST =V S V T . We say that a subset of VV ST is a generalized adjacency cluster if it consists of vertices of a maximal connected subgraph of G ST =(V ST ,E S E T ). We call G=G ST [V] the subgraph induced by set V.

Let G=(V(G),E(G)) be a graph with vertex set V(G) and edge set E(G), such that |V(G)|=n. Let v, \(\bar{v} \in V(G)\). The distance between vertices v and \(\bar{v}\), denoted by \(d_{G}(v,\bar{v})\), is the number of edges in a shortest path between v and \(\bar{v}\) in G. A path between two vertices v0 and v t of graph G is a sequence of vertices v1,v2,…,v t such that v i vi+1 is an edge of G, 1≤it−1. Let P n be a graph that is a path with n vertices. A θ-power of a path\(P_{n_{\theta}}\), denoted by \(P^{\theta}_{n_{\theta}}\), θ>0, is graph such that \(V(P^{\theta}_{n_{\theta}}) = V(P_{n_{\theta}})\) and \(E(P_{n_{\theta}}^{\theta}) = \{v\bar{v} : d_{P_{n_{\theta}}}(v, \bar{v})\leq \theta \ \mathrm{with} \ v, \bar{v} \in V(P_{n_{\theta}}^{\theta})\}\). For the benefit of the reader, we denote the power of a path \(P^{\theta}_{n_{\theta}}\) by P θ . The definition of a chromosome with n θ markers in a θ-adjacency graph is similar to a power of a path \(P^{\theta}_{n_{\theta}}\). Now, the central question of this work can be reformulated as follows:

Question 1

([2, 5])

Given a connected graph G, do there exist G S and G T , two θ-powers of paths P S and P T , whose intersection contains G as an induced subgraph?

If the answer is yes, we are also interested in finding the minimum value of power θ and number vertices n θ for these two θ-powers of paths.

In order to contribute to this challenging problem, we divide our study in two cases, depending on whether or not G is an induced subgraph of G S or G T . First, we give some definitions. We say that G is an unit interval graph if there exists a family I of intervals (a,b) on the real line such that each vV(G) can be put in a one-to-one correspondence with (a v ,b v )I; the intervals in I are of same length; and \(v\bar{v}\) is a edge of E(G) if, and only if, \((a_{v}, b_{v}) \cap (a_{\bar{v}}, b_{\bar{v}}) \neq \emptyset\). This family of intervals is called an interval model for G. Lin et al. [6] and Soulignac [9] present a proof that the class of proper interval graphs precisely the class of unit interval graphs. There exist linear-time recognition algorithms for unit interval graphs, for example Figueiredo et al. [4] and Corneil et al. [3].

Brandstädt et al. [2] and Lin et al. [5] proved independently the following structural property:

Theorem 1

([2, 5])

A graphGis an induced subgraph of a power of a path if, and only if, Gis an unit interval graph.

Thus, given an unit interval graph G with n vertices, there exists a θ-power of a path \(P_{n_{\theta}}\) that contains G as an induced subgraph. But the proofs of the structural characterization given by Theorem 1 [2, 5] does not lead to an algorithm that constructs G S and G T for Question 1 with minimum value of power θ and number vertices n θ .

In the paper [6], the authors show an \(\mathcal{O}(n)\) time algorithm that includes new intervals into a proper interval model I of a connected graph G, constructing an extended model I′ containing I. This extended model I′ gives an implicit representation of a power of a path for all proper interval graph G, but the number of inserted intervals, or the size of the power θ, cannot be minimum. The authors also remark that any explicit representation would require \(\mathcal{O}(n^{2})\) steps.

We present in this work an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph G, an explicit representation of the smallest θ-power of path, G S (with respect to θ and to the number of vertices), that contains G as an induced subgraph. Next, we construct G T , a θ-power of a path with the same number of vertices of G S , such that the intersection G S G T contains G as an induced subgraph.

This paper is organized as follows. In Sects. 2 and 3, we present the algorithm and we prove its correctness and complexity. In Sect. 4, we discuss the problem when G is an induced subgraph neither of G S nor of G T and we present a method of finding the smallest power of a path when graph G is a cycle C n .

2 The algorithm

Our result is based on the ordering of the vertex set of G, given by Algorithm Recognize [3], which satisfies the property proved by Roberts in [7]:

Property 2

A graphGis an unit interval graph if and only if there is an order < on vertices such that for all verticesv, the closed neighborhood ofvis a set of consecutive vertices with respect to the order <.

Since all powers of paths are unit interval graphs, we can insert the vertices of V(G) in the vertex set of a power of a path \(P^{\theta}_{n_{\theta}}\) until this power of a path contains G as an induced subgraph.

This construction is done by Algorithm CPP as follows. First, let v1<v2<<v n be an ordering of V(G) given by Algorithm Recognize [3]. We consider θ0 as the number of vertices of the maximal clique that contains v1, minus one; and we insert the vertices of this clique in \(P^{\theta_{0}}\). The Algorithm CPP constructs a sequence of power of a paths \(P^{\theta_{0}} \subset P^{\theta_{1}} \subset \cdots \subset P^{\theta_{l-1}} \subset P^{\theta_{l}}\) such that θ i =θi−1+1.

Let v be the first vertex non-adjacent to v1 in the order on V(G). If v is adjacent to v2, Algorithm CPP must insert v in the vertex of \(P^{\theta_{0}}\) that is at distance θ0+1 from vertex v1 in \(P^{\theta_{0}}\). Similarly, if v is not adjacent to v t , but is adjacent to vt+1, Algorithm CPP must insert v in the vertex of \(P^{\theta_{0}}\) that is at a distance θ0+1 from vertex v t in \(P^{\theta_{0}}\). This is done by inserting t−1 vertices between the vertex of largest index adjacent to v1 and v in \(P^{\theta_{0}}\). Now, suppose that there exist at least two vertices v, \(\bar{v}\) that are not adjacent to v1 and adjacent to v2. Let \(\bar{v}\) be the second vertex of this set. In order to minimize the number of vertices of \(P^{\theta_{0}}\), vertex \(\bar{v}\) must be a vertex of \(P^{\theta_{0}}\) at distance θ0+2 of vertex v1 in \(P^{\theta_{0}}\). Then Algorithm CPP must call Procedure SHIFT to increase θ0 to θ1:=θ0+1 because of the edge \(\bar{v}v_{2}\). On the other hand, this increase adds several edges in \(P^{\theta_{0}}\) which are not in E(G). Thus, Procedure SHIFT adjusts the power of a path \(P^{\theta_{0}}\) for the new θ1, by inserting vertices in \(P^{\theta_{0}}\) in order to preserve the adjacencies and non-adjacencies between vertices of G and generates a new \(P^{\theta_{1}}\). Algorithm CPP proceeds until all vertices of V(G) are included in \(P_{n_{\theta}}^{\theta}\), a smallest power of a path with respect to θ and n θ .

Before describing Algorithm CPP, we borrow some definitions from [3]. Given an ordering of V(G) returned by Algorithm Recognize [3], then order G (v) is the position of vertex v considering this ordering; \(\xi_{G}(v) = \textrm{max} \{\mathrm {order}_{G}(\overline{v}) :\bar{v} \in N_{G}[v]\}\) and \(\eta_{G}(v) = \textrm{min} \{\mathrm {order}_{G}(\bar{v}): \bar{v} \in N_{G}[v]\}\), where N G [v]={wV(G):vwE(G)}{v}. Let vV(G) and uV(P θ ). We refer to \(\mathrm {order}_{P^{\theta}}(v)\) as the position of vertex v in the ordering of the vertex set of P θ , i.e., \(\mathrm {order}_{P^{\theta}}(v)= i\), if u i =v in P θ . We denote \(\xi_{P^{\theta}}(u)= \textrm{max} \{\mathrm {order}_{P^{\theta}}(\bar{u}) : \bar{u} \in N_{P^{\theta}}[u]\}\) and \(\eta_{P^{\theta}}(u) = \textrm{min} \{\mathrm {order}_{P^{\theta}}(\bar{u}) :\bar{u} \in N_{P^{\theta}}[u]\}\).

Next, we present Algorithm CPP and Procedure SHIFT.

Algorithm

CONSTRUCTING_POWER_OF_PATH(CPP)

Procedure SHIFT receives as input a smallest power of a path P θ that contains G[v1,…,vl−1], ξ G (v1)+1≤ln as an induced subgraph in P θ . Power P θ contains the last vertex v l inserted by Algorithm CPP. Vertex v l raises Procedure SHIFT because v l is not adjacent to some vertex vlt in P θ , but vltv l E(G).

Procedure

SHIFT

Algorithm CPP returns \(P_{n_{\theta}}^{\theta}\), the smallest power of a path (with respect to θ and n θ ) that contains G as an unit interval graph. We construct two powers of paths, G T =(V T ,E T ) and G S =(V S ,E S ), from \(P_{n_{\theta}}^{\theta}\) as follows. First, \(V_{T}= V_{S} = V(P^{\theta}_{n_{\theta}})\). Then, vertices of V T , which are not in V, receive different labels from vertices in \(V(P_{n_{\theta}}^{\theta})\).

We show an example of an unit interval graph G in Fig. 1. For this graph G, Algorithm CPP returns G S , the 2-power of path P S =v1,v2,v3,0,v4,v5. Then, G T is a 2-power of path P T =v1,v2,v3,v b ,v4,v5.
Fig. 1

Algorithm CPP returns the 2-power of path P6=v1,v2,v3,0,v4,v5 for unit interval graph G

3 Proofs

In this section, we present the proofs of correctness of the Procedure SHIFT (Lemma 1) and Algorithm CPP (Theorem 4).

Lemma 1

LetP θ be a smallest power of a path that containsGl−1=G[v1,…,vl−1] as an induced subgraph, with respect to the orderingv1<<vl−1. Letv l V(G) be the next vertex inserted inP θ and\(v_{l-t-1}v_{l} \not\in E(G), \ v_{l-t}v_{l} \in E(G)\)and\(d_{P_{n_{\theta}}}(v_{l-t}, v_{l})= \theta+1\). Then, the output of the Procedure SHIFT, the power of a pathPθ+1, is a smallest power of a path that containsG l =G[v1,…,vl−1,v l ] as an induced subgraph, with respect to the orderingv1<<vl−1<v l .

Proof

Since \(v_{l-t-1}v_{l} \not\in E(G)\), vltv l E(G) and \(\theta+1=\allowbreak d_{P_{n_{\theta}}}(v_{l-t}, v_{l})\), the Procedure SHIFT must increase the power θ by one unit (Step 1). But the increase of θ to θ+1 creates several adjacencies in P θ between pairs of vertices of the set {v1,…,v l } that are non-adjacent in G. In order to preserve the adjacencies and non-adjacencies between vertices of G in P θ , Procedure SHIFT is forced to insert one vertex between the vertex that received vlt−1 in P θ and its consecutive vertex in P θ . Again, counting in descending order from vertex vlt−1, the adjacencies were violated in each “block” of θ vertices in P θ . So, the procedure must insert one vertex to each θ+1 vertices in descending order, from vertex vlt−1 in P θ . We observe that the set formed by the initial vertices of V(P θ ) has cardinality less than or equal to θ+1, because dividing \(\mathrm {order}_{P^{\theta}}(v_{l-t-1}) \ \) by θ+1 the remainder is greater than or equal to 1 and less than or equal to θ+1.

In each step, the procedure inserts the smallest number of vertices necessary to guarantee that the power of a path Pθ+1, created by Procedure SHIFT, contains G l [v1,…,v l ] as an induced subgraph. So, the power θ+1 and the number of inserted vertices are minimum and, consequently, Pθ+1 is a smallest power of a path that contains G l [v1,…,v l ] as an induced subgraph. □

First, we prove that Algorithm CPP correctly returns a smallest power of a path according to the ordering given by Algorithm Recognize [3].

Lemma 2

LetGbe a connected unit interval graph. Algorithm CPP generates the smallest power of a path\(P_{n_{\theta}}^{\theta}\), with respect toθandn θ , that containsGas an induced subgraph according to the orderingv1<<v n given by the input of CPP.

Proof

Algorithm CPP constructs a sequence of powers of paths \(P^{\theta_{0}} \subseteq P^{\theta_{1}} \subseteq \cdots \subseteq P^{\theta}\), where θ i =θi−1+1. This is done by successively adding, in each \(P^{\theta_{i}}\), vertices of G following the input ordering, preserving the adjacencies and non-adjacencies between vertices of G and minimizing θ and n θ . Initially, the power of a path \(P^{\theta_{0}}\) receives the maximal clique containing v1, i.e., \(V(P^{\theta_{0}}) =\{u_{1}, \ldots, u_{\xi_{P^{\theta_{0}}}(v_{1})}\}\) and θ0=ξ G (v1)−1. This is the smallest power of a path that contains \(G[v_{1}, \ldots, v_{\xi_{G}(v_{1})}]\) as an induced subgraph.

Suppose that the l−1 first vertices, i.e., {v1,…,vl−1}, were already been inserted by Algorithm CPP in the power of a path P θ , i.e., P θ is the smallest power of a path, with respect to θ and n θ that contains G[v1,…,vl−1] as an induced subgraph. Let v l V(G) the next vertex to be inserted by Algorithm CPP in P θ . Suppose that v l is adjacent, in P θ , to {vlt,…,vl−1}. Vertex v l must be inserted in P θ between positions \(\xi_{P^{\theta}}(v_{l-t-1})+1\) and \(\xi_{P^{\theta}}(v_{l-t})\) so that G[v1,…,v l ] be an induced subgraph of P θ . Then, \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta+1\) and \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). We consider two cases with respect to the adjacencies of v l in G. From now on, we refer to Fig. 2, and Fig. 3 and Fig. 4, where dashed lines represent adjacencies.
Fig. 2

Bracket indicates possible positions of v l

Fig. 3

Bracket indicates possible positions of vl−1

Fig. 4

Bracket indicates possible positions of vertices vl−1 and v l

Case 1: If t=θ, then after insertion of v l , \(\theta + 1 \leq d_{P_{n_{\theta}}}(v_{l-t-1},v_{l})\), because the set {vlt,…,vl−1} has t=θ elements (see Fig. 2). In order to minimize θ and n θ , Algorithm CPP must insert v l in the consecutive vertex to vl−1 in the power of a path P θ , and as a consequence \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta + 1\). In effect, since vl−1 is adjacent to vlt in P θ , by hypothesis, vl−1 was inserted in P θ such that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) \leq \theta\) and v l was inserted in the consecutive vertex to vl−1 in P θ , then the claim is true. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\), Algorithm CPP inserted v l without changing θ, the number of vertices of P θ became n θ +1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta+1\), Algorithm CPP called the Procedure SHIFT and, by Lemma 1, we conclude the proof.

Case 2: If 1<t<θ, Algorithm CPP must insert v l in P θ such that \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta + 1\) so that vlt−1 and v l are not adjacent. We observe the position of vl−1 in P θ . If vl−1 is not adjacent to vlt−1 in P θ (see Fig. 3), in order to minimize the number of vertices of P θ , Algorithm CPP inserts v l in the consecutive vertex to vl−1. By hypothesis, vertex vl−1 was inserted in P θ so that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1})\leq \theta\). Then, if \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) < \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). Thus v l was inserted in P θ without changing θ, the number of vertices of P θ became n θ +1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) = \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta + 1\), Procedure SHIFT was called and, by Lemma 1, we conclude the proof.

If vl−1 is adjacent to vlt−1 in P θ (see Fig. 4), the position of vlt−1 in P θ is between lt+1 and \(\xi_{P^{\theta}}(v_{l-t-1})\), including them. Again, in order to minimize the number of vertices of P θ , vertex v l is inserted \((\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))\) vertices after vertex vl−1 in P θ . Thus,
Since \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) < d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). So, v l was inserted in P θ without changing θ, and the number of vertices of P θ became \(n_{\theta}+(\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))+1\). This insertion was minimum, because \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\).

This concludes the proof of the Lemma 2. □

In order to show that the Algorithm CPP returns the smallest power of a path containing G as an induced subgraph, we present two results with a given power of a path P σ containing G as an induced subgraph. First, we shall give some notation from [3]. Given an unit interval graph G and an unit interval model associated to its vertices I={v1,v2,…,v n }, we recall that the interval associated to vertex v is (a v ,b v ). We say that v1,v2,…,v n is a natural labeling for the vertices of G, if \(a_{v_{i}} \leq a_{v_{i+1}}\), for each 1≤in−1. The ordering v1<v2<…<v n is a natural ordering, if v1,v2,…,v n is a natural labeling for V(G). A vertex is a left anchor if it can receive the label v1 in some natural labeling for V(G). Consider the model I′ obtained by mirroring an unit interval model I (that is, replacing each interval (a,b) by (−b,−a)). Model I′ is also a valid unit interval model for G, so the rightmost interval in I is also a left anchor.

In the next results, we show properties of the ordering of V(G) induced by a natural ordering that is generated by the subscripts of a natural labeling of a power of a path.

Lemma 3

Let\(P_{n_{\sigma}}^{\sigma}\)be a power of a path that containsGas an induced subgraph. The ordering of the vertices ofV(G) induced by a natural ordering of the vertices ofV(P σ ) satisfies Property 2.

Proof

Suppose that this ordering of V(G) does not satisfy Property 2. Then, there exist three vertices v r ,v s ,v t V(G) such that v r <v s <v t with \(v_{r}v_{s}\not\in E(G)\) and v r v t E(G). It follows that v r v t E(G)E(P σ ). Therefore, \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{t})|\leq \sigma\). Since v r <v s <v t in V(P σ ), we have \(|\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq |\mathrm {order}_{P}(v_{r}) - \mathrm {order}_{P}(v_{t})|\), and then \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq \sigma\). Consequently, v r v s E(P σ ) and \(v_{r}v_{s} \not\in E(G)\), i.e., \(P_{n_{\sigma}}^{\sigma}\) does not contain G as an induced subgraph. □

Vertices \(v, \overline{v} \in V(G)\) are indistinguishable vertices (twin vertices) if \(N_{G}[v] = N_{G}[\overline{v}]\). The next result states that it is possible to change the position, between indistinguishable vertices of V(G) in a natural ordering of V(P σ ).

Lemma 4

Letv, \(\overline{v} \in V(G)\)such that\(N_{G}[v] = N_{G}[\overline{v}]\)withv=u i and\(\overline{v} = u_{j}\)inV(P σ ). If we change the positions of verticesvand\(\overline{v}\)inP σ , i.e., v=u j and\(\overline{v} = u_{i}\), graphGwill still be an induced subgraph ofP σ .

Proof

Without loss of generality, suppose i<j. By Lemma 3, the ordering of V(G) induced by a natural ordering of V(P σ ) satisfies Property 2. So, \(N_{G}[v]=\{v_{\eta_{G}(v)},\ldots, v_{\xi_{G}(v)}\}\) and \(N_{G}[\overline{v}]=\{v_{\eta_{G}(\overline{v})}, \ldots,v_{\xi_{G}(\overline{v})}\}\). Since \(N[v] = N[\overline{v}]\), we have \(\xi_{G}(\overline{v}) =\xi_{G}(v)\) and \(\eta_{G}(\overline{v}) = \eta_{G}(v)\). Then, \(v_{\xi_{G}(v)} =v_{\xi_{G}(\overline{v})}\), \(v_{\eta_{G}(v)} = v_{\eta_{G}(\overline{v})}\), \(v_{\xi_{G}(v)+1}=v_{\xi_{G}(\overline{v})+1}\) and \(v_{\eta_{G}(v)-1} = v_{\eta_{G}(\overline{v})-1}\). Thus, by changing the positions of vertices v and \(\overline{v}\) in P σ , we have \(\mathrm {order}_{P^{\sigma}}(\overline{v}) - \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(\overline{v})-1}) \geq \sigma +1\); then edge \(\overline{v}v_{\eta_{G}(\overline{v})-1} \not\in E(P^{\sigma})\). Also, for any vV(G) with \(\mathrm {order}_{G}(v') < \mathrm {order}_{G}(v_{\eta_{G}(\overline{v})-1})\), edge \(\overline{v}v' \not\in E(P^{\sigma})\). Similarly, \(\mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(v)+1}) - \mathrm {order}_{P^{\sigma}}(v)\geq \sigma + 1\), i.e., edge \(vv_{\xi_{G}(v)+1} \not\in E(P^{\sigma})\) and also, for any vV(G) with \(\mathrm {order}_{G}(v_{\xi_{G}(v)+1}) < \mathrm {order}_{G}(v')\), edge \(vv' \not\in E(P^{\sigma})\).

Analogously, \(\sigma \geq \mathrm {order}_{P^{\sigma}}(\overline{v})- \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(v)})\), i.e., edge \(\overline{v}v_{\eta_{G}(\overline{v})} \in E(P^{\sigma})\) and, for any vV(G) with \(\mathrm {order}_{G}(v_{\eta_{G}(\overline{v})}) < \mathrm {order}_{G}(v') < \mathrm {order}_{G}(\overline{v})\), edge \(\overline{v}v' \in E(P^{\sigma})\). Similarly \(\sigma \geq \mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(\overline{v})}) - \mathrm {order}_{P^{\sigma}}(v)\), i.e., edge \(vv_{\xi_{G}(v)} \in E(P^{\sigma})\) and, for any vV(G) with \(\mathrm {order}_{G}(v) < \mathrm {order}_{G}(v') <\mathrm {order}_{G}(v_{\xi_{G}(v)})\), edge vvE(P σ ). □

In what follows, we denote by v i < B v j if order G (v i )<order G (v j ) considering the ordering of V(G) given by Algorithm Recognize [3]. First, Theorem 4, we need two results.

Theorem 3

(Theorem 2.2 [3])

LetIbe an unit interval model of an unit interval graphGwith natural labelingv1,…,v n . Then, for all vertices\(\bar{v},\ v \in V(G)\), if\(a_{\bar{v}} < a_{v}\)but\(v <_{B} \bar{v}\), we have\(N_{G}[v]=N_{G}[\bar{v}]\).

As consequence of Theorem 2.3 of [3], we have the following result.

Lemma 5

([3])

Let\(v'_{1} <_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\)be an ordering ofV(G) given by Algorithm Recognize [3] of an unit interval graphG. Given a natural labelingv1,…,v n then\(N_{G}[v'_{1}]=N_{G}[v_{1}]\)or\(N_{G}[v'_{1}]=N_{G}[v_{n}]\).

Finally, the correctness of Algorithm CPP is given by theorem below.

Theorem 4

LetGbe an unit interval graph. Algorithm CPP returns the smallest power of a path\(P^{\theta}_{n_{\theta}}\)with respect toθandn θ , that containsGas an induced subgraph.

Proof

Let \(P^{\sigma}_{n_{\sigma}}\) be the smallest power of a path that contains G as an induced subgraph. Let \(\overline{u}_{1} < \cdots < \overline{u}_{n_{\sigma}}\) be a natural ordering of V(P σ ) and let \(\overline{v}_{1} < \cdots < \overline{v}_{n}\) be the ordering of V(G) induced by the natural ordering of V(P σ ). Clearly, \(\overline{v}_{1}, \ldots, \overline{v}_{n}\) is a natural labeling of V(G). Let I be a family of intervals for this labeling of V(G), such that each vV(G) is associated to (a v ,b v )I.

If we prove \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\) is equal to \(v'_{1} <_{B}v'_{2} <_{B} \cdots <_{B} v'_{n}\) up to indistinguishable vertices, we have θ=σ and n θ =n σ . In fact, since P σ is the smallest power of a path that contains G as an induced subgraph, then σθ and n σ n θ . On the order hand, by Lemma 2, the power of a path P θ returned by Algorithm CPP is the smallest power of a path that contains G as an induced subgraph with respect to the ordering, \(v'_{1}<_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\). So, if this ordering is equal to \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\), up to indistinguishable vertices, by Lemma 4, P σ contains G as an induced subgraph with respect to the ordering \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\). Then, by minimality of θ and n θ with respect to \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\), we have σθ and n σ n θ .

First, suppose that the left anchor \(\overline{v}_{1}\) is equal to \(v'_{1}\). Suppose, by absurd, that there exist \(v, \ \tilde{v} \in V(G)\), such that \(v < \tilde{v}\), \(\tilde{v} <_{B} v\) and \(N_{G}[v] \neq N_{G}[\tilde{v}]\). Since \(v < \tilde{v}\) then \(a_{v} \leq a_{\tilde{v}}\). If \(a_{v} = a_{\tilde{v}}\), since all intervals of I have the same length, we have \(b_{v} = b_{\tilde{v}}\) and hence \(N_{G}[v] =N_{G}[\tilde{v}]\) a contradiction to the hypothesis. If \(a_{v} < a_{\tilde{v}}\), since \(\tilde{v} <_{B} v\) then, by Theorem 3, \(N_{G}[v] = N_{G}[\tilde{v}]\), a contradiction to the hypothesis. Thus, for all pair of vertices \(v, \tilde{v} \in V(G)\) such that \(v < \tilde{v}\) and \(\tilde{v} <_{B} v\), then \(N_{G}[v] = N_{G}[\tilde{v}]\). Consequently, we have σ=θ and n σ =n θ .

Now, suppose that the left anchor \(\overline{v}_{1}\) is different from \(v'_{1}\). By Lemma 5, either \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\) or \(N_{G}[\overline{v}_{n}] =N_{G}[v'_{1}]\). If \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\), by Lemma 4, we can change the positions of these vertices in V(P σ ), i.e., \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(v'_{1})}=\overline{v}_{1}\) and \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(\overline{v}_{1})} = v'_{1}\) and G will still be an induced subgraph of P σ . After this change \(v'_{1} < \overline{v}_{2} < \cdots <\overline{v}_{1} < \cdots < \overline{v}_{n}\) is the new ordering of V(G) induced by the ordering of V(P σ ). We repeat the same argument used in the previous case, where \(\overline{v}_{1}\) is equal to \(v'_{1}\) and we conclude the proof. If \(N_{G}[\overline{v}_{n}] = N_{G}[v'_{1}]\), since \(\overline{v}_{n}\) is the left anchor of the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} <\cdots < \overline{v}_{1}\) of V(G) induced by the natural ordering \(\overline{u}_{n_{\sigma}} < \cdots < \overline{u}_{1}\) of V(P σ ) then, we can repeat the previous argument for the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} < \cdots < \overline{v}_{1}\) and so we conclude the proof. □

The Algorithm CPP analyzes each vertex of G in the ordering returned by Algorithm Recognize [3] a single time. In the worst case, the Algorithm CPP calls Procedure SHIFT for each vertex v l V(G) only once. Since for each vertex v l the Procedure SHIFT analyzes the set of vertices of G l at most once, the complexity of the Algorithm CPP is \(\mathcal{O}(n^{2})\).

4 G is not an induced subgraph of G S and G T

If we relax the constraint that G must be an induced subgraph of G S or G T then even for unit interval graphs it is possible to find two powers of paths, whose intersection contains G as an induced subgraph, smaller than the answer given by Algorithm CPP. See an example in Fig. 5.
Fig. 5

Graph G is not induced subgraph of G S and G T

If graph G is an unit interval graph then G contains no induced Claw (Fig. 6), S3 (Fig. 7), \(\overline{S}_{3}\) (Fig. 8) and Cycle (C n ), n≥4. If G is a cycle C n , n≥4. Then the smallest θ-powers of paths, G S and G T , such that G S G T contains C n as induced subgraph can be obtained as follows. First, we construct G S : for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), u2j−1:=v j ; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), u2j:=vn+1−j. Now, we construct G T : for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), w2j−1:=vj+1; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), w2j:=v k , where \(k = (n+2-j) \operatorname {mod}n\). See an example when G is a C6 in Fig. 9.
Fig. 6

Claw

Fig. 7

3-sun (S3)

Fig. 8

Net (\(\overline{S}_{3}\))

Fig. 9

Graph G=C6 and the respective G S and G T , 2-powers of paths with 6 vertices

Theorem 5

LetG S andG T be 2-powers of paths with n vertices constructed by the previous method. ThenG S G T isC n , n≥4.

Proof

Let G S be the 2-power of path P S =u1,…,u n , and let G T be the 2-power of path P T =w1,…,w n constructed by the previous method. Since the distance between consecutive vertices of G in G S (resp. G T ) is less than or equal to 2, G S (resp. G T ) contains G as subgraph.

For each v i C n , \(i \in \{2, \ldots, \lceil\frac{n}{2}\rceil,\lceil\frac{n}{2}\rceil +2, \ldots, n\}\), and 3≤jn−2, if u j =v i with j odd then wj−2=v i ; if j is even, we have wj+2=v i .

Now, let v i C n , if u j =v i , 3≤jn−2 with j odd (resp. even), then wj−2=v i (resp. wj+2=v i ), and its neighbors uj−1=v k =wj−1+2 (resp. uj−1=vk−1=wj−1−2) and uj+1=vk−1=wj+1+2 (resp. uj+1=v k =wj+1−2). We conclude that \(d_{P_{S}}(v_{i},v_{k})= d_{P_{S}}(v_{i}, v_{k-1}) = 1\), \(d_{P_{T}}(v_{i}, v_{k}) = 3\) and \(d_{P_{T}}(v_{i}, v_{k-1}) = 5\), i.e., v i v k ,v i vk−1E(G S ) and \(v_{i}v_{k},v_{i}v_{k-1} \not \in E(G_{T})\). Hence, \(v_{i}v_{k},v_{i}v_{k-1} \not \in G_{S} \cap G_{T}\). □

5 Conclusion

In this work, we developed an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph G, an explicit representation of the smallest θ-power of path G S (with respect to θ and to the number of vertices) that contains G as an induced subgraph. We construct G T , a θ-power of a path with the same number of vertices of G S , such that the intersection G S G T contains G as an induced subgraph.

We remark that θ can be greater than or equal to the size of a maximum clique of the graph G, ω(G). We present in Fig. 10 an example where G has ω(G)=4 and Algorithm CPP returns θ=5, but the difference between θ and ω(G) can be greater than 1.
Fig. 10

Graph G with n=10 and ω(G)=4 and the output returned by Algorithm CPP: \(P^{\theta}_{n_{\theta}}\) with n θ =21 and θ=5

In case graph G is not an induced of G S and G T , we show a method that generates G S and G T , 2-powers of paths with n vertices, whose intersection is C n , n≥4.

As future work, we intend to investigate this problem for other classes of graphs. We remark that all remaining forbidden induced subgraphs of unit interval graphs (Figs. 67 and 8), have answer YES to Question 1.

For a Claw graph, we see that G S is the 2-power of path P S =v2,a,v1,v3,v4; and G T is the 2-power of path P T =v3,b,v1,v2,v4. For a 3-sun graph, we find that G S and G T are 4-powers of paths P S =v5,a,b,v4,v6,v3,v1,v2 and P T =v1,v6,x,v5,v2,v4,y,v3, respectively. For a Net graph, we see that G S and G T are 2-powers of paths P S =v4,v2,v1,v5,v3,a,v6 and P T =v4,b,v1,v5,v3,v2,v6, respectively.

Declarations

Acknowledgements

This research was supported by CNPq and FAPERJ.

We are really grateful to professor Jayme Szwarcfiter for having presented to us the paper [5] in the very beginning of our work and for fruitful discussions on this topic. We are also thankful to the anonymous referees for their careful reading and valuable contributions.

Authors’ Affiliations

(1)
Instituto de Matemática e Estatística, Universidade Federal Fluminense
(2)
Department of Mathematics and Statistics, University of Ottawa
(3)
Department of Statistics, University of Toronto

References

  1. Adam Z, Choi V, Sankoff D, Zhu Q (2008) Generalized gene adjacencies, graph bandwidth and clusters in yeast evolution. In: Lecture Notes in Bioinformatics, vol 4983, pp 134–145Google Scholar
  2. Brandstädt A, Hundt C, Mancini F, Wagner P (2010) Rooted directed path graphs are leaf powers. Discrete Math 310:897–910MATHMathSciNetView ArticleGoogle Scholar
  3. Corneil DG, Kim H, Natarajan S, Olariu S, Sprague A (1995) Simple linear time recognition of unit interval graphs. Inf Process Lett 55:99–104MATHMathSciNetView ArticleGoogle Scholar
  4. Figueiredo CMH, Meidanis J, Mello CP (1995) A linear-time algorithm for proper interval graph recognition. Inf Process Lett 56:179–184MATHView ArticleGoogle Scholar
  5. Lin MC, Rautenbach D, Soulignac FJ, Szwarcfiter JL (2011) Powers of cycles, powers of paths, and distance graph. Discrete Appl Math 159:621–627MATHMathSciNetView ArticleGoogle Scholar
  6. Lin MC, Soulignac FJ, Szwarcfiter JL (2009) Short models for unit interval graphs. Electron Notes Discrete Math 35:247–255MathSciNetView ArticleGoogle Scholar
  7. Roberts FS (1968) Representations of indifference relations. Stanford University, StanfordGoogle Scholar
  8. Sankoff D, Xu X (2008) Tests for gene clusters satisfying the generalized criterion. Lect Notes Comput Sci 5167:152–160MathSciNetView ArticleGoogle Scholar
  9. Soulignac FJ (2010) On proper and helly circular-arc graphs. Universidad de Buenos Aires, Buenos AiresGoogle Scholar

Copyright

© The Brazilian Computer Society 2012