Skip to main content

Volume 18 Supplement 2

GraphCliques

Gene clusters as intersections of powers of paths

Abstract

There are various definitions of a gene cluster determined by two genomes and methods for finding these clusters. However, there is little work on characterizing configurations of genes that are eligible to be a cluster according to a given definition. For example, given a set of genes in a genome, is it always possible to find two genomes such that their intersection is exactly this cluster? In one version of this problem, we make use of the graph theory to reformulated it as follows: Given a graph G with n vertices, do there exist two θ-powers of paths G S =(V S ,E S ) and G T =(V T ,E T ) such that G S G T contains G as an induced subgraph? In this work, we divide the problem in two cases, depending on whether or not G is an induced subgraph of G S or G T . We show an \(\mathcal{O}(n^{2})\) time algorithm that generates the smallest θ-powers of paths G S and G T (with respect to and the number of vertices) that contains G as an induced subgraph. Finally, we discuss the problem when G is an induced subgraph neither of G S nor of G T and we present a method of finding the smallest power of a path when graph G is a cycle C n .

Introduction

Due to recent research on genetic mapping, a large amount of information is available and stored in databases of various research centers in the world. Processing these data, in order to obtain relevant biological conclusions, is one of the challenges in biology. One way to structure these data is using comparison of genomes, i.e., the search for similarities and differences between two or more organisms. The central question of this paper proposes to deal with a problem in this area by asking: given a set of genes in a genome, called cluster, is it always possible to find two genomes such that their intersection is exactly this cluster? First, we show the modeling presented by Adam et al. [1] and Sankoff and Xu [8], which will be used in this paper.

A marker is a gene with a known location on a chromosome. Let V X be the set of n markers in the genome X. These markers are partitioned among a number of total orders called chromosomes. For markers g and h in V X on the same chromosome in X, let ghE X if the number of genes intervening between g and h in X is less than θ, where θ≥1 is a fixed neighborhood parameter. We call G X =(V X ,E X ) a θ-adjacency graph if its edges are determined by a neighborhood parameter θ.

Consider the θ-adjacency graphs G S =(V S ,E S ) and G T =(V T ,E T ) with a non-null set of vertices in common V ST =V S V T . We say that a subset of VV ST is a generalized adjacency cluster if it consists of vertices of a maximal connected subgraph of G ST =(V ST ,E S E T ). We call G=G ST [V] the subgraph induced by set V.

Let G=(V(G),E(G)) be a graph with vertex set V(G) and edge set E(G), such that |V(G)|=n. Let v, \(\bar{v} \in V(G)\). The distance between vertices v and \(\bar{v}\), denoted by \(d_{G}(v,\bar{v})\), is the number of edges in a shortest path between v and \(\bar{v}\) in G. A path between two vertices v0 and v t of graph G is a sequence of vertices v1,v2,…,v t such that v i vi+1 is an edge of G, 1≤it−1. Let P n be a graph that is a path with n vertices. A θ-power of a path\(P_{n_{\theta}}\), denoted by \(P^{\theta}_{n_{\theta}}\), θ>0, is graph such that \(V(P^{\theta}_{n_{\theta}}) = V(P_{n_{\theta}})\) and \(E(P_{n_{\theta}}^{\theta}) = \{v\bar{v} : d_{P_{n_{\theta}}}(v, \bar{v})\leq \theta \ \mathrm{with} \ v, \bar{v} \in V(P_{n_{\theta}}^{\theta})\}\). For the benefit of the reader, we denote the power of a path \(P^{\theta}_{n_{\theta}}\) by Pθ. The definition of a chromosome with n θ markers in a θ-adjacency graph is similar to a power of a path \(P^{\theta}_{n_{\theta}}\). Now, the central question of this work can be reformulated as follows:

Question 1

([2, 5])

Given a connected graph G, do there exist G S and G T , two θ-powers of paths P S and P T , whose intersection contains G as an induced subgraph?

If the answer is yes, we are also interested in finding the minimum value of power θ and number vertices n θ for these two θ-powers of paths.

In order to contribute to this challenging problem, we divide our study in two cases, depending on whether or not G is an induced subgraph of G S or G T . First, we give some definitions. We say that G is an unit interval graph if there exists a family I of intervals (a,b) on the real line such that each vV(G) can be put in a one-to-one correspondence with (a v ,b v )I; the intervals in I are of same length; and \(v\bar{v}\) is a edge of E(G) if, and only if, \((a_{v}, b_{v}) \cap (a_{\bar{v}}, b_{\bar{v}}) \neq \emptyset\). This family of intervals is called an interval model for G. Lin et al. [6] and Soulignac [9] present a proof that the class of proper interval graphs precisely the class of unit interval graphs. There exist linear-time recognition algorithms for unit interval graphs, for example Figueiredo et al. [4] and Corneil et al. [3].

Brandstädt et al. [2] and Lin et al. [5] proved independently the following structural property:

Theorem 1

([2, 5])

A graphGis an induced subgraph of a power of a path if, and only if, Gis an unit interval graph.

Thus, given an unit interval graph G with n vertices, there exists a θ-power of a path \(P_{n_{\theta}}\) that contains G as an induced subgraph. But the proofs of the structural characterization given by Theorem 1 [2, 5] does not lead to an algorithm that constructs G S and G T for Question 1 with minimum value of power θ and number vertices n θ .

In the paper [6], the authors show an \(\mathcal{O}(n)\) time algorithm that includes new intervals into a proper interval model I of a connected graph G, constructing an extended model I′ containing I. This extended model I′ gives an implicit representation of a power of a path for all proper interval graph G, but the number of inserted intervals, or the size of the power θ, cannot be minimum. The authors also remark that any explicit representation would require \(\mathcal{O}(n^{2})\) steps.

We present in this work an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph G, an explicit representation of the smallest θ-power of path, G S (with respect to θ and to the number of vertices), that contains G as an induced subgraph. Next, we construct G T , a θ-power of a path with the same number of vertices of G S , such that the intersection G S G T contains G as an induced subgraph.

This paper is organized as follows. In Sects. 2 and 3, we present the algorithm and we prove its correctness and complexity. In Sect. 4, we discuss the problem when G is an induced subgraph neither of G S nor of G T and we present a method of finding the smallest power of a path when graph G is a cycle C n .

The algorithm

Our result is based on the ordering of the vertex set of G, given by Algorithm Recognize [3], which satisfies the property proved by Roberts in [7]:

Property 2

A graphGis an unit interval graph if and only if there is an order < on vertices such that for all verticesv, the closed neighborhood ofvis a set of consecutive vertices with respect to the order <.

Since all powers of paths are unit interval graphs, we can insert the vertices of V(G) in the vertex set of a power of a path \(P^{\theta}_{n_{\theta}}\) until this power of a path contains G as an induced subgraph.

This construction is done by Algorithm CPP as follows. First, let v1<v2<<v n be an ordering of V(G) given by Algorithm Recognize [3]. We consider θ0 as the number of vertices of the maximal clique that contains v1, minus one; and we insert the vertices of this clique in \(P^{\theta_{0}}\). The Algorithm CPP constructs a sequence of power of a paths \(P^{\theta_{0}} \subset P^{\theta_{1}} \subset \cdots \subset P^{\theta_{l-1}} \subset P^{\theta_{l}}\) such that θ i =θi−1+1.

Let v be the first vertex non-adjacent to v1 in the order on V(G). If v is adjacent to v2, Algorithm CPP must insert v in the vertex of \(P^{\theta_{0}}\) that is at distance θ0+1 from vertex v1 in \(P^{\theta_{0}}\). Similarly, if v is not adjacent to v t , but is adjacent to vt+1, Algorithm CPP must insert v in the vertex of \(P^{\theta_{0}}\) that is at a distance θ0+1 from vertex v t in \(P^{\theta_{0}}\). This is done by inserting t−1 vertices between the vertex of largest index adjacent to v1 and v in \(P^{\theta_{0}}\). Now, suppose that there exist at least two vertices v, \(\bar{v}\) that are not adjacent to v1 and adjacent to v2. Let \(\bar{v}\) be the second vertex of this set. In order to minimize the number of vertices of \(P^{\theta_{0}}\), vertex \(\bar{v}\) must be a vertex of \(P^{\theta_{0}}\) at distance θ0+2 of vertex v1 in \(P^{\theta_{0}}\). Then Algorithm CPP must call Procedure SHIFT to increase θ0 to θ1:=θ0+1 because of the edge \(\bar{v}v_{2}\). On the other hand, this increase adds several edges in \(P^{\theta_{0}}\) which are not in E(G). Thus, Procedure SHIFT adjusts the power of a path \(P^{\theta_{0}}\) for the new θ1, by inserting vertices in \(P^{\theta_{0}}\) in order to preserve the adjacencies and non-adjacencies between vertices of G and generates a new \(P^{\theta_{1}}\). Algorithm CPP proceeds until all vertices of V(G) are included in \(P_{n_{\theta}}^{\theta}\), a smallest power of a path with respect to θ and n θ .

Before describing Algorithm CPP, we borrow some definitions from [3]. Given an ordering of V(G) returned by Algorithm Recognize [3], then order G (v) is the position of vertex v considering this ordering; \(\xi_{G}(v) = \textrm{max} \{\mathrm {order}_{G}(\overline{v}) :\bar{v} \in N_{G}[v]\}\) and \(\eta_{G}(v) = \textrm{min} \{\mathrm {order}_{G}(\bar{v}): \bar{v} \in N_{G}[v]\}\), where N G [v]={wV(G):vwE(G)}{v}. Let vV(G) and uV(Pθ). We refer to \(\mathrm {order}_{P^{\theta}}(v)\) as the position of vertex v in the ordering of the vertex set of Pθ, i.e., \(\mathrm {order}_{P^{\theta}}(v)= i\), if u i =v in Pθ. We denote \(\xi_{P^{\theta}}(u)= \textrm{max} \{\mathrm {order}_{P^{\theta}}(\bar{u}) : \bar{u} \in N_{P^{\theta}}[u]\}\) and \(\eta_{P^{\theta}}(u) = \textrm{min} \{\mathrm {order}_{P^{\theta}}(\bar{u}) :\bar{u} \in N_{P^{\theta}}[u]\}\).

Next, we present Algorithm CPP and Procedure SHIFT.

Algorithm

CONSTRUCTING_POWER_OF_PATH(CPP)

figurea

Procedure SHIFT receives as input a smallest power of a path Pθ that contains G[v1,…,vl−1], ξ G (v1)+1≤ln as an induced subgraph in Pθ. Power Pθ contains the last vertex v l inserted by Algorithm CPP. Vertex v l raises Procedure SHIFT because v l is not adjacent to some vertex vlt in Pθ, but vltv l E(G).

Procedure

SHIFT

figureb

Algorithm CPP returns \(P_{n_{\theta}}^{\theta}\), the smallest power of a path (with respect to θ and n θ ) that contains G as an unit interval graph. We construct two powers of paths, G T =(V T ,E T ) and G S =(V S ,E S ), from \(P_{n_{\theta}}^{\theta}\) as follows. First, \(V_{T}= V_{S} = V(P^{\theta}_{n_{\theta}})\). Then, vertices of V T , which are not in V, receive different labels from vertices in \(V(P_{n_{\theta}}^{\theta})\).

We show an example of an unit interval graph G in Fig. 1. For this graph G, Algorithm CPP returns G S , the 2-power of path P S =v1,v2,v3,0,v4,v5. Then, G T is a 2-power of path P T =v1,v2,v3,v b ,v4,v5.

Fig. 1
figure1

Algorithm CPP returns the 2-power of path P6=v1,v2,v3,0,v4,v5 for unit interval graph G

Proofs

In this section, we present the proofs of correctness of the Procedure SHIFT (Lemma 1) and Algorithm CPP (Theorem 4).

Lemma 1

LetPθbe a smallest power of a path that containsGl−1=G[v1,…,vl−1] as an induced subgraph, with respect to the orderingv1<<vl−1. Letv l V(G) be the next vertex inserted inPθand\(v_{l-t-1}v_{l} \not\in E(G), \ v_{l-t}v_{l} \in E(G)\)and\(d_{P_{n_{\theta}}}(v_{l-t}, v_{l})= \theta+1\). Then, the output of the Procedure SHIFT, the power of a pathPθ+1, is a smallest power of a path that containsG l =G[v1,…,vl−1,v l ] as an induced subgraph, with respect to the orderingv1<<vl−1<v l .

Proof

Since \(v_{l-t-1}v_{l} \not\in E(G)\), vltv l E(G) and \(\theta+1=\allowbreak d_{P_{n_{\theta}}}(v_{l-t}, v_{l})\), the Procedure SHIFT must increase the power θ by one unit (Step 1). But the increase of θ to θ+1 creates several adjacencies in Pθ between pairs of vertices of the set {v1,…,v l } that are non-adjacent in G. In order to preserve the adjacencies and non-adjacencies between vertices of G in Pθ, Procedure SHIFT is forced to insert one vertex between the vertex that received vlt−1 in Pθ and its consecutive vertex in Pθ. Again, counting in descending order from vertex vlt−1, the adjacencies were violated in each “block” of θ vertices in Pθ. So, the procedure must insert one vertex to each θ+1 vertices in descending order, from vertex vlt−1 in Pθ. We observe that the set formed by the initial vertices of V(Pθ) has cardinality less than or equal to θ+1, because dividing \(\mathrm {order}_{P^{\theta}}(v_{l-t-1}) \ \) by θ+1 the remainder is greater than or equal to 1 and less than or equal to θ+1.

In each step, the procedure inserts the smallest number of vertices necessary to guarantee that the power of a path Pθ+1, created by Procedure SHIFT, contains G l [v1,…,v l ] as an induced subgraph. So, the power θ+1 and the number of inserted vertices are minimum and, consequently, Pθ+1 is a smallest power of a path that contains G l [v1,…,v l ] as an induced subgraph. □

First, we prove that Algorithm CPP correctly returns a smallest power of a path according to the ordering given by Algorithm Recognize [3].

Lemma 2

LetGbe a connected unit interval graph. Algorithm CPP generates the smallest power of a path\(P_{n_{\theta}}^{\theta}\), with respect toθandn θ , that containsGas an induced subgraph according to the orderingv1<<v n given by the input of CPP.

Proof

Algorithm CPP constructs a sequence of powers of paths \(P^{\theta_{0}} \subseteq P^{\theta_{1}} \subseteq \cdots \subseteq P^{\theta}\), where θ i =θi−1+1. This is done by successively adding, in each \(P^{\theta_{i}}\), vertices of G following the input ordering, preserving the adjacencies and non-adjacencies between vertices of G and minimizing θ and n θ . Initially, the power of a path \(P^{\theta_{0}}\) receives the maximal clique containing v1, i.e., \(V(P^{\theta_{0}}) =\{u_{1}, \ldots, u_{\xi_{P^{\theta_{0}}}(v_{1})}\}\) and θ0=ξ G (v1)−1. This is the smallest power of a path that contains \(G[v_{1}, \ldots, v_{\xi_{G}(v_{1})}]\) as an induced subgraph.

Suppose that the l−1 first vertices, i.e., {v1,…,vl−1}, were already been inserted by Algorithm CPP in the power of a path Pθ, i.e., Pθ is the smallest power of a path, with respect to θ and n θ that contains G[v1,…,vl−1] as an induced subgraph. Let v l V(G) the next vertex to be inserted by Algorithm CPP in Pθ. Suppose that v l is adjacent, in Pθ, to {vlt,…,vl−1}. Vertex v l must be inserted in Pθ between positions \(\xi_{P^{\theta}}(v_{l-t-1})+1\) and \(\xi_{P^{\theta}}(v_{l-t})\) so that G[v1,…,v l ] be an induced subgraph of Pθ. Then, \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta+1\) and \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). We consider two cases with respect to the adjacencies of v l in G. From now on, we refer to Fig. 2, and Fig. 3 and Fig. 4, where dashed lines represent adjacencies.

Fig. 2
figure2

Bracket indicates possible positions of v l

Fig. 3
figure3

Bracket indicates possible positions of vl−1

Fig. 4
figure4

Bracket indicates possible positions of vertices vl−1 and v l

Case 1: If t=θ, then after insertion of v l , \(\theta + 1 \leq d_{P_{n_{\theta}}}(v_{l-t-1},v_{l})\), because the set {vlt,…,vl−1} has t=θ elements (see Fig. 2). In order to minimize θ and n θ , Algorithm CPP must insert v l in the consecutive vertex to vl−1 in the power of a path Pθ, and as a consequence \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta + 1\). In effect, since vl−1 is adjacent to vlt in Pθ, by hypothesis, vl−1 was inserted in Pθ such that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) \leq \theta\) and v l was inserted in the consecutive vertex to vl−1 in Pθ, then the claim is true. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\), Algorithm CPP inserted v l without changing θ, the number of vertices of Pθ became n θ +1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta+1\), Algorithm CPP called the Procedure SHIFT and, by Lemma 1, we conclude the proof.

Case 2: If 1<t<θ, Algorithm CPP must insert v l in Pθ such that \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta + 1\) so that vlt−1 and v l are not adjacent. We observe the position of vl−1 in Pθ. If vl−1 is not adjacent to vlt−1 in Pθ (see Fig. 3), in order to minimize the number of vertices of Pθ, Algorithm CPP inserts v l in the consecutive vertex to vl−1. By hypothesis, vertex vl−1 was inserted in Pθ so that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1})\leq \theta\). Then, if \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) < \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). Thus v l was inserted in Pθ without changing θ, the number of vertices of Pθ became n θ +1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) = \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta + 1\), Procedure SHIFT was called and, by Lemma 1, we conclude the proof.

If vl−1 is adjacent to vlt−1 in Pθ (see Fig. 4), the position of vlt−1 in Pθ is between lt+1 and \(\xi_{P^{\theta}}(v_{l-t-1})\), including them. Again, in order to minimize the number of vertices of Pθ, vertex v l is inserted \((\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))\) vertices after vertex vl−1 in Pθ. Thus,

Since \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) < d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). So, v l was inserted in Pθ without changing θ, and the number of vertices of Pθ became \(n_{\theta}+(\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))+1\). This insertion was minimum, because \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\).

This concludes the proof of the Lemma 2. □

In order to show that the Algorithm CPP returns the smallest power of a path containing G as an induced subgraph, we present two results with a given power of a path Pσ containing G as an induced subgraph. First, we shall give some notation from [3]. Given an unit interval graph G and an unit interval model associated to its vertices I={v1,v2,…,v n }, we recall that the interval associated to vertex v is (a v ,b v ). We say that v1,v2,…,v n is a natural labeling for the vertices of G, if \(a_{v_{i}} \leq a_{v_{i+1}}\), for each 1≤in−1. The ordering v1<v2<…<v n is a natural ordering, if v1,v2,…,v n is a natural labeling for V(G). A vertex is a left anchor if it can receive the label v1 in some natural labeling for V(G). Consider the model I′ obtained by mirroring an unit interval model I (that is, replacing each interval (a,b) by (−b,−a)). Model I′ is also a valid unit interval model for G, so the rightmost interval in I is also a left anchor.

In the next results, we show properties of the ordering of V(G) induced by a natural ordering that is generated by the subscripts of a natural labeling of a power of a path.

Lemma 3

Let\(P_{n_{\sigma}}^{\sigma}\)be a power of a path that containsGas an induced subgraph. The ordering of the vertices ofV(G) induced by a natural ordering of the vertices ofV(Pσ) satisfies Property 2.

Proof

Suppose that this ordering of V(G) does not satisfy Property 2. Then, there exist three vertices v r ,v s ,v t V(G) such that v r <v s <v t with \(v_{r}v_{s}\not\in E(G)\) and v r v t E(G). It follows that v r v t E(G)E(Pσ). Therefore, \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{t})|\leq \sigma\). Since v r <v s <v t in V(Pσ), we have \(|\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq |\mathrm {order}_{P}(v_{r}) - \mathrm {order}_{P}(v_{t})|\), and then \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq \sigma\). Consequently, v r v s E(Pσ) and \(v_{r}v_{s} \not\in E(G)\), i.e., \(P_{n_{\sigma}}^{\sigma}\) does not contain G as an induced subgraph. □

Vertices \(v, \overline{v} \in V(G)\) are indistinguishable vertices (twin vertices) if \(N_{G}[v] = N_{G}[\overline{v}]\). The next result states that it is possible to change the position, between indistinguishable vertices of V(G) in a natural ordering of V(Pσ).

Lemma 4

Letv, \(\overline{v} \in V(G)\)such that\(N_{G}[v] = N_{G}[\overline{v}]\)withv=u i and\(\overline{v} = u_{j}\)inV(Pσ). If we change the positions of verticesvand\(\overline{v}\)inPσ, i.e., v=u j and\(\overline{v} = u_{i}\), graphGwill still be an induced subgraph ofPσ.

Proof

Without loss of generality, suppose i<j. By Lemma 3, the ordering of V(G) induced by a natural ordering of V(Pσ) satisfies Property 2. So, \(N_{G}[v]=\{v_{\eta_{G}(v)},\ldots, v_{\xi_{G}(v)}\}\) and \(N_{G}[\overline{v}]=\{v_{\eta_{G}(\overline{v})}, \ldots,v_{\xi_{G}(\overline{v})}\}\). Since \(N[v] = N[\overline{v}]\), we have \(\xi_{G}(\overline{v}) =\xi_{G}(v)\) and \(\eta_{G}(\overline{v}) = \eta_{G}(v)\). Then, \(v_{\xi_{G}(v)} =v_{\xi_{G}(\overline{v})}\), \(v_{\eta_{G}(v)} = v_{\eta_{G}(\overline{v})}\), \(v_{\xi_{G}(v)+1}=v_{\xi_{G}(\overline{v})+1}\) and \(v_{\eta_{G}(v)-1} = v_{\eta_{G}(\overline{v})-1}\). Thus, by changing the positions of vertices v and \(\overline{v}\) in Pσ, we have \(\mathrm {order}_{P^{\sigma}}(\overline{v}) - \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(\overline{v})-1}) \geq \sigma +1\); then edge \(\overline{v}v_{\eta_{G}(\overline{v})-1} \not\in E(P^{\sigma})\). Also, for any vV(G) with \(\mathrm {order}_{G}(v') < \mathrm {order}_{G}(v_{\eta_{G}(\overline{v})-1})\), edge \(\overline{v}v' \not\in E(P^{\sigma})\). Similarly, \(\mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(v)+1}) - \mathrm {order}_{P^{\sigma}}(v)\geq \sigma + 1\), i.e., edge \(vv_{\xi_{G}(v)+1} \not\in E(P^{\sigma})\) and also, for any vV(G) with \(\mathrm {order}_{G}(v_{\xi_{G}(v)+1}) < \mathrm {order}_{G}(v')\), edge \(vv' \not\in E(P^{\sigma})\).

Analogously, \(\sigma \geq \mathrm {order}_{P^{\sigma}}(\overline{v})- \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(v)})\), i.e., edge \(\overline{v}v_{\eta_{G}(\overline{v})} \in E(P^{\sigma})\) and, for any vV(G) with \(\mathrm {order}_{G}(v_{\eta_{G}(\overline{v})}) < \mathrm {order}_{G}(v') < \mathrm {order}_{G}(\overline{v})\), edge \(\overline{v}v' \in E(P^{\sigma})\). Similarly \(\sigma \geq \mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(\overline{v})}) - \mathrm {order}_{P^{\sigma}}(v)\), i.e., edge \(vv_{\xi_{G}(v)} \in E(P^{\sigma})\) and, for any vV(G) with \(\mathrm {order}_{G}(v) < \mathrm {order}_{G}(v') <\mathrm {order}_{G}(v_{\xi_{G}(v)})\), edge vvE(Pσ). □

In what follows, we denote by v i < B v j if order G (v i )<order G (v j ) considering the ordering of V(G) given by Algorithm Recognize [3]. First, Theorem 4, we need two results.

Theorem 3

(Theorem 2.2 [3])

LetIbe an unit interval model of an unit interval graphGwith natural labelingv1,…,v n . Then, for all vertices\(\bar{v},\ v \in V(G)\), if\(a_{\bar{v}} < a_{v}\)but\(v <_{B} \bar{v}\), we have\(N_{G}[v]=N_{G}[\bar{v}]\).

As consequence of Theorem 2.3 of [3], we have the following result.

Lemma 5

([3])

Let\(v'_{1} <_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\)be an ordering ofV(G) given by Algorithm Recognize [3] of an unit interval graphG. Given a natural labelingv1,…,v n then\(N_{G}[v'_{1}]=N_{G}[v_{1}]\)or\(N_{G}[v'_{1}]=N_{G}[v_{n}]\).

Finally, the correctness of Algorithm CPP is given by theorem below.

Theorem 4

LetGbe an unit interval graph. Algorithm CPP returns the smallest power of a path\(P^{\theta}_{n_{\theta}}\)with respect toθandn θ , that containsGas an induced subgraph.

Proof

Let \(P^{\sigma}_{n_{\sigma}}\) be the smallest power of a path that contains G as an induced subgraph. Let \(\overline{u}_{1} < \cdots < \overline{u}_{n_{\sigma}}\) be a natural ordering of V(Pσ) and let \(\overline{v}_{1} < \cdots < \overline{v}_{n}\) be the ordering of V(G) induced by the natural ordering of V(Pσ). Clearly, \(\overline{v}_{1}, \ldots, \overline{v}_{n}\) is a natural labeling of V(G). Let I be a family of intervals for this labeling of V(G), such that each vV(G) is associated to (a v ,b v )I.

If we prove \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\) is equal to \(v'_{1} <_{B}v'_{2} <_{B} \cdots <_{B} v'_{n}\) up to indistinguishable vertices, we have θ=σ and n θ =n σ . In fact, since Pσ is the smallest power of a path that contains G as an induced subgraph, then σθ and n σ n θ . On the order hand, by Lemma 2, the power of a path Pθ returned by Algorithm CPP is the smallest power of a path that contains G as an induced subgraph with respect to the ordering, \(v'_{1}<_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\). So, if this ordering is equal to \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\), up to indistinguishable vertices, by Lemma 4, Pσ contains G as an induced subgraph with respect to the ordering \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\). Then, by minimality of θ and n θ with respect to \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\), we have σθ and n σ n θ .

First, suppose that the left anchor \(\overline{v}_{1}\) is equal to \(v'_{1}\). Suppose, by absurd, that there exist \(v, \ \tilde{v} \in V(G)\), such that \(v < \tilde{v}\), \(\tilde{v} <_{B} v\) and \(N_{G}[v] \neq N_{G}[\tilde{v}]\). Since \(v < \tilde{v}\) then \(a_{v} \leq a_{\tilde{v}}\). If \(a_{v} = a_{\tilde{v}}\), since all intervals of I have the same length, we have \(b_{v} = b_{\tilde{v}}\) and hence \(N_{G}[v] =N_{G}[\tilde{v}]\) a contradiction to the hypothesis. If \(a_{v} < a_{\tilde{v}}\), since \(\tilde{v} <_{B} v\) then, by Theorem 3, \(N_{G}[v] = N_{G}[\tilde{v}]\), a contradiction to the hypothesis. Thus, for all pair of vertices \(v, \tilde{v} \in V(G)\) such that \(v < \tilde{v}\) and \(\tilde{v} <_{B} v\), then \(N_{G}[v] = N_{G}[\tilde{v}]\). Consequently, we have σ=θ and n σ =n θ .

Now, suppose that the left anchor \(\overline{v}_{1}\) is different from \(v'_{1}\). By Lemma 5, either \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\) or \(N_{G}[\overline{v}_{n}] =N_{G}[v'_{1}]\). If \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\), by Lemma 4, we can change the positions of these vertices in V(Pσ), i.e., \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(v'_{1})}=\overline{v}_{1}\) and \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(\overline{v}_{1})} = v'_{1}\) and G will still be an induced subgraph of Pσ. After this change \(v'_{1} < \overline{v}_{2} < \cdots <\overline{v}_{1} < \cdots < \overline{v}_{n}\) is the new ordering of V(G) induced by the ordering of V(Pσ). We repeat the same argument used in the previous case, where \(\overline{v}_{1}\) is equal to \(v'_{1}\) and we conclude the proof. If \(N_{G}[\overline{v}_{n}] = N_{G}[v'_{1}]\), since \(\overline{v}_{n}\) is the left anchor of the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} <\cdots < \overline{v}_{1}\) of V(G) induced by the natural ordering \(\overline{u}_{n_{\sigma}} < \cdots < \overline{u}_{1}\) of V(Pσ) then, we can repeat the previous argument for the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} < \cdots < \overline{v}_{1}\) and so we conclude the proof. □

The Algorithm CPP analyzes each vertex of G in the ordering returned by Algorithm Recognize [3] a single time. In the worst case, the Algorithm CPP calls Procedure SHIFT for each vertex v l V(G) only once. Since for each vertex v l the Procedure SHIFT analyzes the set of vertices of G l at most once, the complexity of the Algorithm CPP is \(\mathcal{O}(n^{2})\).

G is not an induced subgraph of G S and G T

If we relax the constraint that G must be an induced subgraph of G S or G T then even for unit interval graphs it is possible to find two powers of paths, whose intersection contains G as an induced subgraph, smaller than the answer given by Algorithm CPP. See an example in Fig. 5.

Fig. 5
figure5

Graph G is not induced subgraph of G S and G T

If graph G is an unit interval graph then G contains no induced Claw (Fig. 6), S3 (Fig. 7), \(\overline{S}_{3}\) (Fig. 8) and Cycle (C n ), n≥4. If G is a cycle C n , n≥4. Then the smallest θ-powers of paths, G S and G T , such that G S G T contains C n as induced subgraph can be obtained as follows. First, we construct G S : for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), u2j−1:=v j ; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), u2j:=vn+1−j. Now, we construct G T : for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), w2j−1:=vj+1; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), w2j:=v k , where \(k = (n+2-j) \operatorname {mod}n\). See an example when G is a C6 in Fig. 9.

Fig. 6
figure6

Claw

Fig. 7
figure7

3-sun (S3)

Fig. 8
figure8

Net (\(\overline{S}_{3}\))

Fig. 9
figure9

Graph G=C6 and the respective G S and G T , 2-powers of paths with 6 vertices

Theorem 5

LetG S andG T be 2-powers of paths with n vertices constructed by the previous method. ThenG S G T isC n , n≥4.

Proof

Let G S be the 2-power of path P S =u1,…,u n , and let G T be the 2-power of path P T =w1,…,w n constructed by the previous method. Since the distance between consecutive vertices of G in G S (resp. G T ) is less than or equal to 2, G S (resp. G T ) contains G as subgraph.

For each v i C n , \(i \in \{2, \ldots, \lceil\frac{n}{2}\rceil,\lceil\frac{n}{2}\rceil +2, \ldots, n\}\), and 3≤jn−2, if u j =v i with j odd then wj−2=v i ; if j is even, we have wj+2=v i .

Now, let v i C n , if u j =v i , 3≤jn−2 with j odd (resp. even), then wj−2=v i (resp. wj+2=v i ), and its neighbors uj−1=v k =wj−1+2 (resp. uj−1=vk−1=wj−1−2) and uj+1=vk−1=wj+1+2 (resp. uj+1=v k =wj+1−2). We conclude that \(d_{P_{S}}(v_{i},v_{k})= d_{P_{S}}(v_{i}, v_{k-1}) = 1\), \(d_{P_{T}}(v_{i}, v_{k}) = 3\) and \(d_{P_{T}}(v_{i}, v_{k-1}) = 5\), i.e., v i v k ,v i vk−1E(G S ) and \(v_{i}v_{k},v_{i}v_{k-1} \not \in E(G_{T})\). Hence, \(v_{i}v_{k},v_{i}v_{k-1} \not \in G_{S} \cap G_{T}\). □

Conclusion

In this work, we developed an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph G, an explicit representation of the smallest θ-power of path G S (with respect to θ and to the number of vertices) that contains G as an induced subgraph. We construct G T , a θ-power of a path with the same number of vertices of G S , such that the intersection G S G T contains G as an induced subgraph.

We remark that θ can be greater than or equal to the size of a maximum clique of the graph G, ω(G). We present in Fig. 10 an example where G has ω(G)=4 and Algorithm CPP returns θ=5, but the difference between θ and ω(G) can be greater than 1.

Fig. 10
figure10

Graph G with n=10 and ω(G)=4 and the output returned by Algorithm CPP: \(P^{\theta}_{n_{\theta}}\) with n θ =21 and θ=5

In case graph G is not an induced of G S and G T , we show a method that generates G S and G T , 2-powers of paths with n vertices, whose intersection is C n , n≥4.

As future work, we intend to investigate this problem for other classes of graphs. We remark that all remaining forbidden induced subgraphs of unit interval graphs (Figs. 67 and 8), have answer YES to Question 1.

For a Claw graph, we see that G S is the 2-power of path P S =v2,a,v1,v3,v4; and G T is the 2-power of path P T =v3,b,v1,v2,v4. For a 3-sun graph, we find that G S and G T are 4-powers of paths P S =v5,a,b,v4,v6,v3,v1,v2 and P T =v1,v6,x,v5,v2,v4,y,v3, respectively. For a Net graph, we see that G S and G T are 2-powers of paths P S =v4,v2,v1,v5,v3,a,v6 and P T =v4,b,v1,v5,v3,v2,v6, respectively.

References

  1. 1.

    Adam Z, Choi V, Sankoff D, Zhu Q (2008) Generalized gene adjacencies, graph bandwidth and clusters in yeast evolution. In: Lecture Notes in Bioinformatics, vol 4983, pp 134–145

    Google Scholar 

  2. 2.

    Brandstädt A, Hundt C, Mancini F, Wagner P (2010) Rooted directed path graphs are leaf powers. Discrete Math 310:897–910

    MATH  MathSciNet  Article  Google Scholar 

  3. 3.

    Corneil DG, Kim H, Natarajan S, Olariu S, Sprague A (1995) Simple linear time recognition of unit interval graphs. Inf Process Lett 55:99–104

    MATH  MathSciNet  Article  Google Scholar 

  4. 4.

    Figueiredo CMH, Meidanis J, Mello CP (1995) A linear-time algorithm for proper interval graph recognition. Inf Process Lett 56:179–184

    MATH  Article  Google Scholar 

  5. 5.

    Lin MC, Rautenbach D, Soulignac FJ, Szwarcfiter JL (2011) Powers of cycles, powers of paths, and distance graph. Discrete Appl Math 159:621–627

    MATH  MathSciNet  Article  Google Scholar 

  6. 6.

    Lin MC, Soulignac FJ, Szwarcfiter JL (2009) Short models for unit interval graphs. Electron Notes Discrete Math 35:247–255

    MathSciNet  Article  Google Scholar 

  7. 7.

    Roberts FS (1968) Representations of indifference relations. Stanford University, Stanford

    Google Scholar 

  8. 8.

    Sankoff D, Xu X (2008) Tests for gene clusters satisfying the generalized criterion. Lect Notes Comput Sci 5167:152–160

    MathSciNet  Article  Google Scholar 

  9. 9.

    Soulignac FJ (2010) On proper and helly circular-arc graphs. Universidad de Buenos Aires, Buenos Aires

    Google Scholar 

Download references

Acknowledgements

This research was supported by CNPq and FAPERJ.

We are really grateful to professor Jayme Szwarcfiter for having presented to us the paper [5] in the very beginning of our work and for fruitful discussions on this topic. We are also thankful to the anonymous referees for their careful reading and valuable contributions.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Simone Dantas.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Costa, V., Dantas, S., Sankoff, D. et al. Gene clusters as intersections of powers of paths. J Braz Comput Soc 18, 129–136 (2012). https://doi.org/10.1007/s13173-012-0064-8

Download citation

Keywords

  • Power of a path
  • Unit interval graph
  • Genome
  • Gene clusters