Gene clusters as intersections of powers of paths

Costa, Vítor; Dantas, Simone; Sankoff, David; Xu, Ximing

doi:10.1007/s13173-012-0064-8

Volume 18 Supplement 2

GraphCliques

SI: GraphCliques
Open access
Published: 23 February 2012

Gene clusters as intersections of powers of paths

Vítor Costa¹,
Simone Dantas¹,
David Sankoff² &
…
Ximing Xu³

Journal of the Brazilian Computer Society volume 18, pages 129–136 (2012)Cite this article

1690 Accesses
2 Citations
Metrics details

Abstract

There are various definitions of a gene cluster determined by two genomes and methods for finding these clusters. However, there is little work on characterizing configurations of genes that are eligible to be a cluster according to a given definition. For example, given a set of genes in a genome, is it always possible to find two genomes such that their intersection is exactly this cluster? In one version of this problem, we make use of the graph theory to reformulated it as follows: Given a graph G with n vertices, do there exist two θ-powers of paths G_S=(V_S,E_S) and G_T=(V_T,E_T) such that G_S∩G_T contains G as an induced subgraph? In this work, we divide the problem in two cases, depending on whether or not G is an induced subgraph of G_S or G_T. We show an \(\mathcal{O}(n^{2})\) time algorithm that generates the smallest θ-powers of paths G_S and G_T (with respect to and the number of vertices) that contains G as an induced subgraph. Finally, we discuss the problem when G is an induced subgraph neither of G_S nor of G_T and we present a method of finding the smallest power of a path when graph G is a cycle C_n.

1 Introduction

Due to recent research on genetic mapping, a large amount of information is available and stored in databases of various research centers in the world. Processing these data, in order to obtain relevant biological conclusions, is one of the challenges in biology. One way to structure these data is using comparison of genomes, i.e., the search for similarities and differences between two or more organisms. The central question of this paper proposes to deal with a problem in this area by asking: given a set of genes in a genome, called cluster, is it always possible to find two genomes such that their intersection is exactly this cluster? First, we show the modeling presented by Adam et al. [1] and Sankoff and Xu [8], which will be used in this paper.

A marker is a gene with a known location on a chromosome. Let V_X be the set of n markers in the genome X. These markers are partitioned among a number of total orders called chromosomes. For markers g and h in V_X on the same chromosome in X, let gh∈E_X if the number of genes intervening between g and h in X is less than θ, where θ≥1 is a fixed neighborhood parameter. We call G_X=(V_X,E_X) a θ-adjacency graph if its edges are determined by a neighborhood parameter θ.

Consider the θ-adjacency graphs G_S=(V_S,E_S) and G_T=(V_T,E_T) with a non-null set of vertices in common V_ST=V_S∩V_T. We say that a subset of V⊆V_ST is a generalized adjacency cluster if it consists of vertices of a maximal connected subgraph of G_ST=(V_ST,E_S∩E_T). We call G=G_ST[V] the subgraph induced by set V.

Let G=(V(G),E(G)) be a graph with vertex set V(G) and edge set E(G), such that |V(G)|=n. Let v, \(\bar{v} \in V(G)\). The distance between vertices v and \(\bar{v}\), denoted by \(d_{G}(v,\bar{v})\), is the number of edges in a shortest path between v and \(\bar{v}\) in G. A path between two vertices v₀ and v_t of graph G is a sequence of vertices v₁,v₂,…,v_t such that v_iv_i+1 is an edge of G, 1≤i≤t−1. Let P_n be a graph that is a path with n vertices. A θ-power of a path\(P_{n_{\theta}}\), denoted by \(P^{\theta}_{n_{\theta}}\), θ>0, is graph such that \(V(P^{\theta}_{n_{\theta}}) = V(P_{n_{\theta}})\) and \(E(P_{n_{\theta}}^{\theta}) = \{v\bar{v} : d_{P_{n_{\theta}}}(v, \bar{v})\leq \theta \ \mathrm{with} \ v, \bar{v} \in V(P_{n_{\theta}}^{\theta})\}\). For the benefit of the reader, we denote the power of a path \(P^{\theta}_{n_{\theta}}\) by P^θ. The definition of a chromosome with n_θ markers in a θ-adjacency graph is similar to a power of a path \(P^{\theta}_{n_{\theta}}\). Now, the central question of this work can be reformulated as follows:

Question 1

([2, 5])

Given a connected graph G, do there exist G_S and G_T, two θ-powers of paths P_S and P_T, whose intersection contains G as an induced subgraph?

If the answer is yes, we are also interested in finding the minimum value of power θ and number vertices n_θ for these two θ-powers of paths.

In order to contribute to this challenging problem, we divide our study in two cases, depending on whether or not G is an induced subgraph of G_S or G_T. First, we give some definitions. We say that G is an unit interval graph if there exists a family I of intervals (a,b) on the real line such that each v∈V(G) can be put in a one-to-one correspondence with (a_v,b_v)∈I; the intervals in I are of same length; and \(v\bar{v}\) is a edge of E(G) if, and only if, \((a_{v}, b_{v}) \cap (a_{\bar{v}}, b_{\bar{v}}) \neq \emptyset\). This family of intervals is called an interval model for G. Lin et al. [6] and Soulignac [9] present a proof that the class of proper interval graphs precisely the class of unit interval graphs. There exist linear-time recognition algorithms for unit interval graphs, for example Figueiredo et al. [4] and Corneil et al. [3].

Brandstädt et al. [2] and Lin et al. [5] proved independently the following structural property:

Theorem 1

([2, 5])

A graphGis an induced subgraph of a power of a path if, and only if, Gis an unit interval graph.

Thus, given an unit interval graph G with n vertices, there exists a θ-power of a path \(P_{n_{\theta}}\) that contains G as an induced subgraph. But the proofs of the structural characterization given by Theorem 1 [2, 5] does not lead to an algorithm that constructs G_S and G_T for Question 1 with minimum value of power θ and number vertices n_θ.

In the paper [6], the authors show an \(\mathcal{O}(n)\) time algorithm that includes new intervals into a proper interval model I of a connected graph G, constructing an extended model I′ containing I. This extended model I′ gives an implicit representation of a power of a path for all proper interval graph G, but the number of inserted intervals, or the size of the power θ, cannot be minimum. The authors also remark that any explicit representation would require \(\mathcal{O}(n^{2})\) steps.

We present in this work an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph G, an explicit representation of the smallest θ-power of path, G_S (with respect to θ and to the number of vertices), that contains G as an induced subgraph. Next, we construct G_T, a θ-power of a path with the same number of vertices of G_S, such that the intersection G_S∩G_T contains G as an induced subgraph.

This paper is organized as follows. In Sects. 2 and 3, we present the algorithm and we prove its correctness and complexity. In Sect. 4, we discuss the problem when G is an induced subgraph neither of G_S nor of G_T and we present a method of finding the smallest power of a path when graph G is a cycle C_n.

2 The algorithm

Our result is based on the ordering of the vertex set of G, given by Algorithm Recognize [3], which satisfies the property proved by Roberts in [7]:

Property 2

A graphGis an unit interval graph if and only if there is an order < on vertices such that for all verticesv, the closed neighborhood ofvis a set of consecutive vertices with respect to the order <.

Since all powers of paths are unit interval graphs, we can insert the vertices of V(G) in the vertex set of a power of a path \(P^{\theta}_{n_{\theta}}\) until this power of a path contains G as an induced subgraph.

This construction is done by Algorithm CPP as follows. First, let v₁<v₂<⋯<v_n be an ordering of V(G) given by Algorithm Recognize [3]. We consider θ₀ as the number of vertices of the maximal clique that contains v₁, minus one; and we insert the vertices of this clique in \(P^{\theta_{0}}\). The Algorithm CPP constructs a sequence of power of a paths \(P^{\theta_{0}} \subset P^{\theta_{1}} \subset \cdots \subset P^{\theta_{l-1}} \subset P^{\theta_{l}}\) such that θ_i=θ_i−1+1.

Let v be the first vertex non-adjacent to v₁ in the order on V(G). If v is adjacent to v₂, Algorithm CPP must insert v in the vertex of \(P^{\theta_{0}}\) that is at distance θ₀+1 from vertex v₁ in \(P^{\theta_{0}}\). Similarly, if v is not adjacent to v_t, but is adjacent to v_t+1, Algorithm CPP must insert v in the vertex of \(P^{\theta_{0}}\) that is at a distance θ₀+1 from vertex v_t in \(P^{\theta_{0}}\). This is done by inserting t−1 vertices between the vertex of largest index adjacent to v₁ and v in \(P^{\theta_{0}}\). Now, suppose that there exist at least two vertices v, \(\bar{v}\) that are not adjacent to v₁ and adjacent to v₂. Let \(\bar{v}\) be the second vertex of this set. In order to minimize the number of vertices of \(P^{\theta_{0}}\), vertex \(\bar{v}\) must be a vertex of \(P^{\theta_{0}}\) at distance θ₀+2 of vertex v₁ in \(P^{\theta_{0}}\). Then Algorithm CPP must call Procedure SHIFT to increase θ₀ to θ₁:=θ₀+1 because of the edge \(\bar{v}v_{2}\). On the other hand, this increase adds several edges in \(P^{\theta_{0}}\) which are not in E(G). Thus, Procedure SHIFT adjusts the power of a path \(P^{\theta_{0}}\) for the new θ₁, by inserting vertices in \(P^{\theta_{0}}\) in order to preserve the adjacencies and non-adjacencies between vertices of G and generates a new \(P^{\theta_{1}}\). Algorithm CPP proceeds until all vertices of V(G) are included in \(P_{n_{\theta}}^{\theta}\), a smallest power of a path with respect to θ and n_θ.

Before describing Algorithm CPP, we borrow some definitions from [3]. Given an ordering of V(G) returned by Algorithm Recognize [3], then order_G(v) is the position of vertex v considering this ordering; \(\xi_{G}(v) = \textrm{max} \{\mathrm {order}_{G}(\overline{v}) :\bar{v} \in N_{G}[v]\}\) and \(\eta_{G}(v) = \textrm{min} \{\mathrm {order}_{G}(\bar{v}): \bar{v} \in N_{G}[v]\}\), where N_G[v]={w∈V(G):vw∈E(G)}∪{v}. Let v∈V(G) and u∈V(P^θ). We refer to \(\mathrm {order}_{P^{\theta}}(v)\) as the position of vertex v in the ordering of the vertex set of P^θ, i.e., \(\mathrm {order}_{P^{\theta}}(v)= i\), if u_i=v in P^θ. We denote \(\xi_{P^{\theta}}(u)= \textrm{max} \{\mathrm {order}_{P^{\theta}}(\bar{u}) : \bar{u} \in N_{P^{\theta}}[u]\}\) and \(\eta_{P^{\theta}}(u) = \textrm{min} \{\mathrm {order}_{P^{\theta}}(\bar{u}) :\bar{u} \in N_{P^{\theta}}[u]\}\).

Next, we present Algorithm CPP and Procedure SHIFT.

Algorithm

CONSTRUCTING_POWER_OF_PATH(CPP)

Procedure SHIFT receives as input a smallest power of a path P^θ that contains G[v₁,…,v_l−1], ξ_G(v₁)+1≤l≤n as an induced subgraph in P^θ. Power P^θ contains the last vertex v_l inserted by Algorithm CPP. Vertex v_l raises Procedure SHIFT because v_l is not adjacent to some vertex v_l−t in P^θ, but v_l−tv_l∈E(G).

Procedure

SHIFT

Algorithm CPP returns \(P_{n_{\theta}}^{\theta}\), the smallest power of a path (with respect to θ and n_θ) that contains G as an unit interval graph. We construct two powers of paths, G_T=(V_T,E_T) and G_S=(V_S,E_S), from \(P_{n_{\theta}}^{\theta}\) as follows. First, \(V_{T}= V_{S} = V(P^{\theta}_{n_{\theta}})\). Then, vertices of V_T, which are not in V, receive different labels from vertices in \(V(P_{n_{\theta}}^{\theta})\).

We show an example of an unit interval graph G in Fig. 1. For this graph G, Algorithm CPP returns G_S, the 2-power of path P_S=v₁,v₂,v₃,0,v₄,v₅. Then, G_T is a 2-power of path P_T=v₁,v₂,v₃,v_b,v₄,v₅.

3 Proofs

In this section, we present the proofs of correctness of the Procedure SHIFT (Lemma 1) and Algorithm CPP (Theorem 4).

Lemma 1

LetP^θbe a smallest power of a path that containsG_l−1=G[v₁,…,v_l−1] as an induced subgraph, with respect to the orderingv₁<⋯<v_l−1. Letv_l∈V(G) be the next vertex inserted inP^θand\(v_{l-t-1}v_{l} \not\in E(G), \ v_{l-t}v_{l} \in E(G)\)and\(d_{P_{n_{\theta}}}(v_{l-t}, v_{l})= \theta+1\). Then, the output of the Procedure SHIFT, the power of a pathP^θ+1, is a smallest power of a path that containsG_l=G[v₁,…,v_l−1,v_l] as an induced subgraph, with respect to the orderingv₁<⋯<v_l−1<v_l.

Proof

Since \(v_{l-t-1}v_{l} \not\in E(G)\), v_l−tv_l∈E(G) and \(\theta+1=\allowbreak d_{P_{n_{\theta}}}(v_{l-t}, v_{l})\), the Procedure SHIFT must increase the power θ by one unit (Step 1). But the increase of θ to θ+1 creates several adjacencies in P^θ between pairs of vertices of the set {v₁,…,v_l} that are non-adjacent in G. In order to preserve the adjacencies and non-adjacencies between vertices of G in P^θ, Procedure SHIFT is forced to insert one vertex between the vertex that received v_l−t−1 in P^θ and its consecutive vertex in P^θ. Again, counting in descending order from vertex v_l−t−1, the adjacencies were violated in each “block” of θ vertices in P^θ. So, the procedure must insert one vertex to each θ+1 vertices in descending order, from vertex v_l−t−1 in P^θ. We observe that the set formed by the initial vertices of V(P^θ) has cardinality less than or equal to θ+1, because dividing \(\mathrm {order}_{P^{\theta}}(v_{l-t-1}) \ \) by θ+1 the remainder is greater than or equal to 1 and less than or equal to θ+1.

In each step, the procedure inserts the smallest number of vertices necessary to guarantee that the power of a path P^θ+1, created by Procedure SHIFT, contains G_l[v₁,…,v_l] as an induced subgraph. So, the power θ+1 and the number of inserted vertices are minimum and, consequently, P^θ+1 is a smallest power of a path that contains G_l[v₁,…,v_l] as an induced subgraph. □

First, we prove that Algorithm CPP correctly returns a smallest power of a path according to the ordering given by Algorithm Recognize [3].

Lemma 2

LetGbe a connected unit interval graph. Algorithm CPP generates the smallest power of a path\(P_{n_{\theta}}^{\theta}\), with respect toθandn_θ, that containsGas an induced subgraph according to the orderingv₁<⋯<v_ngiven by the input of CPP.

Proof

Algorithm CPP constructs a sequence of powers of paths \(P^{\theta_{0}} \subseteq P^{\theta_{1}} \subseteq \cdots \subseteq P^{\theta}\), where θ_i=θ_i−1+1. This is done by successively adding, in each \(P^{\theta_{i}}\), vertices of G following the input ordering, preserving the adjacencies and non-adjacencies between vertices of G and minimizing θ and n_θ. Initially, the power of a path \(P^{\theta_{0}}\) receives the maximal clique containing v₁, i.e., \(V(P^{\theta_{0}}) =\{u_{1}, \ldots, u_{\xi_{P^{\theta_{0}}}(v_{1})}\}\) and θ₀=ξ_G(v₁)−1. This is the smallest power of a path that contains \(G[v_{1}, \ldots, v_{\xi_{G}(v_{1})}]\) as an induced subgraph.

Suppose that the l−1 first vertices, i.e., {v₁,…,v_l−1}, were already been inserted by Algorithm CPP in the power of a path P^θ, i.e., P^θ is the smallest power of a path, with respect to θ and n_θ that contains G[v₁,…,v_l−1] as an induced subgraph. Let v_l∈V(G) the next vertex to be inserted by Algorithm CPP in P^θ. Suppose that v_l is adjacent, in P^θ, to {v_l−t,…,v_l−1}. Vertex v_l must be inserted in P^θ between positions \(\xi_{P^{\theta}}(v_{l-t-1})+1\) and \(\xi_{P^{\theta}}(v_{l-t})\) so that G[v₁,…,v_l] be an induced subgraph of P^θ. Then, \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta+1\) and \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). We consider two cases with respect to the adjacencies of v_l in G. From now on, we refer to Fig. 2, and Fig. 3 and Fig. 4, where dashed lines represent adjacencies.

Case 1: If t=θ, then after insertion of v_l, \(\theta + 1 \leq d_{P_{n_{\theta}}}(v_{l-t-1},v_{l})\), because the set {v_l−t,…,v_l−1} has t=θ elements (see Fig. 2). In order to minimize θ and n_θ, Algorithm CPP must insert v_l in the consecutive vertex to v_l−1 in the power of a path P^θ, and as a consequence \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta + 1\). In effect, since v_l−1 is adjacent to v_l−t in P^θ, by hypothesis, v_l−1 was inserted in P^θ such that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) \leq \theta\) and v_l was inserted in the consecutive vertex to v_l−1 in P^θ, then the claim is true. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\), Algorithm CPP inserted v_l without changing θ, the number of vertices of P^θ became n_θ+1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta+1\), Algorithm CPP called the Procedure SHIFT and, by Lemma 1, we conclude the proof.

Case 2: If 1<t<θ, Algorithm CPP must insert v_l in P^θ such that \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta + 1\) so that v_l−t−1 and v_l are not adjacent. We observe the position of v_l−1 in P^θ. If v_l−1 is not adjacent to v_l−t−1 in P^θ (see Fig. 3), in order to minimize the number of vertices of P^θ, Algorithm CPP inserts v_l in the consecutive vertex to v_l−1. By hypothesis, vertex v_l−1 was inserted in P^θ so that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1})\leq \theta\). Then, if \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) < \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). Thus v_l was inserted in P^θ without changing θ, the number of vertices of P^θ became n_θ+1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) = \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta + 1\), Procedure SHIFT was called and, by Lemma 1, we conclude the proof.

If v_l−1 is adjacent to v_l−t−1 in P^θ (see Fig. 4), the position of v_l−t−1 in P^θ is between l−t+1 and \(\xi_{P^{\theta}}(v_{l-t-1})\), including them. Again, in order to minimize the number of vertices of P^θ, vertex v_l is inserted \((\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))\) vertices after vertex v_l−1 in P^θ. Thus,

Since \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) < d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). So, v_l was inserted in P^θ without changing θ, and the number of vertices of P^θ became \(n_{\theta}+(\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))+1\). This insertion was minimum, because \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\).

This concludes the proof of the Lemma 2. □

In order to show that the Algorithm CPP returns the smallest power of a path containing G as an induced subgraph, we present two results with a given power of a path P^σ containing G as an induced subgraph. First, we shall give some notation from [3]. Given an unit interval graph G and an unit interval model associated to its vertices I={v₁,v₂,…,v_n}, we recall that the interval associated to vertex v is (a_v,b_v). We say that v₁,v₂,…,v_n is a natural labeling for the vertices of G, if \(a_{v_{i}} \leq a_{v_{i+1}}\), for each 1≤i≤n−1. The ordering v₁<v₂<…<v_n is a natural ordering, if v₁,v₂,…,v_n is a natural labeling for V(G). A vertex is a left anchor if it can receive the label v₁ in some natural labeling for V(G). Consider the model I′ obtained by mirroring an unit interval model I (that is, replacing each interval (a,b) by (−b,−a)). Model I′ is also a valid unit interval model for G, so the rightmost interval in I is also a left anchor.

In the next results, we show properties of the ordering of V(G) induced by a natural ordering that is generated by the subscripts of a natural labeling of a power of a path.

Lemma 3

Let\(P_{n_{\sigma}}^{\sigma}\)be a power of a path that containsGas an induced subgraph. The ordering of the vertices ofV(G) induced by a natural ordering of the vertices ofV(P^σ) satisfies Property 2.

Proof

Suppose that this ordering of V(G) does not satisfy Property 2. Then, there exist three vertices v_r,v_s,v_t∈V(G) such that v_r<v_s<v_t with \(v_{r}v_{s}\not\in E(G)\) and v_rv_t∈E(G). It follows that v_rv_t∈E(G)⊂E(P^σ). Therefore, \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{t})|\leq \sigma\). Since v_r<v_s<v_t in V(P^σ), we have \(|\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq |\mathrm {order}_{P}(v_{r}) - \mathrm {order}_{P}(v_{t})|\), and then \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq \sigma\). Consequently, v_rv_s∈E(P^σ) and \(v_{r}v_{s} \not\in E(G)\), i.e., \(P_{n_{\sigma}}^{\sigma}\) does not contain G as an induced subgraph. □

Vertices \(v, \overline{v} \in V(G)\) are indistinguishable vertices (twin vertices) if \(N_{G}[v] = N_{G}[\overline{v}]\). The next result states that it is possible to change the position, between indistinguishable vertices of V(G) in a natural ordering of V(P^σ).

Lemma 4

Letv, \(\overline{v} \in V(G)\)such that\(N_{G}[v] = N_{G}[\overline{v}]\)withv=u_iand\(\overline{v} = u_{j}\)inV(P^σ). If we change the positions of verticesvand\(\overline{v}\)inP^σ, i.e., v=u_jand\(\overline{v} = u_{i}\), graphGwill still be an induced subgraph ofP^σ.

Proof

Without loss of generality, suppose i<j. By Lemma 3, the ordering of V(G) induced by a natural ordering of V(P^σ) satisfies Property 2. So, \(N_{G}[v]=\{v_{\eta_{G}(v)},\ldots, v_{\xi_{G}(v)}\}\) and \(N_{G}[\overline{v}]=\{v_{\eta_{G}(\overline{v})}, \ldots,v_{\xi_{G}(\overline{v})}\}\). Since \(N[v] = N[\overline{v}]\), we have \(\xi_{G}(\overline{v}) =\xi_{G}(v)\) and \(\eta_{G}(\overline{v}) = \eta_{G}(v)\). Then, \(v_{\xi_{G}(v)} =v_{\xi_{G}(\overline{v})}\), \(v_{\eta_{G}(v)} = v_{\eta_{G}(\overline{v})}\), \(v_{\xi_{G}(v)+1}=v_{\xi_{G}(\overline{v})+1}\) and \(v_{\eta_{G}(v)-1} = v_{\eta_{G}(\overline{v})-1}\). Thus, by changing the positions of vertices v and \(\overline{v}\) in P^σ, we have \(\mathrm {order}_{P^{\sigma}}(\overline{v}) - \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(\overline{v})-1}) \geq \sigma +1\); then edge \(\overline{v}v_{\eta_{G}(\overline{v})-1} \not\in E(P^{\sigma})\). Also, for any v′∈V(G) with \(\mathrm {order}_{G}(v') < \mathrm {order}_{G}(v_{\eta_{G}(\overline{v})-1})\), edge \(\overline{v}v' \not\in E(P^{\sigma})\). Similarly, \(\mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(v)+1}) - \mathrm {order}_{P^{\sigma}}(v)\geq \sigma + 1\), i.e., edge \(vv_{\xi_{G}(v)+1} \not\in E(P^{\sigma})\) and also, for any v′∈V(G) with \(\mathrm {order}_{G}(v_{\xi_{G}(v)+1}) < \mathrm {order}_{G}(v')\), edge \(vv' \not\in E(P^{\sigma})\).

Analogously, \(\sigma \geq \mathrm {order}_{P^{\sigma}}(\overline{v})- \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(v)})\), i.e., edge \(\overline{v}v_{\eta_{G}(\overline{v})} \in E(P^{\sigma})\) and, for any v′∈V(G) with \(\mathrm {order}_{G}(v_{\eta_{G}(\overline{v})}) < \mathrm {order}_{G}(v') < \mathrm {order}_{G}(\overline{v})\), edge \(\overline{v}v' \in E(P^{\sigma})\). Similarly \(\sigma \geq \mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(\overline{v})}) - \mathrm {order}_{P^{\sigma}}(v)\), i.e., edge \(vv_{\xi_{G}(v)} \in E(P^{\sigma})\) and, for any v′∈V(G) with \(\mathrm {order}_{G}(v) < \mathrm {order}_{G}(v') <\mathrm {order}_{G}(v_{\xi_{G}(v)})\), edge vv′∈E(P^σ). □

In what follows, we denote by v_i<_Bv_j if order_G(v_i)<order_G(v_j) considering the ordering of V(G) given by Algorithm Recognize [3]. First, Theorem 4, we need two results.

Theorem 3

(Theorem 2.2 [3])

LetIbe an unit interval model of an unit interval graphGwith natural labelingv₁,…,v_n. Then, for all vertices\(\bar{v},\ v \in V(G)\), if\(a_{\bar{v}} < a_{v}\)but\(v <_{B} \bar{v}\), we have\(N_{G}[v]=N_{G}[\bar{v}]\).

As consequence of Theorem 2.3 of [3], we have the following result.

Lemma 5

([3])

Let\(v'_{1} <_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\)be an ordering ofV(G) given by Algorithm Recognize [3] of an unit interval graphG. Given a natural labelingv₁,…,v_nthen\(N_{G}[v'_{1}]=N_{G}[v_{1}]\)or\(N_{G}[v'_{1}]=N_{G}[v_{n}]\).

Finally, the correctness of Algorithm CPP is given by theorem below.

Theorem 4

LetGbe an unit interval graph. Algorithm CPP returns the smallest power of a path\(P^{\theta}_{n_{\theta}}\)with respect toθandn_θ, that containsGas an induced subgraph.

Proof

Let \(P^{\sigma}_{n_{\sigma}}\) be the smallest power of a path that contains G as an induced subgraph. Let \(\overline{u}_{1} < \cdots < \overline{u}_{n_{\sigma}}\) be a natural ordering of V(P^σ) and let \(\overline{v}_{1} < \cdots < \overline{v}_{n}\) be the ordering of V(G) induced by the natural ordering of V(P^σ). Clearly, \(\overline{v}_{1}, \ldots, \overline{v}_{n}\) is a natural labeling of V(G). Let I be a family of intervals for this labeling of V(G), such that each v∈V(G) is associated to (a_v,b_v)∈I.

If we prove \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\) is equal to \(v'_{1} <_{B}v'_{2} <_{B} \cdots <_{B} v'_{n}\) up to indistinguishable vertices, we have θ=σ and n_θ=n_σ. In fact, since P^σ is the smallest power of a path that contains G as an induced subgraph, then σ≤θ and n_σ≤n_θ. On the order hand, by Lemma 2, the power of a path P^θ returned by Algorithm CPP is the smallest power of a path that contains G as an induced subgraph with respect to the ordering, \(v'_{1}<_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\). So, if this ordering is equal to \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\), up to indistinguishable vertices, by Lemma 4, P^σ contains G as an induced subgraph with respect to the ordering \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\). Then, by minimality of θ and n_θ with respect to \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\), we have σ≥θ and n_σ≥n_θ.

First, suppose that the left anchor \(\overline{v}_{1}\) is equal to \(v'_{1}\). Suppose, by absurd, that there exist \(v, \ \tilde{v} \in V(G)\), such that \(v < \tilde{v}\), \(\tilde{v} <_{B} v\) and \(N_{G}[v] \neq N_{G}[\tilde{v}]\). Since \(v < \tilde{v}\) then \(a_{v} \leq a_{\tilde{v}}\). If \(a_{v} = a_{\tilde{v}}\), since all intervals of I have the same length, we have \(b_{v} = b_{\tilde{v}}\) and hence \(N_{G}[v] =N_{G}[\tilde{v}]\) a contradiction to the hypothesis. If \(a_{v} < a_{\tilde{v}}\), since \(\tilde{v} <_{B} v\) then, by Theorem 3, \(N_{G}[v] = N_{G}[\tilde{v}]\), a contradiction to the hypothesis. Thus, for all pair of vertices \(v, \tilde{v} \in V(G)\) such that \(v < \tilde{v}\) and \(\tilde{v} <_{B} v\), then \(N_{G}[v] = N_{G}[\tilde{v}]\). Consequently, we have σ=θ and n_σ=n_θ.

Now, suppose that the left anchor \(\overline{v}_{1}\) is different from \(v'_{1}\). By Lemma 5, either \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\) or \(N_{G}[\overline{v}_{n}] =N_{G}[v'_{1}]\). If \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\), by Lemma 4, we can change the positions of these vertices in V(P^σ), i.e., \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(v'_{1})}=\overline{v}_{1}\) and \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(\overline{v}_{1})} = v'_{1}\) and G will still be an induced subgraph of P^σ. After this change \(v'_{1} < \overline{v}_{2} < \cdots <\overline{v}_{1} < \cdots < \overline{v}_{n}\) is the new ordering of V(G) induced by the ordering of V(P^σ). We repeat the same argument used in the previous case, where \(\overline{v}_{1}\) is equal to \(v'_{1}\) and we conclude the proof. If \(N_{G}[\overline{v}_{n}] = N_{G}[v'_{1}]\), since \(\overline{v}_{n}\) is the left anchor of the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} <\cdots < \overline{v}_{1}\) of V(G) induced by the natural ordering \(\overline{u}_{n_{\sigma}} < \cdots < \overline{u}_{1}\) of V(P^σ) then, we can repeat the previous argument for the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} < \cdots < \overline{v}_{1}\) and so we conclude the proof. □

The Algorithm CPP analyzes each vertex of G in the ordering returned by Algorithm Recognize [3] a single time. In the worst case, the Algorithm CPP calls Procedure SHIFT for each vertex v_l∈V(G) only once. Since for each vertex v_l the Procedure SHIFT analyzes the set of vertices of G_l at most once, the complexity of the Algorithm CPP is \(\mathcal{O}(n^{2})\).

4 G is not an induced subgraph of G_S and G_T

If we relax the constraint that G must be an induced subgraph of G_S or G_T then even for unit interval graphs it is possible to find two powers of paths, whose intersection contains G as an induced subgraph, smaller than the answer given by Algorithm CPP. See an example in Fig. 5.

If graph G is an unit interval graph then G contains no induced Claw (Fig. 6), S₃ (Fig. 7), \(\overline{S}_{3}\) (Fig. 8) and Cycle (C_n), n≥4. If G is a cycle C_n, n≥4. Then the smallest θ-powers of paths, G_S and G_T, such that G_S∩G_T contains C_n as induced subgraph can be obtained as follows. First, we construct G_S: for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), u_2j−1:=v_j; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), u_2j:=v_n+1−j. Now, we construct G_T: for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), w_2j−1:=v_j+1; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), w_2j:=v_k, where \(k = (n+2-j) \operatorname {mod}n\). See an example when G is a C₆ in Fig. 9.

Theorem 5

LetG_SandG_Tbe 2-powers of paths with n vertices constructed by the previous method. ThenG_S∩G_TisC_n, n≥4.

Proof

Let G_S be the 2-power of path P_S=u₁,…,u_n, and let G_T be the 2-power of path P_T=w₁,…,w_n constructed by the previous method. Since the distance between consecutive vertices of G in G_S (resp. G_T) is less than or equal to 2, G_S (resp. G_T) contains G as subgraph.

For each v_i∈C_n, \(i \in \{2, \ldots, \lceil\frac{n}{2}\rceil,\lceil\frac{n}{2}\rceil +2, \ldots, n\}\), and 3≤j≤n−2, if u_j=v_i with j odd then w_j−2=v_i; if j is even, we have w_j+2=v_i.

Now, let v_i∈C_n, if u_j=v_i, 3≤j≤n−2 with j odd (resp. even), then w_j−2=v_i (resp. w_j+2=v_i), and its neighbors u_j−1=v_k=w_j−1+2 (resp. u_j−1=v_k−1=w_j−1−2) and u_j+1=v_k−1=w_j+1+2 (resp. u_j+1=v_k=w_j+1−2). We conclude that \(d_{P_{S}}(v_{i},v_{k})= d_{P_{S}}(v_{i}, v_{k-1}) = 1\), \(d_{P_{T}}(v_{i}, v_{k}) = 3\) and \(d_{P_{T}}(v_{i}, v_{k-1}) = 5\), i.e., v_iv_k,v_iv_k−1∈E(G_S) and \(v_{i}v_{k},v_{i}v_{k-1} \not \in E(G_{T})\). Hence, \(v_{i}v_{k},v_{i}v_{k-1} \not \in G_{S} \cap G_{T}\). □

5 Conclusion

In this work, we developed an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph G, an explicit representation of the smallest θ-power of path G_S (with respect to θ and to the number of vertices) that contains G as an induced subgraph. We construct G_T, a θ-power of a path with the same number of vertices of G_S, such that the intersection G_S∩G_T contains G as an induced subgraph.

We remark that θ can be greater than or equal to the size of a maximum clique of the graph G, ω(G). We present in Fig. 10 an example where G has ω(G)=4 and Algorithm CPP returns θ=5, but the difference between θ and ω(G) can be greater than 1.

In case graph G is not an induced of G_S and G_T, we show a method that generates G_S and G_T, 2-powers of paths with n vertices, whose intersection is C_n, n≥4.

As future work, we intend to investigate this problem for other classes of graphs. We remark that all remaining forbidden induced subgraphs of unit interval graphs (Figs. 6, 7 and 8), have answer YES to Question 1.

For a Claw graph, we see that G_S is the 2-power of path P_S=v₂,a,v₁,v₃,v₄; and G_T is the 2-power of path P_T=v₃,b,v₁,v₂,v₄. For a 3-sun graph, we find that G_S and G_T are 4-powers of paths P_S=v₅,a,b,v₄,v₆,v₃,v₁,v₂ and P_T=v₁,v₆,x,v₅,v₂,v₄,y,v₃, respectively. For a Net graph, we see that G_S and G_T are 2-powers of paths P_S=v₄,v₂,v₁,v₅,v₃,a,v₆ and P_T=v₄,b,v₁,v₅,v₃,v₂,v₆, respectively.

References

Adam Z, Choi V, Sankoff D, Zhu Q (2008) Generalized gene adjacencies, graph bandwidth and clusters in yeast evolution. In: Lecture Notes in Bioinformatics, vol 4983, pp 134–145
Google Scholar
Brandstädt A, Hundt C, Mancini F, Wagner P (2010) Rooted directed path graphs are leaf powers. Discrete Math 310:897–910
Article MATH MathSciNet Google Scholar
Corneil DG, Kim H, Natarajan S, Olariu S, Sprague A (1995) Simple linear time recognition of unit interval graphs. Inf Process Lett 55:99–104
Article MATH MathSciNet Google Scholar
Figueiredo CMH, Meidanis J, Mello CP (1995) A linear-time algorithm for proper interval graph recognition. Inf Process Lett 56:179–184
Article MATH Google Scholar
Lin MC, Rautenbach D, Soulignac FJ, Szwarcfiter JL (2011) Powers of cycles, powers of paths, and distance graph. Discrete Appl Math 159:621–627
Article MATH MathSciNet Google Scholar
Lin MC, Soulignac FJ, Szwarcfiter JL (2009) Short models for unit interval graphs. Electron Notes Discrete Math 35:247–255
Article MathSciNet Google Scholar
Roberts FS (1968) Representations of indifference relations. Stanford University, Stanford
Google Scholar
Sankoff D, Xu X (2008) Tests for gene clusters satisfying the generalized criterion. Lect Notes Comput Sci 5167:152–160
Article MathSciNet Google Scholar
Soulignac FJ (2010) On proper and helly circular-arc graphs. Universidad de Buenos Aires, Buenos Aires
Google Scholar

Download references

Acknowledgements

This research was supported by CNPq and FAPERJ.

We are really grateful to professor Jayme Szwarcfiter for having presented to us the paper [5] in the very beginning of our work and for fruitful discussions on this topic. We are also thankful to the anonymous referees for their careful reading and valuable contributions.

Author information

Authors and Affiliations

Instituto de Matemática e Estatística, Universidade Federal Fluminense, 24.020-140, Niterói, Brazil
Vítor Costa & Simone Dantas
Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
David Sankoff
Department of Statistics, University of Toronto, Toronto, Canada
Ximing Xu

Authors

Vítor Costa
View author publications
You can also search for this author in PubMed Google Scholar
Simone Dantas
View author publications
You can also search for this author in PubMed Google Scholar
David Sankoff
View author publications
You can also search for this author in PubMed Google Scholar
Ximing Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simone Dantas.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Costa, V., Dantas, S., Sankoff, D. et al. Gene clusters as intersections of powers of paths. J Braz Comput Soc 18, 129–136 (2012). https://doi.org/10.1007/s13173-012-0064-8

Download citation

Received: 31 January 2012
Accepted: 01 February 2012
Published: 23 February 2012
Issue Date: June 2012
DOI: https://doi.org/10.1007/s13173-012-0064-8

GraphCliques

Gene clusters as intersections of powers of paths

Abstract

1 Introduction

Question 1

Theorem 1

2 The algorithm

Property 2

Algorithm

Procedure

3 Proofs

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Theorem 3

Lemma 5

Theorem 4

Proof

4 G is not an induced subgraph of G S and G T

Theorem 5

Proof

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

4 G is not an induced subgraph of G_S and G_T