Shortest supersequence of all permutations

Importance: Low ✭
Author(s): Hu, T.C.
Koutas, P.J.
Subject: Combinatorics
Recomm. for undergrads: yes
Posted by: Pseudonym
on: July 18th, 2007

Let $ \Sigma = \{ x_1 \ldots x_k \} $ be a finite alphabet, and $ \alpha, \beta \in \Sigma^{*} $. $ \alpha $ is a supersequence of $ \beta $ if $ \beta $ can be constructed by deleting symbols from $ \alpha $.

A universal supersequence over $ \Sigma $ is a sequence $ u \in \Sigma^* $ such that for all $ \alpha \in \pi(\Sigma) $ (that is, for every permutation $ \alpha $ of $ \{ x_1 \ldots x_k \} $), $ u $ is a supersequence of $ \alpha $.

For $ | \Sigma | = k $, denote the length of a shortest universal supersequence over $ \Sigma $ as $ \sigma(k) $.

Question   Find a closed form or recurrence for $ \sigma(k) $.

A little thought gives an obvious upper bound $ \sigma(k) \leq k^2 $. A little more thought gives a slightly tighter one $ \sigma(k) \leq k^2 - k + 1 $.

Bibliography

P.J. Koutas, T.C. Hu, Shortest String Containing All Permutations, Discrete Mathematics, Vol. 11, 1975, pp. 125-132.


* indicates original appearance(s) of problem.

Testing if universal

Is there an efficient algo for testing whether a sequence is universal supersequence over sigma? Thanks!

It's been solved

It was solved in 2001 by R. Erra, N. Lygeros, N. Stewart. The paper is here if anyone is interested:

http://www.nigels.com/research/dmtcs2001.pdf

Re: It's been solved

I have taken a look at the paper by R. Erra, N. Lygeros, N. Stewart. I don't see why their proof implies the conjectured bound.

They construct (conjectured) minimal length sequences from a known minimal length sequence in the case n=4, but they don't show that one cannot do better with another construction.

shortest string containing all permutations

Seems too simple.

The recurrence sigma(k) is:

sigma(0) = 0 sigma(i+1) = 1 + 2*(sigma(i))

The induction step is derived from the fact that,
upon adding a symbol to the alphabet,
the universal supersequence of the result
must contain the new symbol both before and after
the universal supersequence of the original.

The shortest such string is
u(Epsiloni) + newSymbol + u(Epsiloni)

Q.E.D.

-cstb jas@cruzio.co

if it seems too simple...

While it is true that the sequences constructed by your method will contain all possilbe permutations, it is possible to do much better if you use the new symbol more than once. Indeed, the "obvious upper bound" $ \sigma(k) \le k^2 $ from the problem description is much better than the (exponential) one given by your formula.

Shortest string containing all permutations

If memory serves me correctly, this was solved by Koutas and Hu giving your tighter upper bound. Some refs:

L. Adleman, Short Permutation Strings, Discrete Mathematics, Vol. 10, 1974, pp. 197-200

P.J. Koutas, T.C. Hu, Shortest String Containing All Permutations, Discrete Mathematics, Vol. 11, 1975, pp. 125-132

G. Galbiati, F.P. Preparata, On Permutation-Embedding Sequences, SIAM J. Appl. Math., Vol. 30, No. 3, May 1976, pp. 421-423

D.J. Kleitman, D.J. Kwiatkowski, A Lower Bound on the Length of a Sequence Containing All Permutations as Subsequences, JCT (A), Vol. 21, 1976, pp. 129-136

M. Cai, A New Bound on the Length of the Shortest String Containing all r-Permutations, Discrete Mathematics, Vol. 39, 1982, pp. 329-330

C. Savage, Short Strings Containing All k-Element Permutations, Discrete Mathematics, Vol. 42, 1982, pp. 281-285

A.A. Schaffer, Shortest Prefix Strings Containing All Subset Permutations, Discrete Mathematics, Vol. 64, 1987, pp. 239-252

FR

Thanks!

Thanks. I'll take a look at the references when I have a chance, and update the bibliography. I'll also update the problem statement to use the original notation.

I'm pleased to see that the problem is still open. This is exactly the sort of "probably within the grasp of a bright undergraduate" problem that is great to see.

Still open!

Thanks for a lot of relevant references. I didn't look at all of them, but what it seems to me, is that

  1. Koutas and Hu show that $ \sigma(k) \le k^2 - 2k + 4 $ and conjecture that this is tight
  2. Kleitman and Smith prove that $ \sigma(k) \ge k^2 − c k^{7/4+\varepsilon} $

Is that all that is known? It'd be nice if someone interested in this problem did a more extensive search, and then updated the statement of this problem: giving credit to Koutas and Hu (or whoever was the first to conjecture the value of $ \sigma(k) $, summarizing the literature, and perhaps choosing more descriptive keywords.

If you like this problem and want to spend some time researching it, feel free to do these changes (after all, this site is meant to be wiki-like, so that people edit others contributions).

Counterexample to k^2-2k+4 conjecture

Here is a counterexample to the conjecture $ \sigma(k)=k^2-2k+4 $. If the conjecture is true then $ \sigma(10)=84 $, however, the following shows $ \sigma(10)\leq83 $. Let $ \{1,\ldots,8,A,B\} $ be the alphabet of ten symbols. Then the string

$$B 12345678A B 12345678 B A1234567 B 812345A6 B 7812345 B 6A781234 B 5678123A B 45678123 B A45678123 B$$

contains every permutation of $ \{1\ldots8,A,B\} $. To find a permutation with B in its nth place we match its B to the nth B in the sequence. Note that if we reverse the sequence and substitute symbols we get the same sequence. Let $ [a,b] $ denote the set of sequences that contain each $ b $ length string in $ a $ symbols with distinct symbols. We want to show 12345678A$ \in[9,1] $, 12345678A 12345678$ \in[9,2] $, etc. Whenever the i+1th block contains all the symbols other than the last symbol of the $ i $th block, then we can inductively extend (first i blocks)$ \in[9,i] $ to (first i+1 blocks)$ \in[9,i+1] $. We can manually check that (first five blocks)$ \in[9,5] $.

An aside

Using a similar construction, the following sequence shows $ \sigma(11) \leq 102 $, giving another counterexample to the proposed tightness of $ \sigma(k) = k^2 - 2k + 4 $.

"B123456789AB123456789BA12345678B9123456A7B89123456A B78912345B6A7891234B56789123AB456789123BA456789123B"

Unlike the previously constructed counterexample for $ k = 10 $, there is only "near palindromic" symmetry (under the mapping $ \{1 \leftrightarrow 3,4 \leftrightarrow 9,5 \leftrightarrow 8,6 \leftrightarrow 7\} $).

I think it is solved

I think this problem was solved by Aviezri Fraenkel, but I can not find the paper now. I will check it out.

Great!

I'd be very interested in knowing about that, thanks.

About Fraenkel's solution

It seems that Aviezri Fraenkel solved a slightly different problem, where the sequence is universal if it is a super-sequence of any sequence of $ k $ symbols (and not just permutations). I give the solution in the comment below. The solution was explained to me by Udi Hadad, an M.Sc. student of Fraenkel. As far as he knows Fraenkel solved the problem for a biologist who needed the solution, and never published it.

Fraenkel's solution

For convenience, assume that $ \Sigma = \{  1, \ldots, k \} $. Consider the directed graph $ G $ defined as follows. The vertices of $ G $ are the set $ \Sigma^{k-1} $. Every vertex has $ k $ outgoing edges, labeled $ 1 $ to $ k $. Given a vertex $ (a_1, a_2, \ldots, a_{k-1}) $, the $ i $-th outgoing edge from $ (a_1, a_2, \ldots, a_{k-1}) $ lead to the vertex $ (a_2, \ldots, a_{k-1}, i) $. We can identify every edge with some sub-sequence that needs to be covered. Now note that in this graph, the out-degree and in-degree are equal for every vertex, and therefore there exists an Eulerian path in the graph. This path gives the universal sequence, by taking the label of its initial vertex and then the labels of every edge on this path.

This sequence is optimal, because it adds one letter for ever subsequence that should be covered, except for the first one.

Re: Fraenkel's solution

This is a nice solution but to quite a different problem. (Btw, it is due to De Bruijn in 1946, see De Bruijn sequence on Wikipedia.)

The main difference is not in considering non-permutations too, but in that here we want every sequence to consider as a consecutive subsequence (which obviously implies it needs to be of length at least $ k^k $).

The original question, however, asks for "subsequences with gaps", where, indeed, it is easy to find a universal supersequence of length $ k^2 $.