Zimin Words and Bifixes

One of the earliest contributions to the On-Line Encyclopedia of Integer Sequences (OEIS) was a family sequences counting the number of words that begin (or don’t begin) with a palindrome:

  • Let \(f_k(n)\) be the number of strings of length \(n\) over a \(k\)-letter alphabet that begin with a nontrivial palindrome” for various values of \(k\).
  • Let \(g_k(n)\) be the number of strings of length n over a \(k\)-letter alphabet that do not begin with a nontrivial palindrome.
  • Number of binary strings of length \(n\) that begin with an odd-length palindrome. (A254128)

(If I had known better, I would have published fewer sequences in favor of a table, and I would have requested contiguous blocks of A-numbers.)

I must have written some Python code to compute some small terms of this sequence, and I knew that \(g_k(n) = k^n – f_k(n)\), but I remember being at in my friend Q’s bedroom when the recursion hit me for \(f_k(n)\): $$f_k(n) = kf_k(n-1) + k^{\lceil n/2 \rceil} – f_k\big(\lceil \frac n 2 \rceil \big)$$

“Bifix-free” words

One sequence that I didn’t add to the OEIS was the “Number of binary strings of length n that begin with an even-length palindrome”—that’s because this was already in the Encyclopedia under a different name:

A094536: Number of binary words of length n that are not “bifix-free”.

0, 0, 2, 4, 10, 20, 44, 88, 182, 364, 740, 1480, 2980, 5960, …

A “bifix” is a shared prefix and suffix, so a “bifix-free” word is one such that all prefixes are different from all suffixes. More concretely, if the word is \(\alpha_1\alpha_2 \dots \alpha_n\), then \((\alpha_1, \alpha_2, \dots, \alpha_k) \neq (\alpha_{n-k+1},\alpha_{n-k+2},\dots,\alpha_n)\) for all \(k \geq 1\).

The reason why the number of binary words of length \(n\) that begin with an even length palindrome is equal to the number of binary words of length \(n\) that have a bifix is because we have a bijection between the two sets. In particular, find the shortest palindromic prefix, cut it in half, and stick the first half at the end of the word, backward. I’ve asked for a better bijection on Math Stack Exchange, so if you have any ideas, please share them with me!

In 2019–2020, Daniel Gabric, Jeffrey Shallit wrote a paper closely related to this called Borders, Palindrome Prefixes, and Square Prefixes.

Zimin words

A Zimin word can be defined recursively, but I think it’s most suggestive to see some examples:

  • \(Z_1 = A\)
  • \(Z_2 = ABA\)
  • \(Z_3 = ABACABA\)
  • \(Z_4 = ABACABADABACABA\)
  • \(Z_n = Z_{n-1} X Z_{n-1}\)

All Zimin words \(Z_n\) are examples of “unavoidable patterns”, because every sufficiently long string with letters in any finite alphabet contains a substring that matches the \(Z_n\) pattern.

For example the word \(0100010010111000100111000111001\) contains a substring that matches the Zimin word \(Z_3\). Namely, let \(A = 100\), \(B = 0\), and \(C = 1011\), visualized here with each \(A\) emboldened: \( 0(\mathbf{100}\,0\,\mathbf{100}\,1011\,\mathbf{100}\,0\,\mathbf{100})111000111001\).

Binary solo from Bret at 1:26.

I’ve written a Ruby script that generates a random string of length 29 and uses a regular expression to find the first instance of a substring matching the pattern \(Z_3 = ABACABA\). You can run it on TIO, the impressive (and free!) tool from Dennis Mitchell.

# Randomly generates a binary string of length 29.
random_string = 29.times.map { [0,1].sample }.join("")
p random_string
# Finds the first Zimin word ABACABA
p random_string.scan(/(.+)(.+)\1(.+)\1\2\1/)[0]
# Pattern:             A   B   A C   A B A

Why 29? Because all binary words of length 29 contain the pattern \(Z_3 = ABACABA\). However, Joshua Cooper and Danny Rorabaugh’s paper provides 48 words of length 28 that avoid that pattern (these and their reversals):

1100000010010011011011111100
1100000010010011111101101100
1100000010101100110011111100
1100000010101111110011001100
1100000011001100101011111100
1100000011001100111111010100
1100000011011010010011111100
1100000011011011111100100100
1100000011111100100101101100
1100000011111100110011010100
1100000011111101010011001100
1100000011111101101100100100

1100100100000011011011111100
1100100100000011111101101100
1100100101101100000011111100
1100110011000000101011111100
1100110011000000111111010100
1100110011010100000011111100
1101010000001100110011111100
1101010000001111110011001100
1101010011001100000011111100
1101101100000010010011111100
1101101100000011111100100100
1101101100100100000011111100

The Zimin Word \(Z_2 = ABA\) and Bifixes

The number of Zimin words of length \(n\) that match the pattern ABA is equal to the number of of words that begin with an odd-length palindrome. Analogously, the number of words with a bifix is equal to the number of words that begin with an even-length palindrome. The number of these agree when \(n\) is odd.

I’ve added OEIS sequences A342510A342512 which relate to how numbers viewed as binary strings avoid—or fail to avoid—Zimin words. I asked users to implement this on Code Golf Stack Exchange.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *