Analysis - Fibonacci Strings

Analysis: Fibonacci's Revenge

This task asks you to do two things: to create a "Fibonacci" sequence of strings by repeatedly concatinating two starter strings, and to find a specified character in the resulting sequence of strings. For example, if the starter strings are harmony and hedgehog, the first four elements of the sequence are as shown in Table 1

Table 1 - A "Fibonacci" Sequence of Strings
Index	Element	Characters
1	harmony	1 ~ 7
2	hedgehog	8 ~ 15
3	harmonyhedgehog	16 ~ 30
4	hedgehogharmonyhedgehog	31 ~ 53
\(\vdots\)	\(\vdots\)	\(\vdots\)

and the \(42^{nd}\) character in the sequence is m.

Naive Solution

At first glance, this problem seems too easy to be a BIO finals question. We can construct the sequence of characters by repeatedly concatenating the given strings, and we are told the location from which to retrieve the target character. There's no sorting or searching -- or really any computation at all -- involved.

You can see just how simple the naive solution to the problem is by examining this replit. Running the program with defaults (which finds the \(600,000,000^{th}\) character starting from harmony and hedgehog) demonstrates that computation time isn't a problem in this challenge.

What made this problem interesting enough for the 2000 BIO finals was the state of technology at that time. The computing platform that Cambridge University offered the competitors was a "Pentium PC" with perhaps 256 MB of RAM to run the program, the OS, and all background applications. This is roughly comparable to a Raspberry Pi A. If you are using the free version of Replit, your account has twice that for the exclusive use of your program.

Even with this immense advantage, you won't be able to satisfy the requirements of the challenge, which include accepting an \(n\) value up to \(2^{30} = 1,073,741,824\). The naive solution will exceed your RAM limit a little beyond \(n = 600,000,000\), with the exact number depending on the starting strings. When that happens, Replit will just quit running your program.

The problem with the naive solution is that it requires you to keep the sequence (or a large part of it) in memory. In the suggested naive solution, I keep only the last two elements of the sequence in memory, but this means space for more than 480M characters if \(n = 600,000,000\). When I run the naive solution on my own computer and monitor its memory usage using the Valgrind profiler, it takes more than 1.5 GB of heap memory to find the \(1,000,000,000^{th}\) character in the sequence.

Improved Solution

In order to solve our memory problem, we need to avoid making ever-longer strings as we progress along the sequence. To see that this is indeed possible, consider the \(k^{th}\) element, which is the concatination of the \((k-2)^{nd}\) and \((k-1)^{st}\) elements. For example, let us represent the initial strings as a and b. Then instead of storing the \(k = 8^{th}\) element, we could just keep the \(6^{th}\) and \(7^{th}\):

Table 2 - Elements in Terms of Earlier Elements
Index	Element
1	a
2	b
3	ab
4	bab
5	abbab
6	bababbab
7	abbabbababbab
8	bababbababbabbababbab
\(\vdots\)	\(\vdots\)

Of course, this is nothing more than the "Fibonacci Rule" to recursively generate the next element. But what if we didn't stop two steps back but instead went back three:

Table 3 - Using Even Earlier Elements
Index	Element
1	a
2	b
3	ab
4	bab
5	abbab
6	bababbab
7	abbabbababbab
8	bababbababbabbababbab
\(\vdots\)	\(\vdots\)

Now instead of storing the \(8^{th}\) element in memory, we only need the \(5^{th}\) and \(6^{th}\). This saves us the memory needed to store what is effectively a second copy of the \(6^{th}\) element.

Obviously, we can keep going back and replacing each term with a combination of earlier (and importantly, smaller) terms. In fact, every element is made up of the two original strings a and b, so if we are clever they are the only things we have to keep in memory. We can extract the target character from one of the original strings without having to make giant, repetitive concatinations of them first.

Here's how we might implement this using a recursive algorithm:

Determine which element of the "Fibonacci" sequence of strings contains the target character, and determine the offset of the character within that element. (For efficiency, keep a record of the lengths of each element as you step along the sequence.)
1. Create and initialise to zero two variables: one for the sequence element number and one for the sum of the lengths of elements considered so far.
2. Increment the sequence element number variable.
3. Note the difference between the target character's position and the sum of element lengths so far. (If the current element contains the character, we will use it below as the offset.)
4. Find the length of the current element. For the first element, this is the length of string a; for the second, the length of string b. For all other elements, it is the sum of the lengths of the two previous elements.
5. Add the length of the current element to the total length of all previous elements.
6. Compare the new total length to the target character position. If the total length is equal or larger, the target character is in the current element and its offset is the difference we found in step 1.c. above; otherwise, loop again from step 1.b.
Call the recursive function of step 3 below with the element number and offset for the target character's position.
Find the target character, given the element number and offset.
1. If the element number is 1 or 2, use the offset and the original two strings to determine the target character.
2. Otherwise, instead of searching the specified element (call it the \(k^{th}\) element) for the target character, compare the offset to the lentgh of the \((k-2)^{nd}\) element. If the offset is less than or equal to that length, call the recursive function of step 3 with element number \(k-2\) and the same offset; otherwise, call the function with element number \(k-1\) and the difference between the current offset and the length of element \(k-2\).

The approach described above works well enough, but the recursive calls could go as many as 40 deep for short input strings. For a little more work in the recursive function, we can take bigger steps backwards in the sequence. For example, we can decompose the \(k^{th}\) element into three pieces consisting of the \((k-2)^{nd} + (k-3)^{rd} + (k-2)^{nd}\) elements. By doing so, the recursive calls can be cut to 20 deep at worst case. This is the approach I take in the replit.

Next Steps

Recursion is a powerful strategy that often leads to more efficient algorithms and simpler code. You can learn more about recursion here.