This task asks you to do two things: to create a "Fibonacci" sequence of strings by repeatedly concatinating two starter strings, and to find a specified character in the resulting sequence of strings. For example, if the starter strings are harmony and hedgehog, the first four elements of the sequence are as shown in Table 1
Index | Element | Characters |
---|---|---|
1 | harmony | 1 ~ 7 |
2 | hedgehog | 8 ~ 15 |
3 | harmonyhedgehog | 16 ~ 30 |
4 | hedgehogharmonyhedgehog | 31 ~ 53 |
\(\vdots\) | \(\vdots\) | \(\vdots\) |
and the \(42^{nd}\) character in the sequence is m.
At first glance, this problem seems too easy to be a BIO finals question. We can construct the sequence of characters by repeatedly concatenating the given strings, and we are told the location from which to retrieve the target character. There's no sorting or searching -- or really any computation at all -- involved.
You can see just how simple the naive solution to the problem is by examining this replit. Running the program with defaults (which finds the \(600,000,000^{th}\) character starting from harmony and hedgehog) demonstrates that computation time isn't a problem in this challenge.
What made this problem interesting enough for the 2000 BIO finals was the state of technology at that time. The computing platform that Cambridge University offered the competitors was a "Pentium PC" with perhaps 256 MB of RAM to run the program, the OS, and all background applications. This is roughly comparable to a Raspberry Pi A. If you are using the free version of Replit, your account has twice that for the exclusive use of your program.
Even with this immense advantage, you won't be able to satisfy the requirements of the challenge, which include accepting an \(n\) value up to \(2^{30} = 1,073,741,824\). The naive solution will exceed your RAM limit a little beyond \(n = 600,000,000\), with the exact number depending on the starting strings. When that happens, Replit will just quit running your program.
The problem with the naive solution is that it requires you to keep the sequence (or a large part of it) in memory. In the suggested naive solution, I keep only the last two elements of the sequence in memory, but this means space for more than 480M characters if \(n = 600,000,000\). When I run the naive solution on my own computer and monitor its memory usage using the Valgrind profiler, it takes more than 1.5 GB of heap memory to find the \(1,000,000,000^{th}\) character in the sequence.
In order to solve our memory problem, we need to avoid making ever-longer strings as we progress along the sequence. To see that this is indeed possible, consider the \(k^{th}\) element, which is the concatination of the \((k-2)^{nd}\) and \((k-1)^{st}\) elements. For example, let us represent the initial strings as a and b. Then instead of storing the \(k = 8^{th}\) element, we could just keep the \(6^{th}\) and \(7^{th}\):
Index | Element |
---|---|
1 | a |
2 | b |
3 | ab |
4 | bab |
5 | abbab |
6 | bababbab |
7 | abbabbababbab |
8 | bababbababbabbababbab |
\(\vdots\) | \(\vdots\) |
Of course, this is nothing more than the "Fibonacci Rule" to recursively generate the next element. But what if we didn't stop two steps back but instead went back three:
Index | Element |
---|---|
1 | a |
2 | b |
3 | ab |
4 | bab |
5 | abbab |
6 | bababbab |
7 | abbabbababbab |
8 | bababbababbabbababbab |
\(\vdots\) | \(\vdots\) |
Now instead of storing the \(8^{th}\) element in memory, we only need the \(5^{th}\) and \(6^{th}\). This saves us the memory needed to store what is effectively a second copy of the \(6^{th}\) element.
Obviously, we can keep going back and replacing each term with a combination of earlier (and importantly, smaller) terms. In fact, every element is made up of the two original strings a and b, so if we are clever they are the only things we have to keep in memory. We can extract the target character from one of the original strings without having to make giant, repetitive concatinations of them first.
Here's how we might implement this using a recursive algorithm:
The approach described above works well enough, but the recursive calls could go as many as 40 deep for short input strings. For a little more work in the recursive function, we can take bigger steps backwards in the sequence. For example, we can decompose the \(k^{th}\) element into three pieces consisting of the \((k-2)^{nd} + (k-3)^{rd} + (k-2)^{nd}\) elements. By doing so, the recursive calls can be cut to 20 deep at worst case. This is the approach I take in the replit.
Recursion is a powerful strategy that often leads to more efficient algorithms and simpler code. You can learn more about recursion here.