Splay operation experiment¶

NTIN066, homework 3, splay operation experiment. Student: Mikoláš Fromm

In this notebook, we will experiment with splay trees, especially how they perform compared to the naive single rotation implementation.

In [36]:
import matplotlib.pyplot as plt
import numpy as np
import math

from splay_experiment import perform_tests, load_results
In [6]:
## perform all tests
if (False):
    perform_tests(42, save=True, verbose=False)
In [30]:
tests = load_results()
test_types = ["sequential", "random", "subset"]
tree_types = ["std", "naive"]
set_sizes = [10, 100, 1000]

rename_dict = {
    "sequential": "Sequential",
    "random": "Random",
    "subset": "Subset",
    "std" : "Std",
    "naive" : "Naive",
    "combined" : "Combined"
}
In [29]:
fig, axes = plt.subplots(1, 2, figsize=(18, 8))  # 1 row, 2 columns for separate plots

# Iterate over each test and plot them individually in their own subplot
for idx, test in enumerate(test_types[:2]):  # Adjust the test names accordingly
    ax = axes[idx]
    for type in tree_types:
        # Extract x and y values for plotting
        x = tests[test][type][:, 0]
        y = tests[test][type][:, 1]
        ax.plot(x, y, label=f"{rename_dict.get(type, type)}", marker='o')  # Use markers for clarity
    ax.set_title(f"{test.capitalize()} Test")
    ax.set(xlabel='Number of Inserted Nodes', ylabel='Avg. Number of Rotations')
    ax.grid(True)
    ax.legend(title="Tree Type")

plt.tight_layout()
plt.show()
No description has been provided for this image

Interpretation¶

Sequential test¶

The sequential test takes a certain number of nodes and inserts them in their sequential order into the tree. Then it starts to repeatedly find all the nodes in their sequential order.

We can clearly see how the standard splay operation maintains the average number of splay operations O(1) (eg. constant), whereas the naive implementation requires O(n) splay operations for sequential insert and sequential find.

We have proven the behaviour for the standard implementation in the 2nd lecture on the page 15, where we stated that when performing a repeated find of a subsequence S in the increasing order, the total number of operations is O(n), therefore single find operation must take at most O(1) operations.

In case of the naive implementation, we can make an observation that after the complete insertion, we end up with a straight line (or branch) containing all inserted values. The single FIND must therefore take O(n) in the worst case.

Random test¶

The random test inserts n elements and then performs find of 5n elements.

The standard splay tree benefits from the fact that it keeps the tree well-formed, while bringing the most accessed nodes to the top (making them more effective to reach again). This comes at the cost of making more operations for a single splay (single splay requires 2 rotations, while the naive just 1).

If we find the elements randomly, we are effectively giving up the benefit of the splay tree, since splaying the found node up to the root (while self-adjusting the tree) can not guarantee us a more effective find of a next (randomly chosen) node. At the end, we only see that the average number of rotations is higher for the standard implementation compared to the naive one, because the naive one simply does not need to rotate that much to perform a single splay.

The logarithmical growth of the average number of rotations is likely given by the fact that the more nodes are in the tree, the deeper the random node can be. The depth of a node can be, of course, at most linear with the number of nodes in the tree (when the insertion would be performed sequentially in the icreasing order of the nodes), but since the insertion of the nodes into the tree is performed randomly, we are, on average, getting rid of this problem.

In [80]:
# Create a new figure for the third test type
fig2, ax2 = plt.subplots(1, 3, figsize=(20, 8))  # Single plot for the third test

num_of_finds = 10_000

## plot subset with std implementation comparing sizes
test = test_types[2]
for idx, type in enumerate([tree_types[0], tree_types[1], "combined"]):
    ax = ax2[idx]
    for set_size in set_sizes:
        if (type == "combined"):
            x = tests[test][tree_types[0]][set_size][:, 0]
            y = tests[test][tree_types[0]][set_size][:, 1]
            ax.plot(x[x >= set_size], y[x >= set_size], label=f"{set_size} - {rename_dict.get(tree_types[0], tree_types[0])}", marker='o')
            x = tests[test][tree_types[1]][set_size][:, 0]
            y = tests[test][tree_types[1]][set_size][:, 1]
            ax.plot(x[x >= set_size], y[x >= set_size], label=f"{set_size} - {rename_dict.get(tree_types[1], tree_types[1])}", marker='o')
        else:
            x = tests[test][type][set_size][:, 0]
            y = tests[test][type][set_size][:, 1]
            ax.plot(x[x >= set_size], y[x >= set_size], label=f"{set_size}", marker='o')
        
    ax.set_title(f"{rename_dict.get(test, test)} Test - {rename_dict.get(type, type)} {"Tree" if type != "combined" else ""}")
    ax.set(xlabel='Number of Inserted Nodes (log. scale)', ylabel='Avg. Number of Rotations per Find')
    ax.set_xscale('log')

    ax.grid(True)

    if (type == "combined"):
        ax.legend(title="Subset Size - Splay")
    else:
        ax.legend(title="Subset Size")

plt.tight_layout()
plt.show()
No description has been provided for this image

Interpretation¶

Subset test¶

Inserts a sequence of n elements, which contains arithmetic progressions interspersed with random elements. Then repeatedly (10 000 times) accesses a small subset of these elements in a random order.

We can see from the measured results that both implementations are performing quite similarly, only that the standard implementation is having slightly more average rotations per find compared to the naive implementation.

Again, on the lecture 2 on the page 15, we have stated the following:

While performing k-times FIND of a value from a certain subset M (|M| = m) from the splay tree of a size n, the total complexity is O(n*log(n) + k + k*log(m))

As we have stated earlier, one of the advantages of a splay tree is that it keeps the frequently found items close to the root. Since we perform the FIND only on a small subset of all the values, we will quickly move the whole subset to the top of the tree. Any further FIND will therefore very likely don't need to go any deeper than O(log(m)). For that reason, we expect that the jumps between each subset size in the plot are determined by the O(log(m)) complexity of a single FIND. We also demonstrate it on the following plot:

In [82]:
## plot subset with std implementation comparing sizes
fig3, ax3 = plt.subplots(1, 2, figsize=(16,8))  # Single plot for the third test

## plot subset with std implementation comparing sizes
colors = ['blue', 'red', 'green']
test = test_types[2]
for idx, type in enumerate(tree_types):
    ax = ax3[idx]
    for set_size in set_sizes:
        x = tests[test][type][set_size][:, 0]
        y = tests[test][type][set_size][:, 1]
        ax.plot(x[x >= set_size], y[x >= set_size], label=f"{set_size}", marker='o', color=colors[set_sizes.index(set_size)])

        y = np.zeros(len(x)) + 2*math.log(set_size)
        ax.plot(x[x >= set_size], y[x >= set_size], label=f"2*log({set_size})", marker='x', color=colors[set_sizes.index(set_size)])

    ax.set_title(f"{rename_dict.get(test, test)} Test: results vs. log(subset_size) comparison")
    ax.set(xlabel='Number of Inserted Nodes (log. scale)', ylabel='Avg. Number of Rotations per Find')
    ax.set_xscale('log')
    ax.grid(True)   
    ax.legend(title="Size of searched subset\n and 2*log(subset_size)")

plt.tight_layout()
plt.show()
No description has been provided for this image

On the plot above, we can see that for our case, setting the upper bound equal 2*log(m) satisfies all three subsets.

On the other hand, in the case of the subset size equal 1000, the subset starts to represent a relatively big portion of the whole tree and the avg. number of rotations per find starts to grow logarithmically, which likely corresponds to the fact that the greater the subset, the closer we get to the regular FIND in the splay tree without any subset restriction.