ProxySQL’s new routing algorithm to handle hundreds of thousands of schemas/shards

ProxySQL is a service designed to scale and handle traffic in very large setups.
In the past we already showed examples of how ProxySQL can handle thousands of MySQL servers, and millions (yes, millions!) of distinct users.
Although, recently a new challenge raised: can ProxySQL perform routing based on MySQL schemaname for a vary large number of schemas?
We know that in production setups ProxySQL is already performing routing to MySQL servers based on schemaname for few hundreds schemas, but can it handle thousands or more?

Technically there is no limit on how many shards ProxySQL can handle, as long as there are rules for them.
Tibor Korocz already pointed in a blog post that more query rules can affect performance.
In case of query routing based on schemaname no regular expressions are involved (therefore each rule is not expensive), but the same principle applies: the more rules need to be evaluated, longer it takes to make a decision (in this case to perform the routing).

Scope of this blog post is to understand the performance implication of performing routing based on the number of rules.
To simplify the setup to run some benchmark, we used only 1 MySQL server as backend, and we created 50000 schemas:

for i in `seq 10001 60000` ; do
echo "CREATE DATABASE IF NOT EXISTS shard_$i;"
done | mysql -u root

To run the benchmark we used a various number of rules.
For every test, we configured ProxySQL setting the right number of rules: for 100 schemas we created 100 rules, for 1000 schema 1000 rules, etc.
For example, for 200 rules we used this configuration:

(
echo "DELETE FROM mysql_query_rules; INSERT INTO mysql_query_rules (active,username,cache_ttl) VALUES (1,\"sbtest\",120000);"
for i in `seq 10001 10200` ; do
echo "INSERT INTO mysql_query_rules (active,username,schemaname,destination_hostgroup,apply) VALUES (1,\"sbtest\",\"shard_$i\",1,1);"
done
echo "LOAD MYSQL QUERY RULES TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032

We then ran queries as the following:

for j in `seq 1 50000` ; do
echo "USE shard_$(($RANDOM%200+10001))" ; echo "SELECT 1;"
done | mysql -u sbtest -psbtest -h 127.0.0.1 -P6033 --ssl-mode=disabled -NB > /dev/null

It is relevant to note that we aren’t interested in the total execution time (this is why we used this simple one-liner to generate traffic), but in the time spent inside the Query Processor.
Because the execution time of the queries is not relevant, we also configured the query cache.
To compute the time spent in the Query Processor we set variable mysql-stats_time_query_processor='true' and computed the value of status variable Query_Processor_time_nsec. This status variable measures the time spent inside the Query Processor, in nanoseconds.

As we expected, the more query rules ProxySQL needs to evaluate, the longer it takes to compute the destination hostgroup for that schemaname. This results in latency before executing the query:

Performance with legacy routing algorithm

The graph above shows that for 10 rules the average latency is around 1us (microsecond), for 100 rules the average latency is 2us , and that it gradually grows to 6us for 500 rules and 256us for 20000 rules.

6us average latency for 500 rules doesn’t seem to be an issue, but 256us (0.256ms) average latency for 20000 does seem like an issue.
What is also important to note is that these values are the average values for all the shards. Routing decision for some shards are faster than others, depending in which order the rules are written.

In fact, with 20000 rules the average latency for shard_10001 is 0.89us , while the average latency for shard_30000 is 581.1us !

Performance with legacy routing algorithm at 20k rows

Surely, this doesn’t seem a scalable solution.

MySQL Query Rules Fast Routing

ProxySQL 1.4.7 introduces a new routing capability that enhances what already configurable in mysql_query_rules.
The new capability is configurable using table mysql_query_rules_fast_routing. This table is simpler compared to mysql_query_rules, and its purpose is simple: given username, schemaname and flagIN, finds the destination_hostgroup.

For reference, this is the table definition of mysql_query_rules_fast_routing:

CREATE TABLE mysql_query_rules_fast_routing (
    username VARCHAR NOT NULL,
    schemaname VARCHAR NOT NULL,
    flagIN INT NOT NULL DEFAULT 0,
    destination_hostgroup INT CHECK (destination_hostgroup >= 0) NOT NULL,
    comment VARCHAR NOT NULL,
    PRIMARY KEY (username, schemaname, flagIN) )

Table mysql_query_rules_fast_routing should be considered as an extension of mysql_query_rules. After processing the rules in mysql_query_rules, the rules in mysql_query_rules_fast_routing will be evaluated, unless the last matching rule defined in mysql_query_rules sets apply=1.
Because of the nature of the rules in mysql_query_rules_fast_routing, one and only one rule will be evaluated.
The rules defined in mysql_query_rules_fast_routing are loaded in a hash table, where the key is username, schemaname and FlagIN , and the value is the destination_hostgroup .

New routing algorithm in action

This is the command we used to configure mysql_query_rules_fast_routing with a different number of rules, up to 50000 rules in this example:

(
echo "DELETE FROM mysql_query_rules; DELETE FROM mysql_query_rules_fast_routing;"
echo "INSERT INTO mysql_query_rules (active,username,cache_ttl) VALUES (1,\"sbtest\",120000);"
for i in `seq 10001 60000` ; do
echo "INSERT INTO mysql_query_rules_fast_routing (username,schemaname,flagIN,destination_hostgroup,comment) VALUES (\"sbtest\",\"shard_$i\",0,1,\"\");"
done
echo "LOAD MYSQL QUERY RULES TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032

The results are really interesting:

Performance with new routing algorithm

In other words:

  • between 10 rules to 10k rules, the average latency is between 1.7us to 1.9us
  • at 20k rules the average latency is 2.2us
  • at 50k rules the average latency is 2.4us

What is really interesting is that this new routing algorithm allows ProxySQL to perform schema routing for up to 10000 shards introducing less latency the old routing algorithm was introducing for 100 shards, and it scales very well to 50k rules wth very small overhead compared to 100 shards in the old routing algorithm.

Disabling mysql-stats_time_query_processor by default

In ProxySQL 1.4.4 the new variable mysql-stats_time_query_processor was introduced to dynamically enable or disable measuring the time spent in the Query Processor. Although timing is very important to understand the impact of having too many rules, measuring time elapse with good precision has some performance impact.
In fact, on this same server where we ran these benchmark, measuring time elapse with very good precision has a cost (latency) of 0.3us , mostly spent in kernel space (system calls).
That also means that from the metrics above you should remove 0.3us to determine the latency when measuring time is not enabled. For example, “at 50k rules the average latency is 2.4us” should become “at 50k rules the average latency is 2.1us” .
For this reason, since ProxySQL 1.4.4 , mysql-stats_time_query_processor is disabled by default.

Why flagIN is relevant?

The new routing algorithm performs routing based on username, schemaname, and flagIN .
I think some are asking why flagIN is relevant.
flagIN in mysql_query_rules_fast_routing allows to set flagOUT in mysql_query_rules based on other criteria later relevant for routing.
For example, assume that you don’t want to only perform routing based on username+schemaname, but also read/write split.
You can use mysql_query_rules to:

  • set flagOUT = 1 for all queries to be sent to the server that is the master, no matter in which cluster
  • set flagOUT = 2 for all queries to be sent to the server that is the slave, no matter in which cluster

The value of flagOUT resulting from mysql_query_rules will become the flagIN in mysql_query_rules_fast_routing. This will allow to combine read/write split to routing based on username+schemaname.
For benchmark purpose, let’s create 200.000 rules:

(
echo "DELETE FROM mysql_query_rules; DELETE FROM mysql_query_rules_fast_routing; INSERT INTO mysql_query_rules (active,username,cache_ttl) VALUES (1,\"sbtest\",120000);"
for i in `seq 10001 60000` ; do
echo "INSERT INTO mysql_query_rules_fast_routing (username,schemaname,flagIN,destination_hostgroup,comment) VALUES (\"sbtest\",\"shard_$i\",0,1,\"\");"
done
echo "INSERT INTO mysql_query_rules_fast_routing SELECT username, schemaname, FlagIN+1, destination_hostgroup, comment FROM mysql_query_rules_fast_routing;"
echo "INSERT INTO mysql_query_rules_fast_routing SELECT username, schemaname, FlagIN+2, destination_hostgroup, comment FROM mysql_query_rules_fast_routing;"
echo "LOAD MYSQL QUERY RULES TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032

mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT COUNT(*) FROM mysql_query_rules_fast_routing"
+----------+
| COUNT(*) |
+----------+
| 200000   |
+----------+

Updating the graph after testing routing based on 200k rules:

Performance with legacy routing algorithm up to 200k rules

As you can see from the graph above, ProxySQL is now able to perform routing decision based on 200k rules with almost no extra impact compared to 100 rules in the legacy algorithm.

Drawback of the new algorithm

This new routing algorithm seems to have a minimum overhead of 1.7us (or 1.4us with mysql-stats_time_query_processor disabled).
The minimum overhead is not present in the old/legacy algorithm, therefore it makes sense to use the new routing algorithm only if you have more than 100 routing rules.
Furthermore, ProxySQL Cluster capability doesn’t support mysql_query_rules_fast_routing yet.
As of v2.0.0 mysql_query_rules_fast_routing is supported also in ProxySQL Cluster mode.

Memory usage

To reduce contention between worker threads in ProxSQL, threads maintain their own copy of query rules. This means that the more rules you have and the more threads you have configured, the more memory is needed. In this example, loading 200k rules and using 4 threads lead to a memory usage of 280MB.
In future we plan to make configurable if, for query processing, ProxySQL should use more memory and be lock free, or use less memory and not be lock free, or a mix of the two options:

  • mysql_query_rules to be lock free
  • mysql_query_rules_fast_routing to be shared between threads

These are implementation details we will be looking into when planning to optimize ProxySQL even further.

Conclusion

ProxySQL 1.4.7 introduces a new routing algorithm that doesn’t replace the legacy algorithm, but enhances it.
Thank to the new routing algorithm, now ProxySQL can easily handle routing based on hundreds of thousands of schemas/shards with almost no impact (few microseconds).