Wednesday, June 3, 2015

SQL Load Balancing Benchmark - Comparing Performance of ProxySQL vs MaxScale

In the MySQL ecosystem there are few load balancers there are also open-source, and ProxySQL is one of the few proxies that works at the application layer and therefore is SQL aware.
In this blog post we will benchmark ProxySQL against MaxScale, another popular proxy for MySQL.
The idea to compare ProxySQL vs MaxScale came after reading an interesting blog post of Krzysztof Książek on SQL Load Balancing Benchmark, comparing performance of MaxScale vs HAProxy.

Disclaimer: ProxySQL is not GA yet, therefore please do not use it in production.



Sysbench setup

I wanted to setup a similar sysbench setup to what Krzysztof used in his benchmark, but it is slightly different:
a) instead of using a MySQL cluster with Galera, I setup a cluster with 1 master and 3 slaves. Since the workload was meant to be completely read-only and in-memory, the 2 setups are functionally identical;
b) instead of using AWS instances I used 4 physical servers: server A was running as a master and servers B, C and D were running as slaves. Since the master was idle (remember, this is a read-only workload that use only the slaves), I used the same box to also run sysbench and all the various proxies.

Benchmark were executed running the follow:
./sysbench \
--test=./tests/db/oltp.lua \
--num-threads=$THREADS \
--max-requests=0 \
--max-time=600 \
--mysql-user=rcannao \
--mysql-password=rcannao \
--mysql-db=test \
--db-driver=mysql \
--oltp-tables-count=128 \
--oltp-read-only=on \
--oltp-skip-trx=on  \
--report-interval=1 \
--oltp-point-selects=100 \
--oltp-table-size=400000 \
--mysql-host=127.0.0.1 \
--mysql-port=$PORT \
run

The versions used are:
Percona Server 5.6.22
sysbench 0.5
ProxySQL at commit a47136e with debugging disabled
MaxScale 1.0.5 GA
HAProxy 1.4.15

ProxySQL and MaxScale: few design differences


In the benchmark executed by Krzysztof, MaxScale was configured to listen on port 4006 where the service "RW Split Router" was running, and on port 4008 where the service "Read Connection Router" was running.
To my understand:
a) RW Split Router performs read/write split, parsing the queries and tracking the state of the transaction;
b) Read Connection Router performs a simple network forwarding, connecting clients to backends;
c) the two services, to operate, need to listen on different ports.

ProxySQL is, by design, different.

ProxySQL and RW split


ProxySQL performs a very simple query analysis to determine where the queries need to be send.
ProxySQL decides where a query needs to be forwarded based on a user configurable chain of rules, where a DBA can specify various matching criteria like username, schemaname, if there is an active transaction (feature not completely implemented), and a regular expression to match the query.
Matching against a regular expression provides better speed than building a syntax tree, and having a chain of rules that match with either regex or other attributes allows a great degree of flexibility compared to hardcoded routing policies.
Therefore, to implemented a basic read/write split, ProxySQL was configured in a way that:
a) all the queries matching '^SELECT.*FOR UPDATE$' were sent to master ;
b) all the queries not matching the previous rules but matching '^SELECT.*' were sent to slaves.
c) by default, all traffic not matching any of the previous rules was sent to master;

Considering the 3 rules listed above, all traffic generated by sysbench was always sent to slaves.

Additionally, while ProxySQL doesn't perform any syntax parsing to determine the target of a query, no matter what routing rules are in place, it also performs a very simple query analysis to determine what type of statement is being executed and generate statistics based on these. That is, ProxySQL is counting the type of statements that is executing, and these information are accessible through ProxySQL itself.

As already pointed in previous articles, one of the main idea behind ProxySQL is that the DBA is now the one controlling and defining query routing rules, making routing completely transparent to the developers, eliminates the politics behind DBAs depending on developers for such tweaking
of the setup, and therefore increasing interaction speed.

ProxySQL and Fast Forwarding

I think that the way MaxScale implements different modules listening on different port is a very interesting approach, yet it forces the developers to enforce some sort of read/write split in the application: connect to port 4006 if you want R/W split, or port 4008 if you want RO load balancing.
My aim in ProxySQL is that the application should have a single connection point, ProxySQL, and the proxy should determine what to do with the incoming requests. In other words, the application should just connect to ProxySQL and this should take care of the rests, according to its configuration.
To do so, ProxySQL should always authenticate the client before applying any rule. Therefore I thought that a quick feature to implement is Fast Forwarding based on username: when a specific user connects, all its requests are forwarded to the backends without any query processing or connection pool.
In other words, ProxySQL's Fast Forwarding is a concept similar to MaxScale's Read Connection, but uses the same port as the R/W split module and the matching criteria is the client's username instead of listener port.
Note that ProxySQL already support multiple listeners, but the same rules apply to all ports; in future versions, ProxySQL will support matching criteria also based on listener's port behaving in a similar way of MaxScale, but will also add additional matching criteria like the source of the connection.



Performance benchmarks


As said previously, on the same host where sysbench was running I also configured ProxySQL, MaxScale and HAProxy.
In the blog post published by Severalnines, one of the comment states that MaxScale was very slow with few connections, on physical hardware.
Therefore, the first benchmark I wanted to run was exactly at low number of connections, and progressively increase the number of connections.
ProxySQL and MaxScale were both configured with just 1 worker thread, and HAProxy was configured with only 1 process.

Please note that in the follows benchmark worker threads and connections are two completely different entities:
1) a connection is defined as a client connection;
2) a worker thread is a thread inside the proxy, either ProxySQL, MaxScale or HAProxy (even if HAProxy uses processes and not threads).
What could cause confusion is the fact that in sysbench a thread is a connection: from a proxy prospective, it is just a connection.

Benchmark with 1 worker thread




Tagline:
maxscale rw = MaxScale with RW Split Router
maxscale rr = MaxScale with Read Connection Router
proxysql rw = ProxySQL with query routing enabled
proxysql ff = ProxySQL with fast forwarding enabled

Average throughput in QPS:

Connections HAProxy MaxScale RW MaxScale RR ProxySQL RW ProxySQL FF
1 3703.36 709.99 722.27 3534.92 3676.04
4 14506.45 2815.7 2926.44 13125.67 14275.66
8 26628.44 5690.22 5833.77 23000.98 24514.94
32 54570.26 14722.97 22969.73 41072.51 51998.35
256 53715.79 13902.92 42227.46 45348.59 58210.93

In the above graphs we can easily spot that:
a) indeed, MaxScale performance are very low when running with just few connections (more details below);
b) for any proxy, performance become quite unstable when the number of connections increases;
c) proxysql-ff is very close to the performance of haproxy;
d) with only 1 or 4 client connections, ProxySQL provides 5 times more throughput than MaxScale in both modules; with only 8 client connections ProxySQL provides 4 times more throughput than MaxScale in R/W split, and 4.3 times more in fast forward mode;
e) at 32 client connections, proxysql-rw provides 2.8x more throughput than maxscale-rw, and proxysql-ff provides 2.3x more than maxscale-rr ;
f) 4 proxies configurations (haproxy, maxscale-rw, proxysql-rw, proxysql-ff) behave similarly at 32 or 256 client's connections, while maxscale-rr almost double its throughput at 256 connections vs 32 connections: in other words, when the number of connections is high some bottleneck is taken away.

Below are also the graphs of average throughput, average and 95% response time at low number of connections.





Fortunately, I have access to physical hardware (not AWS instances) and I was able to reproduce the issue reported in that comment: MaxScale seems to be very slow when running with just few connections.
Although, for comparison, I tried a simple benchmark on AWS and I found that MaxScale doesn't behave as bad as on physical server.
After these interesting results, I tried running the same benchmark connecting to MaxScale and ProxySQL not through TCP but through Unix Domain Socket, with further interesting results.
Unfortunately, I didn't have a version of HAProxy that accepted connections via UDS, so I ran benchmark against HAProxy using TCP connections.















Average throughput in QPS:
Connections HAProxy MaxScale RW MaxScale RR ProxySQL RW ProxySQL FF
1 3703.36 3276.85 3771.15 3716.19 3825.81
4 14506.45 11780.27 14807.45 13333.03 14729.59
8 26628.44 15203.93 27068.81 24504.42 25538.57
32 54570.26 16370.69 44711.25 46846.04 58016.03
256 53715.79 14689.73 45108.54 54229.29 71981.32

In the above graphs we can easily spot that:
a) MaxScale is no longer slow when running with just few connections: the performance bottleneck at low number of connections is not present when using UDS instead of TCP;
b) again, for any proxy, performance become quite unstable when the number of connections increase;
d) maxscale-rw is the slowest configuration at any number of connections;d) with an increased number of client connections, performance of MaxScale reaches its limits with an average QPS of 16.4k reads/s peaking at 32 connections for maxscale-rw , and an average QPS of 45.1k reads/s peaking at 256 connections for maxscale-rr;
e) with an increased number of client connections, performance of ProxySQL reaches its limits with an average QPS of 54.2k reads/s peaking at 256 connections for proxysql-rw , and an average QPS of 72.0k reads/s peaking at 256 connections for proxysql-ff .

As pointed already, with an increased number of connections the performance become quite unstable, although it is easy to spot that:
1) in R/W split mode, ProxySQL can reached a throughput over 3 times higher than MaxScale;
2) ProxySQL in Fast Forward mode can reach a throughput of 33% more than MaxScale in Read Connection Router mode;
3) ProxySQL in R/W split mode is faster than MaxScale in simple Read Connection Router mode.

The above points that while MaxScale has a readconnroute module with a low latency, none of the two MaxScale's module scale very well. The bottleneck seems to be that MaxScale uses a lot of CPU, as already pointed by Krzysztof in his blog post, therefore it quickly saturates its CPU resources without being able to scale.

Of course, it is possible to scale adding more threads: more results below!


MaxScale and TCP

At this stage I knew that, on physical hardware:
- ProxySQL was running well when clients were connecting via TCP or UDS at any number of connections;
- MaxScale was running well when clients were connecting via UDS at any number of connections;
- MaxScale was running well when clients were connecting via TCP with a high number of connections;
- MaxScale was not running well when clients were connecting via TCP with a low number of connections.

My experience with networking programming quickly drove me to where the bottleneck could be.
This search returns no results:
https://github.com/mariadb-corporation/MaxScale/search?utf8=%E2%9C%93&q=TCP_NODELAY

In other words, MaxScale never disabled the Nagle's algorithm, adding latency to any communication with the client. The problem is noticeable only at low number of connections because at high number of connections the latency introduced by Nagle's algorithm become smaller compared to the overall latency caused by processing multiple clients. For reference:
http://en.wikipedia.org/wiki/Nagle%27s_algorithm

I will also soon open a bug report against MaxScale.

What I can't understand, and I would appreciate if someone's else can comment on this, is why Nagle's algorithm doesn't seem to have any effect on AWS or other virtualization environments.
In any case, this is a very interesting example of how software behave differently on physical hardware and virtualization environments.

Because MaxScale performs on average, 5x more slowly at low number of connections via TCP, the follow graphs only use UDS for ProxySQL and MaxScale: the performance of MaxScale on TCP were too low to be considered.






Benchmark with 2 worker threads

Because MaxScale performs really bad at low number of connections via TCP due the Nagle's algorithm on physical hardware, I decided to run all the next benchmark connecting to MaxScale and ProxySQL only through UDS. HAProxy will still be used for comparison, even if connections are through TCP sockets.
I know it is not fair to compare performance of connections via TCP (HAProxy) against connections via UDS (for ProxySQL and MaxScale), but HAProxy is used only for reference.














Average throughput in QPS:
Connections HAProxy MaxScale RW MaxScale RR ProxySQL RW ProxySQL FF
4 14549.61 11627.16 14185.88 13697.03 14795.74
8 27492.41 21865.39 27863.94 25540.61 27747.1
32 81301.32 29602.84 63553.77 62350.89 77449.45
256 109867.66 28329.8 73751.24 81663.75 125717.18
512 105999.84 26696.6 69488.71 81734.18 128512.32
1024 103654.97 27340.47 63446.61 74747.25 118992.24


Notes with 2 worker threads (for MaxScale and ProxySQL) or 2 worker processes (HAProxy):
a) once again, for any proxy, performance become quite unstable when the number of connections increase. Perhaps this is not a bug in the proxies, but it is a result of how the kernel schedules processes;
b) up to 32 client connections, performance of 2 workers is very similar to performance of 1 worker no matter the proxy. Each proxy configuration has its different performance, but it performs the same with either 1 or 2 workers;
c) maxscale-rw reaches its average peak at 32 connections, reaching 29.6k reads/s;
d) maxscale-rr reaches its average peak at 256 connections, reaching 73.8k reads/s;
e) proxysql-rw reaches its average peak at 512 connections, reaching 81.7k reads/s;
f) proxysql-ff reaches its average peak at 512 connections, reaching 128.5k reads/s;

As pointed already, with an increased number of connections the performance become quite unstable, but as in the workload with just one worker thread it is easy to spot that:
1) in R/W split mode, ProxySQL can reach a throughput of nearly 3 times higher than MaxScale;
2) ProxySQL in Fast Forward mode can reach a throughput of 74% more than MaxScale in Read Connection Router mode;
3) ProxySQL in R/W split mode is faster than MaxScale in simple Read Connection Router mode.

The above points confirms what said previously: ProxySQL uses less CPU resources, therefore it is able to scale a lot better than MaxScale with an increased number of client connections.




Benchmark with 4 worker threads

I ran more benchmark using 4 worker threads for ProxySQL and MaxScale, and 4 worker processes  for HAProxy.










Average throughput in QPS:
Connections HAProxy MaxScale RW MaxScale RR ProxySQL RW ProxySQL FF
16 50258.21 41939.8 50621.74 46265.65 51280.99
32 89501.33 50339.81 87192.58 70321.17 85846.94
256 174666.09 52294.7 117709.3 115056.5 183602.6
512 176398.33 46777.17 114743.73 112982.78 188264.03
2048 157304.08 0 107052.01 102456.38 187906.29


What happens with 4 worker threads/processes?
a) as with 1 or 2 workers, for any proxy, performance become quite unstable when the number of connections increase, but this time the fluctuations seems more smooth. Yet, ProxySQL seems the most stable proxy at high number of connections;
b) at 32 connections, ProxySQL and HAProxy gives similar throughput at either 2 or 4 workers;
c) at 32 connections, MaxScale provides more throughput with 4 workers than at 2 workers, showing that MaxScale needs more CPU power to provide better throughput;
d) at 32 connections, HAProxy, ProxySQL and MaxScale provide similar reads/s if they do not analyze traffic (89.5k , 85.8k and 87.2k);
e) using R/W functionality, at 16 connections ProxySQL provides 10% more reads/s than MaxScale (46.3k vs 41.9k), and at 32 connections ProxySQL provides 40% more reads/s than MaxScale (70.3k vs 50.3k);
f) MaxScale in R/W mode wasn't able to run 2048 client's connections;
g) maxscale-rw reaches its average peak at 256 connections, with 52.3k reads/s;
h) maxscale-rr reaches its average peak at 256 connections, with 117.7k reads/s;
i) proxysql-rw reaches its average peak at 256 connections, with 115.1k reads/s;
j) proxysql-ff reaches its average peak at 512 connections, with 188.3k reads/s;
 
Few more notes on scalability with 4 threads:
1) in R/W split mode, ProxySQL can reached a throughput over 2 times higher than MaxScale;
2) ProxySQL in Fast Forward mode can reach a throughput of 60% more than MaxScale in Read Connection Router mode;
3) ProxySQL in R/W split mode is, for the first time, slightly slower than MaxScale in simple Read Connection Router mode (115.1k vs 117.7k).


Note on transport layer load balancing

I consider important only the benchmark related to R/W split because only this provides SQL load balancing; HAProxy, ProxySQL with fast forward and MaxScale with readconnroute module do not provide SQL load balancing, but are present in the benchmark above to provide some reference of the overhead caused by processing SQL traffic.
Furthermore, the performance of MaxScale's readconnroute cannot be compared with the performance of HAProxy or ProxySQL. From a user's prospective, I would prefer to use HAProxy because it can provide way better performance.


Conclusions

One of the main focus while developing ProxySQL is that it must be a very fast proxy, to introduce almost no latency. This goal seems to be very well achieved, and ProxySQL is able to process MySQL traffic with very little overhead, and it is able to scale very well.
In all the benchmark listed above ProxySQL is able to scale easily.
In fact, in R/W split mode (highly configurable in ProxySQL, but hardcoded in MaxScale), ProxySQL is able to provide up to 5 times more throughput than MaxScale, depending from workload.

Since ProxySQL in query processing mode (R/W split) provides more throughput than MaxScale's readconnroute in the majority of the cases, I would always use ProxySQL's query processing that implements important features like query routing, query rewrite, query caching, statistics, connection poll, etc.
At today, the only reason why I wouldn't use ProxySQL in production is that ProxySQL is not GA ... yet!





Tuesday, January 21, 2014

ProxySQL Tutorial - part 1




The first tutorial is a high level overview of how to install ProxySQL , and how to quickly configure and use it.




Install and compile ProxySQL




Installing and compiling ProxySQL is very straightforward:
1) download and uncompress the source code from GitHub
2) compile


Note:
The following packages are needed in order to compile:
- libssl and ibssl-dev (Ubuntu) or openssl and openssl-devel (Centos)
- libglib2 and libglib2.0-dev (Ubuntu) or glib2 and glib2-devel (Centos)
- libmysqlclient and libmysqlclient-dev (Ubuntu) or mysql-libs and mysql-devel (Centos)


Below an example:


vegaicm@voyager:~/proxysql$ wget -q https://github.com/renecannao/proxysql/archive/master.zip -O proxysql.zip
vegaicm@voyager:~/proxysql$ unzip -q proxysql.zip
vegaicm@voyager:~/proxysql$ cd proxysql-master/src/
vegaicm@voyager:~/proxysql/proxysql-master/src$ make
...
gcc -o proxysql obj/main.o obj/free_pkts.o obj/mem.o obj/debug.o obj/sqlite3.o obj/fundadb_hash.o obj/global_variables.o obj/mysql_connpool.o obj/mysql_protocol.o obj/mysql_handler.o obj/network.o obj/queue.o obj/threads.o obj/admin_sqlite.o -I../include -lpthread -ggdb -rdynamic -lcrypto `mysql_config --libs_r --cflags` `pkg-config --libs --cflags gthread-2.0` -DDEBUG -lm








Configure ProxySQL to run against your local MySQL instance



ProxySQL is under active development, evolving day by day thus some of the options and switches may change from version to version.
To simplify the configuration and to make it easy to try, ProxySQL is shipped with an interactive script that should help its configuration : proxysql_interactive_config.pl


Launch the script, and it will guide you step by step through a basic configuration.
The script will propose several default values, making the configuration very simple.
Following is an example, where we are going to use many of the defaults and configure only server and usernames:



vegaicm@voyager:~/proxysql/proxysql-master/src$ ./proxysql_interactive_config.pl



Interactive script to configure ProxySQL,
High Performance and High Availability proxy for MySQL





Generic options:
- core_dump_file_size : maximum size of core dump in case of crash
- stack_size : stack size allocated for each thread
- error_log : log file for error messages (not implemented yet)


   core_dump_file_size [0]:
   stack_size (65536-8388608) [524288]:



Clients can communicate with ProxySQL through 2 different sockets:
- proxy_mysql_port : TCP socket for MySQL traffic : default is 6033
- mysql_socket : Unix Domanin socket : default is /tmp/proxysql.sock


   proxy_mysql_port [6033]:
   mysql_socket [/tmp/proxysql.sock]:



ProxySQL uses an admin interface for runtime configuration and to export statistics.
Such interface uses the MySQL protocol and can be used by any MySQL client.
Options:
- proxy_admin_port : TCP socket for Administration : default is proxy_mysql_port-1 (6032)
- proxy_admin_user : username for authentication ( this is not a mysql user )
- proxy_admin_password : password for the user specified in proxy_admin_user


   proxy_admin_port [6032]:
   proxy_admin_user [admin]:
   proxy_admin_password [admin]:



ProxySQL allows to cache SELECT statements executed by the application.
Query cache is configured through:
- mysql_query_cache_partitions : defines the number of partitions, reducing contention
- mysql_query_cache_default_timeout : defaults TTL for queries without explicit TTL
- mysql_query_cache_size : total amount of memory allocable for query cache


   mysql_query_cache_partitions (1-64) [16]:
   mysql_query_cache_default_timeout (0-315360000) [1]: 30
   mysql_query_cache_size (1048576-10737418240) [67108864]:



Several options define the network behaviour of ProxySQL:
- mysql_threads : defines how many threads will process MySQL traffic
- mysql_poll_timeout : poll() timeout (millisecond)
- mysql_max_query_size : maximum length of a query to be analyzed
- mysql_max_resultset_size : maximum size of resultset for caching and buffering
- net_buffer_size : internal buffer for network I/O
- backlog : listen() backlog


   mysql_threads (1-128) [8]: 4
   mysql_poll_timeout (100-1000000) [10000]:
   mysql_max_query_size (1-16777210) [1048576]:
   mysql_max_resultset_size (1-1073741824) [1048576]:
   net_buffer_size (1024-16777216) [8192]:
   backlog (50-10000) [2000]:



ProxySQL implements an internal connection pool. Configurable with:
- mysql_connection_pool_enabled : enables the connection pool if set to 1
- mysql_wait_timeout : timeout to drop unused connections


   mysql_connection_pool_enabled (0-1) [1]:
   mysql_wait_timeout (1-31536000) [28800]:




ProxySQL connects to various mysqld instances that form the backend.
- mysql_servers : list of mysqld servers in the format host:port;host:port;...


   Hostname[:port] of backend#1 : 127.0.0.1:3306
   Would you like to add another backend server (Y-N) [N]: N



Few options specify how to connect to the backend:
- mysql_usage_user : user used by ProxySQL to connect to the backend to verify its status
- mysql_usage_password : password for user specified in mysql_usage_user


Note:
the user specified in mysql_usage_user needs only USAGE privilege, and you can create the user with GRANT USAGE


   mysql_usage_user [proxy]:
   mysql_usage_password [proxy]:


Note (again!):
The user specified in mysql_usage_user needs only USAGE privilege
You can create the user with GRANT USAGE ON *.* TO 'proxy'@'<proxysqlip>' IDENTIFIED BY 'proxy';




ProxySQL authenticates clients' connections, and then uses the same credentials to connect to the backends.
ProxySQL needs to know clients' usernames and passwords because a single client connection can generate multiple connections to the backend.


   Username for user#1 : testuser
   Password for user msandbox : testpassword
   Would you like to add another user (Y-N) [N]: N


Basic configuration completed!



If you compiled ProxySQL with debug information (enabled by default) you can enable debug verbosity.


   Would you like to enable debug verbosity? (Y-N) [N]: N




#
# ProxySQL config file
#
# Generated using proxysql_interactive_config.pl
#
[global]
core_dump_file_size=0
debug=0
stack_size=524288
proxy_admin_port=6032
proxy_admin_user=admin
proxy_admin_password=admin
net_buffer_size=8192
backlog=2000


[mysql]
mysql_threads=4
proxy_mysql_port=6033
mysql_socket=/tmp/proxysql.sock
mysql_query_cache_partitions=16
mysql_query_cache_default_timeout=30
mysql_query_cache_size=67108864
mysql_poll_timeout=10000
mysql_max_query_size=1048576
mysql_max_resultset_size=1048576
mysql_connection_pool_enabled=1
mysql_wait_timeout=28800
mysql_servers=127.0.0.1:3306
mysql_usage_user=proxy
mysql_usage_password=proxy
[mysql users]
testuser=testpassword


[debug]
debug_generic=0
debug_net=0
debug_pkt_array=0
debug_memory=0
debug_mysql_com=0
debug_mysql_connection=0
debug_mysql_server=0
debug_admin=0
debug_mysql_auth=0



Would you like to write a configuration file? (Y-N) [Y]:
    config filename [proxysql.cnf]:  


Configuration completed!
Quit



Note on debugging: do not enable debug at this stage. I will write an article just on this topic.



Create users in your local MySQL instance to allow ProxySQL to connect:


GRANT USAGE ON *.* TO 'proxy'@'127.0.0.1' IDENTIFIED BY 'proxy';
CREATE USER 'testuser'@'127.0.0.1' IDENTIFIED BY 'testpassword';
GRANT ALL PRIVILEGES ON test.* TO 'testuser'@'127.0.0.1';
GRANT ALL PRIVILEGES ON sbtest.* TO 'testuser'@'127.0.0.1';



Run ProxySQL



Starting ProxySQL is very straightforward, just execute the following in your prompt :


vegaicm@voyager:~/proxysql/proxysql-master/src$ ./proxysql


Note that currently it has some intentional limitations due the fact that is under development:
- no daemon support, only foreground;
- no logging facilities included, only stdout;
- the working directory must be writable .



Connect to MySQL through ProxySQL



Let’s make a simple test to connect to ProxySQL using tcp port 6033 :



vegaicm@voyager:~$ mysql -u testuser -ptestpassword -h 127.0.0.1 -P6033
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3055020864
Server version: 5.1.30 (Ubuntu)


Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.


Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


mysql> SHOW GRANTS;
+-----------------------------------------------------------------------------------------------------------------+
| Grants for testuser@127.0.0.1                                                                                   |
+-----------------------------------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'testuser'@'127.0.0.1' IDENTIFIED BY PASSWORD '*9F69E47E519D9CA02116BF5796684F7D0D45F8FA' |
| GRANT ALL PRIVILEGES ON `test`.* TO 'testuser'@'127.0.0.1'                                                      |
| GRANT ALL PRIVILEGES ON `sbtest`.* TO 'testuser'@'127.0.0.1'                                                    |
+-----------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)


mysql>
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A


mysql> CREATE TABLE proxytest (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=INNODB;
Query OK, 0 rows affected (0.04 sec)


mysql> INSERT INTO proxytest VALUES(NULL);
Query OK, 1 row affected (0.00 sec)


mysql> INSERT INTO proxytest SELECT NULL FROM proxytest;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0


mysql> INSERT INTO proxytest SELECT NULL FROM proxytest;
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0


mysql> SELECT * FROM proxytest;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
+----+
4 rows in set (0.00 sec)


mysql> \q
Bye


Now let’s try to connect to ProxySQL through socket /tmp/proxysql.sock :


vegaicm@voyager:~$ mysql -u testuser -ptestpassword -S /tmp/proxysql.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A


Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3054492480
Server version: 5.1.30 (Ubuntu)


Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.


Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


mysql> SELECT * FROM proxytest;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
+----+
4 rows in set (0.00 sec)


mysql> \q
Bye



Simple queries using ProxySQL seems to give us the results we were expecting.
Using mysqlslap we can quickly test how ProxySQL performs compared to MySQL running the very simple query “SELECT * FROM proxytest”. We will run MySQL with and without MySQL query cache enabled.



vegaicm@voyager:~$ for i in `seq 1 20000` ; do echo "SELECT * FROM proxytest;" ; done > /tmp/select_proxytest.sql


vegaicm@voyager:~$ mysql -u root -e "SET GLOBAL query_cache_size=0;" ; mysqlslap -u testuser -ptestpassword --create-schema=test -c 8,16,32,64,128,256 -q /tmp/select_proxytest.sql -S /var/run/mysqld/mysqld.sock | grep "Average number of seconds"
   Average number of seconds to run all queries: 3.284 seconds
   Average number of seconds to run all queries: 6.591 seconds
   Average number of seconds to run all queries: 13.530 seconds
   Average number of seconds to run all queries: 29.546 seconds
   Average number of seconds to run all queries: 66.452 seconds
   Average number of seconds to run all queries: 138.742 seconds



vegaicm@voyager:~$ mysql -u root -e "SET GLOBAL query_cache_size=1024*1024;" ; mysqlslap -u testuser -ptestpassword --create-schema=test -c 8,16,32,64,128,256 -q /tmp/select_proxytest.sql -S /var/run/mysqld/mysqld.sock | grep "Average number of seconds"
   Average number of seconds to run all queries: 1.632 seconds
   Average number of seconds to run all queries: 4.528 seconds
   Average number of seconds to run all queries: 7.533 seconds
   Average number of seconds to run all queries: 17.182 seconds
   Average number of seconds to run all queries: 41.967 seconds
   Average number of seconds to run all queries: 91.434 seconds




vegaicm@voyager:~$ mysqlslap -u testuser -ptestpassword --create-schema=test -c 8,16,32,64,128,256 -q /tmp/select_proxytest.sql -S /tmp/proxysql.sock | grep "Average number of seconds"
   Average number of seconds to run all queries: 1.135 seconds
   Average number of seconds to run all queries: 1.908 seconds
   Average number of seconds to run all queries: 4.575 seconds
   Average number of seconds to run all queries: 9.451 seconds
   Average number of seconds to run all queries: 18.843 seconds
   Average number of seconds to run all queries: 36.143 seconds






A picture is worth a thousands words:







Build a small cluster to run ProxySQL



While it is possible to use ProxySQL on a setup with a single MySQL server, to use many of its features (load balancing, queries routing, read/write split) it is required to have several MySQL backends, configured in various architectures.
To build a small cluster with several MySQL backends we will create a simple replication setup using MySQL Sandbox .


If you are not familiar with MySQL Sandbox, it is “a tool for installing one or more MySQL servers in isolation, without affecting other servers” that you can download from https://launchpad.net/mysql-sandbox .


vegaicm@voyager:~$ make_replication_sandbox mysql_binaries/mysql-5.5.34-linux2.6-i686.tar.gz
installing and starting master
installing slave 1
installing slave 2
starting slave 1
.... sandbox server started
starting slave 2
.... sandbox server started
initializing slave 1
initializing slave 2
replication directory installed in $HOME/sandboxes/rsandbox_mysql-5_5_34




Now that the cluster is installed, verify on which ports are listening the various mysqld processes:


vegaicm@voyager:~$ cd sandboxes/rsandbox_mysql-5_5_34
vegaicm@voyager:~/sandboxes/rsandbox_mysql-5_5_34$ cat default_connection.json
{
"master":
   {
       "host":     "127.0.0.1",
       "port":     "23389",
       "socket":   "/tmp/mysql_sandbox23389.sock",
       "username": "msandbox@127.%",
       "password": "msandbox"
   }
,
"node1":
   {
       "host":     "127.0.0.1",
       "port":     "23390",
       "socket":   "/tmp/mysql_sandbox23390.sock",
       "username": "msandbox@127.%",
       "password": "msandbox"
   }
,
"node2":
   {
       "host":     "127.0.0.1",
       "port":     "23391",
       "socket":   "/tmp/mysql_sandbox23391.sock",
       "username": "msandbox@127.%",
       "password": "msandbox"
   }
}
The mysqld processes are listening on port 23389 (master) and 23390 and 23391 (slaves).


Important note:
Unless configured differently (details in another tutorial), ProxySQL considers a writable master any server that is configured with read_only=OFF . Because MySQL Sandbox doesn’t configure the slaves with read_only=ON , ProxySQL will consider any node as a master : that is obviously incorrect and needs to be addressed before starting ProxySQL.


Change read_only setting on the two slaves:


vegaicm@voyager:~$ mysql -u msandbox -pmsandbox -h 127.0.0.1 -P23390 -e "SET GLOBAL read_only=ON"
vegaicm@voyager:~$ mysql -u msandbox -pmsandbox -h 127.0.0.1 -P23391 -e "SET GLOBAL read_only=ON"





Reconfigure ProxySQL to use the test cluster



It is possible to reconfigure ProxySQL without restarting it, but it is a long process that will be explained in another tutorial . For now, in order to reconfigure proxysql , you should :
- stop proxysql
- delete the proxysql.db* files
- edit proxysql.cnf or relaunch proxysql_interactive_config.pl


Note: proxysql.db* files are the database files that ProxySQL uses to store configurations after processing proxysql.cnf , and the configuration stored in these files have higher priority over the same configuration values stored in proxysql.cnf . That is the reason why you should delete proxysql.db* files if you edit proxysql.cnf , or the new settings in proxysql.cnf won’t have any effect.
In the next article we will describe the content of these database files and how to reconfigure ProxySQL without deleting any file and without restarting the process.


For semplicity I will rerun proxysql_interactive_config.pl and highlight the few differences:


vegaicm@voyager:~/proxysql/proxysql-master/src$ ./proxysql
^C
vegaicm@voyager:~/proxysql/proxysql-master/src$ rm -f proxysql.db*
vegaicm@voyager:~/proxysql/proxysql-master/src$ rm proxysql.cnf
vegaicm@voyager:~/proxysql/proxysql-master/src$ ./proxysql_interactive_config.pl



Interactive script to configure ProxySQL,
High Performance and High Availability proxy for MySQL
...


  Hostname[:port] of backend#1 : 127.0.0.1:23389
  Would you like to add another backend server (Y-N) [N]: Y
  Hostname[:port] of backend#2 : 127.0.0.1:23390
  Would you like to add another backend server (Y-N) [N]: y
  Hostname[:port] of backend#3 : 127.0.0.1:23391
  Would you like to add another backend server (Y-N) [N]: N




  mysql_usage_user [proxy]: msandbox
  mysql_usage_password [proxy]: msandbox


Note (again!):
The user specified in mysql_usage_user needs only USAGE privilege
You can create the user with GRANT USAGE ON *.* TO 'msandbox'@'<proxysqlip>' IDENTIFIED BY 'msandbox';




ProxySQL authenticates clients' connections, and then uses the same credentials to connect to the backends.
ProxySQL needs to know clients' usernames and passwords because a single client connection can generate multiple connections to the backend.


  Username for user#1 : msandbox
  Password for user msandbox : msandbox
  Would you like to add another user (Y-N) [N]: N





Your new proxysql.cnf should look like this .









Connect to the various MySQL backends through ProxySQL


Restart ProxySQL :


vegaicm@voyager:~/proxysql/proxysql-master/src$ ./proxysql



Once again , let’s make a simple test to connect to ProxySQL :



vegaicm@voyager:~$ mysql -u msandbox -pmsandbox -h 127.0.0.1 -P6033
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3048201024
Server version: 5.1.30 MySQL Community Server (GPL)


Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.


Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


mysql> SELECT 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)


mysql> \q


All looks good so far. But to which backend are we connected?
Let make a simple test:


vegaicm@voyager:~$ mysql -u msandbox -pmsandbox -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port"
+------------+--------+
| @@hostname | @@port |
+------------+--------+
| voyager    |  23390 |
+------------+--------+
vegaicm@voyager:~$ mysql -u msandbox -pmsandbox -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port"
+------------+--------+
| @@hostname | @@port |
+------------+--------+
| voyager    |  23391 |
+------------+--------+
vegaicm@voyager:~$ mysql -u msandbox -pmsandbox -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port"
+------------+--------+
| @@hostname | @@port |
+------------+--------+
| voyager    |  23390 |
+------------+--------+
vegaicm@voyager:~$ mysql -u msandbox -pmsandbox -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port"
+------------+--------+
| @@hostname | @@port |
+------------+--------+
| voyager    |  23389 |
+------------+--------+


It seems we are randomly connected to one of the backends.


Let make a different test now: we will send the same identical query over a very small period of time:


vegaicm@voyager:~$ for i in `seq 1 50` ; do mysql -u msandbox -pmsandbox -B -N -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port" ; done | sort | uniq -c
    50 voyager    23389
vegaicm@voyager:~$ for i in `seq 1 50` ; do mysql -u msandbox -pmsandbox -B -N -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port" ; done | sort | uniq -c
    50 voyager    23390
vegaicm@voyager:~$ for i in `seq 1 50` ; do mysql -u msandbox -pmsandbox -B -N -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port" ; done | sort | uniq -c
    50 voyager    23390
vegaicm@voyager:~$ for i in `seq 1 50` ; do mysql -u msandbox -pmsandbox -B -N -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port" ; done | sort | uniq -c
    50 voyager    23389
vegaicm@voyager:~$ for i in `seq 1 50` ; do mysql -u msandbox -pmsandbox -B -N -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port" ; done | sort | uniq -c
    50 voyager    23389
vegaicm@voyager:~$ for i in `seq 1 50` ; do mysql -u msandbox -pmsandbox -B -N -h 127.0.0.1 -P6033 -e "SELECT @@hostname, @@port" ; done | sort | uniq -c
    50 voyager    23391


Caching in action!
Because ProxySQL is configured to cache every SELECT statement for 1 second, if we send several identical queries on a short period of time the queries won’t be executed by the MySQL backend but resolved directly from ProxySQL internal cache!




Running some simple load test using ProxySQL



We will use sysbench to run some simple load test.
Note that the following results should not be considered a meaningful benchmark as all the servers run on the same physical box (my laptop) and as such they don’t really scale-out. To scale-out and improve performance you need to add more servers.
Therefore, you should consider the following only as an example of how to use proxysql.



First, prepare the table:


vegaicm@voyager:~$ sysbench --test=oltp --oltp-table-size=10000 --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=10000 prepare
sysbench 0.4.12:  multi-threaded system evaluation benchmark


No DB drivers specified, using mysql
Creating table 'sbtest'...
Creating 10000 records in table 'sbtest'...



Before running benchmarks note that :


1) performance of ProxySQL are affected by the fact that the TCP protocol is really expensive;
2) by default, sysbench uses the MySQL binary protocol ( that means it uses prepared statements ), but ProxySQL doesn’t recognize (yet?) this protocol, therefore can’t boost performance if prepared statements are used. Note that the majority of the applications do not use prepared statements, therefore ProxySQL is meant to boost the performance of the majority of the applications out there.
With this in mind, we will run some read only workload using Unix domain socket and disabling prepared statement in sysbench.


Running sysbench directly against mysqld without ProxySQL:
vegaicm@voyager:~$ sysbench --num-threads=16 --max-requests=0 --max-time=30 --test=oltp --oltp-table-size=10000 --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test --mysql-socket=/tmp/mysql_sandbox23389.sock --oltp-table-size=10000 --oltp-read-only --db-ps-mode=disable run | grep 'transactions:'
    transactions:                        39824  (1327.14 per sec.)


And with ProxySQL:
vegaicm@voyager:~$ sysbench --num-threads=16 --max-requests=0 --max-time=30 --test=oltp --oltp-table-size=10000 --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test --mysql-socket=/tmp/proxysql.sock --oltp-table-size=10000 --oltp-read-only --db-ps-mode=disable run | grep 'transactions:'
    transactions:                        80899  (2696.14 per sec.)





You can easily note how ProxySQL improves performance!





To summarize in a graph:



Testing read/write workload



What about using ProxySQL in a read/write workload?


sysbench with a read/write workload without proxysql:


vegaicm@voyager:~$ sysbench --num-threads=16 --max-requests=0 --max-time=30 --test=oltp --oltp-table-size=10000 --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test --mysql-socket=/tmp/mysql_sandbox23389.sock --oltp-table-size=10000 --db-ps-mode=disable run | grep 'transactions:'
    transactions:                        13010  (433.45 per sec.)


sysbench with a read/write workload with proxysql:


vegaicm@voyager:~$ sysbench --num-threads=16 --max-requests=0 --max-time=30 --test=oltp --oltp-table-size=10000 --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test --mysql-socket=/tmp/proxysql.sock --oltp-table-size=10000 --db-ps-mode=disable run | grep 'transactions:'
    transactions:                        14696  (487.60 per sec.)





The above shows that proxysql boosts performance also in read/write workload, but what it is more important is that transparently connects the clients to the correct servers, whatever is the master or a slave.


As mentioned previously, the above results are only an example of how to use proxysql : in another article we will run real benchmark adding more servers and highlighting how ProxySQL is able to scale-out the performance of your application without code changes.


In the next article we will discuss some of the internals of ProxySQL, and how to configure them.