Dedicated to Fernando Pedone and other modest and courageous researchers whose work made this possible.
Sysbench Synchrones Transatlantiques
If the curious reader decides to google for "synchronous replication" with the hope to educate himself on the matter, he will be presented with a whole lot of established and not so much expert advice and explanation, most of which turns out to be, well, to put it mildly, wild superstitions rooted in the beginnings of computer era. Basically the consensus among those "experts" would be that "synchronous replication is so problematic that you're better off with asynchronous". Only Microsoft (!!!) honestly admits that, asynchronous replication kinda "sucks and must die". Well, it may be too strong of a statement, but a refreshing point nevertheless.
One of the most widely cultivated myths about synchronous replication is that it is slow and has severe distance limitations. Some "experts" even have a simple (and therefore convincing) formula for the dependence of maximum transaction rate on the distance: 1/RTT, (where RTT stands for round trip time for electrical signal). "Synchronous replication transaction rate is limited by the speed of light!" say those new Einsteins. Indeed, the minimum signal RTT over the distance of 6000km (~4000 miles) is 0.04 seconds, which leads to 1/0.04 = 20 trx/sec. Ta-da! Hard science triumphs once again! Too bad for the general public, that it is unaware that advent of the virtual synchrony and other developments in computer science have drastically reduced distance limitations on synchronous replication ("synchronous" as in "either commit everywhere or not at all") since the 60s.
To check this I'll try to replicate Sysbench OLTP load between EC2 eu-west and us-east zones using MySQL/Galera 0.7.1rc cluster (which is soon to be released). I'll use large EC2 instances, 1M rows table and variable number of connections to see the trend. The load will be applied on the European part of the cluster and will be synchronously replicated to a reserve node in the American accessibility zone (Disaster Recovery scenario). Ping RTT between eu-west and us-east zones was measured to be: min/avg/max/mdev = 86.868/87.284/94.398/0.919 ms which leads to a "theoretical" limit of ~11 trx/sec for transaction rate.
For a reference point I'll take an official MySQL 5.1.41 build with innodb_flush_at_trx_commit
set to 1. This somewhat decreases MySQL performance, but we're testing HA solutions here, aren't we?
Results
I'll put the raw numbers at the end. Everybody likes pictures, including myself.
Transaction rate
Aaaah?! What is it?! Not only transaction rate is nowhere near 11 trx/sec predicted by the simple (and therefore convincing) formula. It is actually quite close to that of the plain MySQL with the right number of connections. That's right, it is called "concurrency". "Synchronous concurrency" if you will. And it even shows some signs of scalability, when adding more nodes. All this is achieved with out-of-the-box MySQL/Galera distribution without tweaking of any kind. (To tell the truth, 0.7 is not very tweaking friendly. I'll revisit this with 0.8)
Latencies
Well, RTT has its toll. However it is not that dramatic - just one RTT per transaction. It'd be much-much worse if you try to access the database with the client over that distance. Curiously, it contributes only with the low number of concurrent connections. At 64 connections and higher some other factors, independent of the distance, come into play. This roughly coincides with the sharp throughput cut-off we can see on the throughput chart.
Apparently we have some room for improvement here. Currently replication subsystem is optimized for LANs and with some tweaking should allow for far higher throughput (just a matter of buffer sizes) in WAN. Optimistic row lock release should further improve concurrency. Expect all these improvements in 0.8 which is not far away.
Conclusions
So we have learned something today, my curious reader. Synchronous replication is not necessarily slow. And it can be done over great distances. And not only for DR, but also to improve performance if your database needs to be accessed by clients from around the world.
So when you meet a "replication expert" who tries to educate you that synchronous replication is slow, or that it requires 2-phase commit for that matter, or... well... No, don't shit bricks like I normally do. Smile. Pat him sympathetically on the back and give him a link to MySQL/Galera distribution. Educate him.
Hard Numbers
In case you find pictures too cramped.
conns trx/sec dlk/sec lat(95) lat(avg)
plain + trx flush
8 356.25 0.0 0.0423 0.0224
16 437.51 0.0 0.0677 0.0366
32 426.04 0.0 0.1350 0.0751
64 419.96 0.0 0.2597 0.1524
1eu node
8 435.22 0.0 0.0352 0.0184
16 497.05 0.0 0.0589 0.0322
32 530.13 0.0 0.1123 0.0603
64 514.13 0.0 0.2194 0.1245
1eu + 1usa
8 76.25 0.0 0.1226 0.1049
16 141.62 0.0 0.1401 0.1129
32 204.55 0.0 0.2115 0.1564
64 328.22 0.0 0.2695 0.1949
128 332.54 0.0 0.5518 0.3846
192 345.54 0.0 0.8469 0.5551
256 339.87 0.0 1.2046 0.7522
320 340.46 0.0 1.5074 0.9383
2eu + 1usa
8 73.90 0.11 0.1287 0.1082
16 140.9 0.51 0.1366 0.1135
32 233.31 1.53 0.1890 0.1371
64 376.57 4.77 0.2784 0.1699
128 409.60 9.61 0.5578 0.3122
192 384.42 13.27 0.8818 0.4989
256 386.39 17.55 1.1541 0.6618
320 368.29 22.25 1.5761 0.8676
eu-usa rtt min/avg/max/mdev = 86.868/87.284/94.398/0.919 ms