The MySQL compressed protocol
MySQL's compressed protocol is not so widely used, but there are still users that use it. Thus, the need to implement it in mysqlnd.
Unfortunately the compressed protocol is full of landmines, a lot of them, exploding one by one.
How does the compressed protocol work?
During handshake the client asks the server to use compression. Every packet from the server is tried to be compressed, maybe there is a small threshold like in Connector/J, I haven't checked the server's code in net_serv.cc . If the compressed size is smaller than the original size, then the compressed data is used, otherwise the uncompressed one. The compressed protocol adds additional overhead of 3 bytes, the compressed header, which contains length encoded size (max 16M).
So the layout of wire data is like :
- 3 byte length
- byte sequence
- 3 byte length (for compressed)
- data
The first length is known from the normal protocol. It is always the size of the packet data but.... it depends. If the data compresses than it is the length of the compressed data without the size of the compressed header (3 bytes), however if the data doesn't compress well and the uncompressed data is sent, than these 3 bytes store the uncompressed data length + 3 bytes (the compressed header), which contain 0x0. As I just wrote, if the data doesn't compress well, the compressed length is set to 0x0, otherwise contains the length of the uncompressed data.
What is good in this scenario? The MySQL protocol works as before, just it tunnels another MySQL session in the current session. The payload which is sent also contains not only usable data but a 4 byte header. So, once the payload is decompressed we have a separate packet, which looks like a packet from the uncompressed protocol "header (4byte) + data".
Here comes the trick. To try to achieve better compression, when the server sends several packets it does join them in one, which gets compressed and sent in the payload of just one packet of the compressed protocol.
But there is more! The compressed session has its own sequence numbers, while the uncompressed data different ones.
And there is even more! In my experiments the uncompressed data from the server gets split in 16384 byte blocks, everyone of them separately compressed and sent over the wire. This might result and I experience it, in a normal packet split between two compressed payloads. Yikes!!!!
Maybe there are more mines, I haven't still found them.
So, what does that mean for mysqlnd? Well, if there is not much data mysqlnd works, even can workaround this schizophrenic sequence numbers, but this split in 16384 byte parts just killed the fish! mysqlnd can't handle it, still. More and more I am thinking that the real solution might be to write a stream filter and attach it to the stream connection thus mysqlnd's routines for reading the header and the data doesn't need to be changed - the filter will make the reads transparent of the underlying protocol. However, I never wrote a filter and by looking at zlib.inflate, it might not be that easy, but maybe I should just take str.rotate13, it might be easier to understand. So, stay tuned. There will be a solution, but I just can't say now what it will be.