A Progammer explores the IT Security field; offering packets of useful information he picks up along the way.
Subscribe

Archive for the ‘On the Job’

Compression on the job

May 26, 2008 By: Ron Category: On the Job No Comments →

Worked on an interesting assignment at work the other week. A little background first. TIBCO is a middle-ware solution used frequently in the financial industry. Tibco allows legacy applications to talk or communicate with each other. For example, a process in C++ can publish a TIBCO message that can be picked up and processed by a Java process and vise-versa. TIBCO is set up to run throughout the firm, across many different applications. TIBCO can be set to run in two modes; ‘reliable’ and ‘certified’ mode. Reliable messaging is not concerned with the receiving party actually receiving the message, it’s a publish and forget. If the recipient picks up the message that was sent, fine. If the recipient didn’t pick up, also fine. That is not the publishing process’ concern. Certified messaging, on the other hand, makes sure the receiving process or processes (multiple processes listening) actually get the message. If the receiver didn’t get the message because the process crashed, messages will be queued up so that when the process comes back up, the messages in the queue will be published out again.

The main process that runs on Unix talks to the Front-End (FE) trading system in reliable mode. The messages that are published to the FE are order acknowledgements, executions, tickets, amongst other types of messages. The FE processes these message in real-time. You can imagine all these messages being published out to 1000+ trading FE’s. So it’s possible that all these processes running might overload the network, especially in high volume trading times. We, therefore, needed a way to ease the amount of data sent over the wire to all these FE’s. I decided to try good old compression similar to ZIP and GZIP. I implemented a solution in JAVA that compressed the Java String message before it was published out over TIBCO to the FE. The FE needed coding modifications for this solution, as well, to handle the compressed messages and perform the decompression on the fly. I also made sure that this functionality can be turned off at run-time, just in case something unforeseen happened and we need to revert back to sending messages uncompressed.

Data compression reduces the size of data by using a compression scheme. There are many different types of compression algorithms that are used differently for certain types of files. The table below shows the compression rates of a few sample messages that were compressed using “on the fly” compression. Notice that the bigger the message, the better compression rate you will get. The reason is that “on the fly” compression uses a substitution scheme, so the more repetitive the text, the better compression rate you will get.