Query Routing and Filtering

SQL Relay's query routing and filtering feature allows you to send one set of queries to one database, another set of queries to another and filter out queries entirely so that they aren't sent to any database.

How is this useful? Here are some examples of what you can do:

These are not the only things you can do. Use your imagination.


How it Works

Normally, SQL Relay maintains a set of persistent connections to a database or database cluster and distributes queries over the connections. The databases to connect to are defined in a configuration file.

When used in query routing and filtering mode, SQL Relay is configured to distribute queries over other instances of SQL Relay running in standard mode.

For instance, you can set up one instance of SQL Relay to maintain 3 connections to a MySQL database and another instance of SQL Relay to maintain 5 connections to a PostgreSQL database, then set up another instance of SQL Relay to route queries to the other 2 instances.

The router instance decides which queries to send to the other instances by comparing them to a set of regular expressions.


General Configuration

Each instance of SQL Relay that will be used as a query router must omit the connection tag and include a single router tag. The router tag must contain a set of route and filter tags. Each route tag defines an instance of SQL Relay to route queries to and contains a set of query tags defining which queries to route to it. Each filter tag contains a set of query tags defining which queries to filter out.

When deciding where to route a query (or whether to filter it out), SQL Relay runs through the query tags, in the order that they appear in the configuration file, and compares the query to the regular expression defined in the tag. If the query matches a regular expression defined in a query tag inside of a route tag, it will be routed to the instance of SQL Relay defined in the route tag. If the query matches a regular expression defined in a query tag inside of a filter tag, it will be filtered out.

See the examples below. Note that the regular expressions in the examples below aren't perfect. They are not case insensitive, for example, nor do they necessarily cover all possible cases. In some cases, better, or more regular expressions could be written. In other cases, developers may need to structure their queries so that they are guaranteed to match.

Sending Inserts, Updates and Deletes to a Master Database and Distributing Selects Over a Set of Slaves

Here's an example configuration for sending inserts, updates and delete's to a master database and distributing selects over a set of slaves.

In this example, 3 instances of SQL Relay are defined: one to maintain connections to the "master" database, another to maintain connections to a set of "slave" databases, and another to route queries over the "master" and "slave" instances. Clients would connect to the "router" instance (on port 9001 or to the /tmp/router.socket unix socket). In this example, all 3 instances of SQL Relay run on the same machine, but in reality, all 3 could run on separate machines.

<?xml version="1.0"?>
<!DOCTYPE instances SYSTEM "sqlrelay.dtd">

<instances>

        <!-- This instance maintains connections to the "master" MySQL database
                on the masterdb machine.  This instance only listens on the
                unix socket /tmp/master.socket and thus cannot be connected to
                by clients from another machine. -->
        <instance id="master" port="" socket="/tmp/master.socket" dbase="mysql">
                <users>
                        <user user="masteruser" password="masterpassword"/>
                </users>
                <connections>
                        <connection string="user=masteruser;password=masterpassword;host=masterdb;db=master;"/>
                </connections>
        </instance>


        <!-- This instance maintains connections to 4 "slave" MySQL databases
                on 4 slave machines.  This instance only listens on the unix
                socket /tmp/slave.socket and thus cannot be connected to by
                clients from another machine. -->
        <instance id="slave" port="" socket="/tmp/slave.socket" dbase="mysql">
                <users>
                        <user user="slaveuser" password="slavepassword"/>
                </users>
                <connections>
                        <connection connectionid="db1" string="user=slaveuser;password=slavepassword;host=slavedb1;db=slave;"/>
                        <connection connectionid="db2" string="user=slaveuser;password=slavepassword;host=slavedb2;db=slave;"/>
                        <connection connectionid="db3" string="user=slaveuser;password=slavepassword;host=slavedb3;db=slave;"/>
                        <connection connectionid="db4" string="user=slaveuser;password=slavepassword;host=slavedb3;db=slave;"/>
                </connections>
        </instance>


        <!-- This instance sends DML (insert,update,delete) and
                DDL (create/delete) queries to the "master" SQL Relay instance
                which, in turn, sends them to the "master" database.
                This instance sends any other queries to the "slave" SQL Relay
                instance which, in turn, distributes them over the "slave"
                databases. -->
        <instance id="router" socket="/tmp/router.socket" dbase="router">
                <users>
                        <user user="routeruser" password="routerpassword"/>
                </users>
                <router>
                        <!-- send all DML/DDL queries to "master"  -->
                        <route host="" port="" socket="/tmp/master.socket" user="masteruser" password="masterpassword">
                                <query pattern="^drop "/>
                                <query pattern="^create "/>
                                <query pattern="^insert "/>
                                <query pattern="^update "/>
                                <query pattern="^delete "/>
                        </route>
                        <!-- send all other queries to "slave" -->
                        <route host="" port="" socket="/tmp/slave.socket" user="slaveuser" password="slavepassword">
                                <query pattern=".*"/>
                        </route>
                </router>
        </instance>

</instances>
Provide a Single Point of Access to Multiple Databases

Here's an example configuration providing a single point of access to multiple databases.

In this example, we provide a single point of access to the MySQL database "mysqldb" and the PostgreSQL database "postgresqldb". Since both databases support the "database.tablename" syntax, we can use that format when specifying tables and send all queries containing "mysqldb." to the MySQL database and all queries containing "postgresqldb." to the PostgreSQL database.

<?xml version="1.0"?>
<!DOCTYPE instances SYSTEM "sqlrelay.dtd">

<instances>

        <!-- This instance maintains connections to a MySQL database -->
        <instance id="mysqldb" port="" socket="/tmp/mysqldb.socket" dbase="mysql">
                <users>
                        <user user="mysqldbuser" password="mysqldbpassword"/>
                </users>
                <connections>
                        <connection string="user=mysqldbuser;password=mysqldbpassword;host=mysqldb;db=mysqldb;"/>
                </connections>
        </instance>


        <!-- This instance maintains connections to a PostgreSQL database -->
        <instance id="postgresqldb" port="" socket="/tmp/postgresqldb.socket" dbase="postgresql">
                <users>
                        <user user="postgresqldbuser" password="postgresqldbpassword"/>
                </users>
                <connections>
                        <connection string="user=postgresqldbuser;password=postgresqldbpassword;host=postgresqldb;db=postgresqldb;"/>
                </connections>
        </instance>


        <!-- This instance sends queries containing "mysqldb." to the mysql
                database and "postgresqldb." to the postgresql database -->
        <instance id="router" socket="/tmp/router.socket" dbase="router">
                <users>
                        <user user="routeruser" password="routerpassword"/>
                </users>
                <router>
                        <!-- send all mysqldb queries to "mysqldb" -->
                        <route host="" port="" socket="/tmp/mysqldb.socket" user="mysqldbuser" password="mysqldbpassword">
                                <query pattern="mysqldb\."/>
                        </route>
                        <!-- send all postgresqldb queries to "postgresqldb" -->
                        <route host="" port="" socket="/tmp/postgresqldb.socket" user="postgresqldbuser" password="postgresqldbpassword">
                                <query pattern="postgresqldb\."/>
                        </route>
                </router>
        </instance>

</instances>
Prevent Joins Between Particular Sets of Tables

Here's an example configuration which prevent joins between particular sets of tables.

In this example, any select containing both table1 and table3 will be filtered out.

<?xml version="1.0"?>
<!DOCTYPE instances SYSTEM "sqlrelay.dtd">

<instances>

        <!-- This instance maintains connections to a MySQL database -->
        <instance id="mysqldb" port="" socket="/tmp/mysqldb.socket" dbase="mysql">
                <users>
                        <user user="mysqldbuser" password="mysqldbpassword"/>
                </users>
                <connections>
                        <connection string="user=mysqldbuser;password=mysqldbpassword;host=mysqldb;db=mysqldb;"/>
                </connections>
        </instance>


        <!-- This instance sends queries to the
                mysqldb instance after filtering them -->
        <instance id="router" socket="/tmp/router.socket" dbase="router">
                <users>
                        <user user="routeruser" password="routerpassword"/>
                </users>
                <router>
                        <!-- filter out selects containing table1 and table3 -->
                        <filter>
                                <query pattern="^select .* (table1.*table3|table3.*table1)"/>
                        </filter>
                        <!-- send any other queries to "mysqldb" -->
                        <route host="" port="" socket="/tmp/mysqldb.socket" user="mysqldbuser" password="mysqldbpassword">
                                <query pattern=".*"/>
                        </route>
                </router>
        </instance>

</instances>
Filter Out Queries That Lack a Where Clause

Here's an example configuration which filters out queries which lack a where clause.

<?xml version="1.0"?>
<!DOCTYPE instances SYSTEM "sqlrelay.dtd">

<instances>

        <!-- This instance maintains connections to a MySQL database -->
        <instance id="mysqldb" port="" socket="/tmp/mysqldb.socket" dbase="mysql">
                <users>
                        <user user="mysqldbuser" password="mysqldbpassword"/>
                </users>
                <connections>
                        <connection string="user=mysqldbuser;password=mysqldbpassword;host=mysqldb;db=mysqldb;"/>
                </connections>
        </instance>


        <!-- This instance sends queries to the
                mysqldb instance after filtering them -->
        <instance id="router" socket="/tmp/router.socket" dbase="router">
                <users>
                        <user user="routeruser" password="routerpassword"/>
                </users>
                <router>
                        <!-- send queries to "mysqldb" which match the
                                specified patterns -->
                        <route host="" port="" socket="/tmp/mysqldb.socket" user="mysqldbuser" password="mysqldbpassword">
                                <!-- allow creates -->
                                <query pattern="^create "/>
                                <!-- allow drops -->
                                <query pattern="^drop "/>
                                <!-- allow inserts -->
                                <query pattern="^insert "/>
                                <!-- allow deletes -->
                                <query pattern="^delete "/>
                                <!-- allow updates -->
                                <query pattern="^update "/>
                                <!-- allow selects with a where clause -->
                                <query pattern="^select .* from .* where "/>
                        </route>
                        <!-- all other queries will be filtered out -->
                        <filter>
                                <query pattern=".*"/>
                        </filter>
                </router>
        </instance>

</instances>
Quirks and Limitations

Query Normalization

To make pattern matching easier, SQL Relay "normalizes" the query before matching it against the pattern. The original query is run against the database but when matched against the pattern, whitespace is compresssed and the entire query is converted to lower-case.

When matching query operators, you must use lower-cased versions of them such as "select", "insert", "and", "or", etc. When matching table names, you must use a lower-cased version of the table-name.



Examples are Limited

The examples above are good introductions, but don't cover every scenario. For example, they expect the query to begin at the beginning of a line with no preceeding comments or spaces. You may have to write more complex patterns for your applications.



Perl Compatible Regular Expressions

SQL Relay is built upon the Rudiments library. Rudiments can be built with or without support for libpcre which provides support for Perl Compatible Regular Expressions. PCRE's are more powerful than standard posix regular expressions and have many more operators.

As such, if you copy an sqlrelay.conf file from a machine where Rudiments was compiled with PCRE support to a machine where Rudiments wasn't compiled with PCRE support, then it's possible that your patterns may not work on the new machine.

To make matters worse, sufficiently old versions of the posix regular expression functions had fewer operators than modern versions. So, even if Rudiments isn't using PCRE's, it's not impossible that after copying a sqlrelay.conf file from a fairly modern OS to an antique, the patterns won't work on the antique machine either.

The examples above ought to work with PCRE's and all versions of posix regular expressions.



Selects Not Showing Changes

In the scenario above where DML/DDL is sent to the master database and selects are distributed over slaves, an unintuitive thing can happen.

If you begin a transaction and do several inserts, updates and deletes, you'll find that if you do a select, you will not see your changes. This is because in a master-slave configuration, changes to the database are not pushed out to the slaves until the changes have been committed. Since your selects are being run against the slaves, you must first commit before your changes will be visible.



Commits and Rollbacks

Behind the scenes, the router maintains parallel transactions on each of the databases that it is routing queries to. When the client issues a begin, commit or rollback, the router issues a begin, commit or rollback to each of the databases. Similarly, if the client turns auto-commit on or off, the router turns auto-commit on or off on each of the databases.

There are scenarios where a commit, rollback or auto-commit on/off command could succeed on some of the databases and fail on others. Some databases have a 2-phase commit feature to handle these scenarios. With 2-phase commit, you can roll back a commit until you do second commit. Many databases don't support 2-phase commit though. At present, SQL Relay doesn't currently support 2-phase commit for any databases. So, currently, SQL Relay doesn't handle these scenarios very well. It returns false, indicating than the operation failed, but there's no good way to know which databases succeeded and which failed. Ideally, SQL Relay would remove the database that failed, stop sending queries to it, notify someone that it has gotten out of sync and log all of the queries that would have gone to it so they can be replayed later, or something like that. But it doesn't currently do any of that.

Since queries may be routed to different kinds of databases, the router has to employ some tricks to maintain parallel transactions. Some databases run in auto-commit mode by default and must be issued a "begin" query to start a transaction. Other databases implicitly start a new transaction when a client logs in and after each commit or rollback. If any of the databases being routed to require a "begin" query to start a transaction, then the ones that don't are put in auto-commit mode when the client logs in and after each commit or rollback and are taken out of auto-commit mode when the client sends a begin query. If none of the databases being routed to require a "begin" query to start a transaction, then the databases are not put in auto-commit mode when the client logs in or after each commit or rollback. Rather, transactions are implicitly started by the database. For example, if your client application is using a router which routes queries over both PostgreSQL and Oracle databases, then since PostgreSQL requires "begin" queries, you must use a "begin" query to start a transaction, even if your app only intends to send queries which would be run against Oracle. Conversely, if your client application is using a router which only routes queries over a set of Oracle databases, then you do not have to use "begin" queries.



Stored Procedures

It's possible to use stored procedures with SQL Relay's query routing feature. However, since stored procedures are run on the database, SQL Relay can't route the individual queries run inside the stored procedure. So, the stored procedure and all queries run inside of it will be run against whichever database it was routed to.