Musings on SQL
SQL stands for "Structured Query Language". For developers it is anything but. Why the hell do the 4 basic commands of SQL - Select, Insert, Update, Delete all have different formatting? Most programs need to add a record, then once its there update it. If the SQL used to do the update was similar, much code could be removed - always a boon.
Even better why not add "Update or Insert" as standard in the language? This would remove the need for may extra "is the record there, yes do this, no do that".
In my experience the thing that kills server performance is the number of SQL statements fired at the DB. ORACLE and SQLServer are my realms of knowledge - I have developed an app that works on either of them. The number one performance killer is the number of "hits" to the database. Asking for 1000 records normally quicker than doing individual selects. The average 3GHz Pentium IV PC can push about 400 of my SQL statements thru SQLServer a second. Oracle 9 is quite a bit slower - probably about 1/4 the speed using my heavily MS orientated environment (MTS,ADO,ODBC layers).
So I switched to a stored procedure for the update. I can now do about 1200 a second - so a substantial improvement. I have the SP in ORACLE as well, but have not benchmarked it. The Stored procedure was designed to be table bound - no dynamic SQL, and did a "check if there, update or insert" series of functions.
SPs are good for this type of operation, but lousy at data processing. The run rate of SPs is quite slow when not hitting the DB, and both TSQL and PL/SQL are clumsy to code complex logic in.
SPs are also fundamentally unportable, so should be avoided at all costs if you don't want DB lock in - trying to find equivalent functions in TSQL to PL/SQL and vice versa is a chore I could do with out.
So back to performance. SQL Server shows that read-only DBs are its forte. Selects are blindingly fast even with complex joins.
Downside: its locking can cause much pain on a busy DB. Views in particular cease to work whilst the table is being updated - something I discovered to my cost. I had a view of the most recent record. Updating the DB was all fine. Then I put two users on. Crunch. Database locked. Actually this happens a lot in a busy system. It makes coding OLAP apps hard. Yes you could be lazy like most people are, and not transactionalize your accesses, but this to me invalidates the point of the DB... There are however exciting SQL hints to ignore locks - to get around this problem!!
Oracle is much better in this regard. Transactions do not lock anything, and allow for seamless updating. Makes Oracle "keep going" much better under load. Oracle's technique however can lead to a complex transactional problem: if two transactions occur simultaneously that do updates and counts they will not give correct results. A manifestation of this problem would be an overdrawn account, even though transactions should be protecting against such things. To make matters worse serialized transactions are basically broken (or were of version 9) - which would be a way out of this problem.
Oracle also suffers from having no decent TOP command. Ugly joins on hidden fields are not the way to go - don't fall into the trap of using record number and then sorting. It will break. I should be able to tell Oracle not to worry about all records in a huge set, though.
Next come table sizes. Log tables are always difficult to deal with. One client had a 9 million row log table in SQLServer. I was upgrading the db, and messed up a field. Doing an update of one field from a backup proved to be more than SQLServer could manage. 1.5 hrs later I cancelled it and tried to optimize, but adding keys was no help - in fact probably made it worse. I have not tried the same with ORACLE, but I think that the performance fall off would be much less steep.
SQLServer also support forward only recordsets differently to Oracle. Inside transactions, SQL can only have one open at once. This is poo. This means that if you need to loop, and then inner loop, you cannot. Instead you are forced to compile lists and dump it to the DB. Bearing in mind the performance penalty of SQL statements, maybe this is not such a bad thing, however. Not so good when you start testing a huge piece of code, and suddenly find this limitation on a Monday on site.
Next up indexes. The query analyzer is good in SQL. It gives a good idea of what is going to happen. One thing I found out to my cost, though, was if your indexes do not exactly match the fields in, say, the "where" clause and the table grows too big, that index will cease to be used. All of a sudden massive performance fall off. Basically a 10000 row table fell back to a table scan. I suspect its cos the query optimizer "gives up" on the index on large tables.
Oh and remember to setup an optimize in SQL or analyse in ORACLE to fix your indexes. This is vital!
So once you have navigated your way through all the pitfalls how do they square up? ORACLE is a big old mother of a DB solid, but slow. Robust, but hard to manage - lots of 80's style tools and rubbish. If you do need it, get a copy of TOAD from Toadsoft. It makes the process slightly less painful.
SQLServer is on the other hand an upstart. It loves to perform super quick, but quickly falls on its face. In this day and age its starting to matter less, as machines are getting so big and fast that on small DBs SQL wins hands down. However scale it up to multi gigabyte DBs with huge tables and it quickly stifles itself. The tools are much friendlier though.
The final point on the two is that they are fundamentally incompatible with each other if you are vaguely interested in performance. Rapidly you need two copies of every SQL statement. Oracle's case sensitivity caused some issues - forcing every field thru an UPPER is a poor solution, but the only quick one I could think of, without changing the DB structure between the two vendors. SQL actually suffered more from this, and required a different SQL statement. At least Oracle 9i supports MSSQL join syntax in the ODBC driver now though.
Which brings me back to SQL as a language. Its crap. It was developed in the 80s by academics. Its abstraction from the underlying database has made many vendors produce broken, incompatible extensions. It lacks many useful tools - like adequate iteration thru parent-child relationships (ORACLE has some extensions for this which, of course, are non standard). It has kludgy formatting, and I cannot imagine its easy to match any underlying structure.
So what would I like? A nice XML based query structure. Specify keys and transforms. Bits could be turned on and off on demand like case sensitivity. Maybe some vendors have this - I've not bothered to check, but it needs to be an ANSI standard, and should be supported as an extension to SQL. Allow it to tell you extra information about the database like indexes and keys in a standard way, and support things like identity column datatypes out of the box.
Unfortunately DB vendors are so into lock in, there is little chance of this actually happening. But enough little rants, and maybe someone will find this blog on Google and do something about it...
Even better why not add "Update or Insert" as standard in the language? This would remove the need for may extra "is the record there, yes do this, no do that".
In my experience the thing that kills server performance is the number of SQL statements fired at the DB. ORACLE and SQLServer are my realms of knowledge - I have developed an app that works on either of them. The number one performance killer is the number of "hits" to the database. Asking for 1000 records normally quicker than doing individual selects. The average 3GHz Pentium IV PC can push about 400 of my SQL statements thru SQLServer a second. Oracle 9 is quite a bit slower - probably about 1/4 the speed using my heavily MS orientated environment (MTS,ADO,ODBC layers).
So I switched to a stored procedure for the update. I can now do about 1200 a second - so a substantial improvement. I have the SP in ORACLE as well, but have not benchmarked it. The Stored procedure was designed to be table bound - no dynamic SQL, and did a "check if there, update or insert" series of functions.
SPs are good for this type of operation, but lousy at data processing. The run rate of SPs is quite slow when not hitting the DB, and both TSQL and PL/SQL are clumsy to code complex logic in.
SPs are also fundamentally unportable, so should be avoided at all costs if you don't want DB lock in - trying to find equivalent functions in TSQL to PL/SQL and vice versa is a chore I could do with out.
So back to performance. SQL Server shows that read-only DBs are its forte. Selects are blindingly fast even with complex joins.
Downside: its locking can cause much pain on a busy DB. Views in particular cease to work whilst the table is being updated - something I discovered to my cost. I had a view of the most recent record. Updating the DB was all fine. Then I put two users on. Crunch. Database locked. Actually this happens a lot in a busy system. It makes coding OLAP apps hard. Yes you could be lazy like most people are, and not transactionalize your accesses, but this to me invalidates the point of the DB... There are however exciting SQL hints to ignore locks - to get around this problem!!
Oracle is much better in this regard. Transactions do not lock anything, and allow for seamless updating. Makes Oracle "keep going" much better under load. Oracle's technique however can lead to a complex transactional problem: if two transactions occur simultaneously that do updates and counts they will not give correct results. A manifestation of this problem would be an overdrawn account, even though transactions should be protecting against such things. To make matters worse serialized transactions are basically broken (or were of version 9) - which would be a way out of this problem.
Oracle also suffers from having no decent TOP command. Ugly joins on hidden fields are not the way to go - don't fall into the trap of using record number and then sorting. It will break. I should be able to tell Oracle not to worry about all records in a huge set, though.
Next come table sizes. Log tables are always difficult to deal with. One client had a 9 million row log table in SQLServer. I was upgrading the db, and messed up a field. Doing an update of one field from a backup proved to be more than SQLServer could manage. 1.5 hrs later I cancelled it and tried to optimize, but adding keys was no help - in fact probably made it worse. I have not tried the same with ORACLE, but I think that the performance fall off would be much less steep.
SQLServer also support forward only recordsets differently to Oracle. Inside transactions, SQL can only have one open at once. This is poo. This means that if you need to loop, and then inner loop, you cannot. Instead you are forced to compile lists and dump it to the DB. Bearing in mind the performance penalty of SQL statements, maybe this is not such a bad thing, however. Not so good when you start testing a huge piece of code, and suddenly find this limitation on a Monday on site.
Next up indexes. The query analyzer is good in SQL. It gives a good idea of what is going to happen. One thing I found out to my cost, though, was if your indexes do not exactly match the fields in, say, the "where" clause and the table grows too big, that index will cease to be used. All of a sudden massive performance fall off. Basically a 10000 row table fell back to a table scan. I suspect its cos the query optimizer "gives up" on the index on large tables.
Oh and remember to setup an optimize in SQL or analyse in ORACLE to fix your indexes. This is vital!
So once you have navigated your way through all the pitfalls how do they square up? ORACLE is a big old mother of a DB solid, but slow. Robust, but hard to manage - lots of 80's style tools and rubbish. If you do need it, get a copy of TOAD from Toadsoft. It makes the process slightly less painful.
SQLServer is on the other hand an upstart. It loves to perform super quick, but quickly falls on its face. In this day and age its starting to matter less, as machines are getting so big and fast that on small DBs SQL wins hands down. However scale it up to multi gigabyte DBs with huge tables and it quickly stifles itself. The tools are much friendlier though.
The final point on the two is that they are fundamentally incompatible with each other if you are vaguely interested in performance. Rapidly you need two copies of every SQL statement. Oracle's case sensitivity caused some issues - forcing every field thru an UPPER is a poor solution, but the only quick one I could think of, without changing the DB structure between the two vendors. SQL actually suffered more from this, and required a different SQL statement. At least Oracle 9i supports MSSQL join syntax in the ODBC driver now though.
Which brings me back to SQL as a language. Its crap. It was developed in the 80s by academics. Its abstraction from the underlying database has made many vendors produce broken, incompatible extensions. It lacks many useful tools - like adequate iteration thru parent-child relationships (ORACLE has some extensions for this which, of course, are non standard). It has kludgy formatting, and I cannot imagine its easy to match any underlying structure.
So what would I like? A nice XML based query structure. Specify keys and transforms. Bits could be turned on and off on demand like case sensitivity. Maybe some vendors have this - I've not bothered to check, but it needs to be an ANSI standard, and should be supported as an extension to SQL. Allow it to tell you extra information about the database like indexes and keys in a standard way, and support things like identity column datatypes out of the box.
Unfortunately DB vendors are so into lock in, there is little chance of this actually happening. But enough little rants, and maybe someone will find this blog on Google and do something about it...
Comments