Mythbuster

There are numerous myths related to SQL Server. On this page I'll try to tell whether they are true, plausible or just completely wrong.

The apply operator causes one function call per row

First of all, you don't have to use apply against a function. You can also use it against methods like xml.nodes(), and you can use it against correlated subqueries, common table expressions or even tables if you insist. But, let's go back to the functions.

It is a common misconception that the apply operator is called once of each row on the left side of the apply operator. We can even have a look at books online:

"The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table expression acts as the left input. The right input is evaluated for each row from the left input and the rows produced are combined for the final output."

How the apply operator works, depends on the type of function on the right hand side. The description in BOL holds true for Multi Statement Table Valued Functions (type TF in sys.objects), but not for Inline Table Valued Functions (type IF in sys.objects). If you call a TF and look at the execution plan, you will find a Table Valued Function operator. If you call a IF, you will not. IFs can be expanded in the same way as a view (and hence it is sometimes referred to as parameterized view).

So, let's have a look at some code:

select
  * 
from 
  sys.objects so
    cross apply
  dbo.ObjectTypeCount(so.type)

select 
  * 
from 
  sys.objects so
    cross apply
  dbo.ObjectTypeCount2(so.type)

These two functions are as identical as possible, but ObjectTypeCount is multi statement, whereas ObjectTypeCount is inline.

Looking at the execution plans, you can see a Table Valued Function operator for ObjectTypeCount, but not for ObjectTypeCount2.

Execution plan using ObjectCount
Execution plan for TF
Execution plan using ObjectCount2
Execution plan for IF

Now, if you should happen to do this test and retrieve both execeution plans at the same time, notice that SQL Server tells you that the estimated cost of using IF is approximately twice as high as TF. This is because SQL Server is not very good at estimating costs for function calls. Actually, it is the other way around, even with this little data the IF is noticably faster.

Busted!

Download code as 7-zip archive.

The sum of Max Server Memory for all instances in a cluster should be less than the physical memory of the node with the least memory

As with the 10mbps for heartbeat network myth, this one is really based on an article by Microsoft. However, this article is on SQL Server 2000. SQL Server 2000 was not very good at releasing memory to the OS under memory pressure, thus there were made some changes to the memory mananger when SQL Server 2005 was released, to solve this problem. So, if you should happen to run a SQL Server 2000 cluster, you should definitely set Minimum and Max Server Memory of the SQL Server 2000 instances to control the memory usage of each of these.

For SQL Server 2005 onwards however, this is really not an issue anymore, and we can even look at a 175 page whitepaper on SQL Server Clustering, it does not mention memory opptions at all. So, does that mean that memory options is pretty much useless now?

No, not at all. Microsoft states that there are really three options you can go for:

Actually, there are more options. You have two options for how you can use the Maximum Server Memory option, and you can choose to use the Minimum Server Memory option or not. This gives us a total of six compbinations. So, let's look at the options:

It is a common misconception that SQL Server allocates the amount of memory specified in the Minimum Server Memory option when the instance starts. This is not the case, SQL Server allocates memory dynamically, and it does not even need to ever reach this amount of memory, for some reason. But, once the instance has reached this amount of memory, it will never release memory to an amount less than Minimum Server Memory, unless this option is changed first of course. You can use this option to prevent the instance from releasing too much memory.

Maximum Server Memory specifies an upper memory limit for the instance, potentially leaving out memory for the OS, other applications and other instances. If all nodes have the same amount of memory, Maximum Server Memory should normally be set somewhat lower than the amount of physical memory, reserving let's say 4GB to the OS. If the servers have different amount of memory, this should definitely be taken into consideration, as setting Maximum Server Memory to 28GB on an instance running on a 16GB host would be the same as not having Maximum Server Memory set.

As stated, you can choose to use the Maximum Server Memory option in two ways. As described above, you can reserve memory to the OS and applications only, or you can do it the SQL Server 2000 way and reserve memory for one or more other instances as well, to allow instnace failover without any instances having to release memory first. Normally, this is not an issue though, as SQL Server by default perform a checkpoint every minute to flush dirty pages to disk, if this is needed. Thus, there should be a limited amount of dirty pages waiting to be written to disk, and the instance should normally be able to release memory fairly quickly.

So, my recommendations for the memory options in a cluster environment, or any other environment with multiple instances:

I would also like to recommend Brent Ozar's MCM videos Clustering Introduction and Clustering Setup and Troubleshooting

I'll say that this myth is busted.

10 Mbit half duplex is recommended for heartbeat network in cluster

I've seen this a lot of times, and this myth originates from an old knowledgebase article on clustering in Windows 2000 (or even older). What Microsoft says is:

"If you are not sure of the supported speed of your card and connecting devices, or your manufacturer's recommended settings, Microsoft recommends that you set all the devices on that path of 10 MB/Sec and Half Duplex. This configuration will provide sufficient bandwidth and reliable communication."

If we overlook the typo (10MB/Sec instead of 10Mb/Sec), we can clearly see that this is not a general recommendation. It could actually be rewritten to something like:

"Set your devices to the highest speed and duplex rating that are supported by all components in the chain".

In todays systems, this is typically 1Gbps FDX. Please note that teamed network connections are not supported for the heartbeat network.

Busted!

Myth: SQL Server listens on port TCP 1433

SQL Server default instances listen on port number 1433 by default, so quite often you can connect to SQL Server using this port number. You cannot expect to connect to port 1433 though, as the default instance may be configured with a different port number (rare), or you may have to connect to a named instance. Named instances have dynamic port numbers by default, but they can be configured with specific port numbers. This is most commonly done when there is a firewall between the SQL Server and its clients. So, why is this so?

Each instance runs as a separate process. Only one process may listen to a specific combination of IP address and port number at a time, and as such each instance will either need its own IP address or port number. It is rather impractical to have each server requiring a separate IP address for each instance, so they use their own port numbers instead. In addition, it may be practical to have the instances listening on the loopback addresses (127.0.0.1 and ::1) too. This can only be done if the port numbers are different.

So, how can I know what port number the SQL Server instance is listening on? You may look for EventID 26022 in the application log. The Source here is the SQL Server instance, for instance MSSQL$SQL01 for the SQL01 named instance. In the message field you will see something like:

Server is listening on [::1  49170].
Server is listening on [127.0.0.1  49170].

It may be tempting to just add a rule in the firewall for this port number. Don't. There is no guarantee the instance will keeps its port number. In my test environments, I've seen the port number change several times. I would not even expect it to the the same for all IP addresses, even though I've never seen a case where they're not. If you want to have a specific port number, use SQL Server Configuration Manager to configure the TCP/IP Protocol under “SQL Server Network Configuration”. For each IP of the SQL Server, you may specify whether the instance should listen on it. If the TCP Dynamic Ports is 0, it uses dynamic ports (hey, that's logical right?). Clear it to use a static port, and assign it in the TCP Port field.

So, how the client know what port number to connect to? It does not. It connects to the server on UDP 1434, where it finds the SQL Server Browser service, and asks how it can connect to the instance, and the SQL Server Browser service replies with connection information, such as IP address and port number.

Myth: You must start the restore process with the last full backup

Fortunately, this is not true. Can you imagine that your last full backup is corrupt? If this myth were true, it would be impossible to avoid data loss in case of a disaster. You can start the restore process at any full backup, and restore all log backups in sequence afterwards. It is always recommended to start with the last full backup when you can however, to keep the TTR (time to recover) as low as possible. Busted!

Myth: Truncate table is a nonlogged operation

Truncate table does actually not delete any rows, and as such no deletion is logged in the transaction log. The truncate table statement deallocate all pages allocated to an object and resets the seed of any identity columns, and this operation is indeed logged, and can be rolled back. Busted!

Myth: Common Table Expressions and Windowing functions are proprietary

First of all, I'll say that there is nothing wrong with proprietary code in a database system, quite on the contrary. If you use proprietary code well, you are most likely capable of solving issues in the most effective manner on that platform.

When it comes to Common Table Expressions (CTEs, also known as the WITH-clause) and windowing functions (also known as the OVER clause) these are not proprietary at all. They were both introduced in the SQL:1999 standard, approximately six years before the release of SQL Server 2005, the first version of SQL Server supporting these syntaxes. Both CTEs and Windowing functions are supported by several major database platforms, among them PostgreSQL, DB2, Oracle and SQL Server. Busted!

Myth: A trigger fires once for each row affected by a query

To be quite honest, nobody have told be this. But, I've seen a lot of code which is based on this misunderstanding of how triggers work, so I'll spend a few minutes to desribe why it is important to know how a trigger works. So, let's just start of with a littlebit of code:


use tempdb;
go

create table TriggerTestTable (
  id int
);
go

create trigger DemoTrigger on TriggerTestTable for update 
as
begin
  declare @a int;
  select @a = id from deleted;
  print 'Previus value was '+cast(@a as varchar);
end;
go

insert into TriggerTestTable values (1);
insert into TriggerTestTable values (2);
go

update TriggerTestTable set id = id+1;
go

drop trigger DemoTrigger;
go

drop table TriggerTestTable;
go

If you wonder what the trigger will print when you do the update, it is a single line saying "Previous value was 1". You may be surprised that it does not print two lines, but you should not be. Keep in mind that inserted and deleted are two pseudo-tables, and as such they may have several rows. The trigger is fired once, and in this case both inserted and deleted contained two rows, one for each of the rows updated. However, we did not take this into consideration in our trigger.

Since a trigger fires once for a inserted, update or delete regardless of the number of rows affected, you shall always use set based operations against the pseudo tables. Never assume that only one row is affected. And if you still should happen to be in doubt, this myth is busted!

Download code as 7-zip archive.

Myth: Clustered indexes on identity columns don't get fragmented.

There is absolutely no guarantee that a clustered index on an identity column won't be fragmented. Keep in mind that fragmentation may come because of either inserts, updates and deletes. Even if we don't delete any data, a clustered index may be fragmented due to updates. I've seen large clustered indexes on identity columns that had a average fragmentaion of more than 99%. Like kimberly Tripp said, they're perfectly fragmented. So, why does this happen?

In this case, fragmenation occurs since a page does not have enough room for the row it should hold. But, since the rows are inserted in order, it's a common misunderstanding that you won't experience fragmentation. The logic seems simple, if the block does not have room for the row, another block is added, and thus you avoid fragmentation. So, why is this argumentation wrong? Since we expect the width of the row to be constant! We have not taken into consideration that the row may be updated later on. And, it is important to understand that even though you don't update your clustering key, you may still get fragmentation. For instance, you may insert a lot of rows into a table, and all of these rows have a variable length column set to null. All pages is the table that you alter are fully stuffed with these rows. Later you run a second script to alter the variable length column of some of these rows. Since the page holding the row is full, the update causes a page split, and thus fragmentation.

Normally, an identity column is a very good candidate for a clustered index. We have however seen that freqent updates of variable length columns may cause serious fragmentation in such cases. If you have variable length columns in a table with an identity column, you should consider the update rate for these columns (and whether the updates alter the width of the row) before creating a clustered index on the identity column. Busted!

Myth: Join conditions can be moved to the where clause and vice versa

For inner joins, this is true. There is no difference whether a condition is a part of the join or the where clause. For outer joins however, it is a major difference.

When a condition is a part of the join clause, it controls wether SQL Server can make a match between the tables. Let's say we have a left outer join between the tables A and B. Since there is no match in in table B, NULLs are returned for all columns from B, but information is still returned from table A.

When a condition is a part of the where clause, it controls wether SQL Server will return the row or not. So, if you have a condition on a column from table B, you will only return rows where you actually was able to get a match, which in fact is an inner join. If you look at the execution plan, you will even see that SQL Server has replaced your outer join with an inner join. You can however eliminate this behaviour by using isnull or coalesce.

We've clearly seen that there is a difference between conditions in the join and the where clause. Busted!

Myth: SQL Server cannot utilize more than one index at a time

A common challenge with indexing, is what columns to choose. Of course, picking every column in the table is not a very good idea. We want to have a minimum number of columns in the index, and we want it to cover as much queries as possible. Quite often these two goals are contradicotry. Often we end up having multiple small indexes instead of one large. In some cases, it may even be useful to have an index on a simple colum. SQL Server is actually capable of using this index in combination with other indexes. In your execution plan, you may for instance see one index scan and one index seek followed by a inner join of some kind, for instance a hash match using your clustering key as hash key. SQL Server will join two indexes in this way when:

A good example may be when you have a wide table and issue a query that returns a lot of rows and can be covered by two narrow indexes. Busted!

Myth: You cannot rename a table without modifying client code

Sometimes you want to rename a table that has been put into production. There can be many reasons, among the more common is enforcing a naming standard. For instance, you may have a naming standard that states that all tables should have names that are the plural form of enlgish nouns, written in Pascal case. Let's say that you find yourself having a table called bruker_grupper. It would not be very popular just to rename the table. We can however rename the table and create a synonym for it at the same time:

sp_rename 'bruker_grupper','UserGroups';
create synonym bruker_grupper for UserGroups;

Now we have enforced the naming standard for the tables, while the old code still works. Busted!

Myth: Collation has no effect on unicode strings

Fortunately, this is not true. As mentioned earlier, collations has two effects. One of them is the character set used by SQL Server, which clearly does not have any effect on unicode columns (nchar and nvarchar). The other effect of a collation is the way values are compared, and thus sorted. This affects both non-unicode as well as unicode data. Busted!

Myth: A procedure whose name start with sp_ will perform worse than others

First of all, the prefix sp_ does not mean stored procedure, but system procedure or special procedure. These procedures are special since they can be executed directly (without using fully qualified name) from any database even though they are stored in the master database. If you call one of these procedures with fully qualified name or from the database in which they are stored (typically master), they perform equally well as other procedures. If you call them without using fully qualified name from another databsae however, you will experience a performance hit since SQL Server have to look for the procedure in the local database first, then in the master database. This myth is not completely wrong, so... I'll say plausible.

Myth: An inner join returns the combined rows from two tables where some column is equal in the two tables

A join where columns in two tables should be equal is a equi-join, not an inner join. An inner join can be written with any operator, you can even use the between operator. An inner join returns the combined rows from two tables where the join criteria (comparison) is satisfied. Most commonly an inner join is also a equi-join, but this is not a requirement. Busted!

Myth: The order of joins is irrelevant

This is covered in a Q&A article. In some cases the order of the joins does matter, so it's busted!

Myth: You must use the APPLY operator to create a join against a table valued funciton

It is correct that the APPLY operator is used to make joins against funtions (or methods of .NET based data types) but JOIN and APPLY work in different ways. If you use APPLY the function (or method) is called once for each row. Typically this is the case if the .NET based data type is a column in the table that you query, or if you call a function that takes one or more of your columns as parameters and return a data set. In these cases you have to use the apply operator.

If the function does not take any paramters or if it is called without any correlated parameters, you can use a JOIN just as normal. Busted!

Myth: The XML datatype stores the value assigned to it

When you assign a value, that is an XML document or fragment, to the XML data type this value is parsed into a tree structure representing the content of the value. The value itself is not stored however, and there is absolutely no guarantee that the XML document retrieved from the XML data type is identical to the one assigned to it. In fact, there are some cases in which it is guaranteed not to be identical, as for instance:

declare
  @a xml;
set @a = '<test></test>';
select @a;

You might expect this code to return <test></test>, but it does in fact return <test /> instead. So, instead of having an opening tag and a closing tag for our empty element, we got an empty tag instead. Semantically we do have the same document, but it is not identical. Busted!

Myth: A table cannot be moved from one schema to another

A possible cause for this myth is that you cannot move a table from one table to another by using the alter table syntax. You can do it by use the alter schema syntax though:

use AdventureWorks2008
go

alter schema HumanResources transfer Person.Address;

This code will transfer the Address table from the schema Person to HumanResources. Busted!

Myth: Since no data is persisted in tempdb, it does not need to be on shared storage in a cluster

Unfortunately, this is not true. There is a feature request on this case, but so far you cannot have the tempdb database stored locally in a cluster. If you try, you will get error 5184. Busted!

Myth: The revoke statement denies permission

Let's start off by describing what a revoke actually does. Revoke removes an entry from the access control list. So, what does this actually mean? Imagine that you have a user Bob which is member of a Sales role. The Sales role has been granted select on the Order table, but Bob has been denied select on the same table. If we now issue a revoke select on the table for Bob, Bobs security entry for select against the table is removed. Since the deny rule is removed, user Bob will now have select permission against the table since he is a member of the Sales role. Busted!

Myth: With simple recovery model the log file will be small

While this is true in most cases, there are absolutely no guarantee. With simple recovery model SQL Server will only keep data in the transaction log until the data is committed or rolled back and a checkpoint has occured. In other words, if you have a (long running) transaction performing a lots of inserts, updates or deletes you will see that your transaction logs grows pretty much as they would have done in full recovery model too. Please note that even if this database were in full recovery model and you were taking regular logbackups, the logfile would still grow as the requirement for releasing the internal space in the logfile requires the transaction to be committed or rolled back. So, not even regular log backups are guaranteed to prevent the log file from growing. Busted!

Myth: Clients will not notice a cluster failover

In a cluster, the instance is only running on one server at the time. In case of a failover, the new server will have to start the SQL Server service. As we know, the tempdb database is recreated during service startup, so any data in tempdb is lost. But, that's not all. When SQL Server is starting it will perform recovery of databases if the databsaes was not closed down cleanly. In other words, all changes in the transaction log that not yet has been committed to the database files are rolled forward (redo phase). Any active transactions at the end of the transaction log are rolled back (undo phase). In other words, all active transactions at the time of failover are rolled back. Finally, the new server does of course not have any active connections. All connections are lost during failover, so all the clients will have to reconnect. Busted!

Myth: SQL Server will sort data correctly when I chose the right collation

Let's first kill one myth: "Collations are sort orders". Collations are not really sort orders, they contain a character set and tells SQL Server how to compare data. But of course, sorting data cannot be done without comparing data, so indirectly a collation tells SQL Server how data should be sorted.

Unfortunately, even if you choose the right collation, SQL Server will not neccesarily sort your data correctly. Let's have a simple example. At which letter would you place MacDonald? In many languages MacDonald should be filed under D, not M. Similarily, both o'Brian and van Buuren should be filed under B. And we could include more sofisticated examples. For instance, in Norwegian the letter Å can ble transcribed to AA, but the latter should be filed under A or Å according to pronounciation. So Aass (the brewery) should be filed under Å, and the German city Aachen should be filed under A. Even more interresting, in the norwegian work skoeske should oe NOT be treated as Ø even though Ø can be transcribed to OE.

So, we have seen that the collation indirectly tells SQL Server how to sort data, but unfortunately SQL Server will still not be able to sort your data 100% correctly. Busted!

Myth: Since I want the data sorted by the clustering key, I can eliminate the order by clause

It is true that the data is physically ordered by the clustering key, but that does unfortuantely not eliminate the need for the order by cluase. Let me show why with a simple example showing what may actually happen in SQL Server. First, a query Q1 is issued against table T1 causing a clustered index scan. A few seconds later, another query Q2 is issued against the same table also causing a clustered index scan. SQL Server optimizes this secondary query by attaching it to the same data stream that Q1 uses. Query Q1 ends and returns its data to the client, ordered by the clustering key. Query Q2 has to catch up the data at the beginning of the table, and starts to read from the beginning of the table. Query Q2 finishes, returning the data to the client. These data are NOT ordered by the clustering key. The only way you can ensure that the data is ordered in a specific way is by using an order by clause. Busted!

Myth: The clustering key should be small, unique and sequential

There is no rule stating that a clustering key must have these characteristics, but they are all benificial for a clustering key. A clustering key should be small in size, since all nonclustered indexes automatically will contain the clustering key so that SQL Server can look up the rows for which a query gets a match in the index. If the clustering key is not unique, SQL Server cannot automatically use the clustering key to look up a specific row, therefore a 4 byte uniquifier is added to the clustering key, increasing the size of the clustering key. Finally, if the clusteirng key is sequential new rows are always added at the end of the index and thus eliminating or reducing SQL Server's need of performing a page split. A developer may have valid reasons for choosing a clustering key that is not both small, unique and sequential, still I'll say that the myth is confirmed!

Myth: I don't need to back up my transaction log since I do regular full backups

This is a very common misconception. Full backups are database backups, not log backups. Full backup may indeed back up a portion of the transaction log, but that is only to ensure that SQL Server will be able to restore the database in a transactionally consistent state. To back up transaction log and thus free up space in the log files you will need to do a transaction log backup. Busted!

Myth: Index seeks are better than index scans

It's impossible to say which one performs better, as it really depends. If you have a large table and run a query that returns a single tuple, it is pretty obvious that an index seek is the preferred operator. If you have a small table and issue a query that returns a lot of tuples it is the other way around, an index scan would be the most cost effective operator. So it is wrong to state that index seeks are better than index scans. Busted!

Myth: A clustered index scan is better than a table scan

Well, this one is really simple. A clustered index scan and a table scan is essentially the same thing. In both cases SQL Server reads all the data pages of an object. Busted!