An interesting issue with SQL Replication and a rogue system spid

August 9, 2012, 5:03 am

≫ Next: When DBMail started complaining about the servername being NULL

≪ Previous: An in-depth look at Ghost Records in SQL Server

I recently came across this interesting issue with SQL Replication. We were trying to create a new publication, and the new publication wizard would just hang. Upon doing some investigation, we found that we were hitting the connect article mentioned here. However, the connect article mentions that the bug is closed as “won’t fix”, so we had to somehow find a way out of the situation. Let me first describe how we narrowed down into the issue:

First, check sysprocesses to see which spid is blocking the new publication wizard (or whatever replication operation you’re trying to perform).
If you see a system spid, such as 5 or 7 (all spids less than 50 are system spids, as a general rule), then do a DBCC Opentran and see if the same spid shows up.
If you see something like this in the output:
Transaction information for database 'master'.

Oldest active transaction:

SPID (server process ID): 5s

UID (user ID) : -1

Name : user_transaction

LSN : (3286:3576:1)

Start time : Aug 2 2012 8:04:46:603AM

SID : 0x01

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

then you’re likely hitting the same problem.

Another thing you might want to check is the locks held by that spid. I checked them using the sp_lock <spid> command, and found this (notice the last one):
spid dbid ObjId IndId Type Resource Mode Status

5 5 0 0 DB S GRANT

5 10 0 0 DB S GRANT

5 1 60 0 TAB IX GRANT

5 5 1663344990 0 TAB Sch-M GRANT

5 1 60 1 KEY (fa00cace1004) X GRANT

Next, check the SQL Server errorlog, and see if you can spot any messages point towards “script upgrade”. An example would be:

2012-08-02 08:04:06.500 Logon Error: 18401, Severity: 14, State: 1.

2012-08-02 08:04:06.500 Logon Login failed for user 'maverick'. Reason: Server is in script upgrade mode. Only administrator can connect at this time. [CLIENT: 150.232.101.86]

Also, see if you can spot messages related to upgrading replication in the errorlog. In my case I found quite a few:
2012-08-02 08:04:11.780 spid5s Database 'master' is upgrading script 'repl_upgrade.sql' from level 167774691 to level 167777660.

2012-08-02 08:04:13.010 spid5s Upgrading distribution settings and system objects in database distribution.

2012-08-02 08:04:17.590 spid5s Upgrading publication settings and system objects in database [Cash].

2012-08-02 08:04:18.270 spid5s Upgrading publication settings and system objects in database [Sellers].

2012-08-02 08:04:18.620 spid5s Upgrading publication settings and system objects in database [Revenue].

What this tells us is that there was a patch applied at some point, and it failed while upgrading replication. Now, every time SQL Server starts up, it tries to upgrade the replication. Let’s see if we can find an upgrade failure message as well. For example, you may find something that looks like this:
2012-08-02 08:04:46.470 spid5s       Upgrading subscription settings and system objects in database [XYZ].
2012-08-02 08:04:46.600 spid5s       Index cannot be created on object 'MSreplication_subscriptions' because the object is not a user table or view.
2012-08-02 08:04:46.600 spid5s       Error executing sp_vupgrade_replication.
2012-08-02 08:04:46.600 spid5s       Saving upgrade script status to 'SOFTWARE\Microsoft\MSSQLServer\Replication\Setup'.
2012-08-02 08:04:46.600 spid5s       Saved upgrade script status successfully.
2012-08-02 08:04:46.600 spid5s       Recovery is complete. This is an informational message only. No user action is required.
Also notice the spid in the aforementioned failure messages. See it? So, because the replication upgrade fails, this system spid holds the lock on some resource, and as a result, we’re unable to perform any replication related activities.

Troubleshooting

So how do we troubleshoot this? Let me list out the steps:

Let’s first focus on the exact error we see in the errorlog, which seems to be the reason behind the replication upgrade failing:

We can clearly see that it has an issue with the MSReplication_Susbcriptions object in the XYZ database. I checked on the object using sp_help, and found that it was a synonym.
Next, we dropped the offending synonym, and scripted out the MSReplication_Subscriptions object from one of the other databases that had replication enabled. We ran this script in the XYZ database to create the object.
As a test, we ran the sp_vupgrade_replication stored procedure explicitly from SSMS, and it completed fine.
Next, we restarted SQL, and saw that the script upgrade had completed successfully this time. Subsequent restarts did not result in SQL Server going into script upgrade mode. This meant that the system spid was no longer holding the lock, and we could now perform replication related activities successfully.

Hope this helps. Comments/feedback are welcome.

↧

When DBMail started complaining about the servername being NULL

August 22, 2012, 3:23 am

≫ Next: Migrating TFS from SQL Server Enterprise to Standard can cause problems due to compression

≪ Previous: An interesting issue with SQL Replication and a rogue system spid

I recently came across an issue, where, for some reason, DBMail was not working. To be more specific, we were unable to create a profile for DBMail, let alone send emails. When trying to add the profile to the account, we were getting this error:

TITLE: Configuring...
------------------------------
Unable to create new account test for SMTP server Microsoft.SqlServer.Management.SqlManagerUI.SQLiMailServer.
------------------------------
ADDITIONAL INFORMATION:
Create failed for MailAccount 'test'. (Microsoft.SqlServer.Smo)
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=10.50.2500.0+((KJ_PCU_Main).110617-0038+)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&EvtID=Create+MailAccount&LinkId=20476
------------------------------
An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)
------------------------------
Cannot insert the value NULL into column 'servername', table 'msdb.dbo.sysmail_server'; column does not allow nulls. INSERT fails.
The statement has been terminated. (Microsoft SQL Server, Error: 515)
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=10.50.2500&EvtSrc=MSSQLServer&EvtID=515&LinkId=20476
------------------------------
BUTTONS:
OK
------------------------------

Now, looking at the error message, it’s clear that we’re somehow passing NULL for the servername field when creating the profile. I tried creating a profile using T-SQL (using the steps mentioned here), and that worked just fine. I could also see the row in the msdb.dbo.sysmail_server table.

So there was definitely an issue with how the servername value was being captured/passed. I captured a profiler trace, and found the following rows to be of interest:

SP:StmtCompleted        SELECT @mailserver_name=@@SERVERNAME
   --create a credential in the credential store if a password needs to be stored
       Microsoft SQL Server Management Studio

SP:StmtStarting             IF(@username IS NOT NULL)
       Microsoft SQL Server Management Studio

SP:StmtCompleted       IF(@username IS NOT NULL)
       Microsoft SQL Server Management Studio

SP:StmtStarting           INSERT INTO msdb.dbo.sysmail_server (account_id,servertype, servername, port, username, credential_id, use_default_credentials, enable_ssl)
   VALUES (@account_id, @mailserver_type, @mailserver_name, @port, @username, @credential_id, @use_default_credentials, @enable_ssl)
       Microsoft SQL Server Management Studio

Exception                     Error: 515, Severity: 16, State: 2
User Error Message     Cannot insert the value NULL into column 'servername', table 'msdb.dbo.sysmail_server'; column does not allow nulls. INSERT fails.
User Error Message     The statement has been terminated.

Accordingly, I tried running select @@SERVERNAME explicitly on the server, and lo and behold, that was NULL too…!!! However, the select serverproperty(‘servername’) command was able to return the server name. But unfortunately, DBMail uses select @@SERVERNAME, and not serverproperty (‘servername’), as we can see clearly in the profiler trace. So this was definitely where the issue was originating from. I then queried the sys.sysservers dmv, and I couldn’t see a record with the srvid 0 (the details of the local server are always stored in the dmv with srvid 0). Next, we ran the following commands to fix the situation:

sp_dropserver ‘<localservername>’

sp_addserver ‘<localservername>’, @local=’LOCAL’

After this, we restarted SQL Server, and DBMail worked like a charm (after we had cleaned up the mess we had created earlier, of course). Hope this helps.

↧

Migrating TFS from SQL Server Enterprise to Standard can cause problems due to compression

September 3, 2012, 7:21 am

≫ Next: SQL Server Cluster Failover Root Cause Analysis–the what, where and how

≪ Previous: When DBMail started complaining about the servername being NULL

When migrating a Team Foundation Server from SQL Server Enterprise to Standard , you might run into this error:

Restore Failed For Server ‘<Servername>’, (Microsoít.SqlServer.SmoExtended)
Additional information:
An exception occurred while executing a Transact-SQL statement or batch.

(Microsoft.SqlServer ,Connectionlnlo)
Database ‘<TFS Database name> cannot be started in this edition of SQL Server because part or all of object tbl_Branch’ is enabled with data compression or vardecimal storage Format. Data compression and vardecimal storage Format are only supported on SQL Server Enterprise Edition.

Database ‘<TFS Database name>’ cannot be started because some of the database functionality is not available in the current edition of SQL Server. (Microsoft SQL Server, Error: 909)

The error message seems obvious enough, but the question is, how exactly do you proceed? For example, one of the things you would need to find out is which objects have compression enabled on them(yeah, TFS enables compression on some objects in its databases) , and how to get rid of it, so the migration can proceed. Here are the steps:

Run the following query in each TFS database to determine whether there are objects which have compression enabled:

select so.name,so.type,so.type_desc,sp.data_compression,sp.data_compression_desc from sys.partitions sp
inner join sys.objects so
on (so.object_id=sp.object_id)
where sp.data_compression!=0

If there are objects listed in the output of the query, then the next step is to disable the compression on the objects and their indexes. I actually ended up writing a small script for this(see attachment “Disable Compression on TFS DB’s.sql”). As always, this script does not come with any guarantees. Please do test it thoroughly before running on your production environment. You will need to run this script in the context of each of the TFS databases.

After this, you should be good to proceed with the migration. If you face any issues when trying to disable the compression, please do not hesitate to call Microsoft for support.

Hope this helps. Do let me know if you have any feedback, suggestions or comments. Thanks.

↧

SQL Server Cluster Failover Root Cause Analysis–the what, where and how

September 3, 2012, 12:55 pm

≫ Next: SQL 2008–Service fails to come online with “a valid certificate could not be found, and it is not possible to create a self-signed certificate”

≪ Previous: Migrating TFS from SQL Server Enterprise to Standard can cause problems due to compression

I know many of you get into situations where SQL Server fails over from one node of a cluster to the other, and you’re hard-pressed to find out why. In this post, I shall seek to answer quite a few questions about how to about conducting a post-mortem analysis for SQL Server cluster failover, aka Cluster Failover RCA.

First up, since this is a post mortem analysis, we need all the logs we can get. Start by collecting the following:

SQL Server Errorlogs
The “Application” and “System” event logs, saved in txt or csv format (eases analysis)
The cluster log (see here and here for details on how to enable/collect cluster logs for Windows 2003 and 2008 respectively)

Now that we have all the logs in place, then comes the analysis part. I’ve tried to list down the steps and most common scenarios here:

Start with the SQL Errorlog. The Errorlog files in the SQL Server log folder can be viewed using notepad, textpad or any other text editor. The current file will be named Errorlog, the one last used Errorlog.1, and so on. See if the SQL Server was shut down normally. For example, the following stack denotes a normal shutdown for SQL:

2012-09-04 00:32:54.32 spid14s     Service Broker manager has shut down.
2012-09-04 00:33:02.48 spid6s      SQL Server is terminating in response to a 'stop' request from Service Control Manager. This is an informational message only. No user action is required.
2012-09-04 00:33:02.50 spid6s      SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.
You might see a lot of situations where SQL Server failed over due to a system shutdown i.e. the node itself rebooted. In that case, the stack at the bottom of the SQL Errorlog will look something like this:

2012-07-13 06:39:45.22 Server      SQL Server is terminating because of a system shutdown. This is an informational message only. No user action is required.
2012-07-13 06:39:48.04 spid14s     The Database Mirroring protocol transport has stopped listening for connections.
2012-07-13 06:39:48.43 spid14s     Service Broker manager has shut down.
2012-07-13 06:39:55.39 spid7s      SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.
2012-07-13 06:39:55.43 Server      The SQL Server Network Interface library could not deregister the Service Principal Name (SPN) for the SQL Server service. Error: 0x6d3, state: 4. Administrator should deregister this SPN manually to avoid client authentication errors.
You can also use the systeminfo command from a command prompt to check when the node was last rebooted (look for “System Boot Time”), and see if this matches the time of the Failover. If so, then you need to investigate why the node rebooted, because SQL was just a victim in this case.

Next come the event logs. Look for peculiar signs in the application and system event logs that could have caused the failover. For example, one strange scenario that I came across was when the disks hosting tempdb became inaccessible for some reason. In that case, I saw the following in the event logs:

Information 7/29/2012 12:44:07 AM MSSQLSERVER 680 Server Error [8, 23, 2] occurred while attempting to drop allocation unit ID 423137010909184 belonging to worktable with partition ID 423137010909184.

Error 7/29/2012 12:44:07 AM MSSQLSERVER 823 Server The operating system returned error 2(The system cannot find the file specified.) to SQL Server during a read at offset 0x000001b6d70000 in file 'H:\MSSQL\Data\tempdata4.ndf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
And then some time later, we see SQL shutting down in reaction to this:

Error 7/29/2012 12:44:17 AM MSSQLSERVER 3449 Server SQL Server must shut down in order to recover a database (database ID 2). The database is either a user database that could not be shut down or a system database. Restart SQL Server. If the database fails to recover after another startup, repair or restore the database.

Error 7/29/2012 12:44:17 AM MSSQLSERVER 3314 Server During undoing of a logged operation in database 'tempdb', an error occurred at log record ID (12411:7236:933). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.

Error 7/29/2012 12:44:17 AM MSSQLSERVER 9001 Server The log for database 'tempdb' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
Another error that clearly points toward the disks being a culprit is this:

Error 7/29/2012 12:44:15 AM MSSQLSERVER 823 Server The operating system returned error 21(The device is not ready.) to SQL Server during a read at offset 0x00000000196000 in file 'S:\MSSQL\Data\tempdb.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
The next logical step of course would be to check why the disks became unavailable/inaccessible. I would strongly recommend having your disks checked for consistency, speed and stability by your vendor.

If you don’t have any clue from these past steps, try taking a look at the cluster log as well. Please do note that the Windows cluster logs are recorded in GMT/UTC time zone always, so you’ll need to make the necessary calculations to determine what time to focus on in the cluster log. See if you can find anything which could have caused the cluster group to fail, such as the network being unavailable, failure of the IP/Network name, etc.

There is no exhaustive guide to finding the root cause for a Cluster Failover, mainly because it is an approach thing. I do, however, want to talk about a few cluster concepts here, which might help you understand the messages from the various logs better.

checkQueryProcessorAlive: Also known as the isAlive check in SQL Server, this executes “SELECT @@servername” against the SQL Server instance. It waits 60 seconds before running the query again, but checks every 5 seconds whether the service is alive by calling sqsrvresCheckServiceAlive. Both these values(60 seconds and 5 seconds) are configured “by default” and can be changed from the properties of the SQL Server resource in Failover Cluster Manager/Cluster Administrator. I understand that for SQL 2012, we’ve included some more comprehensive checks like running sp_server_diagnostics as part of this check to ensure that SQL is in good health.

sqsrvresCheckServiceAlive: Also known as the looksAlive check in SQL Server, this checks to see if the status of the SQL Service and returns “Service is dead” if the status is not one of the following:

SERVICE_RUNNING
SERVICE_START_PENDING
SERVICE_PAUSED
SERVICE_PAUSE_PENDING

So if you see messages related to one of these checks failing in either the event logs or the cluster logs, you know that SQL Server was not exactly “available” at that time, which caused the failover. The next step, of course would be to investigate why SQL Server was not available at that time. It can be due to a resource bottleneck such as high CPU or memory consumption, SQL Server hung/stalled, etc.

The base idea here, as with any post-mortem analysis, is to construct a logical series of events leading up to the failover, based on the data. If we can do that, then we have at least a clear indication on what caused the failover, and more importantly, how to avoid such a situation in the future.

If you’re still unable to determine anything about the cause of the failover, I would strongly recommend contacting Microsoft CSS to review the data once and see if they’re able to spot anything.

Hope this helps. As always, comments, feedback and suggestions are welcome.

↧

SQL 2008–Service fails to come online with “a valid certificate could not be found, and it is not possible to create a self-signed certificate”

October 9, 2012, 3:07 am

≫ Next: Why the registry size can cause problems with your AlwaysOn/Failover Cluster setup

≪ Previous: SQL Server Cluster Failover Root Cause Analysis–the what, where and how

You might run into this situation where SQL Server fails to come online (either with a new install or an existing one). Looking at the application event logs, you see these messages:

Event Type: Error

Event Source: MSSQLSERVER

Event Category: Server

Event ID: 17182

Date: 05/08/2012

Time: 5:03:40 AM

User: N/A

Computer: SQLTest1

Description:

TDSSNIClient initialization failed with error 0x80092004, status code 0x80. Reason: Unable to initialize SSL support. Cannot find object or property.

………………………

Event Type: Error

Event Source: MSSQLSERVER

Event Category: Server

Event ID: 17190

Date: 05/08/2012

Time: 5:03:40 AM

User: N/A

Computer: FTRNSNA01VSQL11

Description:

FallBack certificate initialization failed with error code: 1.

As always, it’s a good idea to take a look at the SQL Errorlog. Looking in the errorlog, you might see these messages:

2012-05-08 05:10:13.14 Server Error: 17190, Severity: 16, State: 1.

2012-05-08 05:10:13.14 Server FallBack certificate initialization failed with error code: 1.

2012-05-08 05:10:13.14 Server Unable to initialize SSL encryption because a valid certificate could not be found, and it is not possible to create a self-signed certificate.

2012-05-08 05:10:13.16 Server Error: 17182, Severity: 16, State: 1.

2012-05-08 05:10:13.16 Server TDSSNIClient initialization failed with error 0x80092004, status code 0x80. Reason: Unable to initialize SSL support. Cannot find object or property.

2012-05-08 05:10:13.16 Server Error: 17182, Severity: 16, State: 1.

2012-05-08 05:10:13.16 Server TDSSNIClient initialization failed with error 0x80092004, status code 0x1. Reason: Initialization failed with an infrastructure error. Check for previous errors. Cannot find object or property.

This is another error that does not exactly point towards the actual cause of the problem. One might think, why is it not possible to create a self-signed certificate? The answer is that the certificate cannot be created because the user profile is corrupted. Here’s what you can do:

Workaround : Change the service account. If the new account’s profile on the server is not corrupted, the services will come online.

Solution: Delete the profile and recreate it. For details, please refer to the KB here

Hope this helps.

↧

Why the registry size can cause problems with your AlwaysOn/Failover Cluster setup

October 24, 2012, 11:37 pm

≫ Next: How To : SQL 2012 Filetable Setup and Usage

≪ Previous: SQL 2008–Service fails to come online with “a valid certificate could not be found, and it is not possible to create a self-signed certificate”

I recently worked on a very interesting issue, where one of the cluster nodes in an AlwaysOn environment became unstable, and the administrators ended up evicting the node from the Windows cluster as an emergency measure. Ideally, since the primary node/replica was no longer available, the Availability Group should have come up on the secondary replica, but it didn’t in this case. The AG was showing online in the Failover Cluster Manager, but in SQL Server Management studio, the database in the AG was in “Not Synchronizing\Recovery Pending” state.

We checked the errorlogs (on the secondary), and found these messages:

2012-09-05 04:01:32.300 spid18s      AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering service to start. This is an informational message only. No user action is required.
2012-09-05 04:01:32.310 spid21s      Error: 35262, Severity: 17, State: 1.
2012-09-05 04:01:32.310 spid21s      Skipping the default startup of database 'Test' because the database belongs to an availability group (Group ID: 65537). The database will be started by the availability group. This is an informational message only. No user action is required.
……..

2012-09-05 04:01:32.430 spid18s      AlwaysOn: The local replica of availability group 'PST TEST' is starting. This is an informational message only. No user action is required.
…….
2012-09-05 04:01:32.470 spid18s      The state of the local availability replica in availability group 'PST TEST' has changed from 'NOT_AVAILABLE' to 'RESOLVING_NORMAL'. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.
…….

2012-09-05 04:01:32.880 spid52       AlwaysOn: The local replica of availability group 'PST TEST' is preparing to transition to the primary role in response to a request from the Windows Server Failover Clustering (WSFC) cluster. This is an informational message only. No user action is require
2012-09-05 04:01:32.980 spid52       The state of the local availability replica in availability group 'PST TEST' has changed from 'RESOLVING_NORMAL' to 'PRIMARY_PENDING'. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.
2012-09-05 04:01:33.090 Server       Error: 41015, Severity: 16, State: 1.
2012-09-05 04:01:33.090 Server       Failed to obtain the Windows Server Failover Clustering (WSFC) node handle (Error code 5042). The WSFC service may not be running or may not be accessible in its current state, or the specified cluster node name is invalid.

Since there were clear errors related to the Windows Server Failover Cluster (WSFC), we checked and ensured that the windows cluster was stable. It was, and the cluster validation came back clean.

We tried bringing the database online using "Restore database lab with recovery", but it failed saying the database is part of an availability group. We then tried removing it from the Availability Group, but it failed with error 41190, stating that the database is not in a state that it can be removed from the Availability Group. The only option we had at this point was to delete the AG. We tried doing so, but that too returned with an error:

Msg 41172, Level 16, State 0, Line 3
An error occurred while dropping availability group 'PST TEST' from Windows Server Failover Clustering (WSFC) cluster and from the local metadata. The operation encountered SQL OS error 41036, and has been terminated. Verify that the specified availability group name is correct, and then retry the command.

However, the AG was no longer visible in SQL Server Management Studio and Failover Cluster Manager. I was still skeptical, since the error had clearly complained about the metadata cleanup. When we tried creating a new AG with the name PST TEST, it errored out as expected, stating that the AG as still present. So we ended up creating an AG with a different name and adding the Test database to it.

Root Cause Analysis

So much for getting the environment back up, but what about the root cause? I mean, how can we ensure that such as issue never happens again? I checked with some friends in the Product Group, and according to them, deleting an AG should “Always” work. So why didn’t it work in this case?

The answer lies in the size of the registry on the servers. As many of you might know, the limit for registry size is still 2 GB. This is also documented in the msdn article here. The proper way to investigate would be to follow these steps:

Check the Paged pool usage from perfmon by checking the Memory->Pool Paged Bytes counter
If you see high memory usage there (close to the 2 GB limit), then we need to find who’s using the pool. There’s a very useful article on this:
http://msdn.microsoft.com/en-us/library/windows/hardware/gg463213.aspx
Using one of the methods described in the article, we can easily identify which process is using the paged pool. One other way is to use the Process->Pool paged bytes Perfmon counter.
In our case, we identified CM31 as the tag using about 1.97 GB from the paged pool. Looking up the tag list available here, we can see that the CM series corresponds to “Configuration Manager (registry)”.
So it’s clear that registry is using a large chunk of the paged pool, and once this usage hits 2 GB, users will not be able to login to the system, and as a result, everything else, including the cluster service and the AG, will fail. This issue can happen either due to large registry hives or some process loading keys multiple times.
Next, check the sizes of the files in the Windows/system32/config folder. If these are large (>1 GB), then that will be the cause of the issue. Also, check the sizes of the NTUser.dat files in C:\Users. There will be one for each user, so searching for them in c:\users is the simplest way.
In our case, we could clearly see that the SOFTWARE hive was by far the largest, and very close to the limit:
The next step is to figure out which process/hive is responsible for the huge size of the Software branch. In our case we found that it was a known issue with the Cluster service, outlined in this KB:
http://support.microsoft.com/kb/2616514
Another known issue that can cause similar issues in Windows 2000 and 2003:
http://support.microsoft.com/kb/906952
The best remedial measure is to compress the “Bloated” registry hives, using the steps outlined in this KB:
http://support.microsoft.com/kb/2498915

There can, of course, be other processes bloating the Software hive, and the only way to find out is to take a backup of the registry hive and try to find which hives/keys are the largest. Once we have identified the keys, we can trace them back to the process which is responsible.

Hope this helps. Any feedback/suggestions are welcome.

↧

How To : SQL 2012 Filetable Setup and Usage

November 10, 2012, 8:24 am

≫ Next: An in-depth look at SQL Server Memory–Part 1

≪ Previous: Why the registry size can cause problems with your AlwaysOn/Failover Cluster setup

One of the cool things about my job is that I get to work on the latest technologies earlier than most people. I recently stumbled upon an issue related to Filetables, a new feature in SQL Server 2012.

To start with, a Filetable brings you the ability to view files and documents in SQL Server, and allows you to use SQL Server specific features such as Full-Text Search and semantic search on them. At the same time, it also allows you to access those files and documents directly, through windows explorer or Windows Filesystem API calls.

Setting up Filetables

Here are some basic steps for setting up Filetables in SQL Server 2012:

Enable Filestream for the instance in question from SQL Server Configuration Manager (Right click on the SQL Server Service-> Properties->Filestream-> Enable Filestream for Transact-SQL access). Also make sure you provide a Windows Share name. Restart SQL after making this change.
Create a database in SQL (exclusively)for Filetables (preferable to using an existing database), and specify the WITH FILESTREAM option. Here’s an example:

CREATE DATABASE FileTableDB
ON PRIMARY
(
    NAME = N’FileTableDB',
    FILENAME = N'C:\FileTable\FileTableDB.mdf'
),
FILEGROUP FilestreamFG CONTAINS FILESTREAM
(
    NAME = FileStreamGroup1,
    FILENAME= 'C:\FileTable\Data'
)
LOG ON
(
    NAME = N'FileTableDB_Log',
    FILENAME = N'C:\FileTable\FileTableDB_log.ldf'
)
WITH FILESTREAM
(
    NON_TRANSACTED_ACCESS = FULL,
    DIRECTORY_NAME = N'FileTables'
)

Alternatively, you can add a Filestream Filegroup to an existing database, and then create a Filestream directory for the database:

ALTER DATABASE [FileTableDB] ADD FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM(NAME = FileStreamGroup1, FILENAME= 'C:\FileTable\Data')
GO

ALTER DATABASE FileTableDB
SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL, DIRECTORY_NAME = N'FileTables' );
GO

To verify the directory creation for the database, run this query:

SELECT DB_NAME ( database_id ), directory_name
FROM sys.database_filestream_options;
GO

Next, you can run this query to check if the enabling Non Transacted Access on the database was successful (the database should have the value ‘FULL’ in the non_transacted_access_desc column):

SELECT DB_NAME(database_id), non_transacted_access, non_transacted_access_desc
FROM sys.database_filestream_options;
GO

The next step is to create a Filetable. It is optional to specify the Filetable Directory name. If you don’t specify one, the directory will be created with the same name as the Filetable.
Example:

CREATE TABLE DocumentStore AS FileTable
    WITH (
          FileTable_Directory = 'DocumentTable',
          FileTable_Collate_Filename = database_default
         );
GO

Next, you can verify the previous step using this query (don’t be daunted by the number of rows you see for a single object):

SELECT OBJECT_NAME(parent_object_id) AS 'FileTable', OBJECT_NAME(object_id) AS 'System-defined Object'
FROM sys.filetable_system_defined_objects
ORDER BY FileTable, 'System-defined Object';
GO

Now comes the most exciting part. Open the following path in windows explorer:
\\<servername>\<Instance FileStream Windows share name (from config mgr)>\<DB Filetable directory>\<Table Directory Name>
In our case, it will be:
\\Harsh2k8\ENT2012\Filetables\DocumentTable
Next, copy files over to this share, and see the magic:
select * from DocumentStore

So you get the best of both worlds: Accessing files through SQL, searching for specific words/strings inside the files from inside SQL, etc. while retaining the ability to access the files directly through a windows share. Really cool, right? I think so too.

A few points to remember:

The Fielstream/Filetable features together give you the ability to manage windows files from SQL Server. Since we’re talking about files on the file system, accessing them requires a Windows user. Thus, these features will not work with SQL Server authentication. The only exception is using a SQL Server login that has sysadmin privileges (in which case it will impersonate the SQL Server Service account).
Filetables give you the ability to get the logical/UNC path to files and directories, but any file manipulation operations (such as copy, cut, delete, etc.) must be performed by your application, possibly using file system API's such as CreateFile or CreateDirectory. In short, the onus is on the application to obtain a handle to the file using file system API’s. Filetables only serve the purpose of providing the path to the application.

Some useful references for Filetables:
http://msdn.microsoft.com/en-us/library/gg492089.aspx
http://msdn.microsoft.com/en-us/library/gg492087.aspx

Hope this helps. Any comments/feedback/suggestions are welcome.

↧

An in-depth look at SQL Server Memory–Part 1

December 16, 2012, 11:21 pm

≫ Next: An in-depth look at SQL Server Memory–Part 2

≪ Previous: How To : SQL 2012 Filetable Setup and Usage

I know that memory management in SQL Server is one area that’s a bit of an enigma for a lot of people, and most of us only tend to know as much about memory as is related to our day-to-day activities. In this post(and others in this series), I shall seek to do a deep dive into SQL Server memory management, and give you as complete a picture as possible.

Let’s start off by understanding a few terms:

VAS

Virtual Address Space. Windows uses Virtual addresses to allocate memory to a process, and the virtual address to physical address mapping is taken care of by the OS. For details on the need for using Virtual Addresses, please refer to the following technet article:

http://technet.microsoft.com/en-us/library/cc767886.aspx

On a 32 bit system, the max address that can be referenced is 2^32 (since each bit can reflect can reflect either a "set" state or a "reset state"), which amounts to ~4 GB. Thus, the VAS on a 32 bit system is 4 GB, of which 2 GB is for the OS Kernel, and 2 GB is allocated to each process. This means that each process can potentially grow up to 2 GB in terms of VAS usage.

/PAE

Stands for Physical Address Extension. Basically, on 32 bit systems, it enables the use of 36 bit pointers (instead of the default 32 bit ones) by utilizing the underlying hardware. Using 36 bit pointers means that we can now use 36 bit addresses as opposed to 32 bit ones, thereby increasing the max memory the OS can "see" to 64 GB (2^36). If you want to utilize more than 4 GB of RAM on a 32 bit server, then you have to use the /PAE switch in the boot.ini OS file.

/3GB

The /3GB switch changes the default break-up of the VAS, giving 3 GB to applications (such as SQL) which are Large Address Aware, and leaving 1 GB for the OS kernel. Keep in mind that setting the /3GB switch means that the OS Kernel can then only "see" up to 16 GB of physical memory. More on the /3GB switch here.

/USERVA

This switch is used to fine tune the VAS usage by applications to between 2 and 3 GB, and is added in the boot.ini as well.

Bpool

Short for Buffer Pool. SQL memory can be divided into 2 parts, BPool and MTL/Non-BPool. The BPool area caters to all memory requests upto 8 KB in size. Since the size of a page in SQL is 8KB, this basically means that all data and index page allocation requests are catered to from the BPool, as are Large Pages. The Max Server Memory setting up to SQL 2008 R2 caps only the BPool area.

MTL/Non-BPool

All requests for memory greater than 8KB are catered to from the MTL/Non-BPool area. This area also includes memory used for COM Objects, CLR Code, Extended Stored Procedures, Large cached plans, etc. Leaks by these non-SQL components can also cause SQL memory usage to bloat and eventually lead to an OOM (Out Of Memory) condition.

AWE

Stands for Address Windowing Extensions. There's a specific set of AWE API's used to allocate AWE memory. This feature has different uses in 32 and 64 bit. AWE can only be used if the account under which SQL Service is running (the "Service Account") has the "Lock Pages in Memory" privilege granted to it in gpedit.msc.

32 bit: In 32 bit systems, enabling AWE basically helps you take advantage of the fact that "fetching from RAM is faster than fetching from Disk". Only if the RAM on the server is greater than the VAS (4 GB) shall SQL be able to utilize AWE. Using the AWE API's, SQL allocates memory, fetches pages (data and index pages only) into RAM, and then maps/unmaps them into the BPool as needed. To put it simply, we create a "window" in the BPool VAS which is used to map/unmap data and index pages stored in the AWE allocated region.

64 bit: If SQL has the Lock Pages in Memory privilege, then it will try and allocate some amount of memory through AWE API's. The benefit is that this memory cannot be paged out by the Operating System as part of a working set trim operation.

The VAS windowing concept does not come into picture here because on 64 bit, we have virtually unlimited VAS.

Please note that the AWE memory is not part of the working set, which is why it will not be a candidate for "working set trimming" by the OS in case of server level memory pressure. This is true for both 32 bit and 64 bit environments.

Memory Architecture

Now we get to the interesting stuff. Let's understand the major components in the SQL Memory architecture:

Memory Node: A memory node is a logical division of memory, mapped on top of a NUMA node. In English, this means that if you have 2 NUMA nodes on your server, there will be 2 memory nodes as well. If you do not have NUMA, then there will be just one memory node.

Memory Allocator: All memory allocation on the memory nodes have to go through memory allocator routines tied to the Memory Nodes. Basically, memory requests to a Memory Node will have to land up with the Memory Allocators in order to be honored. This is because the it's the Memory Allocator routines that know the various types of Windows API's to be called for different kinds of allocation requests. The allocator routines have code for allocating Pages (used for single, multi and large page requests), Virtual allocator, and Shared memory allocator.

The virtual allocator uses VirtualAlloc() and AWE API’s to allocate memory. More about these later in this post. The multi-page allocator also uses the Virtual Allocator to honor requests for multiple pages.

Memory Clerks: The most crucial in the memory architecture perhaps, is the Memory Clerks component. The major memory consumers in SQL have their own memory clerks, and we use the Memory Clerks to track memory usage by component. The memory clerks can be divided into the following categories, based on the larger structures that house them in memory:

Generic: Includes the Buffer Pool, CLR, Optimizer and Xevent Clerks. The generic clerks do not use the SQL OS caching infrastructure, but still have the ability to respond to memory pressure.

Cache Store : The Procedure Cache and System Rowset clerks come under this bucket. Cache store clerks use multiple hash tables for lookup. So for example, if you're searching on multiple criteria, having multiple hash tables for lookup helps boost performance. These clerks also use the clock algorithm (based on LRU policy) to control the lifetime and visibility of entries. This clock algorithm enables these clerks to respond efficiently to memory pressure.

User Store: Includes the Token Perm and Metadata clerks. User store clerks are similar to Cache Store, but they do not use Hash Tables for lookup. The user store requires cache developer users to leverage the framework to implement their own storage semantics, i.e. they need to build their own custom logic for lookup. In a cache store, the lifetime is fully controlled by SQLOS’s caching framework. In a user store, the entry’s lifetime is only partially controlled by a store. Since the user store implements its own storage, it also participates in lifetime control. In plain English, this means that for user store clerks, the developers can develop their own logic to manage the lifetime of an entry (and hence also the response to memory pressure). They can leave the lifetime management to the caching infrastructure, or they can develop their own way to manage it.

Object Store/Memory Pool: Includes clerks like Locks and SNI Network Packets. The ObjectStore/Memory pool is a cache of homogenous objects (unlike user and Cache stores, which can hold heterogeneous objects). These do not have hash tables (for lookup or clock algorithms for lifetime management.

When a thread wants memory in SQL, it has to go to the Memory clerks to request for the same. The clerk, in turn, requests the Memory Allocators for memory (it’s not possible for a thread to interface directly with the Allocators). Another thing is that the clerks have functionality built in for responding to memory pressure. The memory allocation can be from:

Heap/Memory Object: Used when the requirement is for a very small size (say, a few hundred bytes). Heap allocation API’s are used in some rare scenarios by SQL Server.

Virtual Alloc: This is the most commonly used method of allocating memory in SQL, and involves the use of the VirtualAlloc() windows API. The primary reason for the extensive use of VirtualAlloc() is that it gives us the flexibility to manage memory in our own way. It has the capability to reserve and/or commit memory, as well as specification of access control for the pages involved. VirtualAlloc is used for honoring both single and multi-page requests.

AWE: AWE API's are also used to allocate memory by SQL, as long as the Lock Pages in Memory privilege has been granted to the service account under which SQL Server is running. I've specified the uses of AWE API's on both 32 bit and 64 bit systems above.

Let's talk about some specific consumers here:

Database Page Cache: The database page cache requests for memory from Buffer Pool, and the Buffer Pool, in turn, calls the Virtual Allocator (which, in turn, uses the VirtualAlloc() and AWE API's to try and honor the request).

Backup Buffers: The backups request for memory from the SQLUtilities memory clerk, which in turn, calls the Virtual Allocator to allocate memory.

Plan Cache: The plan cache requests memory from the Memory Object (which is like a heap), which in turn requests mostly for single pages using a memory clerk called SQLQUERYPLAN. The interesting thing is that from SQL 2005 onwards, all single page requests go through the Buffer Pool, which is basically code optimized for providing 8K pages. The Buffer Pool, in turn, uses the Virtual Allocator to honor the request.
If the plan cache needs multiple pages (i.e. requests memory > 8K), then the memory clerk will directly invoke the Multi-page allocator. The multi-page allocator, in turn, uses the same VirtualAlloc() and AWE API's to allocate memory.

Optimizer: The Optimizer requests memory from a mark/shrink heap (as it just uses and then releases memory), and this is tracked by a memory clerk called SQLOPTIMIZER.

The Buffer Pool acts as both a memory clerk and consumer because it's optimized for allocating 8K pages as well as managing a cache of 8K pages. What this means is that the Buffer Pool is good at tracking it's own memory consumption, as well as providing single pages to other consumers such as plan cache on demand. It also keeps a track of the pages it provides to other consumers (which shows up as "stolen pages" in DBCC Memorystatus).

So, to sum up, this is what the picture looks like:

Additional Information:

Here are some DMV's that you can use to track the memory architecture components explained above:
sys.dm_os_memory_clerks

sys.dm_os_memory_objects

sys.dm_os_memory_nodes

sys.dm_os_memory_pools

Please feel free to play around with these. Do refer to books online for more details on these DMV's.

Hope this post helps. Any comments, suggestions or feedback is welcome.

↧

An in-depth look at SQL Server Memory–Part 2

February 12, 2013, 3:12 pm

≫ Next: An in-depth look at SQL Server Memory–Part 3

≪ Previous: An in-depth look at SQL Server Memory–Part 1

The memory architecture evolved in a big way from SQL 2000 to 2005. Basically, in 2000, all we had was the procedure cache (used to cache compiled plans and execution plans, execution contexts, etc.) and the buffer pool. However, in 2005, with the increase in the variety of memory consumers, and the addition of new features, the number of caches increased dramatically. In this part of the memory series, I'll try to do a deep dive into the caching mechanism of SQL server.

Common Caching Framework:

SQL Server implements a common caching framework. The highlight of this framework is its inbuilt ability for both lifetime and visibility management of entries. Both lifetime and visibility are controlled by a "Clock Algorithm".

Lifetime Management:

Under this algorithm, there are "Clock Hands" that sweep the cache at regular intervals. Every time the clock hand steps on a not-in-use entry, it decreases cost by some amount. If the entry is not in use and its cost is zero, the clock hand makes the entry invisible and then attempts to remove it from the cache. In fact, Lifetime of an entry is managed by an embedded reference count in the Clock Entry Info class. After this count goes to 0, an entry is destroyed.

There are 2 types of clock hands, internal and external. An external clock hand is moved by the resource monitor (RM) when the whole process gets into memory pressure. The internal clock hand is used to control the size of a cache relative to other caches. You can think of the internal clock hand as a way to put a max cap on a single cache. If this mechanism didn’t exist, it would be possible for a single cache to push the whole process into different types of memory pressure. To avoid this type of situation, the internal clock hand starts moving after the framework predicts that the procedure cache’s max cap is reached.

Visibility Management:

Visibility of an entry is implemented by a pin count embedded in the Clock Entry Info class. Keep in mind that pin count and reference count are different mechanisms. Reference count manages lifetime, and pin count manages visibility. For an entry to be visible, its pin count needs to be visible and have a value larger than 0. It also needs to be non-dirty and not marked for single usage. A pin count of 1 means that the entry is visible and is currently not in use.

Procedure Cache:

The procedure cache is, in most cases, the biggest consumer of SQL Server memory after the buffer pool. In this section, I'll seek to discuss the architecture and working of the Procedure cache in detail.

Components:

The main types of objects that can be stored in the procedure cache are described as follows:

Compiled Plans: When the query optimizer finishes compiling a query plan, the output is a compiled plan. A compiled plan is a set of instructions that describes exactly how SQL Server will implement a query. If you submit a T-SQL query for execution, all you have supplied is a set of logical instructions. There may be thousands of different ways that SQL Server could execute the query. The compiled plan for this query, though, would tell SQL Server exactly which physical query operators to use. For example, the compiled plan would specify whether to use an index seek, an index scan, or a table scan to retrieve rows from each table. A compiled plan represents a collection of all the query plans for a single T-SQL batch or stored procedure. Compiled plans are re-entrant, i.e. if multiple users are simultaneously executing the same stored procedure, they can all share a single compiled plan.

Execution Contexts: While executing a compiled plan, SQL Server has to keep track of information about the state of execution. The execution context tracks a particular execution’s parameter and local variable values, which statement in the plan is currently being executed, and object IDs for any temp objects created during execution. If two users are executing the same stored procedure at the same time, each user has his or her own execution context, but the two users may share a single compiled plan. Execution contexts cannot be shared simultaneously, but once one user is done with an execution context, it can be reused by the next user to execute the same query. For this reason, execution contexts are cached. Every execution context is linked to a particular compiled plan. Execution contexts are much cheaper to create than compiled plans, so under memory pressure they are always aged out of cache before compiled plans.

Cursors: Cursors track the execution state of server-side cursors, including the cursor’s current location within a resultset. Cursors have the same relationship to a compiled plan as execution contexts; every cursor is tied to a particular compiled plan. Like execution contexts, cursors can be used by only one connection at a time, but the compiled plan that the cursor is linked to can be concurrently used by multiple connections.

Algebrizer Trees: The query optimizer does not directly act on raw query text; it needs a more structured input. The Algebrizer’s job is to produce an algebrizer tree, which represents the logical structure of a query. As part of this process, the Algebrizer performs tasks like resolving table, column, and variable names to particular objects in the database. It also determines the data types of any expressions in the query. Since we have the compiled plans, we do not need to cache Algebrizer trees. The only exceptions are the Algerbizer trees for views, defaults and constraints. They are cached because a view may be referenced by many different queries. Caching the view’s algebrizer tree prevents SQL Server from repeatedly having to parse and algebrize the view every time another query is compiled that references the view.

When you send a query to SQL Server, the batch is parsed and sent to the Algebrizer, which produces an Algebrizer tree. The query optimizer uses the Algebrizer tree as input, and produces a compiled plan as output. Finally, in order to execute the compiled plan, an execution context must be created to track runtime state.

Architecture:

The caching infrastructure (of which Procedure cache is a part) exposes objects called cachestores. A cachestore provides a common set of memory allocation interfaces that are reused for many different memory consumers inside SQL Server. The procedure cache is split into several cachestores:

Cachestore	Common Name	Description
CACHESTORE_OBJCP	Object Cachestore	Stored procedures, functions, and triggers
CACHESTORE_SQLCP	SQL Cachestore	Ad hoc and prepared queries
CACHESTORE_PHDR	Algebrizer Cachestore	View, default, and constraint algebrizer trees
CACHESTORE_XPROC	Xproc Cachestore	Extended stored procedures

The object cachestore is used to cache compiled plans and related objects for stored procedures, functions, and triggers. The SQL cachestore holds plans for ad hoc and prepared queries. The Algebrizer cachestore and Xproc Cachestores hold algebrizer trees (for views, defaults and constraints only) and extended stored procedure objects respectively. The Object and SQL Cachestores are generally much larger than the other 2 cachestores.

In each cachestore, the lookup is managed using one or more hash tables. For example, in the SQL Cachestore, each plan is assigned a unique hash value, and the plans are divided into hash buckets based on these. Each bucket holds zero or more cached compiled plans. Each compiled plan may contain cached execution contexts and cached cursors.

Multiple plans may reside in the same hash bucket, but SQL Server limits the number of entries in each cachestore in an attempt to prevent excessive plan lookup times caused by long hash chain lengths. The SQL and object cachestores are each permitted to grow to approximately 160,000 entries on 64-bit servers, and approximately 40,000 entries on most 32-bit servers.

Memory Pressure:

As discussed earlier, the "Clock Algorithm" is used to delete old entries from a cachestore based on an LRU algorithm. From SQL 2005 onwards, the execution contexts are treated as part of the compiled plans, rather than being treated as separate cache objects. Every time a clock sweep passes a plan, the plan voluntarily releases half of its cached execution contexts, even if the plan itself is going to remain in the cache.

The background thread that ages plans out of the cache in SQL Server 2005 is called the resource monitor thread.

Cache Lookups and Plan Reuse:

The main purpose of cachestores (as if it wasn't obvious already) is to provide for reuse of the objects that they cache. The lookup method varies according to the object in question. For a stored procedure (CACHESTORE_OBJCP), the Database ID and the Object ID are used to look up the stored procedure plan(s).

For ad hoc and prepared queries, the text of the query is processed through a hash function. The hash function returns an integer value that is then referred to as the “object ID” of the SQL compiled plan object. SQL 2005 hashes the entire query text. SQL 2000 only hashed the first 8 KB, so there was a chance of the hash being the same for 2 long queries with, say, slightly different where clauses.

A number of things are taken into account when determining whether a plan will be reused:

Since we hash the entire query from SQL 2005 onwards, the text of two ad hoc queries must match exactly in order for a cached plan to be reused.

The object ID and database ID properties must also match the user’s current environment for plan reuse to take place.

Other properties like user ID and language settings may be required to match, as well; any property of the cached object that must match in order for a lookup to succeed is referred to as a cache key. The cache keys for a plan are combined by another hash function to determine the bucket in the procedure cache where the plan will be stored.

Different values for certain SET options can prevent plan reuse for ad hoc and prepared queries.

If you see multiple plans in cache for what appears to be the same query, you can determine the key differences between them by comparing the sys.dm_exec_plan_attributes DMF output for the two plans. The plan attributes that must match in order for reuse to occur will have an is_cache_key column value of 1.

Flushing the Procedure Cache:

The most common method of flushing the contents of the procedure cache is to run DBCC FREEPROCCACHE. In addition, ALTER DATABASE or DROP DATABASE commands, closing a database (due to the autoclose database option), and changes to sp_configure options can all implicitly free all or portions of the procedure cache.

In the next part, we will focus on the troubleshooting aspect of memory.

↧

An in-depth look at SQL Server Memory–Part 3

March 15, 2013, 11:59 am

≫ Next: Why the service account format matters for upgrades

≪ Previous: An in-depth look at SQL Server Memory–Part 2

In this third and final instalment of the SQL Server Memory series, I will look to focus on troubleshooting SQL Server Memory pressure issues.

Before we start on the troubleshooting part though, we need to determine the type of memory pressure that we're seeing here. I've tried to list those down here:

1.     External Physical Memory pressure - Overall RAM pressure on the server. We need to find the largest consumers of memory (might be SQL), and try to reduce their consumption. It might also be that the system is provided with RAM inadequate for the workload it's running.
2.     Internal Physical Memory pressure - Memory Pressure on specific components of SQL Server. Can be a result of External Physical Memory pressure, or of one of the components hogging too much memory.
3.     Internal Virtual Memory pressure - VAS pressure on SQL server. Mostly seen only on 32 bit (X86) systems these days (X64 has 8 TB of VAS, whereas X86 only had 4 GB. Refer to Part 1 for details).
4.     External Virtual Memory pressure - Page file pressure on the OS. SQL Server does not recognize or respond to this kind of pressure.

Troubleshooting

Now for getting our hands dirty. When you suspect memory pressure on a server, I would recommend checking the following things, in order:

1. Log in to the server, and take a look at the performance tab of the Task Manager. Do you see the overall memory usage on the server getting perilously close to the total RAM installed on the box? If so, it's probable that we're seeing External Physical Memory pressure.
2. Next, look at the Processes tab, and see which of the processes is using the maximum amount of RAM. Again, for SQL, the true usage might not reflect in the Working set if LPIM is enabled (i.e. SQL is using AWE API's to allocate memory). To check SQL's total memory consumption, you can run the following query from inside SQL (valid from SQL 2008 onwards):

select physical_memory_in_use_kb/(1024) as sql_physical_mem_in_use_mb,

locked_page_allocations_kb/(1024) as awe_memory_mb,
total_virtual_address_space_kb/(1024) as max_vas_mb,
virtual_address_space_committed_kb/(1024) as sql_committed_mb,
memory_utilization_percentage as working_set_percentage,
virtual_address_space_available_kb/(1024) as vas_available_mb,
process_physical_memory_low as is_there_external_pressure,
process_virtual_memory_low as is_there_vas_pressure
from sys.dm_os_process_memory
Go

For SQL installations prior to 2008 (valid for 2008 and 2008 R2 as well), you can run DBCC Memorystatus, and take the total of VM Committed and AWE Allocated from the memory manager section to get a rough idea of the amount of memory being used by SQL Server.

3. Next, compare this with the total amount of RAM installed on the server. If SQL seems to be taking most of the memory, or at least, much more than it should, then we need to focus our attentions on SQL Server. The exact specifics will vary according to the environment, and factors such as whether it is a dedicated SQL server box, number of instances of SQL Server running on the server, etc. In case you have multiple instances of SQL Server, it will be best to start with the instance consuming the maximum amount of memory (or the maximum deviation from "what it should be consuming"), tune it and then move on to the next one.

4. One of the first things to check should be the value of the "max server memory" setting for SQL Server. You can check this by turning on the 'show advanced options' setting of sp_configure, or by right clicking on the instance in Object Explorer in SSMS, selecting properties, and navigating to the "memory" tab. If the value is "2147483647", this means that the setting has been left to default, and has not been set since the instance was installed. It's absolutely vital to set the max server memory setting to an optimal value. A general rule of thumb that you can use to set a starting value is as follows:
Total server memory - (Memory for other applications/instances+ OS memory)
The recommendation for the OS memory value is around 3-4 GB on 64 bit systems, and 1-2 GB on 32 bit systems. Please note that this is only a recommendation for the starting value. You need to fine tune it based on observations w.r.t performance of both SQL and other applications (if any) on the server.

5. Once you've determined that the max server memory is set properly, the next step is to find out which component within SQL is consuming the most memory. The best place to start is, quite obviously, the good old "DBCC Memorystatus" command, unless you're using NUMA, in which case, it will be best to use perfmon counters to track page allocations across NUMA nodes, as outlined here.
I will try to break down most of the major components in the DBCC Memorystatus output here (I would recommend reading KB 907877 as a primer before this):

I. First up is the memory manager section. As discussed earlier, this section contains details about the overall memory comsumption of SQL Server. An example:

Memory Manager                           KB
---------------------------------------- -----------
VM Reserved                                 4059416
VM Committed                                 43040
Locked Pages Allocated                   41600
Reserved Memory                              1024
Reserved Memory In Use                        0

II. Next, we have the memory nodes, starting with 0. As I mentioned, because there is a known issue with the way dbcc memorystatus displays the distribution of allocations across memory nodes, it is best to study the distribution through the SQL Server performance counters. Here's a sample query:

select * from sys.dm_os_performance_counters
where object_name like '%Buffer Node%'

III. Next, we have the clerks. I've tried to outline the not so obvious ones in this table, along with their uses:

Clerk Name
Used for
MEMORYCLERK_SQLUTILITIES
Database mirroring, backups, etc.
MEMORYCLERK_SQLXP
Extended Stored Procedures (loaded into SQL Server)
MEMORYCLERK_XE, MEMORYCLERK_XE_BUFFER
Extended Events

If you see any of the clerks hogging memory, then you need to focus on that, and try and narrow down the possible causes.

Another thing to watch out for is high values for the multipage allocator. If you see any clerk with extremely high values for multipage allocator, it means that the non-Bpool area is growing due to one of the following:

                                       i.            CLR Code: Check the errorlog for appdomain messages
                                     ii.            COM Objects : Check the errorlog for sp_oacreate
                                    iii.            Linked servers: Can be checked using Object Explorer in SSMS
                                   iv.             Extended stored procedures : Check the errorlog for loading extended stored procedure messages.
                                    Alternatively, you can query the sys.extended_procedures view as well.
                                     v.            Third party DLL's : Third party DLL's loaded into the SQL server process space. Run the following query to check:
        select * from sys.dm_os_loaded_modules where company <> 'Microsoft Corporation'

Here's a query to check for the biggest multipage consumers:
select type, name, sum(multi_pages_kb)/1024 as multi_pages_mb
from sys.dm_os_memory_clerks
where multi_pages_kb > 0
group by type, name
order by multi_pages_mb desc

Yet another symptom to watch out for is a high ratio of stolen pages from the Buffer Pool. You can check this in the 'Buffer Pool' section of the MEMORYSTATUS output. A sample:

Buffer Pool                                      Value
---------------------------------------- -----------
Committed                                       4448
Target                                             25600
Database                                          2075
Dirty                                                       50
In IO                                                         0
Latched                                                   0
Free                                                     791
Stolen                                                1582
Reserved                                                 0
Visible                                             25600
Stolen Potential                            22738
Limiting Factor                                     17
Last OOM Factor                                   0
Last OS Error                                          0
Page Life Expectancy                    87529

What this means is that Buffer Pool pages are being utilized for "other" uses, and not for holding data and index pages in the BPool. This can lead to performance issues and a crunch on the Bpool, thereby slowing down overall query performance (please refer to part 1 for consumers that "Steal" pages from the BPool). You can use the following query to check for the highest "Steal" consumers:

select type, name, sum((single_pages_kb*1024)/8192) as stolen_pages
from sys.dm_os_memory_clerks
where single_pages_kb > 0
group by type, name
order by stolen_pages desc

IV. Next, we have the stores namely, Cachestore, Userstore and Objectstore. Please refer to part 1 for how and by which component these clerks are used. You can use the following queries to check for the biggest Cachestores, Userstores and Objectstores respectively:

select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024
as store_size_mb
from sys.dm_os_memory_cache_counters
where type like 'CACHESTORE%'
group by name, type
order by store_size_mb desc
go

select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024
as store_size_mb
from sys.dm_os_memory_cache_counters
where type like 'USERSTORE%'
group by name, type
order by store_size_mb desc
go

select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024
as store_size_mb
from sys.dm_os_memory_clerks
where type like 'OBJECTSTORE%'
group by name, type
order by store_size_mb desc
go

V. Next, we have the gateways. The concept of gateways was introduced to throttle the use of query compilation memory. In plain english, this means that we did not want to allow too many queries with a high requirement for compilation memory to be running at the same time, as this would lead to consequences like internal memory pressure (i.e. one of the components of the buffer pool growing and creating pressure on other components).

The concept basically works like this: When a query starts execution, it will start with a small amount of memory. As its consumption grows, it will cross the threshold for the small gateway, and must wait to acquire it. The gateway is basically implemented through a semaphore, which means that it will allow upto a certain number of threads to acquire it, and make threads beyond the limit wait. As the memory consumption for the query grows, it must acquire the medium and big gateways before being allowed to continue execution. The exact thresholds depend on factors like total memory on the server, SQL Max server memory sitting, memory architecture (x86 or x64), load on the server, etc.

The number of queries allowed at each of the gateways described in the following table:

Gateway	Dynamic/Static	Config Value
Small	Dynamic	Default is (no. of CPU's SQL sees * 4)
Medium	Static	Number of CPU's SQL sees.
Large	Static	1 per instance

So if you see a large number of queries waiting on the large gateway, it means that you need to see why there are so many queries requiring large amounts of memory, and try to tune those queries. Such queries will show up with RESOURCE_SEMAPHORE_QUERY_COMPILE or RESOURCE_SEMAPHORE wait types in sysprocesses, sys.dm_exec_requests, etc.

I am listing down some DMV's that might come in handy for SQL Server Memory Troubleshooting:
Sysprocesses
Sys.dm_exec_requests
Sys.dm_os_process_memory: Usage above.
Sys.dm_os_sys_memory: Will give you the overall memory picture for the server
Sys.dm_os_sys_info: Can be used to check OS level information like hyperthread ratio, CPU Ticks, OS Quantum, etc.
Sys.dm_os_virtual_address_dump: Used to check for VAS usage (reservations). The following query will give you VAS usage in descending order of reservations:

with vasummary(Size,reserved,free) as (select size = vadump.size,
reserved = SUM(case(convert(int, vadump.base) ^ 0) when 0 then 0 else 1 end),
free = SUM(case(convert(int, vadump.base) ^ 0x0) when 0 then 1 else 0 end)
from
(select CONVERT(varbinary, sum(region_size_in_bytes)) as size,
region_allocation_base_address as base
from sys.dm_os_virtual_address_dump
where region_allocation_base_address<> 0x0
group by region_allocation_base_address
UNION(
select CONVERT(varbinary, region_size_in_bytes),
region_allocation_base_address
from sys.dm_os_virtual_address_dump
where region_allocation_base_address = 0x0)
)
as vadump
group by size)
select * from vasummary order by reserved desc
go

Sys.dm_os_memory_clerks (Usage above)
Sys.dm_os_memory_nodes: Just a select * would suffice. This DMV has one row for each memory node.
Sys.dm_os_memory_cache_counters: Used above to find the size of the cachestores. Another sample query would be
select (single_pages_kb+multi_pages_kb) as memusage,* from Sys.dm_os_memory_cache_counters order by memusage desc

Once you have narrowed down the primary consumer and the specific component which is causing a memory bottleneck, the resolution steps should be fairly simple. For example, if you see some poorly written code, you can hound the developers to tune it. For other processes hogging memory at the OS Level, you will need to investigate them. For high consumption by a particular clerk, check the corresponding components. An example would be, say, in case of high usage by the SQLUtilities clerk, one of the first things you need to check if there is any Mirroring set up on the instance, and if it’s working properly.
Another thing I would strongly recommend would be to watch out for memory related KB articles, and make sure you have the relevant fixes applied.
Hope this helps. Any feedback, questions or comments are welcome.

↧

Why the service account format matters for upgrades

April 1, 2013, 8:25 pm

≫ Next: An interesting issue with SQL Server Script upgrade mode

≪ Previous: An in-depth look at SQL Server Memory–Part 3

I've seen this issue a few times in the past few months, so decided to blog about this. When upgrading from SQL 2005 to SQL 2008/SQL 2008 R2 (or even from SQL 2008 to SQL 2008 R2), you might face an error with the in-place upgrade.

Open the setup logs folder (located in C:\Program files\Microsoft SQL Server\<Version -100 for 2008 and 2008 r2>\Setup Bootstrap\Log folder by default), and look for a folder with the datetime of the upgrade attempt. Inside this folder, look for a file named "Detail.txt".

Looking inside the detail.txt file, check for the following stack:

2013-01-21 11:16:42 Slp: Sco: Attempting to check if container 'WinNT://Harsh2k8,computer' of user account exists

2013-01-21 11:16:42 Slp: Sco: User srv_sql@contoso.test wasn't located

2013-01-21 11:16:42 Slp: Sco: User srv_sql@contoso.test doesn't exist

2013-01-21 11:16:42 SQLBrowser: SQL Server Browser Install for feature 'SQL_Browser_Redist_SqlBrowser_Cpu32' generated exception, and will invoke retry option. The exception: Microsoft.SqlServer.Configuration.Sco.ScoException: The specified user 'srv_sql@contoso.test' does not exist.

at Microsoft.SqlServer.Configuration.Sco.UserGroup.AddUser(String userName)

at Microsoft.SqlServer.Configuration.SqlBrowser.SqlBrowserPrivateConfig.AddAccountToGroup(SqlBrowserPublicConfig publicConfigSqlBrowser)

at Microsoft.SqlServer.Configuration.SqlBrowser.SqlBrowserPrivateConfig.UpdateAccountIfNeeded(SqlBrowserPublicConfig publicConfigSqlBrowser)

at Microsoft.SqlServer.Configuration.SqlBrowser.SqlBrowserPrivateConfig.ConfigUserProperties(SqlBrowserPublicConfig publicConfigSqlBrowser)

at Microsoft.SqlServer.Configuration.SqlBrowser.SqlBrowserPrivateConfig.ExecConfigNonRC(SqlBrowserPublicConfig publicConfigSqlBrowser)

at Microsoft.SqlServer.Configuration.SqlBrowser.SqlBrowserPrivateConfig.SelectAndExecTiming(ConfigActionTiming timing, Dictionary`2 actionData, PublicConfigurationBase spcbPublicConfig)

at Microsoft.SqlServer.Configuration.SqlBrowser.SqlBrowserPrivateConfigBase.ExecWithRetry(ConfigActionTiming timing, Dictionary`2 actionData, PublicConfigurationBase spcbPublicConfig).

2013-01-21 11:16:42 SQLBrowser: The last attempted operation: Adding account 'srv_sql@contoso.test' to the SQL Server Browser service group 'SQLServer2005SQLBrowserUser$Harsh2k8'..

The key thing here is the message "Attempting to check if container WinNT://Harsh2k8, computer of user account exists". If you see this message, go to the SQL Server configuration manager, right click on the offending service mentioned in the detail.txt, open the properties window and navigate to the "Log On" tab. Check the format of the service account here. It should be in the format domain\username. Change this to username@domain, and type in the password. After this, restart the SQL Service to make sure the changes have taken effect.

Try the setup again, and it should work this time.

Hope this helps.

↧

An interesting issue with SQL Server Script upgrade mode

April 15, 2013, 1:08 am

≫ Next: How To: Troubleshooting SQL Server I/O bottlenecks

≪ Previous: Why the service account format matters for upgrades

Here's another common issue that I've seen quite a few people run into of late.

When you run a patch against SQL Server, the patch installs successfully, but on restart, SQL goes into "script upgrade mode" and you're unable to connect to it. Upon looking at the errorlog, you see something like this:

2012-08-23 03:43:38.29 spid7s Error: 5133, Severity: 16, State: 1.

2012-08-23 03:43:38.29 spid7s Directory lookup for the file "D:\SQLData\temp_MS_AgentSigningCertificate_database.mdf" failed with the operating system error 2(The system cannot find the file specified.).

2012-08-23 03:43:38.29 spid7s Error: 1802, Severity: 16, State: 1.

2012-08-23 03:43:38.29 spid7s CREATE DATABASE failed. Some file names listed could not be created. Check related errors.

2012-08-23 03:43:38.31 spid7s Error: 912, Severity: 21, State: 2.

2012-08-23 03:43:38.31 spid7s Script level upgrade for database 'master' failed because upgrade step 'sqlagent100_msdb_upgrade.sql' encountered error 598, state 1, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.

2012-08-23 03:43:38.31 spid7s Error: 3417, Severity: 21, State: 3.

2012-08-23 03:43:38.31 spid7s Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

Script upgrade means that when SQL is restarted for the first time after the application of the patch, the upgrade scripts are run against each system db (to upgrade the system tables, views, etc. ). During this process, SQL Server attempts to create this mdf file in the default data location, and if the path is not available, then we get this error. Most of the time, it's a result of the data having been moved to a different folder, and the original Default Data path being no longer available.

The default data path can be checked from the following registry key (for a default SQL 2008 instance):

HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQLServer

The Mssqlserver key will have a string entry named "DefaultData". If you see a location here that's no longer available, please change it to the current data location (alternatively, you can also "recreate" the default data path mentioned in the string value).

After this, restart SQL Server and the script upgrade should complete successfully this time. Hope this helps.

↧

How To: Troubleshooting SQL Server I/O bottlenecks

June 3, 2013, 5:01 pm

≫ Next: SQL Server patch fails with "Could not find any resources appropriate for the specified culture or the neutral culture"

≪ Previous: An interesting issue with SQL Server Script upgrade mode

One of the most common reason for server performance issues with respect to SQL Server is the presence of an I/O bottleneck on the system. When I say I/O bottleneck, it can mean issues like slow disks, other processes hogging I/O, out-dated drivers, etc. In this blog, I will seek to outline the approach for identifying and troubleshooting I/O bottlenecks on SQL Server.

The Symptoms

The following are the most common symptoms of an I/O bottleneck on the SQL Server machine:

You see a lot of threads waiting on one or more of the following waits:

PAGEIOLATCH_*
WRITELOG
TRACEWRITE
SQLTRACE_FILE_WRITE_IO_COMPLETION
ASYNC_IO_COMPLETION
IO_COMPLETION
LOGBUFFER

You see the famous "I/O taking longer than 15 seconds" messages in the SQL Server errorlogs:
2012-11-11 00:21:25.26 spid1 SQL Server has encountered 192 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [E:\SEDATA\stressdb5.ndf] in database [stressdb] (7). The OS file handle is 0x00000000000074D4. The offset of the latest long IO is:0x00000000022000”.

Troubleshooting

Data Collection:

If you see the symptoms outlined above quite frequently on your SQL Server installation, then it will be safe to draw the conclusion that your instance is suffering from a disk subsystem or I/O bottleneck. Let's look at the data collection and troubleshooting approach pertaining to the same:

Enable a custom Performance Monitor collector to capture all disk related counters. Just go to start->run, type perfmon, and hit ok. Next, go to Data Collector sets->User Defined, right click on User Defined, and click New-> Data Collector set.
Note: The best thing about perfmon(apart from the fact that it is built into windows) is that it's a very lightweight diagnostic, and has negligible performance overhead/impact.
Give the data collector set a name, and select Create manually. Under type of data, select the "Create data logs" option, and check the Performance Counter checkbox under it.
Next, click on add performance counters, and select the "LogicalDisk", "Process" and "PhysicalDisk" groups, and select "All instances" for both before adding them.
After you have added the counters, you can also modify the sample interval. You might want to do this if you see spikes lasting less than 15 seconds, which is the default sample interval. I sometimes use an interval of 5 seconds when I want to closely monitor an environment .
Click on Finish and you will now see the new Data Collector set created under User Defined.
Next, right click on the Data Collector set you just created, and click start.

I normally recommend my clients to run the perfmon collector set for at least one business day, so that it has captured data for the load exerted by at least one standard business cycle.

Analysis:

Now that we have the data, we can start the analysis. After stopping the collector set, you can open the blg file generated (the path is displayed under the output column, on the right hand side in perfmon) using perfmon (a simple double click works, as the file type is associated with perfmon by default). Once open, is should have automatically loaded all the counters. Analysing with all the counters can be a bit cumbersome, so I would suggest that you first delete all the counters and then add specific counters one by one.

I will list out the important counters here, along with their expected values:

Process->IO Data Bytes/sec: This counter represents the average amount of IO Data bytes/sec spawned by each process. In conjunction with IO Other Bytes/sec, this counter can be used to determine the average IO per second as well as the total amount of IO spawned by each process during the capture. Check for the largest I/O consumers, and see if SQL is being starved of I/O due to some other process spawning a large amount of I/O on the system.
Process-> IO Other Bytes/sec: This counter represents the non-data IO spawned by each process during the capture. Usually, the amount of non-data IO is very low as compared to data IO. Use the total of both IO Data Bytes and IO other bytes to determine the total amount of IO spawned by each process during the capture. Check for the largest I/O consumers, and see if SQL is being starved of I/O due to some other process spawning a large amount of I/O on the system.
Physical Disk/Logical Disk->Avg. Disk Sec/Read: This counter signifies the average amount of time, in ms, that it takes for a read I/O request to be serviced for each physical/logical disk. An average of less than 10 ms (0.010) is good, and between 10-15 ms (0.010-0.015) is acceptable, but anything beyond 15 ms (0.015) is a cause for concern.
Physical Disk/Logical Disk->Avg. Disk Sec/Write: This counter signifies the average amount of time, in ms, that it takes for a write I/O request to be serviced for each physical/logical disk. An average of less than 10 ms (0.010) is good, and between 10-15 ms (0.010-0.015) is acceptable, but anything beyond 15 ms (0.015) is a cause for concern.
Physical Disk/Logical Disk->Disk Bytes/Sec: This counter represents, in bytes, the throughput of your I/O subsystem for each physical/logical disk. Look for the max value for each disk, and divide it by 1024 twice to get the max throughput in MB for the disk. SAN's generally start from 200-250 MB/s these days. If you see that the throughput is lower than the specifications for the disk, it's not necessarily a cause for concern. Check this counter in conjunction with the Avg Disk Sec/Read or Avg Disk Sec/Write counters (depending on the wait/symptom you see in SQL), and see the latency at the time of the maximum throughput. If the latency is green, then it just means that SQL spawned I/O that was less the disk throughput capacity, and was easily handled by the disk.
Physical Disk/Logical Disk->Avg. Disk Queue Length: This counter represents the average number of I/O's pending in the I/O queue for each physical/logical disk. Generally, if the average is greater than 2, it's a cause for concern. Check the other counters to confirm.
Physical Disk/Logical Disk->Split IO/Sec: This counter indicates the I/O's for which the Operating System had to make more than one command call, grouped by physical/logical disk. This happens if the IO request touches data on non-contiguous file segments. It's a good indicator of file/volume fragmentation.
Physical Disk/Logical Disk->%Disk Time: This counter is a general mark of how busy the physical/logical disk is. Actually, it is nothing more than the “Avg. Disk Queue Length” counter multiplied by 100. It is the same value displayed in a different scale. This is the reason you can see the %Disk Time going greater than 100, as explained in the KB http://support.microsoft.com/kb/310067. It basically means that the Avg. Disk Queue Length was greater than 1 during that time. If you've captured the perfmon for a long period (a few hours or a complete business day), and you see the %Disk Time to be greater than 80%, it's generally indicative of a disk bottleneck, and you should take a closer look at the other counters to arrive at a logical conclusion.

It's important to keep 2 things in mind. One, make sure your data capture is not skewed or biased in any way (for example, do not run a capture at the time of a monthly data load or something). Second, make sure you correlate the numbers reflected across the various counters to arrive at the overall picture of how your disks are doing.

Most of the time, I see that people are surprised when they are told that there are I/O issues on the system. Their typical response is "But, it's been working just fine for x years, how can it create a bottleneck now?". The answer lies within the question itself. When the server was initially configured, the disk resources were sufficient for the load on the server. However, with time, it's inevitable that the business grows as a whole, and so do the number of transactions, as well as the overall load. As a result, there comes a day when the load breaches that threshold, and the disk resources on the server are no longer sufficient to handle it. If you come to office one fine day, see high latency on the disks during normal working hours, and are sure that

No special/additional workloads are running on SQL
No other process on the server is spawning excessive I/O,
Nothing changed on the server in the past 24 hours (like a software installation, patching, reboot, etc.)
All the BIOS and disk drivers on the server are up to date,

Then it's highly likely that the load on your server has breached this threshold, and you should think about asking your disk vendor(s) for a disk upgrade (after having them check the existing system once for latency and throughput, of course). Another potential root cause that can cause high latency is that your disk drivers and/or BIOS are out of date. I would strongly recommend checking periodically for updates to all the drivers on the machine, as well as the BIOS.

Hope this helps. As always, comments, feedbacks and suggestions are welcome.

↧

SQL Server patch fails with "Could not find any resources appropriate for the specified culture or the neutral culture"

June 12, 2013, 3:52 pm

≫ Next: SQL 2005 patch fails with 1642 “Unable to install Windows Installer MSP file”

≪ Previous: How To: Troubleshooting SQL Server I/O bottlenecks

I recently worked on a number of issues where SQL Server Service Pack/patch installation would fail, and we would see this error in the relevant Detail.txt (located in C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\<Date time of the installation attempt> for SQL 2008/2008 R2):

2013-04-07 20:14:07 Slp: Package sql_bids_Cpu64: - The path of cached MSI package is: C:\Windows\Installer\5c23b5e.msi . The RTM product version is: 10.50.1600.1

2013-04-07 20:14:07 Slp: Error: Action "Microsoft.SqlServer.Configuration.SetupExtension.InitializeUIDataAction" threw an exception during execution.

2013-04-07 20:14:13 Slp: Received request to add the following file to Watson reporting: C:\Users\kalerahul\AppData\Local\Temp\2\tmpCC09.tmp

2013-04-07 20:14:13 Slp: The following is an exception stack listing the exceptions in outermost to innermost order

2013-04-07 20:14:13 Slp: Inner exceptions are being indented

2013-04-07 20:14:13 Slp:

2013-04-07 20:14:13 Slp: Exception type: System.Resources.MissingManifestResourceException

2013-04-07 20:14:13 Slp: Message:

2013-04-07 20:14:13 Slp: Could not find any resources appropriate for the specified culture or the neutral culture. Make sure "Errors.resources" was correctly embedded or linked into assembly "Microsoft.SqlServer.Discovery" at compile time, or that all the satellite assemblies required are loadable and fully signed.

2013-04-07 20:14:13 Slp: Stack:

2013-04-07 20:14:13 Slp: at System.Resources.ResourceManager.InternalGetResourceSet(CultureInfo culture, Boolean createIfNotExists, Boolean tryParents)

2013-04-07 20:14:13 Slp: at System.Resources.ResourceManager.GetObject(String name, CultureInfo culture, Boolean wrapUnmanagedMemStream)

2013-04-07 20:14:13 Slp: at Microsoft.SqlServer.Discovery.MsiException.GetErrorMessage(Int32 errorNumber, CultureInfo culture)

2013-04-07 20:14:13 Slp: at Microsoft.SqlServer.Discovery.MsiException.GetErrorMessage(MsiRecord errorRecord, CultureInfo culture)

2013-04-07 20:14:13 Slp: at Microsoft.SqlServer.Discovery.MsiException.get_Message()

2013-04-07 20:14:13 Slp: at System.Exception.ToString()

2013-04-07 20:14:13 Slp: at Microsoft.SqlServer.Setup.Chainer.Workflow.ActionEngine.RunActionQueue()

2013-04-07 20:14:13 Slp: at Microsoft.SqlServer.Setup.Chainer.Workflow.Workflow.RunWorkflow(WorkflowObject workflowObject, HandleInternalException exceptionHandler)

2013-04-07 20:14:13 Slp: at Microsoft.SqlServer.Chainer.Setup.Setup.RunRequestedWorkflow()

2013-04-07 20:14:13 Slp: at Microsoft.SqlServer.Chainer.Setup.Setup.Run()

2013-04-07 20:14:13 Slp: at Microsoft.SqlServer.Chainer.Setup.Setup.Start()

2013-04-07 20:14:13 Slp: at Microsoft.SqlServer.Chainer.Setup.Setup.Main()

Now that's a weird and hard to understand error, isn't it? However, look closely at what the setup is trying to do, and you will see that it's trying to access the following file from the installer cache:
C:\Windows\Installer\5c23b5e.msi

Open the installer cache and try to install the msi manually. If it succeeds, try running the patch setup again and it should proceed beyond the error this time. If the msi setup fails, then you will need to troubleshoot that first, before the patch setup proceeds further. This behaviour is expected, in that the service pack setup will try to access the msi's (Microsoft Installer files, installed with the base installation of SQL) and msp's (Microsoft Patch files, installed by Service packs, CU's and hotfixes) of each of the installed components of SQL Server. If it's unable to access/run any of these, the Service pack setup will fail.

Hope this helps.

↧

SQL 2005 patch fails with 1642 “Unable to install Windows Installer MSP file”

April 3, 2012, 3:34 am

≫ Next: How to replace/restore start menu shortcuts for any program

≪ Previous: SQL Server patch fails with "Could not find any resources appropriate for the specified culture or the neutral culture"

This one is for all my DBA friends out there. I recently ran into this issue when running a security patch installation for a SQL 2005 instance on SP4. The setup failed, and when I looked into the “C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\Log\Hotfix” folder (this is where the patch setup files for 2005 are to be found), here’s what I found in the latest summary.txt:-

**********************************************************************************
Product Installation Status
Product : SQL Server Database Services 2005 (MSSQLSERVER)
Product Version (Previous): 5000
Product Version (Final) :
Status : Failure
Log File : C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB2494120_sqlrun_sql.msp.log
Error Number : 1642
Error Description : Unable to install Windows Installer MSP file
----------------------------------------------------------------------------------
Product : SQL Server Tools and Workstation Components 2005
Product Version (Previous): 5000
Product Version (Final) :
Status : Failure
Log File : C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQLTools9_Hotfix_KB2494120_sqlrun_tools.msp.log
Error Number : 1642
Error Description : Unable to install Windows Installer MSP file
----------------------------------------------------------------------------------

“Hmmm”, I thought to myself, “here’s one I haven’t seen before”. Since the Log file indicated was in the same folder, I pulled up the “SQL9_Hotfix_KB2494120_sqlrun_sql.msp.log” file, and here’s an extract from the time the error occurred:-

MSI (s) (D8:6C) [07:36:23:597]: File will have security applied from OpCode.
MSI (s) (D8:6C) [07:36:23:644]: Original patch ==> e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp
MSI (s) (D8:6C) [07:36:23:644]: Patch we're running from ==> C:\WINDOWS\Installer\5daea.msp
MSI (s) (D8:6C) [07:36:23:644]: SOFTWARE RESTRICTION POLICY: Verifying patch --> 'e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp' against software restriction policy
MSI (s) (D8:6C) [07:36:23:644]: Note: 1: 2262 2: DigitalSignature 3: –2147287038
MSI (s) (D8:6C) [07:36:23:644]: SOFTWARE RESTRICTION POLICY: e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp is not digitally signed
MSI (s) (D8:6C) [07:36:23:644]: SOFTWARE RESTRICTION POLICY: e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp is permitted to run at the 'unrestricted' authorization level.
MSI (s) (D8:6C) [07:36:23:660]: SequencePatches starts. Product code: {130A3BE1-85CC-4135-8EA7-5A724EE6CE2C}, Product version: 9.00.1399.06, Upgrade code: {929C9FEC-8873-4A1A-A209-9AF432E8E1D1}, Product language 1033
MSI (s) (D8:6C) [07:36:23:660]: 3.0 patch e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp is of type QFE
MSI (s) (D8:6C) [07:36:23:660]: PATCH SEQUENCER: verifying the applicability of QFE patch e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp against product code: {130A3BE1-85CC-4135-8EA7-5A724EE6CE2C}, product version: 9.00.1399.06, product language 1033 and upgrade code: {929C9FEC-8873-4A1A-A209-9AF432E8E1D1}
MSI (s) (D8:6C) [07:36:23:660]: Validating transform 'Target01ToUpgrade01' with validation bits 0x920
MSI (s) (D8:6C) [07:36:23:660]: Note: 1: 2749 2: Target01ToUpgrade01 3: C:\WINDOWS\Installer\5daea.msp 4: 9.4.5000.00 5: 9.00.1399.06
MSI (s) (D8:6C) [07:36:23:660]: 1: 2749 2: Target01ToUpgrade01 3: C:\WINDOWS\Installer\5daea.msp 4: 9.4.5000.00 5: 9.00.1399.06
MSI (s) (D8:6C) [07:36:23:660]: PATCH SEQUENCER: QFE patch e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp is not applicable.
MSI (s) (D8:6C) [07:36:23:660]: SequencePatches returns success.
MSI (s) (D8:6C) [07:36:23:660]: Final Patch Application Order:
MSI (s) (D8:6C) [07:36:23:660]: Other Patches:
MSI (s) (D8:6C) [07:36:23:660]: Unknown\Absent: {89F18EEE-A409-4B25-915A-0F03651ECF48} - e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp
MSI (s) (D8:6C) [07:36:23:660]: Product: Microsoft SQL Server 2005 - Update '{89F18EEE-A409-4B25-915A-0F03651ECF48}' could not be installed. Error code 1642. Additional information is available in the log file C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB2494120_sqlrun_sql.msp.log.
MSI (s) (D8:6C) [07:36:23:660]: Note: 1: 1708
MSI (s) (D8:6C) [07:36:23:660]: Note: 1: 2729
MSI (s) (D8:6C) [07:36:23:660]: Note: 1: 2729
MSI (s) (D8:6C) [07:36:23:660]: Product: Microsoft SQL Server 2005 -- Installation failed.

Just for kicks, I also checked out the Hotfix.log (it’s the precursor to the “Detail.txt” in SQL 2008 that we so often use). Here’s an extract from it for reference:-

03/29/2012 07:36:17.986 Copy Engine: Creating MSP install log file at: C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB2494120_sqlrun_sql.msp.log
03/29/2012 07:36:17.986 Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
03/29/2012 07:36:17.986 Registry: Cannot read registry key value "Debug", error 0
03/29/2012 07:36:23.785 MSP returned 1642: The installer cannot install the upgrade patch because the program being upgraded may be missing or the upgrade patch updates a different version of the program. Verify that the program to be upgraded exists on your computer and that you have the correct upgrade patch.
03/29/2012 07:36:23.785 Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
03/29/2012 07:36:23.785 Registry: Cannot read registry key value "Debug", error 997
03/29/2012 07:36:23.801 Copy Engine: Error, unable to install MSP file: e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp
03/29/2012 07:36:23.801 The following exception occurred: Unable to install Windows Installer MSP file Date: 03/29/2012 07:36:23.801 File: \depot\sqlvault\stable\setupmainl1\setup\sqlse\sqlsedll\copyengine.cpp Line: 807
03/29/2012 07:36:24.066 Watson: Param1 = Unknown
03/29/2012 07:36:24.066 Watson: Param2 = 0x66a
03/29/2012 07:36:24.066 Watson: Param3 = Unknown
03/29/2012 07:36:24.066 Watson: Param4 = 0x66a
03/29/2012 07:36:24.066 Watson: Param5 = copyengine.cpp@807
03/29/2012 07:36:24.066 Watson: Param6 = Unknown
03/29/2012 07:36:24.066 Watson: Param7 = SQL9
03/29/2012 07:36:24.066 Watson: Param8 = @
03/29/2012 07:36:24.066 Watson: Param9 = x86
03/29/2012 07:36:24.066 Watson: Param10 = 5057
03/29/2012 07:36:24.066 Installed product: SQL9
03/29/2012 07:36:24.066 Installing product: SQLTools9
03/29/2012 07:36:24.285 Registry: Opened registry key "Software\Microsoft\Windows\CurrentVersion\Uninstall"
03/29/2012 07:36:24.301 Installing instance: SQL Tools
03/29/2012 07:36:24.301 Installing target: SPJP063
03/29/2012 07:36:24.301 Installing file: sqlrun_tools.msp
03/29/2012 07:36:24.332 Copy Engine: Creating MSP install log file at: C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQLTools9_Hotfix_KB2494120_sqlrun_tools.msp.log
03/29/2012 07:36:24.332 Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
03/29/2012 07:36:24.332 Registry: Cannot read registry key value "Debug", error 0
03/29/2012 07:36:38.930 MSP returned 1642: The installer cannot install the upgrade patch because the program being upgraded may be missing or the upgrade patch updates a different version of the program. Verify that the program to be upgraded exists on your computer and that you have the correct upgrade patch.
03/29/2012 07:36:38.930 Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
03/29/2012 07:36:38.930 Registry: Cannot read registry key value "Debug", error 997
03/29/2012 07:36:38.930 Copy Engine: Error, unable to install MSP file: e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixTools\Files\sqlrun_tools.msp
03/29/2012 07:36:38.930 The following exception occurred: Unable to install Windows Installer MSP file Date: 03/29/2012 07:36:38.930 File: \depot\sqlvault\stable\setupmainl1\setup\sqlse\sqlsedll\copyengine.cpp Line: 807

No clues, right? So finally, in a desperate attempt, I decided to capture a Process Monitor trace (available on Technet, see here ). And whoa, look what I found there:-

00:24:41.4884798 msiexec.exe 6764 RegEnumKey HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\UserData\S-1-5-18\Products\<SQL GUID>\Patches NO MORE ENTRIES Index: 0, Length: 288

“Aha”, I thought to myself, “so this is the problem”. Basically (and you can check this on a normal/healthy installation), the Patches key is supposed to have subkeys (1 for each patch) for all the patches applied to the SQL Server instance so far; and those keys seem to be missing in this case.

So what happened? God knows. Could be anything, could be someone deleted them manually, or some cleanup programs “Cleaned” them up by mistake, etc.

The main question, though, is how do we fix it? Simple enough, removing and re-installing the previous Service Pack should recreate the registry keys, and should thus fix it, right? Wrong. Service Pack uninstallation was introduced from SQL 2008 onwards, so that’s not possible. So what else?

Warning : This is one of those “weird” solutions. Some might even call it a hack, though I just call it exploiting a loophole in the service pack installer. Here are the steps:-

Rename the “sqlservr.exe” in the Binn folder of your instance.
Copy the sqlservr.exe from another instance, that’s on a lower SP/RTM build than the your target instance (in my case, the target was an instance on SP4, so I used the sqlservr.exe from an instance on SP3)
Paste the exe into the Binn folder of your instance.
Now run the SP setup (in my case, it was the SP4 setup), and it should be able to detect SQL on the lower build and allow you to proceed with the install, thereby creating the missing registry entries in the process.

Yes, you could say this is a loophole in the Service Pack install process, that it only checks the build of the sqlservr.exe to determine what build the instance is on, and I would actually agree with you. But in situations like this, it’s these “loopholes” that come in handy.
As always, any comments/feedback/questions are both welcome and solicited.

↧

How to replace/restore start menu shortcuts for any program

April 5, 2012, 7:27 am

≫ Next: SQL, Sharepoint and the Windows Internal Database – an interesting saga

≪ Previous: SQL 2005 patch fails with 1642 “Unable to install Windows Installer MSP file”

Okay, let me admit first up that this is not an out and out SQL Server issue, but one of those interesting ones, that required me to provide an easy workaround. What happened was, someone (or some program) deleted the entire SQL Server folder from the start menu. All the components were, however, still installed and were functioning perfectly.

I actually went to the following path (this was a SQL 2008)

C:\Program Files (x86)\Microsoft SQL Server\100\Tools\binn\VSShell\Common7\IDE

and was able to find the ssms.exe there, using which SQL Server Management Studio started perfectly.

So the question was, how do we get the shortcuts back? Here are the steps:-

Go to another machine which has the shortcuts in place, right click on the SQL Server 2008 folder in the start menu, and select copy:-
Next, go to a windows explorer folder one the same box, and press Ctrl+V. You will see a folder being pasted there:
Now, copy this folder and paste it into the following path on the machine where the shortcuts are missing:-
C:\ProgramData\Microsoft\Windows\Start Menu\Programs

And voila, all your shortcuts are back. Cool one, isn’t it?

P.S. Please note that this only works if the installation paths are the same for both the machines involved (which they mostly are for Tools and Workstation Components).

↧

SQL, Sharepoint and the Windows Internal Database – an interesting saga

April 6, 2012, 7:58 am

≫ Next: SQL Server Resource database corruption–yes, it’s possible

≪ Previous: How to replace/restore start menu shortcuts for any program

This one is for all my friends out there who use Sharepoint. A default Sharepoint installation enables/installs the Windows Internal database, and creates its databases on it. The Windows Internal Database is, in a way, a special edition of SQL Server, in the sense that it’s not a Full version, but does not have the data file limitations of SQL Server Express either (yes, you heard that right). Anyways, the focus of this post is going to be on the following things:

How to connect to the Windows Internal Database (to see what's going on at the back-end)
How to troubleshoot common issues such as log file growth for Sharepoint databases attached to Windows Internal database (from a purely SQL perspective)
How to set up automated SQL backups for your Sharepoint databases (remember, Windows Internal database does not have SQL Server Agent, and normal Windows scripts for taking backups will not work either).

Okay, so let’s get started:

Connecting to the Windows Internal Database

If you open the SQL Server Configuration manager on a machine that has Windows Internal database enabled, you will see a service named “Windows Internal Database (MICROSOFT##SSEE)” (also visible on the services console). Right click on the service in SQL Server Configuration manager, go to “Properties”, and click on the “Advanced” tab. Here, select the “Startup Parameters” option, and you will see a drop down next to it. In the drop down, look for the path to the Errorlog. Typically, it will be something like this:

C:\Windows\SYSMSI\SSEE\MSSQL.2005\MSSQL\LOG\ERRORLOG

So now we have the path to the Errorlog for the Windows Internal Database. Open the errorlog in a text editor (notepad or anything else of the sort), and look for the pipe name. Typically, the pipe name looks something like this:

Server local connection provider is ready to accept connection on [ \\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query ]

This is what we will use to connect to the WI database (yeah, I’m feeling lazy). So we just start up SQL Server Management Studio (on the local box, as you cannot connect to the Windows Internal Database over the network), and fill in the pipe name there, which is “\\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query” in our case, and hit Connect, and voila, you’re connected.

Troubleshooting log file growth

Now, if you’re facing issues with, say, log file growth with your Sharepoint databases (which are attached to the Windows Internal Database instance, of course), then as usual, the first thing to check would be the log_reuse_wait_desc column in sys.databases

select log_reuse_wait_desc,* from sys.databases

This should give you a fair idea if there’s anything preventing your log files from reusing the space inside them. From a SQL perspective, perhaps the best thing would be to put the databases in Simple recovery model, so that you can stop worrying about Log file space reuse altogether. I have done this successfully for a couple of my customers, without any adverse impact whatsoever to their environments. But that’s not to say that it will work fine for your environment as well. Please do take a full backup both before and right after you make the change, to be safe. It might also be a good idea to restore the db on another server and test it after changing the recovery model to Simple.

Setting up Automated backups

This is by far the most interesting part of the post, or at least, the one that took me the maximum amount of time to accomplish. My customer wanted to set up automated backups from inside SQL for the Sharepoint databases. After a lot of time and effort in preparing and testing, we finally got the script ready (SQL_WIDB_Backup.sql, see attached).

You need to customize the script according to you database names and file paths, and then configure a bat file which calls the sql script. The bat file will have a command like this (again, please configure according to your environment):

sqlcmd -S\\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query -i c:\SQL_WIDB_Backup.sql -o c:\SQL_WIDB_Backup_Report.txt

The bat file can then be configured to run at specific times using the "Task Scheduler" (Start->Accessories->System Tools).

Hope this helps.

↧

SQL Server Resource database corruption–yes, it’s possible

April 9, 2012, 11:50 am

≫ Next: The ‘NULL’ Debate, and a few other interesting facts

≪ Previous: SQL, Sharepoint and the Windows Internal Database – an interesting saga

It’s very rare that I run into an issue with the Resource database, and the one I ran into recently was rarer still. But before I get into the nitty-gritty of the issue, let us begin by outlining a few details about the resource database:

The Resource database

The resource database is a hidden system database, and cannot be accessed explicitly by the users. Also, as documented here, there is no way inside SQL Server to back up the resource DB. The only way to take a backup of the resource db is to make file level copies. This is something that you can do either manually or through VSS (disk level) backups.

Now, it’s not without reason that we do not have any way to take backups of the Resource database. A few salient points:

The resource DB is a read-only database
Outside of a hardware issue, there is no way for the resource db to get corrupted.

But what if there is a hardware problem, say, god forbid, your SAN crashes, or if there’s some sort of a “scribbler” issue with one of the hardware drivers (more details on that in a different post), and you end up with your resource database corrupted, what do you do? Here are the options, in order:

The ideal way to get out of this situation is to restore the resource db files from file level backups. So if you’re reading about this database for the first time, the first thing you should do is to make file-level copies of the resource db files (or add them to the set of files you back-up using VSS backups). I would recommend taking backups of the resource db files immediately after the successful application of a hotfix/Service Pack/CU/Security Update.
If you are in this situation already, and do not have a backup of your resource db files, do not despair. Simply take another server, install an instance with the same instance id and instance name as the target instance, and bring it to the same build as well. Once this is done, stop the SQL Service, copy the resource db files, and use them to replace the corrupted resource db files on the problem instance. Your SQL server service should come online now. I’ve tested this extensively on SQL 2008 and 2008 R2, and it indeed works.
If this is a cluster, and you’re on SQL 2008 or later, you can try bringing SQL up on the second node. If the second node’s copy of the resource db files are not corrupted, you should be successful.

Now, allow me to explain why this special case described in bullet 3 exists:
In SQL 2005, the resource db was tied to the master database, and the resource db mdf and ldf files had to be in the same folder as the master db files, else your SQL Service would fail to start. In case of a cluster, the resource db resided on a clustered drive, and when the failover happened, the ownership of the resource database was passed to the second node. Since we had only one copy of the resource database to patch, we were able to patch all the nodes on the cluster in a single run in case of SQL 2005.

This behaviour changed from SQL 2008 onwards. In SQL 2008 and 2008 R2, the resource database is no longer tied to the master database, and exists in the Binn folder instead. So basically, the resource database is a part of the instance binaries from SQL 2008 onwards. This is why, in case of SQL 2008 and 2008 R2, you need to patch both the nodes separately (one by one). Makes sense? This is why I mentioned in point 3 above that if you are on a cluster and SQL is 2008 or later, there is a good chance you might be able to get SQL up on the other node, even if the resource db files on one node are corrupted.

As a last word, if you’re not sure how your resource db files came to be corrupted, please take it as a top priority to find the root cause behind the corruption, as this is definitely something that warrants further investigation.

If you have any interesting incidents to share w.r.t the resource database, please feel free to do so in the comments section.

↧

The ‘NULL’ Debate, and a few other interesting facts

April 18, 2012, 3:46 am

≫ Next: SQL 2008/2008 R2/2012 setup disappears/fails when installing Setup Support files

≪ Previous: SQL Server Resource database corruption–yes, it’s possible

This is for all my developer friends out there. I recently had a very interesting discussion with a friend of mine on the enigma called NULL and how it’s different from, say, an empty string. This is something that’s been under debate for as long as i can remember, and not just in the realm of RDBMS.

So what is NULL? A NULL is an undefined value, and is not equivalent to a space or an empty string. Let me illustrate with an example:

create table t1 (id int, name varchar(20)) --create a table with two fields

insert into t1(id) values(1) -- insert a row containing the value for the first field only

select * from t1

id name
1 NULL

Here, because we did not insert anything for the second field, the field was populated with a default value of NULL. Let’s see what happens if we insert a blank string for the second field:

insert into t1 values(2,'') --just two single quotes, with nothing between them
go

select * from t1

id    name
1    NULL
2

In this case, because we specified an empty string, the value does not amount to NULL.

Similarly, if you insert a string containing only spaces in a cell, and then apply the trim functions (ltrim and rtrim) on it, the resultant value will not amount to NULL:

Insert into t1 values(3,' ')
go

select id, ltrim(rtrim(name)) from t1

id    (No column name)
1    NULL
2
3

The Len function

Another interesting thing I discovered was w.r.t the Len function, used to find the length of a character expression. For example, the statement select Len ('Harsh') returns an output of 5. Also, Select Len(‘’) returns 0. Both of these outputs are as expected. However, what if run Select Len (‘ ‘) (this has about 5 whitespaces) ? The expected output is 5 right? Wrong. The output is 0.

Another twist is if you add a character to the end of the string, after the whitespaces, i.e., Select Len (‘ a’) will return an output of 5. Try the following cases as well, just for fun:

Select Len(‘ a ‘) --the character a enclosed by 2 whitespaces on each side

Select Len(‘h ‘) -- the character h followed by 4 whitespaces

For the first one, the output is 3, and not 5 as I expected. This is because the Len function, by design, ignores trailing spaces. In other words, you could say that it does an implicit rtrim on the string. This is also the reason why the second statement will return a length of 1, not 5 as expected.

In case your application is such that the presence of whitespaces in the data matters and you need them to be counted in the string length (this can be especially true if you’re writing code to move the data as-is to a table/database/application), then a suitable alternative would be the Datalength function. The Datalength function counts whitespaces, both preceding and trailing, when calculating the length. As a simple example, select datalength(' a ') (a enclosed by 2 whitespaces on each side) will return 5 as against 3 returned by Len.

Hope this helps a few of my developer friends out there.Any comments/suggestions/feedback are welcome.

↧

SQL 2008/2008 R2/2012 setup disappears/fails when installing Setup Support files

May 7, 2012, 7:49 am

≫ Next: How a tiny little whitespace can make life difficult for your SQL Cluster

≪ Previous: The ‘NULL’ Debate, and a few other interesting facts

I’m sure many of you would have seen this issue when running SQL 2008/2008 R2/2012 setup on a new server. The setup will proceed to install Setup support files, the window will disappear but, strangely enough, the next window never shows up.

Here’s what you need to do:

Click on start->run and type %temp% and press enter (basically, go to the temp folder)
Here, look for SQLSetup.log and SQLSetup_1.log. Open the SQLSetup_1.log file. In there, check for the following messages:
04/16/2012 17:16:47.950 Error: Failed to launch process
04/16/2012 17:16:47.952 Error: Failed to launch local setup100.exe: 0x80070003

Typically, you get this error only in SQL 2008, SQL 2008 R2 and SQL 2012. The steps are slightly different for all 3, and I’ve tried to outline them here:

SQL Server 2008

1. Save the following in a .reg file and merge to populate the registry:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\Bootstrap]
"BootstrapDir"="C:\\Program Files\\Microsoft SQL Server\\100\\Setup Bootstrap\\"

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\Bootstrap\Setup]
"PatchLevel"="10.0.1600.22"

2. Next, copy the following files and folders from the media to the specified destinations:

File/Folder in media	Destination
X64/X86 (depending on what architecture you want to install)	C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Release
Setup.exe	C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Release
Setup.rll	C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Release\Resources\1033\

SQL Server 2008 R2

1. Save the following in a .reg file and merge to populate the registry:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\Bootstrap]
"BootstrapDir"="C:\\Program Files\\Microsoft SQL Server\\100\\Setup Bootstrap\\"

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\Bootstrap\Setup]
"PatchLevel"="10.50.1600.00"

2. Next, copy the following files and folders from the media to the specified destinations:

File/Folder in media	Destination
X64/X86 folder (depending on what architecture you want to install)	C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\SQLServer2008R2
Setup.exe	C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\SQLServer2008R2
Resources folder	C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\SQLServer2008R2

Next, re-run the setup, and it should proceed beyond the point of error this time.

SQL Server 2012

1. Save the following in a .reg file and merge to populate the registry:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\Bootstrap]
"BootstrapDir"="C:\\Program Files\\Microsoft SQL Server\\110\\Setup Bootstrap\\"

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\Bootstrap\Setup]
"PatchLevel"="11.00.2100.60"

2. Next, copy the following files and folders from the media to the specified destinations:

File/Folder in media	Destination
X64/X86 folder (depending on what architecture you want to install)	C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\SQLServer2012
Setup.exe	C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\SQLServer2012
Resources folder	C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\SQLServer2012

Next, re-run the setup, and it should proceed beyond the point of error this time.

As always, comments/suggestions/feedback are welcome and solicited.

↧