Quantcast
Channel: SQL Journey
Viewing all 62 articles
Browse latest View live

SQL Server Service Pack installation may fail if your instance name is a Windows reserved word

$
0
0

Okay, so I woke up one morning and decided that this was a good day to patch my SQL Server 2008 R2 instance (named LPT2) to Service Pack 1. So I just downloaded the Service Pack from the Microsoft website, ran it, and was going through the screens, whistling softly to myself, when….CRASH…!!!! The Service pack setup failed…!!! And even worse, I did not even get the basic error prompt that I feel is the least DBA’s like me deserve. So I just thought to myself, "okay fine, Mr. Service Pack, so you wanna play this the hard way? Let’s see what you got."

I pulled up the C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log folder, selected the folder with the date modified that most closely matched the time of the ill-fated installation attempt, and opened the summary.txt inside it….but alas, no clues there. "No problem," I thought to myself, "lets dive in deeper." So I opened the Detail.txt in the same folder, and searched for the common error strings like “Return value 3” and “at Microsoft.SQLserver”, etc. Surprisingly enough, still nothing…!!!
So I just switched to the basic “failed”, and found this "immensely descriptive" error message:

2011-10-19 11:08:50 Slp: Attempting to run patch request for instance: LPT2
2011-10-19 11:08:53 Slp: Error: Failed to run patch request for instance: LPT2 (exit code: -2068774911)

By now I had lost track of the tune which I was whistling, and an uncertain frown had taken over my expression. Upon doing some (okay, a lot of) research, I arrived at the following conclusion:

The root cause of the issue was that my instance name was the same as a windows keyword. You can find a list of the Windows reserved keywords here:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

The issue occurs because from SQL 2008 onwards, the service pack installation creates a folder with the instance name inside the respective "date_timestamp" folder in C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log folder. Since the instance name is a windows reserved keyword, the installation is not able to create the folder; hence it is unable to proceed beyond this point.

Yes, I know you're surprised that we're able to install an instance with such a name in the first place, but that's because the instance folder is named MSSQL10.<keyword>, or MSSQL10_50.<keyword>, so Windows allows the folder creation at the time of running the RTM installation.

The bad news is, there is (did I say unfortunately?) no resolution for this situation. The only way to proceed from here is to use a different instance name, i.e. you have to install a new instance and move all the databases over to it.

So if you’re still reading this post, I hope it leaves you with the same lesson that it left me with:-

“Don’t install instances with names that are windows reserved keywords”

Please let me know if you have any questions or concerns regarding the issue mentioned in this post. Or, if you have been unfortunate enough to encounter this issue yourself, and know of a workaround for it, your two cents would be highly appreciated.


Implementing SSL encryption for SQL Server in a DNS forwarding environment

$
0
0

Let’s say you have an environment which implements DNS forwarding. In such a setup, the client uses a different name (or FQDN) while connecting to SQL Server than the actual SQL Server name (or FQDN). The connection request is forwarded to actual SQL Server through DNS forwarding implemented at the n/w layer.

In such an environment, the standard procedure for implementing SSL Encryption will not work. This post seeks to list out the steps needed to implement SSL Encryption successfully in such a scenario.

The Error

Do we love them or what? If you try to implement SSL encryption (either client side or Server side) using the standard procedure, the attempt to connect to SQL Server will generate the following error:-

[System.Data.SqlClient.SqlException]        
{"A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: SSL Provider, error: 0 - The certificate's CN name does not match the passed value.)"}        

The reason we get this error is because the client is submitting a connection request to say, 'Server X' (which doesn't exist), and the actual SQL Server name is, say, 'Server Y'. Normal connectivity works fine, as the DNS forwarding alias exists on the n/w (the DNS server uses something like a mapping table to look up the actual server name for each request, and then redirects the connection attempt to the concerned server). While using force protocol encryption, the connection request fails since the Certificate being used has been issued to the actual Server name, and the responding server name is different from the one specified in the connection request. As a result, connectivity will fail.

So how do we fix it?

There are basically three things that need to be done for implementing SSL Encryption in such an environment:-

1) Create DNS CNAME record(s) which map the alias(es) to the "actual" name of SQL server: - Please use the KB article titled “Creating a DNS Alias Record” available at http://support.microsoft.com/kb/168322

Basically, this relates to the DNS forwarding part of the problem. If we create A records instead of CNAME records for the aliases, we get this error:-

[Microsoft] [ODBC SQL Server Driver] [TCP/IP Sockets] SSL Security error

2) Create a single DNS A record for that "actual" name of the SQL Server.

3) Create a SSL certificate with the "Subject Alternative Name" field: - In such a scenario, the certificate should have the "SUBJECT ALTERNATIVE NAME" field enabled, and this should contain the actual name or FQDN of the SQL Server ('Server Y' in the example above) as well as all the aliases ('Server X' in the example above).

The CN of the "SUBJECT " field should contain the "Actual" name of the SQL server.

4) To enable Subject Alternative Name field, run the following command on the CA(Certification Authority) server: -

certutil -setreg policy\EditFlags +EDITF_ATTRIBUTESUBJECTALTNAME2
net stop certsvc
net start certsvc

This command will add a registry entry to enable the "SUBJECT ALTERNATIVE NAME" field on the certificates.

If you are using a third party certificate, request your vendor to issue you a certificate with the "SUBJECT ALTERNATIVE NAME" field enabled. The contents the "SUBJECT ALTERNATIVE NAME" field should have are outlined in Step 6.

5) Submit a new certificate request to the CA (Certification Authority) to get the new certificate issued with both "SUBJECT" and "SUBJECT ALTERNATIVE NAME".

6) The CN name of the "SUBJECT" should have the "actual" name of the SQL Server. The "SUBJECT ALTERNATIVE NAME" field on the certificate should have the actual SQL Server name, as well as all the CNAME labels (aliases).

7) Install the newly issued certificate on the SQL Server or the client depending on whether you are trying to implement Server side or client side encryption.

Please feel free to post the questions in comment section!

An interesting find about Temp tables in SQL Server

$
0
0

I ran into a very interesting issue with temp tables recently. The issue description goes something like this. We have an application preparing some SQL statements, and sending them to the SQL Database engine for execution. However, the “issue” is easily reproducible in SQL Server Management studio. The first batch of statements looks something like this:-

--Execution Set 1 starts
if exists(select* from tempdb..sysobjects where id=OBJECT_ID('tempdb..#Tmp')and xtype='U')drop table #Tmp

Select

'1'as A,

'2'as B

Into #Tmp

Select

        a,b

from #Tmp

--Execution Set 1 ends here

Next, we prepare and send the following set of statements, from the same session:-

--Execution set 2 starts

if exists(select* from tempdb..sysobjects where id=OBJECT_ID('tempdb..#Tmp')and xtype='U')

drop table #Tmp

Select

'3'as A,

'4'as B,

'The Troublemaker' as C

Into #Tmp

Select

        a,b,c

from #Tmp

--Execution Set 2 ends here

Upon execution, the second batch generates an error:-

Msg 207, Level 16, State 1, Line 11

Invalid column name 'c'.

It does seem that SQL Server is caching the temp table definition, and when the second batch of statements goes in (remember, it is being compiled as a complete batch), the “select from” statement is compiled against the existing temp table definition, and thus, fails. However, if we use a “Select * from” instead of “Select a,b,c” in the second batch, we’re able to get the desired results. This is because when it actually gets to the execution of the “select from”, the table definition has been changed, and it picks up the new definition.

I also found that adding a “go” statement after the “If exists…then drop table #Tmp” statement resolves the issue in Management studio. This is again expected, as the go statement acts as a batch separator, and since the table(as well as it’s cached definition) has been dropped when the second select into statement (Batch 2) is parsed, it’s able to create a new table using the statement.

Adding a “go” statement after the “select into” statement also resolves the issue, and again, this makes sense too, because the select into statement goes as a separate batch, and the select from as a different one, the table definition in the cache has been updated (as the select into statement was compiled and executed before the select from statement came along).

However, in my case, since the customer was using an application, using go was not possible (since go is not a T-SQL command, as documented here). Upon doing some detailed research, I found the following excerpt from the Books Online for “Create Table” (available here) to be relevant to the situation at hand (though the scenario is not the exact same one):-

“A local temporary table created within a stored procedure or trigger can have the same name as a temporary table that was created before the stored procedure or trigger is called. However, if a query references a temporary table and two temporary tables with the same name exist at that time, it is not defined which table the query is resolved against. Nested stored procedures can also create temporary tables with the same name as a temporary table that was created by the stored procedure that called it. However, for modifications to resolve to the table that was created in the nested procedure, the table must have the same structure, with the same column names, as the table created in the calling procedure.”

So, in case if you ever face this issue, here are the possible workarounds:-

Connecting from SSMS (or sqlcmd or osql)

When connecting from SSMS, the simplest workaround is to insert a “go” statement in the second batch (after either of the first 2 statements), thereby breaking the batch into two batches. We can also try using a different temp table name, thereby eliminating the issue completely.

Connecting from Application

When connecting from an application, we can have the following workarounds:-

· Use a different temp table name

· Drop the temp table at the end of the first batch itself, rather than at the start of the second batch (which is what my customer used)

· Split the second batch into two batches, placing the “If exists…then drop table #tmp” statement and the other two “Select” statements in 2 separate batches.

Hope this helps someone. Please do let me know if you have any questions/doubts/comments related to the issue, or if you know of a different workaround for it.

SQL 2008/R2 setup fails with "Wait on the database engine recovery handle failed"

$
0
0

When installing SQL Server 2008/2008 R2, you might come across a situation where the setup fails towards the end, when trying to start the SQL Server services.

You find this message in the summary.txt:-

Configuration error code: 0x4BDAF9BA@1306@24
Configuration error description: Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes.
Configuration log: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20110831_132727\Detail.txt

In the detail.txt, you find these messages:-

2011-08-31 13:49:57 Slp: at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.Execute(String actionId, TextWriter errorStream)
2011-08-31 13:49:57 Slp: Exception: Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineConfigException.
2011-08-31 13:49:57 Slp: Source: Microsoft.SqlServer.Configuration.SqlServer_ConfigExtension.
2011-08-31 13:49:57 Slp: Message: Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes..
2011-08-31 13:49:57 Slp: Watson Bucket 1

Also, since the services are created, the errorlog is also updated. You will find these messages in the errorlog:-

2011-08-31 13:49:57.25 spid7s Starting up database 'mssqlsystemresource'.
2011-08-31 13:49:57.35 spid7s The resource database build version is 10.50.1600. This is an informational message only. No user action is required.
2011-08-31 13:49:57.49 spid7s Error: 15209, Severity: 16, State: 1.
2011-08-31 13:49:57.49 spid7s An error occurred during encryption.

The service will not come online if you try to start it from configuration manager or services console.

The root cause of this issue, in most cases, is that the profile of the user being used for the service account (in my case it was local system) is corrupted.

To resolve it, follow these steps:-

When the installation throws this error, click on OK and allow it to proceed. It will fail for Database Engine, but the SQL Server service should have been created. Check the Services console.

If the service is present, perform the following steps:-

1. Go to SQL Server Configuration manager, right click on the SQL Server service, and change the service account (if it is local system, give it a windows level account, and vice-versa). It might throw a WMI error but you will see the account getting updated anyway. If not, then use the Services console. Change the account for SQL Agent as well.

2. Next, try to start the service. It should come online.

3. However, you will not be able to log in to the SQL Server

4. Now stop the service and start it from the command prompt using -m -c -T3608 parameters.

5. Now try logging in to the server using admin connection from sqlcmd (sqlcmd admin:<server name>\<instance name> ...)

6. Once logged in, use the sp_addsrvrolemember '<domain\username>','sysadmin'.

Also add the builtin/administrators to the sysadmin role

7. Now stop the service from the command prompt and start it from SQL Server configuration manager

You should be able to log in to the server now.

Hope this helps someone.

SQL Server Patch installation fails with error 1603

$
0
0

Ran into this interesting issue recently. I was trying to install a patch on SQL, and it failed. I searched in the hotfix.log (since this was a SQL 2005 instance), and found these messages:-

02/03/2012 03:01:03.649 Installing file: sqlrun_sql.msp
02/03/2012 03:01:03.696 Copy Engine: Creating MSP install log file at: C:\Program Files (x86)\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB2494113_sqlrun_sql.msp.log
02/03/2012 03:01:03.696 Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
02/03/2012 03:01:03.696 Registry: Cannot read registry key value "Debug", error 0
02/03/2012 03:01:04.226 MSP returned 1603: A fatal error occurred during installation.

From here, you can determine which component failed (in this case it was sqlrun_sql).

In case of SQL 2008 and R2, the summary.txt will point you to the relevant component's log file:-
Log with failure: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20111228_093034\SQL8\sql_fulltext_Cpu64_1.log
Upon opening the log of the particular component, you will see something like this:-
MSI (s) (F0:84) [09:38:53:925]: Opening existing patch 'C:\Windows\Installer\278e8021.msp'.
MSI (s) (F0:84) [09:38:53:925]: SequencePatches starts. Product code: {F233AB42-A61A-4FAB-82A5-A0EBEE94176E}, Product version: 10.50.1600.1, Upgrade code: {5A5E4CCE-8348-4EF8-8FA1-2B5524329CF5}, Product language 1033
MSI (s) (F0:84) [09:38:53:926]: PATCH SEQUENCER: verifying the applicability of minor upgrade patch p:\c5316177906c8cf6c8a3248b50cc\x64\setup\sql_fulltext.msp against product code: {F233AB42-A61A-4FAB-82A5-A0EBEE94176E}, product version: 10.50.1600.1, product language 1033 and upgrade code: {5A5E4CCE-8348-4EF8-8FA1-2B5524329CF5}
MSI (s) (F0:84) [09:38:53:926]: Note: 1: 2262 2: _Tables 3: -2147287038
MSI (s) (F0:84) [09:38:53:926]: Note: 1: 2262 2: _Columns 3: -2147287038
MSI (s) (F0:84) [09:38:53:926]: PATCH SEQUENCER: minor upgrade patch p:\c5316177906c8cf6c8a3248b50cc\x64\setup\sql_fulltext.msp is applicable.
MSI (s) (F0:84) [09:38:53:926]: SequencePatches returns success.
MSI (s) (F0:84) [09:38:53:926]: Final Patch Application Order:
MSI (s) (F0:84) [09:38:53:926]: {2E7EB973-48D5-43CA-9360-CA011FAE81EE} - p:\c5316177906c8cf6c8a3248b50cc\x64\setup\sql_fulltext.msp
MSI (s) (F0:84) [09:38:53:926]: Other Patches:
MSI (s) (F0:84) [09:38:53:926]:
Internal Exception during install operation: 0xc0000005 at 0x000007FEFDE93821.
MSI (s) (F0:84) [09:38:53:926]: WER report disabled for silent install.
MSI (s) (F0:84) [09:38:53:926]: WER report disabled for non-console install.
MSI (s) (F0:84) [09:38:53:926]: Internal MSI error. Installer terminated prematurely.

I noticed that as soon as it determines the final patch application order and decides to proceed with the install, it hits an AV and the windows installer terminates.

Though it may seem otherwise, this is not a problem with the windows installer itself, but a problem with the msp file in question. The file is corrupted in some way, and therefore caused the installer to fail.

I was also able to reproduce the issue on my machine as well. How? Simple, run the FindSQLInstallsOnly script (available here), pick the msp for a component, go to installer cache and corrupt it (something as simple as opening it in a notepad, modifying it and saving the changes), and then run a patch which includes a fix for the component in question. You should see a stack similar to the one above in the Setup logs.

So what’s the resolution? You guessed it. There are multiple ways to get around this issue. You can:-

1. Go to the installer cache (C:\windows\Installer) and delete/rename the offending file, and then recreate a "good" copy of the same (recommended). The steps for this can be found in the KB:-

http://support.microsoft.com/kb/969052

2. Go to the registry and remove all references to the installer cache file, as well as its last used source path (simply search for the installer cache file name, which you can get from the output of the FindSQLInstallsOnly script). This will allow the file to be created afresh when the patch runs for the next time.

Hope this helps someone. Any questions/feedback, please feel free to use the comments section. Thanks.

Windows Server 2003 - KB 2463332 for Windows Internal Database fails to install

$
0
0

This is an interesting issue that I ran into, and took some time to find out the steps for resolution. What was happening was the customer was receiving repeated prompts for installing KB 2463332 for Windows Internal Database. It could, of course, have been any other update for the windows internal DB as well.

Now, what sets this scenario apart from normal troubleshooting ones is the absence of our sweet old SQL setup logs. So, to troubleshoot the issue, here’s what you need to do:-

1.Download the update from
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=34D4CE5C-23D1-47D2-B9D2-AAB32DB41B19

2.Extract the exe using /x (i.e. run the exe from the command prompt, with the /x switch, which will give you a prompt for the location you want to extract it to).

3.Run the following command from the command prompt:-
msiexec /i SSEE_10.msi CALLERID=OCSetup.exe REINSTALL=ALL REINSTALLMODE=vomus /qn REBOOT=ReallySupress /l*v wsee.log

4.Check the wsee.log. You may find something like this(or even a different error, but at least now you have one to proceed on):-
GetServiceUserGroup failed for MICROSOFT##SSEE, 5
Error Code: 0x80070534 (1332)
Windows Error Text: No mapping between account names and security IDs was done.
Source File Name: sqlca\sqlcax.cpp
Compiler Timestamp: Thu Dec 9 14:16:30 2010
Function Name: SetInstanceProperty
Source Line Number: 1224

Error Code: 1332
MSI (s) (A0!E8) [13:14:54:064]: Product: Windows Internal Database -- Error 29528. The setup has encountered an unexpected error while Setting Internal Properties. The error is: Fatal error during installation.

Error 29528. The setup has encountered an unexpected error while Setting Internal Properties. The error is: Fatal error during installation.

5.Go to HKLM\software\microsoft\microsoft sql server\mssql.2005\setup and clear the contents of the FTSGroup and SQLGroup keys. (this is obviously based on the exact error found in step 4)

6.Re-run installation using the same command.

7.You may find something like this in the log:-
MSI (s) (A0:F8) [13:24:48:076]: Skipping action: UpgradeRestoreServiceStatus.D20239D7_E87C_40C9_9837_E70B8D4882C2 (condition is false)
MSI (s) (A0:F8) [13:24:48:076]: Doing action: RemoveExistingProducts
Action ended 13:24:48: SetProductNameInstance. Return value 1.
MSI (s) (A0:F8) [13:24:48:076]: Skipping RemoveExistingProducts action: current configuration is maintenance mode or an uninstall
Action start 13:24:48: RemoveExistingProducts.
MSI (s) (A0:F8) [13:24:48:076]: Skipping action: RemoveSqlProducts.D20239D7_E87C_40C9_9837_E70B8D4882C2 (condition is false)
MSI (s) (A0:F8) [13:24:48:076]: Doing action: ChangeServiceConfig.D20239D7_E87C_40C9_9837_E70B8D4882C2
Action ended 13:24:48: RemoveExistingProducts. Return value 0.
MSI (s) (A0:04) [13:24:48:092]: Invoking remote custom action. DLL: C:\WINDOWS\Installer\MSID735.tmp, Entrypoint: ChangeServiceConfig
Action start 13:24:48: ChangeServiceConfig.D20239D7_E87C_40C9_9837_E70B8D4882C2.

Function=ChangeServiceConfig

Doing Action: ChangeServiceConfig
PerfTime Start: ChangeServiceConfig : Mon Feb 21 13:24:48 2011

Service name: MSSQL$MICROSOFT##SSEE
Startup type = 0
Status = 3

8.Run install again, you may find something like this in the logs:-
Property(S): SqlUpgradeMessage = Service 'MSSQL$Microsoft##SSEE' could not be started. Verify that you have sufficient privileges to start system services. The error code is (3417)
Property(S): UpgradeInstruction = Start service MSSQL$Microsoft##SSEE with parameters -m SqlSetup -T4022 -T4010
Connect to SQL instance RD-SERVER2\MICROSOFT##SSEE as sysadmin
Launch SQL statement USE master
Launch script file C:\WINDOWS\SYSMSI\SSEE\MSSQL.2005\MSSQL\Install\sysdbupg.sql
Launch script file C:\WINDOWS\SYSMSI\SSEE\MSSQL.2005\MSSQL\Install\DbEngine_hotfix_install.sql
Launch script file C:\WINDOWS\SYSMSI\SSEE\MSSQL.2005\MSSQL\Install\systemdbsig.sql
Stop service MSSQL$Microsoft##SSEE
MSI (s) (A0:EC) [13:32:02:359]: Note: 1: 1729
MSI (s) (A0:EC) [13:32:02:359]: Product: Windows Internal Database -- Configuration failed.

MSI (s) (A0:EC) [13:32:02:359]: Cleaning up uninstalled install packages, if any exist
MSI (s) (A0:EC) [13:32:02:359]: MainEngineThread is returning 1603
MSI (s) (A0:B8) [13:32:02:468]: Destroying RemoteAPI object.
MSI (s) (A0:74) [13:32:02:468]: Custom Action Manager thread ending.
=== Logging stopped: 2/21/2011 13:32:02 ===
MSI (c) (B4:B0) [13:32:02:468]: Decrementing counter to disable shutdown. If counter >= 0, shutdown will be denied. Counter after decrement: -1
MSI (c) (B4:B0) [13:32:02:468]: MainEngineThread is returning 1603
=== Verbose logging stopped: 2/21/2011 13:32:02 ===

9.Check the eventlogs. You may find something like this:-
Event Type: Error
Event Source: MSSQL$MICROSOFT##SSEE
Event Category: (2)
Event ID: 17207
Date: 2/21/2011
Time: 1:34:50 PM
User: N/A
Computer: RD-SERVER2
Description:
FCB::Open: Operating system error 5(Access is denied.) occurred while creating or opening file 'C:\WINDOWS\SYSMSI\SSEE\MSSQL.2005\MSSQL\DATA\master.mdf'. Diagnose and correct the operating system error, and retry the operation.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 37 43 00 00 10 00 00 00 7C......
0008: 1b 00 00 00 52 00 44 00 ....R.D.
0010: 2d 00 53 00 45 00 52 00 -.S.E.R.
0018: 56 00 45 00 52 00 32 00 V.E.R.2.
0020: 5c 00 4d 00 49 00 43 00 \.M.I.C.
0028: 52 00 4f 00 53 00 4f 00 R.O.S.O.
0030: 46 00 54 00 23 00 23 00 F.T.#.#.
0038: 53 00 53 00 45 00 45 00 S.S.E.E.
0040: 00 00 00 00 00 00 ......

10.Go to C:\WINDOWS\SYSMSI\SSEE\MSSQL.2005\MSSQL\DATA\ and give full control to Network Service (the service account of the windows internal database service) on the Data folder.

11.Run install again. You should find "Configuration completed" in the logs this time.

The service should come online after that.

If you find an "Access denied" error message related to default traces in the event logs, go to C:\WINDOWS\SYSMSI\SSEE\MSSQL.2005\MSSQL\Log and give full control to Network Service (the service account of the windows internal database service) on the Log folder.
After this, reboot the box, and try again.

Hope this helps.

An interesting “Issue” with adding Windows Logins in SQL Server

$
0
0

Now here’s one that had me stumped for quite some time. A brief description of the issue:-

1.     I have 2 instances of SQL Server set up on different servers, both with a Case sensitive collation

2.     At least one of the instances is installed on a cluster.

3.     The Windows version for all of them is Windows Server 2008 R2, with the default collation settings (case insensitive).

When I try to add a windows login to one SQL environment, I am able to add the login in either case. But when I add the login in the other environment, I am unable to add it in both cases. I am only able to add it in one particular case. When trying to add it in the second case I get this error:-
   

Msg 15401, Level 16, State 1, Line 1
Windows NT user or group 'HARSH2K8\HARSH' not found. Check the name again.

So the question arises, why?

After spending quite some time researching this “weird” issue, I found some interesting stuff, which I will try to explain here without going into too much technical detail:-

    • When we add a windows login in SQL, the basic steps it performs internally go something like this:-

o   Take the name of the login, try to retrieve the SID. If unsuccessful, raise error.

o   Take the SID retrieved in the previous step, and try to fetch the login name for that SID. If unsuccessful, raise error.

o   Take the newly retrieved name, and compare it with the name passed originally (to be added as windows login). If both do not match, raise error.

    • As you probably guessed already, to perform all these steps, SQL uses Windows API calls. These are the same calls which are used when you try to add a Login to the security permissions on a File/folder (where we use the “Check Name”button).
    • The issue arises because the windows API used to retrieve the login name for a SID can return the login in any case (not necessarily the same case in which the Login is defined in Active Directory). And when the name is returned, the SQL code does a “simple” i.e. case sensitive comparison. This is where the mismatch occurs and we see the error mentioned above.
    • There is also some “caching” of the name for a SID at the machine level.

Though the jury is still out on whether the issue is with SQL Server for not doing the comparison in a case-insensitive manner, or with the Windows API for not returning the Login in the same case in which it exists in AD, the product group figured the best way was to fix it themselves. So, they modified the code for SQL 2012, and the code change was made in the RTM version of SQL 2012. However, due to certain restrictions, we were not able to back port it for the earlier versions of SQL. Also, there is a pretty obvious workaround of adding the login in the other case. The only time you’re likely to run into an issue with that is if you try to add one of the two servers as a Linked server in the other one, in which case authentication will fail since the logins are in different case on the two servers. In such a scenario, try rebooting the box, and then try to add the login. If that doesn’t work, then the best way out is to use a different login altogether, one which you’re able to add in the same case on both servers.

Not a very common or helpful post, I know, but an interesting one nonetheless. What say?

VSS backups might cause SQL to generate Non-Yielding Scheduler dumps if Backup verification is turned on

$
0
0

Found an interesting Non-Yielding scheduler recently. Opened the dump, and found function calls related to backup verification (such as validating the file name, verifying that the drive is part of the cluster group, etc.) at the top of the stack:

Child-SP          RetAddr           Call Site
00000000`29cda478 000007fe`fe21a776 ntdll!ZwAlpcSendWaitReceivePort
00000000`29cda480 000007fe`fe2bcc74 rpcrt4!LRPC_CCALL::SendReceive
00000000`29cda540 000007fe`fe2bcf25 rpcrt4!NdrpClientCall3
00000000`29cda800 000007fe`f8902196 rpcrt4!NdrClientCall3
00000000`29cdab90 000007fe`f89023bb clusapi!ConnectCluster
00000000`29cdac10 00000000`01e83b60 clusapi!OpenClusterImpl
00000000`29cdac80 00000000`01e903f3 sqlservr!FClusMgr::VerifyDriveInClusterGroup
00000000`29cdbd00 00000000`024d258f sqlservr!FileMgr::ValidateFileName
00000000`29cdc5e0 00000000`02dbf13f sqlservr!BackupFileList::GenerateVolumeUsageList
00000000`29cdc610 00000000`02dc03e0 sqlservr!BackupOperation::VerifyBackupSet
00000000`29cddb20 00000000`02dc3d2d sqlservr!BackupEntry::VerifyBackupSet
00000000`29cddcf0 00000000`022224d3 sqlservr!CStmtLoadVol::XretExecute
00000000`29cdde30 00000000`02225f9c sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn
00000000`29cdde60 00000000`015bdad0 sqlservr!CMsqlExecContext::ExecuteStmts<1,0>
00000000`29cdee10 00000000`00724479 sqlservr!CMsqlExecContext::FExecute

I confirmed that SQL was indeed installed on a cluster. I also found these messages in the event logs, at around the same time as the Non-Yielding dump:-

Information 3/15/2012 1:03:59 AM SQLISPackage100 12289 None "Package ""Backup_TLog_EnterpriseSecurity"" finished successfully."
Information 3/15/2012 1:03:58 AM SQLISPackage100 12289 None "Package ""Backup_TLog_Caching"" finished successfully."
Information 3/15/2012 1:03:56 AM MSSQL$REPL 1440 Server Database mirroring is active with database 'AmsAuditing' as the principal copy. This is an informational message only. No user action is required.
Information 3/15/2012 1:03:55 AM MSSQL$REPL 1442 Server Database mirroring is inactive for database 'AmsAuditing'. This is an informational message only. No user action is required.

Warning 3/15/2012 1:03:55 AM VSS 4003 None Volume Shadow Copy Service warning: Writer received a Freeze event more than two minutes ago. The writer is still waiting for either an Abort or a Thaw event.
Operation:
   Gathering Writer Data
Context:
   Writer Class Id: {41e12264-35d8-479b-8e5c-9b23d1dad37e}
   Writer Name: Cluster Database
   Writer Instance ID: {db6da9b0-343e-43e2-ab1d-78501c1c1d32}

Warning 3/15/2012 1:03:55 AM VSS 4003 None Volume Shadow Copy Service warning: Writer received a Freeze event more than two minutes ago. The writer is still waiting for either an Abort or a Thaw event.
Operation:
   Gathering Writer Data
Context:
   Writer Class Id: {4dc3bdd4-ab48-4d07-adb0-3bee2926fd7f}
   Writer Name: Shadow Copy Optimization Writer
   Writer Instance ID: {58c69918-2ebe-4bf2-8035-4cdc660ff0c2}

Information 3/15/2012 1:03:23 AM MSSQL$REPL 17883 Server Process 0:0:0 (0x1754) Worker 0x00000016BF0BA1A0 appears to be non-yielding on Scheduler 29. Thread creation time: 12974041676579. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 8%%. System Idle 91%%. Interval: 70095 ms.

Information 3/15/2012 1:03:01 AM MSSQL$REPL 18265 Backup Log was backed up. Database: EnterpriseSecurity, creation date(time): 2008/03/15(12:30:30), first LSN: 11537:1227:1, last LSN: 11537:1229:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'N:\EnterpriseSecurity_backup_2012_03_15_010301_4748794.trn'}). This is an informational message only. No user action is required.
Information 3/15/2012 1:03:01 AM SQLISPackage100 12288 None "Package ""Backup_TLog_EnterpriseSecurity"" started."
Information 3/15/2012 1:02:04 AM MSSQL$REPL 18265 Backup Log was backed up. Database: Caching, creation date(time): 2008/03/15(12:30:10), first LSN: 31246:4891:1, last LSN: 31252:2022:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'N:\Caching_backup_2012_03_15_010201_5628888.trn'}). This is an informational message only. No user action is required.
Information 3/15/2012 1:02:00 AM SQLISPackage100 12288 None "Package ""Backup_TLog_Caching"" started."

So if you read the message stack from the bottom up, you see the “Backup_Tlog_Caching” and “Backup_Tlog_EnterpriseSecurity” jobs starting, then the Backup log message, followed by the VSS warning, and finally we see both the “Backup_Tlog..” packages complete successfully.

After discussing with my (did i say genius?) TL, we found that this is what is happening:-

  • The customer is running SQL Backups, with the verify backup integrity option enabled.
  • The customer is also running VSS (volume) backups.

How the Non-Yielding situation arises

  • SQL Server runs the backup job, and the backup is successful
  • After that, it needs to verify the backup, which involves several steps such as checking if the disk is part of the cluster group, verify backupset, validate filename, etc. For these operations, SQL needs access to the disk.
  • However, the VSS backup is running at the same time on the target drive, and as a result, I/O on the drive is frozen. So the SQL request is blocked i.e. made to wait
  • While waiting, the SQL request hits the time threshold for a non-yielding scheduler dump i.e. it’s made to wait for long enough to trigger the non-yielding scheduler condition. This is why we see the non-yielding dump.

So the obvious workaround/solution to this situation would be to either change the schedule of the VSS backups, or (much more simple) remove the verify backup integrity option from the SQL backups (which would eliminate the need to call those cluster API’s, and hence not cause the SQL server thread to be blocked).

An interesting one, what say?


Access Violation dumps and metadata corruption

$
0
0

This is an issue that I had been busy working on these past few days. We were getting AV dumps on the DB, and when I looked at the stack, I found that SQL was calling a function to get the name of a column(given a table name), an index id and key id. This function call resulted in a an exception being generated, which is what caused the AV dump:

Child-SP          RetAddr           Call Site
00000000`249676c8 00000000`76ecc0b0 ntdll!ZwWaitForSingleObject+0xa
00000000`249676d0 00000000`01596369 kernel32!WaitForSingleObjectEx+0x9c
00000000`24967790 00000000`01595d2b sqlservr!CDmpDump::DumpInternal+0x4d9
00000000`24967890 00000000`01f95080 sqlservr!CDmpDump::Dump+0x3b
00000000`249678e0 00000000`0204ebae sqlservr!SQLDumperLibraryInvoke+0x1a0
00000000`24967910 00000000`021968d5 sqlservr!CImageHelper::DoMiniDump+0x3ce
00000000`24967af0 00000000`0219728c sqlservr!ContextDumpNoStackOverflow+0x325
00000000`24968340 00000000`021978ea sqlservr!ContextDump+0x7bc
00000000`24968da0 00000000`01f6db08 sqlservr!stackTraceExceptionFilter+0x24a
00000000`24968df0 00000000`01f820d8 sqlservr!SOS_OS::ExecuteDumpExceptionHandlerRoutine+0x28
00000000`24969080 00000000`0267fda5 sqlservr!GenerateExceptionDump+0x48
00000000`249690b0 00000000`0267fe4c sqlservr!ex_trans_cexcept+0x45
00000000`249690f0 00000000`74f6acf0   sqlservr!SOS_SEHTranslator+0x4c
00000000`24969120 00000000`74f69e0b msvcr80!_CallSETranslator+0x40
00000000`24969190 00000000`74f6a62b msvcr80!FindHandlerForForeignException+0x9b
00000000`24969230 00000000`74f6a86b msvcr80!FindHandler+0x63b
00000000`249697e0 00000000`74f6abe7 msvcr80!__InternalCxxFrameHandler+0x1fb
00000000`24969830 00000000`00d10cf3 msvcr80!__CxxFrameHandler+0x77
00000000`24969880 00000000`770058dd sqlservr!__GSHandlerCheck_EH+0x63
00000000`249698b0 00000000`770096d7 ntdll!RtlpExecuteHandlerForException+0xd
00000000`249698e0 00000000`77016e08 ntdll!RtlDispatchException+0x20c
00000000`24969f80 00000000`025ff563 ntdll!KiUserExceptionDispatch+0x2e
00000000`2496a520 00000000`00bc516b sqlservr!WstrIndkeyWstrI4I4+0x323
00000000`2496a670 00000000`00869f6c sqlservr!CQScanNLJoinNew::GetRowHelper+0x119b
00000000`2496abd0 00000000`00c72c2b sqlservr!CQScanSortNew::BuildSortTable+0x18c
00000000`2496ac90 00000000`0086cc15 sqlservr!CQScanTopSortNew::Open+0x47
00000000`2496acc0 00000000`0086cb2e sqlservr!CQueryScan::Startup+0xcd
00000000`2496ad10 00000000`0086bdea sqlservr!CXStmtQuery::SetupQueryScanAndExpression+0x412
00000000`2496ad70 00000000`0087389b sqlservr!CXStmtQuery::ErsqExecuteQuery+0x2f8
00000000`2496dd80 00000000`0086fe6b sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0xcc2
00000000`2496e030 00000000`0086f789 sqlservr!CMsqlExecContext::FExecute+0x58b
00000000`2496e1b0 00000000`0245bcfd sqlservr!CSQLSource::Execute+0x319
00000000`2496e2e0 00000000`02460b34 sqlservr!ExecuteSql+0x72d
00000000`2496ed60 00000000`02e43271 sqlservr!CSpecProc::ExecuteSpecial+0x234
00000000`2496ee80 00000000`00871270 sqlservr!CSpecProc::Execute+0x1f1
00000000`2496eff0 00000000`008cf87a sqlservr!process_request+0x370
00000000`2496f2b0 00000000`0080b29b sqlservr!process_commands+0x1ba
00000000`2496f4b0 00000000`0080af5a sqlservr!SOS_Task::Param::Execute+0x11b
00000000`2496f5d0 00000000`0080ac35 sqlservr!SOS_Scheduler::RunTask+0xca
00000000`2496f660 00000000`00dbc560 sqlservr!SOS_Scheduler::ProcessTasks+0x95
00000000`2496f6d0 00000000`00dbaca0 sqlservr!SchedulerManager::WorkerEntryPoint+0x110
00000000`2496f790 00000000`00dba640 sqlservr!SystemThread::RunWorker+0x60
00000000`2496f7c0 00000000`00dbc6ff sqlservr!SystemThreadDispatcher::ProcessWorker+0x12c
00000000`2496f850 00000000`74f337d7 sqlservr!SchedulerManager::ThreadEntryPoint+0x12f
00000000`2496f8e0 00000000`74f33894 msvcr80!_callthreadstartex+0x17
00000000`2496f910 00000000`76ebbe3d msvcr80!_threadstartex+0x84
00000000`2496f940 00000000`76ff6861 kernel32!BaseThreadInitThunk+0xd
00000000`2496f970 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

From the dump, I was able to extract the dbid and the object id, and when we tried to run update statistics on the table in question, it failed with the error:-

Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.

So definitely there was some Metadata corruption here. However, we were able to update statistics on all the indexes and stats on the table explicitly without any issues, but the update table continued to fail with the same error.

It was then that we stumbled upon another interesting piece of the puzzle. When we ran a “Select * from sysindexes”, the query failed with the same error…!!! Based on this, and some more research, we summed up that there was a statistic against this table (and a system statistic at that), which was present in sysindexes. This makes sense, as for individual index rebuilds, we would not need to scan the sysindexes dmv, but when running update statistics against the table, we would need to scan the dmv based on the object id, which causes an AV. We were able to confirm this by querying only the status, id, name and indid columns of sysindexes for the object id in question. We saw that the statistic was mapped to an index id which was not present in any of the other dmv’s such as sys.stats, sys.stats_columns, sys.sysindexkeys, etc.

We even found a KB explaining the issue, and a fix for it, which would prevent the issue from occurring in the future (see here).

But the question remained, how do we get rid of it now? As you probably guessed, we just need to delete the offending statistic, right? But when trying to run the Drop statistics statement, we were getting the error:-

Msg 3701, Level 11, State 6, Line 1

Cannot drop the statistics 'Fielders._WA_Sys_08000002_1FA46B10', because it does not exist or you do not have permission.

So we connected through the DAC Connection (simple, just type Admin: Servername\Instancename in the connection string in SSMS, but remember you have to be a sysadmin for this), and enclosed the statistic name in square brackets, and ran the drop statistics command, and would you believe it, it worked like a charm.

Hope that the next time you run into a metadata corruption issue like this, you know what to do.

As always, feedback/comments/suggestions are both welcome and solicited.

SQL 2005 Patch on cluster might fail with “No Passive nodes were successfully patched”

$
0
0

This is one of those “rare” setup issues you might run into, on a “Bad luck” day. The patch fails, and in the summary.txt, you see this at the bottom:-

Summary
     No passive nodes were successfully patched
     Exit Code Returned: 11009
Looking at the hotfix.log, you find this:-

03/18/2012 11:57:51.200 MSP Error: 29527  The setup has encountered an unexpected error in datastore. The action is RestoreSetupParams. The error is :  Source File Name: datastore\cachedpropertycollection.cpp
Compiler Timestamp: Tue Sep 21 15:48:22 2010
     Function Name: CachedPropertyCollection::findProperty
Source Line Number: 130
----------------------------------------------------------
Failed to find property "OwningGroup" {"VirtualServerInfo", "", "SQLINST"} in cache
      Source File Name: datastore\clusterinfocollector.cpp
    Compiler Timestamp: Tue Sep 21 15:48:22 2010
         Function Name: ClusterInfoCollector::collectClusterVSInfo
    Source Line Number: 888
    ----------------------------------------------------------
    Failed to detect VS info due to datastore exception.
          Source File Name: datastore\machineconfigscopeproperties.cpp
        Compiler Timestamp: Tue Sep 21 15:48:22 2010
             Function Name: SqlInstallConfigScope.InstanceName
        Source Line Number: 95
        ----------------------------------------------------------
        Error:
03/18/2012 11:57:51.357 Attempting to continue the 32 bit ngen queue
03/18/2012 11:57:51.592 Attempting to continue the 64 bit ngen queue
03/18/2012 11:57:51.639 The patch installation could not proceed due to unexpected errors

Nothing conclusive, right? So now we proceed to the relevant file ending with sqlrun_sql.msp.log file. In there, I found this:-

<Func Name='RestoreSetupParams'>
Failed to find installation media path due to datastore exception
in FindSetupFolder()
MSI (s) (30!C4) [11:57:48:930]: Note: 1: 2203 2: C:\WINDOWS\system32\Setup\SqlRun.msi 3: -2147287038


Loaded DLL:
C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\xmlrw.dll
Version:
2.0.3609.0


Failed to find installation media path due to datastore exception
in FindSetupFolder()


Failed to find installation media path due to datastore exception
in FindSetupFolder()


Failed to find installation media path due to datastore exception
in FindSetupFolder()


Failed to find installation media path due to datastore exception
in FindSetupFolder()


Failed to find installation media path due to datastore exception
in FindSetupFolder()


Failed to find installation media path due to datastore exception
in FindSetupFolder()
MSI (s) (30!C4) [11:57:50:386]: PROPERTY CHANGE: Adding SqlInstanceName property. Its value is 'MSSQLSERVER'.
MSI (s) (30!C4) [11:57:50:386]: PROPERTY CHANGE: Adding INSTANCENAME property. Its value is 'MSSQLSERVER'.


Loaded DLL:
C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\sqlsval.dll
Version:
2005.90.5000.0


MSI (s) (30!C4) [11:57:51:185]: Transforming table Error.

MSI (s) (30!C4) [11:57:51:185]: Note: 1: 2262 2: Error 3: -2147287038
MSI (s) (30!C4) [11:57:51:185]: Transforming table Error.

MSI (s) (30!C4) [11:57:51:185]: Transforming table Error.

MSI (s) (30!C4) [11:57:51:185]: Note: 1: 2262 2: Error 3: -2147287038
MSI (s) (30!C4) [11:57:51:185]: Transforming table Error.

MSI (s) (30!C4) [11:57:51:185]: Note: 1: 2262 2: Error 3: -2147287038
MSI (s) (30!C4) [11:57:51:185]: Transforming table Error.

MSI (s) (30!C4) [11:57:51:185]: Note: 1: 2262 2: Error 3: -2147287038
Error Code: 29527
MSI (s) (30!C4) [11:57:54:034]: Transforming table Error.

MSI (s) (30!C4) [11:57:54:034]: Note: 1: 2262 2: Error 3: -2147287038
MSI (s) (30!C4) [11:57:54:034]: Transforming table Error.

MSI (s) (30!C4) [11:57:54:034]: Transforming table Error.

MSI (s) (30!C4) [11:57:54:034]: Note: 1: 2262 2: Error 3: -2147287038
MSI (s) (30!C4) [11:57:54:034]: Transforming table Error.

MSI (s) (30!C4) [11:57:54:034]: Note: 1: 2262 2: Error 3: -2147287038
MSI (s) (30!C4) [11:57:54:034]: Transforming table Error.

MSI (s) (30!C4) [11:57:54:034]: Note: 1: 2262 2: Error 3: -2147287038
MSI (s) (30!C4) [11:57:54:034]:
MSI (s) (30:B4) [11:57:54:050]: Transforming table InstallExecuteSequence.

MSI (s) (30:B4) [11:57:54:050]: Note: 1: 2262 2: InstallExecuteSequence 3: -2147287038
MSI (s) (30:B4) [11:57:54:065]: Transforming table InstallExecuteSequence.

MSI (s) (30:B4) [11:57:54:065]: Transforming table InstallExecuteSequence.

MSI (s) (30:B4) [11:57:54:065]: Note: 1: 2262 2: InstallExecuteSequence 3: -2147287038
MSI (s) (30:B4) [11:57:54:065]: Transforming table InstallExecuteSequence.

MSI (s) (30:B4) [11:57:54:065]: Note: 1: 2262 2: InstallExecuteSequence 3: -2147287038
MSI (s) (30:B4) [11:57:54:065]: Transforming table InstallExecuteSequence.

MSI (s) (30:B4) [11:57:54:065]: Note: 1: 2262 2: InstallExecuteSequence 3: -2147287038
MSI (s) (30:B4) [11:57:54:065]: Product: Microsoft SQL Server 2005 (64-bit) - Update 'Service Pack 4 for SQL Server Database Services 2005 (64-bit) ENU (KB2463332)' could not be installed. Error code 1603. Additional information is available in the log file C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB2463332_sqlrun_sql.msp.log.

…..

MSI (s) (30:B4) [11:57:54:097]: Note: 1: 2262 2: Error 3: -2147287038
MSI (s) (30:B4) [11:57:54:097]: Product: Microsoft SQL Server 2005 (64-bit) -- Configuration failed.

Any guesses as to what might be causing the issue? Nothing conclusive from the logs right? I agree. So, the root cause here, or at least on the 3-4 cases of this nature that I have worked on, was that there was another group/disk in the cluster which was in “failed” state. Once we fixed that (either by bringing the group/disk online, or deleting it altogether if it’s not being used), we were able to install the patch successfully. This is because the SQL setup loops through (we call it enumeration) all the disks in all the cluster groups, and not just the group in which SQL is installed.

Hope this helps. Let me know if there’s any comments/discrepancies, or any other causes that you find for this issue.

SQL 2005 patch fails with 1642 “Unable to install Windows Installer MSP file”

$
0
0

This one is for all my DBA friends out there. I recently ran into this issue when running a security patch installation for a SQL 2005 instance on SP4. The setup failed, and when I looked into the “C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\Log\Hotfix” folder (this is where the patch setup files for 2005 are to be found), here’s what I found in the latest summary.txt:-

**********************************************************************************
Product Installation Status
Product : SQL Server Database Services 2005 (MSSQLSERVER)
Product Version (Previous): 5000
Product Version (Final) :
Status : Failure
Log File : C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB2494120_sqlrun_sql.msp.log
Error Number : 1642
Error Description : Unable to install Windows Installer MSP file
----------------------------------------------------------------------------------
Product : SQL Server Tools and Workstation Components 2005
Product Version (Previous): 5000
Product Version (Final) :
Status : Failure
Log File : C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQLTools9_Hotfix_KB2494120_sqlrun_tools.msp.log
Error Number : 1642
Error Description : Unable to install Windows Installer MSP file
----------------------------------------------------------------------------------

“Hmmm”, I  thought to myself, “here’s one I haven’t seen before”. Since the Log file indicated was in the same folder, I pulled up the “SQL9_Hotfix_KB2494120_sqlrun_sql.msp.log” file, and here’s an extract from the time the error occurred:-

MSI (s) (D8:6C) [07:36:23:597]: File will have security applied from OpCode.
MSI (s) (D8:6C) [07:36:23:644]: Original patch ==> e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp
MSI (s) (D8:6C) [07:36:23:644]: Patch we're running from ==> C:\WINDOWS\Installer\5daea.msp
MSI (s) (D8:6C) [07:36:23:644]: SOFTWARE RESTRICTION POLICY: Verifying patch --> 'e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp' against software restriction policy
MSI (s) (D8:6C) [07:36:23:644]: Note: 1: 2262 2: DigitalSignature 3: –2147287038
MSI (s) (D8:6C) [07:36:23:644]: SOFTWARE RESTRICTION POLICY: e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp is not digitally signed
MSI (s) (D8:6C) [07:36:23:644]: SOFTWARE RESTRICTION POLICY: e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp is permitted to run at the 'unrestricted' authorization level.
MSI (s) (D8:6C) [07:36:23:660]: SequencePatches starts. Product code: {130A3BE1-85CC-4135-8EA7-5A724EE6CE2C}, Product version: 9.00.1399.06, Upgrade code: {929C9FEC-8873-4A1A-A209-9AF432E8E1D1}, Product language 1033
MSI (s) (D8:6C) [07:36:23:660]: 3.0 patch e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp is of type QFE
MSI (s) (D8:6C) [07:36:23:660]: PATCH SEQUENCER: verifying the applicability of QFE patch e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp against product code: {130A3BE1-85CC-4135-8EA7-5A724EE6CE2C}, product version: 9.00.1399.06, product language 1033 and upgrade code: {929C9FEC-8873-4A1A-A209-9AF432E8E1D1}
MSI (s) (D8:6C) [07:36:23:660]: Validating transform 'Target01ToUpgrade01' with validation bits 0x920
MSI (s) (D8:6C) [07:36:23:660]: Note: 1: 2749 2: Target01ToUpgrade01 3: C:\WINDOWS\Installer\5daea.msp 4: 9.4.5000.00 5: 9.00.1399.06
MSI (s) (D8:6C) [07:36:23:660]: 1: 2749 2: Target01ToUpgrade01 3: C:\WINDOWS\Installer\5daea.msp 4: 9.4.5000.00 5: 9.00.1399.06
MSI (s) (D8:6C) [07:36:23:660]: PATCH SEQUENCER: QFE patch e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp is not applicable.
MSI (s) (D8:6C) [07:36:23:660]: SequencePatches returns success.
MSI (s) (D8:6C) [07:36:23:660]: Final Patch Application Order:
MSI (s) (D8:6C) [07:36:23:660]: Other Patches:
MSI (s) (D8:6C) [07:36:23:660]: Unknown\Absent: {89F18EEE-A409-4B25-915A-0F03651ECF48} - e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp
MSI (s) (D8:6C) [07:36:23:660]: Product: Microsoft SQL Server 2005 - Update '{89F18EEE-A409-4B25-915A-0F03651ECF48}' could not be installed. Error code 1642. Additional information is available in the log file C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB2494120_sqlrun_sql.msp.log.
MSI (s) (D8:6C) [07:36:23:660]: Note: 1: 1708
MSI (s) (D8:6C) [07:36:23:660]: Note: 1: 2729
MSI (s) (D8:6C) [07:36:23:660]: Note: 1: 2729
MSI (s) (D8:6C) [07:36:23:660]: Product: Microsoft SQL Server 2005 -- Installation failed.

Just for kicks, I also checked out the Hotfix.log (it’s the precursor to the “Detail.txt” in SQL 2008 that we so often use). Here’s an extract from it for reference:-

03/29/2012 07:36:17.986 Copy Engine: Creating MSP install log file at: C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB2494120_sqlrun_sql.msp.log
03/29/2012 07:36:17.986 Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
03/29/2012 07:36:17.986 Registry: Cannot read registry key value "Debug", error 0
03/29/2012 07:36:23.785 MSP returned 1642: The installer cannot install the upgrade patch because the program being upgraded may be missing or the upgrade patch updates a different version of the program. Verify that the program to be upgraded exists on your computer and that you have the correct upgrade patch.
03/29/2012 07:36:23.785 Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
03/29/2012 07:36:23.785 Registry: Cannot read registry key value "Debug", error 997
03/29/2012 07:36:23.801 Copy Engine: Error, unable to install MSP file: e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixSQL\Files\sqlrun_sql.msp
03/29/2012 07:36:23.801 The following exception occurred: Unable to install Windows Installer MSP file Date: 03/29/2012 07:36:23.801 File: \depot\sqlvault\stable\setupmainl1\setup\sqlse\sqlsedll\copyengine.cpp Line: 807
03/29/2012 07:36:24.066 Watson: Param1 = Unknown
03/29/2012 07:36:24.066 Watson: Param2 = 0x66a
03/29/2012 07:36:24.066 Watson: Param3 = Unknown
03/29/2012 07:36:24.066 Watson: Param4 = 0x66a
03/29/2012 07:36:24.066 Watson: Param5 = copyengine.cpp@807
03/29/2012 07:36:24.066 Watson: Param6 = Unknown
03/29/2012 07:36:24.066 Watson: Param7 = SQL9
03/29/2012 07:36:24.066 Watson: Param8 = @
03/29/2012 07:36:24.066 Watson: Param9 = x86
03/29/2012 07:36:24.066 Watson: Param10 = 5057
03/29/2012 07:36:24.066 Installed product: SQL9
03/29/2012 07:36:24.066 Installing product: SQLTools9
03/29/2012 07:36:24.285 Registry: Opened registry key "Software\Microsoft\Windows\CurrentVersion\Uninstall"
03/29/2012 07:36:24.301 Installing instance: SQL Tools
03/29/2012 07:36:24.301 Installing target: SPJP063
03/29/2012 07:36:24.301 Installing file: sqlrun_tools.msp
03/29/2012 07:36:24.332 Copy Engine: Creating MSP install log file at: C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQLTools9_Hotfix_KB2494120_sqlrun_tools.msp.log
03/29/2012 07:36:24.332 Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
03/29/2012 07:36:24.332 Registry: Cannot read registry key value "Debug", error 0
03/29/2012 07:36:38.930 MSP returned 1642: The installer cannot install the upgrade patch because the program being upgraded may be missing or the upgrade patch updates a different version of the program. Verify that the program to be upgraded exists on your computer and that you have the correct upgrade patch.
03/29/2012 07:36:38.930 Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
03/29/2012 07:36:38.930 Registry: Cannot read registry key value "Debug", error 997
03/29/2012 07:36:38.930 Copy Engine: Error, unable to install MSP file: e:\1d8e62c6a0cf9250ed0fe97eebe1\HotFixTools\Files\sqlrun_tools.msp
03/29/2012 07:36:38.930 The following exception occurred: Unable to install Windows Installer MSP file Date: 03/29/2012 07:36:38.930 File: \depot\sqlvault\stable\setupmainl1\setup\sqlse\sqlsedll\copyengine.cpp Line: 807

No clues, right? So finally, in a desperate attempt, I decided to capture a Process Monitor trace (available on Technet, see here ). And whoa, look what I found there:-

00:24:41.4884798 msiexec.exe 6764 RegEnumKey HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\UserData\S-1-5-18\Products\<SQL GUID>\Patches NO MORE ENTRIES Index: 0, Length: 288

“Aha”, I thought to myself, “so this is the problem”. Basically (and you can check this on a normal/healthy installation), the Patches key is supposed to have subkeys (1 for each patch) for all the patches applied to the SQL Server instance so far; and those keys seem to be missing in this case.

So what happened? God knows. Could be anything, could be someone deleted them manually, or some cleanup programs “Cleaned” them up by mistake, etc.

The main question, though, is how do we fix it? Simple enough, removing and re-installing the previous Service Pack should recreate the registry keys, and should thus fix it, right? Wrong. Service Pack uninstallation was introduced from SQL 2008 onwards, so that’s not possible. So what else?

Warning : This is one of those “weird” solutions. Some might even call it a hack, though I just call it exploiting a loophole in the service pack installer. Here are the steps:-

  1. Rename the “sqlservr.exe” in the Binn folder of your instance.
  2. Copy the sqlservr.exe from another instance, that’s on a lower SP/RTM build than the your target instance (in my case, the target was an instance on SP4, so I used the sqlservr.exe from an instance on SP3)
  3. Paste the exe into the Binn folder of your instance.
  4. Now run the SP setup (in my case, it was the SP4 setup), and it should be able to detect SQL on the lower build and allow you to proceed with the install, thereby creating the missing registry entries in the process.

Yes, you could say this is a loophole in the Service Pack install process, that it only checks the build of the sqlservr.exe to determine what build the instance is on, and I would actually agree with you. But in situations like this, it’s these “loopholes” that come in handy.
As always, any comments/feedback/questions are both welcome and solicited.

How to replace/restore start menu shortcuts for any program

$
0
0

Okay, let me admit first up that this is not an out and out SQL Server issue, but one of those interesting ones, that required me to provide an easy workaround. What happened was, someone (or some program) deleted the entire SQL Server folder from the start menu. All the components were, however, still installed and were functioning perfectly.

I actually went to the following path (this was a SQL 2008)

C:\Program Files (x86)\Microsoft SQL Server\100\Tools\binn\VSShell\Common7\IDE

and was able to find the ssms.exe there, using which SQL Server Management Studio started perfectly.

So the question was, how do we get the shortcuts back? Here are the steps:-

  1. Go to another machine which has the shortcuts in place, right click on the SQL Server 2008 folder in the start menu, and select copy:-
    image
  2. Next, go to a windows explorer folder one the same box, and press Ctrl+V. You will see a folder being pasted there:       
           
    image
  3. Now, copy this folder and paste it into the following path on the machine where the shortcuts are missing:-
    C:\ProgramData\Microsoft\Windows\Start Menu\Programs

And voila, all your shortcuts are back. Cool one, isn’t it? 

P.S. Please note that this only works if the installation paths are the same for both the machines involved (which they mostly are for Tools and Workstation Components).

SQL, Sharepoint and the Windows Internal Database – an interesting saga

$
0
0

This one is for all my friends out there who use Sharepoint. A default Sharepoint installation enables/installs the Windows Internal database, and creates its databases on it. The Windows Internal Database is, in a way, a special edition of SQL Server, in the sense that it’s not a Full version, but does not have the data file limitations of SQL Server Express either (yes, you heard that right). Anyways, the focus of this post is going to be on the following things:

  1. How to connect to the Windows Internal Database (to see what's going on at the back-end)
  2. How to troubleshoot common issues such as log file growth for Sharepoint databases attached to Windows Internal database (from a purely SQL perspective)
  3. How to set up automated SQL backups for your Sharepoint databases (remember, Windows Internal database does not have SQL Server Agent, and normal Windows scripts for taking backups will not work either).

Okay, so let’s get started:

Connecting to the Windows Internal Database

If you open the SQL Server Configuration manager on a machine that has Windows Internal database enabled, you will see a service named “Windows Internal Database (MICROSOFT##SSEE)” (also visible on the services console). Right click on the service in SQL Server Configuration manager, go to “Properties”, and click on the “Advanced” tab. Here, select the “Startup Parameters” option, and you will see a drop down next to it. In the drop down, look for the path to the Errorlog. Typically, it will be something like this:

C:\Windows\SYSMSI\SSEE\MSSQL.2005\MSSQL\LOG\ERRORLOG

So now we have the path to the Errorlog for the Windows Internal Database. Open the errorlog in a text editor (notepad or anything else of the sort), and look for the pipe name. Typically, the pipe name looks something like this:

Server local connection provider is ready to accept connection on [ \\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query ]

This is what we will use to connect to the WI database (yeah, I’m feeling lazy). So we just start up SQL Server Management Studio (on the local box, as you cannot connect to the Windows Internal Database over the network), and fill in the pipe name there, which is “\\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query” in our case, and hit Connect, and voila, you’re connected.

Troubleshooting log file growth

Now, if you’re facing issues with, say, log file growth with your Sharepoint databases (which are attached to the Windows Internal Database instance, of course), then as usual, the first thing to check would be the log_reuse_wait_desc column in sys.databases

select log_reuse_wait_desc,* from sys.databases

This should give you a fair idea if there’s anything preventing your log files from reusing the space inside them. From a SQL perspective, perhaps the best thing would be to put the databases in Simple recovery model, so that you can stop worrying about Log file space reuse altogether. I have done this successfully for a couple of my customers, without any adverse impact whatsoever to their environments. But that’s not to say that it will work fine for your environment as well. Please do take a full backup both before and right after you make the change, to be safe. It might also be a good idea to restore the db on another server and test it after changing the recovery model to Simple.

Setting up Automated backups

This is by far the most interesting part of the post, or at least, the one that took me the maximum amount of time to accomplish. My customer wanted to set up automated backups from inside SQL for the Sharepoint databases. After a lot of time and effort in preparing and testing, we finally got the script ready (SQL_WIDB_Backup.sql, see attached).

You need to customize the script according to you database names and file paths, and then configure a bat file which calls the sql script. The bat file will have a command like this (again, please configure according to your environment):

sqlcmd -S\\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query -i c:\SQL_WIDB_Backup.sql -o c:\SQL_WIDB_Backup_Report.txt

The bat file can then be configured to run at specific times using the "Task Scheduler" (Start->Accessories->System Tools).

Hope this helps.

SQL Server Resource database corruption–yes, it’s possible

$
0
0

It’s very rare that I run into an issue with the Resource database, and the one I ran into recently was rarer still. But before I get into the nitty-gritty of the issue, let us begin by outlining a few details about the resource database:

The Resource database

The resource database is a hidden system database, and cannot be accessed explicitly by the users. Also, as documented here, there is no way inside SQL Server to back up the resource DB. The only way to take a backup of the resource db is to make file level copies. This is something that you can do either manually or through VSS (disk level) backups.

Now, it’s not without reason that we do not have any way to take backups of the Resource database. A few salient points:

  • The resource DB is a read-only database
  • Outside of a hardware issue, there is no way for the resource db to get corrupted.

But what if there is a hardware problem, say, god forbid, your SAN crashes, or if there’s some sort of a “scribbler” issue with one of the hardware drivers (more details on that in a different post), and you end up with your resource database corrupted, what do you do? Here are the options, in order:

  1. The ideal way to get out of this situation is to restore the resource db files from file level backups. So if you’re reading about this database for the first time, the first thing you should do is to make file-level copies of the resource db files (or add them to the set of files you back-up using VSS backups). I would recommend taking backups of the resource db files immediately after the successful application of a hotfix/Service Pack/CU/Security Update.
  2. If you are in this situation already, and do not have a backup of your resource db files, do not despair. Simply take another server, install an instance with the same instance id and instance name as the target instance, and bring it to the same build as well. Once this is done, stop the SQL Service, copy the resource db files, and use them to replace the corrupted resource db files on the problem instance. Your SQL server service should come online now. I’ve tested this extensively on SQL 2008 and 2008 R2, and it indeed works.
  3. If this is a cluster, and you’re on SQL 2008 or later, you can try bringing SQL up on the second node. If the second node’s copy of the resource db files are not corrupted, you should be successful.

Now, allow me to explain why this special case described in bullet 3 exists:
In SQL 2005, the resource db was tied to the master database, and the resource db mdf and ldf files had to be in the same folder as the master db files, else your SQL Service would fail to start. In case of a cluster, the resource db resided on a clustered drive, and when the failover happened, the ownership of the resource database was passed to the second node. Since we had only one copy of the resource database to patch, we were able to patch all the nodes on the cluster in a single run in  case of SQL 2005.

This behaviour changed from SQL 2008 onwards. In SQL 2008 and 2008 R2, the resource database is no longer tied to the master database, and exists in the Binn folder instead. So basically, the resource database is a part of the instance binaries from SQL 2008 onwards. This is why, in case of SQL 2008 and 2008 R2, you need to patch both the nodes separately (one by one). Makes sense? This is why I mentioned in point 3 above that if you are on a cluster and SQL is 2008 or later, there is a good chance you might be able to get SQL up on the other node, even if the resource db files on one node are corrupted.

As a last word, if you’re not sure how your resource db files came to be corrupted, please take it as a top priority to find the root cause behind the corruption, as this is definitely something that warrants further investigation.

If you have any interesting incidents to share w.r.t the resource database, please feel free to do so in the comments section.

The ‘NULL’ Debate, and a few other interesting facts

$
0
0

This is for all my developer friends out there. I recently had a very interesting discussion with a friend of mine on the enigma called NULL and how it’s different from, say, an empty string. This is something that’s been under debate for as long as i can remember, and not just in the realm of RDBMS.

So what is NULL? A NULL is an undefined value, and is not equivalent to a space or an empty string. Let me illustrate with an example:

create table t1 (id int, name varchar(20))      --create a table with two fields

go

insert into t1(id) values(1)                           -- insert a row containing the value for the first field only

go

select * from t1            

id    name
1    NULL

Here, because we did not insert anything for the second field, the field was populated with a default value of NULL. Let’s see what happens if we insert a blank string for the second field:

insert into t1 values(2,'')   --just two single quotes, with nothing between  them
go

select * from t1

id    name
1    NULL
2   

In this case, because we specified an empty string, the value does not amount to NULL.

Similarly, if you insert a string containing only spaces in a cell, and then apply the trim functions (ltrim and rtrim) on it, the resultant value will not amount to NULL:

Insert into t1 values(3,'    ')
go

select id, ltrim(rtrim(name)) from t1

id    (No column name)
1    NULL
2   
3   

The Len function

Another interesting thing I discovered was w.r.t the Len function, used to find the length of a character expression. For example, the statement select Len ('Harsh') returns an output of 5. Also, Select Len(‘’) returns 0. Both of these outputs are as expected. However, what if run Select Len (‘     ‘) (this has about 5 whitespaces) ? The expected output is 5 right? Wrong. The output is 0.

Another twist is if you add a character to the end of the string, after the whitespaces, i.e., Select Len (‘    a’) will return an output of 5. Try the following cases as well, just for fun:

Select Len(‘  a  ‘)    --the character a enclosed by 2 whitespaces on each side

Select Len(‘h    ‘)   -- the character h followed by 4 whitespaces

For the first one, the output is 3, and not 5 as I expected. This is because the Len function, by design, ignores trailing spaces. In other words, you could say that it does an implicit rtrim on the string. This is also the reason why the second statement will return a length of 1, not 5 as expected.

In case your application is such that the presence of whitespaces in the data matters and you need them to be counted in the string length (this can be especially true if you’re writing code to move the data as-is to a table/database/application), then a suitable alternative would be the Datalength function. The Datalength function counts whitespaces, both preceding and trailing, when calculating the length. As a simple example, select datalength('  a  ') (a enclosed by 2 whitespaces on each side) will return 5 as against 3 returned by Len.

Hope this helps a few of my developer friends out there.Any comments/suggestions/feedback are welcome.


SQL 2008/2008 R2/2012 setup disappears/fails when installing Setup Support files

$
0
0

I’m sure many of you would have seen this issue when running SQL 2008/2008 R2 setup on a new server. The setup will proceed to install Setup support files, the window will disappear but, strangely enough, the next window never shows up.

Here’s what you need to do:

  1. Click on start->run and type %temp% and press enter (basically, go to the temp folder)
  2. Here, look for SQLSetup.log and SQLSetup_1.log. Open the SQLSetup_1.log file. In there, check for the following messages:
    04/16/2012 17:16:47.950 Error: Failed to launch process
    04/16/2012 17:16:47.952 Error: Failed to launch local setup100.exe: 0x80070003

Typically, you get this error only in SQL 2008, SQL 2008 R2 and SQL 2012. The steps are slightly different for all 3, and I’ve tried to outline them here:

SQL Server 2008

1. Save the following in a .reg file and merge to populate the registry:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\Bootstrap]
"BootstrapDir"="C:\\Program Files\\Microsoft SQL Server\\100\\Setup Bootstrap\\"

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\Bootstrap\Setup]
"PatchLevel"="10.0.1600.22"

2. Next, copy the following files and folders from the media to the specified destinations:

    File/Folder in media

    Destination

    X64/X86 (depending on what architecture you want to install)

    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Release

    Setup.exe

    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Release

    Setup.rll

    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Release\Resources\1033\

    SQL Server 2008 R2

    1. Save the following in a .reg file and merge to populate the registry:

    Windows Registry Editor Version 5.00

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\Bootstrap]
    "BootstrapDir"="C:\\Program Files\\Microsoft SQL Server\\100\\Setup Bootstrap\\"

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\Bootstrap\Setup]
    "PatchLevel"="10.50.1600.00"

    2. Next, copy the following files and folders from the media to the specified destinations:

    File/Folder in media

    Destination

    X64/X86 folder (depending on what architecture you want to install)

    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\SQLServer2008R2

    Setup.exe

    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\SQLServer2008R2

    Resources folder

    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\SQLServer2008R2

    Next, re-run the setup, and it should proceed beyond the point of error this time.

    SQL Server 2012

    1. Save the following in a .reg file and merge to populate the registry:

    Windows Registry Editor Version 5.00

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\Bootstrap]
    "BootstrapDir"="C:\\Program Files\\Microsoft SQL Server\\110\\Setup Bootstrap\\"

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\Bootstrap\Setup]
    "PatchLevel"="11.00.2100.60"

    2. Next, copy the following files and folders from the media to the specified destinations:

    File/Folder in media

    Destination

    X64/X86 folder (depending on what architecture you want to install)

    C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\SQLServer2012

    Setup.exe

    C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\SQLServer2012

    Resources folder

    C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\SQLServer2012

    Next, re-run the setup, and it should proceed beyond the point of error this time.

    As always, comments/suggestions/feedback are welcome and solicited.

    How a tiny little whitespace can make life difficult for your SQL Cluster

    $
    0
    0

    Remember that tiny little whitespace that we tend to ignore most of the time? Believe it or not, there are situations when you could pay heavily if you don’t pay attention to this itsy-bitsy little character. Let me explain how:

    If you have a SQL Server instance, or multiple ones, on a cluster, and decide to have all of them running on the same static ports (on different IP’s, of course), then you might be surprised to see some of the services failing to come online after the change. The reason? Read on.

    When we change the port from SQL Server Configuration manager (SQL Server Network Configuration->Protocols for InstanceName –>TCP/IP->Properties), typically we just remove the value for the TCP Dynamic Ports under IPAll, and enter the static port number in the TCP Port field. A value of 0 in the TCP Dynamic Ports field indicates that Dynamic ports are to be used. By default, the SQL installation uses dynamic ports, and except in the case of a default instance, the static port field is empty.

    Coming back to the topic, say, after we change the port settings to reflect the static port number, we restart the service and it fails to come online. Check the errorlog, and you might see something like this:

    2012-05-17 13:08:29.34 Server      Error: 17182, Severity: 16, State: 1.
    2012-05-17 13:08:29.34 Server      TDSSNIClient initialization failed with error 0xd, status code 0x10. Reason: Unable to retrieve registry settings from TCP/IP protocol's 'IPAll' configuration key. The data is invalid.

    2012-05-17 13:08:29.35 Server      Error: 17182, Severity: 16, State: 1.
    2012-05-17 13:08:29.35 Server      TDSSNIClient initialization failed with error 0xd, status code 0x1. Reason: Initialization failed with an infrastructure error. Check for previous errors. The data is invalid.

    So, the error says the data in the IPAll configuration key is invalid. Where exactly is this key anyways? The TCP protocol, and the IPAll subkey, are located in :

    HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft SQL Server\<InstanceName>\MSSQLServer\SuperSocketNetLib\

    Under the IPAll subkey, you will find the same two “TCP Dynamic Ports” and “TCP Port” keys. Check the value for the TCP Dynamic Ports key. Do you see a whitespace there? If so, then most likely that is the reason for the service startup failure. Removing the whitespace should fix the issue, and the service should come online just fine. This is equivalent to changing it from the SQL Server Configuration manager, and the registry should only be used when you cannot access the SQL Server Configuration Manager for some reason. 

    Hope this helps.

    Backup database results in error “Could not clear 'DIFFERENTIAL' bitmap in database”

    $
    0
    0

    I recently ran into yet another issue, where the error message had absolutely no relation to the final solution. When trying to back up a database, we were getting the following error:

    Msg 18273, Level 16, State 1, Line 1

    Could not clear 'DIFFERENTIAL' bitmap in database 'RS_newTempDB' because of error 9002. As a result, the differential or bulk-logged bitmap overstates the amount of change that will occur with the next differential or log backup. This discrepancy might slow down later differential or log backup operations and

    cause the backup sets to be larger than necessary. Typically, the cause of this error is insufficient resources. Investigate the failure and resolve the cause. If the error

    occurred on a data backup, consider taking a data backup to create a new base for future differential backups.

    When checking the database properties, I noticed that the log file for the DB was just 504 KB in size, and it’s autogrowth was set to 1 percent. Now, since I had seen issues with keeping the autogrowth for log files low in the past (the famous VLF’s issue, which impacts startup and recovery of the DB), I suggested that we increase it. We set the Autogrowth to something like 100 MB, and voila, the backup completed successfully.

    Hope this helps someone.

    The most interesting issue in DB Mirroring you will ever see

    $
    0
    0

    I recently worked on a very interesting “issue” in DB mirroring, relevant to a very specific scenario. Read on to find out more.

    Basically, we have a setup which looks something like this:

    Initial setup with machines A, B and C
    A principal
    B mirror
    C witness


    Take down the principal A (network disconnect or stop SQL)
    A down
    B online principal
    C witness

    Failover happens cleanly. While A is down we do our auto repair (remove witness, add a new mirror on D, establish a witness on E)
    A down
    B online principal
    D mirror
    E witness

    Now when we bring A back up (reconnect network or start SQL)
    A (principal in recovery)
    B online principal
    D mirror
    E witness

    At this point A correctly stays in recovery because it doesn’t have quorum. Now if you restart SQL on A
    A online principal
    B online principal
    D mirror
    E witness

    So we end up with 2 copies of the database online, which can be undesirable in certain situations.

    Looking at the errorlog, when the service is restarted for the first time, we saw these messages:

    2012-06-26 17:10:23.80 spid21s     Error: 1438, Severity: 16, State: 1.

    2012-06-26 17:10:23.80 spid21s     The server instance Partner rejected configure request; read its error log file for more information. The reason 1460, and state 1, can be of use for diagnostics by Microsoft. This is a transient error hence retrying the request is likely to succeed. Correct the cause if any and retry.

    2012-06-26 17:10:26.49 spid19s     Bypassing recovery for database 'XYZ' because it is marked as an inaccessible database mirroring database. A problem exists with the mirroring session. The session either lacks a quorum or the communications links are broken because of problems with links, endpoint configuration, or permissions (for the server account or security certificate). To gain access to the database, figure out what has changed in the session configuration and undo the change.

    Attempting to access the database gives:

    Msg 955, Level 14, State 1, Line 2

    Database XYZ is enabled for Database Mirroring, but the database lacks quorum: the database cannot be opened.  Check the partner and witness connections if configured.

    However, when you restart the service for the second time, you see:

    2012-06-26 17:32:32.51 spid7s      Recovery is writing a checkpoint in database 'XYZ' (5). This is an informational message only. No user action is required.

    After some research and a lot of discussions, we were able to nail it down to the following steps:

    When A comes back on (first startup), it looks for the Partner B and Witness C. It is able to communicate to the Witness C (say) on port 5033. The Witness C sends back a message, (actually a DBM_NOT_SHIPPPING error) indicating it is not part of the mirroring session #1 anymore.

    So the old principal A removed the Witness C from it’s local configuration. Now, after the next restart, it again attempts to contact the mirror B (but not the Witness C, because it has been removed from the configuration on A, remember). The mirror B says it is already part of a different mirror session, mirroring session # 2. So the principal A removes Mirror B also from its configuration.

    At this point the system A is a restarting primary with no witness configured so it has the implied quorum vote of a primary and is able to restart and come on line.  This is the same case as if a drop witness command was executed and had acted on the mirror and witness without the acknowledgement getting to the primary before a restart (command accepted remotely so restarted node syncs with latest configuration on restart).

    In the normal case where a session is dropped while the primary is down the old mirror will return DBM_NOT_SHIPPING which will cause the old primary to drop mirroring locally and stay online.

    The mirror in this case has been configured with a different DBM session so it returns DBM_ALREADYSHIPPING_REMOTE which does not cause the session to drop but the DB (on A) comes online as a principal – no witness, mirror not connected.   Running an alter database set partner off will put it into the same state as the normal case.

    As you probably surmised already, this behaviour is by design. But how to avoid this? One of my esteemed colleagues was able to come up with the following workaround:

    When you remove the mirroring session #1, and establish mirroring session with B as the Primary, D as mirror and E as witness, you need to make sure that the old Witness C is not using the same endpoint(5033) anymore. This, in turn, will ensure that the old Principal A is unable to talk to any of the remnants of the mirror session # 1. As a result, any attempts by A to communicate to Witness C will lead to a timeout. Thus, the old Principal A will remain in a “RECOVERING” state since Quorum is not established yet. The only negative impact of this approach is that you cannot share the same Witness server/endpoint for multiple mirroring sessions.

    Now apart from this, there are few other things you need to account for:

    • After a new mirror session is setup, you need to drain the information about the old principal A from all existing application connections and provide them with the new principal and its partner information. For this a disconnect and reconnect is required.
    • After the old principal A comes backup, you need to use the following commands to remove remnants of mirror session #1 from it and keep it out of application use:

                alter database XYZ set partner off
                go

                alter database XYZ set single_user with rollback immediate
                go

                alter database XYZ set offline
                go

    Not a very common scenario, but an interesting one nonetheless. What say?

    An in-depth look at Ghost Records in SQL Server

    $
    0
    0

    Ghost records are something that are a bit of an enigma for most folks working with SQL Server, and not just because of the name. Today, I’ll seek to explain the concept, as well as identify some troubleshooting techniques.

    The main reason behind introducing the concept of Ghost records was to enhance performance. In the leaf level of an index, when rows are deleted, they're marked as ghost records. This means that the row stays on the page but a bit is changed in the row header to indicate that the row is really a ghost. The page header also reflects the number of ghost records on a page. What this means, in effect, is that the DML operation which fired the delete will return to the user much faster, because it does not have to wait for the records to be deleted physically. Rather, they’re just marked as “ghosted”.

    Ghost records are present only in the index leaf nodes. If ghost records weren't used, the entire range surrounding a deleted key would have to be locked. Here’s an example i picked up from somewhere:
    Suppose you have a unique index on an integer and the index contains the values 1, 30, and 100. If you delete 30, SQL Server will need to lock (and prevent inserts into) the entire range between 1 and 100. With ghosted records, the 30 is still visible to be used as an endpoint of a key-range lock so that during the delete transaction, SQL Server can allow inserts for any value other than 30 to proceed.

    SQL Server provides a special housekeeping thread that periodically checks B-trees for ghosted records and asynchronously removes them from the leaf level of the index. This same thread carries out the automatic shrinking of databases if you have that option set.The ghost record(s) presence is registered in:

    • The record itself
    • The Page on which the record has been ghosted
    • The PFS for that page (for details on PFS, see Paul Randal’s blog here)
    • The DBTABLE structure for the corresponding database. You can view the DBTABLE structure by using the DBCC DBTABLE command (make sure you have TF 3604 turned on).

    The ghost records can be cleaned up in 3 ways:

    • If a record of the same key value as the deleted record is inserted
    • If the page needs to be split, the ghost records will be handled
    • The Ghost cleanup task (scheduled to run once every 5 seconds)

    The Ghost cleanup process divides the “ghost pages” into 2 categories:

    • Hot Pages (frequently visited by scanning processes)
    • Cold Pages

    The Ghost cleanup thread is able to retrieve the list of Cold pages from the DBTABLE for that database, or the PFS Page for that interval. The cleanup task cleans up a maximum of 10 ghost pages at a time. Also, while searching for the ghost pages, if it covers 10 PFS Pages, it yields.

    As far as hot ghost pages are concerned, the ghost cleanup strives to keep the number of such pages below a specified limit. Also, if the thread cleans up 10 hot ghost pages, it yields. However, if the number of hot ghost pages is above the specified (hard-coded) limit, the task runs non-stop till the count comes down below the threshold value.

    If there is no CPU usage on the system, the Ghost cleanup task runs till there are no more ghost pages to clean up.

    Troubleshooting

    So now we get to the interesting part. If your system has some huge delete operations, and you feel the space is not being freed up at all or even not at the rate it should be, you might want to check if there are ghost records in that database. I’ll try to break down the troubleshooting into some logical steps here:

    1. Run the following command:
      Select * from sys.dm_db_index_physical_stats(db_id(<dbname>),<ObjectID>,NULL,NULL,’DETAILED’)
      P.S. The object ID can be looked up from sys.objects by filtering on the name column.

    2. Check the Ghost_Record_Count and Version_Ghost_Record_Count columns (version ghost record count will be populated when you’re using snapshot isolation on the database). If this is high (several million in some cases), then you’ve most probably got a ghost record cleanup issue. If this is SQL Server 2008/2008 R2, then make sure you have applied the patch mentioned in the kb http://support.microsoft.com/kb/2622823

    3. Try running the following command:
      EXEC sp_clean_db_free_space @dbname=N’<dbname>’

    4. If the ghost record count from step 1 is the same (or similar) after running this command, then we might need to dig in a bit deeper.
      Warning: Some of the troubleshooting steps mentioned from hereon are unpublished and might be unsupported by Microsoft. Proceed at your own risk.

    5. Enable Trace Flag 662 (prints detailed information about the work done by the ghost cleanup task when it runs next), and 3605 (directs the output of TF 662 to the SQL errorlog). Please do this during off hours.

    6. Wait for a few minutes, then examine the errorlog. First, you need to check if the database is being touched at all. If so, it’s very much possible that the Ghost Cleanup task is doing it’s job, and will probably catch up in a bit. Another thing to watch out for is, do you see one page being cleaned up multiple times? If so, note the page number and file id. Please ensure you disable the TF 662 after this step (it creates a lot of noise in the errorlog, so please use it for as little time as possible)

    7. Next, run the following command on the page to view its contents
      DBCC PAGE(‘<DBName>’,<file id>,<Page no.>,3)

    8. This will give you the contents of the page. see if you can spot a field called m_ghostRecCnt in the output. If it has a non-zero value, than means the page has ghost records. Also, look for the PFS page for that page. It will look something like PFS (1:1). You can also try dumping the PFS page to see if this page has a ‘Has Ghost’ against it. For more details on the DBCC Page, check out Paul Randal’s post here

     

    Another thing that deserves mention is the special role of the PAGLOCK hint w.r.t ghost records:

    • Running a select statement with the PAGLOCK hint against a table will ensure that all the ghost records in that table are queued for cleanup by the ghost cleanup task.
    • Accommodating the PAGLOCK hint in your delete statement will ensure that the records are deleted there and then, and are not left behind for the Ghost Cleanup task to take care of later. By default, all indexes have the PAGLOCK option turned on (you can check by scripting out a create index task), but they might not be able to get it all the time. This is where the PAGLOCK query hint comes in. It makes your query wait for the Page Lock, so it can clean up the records physically before returning. However, it’s not advisable to use the PAGLOCK hint in your delete statements all the time, as the performance trade-off also needs to be taken into consideration (this is the same purpose for which the Ghost Cleanup task was introduced, remember?). This should be resorted to only under situations where you are facing a definite issue with Ghost Record cleanup, and have a dire need to prevent further ghost records from getting created.

    These steps might or might not solve your problem, but what they will do is give you an insight into how the SQL Server Database Engine works w.r.t Ghost records and their cleanup. One of the most common (and quickest) resolutions for a ghost records issue is to restart SQL Server.

    Once again, this post does not come with any guarantees, and the contents are in no way endorsed by Microsoft or any other corporation or individual.

    Hope this helps you understand the concept of Ghost Records somewhat. You’re more than welcome to share your experiences/opinions/knowledge in the comments section, and I shall be delighted to include them in the contents of the post if suitable.

    Viewing all 62 articles
    Browse latest View live




    Latest Images