SQL Database login for x on x failed. Login failed for x. Event ID: 3351

Fri, Feb 19, 2016 4-minute read

Below follows some notes on an issue I recently encountered and spent best part of a day troubleshooting. This was occurring in a small SharePoint environment but I suspect it could occur in any environment which has some sort of distributed call to a SQL Server.

TL;DR - it was the Dell KACE Agent. See the end of the article for more.

In this case though, there was a SharePoint application server, and a SharePoint WFE that every half hour was showing an error in their event logs with EventID 3351. I noticed the errors were staggered - i.e. one server was at 15 and 45 minutes past the hour and the other was at approx 20 and 50 minutes past the hour.

The error message in the event log read:

SQL database login for ‘Sharepoint_Config’ on instance ‘sqlserver’ failed. Additional error information from SQL Server is included below.

Login failed for user ‘DOMAIN$MACHINENAME'.

Makes sense as although the SQL Server is in mixed mode, the list of domain accounts that can access it is quite refined. Certainly specific machine names are not configured for access. A quick Google suggested that this error can occur when the SPTimerV4 SharePoint Timer Service doesn’t have appropriate permissions to the database. This didn’t sound likely as all timer jobs appeared to be operating correctly, but nevertheless had a look. I also hunted about for all timer jobs running on a 30 minute schedule and the permissions for the domain account running the timer service looked good and there were only 2 jobs that ran on 30 minute schedule… and they were both disabled. So it wasn’t the Timer service.

A further Google search revealed that this can occur with Dynamics CRM NAV, but this wasn’t our case for the same reason as above. Also, a suggestion that a SharePoint Management Pack for System Centre Operations Manager (SCOM) or Systems Centre Configuration Manager (SCCM) could cause it. Whilst there is SCOM in the organisation, the sysadmins advised this wasn’t the case. Nevertheless, I disabled the CcmExec.exe process on the SP servers, and the error still occurred. Bizarre.

I turned to the SQL Server SQL Profiler. I fired up a new trace and refined the events being traced down to Eventlog Error and ErrorLog Errors and waited for the clock to tick by until it hit the half hour mark. The errors duly occurred in the trace log and a bit more of an exception:

Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors. [CLIENT: xxx.xxx.xxx.xxx]

So this confirmed the error, but wasn’t much help. We already knew that it was something trying to access the SQL Server using a machine account… but what?

I cross-referenced the timings of Event Log errors (which were occurring the Application event log as critical errors) with the SharePoint ULS trace log and noticed that the source was Powershell. And indeed, the error in the ULS log was the same exception in the event log (as you would expect.)

This still didn’t help much. I scoured the Windows Task Scheduler looking for something that may be triggering a Powershell script on a timer. Nothing. Had a look through Powershell itself (Get-ScheduledJob cmdlet) - nothing.

By watching the Windows Task Manager around the time the issue occurred, I noticed that indeed, a Powershell process was popping up. Creating a dump of the process didn’t help because although I got a nice load of stuff about Powershell, there was nothing about what script it was actually running.

It was that point that I turned to Sysinternals Process Monitor which if you don’t know it is like a super-powered Task Manager. The main thing is that it will group processes by their process tree. So, as time ticked by and powershell kicked in, I was able to see the full process tree that was invoking powershell. And it was - Dell KACE Agent (AMPAgent.exe or AMPWatchdog.exe). Hurrah. These are some sort of management and configuration agent that sit on a server and communicate back to a machine manager for Dell servers.

I disabled those services and let yet more time tick over. And this time - no error in the event logs.

Problem solved! (Well for me, anyway… the sysadmins who need/use that Dell KACE Service now need to figure out how to get it working properly.)