I ran into this at a client site recently, and wanted to blog my experience. I had a number of things not working as expected in the Cache (including Event ID 1000 and Event ID 1026), and at the end of the day, it appears to have boiled down to 2 things. Firstly, the cache cluster was improperly configured. As such, I ended wiping out the cluster, and rebuilding it. Then after much pain, I found that one of the servers in the cluster that was constantly complaining about not being able to start properly was still misconfigured (using the wrong account), and after stopping the cluster, exporting the config, fixing it, and reimporting the config, then restarting the cluster finally solved the problem for good.
I started with this blog post http://www.sharepointconsultant.ch/2013/03/07/adding-a-local-sharepoint-2013-development-server-as-a-cache-host-to-appfabrics-cache-cluster/.
That gave me the following knowledge:
1) There’s a Windows Service named “AppFabric Caching Service”, which matches 1:1 to each server in the cluster (IE. every server that’s part of the cluster has this service on it, and it should be set to run “Automatic”, and be running, if it’s healthy).
2) The key PowerShell you’ll need to know is as follows.
** Always run your PowerShell window as Administrator when working with the AppFabric Cache **
Start with the following line of PowerShell to let it know who’s boss.
PS C:> Use-CacheCluster
Next, find out the details about your individual host. (It’s most likely configured on port 22233)
PS C:> Get-CacheHostConfig –ComputerName [yourServerName] -CachePort 22233
That should return the details for this server in the cluster. Something like below.
HostName : [Your Server Name]
ClusterPort : 22234
CachePort : 22233
ArbitrationPort : 22235
ReplicationPort : 22236
Size : 400 MB
ServiceName : AppFabricCachingService
HighWatermark : 99%
LowWatermark : 90%
IsLeadHost : True
If, however, you’re getting an error along the lines of:
PS C:> Get-AFCacheHostConfiguration : ErrorCode<ERRCAdmin010>:SubStatus<ES0001>:Specified host is not present in cluster.
You can register your host in the cluster as follows.
PS C:> Register-CacheHost –Provider [yourProvider] –ConnectionString [yourConnectionString] -Account "NT AuthorityNetwork Service" -CachePort 22233 -ClusterPort 22234 -ArbitrationPort 22235 -ReplicationPort 22236 –HostName [yourServerName]
You’ll need 3 pieces of information to properly run the statement above.
yourProvider & yourConnectionString – Can be found in the registry under HKLM Software Microsoft AppFabric V1.0 Configuration or they can also be found in C:Program FilesAppFabric 1.1 for Windows Server in the DistributedCacheService.exe.config file.
yourServerName – The name of your server
(Optionally you can change the account, but I would recommend you leave the Network Service account in place – this seems to keep SharePoint 2013 happy)
Now when you run this command:
PS C:> Get-CacheHost
You should see the following.
HostName : CachePort Service Name Service Status Version Info
——————– ———— ————– ————
MyServer1.domain.com:22233 AppFabricCachingService UP 3 [3,3][1,3]
MyServer2.domain.com:22233 AppFabricCachingService UP 3 [3,3][1,3]
At the very least, you should see both servers in the cluster at this point. If you see this above, you’re done, and don’t need the rest of this article. However, if you’re unlucky, and one or more of the servers are down (Service Status = Down, or Starting) keep reading.
At this point, one of my servers was not started (DOWN), so I went ahead and ran the following.
PS C:> Start-CacheHost –ComputerName [yourServerName] –CachePort 22233
If that failed, like it did for me, I would recommend exporting your cache cluster configuration, and seeing if anything is wrong. To do this, run the following.
PS C:> Export-CacheClusterConfig [path to output filename]
So, for example…
PS C:> Export-CacheClusterConfig c:file.txt
When looking at the file, down near the bottom, I noticed that the account that MyServer1 was running under was all goofy (usernames shouldn’t have tilde’s in them).
<hosts>
<host replicationPort=”22236″ arbitrationPort=”22235″ clusterPort=”22234″
hostId=”1909348767″ size=”800″ leadHost=”true” account=”DOMAINappsrv1~“
cacheHostName=”AppFabricCachingService” name=”MyServer1.domain.com”
cachePort=”22233″ />
<host replicationPort=”22236″ arbitrationPort=”22235″ clusterPort=”22234″
hostId=”1634054989″ size=”400″ leadHost=”true” account=”DOMAINspService”
cacheHostName=”AppFabricCachingService” name=”MyServer2.domain.com”
cachePort=”22233″ />
</hosts>
WARNING: MAKE A BACKUP BEFORE YOU MAKE ANY CHANGES!!!
I fixed the account name (to match the service account on the other server DOMAINspService) and then had to import the configuration back in.
** BUT WAIT – There’s more! **
Before you try to import your configuration, you’ll need to go into your Windows “Services” application and disable the “AppFabric Caching Service”, and then stop the service on each server in the cluster.
To do this, go find the following service and double click on it.
Next follow this order exactly, set the startup type to disabled, then stop the service (this is the same as running the PowerShell to shut down the AppFabric host).
Repeat the above steps on each server in the cache cluster.
Finally, once you’re done, now you can import the file like below.
PS C:> Import-CacheClusterConfig C:file.txt
Confirm
Are you sure you want to perform this action?
Performing operation “Replace cluster configuration.” on Target “Cluster configuration.”.
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is “Y”): y
If you shut down the cluster properly (like I describe above), your configuration should take at this point.
If you see the following error, ensure that you’ve shut down the service on all servers in the cluster (seen above).
Import-AFCacheClusterConfiguration : ErrorCode<ERRCAdmin001>:SubStatus<ES0001>:Hosts are already running in the
cluster.
At line:1 char:1
+ Import-AFCacheClusterConfiguration C:file.txt
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Import-AFCacheClusterConfiguration], DataCacheException
+ FullyQualifiedErrorId : ERRCAdmin001,Microsoft.ApplicationServer.Caching.Commands.ImportAFCacheClusterConfigurat
ionCommand
Go back to your services window and set your AppFabric service back to Automatic. Now all you should need to do is start the cluster, and you’ll be good.
PS C:> Start-CacheCluster
And all your servers should be UP at this point. You can also check the cluster health with the following.
PS C:> Get-CacheClusterHealth
You can also check the Cache status with the following command.
PS C:> Get-Cache
Don’t forget, you can always see all the valid PowerShell commands using the following.
PS C:> Get-Help *Cache*
I hope this helps others where I was pulling my hair out.
This is a fantastic article and very well written. Thanks – helped me solve my problem.
This article literally saved me a tremendous amount of time!! Thank you to the author!
I agree! I’ve spent days trying to fix this. For me, just exporting and re-importing the config file, then starting things back up, finally fixed the problem. GREAT blog post!!
Hi,
All My settigs mentioned in the article are fine with my server but still getting below error very frequently. Can you please help me with this?
ViewStateLog: Failed to write to the velocity cache: http://server:2731/default.aspx
Unexpected Exception in SPDistributedCachePointerWrapper::InitializeDataCacheFactory for usage ‘DistributedViewStateCache’ – Exception ‘Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:The request timed out.. Additional Information : The client was trying to communicate with the server : net.tcp://server:22233 at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody) at Microsoft.ApplicationServer.Caching.DataCacheFactory.GetCacheProperties(RequestBody request, IClientChannel channel) at Microsoft.ApplicationServer.Caching.DataCacheFactory.GetCache(String cacheName) at Microsoft.SharePoint.DistributedCaching.SPDistributedCachePointerWrapper.InitializeDataCacheFactory()’.
Below is the info from my dev server
ps>> Get-Cachehostconfig with host details gives me
HostName : server.corp.domain.com
ClusterPort : 22234
CachePort : 22233
ArbitrationPort : 22235
ReplicationPort : 22236
Size : 819 MB
ServiceName : AppFabricCachingService
HighWatermark : 99%
LowWatermark : 90%
IsLeadHost : True
ps>>get-cachehost
PS C:Usersgnfoip02> Get-CacheHost
HostName : CachePort Service Name Service Status Vers
ion
Info
——————– ———— ————– —-
server.corp.domain.com:22233 AppFabricCachingService UP 3 [3
,3][
1,3]
Hosts under exported files is having
Hi , Great Blog.. Thanks for all the Guides.. This didn’t solve my problem but I lead me in the correct direction..
I ended up removing the host from the cluster. uninstalling sharepoint. and uninstalling app fabric and re-running the pre-requisites installer.
Other guides which I found helpful are..
http://blogs.msdn.com/b/besidethepoint/archive/2013/03/27/appfabric-caching-and-sharepoint-2.aspx
http://technet.microsoft.com/en-us/library/jj219613.aspx#addremove
http://www.sharpcoder.co.uk/post/2011/04/11/Removing-Hosts-from-an-AppFabric-Cache-Cluster.aspx
http://technet.microsoft.com/en-us/library/jj891108.aspx
http://social.msdn.microsoft.com/Forums/en-US/6a645248-80a2-4b9d-a6d1-1ccc6faa8289/sharepoint-foundation-2013-distributed-cache-not-working-driving-me-nuts
http://blogs.msdn.com/b/josrod/archive/2007/12/12/clear-the-sharepoint-configuration-cache-for-timer-job-and-psconfig-errors.aspx
That Just about covers 90% of the scenarios. this is 48 hours worth of research, 🙂
Great article, helped me out with a quirk in our SP installation! Thanks a lot!
Thanks for this, I was drawing a blank not knowing that you need to execute Use-CacheCluster first. Why oh why does the Technet documentation not mention this? It turned out after exporting the config that I also had the wrong account configured
Great post, helped me solve an annoying issue.
What I found out to resolve my issue, was that the cache-host causing problems had the pre-windows 2000 hostname in the host name attribute. Changing that to the FQDN and importing the config fixed the problem.
Thank you Colin, these steps helped me solve a similar issue.
This post is truly a gold mine.
I had this issue and found a couple of things
When you register FQDN for the server … otherwise it creates a dummy entry which you need to remove. Also if you accidentally create an other service instantance you get and error which is easy to fix
PS C:Usersdwesterdale> Add-SPDistributedCacheServiceInstance
Add-SPDistributedCacheServiceInstance : Cannot start service AppFabricCachingService on computer ‘.’.
At line:1 char:1
+ Add-SPDistributedCacheServiceInstance
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (Microsoft.Share…ServiceInstance:SPCmdletAddDist…ServiceInstance) [Add-SPDistributedCacheServiceInstan
ce], InvalidOperationException
+ FullyQualifiedErrorId : Microsoft.SharePoint.PowerShell.SPCmdletAddDistributedCacheServiceInstance
# now deleete the old instance
PS C:Usersdwesterdale> Remove-SPDistributedCacheServiceInstance
PS C:Usersdwesterdale> Add-SPDistributedCacheServiceInstance
PS C:Usersdwesterdale> Start-CacheCluster
— this shows an ‘error’ bacause the host has already started but I think there is no issues really.
Thereafter I unregister the dummy entry I mentioned above ..
AppFabric service is now happy!
Great post, like Daniel said above – this is gold!
This post really nails this problem. Great. Why microsoft does not have such blogs?
My response would be, that’s why the have the MVP program. It helps them to get the best of the community contributors and reward them by saying, you helped us out and figured this out on our behalf.
Excellent post it helped solve the same exact issue
Solved my problem! Thanks for taking the time to blog about this!
Colin – thank you, the Export-CacheClusterConfig helped immensely. I used the MS powershell to change the Cache Svc from the farm account to a managed service account, and everything was fine until I removed the farm account from local admin – at which point, the Dynamic Cache service crashed continually.
When I ran the export I found that 2 groups were involved: securityProperties>
The managed account was already in the WSS_WPG group but not in WSS_ADMIN_WPG
I added the account and all is well.
That still doesn’t explain why the farm account being in the local admin group would affect the DCS (which was running under another service account).
Great write-up, helped me enormously. I had two cache hosts in the cluster, one of which was starting (forever) and the other down. Exporting the config, manually matching up the service account and port values, and then importing the saingle file to both hosts did the trick.
Interestingly, my dodgy config came from autospinstaller installation of the service!
Excellent post and great time and life saver 🙂
It did help me understand as well as troubleshoot the problem. My problem was bit different. One the host in cluster was suffering from ping loss and that has put the services down.
Is that okay if I can have only one host running the cache services while I disable the windows service on others?
Thank you very much Colin!!
Hi,
I am working on Windows Server App fabric. I have been trying to add 2nd cache host to my cluster.
What I did : –
I created a cluster and added my local machine as cache host on cache port 22233.
Now i installed appfabric on 2nd machine and while configuring it I joined it with the previous cluster.
When i used command Get-Cache Host, it showed my machine with service status as UP and it showed the 2nd machine with service status as UNKNOWN.
Also, when i viewed the config of each host both were shown as lead host.
Please help me in adding 2nd cache host in a cluster.
Hi
i missing the appfabric services in SharePoint
will this help?
They will be installed as part of the sharepoint prerequisites.
Colin,
Excellent Post… Thanks for sharing the knowledge.
This helped me solve a rather perplexing issue!
Hello – thank you for your article which almost helped me. 😉 I have 2 servers in a cluster where one has a status of UNKNOWN. The App Fabric service stops and disables itself immediately on being started. The event viewer shows a failure in KERNELBASE.dll. My errors have resisted your fixes. Any ideas?
1. Get-CacheHostConfig : ErrorCode:SubStatus:The requested name is valid, but no data of the requested type was found
2. Register-CacheHost : ErrorCode:SubStatus:The requested name is valid, but no data of the requested type was found
3. Start-CacheHost : ErrorCode:SubStatus:The requested name is valid, but no data of the requested type was found
I must use the NetBIOS name of the server, so if your server name is over 15 characters, truncate it in the PowerShell commands. Also check you have a DNS entry for the shorter name.
You are a super hero.
Excellent article, thank you very much!
Thank you!
This all requires that your cluster somehow runs, i.e. there is at least one server that is ok. If that is not the case, say, you have one server Sharepoint farm and the distributed cache is broken on it, then the very first command “Use-CacheCluster” already fails (“Failed to connect hosts in the cluster.”).I’m looking for help how to rebuild the cluster from absolutely scratch. Where the original, the one and only Sharepoint server has gone, and you have a clone of it under different name. (This was *not* my idea.)
Any pointers appreciated,
TIA
Hi! Thank you for guide! It is very helpful!