Hi,
I have a client who has a moderately popular website. The entire site is written in classic ASP running on Windows Server 2008 R2 fully patched and IIS 7.5. Occasionally, his entire site will hang, causing any HTTP(s) request to simply timeout, just to that website. He has a subsidiary website running on the same machine (different AppPool) that is still accessible during this time. Recycling, restarting, stopping/starting the AppPool for the hung website does not work, never has. It requires a restart of the w3svc service, which takes between 30 seconds and a full minute for it to stop during an incident like this. His website is high profile during certain times and when this happens it is crippling to his business. We have IPMonitor checking the status of the website and restarting the service if needed, but this is such a terrible solution.
One important thing we've noticed is when search engine crawlers index the site, these crashes happen MUCH more frequently. They have had to restrict all robot activity in order for their site to function normally. These crashes still happen, and it could be bots that don't abide by the robots.txt file.
The server is a very capable Dell, with modern twin hexacore processors at 2.67Ghz, 48GB of RAM (most of which is never allocated), and 15k drives in a raid 10 for optimal local storage speeds. For extra storage and cluster data they have an EqualLogic array hooked up at 4Gbps. The SQL database for this server is running on a secondary machine to reduce load.
If anyone could shed any light on issues they've had or seen, it would be much appreciated. Event logs show absolutely nothing, I've tried increasing the debug level on IIS logs, but I still see no helpful information. Another crashed just happened this morning, so I can still look back on any other areas people happen to suggest.
Thanks