NetTalk Central
NetTalk Web Server => Web Server - Ask For Help => Topic started by: trent on May 02, 2013, 03:57:56 PM
-
Hi All,
Our Nettalk Web Server seems to create a lot of SQL 'Sleeping Connections' which then causes our desktop and web apps to not load the tables tables that have these open connections. Is there any way to clear our or stop these sleeping connections so that the SQL Server Instance does not need to be restarted every few days?
Regards,
Trent
-
Hi Trent,
the sleeping connections are ok and actually make SQL work faster re-using a connection rather than closing them and creating new connections (they do close after a period of inactivity). It's sort of like a Thread Pool.
The problem you are *probably* having is dead-locked connections. You can see this in the view process's screen columns blocking and blocked.
Bruce added a method in NT7 called SetSQLTimeout(filename,net:on/net:off)
You may need to call this method if you are doing a lot of file access in handcode.
HTH's
Kevin
-
Hi Kevin,
Thanks for the reply. I don't have NT7 but I see that it is in NT6 as 'experimental'. Will give it a go.
Regards,
Trent
-
Hi Kevin,
I've just gotten around to testing the SetSQLTimeout method. Would you be able to guide in the right direction of implementing this? Do I have to call this method for every table every time I am accessing the tables in hand code? Can I set this once on each table being used at WebServer init?
Regards,
Trent
-
you use it to "wrap" your actual file access. for example (in NT7);
self.SetSqlTimeout(File,net:On)
Add(file)
self.SetSqlTimeout(File,net:Off)
or
self.SetSqlTimeout(File,net:On)
put(file)
self.SetSqlTimeout(File,net:Off)
you can control the actual timeout period in the WebHandler, ProcessLink method, before parent call.
self.Site.SqlTimeout = 10000
10000 is the default value (ie if the line above is omitted then 10000 is used.)
The template already generates all the above around generated file access code.
Cheers
Bruce
-
Awesome thank you Bruce! Does this work with NT6?
-
try it in NT6 - I think it was added just before NT7.
If not here is some prop:sql I use.
TAGGED{Prop:SQL} = 'Set Lock_Timeout 10000' !wait 10 seconds
TAGGED{Prop:SQL} = 'Select * from TAGGED Where TAG_SessionId = <39>' & clip(p_web.SessionId) & '<39> and TAG_Tagged = 1 and TAG_Module = <39>POT<39>'
TAGGED{Prop:SQL} = 'Set Lock_Timeout -1' !
-
Hi Trent,
yes, I think it's in very late builds of NT6.
cheers
Bruce
-
Do I need to use this on:
- access:TABLE.Fetch
- relate:TABLE.Delete
- and any {prop:sql} statements I'm running as well?
Regards,
Trent
-
That is the safest option to avoid the dreaded lock. The other option is to just track down the code that is locking but that can be really hard and it may take a few goes to plug all the holes.
-
Ok cool, thanks Kevin and Bruce.
-
Since Set Lock_Timeout is connection based, if you set it once early it should persist for the life of the connection.
For discussion: Does it not make sense to set it once as you make the initial connection and never set it back to -1.
When would you Not want it to timeout. (By default), You want it to timeout before the web request times out.
Performance issues due to needless blocking are the worst, And dangling Netttalk threads that have lost their way to
the original client request (because the user gave up or timed out), leave the system in an unhealthy state.
So an abandoned thread that is still blocked, could in-turn block others... (in my experience, the same user who
is simply trying again, and again and again...<sigh>)
thoughts?
-
>> And dangling Netttalk threads that have lost their way to the original client request (because the user gave up or timed out),
I'm not sure this statement is accurate. The thread creates the response, and passes that back to the Server thread for passing to the client. So the thread will complete even if the client has disconnected.
If this wasn't so, then the ram consumed by a server would quickly leak away until the process crashed.
(It certainly is possible for a thread to never end, but the above would not be the cause of it.)
cheers
Bruce
-
> Since Set Lock_Timeout is connection based, if you set it once early it should persist for the life of the connection.
I originally tried to set it in the process link of the webhandler but it did not "seem" to work. Maybe there is a better way to implement with some experimentation but finding the steps to replicate is part of the problem.
-
>>The thread creates the response, and passes that back to the Server thread for passing to the client. So the thread will complete even if the client has disconnected.
My dangling reference is the case where the timeout is -1, the thread/connection to SQL would persist until the blocking is resolved, or the connection is killed. The SQL connection that is BLOCKED - remains, um, dangling!? "Sleeping Connections" are fine, But ones that are BLOCKED and persist for any length of time can create a scenario for continued degradation. In these scenarios the users (or the Browser) will give up, or try again...
MS says "At the beginning of a connection, this setting has a value of -1. After it is changed, the new setting stays in effect for the remainder of the connection." My experience is mainly with NT4 and in practise I set the timeout in two places either when i make my initial connection or sometimes within a stored procedure.
IN NT7 IO statements are preceeded explicitly with a LOCK_timeout, So you address the main issue, but you ALSO EXPLICITLY Set it after your IO to the default(-1) not The Setting that existed before you changed it. Nothing wrong with that so long as one understands the impact. To me this overrides the implied behaviour from MS as now "ALL IO statements must EXPLICITLY set the TIMEOUT, because NETTALK is always resetting the behaviour for the connection", but not for a good reason (to me) because all Nettalk generated IO EXPLICITLY SET a good timeout anyway. (not necessarily true for any other code)
I guess a CAVEAT in the DOCS that all hand code and non-nettalk MSSQL I-O should explicitly set the lock_timeout, would suffice.
In my mind, By default, threads that quietly die are better than threads that quietly live a long life, contributing to the additional Blocking. Hence i believe -1 is the least desireable default behaviour, it should be left at what was last intended. (in nettalks defence for some reason it intends -1 <g>)
KEVIN, the basic BLOCK behavior can be tested using Query analzyer, BEGIN a TRANSACTION, select/update a resources with LOCK, test your app accessing that resource. (ROLLBACK when done). SQL Profiler is handy, it will clearly show each time the SET LOCk_TIMEOUT setting.
-
What about when you are hand-coding a loop through a table? Would you set the timeout on and off for each record accessed or just once for the entire loop?
-
I do it for every file access. eg
TAGGED{Prop:SQL} = 'Set Lock_Timeout 10000' !wait 10 seconds
TAGGED{Prop:SQL} = 'Select * from TAGGED Where TAG_SessionId = <39>' & clip(p_web.SessionId) & '<39> and TAG_Tagged = 1 and TAG_Module = <39>POT<39>'
TAGGED{Prop:SQL} = 'Set Lock_Timeout -1' !
Loop
Next(Tagged)
If Errorcode()
Break
End
If Tag:Tagged = 1
Tag:Tagged = 0
TAGGED{Prop:SQL} = 'Set Lock_Timeout 10000' !wait 10 seconds
Access:TAGGED.Update()
TAGGED{Prop:SQL} = 'Set Lock_Timeout -1' !
End
End
I think this is better than holding the timeout until the end of the loop. It can be problematic if you call a sub routine and you set the timeout again before you set it off or if you forget to set the timeout off. This is just my experience...
Kev
-
OK but is it needed for each 'next()'? i.e.
access:UserLicence.open
access:UserLicence.usefile()
clear(ULI:Record)
ULI:UserID = p_web.GSV('UserID')
set(ULI:KeyUserModuleLevel,ULI:KeyUserModuleLevel)
LOOP
p_web.SetSqlTimeout(UserLicence, net:on) <<<<<<<<here?
if access:UserLicence.next() then break.
if ULI:UserID not = p_web.GSV('UserID') then break.
p_web.SSV(p_web.GSV('UserID') &'-'& clip(ULI:ModuleName),ULI:LicenceLevel)
p_web.SetSqlTimeout(UserLicence, net:Off) <<<<<<<<and here?
end!loop
access:UserLicence.close
Also would you go so far as to including it for the 'open', 'usefile()' and 'close'?
-
if using prop:sql, no because it's in the buffer. not sure about abc access but I "guess" it just grabs 1 record at a time and in that case yes.
however in your example there is a problem where you break out of the loop without setting the timeout off. So I would either set the timeout on/off outside the LOOP or rework you next/break statement so you set off before exiting the loop.
If using ABC I think I just set it on/off outside the loop - seemed to work...
-
Thanks for clearing that up Kevin. I think I'll set it for each record in an ABC loop just in case. Having a specific time limit on the entire loop wouldn't work for some of our queries.
I wonder if there is a more convenient way to do this... will have to research a bit further.
Regards,
Trent
-
Hi All,
I've set all handcoded SQL statements to have a timeout now BUT our WebServer and main application are still getting 'blocked' by these sleeping connections to the point that the WebServer has to be restarted at least 2 or three times a day to allow our main application to work correctly (Win32 browses and update form windows just freeze when a user tries to open the window). This looks really bad for us and shouldn't be happening.
A quick google search suggests that disabling or even killing all active sleeping connections is a bad idea and will cause performance issues, so what else can be done to fix this issue?
Does our Win32 application need to have the SetTimeout property set on every handcoded and non-handcoded SQL statement?
Regards,
Trent
-
Hi Trent,
> getting 'blocked' by these sleeping connections
- sleeping connections don't block apps - just the ones that are locked
> killing all active sleeping connections is a bad idea and will cause performance issues, so what else can be done to fix this issue?
- correct. what you want is to bring up the blocked and blocking process. On the blocking process's see what SQL statement it is running. Also look at what Table is being locked. You can then kill individual connections which sometimes works without killing your app.
> Does our Win32 application need to have the SetTimeout property set on every handcoded and non-handcoded SQL statement?
- I don't believe so.
> I've set all handcoded SQL statements to have a timeout now
- if you know what table is being locked then that narrows things a lot. If you miss resetting a timeout (off) after your code then you will experience locks. My feeling is something is missing somewhere.
> WebServer has to be restarted at least 2 or three times a day to allow our main application to work correctly
- I used to have code placed in strategic places that would run some SQL statements (like a sum or count) on tables normally affected and if it came back with the usual errorcode when a table is locked I would automatically restart the app. I also used Xfiles to save all session data before the restart and then reload it when the app restarted so users could carry on without really knowing there was a problem.
-
Hi Kevin,
>>You can then kill individual connections which sometimes works without killing your app.
Yeah I don't really want to be forever having to log into client's servers to manually kill individual connections - I like making things easy for myself not harder :)
>>My feeling is something is missing somewhere.
I'll go through the entire apps again to check. The environment is two Win32 apps using 2x separate databases (one database merges data into the second database, the 1st app can call procedures in the 2nd apps dlls). The WebServer is used for the 1st Win32 app.
>>I would automatically restart the app.
The WebServer runs on a master server. Including this functionality would mean including more 3rd party programs that can start remotely from other machines on the network (unless NetTalk can do this???).
>>I also used Xfiles to save all session data before the restart and then reload it when the app restarted so users could carry on without really knowing there was a problem.
That's pretty awesome how do you do that?
Regards,
Trent
-
It's all done in the web app...
Bruce gets the credit on the xfiles stuff as I needed his expertise to get it working. I log all my messages to disk which then helps me identify which tables were locked etc. Hope this helps. One app that restarted 5-6 times a day now runs for months on end. Another app my uses used to email me with SWOD (spinning wheel of Death) happened so often the app was almost unusable and caused them (and me) a lot of grief. Both fixed with the timeout stuff. I also run an app that comes with self service connectionchecker that also checks the app is running and if not restarts it.
I had this in it's own source procedure hooked off the indexpage so it regularly checked for locks.
CheckForLocks ROUTINE
p_web.SetSQLTimeout(ADJUST,net:on)
DUMMY{Prop:SQL} = 'Select Count(*) as Dum_Field01 From APPROVAL'
p_web.SetSQLTimeout(ADJUST,net:off)
Next(DUMMY)
If Errorcode()
Case Errorcode()
Of 33
p_web._trace('Lock on APPROVAL - restart errorcode=' & errorcode() & 'error=' & error() & ' ' & clip(glo:dbowner))
Message('Lock on APPROVAL - restart errorcode=' & errorcode() & 'error=' & error() & ' Threads=' & 'wp:NumberOfThreads')
p_web.RequestData.Webserver._wait()
xml1.save(p_web.RequestData.Webserver._SessionQueue,'sessionqueue.xml','table','record')
xml1.save(p_web.RequestData.Webserver._SessionDataQueue,'sessiondataqueue.xml','table','record')
p_web.RequestData.Webserver._release()
Chain(Command(0))
Of 90
p_web._trace('APPROVAL Busy=' & errorcode() & 'error=' & error() & ' ' & clip(glo:dbowner))
Message('APPROVAL Busy=' & errorcode() & 'error=' & error() & ' ' & glo:dbowner)
p_web.RequestData.Webserver._wait()
xml1.save(p_web.RequestData.Webserver._SessionQueue,'sessionqueue.xml','table','record')
xml1.save(p_web.RequestData.Webserver._SessionDataQueue,'sessiondataqueue.xml','table','record')
p_web.RequestData.Webserver._release()
Chain(Command(0))
Else
p_web._trace('APPROVAL Error=' & errorcode() & 'error=' & error() & ' ' & clip(glo:dbowner))
Message('APPROVAL Error=' & errorcode() & 'error=' & error() & ' Threads=' & 'wp:NumberOfThreads')
End
Else
End
***********************************
This stuff goes in the webserver procedure
GlobalErrors.SetProcedureName('WebServer')
! [Priority 800]
Message('Web Server started: ' & LongPath())
xml.TagCase = XF:CaseAny
xml.load(ThisWebServer._SessionQueue,'sessionqueue.xml','table','record')
xml.load(ThisWebServer._SessionDataQueue,'sessiondataqueue.xml','table','record')
***
ThisWindow.Kill PROCEDURE
ReturnValue BYTE,AUTO
! Start of "WindowManager Method Data Section"
! [Priority 5000]
! End of "WindowManager Method Data Section"
CODE
! Start of "WindowManager Method Executable Code Section"
! [Priority 50]
! ThisWebServer._Wait()
xml.save(ThisWebServer._SessionQueue,'sessionqueue.xml','table','record')
xml.save(ThisWebServer._SessionDataQueue,'sessiondataqueue.xml','table','record')
! ThisWebServer._Release()
****
xml.SaveCurrentFieldToXML PROCEDURE (Long p_x,Long p_DimCounter,String p_name)
! Start of "Class Method - Data Section"
! [Priority 5000]
a string(1)
! End of "Class Method - Data Section"
CODE
! Start of "Class Method - Executable Code Section"
! [Priority 50]
! xfiles doesn't do &Strings well (yet), so we need to give it a hand here, when saving.
if lower(p_name) = 'extvalue'
if thisWebServer._SessionDataQueue.ExtValue &= Null
self.currentfield &= a
else
self.currentfield &= thisWebServer._SessionDataQueue.ExtValue
end
end
******
xml.AssignField PROCEDURE (String pString)
! Start of "Class Method - Data Section"
! [Priority 5000]
! End of "Class Method - Data Section"
CODE
! Start of "Class Method - Executable Code Section"
! [Priority 50]
! the incoming queue contains the length of memory set aside for the (bigger) session values.
! and big values are stored in extvalue, not value. Extvalue is a pointer ( &string) so we need to
! give xFiles some assistance here.
! This code will only work if extvaluesize is encountered _before_ extvalue.
case lower(self.CurrentTag)
of 'extvaluesize'
thisWebServer._SessionDataQueue.ExtValueLen = pString ! pString is a parameter to this method, not the pString data type.
thisWebServer._SessionDataQueue.ExtValue &= new(string(thisWebServer._SessionDataQueue.ExtValueLen))
return
of 'extvalue'
if pString <> ''
thisWebServer._SessionDataQueue.ExtValue = pString
end
return
end
*****
-
If a sleep connection is blocking, that indicates that the transaction is still active. I had the same issue and it took a lot of digging and frustration. I looked for where I had opened files and not closed them in my code, make sure you are not doing auto-inc in clarion as this can cause a new connection to start and not close properly, and last but not least make sure if you are using MSSQL that you are using verifyViaSelect. In the end, even after I made sure all my files were getting closed and the auto inc was on the server... the verifyViaSelect solved the last of the issue for me. It seems that when it was using the stored procedure to validate the file structure clarion did not always close the connection after it was done with that task.
To troubleshoot these issues I added a few admin pages that displayed the "Active Transactions", and the "Cursors" with the last statement that generated both. Following is the code for that.
Active Transactions
-----------------------------------------------
Create View v_ActiveTransactions
as
SELECT
[s_tst].[session_id],
[s_es].[login_name] AS [Login Name],
DB_NAME (s_tdt.database_id) AS [Database],
[s_tdt].[database_transaction_begin_time] AS [Begin Time],
[s_tdt].[database_transaction_log_bytes_used] AS [Log Bytes],
[s_tdt].[database_transaction_log_bytes_reserved] AS [Log Rsvd],
[s_est].text AS [Last T-SQL Text],
[s_eqp].[query_plan] AS [Last Plan]
FROM
sys.dm_tran_database_transactions [s_tdt]
JOIN
sys.dm_tran_session_transactions [s_tst]
ON
[s_tst].[transaction_id] = [s_tdt].[transaction_id]
JOIN
sys.[dm_exec_sessions] [s_es]
ON
[s_es].[session_id] = [s_tst].[session_id]
JOIN
sys.dm_exec_connections [s_ec]
ON
[s_ec].[session_id] = [s_tst].[session_id]
LEFT OUTER JOIN
sys.dm_exec_requests [s_er]
ON
[s_er].[session_id] = [s_tst].[session_id]
CROSS APPLY
sys.dm_exec_sql_text ([s_ec].[most_recent_sql_handle]) AS [s_est]
OUTER APPLY
sys.dm_exec_query_plan ([s_er].[plan_handle]) AS [s_eqp]
ORDER BY
[Begin Time] ASC;
GO
Cursors
-------------------------
CREATE View [dbo].[dv_CursorSessions]
as
select login_time,status,last_request_start_time,last_request_end_time,
ses.transaction_isolation_level, ses.lock_timeout,row_count, c.*, t.text
from sys.dm_exec_sessions ses
cross apply sys.dm_exec_cursors(ses.session_id) c
CROSS APPLY sys.dm_exec_sql_text (c.sql_handle) t
where ses.is_user_process=1 and ses.session_id <> @@SPID
GO
-
Hi Kevin and Debra,
Thank you very much for your suggestions. What i'll try is this:
1) Add a 'CheckForLocks' routine to restart the WebServer when required on a timer in the WebServer procedure.
2) Add the VerifyViaSelect to the most heavily used tables using prop:sql and a select statement.
Hi Bruce,
Having 'CheckForLocks' routine as an option would be a great idea for anyone using an MSSQL database with their web app.
Regards,
Trent
-
Hi Debra,
how/where do you set the verifyviaselect?
My apps seem to run fine without it but always keen to explore better options.
Cheers,
Kev
-
I have it on each table in my dictionary in the driver options /verifyviaselect. I tried setting in at the connection in my web handler - process link embed but it does not seem to do the trick there... but the dictionary does.
This also made a significant performance increase when opening forms for one of my wider (many columns) tables.
-
Hi Kevin and Debra,
I've tried quite a few things:
1) Check for locks: This didn't work for me. I had to keep adding more and more tables to check and they never returned any errors even though the WebServer was hanging.
2) Verify Via Select: I've added this into each table of one of the applications and haven't noticed any difference.
3) Killing all sleeping connections that are older than 1x hour: This has worked the best for me. I run this on a timer every 30 minutes. The only problem is that for some reason my WebServer is still GPF'ing at some stage that is rediculously hard to find. This is the code:
KillSleepingConnections ROUTINE
clear(GSYSID:Record)
GloSQLSysID{prop:sql} = 'DECLARE @v_spid INT '&|
'DECLARE c_Users CURSOR '&|
' FAST_FORWARD FOR '&|
' SELECT SPID '&|
' FROM master..sysprocesses (NOLOCK) '&|
' WHERE spid>50 '&|
' AND status=''sleeping'' '&|
' AND DATEDIFF(mi,last_batch,GETDATE())>=60 '&|
' AND spid<>@@spid '&|
'OPEN c_Users '&|
'FETCH NEXT FROM c_Users INTO @v_spid '&|
'WHILE (@@FETCH_STATUS=0) '&|
'BEGIN '&|
' EXEC(''KILL ''+@v_spid) '&|
' FETCH NEXT FROM c_Users INTO @v_spid '&|
'END '&|
'CLOSE c_Users '&|
'DEALLOCATE c_Users;'
debug(GloSQLSysID{prop:sql})
next(GloSQLSysID)
if errorcode() = 90
debug('Error on KillSleepingConnections, fileerror('& FILEERROR() &') fileerrorcode('& FILEERRORCODE() &')')
ELSE
debug('no error on KillSleepingConnections')
end!If
4) Restart procedure to run every 12x hours: This I have just tried this morning. It's a basterdisation of the 'Check for Locks' code where the procedure doesn't check for locked tables, it will just restart the WebServer. There is a problem where the session data is not being saved into an XML file. The XML file is not even created. Here is the code:
debug('RestartWebServer')
p_web.RequestData.Webserver._wait()
debug('Saving _SessionQueue to: '& clip(GBL:WorkDirectory) &'sessionqueue.xml')
xml1.save(p_web.RequestData.Webserver._SessionQueue,clip(GBL:WorkDirectory) &'sessionqueue.xml','table','record')
if xml1.Error
debug('XML Error - '& xml1.Error &' '& xml1.ErrorStr)
end!if
debug('Saving _SessionDataQueue to: '& clip(GBL:WorkDirectory) &'sessiondataqueue.xml')
xml1.save(p_web.RequestData.Webserver._SessionDataQueue,clip(GBL:WorkDirectory) &'sessiondataqueue.xml','table','record')
if xml1.Error
debug('XML Error - '& xml1.Error &' '& xml1.ErrorStr)
end!if
p_web.RequestData.Webserver._release()
if exists(clip(GBL:WorkDirectory) &'sessionqueue.xml') and exists(clip(GBL:WorkDirectory) &'sessiondataqueue.xml')
debug(Command('0') & command(''))
Chain(Command('0') & command(''))
ELSE
debug('an xml file does not exist - restart cancelled')
end!If
What I see in the debug is the first line 'Restart WebServer' then the Chain Command. I don't see any debug for the XML saving at all. Why wouldn't this debug be showing? (debug outputs to DebugView)
Regards,
Trent
-
Make sure you are not just checking for sleeping connections. Look for Active Transactions. Look at the code I (debraballenger) posted earlier. Adding this as a view will let you easily see what code was executed to create that connection in most cases... sometimes it just tells you the cursor. In the cases that it shows the cursor you are generally safe to kill the transaction without it killing your app. It is actually important for you to understand if you have sleeping connections or active transactions. If they are active transactions killing the connection causes a roll back and you loose the data changes associated with that transaction. So make sure you try to trace down any active transactions by looking at the statements that generated them.
I ended up writing a separate app that I can view stats of my main website, namely the historical response time, and memory utilized so I can spot spikes. And most importantly the active transactions using the view. I filter the active transactions to show if the transaction has been open for more than 2 minutes. The other things I have done is to turn on the logging that is found in the Multi Host app, and write a log of when my app restarts. With the above history I can see what all was happening on the server at the time of the freeze. One scenario that this helped me track down was in the log from the multi host app I could see that a request was sent to a particular field on a form, the server never sent back a response to the browser and the app hung. So I knew exactly which field and form was causing my non responsive app and fixed it. You can do the same thing with the active transactions based on the transaction start time, you will know what was happening on the server to create that transaction.
So in summary, create an admin site for yourself... the history it creates will save you a lot of frustration of tracking down impossible bugs.