I received the following message in the online.log:

18:47:48 Assert Failed: Memory free block header corruption detected in
mt_shm_malloc_segid 1
18:47:48 Informix Dynamic Server Version 7.31.FC7XS
18:47:48 Who: Session(8013, informix@sappd3, 17925, 1541177696)
Thread(9538, sqlexec, c0000000d5697cc0, 3)
File: mtshpool.c Line: 2649
18:47:48 Results: Unable to repair pool
18:47:48 Action: Please notify Informix Technical Support.
18:47:48 stack trace for pid 8561 written to /local/db_tmp/af.292aa913
18:48:38 See Also: /local/db_tmp/af.292aa913, shmem.292aa913.0

After the error, many user connections were blocked, though new connections
were allowed. The 'onstat -g ath' showed the blocked sqlexec threads with
"mutex wait nsfock". I tried to terminate the sessions that were blocked,
but every attempt failed. I then tried to shutdown the engine with 'onmode
-yuk' but it just sat there. I finally had to kill the oninit process and
cleanup the shared memory.
>> Question 1: What is nsfock?
>> Question 2: Is there any way to safely terminate a thread that is
blocked on nsfock?

Here's some information from the dump file:

18:47:48 Found during mt_shm_malloc_segid 1
18:47:48 Pool '8013' (0xc00000019478c028)
18:47:48 Bad free block 0xc000000194798f90
....
18:47:48 Found during recover_pool_bad_free_block 2
18:47:48 Pool '8013' (0xc00000019478c028)
18:47:48 Bad free block 0xc0000001947a1d98
....
18:47:48 Multiple block errors found
18:47:48 Informix Dynamic Server Version 7.31.FC7XS Software Serial
Number ACN#xxxxxxxx
18:47:48 Assert Failed: Memory free block header corruption detected in
mt_shm_malloc_segid 1
18:47:48 Who: Session(8013, informix@sappd3, 17925, 1541177696)
Thread(9538, sqlexec, c0000000d5697cc0, 3)
File: mtshpool.c Line: 2649
18:47:48 Results: Unable to repair pool
18:47:48 Action: Please notify Informix Technical Support.
18:47:48 Stack for thread: 9538 sqlexec

base: 0xc00000019492e000
len: 270336
pc: 0x0000000000000000
tos: 0xc000000194932180
state: running
vp: 3

( 0) 0x40000000004eca3c legacy_hp_afstack + 0x24c
[/informix/PRD/bin/oninit]
( 1) 0x40000000004ebf48 afstack + 0x68 [/informix/PRD/bin/oninit]
( 2) 0x40000000004eb32c afhandler + 0x644 [/informix/PRD/bin/oninit]
( 3) 0x40000000004eac04 affail_interface + 0x54
[/informix/PRD/bin/oninit]
( 4) 0x40000000004df5d0 recover_pool_bad_free_block + 0x220
[/informix/PRD/bin/oninit]
( 5) 0x40000000004dbcfc mt_shm_malloc_segid + 0x214
[/informix/PRD/bin/oninit]
( 6) 0x40000000004db940 mt_shm_malloc + 0x80 [/informix/PRD/bin/oninit]
( 7) 0x40000000004dd328 mt_shm_realloc + 0x208
[/informix/PRD/bin/oninit]
( 8) 0x40000000004a8638 deccvt + 0x3b0 [/informix/PRD/bin/oninit]
( 9) 0x40000000004a6ef4 dectoasc + 0x94 [/informix/PRD/bin/oninit]
(10) 0x4000000000491100 rvalstr + 0x498 [/informix/PRD/bin/oninit]
(11) 0x40000000001c85ec smi_printconst + 0xfc
[/informix/PRD/bin/oninit]
(12) 0x40000000001c7f3c smi_printexpr + 0xe34
[/informix/PRD/bin/oninit]
(13) 0x40000000001c7ea8 smi_printexpr + 0xda0
[/informix/PRD/bin/oninit]
(14) 0x40000000001c6b7c smi_printfilter + 0xe4
[/informix/PRD/bin/oninit]
(15) 0x40000000001c4d58 smi_opexplain + 0x3a8
[/informix/PRD/bin/oninit]
(16) 0x40000000001c984c smi_storeexplainflags + 0x84
[/informix/PRD/bin/oninit]
(17) 0x40000000001c3eb4 prconblock + 0x364 [/informix/PRD/bin/oninit]
(18) 0x40000000003b761c pstread + 0x8ac [/informix/PRD/bin/oninit]
(19) 0x4000000000305a44 pst_rsread + 0x56c [/informix/PRD/bin/oninit]
(20) 0x40000000003066dc rsread + 0xabc [/informix/PRD/bin/oninit]
(21) 0x4000000000555b14 fmread + 0x63c [/informix/PRD/bin/oninit]
(22) 0x4000000000151984 sqisread + 0x2c [/informix/PRD/bin/oninit]
(23) 0x400000000015aa64 readidx + 0x3e4 [/informix/PRD/bin/oninit]
(24) 0x4000000000159ea8 gettupl + 0x2d8 [/informix/PRD/bin/oninit]
(25) 0x4000000000157ccc scan_next + 0x204 [/informix/PRD/bin/oninit]
(26) 0x40000000002a742c inner_next + 0x1ac [/informix/PRD/bin/oninit]
(27) 0x40000000002a6c9c join_next + 0xf4 [/informix/PRD/bin/oninit]
(28) 0x40000000002a6dc8 join_next + 0x220 [/informix/PRD/bin/oninit]
(29) 0x400000000015d32c filltemp + 0x3ac [/informix/PRD/bin/oninit]
(30) 0x40000000001578c0 scan_open + 0x2e8 [/informix/PRD/bin/oninit]
(31) 0x40000000002a82bc group_open + 0x434 [/informix/PRD/bin/oninit]
(32) 0x400000000016bfe8 sort_open + 0x90 [/informix/PRD/bin/oninit]
(33) 0x400000000015eff8 prepselect + 0x5e0 [/informix/PRD/bin/oninit]
(34) 0x400000000020ede8 open_cursor + 0x468 [/informix/PRD/bin/oninit]
(35) 0x400000000020e8b8 sq_open + 0x58 [/informix/PRD/bin/oninit]
(36) 0x4000000000221978 sqmain + 0x100 [/informix/PRD/bin/oninit]
(37) 0x40000000004c8a88 startup + 0xd8 [/informix/PRD/bin/oninit]
(38) 0x40000000004e11fc resume + 0x10c [/informix/PRD/bin/oninit]

....
===========------------- - - - - - -
/informix/PRD/bin/onstat -g ses 8013:

Informix Dynamic Server Version 7.31.FC7XS -- On-Line -- Up 2 days
19:51:06 -- 5041920 Kbytes

session #RSAM total used
id user tty pid hostname threads memory memory
8013 informix - 17925 sappd3 1 450560 442504

tid name rstcb flags curstk status
9538 sqlexec c0000000d5697cc0 ---PR-- 254960
c0000000d5697cc0running

Memory pools count 1
name class addr totalsize freesize #allocfrag
#freefrag
Changing data structure forced command termination.

....
===========------------- - - - - - -
/informix/PRD/bin/onstat -g sql 8013:

Informix Dynamic Server Version 7.31.FC7XS -- On-Line -- Up 2 days
19:51:06 -- 5041920 Kbytes

Sess SQL Current Iso Lock SQL ISAM F.E.
Id Stmt type Database Lvl Mode ERR ERR Vers
8013 SELECT sysmaster DR Not Wait 0 0 7.31

Current statement name : unlcur

Current SQL statement :
select sqx_sessionid, max(substr(sqx_selflag,4)), max(sqx_estcost),
max(sqx_estrows) from syssqexplain where
sqx_sessionid
in

(53,54,92,93,96,161,235,257,258,264,280,290,291,29 5,307,309,334,341,346,383

,390,394,459,483,502,511,515,575,584,587,624,639,6 42,694,695,707,743,750,77

7,805,812,997,1133,1308,1410,3365,3441,4058,4114,4 437,4456,4503,4530,4539,4
563,4705,4830,5110,6071,6071,6071,7712) and sqx_iscurrent="Y"
and sqx_ismain="Y" group by 1 order by 1

Last parsed SQL statement :
select sqx_sessionid, max(substr(sqx_selflag,4)), max(sqx_estcost),
max(sqx_estrows) from syssqexplain where
sqx_sessionid
in

(53,54,92,93,96,161,235,257,258,264,280,290,291,29 5,307,309,334,341,346,383

,390,394,459,483,502,511,515,575,584,587,624,639,6 42,694,695,707,743,750,77

7,805,812,997,1133,1308,1410,3365,3441,4058,4114,4 437,4456,4503,4530,4539,4
563,4705,4830,5110,6071,6071,6071,7712) and sqx_iscurrent="Y"
and sqx_ismain="Y" group by 1 order by 1
>> Question 3: Could there have been a stack overflow that wasn't caught?
'onstat -c' had STACKSIZE = 256 (262144)
'onstat -g ses' had curstk = 254960
'onstat -g stk' had len = 270336
>> Question 4: The query above is used by a monitoring program, and is run
frequently. Is there something wrong with the syntax?

Comments are appreciated,
Tim


sending to informix-list