http://www.stevedunn.ca/ <----------------<<<[/ref][/ref][/ref] ------------------------------------------------------------------ Say hi to my cat -- http://www.stevedunn.ca/photos/toby/ [allowsmilie] => 1 [showsignature] => 0 [ipaddress] => [iconid] => 0 [visible] => 1 [attach] => 0 [infraction] => 0 [reportthreadid] => 0 [isusenetpost] => 1 [msgid] => [ref] => [htmlstate] => on_nl2br [postusername] => Stephen [ip] => stephen@stevedu [isdeleted] => 0 [usergroupid] => [membergroupids] => [displaygroupid] => [password] => [passworddate] => [email] => [styleid] => [parentemail] => [homepage] => [icq] => [aim] => [yahoo] => [msn] => [skype] => [showvbcode] => [showbirthday] => [usertitle] => [customtitle] => [joindate] => [daysprune] => [lastvisit] => [lastactivity] => [lastpost] => [lastpostid] => [posts] => [reputation] => [reputationlevelid] => [timezoneoffset] => [pmpopup] => [avatarid] => [avatarrevision] => [profilepicrevision] => [sigpicrevision] => [options] => [akvbghsfs_optionsfield] => [birthday] => [birthday_search] => [maxposts] => [startofweek] => [referrerid] => [languageid] => [emailstamp] => [threadedmode] => [autosubscribe] => [pmtotal] => [pmunread] => [salt] => [ipoints] => [infractions] => [warnings] => [infractiongroupids] => [infractiongroupid] => [adminoptions] => [profilevisits] => [friendcount] => [friendreqcount] => [vmunreadcount] => [vmmoderatedcount] => [socgroupinvitecount] => [socgroupreqcount] => [pcunreadcount] => [pcmoderatedcount] => [gmmoderatedcount] => [assetposthash] => [fbuserid] => [fbjoindate] => [fbname] => [logintype] => [fbaccesstoken] => [newrepcount] => [vbseo_likes_in] => [vbseo_likes_out] => [vbseo_likes_unread] => [temp] => [field1] => [field2] => [field3] => [field4] => [field5] => [subfolders] => [pmfolders] => [buddylist] => [ignorelist] => [signature] => [searchprefs] => [rank] => [icontitle] => [iconpath] => [avatarpath] => [hascustomavatar] => 0 [avatardateline] => [avwidth] => [avheight] => [edit_userid] => [edit_username] => [edit_dateline] => [edit_reason] => [hashistory] => [pagetext_html] => [hasimages] => [signatureparsed] => [sighasimages] => [sigpic] => [sigpicdateline] => [sigpicwidth] => [sigpicheight] => [postcount] => 2 [islastshown] => [isfirstshown] => [attachments] => [allattachments] => ) --> http://www.stevedunn.ca/ <----------------<<<[/ref][/ref] > ------------------------------------------------------------------ > Say hi to my cat -- http://www.stevedunn.ca/photos/toby/[/ref] And here I thought I had set the time wrong! Also if you are updating an 'older' Compaq ProLiant that requires the EFS548 version it generates the following error on 5.0.7: i386ld: Symbol pci-debug in /var/opt/K/SCO/link/1.1.1Hw/etc/conf/pack.d/cpqw/Driver.o is multiply defined. first defined in /var/opt/K/SCO/link/1.1.1Hw/etc/conf/pack.d/pci/Driver.o ERROR: Can not link-edit unix Obviously the EFS install fails and you don't get the required drivers. HP/Compaq is not showing any updated EFS although I did report this March 12th. Normally by now I would have updated a portion of my client base and would have something to report. The problem now is that we do not want to be supporting two versions of O/S, so have left most people on 5.0.6. I am testing Progress 9.1b on 5.0.7, with a ProLiant ML530G3. So far the system has locked up about 6 times with the database running, no panic dumps as the system is completely locked. The server runs fine for days without Progress running, locks up in a few hours after the database is started. No idea why yet. Mike -- Michael Brown The Kingsway Group [allowsmilie] => 1 [showsignature] => 0 [ipaddress] => [iconid] => 0 [visible] => 1 [attach] => 0 [infraction] => 0 [reportthreadid] => 0 [isusenetpost] => 1 [msgid] => <3F58AA15.D55682A1@tkg.ca> [ref] => [htmlstate] => on_nl2br [postusername] => Mike [ip] => mike@tkg.ca [isdeleted] => 0 [usergroupid] => [membergroupids] => [displaygroupid] => [password] => [passworddate] => [email] => [styleid] => [parentemail] => [homepage] => [icq] => [aim] => [yahoo] => [msn] => [skype] => [showvbcode] => [showbirthday] => [usertitle] => [customtitle] => [joindate] => [daysprune] => [lastvisit] => [lastactivity] => [lastpost] => [lastpostid] => [posts] => [reputation] => [reputationlevelid] => [timezoneoffset] => [pmpopup] => [avatarid] => [avatarrevision] => [profilepicrevision] => [sigpicrevision] => [options] => [akvbghsfs_optionsfield] => [birthday] => [birthday_search] => [maxposts] => [startofweek] => [referrerid] => [languageid] => [emailstamp] => [threadedmode] => [autosubscribe] => [pmtotal] => [pmunread] => [salt] => [ipoints] => [infractions] => [warnings] => [infractiongroupids] => [infractiongroupid] => [adminoptions] => [profilevisits] => [friendcount] => [friendreqcount] => [vmunreadcount] => [vmmoderatedcount] => [socgroupinvitecount] => [socgroupreqcount] => [pcunreadcount] => [pcmoderatedcount] => [gmmoderatedcount] => [assetposthash] => [fbuserid] => [fbjoindate] => [fbname] => [logintype] => [fbaccesstoken] => [newrepcount] => [vbseo_likes_in] => [vbseo_likes_out] => [vbseo_likes_unread] => [temp] => [field1] => [field2] => [field3] => [field4] => [field5] => [subfolders] => [pmfolders] => [buddylist] => [ignorelist] => [signature] => [searchprefs] => [rank] => [icontitle] => [iconpath] => [avatarpath] => [hascustomavatar] => 0 [avatardateline] => [avwidth] => [avheight] => [edit_userid] => [edit_username] => [edit_dateline] => [edit_reason] => [hashistory] => [pagetext_html] => [hasimages] => [signatureparsed] => [sighasimages] => [sigpic] => [sigpicdateline] => [sigpicwidth] => [sigpicheight] => [postcount] => 3 [islastshown] => [isfirstshown] => [attachments] => [allattachments] => ) --> Is 5.0.7 ready for production? - SCO

Is 5.0.7 ready for production? - SCO

I am about to do some new/upgrade proposals and think I want to use 5.0.7, but I am concerned about stability, licensing, and hardware hassles. I just googled for "5.0.7", range June - today, I see quite a few requesting assistance with problems. Did I get the wrong impression from those posts? That perhaps 5.0.7 comes with some brand new headaches, even if one does fine on 5.0.5 and 5.0.6? Maybe this is a stupid question (to publicly ask "Is 5.0.7 OK?"), or if answering a stupid question like this might imply the one who answers is as stupid as ...

  1. #1

    Default Is 5.0.7 ready for production?

    I am about to do some new/upgrade proposals and think I want to use 5.0.7,
    but I am concerned about stability, licensing, and hardware hassles. I just
    googled for "5.0.7", range June - today, I see quite a few requesting
    assistance with problems.



    Did I get the wrong impression from those posts? That perhaps 5.0.7 comes
    with some brand new headaches, even if one does fine on 5.0.5 and 5.0.6?
    Maybe this is a stupid question (to publicly ask "Is 5.0.7 OK?"), or if
    answering a stupid question like this might imply the one who answers is as
    stupid as the one asking the questions? If that is the case, this post will
    get no replies of course.



    Should I just get some certified hardware and go for 5.0.7 just like I would
    5.0.6?


    Bob Guest

  2. #2

    Default Re: Is 5.0.7 ready for production?

    In article <bj8jb9$g2ph7$news.uni-berlin.de> "Bob Meyers" <com> writes:
    $I am about to do some new/upgrade proposals and think I want to use 5.0.7,
    $but I am concerned about stability, licensing, and hardware hassles. I just
    $googled for "5.0.7", range June - today, I see quite a few requesting
    $assistance with problems.

    I think you'll also find a number of people having problems with
    older versions, too. Your caution regarding using a new release
    of any software is wise, though; I guess you've been bitten by
    this too many times in the past :-)

    I have only one 5.0.7 system at the moment; I also have a few 5.0.4
    through 5.0.6 systems at various locations. It would be inappropriate to
    make a blanket statement regarding the stability of any hardware or
    software based on a sample size of one.

    The 5.0.7 system reboots spontaneously after being up for anywhere
    from a day or two to a few weeks. This system replaced a completely
    different box, running 5.0.6, that suddenly developed stability issues
    earlier this year, so while we originally suspected hardware
    problems (or power problems, as their UPS was not working at that
    point - they assure me it's fixed now), it now looks likely that
    there is some other issue, entirely outside the server hardware
    and software, that is causing the problem; we haven't been able
    to track it down yet.

    The 5.0.7 system also has an odd issue in which the hardware
    clock occasionally jumps exactly four hours forward. The setclk
    line in root's crontab was causing this, so I commented it out;
    it still happens on some, though not all, reboots, and I haven't
    figured out the pattern yet. I think I've finally programmed
    their local sysadmin to check and correct the time if he
    reboots the system, but obviously if the system reboots when he's
    not there, he can't do this (and since their other servers,
    including production Web servers, rely on this server, it's
    preferable for it to come back up quickly, even if the time is
    wrong, than to stay down until the next business day).

    Other than that, it's been fine. As for the other systems,
    there's one that gets rebooted occasionally by the client;
    whenever anything is wrong with either it or their other server,
    they usually just reboot both servers, regardless of what the
    problem is or which server was having problems, and without
    making any notes about what is wrong or what error message they
    might have seen. Since they don't generally tell me when they've
    done this, I don't know if this system is more or less stable
    than the 5.0.7 system. I used to have a 5.0.0 system that was
    more stable than this and, over the years, became less stable
    than this. My other 5.0.4/5/6 systems are quite stable, as
    were some 5.0.2 systems I used to have.

    So the only 5.0.7 issue I have that doesn't seem to be
    outside the scope of the OS is the time thing, and that's a
    pretty minor problem for most people. Including the reboot
    problem that I suspect is not the fault of 5.0.7 or of the
    hardware, and it's near (but not necessarily at) the bottom
    of the stability range I've seen with other OSR5 versions;
    excluding the reboot problem, since it's probably not 5.0.7's
    fault, its stability is near the top of the range.
    --
    Stephen M. Dunn <ca> [/ref][/ref]
    ------------------------------------------------------------------
    Say hi to my cat -- http://www.stevedunn.ca/photos/toby/
    Stephen Guest

  3. #3

    Default Re: Is 5.0.7 ready for production?

    "Stephen M. Dunn" wrote:
    -snip- 

    -snip-
     [/ref]
    > ------------------------------------------------------------------
    > Say hi to my cat -- http://www.stevedunn.ca/photos/toby/[/ref]

    And here I thought I had set the time wrong!

    Also if you are updating an 'older' Compaq ProLiant that requires the
    EFS548 version it generates the following error on 5.0.7:

    i386ld: Symbol pci-debug in
    /var/opt/K/SCO/link/1.1.1Hw/etc/conf/pack.d/cpqw/Driver.o
    is multiply defined.

    first defined in
    /var/opt/K/SCO/link/1.1.1Hw/etc/conf/pack.d/pci/Driver.o
    ERROR: Can not link-edit unix


    Obviously the EFS install fails and you don't get the required drivers.
    HP/Compaq is not showing any updated EFS although I did report this
    March 12th. Normally by now I would have updated a portion of my client
    base and would have something to report. The problem now is that we
    do not want to be supporting two versions of O/S, so have left most
    people on 5.0.6.

    I am testing Progress 9.1b on 5.0.7, with a ProLiant ML530G3. So far
    the system has locked up about 6 times with the database running, no
    panic dumps as the system is completely locked. The server runs fine
    for days without Progress running, locks up in a few hours after
    the database is started. No idea why yet.

    Mike

    --
    Michael Brown

    The Kingsway Group
    Mike Guest

  4. #4

    Default Re: Is 5.0.7 ready for production?

    Mike Brown wrote:
     

    "pci_debug", actually. Which shows that you're typing this rather than
    cut-and-paste or redirecting to a file, tsk.
     

    You can fix the above problem by patching the variable name in
    pci/Driver.o to remove the conflict. The variable is internal to
    pci/Driver.o, so its name can be changed without a matching change
    elsewhere. The variable in Compaq's driver is internal to it. We just
    have to resolve the name collision:

    # cd /etc/conf/pack.d/pci
    # cp -p Driver.o Driver.o.orig
    # strings -a -o Driver.o | grep -i debug
    18790 pci_debug
    # bs=1 count=9 oseek=18790 of=Driver.o
    9+0 records in
    9+0 records out
    # strings -a -o Driver.o | grep -i debug
    18790 pci_Debug
    # cd /etc/conf/cf.d
    # ./link_unix
     

    Is this without database activity?? Doesn't the Compaq watchdog stuff
    kick in and reboot the system? (Not that that would be much better, but
    it's odd to actually _hang_ a system with a watchdog in it...)
     
    Bela Guest

  5. #5

    Default Re: Is 5.0.7 ready for production?

    Bela Lubkin typed (on Fri, Sep 05, 2003 at 10:06:50PM +0000):
    |
    | You can fix the above problem by patching the variable name in
    | pci/Driver.o to remove the conflict. The variable is internal to
    | pci/Driver.o, so its name can be changed without a matching change
    | elsewhere. The variable in Compaq's driver is internal to it. We just
    | have to resolve the name collision:
    |
    | # cd /etc/conf/pack.d/pci
    | # cp -p Driver.o Driver.o.orig
    | # strings -a -o Driver.o | grep -i debug
    | 18790 pci_debug
    | # bs=1 count=9 oseek=18790 of=Driver.o

    Hmm... missing echo and dd here?

    # echo pci_debug | dd bs=1 count=9 oseek=18790 of=Driver.o

    | 9+0 records in
    | 9+0 records out
    | # strings -a -o Driver.o | grep -i debug
    | 18790 pci_Debug
    | # cd /etc/conf/cf.d
    | # ./link_unix

    --
    JP
    Jean-Pierre Guest

  6. #6

    Default "pci_debug" symbol conflict, Re: Is 5.0.7 ready for production?

    Jean-Pierre Radley wrote:
     

    Yep. Don't know what went wrong. It's wrong in my outgoing log, so it
    must have been some sort of typo/braino.
     

    Almost:

    # echo pci_Debug | dd conv=notrunc bs=1 count=9 oseek=18790 of=Driver.o

    -- the whole point was to _change_ the name; and "conv=notrunc" is
    necessary or `dd` truncates the file.

    I also meant to point out that the seek offset was from the `strings`
    output, might be different on another system. I must have been asleep.
     
     
    Bela Guest

  7. #7

    Default Re: Is 5.0.7 ready for production?

    Bela Lubkin wrote: 
    >
    > "pci_debug", actually. Which shows that you're typing this rather than
    > cut-and-paste or redirecting to a file, tsk.
    >[/ref]

    Actually worse then that, it was from memory.

     
    >
    > You can fix the above problem by patching the variable name in
    > pci/Driver.o to remove the conflict. The variable is internal to
    > pci/Driver.o, so its name can be changed without a matching change
    > elsewhere. The variable in Compaq's driver is internal to it. We just
    > have to resolve the name collision:
    >
    > # cd /etc/conf/pack.d/pci
    > # cp -p Driver.o Driver.o.orig
    > # strings -a -o Driver.o | grep -i debug
    > 18790 pci_debug
    > # bs=1 count=9 oseek=18790 of=Driver.o
    > 9+0 records in
    > 9+0 records out
    > # strings -a -o Driver.o | grep -i debug
    > 18790 pci_Debug
    > # cd /etc/conf/cf.d
    > # ./link_unix
    >[/ref]

    I will try this on Sunday, noting the posts from JPR.

    .. 
    >
    > Is this without database activity?? Doesn't the Compaq watchdog stuff
    > kick in and reboot the system? (Not that that would be much better, but
    > it's odd to actually _hang_ a system with a watchdog in it...)[/ref]
     [/ref]


    Spent six hours on that server last night and this morning. I linked in
    the scodb, but after a lockup there is no keyboard input accepted ( not
    even CAPS-LOCK or NUM-LOCK toggle the leds ). Yes, the Compaq watchdog
    does kick in and reboot the server if I wait. After a reboot I can get
    Progress to start up, but if a wait a while the machine locks instantly
    when the database goes to start. With a bit more debugging I think
    there may be a relationship between a consumption of streams resources
    as reported in "netstat -m" under "streams memory in use" ( SMiU ) and
    the system locking up. With just a root login on tty01 running netstat
    and a graphic login on tty02 all is well. If I fire up mozilla, let
    it bring up the sco home page, then watch the SMiU it slowly
    climbs up. Looks like it will go from ~180k to 4MB in 11 minutes.
    Starting Progress at that point, or even using Mozilla to go to more
    web pages instantly locks the system. I repeated the test 3 times.

    If I just bring up Progress ( no graphical login ) the consumption is much
    slower, maybe 2k per minute. The server had frozen up and the Compaq
    watchdog rebooted it during the week, after 35.5 hours. As far as I can
    tell there was very little database use, the Progress server was just brought
    and left running.

    The HW is a ProLiant ML530, single 2.4Ghz Xeon with hyperthreading off,
    1536MB of ram, and a 6400 raid controller. The NIC is a Broadcom with
    driver version 6.0.129 embedded on the system board. I installed an
    Intel PRO100 card to replace the BCME, which I disabled in the bios
    and in netconfig. There was no change in symptoms, or speed of the
    consumption of SMiU.

    The SW is 5.0.7 with OSS656B and EFS5.60a ( which is current ).
    I updated the system to osr507mp1 and retested, but no change.

    I can ftp or rcp without any problem, copied 8GB to the machine without
    any apparent residual increase in SMiU, but after running mozilla for
    a few minutes the next attempt at copying data piped through a rcmd
    froze the system.

    The machine is not in production at all, it is just for compatibilty
    testing at this point. Any ideas?


    --
    Michael Brown

    The Kingsway Group
    Mike Guest

  8. #8

    Default Re: Is 5.0.7 ready for production?

    Mike Brown wrote:
     
    > >
    > > "pci_debug", actually. Which shows that you're typing this rather than
    > > cut-and-paste or redirecting to a file, tsk.[/ref]
    >
    > Actually worse then that, it was from memory.[/ref]

    Well in that case you did a pretty good job...
     
    > >
    > > Is this without database activity?? Doesn't the Compaq watchdog stuff
    > > kick in and reboot the system? (Not that that would be much better, but
    > > it's odd to actually _hang_ a system with a watchdog in it...)[/ref]
    >
    > Spent six hours on that server last night and this morning. I linked in
    > the scodb, but after a lockup there is no keyboard input accepted ( not
    > even CAPS-LOCK or NUM-LOCK toggle the leds ). Yes, the Compaq watchdog
    > does kick in and reboot the server if I wait.[/ref]

    I think there's a way to get the watchdog to trip into scodb rather than
    reboot -- some sort of software setting in the "cpqw" driver or
    something like that. If not, two other routes would be to install an
    NMI card in the machine (or it might have that built in -- so that's two
    questions to ask a knowledgable Compaq/HP tech); or try a serial
    console. Serial console is really easy. See a recent post from me on
    "5.0.7 machine locks up!", where I show how to do a temporary
    (single-session) serial console.
     

    That's not exactly "slow" for a resource that is normally sized in the
    neighborhood of 4MB.

    Is this the Mozilla build that came with OSR507? I haven't heard
    anything about it causing STREAMS leaks (and normally it would be
    difficult for a user-level program to cause a STREAMS leak).

    A leak usually shows up entirely in one particular STREAMS buffer size.
    Do you see that -- exceptionally high usage of one size, normal use of
    others? (I'm primarily interested in the "alloc" column.)
     

    We're looking at at least two distinct bugs here, probably three.
    Neither Mozilla nor Progress should cause STREAMS leaks; at worst, they
    might cause a burst of consumption at startup time, leveling off after
    reaching a steady state. There must be a kernel bug which reacts with
    some operation they're doing to cause the leak. Then, whatever it is,
    it's probably a Mozilla bug that it does it so _vigorously_.

    And finally, the system shouldn't lock up when it runs out of STREAMS.
    Normally it would produce console warnings, and various things that use
    STREAMS (mainly networking) would start failing. At a guess, some
    driver has a continual requirement for STREAMS blocks, and mishandles an
    error return very badly. Again guessing, from what you've said the
    Compaq EFS is the only added kernel code, so it probably contains the
    driver that converts a resource crisis into a hang.

    You could experiment with linking out various Compaq EFS drivers. I
    know some of them can be removed individually, some go in groups. (By
    "remove" I mean "turn off in link kit", not actually removing the
    software, unless it's too hard to figure out how to turn them off
    manually...)

    Is any part of the EFS required for the machine to work? Probably the
    RAID driver; anything else? Maybe the thing to do is retract all of the
    EFS that you don't absolutely need, then see which parts still persist.
    Do Mozilla and Progresss still leak STREAMS? Does the system still hang
    when hitting the high water mark?
     
    Bela Guest

  9. #9

    Default Re: Is 5.0.7 ready for production?

    Bela Lubkin wrote: 

    I think there is a NMI switch built in, I will try it.
     
    >
    > That's not exactly "slow" for a resource that is normally sized in the
    > neighborhood of 4MB.
    >
    > Is this the Mozilla build that came with OSR507? I haven't heard
    > anything about it causing STREAMS leaks (and normally it would be
    > difficult for a user-level program to cause a STREAMS leak).
    >[/ref]

    It is the standard Mozilla, so far its just a plain jane install.

     

    You are right, the class 6, 2048 bytes column is at 836 alloc, everything
    else is at 0 or 1.

    Just the database server was started 20 hours ago, no actual use yet, and
    MSiU is up to 1725.85KB. A bit less than 2K per minute.
     
    >
    > We're looking at at least two distinct bugs here, probably three.
    > Neither Mozilla nor Progress should cause STREAMS leaks; at worst, they
    > might cause a burst of consumption at startup time, leveling off after
    > reaching a steady state. There must be a kernel bug which reacts with
    > some operation they're doing to cause the leak. Then, whatever it is,
    > it's probably a Mozilla bug that it does it so _vigorously_.
    >
    > And finally, the system shouldn't lock up when it runs out of STREAMS.
    > Normally it would produce console warnings, and various things that use
    > STREAMS (mainly networking) would start failing. At a guess, some
    > driver has a continual requirement for STREAMS blocks, and mishandles an
    > error return very badly. Again guessing, from what you've said the
    > Compaq EFS is the only added kernel code, so it probably contains the
    > driver that converts a resource crisis into a hang.
    >[/ref]

    It would not be the first time.
     [/ref]

    I think the raid driver ( CISS ) may be the only item I need, so on Monday
    I will start the debugging process by removing casmd and cevtd.


    --
    Michael Brown

    The Kingsway Group
    Mike Guest

  10. #10

    Default ML530/STREAMS leak, Re: Is 5.0.7 ready for production?

    Mike Brown wrote:
     
    > >
    > > That's not exactly "slow" for a resource that is normally sized in the
    > > neighborhood of 4MB.
    > >
    > > Is this the Mozilla build that came with OSR507? I haven't heard
    > > anything about it causing STREAMS leaks (and normally it would be
    > > difficult for a user-level program to cause a STREAMS leak).[/ref]
    >
    > It is the standard Mozilla, so far its just a plain jane install.

    >
    > You are right, the class 6, 2048 bytes column is at 836 alloc, everything
    > else is at 0 or 1.
    >
    > Just the database server was started 20 hours ago, no actual use yet, and
    > MSiU is up to 1725.85KB. A bit less than 2K per minute.[/ref]

    As an experiment, bring the machine up, bring up Progress and/or Mozilla
    (probably worthwhile to try this with each, individually); then mark the
    network as down:

    ifconfig net0 down # or net1

    _then_ disconnect the ethernet cable. Then leave the machine alone for
    a while, see if STREAMS leak. Which should take a few minutes with
    Mozilla, maybe an hour with Progress. (You don't need to wait until
    STREAMS are all leaked away, just long enough to say "yep, they're
    leaking at the same rate as before".) Do it with `ifconfig down` and
    physical disconnection, not e.g. reconfiguring the kernel with no TCP/IP
    -- I want to keep everything as much the same as possible, except no
    packets flow (cable disconnected) and the interface doesn't even accept
    packets to be sent (ifconfig down) -- else that in itself might leak
    STREAMS.

    If you kill Mozilla off before it's leaked every bit of STREAMS away,
    does it release the lost blocks? How about Progress?
     

    I'd appreciate if you can try the autistic test first -- shouldn't take
    very long, just a couple of clean reboots and short waits.
     
    Bela Guest

  11. #11

    Default Re: ML530/STREAMS leak, Re: Is 5.0.7 ready for production?

    Bela Lubkin wrote: 
    > >
    > > It is the standard Mozilla, so far its just a plain jane install.
    > > 
    > >
    > > You are right, the class 6, 2048 bytes column is at 836 alloc, everything
    > > else is at 0 or 1.
    > >
    > > Just the database server was started 20 hours ago, no actual use yet, and
    > > MSiU is up to 1725.85KB. A bit less than 2K per minute.[/ref]
    >
    > As an experiment, bring the machine up, bring up Progress and/or Mozilla
    > (probably worthwhile to try this with each, individually); then mark the
    > network as down:
    >
    > ifconfig net0 down # or net1
    >
    > _then_ disconnect the ethernet cable. Then leave the machine alone for
    > a while, see if STREAMS leak. Which should take a few minutes with
    > Mozilla, maybe an hour with Progress. (You don't need to wait until
    > STREAMS are all leaked away, just long enough to say "yep, they're
    > leaking at the same rate as before".) Do it with `ifconfig down` and
    > physical disconnection, not e.g. reconfiguring the kernel with no TCP/IP
    > -- I want to keep everything as much the same as possible, except no
    > packets flow (cable disconnected) and the interface doesn't even accept
    > packets to be sent (ifconfig down) -- else that in itself might leak
    > STREAMS.
    >
    > If you kill Mozilla off before it's leaked every bit of STREAMS away,
    > does it release the lost blocks? How about Progress?

    >
    > I'd appreciate if you can try the autistic test first -- shouldn't take
    > very long, just a couple of clean reboots and short waits.
    > [/ref]

    Some good news, I have built a ML370G1 to the same OS and patch level
    as the ML530 and its not having any problems. The SMiU is only 118KB,
    38 2048 byte blocks allocated after 12 hours.

    Mike

    --
    Michael Brown

    The Kingsway Group
    Mike Guest

  12. #12

    Default Re: ML530/STREAMS leak, Re: Is 5.0.7 ready for production?

    Mike Brown wrote: [/ref]
     [/ref]

    Easy test, with the net1 marked down and cable disconnected the leak stops.
    I used Mozilla for all the testing. I brought up Mozilla first with the
    networking on, and waited until 325 2K blocks were allocated. After
    marking the net1 down and pulling the cable the allocation stayed the
    same for 10 minutes. I next plugged just the cable in, and the allocation
    rose to 331, then minutes later dropped back to 287 and stayed there.
    After a while I marked net1 as up, and the leak started up again, after
    a delay of around 20 seconds. When 334 blocks had been allocated I closed
    Mozilla and the 2K allocation stayed at 335.
     [/ref]

    Maybe 1 or 2 blocks, generally not much.

     [/ref]
    >[/ref]

    When I started on the machine on Monday it had been running over 24 hours,
    with about 1.7MB SMiU. I started up Mozilla and waited to watch the 2K
    block allocation rise, but it stayed constant around 700 blocks. Did not
    seem to be any leaks. I checked to see what had changed, apparently nothing,
    so I used Mozilla for a while, checked out the news on SCO, drivers from
    Compaq for hundreds of pages with no lookups. Very puzzling. The Progress
    server had been running since saturday. I finally shut down Progress and
    rebooted the server. Then when I ran Mozilla the leak was back, and within
    5 screens on the Compaq site the server locked up. I had been to the
    same pages without a problem moments earlier.

    The BIOS supports turning on an NMI button on the system board. I brought
    the server up and pressed in the button and got:

    WARNING: casm: NMI Handler has been called on processor 0!
    WARNING: casm: NMI - Dump Switch has been pressed - Hour 12 - 9/8/2003
    NOTICE: casm: The ASR Timer has been disabled.
    PANIC: casm: FATAL NMI detected.

    debug0:1>

    I rebooted the machine, used Mozilla to cause a lock up and press the NMI
    button but nothing happened until the ASR rebooted the server.

    I then disabled casm and cevt, relinked and rebooted. Server still had a
    leak and froze up with Mozilla.

    I removed the Compaq EFS5.60a totally, and the server hung during Mozilla
    use after a few pages.

    Thanks to all the people who emailed me with suggestions, I did bring up
    a serial console on tty1a, used Mozilla to freeze the system and tried to
    break into debug with ^X, but the system stayed frozen.

    I also went into mkdev graphics and checked the settings, looked okay. I
    tested to see if it may be a video card driver problem by repeatedly
    scrolling up and down a large page in Mozilla, moving it around and
    changing it size. No extra streams leaks or lockups.

    I finished off using rcmd to copy 2GB of data into the system, watched
    2K allocation rise to 818 blocks without a problem, and checked that the
    data got over. After it was finished the allocation did not reduce, and
    7 hours later it is still at 848 blocks. Progress is not running.

    Also, due to a new form of dyslexia called "cantkeepforeverchangingmodelscorrect"
    please note that the machine is actually a ML370G3 with a 2.8Ghz xeon.

    Mike

    --
    Michael Brown

    The Kingsway Group
    Mike Guest

  13. #13

    Default Re: Is 5.0.7 ready for production?


    "Bob Meyers" <com> wrote in message
    news:bj8jb9$g2ph7$news.uni-berlin.de... 
    just 
    as 
    will 
    would 

    Well now, and I thought I would only see 1 or so responses :-) Thanks guys!


    Bob Guest

Similar Threads

  1. Is PHP5 ready for semi-production release apps?
    By Useko Netsumi in forum PHP Development
    Replies: 2
    Last Post: April 29th, 07:19 PM
  2. CD production
    By Dana_Barish@adobeforums.com in forum Adobe Acrobat Windows
    Replies: 1
    Last Post: April 19th, 08:41 AM
  3. W3D never ready
    By Chimpoid in forum Macromedia Director Lingo
    Replies: 0
    Last Post: July 9th, 10:11 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •