Professional Web Applications Themes

Odd sendmail problem - SCO

(SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504 SCO:SendMail::8.8.5c) Howdy. An old system which has been working fine (up until recently, no hardware changes, and only unrelated software changes (non-os, just custom app)) has had it's sendmail daemon starting to act funny. The only tell-tail signs are that the load average sky rockets, /usr/spool/mqueue/ (/var/opt/K/SCO/SendMail/8.8.5c/spool/mqueue/) becomes inaccessable to 'ls' etc., and thus mail flow stops. A reboot of the machine returns it to normal usage again for some time, but it does it again 2-3 days later. I thought that maybe some garbage message was getting into the queue, but have no awy of verifying ...

  1. #1

    Default Odd sendmail problem

    (SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504 SCO:SendMail::8.8.5c)

    Howdy.

    An old system which has been working fine (up until recently, no hardware
    changes, and only unrelated software changes (non-os, just custom app)) has
    had it's sendmail daemon starting to act funny.

    The only tell-tail signs are that the load average sky rockets,
    /usr/spool/mqueue/ (/var/opt/K/SCO/SendMail/8.8.5c/spool/mqueue/) becomes
    inaccessable to 'ls' etc., and thus mail flow stops.

    A reboot of the machine returns it to normal usage again for some time, but
    it does it again 2-3 days later.

    I thought that maybe some garbage message was getting into the queue, but
    have no awy of verifying this.

    Two things I'd like to know:

    Is there any way to find out what is under that directory structure
    without relying upon the 'ls' and 'find' tools. Find shows the first 2
    entries, but then stops.

    Beyond upgrading everything (not an option at this stage), any other
    ideas about how to combat this?

    The system is in a protected network, so it is highly doubtful that someone
    is attacking this from the outside (it is not in any name servers, and has
    to go through a firewall to get there).

    Firewall logs don't show anything out of normal.

    For the moment, I've tried ruling out an issue with the symlink, and just
    made a bare directory, hopefully leaving the old contents intact (after a
    reboot).

    bkx


    Stuart Guest

  2. #2

    Default Re: Odd sendmail problem

    In article <tpgi.com.au>,
    Stuart J. Browne <com.au> wrote: 
     

    Are you 100% sure that the non-os custom ap isn't contributing to
    the problem. Some applications have been known to send mail - you
    know like LP does when a job is canceled. Sending mail to
    an unknown user on the system can cause a lot of 'undelivered
    messages'.
     
     

    Next time don't reboot the machine but kill sendmail. The go
    look at the messages in the queue. What does the ouptut of
    'mailq' tell you. Look at the headers and see if you can spot the
    problem message.
     

    If it cures everytime you restart and don't find the problem before
    you restart you are going to be having this until you determine
    what is causing it.
     
     

    So what does 'ls' give you. If it's too slow you probably have
    thousands in the queue and sorting them is going to take time so
    just type echo *. Output will be a mass all on one line, but
    pipe it through wc and you'll get an idea.
     

    Why do you think upgrading witll cure this if you haven't found
    the problem? I'm not being facetious. You say the only change
    was a non-os application. If everthing was find until then it
    surely points a finger at that application. Of course if many
    people have root access anything could be wrong. And an upgrade
    instead of a complete reinstall could carry the problem forward.
     

    I also doubt that is a problem.
     

    You are looking at the wrong logs. Look at the sendmail logs.
    Your clue should be there.
     

    "hopefully?"

    And what 'bare directory' did you make?

    Unix systems DO NOT need to be rebooted in my experience unless
    it's something that has to do with a kernel related problem. That's
    based on 20 years with *n*x systems.

    I've used sendmail as the MTA for two ISPs and it just doesn't
    problems as you have described. I really bet it is NOT sendmail
    acting funny but something else causing it to aculate tonnes of
    undeliverable messages.

    Bill
    --
    Bill Vermillion - bv wjv . com
    Bill Guest

  3. #3

    Default Re: Odd sendmail problem

    Bill Vermillion typed (on Tue, Oct 14, 2003 at 07:25:01PM +0000):
    | In article <tpgi.com.au>,
    | Stuart J. Browne <com.au> wrote:
    | >(SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504 SCO:SendMail::8.8.5c)
    |
    | >Two things I'd like to know:
    |
    | > Is there any way to find out what is under that directory structure
    | >without relying upon the 'ls' and 'find' tools. Find shows the first 2
    | >entries, but then stops.

    I suspect you're describing what happens when you run a command like

    find /some/path | xargs grep "some string"

    and it turns out that there is a pipe-file under /some/path. It is
    the 'grep' that is apparently stopping -- but it isn't stopping, it'll
    eternally read such a FIFO looking for "some string", until you lose
    patience and kill the command.

    If that's what's happening, then run:

    find /some/path ! -type p | xargs grep "some string"

    or better, to avoid the error you get when you try to grep a directory:

    find /some/path -type f | xargs grep "some string"

    --
    JP
    Jean-Pierre Guest

  4. #4

    Default Re: Odd sendmail problem


    "Jean-Pierre Radley" <com> wrote in message
    news:jpr.com... 
    SCO:SendMail::8.8.5c) 
    structure 


    Actually, all I did was:

    cd /usr/spool/mqueue
    find . -print

    It showed two entries, then halted.


    Stuart Guest

  5. #5

    Default Re: Odd sendmail problem

    > Stuart J. Browne <com.au> wrote: [/ref]
    SCO:SendMail::8.8.5c) 

    >
    > Are you 100% sure that the non-os custom ap isn't contributing to
    > the problem. Some applications have been known to send mail - you
    > know like LP does when a job is canceled. Sending mail to
    > an unknown user on the system can cause a lot of 'undelivered
    > messages'.[/ref]

    Our application does launch "/usr/lib/sendmail -t", but I can vouch for the
    data that goes to it. At worst, it should be forwarded straight through
    the queue (DS entry in /usr/lib/sendmail.cf), to a real mail server.
     

    >
    > Next time don't reboot the machine but kill sendmail. The go
    > look at the messages in the queue. What does the ouptut of
    > 'mailq' tell you. Look at the headers and see if you can spot the
    > problem message.[/ref]

    This is what I've attempted each time it's happened. The sendmail
    processes don't die. Mailq doesn't complete. I thought maybe it was
    lots-of-files, but after leaving it for over half an hour and still no
    response, no. I've seen lots-of-files before. Besides, using an unsorted
    'find' would get past that, which it did not.
     
    >
    > If it cures everytime you restart and don't find the problem before
    > you restart you are going to be having this until you determine
    > what is causing it.


    >
    > So what does 'ls' give you. If it's too slow you probably have
    > thousands in the queue and sorting them is going to take time so
    > just type echo *. Output will be a mass all on one line, but
    > pipe it through wc and you'll get an idea.[/ref]

    Forgot about 'echo *', will try that next time.
     
    >
    > Why do you think upgrading witll cure this if you haven't found
    > the problem? I'm not being facetious. You say the only change
    > was a non-os application. If everthing was find until then it
    > surely points a finger at that application. Of course if many
    > people have root access anything could be wrong. And an upgrade
    > instead of a complete reinstall could carry the problem forward.[/ref]

    I'd agree with you if we hadn't updated that application on over 100 other
    machines, some of which are identical. I'm more inclined to beleive that
    it has to do with external changes on the LAN (which I know is a pipe
    dream).

    The only thing the custom app does with regards to sendmail is issue
    '/usr/lib/sendmail -t < tmp.file'. The code which does that hasn't changed
    within the application. At no other time does it go anywhere near sendmail
    or the mail subsystems.
     
    >
    > I also doubt that is a problem.

    >
    > You are looking at the wrong logs. Look at the sendmail logs.
    > Your clue should be there.[/ref]

    Unfortunately not. They are clean. Shows normal operation, and then just
    no operation.
     
    >
    > "hopefully?"[/ref]

    There are no qf* files in there, just an assortment of df* and xf*, so no
    header details. Nothing is obscenely large, just a mix and match of normal
    mail traffic, about 30 partial messages in all.

    'hopefully' meaning that the nasty-killing of processes due to reboot
    leaves the partial files intact, and not cleaned off by a fsck during boot
    (I don't have console access, and walking the users out there through
    single-user boot-up, bypassing fsck's isn't really on the cards).
     

    Moved symlink '/usr/spool/mqueue' aside, created '/usr/spool/mqueue' as a
    directory.
     

    I'm of a similar mind (much to my bosses frustration), but when the load
    average keeps creeping, and processes don't die, choices are very limited,
    especially on production machines where people are trying to work.

    I'll take 10 min abuse for dropping everybody off, rather than 8hrs of
    "It's slow! it's slow!" in this sort of situation.
     

    I use it for 1 ISP, as well as all inhouse, please all of our clients (in
    excess of 100+). I trust sendmail. I don't have that many running this
    particular version on OSR504 however, so I was wondering if there was a
    specific issue that I was unaware of.
     

    Thanks Bill ;) Your insight is much appreciated.

    bkx


    Stuart Guest

  6. #6

    Default Re: Odd sendmail problem

    In article <3f8c880b$tpgi.com.au>,
    Stuart J. Browne <com.au> wrote: [/ref]
    >SCO:SendMail::8.8.5c)[/ref]
     [/ref][/ref]
     [/ref][/ref]
     [/ref]
     

    Is the DS defined to be another machine? If so what is the
    possiblity that machine is rejecting the mail and going back to a
    non-existant user.

    I had a situation [BSD web server] where the ownership of the web
    server was changed to www - so a break wouldn't give root access.
    When the account was created it was with a directory of
    'nonexistant' and shell of '/bin/nologin'. One day I got
    'file system full' message and found that 'www' had many MBs of
    messages and with no shell or any other access it went unoticed
    until things got full.

    Another time [a long time ago] I ed up on something in
    sendmail - back in the 4.? days when it wasn't as robust and when I
    started it up one process was generating error messages and filling
    up the space at a rapid pace. [Irix on an SGI].

    I tried killing a process I'd see in the ps but I would get no such
    processes. sendmail had run amuck and was spawning processes and
    finishing them faster than I could find them.

    I issued 'killall /usr/sbin/sendmail' and it took about 20 minutes
    for the system to become stable as it was killing processes only
    slightly faster than sendmail was generating. This was in a
    headless server environment where you don't really want to have to
    reboot unless absolutely neccessary.
     [/ref][/ref]

    You can bet the load average went through the roof in the above
    scenario also.
     [/ref][/ref]
     [/ref]
     

    Are you trying to kill sendmail by the process ID. As noted above
    I found that was fruitless.

    Since all the sendmail files end in digits often when I look at
    a listing the queue and want to find out what is causing what
    and not running mailq I'll just list the directory and note the
    last digits of the last file. Then I can 'less' *NN and get
    the q and d files. You might try this after using echo *
    and then perhaps try echo *NN until you see a number or two that
    you can try cat *NN [if nothing else works] to at least get a clue
    on what has gone astray.
     [/ref][/ref]
     [/ref]
     [/ref][/ref]
     [/ref][/ref]
     [/ref]
     

    It's easy to forget and I cuss myself when I need it and take a
    while to remember it. I used to have it when trying to fix old
    Xenix systems where no one had a clue and I'd boot with one of my
    disks and use echo * to find out what was there.

     [/ref][/ref]
     [/ref]
     

    It could be related to that. Particulary if the DS variable in
    your sendmail.cf is not empty and is pointing somewhere else.

    As to other machines which are identical I think one of Murphy's
    laws of computeing is 'Identical systems never are'.
     

    [When I refrenced /usr/bin/sendmail above I had forgotten that
    it was under /usr/lib on SCO].

    On your tmp.file you read in with the -t I have no idea what is in
    it but try it with ONLY root at the localhost and see if it goes
    where it should, then try adding the others names one at a time.
    You might also wan't to temporarily make the DS a blank if it is
    not at this moment, and then restart sendmail. The .cf and
    the sendmail.cw [or local-host-names depending on versions], and
    the relay file are the only ones I've found that need a restart
    to be recognized. I've not seen any doentaion [it is too large]
    so that is just observation.
     [/ref][/ref]
     [/ref]
     [/ref][/ref]
     [/ref]
     

    Do you know when the app is supposed to be sending mail. If so can
    you trigger while performing a tail -f on the logfile. [Thea
    presuposes your system is not so busy that they scroll of the
    screen faster than you can see them like one of my servers].
     [/ref][/ref]
     [/ref]
     

    The only time I've seen just a df* and nothing else appeared to be
    from some spammer somewhere as the files are huge and mostly
    binary. Could something like swen be on a local PC and deluging
    you with something like that? That's just a wild thought as I type
    this.
     

    fsck should not touch a thing if you did a reboot and not a
    poweroff. If your system is swamped it may take quite awhile for
    the shutdown to complete. If it never does and you have to take
    drastic measures you might try sync;sync;sync;reboot all on
    one line. If that works it might get most things flushed so an
    fsck won't occur - but dont bet on it.
     [/ref]
     

    Understand.
     [/ref]
     

    Do you have a feel for how long it is before it starts getting
    sluggish? If so go for about 1/2 that time and just stop sendmail,
    and take a look to see if things are building up. Then fire it
    back up and see if it goes down at the normal time, or if it now
    takes from the restart time to the normal slowness time to get
    slow. [I hope I explained that properly].

    If that is the case - the time for slowness is based upon sendmail
    start time - you might make a cron entry to restart sendmail
    nightly. If sendmail takes a while to shut down you might want to
    leave it off for 10 or more minutes.

    If you have a secondary MX machine that should be no problem as
    any mail destinated for that machine will be held there.
    [I'm a firm believer in secondary MX but I'm constantly amazed at
    how people have them].
     

    That's just good business sense.
     [/ref]
     

    I run a small niche market ISP - but in the past 10 days I've seen
    my daily maillog grow 25% larger as the spam just keeps increasing.

    There were 245,808 lines in yesterdays maillog and that's only
    a user population of about 150. [One web site comes up #1 in Google
    and it has never been adversised - only in the last two months has
    any development work been done on it in two years. I just checked
    and I tossed away over 49,000 messages destined for that one
    yesterday. So that's 20 percent of the total. For that I don't
    send rejects or anything. I just route then to /dev/null. Not
    polite but the only effective way to handle those.

    Being on a tier 1 backbone makes life interesting - and sometimes
    hectic.

    I hope some of these ideas may help.

    Bill


    --
    Bill Vermillion - bv wjv . com
    Bill Guest

Similar Threads

  1. Sendmail PHP integration problem
    By Sumith in forum PHP Development
    Replies: 6
    Last Post: February 1st, 05:18 AM
  2. php <-> sendmail problem
    By in forum PHP Development
    Replies: 3
    Last Post: August 2nd, 12:19 PM
  3. rpm -i sendmail problem
    By Wenjie in forum Linux Setup, Configuration & Administration
    Replies: 5
    Last Post: July 29th, 12:35 AM
  4. Sendmail problem in AIX 4.3.3
    By subhasbothra in forum AIX
    Replies: 1
    Last Post: July 19th, 08:09 AM
  5. sendmail problem
    By zico in forum Linux / Unix Administration
    Replies: 0
    Last Post: July 18th, 10:21 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139