Professional Web Applications Themes

HACMP and failed network - AIX

Over the weekend we had the following to occur and I've been tasked to find out why. We had a complete network failure and the HA servers crashed. Now to my understanding HA is supposed to detect network failures and come into action. What could be some of the causes of this? Thanks to all replies...

  1. #1

    Default HACMP and failed network

    Over the weekend we had the following to occur and I've been tasked to
    find out why. We had a complete network failure and the HA servers
    crashed. Now to my understanding HA is supposed to detect network
    failures and come into action. What could be some of the causes of
    this?

    Thanks to all replies
    St. Claire Guest

  2. #2

    Default Re: HACMP and failed network

    On 21 Jul 2003 04:46:59 -0700, [email]tstclairehotmail.com[/email] (St. Claire)
    wrote:
    >Over the weekend we had the following to occur and I've been tasked to
    >find out why. We had a complete network failure and the HA servers
    >crashed. Now to my understanding HA is supposed to detect network
    >failures and come into action.
    Your understanding is wrong. HA can detect and handle the failure of
    a single adapter or network drop and switch to a standby. A complete
    network failure means there are no working standby adapters so
    switching is pointless. You need to work with IBM support to make
    sure your levels of HA code are up to date. HA should be able to
    survive a network failure based on my experience.


    Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us
    Darrell Frappier Guest

  3. #3

    Default Re: HACMP and failed network

    That makes sense. However, does this warrant a dead man switch
    condition if the system is still operating normally without a network?


    Darrell Frappier <darrell at frappier dot us> wrote in message news:<06930f653b5f963a3b8f2cf301177035free.terane ws.com>...
    > On 21 Jul 2003 04:46:59 -0700, [email]tstclairehotmail.com[/email] (St. Claire)
    > wrote:
    > >Over the weekend we had the following to occur and I've been tasked to
    > >find out why. We had a complete network failure and the HA servers
    > >crashed. Now to my understanding HA is supposed to detect network
    > >failures and come into action.
    >
    > Your understanding is wrong. HA can detect and handle the failure of
    > a single adapter or network drop and switch to a standby. A complete
    > network failure means there are no working standby adapters so
    > switching is pointless. You need to work with IBM support to make
    > sure your levels of HA code are up to date. HA should be able to
    > survive a network failure based on my experience.
    >
    >
    > Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us
    St. Claire Guest

  4. #4

    Default Re: HACMP and failed network

    This does not warrant a Dead Man Switch. A dead-man switch is where
    HACMP can't get enough CPU to set heartbeats over IP and Non-ip
    networks. I think you're missing something here. If you experienced
    a total network failure, was it just the IP network? If you
    experience just an IP network failure, then, your non-ip network is
    still available and HACMP sends it's heart beats over that, that way,
    it knows that servers are still alive and it's just the IP net. If
    you experienced a total failure, both IP and Non IP, then did you
    experience a Power Failure in your Computer room or is your private
    network between your cluster on the IP Network?

    HTH,
    Pete's

    [email]tstclairehotmail.com[/email] (St. Claire) wrote in message news:<8581fcb0.0307220654.3a1d7494posting.google. com>...
    > That makes sense. However, does this warrant a dead man switch
    > condition if the system is still operating normally without a network?
    >
    >
    > Darrell Frappier <darrell at frappier dot us> wrote in message news:<06930f653b5f963a3b8f2cf301177035free.terane ws.com>...
    > > On 21 Jul 2003 04:46:59 -0700, [email]tstclairehotmail.com[/email] (St. Claire)
    > > wrote:
    > > >Over the weekend we had the following to occur and I've been tasked to
    > > >find out why. We had a complete network failure and the HA servers
    > > >crashed. Now to my understanding HA is supposed to detect network
    > > >failures and come into action.
    > >
    > > Your understanding is wrong. HA can detect and handle the failure of
    > > a single adapter or network drop and switch to a standby. A complete
    > > network failure means there are no working standby adapters so
    > > switching is pointless. You need to work with IBM support to make
    > > sure your levels of HA code are up to date. HA should be able to
    > > survive a network failure based on my experience.
    > >
    > >
    > > Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us
    Pete's Guest

  5. #5

    Default Re: HACMP and failed network

    In real sense, by default HACMP does not take any action against Network failure
    event except logging in the HACMP log files.
    However you can customize HACMP to take any specific action ( e.g to run
    some script ) or you can promote network failure to node failure.
    However default behavior is to just log the event.

    SASA

    [email]tstclairehotmail.com[/email] (St. Claire) wrote in message news:<8581fcb0.0307230950.6f3ac7dposting.google.c om>...
    > I didn't think it should but I'm really confused as to why it
    > happened. We have two [tcp/ip] networks here and they both went down
    > but there was no power failure. We have one non ip network and that
    > one was fine. There was no system dump which tells me that the system
    > did not crash; I guess it just hanged. My reason for thinking it's HA
    > related is because the only systems that 'crashed' were the HA nodes.
    >
    >
    >
    > [email]empete2000[/email] (Pete's) wrote in message news:<6724a51f.0307230452.77f19d7cposting.google. com>...
    > > This does not warrant a Dead Man Switch. A dead-man switch is where
    > > HACMP can't get enough CPU to set heartbeats over IP and Non-ip
    > > networks. I think you're missing something here. If you experienced
    > > a total network failure, was it just the IP network? If you
    > > experience just an IP network failure, then, your non-ip network is
    > > still available and HACMP sends it's heart beats over that, that way,
    > > it knows that servers are still alive and it's just the IP net. If
    > > you experienced a total failure, both IP and Non IP, then did you
    > > experience a Power Failure in your Computer room or is your private
    > > network between your cluster on the IP Network?
    > >
    > > HTH,
    > > Pete's
    > >
    > > [email]tstclairehotmail.com[/email] (St. Claire) wrote in message news:<8581fcb0.0307220654.3a1d7494posting.google. com>...
    > > > That makes sense. However, does this warrant a dead man switch
    > > > condition if the system is still operating normally without a network?
    > > >
    > > >
    > > > Darrell Frappier <darrell at frappier dot us> wrote in message news:<06930f653b5f963a3b8f2cf301177035free.terane ws.com>...
    > > > > On 21 Jul 2003 04:46:59 -0700, [email]tstclairehotmail.com[/email] (St. Claire)
    > > > > wrote:
    > > > > >Over the weekend we had the following to occur and I've been tasked to
    > > > > >find out why. We had a complete network failure and the HA servers
    > > > > >crashed. Now to my understanding HA is supposed to detect network
    > > > > >failures and come into action.
    > > > >
    > > > > Your understanding is wrong. HA can detect and handle the failure of
    > > > > a single adapter or network drop and switch to a standby. A complete
    > > > > network failure means there are no working standby adapters so
    > > > > switching is pointless. You need to work with IBM support to make
    > > > > sure your levels of HA code are up to date. HA should be able to
    > > > > survive a network failure based on my experience.
    > > > >
    > > > >
    > > > > Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us
    sasa queer Guest

Similar Threads

  1. Login failed for user 'NT AUTHORITY\NETWORK SERVICE'
    By ecoder in forum ASP.NET Security
    Replies: 13
    Last Post: September 26th, 04:12 AM
  2. Replies: 0
    Last Post: May 13th, 04:40 AM
  3. HACMP
    By St. Claire in forum AIX
    Replies: 6
    Last Post: March 5th, 10:25 AM
  4. HACMP and rsh
    By St. Claire in forum AIX
    Replies: 2
    Last Post: July 1st, 02:46 PM
  5. Hacmp 4.4.1 on AIX 5.2
    By peter.glover@dsl.pipex.com in forum AIX
    Replies: 0
    Last Post: June 25th, 09:56 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139