Professional Web Applications Themes

[7.4.1] signal 11 while accesing DB - PostgreSQL / PGSQL

Hi, Yesterday morning one of our PostgreSQL engines stopped unexpectedly, with the following log: postgres[4192]: [5105-1] LOG: server process (PID 26678) was terminated by signal 11 postgres[4192]: [5106-1] LOG: terminating any other active server processes postgres[4192]: [5107-1] LOG: all server processes terminated; reinitializing postgres[26717]: [5108-1] LOG: database system was interrupted at 2005-01-23 07:23:37 CET postgres[26717]: [5109-1] LOG: checkpoint record is at 31B/FE2C9BA4 postgres[26717]: [5110-1] LOG: redo record is at 31B/FE00A484; undo record is at 0/0; shutdown FALSE postgres[26717]: [5111-1] LOG: next transaction ID: 390435693; next OID: 678576128 postgres[26717]: [5112-1] LOG: database system was not properly shut down; automatic recovery in ...

  1. #1

    Default [7.4.1] signal 11 while accesing DB

    Hi,
    Yesterday morning one of our PostgreSQL engines stopped unexpectedly,
    with the following log:

    postgres[4192]: [5105-1] LOG: server process (PID 26678) was terminated by signal 11
    postgres[4192]: [5106-1] LOG: terminating any other active server processes
    postgres[4192]: [5107-1] LOG: all server processes terminated; reinitializing
    postgres[26717]: [5108-1] LOG: database system was interrupted at 2005-01-23 07:23:37 CET
    postgres[26717]: [5109-1] LOG: checkpoint record is at 31B/FE2C9BA4
    postgres[26717]: [5110-1] LOG: redo record is at 31B/FE00A484; undo record is at 0/0; shutdown FALSE
    postgres[26717]: [5111-1] LOG: next transaction ID: 390435693; next OID: 678576128
    postgres[26717]: [5112-1] LOG: database system was not properly shut down; automatic recovery in progress
    postgres[26717]: [5113-1] LOG: redo starts at 31B/FE00A484
    postgres[26717]: [5114-1] LOG: record with zero length at 31B/FEFCE3A8
    postgres[26717]: [5115-1] LOG: redo done at 31B/FEFCE384
    postgres[26717]: [5116-1] LOG: removing transaction log file "0000031B000000FB"
    postgres[26717]: [5117-1] LOG: removing transaction log file "0000031B000000FC"
    postgres[26717]: [5118-1] LOG: removing transaction log file "0000031B000000FD"
    postgres[26717]: [5119-1] LOG: database system is ready

    and then keep crashing as client connects to one, specified database:

    postgres[4192]: [5108-1] LOG: server process (PID 4501) was terminated by signal 11
    postgres[4192]: [5109-1] LOG: terminating any other active server processes
    postgres[4192]: [5110-1] LOG: all server processes terminated; reinitializing
    postgres[4621]: [5111-1] LOG: database system was interrupted at 2005-01-23 07:25:32 CET
    postgres[4621]: [5112-1] LOG: checkpoint record is at 31B/FEFCE3A8
    postgres[4621]: [5113-1] LOG: redo record is at 31B/FEFCE3A8; undo record is at 0/0; shutdown TRUE
    postgres[4621]: [5114-1] LOG: next transaction ID: 390439982; next OID: 678592512
    postgres[4621]: [5115-1] LOG: database system was not properly shut down; automatic recovery in progress
    postgres[4621]: [5116-1] LOG: record with zero length at 31B/FEFCE3E8
    postgres[4621]: [5117-1] LOG: redo is not required
    postgres[4621]: [5118-1] LOG: database system is ready
    postgres[4192]: [5111-1] LOG: server process (PID 16866) was terminated by signal 11

    I find out two ways to crash the server:
    1. Connecting to the one, specified database ("data")
    2. Creating another database, and trying to access it

    Other databases works find, I can add or remove records with no problem.

    I copied the PGDATA to another machine for debugging and restored the PGDATA from night dump. Everything went smoothly.
    On debugging machine with exactly the same version of PostgreSQL the
    backend keep crashing:

    [16350] DEBUG: /usr/lib/postgresql/bin/postmaster: PostmasterMain: initial environ dump:
    [16350] DEBUG: -----------------------------------------
    [16350] DEBUG: PWD=/DEBUG/lib/postgres/data
    [16350] DEBUG: USER=postgres
    [16350] DEBUG: MAIL=/root/Mailbox
    [16350] DEBUG: LANG=pl_PL
    [16350] DEBUG: LOGNAME=root
    [16350] DEBUG: SHLVL=4
    [16350] DEBUG: SHELL=/bin/sh
    [16350] DEBUG: HOME=/var/lib/postgres
    [16350] DEBUG: TERM=linux
    [16350] DEBUG: PGDATA=/DEBUG/lib/postgres/data
    [16350] DEBUG: PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
    [16350] DEBUG: LESSCHT=latin1
    [16350] DEBUG: SSH_TTY=/dev/pts/0
    [16350] DEBUG: _=/usr/lib/postgresql/bin/postmaster
    [16350] DEBUG: -----------------------------------------
    [16350] DEBUG: found "/usr/lib/postgresql/bin/postgres" using argv[0]
    [16350] LOG: could not create IPv6 socket: Rodzina adresów nie obsługiwana przez protokół
    [16350] DEBUG: invoking IpcMemoryCreate(size=31940608)
    [16350] DEBUG: found "/usr/lib/postgresql/bin/postmaster" using argv[0]
    [16356] LOG: database system was shut down at 2005-01-23 14:36:26 CET
    [16356] LOG: checkpoint record is at 31C/1F7C150
    [16356] LOG: redo record is at 31C/1F7C150; undo record is at 0/0; shutdown TRUE
    [16356] LOG: next transaction ID: 390466146; next OID: 679780352
    [16356] LOG: database system is ready
    [16356] DEBUG: proc_exit(0)
    [16356] DEBUG: shmem_exit(0)
    [16356] DEBUG: exit(0)
    [16350] DEBUG: reaping dead processes
    [16350] DEBUG: forked new backend, pid=16435 socket=8
    [16435] DEBUG: /usr/lib/postgresql/bin/postmaster child[16435]: starting with (
    [16435] DEBUG: postgres
    [16435] DEBUG: -v196608
    [16435] DEBUG: -p
    [16435] DEBUG: template1
    [16435] DEBUG: )
    [16435] DEBUG: InitPostgres
    [16435] DEBUG: StartTransactionCommand
    [16435] DEBUG: CommitTransactionCommand
    [16435] DEBUG: proc_exit(0)
    [16435] DEBUG: shmem_exit(0)
    [16435] DEBUG: exit(0)
    [16350] DEBUG: reaping dead processes
    [16350] DEBUG: child process (PID 16435) exited with exit code 0
    [16350] DEBUG: forked new backend, pid=16510 socket=8
    [16510] DEBUG: /usr/lib/postgresql/bin/postmaster child[16510]: starting with (
    [16510] DEBUG: postgres
    [16510] DEBUG: -v196608
    [16510] DEBUG: -p
    [16510] DEBUG: data
    [16510] DEBUG: )
    [16510] DEBUG: InitPostgres
    [16350] DEBUG: reaping dead processes
    [16350] DEBUG: child process (PID 16510) was terminated by signal 11
    [16350] LOG: server process (PID 16510) was terminated by signal 11
    [16350] LOG: terminating any other active server processes
    [16350] LOG: all server processes terminated; reinitializing
    [16350] DEBUG: shmem_exit(0)
    [16350] DEBUG: invoking IpcMemoryCreate(size=31940608)
    [16516] LOG: database system was interrupted at 2005-01-24 08:54:48 CET
    [16516] LOG: checkpoint record is at 31C/1F7C150
    [16516] LOG: redo record is at 31C/1F7C150; undo record is at 0/0; shutdown TRUE
    [16516] LOG: next transaction ID: 390466146; next OID: 679780352
    [16516] LOG: database system was not properly shut down; automatic recovery in progress
    [16516] LOG: record with zero length at 31C/1F7C190
    [16516] LOG: redo is not required
    [16516] LOG: database system is ready

    the postgres process left the core, and the backtrace in gdb shows:

    (gdb) bt
    #0 0x081b4e79 in GetRawDatabaseInfo ()
    #1 0x081b4865 in InitPostgres ()
    #2 0x08154741 in PostgresMain ()
    #3 0x0813437d in ClosePostmasterPorts ()
    #4 0x08133c0f in ClosePostmasterPorts ()
    #5 0x08132519 in PostmasterMain ()
    #6 0x08131f50 in PostmasterMain ()
    #7 0x081088df in main ()
    #8 0x001ef14f in __libc_start_main () from /lib/libc.so.6

    Some specs:
    Main machine: Dual Xeon 2.8GHz, 2GB ECC RAM, two SCSI 73GBs 15K RPM.
    Both machines are Debian Woody with 2.4 kernels, and own compilation of
    PostgreSQL 7.4.1.
    The server is running for nearly two years now, with PostgreSQL for all
    the time, and as it's the first problem I encountered, I don't suspect
    it's a hardware fault.

    Thanks for your help,
    --
    Marcin

    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to [email]majordomopostgresql.org[/email]

    Marcin Guest

  2. #2

    Default Re: [7.4.1] signal 11 while accesing DB

    On Mon, 2005-01-24 at 11:45, Marcin wrote:
    > Hi,
    > Yesterday morning one of our PostgreSQL engines stopped unexpectedly,
    > with the following log:
    >
    > postgres[4192]: [5105-1] LOG: server process (PID 26678) was terminated by signal 11
    Have you tried backing up your data directory (as a precaution) and
    upgrading to 7.4.6? It's quite possible that there was a bug in 7.4.1
    that you're being bitten by. Not that I'd know, I never used 7.4 before
    about version 7.4.3 or so.

    ---------------------------(end of broadcast)---------------------------
    TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [email]majordomopostgresql.org[/email])

    Scott Marlowe Guest

  3. #3

    Default Re: [7.4.1] signal 11 while accesing DB

    Marcin <migorop.pl> writes:
    > Yesterday morning one of our PostgreSQL engines stopped unexpectedly,
    > [ with SIGSEGV at ]
    > (gdb) bt
    > #0 0x081b4e79 in GetRawDatabaseInfo ()
    > #1 0x081b4865 in InitPostgres ()
    > #2 0x08154741 in PostgresMain ()
    This suggests corrupted data in pg_database, probably a bad page header
    or a bad item offset.
    > The server is running for nearly two years now, with PostgreSQL for all
    > the time, and as it's the first problem I encountered, I don't suspect
    > it's a hardware fault.
    I do ;-). It would be interesting though to examine pg_database with
    pg_filedump or a similar tool. See
    [url]http://sources.redhat.com/rhdb/utilities.html[/url]
    for pg_filedump. Also, you might want to update to something newer than
    7.4.1 --- I see at least one potentially relevant bug fix in the 7.4
    branch CVS history.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 6: Have you searched our list archives?

    [url]http://archives.postgresql.org[/url]

    Tom Lane Guest

  4. #4

    Default Re: [7.4.1] signal 11 while accesing DB

    Marcin <migorop.pl> writes:
    > Okay, then pg_filedump 1262:
    > and suddenly:
    > Block 4 ************************************************** ******
    > <Header> -----
    > Block Offset: 0x00008000 Offsets: Lower 0 (0x0000)
    > Block: Size 0 Version 0 Upper 0 (0x0000)
    > LSN: logid 0 recoff 0x00000000 Special 0 (0x0000)
    > Items: 0 Free Space: 0
    > Length (including item array): 24
    > Error: Invalid header information.
    Ah-hah. This case should be coped with by 7.4.3 and later, due to this patch:

    2004-06-05 13:42 tgl

    * src/include/storage/: bufpage.h (REL7_4_STABLE), bufpage.h:
    Adjust PageGetMaxOffsetNumber to ensure sane behavior on
    uninitialized pages, even when the macro's result is stored into an
    unsigned variable.

    In the meantime a VACUUM of pg_database should get you out of trouble.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

    Tom Lane Guest

Similar Threads

  1. Web App Accesing eventlogs
    By Rabia [Xavor] in forum ASP.NET Security
    Replies: 1
    Last Post: August 9th, 02:41 PM
  2. Accesing XML Attribute in a binding...
    By bodhi1000 webforumsuser@macromedia.com in forum Macromedia Flash Actionscript
    Replies: 1
    Last Post: January 30th, 02:49 PM
  3. Accesing hash of hashes
    By Ravi Malghan in forum PERL Beginners
    Replies: 3
    Last Post: November 13th, 08:26 AM
  4. Accesing embedded controls??
    By ozgur develioglu in forum ASP.NET Building Controls
    Replies: 0
    Last Post: July 21st, 06:57 AM
  5. accesing php script in a different port
    By Michael P. Carel in forum PHP Development
    Replies: 1
    Last Post: July 9th, 07:02 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139