Filed 29-JUN-2001 Product Oracle Server - Enterprise Edition V7 Product Version 8.1.7.1 Platform Sun SPARC Solaris (64-BIT) Platform Version 2.7 RDBMS Version 8.1.7.1 Affects Platforms Generic Base Bug N/A Fixed in Product Version No Data ROLLBACK SEGMENT CORRUPTION ORA-600 [4137] -------------------------------------------------------------------------------- ========================= PROBLEM: The rollback segment has been corrupted. SMON unabled to recover the problem transaction and failed with ORA-600 [4137]. ========================= DIAGNOSTIC ANALYSIS: The corrupted block of the rollback segment contains contains invalid rdba for previous undo record. rdba references the block that belongs to other transaction. So SMON can't recover the transaction. ========================= WORKAROUND: ========================= RELATED BUGS: Bug:1528312 ========================= REPRODUCIBILITY: The problem has occured only once. ========================= TESTCASE: No ======================== STACK TRACE: ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], [] ----- Call Stack Trace ----- Stack trace: 5752: ora_smon_Y ffffffff7dea4508 read (c, ffffffff7fffc14c, 1) 00000001007e54f0 ssdinit (ffffffff7fffc14c, 0, 101bc2cf4, 0, 1, 0) + 270 00000001007e3d04 skdstinit (ffffffff7fffc3c0, 0, 101bc76b0, ffffffff7fffc274, a, ffffffff7fffc0a2) + 4 000000010029f4c4 ksedst (0, 101bc0d88, 1, 1016e8810, 84, ffffffff7fffc06c) + 24 000000010029ee8c ksedmp (0, 0, 2, ffffffff7fffd410, ffffffff7fffd0a0, ffffffff7fffd00c) + f4 00000001002f5658 ksddoa (101c24e70, 101c25f70, 10029ed98, 101c24e70, 2, 101c24e48) + 98 00000001002effa8 ksdpcg (101c24e28, 101c25f70, 0, 0, 0, ffffffff7fffc98c) + 148 00000001002f0340 ksdpec (101bc2bb4, 101bc2bb0, 258, 7f7, 0, ffffffff7fffccf8) + 1b0 00000001002c4364 ksfpec (101bc0d88, 258, 101bc0e70, ffffffff7fffccf8, 0, 0) + 13c 00000001010d88d4 kgeriv (101bc1d98, 1000, 101bc0e60, 101ba4e50, 1016e8810, 101c22c18) + dc 00000001010d8cfc kgesiv (1000, 101c22c18, 1029, 0, ffffffff7fffccf8, 101c22c18) + 8c 000000010029e028 ksesic0 (1029, 1, 0, b5c9ae, 40190feea18, 400198a7014) + 68 0000000100246a20 ktubko (400198a7014, ffffffff7fffd150, 4, ffffffff7fffd410, ffffffff7fffd0a0, ffffffff7fffd00c) + 1c0 0000000100243af4 kturrt (101bc0e60, 101bc2bb4, 1, 0, 3, 4) + 1304 0000000100241fe4 kturec (1, 3, 1, 0, ffff, 11) + 384 0000000100240dd8 kturax (40193adf4c8, 0, 7661, 0, 0, 0) + 1e8 000000010028d31c ktprbeg (101bc0e60, 101bc2bb0, 2911, 101bc30d0, 0, 1) + 234 0000000100b23f24 ktmmon (0, 101bc30c0, 9, 0, 4018124e020, 4018124e020) + 15bc 00000001002fb884 ksbrdp (101bc30d8, 0, 1016e6e90, 38000e24c, 38000e230, ffffffff7fffe800) + 3d4 000000010082ab68 opirip (3800140b0, 101bc6318, 101bc08cc, 101bc3788, 101bc0b40, ffffffff7ffff320) + 3c0 000000010010cbec opidrv (32, ffffffff7ffff488, 101bc0e60, 6c6f6700, 0, 0) + 7f4 000000010010b398 sou2o (ffffffff7ffff8c0, 32, 0, 0, 0, 101c13a00) + 10 000000010010af8c main (0, 101bf7da0, 1, ffffffff7dfb2200, 0, 100000000) + a4 000000010010aecc _start (0, 0, 0, 0, 0, 0) + ec End of trace ----- End of Call Stack Trace ----- =================================================== SUPPORTING INFORMATION: Please find on ess30 in /bug/bug1859331 alert.log - part of alert.log when the error occured smon_600_4137_head.trc - SMON trace file with ORA-600 [4137] y_ora_17480.trc - block dump of the rollback corrupted block rheader.lis - dump of the corrupted rollback header ib_ora_5271.trc - dump of the redo for the corrupted block. Unfortunately I can't provide a full dump of redo at the moment because archived redo log files are big, 4 files 500MB each. The corrupted rollback segment has been dropped because it is a production system. But the customer keeps all files ready so we can make additional dumps that you need. - How was the database recovered from this? - Did they restore and rollforward? - Did the customer really drop the rollback segment (it had an active transaction that coulnd't be rolled back)? - What files do we have available? We would really need the system and rollback datafiles. How big are these - Do we have a backup from before the errors occured, and all redo to roll forward through the time of the errors? - How was the database recovered from this? - Did they restore and rollforward? - Did the customer really drop the rollback segment (it had an active transaction that coulnd't be rolled back)? Determined the objects that referenced the corrupted rollback segement header slot. Fortunately it were temporary working tables. Dropped them. Declared the rollback segment as corrupted via _corrupted_rollback_segments parameter, restarted the database and dropped the rollback segment. - What files do we have available? We would really need the system and rollback datafiles. How big are these . We have both system (1gb) and rollback tablespace (2 files : 16Gb+8Gb) - Do we have a backup from before the errors occured, and all redo to roll forward through the time of the errors? No, we don't have datafiles backup before the errors occured, but we have redo logs during the time of errors (June 14,15,16). @ Emailed STOMIN for clarification of recovery procedures The ora-600 [4137] is not the first serious error. From the alert log, the following errors precede the ora-600 [4137] (all in a 50 minute period): - A number of job queue processes die, with PIDs 3127, 7496, 9814, 7873, 9307, 9816 - Other processes (job queue and user) get various internal errors: ora-600 [1117] ora-600 [4412] ora-600 [1113] ora-600 [17082] ora-600 [ktcrcm1] ora-600 [4427] ora-600 [4415] ora-600 [4156] ora-600 [13009] ora-600 [4068] ora-600 [4152] ora-600 [4135] It's only after the above errors do we see the ora-600 [4137], so I don't think that we can take the rollback segment corruption in isolation. Obviously something serious has happened instance-wide prior to the RBS corruption. . The excerpt of the alert log that you have provided starts immediately with the above mentioned errors. o. Can you upload a more complete alert log, going back a few days if possible o. Can you upload all trace files from the errors (above) referenced in the alert log? This may give us a better picture of what has happened. o. As for the ora-600 [4137], can you do the following: 1. Dump the redo (from all saved logs) for any changes to datafile 2 block 443414 (the undo block in question) 2. Get OS dumps from datafile 2 of the undo block, and a few blocks either side the undo block, % dd if= bs= skip=443410 count=10 \ | od -x > /tmp/os_block_dump Please upload all info to ess30 alert2.log - alert.log from 13.06.2001 ib_ora_23419.trc - dump of redo for block 2 443414. It is empty because the customer made dump only from redo log files that he had on the disk. The customer has some redo log files on the tape. os_block_dump.txt Unfortunately they have deleted all trace files related to that moment. They have only one y_ora_19743.trc.gz. It is a system state dump. I'm afraid that we're going to need to redo for this undo block. The OS block dump shows the following blocks and corresponding transaction IDs: Undo block XID 0x0086c412 0001.0000.00b5c9ae 0x0086c413 0001.000d.00b58d43 0x0086c414 0001.0000.00b5c9ae 0x0086c415 0001.0025.00b79663 0x0086c416 0001.0000.00b5c9ae <- undo block in question 0x0086c417 0001.0010.00b4bf51 0x0086c418 0001.0000.00b5c9ae The ora-600 [4137] occurs because we're trying to rollback a transaction and find that the xid in the undo block does not agree with the xid we're trying to rollback. This is often caused by a missed write, so we need to see the last change in the redo stream for this undo block. Please dump the redo from the logs saved to tape as follows: alter system dump logfile '' dba min 2 . 443410 dba max 2 . 443416; This will match up with the OS block dump *** JBARLOW 08/30/01 12:00 am *** (CHG: Sta->33) *** JBARLOW 08/30/01 12:00 am *** Suspending