Wednesday, November 19, 2014
How to demonstrate Oracle In-Doubt Transactions
Here is the example.
Using databases TEST1 and TEST2, create a table on each. Also create a database link from TEST1 to TEST2.
conn user1/pwd1@TEST1
create table loc_tab (afld varchar2(10));
create database link alink connect to user2 identified by pwd2 using 'TEST2';
conn user2/pwd2@TEST2
create table rem_tab (afld varchar2(10));
Then connect to TEST1 and insert a row into both.
conn user1/pwd1@TEST1
insert into loc_tab values ('A');
insert into rem_tab@alink values ('B');
The connect another session to SYS as SYSDBA on TEST2 and abort the database.
shutdown abort
Back on TEST1, commit the transaction. You will get an error.
SQL> commit;
commit
*
ERROR at line 1:
ORA-02054: transaction 6.21.2068 in-doubt
ORA-03150: end-of-file on communication channel for database link
ORA-02063: preceding line from ALINK
Select * from dba_2pc_pending to see the in doubt transaction on TEST1.
Startup TEST2 and you'll see that the in doubt transaction remains.
The options for dealing with this are COMMIT FORCE or ROLLBACK FORCE. Because we are creating standby databases from TEST1 and don't want the users getting ORA-01591 errors, we will roll back the error.
SQL> select local_tran_id from dba_2pc_pending WHERE state not like 'forced%';
LOCAL_TRAN_ID
----------------------
6.21.2068
SQL> rollback force '6.21.2068';
Rollback complete.
** note the WHERE Clause to avoid seeing other in doubt transaction that have already been forced back recently.
If you choose to COMMIT FORCE '6.21.2068', then it will only commit the local part of the transaction.
I did find that when I restarted TEST2, before performing a 'commit force' on TEST1, the command hung? Not sure about that. I didn't hang on later tests.
Another thing you can do is pretend the failure on TEST2 occurred.
This is done by not aborting TEST2, but instead committing with a comment.
COMMIT COMMENT 'ORA-2PC-CRASH-TEST-7';
Where the 7 can be a number between 1 and 10. The setting of this requires some thinking about. I've listed the values and what they mean at the end of this note.
The result however is the same. You still have to rollback force or commit force the in doubt transaction, but you'll get an error when you do.
Finally there are options to auto fix in doubt transactions based on some criteria.
ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;
ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;
Doc ID 100664.1 from Oracle will give you more detailed information on this matter, and it also has links to other related documents.
COMMIT COMMENT Options.
1 - Crash commit point after collect
2 - Crash non-commit-point site after collect
3 - Crash before prepare (non-commit-point site)
4 - Crash after prepare (non-commit-point site)
5 - Crash commit point site before commit
6 - Crash commit point site after commit
7 - Crash non-commit-point site before commit
8 - Crash non-commit-point site after commit
9 - Crash commit point site before forget
10 - Crash non-commit-point site before forget
Monday, November 3, 2014
Data Guard Transport Lag
I need to thank Emre Baransel's Oracle Blog for pulling me through a tight spot for something I should have already known.
OEM 12c reports the following :-
There are approximately 3,753 seconds of redo not yet available on this standby database.
I know this means that there is a delay or a gap in the transport of REDO logs to a Data Guard Standby Database, but having the commands to hand so you can make a quick diagnosis of the problem, was briefly out of my grasp. So this is what you do.
Locate your Primary and Standby databases and get a sqlplus session running as sysdba on each.
On the primary
SELECT a.thread#,
b. last_seq,
a.applied_seq,
a. last_app_timestamp,
b.last_seq - a.applied_seq arc_diff
FROM (SELECT thread#,
MAX(sequence#) applied_seq,
MAX(next_time) last_app_timestamp
FROM gv$archived_log
WHERE applied = 'YES'
GROUP BY thread#) a,
(SELECT thread#,
MAX (sequence#) last_seq
FROM gv$archived_log
GROUP BY thread#) b
WHERE a.thread# = b.thread#;
Sadly at this point I find that the 'arc_diff' is now zero and the problem has gone from OEM too.
However ,the principle still applies, and when this happens again, I'll put the figures in.
THREAD# LAST_SEQ APPLIED_SEQ LAST_APP_TIMESTAMP ARC_DIFF
---------- ---------- ----------- -------------------- ----------
1 568343 568343 03-NOV-2014 12:02:23 0
Now looking on the Standby.
SELECT thread#, max(SEQUENCE#) FROM V$ARCHIVED_LOG where APPLIED='YES' group by thread#;
THREAD# MAX(SEQUENCE#)
---------- --------------
1 568343
And
select process, status, thread#, sequence#, block#, blocks from v$managed_standby ;
PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
--------- ------------ ---------- ---------- ---------- ----------
ARCH CLOSING 1 568343 12289 1833
ARCH CONNECTED 0 0 0 0
ARCH CLOSING 1 568342 8193 2034
ARCH CONNECTED 0 0 0 0
MRP0 APPLYING_LOG 1 568344 7008 204800
RFS IDLE 1 568344 7010 116
RFS IDLE 0 0 0 0
RFS IDLE   ; 0 0 0 0
8 rows selected.
•Standby database process status: You can run following query on standby database to see what MRP and RFS processes are doing, which block of which archivelog sequences are being shipped or being applied.
And lastly..... Apply/transport lags: v$dataguard_stats view will show the general synchronization status of standby database.
set lines 200
col name format a40
col value format a20
select * from v$dataguard_stats;
NAME VALUE UNIT TIME_COMPUTED
---------------------------------------- -------------------- ------------------------------ ------------------------------
apply finish time +00 00:00:00.0 day(2) to second(1) interval 03-NOV-2014 12:58:01
apply lag +00 00:00:06 day(2) to second(0) interval 03-NOV-2014 12:58:01
estimated startup time 15 second 03-NOV-2014 12:58:01
standby has been open N 03-NOV-2014 12:58:01
transport lag +00 00:00:00 day(2) to second(0) interval 03-NOV-2014 12:58:01
SQL>
And ..... Apply rate: To find out the speed of media recovery in a standby database, you can use this query
set lines 200
col type format a30
col ITEM format a20
col comments format a20
select * from v$recovery_progress;
START_TIME TYPE ITEM UNITS SOFAR TOTAL TIMESTAMP
-------------------- ------------------------------ -------------------- -------------------------------- ---------- ---------- --------------------
03-NOV-2014 08:41:01 Media Recovery Log Files Files 4 0
03-NOV-2014 08:41:01 Media Recovery Active Apply Rate KB/sec 241 0
03-NOV-2014 08:41:01 Media Recovery Average Apply Rate KB/sec 3 0
03-NOV-2014 08:41:01 Media Recovery Redo Applied Megabytes 48 0
03-NOV-2014 08:41:01 Media Recovery Last Applied Redo SCN+Time 2147483647 0 03-NOV-2014 12:50:07
03-NOV-2014 08:41:01 Media Recovery Active Time Seconds 13 0
03-NOV-2014 08:41:01 Media Recovery Apply Time per Log Seconds 2 0
03-NOV-2014 08:41:01 Media Recovery Checkpoint Time per Seconds 0 0
Log
03-NOV-2014 08:41:01 Media Recovery Elapsed Time Seconds 15598 0
9 rows selected.
Happyjohn