Wednesday, November 19, 2014

How to demonstrate Oracle In-Doubt Transactions

An 'in doubt transaction' occurs in Oracle when you update both the local and a remote table via a db link but then the remote databases fails.

Here is the example.

Using databases TEST1 and TEST2, create a table on each. Also create a database link from TEST1 to TEST2.

conn user1/pwd1@TEST1
create table loc_tab (afld varchar2(10));
create database link alink connect to user2 identified by pwd2 using 'TEST2';

conn user2/pwd2@TEST2
create table rem_tab (afld varchar2(10));

Then connect to TEST1 and insert a row into both.

conn user1/pwd1@TEST1
insert into loc_tab values ('A');
insert into rem_tab@alink values ('B');

The connect another session to SYS as SYSDBA on TEST2 and abort the database.

shutdown abort

Back on TEST1, commit the transaction. You will get an error.

SQL> commit;
commit
*
ERROR at line 1:
ORA-02054: transaction 6.21.2068 in-doubt
ORA-03150: end-of-file on communication channel for database link
ORA-02063: preceding line from ALINK

Select * from dba_2pc_pending to see the in doubt transaction on TEST1.
Startup TEST2 and you'll see that the in doubt transaction remains.

The options for dealing with this are COMMIT FORCE or ROLLBACK FORCE. Because we are creating standby databases from TEST1 and don't want the users getting ORA-01591 errors, we will roll back the error.

SQL> select local_tran_id from dba_2pc_pending WHERE state not like 'forced%';
LOCAL_TRAN_ID
----------------------
6.21.2068

SQL> rollback force '6.21.2068';
Rollback complete.

** note the WHERE Clause to avoid seeing other in doubt transaction that have already been forced back recently.

If you choose to COMMIT FORCE '6.21.2068', then it will only commit the local part of the transaction.

I did find that when I restarted TEST2, before performing a 'commit force' on TEST1, the command hung? Not sure about that. I didn't hang on later tests.


Another thing you can do is pretend the failure on TEST2 occurred.
This is done by not aborting TEST2, but instead committing with a comment.

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-7';

Where the 7 can be a number between 1 and 10. The setting of this requires some thinking about. I've listed the values and what they mean at the end of this note.

The result however is the same. You still have to rollback force or commit force the in doubt transaction, but you'll get an error when you do.

Finally there are options to auto fix in doubt transactions based on some criteria.

ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;
ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;

Doc ID 100664.1 from Oracle will give you more detailed information on this matter, and it also has links to other related documents.


COMMIT COMMENT Options.

1 - Crash commit point after collect
2 - Crash non-commit-point site after collect
3 - Crash before prepare (non-commit-point site)
4 - Crash after prepare (non-commit-point site)
5 - Crash commit point site before commit
6 - Crash commit point site after commit
7 - Crash non-commit-point site before commit
8 - Crash non-commit-point site after commit
9 - Crash commit point site before forget
10 - Crash non-commit-point site before forget

Monday, November 3, 2014

Data Guard Transport Lag

I need to thank Emre Baransel's Oracle Blog for pulling me through a tight spot for something I should have already known.

OEM 12c reports the following :-

There are approximately 3,753 seconds of redo not yet available on this standby database.

I know this means that there is a delay or a gap in the transport of REDO logs to a Data Guard Standby Database, but having the commands to hand so you can make a quick diagnosis of the problem, was briefly out of my grasp. So this is what you do.

Locate your Primary and Standby databases and get a sqlplus session running as sysdba on each.

On the primary

SELECT a.thread#, 
                b. last_seq,
                a.applied_seq,
                a. last_app_timestamp,
                b.last_seq - a.applied_seq   arc_diff
FROM (SELECT   thread#,
                                MAX(sequence#) applied_seq,
                                MAX(next_time) last_app_timestamp
                FROM gv$archived_log
                WHERE applied = 'YES'
                GROUP BY thread#) a,
             (SELECT   thread#,
                                MAX (sequence#) last_seq
                FROM gv$archived_log
                GROUP BY thread#) b
WHERE a.thread# = b.thread#;

Sadly at this point I find that the 'arc_diff' is now zero and the problem has gone from OEM too.
However ,the principle still applies, and when this happens again, I'll put the figures in.

  THREAD#   LAST_SEQ APPLIED_SEQ LAST_APP_TIMESTAMP     ARC_DIFF
---------- ---------- ----------- -------------------- ----------
        1     568343      568343 03-NOV-2014 12:02:23          0

Now  looking on the Standby.

SELECT thread#, max(SEQUENCE#) FROM V$ARCHIVED_LOG where APPLIED='YES' group by thread#;

  THREAD# MAX(SEQUENCE#)
---------- --------------
        1         568343

And

select process, status, thread#, sequence#, block#, blocks from v$managed_standby ;

PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK#     BLOCKS
--------- ------------ ---------- ---------- ---------- ----------
ARCH      CLOSING               1     568343      12289       1833
ARCH      CONNECTED             0          0          0          0
ARCH      CLOSING               1     568342       8193       2034
ARCH      CONNECTED             0          0          0          0
MRP0      APPLYING_LOG          1     568344       7008     204800
RFS       IDLE                  1     568344       7010        116
RFS       IDLE                  0          0          0          0
RFS       IDLE    ;               0          0          0          0

8 rows selected.

•Standby database process status: You can run following query on standby database to see what MRP and RFS processes are doing, which block of which archivelog sequences are being shipped or being applied.

And lastly..... Apply/transport lags: v$dataguard_stats view will show the general synchronization status of standby database.

 

set lines 200

col name format a40

col value format a20

select * from v$dataguard_stats;

 

NAME                                     VALUE                UNIT                           TIME_COMPUTED

---------------------------------------- -------------------- ------------------------------ ------------------------------

apply finish time                        +00 00:00:00.0       day(2) to second(1) interval   03-NOV-2014 12:58:01

apply lag                                +00 00:00:06         day(2) to second(0) interval   03-NOV-2014 12:58:01

estimated startup time                   15                   second                         03-NOV-2014 12:58:01

standby has been open                    N                                                   03-NOV-2014 12:58:01

transport lag                            +00 00:00:00         day(2) to second(0) interval   03-NOV-2014 12:58:01

 

SQL>

 

And ..... Apply rate: To find out the speed of media recovery in a standby database, you can use this query

 

set lines 200

col type format a30

col ITEM format a20

col comments format a20

select * from v$recovery_progress;

 

START_TIME           TYPE                           ITEM                 UNITS                                 SOFAR      TOTAL TIMESTAMP

-------------------- ------------------------------ -------------------- -------------------------------- ---------- ---------- --------------------

03-NOV-2014 08:41:01 Media Recovery                 Log Files            Files                                     4          0

03-NOV-2014 08:41:01 Media Recovery                 Active Apply Rate    KB/sec                                  241          0

03-NOV-2014 08:41:01 Media Recovery                 Average Apply Rate   KB/sec                                    3          0

03-NOV-2014 08:41:01 Media Recovery                 Redo Applied         Megabytes                                48          0

03-NOV-2014 08:41:01 Media Recovery                 Last Applied Redo    SCN+Time                         2147483647          0 03-NOV-2014 12:50:07

03-NOV-2014 08:41:01 Media Recovery                 Active Time          Seconds                                  13          0

03-NOV-2014 08:41:01 Media Recovery                 Apply Time per Log   Seconds                                   2          0

03-NOV-2014 08:41:01 Media Recovery                 Checkpoint Time per  Seconds                                   0          0

                                                    Log

 

03-NOV-2014 08:41:01 Media Recovery                 Elapsed Time         Seconds                               15598          0

 

9 rows selected.

 

 

 

Happyjohn