Friday, August 17, 2007

Listener hangs in Oracle 10g and no new connections are allowed

Just when I was thinking how boring my job as DBA was (doing the same routine work and nothing new to implement until you convince everyone and take approvals from them!!) I was made to feel that DBA work is not at all boring if you are at the receiving end :-) . I was suddenly bombarded with mails from the users saying that they are not able to connect to the database. As usual I checked the listener status and replied to some of them saying that there is no problem with the listener as such but would get back to them with a solution. It was a production database, the pressure on me was mounting every second. I was not sure what to do? Reply to the users or solve the problem :-)

The database version was of Oracle 10.2.0.1 running on Linux RHEL 4..The listener seemed to be in hung state and moreover there were no errors in the listener log file.

I don’t know what came to my mind I thought of checking the listener process using the ps command. I was surprised to see a child process forked automatically with the same name of the current listener.

$ ps -ef |grep tns
oracle 2310 1 0 Jul 17 ? 72:00 /oracle/ora10g/db/bin/tnslsnr oprem -inherit
oracle 6573 2310 0 14:19:23 ? 0:00 /oracle/ora10g/db/bin/tnslsnr oprem –inherit

I killed the child process and then reloaded the listener. Phew!!...it worked.The users were able to connect to the database now.

Later I found that this is an Oracle bug (No.4518443).

Some of the possible solutions for the above problem are:

1. Kill the child process using kill command and then reload the listener process.

2. Add the following parameter in the listener.ora file and restart the listener process.

SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER_NAME=OFF

Where listener_name is the listener name (here it would be oprem)

3. Apply Patch.

4. Rename the ons.config file and restart the listener. (path:$ORACLE_HOME/opmn/conf)

This problem is fixed in 10.2.0.3

Note: If you add the above parameter for a listener.ora in RAC setup, then Fast Application Notification (FAN) will not work.

For patch and more details on this refer to metalink note: 340091.1

Hope this was useful to you guys.

Monday, August 13, 2007

Sometimes something goes unnoticed...

ORA-01194: File 5 needs media recovery to be consistent


When you get the above error if you decide that you have to recover the datafile (or database), think twice. Always you may not have to recover the file. I had faced this scenario in one of the client places. Though the initial plan was to restore the datafile from the backup and recover, I had to change my mind.

Now the scenario…I was informed that a guy at the client site had restarted the database (on windows) and they could not open the database because of the datafile inconsistency. After some searching I could find that the problem was with the backup process!!

It happened so that when the hot backup (user-managed) of the database was being taken, that guy had shutdown the database. Now, while restarting the database a particular datafile was asking for media recovery. When checked I found that the datafile was still in backup mode. i.e. the database was shutdown when the backup was going on. Now, the solution is simple. Mount the database and use alter database datafile 5 end backup command and open the database. There is absolutely no need to recover the datafile!! J

So what exactly happened here?!? Well, nothing strange happened. It is a normal oracle behavior, nothing else. When the instance was restarted, the datafile which was in the backup mode will look old because the datafile header is freezed with older SCN. So it will (and it shouldJ) ask for recovery. That’s it!

Generally on unix servers if you issue normal shutdown commands (except abort!!) on the database when the hot backup is going on, shutdown wont happen. It will throw an error saying that the datafile is in backup mode. So there is very less chance of facing this scenario on unix servers.

But in windows, it is general practice to shutdown the database using the OracleService on the services window. In this case even if the backup is happening database goes down with shutdown abort command internally. You won’t come to know that shutdown abort has happened.

You can even face this scenario when the instance crashes during backup (hot backup).You try to restart the instance without knowing that backup was active during the instance crash and end up with the same error.

So, don’t panic… Take it easy!!!

How to recover a table using RMAN backup

This article is about recovering a particular table (which was dropped or truncated) using RMAN backup. I assume that logical backup is not planned as the size of the database is in Terabytes (TB) and RMAN is used for backing up the database. Here tablespace point in time recovery (TSPITR) does not come into picture because we want to recover a single table and not all the objects in the tablespace.

It is assumed that

- The target database is on host A and the RMAN full backup was taken before the table TEST which is to be recovered was dropped.
- The database is to be restored onto host B
- The directory structure of host B is different to host A
- The ORACLE_SID will not change for the restored database
- The backups were carried out to disk
- TEST table to be recovered is in the tablespace TEST_DATA

The following steps are required:

- make the backup available to host B
- make a copy of the init.ora available to host B
- edit the init.ora to reflect directory structure changes
- set up a password file for the duplicated database
- mount the database
- restore and rename the datafiles
- recover and open the database
-export and import the table to the target database

These steps are explained further below.


1.Backup the latest controlfile

sqlplus>alter database backup controlfile to ‘/oracle/control.ctl’;

Note: Backup the archived logs
Move all the archive logs to the Host B from the time backup was taken.


2. List Datafile Locations on Host A

The datafile numbers and location on host A are required. These datafile locations will change on host B

sqlplus> select file#, name from v$datafile;

file# name
----- ------------------------------
1 /oracle/orcl/oradata/system01.dbf
2 /oracle/ orcl/oradata/users..dbf
3 /oracle/orcl/oradata/undo01.dbf
4 /oracle/orcl/oradata/tools01.dbf
5 /oracle/orcl/oradata/test01.dbf
6 /oracle/orcl/oradata/test02.dbf
7 /oracle/orcl/oradata/undo02.dbf
8 /oracle/orcl/oradata/rcvcat.dbf


3. Make the Backups Available to Host B

During restore, RMAN will expect the backup sets to be located in the same directory as written to during the backup. For disk backups, this can be accomplished in many ways:

- set up an NFS directory, mounted on both host A and host B
- use of symbolic links on host B



4. init.ora on Host B

The "init.ora" needs to be made available on host B. Any location specific
parameters must be ammended. For example,
- *_dump_dest
- log_archive_dest*
- control_files=(control file backup taken on Host A)


5. Setup PASSWORD File

In order to allow RMAN remote connections, a password file must be setup for the duplicated database. For example,

$orapwd file=$ORACLE_HOME/dbs/orapw$ORACLE_SID password=oracle


6. Recover the Database

On Host B perform the following steps.

6.1 Startup nomount the database

sqlplus> startup nomount pfile=

6.2 Mount the database

sqlplus> alter database mount;

6.3 Rename and restore the datafiles, and perform database recovery

RMAN can be used to change the location of the datafiles from the location on host A to the new location on host B. Here rename the datafiles of SYSTEM,UNDOTBS1 and TEST_DATA tablespaces only.

Note: If you have two undo tablespaces in your database and you keep switching between these undo tablespaces it is necessary to restore both the undo tablespaces.

RMAN> run {

allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;

set newname for datafile 1 to '/oracle/datafiles/system01.dbf';
set newname for datafile 3 to '/oracle/datafiles/undo01.dbf';
set newname for datafile 5 to '/oracle/datafiles/test01.dbf';
set newname for datafile 6 to '/oracle/datafiles/test02.dbf';
set newname for datafile 7 to '/oracle/datafiles/undo02.dbf';

restore tablespace SYSTEM;
restore tablespace UNDOTBS1;
restore tablespace TEST_DATA;
switch datafile all;
}

6.4 Recover and open the database

Perform incomplete recovery and take the datafiles of the tablespaces other that SYSTEM,UNDOTBS1,TEST_DATA to offline. This makes our restore work easier. i.e. you don’t have to restore the whole database backup. When you issue offline drop command, controlfile assumes that it does not need these files for recovery(so need to restore!!).This is helpful when you have a database of say 1 TB and the tablespace in which the table to be recovered is present is of say 10 GB. By skipping the restoration of other tablespaces you save lot of time and space also.

sqlplus>alter database datafile 2 offline drop;
alter database datafile 4 offline drop;
alter database datafile 8 offline drop;

sqlplus> recover database using backup controlfile until cancel; (or until time)

Forward the database applying archived redo log files to the point just before the table was dropped and stop the recovery process by typing cancel at the prompt (assuming that you have required archived redo log files in the log_archive_dest directory)


6.5 Rename the logfiles prior to opening the database

sqlplus> alter database rename file ' /oracle/orcl/oradata/redo1.log' to '/oracle/redologs/redo1.log';

sqlplus> alter database rename file ' /oracle/orcl/oradata/redo2.log' to '/oracle/redologs/redo2.log';

sqlplus> alter database open resetlogs;

Now you can query the table TEST to find out the data.

Once you are confirmed that the table TEST is recovered, export the table and import to the target database.