Every DBA knows the meaning of “My Oracle Support – MOS – statement: ‘This script is intended to be a self-help tool to assist in diagnosing and resolving the problem you encountered.’
I don’t remember who wrote that, but I like it. And since we’ve got to learn how to fix the problems ourselves before our bosses will let us hire help, this is good advice.
By now, most of us have many scripts for fixing things in MOS: add database links, add users, and create temporary table spaces. Even so, there are still some basic tasks that nobody seems to write them down somewhere for reference or future use! Little steps that we forget when we go “under duress”. Or worse yet: we know about them but just cannot find them when needed.
Now, this is not a problem with the MOS statement itself. You are always welcome to report it in Bugzilla. But the requirement for finding these scripts might be addressed by more helpful text on some of them.
I have written here about how to recover single objects from your damaged database before, but today I am going to show you how to get everything back! And since everybody’s SQL*Plus is different, all I can offer are steps that work for me. This means that you need to adapt them slightly when using another toolset.
The steps are grouped in “phases” which together should restore your control over your database:
1) Find out what needs fixing
2) Collect data
3) Apply fixes
4) Repair and recover
5) (Optional) Clean up
1. Find out what needs fixing
After an instance crash, you might see lots of red in your alert log file. This is normal, but before jumping in to fix things, make sure you have found out what really makes the server go down again. Here are some good resources for diagnosing problems:
Before doing anything, you should figure out why your database is in the state it’s in. If there are no good reasons for recovery yet, you can jump straight to Phase 2 and start collecting data.
2. Collect Data
This step includes backup of critical files, integrity checks on those backups and scripts that can help you diagnose problems:
- For basic information about your instance: show parameter settings and explain plan execution plans
- To find further problems: list segments by table space/owner/object; analyze table space usage; analyze object dependencies.
3. Apply Fixes
The main goal of this phase is to get the most important files back, either by restoring or re-creating them:
Data/control files; archive log files; redo logs; archived logs; parameter file; SPFILE backups.
4. Repair and recover
Now that you have fixed your database enough so it can run on its own for a while, you should try to repair the damage caused by the crash itself. This step includes fixing fake objects which are no longer valid because their base objects became invalid, e.g.: truncate table t1 rename table t1 to t2 recreate indexes on table t2 delete from t1 where 1=2 index ix on t1 (…)
5. (Optional) Clean up
After the crash, lots of temporary files get accumulated in your data directories. It’s a good idea to clean them up regularly so you don’t run out of space.
You might also want to stop collecting archived redo logs unless you are planning an instant recovery or flashback database operation. If there is nothing else to fix right now, you can also improve general performance by clearing some caches and re-initializing resource managers like AQ and ASM.
In this post I have shown you how to recover from a MOS instance crash using Oracle built-in tools and scripts. As always, the more information is out there on the Internet about a problem that affects us all, the better our chances of finding a good solution for it quickly.
The recovery process can be long and tricky. The best way to avoid errors is to have a good plan in advance. You might want to include the steps listed above in your own list of typical maintenance activities that you already execute from time to time.
As part of the Ask Tom Team, he focuses on performance tuning and troubleshooting complex technical problems in all areas of Oracle technology: applications, database server internals, storage engines and servers. He has been working actively with Oracle since 1993 and has obtained several professional certifications including OCP.