RAC FAQ

                                                          RAC FAQ


1.How to see db is running or not in nodes
--> srvctl status database -d <db_name>
output=>instance <instance_name> is running on node <node_name>
2.How to stop rac environment
i)stop dbconsole
$export ORACLE_SID=<sid_name>
$emctl stop dbcosole

ii)stop services
srvctl stop service -d <db_name>
iii)stop database
srvctl stop database -d <db_name>
iv)stop asm in rac1
srvctl stop asm -n <rac1>

v)stop asm in rac2
srvctl stop asm -n <rac2>
vi)stop (gsd,ons,listener,vip) nodeapps
srvctl stop nodeapps -n <rac1>
srvctl stop nodeapps -n <rac2>
vii)stop cluster
a)#cd /etc/init.d
#./init.crs stop
 or
b)cd $ORACLE_HOME/bin
#./crsctl stop crs
3.How to register a database
$srvctl add database -d <db_name> -o $ORACLE_HOME
4.How to register an instance
$srvctl add instance -i <instance_name> -n <node_name> -d <db_name>
5.How to see the patches in cluster
$opatch lsinventory -detail $ORA_CRS_HOME
6.How to see the patches in db
$opatch lsinventory -detail $ORACLE_HOME
7.How to check OLR
bin#ocrconfig -local
8.How to know which nodes are participating in cluster
$./olsnodes
output=>mycluster A active
mycluster B active
9.What are the resources are registered in the cluster
$crs_stat -t
10.How to check the status of the cluster
$crsctl check crs
11.How to know the version of the cluster
$crsctl query crs softwareversion
or
$crsctl query crs activeversion
output=>clusterware version on node rac1 is 11.2.0.2.0

12.How to check/start/stop the cluster in 11gr2
bin#crsctl check/start/stop cluster -all
(in 11gr2 we can fire the command at once for all the nodes)
13.How to know the location of voting disk
$crsctl query css votedisk
14.How to know the location of ocr file and to check the integrity of the ocr
$ocrcheck
version:11.2.0.3
Total space:262120 (kbytes)
used space:- 2844
Avaialble space:- 264668210
15.How to know the location of OLR(11gr2)
bin#ocrcheck -local (need root privs)
16.How to dump the content of the OCR into a textfile
$ocrdump
17.How to know the default location of ocr file
$ocrconfig -showbackup
18.How to disable the cluster
$crsctl disable/enable crs
19.How to know the disk timeout latency (delay)
$crsctl get css disktimeout
20.How to know the network timeout latency
$crsctl get cssmiscount

ADMINISTRING ORACLE RAC USING SRVCTL
====================================
1.How to checkup the status of all instances
$srvctl status database -d <db_name>
2.How to check the status of a specific instances
$srvctl status instance -i <instance_name> -d <db_name>
3.How to know the configuration of database
$srvctl config database -d <db_name>
4.How to enable/disable the database
$srvctl enable/disable database -d <db_name>
5.How to create a high availability service
$srvctl add service -s <service_name> -d <db_name> -r <preferred_instance> -a <available_instance> -P basic
6.How to check the status of a specific service
$srvctl status/start/stop service -s <service_name> -d <db_name>
7.How to stop the listener
$srvctl stop listener -n <node_name>
8.How to know the configuration of scan
$srvctl config scan
9.How to know the config/status of scan listener
srvctl status/config scan-listener
10.what is the default location of OLR
$GRID_HOME/cdata/<host_name>.olr
11.How to know the master node
select * from gv$ges_resource;
or
ocrconfig -showbackup or alert log file
12.How to take manual backup of ocr
$ocrconfig -export /opt/ocr.bkp
13.How to register OCR file from the default backup
$ocrconfig -restore $ORA_CRS_HOME/cdata/crs/ocr001.ocr
(note:- cluste should be down in the all the nodes)
14.How to restore ocr from manual backup
$ocrconfig -import /opt/ocr.bkp
15.By default Oracle doesnot take backup of voting disk
->if the vd is in the cluster file system use cp command to take the backup
->if the vd is in raw partition,use dd command to take the backup
#dd if=/dev/sda5 of=/opt/voting.bkp
16.How to restore the VD
dd if=/opt/voting.bkp of=/dev/sda5
==============================================
1.What is weak start dependency on vip property clusterware resources
->when db instance starts ,then the resource tries to start the vip for the node,if the vip doesnot start successfully,then the instance still starts but the services doesnot start.

2.What is the generic server pools
->
i)oracle defined server pool is called generic
ii)Oracle manages the generic server pool to support adminstrator managed dbs
iii)We can add or remove an adminstrator managed db using either srvctl or DBCA ,Oracle RAC creates or remove the server pools that are member of generic
iv)we can't use srvctl or crsctl to modify the generic server pool

3.what is policy managed dbs
->
i)we have to pmdbs in 11gr2
ii)pmdbs and amdbs can't coexist in same servers
iii)pmdbs runs in one or more db server pools that are created in cluster
iv)pmdbs runs in different server in different time
v)if you are using oasm with omf for your db storage,then when an instance starts and there is no redo thread available,oracle rac automatically enables one and creates the required redo log files
and undo tablespaces.

4.what is awr
A built in repository that exists in every oracle dbs.At regular intervals,oracle database makes a snapshot of all of its vital statics and workload and stores them in awr.

5.what is cache coherency.
->The synchronization of data in multiple caches sothat reading a memory location throug any cache will return the most recent data written to that location through any other cache.sometimes it is called
cache consistency.

6.what is cardinality
->The no of database instances you want running during normal operations
7.What is the cluster
->Multiple interconnected computers or servers that appear as if they are one server to end users and applications

7.What is the cluster file system
->A distributed filesystem that is a cluster of servers that collaborate to provide high performance services to their clients.Cluster file system s/w deals with distributing requests to storage
cluster component.

8.What is cluster ready services daemon (CRSD)
->The primary oracle clusterware process that performs high availability recovery and management operation such as maaintaining OCR

9.What is GV$ views
->
i)In addition to v$ information,each GV$ view contains an extra col.i,e inst_id .which displays the instance number from which the associated v$ view information was obtained.
ii)It is created automatically ,If we create database by DBCA
iii)IF we create db manually then we have to run catclustdb.sql script.

10.What is the advantages of policy managed database
i)Before 11gr2 databases are administored managed,where a dba managed each instance of db by defining specific instances to run on specific nodes in the cluster.
ii)11gr2 implemented dynamic grid configuration introduces policy managed databases where dba is required only to define the cardinality i,e no. of db instances required.
iii)Oracle clusterware manages the allocation of nodes to run the instances.
iv)Oracle RAC allocates the required redo threads and undo tablespaces .It is only happened if db uses only oracle managed files.

11.What is RAC background process
i) ACMS:- Atomic controlfile to memory service. The acms per-instance process is an agent that contributes to ensuring a distributed sga memory update is either globally commited on success or globally
aborted if a failures occurs
ii)GTX0-J:-Global transaction process
It provides transparent support for XA global transaction in RAC env.The db auto tunes the no of these process based on workload of XA global transactions.
iii)LMON:- Global enque service monitor
It monitors global enque and resources across the cluster and performs global enque recovery operation
iv)LMD:- Global enque service daemon
it manages incoming remote resource requests within each instance
v)LMS:- Global cache service process

it maintains records of datafile statuses and each cached block by recording information in GRD
It controls the flow of messages to remote instances and manages global datablock acess and images between the buffer caches of different instances.
vi)LCK0:- Instance enque process
It manages non-cache fusion resource requests such as library and row cache requests
vii)RMSN:- oracle rac management process
it will create resources when new instance is added to the clusters
viii)RSMN:- Remote slave monitor
it manages background slave process creation and communication on remote instances.
This background slave process perform tasks on behalf of a coordinating process running in another instance

==============================================================
12.What is the process of adding a node in rac

step 1:- check is the new node is ready from a hardware and operating system perspective from rac1
rac1$su - grid
$export GRID_HOME=/u01/app/11.2.0/grid
$$GRID_HOME/bin/cluvfy stage -post hwos -n rac3

Step 2:- check the compartibility that is new node is compared to an existing node from rac1
rac1$$GRID_HOME/bin/cluvfy comp peer -refnode rac1 -n rac3 -orainv oinstall -osdba dba -verbose

Step 3:- verify the integrity of the cluster and wheather it is ready for a new node

$GRID_HOME/bin/cluvfy stage -pre nodeadd -n rac3 -fixup -verbose

Step 4:- From an existing node,extend to the new node using addNode.sh
rac1$export IGNORE_PREADDNODE_CHECKS=Y
rac1$$GRID_HOME/oui/bin/addNode.sh -silent "CLUSTER_NEW_NODES={rac3}""CLUSTER_NEW_VIRTUAL_HOSTNAMES={rac3-vip}"

Step 5:- Verify that the clusterware has been extended to the new node properly or not
rac1$$GRID_HOME/bin/cluvfy stage -post nodeadd -n rac3 -verbose

Step 6:- extend the oracle db s/w to new node
rac1$echo $ORACLE_HOME
rac1$$ORACLE_HOME/oui/bin/addNode.sh -silent "CLUSTER_NEW_NODES={rac3}"
run the root.sh commands on the new node as directed
rac3$/u01/app/oracle/product/11.2.0/db_1/root.sh

Step 7:- change the ownership of oracle executable in newly created $ORACLE_HOME on rac3
rac3$export ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1
rac3$chgrp asmadmin $ORACLE_HOME/bin/oracle
rac3$chmod 6751 $ORACLE_HOME/bin/oracle
rac3$ls -ltr $ORACLE_HOME/bin/oracle

Step 8:- Verify the adminstrative privilages across all nodes
rac3$$ORACLE_HOME/bin/cluvfy comp admprv -o db_config -d $ORACLE_HOME -n rac1,rac2,rac3 -verbose

Step 9:- satisfy node-instance dependancy from the new node rac3 .create password file,init.ora file and oratab entry for the new instance
rac3$echo $ORACLE_HOME
$cd $ORACLE_HOME/dbs
dbs$mv initracdb1.ora initracdb3.ora
dbs$mv orapwracdb1 orapwracdb3
$echo "racdb3:$ORACLE_HOME:N">>/etc/oratab

From a node with an existing instance of racdb .create the public thread,undo tablespace and init.ora entries for new instance
rac1$export ORACLE_SID=racdb1
$.oraenv
$sqlplus "/as sysdba"
sql>alter database add logfile 2 group 7 ("+data,'+fra') size 100m ,group 8 ('+data','+fra') size 100m,group 9('+data','+fra') size 100m;
sql>alter database enable public thread 3;
sql>create undo tablespace undotbs 3 datafile '+data' size 200m;
sql>alter system set undo_tablespace=undotb3 scope=spfile sid='racdb3';
sql>alter system set cluster_database_instance=3 scope=spfile sid = '*';

Step 10:
Update in OCR for a new instance
rac3$srvctl add instance -d racdb -i racdb3 -n rac3
rac3$srvctl add status -d racdb -i racdb3 -n rac3
rac3$srvctl add config -d racdb -i racdb3 -n rac3

add racdb3 instance to the 'racsvc.colestock.test service and verify
rac3$srvctl add service -d racdb -s racsvc.colestock.test

Step 11:-
start the new instance and verify the status of instance and services
rac3$srvctl start instance -d racdb -i racdb3
rac3$srvctl status instance -d racdb -i racdb3 -v

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1.What is the purpose of private interconnect
-->
i)clusterware uses the private interconnect for cluster synchronization(network heartbeat) and daemon communication between the clustered nodes.This communication is based on TCP protocol.

ii)RAC uses the interconnect for cache fusion(UDP) and inter-process communication (TCP).Cache fusion is the remote memory mapping of oracle buffers shared between the caches of participating nodes
in the cluster.

2.Why do we have a virtual ip(vip) in oracle rac
-->
Without using vips or fan,clients connected to a node that died will often wait for a tcp timeout period(which can be upto 10mins) before getting an error.As a result,you don't really have a good
HA solution without using VIPS.

When a node fails,the vip associated with it is automatically failed over to some other node and new node re-apps the world indicating a new MAC address for the IP,subsequent packets sent to the vip
go to the new node,which will send error RST packets back to the clients .This results in the clients getting errors immediatly.

3.What is voting disk

-->
Oracle clusterware uses the voting disk to determine which instances are members of a cluster.The VD must reside on a shared disk.Basically all nodes in the RAC cluster register their heart beat
information on this VD.The number decides the number of active nodes in the RAC cluster.These are also used for checking the availability of instances in RAC and remove the unavailable nodes
out of the cluster.It helps in preventing split brain condition and keeps database information intact.
For high availability,oracle recommends that you have a min. of 3 VD.If you configure a single VD,Then you should use external mirroring to provide redundancy .You can have upto 32 VD in your
cluster.What I could understand about the odd value of no. of VD is that a node should see max. no of VD to continue to function,so with 2 ,if it can see only 1,Its not the maximum value but a half
value of VD.

4.What is split brain syndrome
-->
In a oracle rac environment all the instances/servers communicate with each other using high speed interconnect on private network .This pvt network interface or interconnect are redundant and
are only used for inter-instance oracle datablock transfers.
Now talking about split brain concept w.r.t oracle rac system,it occurs when the instance members in a RAC fail to ping/connect to each other via this pvt interconnect.But the servers are all
physically up and running and the database instance on each of these servers is also running.This individual nodes are running fine and can conceptually accept user connection and work independently

So basically due to lack of communication the instance thinks that the other instance that it is not able to connect is down and it needs to do something about the situation.The problem is if
we leave these instance running,the same block might read ,updated in these individual instances and there would be data integrity issue,as the blocks changed in one instance,will not be locked
and could be over-written by another instance.Oracle has efficiently implemented check for the split brain syndrome.

5.What does rac do incase node becomes inactive
-->
In rac if any node becomes inactive or if other nodes are unable to ping/connect to a node in the rac,then the node which first detects that one of the node is not accessible,it will evict that node
from the rac group.
Ex:- There are 4 nodes in a rac instance and node 3 becomes unavailble and node 1 tries to connect to node 3 and finds if not responding,then nodes will evict node 3 out of the rac groups and
will leave only node 1 ,node 2 and node 4 in the rac group to continue functioning.
Ex 2:- (complecated 10 nodes)
There are 10  rac nodes in a cluster .And say 4 nodes are not able to communicate with the other 6.So there are 2 groups formed in this 10 node rac cluster(one group of 4 nodes and other of 6 nodes)
Now the nodes will quickly try to affirm their membership by locking controlfile,then the node that lock the controlfile will try to check the votes of the other nodes.The group with the most
number of active nodes gets the preference and the others are evicted issue with only 1 node getting evicted and the rest function fine.
when we see that the node is evicted,usually oracle rac will reboot that node and try to do a cluster reconfiguration to include back the evicted node.The error is ORA-29740.

No comments:

Post a Comment