Saturday, 10 December 2011


Oracle clusterware is a portable cluster infrastructure that provides High Availability(HA) to RAC databases and other applications. Here a cluster is a collection of two or more nodes where they share a common pool of storage used by Oracle clusterware system files (OCR and Voting disk), a common network interconnect, and a common operating system.


The primary purpose of voting disk is to help in situations where private network communication fail. So the voting disk is used to communicate the node state information used to determine which nodes will go offline.


The Oracle Cluster Registry (OCR) file is also a key component of oracle clusterware. It maintains information about high availability components in your cluster, such as the voting file is essentially used by Cluster synchronization services daemon for node-cluster node list , cluster database instance to node mapping, and CRS application resource profiles (such as services, Virtual interconnect protocol addresses and so on). This file is maintained by administrative tools such as SRVCTL. Its size is around 100 MB.

Oracle Clusterware Run-Time View

On UNIX, oracle clusterware stack is run from entries in /etc/inittab with respawn. Here is a basic description of each process :

Cluster Synchronization Service Daemon (OCSSD) : This process runs in both vendor clusterware and nonvendor clusterware environments. It integrates with existing vendor clusterware, when present. OCSSD's primary job is internode hea;th monitoring, primarily using the network interconnect as well as voting disks, and database/ASM instance endpoint discovery via group services. OCSSD runs as oracle user, and failure exit causes machine reboot to prevent data corruption in the event of a split brain.

Process Monitor Daemon (OPROCD) = This process is spawned in any nonvendor clusterware environment. If OPROCD detects problems, it kills a node. And yes it runs as root. This daemon is used to detect hardware and driver freezes on the machine. If a machine was frozen long enough for the other nodes to evict it from the cluster, it needs to kill itself to prevent any I/O from being reissued to the disk after the rest of the cluster has remastered locks.

Cluster ready services daemon (CRSD) : This process is the engine for High Availability operations.It manages Oracle clusterware registered applications and starts, stops, check and fails them over via special action scripts. CRSD spawns dedicated process called RACGIMON that monitor the health of the database and ASM instances and host various feature threads such as Fast Application Notification (FAN). One RACGIMON process is spawned for each instance. CRSD maintains configuration profiles as well as resource statuses in OCR (Oracle Cluster Registry). It runs as root and restarted automatically on failure. In addition, CRSD can spawn temporary children to execute particular actions such as : racgeut (execute under time), racgmd (Manage database), racgchsn (Change service Name), racgons (to add/remove ONS configuration to OCR), racgvip: to start/stop/check instance virtual IP.

Event management daemon (EVMD): This process forwards cluster events when things happen. It spawns a permanent child evmlogger that, on demand , spawns children such as racgevtf to invoke callouts. It runs as oracle, and is restarted automatically on failure.

NOTE :- The RACG infrastructure is used to deploy the oracle database in highly available clustered enviroment. This infrastructure in mainly implemented using the racgwrap script that invokes the racgmain program. It is used by CRS to execute actions for all node-centric resources as well as to proxy actions for all instance-centric resources to RACGIMON. Basically, this infrastructure is responsible for managing all ora.* resources.


Global Cache Service processes (GCS) are processes that, when spawned by Oracle, copy blocks directly from the holding instance's buffer cache and send a read consistent copy of the block to the requesting foreground process on the requesting instance to be placed into the buffer cache. GCS rolls back any uncommitted transactions for any blocks that are being requested for consistent read by the remote instance.

RAC software provides for up to 10 GCS processes (0 thru 9), depending on the amount of messaging traffic. However, there is, by default, one GCS process per pair of CPU. In general, the number of GCS processes varies depending on the amount of messaging traffic amongst nodes in the cluster. Note: Global Cache Services (GCS) were called Lock Manager Services (LMS) in Oracle Parallel Server (OPS).

Oracle 10g added a new initialization parameter called gcs_server_processes. This parameter allows you to set the number of GCS processes that are started to handle interconnect traffic. The default is 2, but can be increased on a per node basis if required.


Cluster Group Services (CGS) is a background process that monitors the entire cluster to manage global resources. By constantly probing the other instances, it checks and manages instance deaths and the associated recovery for GCS. When a node joins or leaves the cluster, it handles reconfiguration of locks and resources. In particular, CGS handles the part of recovery associated with global resources. Note: Cluster Group Services was called Lock Manager Monitor in OPS.


Global Enqueue Service Daemon is a background agent process that manages requests for resources to control access to blocks and global enqueues. It manages lock manager service requests for GCS resources and sends them to a service queue to be handled by the GCS process. The GES process also handles global deadlock detection and remote resource requests (remote resource requests are requests originating from another instance).


The Lock Process (LCK) manages non-cache fusion resource requests such as library and row cache requests and lock requests that are

local to the server. LCK manages instance resource requests and cross-instance call operations for shared resources. It builds a list of invalid lock elements and validates lock elements during recovery. Because the LMS process handles the primary function of lock management, only a single LCK process exists in each instance. There is only one LCK process per instance in RAC.


The Diagnosability Daemon (DIAG) background process monitors the health of the instance and captures diagnostic data about process failures within instances. The operation of this daemon is automated and updates an alert log file to record the activity that it performs.

Oracle 11g Processes

In addition, Oracle 11g added the following new processes:

ACMS: The Atomic Controlfile to Memory Service (ACMS) process insures SGA memory updates are updated on all nodes or none.

GTX: The Global Transaction Process supports XA transactions within RAC.

RMS: The RAC Management Process creates resources that allow new database instances to be added to the cluster.

RSM: The remote slave monitor creates slave processes for processes running on other RAC instances within the cluster.

Wednesday, 23 November 2011

Actual use of Active Session History (ASH)

Active Session History (ASH) was introduced in Oracle 10g. It samples the activity of each active1 database session every second. The data is held in a buffer in memory in the database. The design goal is to keep about an hour (your mileage will vary).
If a session is not active it will not be sampled. The in-memory buffer is exposed via a view called v$active_session_history. But yes if you want inactive sessions also to be sampled you can set the parameter _ash_enable_all = TRUE.

When an AWR snapshot is taken, 1 row in 10 from the ASH buffer is copied down into the AWR repository. It can also be flushed to disk between snapshots when the buffer reaches 66% full, so there is no missed data.The data is stored in WRH$_ACTIVE_SESSION_HISTORY and it is exposed via dba_hist_active_sess_history.

The main components of the ASH are
- 2 background procresses, MMON and MMNL,
- a rolling buffer, which sits in the SGA fixed area and holds the historical statistical information of active sessions

ASH is enabled by default, but before you rush off to use it, be aware that it is a licenced feature. It is part of the Diagnostic Pack, so you have to pay for it. I don’t like that either, but that’s how it is.

The ASH will auto-configure itself to try and hold at least one-hour worth of instance activity without wasting too much value SGA space and will never be bigger than 5% of SGA_TARGET or 5% of SHARED_POOL (if AutoSGA is not used)". ASH memory represents 2MB of memory per CPU, ASH cannot exceed 5% of shared pool and 5% of SGA_TARGET.

Comparison of ASH with SQL TRACE

ASH and SQL*Trace are not the same thing, but both are valuable tools for finding out about where processes spend time. SQL*Trace (or event 10046 as we used to call it) has been my weapon of choice for solving performance issues for a very long time, and it is extremely effective, and there is still a place for it.

There are difficulties with using SQL trace, especially in a production environment.

Firstly, it does have a run time overhead. You could afford to trace a single process, but you certainly couldn’t trace the entire database.

1) You have to work with trace in a reactive way. You will probably not already be tracing a process when you experience a performance problem, so you need to run the process again and reproduce the poor performance with trace.

2) Trace will tell you if a session is blocked waiting on a lock. However, it will not tell you who is blocking you. ASH will do this (although there are limitations).

3) A trace file records everything that happens in a session, whereas ASH data samples the session every seconds. Short-lived events will be missed, so the data has to be handled statistically.

4) There are problems with both approaches if you have the kind of application where you have lots of different SQL statements because the application uses literal values rather than bind variables (and cursor sharing is EXACT).

5) Oracle’s TKPROF trace file profiler cannot aggregate these statements, but I have found another called ORASRP that can. With ASH, you will see different SQL_IDs, but it can be effective to group statements with the same execution plan.

6) You may have trouble finding the SQL text in the SGA (or via the DBMS_XPLAN package) because it has already been aged out of the library cache. You may have similar problems with historical ASH data because the statement had been aged out when the AWR snapshot was taken.

7) A trace file, with STATISTICS_LEVEL set to ALL, will give you timings for each operation in the execution plan. So, you can see where in the execution plan the time was spent. ASH will only tell you how long the whole statement takes to execute, and how long was spent on which wait event.

Application Instrumentation

Oracle has provided a package called DBMS_APPLICATION_INFO since at least Oracle 8. This allows you to set two attributes; MODULE and ACTION for a session. That value then appears in v$session, and can be very useful to help you identify what database sessions relate to what part of an application. These values are then also captured by ASH. I cannot over-emphasise the importantance of this instrumentation when analysing performance issues. Without sensible values in these columns all you have is the program
name. You will probably struggle to identify ASH data for the sessions which are of interest. These values are not set by default. Instead DBAs are dependent on developers to include them in their code. For example, Oracle E-Business Suite has built this into the application.

Following query fetchs top sqls spent more on cpu/wait/io. (Thanks to Kyle Hailey for this script):

ash.SQL_ID ,
sum(decode(ash.session_state,'ON CPU',1,0)) "CPU",
sum(decode(ash.session_state,'WAITING',1,0)) -
sum(decode(ash.session_state,'WAITING', decode(en.wait_class, 'User I/O',1,0),0)) "WAIT" ,
sum(decode(ash.session_state,'WAITING', decode(en.wait_class, 'User I/O',1,0),0)) "IO" ,
sum(decode(ash.session_state,'ON CPU',1,1)) "TOTAL"
from v$active_session_history ash,v$event_name en
where SQL_ID is not NULL and en.event#=ash.event#

Tuesday, 22 November 2011

I/O Calibration Workaround

The I/O calibration feature of Oracle Database enables you to assess the performance of the storage subsystem, and determine whether I/O performance problems are caused by the database or the storage subsystem. Unlike other external I/O calibration tools that issue I/Os sequentially, the I/O calibration feature of Oracle Database issues I/Os randomly using Oracle data files to access the storage media, producing results that more closely match the actual performance of the database.

Oracle Database also provides Orion, an I/O calibration tool. Orion is a tool for predicting the performance of an Oracle database without having to install Oracle or create a database. Unlike other I/O calibration tools, Oracle Orion is expressly designed for simulating Oracle database I/O workloads using the same I/O software stack as Oracle. Orion can also simulate the effect of striping performed by Oracle Automatic Storage Management.

Prerequisites of I/O calibration :=
1) The user must be granted the SYSDBA privilege
2) timed_statistics must be set to TRUE
3) Asynchronous I/O must be enabled
4) When using file systems, asynchronous I/O can be enabled by setting the FILESYSTEMIO_OPTIONS initialization parameter to SETALL.
5) Ensure that asynchronous I/O is enabled for data files by running the following query:


Oracle Orion is a tool for predicting the performance of an Oracle database without having to install Oracle or create a database. Unlike other I/O calibration tools, Oracle Orion is expressly designed for simulating Oracle database I/O workloads using the same I/O software stack as Oracle. Orion can also simulate the effect of striping performed by Oracle Automatic Storage Management.

Orion Test Targets

You can use Orion to test any disk-based character device that supports asynchronous I/O. Orion has been tested on the following types of targets:
1) DAS (direct-attached) storage: You can use Orion to test the performance of one or more local disks, volumes, or files on the local host.
2) SAN (storage-area network) storage: Orion can be run on any host that has all or parts of the SAN storage mapped as character devices. The devices can correspond to striped or un-striped volumes exported by the storage array(s), or individual disks, or one or more whole arrays.
3) NAS (network-attached storage): You can use Orion to test the performance on data files on NAS storage. In general, the performance results on NAS storage are dependent on the I/O patterns with which the data files have been created and updated. Therefore, you should initialize the data files appropriately before running Orion.

For each type of workload , Orion can run tests using different I/O loads to measure performance metrics such as MBPS, IOPS, and I/O latency. Load is expressed in terms of the number of outstanding asynchronous I/Os. Internally, for each such load level, the Orion software keeps issuing I/O requests as fast as they complete to maintain the I/O load at that level. For random workloads, using either large or small sized I/Os, the load level is the number of outstanding I/Os. For large sequential workloads, the load level is a combination of the number of sequential streams and the number of outstanding I/Os per stream. Testing a given workload at a range of load levels can help you understand how performance is affected by load.

Note the following when you use Orion:
1) Run Orion when the storage is idle (or pretty close to idle). Orion calibrates the performance of the storage based on the I/O load it generates; Orion is not able to properly assess the performance if non-Orion I/O workloads run simultaneously.
2) If a database has been created on the storage, the storage can alternatively be calibrated using the PL/SQL routine dbms_resource_manager.calibrate_io().

Thursday, 26 May 2011


An SQL cursor is nothing but an area in user session memory which contains data about a specific statement. An SQL cursor actually has 2 parts.. The first heap which contains library cache metadata and second heap which is a larger heap contains executable representation of the cursor (usually called SQLAREA). Each heap is comprised of one or more chunks of memory.

SQL AREA can be thought of as a handle which points to a private sql area in UGA and the further pointing to the shared SQL area. All SQL cursors are allocated in 4K chunks. The bigger the cursor the more 4K chunks are allocated. An open cursor has already been parsed and the cursor handle is in library cache.

If the cursor is not in shared pool obviosuly it has to be reconstructed (hard parse) which requires a shared pool and library cache latch. A soft parse is preferable to a hard parse because the database skips the optimization and row source generation steps, proceeding straight to execution.

To reduce the mount of latching in your database DBA's configure a session cursor cache which contains instantiation of shared child cursors (closed session cursors) so that repeatable statements can reuse these cursors. You can store the reusable cursors in this session cursor cache. And the parameter session_cached_Cursors is independent of open_cursors. That means you can keep it higher than open_cursors because session cursors are not cached in open state. In 11G R2 default value of open_cursor is 300 and for session_Cached_cursors is 50

When the statement is in library cache, the associated cursor for the statement is placed on a hash chain. All these cursor handles are in shared pool inside hash chains. The parsing procedure actually has the following workings :=

1) find and execute an open cursor
2) Find a closed cursor in session cursor cache
3) search the hash chains for a soft parse
4) Construct the cursor (hard parse)

A Cursor is a handle to a specific PRIVATE SQL AREA. PRIVATE SQL AREA which is in the UGA keeps information about a parsed sql statement and session specific information for processing. When a statement is executed the process uses private sql area to store bind variable values, query execution work area and so on...
A Shared SQL area is different from private sql area. The shared sql area is in library cache containing execution plans in the SGA.

In the context of PL/SQL, the following is a notion of cursors


Static cursors are normal cursors like (explicit and implicit). For explicit cursors...A cursor is a mechanism by which you can assign a name to a "select statement" and manipulate the information within that SQL statement. A programmer can open, fetch data, close and check attributes of this kind of cursor.

Implicit cursors are more fast and they have less coding effort. And this one will never raise INVALID_CURSOR error and it cannot be opened outside the statement.


Ref CURSORS created only when it is opened. A REF CURSOR is basically a data type.  A variable created based on such a data type is generally called a cursor variable.  A cursor variable can be associated with different queries at run-time.  The primary advantage of using cursor variables is their capability to pass result sets between sub programs (like stored procedures, functions, packages etc....)

Saturday, 21 May 2011

Adaptive Cursor Sharing


The adaptive cursor sharing feature enables a single statement that contains bind variables to use multiple execution plans. Cursor sharing is "adaptive" because the
cursor adapts its behavior so that the database does not always use the same plan for each execution or bind variable value.


A bind-sensitive cursor is a cursor whose optimal plan may depend on the value of a bind variable. The database monitors the behavior of a bind-sensitive cursor that uses different bind values to determine whether a different plan is beneficial.

The criteria used by the optimizer to decide whether a cursor is bind-sensitive include the following:

1) The optimizer has peeked at the bind values to generate selectivity estimates.
2) A histogram exists on the column containing the bind value.


A bind-aware cursor is a bind-sensitive cursor eligible to use different plans for different bind values. After a cursor has been made bind-aware, the optimizer chooses plans for future executions based on the bind value and its selectivity estimate. When a statement with a bind-sensitive cursor executes, the database decides whether to mark the cursor bind-aware. The decision depends on whether the cursor produces significantly different data access patterns for different bind values.

If the database marks the cursor bind-aware, then the next time that the cursor executes the database does the following:

1) Generates a new plan based on the new bind value.
2) Marks the original cursor generated for the statement as not shareable (V$SQL.IS_SHAREABLE is N). This cursor is no longer usable and will be among the first to be aged out of the shared SQL area.


If the optimizer creates a plan for a bind-aware cursor, and if this plan is the same as an existing cursor, then the optimizer can perform cursor merging. In this case, the database merges cursors to save space in the shared SQL area. The database increases the selectivity range for the cursor to include the selectivity of the new bind. Suppose you execute a query with a bind value that does not fall within the selectivity ranges of the existing cursors. The database performs a hard parse and generates a new plan and new cursor. If this new plan is the same plan used by an existing cursor, then the database merges these two cursors and deletes one of the old cursors.