Wednesday 23 November 2011

Actual use of Active Session History (ASH)

Active Session History (ASH) was introduced in Oracle 10g. It samples the activity of each active1 database session every second. The data is held in a buffer in memory in the database. The design goal is to keep about an hour (your mileage will vary).
If a session is not active it will not be sampled. The in-memory buffer is exposed via a view called v$active_session_history. But yes if you want inactive sessions also to be sampled you can set the parameter _ash_enable_all = TRUE.

When an AWR snapshot is taken, 1 row in 10 from the ASH buffer is copied down into the AWR repository. It can also be flushed to disk between snapshots when the buffer reaches 66% full, so there is no missed data.The data is stored in WRH$_ACTIVE_SESSION_HISTORY and it is exposed via dba_hist_active_sess_history.

The main components of the ASH are
- 2 background procresses, MMON and MMNL,
- a rolling buffer, which sits in the SGA fixed area and holds the historical statistical information of active sessions

ASH is enabled by default, but before you rush off to use it, be aware that it is a licenced feature. It is part of the Diagnostic Pack, so you have to pay for it. I don’t like that either, but that’s how it is.

The ASH will auto-configure itself to try and hold at least one-hour worth of instance activity without wasting too much value SGA space and will never be bigger than 5% of SGA_TARGET or 5% of SHARED_POOL (if AutoSGA is not used)". ASH memory represents 2MB of memory per CPU, ASH cannot exceed 5% of shared pool and 5% of SGA_TARGET.

Comparison of ASH with SQL TRACE

ASH and SQL*Trace are not the same thing, but both are valuable tools for finding out about where processes spend time. SQL*Trace (or event 10046 as we used to call it) has been my weapon of choice for solving performance issues for a very long time, and it is extremely effective, and there is still a place for it.

There are difficulties with using SQL trace, especially in a production environment.

Firstly, it does have a run time overhead. You could afford to trace a single process, but you certainly couldn’t trace the entire database.

1) You have to work with trace in a reactive way. You will probably not already be tracing a process when you experience a performance problem, so you need to run the process again and reproduce the poor performance with trace.

2) Trace will tell you if a session is blocked waiting on a lock. However, it will not tell you who is blocking you. ASH will do this (although there are limitations).

3) A trace file records everything that happens in a session, whereas ASH data samples the session every seconds. Short-lived events will be missed, so the data has to be handled statistically.

4) There are problems with both approaches if you have the kind of application where you have lots of different SQL statements because the application uses literal values rather than bind variables (and cursor sharing is EXACT).

5) Oracle’s TKPROF trace file profiler cannot aggregate these statements, but I have found another called ORASRP  www.oracledba.ru/orasrp) that can. With ASH, you will see different SQL_IDs, but it can be effective to group statements with the same execution plan.

6) You may have trouble finding the SQL text in the SGA (or via the DBMS_XPLAN package) because it has already been aged out of the library cache. You may have similar problems with historical ASH data because the statement had been aged out when the AWR snapshot was taken.

7) A trace file, with STATISTICS_LEVEL set to ALL, will give you timings for each operation in the execution plan. So, you can see where in the execution plan the time was spent. ASH will only tell you how long the whole statement takes to execute, and how long was spent on which wait event.

Application Instrumentation

Oracle has provided a package called DBMS_APPLICATION_INFO since at least Oracle 8. This allows you to set two attributes; MODULE and ACTION for a session. That value then appears in v$session, and can be very useful to help you identify what database sessions relate to what part of an application. These values are then also captured by ASH. I cannot over-emphasise the importantance of this instrumentation when analysing performance issues. Without sensible values in these columns all you have is the program
name. You will probably struggle to identify ASH data for the sessions which are of interest. These values are not set by default. Instead DBAs are dependent on developers to include them in their code. For example, Oracle E-Business Suite has built this into the application.

Following query fetchs top sqls spent more on cpu/wait/io. (Thanks to Kyle Hailey for this script):

select
ash.SQL_ID ,
sum(decode(ash.session_state,'ON CPU',1,0)) "CPU",
sum(decode(ash.session_state,'WAITING',1,0)) -
sum(decode(ash.session_state,'WAITING', decode(en.wait_class, 'User I/O',1,0),0)) "WAIT" ,
sum(decode(ash.session_state,'WAITING', decode(en.wait_class, 'User I/O',1,0),0)) "IO" ,
sum(decode(ash.session_state,'ON CPU',1,1)) "TOTAL"
from v$active_session_history ash,v$event_name en
where SQL_ID is not NULL and en.event#=ash.event#

Tuesday 22 November 2011

I/O Calibration Workaround

The I/O calibration feature of Oracle Database enables you to assess the performance of the storage subsystem, and determine whether I/O performance problems are caused by the database or the storage subsystem. Unlike other external I/O calibration tools that issue I/Os sequentially, the I/O calibration feature of Oracle Database issues I/Os randomly using Oracle data files to access the storage media, producing results that more closely match the actual performance of the database.

Oracle Database also provides Orion, an I/O calibration tool. Orion is a tool for predicting the performance of an Oracle database without having to install Oracle or create a database. Unlike other I/O calibration tools, Oracle Orion is expressly designed for simulating Oracle database I/O workloads using the same I/O software stack as Oracle. Orion can also simulate the effect of striping performed by Oracle Automatic Storage Management.

Prerequisites of I/O calibration :=
1) The user must be granted the SYSDBA privilege
2) timed_statistics must be set to TRUE
3) Asynchronous I/O must be enabled
4) When using file systems, asynchronous I/O can be enabled by setting the FILESYSTEMIO_OPTIONS initialization parameter to SETALL.
5) Ensure that asynchronous I/O is enabled for data files by running the following query:
SELECT NAME,ASYNCH_IO FROM V$DATAFILE F,V$IOSTAT_FILE I WHERE  F.FILE#=I.FILE_NO AND FILETYPE_NAME='Data File';

SOME OVERVIEW OF ORION TOOL

Oracle Orion is a tool for predicting the performance of an Oracle database without having to install Oracle or create a database. Unlike other I/O calibration tools, Oracle Orion is expressly designed for simulating Oracle database I/O workloads using the same I/O software stack as Oracle. Orion can also simulate the effect of striping performed by Oracle Automatic Storage Management.

Orion Test Targets

You can use Orion to test any disk-based character device that supports asynchronous I/O. Orion has been tested on the following types of targets:
1) DAS (direct-attached) storage: You can use Orion to test the performance of one or more local disks, volumes, or files on the local host.
2) SAN (storage-area network) storage: Orion can be run on any host that has all or parts of the SAN storage mapped as character devices. The devices can correspond to striped or un-striped volumes exported by the storage array(s), or individual disks, or one or more whole arrays.
3) NAS (network-attached storage): You can use Orion to test the performance on data files on NAS storage. In general, the performance results on NAS storage are dependent on the I/O patterns with which the data files have been created and updated. Therefore, you should initialize the data files appropriately before running Orion.

For each type of workload , Orion can run tests using different I/O loads to measure performance metrics such as MBPS, IOPS, and I/O latency. Load is expressed in terms of the number of outstanding asynchronous I/Os. Internally, for each such load level, the Orion software keeps issuing I/O requests as fast as they complete to maintain the I/O load at that level. For random workloads, using either large or small sized I/Os, the load level is the number of outstanding I/Os. For large sequential workloads, the load level is a combination of the number of sequential streams and the number of outstanding I/Os per stream. Testing a given workload at a range of load levels can help you understand how performance is affected by load.

Note the following when you use Orion:
1) Run Orion when the storage is idle (or pretty close to idle). Orion calibrates the performance of the storage based on the I/O load it generates; Orion is not able to properly assess the performance if non-Orion I/O workloads run simultaneously.
2) If a database has been created on the storage, the storage can alternatively be calibrated using the PL/SQL routine dbms_resource_manager.calibrate_io().