Thursday, July 9, 2020

Oracle Resource Manager | Decoding The Enegma Machine

Oracle Resource Manager (available since 8i) is one of the underestimated Oracle features, because it's not commonly used among DBAs.
In this article I'll try to explain it in a simple way to make you change your mind and start making use of it.
What is Resource Manager:  
In short, it gives the DBA the power of controlling system resources i.e. (CPU) by specifying who has the proirity of using system resources at the time of resource starvation. This means if you don't have CPU bottle necks throughout the day, you don't need it!

Resource Manager comes with Enterprise Edition without extra cost.

Why DBA should use Resource Manager:
The following scenarios illustrate the benefit of using the Resource Manager:

- You want to control the resource consumed by users using tools like SQL Developer/TOAD. (i.e. limit the CPU consumption/ Cancel the query/ Kill the session/ Limit the used degree of parallelism).
- In case there is a resource contention occur at night and you have a very important ETL that should complete on time, and you want Oracle to give this job the highest priority.
- There is an important DB service being used by finance applications at the end of the month, and you don't want other departments connecting via different services to compete the CPU resource with that finance service.
- You want the night RMAN full backup job to release the CPU resource for production applications in case a CPU contention happen at night.

How to use Resource Manager:
Configuring Resource Manager is a bit complicated, but if you follow the right sequence it will be a piece of cake.

Now it's time to setup a Plan and its Consumer Groups and Map these consumer Groups to Consumers. Before getting confused with these terms, let me first start with an example simplifies them to you:

Imagine you have opened a new fried chicken chain called Zeko Fried Chickens ZFC:
You will start with putting a business plan that will boost the food production and get the customers served as quickly as possible [let's refer to this plan as Resource Plan] the plan is as follows:
- You categorized the food production into 3 main meals [Spicy , Non-Spicy and Kids meal]. [let's refer to the meals as Consumer Groups]- You categorize your customers into three categories (Adults like spicy food, Adults don't like spicy food and kids). [Let's refer to the customers as Consumers]
- When customers come in, they will order one of the above mentioned meals, each meal will be ready within 2 minutes. [Let's refer to the process of delivering the meals to the customers as Plan Directive]

Congrats! if you understood the above example you understood the Resource Manager components;
Resource Plan specifies how much CPU will be shared with consumer groups, who will be assigned to these consumer groups, and the distribution of the rest of unused CPU if available.
Consumer Group defines how much CPU resource will be used by specific consumers.
Consumers are the ones who will utilize the resources allocated to the consumer group.
Plan Directive defines the criteria by which the Consumers will get assigned to Consumer Groups.
Next, let's move to the creation of the Resource Manager components ...

- First, create something called Pending Area, it's kind of a temporary area Oracle  use to track/validate the changes to the resource manager to maintain the integrity of the Resource Manager configurations and avoid any miss up could happen. This Pending Area will be associated with your session only, at the end of setting up the Resource Manager you have to close/finish this Pending Area in order to validate and submit the changes. It's similar of BEGIN & END in PL/SQL but this one is specific to Resource Manager.
EXEC  DBMS_RESOURCE_MANAGER.clear_pending_area();
EXEC  DBMS_RESOURCE_MANAGER.create_pending_area(); 

Create a Plan:
EXEC  DBMS_RESOURCE_MANAGER.create_plan('prod_plan', 'Plan for high and low priority tasks.');

    Rollback: Delete Plan:
    -- EXEC dbms_resource_manager.DELETE_PLAN_CASCADE ('prod_plan');

Create consumer groups:
EXEC  DBMS_RESOURCE_MANAGER.create_consumer_group('high_cg',       'high priority free to utilize the cpu');
EXEC  DBMS_RESOURCE_MANAGER.create_consumer_group('low_cg',        'low priority');
EXEC  DBMS_RESOURCE_MANAGER.create_consumer_group('conditioned_cg','low priority when met some conditions');

    Rollback: Delete Consumer Groups:
    -- EXEC DBMS_RESOURCE_MANAGER.delete_consumer_group('high_cg');
    -- EXEC DBMS_RESOURCE_MANAGER.delete_consumer_group('low_cg');
    -- EXEC DBMS_RESOURCE_MANAGER.delete_consumer_group('conditioned_cg');

Create Plan Directive:
Plan Directive defines the resource limits for each Consumer group: [Here is the core idea of the Resource Manager, if you understand it, you understand the whole topic]

Note: MGMT_Px represent the CPU level, MGMT_P1 is the capacity of 100% of all CPUs (cores), whatever will remain unused (leftover) from MGMT_P1 it can be utilized by MGMT_P2 (level2), and whatever leftover from level2 can be used by MGMT_P3 and so on up to 8 levels. Which means if MGMT_P1 is utilizing 100% no CPU will remain left for MGMT_P2.

Ideally, all apps/services should have a share in level 1 (MGMT_P1) each share represent % of total CPU, all shares combined should not exceed 100%, This share doesn't restrict the App/service of using 100% of the CPU if CPU is not utilized by other services (as long as the active running sessions is less than the number of CPUs), but it will maintain that at anytime the Apps/services will get their defined % share in case there are other sessions competing for CPU resources (CPU is busy).
OK, if all applications will be under level 1 then each application will have a defined share in the CPU, but what is the difference between the plan directive levels (MGMT_P1, MGMT_P2, ...)?
The difference between plan directive levels is similar to the lion family, when they are not hungry, and meat is available, anyone of the lion family can eat peacfully from the meat (no CPU contention), but when they all become hungry and meat is not enough for all (CPU contention), here the rule will apply (Plan Directive), the lion will eat first till it get stomach full, then the lionesses and last the lion kids.

Note: There is a consumer group exist by default called "OTHER_GROUPS", it MUST be part of any defined plan directive.

Note: To drop a consumer group you must start with dropping the lowest MGMT_Px consumer group.
Note: Plan directive can be modified (dropped/recreated) on the fly and it will reflect immediately on the current connected sessions once the pending area get submitted.

Define a separate plan directive for default "OTHER_GROUPS": [80% o CPU in Level 1 - highest]
  DBMS_RESOURCE_MANAGER.create_plan_directive (
    plan             => 'prod_plan',
    group_or_subplan => 'OTHER_GROUPS',
    comment          => 'high priority by default',
    MGMT_P1          => 80,
    MGMT_P2          => 0);
Note: When CPU resources are not enough for everyone, the sessions assigned to this group will gurantee that 80% of CPU resources are reserved for them to utilize.
    Rollback: Delete Plan Directive:
    -- EXEC DBMS_RESOURCE_MANAGER.DELETE_PLAN_DIRECTIVE (plan=>'prod_plan', group_or_subplan=>'OTHER_GROUPS');
Define the LOW Priority group: [20% of CPU in level 1]
  DBMS_RESOURCE_MANAGER.create_plan_directive (
    plan                      => 'prod_plan',
    group_or_subplan => 'low_cg',
    comment              => 'Low Priority - level 1',
    -- Limit the number of the concurrent active sessions at any given time for the sessions associated to this consumer group:
    --ACTIVE_SESS_POOL_P1 => 32
    -- Limit the degree of parallelism for the sessions associated to this consumer group:
    MGMT_P1          => 20
Note: When CPU resources are not enough for everyone, the sessions assigned to this group will gurantee that 20% of CPU resources are reserved for them to utilize.
    Rollback: Delete Plan Directive:
    -- EXEC DBMS_RESOURCE_MANAGER.DELETE_PLAN_DIRECTIVE (plan=>'prod_plan',group_or_subplan=>'low_cg');

Define the Conditioned group: [the least priority but 100% of level 2]
    plan             => 'prod_plan',
    group_or_subplan => 'conditioned_cg',
    comment          => 'if the utilization met specific parameter lower the priority',
    MGMT_P1          => 0,
    MGMT_P2          => 100,
    MGMT_P3          => 0,
    -- Switch to other group (with lower CPU)/Cancel Call/Kill Session if the switch conditions are met:
--  switch_group        => 'low_cg',
-- CANCEL_SQL is available only in 12c onwards.
--  switch_group        => 'CANCEL_SQL',
    switch_group        => 'KILL_SESSION',
    -- If Elapsed Time exceeds 30 minutes (1800 seconds):
    switch_time          => 1800,
    -- If the Number of I/O requests exceeds 500000:
    switch_io_reqs      => 500000,
    -- If the Amount of I/O requested exceeds 1024M:
    switch_io_megabytes => 1024,
    -- switch_for call: (12c+)
    switch_for_call      => TRUE
Note: When CPU resources are not enough for everyone, the sessions assigned to this group will not utilize any CPU resources until the sessions assigned to level 1 groups (OTHER_GROUPS & LOW_CG) are not utilizing the CPU anymore. (remember the lionesses that have to what for the lion to finish his meal in the early example).
    Rollback: Delete Plan Directive:
    -- EXEC DBMS_RESOURCE_MANAGER.DELETE_PLAN_DIRECTIVE (plan=>'prod_plan',group_or_subplan=>'conditioned_cg');

Validate/Apply the resource plan:
EXEC DBMS_RESOURCE_MANAGER.validate_pending_area;
EXEC DBMS_RESOURCE_MANAGER.submit_pending_area();

Map/Assign Users/Services/Modules/Machines to Consumer Groups:

Let's suppose that we don't want the reporting applications to compete with the other production Apps for the CPU resources, in this case we will assign the sessions connecting through the reporting application service and let's say it's name is reporting_svc, to conditioned_cg consumer group:

create a pending area: [temporary work area for Resource Management configuration until it get submitted]

EXEC  DBMS_RESOURCE_MANAGER.clear_pending_area();
EXEC  DBMS_RESOURCE_MANAGER.create_pending_area();

DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (
  --attribute     => DBMS_RESOURCE_MANAGER.oracle_user,
  --attribute     => DBMS_RESOURCE_MANAGER.module_name,
  --attribute     => DBMS_RESOURCE_MANAGER.client_os_user,
  --attribute     => DBMS_RESOURCE_MANAGER.client_program,
  --attribute     => DBMS_RESOURCE_MANAGER.client_machine,
  --attribute     => DBMS_RESOURCE_MANAGER.service_module
  attribute       => DBMS_RESOURCE_MANAGER.service_name,
  value           => 'reporting_svc',
  consumer_group  => 'conditioned_cg');

    Rollback: To Remove an attribute, run the same command with the same attribute & value, but remove "consumer_group" parameter:
    DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (
      attribute       => DBMS_RESOURCE_MANAGER.service_name,
      value           => 'reporting_svc');

Here we are providing service_name but you can assign the sessions to the consumer group based on the other available attributes (which are commented and grayed. 
Now what will happen:
As the sessions connected to reporting_svc became part of conditioned_cg consumer group the following will be applied to them:

- Sessions connected to reporting_svc service will not compete with the other applications when CPU reach 100% and will wait for CPU to come below 100% to start to work.
- If any session connected to reporting_svc service ran a query for more than 30 minutes the session will be killed.
- If any session connected to reporting_svc service exceed 50000 I/O requests the session will be killed.
- If any session connected to reporting_svc service exceed 1GB of the I/O volume the session will be killed. 

You may ask, but you didn't assign the rest of application sessions yet to the other consumer groups, how they will have a high priority then?
Any session that is not yet manually been assigned to any of the consumer groups will be part of OTHER_GROUPS consumer group by default, this explains why OTHER_GROUPS must be part of any plan --to not leave any session not mapped to a consumer group; also explains why we configured it with MGMT_P1 which has the highest priority.

Example of mapping SQL Developer sessions to conditioned_cg group:
           DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (
           attribute     => DBMS_RESOURCE_MANAGER.module_name,
value          => 'SQL Developer',
           consumer_group  => 'conditioned_cg');

Validate/Apply the resource plan:
EXEC DBMS_RESOURCE_MANAGER.validate_pending_area;
EXEC DBMS_RESOURCE_MANAGER.submit_pending_area();

Grant permission to All users to use/switch to the created Consumer Groups:

EXEC dbms_resource_manager_privs.grant_switch_consumer_group('public', 'high_cg', FALSE);
EXEC dbms_resource_manager_privs.grant_switch_consumer_group('public', 'low_cg',  FALSE);
EXEC dbms_resource_manager_privs.grant_switch_consumer_group('public', 'conditioned_cg', FALSE);

Activiate the Resource Plan:
   Rollback: Reset the plan in RAC setup:

   Rollback: Reset the plan in a Standalone setup:

To prevent a resource plan from being changed during such maintenance windows use FORCE keyword:


The advantage of keeping default resource plan DEFAULT_PLAN on RAC protects PMON and LMS, from CPU starvation, SYS & SYSTEM sessions will have the highest priority, maintenance AUTO TASKS (i.e. Gather Statistics, SQL Tuning, Segment Advisor) will have the lowest priority.

Implementation is complete, you can tailor the above steps to cater to your business needs.
Monitoring The Resource Manager:
View which Resource Plan is currently Active: [CPU should be ON]
SQL> select name, cpu_managed from v$rsrc_plan where is_top_plan = 'TRUE';  

View a Consumer Plan and its consumer Groups:
SQL> select group_or_subplan from dba_rsrc_plan_directives where plan = 'prod_plan';

View the History of activated Resource Plans:
SQL> select name, to_char(start_time, 'MON DD HH24:MI') start_time, to_char(end_time, 'MON DD HH24:MI') end_time from v$rsrc_plan_history order by start_time;

View the Consumer Groups Mapping Attributes:
SQL> select * from DBA_RSRC_GROUP_MAPPINGS order by consumer_group;
View the mapped Consumer Group to the connected sessions:
SQL> SELECT username,initial_rsrc_consumer_group FROM dba_users;
SQL> select name,active_sessions,io_service_time,small_read_megabytes from V$RSRC_CONSUMER_GROUP;
SQL> SELECT username,module,sid,serial#,resource_consumer_group,count(*) from v$session
     where username is not null group by username,module,sid,serial#,resource_consumer_group order by 2,1;
SQL> SELECT username,sid, resource_consumer_group FROM  v$session WHERE  SERVICE_NAME='reporting_svc';

     -- Display the Sessions and their current consumer group:
     select s.username,r.sid, r.mapped_consumer_group, r.mapping_attribute,  c.consumer_group original_consumer_group
     from v$rsrc_session_info r, dba_rsrc_consumer_groups c, v$session s where r.orig_consumer_group_id = c.consumer_group_id and s.sid=r.sid;

     -- Display the Sessions that moved between consumer groups:
     select r.sid, c1.consumer_group original_consumer_group, c2.consumer_group current_consumer_group
     from v$rsrc_session_info r, dba_rsrc_consumer_groups c1, dba_rsrc_consumer_groups c2  where r.orig_consumer_group_id = c1.consumer_group_id
     and r.current_consumer_group_id = c2.consumer_group_id and r.orig_consumer_group_id != r.current_consumer_group_id;

View the Consumer Groups:
col CONSUMER_GROUP     for a30
col CATEGORY        for a20
col COMMENTS        for a30
col CPU_METHOD        for a20
col MGMT_METHOD        for a20
col status        for a10
col MANDATORY        for a10
where  consumer_group not like '%ORA$%'

View the Consumer Group attributes:

Consumer Groups CPU Consumption:
SQL> select to_char(begin_time, 'HH:MI') time, consumer_group_name,   60 * (select value from v$osstat where stat_name = 'NUM_CPUS') total_CPU,   60 * (select value from v$parameter where name = 'cpu_count') db_total, cpu_consumed_time / 1000 consumed, cpu_consumed_time / (select value from v$parameter where name =       'cpu_count') / 600 %cpu_utilization, cpu_wait_time / 1000 throttled from v$rsrcmgrmetric_history  order by begin_time;

Note: The existance of wait event "resmgr:cpu quantum" represent the CPU throttling that caused by the Resource Manager when DB has a workload exceeds the CPU capacity.


No comments:

Post a Comment