Thursday 23 February 2023

What does the error message "fork: retry: Resource temporarily unavailable"

Issue:
The following errors are seen in /var/log/secure:

Feb 23 14:34:57 erpuatappl sshd[11618]: fatal: setresuid 501: Resource temporarily unavailable
Feb 23 14:41:15 erpuatappl sshd[11850]: error: do_exec_pty: fork: Resource temporarily unavailable

Root Cause:
There can be various reasons for processes not being able to fork:

There is a misbehaving service or process running, consuming more resources than expected.
The system was not able to create new processes, because of the limits set for nproc in /etc/security/limits.conf.
The system ran out of memory and new processes were unable to start because they could not allocate memory.
There is not an available ID to assign to the new process. A unique value less than kernel.pid_max must be available.

Resolution:
There can be various reasons for processes not being able to fork and thus that means there are also various resolution:

1. When the system runs into a limitation in the number of processes, increase the nproc value in /etc/security/limits.conf or /etc/security/limits.d/90-nproc.conf depending on RHEL version. 

2. The limit can be increased for a specific user or all users. For example, here is an example of /etc/security/limits.d/90-nproc.conf file.

<user>       -          nproc     2048      <<<----[ Only for "<user>" user ]

*            -          nproc     2048      <<<----[ For all user's ]

3. Check the total number of threads and processes running on the server:

[applmgr@erpuatappl ~]$  ps -eLf | wc -l

2332

[applmgr@erpuatappl ~]$ cat /proc/sys/kernel/pid_max

32768

For example, if the above result is 2332, then increase kernel.pid_max to 32768.

kernel.pid_max must be larger than the total number of simultaneous threads and processes.

Diagnostic Steps:

Check with sar whether all memory was used or whether a large number of processes was spawned.

**In order to check the use of processes against what is allowed for the user, check the output of ulimit -u for the limit set to the particular user, and compare with the number of processes the user is runing.

**You can run the below command to find the number of processes opened for every user and compare if that limit is exceeded with what defined in /etc/security/limits.conf or /etc/security/limits.d/*.

[applmgr@erpuatappl ~]$ ps --no-headers auxwwwm | awk '$2 == "-" { print $1 }' | sort | uniq -c | sort -n
      1 dbus
      1 gdm
      1 rpc
      1 rpcuser
      2 postfix
      3 68
      3 rtkit
      5 mfe
    165 oracle
    813 applmgr
    317 root

**Increase the value for the "nproc" parameter in /etc/security/limits.conf.
Add the following settings to /etc/security/limits.conf:
oracle           soft     nproc   4096
oracle           hard    nproc   16384
applmgr        soft     nofile  4096
applmgr        hard    nofile  65536
                                           
**Add or edit the following line in the /etc/pam.d/login file, if it does not already exist:

session     required     pam_limits.so

                                      OR
Make a profile if it does not already exist and then put below entry:
**Add the following lines to /etc/profile:

if [ $USER = "oracle" ]; then
    if [ $SHELL = "/bin/ksh" ]; then
        ulimit -p 16384
        ulimit -n 65536
    else
        ulimit -u 16384 -n 65536
    fi
fi

Troubleshooting performance issues in Linux

Server Slowdown:
Performance problems are caused by bottlenecks in one or more hardware subsystems, depending on the profile of resource usage on your system. 

Some elements to consider (in roughly sorted order):

Buggy software
Disk usage
Memory usage
CPU cycles
Network bandwidth

Now, let’s look at the three biggest causes of server slowdown: CPU, RAM, and disk I/O. CPU usage can cause overall slowness on the host, and difficulty completing tasks in a timely fashion. Some tools I use when looking at CPU are top and sar.

SAR Command:

For historical CPU performance data I rely on the sar command, which is provided by the sysstat package. On most server versions of Linux, sysstat is installed by default, but if it’s not, you can add it with your distro’s package manager. The sar utility collects system data every 10 minutes via a cron job located in /etc/cron.d/sysstat (CentOS 7.6). Here’s how to check all of the "Big 3" using sar."sar -A" shows a full report.

To check RAM performance, I use sar command, which give you that day’s memory usage:

$sar -r (starting at midnight)

The main thing to look for in RAM usage is %memused and %commit. A quick word about the %commit field: 

This field can show above 100% since the Linux kernel routinely overcommits RAM. If %commit is consistently over 100%, this result could be an indicator that the system needs more RAM.

The command sar -u gives you info about all CPUs on the system, starting at midnight:

$sar -u (starting at midnight)

As with top, the main things to check here are %user, %system, %iowait, and %idle. This information can tell you how far back the server has been having issues.

For disk I/O performance, I use sar -d, which gives you the disk I/O output using just the device name. 
To get the name of the devices, use sar -dP:
$sar -d
$sar -dP

For this output, looking at %util and %await will give you a good overall picture of disk I/O on the system. The %util field is pretty self-explanatory: It’s the utilization of that device. The await field contains the amount of time the I/O spends in the scheduler.

If any of these commands show a problem, you can go back to see when the server issues started by using:

$sar {-u, -r, -d, -dP} -f /var/log/sa/sa<XX> (where XX is the day of the month you wish to look for).

TOP Command:

The top utility gives you a real-time look at what’s going on with the server. By default, when top starts, it shows activity for all CPUs:

Some things to look for in this view would be the load average (displayed on the right side of the top row), and the value of the following for each CPU:

Command syntax

$ top -c or top

us: This percentage represents the amount of CPU consumed by user processes.

sy: This percentage represents the amount of CPU consumed by system processes.

id: This percentage represents how idle each CPU is.

Each of these three values can give you a fairly good, real-time idea of whether CPUs are bound by user processes or system processes.

Virtual Memory:Report virtual memory statistics

Virtual memory statistics reporter, also known as vmstat, is a Linux command-line tool that reports various bits of system information. Things like memory, paging, processes, IO, CPU, and disk scheduling are all included in the array of information provided.

Basic vmstat Output

The basic output of the vmstat command displays system information in six sections.

1. procs – Process Statistics
r – Active process count.
b – Sleeping process count.

2. memory – Memory statistics
swpd – Total virtual memory. The swap space is initially unoccupied. However, the kernel starts using swap space as the system’s physical memory reaches its limit.
free – Total free memory.
buff – Total memory temporarily used as a data buffer.
cache – Total cache memory.

3. swap – Swap space Statistics
si – The rate of swapping-in memory from disk.
so – The rate of swapping-out memory to disk.

4. io – Input/Output Statistics
bi – Blocks received from a block device per second.
bo – Blocks sent to a block device per second.

5. system – Scheduling statistics
in – The number of system interrupts.
cs – The number of context switches per second.

6. cpu – CPU Statistics
us – The percentage of CPU time spent on non-kernel processes.
sy – The percentage of CPU time spent on kernel processes.
id – The percentage of idle CPU.
wa – The percentage of CPU time spent waiting for Input/Output.
st – The percentage of CPU time stolen by a virtual machine.

Command syntax

The syntax for the vmstat command is rather simple:

$ vmstat [options][delay [count]]

Options to know

The -a option will give us the active and inactive memory of the system:
$vmstat -a

The -f option will give us the number of forks since boot:
$vmstat -f

The -s option displays various memory statistics as well as CPU and IO event counters:
$vmstat -s

The -d option gives you read/write stats for various disks:
$vmstat -d

The -t option gives us timestamp information with every update, a seen here:
$vmstat -t

Using a Time Interval
We can have vmstat provide regular updates to these figures by using a delay value. The delay value is provided in seconds. 

To have the statistics updated every five seconds, we’d use the following command:
$vmstat 5

Using a Count Value
Using too low a delay value will put additional strain on your system. If you need to have rapid updates to try to diagnose a problem, it is recommended that you use a count value as well as a delay value.

The count value tells vmstat how many updates to perform before it exits and returns you to the command prompt. If you do not provide a count value, vmstat will run until it is stopped by Ctrl+C.

To have vmstat provide an update every five seconds—but only for four updates—use the following command:
$vmstat 5 4

Changing the Units
You can choose to have the memory and swap statistics displayed in kilobytes or megabytes using the -S (unit-character) option. This must be followed by one of k , K , m,  or M. These represent:

k:1000 bytes
K: 1024 bytes
m: 1000000 bytes
M: 1048576 bytes

To have the statistics updated every 10 seconds with the memory and swap statistics displayed in megabytes, use the following command:
$vmstat 10 -S M

Wednesday 22 February 2023

If the responsibilities assigned to user in Oracle Application is not appearing then we need to follow below steps.

Error: Recently I've faced this issue that users were complaining that they were not seeing responsibility which was assigned to them.

When I tried to switch to the responsibility which were added recently, It wasn't not showing up. I tried to logoff and login again. Cleared IE Browser Cache. But No luck. After research, I found the solution which worked. Here is for you.

Solution:

1 .Login as System Administrator
2. Submit Concurrent Program "Workflow Directory Services User/Role Validation"

Run program 'Workflow Directory Services User/Role Validation' with following parameters.

Parameters to be passed
Batch Size - 10000(default value)
Username for the user having issue- ERPTEAM
Check_Dangling - Yes (Default value No)
Add missing user/role assignments - Yes (Default Value No)
Update WHO columns in WF tables - Yes (Default Value No)


This request would check all users and assigned responsibilities and will sync up users with attached responsibilities.Users should be able to view assigned responsibilty once the program is completed succesfully.

Note: As per Oracle documentation, it is also good to run 'Synchronize WF LOCAL tables' after this program run. But it is optional.

How to identify and kill zombie/defunct processes in Linux without reboot.

 How to identify and kill zombie/defunct processes in Linux without reboot:

1) Identify the zombie processes

#top -b1 -n1 | grep Z

2) Find the parent of zombie processes

#ps -A -ostat,ppid | grep -e '[zZ]'| awk '{ print $2 }' | uniq | xargs ps -p

3) Send SIGCHLD signal to the parent process. This signal tells the parent process to execute the wait() system call and clean up its zombie children

#kill -s SIGCHLD ppid

4) Again Identify if the zombie processes have been killed

#top -b1 -n1 | grep Z

5) Kill the parent process

#kill -9 ppid

Configure and enable Data Guard broker in Oracle Data Guard

At this point we have a primary database and a standby database, so now we need to start using the Data Guard Broker to manage them. 

Connect to both databases (primary and standby) and issue the following command

PRODRHEL------Primary Database

PRODOL------Standby Database

Register the Service name on both sides:

SQL>alter system set service_names='PROD','PRODRHEL' scope=both;(Primary)

SQL>alter system set service_names='PROD','PRODOL' scope=both;(Standby)

#Enable broker:

On primary:

SQL> alter system set dg_broker_start=true;

SQL> show parameter dg_broker_start;

On standby:

SQL> alter system set dg_broker_start=true;

SQL> show parameter dg_broker_start;

#Register the primary database with broker

On primary, connect to DGMGRL utility and register the primary database with broker

On primary:

$dgmgrl sys/prod@prodrhel

DGMGRL>create configuration my_dg as primary database is prodrhel connect identifier is prodrhel;

DGMGRL>show configuration;

On standby database:

DGMGRL>add database prodol as connect identifier is prodol;

DGMGRL>show configuration;

#Enable configuration

DGMGRL>ENABLE CONFIGURATION;

#Stop log apply:

$dgmgrl sys/prod@prodrhel

DGMGRL>show configuration;

DGMGRL>show database prodol;

DGMGRL>edit database prodol set state=APPLY-OFF;

DGMGRL>show database prodol;

#Start log apply:

DGMGRL>edit database prodol set state=APPLY-ON;

DGMGRL>show database prodol;

How we can manually enable log shipping from primary to standby, the same way we can use broker to enable log shipping

#Disable log shipping/transport:

$dgmgrl sys/prod@prodrhel

DGMGRL> show configuration;

DGMGRL> edit database prodrhel set state=TRANSPORT-OFF;

DGMGRL> show database prodrhel;

#Enable log shipping/transport:

DGMGRL> edit database prodrhel set state=TRANSPORT-ON;

DGMGRL> show database prodrhel;

#Database Switchover

A database can be in one of two mutually exclusive modes (primary or standby). These roles can be altered at runtime without loss of data or resetting of redo logs. 

This process is known as a Switchover and can be performed using the following commands. Connect to the primary database (prodrhel) and switchover to the standby database (prodol).

$dgmgrl sys/prod@prodrhel

DGMGRL> SWITCHOVER TO prodol;

Performing switchover NOW, please wait...

Operation requires a connection to instance "PROD" on database "prodol"

Connecting to instance "PROD"...

Connected.

New primary database "prodol" is opening...

Operation requires startup of instance "PROD" on database "prodrhel"

Starting instance "PROD"...

ORACLE instance started.

Database mounted.

Database opened.

Switchover succeeded, new primary is "prodol"

DGMGRL>

#Let's switch back to the original primary. Connect to the new primary (prodol) and switchover to the new standby database (prodrhel).

$dgmgrl sys/prod@prodol

DGMGRL>SWITCHOVER TO prodrhel;

Performing switchover NOW, please wait...

Operation requires a connection to instance "PROD" on database "prodrhel"

Connecting to instance "PROD"...

Connected.

New primary database "prodrhel" is opening...

Operation requires startup of instance "PROD" on database "prodol"

Starting instance "PROD"...

ORACLE instance started.

Database mounted.

Database opened.

Switchover succeeded, new primary is "prodrhel"

DGMGRL>

#Database Failover

If the primary database is not available the standby database can be activated as a primary database using the following statements.

Connect to the standby database (prodol) and failover.

$dgmgrl sys/prod@prodol

DGMGRL>FAILOVER TO prodol;

Since the standby database is now the primary database it should be backed up immediately.The original primary database can now be configured as a standby. If flashback database was enabled on the primary database, then this can be done relatively easily with the following command.

DGMGRL>reinstate database prodrhel;

#Snapshot Standby

Introduced in 11g, snapshot standby allows the standby database to be opened in read-write mode. When switched back into standby mode, all changes made whilst in read-write mode are lost. This is achieved using flashback database, but the standby database does not need to have flashback database explicitly enabled to take advantage of this feature, thought it works just the same if it is.

Connect to the primary (prodrhel) database and convert the standby database (prodol) to a snapshot standby.

$dgmgrl sys/prod@prodrhel

DGMGRL>show configuration;

DGMGRL>CONVERT DATABASE prodol TO SNAPSHOT STANDBY;

DGMGRL>show configuration;

=======Example========

$sqlplus / as sysdba

SQL> create table student(id number(5));

SQL>begin

for i in 1 .. 100000 loop

insert into student values(1);

end loop;

end;

/

SQL> commit;

Commit complete.

SQL> select count(*) from student;

  COUNT(*)

----------

    100000

When you are finished with the snapshot standby, convert it back to a standby database.

$dgmgrl sys/prod@prodrhel

DGMGRL>show configuration;

DGMGRL>CONVERT DATABASE prodol TO PHYSICAL STANDBY;

DGMGRL>show configuration;

#Changing from Maximum Performance to Maximum Availability

Using DGMGRL connect to either the primary or the standby database.

$dgmgrl sys/prod@prodrhel

DGMGRL>SHOW DATABASE VERBOSE 'prodol';

DGMGRL>edit database prodol set property logxptmode=SYNC;

DGMGRL>edit database prodrhel set property logxptmode=SYNC;

DGMGRL>edit configuration set protection mode as maxavailability;

DGMGRL> show configuration;

#Let's switch back to the original status

DGMGRL> edit configuration set protection mode as maxperformance;

DGMGRL>edit database prodol set property logxptmode=ASYNC;

DGMGRL>edit database prodrhel set property logxptmode=ASYNC;

DGMGRL> show configuration;

#Enable Fast-Start-Failover Data Guard Broker

While Oracle Data Guard definitely protects a database when the entire production site is lost via its failover capabilities, it’s still necessary for an Oracle DBA to intervene to complete the failover process.

With this activity, we can enable automatic failover using Fast-Start-Failover Observer with Data Guard broker.

---Configure FSFO----

Check StaticConnectIdentifier: In order to enable FSFO, the StaticConnectIdentifier parameter must be set both in primary and standby

On primary(prodrhel):

$dgmgrl sys/prod@prodrhel

DGMGRL> show database prodrhel StaticConnectIdentifier;

DGMGRL> show database prodol StaticConnectIdentifier;

If StaticConnectIdentifier is blank: The StaticConnectIdentifier takes its value from LOCAL_LISTENER parameter from the database. 

If this value is not set (or blank) for any database above, then connect to sqlplus and edit LOCAL_LISTENER parameter

SQL> ALTER SYSTEM SET LOCAL_LISTENER='(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.204)(PORT=1521))';

Once you make changes to LOCAL_LISTENER parameter, you must restart the listener.

Define FastStartFailoverTarget: In general, there can be more than one physical standby database. 

So, we need to pair physical standby with primary to let Fast Start Failover know which physical standby to be activated

On primary (prodrhel):

$dgmgrl sys/prod@prodrhel

DGMGRL> SHOW FAST_START FAILOVER

DGMGRL> EDIT DATABASE prodrhel SET PROPERTY FastStartFailoverTarget = 'prodol';

DGMGRL> EDIT DATABASE prodol SET PROPERTY FastStartFailoverTarget = 'prodrhel';

DGMGRL> show database verbose prodrhel;

DGMGRL> show database verbose prodol;

Define FastStartFailoverThreshold: Next we need to let broker know when to initiate automatic failover. 

What is the time (in seconds) that FSFO will wait before initiating failover

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold=30;

DGMGRL> show fast_start failover

Define FastStartFailoverLagLimit: We can optionally define how much time (in seconds) data we are ready to lose in case the Data Guard is in Max Performance Mode

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverLagLimit = 30;

---Enable FSFO: Now we can enable FSFO. Never start observer on production database

On Standby Server:

$dgmgrl sys/prod@prodol

DGMGRL> ENABLE FAST_START FAILOVER;

DGMGRL> show configuration;

DGMGRL> start observer;

Observer started

11:25:34.09  Wednesday, November 02, 2022

Initiating Fast-Start Failover to database "prodol"...

Performing failover NOW, please wait...

Failover succeeded, new primary is "prodol"

11:25:48.30  Wednesday, November 02, 2022

11:27:53.85  Wednesday, November 02, 2022

Initiating reinstatement for database "prodrhel"...

Reinstating database "prodrhel", please wait...

Operation requires shutdown of instance "PROD" on database "prodrhel"

Shutting down instance "PROD"...

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

Operation requires startup of instance "PROD" on database "prodrhel"

Starting instance "PROD"...

ORACLE instance started.

Database mounted.

Continuing to reinstate database "prodrhel" ...

Reinstatement of database "prodrhel" succeeded

11:29:04.71  Wednesday, November 02, 2022

Observer stopped

You have mail in /var/spool/mail/oracle

[oracle@erpproddb ~]$

----Test FSFO Configuration

Let us simulate a failure. We will abort primary (prodrhel) instance and wait for FSFO to perform automatic failover.

Simulate failure: On prodrhel, the current primary, let us shut abort the instance

On primary (prodrhel):

sqlplus / as sysdb

SQL> shut abort;

Check logfiles: At this stage, check alert log and observer log files. FSFO must perform automatic failover and prodol would be your new primary database

On new primary (prodol):

$sqlplus / as sysdba

SQL> set pagesize 200;

SQL> set linesize 200;

SQL> select DATABASE_ROLE,name, open_mode, db_unique_name,PROTECTION_MODE,SWITCHOVER_STATUS from v$database;

#Reinstate Failed Primary

Mount the failed primary (prodrhel) and it will auto reinstate. 

Note: Do not open the database as it will be switched to physical standby

On failed primary (prodrhel):

sqlplus / as sysdba

SQL> startup mount;

dgmgrl sys/prod@prodrhel

DGMGRL> show configuration;

You can perform switchover to get back the original configuration

On current primary (prodol):

$dgmgrl sys/prod@prodol

DGMGRL> show configuration;

DGMGRL> switchover to prodrhel;

-----Disable FSFO----

$dgmgrl sys/prod@prodrhel

DGMGRL> show configuration;

DGMGRL> DISABLE FAST_START FAILOVER;

-----Stop observer:

$dgmgrl sys/prod@prodol

DGMGRL> stop observer;

BI Publisher Login Error "Server not initialized. Please make sure the repository is ready"

User received the “Server not initialized. Please make sure the repository is ready” error when they tried to access bi publisher:


9502/xmlpserver/ : Server not initialized. Please make sure the repository is ready.

We had check the logs then it's showing nodemanager not running then we had run below command:

$cd /erp_ords/BI/BI_Publisher/user_projects/domains/bip/bitools/bin
$./status.sh

Status of Domain: /erp_ords/BI/BI_Publisher/user_projects/domains/bip
NodeManager (erpapxrpt.nicsi.in:9508):NOT RUNNING

$./start.sh

Starting domain; Using domainHome: /erp_ords/BI/BI_Publisher/user_projects/domains/bip ...                                                           

Initializing WebLogic Scripting Tool (WLST) ...

Welcome to WebLogic Server Administration Scripting Shell

Type help() for help on available commands


NodeManager already running
Reading domain...
/Servers/AdminServer/ListenPort=9500
Accessing admin server using URL t3://erpapxrpt.nicsi.in:9500

AdminServer already running

Starting all servers ...
Server bi_server1 not started as already in state (RUNNING)
Starting obiccs1 (Original State:SHUTDOWN) ...
Started obiccs1

Starting obis1 (Original State:SHUTDOWN) ...
Started obis1

Starting obips1 (Original State:SHUTDOWN) ...
Started obips1

Starting obijh1 (Original State:SHUTDOWN) ...
Started obijh1

Starting obisch1 (Original State:SHUTDOWN) ...
Started obisch1


Finished starting servers

Then I had checked again the issue resolved.