Thursday, 23 February 2023

What does the error message "fork: retry: Resource temporarily unavailable"

Issue:
The following errors are seen in /var/log/secure:

Feb 23 14:34:57 erpuatappl sshd[11618]: fatal: setresuid 501: Resource temporarily unavailable
Feb 23 14:41:15 erpuatappl sshd[11850]: error: do_exec_pty: fork: Resource temporarily unavailable

Root Cause:
There can be various reasons for processes not being able to fork:

There is a misbehaving service or process running, consuming more resources than expected.
The system was not able to create new processes, because of the limits set for nproc in /etc/security/limits.conf.
The system ran out of memory and new processes were unable to start because they could not allocate memory.
There is not an available ID to assign to the new process. A unique value less than kernel.pid_max must be available.

Resolution:
There can be various reasons for processes not being able to fork and thus that means there are also various resolution:

1. When the system runs into a limitation in the number of processes, increase the nproc value in /etc/security/limits.conf or /etc/security/limits.d/90-nproc.conf depending on RHEL version. 

2. The limit can be increased for a specific user or all users. For example, here is an example of /etc/security/limits.d/90-nproc.conf file.

<user>       -          nproc     2048      <<<----[ Only for "<user>" user ]

*            -          nproc     2048      <<<----[ For all user's ]

3. Check the total number of threads and processes running on the server:

[applmgr@erpuatappl ~]$  ps -eLf | wc -l

2332

[applmgr@erpuatappl ~]$ cat /proc/sys/kernel/pid_max

32768

For example, if the above result is 2332, then increase kernel.pid_max to 32768.

kernel.pid_max must be larger than the total number of simultaneous threads and processes.

Diagnostic Steps:

Check with sar whether all memory was used or whether a large number of processes was spawned.

**In order to check the use of processes against what is allowed for the user, check the output of ulimit -u for the limit set to the particular user, and compare with the number of processes the user is runing.

**You can run the below command to find the number of processes opened for every user and compare if that limit is exceeded with what defined in /etc/security/limits.conf or /etc/security/limits.d/*.

[applmgr@erpuatappl ~]$ ps --no-headers auxwwwm | awk '$2 == "-" { print $1 }' | sort | uniq -c | sort -n
      1 dbus
      1 gdm
      1 rpc
      1 rpcuser
      2 postfix
      3 68
      3 rtkit
      5 mfe
    165 oracle
    813 applmgr
    317 root

**Increase the value for the "nproc" parameter in /etc/security/limits.conf.
Add the following settings to /etc/security/limits.conf:
oracle           soft     nproc   4096
oracle           hard    nproc   16384
applmgr        soft     nofile  4096
applmgr        hard    nofile  65536
                                           
**Add or edit the following line in the /etc/pam.d/login file, if it does not already exist:

session     required     pam_limits.so

                                      OR
Make a profile if it does not already exist and then put below entry:
**Add the following lines to /etc/profile:

if [ $USER = "oracle" ]; then
    if [ $SHELL = "/bin/ksh" ]; then
        ulimit -p 16384
        ulimit -n 65536
    else
        ulimit -u 16384 -n 65536
    fi
fi

No comments:

Post a Comment