Skip to main content

[Information] Walltime limit of login nodes processes to 336 hours (2 weeks) and maximum 24 hour-CPU time of jobs on login nodes

06.05.25 11:30 CET (12:30 EET)

  • The load on the LUMI login nodes is increasingly higher due a substantial number of processes from tools such as tmux, VScode server or JetBrains server which are not cleaned up. These processes use resources such as virtual memory space, terminal devices, etc. And some hanging processes also accumulate a considerable amount of compute time.
    • To ensure that processes that are likely hanging get cleaned up, we now limit the walltime of a process on a login node to 336 hours (2 weeks). This would still enable users to use tmux or screen to keep sessions alive for a few days and to run scripts that monitor the output of running jobs on the login nodes but would eliminate the forgotten tmux clients and VScode server instances that were not cleaned up. Each process is first softly killed, before a hard kill is sent if needed to ensure that if a process still has to write some data, it can do so properly (like shell histories).
  • Another issue is that there are still heavy runs on the different login nodes when compute nodes should be used (see login nodes policy here https://docs.lumi-supercomputer.eu/runjobs/).
    • To ensure that login nodes cannot be used for running longer jobs, we set a maximum CPU time of jobs of 24 hours. This is long enough for software builds or reasonable data transfers to LUMI-O not be interrupted yet short enough that a compute job using the 16 cores that it can use at 100% would get killed after 1.5 hours.