Recently in my internship at Biometrics Lab at IIT Jodhpur, most of the work is done on remote systems due to high compute requirements.
So I thought I would write something about it, as it’s bound to become more common in future at least for the folks planning to begin their journey with Deep Learning.
Connecting to a host:
$ ssh user_id@host_address
This will open a prompt to enter the password.
That’s it, we are now connected to a remote system. For simple use cases this method more than enough. However, there’s the issue of persistence. If the ssh connection is terminated while a command was being executed, the process would be killed.
If we are training a model which can often take many hours, we cannot rely on having a stable connection for that long. So there’s basically 3 common methods to tackle it:
- nohup
- tmux
- screen
My experience with nohup has been finnicky but it could be due to my lack of experience. Tmux is terminal multiplexer which can be a bit overkill for solving this issue but if gives the desired result.
So the process for tmux would look something like this:
$tmux
#This starts a tmux session
#You can exit this session by pressing ctrl+b
#followed by d
$tmux ls
#This shows a list of all tmux sessions
$tmux attach-session -t session_number
#This allows user to attach to attach to
#any existing session