Linux Fundamentals

Core Linux knowledge for software engineers: process management, file system, networking, shell scripting, and systemd. Essential for anyone running services in production.

Updated Invalid Date·

linux bash shell processes networking permissions systemd

Core Linux knowledge for software engineers: process management, file system, networking, shell scripting, and systemd. Essential for anyone running services in production.

File System

# Navigation
pwd                          # print working directory
ls -lah                      # list with sizes, hidden files, human-readable
find /var/log -name "*.log" -mtime -1   # files modified in last 24h
stat myfile.txt              # inode, permissions, timestamps

# Permissions: rwxrwxrwx (owner/group/other)
chmod 755 script.sh          # rwxr-xr-x: owner full, others read+execute
chmod 600 ~/.ssh/id_rsa      # rw-------: only owner read/write
chown www-data:www-data /var/www/html -R

# Disk usage
df -h                        # disk space by filesystem
du -sh /var/log/*            # size of each item in /var/log
ncdu /                       # interactive disk usage explorer

# Links
ln -s /etc/nginx/sites-available/mysite /etc/nginx/sites-enabled/  # symlink

Process Management

# View processes
ps aux                       # all processes
ps aux | grep python         # filter by name
pgrep -fl gunicorn           # find processes matching name
top                          # interactive (q to quit)
htop                         # better top (install separately)

# Control
kill -9 <pid>                # SIGKILL — instant, no cleanup
kill -15 <pid>               # SIGTERM — graceful shutdown (default)
kill -HUP <pid>              # SIGHUP — reload config
pkill -f "gunicorn myapp"    # kill by command pattern

# Background jobs
nohup python worker.py &     # run in background, ignore SIGHUP
jobs                         # list background jobs
fg %1                        # bring job 1 to foreground
bg %1                        # send job 1 to background

# Resource limits
ulimit -n 65536              # open file descriptor limit
nice -n 10 python cpu_task.py  # run at lower CPU priority

Shell Scripting

#!/usr/bin/env bash
set -euo pipefail            # exit on error, undefined vars, pipe failures

# Variables
APP_ENV="${APP_ENV:-production}"    # default value
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

# Conditionals
if *"$APP_ENV" == "production"*; then
    LOG_LEVEL="INFO"
else
    LOG_LEVEL="DEBUG"
fi

# Check file/dir exists
if *! -f ".env"*; then
    echo "ERROR: .env file not found" >&2
    exit 1
fi

# Loops
for FILE in /var/log/app/*.log; do
    echo "Processing: $FILE"
    gzip "$FILE"
done

# Functions
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a /var/log/deploy.log
}

# Trap for cleanup on exit
cleanup() {
    log "Cleaning up..."
    rm -f /tmp/deploy.lock
}
trap cleanup EXIT

# Check command succeeded
if ! systemctl restart myapp; then
    log "ERROR: failed to restart myapp"
    exit 1
fi

Networking

# Connectivity
ping -c 4 google.com
curl -v https://api.myapp.com/health      # verbose HTTP
wget -O - https://api.myapp.com/health    # GET + print to stdout
nc -zv postgres.internal 5432             # check port open (netcat)

# DNS
dig api.myapp.com                         # full DNS lookup
dig +short api.myapp.com A               # just the IP
nslookup api.myapp.com 8.8.8.8           # use specific DNS server
host api.myapp.com

# Open ports and connections
ss -tlnp                     # listening TCP ports + process
netstat -tlnp                # same (older)
lsof -i :8000                # what's using port 8000
iptables -L -n               # firewall rules

# Transfer files
scp user@host:/path/file .
rsync -avz --progress user@host:/var/log/ ./logs/

# SSH tunnels
ssh -L 5432:postgres.internal:5432 bastion.myapp.com   # forward local:5432 → remote postgres
ssh -N -f -L 6379:redis.internal:6379 bastion          # background tunnel

systemd Service Management

# Service control
systemctl start myapp
systemctl stop myapp
systemctl restart myapp
systemctl reload myapp        # send SIGHUP, reload config without restart
systemctl status myapp
systemctl enable myapp        # start on boot
systemctl disable myapp

# Logs
journalctl -u myapp -f        # follow service logs
journalctl -u myapp --since "1 hour ago"
journalctl -u myapp -p err    # only errors

# /etc/systemd/system/myapp.service
[Unit]
Description=My Application
After=network.target postgresql.service
Requires=postgresql.service

[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/opt/myapp
EnvironmentFile=/opt/myapp/.env
ExecStart=/opt/myapp/venv/bin/gunicorn myapp.wsgi:application \
    --bind 0.0.0.0:8000 \
    --workers 4 \
    --timeout 30
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Text Processing

# grep
grep -r "ERROR" /var/log/      # recursive search
grep -n "NullPointer" app.log  # with line numbers
grep -v "DEBUG" app.log        # exclude lines
grep -E "ERROR|WARN" app.log   # extended regex (OR)
grep -c "500" access.log       # count matches

# awk — field-based processing
awk '{print $1}' access.log                    # print first field
awk -F: '{print $1}' /etc/passwd               # colon delimiter
awk '$9 == 500 {print $7}' access.log          # URLs returning 500
awk 'END {print NR}' app.log                   # count lines

# sed — stream editor
sed 's/ERROR/[ERROR]/g' app.log                # replace
sed -n '/2026-05-01/,/2026-05-02/p' app.log   # range print
sed -i 's/localhost/api.prod.com/g' config.yml # in-place edit

# sort + uniq
sort access.log | uniq -c | sort -rn | head -20  # top 20 unique lines

Useful Combinations

# Real-time error monitor with context
tail -f /var/log/app.log | grep --line-buffered -A 3 "ERROR"

# Count HTTP status codes in nginx access log
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# Find large files
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null

# Check open file descriptors for process
ls -la /proc/$(pgrep -f gunicorn | head -1)/fd | wc -l

Common Failure Cases

Script fails silently because set -euo pipefail is missing Why: without these flags, a failed command in the middle of a script is ignored and execution continues with a broken state; the script exits 0 even though work was skipped. Detect: a deploy script reports success but the application binary was never copied; intermediate commands failed without stopping the script. Fix: add set -euo pipefail as the second line of every bash script (after the shebang) so any non-zero exit, unbound variable, or pipe failure immediately halts execution.

Service crashes repeatedly because Restart=always masks the root cause Why: systemd's Restart=always respawns the service immediately; the service enters a fast crash loop that consumes resources and buries the actual error in a flood of identical log entries. Detect: systemctl status myapp shows "Active: activating (auto-restart)" cycling every few seconds; the same error fills journalctl. Fix: add RestartSec=10 to throttle restarts and give yourself time to read the logs; fix the root cause rather than tuning the restart policy.

File descriptor exhaustion kills the process Why: each open socket, file, or pipe consumes an fd; the default ulimit -n of 1024 is hit quickly by connection-heavy services, causing OSError: [Errno 24] Too many open files. Detect: lsof -p <pid> | wc -l is close to the limit; errors mention "too many open files". Fix: raise the limit in the systemd unit with LimitNOFILE=65536, and ensure connections are properly closed (use context managers for files and sockets).

kill -9 leaves zombie processes and locked files Why: SIGKILL bypasses the process's signal handlers, so cleanup code (releasing locks, flushing buffers, closing sockets) never runs. Detect: a pid file or socket lock remains on disk after the process is gone; a subsequent start fails with "address already in use" or "lock file exists". Fix: always send SIGTERM (kill -15) first and wait for graceful shutdown; only escalate to SIGKILL if the process does not exit within the grace period.

Connections

se-hub · cs-fundamentals/distributed-systems · cloud/cloud-security · cs-fundamentals/security-fundamentals-se · cloud/serverless-patterns · cs-fundamentals/concurrency

Open Questions

What are the most common misapplications of this concept in production codebases?
When should you explicitly choose not to use this pattern or technique?