Today I discovered the versatile tr command for translating and manipulating character streams in Unix pipelines.

Basic tr Command Usage

The tr (translate) command transforms characters from stdin according to specified rules:

Character Translation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Convert lowercase to uppercase
echo "hello world" | tr 'a-z' 'A-Z'
# Output: HELLO WORLD

# Convert uppercase to lowercase  
echo "HELLO WORLD" | tr 'A-Z' 'a-z'
# Output: hello world

# Replace specific characters
echo "hello-world" | tr '-' '_'
# Output: hello_world

# Multiple character replacement
echo "hello world" | tr 'hw' 'HW'
# Output: Hello World

Character Deletion:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Delete specific characters
echo "hello123world456" | tr -d '0-9'
# Output: helloworld

# Delete whitespace
echo "hello   world" | tr -d ' '
# Output: helloworld

# Delete newlines (join lines)
cat multiline.txt | tr -d '\n'

Character Sets and Ranges:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Using predefined character classes
echo "Hello World 123!" | tr '[:upper:]' '[:lower:]'
# Output: hello world 123!

echo "Hello World 123!" | tr -d '[:punct:]'
# Output: Hello World 123

echo "Hello World 123!" | tr -d '[:digit:]'
# Output: Hello World !

# Available character classes:
# [:alnum:]  - alphanumeric characters
# [:alpha:]  - alphabetic characters  
# [:digit:]  - numeric characters
# [:lower:]  - lowercase letters
# [:upper:]  - uppercase letters
# [:punct:]  - punctuation characters
# [:space:]  - whitespace characters

Advanced tr Operations

Squeeze Repeated Characters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Squeeze multiple spaces into single space
echo "hello    world" | tr -s ' '
# Output: hello world

# Squeeze any whitespace
echo -e "hello\t\t\nworld" | tr -s '[:space:]'
# Output: hello world

# Remove duplicate characters
echo "hellooo wooorld" | tr -s 'o'
# Output: helo world

Complement Operations:

1
2
3
4
5
6
7
# Keep only specified characters (delete complement)
echo "abc123def456" | tr -cd '[:digit:]'
# Output: 123456

# Delete everything except letters and spaces
echo "Hello, World! 123" | tr -cd '[:alpha:][:space:]'
# Output: Hello World

Practical Use Cases

Data Cleaning:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Clean CSV data - replace commas with tabs
cat data.csv | tr ',' '\t'

# Remove Windows line endings
cat windows_file.txt | tr -d '\r'

# Convert DOS to Unix line endings
tr -d '\r' < dos_file.txt > unix_file.txt

# Clean phone numbers
echo "(555) 123-4567" | tr -cd '[:digit:]'
# Output: 5551234567

Text Processing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# ROT13 cipher
echo "hello" | tr 'a-zA-Z' 'n-za-mN-ZA-M'
# Output: uryyb

# Create URL-safe strings
echo "Hello World!" | tr '[:upper:][:space:][:punct:]' '[:lower:]--'
# Output: hello-world-

# Extract words (replace non-letters with newlines)
echo "hello,world;testing" | tr -cs '[:alpha:]' '\n'
# Output:
# hello
# world  
# testing

Log Analysis:

1
2
3
4
5
6
7
8
# Count unique IP addresses in log
cat access.log | tr -s ' ' | cut -d' ' -f1 | sort | uniq -c

# Extract only numeric data from mixed content
cat mixed_data.txt | tr -cd '[:digit:]\n'

# Convert log timestamps
cat log.txt | tr ':' '-'  # Replace colons with dashes

System Administration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Generate random passwords (simple method)
tr -cd '[:alnum:]' < /dev/urandom | head -c 16
# Output: aB3xK9mP2qR8vN4L

# Convert file paths
echo "/path/to/file" | tr '/' '\\'
# Output: \path\to\file

# Clean environment variables
echo "$PATH" | tr ':' '\n'  # Show PATH entries one per line

Advanced Patterns

Character Mapping Tables:

1
2
3
4
5
6
7
8
9
# Create substitution cipher
plaintext="abcdefghijklmnopqrstuvwxyz"
ciphertext="zyxwvutsrqponmlkjihgfedcba"
echo "secret message" | tr "$plaintext" "$ciphertext"
# Output: hvxivg nvhhztv

# Leetspeak conversion
echo "elite hacker" | tr 'eElLoOaAsS' '33110044$$'
# Output: 3lit3 h4ck3r

Combining with Other Commands:

1
2
3
4
5
6
7
8
# Word frequency analysis
cat text.txt | tr -cs '[:alpha:]' '\n' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr

# Extract email domains
grep -o '[a-zA-Z0-9._%+-]*@[a-zA-Z0-9.-]*\.[a-zA-Z]*' emails.txt | tr '[:upper:]' '[:lower:]' | cut -d'@' -f2 | sort | uniq

# Convert camelCase to snake_case
echo "camelCaseVariable" | tr '[:upper:]' '[:lower:]' | sed 's/\([a-z]\)\([A-Z]\)/\1_\2/g'

Performance Considerations:

1
2
3
4
5
# tr is very fast for simple transformations
time cat large_file.txt | tr '[:lower:]' '[:upper:]' > /dev/null

# For complex patterns, tr + other tools often faster than sed/awk
time cat large_file.txt | tr -d '[:punct:]' | tr -s '[:space:]' > /dev/null

The tr command is particularly valuable because it’s designed for character-level transformations and is extremely fast, making it ideal for preprocessing data in complex pipelines before more sophisticated tools like sed, awk, or grep operate on it.