Today I discovered the versatile tr
command for translating and manipulating character streams in Unix pipelines.
Basic tr Command Usage#
The tr
(translate) command transforms characters from stdin according to specified rules:
Character Translation:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# Convert lowercase to uppercase
echo "hello world" | tr 'a-z' 'A-Z'
# Output: HELLO WORLD
# Convert uppercase to lowercase
echo "HELLO WORLD" | tr 'A-Z' 'a-z'
# Output: hello world
# Replace specific characters
echo "hello-world" | tr '-' '_'
# Output: hello_world
# Multiple character replacement
echo "hello world" | tr 'hw' 'HW'
# Output: Hello World
|
Character Deletion:#
1
2
3
4
5
6
7
8
9
10
|
# Delete specific characters
echo "hello123world456" | tr -d '0-9'
# Output: helloworld
# Delete whitespace
echo "hello world" | tr -d ' '
# Output: helloworld
# Delete newlines (join lines)
cat multiline.txt | tr -d '\n'
|
Character Sets and Ranges:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
# Using predefined character classes
echo "Hello World 123!" | tr '[:upper:]' '[:lower:]'
# Output: hello world 123!
echo "Hello World 123!" | tr -d '[:punct:]'
# Output: Hello World 123
echo "Hello World 123!" | tr -d '[:digit:]'
# Output: Hello World !
# Available character classes:
# [:alnum:] - alphanumeric characters
# [:alpha:] - alphabetic characters
# [:digit:] - numeric characters
# [:lower:] - lowercase letters
# [:upper:] - uppercase letters
# [:punct:] - punctuation characters
# [:space:] - whitespace characters
|
Advanced tr Operations#
Squeeze Repeated Characters:#
1
2
3
4
5
6
7
8
9
10
11
|
# Squeeze multiple spaces into single space
echo "hello world" | tr -s ' '
# Output: hello world
# Squeeze any whitespace
echo -e "hello\t\t\nworld" | tr -s '[:space:]'
# Output: hello world
# Remove duplicate characters
echo "hellooo wooorld" | tr -s 'o'
# Output: helo world
|
Complement Operations:#
1
2
3
4
5
6
7
|
# Keep only specified characters (delete complement)
echo "abc123def456" | tr -cd '[:digit:]'
# Output: 123456
# Delete everything except letters and spaces
echo "Hello, World! 123" | tr -cd '[:alpha:][:space:]'
# Output: Hello World
|
Practical Use Cases#
Data Cleaning:#
1
2
3
4
5
6
7
8
9
10
11
12
|
# Clean CSV data - replace commas with tabs
cat data.csv | tr ',' '\t'
# Remove Windows line endings
cat windows_file.txt | tr -d '\r'
# Convert DOS to Unix line endings
tr -d '\r' < dos_file.txt > unix_file.txt
# Clean phone numbers
echo "(555) 123-4567" | tr -cd '[:digit:]'
# Output: 5551234567
|
Text Processing:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# ROT13 cipher
echo "hello" | tr 'a-zA-Z' 'n-za-mN-ZA-M'
# Output: uryyb
# Create URL-safe strings
echo "Hello World!" | tr '[:upper:][:space:][:punct:]' '[:lower:]--'
# Output: hello-world-
# Extract words (replace non-letters with newlines)
echo "hello,world;testing" | tr -cs '[:alpha:]' '\n'
# Output:
# hello
# world
# testing
|
Log Analysis:#
1
2
3
4
5
6
7
8
|
# Count unique IP addresses in log
cat access.log | tr -s ' ' | cut -d' ' -f1 | sort | uniq -c
# Extract only numeric data from mixed content
cat mixed_data.txt | tr -cd '[:digit:]\n'
# Convert log timestamps
cat log.txt | tr ':' '-' # Replace colons with dashes
|
System Administration:#
1
2
3
4
5
6
7
8
9
10
|
# Generate random passwords (simple method)
tr -cd '[:alnum:]' < /dev/urandom | head -c 16
# Output: aB3xK9mP2qR8vN4L
# Convert file paths
echo "/path/to/file" | tr '/' '\\'
# Output: \path\to\file
# Clean environment variables
echo "$PATH" | tr ':' '\n' # Show PATH entries one per line
|
Advanced Patterns#
Character Mapping Tables:#
1
2
3
4
5
6
7
8
9
|
# Create substitution cipher
plaintext="abcdefghijklmnopqrstuvwxyz"
ciphertext="zyxwvutsrqponmlkjihgfedcba"
echo "secret message" | tr "$plaintext" "$ciphertext"
# Output: hvxivg nvhhztv
# Leetspeak conversion
echo "elite hacker" | tr 'eElLoOaAsS' '33110044$$'
# Output: 3lit3 h4ck3r
|
Combining with Other Commands:#
1
2
3
4
5
6
7
8
|
# Word frequency analysis
cat text.txt | tr -cs '[:alpha:]' '\n' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr
# Extract email domains
grep -o '[a-zA-Z0-9._%+-]*@[a-zA-Z0-9.-]*\.[a-zA-Z]*' emails.txt | tr '[:upper:]' '[:lower:]' | cut -d'@' -f2 | sort | uniq
# Convert camelCase to snake_case
echo "camelCaseVariable" | tr '[:upper:]' '[:lower:]' | sed 's/\([a-z]\)\([A-Z]\)/\1_\2/g'
|
1
2
3
4
5
|
# tr is very fast for simple transformations
time cat large_file.txt | tr '[:lower:]' '[:upper:]' > /dev/null
# For complex patterns, tr + other tools often faster than sed/awk
time cat large_file.txt | tr -d '[:punct:]' | tr -s '[:space:]' > /dev/null
|
The tr
command is particularly valuable because it’s designed for character-level transformations and is extremely fast, making it ideal for preprocessing data in complex pipelines before more sophisticated tools like sed
, awk
, or grep
operate on it.