Today I discovered advanced Python performance optimization techniques and a powerful new profiling tool that provides detailed insights into Python application performance.
James Powell - Fast and Furious Python 7#
James Powell’s Fast and Furious Python 7 presents advanced techniques for writing high-performance Python code:
1. Understand Python’s Object Model:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
# Expensive: Creates new objects repeatedly
def slow_string_building(items):
result = ""
for item in items:
result += str(item) # Creates new string each time
return result
# Fast: Use join for string concatenation
def fast_string_building(items):
return "".join(str(item) for item in items)
# Benchmark difference
import timeit
items = list(range(1000))
slow_time = timeit.timeit(lambda: slow_string_building(items), number=100)
fast_time = timeit.timeit(lambda: fast_string_building(items), number=100)
print(f"Slow: {slow_time:.4f}s, Fast: {fast_time:.4f}s")
# Fast is typically 10-100x faster
|
2. Leverage Built-in Functions:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
# Slow: Python loop
def slow_sum(numbers):
total = 0
for num in numbers:
total += num
return total
# Fast: Built-in function (implemented in C)
def fast_sum(numbers):
return sum(numbers)
# Even faster for specific cases
import operator
from functools import reduce
def reduce_sum(numbers):
return reduce(operator.add, numbers, 0)
# Specialized operations
import statistics
import math
# Fast statistical functions
mean_value = statistics.mean(numbers)
median_value = statistics.median(numbers)
sqrt_sum = math.sqrt(sum(x*x for x in numbers))
|
3. Use List Comprehensions and Generator Expressions:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
# Slow: Explicit loop
def slow_filter_map(items):
result = []
for item in items:
if item % 2 == 0:
result.append(item * 2)
return result
# Fast: List comprehension
def fast_filter_map(items):
return [item * 2 for item in items if item % 2 == 0]
# Memory efficient: Generator expression
def memory_efficient_filter_map(items):
return (item * 2 for item in items if item % 2 == 0)
# Benchmark
items = list(range(10000))
%timeit slow_filter_map(items) # ~2.5ms
%timeit fast_filter_map(items) # ~0.8ms
%timeit list(memory_efficient_filter_map(items)) # ~0.9ms
|
4. Optimize Data Structures:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
# Slow: Linear search in list
def slow_membership_test(items, targets):
return [item for item in targets if item in items]
# Fast: Use sets for O(1) lookup
def fast_membership_test(items, targets):
item_set = set(items)
return [item for item in targets if item in item_set]
# Collections module optimizations
from collections import deque, Counter, defaultdict
# Fast queue operations
queue = deque()
queue.appendleft(item) # O(1) vs list.insert(0, item) O(n)
# Fast counting
word_counts = Counter(words) # Much faster than manual dict counting
# Avoid KeyError with defaultdict
word_positions = defaultdict(list)
for i, word in enumerate(words):
word_positions[word].append(i) # No need to check if key exists
|
5. Algorithm Complexity Awareness:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
# O(n²) - Avoid nested loops when possible
def slow_find_duplicates(items):
duplicates = []
for i, item1 in enumerate(items):
for j, item2 in enumerate(items[i+1:], i+1):
if item1 == item2:
duplicates.append(item1)
return duplicates
# O(n) - Use data structures effectively
def fast_find_duplicates(items):
seen = set()
duplicates = set()
for item in items:
if item in seen:
duplicates.add(item)
else:
seen.add(item)
return list(duplicates)
# Even better: Use Counter
from collections import Counter
def fastest_find_duplicates(items):
counts = Counter(items)
return [item for item, count in counts.items() if count > 1]
|
Advanced Optimization Techniques:#
Memory Layout Optimization:#
1
2
3
4
5
6
7
8
9
10
11
|
# Use __slots__ for classes with fixed attributes
class Point:
__slots__ = ['x', 'y'] # Saves memory, faster attribute access
def __init__(self, x, y):
self.x = x
self.y = y
# Use array.array for numeric data
import array
numbers = array.array('i', range(10000)) # Much less memory than list
|
Function Call Optimization:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# Cache expensive computations
from functools import lru_cache
@lru_cache(maxsize=128)
def expensive_function(n):
# Simulate expensive computation
return sum(i*i for i in range(n))
# Pre-compile regular expressions
import re
EMAIL_PATTERN = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
def validate_email(email):
return EMAIL_PATTERN.match(email) is not None
|
Scalene - Advanced Python Profiler#
Scalene is a high-performance CPU and memory profiler designed specifically for Python:
Key Features:#
Comprehensive Profiling:#
- CPU profiling: Time spent in Python vs native code
- Memory profiling: Memory usage and allocation patterns
- Line-by-line analysis: Detailed per-line performance data
- GPU profiling: NVIDIA GPU usage tracking
Installation and Basic Usage:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# Install Scalene
pip install scalene
# Profile a Python script
scalene your_program.py
# Profile with specific options
scalene --html --outfile profile.html your_program.py
# Profile only CPU usage
scalene --cpu-only your_program.py
# Profile with memory tracking
scalene --memory your_program.py
|
Advanced Profiling Options:#
1
2
3
4
5
6
7
8
9
10
11
|
# Profile specific functions
scalene --profile-interval 0.01 your_program.py
# Reduced overhead profiling
scalene --cpu-sampling-rate 0.01 your_program.py
# Profile multiprocessing applications
scalene --profile-all your_program.py
# Generate detailed reports
scalene --json --outfile profile.json your_program.py
|
Integration with Code:#
Programmatic Profiling:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
import scalene
# Start profiling
scalene.scalene_profiler.start()
# Your code here
def compute_intensive_function():
result = []
for i in range(100000):
result.append(i ** 2)
return result
data = compute_intensive_function()
# Stop profiling and generate report
scalene.scalene_profiler.stop()
|
Context Manager Usage:#
1
2
3
4
5
6
7
8
9
|
from scalene import scalene_profiler
def your_function():
# Function to profile
return [i*i for i in range(100000)]
# Profile specific code blocks
with scalene_profiler:
result = your_function()
|
Interpreting Scalene Output:#
Line | %CPU | %Native | Memory | Function
-----|------|---------|--------|----------
5 | 45.2 | 12.3 | 2.1MB | list_comprehension()
12 | 32.1 | 8.7 | 0.8MB | string_processing()
18 | 15.4 | 2.1 | 1.2MB | file_operations()
- %CPU: Percentage of total CPU time
- %Native: Time spent in native (C) code
- Memory: Memory allocated on this line
- Function: Function name and context
Memory Profile Features:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
# Example code that Scalene can analyze
def memory_intensive_function():
# Scalene tracks memory allocation here
large_list = [i for i in range(1000000)]
# Scalene identifies memory leaks
cache = {}
for i in range(10000):
cache[i] = [j for j in range(100)]
# Scalene shows peak memory usage
return large_list, cache
# Scalene provides:
# - Peak memory usage per line
# - Memory growth over time
# - Potential memory leaks
# - Memory efficiency suggestions
|
Scalene vs Other Profilers:#
1
2
3
4
5
6
7
8
9
10
11
|
# cProfile (standard library)
python -m cProfile -o profile.prof your_program.py
# line_profiler
kernprof -l -v your_program.py
# memory_profiler
python -m memory_profiler your_program.py
# Scalene (comprehensive)
scalene --html your_program.py
|
Scalene Advantages:
- Lower overhead than most profilers
- Combines CPU and memory profiling
- Better handling of native code
- More accurate line-by-line attribution
- Modern, interactive HTML reports
These tools and techniques provide a comprehensive approach to Python performance optimization, from understanding algorithmic complexity to detailed profiling and measurement of actual performance characteristics.