TIL: James Powell's Fast Python Techniques and Scalene Performance Profiler

Today I discovered advanced Python performance optimization techniques and a powerful new profiling tool that provides detailed insights into Python application performance.

James Powell - Fast and Furious Python 7

James Powell’s Fast and Furious Python 7 presents advanced techniques for writing high-performance Python code:

Core Performance Principles:

1. Understand Python’s Object Model:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


# Expensive: Creates new objects repeatedly
def slow_string_building(items):
    result = ""
    for item in items:
        result += str(item)  # Creates new string each time
    return result

# Fast: Use join for string concatenation
def fast_string_building(items):
    return "".join(str(item) for item in items)

# Benchmark difference
import timeit
items = list(range(1000))
slow_time = timeit.timeit(lambda: slow_string_building(items), number=100)
fast_time = timeit.timeit(lambda: fast_string_building(items), number=100)
print(f"Slow: {slow_time:.4f}s, Fast: {fast_time:.4f}s")
# Fast is typically 10-100x faster

2. Leverage Built-in Functions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


# Slow: Python loop
def slow_sum(numbers):
    total = 0
    for num in numbers:
        total += num
    return total

# Fast: Built-in function (implemented in C)
def fast_sum(numbers):
    return sum(numbers)

# Even faster for specific cases
import operator
from functools import reduce

def reduce_sum(numbers):
    return reduce(operator.add, numbers, 0)

# Specialized operations
import statistics
import math

# Fast statistical functions
mean_value = statistics.mean(numbers)
median_value = statistics.median(numbers)
sqrt_sum = math.sqrt(sum(x*x for x in numbers))

3. Use List Comprehensions and Generator Expressions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


# Slow: Explicit loop
def slow_filter_map(items):
    result = []
    for item in items:
        if item % 2 == 0:
            result.append(item * 2)
    return result

# Fast: List comprehension
def fast_filter_map(items):
    return [item * 2 for item in items if item % 2 == 0]

# Memory efficient: Generator expression
def memory_efficient_filter_map(items):
    return (item * 2 for item in items if item % 2 == 0)

# Benchmark
items = list(range(10000))
%timeit slow_filter_map(items)     # ~2.5ms
%timeit fast_filter_map(items)     # ~0.8ms
%timeit list(memory_efficient_filter_map(items))  # ~0.9ms

4. Optimize Data Structures:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


# Slow: Linear search in list
def slow_membership_test(items, targets):
    return [item for item in targets if item in items]

# Fast: Use sets for O(1) lookup
def fast_membership_test(items, targets):
    item_set = set(items)
    return [item for item in targets if item in item_set]

# Collections module optimizations
from collections import deque, Counter, defaultdict

# Fast queue operations
queue = deque()
queue.appendleft(item)  # O(1) vs list.insert(0, item) O(n)

# Fast counting
word_counts = Counter(words)  # Much faster than manual dict counting

# Avoid KeyError with defaultdict
word_positions = defaultdict(list)
for i, word in enumerate(words):
    word_positions[word].append(i)  # No need to check if key exists

5. Algorithm Complexity Awareness:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


# O(n²) - Avoid nested loops when possible
def slow_find_duplicates(items):
    duplicates = []
    for i, item1 in enumerate(items):
        for j, item2 in enumerate(items[i+1:], i+1):
            if item1 == item2:
                duplicates.append(item1)
    return duplicates

# O(n) - Use data structures effectively
def fast_find_duplicates(items):
    seen = set()
    duplicates = set()
    for item in items:
        if item in seen:
            duplicates.add(item)
        else:
            seen.add(item)
    return list(duplicates)

# Even better: Use Counter
from collections import Counter
def fastest_find_duplicates(items):
    counts = Counter(items)
    return [item for item, count in counts.items() if count > 1]

Advanced Optimization Techniques:

Memory Layout Optimization:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


# Use __slots__ for classes with fixed attributes
class Point:
    __slots__ = ['x', 'y']  # Saves memory, faster attribute access
    
    def __init__(self, x, y):
        self.x = x
        self.y = y

# Use array.array for numeric data
import array
numbers = array.array('i', range(10000))  # Much less memory than list

Function Call Optimization:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


# Cache expensive computations
from functools import lru_cache

@lru_cache(maxsize=128)
def expensive_function(n):
    # Simulate expensive computation
    return sum(i*i for i in range(n))

# Pre-compile regular expressions
import re
EMAIL_PATTERN = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

def validate_email(email):
    return EMAIL_PATTERN.match(email) is not None

Scalene - Advanced Python Profiler

Scalene is a high-performance CPU and memory profiler designed specifically for Python:

Key Features:

Comprehensive Profiling:

CPU profiling: Time spent in Python vs native code
Memory profiling: Memory usage and allocation patterns
Line-by-line analysis: Detailed per-line performance data
GPU profiling: NVIDIA GPU usage tracking

Installation and Basic Usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


# Install Scalene
pip install scalene

# Profile a Python script
scalene your_program.py

# Profile with specific options
scalene --html --outfile profile.html your_program.py

# Profile only CPU usage
scalene --cpu-only your_program.py

# Profile with memory tracking
scalene --memory your_program.py

Advanced Profiling Options:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


# Profile specific functions
scalene --profile-interval 0.01 your_program.py

# Reduced overhead profiling
scalene --cpu-sampling-rate 0.01 your_program.py

# Profile multiprocessing applications
scalene --profile-all your_program.py

# Generate detailed reports
scalene --json --outfile profile.json your_program.py

Integration with Code:

Programmatic Profiling:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


import scalene

# Start profiling
scalene.scalene_profiler.start()

# Your code here
def compute_intensive_function():
    result = []
    for i in range(100000):
        result.append(i ** 2)
    return result

data = compute_intensive_function()

# Stop profiling and generate report
scalene.scalene_profiler.stop()

Context Manager Usage:

1
2
3
4
5
6
7
8
9


from scalene import scalene_profiler

def your_function():
    # Function to profile
    return [i*i for i in range(100000)]

# Profile specific code blocks
with scalene_profiler:
    result = your_function()

Interpreting Scalene Output:

CPU Profile Information:

Line | %CPU | %Native | Memory | Function
-----|------|---------|--------|----------
5    | 45.2 | 12.3    | 2.1MB  | list_comprehension()
12   | 32.1 | 8.7     | 0.8MB  | string_processing()
18   | 15.4 | 2.1     | 1.2MB  | file_operations()

%CPU: Percentage of total CPU time
%Native: Time spent in native (C) code
Memory: Memory allocated on this line
Function: Function name and context

Memory Profile Features:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


# Example code that Scalene can analyze
def memory_intensive_function():
    # Scalene tracks memory allocation here
    large_list = [i for i in range(1000000)]
    
    # Scalene identifies memory leaks
    cache = {}
    for i in range(10000):
        cache[i] = [j for j in range(100)]
    
    # Scalene shows peak memory usage
    return large_list, cache

# Scalene provides:
# - Peak memory usage per line
# - Memory growth over time
# - Potential memory leaks
# - Memory efficiency suggestions

Performance Comparison:

Scalene vs Other Profilers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


# cProfile (standard library)
python -m cProfile -o profile.prof your_program.py

# line_profiler
kernprof -l -v your_program.py

# memory_profiler
python -m memory_profiler your_program.py

# Scalene (comprehensive)
scalene --html your_program.py

Scalene Advantages:

Lower overhead than most profilers
Combines CPU and memory profiling
Better handling of native code
More accurate line-by-line attribution
Modern, interactive HTML reports

These tools and techniques provide a comprehensive approach to Python performance optimization, from understanding algorithmic complexity to detailed profiling and measurement of actual performance characteristics.

James Powell - Fast and Furious Python 7#

Core Performance Principles:#

1. Understand Python’s Object Model:#

2. Leverage Built-in Functions:#

3. Use List Comprehensions and Generator Expressions:#

4. Optimize Data Structures:#

5. Algorithm Complexity Awareness:#

Advanced Optimization Techniques:#

Memory Layout Optimization:#

Function Call Optimization:#

Scalene - Advanced Python Profiler#

Key Features:#

Comprehensive Profiling:#

Installation and Basic Usage:#

Advanced Profiling Options:#

Integration with Code:#

Programmatic Profiling:#

Context Manager Usage:#

Interpreting Scalene Output:#

CPU Profile Information:#

Memory Profile Features:#

Performance Comparison:#

Scalene vs Other Profilers:#