Visual NumPy and Data Representation

A Visual Intro to NumPy and Data Representation – Jay Alammar

Brilliant visual guide to understanding NumPy through interactive diagrams:

Why Visual Learning Works:

  • Complex Concepts: NumPy operations involve multi-dimensional thinking
  • Abstract Operations: Broadcasting and reshaping are hard to visualize mentally
  • Mathematical Foundation: Linear algebra concepts become clearer with visuals
  • Debugging Aid: Understanding data shapes prevents common errors

Core NumPy Concepts Visualized:

Array Creation and Structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import numpy as np

# 1D array visualization
arr_1d = np.array([1, 2, 3, 4])
# [1] [2] [3] [4]

# 2D array (matrix) visualization  
arr_2d = np.array([[1, 2, 3], 
                   [4, 5, 6]])
# [[1] [2] [3]]
# [[4] [5] [6]]

# 3D array visualization
arr_3d = np.array([[[1, 2], [3, 4]], 
                   [[5, 6], [7, 8]]])
# Layer 0: [[1] [2]]  Layer 1: [[5] [6]]
#          [[3] [4]]           [[7] [8]]

Array Operations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Element-wise operations visualized
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Addition: [1] + [4] = [5]
#          [2] + [5] = [7]  
#          [3] + [6] = [9]
result = a + b  # [5, 7, 9]

# Broadcasting visualization
scalar = 10
# [1] + 10 = [11]
# [2] + 10 = [12]
# [3] + 10 = [13]
result = a + scalar  # [11, 12, 13]

Advanced Visualizations:

Matrix Multiplication:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Dot product visualization
A = np.array([[1, 2], 
              [3, 4]])
B = np.array([[5, 6], 
              [7, 8]])

# Visual representation of A @ B:
# Row 1 of A Γ— Column 1 of B = (1Γ—5 + 2Γ—7) = 19
# Row 1 of A Γ— Column 2 of B = (1Γ—6 + 2Γ—8) = 22
# Row 2 of A Γ— Column 1 of B = (3Γ—5 + 4Γ—7) = 43
# Row 2 of A Γ— Column 2 of B = (3Γ—6 + 4Γ—8) = 50
result = A @ B  # [[19, 22], [43, 50]]

Reshaping and Indexing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Original shape: (6,)
arr = np.array([1, 2, 3, 4, 5, 6])

# Reshape to (2, 3): 
# [[1] [2] [3]]
# [[4] [5] [6]]
reshaped = arr.reshape(2, 3)

# Reshape to (3, 2):
# [[1] [2]]
# [[3] [4]]  
# [[5] [6]]
reshaped = arr.reshape(3, 2)

Practical Applications:

Data Science Workflows:

1
2
3
4
5
6
7
8
# Image processing example
image = np.random.rand(256, 256, 3)  # Height, Width, RGB channels

# Grayscale conversion (visual: RGB β†’ single channel)
grayscale = np.mean(image, axis=2)   # Average across color channels

# Image resizing (visual: larger/smaller grid)
resized = image[::2, ::2, :]         # Every other pixel

Machine Learning Data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Dataset visualization
X = np.random.rand(1000, 10)  # 1000 samples, 10 features
y = np.random.randint(0, 2, 1000)  # Binary labels

# Feature normalization (visual: data distribution shifting)
X_normalized = (X - X.mean(axis=0)) / X.std(axis=0)

# Train/test split (visual: data partitioning)
split_idx = int(0.8 * len(X))
X_train, X_test = X[:split_idx], X[split_idx:]

Learning Benefits:

  • Intuitive Understanding: See what operations actually do to data
  • Debugging Skills: Quickly spot shape mismatches and dimensionality issues
  • Performance Awareness: Understand which operations are efficient
  • Mathematical Connection: Bridge between code and underlying mathematics

Advent of Code - Learning Through Problem Solving

My solutions and walkthroughs for Advent of Code Peter Norvig’s Advent of Code 2020 Solutions

Exploring algorithmic problem-solving through Advent of Code challenges:

What is Advent of Code:

  • Daily Programming Puzzles: 25 problems released from Dec 1-25
  • Increasing Difficulty: Problems get progressively harder
  • Algorithmic Focus: Emphasis on data structures and algorithms
  • Multiple Languages: Solve in any programming language

Learning from Peter Norvig’s Solutions:

Elegant Python Patterns:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Pattern: Generator expressions for parsing
def parse_input(text):
    return [list(map(int, line.split())) for line in text.strip().split('\n')]

# Pattern: Using collections.Counter for frequency analysis
from collections import Counter
def find_mode(items):
    return Counter(items).most_common(1)[0]

# Pattern: Recursive solutions with memoization
from functools import lru_cache

@lru_cache(maxsize=None)
def count_arrangements(adapters, index=0):
    if index == len(adapters) - 1:
        return 1
    return sum(count_arrangements(adapters, i) 
               for i in range(index + 1, len(adapters))
               if adapters[i] - adapters[index] <= 3)

Data Structure Selection:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Using sets for fast membership testing
visited = set()
if position not in visited:
    visited.add(position)

# Using deque for efficient queue operations
from collections import deque
queue = deque([start_position])
while queue:
    current = queue.popleft()

# Using complex numbers for 2D coordinates
position = 0 + 0j  # (0, 0)
directions = [1, -1, 1j, -1j]  # right, left, up, down
new_position = position + directions[0]  # Move right

Algorithm Categories in AoC:

Graph Algorithms:

  • BFS/DFS: Pathfinding and reachability
  • Dijkstra: Shortest path with weights
  • Topological Sort: Dependency resolution
  • Connected Components: Network analysis

Dynamic Programming:

  • Memoization: Avoid recomputing subproblems
  • Bottom-up: Build solutions incrementally
  • State Space: Define problem states clearly
  • Optimization: Find optimal solutions

Pattern Recognition:

  • Cycle Detection: Find repeating patterns
  • State Machines: Model system behavior
  • Regular Expressions: Text parsing and matching
  • Mathematical Sequences: Number theory problems

Best Practices from Expert Solutions:

Code Organization:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def solve_part1(input_text):
    data = parse_input(input_text)
    return process_data_part1(data)

def solve_part2(input_text):
    data = parse_input(input_text)
    return process_data_part2(data)

def parse_input(text):
    # Centralized parsing logic
    pass

# Separate parsing from solving
# Reusable functions for both parts

Testing and Validation:

1
2
3
4
5
6
7
def test_with_examples():
    example_input = """..."""
    assert solve_part1(example_input) == expected_result
    print("Part 1 example passed!")

# Always test with provided examples first
# Use assertions to catch regressions

Johnny Decimal Organization System

Home | Johnnyβ€’Decimal

Revolutionary system for organizing digital information and files:

The Core Concept:

  • Hierarchical Structure: 10 areas, 10 categories each, unlimited items
  • Numeric Identification: Every item gets a unique number
  • Search Friendly: Find anything by its number
  • Scalable: Works for personal and business organization

System Structure:

Three Levels:

A β”œ r ─ e ─ a s C β”” a ─ ( t ─ 1 e 0 g I - o t 1 r e 9 i m , e s s 2 ( 0 ( 1 - 1 1 2 1 . 9 , 0 , 1 1 , 3 2 0 , 1 - 1 3 1 . 9 3 0 , , 2 , e e t t 1 c c 1 . . . ) ) 0 3 , e t c . )

Example Implementation:

1 β”œ β”‚ β”‚ β”‚ β”œ β”‚ β”‚ β”‚ β”” 2 β”œ β”œ β”” 0 ─ ─ ─ 0 ─ ─ ─ - ─ ─ ─ - ─ ─ ─ 1 2 9 1 β”œ β”œ β”” 1 β”œ β”œ β”” 1 β”œ β”œ β”” 9 2 2 2 1 ─ ─ ─ 2 ─ ─ ─ 3 ─ ─ ─ 1 2 3 P ─ ─ ─ ─ ─ ─ ─ ─ ─ H e L H C o F H T r e 1 1 1 e 1 1 1 a 1 1 1 m i o r s a 1 1 1 a 2 2 2 r 3 3 3 e n m a o r . . . l . . . e . . . a e v n n 0 0 0 t 0 0 0 e 0 0 0 & n e a i 1 2 3 h 1 2 3 r 1 2 3 c M l l n L i a g O B S & W M M D R P I i a i & D n o k o e e e e o n f l n e & l o i F r a d v s r t e t V v i k l i k l i e u t e M e a e E n s l t o c l m f r M a n c l d e n u P a o e o v a n a a o u t P e t l l p l i n a n t p c C o r s a m V i e a g c i m a o a s P n R e e o w g e e o e t u R c l n e n r e m n n i r e t a i c t s P P m e t o s a i n n o i r r e n n e d c s g r o o e n t s e d n j p t s s e c t s

Digital Implementation:

File System Organization:

/ β”œ β”‚ β”‚ β”œ β”‚ β”‚ β”œ β”‚ β”‚ β”” D ─ ─ ─ ─ o ─ ─ ─ ─ c u 1 β”œ β”” 1 β”œ β”” 2 β”œ β”” 2 β”œ β”” m 1 ─ ─ 1 ─ ─ 1 ─ ─ 1 ─ ─ e . ─ ─ . ─ ─ . ─ ─ . ─ ─ n 0 0 0 0 t 1 p d 2 p b 1 2 m 2 2 r s y e r u 0 o 0 e / O t s B o s B 2 n T 2 c n h i o g i u 1 t a 0 e l o g o r n d - h x - i i n n k a e g b l t p n - - s m s e u y D a t e c f m s t d - o x s o u t i - g t c - / C u n o n b & e r u r o r d g o t a m e u s a R - o E . c e t r e m e b k x x k n u s / e a o s p l i t r e n d . e s n s n s t k m n x g . / a s d s . p l . e x d s m s l f / d / s x

Note-Taking Integration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 11.03 Daily Learning Notes

## 2021-01-08 - NumPy Visualization Study
- Reviewed Jay Alammar's visual NumPy guide
- Key insight: Broadcasting is like stretching arrays
- Next: Practice with real dataset

## Cross-references:
- See 13.02 for portfolio project using NumPy
- See 21.03 for data analysis of expenses

Benefits of the System:

Cognitive Load Reduction:

  • No More “Where Did I Put That?”: Everything has a number
  • Consistent Structure: Same hierarchy everywhere
  • Muscle Memory: Numbers become automatic
  • Reduced Decisions: Clear place for everything

Scalability:

  • Personal Use: Organize your entire digital life
  • Team Projects: Shared understanding of structure
  • Business Applications: Department and project organization
  • Long-term Maintenance: Structure remains stable over time

Implementation Tips:

Getting Started:

  1. Audit Current Organization: What areas of life/work do you manage?
  2. Define Areas: 10 broad categories (can use fewer initially)
  3. Break Down Categories: What subcategories exist in each area?
  4. Assign Numbers: Start with most important/frequent items
  5. Iterate: Refine structure as you use it

Digital Tools Integration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Python script to create Johnny Decimal folder structure
import os

areas = {
    10: "Personal Development",
    20: "Home & Life Management", 
    30: "Work Projects"
}

categories = {
    11: "Learning & Education",
    12: "Health & Fitness",
    21: "Financial Management",
    22: "Home Maintenance"
}

def create_structure(base_path):
    for area_num, area_name in areas.items():
        area_path = f"{base_path}/{area_num}-{area_num+9} {area_name}"
        os.makedirs(area_path, exist_ok=True)
        
        for cat_num, cat_name in categories.items():
            if area_num <= cat_num < area_num + 10:
                cat_path = f"{area_path}/{cat_num} {cat_name}"
                os.makedirs(cat_path, exist_ok=True)

These three resources represent different approaches to managing complexity - visual learning for technical concepts, structured problem-solving for algorithm development, and systematic organization for information management.