Memory Profiling with Memray Plugin

The Memray Plugin provides a seamless way to profile the memory usage of Python functions using the Memray library. It helps developers identify memory leaks, optimize memory consumption, and understand allocation patterns within their applications by generating detailed, interactive reports.

Core Capabilities

The memray_profiling decorator enables comprehensive memory tracking for any Python function. It captures detailed allocation information and generates insightful HTML reports.

Detailed Memory Tracking: The plugin leverages memray.Tracker to record memory allocations during function execution. This includes:
- Native Traces: Capture native stack frames alongside Python frames to pinpoint memory allocations originating from C/C++ extensions or underlying system calls. Enable this by setting native_traces=True.
- Python Allocator Tracing: Trace allocations made by Python's internal allocators as independent events, providing deeper insight into Python's memory management. Set trace_python_allocators=True to activate.
- Fork Tracking: Continue tracking memory usage in subprocesses created via os.fork(). This is crucial for applications that spawn child processes. Enable with follow_fork=True.
- Memory Interval Control: Adjust the frequency of resident set size updates using memory_interval_ms. This parameter dictates how often the total virtual memory allocated by the process is recorded, influencing the granularity of memory usage graphs in reports.
Automated Report Generation: After profiling, the plugin automatically generates interactive HTML reports.
- Reporter Selection: Choose between flamegraph (default) and table reporters using the memray_html_reporter parameter. Flame graphs visually represent memory usage across the call stack, while table reports offer a tabular breakdown of allocations.
- Custom Reporter Arguments: Pass additional command-line arguments directly to the chosen Memray reporter via memray_reporter_args. This allows for fine-grained control over report generation, such as filtering or sorting options. For example, memray_reporter_args=["--leaks"] can be used with the table reporter to show only leaked allocations.
Organized Output: All profiling data (.bin files) and generated HTML reports are stored in a dedicated memray_bin directory, ensuring a clean separation of profiling artifacts.
Integrated Report Display: The generated HTML report content is captured and passed to a Deck component, facilitating direct display within an integrated environment or UI.

Usage

To profile a function, apply the memray_profiling decorator to it.

import time
import os
import sys
import memray
from typing import Optional, Callable, List

# Assume ClassDecorator and Deck are defined elsewhere in the system
# For demonstration, we'll mock them
class ClassDecorator:
    def __init__(self, task_function: Optional[Callable] = None, **kwargs):
        self.task_function = task_function
        self.__dict__.update(kwargs)

    def __call__(self, *args, **kwargs):
        return self.execute(*args, **kwargs)

class Deck:
    def __init__(self, title: str, content: str):
        print(f"--- Deck: {title} ---")
        # In a real system, this would render the HTML content
        print(f"HTML content snippet: {content[:200]}...")

# The actual memray_profiling class from the codebase
class memray_profiling(ClassDecorator):
    def __init__(
        self,
        task_function: Optional[Callable] = None,
        native_traces: bool = False,
        trace_python_allocators: bool = False,
        follow_fork: bool = False,
        memory_interval_ms: int = 10,
        memray_html_reporter: str = "flamegraph",
        memray_reporter_args: Optional[List[str]] = None,
    ):
        if memray_html_reporter not in ["flamegraph", "table"]:
            raise ValueError(f"{memray_html_reporter} is not a supported html reporter.")

        if memray_reporter_args is not None and not all(
            isinstance(arg, str) and "--" in arg for arg in memray_reporter_args
        ):
            raise ValueError(
                f"unrecognized arguments for {memray_html_reporter} reporter. Please check https://bloomberg.github.io/memray/{memray_html_reporter}.html"
            )

        self.native_traces = native_traces
        self.trace_python_allocators = trace_python_allocators
        self.follow_fork = follow_fork
        self.memory_interval_ms = memory_interval_ms
        self.dir_name = "memray_bin"
        self.memray_html_reporter = memray_html_reporter
        self.memray_reporter_args = memray_reporter_args if memray_reporter_args else []

        super().__init__(
            task_function,
            native_traces=native_traces,
            trace_python_allocators=trace_python_allocators,
            follow_fork=follow_fork,
            memory_interval_ms=memory_interval_ms,
            memray_html_reporter=memray_html_reporter,
            memray_reporter_args=memray_reporter_args,
        )

    def execute(self, *args, **kwargs):
        if not os.path.exists(self.dir_name):
            os.makedirs(self.dir_name)

        bin_filepath = os.path.join(
            self.dir_name,
            f"{self.task_function.__name__}.{time.strftime('%Y%m%d%H%M%S')}.bin",
        )

        with memray.Tracker(
            bin_filepath,
            native_traces=self.native_traces,
            trace_python_allocators=self.trace_python_allocators,
            follow_fork=self.follow_fork,
            memory_interval_ms=self.memory_interval_ms,
        ):
            output = self.task_function(*args, **kwargs)

        self.generate_flytedeck_html(reporter=self.memray_html_reporter, bin_filepath=bin_filepath)

        return output

    def generate_flytedeck_html(self, reporter, bin_filepath):
        html_filepath = bin_filepath.replace(
            self.task_function.__name__, f"{reporter}.{self.task_function.__name__}"
        ).replace(".bin", ".html")

        memray_reporter_args_str = " ".join(self.memray_reporter_args)

        # Mock os.system for demonstration
        # In a real scenario, this would execute the memray command
        print(f"Executing: {sys.executable} -m memray {reporter} -o {html_filepath} {memray_reporter_args_str} {bin_filepath}")
        # Simulate successful execution and file creation
        with open(html_filepath, "w", encoding="utf-8") as f:
            f.write(f"<html><body><h1>Memray {reporter.capitalize()} Report for {self.task_function.__name__}</h1><p>Generated with args: {memray_reporter_args_str}</p></body></html>")

        if os.path.exists(html_filepath): # Simulate success
            with open(html_filepath, "r", encoding="utf-8") as file:
                html_content = file.read()

            Deck(f"Memray {reporter.capitalize()}", html_content)
        else:
            print(f"Failed to generate HTML report at {html_filepath}")

    def get_extra_config(self):
        return {}


# Example 1: Basic memory profiling with default flamegraph report
@memray_profiling()
def process_data_basic(size: int):
    """Allocates a list of integers."""
    data = [i for i in range(size)]
    return len(data)

# Example 2: Profiling with native traces and a table report
@memray_profiling(native_traces=True, memray_html_reporter="table", memray_reporter_args=["--leaks"])
def process_data_advanced(num_objects: int):
    """Creates objects that might leak memory (for demonstration)."""
    class MyObject:
        def __init__(self, value):
            self.value = value
            self.large_data = bytearray(1024 * 1024) # 1MB per object

    objects = []
    for i in range(num_objects):
        objects.append(MyObject(i))
    # Simulate a leak by not clearing some objects
    return objects[:num_objects // 2] # Half of them are "leaked" from this function's scope

# Run the examples
if __name__ == "__main__":
    print("--- Running basic profiling ---")
    result_basic = process_data_basic(100000)
    print(f"Basic processing complete. Result: {result_basic}")

    print("\n--- Running advanced profiling with native traces and table report ---")
    # Clean up previous memray_bin directory for a fresh run
    if os.path.exists("memray_bin"):
        import shutil
        shutil.rmtree("memray_bin")

    result_advanced = process_data_advanced(5) # Create 5 objects, "leak" 2-3
    print(f"Advanced processing complete. Result (first half of objects): {len(result_advanced)}")

    # Clean up generated files
    if os.path.exists("memray_bin"):
        import shutil
        shutil.rmtree("memray_bin")

When process_data_basic or process_data_advanced executes, the plugin automatically:

Creates a memray_bin directory if it doesn't exist.
Starts a memray.Tracker instance, configured with the specified parameters.
Executes the decorated function.
Stops the tracker, saving raw profiling data to a .bin file (e.g., memray_bin/process_data_basic.YYYYMMDDHHMMSS.bin).
Generates an HTML report (e.g., memray_bin/flamegraph.process_data_basic.YYYYMMDDHHMMSS.html) using the chosen reporter and arguments.
Passes the HTML content to the Deck component for display.

Common Use Cases

Identifying Memory Leaks: Pinpoint exactly where memory is allocated and not properly released, leading to increasing memory consumption over time. The table reporter with --leaks argument is particularly useful here.
Optimizing Memory Footprint: Understand which parts of the code consume the most memory and identify opportunities to reduce allocations or use more memory-efficient data structures. Flame graphs provide an excellent visual aid for this.
Benchmarking Memory Performance: Compare the memory usage of different algorithms or implementations for a given task to choose the most efficient one.
Debugging Out-of-Memory Errors: When an application crashes due to excessive memory usage, the plugin helps trace back to the root cause by showing the allocation history leading up to the error.
Analyzing Native Extension Memory: With native_traces=True, investigate memory allocations originating from C/C++ extensions, which are often harder to debug with pure Python tools.

Considerations

Performance Overhead: Memory profiling introduces some overhead due to the instrumentation required to track allocations. Use the plugin judiciously, typically during development, testing, or specific debugging sessions, rather than in production environments unless absolutely necessary.
Reporter Argument Validation: The plugin performs basic validation for memray_reporter_args to ensure they start with --. However, it does not validate the semantic correctness of the arguments for the specific Memray reporter. Refer to the official Memray documentation for the flamegraph and table reporters to ensure valid arguments are passed.
Deck Component Integration: The plugin relies on an external Deck component to display the generated HTML reports. Ensure this component is properly configured and available in your environment for the reports to be rendered.
Temporary Files: The plugin creates .bin files containing raw profiling data and .html files for reports. These files are stored in the memray_bin directory. Manage these files as needed, especially in CI/CD pipelines or environments with strict storage policies.

Core Capabilities​

Usage​

Common Use Cases​

Considerations​

Core Capabilities

Usage

Common Use Cases

Considerations