Make it work, then make it lovely, then if you happen to actually, actually must, make it quick. 90 % of the time, if you happen to make it lovely, it’s going to already be quick. So actually, simply make it lovely! (Source)
— Joe Armstrong (co-designers of the Erlang programming language.)
article about Python for the sequence “Information Science: From College to Work.” Because the starting, you will have realized easy methods to manage your Python project with UV, how to write a clean code using PEP and SOLID principles, how to handle errors and use loguru to log your code and how to write tests.
Now you might be ready to create working, production-ready code. However code is rarely excellent and may all the time be improved. A remaining (non-obligatory, however extremely advisable) step in creating code is optimization.
To optimize your code, you want to have the ability to monitor what’s happening in it. To take action, we use instruments referred to as Profilers. They generate profiles of your code. It means a set of statistics that describes how usually and for the way lengthy varied components of this system executed. They make it doable to establish bottlenecks and components of the code that eat too many assets. In different phrases, they present the place your code needs to be optimized.
At the moment, there’s such a proliferation of profilers in Python that the default profiler in Pycharm is known as yappi for “But One other Python Profiler”.
This text is subsequently not an exhaustive listing of all current profilers. On this article, I current a instrument for every side of the code we wish to profile: reminiscence, time and CPU/GPU consumption. Different packages can be talked about with some references however won’t be detailed.
I – Reminiscence profilers
Reminiscence profiling is the strategy of monitoring and evaluating a program’s reminiscence utilization whereas operating. This methodology helps builders find reminiscence leaks, optimizing reminiscence utilization, and comprehending their packages’ reminiscence consumption patterns. Reminiscence profiling is essential to forestall functions from utilizing extra reminiscence than obligatory and inflicting sluggish efficiency or crashes.
1/ memory-profiler
memory_profiler
is an easy-to-use Python module designed to profile reminiscence utilization of a script. It depends upon psutil
module. To put in the bundle, merely sort:
pip set up memory_profiler # (in your digital atmosphere)
# or if you happen to use uv (what I encourage)
uv add memory_profiler
Profiling executable
One of many benefits of this bundle is that it’s not restricted to pythonic use. It installs the mprof
command that enables monitoring the exercise of any executable.
As an example, you’ll be able to monitor the reminiscence consummation of functions like ollama
by operating this command:
mprof run ollama run gemma3:4b
# or with uv
mprof run ollama run gemma3:4b
To see the outcome, you must set up matplotlib
first. Then, you’ll be able to plot the recorded reminiscence profile of your executable by operating:
mprof plot
# or with uv
mprof run ollama run gemma3:4b
The graph then seems to be like this:
Profiling Python code
Let’s get again to what brings us right here, the monitoring of a Python code.
memory_profiler
works with a line-by-line mode utilizing a easy decorator @profile
. First, you enhance the curiosity perform and you then run the script. The output can be written on to the terminal. Take into account the next monitoring.py
script:
@profile
def my_func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
if __name__ == '__main__':
my_func()
You will need to discover that it’s not essential to import the bundle from memory_profiler import profile
on the start of the script. On this case you must specify some particular arguments to the Python interpreter.
python-m memory_profiler monitoring.py # with an area between python and -m
# or
uv run -m memory_profiler monitoring.py
And you’ve got the next output with a line-by-line particulars:

The output is a desk with 5 columns.
- Line #: The road variety of the profiled code
- Mem utilization: The reminiscence utilization of the Python interpreter after executing that line.
- Increment: The change in reminiscence utilization in comparison with the earlier line.
- Occurrences: The variety of occasions that line was executed.
- Line Contents: The precise supply code.
This output may be very detailed and permits very superb monitoring of a selected perform.
Vital: Sadly, this bundle is now not actively maintained. The creator is searching for a substitute.
2/ tracemalloc
tracemalloc
is a built-in module in Python that tracks reminiscence allocations and deallocations. Tracemalloc offers an easy-to-use interface for capturing and analyzing reminiscence utilization snapshots, making it a useful instrument for any Python developer.
It affords the next particulars:
- Reveals the place every object was allotted by offering a traceback.
- Offers reminiscence allocation statistics by file and line quantity, together with the general measurement, rely, and common measurement of reminiscence blocks.
- Permits you to evaluate two snapshots to establish potential reminiscence leaks.
The bundle tracemalloc
could also be usefull to establish reminiscence leak in your code.
Personally, I discover it much less intuitive to arrange than the opposite packages offered on this article. Listed below are some hyperlinks to go additional:
II – Time profilers
Time profiling is the method of measuring the time spent in numerous components of a program. By figuring out efficiency bottlenecks, you’ll be able to focus their optimization efforts on the components of the code that can have essentially the most vital influence.
1/ line-profiler
The line-profiler
bundle is sort of just like memory-profiler
, nevertheless it serves a special objective. It’s designed to profile particular features by measuring the execution time of every line inside these features. To make use of LineProfiler successfully, you have to explicitly specify which features you need it to profile by merely including the @profile
decorator above them.
To put in it simply sort:
pip set up line_profiler # (in your digital atmosphere)
# or
uv add line_profiler
Contemplating the next script named monitoring.py
@profile
def create_list(lst_len: int):
arr = []
for i in vary(0, lst_len):
arr.append(i)
def print_statement(idx: int):
if idx == 0:
print("Beginning array creation!")
elif idx == 1:
print("Array created efficiently!")
else:
elevate ValueError("Invalid index supplied!")
@profile
def predominant():
print_statement(0)
create_list(400000)
print_statement(1)
if __name__ == "__main__":
predominant()
To measure the execution time of the perform predominant()
and create_list()
, we add the decorator @profile
.
The best method to get a time profiling of this script to make use of the kernprof
script.
kernprof -lv monitoring.py # (in your digital atmosphere)
# or
uv run kernprof -lv monitoring.py
It can create a binary file named your_script.py.lprof
. The argument -v
permits to point out directyl the output within the terminal.
In any other case, you’ll be able to view the outcomes later like so:
python-m line_profiler monitoring.py.lprof # (in your digital atmosphere)
# or
uv run python -m line_profiler monitoring.py.lprof
It offers the next informations:

There are two tables, one by profiled perform. Every desk containes the next informations
- Line #: The road quantity within the file.
- Hits: The variety of occasions that line was executed.
- Time: The entire period of time spent executing the road within the timer’s models. Within the header info earlier than the tables, you will note a line “Timer unit:” giving the conversion issue to seconds. It might be completely different on completely different methods.
- Per Hit: The typical period of time spent executing the road as soon as within the timer’s models
- % Time: The share of time spent on that line relative to the overall quantity of recorded time spent within the perform.
- Line Contents: The precise supply code.
1/ cProfile
Python comes with two built-in profilers:
cProfile
: A C extension with affordable overhead that makes it appropriate for profiling long-running packages. It is strongly recommended for many customers.profile
: A pure Python module whose interface is imitated bycProfile
, however which provides vital overhead to profiled packages. It may be a beneficial instrument when you have to prolong or customise the profiling performance.
The bottom syntax is cProfile.run(assertion, filename=None, type=-1)
. The filename
argument may be handed to avoid wasting the output. And the type
argument can be utilized to specify how the output must be printed. By default, it’s set to -1( no worth).
As an example, if you happen to modify the monitoring script like this:
import cProfile
def create_list(lst_len: int):
arr = []
for i in vary(0, lst_len):
arr.append(i)
def print_statement(idx: int):
if idx == 0:
print("Beginning array creation!")
elif idx == 1:
print("Array created efficiently!")
else:
elevate ValueError("Invalid index supplied!")
def predominant():
print_statement(0)
create_list(400000)
print_statement(1)
if __name__ == "__main__":
cProfile.run("predominant()")
we now have the next output:

First, we now have the script outputs: print_statement(0)
and print_statement(1)
.
Then, we now have the profiler output: The primary line reveals the variety of perform calls and the time it took to run. The second line is a reminder of the sorted parameter. And, the profiler offers a desk with six columns:
- ncalls: Reveals the variety of calls made
- tottime: Whole time taken by the given perform. Be aware that the time made in calls to sub-functions are excluded.
- percall: Whole time / No of calls. (the rest is unnoticed)
- cumtime: Not like tottime, this contains time spent on this and all subfunctions that the higher-level perform calls. It’s most helpful and is correct for recursive features.
- percall: The percall following cumtime is calculated because the quotient of cumtime divided by primitive calls. The primitive calls embody all of the calls that weren’t included by way of recursion.
- filename: The title of the tactic.
The primary and the final rows of the desk come from cProfile. The opposite rows are concerning the script.
You may customise the output through the use of the Profile()
class. First, you must initialize an occasion of Profile class and utilizing the tactic allow()
and disable()
to, respectively, begin and to finish the amassing of profiling knowledge. Then, the pstats
module can be utilized to govern the outcomes collected by the profiler object.
To type output by cumulative time, as an alternative of the usual title the earlier code may be rewritten like this:
import cProfile, pstats
# ...
# Identical as earlier than
if __name__ == "__main__":
profiler = cProfile.Profile()
profiler.allow()
predominant()
profiler.disable()
stats = pstats.Stats(profiler).sort_stats('cumtime')
stats.print_stats()
And the output turns into:

As you’ll be able to see, now the desk is sorted by cumtime
. And the 2 rows of cProfile of the earlier desk will not be on this desk.
Visualize profiling with Snakeviz.
The output may be very straightforward to analyse. However, it could change into unreadable if the profiled code turns into too huge.
One other method to analyse the ouput is to visualise knowledge as an alternative of learn it. To take action, we use the Snakeviz
bundle. To put in it, merely sort:
pip set up snakeviz # (in your digital atmosphere)
# or
uv add snakeviz
Then, change stats.print_stats()
by stats.dump_stats("profile.prof")
to avoid wasting profiling knowledge. Now, you’ll be able to have a visualization of your profiling by typing:
snakeviz profile.prof
It launches a file browser interface from which you’ll select amongst two knowledge visualizations: Icicle and Sunburst.


It’s simpler to learn than the print_stats()
output as a result of you’ll be able to work together with every component by shifting your mouse over it. As an example, you’ll be able to have extra particulars concerning the perform create_list()

evaluate_model()
(from the writer).Create a name graph with gprof2dot
A name graph is a visible illustration of the relationships between features or strategies in a program, displaying which features name others and the way lengthy every perform or methodology takes. It may be seen as a map of your code.
pip set up gprof2dot # (in your digital atmosphere)
# or
uv add gprof2dot
Then exectute your by typing
python-m cProfile -o monitoring.pstats .monitoring.py # (in your digital atmosphere)
# or
uv run python-m cProfile -o monitoring.pstats .monitoring.py
It can create a monitoring.pstats
that may be flip right into a name graph utilizing the next command:
gprof2dot -f pstats monitoring.pstats | dot -Tpng -o monitoring.png # (in your digital atmosphere)
# or
uv run gprof2dot -f pstats monitoring.pstats | dot -Tpng -o monitoring.png
Then the decision graph is saved right into a png file named monitoring.png

2/ Different fascinating packages
a/ PyCallGraph
PyCallGraph is a Python module that creates name graph visualizations. To make use of it, you must :
To create a name graph of your code, provide run it a PyCallGraph context like this:
from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput
with PyCallGraph(output=GraphvizOutput()):
# code you wish to profile
Then, you get a png of the decision graph of your code is known as by default pycallgraph.png
.
I’ve made the decision graph of the earlier instance:

In every field, you will have the title of the perform, the time spent in and the variety of calls. Like with snakeviz, the graph could also be very advanced in case your code has many dependencies. However the colour signifies the bottlenecks. In advanced code, it’s very fascinating to review it to see the dependencies and relationships.
b/ PyInstrument
PyInstrument can be a Python profiler very straightforward to make use of. You may add the profiler in your script by surredning the code like this:
from pyinstrument import Profiler
profiler = Profiler()
profiler.begin()
# code you wish to profile
profiler.cease()
print(profiler.output_text(unicode=True, colour=True))
The output provides

It’s much less detailled than cProfile however additionally it is extra readable. Your features are highlighted and sorted by time.
Butthe true curiosity of PyInstrument comes with its html output. To get this html output merely sort within the terminal:
pyinstrument --html .monitoring.py
# or
uv run pyinstrument --html .monitoring.py
It launches a file browser interface from which you’ll select amongst two knowledge visualizations: Name stack and Timeline.


Right here, the profile is extra detailed and you’ve got many choices to filter.
CPU/GPU profiler
CPU and GPU profiling is the method of analyzing the utilization and efficiency of a program on the central processing unit (CPU) and graphics processing unit (GPU). By measuring how a lot assets are spent on completely different components of the code on these processing models, builders can establish efficiency bottlenecks, perceive the place their code is being executed, and optimize their software to attain higher efficiency and effectivity.
So far as I do know, there is just one bundle that may profile GPU energy consumption.
1/ Scalene
Scalene is a high-performance CPU, GPU and reminiscence profiler designed particularly for Python. It’s an open-source bundle that gives detailed insights. It’s designed to be quick, correct, and simple to make use of, making it a wonderful instrument for builders seeking to optimize their code.
- CPU/GPU Profiling: Scalene offers detailed info on CPU/GPU utilization, together with the time spent in numerous components of your code. It may enable you to establish efficiency bottlenecks and optimize your code for higher execution occasions.
- Reminiscence Profiling: Scalene tracks reminiscence allocation and deallocation, serving to you perceive how your code makes use of reminiscence. That is notably helpful for figuring out reminiscence leaks or optimizing memory-intensive functions.
- Line-by-Line Profiling: Scalene offers line-by-line profiling, which supplies you an in depth breakdown of the time spent in every line of your code. This characteristic is invaluable for pinpointing efficiency points.
- Visualization: Scalene features a graphical interface for visualizing profiling outcomes, making it simpler to grasp and navigate the info.
To spotlight all the benefits of Scalene, I’ve developed features with the only real intention of consuming reminiscence memory_waster()
, CPU cpu_waster()
and GPU gpu_convolution()
. All of them are in a script scalene_tuto.py
.
import random
import copy
import math
import cupy as cp
import numpy as np
def memory_waster():
"""Wastes reminiscence however in a managed approach"""
memory_hogs = []
# Create reasonably sized redundant knowledge constructions
for i in vary(100):
garbage_data = []
for j in vary(1000):
waste = f"Ineffective string #{j} repeated " * 10
garbage_data.append(waste)
garbage_data.append(
{
"id": j,
"knowledge": waste,
"numbers": [random.random() for _ in range(50)],
"range_data": listing(vary(100)),
}
)
memory_hogs.append(garbage_data)
for iteration in vary(4):
print(f"Creating copy #{iteration}...")
memory_copy = copy.deepcopy(memory_hogs)
memory_hogs.prolong(memory_copy)
return memory_hogs
def cpu_waster():
meaningless_result = 0
for i in vary(10000):
for j in vary(10000):
temp = (i**2 + j**2) * random.random()
temp = temp / (random.random() + 0.01)
temp = abs(temp**0.5)
meaningless_result += temp
# Some trigonometric operations
angle = random.random() * math.pi
temp += math.sin(angle) * math.cos(angle)
if i % 100 == 0:
random_mess = [random.randint(1, 1000) for _ in range(1000)] # Smaller listing
random_mess.type()
random_mess.reverse()
random_mess.type()
return meaningless_result
def gpu_convolution():
image_size = 128
kernel_size = 64
picture = np.random.random((image_size, image_size)).astype(np.float32)
kernel = np.random.random((kernel_size, kernel_size)).astype(np.float32)
image_gpu = cp.asarray(picture)
kernel_gpu = cp.asarray(kernel)
outcome = cp.zeros_like(image_gpu)
for y in vary(kernel_size // 2, image_size - kernel_size // 2):
for x in vary(kernel_size // 2, image_size - kernel_size // 2):
pixel_value = 0
for ky in vary(kernel_size):
for kx in vary(kernel_size):
iy = y + ky - kernel_size // 2
ix = x + kx - kernel_size // 2
pixel_value += image_gpu[iy, ix] * kernel_gpu[ky, kx]
outcome[y, x] = pixel_value
result_cpu = cp.asnumpy(outcome)
cp.cuda.Stream.null.synchronize()
return result_cpu
def predominant():
print("n1/ Losing some reminiscence (managed)...")
_ = memory_waster()
print("n2/ Losing CPU cycles (managed)...")
_ = cpu_waster()
print("n3/ Losing GPU cycles (managed)...")
_ = gpu_convolution()
if __name__ == "__main__":
predominant()
For the GPU perform, you must set up cupy
in accordance with your cuda model (nvcc --version
to get it)
pip set up cupy-cuda12x # (in your digital atmosphere)
# or
uv add set up cupy-cuda12x
Additional particulars on putting in cupy may be discovered within the documentation.
To run Scalene, use the command
scalene scalene_tuto.py
# or
uv run scalene scalene_tuto.py
It profiles each CPU, GPU, and reminiscence by default. For those who solely need one or among the choices, use the flags --cpu
, --gpu
, and --memory
.
Scalene offers a line-level and a perform degree profiling. And it has two interfaces: the Command Line Interface (CLI) and the net interface.
Vital: It’s higher to make use of Scalene with Ubuntu utilizing WSL. In any other case, the profiler doesn’t retrieve reminiscence consumption info.
a) Command Line Interface
By default, Scalene’s output is the net interface. To acquire the CLI as an alternative, add the flag --cli
.
scalene scalene_tuto.py --cli
# or
uv run scalene scalene_tuto.py --cli
You’ve got the next outcomes:


By default, the code is displayed in darkish mode. So if, like me, you’re employed in gentle mode, the outcome isn’t very fairly.
The visualization is categorized into three distinct colours, every representing a special profiling metric.
- The blue part represents CPU profiling, which offers a breakdown of the time spent executing Python code, native code (comparable to C or C++), and system-related duties (like I/O operations).
- The inexperienced part is devoted to reminiscence profiling, displaying the share of reminiscence allotted by Python code, in addition to the general reminiscence utilization over time and its peak values.
- The yellow part focuses on GPU profiling, displaying the GPU’s operating time and the amount of knowledge copied between the GPU and CPU, measured in mb/s. It’s value noting that GPU profiling is at present restricted to NVIDIA GPUs.
b) The net interface.
The net interface is split in three components.



The colour code is similar as within the command lien interface. However some icons are added:
- 💥: Optimizable code area (efficiency indication within the Operate Profile part).
- ⚡: Optimizable traces of code.
c) AI Strategies
One of many nice benefits of Scalene is the power to make use of AI to enhance the slowness and/or overconsumption you will have recognized. It at present helps OpenAI API, Amazon BedRock, Azure OpenAI and ollama in native

After deciding on your instruments, you simply must click on on 💥 or ⚡if you wish to optimize part of the code or only a line.
I take a look at it with codellama:7b-python
from ollama to optimize the gpu_convolution()
perform. Sadly, as talked about within the interface:
Be aware that optimizations are AI-generated and might not be right.
Not one of the recommended optimizations labored. However the codebase was not conducive to optimization because it was artificially sophisticated. Simply take away pointless traces to avoid wasting time and reminiscence. Additionally, I used a small mannequin, which might be the explanation.
Despite the fact that my checks had been inconclusive, I believe this selection may be fascinating and can certainly proceed to enhance.
Conclusion
These days, we’re much less involved concerning the useful resource consumption of our developments, and really shortly these optimization deficits can accumulate, making the code gradual, too gradual for manufacturing, and generally even requiring the acquisition of extra highly effective {hardware}.
Code profiling instruments are indispensable in the case of figuring out areas in want of optimization.
The mixture of the reminiscence profiler and line profiler offers an excellent preliminary evaluation: straightforward to arrange, with easy-to-understand experiences.
Instruments comparable to cProfile and Scalene are full and have graphical representations, however require extra time to investigate. Lastly, the AI optimization choice supplied by Scalene is an actual asset, even when in my case the mannequin used was not enough to offer something related.
Interested in Python & Information Science?
Comply with me for extra tutorials and insights!