cpython

Author	SHA1	Message	Date
Ken Jin	4fa80ce74c	gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.	2025-11-13 18:08:32 +00:00
Sam Gross	a10152f8fd	gh-137400: Fix thread-safety issues when profiling all threads (gh-137518) There were a few thread-safety issues when profiling or tracing all threads via PyEval_SetProfileAllThreads or PyEval_SetTraceAllThreads: * The loop over thread states could crash if a thread exits concurrently (in both the free threading and default build) * The modification of `c_profilefunc` and `c_tracefunc` wasn't thread-safe on the free threading build.	2025-08-13 14:15:12 -04:00
Kumar Aditya	9a6b60af40	gh-136870: fix data races in instrumentation of bytecode (#136994 ) De-instrumenting code objects modifies the thread local bytecode for all threads as such, holding the critical section on the code object is not sufficient and leads to data races. Now, the de-instrumentation is now performed under a stop the world pause as such no thread races with executing the thread local bytecode while it is being de-instrumented.	2025-07-24 17:58:46 +00:00
Sergey Muraviov	cf19b6435d	gh-134411: assert `PyLong_FromLong(x) != NULL` when `x` is known to be small (#134415 ) Since `PyLong_From Long(PY_MONITORING_DEBUGGER_ID)` falls to `small_int` case and can't return `NULL`. Added `assert`s for extra confidence. https://github.com/python/cpython/issues/134411#issuecomment-2897653868	2025-07-21 11:59:06 +03:00
Tian Gao	db2032407a	Fix a minor indentation error (#136661 )	2025-07-14 17:01:56 -07:00
mpage	619edb802e	gh-132336: Mark a few "slow path" functions used by the interpreter loop as noinline (#132337 ) Mark a few functions used by the interpreter loop as noinline These are all the slow path and should not be inlined into the interpreter loop. Unfortunately, they end up being inlined with LTO and the current PGO task.	2025-04-10 10:41:15 +02:00
Sergey Muraviov	151d1bfd1b	gh-131763: Replace the redundant check with assert in remove_tools (#131765 )	2025-03-26 18:36:04 -04:00
Bénédikt Tran	43fde78bef	gh-111178: fix UBSan failures for `Python/instrumentation.c` (#131608 )	2025-03-24 10:58:33 +01:00
Victor Stinner	b69da006a4	gh-131238: Remove includes from pycore_interp.h (#131495 ) Remove also now unused includes in C files.	2025-03-20 11:35:23 +00:00
Victor Stinner	20c5f969dd	gh-131238: Remove more includes from pycore_interp.h (#131480 )	2025-03-19 23:01:32 +01:00
Victor Stinner	22706843e0	gh-131238: Remove many includes from pycore_interp.h (#131472 )	2025-03-19 17:46:24 +00:00
Mark Shannon	a1aeec61c4	GH-131238: Core header refactor (GH-131250) * Moves most structs in pycore_ header files into pycore_structs.h and pycore_runtime_structs.h * Removes many cross-header dependencies	2025-03-17 09:19:04 +00:00
Kumar Aditya	ea57ffa02e	gh-131141: fix data race in instrumentation while registering callback (#131142 )	2025-03-13 00:11:52 +05:30
Mark Shannon	89df62c120	GH-128534: Fix behavior of branch monitoring for `async for` (GH-130847) * Both branches in a pair now have a common source and are included in co_branches	2025-03-07 14:30:31 +00:00
Mark Shannon	2a18e80695	GH-128534: Instrument branches for `async for` loops. (GH-130569)	2025-02-27 09:36:41 +00:00
Brandt Bucher	11bb08e4ec	GH-129715: Don't project traces that return to an unknown caller (GH-130024)	2025-02-12 10:16:43 -08:00
Sam Gross	a10f99375e	Revert "GH-128914: Remove conditional stack effects from `bytecodes.c` and the code generators (GH-128918)" (GH-129202) The commit introduced a ~2.5-3% regression in the free threading build. This reverts commit `ab61d3f430`.	2025-01-23 09:26:25 +00:00
Mark Shannon	f5b6356a11	GH-128563: Add new frame owner type for interpreter entry frames (GH-129078) Add new frame owner type for interpreter entry frames	2025-01-21 10:15:02 +00:00
Mark Shannon	7239da7559	GH-127953: Make line number lookup O(1) regardless of the size of the code object (GH-128350)	2025-01-21 09:33:23 +00:00
Mark Shannon	ab61d3f430	GH-128914: Remove conditional stack effects from `bytecodes.c` and the code generators (GH-128918)	2025-01-20 17:09:23 +00:00
Mark Shannon	f826beca0c	GH-128375: Better instrument for `FOR_ITER` (GH-128445)	2025-01-06 17:54:47 +00:00
Mark Shannon	d2f1d917e8	GH-122548: Implement branch taken and not taken events for sys.monitoring (GH-122564)	2024-12-19 16:59:51 +00:00
Eric Snow	9dabace39d	gh-114940: Add _Py_FOR_EACH_TSTATE_UNLOCKED(), and Friends (gh-127077) This is a precursor to the actual fix for gh-114940, where we will change these macros to use the new lock. This change is almost entirely mechanical; the exceptions are the loops in codeobject.c and ceval.c, which now hold the "head" lock. Note that almost all of the uses of _Py_FOR_EACH_TSTATE_UNLOCKED() here will change to _Py_FOR_EACH_TSTATE_BEGIN() once we add the new per-interpreter lock.	2024-11-21 11:08:38 -07:00
mpage	2e95c5ba3b	gh-115999: Implement thread-local bytecode and enable specialization for `BINARY_OP` (#123926 ) Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads. Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization. Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.	2024-11-04 11:13:32 -08:00
Mark Shannon	faa3272fb8	GH-125837: Split `LOAD_CONST` into three. (GH-125972) * Add LOAD_CONST_IMMORTAL opcode * Add LOAD_SMALL_INT opcode * Remove RETURN_CONST opcode	2024-10-29 11:15:42 +00:00
Mark Shannon	f55273b3b7	GH-116968: Remove branch from advance_backoff_counter (GH-124469)	2024-10-07 11:46:33 +01:00
Tian Gao	5e0abb4788	gh-116750: Add clear_tool_id function to unregister events and callbacks (#124568 )	2024-10-01 13:32:55 -04:00
Mark Shannon	7a65439b93	GH-122390: Replace `_Py_GetbaseOpcode` with `_Py_GetBaseCodeUnit` (GH-122942)	2024-08-13 14:22:57 +01:00
Tian Gao	57d7c3e78f	gh-122247: Move instruction instrumentation sanity check after tracing check (#122251 )	2024-08-07 21:30:14 -07:00
Ken Jin	b1b61dc4ce	gh-117657: Fix some simple races in instrumentation.c (GH-120118) * stop the world when setting local events	2024-06-13 17:31:21 +08:00
scoder	70b07aa415	gh-111997: Fix argument count for LINE event and clarify type of argument counts. (#119179 )	2024-05-26 12:37:33 +00:00
Irit Katriel	c85e352673	gh-119431: fix refleak in test_monitoring (#119444 )	2024-05-23 10:21:53 +01:00
Irit Katriel	6e9863d7a3	gh-118692: Avoid creating unnecessary StopIteration instances for monitoring (#119216 )	2024-05-21 20:42:51 +00:00
Dino Viehland	00d913c671	gh-118415: Fix issues with local tracing being enabled/disabled on a function (#118496 )	2024-05-06 13:06:09 -07:00
Irit Katriel	85af789961	gh-111997: C-API for signalling monitoring events (#116413 )	2024-05-04 08:23:50 +00:00
Tian Gao	9c14ed0618	gh-107674: Improve performance of `sys.settrace` (GH-117133) * Check tracing in RESUME_CHECK * Only change to RESUME_CHECK if not tracing	2024-05-03 19:49:24 +01:00
Guido van Rossum	7d83f7bcc4	gh-118335: Configure Tier 2 interpreter at build time (#118339 ) The code for Tier 2 is now only compiled when configured with `--enable-experimental-jit[=yes\|interpreter]`. We drop support for `PYTHON_UOPS` and -`Xuops`, but you can disable the interpreter or JIT at runtime by setting `PYTHON_JIT=0`. You can also build it without enabling it by default using `--enable-experimental-jit=yes-off`; enable with `PYTHON_JIT=1`. On Windows, the `build.bat` script supports `--experimental-jit`, `--experimental-jit-off`, `--experimental-interpreter`. In the C code, `_Py_JIT` is defined as before when the JIT is enabled; the new variable `_Py_TIER2` is defined when the JIT or the interpreter is enabled. It is actually a bitmask: 1: JIT; 2: default-off; 4: interpreter.	2024-04-30 18:26:34 -07:00
Dino Viehland	4a1cf66c5c	gh-117657: Fix small issues with instrumentation and TSAN (#118064 ) Small TSAN fixups for instrumentation	2024-04-30 11:38:05 -07:00
Tian Gao	375c94c75d	gh-107674: Lazy load line number to improve performance of tracing (GH-118127)	2024-04-29 09:54:52 +01:00
Dino Viehland	07525c9a85	gh-116818: Make `sys.settrace`, `sys.setprofile`, and monitoring thread-safe (#116775 ) Makes sys.settrace, sys.setprofile, and monitoring generally thread-safe. Mostly uses a stop-the-world approach and synchronization around the code object's _co_instrumentation_version. There may be a little bit of extra synchronization around the monitoring data that's required to be TSAN clean.	2024-04-19 14:47:42 -07:00
Tian Gao	57183241af	gh-107674: Remove some unnecessary code in instrumentation code (GH-117393)	2024-04-09 09:54:28 +01:00
Guido van Rossum	060a96f1a9	gh-116968: Reimplement Tier 2 counters (#117144 ) Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``), shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The API used for adaptive specialization counters is changed but the behavior is (supposed to be) identical. The behavior of the Tier 2 counters is changed: - There are no longer dynamic thresholds (we never varied these). - All counters now use the same exponential backoff. - The counter for ``JUMP_BACKWARD`` starts counting down from 16. - The ``temperature`` in side exits starts counting down from 64.	2024-04-04 15:03:27 +00:00
Brett Simmers	0adfa8482d	gh-115832: Fix instrumentation version mismatch during interpreter shutdown (#115856 ) A previous commit introduced a bug to `interpreter_clear()`: it set `interp->ceval.instrumentation_version` to 0, without making the corresponding change to `tstate->eval_breaker` (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and the version check in bytecodes.c will see a different result than the one in instrumentation.c causing an infinite loop. The fix itself is straightforward: clear `tstate->eval_breaker` when clearing `interp->ceval.instrumentation_version`.	2024-03-04 11:29:39 -05:00
Tian Gao	7895a61168	gh-116098: Revert "gh-107674: Improve performance of `sys.settrace` (GH-114986)" (GH-116178) Revert "gh-107674: Improve performance of `sys.settrace` (GH-114986)" This reverts commit `0a61e23700`.	2024-03-01 07:46:33 +01:00
Tian Gao	0a61e23700	gh-107674: Improve performance of `sys.settrace` (GH-114986)	2024-02-28 15:21:42 +00:00
Michael Droettboom	b05afdd5ec	gh-115168: Add pystats counter for invalidated executors (GH-115169)	2024-02-26 17:51:47 +00:00
Brett Simmers	0749244d13	gh-112175: Add `eval_breaker` to `PyThreadState` (#115194 ) This change adds an `eval_breaker` field to `PyThreadState`. The primary motivation is for performance in free-threaded builds: with thread-local eval breakers, we can stop a specific thread (e.g., for an async exception) without interrupting other threads. The source of truth for the global instrumentation version is stored in the `instrumentation_version` field in PyInterpreterState. Threads usually read the version from their local `eval_breaker`, where it continues to be colocated with the eval breaker bits.	2024-02-20 09:57:48 -05:00
Mark Shannon	0ae60b66de	GH-113486: Do not emit spurious PY_UNWIND events for optimized calls to classes. (GH-113680)	2024-01-05 09:45:22 +00:00
Tian Gao	e0afed7e27	gh-103615: Use local events for opcode tracing (GH-109472) * Use local monitoring for opcode trace * Remove f_opcode_trace_set * Add test for setting f_trace_opcodes after settrace	2023-11-03 16:39:50 +00:00
Sam Gross	6dfb8fe023	gh-110481: Implement biased reference counting (gh-110764)	2023-10-30 16:06:09 +00:00

1 2

92 Commits