Commit Graph

76 Commits

Author SHA1 Message Date
Sam Gross
a10f99375e Revert "GH-128914: Remove conditional stack effects from bytecodes.c and the code generators (GH-128918)" (GH-129202)
The commit introduced a ~2.5-3% regression in the free threading build.

This reverts commit ab61d3f430.
2025-01-23 09:26:25 +00:00
Mark Shannon
f5b6356a11 GH-128563: Add new frame owner type for interpreter entry frames (GH-129078)
Add new frame owner type for interpreter entry frames
2025-01-21 10:15:02 +00:00
Mark Shannon
7239da7559 GH-127953: Make line number lookup O(1) regardless of the size of the code object (GH-128350) 2025-01-21 09:33:23 +00:00
Mark Shannon
ab61d3f430 GH-128914: Remove conditional stack effects from bytecodes.c and the code generators (GH-128918) 2025-01-20 17:09:23 +00:00
Mark Shannon
f826beca0c GH-128375: Better instrument for FOR_ITER (GH-128445) 2025-01-06 17:54:47 +00:00
Mark Shannon
d2f1d917e8 GH-122548: Implement branch taken and not taken events for sys.monitoring (GH-122564) 2024-12-19 16:59:51 +00:00
Eric Snow
9dabace39d gh-114940: Add _Py_FOR_EACH_TSTATE_UNLOCKED(), and Friends (gh-127077)
This is a precursor to the actual fix for gh-114940, where we will change these macros to use the new lock.  This change is almost entirely mechanical; the exceptions are the loops in codeobject.c and ceval.c, which now hold the "head" lock.  Note that almost all of the uses of _Py_FOR_EACH_TSTATE_UNLOCKED() here will change to _Py_FOR_EACH_TSTATE_BEGIN() once we add the new per-interpreter lock.
2024-11-21 11:08:38 -07:00
mpage
2e95c5ba3b gh-115999: Implement thread-local bytecode and enable specialization for BINARY_OP (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
2024-11-04 11:13:32 -08:00
Mark Shannon
faa3272fb8 GH-125837: Split LOAD_CONST into three. (GH-125972)
* Add LOAD_CONST_IMMORTAL opcode

* Add LOAD_SMALL_INT opcode

* Remove RETURN_CONST opcode
2024-10-29 11:15:42 +00:00
Mark Shannon
f55273b3b7 GH-116968: Remove branch from advance_backoff_counter (GH-124469) 2024-10-07 11:46:33 +01:00
Tian Gao
5e0abb4788 gh-116750: Add clear_tool_id function to unregister events and callbacks (#124568) 2024-10-01 13:32:55 -04:00
Mark Shannon
7a65439b93 GH-122390: Replace _Py_GetbaseOpcode with _Py_GetBaseCodeUnit (GH-122942) 2024-08-13 14:22:57 +01:00
Tian Gao
57d7c3e78f gh-122247: Move instruction instrumentation sanity check after tracing check (#122251) 2024-08-07 21:30:14 -07:00
Ken Jin
b1b61dc4ce gh-117657: Fix some simple races in instrumentation.c (GH-120118)
* stop the world when setting local events
2024-06-13 17:31:21 +08:00
scoder
70b07aa415 gh-111997: Fix argument count for LINE event and clarify type of argument counts. (#119179) 2024-05-26 12:37:33 +00:00
Irit Katriel
c85e352673 gh-119431: fix refleak in test_monitoring (#119444) 2024-05-23 10:21:53 +01:00
Irit Katriel
6e9863d7a3 gh-118692: Avoid creating unnecessary StopIteration instances for monitoring (#119216) 2024-05-21 20:42:51 +00:00
Dino Viehland
00d913c671 gh-118415: Fix issues with local tracing being enabled/disabled on a function (#118496) 2024-05-06 13:06:09 -07:00
Irit Katriel
85af789961 gh-111997: C-API for signalling monitoring events (#116413) 2024-05-04 08:23:50 +00:00
Tian Gao
9c14ed0618 gh-107674: Improve performance of sys.settrace (GH-117133)
* Check tracing in RESUME_CHECK

* Only change to RESUME_CHECK if not tracing
2024-05-03 19:49:24 +01:00
Guido van Rossum
7d83f7bcc4 gh-118335: Configure Tier 2 interpreter at build time (#118339)
The code for Tier 2 is now only compiled when configured
with `--enable-experimental-jit[=yes|interpreter]`.

We drop support for `PYTHON_UOPS` and -`Xuops`,
but you can disable the interpreter or JIT
at runtime by setting `PYTHON_JIT=0`.
You can also build it without enabling it by default
using `--enable-experimental-jit=yes-off`;
enable with `PYTHON_JIT=1`.

On Windows, the `build.bat` script supports
`--experimental-jit`, `--experimental-jit-off`,
`--experimental-interpreter`.

In the C code, `_Py_JIT` is defined as before
when the JIT is enabled; the new variable
`_Py_TIER2` is defined when the JIT *or* the
interpreter is enabled. It is actually a bitmask:
1: JIT; 2: default-off; 4: interpreter.
2024-04-30 18:26:34 -07:00
Dino Viehland
4a1cf66c5c gh-117657: Fix small issues with instrumentation and TSAN (#118064)
Small TSAN fixups for instrumentation
2024-04-30 11:38:05 -07:00
Tian Gao
375c94c75d gh-107674: Lazy load line number to improve performance of tracing (GH-118127) 2024-04-29 09:54:52 +01:00
Dino Viehland
07525c9a85 gh-116818: Make sys.settrace, sys.setprofile, and monitoring thread-safe (#116775)
Makes sys.settrace, sys.setprofile, and monitoring generally thread-safe.

Mostly uses a stop-the-world approach and synchronization around the code object's _co_instrumentation_version.  There may be a little bit of extra synchronization around the monitoring data that's required to be TSAN clean.
2024-04-19 14:47:42 -07:00
Tian Gao
57183241af gh-107674: Remove some unnecessary code in instrumentation code (GH-117393) 2024-04-09 09:54:28 +01:00
Guido van Rossum
060a96f1a9 gh-116968: Reimplement Tier 2 counters (#117144)
Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``),
shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The
API used for adaptive specialization counters is changed but the behavior is
(supposed to be) identical.

The behavior of the Tier 2 counters is changed:
- There are no longer dynamic thresholds (we never varied these).
- All counters now use the same exponential backoff.
- The counter for ``JUMP_BACKWARD`` starts counting down from 16.
- The ``temperature`` in side exits starts counting down from 64.
2024-04-04 15:03:27 +00:00
Brett Simmers
0adfa8482d gh-115832: Fix instrumentation version mismatch during interpreter shutdown (#115856)
A previous commit introduced a bug to `interpreter_clear()`: it set
`interp->ceval.instrumentation_version` to 0, without making the corresponding
change to `tstate->eval_breaker` (which holds a thread-local copy of the
version). After this happens, Python code can still run due to object finalizers
during a GC, and the version check in bytecodes.c will see a different result
than the one in instrumentation.c causing an infinite loop.

The fix itself is straightforward: clear `tstate->eval_breaker` when clearing
`interp->ceval.instrumentation_version`.
2024-03-04 11:29:39 -05:00
Tian Gao
7895a61168 gh-116098: Revert "gh-107674: Improve performance of sys.settrace (GH-114986)" (GH-116178)
Revert "gh-107674: Improve performance of `sys.settrace` (GH-114986)"

This reverts commit 0a61e23700.
2024-03-01 07:46:33 +01:00
Tian Gao
0a61e23700 gh-107674: Improve performance of sys.settrace (GH-114986) 2024-02-28 15:21:42 +00:00
Michael Droettboom
b05afdd5ec gh-115168: Add pystats counter for invalidated executors (GH-115169) 2024-02-26 17:51:47 +00:00
Brett Simmers
0749244d13 gh-112175: Add eval_breaker to PyThreadState (#115194)
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.

The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
2024-02-20 09:57:48 -05:00
Mark Shannon
0ae60b66de GH-113486: Do not emit spurious PY_UNWIND events for optimized calls to classes. (GH-113680) 2024-01-05 09:45:22 +00:00
Tian Gao
e0afed7e27 gh-103615: Use local events for opcode tracing (GH-109472)
* Use local monitoring for opcode trace

* Remove f_opcode_trace_set

* Add test for setting f_trace_opcodes after settrace
2023-11-03 16:39:50 +00:00
Sam Gross
6dfb8fe023 gh-110481: Implement biased reference counting (gh-110764) 2023-10-30 16:06:09 +00:00
Irit Katriel
67a91f78e4 gh-109094: replace frame->prev_instr by frame->instr_ptr (#109095) 2023-10-26 13:43:10 +00:00
Mark Shannon
52e902ccf0 GH-109369: Add machinery for deoptimizing tier2 executors, both individually and globally. (GH-110384) 2023-10-23 14:49:09 +01:00
Mark Shannon
bf4bc36069 GH-109369: Merge all eval-breaker flags and monitoring version into one word. (GH-109846) 2023-10-04 16:09:48 +01:00
Tian Gao
d5611f2804 GH-107265: Add missing deoptimizations for ENTER_EXECUTOR's original opcode (GH-109420) 2023-09-22 14:13:31 -07:00
Tian Gao
412f5e85d6 gh-109371: Fix monitoring with instruction events set (gh-109385) 2023-09-18 23:30:08 +09:00
Brandt Bucher
22e65eecaa GH-105848: Replace KW_NAMES + CALL with LOAD_CONST + CALL_KW (GH-109300) 2023-09-13 10:25:45 -07:00
Guido van Rossum
bcce5e2718 gh-109039: Branch prediction for Tier 2 interpreter (#109038)
This adds a 16-bit inline cache entry to the conditional branch instructions POP_JUMP_IF_{FALSE,TRUE,NONE,NOT_NONE} and their instrumented variants, which is used to keep track of the branch direction.

Each time we encounter these instructions we shift the cache entry left by one and set the bottom bit to whether we jumped.

Then when it's time to translate such a branch to Tier 2 uops, we use the bit count from the cache entry to decided whether to continue translating the "didn't jump" branch or the "jumped" branch.

The counter is initialized to a pattern of alternating ones and zeros to avoid bias.

The .pyc file magic number is updated. There's a new test, some fixes for existing tests, and a few miscellaneous cleanups.
2023-09-11 18:20:24 +00:00
Mark Shannon
4a69301ea4 GH-108976. Keep monitoring data structures valid during de-optimization during callback. (GH-109131) 2023-09-11 14:37:09 +01:00
Irit Katriel
96396962ce gh-109094: remove unnecessary updates of frame->prev_instr in instrumentation functions (#109076) 2023-09-07 18:23:11 +01:00
Dong-hee Na
3bfa24e29f gh-107265: Remove all ENTER_EXECUTOR when execute _Py_Instrument (gh-108539) 2023-09-07 09:53:54 +09:00
Mark Shannon
5a2a046151 GH-108390: Prevent non-local events being set with sys.monitoring.set_local_events() (GH-108420) 2023-09-05 08:03:53 +01:00
Dong-hee Na
6cb48f0495 gh-107265: Fix initialize/remove_tools for ENTER_EXECUTOR case (gh-108482) 2023-08-27 12:31:29 +09:00
Irit Katriel
72119d16a5 gh-105481: remove regen-opcode. Generated _PyOpcode_Caches in regen-cases. (#108367) 2023-08-23 18:39:00 +01:00
Dong-hee Na
2135bcd3ca gh-107265: Ensure de_instrument does not handle ENTER_EXECUTOR (#108366) 2023-08-23 08:45:20 -07:00
Mark Shannon
006e44f950 GH-108035: Remove the _PyCFrame struct as it is no longer needed for performance. (GH-108036) 2023-08-17 11:16:03 +01:00
Irit Katriel
971a4c2751 gh-103082: remove assumption that INSTRUMENTED_LINE is the last instrumented opcode (#107978) 2023-08-15 16:40:05 +01:00