Commit Graph

233 Commits

Author SHA1 Message Date
Stan Ulbrych
46f11b36ad gh-76007: Deprecate zlib.__version__ attribute (#140130) 2025-10-15 13:18:48 +02:00
Emma Smith
f262297d52 gh-139877: Use PyBytesWriter in pycore_blocks_output_buffer.h (#139976)
Previously, the _BlocksOutputBuffer code creates a list of bytes objects to handle the output data from compression libraries. This ends up being slow due to the output buffer code needing to copy each bytes element of the list into the final bytes object buffer at the end of compression.

The new PyBytesWriter API introduced in PEP 782 is an ergonomic and fast method of writing data into a buffer that will later turn into a bytes object. Benchmarks show that using the PyBytesWriter API is 10-30% faster for decompression across a variety of settings. The performance gains are greatest when the decompressor is very performant, such as for Zstandard (and likely zlib-ng). Otherwise the decompressor can bottleneck decompression and the gains are more modest, but still sizable (e.g. 10% faster for zlib)!

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-10-14 10:03:55 -07:00
Victor Stinner
7168e98c80 gh-129813, PEP 782: Use PyBytesWriter in lzma and zlib (#138832)
Replace PyBytes_FromStringAndSize(NULL, size) with the new public
PyBytesWriter API.
2025-09-13 19:27:04 +02:00
Victor Stinner
06b7891f12 gh-129813, PEP 782: Use Py_GetConstant(Py_CONSTANT_EMPTY_BYTES) (#138830)
Replace PyBytes_FromStringAndSize(NULL, 0) with
Py_GetConstant(Py_CONSTANT_EMPTY_BYTES). Py_GetConstant() cannot
fail.
2025-09-13 18:30:25 +02:00
Bénédikt Tran
4978bfca10 gh-116946: add Py_TPFLAGS_IMMUTABLETYPE to several internal types (#138582)
The following types are now immutable:

* `_curses_panel.panel`,
* `[posix,nt].ScandirIterator`, `[posix,nt].DirEntry` (exposed in `os.py`),
* `_remote_debugging.RemoteUnwinder`,
* `_tkinter.Tcl_Obj`, `_tkinter.tkapp`, `_tkinter.tktimertoken`,
* `zlib.Compress`, and `zlib.Decompress`.
2025-09-11 09:56:20 +02:00
Adam Turner
98b4cd6fe9 GH-135763: AC: Use `Py_ssize_t(allow_negative=False)` (#138394) 2025-09-02 21:29:05 +01:00
Bénédikt Tran
2a54acf3c3 gh-116946: fully implement GC protocol for zlib objects (#138290) 2025-09-01 10:24:23 +02:00
Adam Turner
bb75dec87f gh-95534: Convert `ZlibDecompressor.__new__` to AC (#137923) 2025-08-19 09:52:13 +01:00
Adam Turner
918e3ba6c0 GH-137623: Use an AC decorator for docstring line length enforcement (#137690) 2025-08-18 18:29:00 +01:00
Bénédikt Tran
737b4ba020 gh-134635: add zlib.{adler32,crc32}_combine to combine checksums (#134650) 2025-05-27 10:48:34 +02:00
Victor Stinner
34c1ea3109 gh-111178: Fix function signatures for multiple tests (#131496) 2025-03-20 12:27:03 +01:00
Steve Dower
63a638c43f gh-91349: Replace zlib with zlib-ng in Windows build (GH-131438) 2025-03-19 19:03:25 +00:00
Bénédikt Tran
1c9b020479 gh-111178: fix UBSan failures in Modules/zlibmodule.c (GH-128252) 2025-01-03 15:36:41 +01:00
Victor Stinner
6a39e96ab8 gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_BYTES) (#125195)
Replace PyBytes_FromString("") and PyBytes_FromStringAndSize("", 0)
with Py_GetConstant(Py_CONSTANT_EMPTY_BYTES).
2024-10-09 17:12:11 +02:00
Victor Stinner
12af8ec864 gh-121040: Use __attribute__((fallthrough)) (#121044)
Fix warnings when using -Wimplicit-fallthrough compiler flag.

Annotate explicitly "fall through" switch cases with a new
_Py_FALLTHROUGH macro which uses __attribute__((fallthrough)) if
available. Replace "fall through" comments with _Py_FALLTHROUGH.

Add _Py__has_attribute() macro. No longer define __has_attribute()
macro if it's not defined. Move also _Py__has_builtin() at the top
of pyport.h.

Co-Authored-By: Nikita Sobolev <mail@sobolevn.me>
2024-06-27 09:58:44 +00:00
Brett Simmers
c2627d6eea gh-116322: Add Py_mod_gil module slot (#116882)
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.

PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.

A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
2024-05-03 11:30:55 -04:00
Gregory P. Smith
4eddb4c9d9 gh-105967: Work around a macOS bug, limit zlib C library crc32 API calls to 1gig (#112615)
Work around a macOS bug, limit zlib crc32 calls to 1GiB.

Without this, `zlib.crc32` and `binascii.crc32` could produce incorrect
results on multi-gigabyte inputs depending on the macOS version's Apple
supplied zlib implementation.
2023-12-04 12:04:05 -08:00
Victor Stinner
21c0844742 gh-108220: Internal header files require Py_BUILD_CORE to be defined (#108221)
* pycore_intrinsics.h does nothing if included twice
  (add #ifndef and #define).
* Update Tools/cases_generator/generate_cases.py to generate the
  Py_BUILD_CORE test.
* _bz2, _lzma, _opcode and zlib extensions now define the
  Py_BUILD_CORE_MODULE macro to use internal headers
  (pycore_code.h, pycore_intrinsics.h and pycore_blocks_output_buffer.h).
2023-08-21 19:15:52 +02:00
shailshouryya
4b2e54bd3c gh-107279 Add <stddef.h> to Modules/zlibmodule.c to fix failing builds (#107280) 2023-07-27 12:26:39 +05:30
Victor Stinner
1a3faba9f1 gh-106869: Use new PyMemberDef constant names (#106871)
* Remove '#include "structmember.h"'.
* If needed, add <stddef.h> to get offsetof() function.
* Update Parser/asdl_c.py to regenerate Python/Python-ast.c.
* Replace:

  * T_SHORT => Py_T_SHORT
  * T_INT => Py_T_INT
  * T_LONG => Py_T_LONG
  * T_FLOAT => Py_T_FLOAT
  * T_DOUBLE => Py_T_DOUBLE
  * T_STRING => Py_T_STRING
  * T_OBJECT => _Py_T_OBJECT
  * T_CHAR => Py_T_CHAR
  * T_BYTE => Py_T_BYTE
  * T_UBYTE => Py_T_UBYTE
  * T_USHORT => Py_T_USHORT
  * T_UINT => Py_T_UINT
  * T_ULONG => Py_T_ULONG
  * T_STRING_INPLACE => Py_T_STRING_INPLACE
  * T_BOOL => Py_T_BOOL
  * T_OBJECT_EX => Py_T_OBJECT_EX
  * T_LONGLONG => Py_T_LONGLONG
  * T_ULONGLONG => Py_T_ULONGLONG
  * T_PYSSIZET => Py_T_PYSSIZET
  * T_NONE => _Py_T_NONE
  * READONLY => Py_READONLY
  * PY_AUDIT_READ => Py_AUDIT_READ
  * READ_RESTRICTED => Py_AUDIT_READ
  * PY_WRITE_RESTRICTED => _Py_WRITE_RESTRICTED
  * RESTRICTED => (READ_RESTRICTED | _Py_WRITE_RESTRICTED)
2023-07-25 15:28:30 +02:00
Serhiy Storchaka
329e4a1a3f gh-86493: Modernize modules initialization code (GH-106858)
Use PyModule_Add() or PyModule_AddObjectRef() instead of soft deprecated
PyModule_AddObject().
2023-07-25 14:34:49 +03:00
Inada Naoki
d5bd32fb48 gh-104922: remove PY_SSIZE_T_CLEAN (#106315) 2023-07-02 15:07:46 +09:00
chgnrdv
13b5d79090 Fix missing/incomplete NULL checks in multiple source files (#104564)
Co-authored-by: Oleg Iarygin <oleg@arhadthedev.net>
2023-05-23 14:01:17 -06:00
Eric Snow
a9c6e0618f gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205)
Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules.  We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
2023-05-05 21:11:27 +00:00
Ruben Vorderman
a89e6713c4 gh-101322: Ensure test_zlib.ZlibDecompressorTest runs, fix errors in ZlibDecompressor (#101323)
* Ensure test_zlib.ZlibDecompressorTest actually runs, fix errors in ZlibDecompressor.
2023-02-04 12:07:30 -08:00
Victor Stinner
65dd745f1a gh-99300: Use Py_NewRef() in Modules/ directory (#99473)
Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and
Py_XNewRef() in test C files of the Modules/ directory.
2022-11-14 16:21:40 +01:00
Eric Snow
73679b13ca gh-90110: Update the C-analyzer Tool (gh-99307) 2022-11-10 09:03:57 -07:00
Benjamin Peterson
0f156c1c56 Remove unused arrange_output_buffer function from zlibmodule.c. (GH-98358) 2022-10-17 09:38:34 -07:00
Ruben Vorderman
eae7dad402 gh-95534: Improve gzip reading speed by 10% (#97664)
Change summary:
+ There is now a `gzip.READ_BUFFER_SIZE` constant that is 128KB. Other programs that read in 128KB chunks: pigz and cat. So this seems best practice among good programs. Also it is faster than 8 kb chunks.
+ a zlib._ZlibDecompressor was added. This is the _bz2.BZ2Decompressor ported to zlib. Since the zlib.Decompress object is better for in-memory decompression, the _ZlibDecompressor is hidden. It only makes sense in file decompression, and that is already implemented now in the gzip library. No need to bother the users with this.
+ The ZlibDecompressor uses the older Cpython arrange_output_buffer functions, as those are faster and more appropriate for the use case. 
+ GzipFile.read has been optimized. There is no longer a `unconsumed_tail` member to write back to padded file. This is instead handled by the ZlibDecompressor itself, which has an internal buffer. `_add_read_data` has been inlined, as it was just two calls.

EDIT: While I am adding improvements anyway, I figured I could add another one-liner optimization now to the python -m gzip application. That read chunks in io.DEFAULT_BUFFER_SIZE previously, but has been updated now to use READ_BUFFER_SIZE chunks.
2022-10-16 19:10:58 -07:00
Gregory P. Smith
9d1c4d69db bpo-38256: Fix binascii.crc32() when inputs are 4+GiB (GH-32000)
When compiled with `USE_ZLIB_CRC32` defined (`configure` sets this on POSIX systems), `binascii.crc32(...)` failed to compute the correct value when the input data was >= 4GiB. Because the zlib crc32 API is limited to a 32-bit length.

This lines it up with the `zlib.crc32(...)` implementation that doesn't have that flaw.

**Performance:** This also adopts the same GIL releasing for larger inputs logic that `zlib.crc32` has, and causes the Windows build to always use zlib's crc32 instead of our slow C code as zlib is a required build dependency on Windows.
2022-03-20 12:28:15 -07:00
Ma Lin
b3f2d4c8ba bpo-47040: improve document of checksum functions (gh-31955)
Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier.
Also cleans up an unnecessary intermediate variable in the implementation.

Authored-By: Ma Lin / animalize
Co-authored-by: Gregory P. Smith <greg@krypto.org>
2022-03-19 14:42:04 -07:00
Ma Lin
7edb6270a7 bpo-41735: Fix thread lock in zlib.Decompress.flush() may go wrong (GH-29587)
* Fix thread lock in zlib.Decompress.flush() may go wrong

Getting `.unconsumed_tail` before acquiring the thread lock may mix up decompress state.
2021-11-26 16:18:17 -08:00
Christian Clauss
dd02a696e5 Fix typos in the Modules directory (GH-28761) 2021-10-07 01:34:42 -07:00
Mohamad Mansour
8f943ca257 [codemod] Fix non-matching bracket pairs (GH-28473)
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2021-09-22 01:09:00 +02:00
Ruben Vorderman
ea23e7820f bpo-43613: Faster implementation of gzip.compress and gzip.decompress (GH-27941)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2021-09-02 17:02:59 +02:00
Ma Lin
a9a69bb3ea bpo-41486: zlib uses an UINT32_MAX sliding window for the output buffer (GH-26143)
* zlib uses an UINT32_MAX sliding window for the output buffer

These funtions have an initial output buffer size parameter:
- zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)
- zlib.Decompress.flush([length])

If the initial size > UINT32_MAX, use an UINT32_MAX sliding window, instead of clamping to UINT32_MAX.
Speed up when (the initial size == the actual size).

This fixes a memory consumption and copying performance regression in earlier 3.10 beta releases if someone used an output buffer larger than 4GiB with zlib.decompress.

Reviewed-by: Gregory P. Smith
2021-07-04 18:10:44 -07:00
Ma Lin
251ffa9d2b bpo-41486: Fix initial buffer size can't > UINT32_MAX in zlib module (GH-25738)
* Fix initial buffer size can't > UINT32_MAX in zlib module

After commit f9bedb630e, in 64-bit build,
if the initial buffer size > UINT32_MAX, ValueError will be raised.

These two functions are affected:
1. zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)
2. zlib.Decompress.flush([length])

This commit re-allows the size > UINT32_MAX.

* adds curly braces per PEP 7.

* Renames `Buffer_*` to `OutputBuffer_*` for clarity
2021-04-30 16:32:49 -07:00
Erlend Egeberg Aasland
9746cda705 bpo-43916: Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to selected types (GH-25748)
Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types:

* _dbm.dbm
* _gdbm.gdbm
* _multibytecodec.MultibyteCodec
* _sre..SRE_Scanner
* _thread._localdummy
* _thread.lock
* _winapi.Overlapped
* array.arrayiterator
* functools.KeyWrapper
* functools._lru_list_elem
* pyexpat.xmlparser
* re.Match
* re.Pattern
* unicodedata.UCD
* zlib.Compress
* zlib.Decompress
2021-04-30 16:04:57 +02:00
Ma Lin
f9bedb630e bpo-41486: Faster bz2/lzma/zlib via new output buffering (GH-21740)
Faster bz2/lzma/zlib via new output buffering.
Also adds .readall() function to _compression.DecompressReader class
to take best advantage of this in the consume-all-output at once scenario.

Often a 5-20% speedup in common scenarios due to less data copying.

Contributed by Ma Lin.
2021-04-27 23:58:54 -07:00
Ma Lin
93f411838a Fix thread locks in zlib module may go wrong in rare case. (#22126)
Setting `next_in` before acquiring the thread lock may mix up compress/decompress state in other threads.
2021-04-27 10:37:11 +02:00
Victor Stinner
32bd68c839 bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)
No longer use deprecated aliases to functions:

* Replace PyObject_MALLOC() with PyObject_Malloc()
* Replace PyObject_REALLOC() with PyObject_Realloc()
* Replace PyObject_FREE() with PyObject_Free()
* Replace PyObject_Del() with PyObject_Free()
* Replace PyObject_DEL() with PyObject_Free()
2020-12-01 10:37:39 +01:00
Mohamed Koubaa
1aaa21ff81 bpo-1635741 port zlib module to multi-phase init (GH-21995)
Port the zlib extension module to multi-phase initialization (PEP 489).
2020-09-07 10:27:55 +02:00
Serhiy Storchaka
578c3955e0 bpo-37999: No longer use __int__ in implicit integer conversions. (GH-15636)
Only __index__ should be used to make integer conversions lossless.
2020-05-26 18:43:38 +03:00
Victor Stinner
4a21e57fe5 bpo-40268: Remove unused structmember.h includes (GH-19530)
If only offsetof() is needed: include stddef.h instead.

When structmember.h is used, add a comment explaining that
PyMemberDef is used.
2020-04-15 02:35:41 +02:00
Victor Stinner
62183b8d6d bpo-40268: Remove explicit pythread.h includes (#19529)
Remove explicit pythread.h includes: it is always included
by Python.h.
2020-04-15 02:04:42 +02:00
Hai Shi
f707d94af6 bpo-39968: Convert extension modules' macros of get_module_state() to inline functions (GH-19017) 2020-03-16 14:15:01 +01:00
Dino Viehland
a1ffad0719 bpo-38074: Make zlib extension module PEP-384 compatible (GH-15792)
Updated zlibmodule.c to be PEP 384 compliant.
2019-09-10 03:27:03 -07:00
Jeroen Demeyer
530f506ac9 bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async (GH-13464)
Automatically replace
tp_print -> tp_vectorcall_offset
tp_compare -> tp_as_async
tp_reserved -> tp_as_async
2019-05-30 19:13:39 -07:00
Serhiy Storchaka
6a44f6eef3 bpo-36048: Use __index__() instead of __int__() for implicit conversion if available. (GH-11952)
Deprecate using the __int__() method in implicit conversions of Python
numbers to C integers.
2019-02-25 17:57:58 +02:00
Alexey Izbyshev
3d4fabb2a4 bpo-35090: Fix potential division by zero in allocator wrappers (GH-10174)
* Fix potential division by zero in BZ2_Malloc()
* Avoid division by zero in PyLzma_Malloc()
* Avoid division by zero and integer overflow in PyZlib_Malloc()

Reported by Svace static analyzer.
2018-10-28 17:45:50 +01:00