Add Support HTTP compression#802
Conversation
mfussenegger
left a comment
There was a problem hiding this comment.
- I think we can remove the
compress_serversetting and always set the header leaving it up to the server - given that in CrateDB compression also needs to be enabled with default being off. - I'd also remove the
compress_algorithmsetting - given that there is so far only gzip supported there's no point in the setting - I'd tend to merge
compress_client/compress_thresholdto a single setting:- number: Compressed if payload body > the number
- bool true: Always compressed
- bool false: Never compressed
Otherwise the interaction between enabled/threshold seems more ambiguous and also sort of redundant to have both separate.
Didn't look into the details yet
|
Thanks for your review!
Agree to remove it and always send
Agree to remove it if there is no plan to implement other algorithms.
Make sense to merge compress parameters. The type check needs to test for bool before int, otherwise |
mfussenegger
left a comment
There was a problem hiding this comment.
Left some minor suggestions, otherwise lgtm
| socket_tcp_keepintvl=None, | ||
| socket_tcp_keepcnt=None, | ||
| jwt_token=None, | ||
| compress=8192, |
There was a problem hiding this comment.
| compress=8192, | |
| compress: Union[int, bool] = 8192, |
I think it would be good if we started adding type annotations - at least for the public API
There was a problem hiding this comment.
I added types, I'd like to tackle all annotations in separate PR. I needed to add for _inactive_servers and server_pool because mypy were complaining.
| compress_enabled = self.compress is True or ( | ||
| not isinstance(self.compress, bool) and len(data) >= self.compress | ||
| ) |
There was a problem hiding this comment.
| compress_enabled = self.compress is True or ( | |
| not isinstance(self.compress, bool) and len(data) >= self.compress | |
| ) | |
| compress_enabled = self.compress is True or ( | |
| isinstance(self.compress, int) and len(data) >= self.compress | |
| ) |
Or might even make it stricter and fail if it's neither bool nor int.
There was a problem hiding this comment.
I added check in __init__ and will fail if it's not int or bool
Adds opt-in HTTP compression support to the crate-python driver to reduce bandwidth usage and improve performance for large queries.
Summary of the changes / Why this is an improvement
Accept-Encoding: gzip, deflateto CrateDB server when it's enabled.compress_thresholdbytes (default 8192)Small note on threshold default value:
I've changed
compress_thresholdto 8192 while implementing. I ran a payload-size sweep against both a local CrateDB instance and a CrateDB Cloud cluster (eu-west-1 free tier), measuring round-trip latency with and without gzip compression. I saw benefits of compression when latency included over 8kb on local and 1kb on cloud instances.Latency can vary based on my network and cluster settings/location. I decided to move on 8kb as a default value for threshold.
Checklist