botocore
is a fairly large Python package. Until version 1.31.x, it came in at close to 90 MB. Since version 1.32.1, which was released in November of 2023, this dropped to around 25 MB when the AWS developers started to compress files.
Specifically, they compress the service definitions in the package's data
folder, which are JSON files. botocore
carries around the full API specifications of all AWS services, whether you need them or not.
With these API definitions still making up the bulk of the botocore
package, it suggests itself to remove unused files from the data
folder to reduce the amount of disk spaced consumed.
This is particularly helpful when looking for ways to shrink the size of installation bundles, such as Lambda packages. (Note that AWS recommends to bundle botocore
and boto3
with your Lambda function code, even if they provide an up-to-date version in the Lambda runtime environment.)
However, we have to be careful not to remove files that our program does depend on. Here's a small utility script that detects the AWS services a Python program uses and prunes a botocore
package accordingly:
import argparse
import pathlib
import re
import shutil
PATTERN = re.compile(
"boto3\\.(?:client|resource)\\([\\\"'](\\w+)[\\\"'](?:\\)|\\s*,)"
)
def prune_botocore(
source_path: pathlib.Path,
botocore_path: pathlib.Path,
keep: list[str],
):
for file in source_path.rglob("*.py"):
print(f"Checking {file}")
with open(file, "r") as f:
for line in f:
if (
match := re.search(
PATTERN,
line,
)
) is not None:
keep.append(match.group(1))
keep = set(keep)
print("Keeping the following services:", sorted(keep))
for folder in botocore_path.rglob("data/*"):
if folder.is_dir() and folder.name not in keep:
shutil.rmtree(folder)
if __name__ == "__main__":
arg_parser = argparse.ArgumentParser()
arg_parser.add_argument(
"--source",
type=pathlib.Path,
help="Location of the main source code.",
)
arg_parser.add_argument(
"--botocore",
type=pathlib.Path,
help="Location of the botocore package.",
)
arg_parser.add_argument(
"--keep",
nargs="+",
help="Services to keep even if not referenced in source code.",
default=[],
)
args = arg_parser.parse_args()
prune_botocore(
source_path=args.source,
botocore_path=args.botocore,
keep=args.keep,
)
It finds all instances where a boto3
client or resource is instantiated using the following regular expression:
boto3\.(?:client|resource)\([\"'](\w+)[\"'](?:\)|\s*,)
Then, it removes all other services from the data
folder.
It's also possible to specify which services should be kept beyond those found in the code. For example, in the case of a Lambda function that uses the X-Ray wrapper of Powertools for AWS Lambda, we have to keep the xray
client even though we don't reference it directly in our code:
python prune_botocore.py --source package/my-function-with-tracing \
--botocore package/botocore \
--keep xray