Batch Inferencing¶
Some problem statements do not warrant the deployment of an API server but instead methods for conducting batched inferencing where a batch of data is provided to a script and it is able to churn out a set of predictions, perhaps exported to a file.
This template provides a Python script (src/batch_infer.py
) and a configuration file (conf/batch_infer.yaml
) for this purpose.
Let's first download some data on our local machine for us to conduct batch inferencing on:
mkdir -p data/batch-infer && cd $_
echo -n "Output1" > in1.txt
echo -n "Output2" > in2.txt
echo -n "Output3" > in3.txt
New-Item -ItemType Directory -Force -Path 'data/batch-infer'
$currentDirectory = Get-Location
Set-Location -Path 'data/batch-infer'
New-Item -ItemType File -Force -Name 'in1.txt' | Out-Null
Add-Content -Path 'in1.txt' -Value "Output1"
New-Item -ItemType File -Force -Name 'in2.txt' | Out-Null
Add-Content -Path 'in2.txt' -Value "Output2"
New-Item -ItemType File -Force -Name 'in3.txt' | Out-Null
Add-Content -Path 'in3.txt' -Value "Output3"
Set-Location -Path $currentDirectory
To execute the batch inferencing script locally:
# Navigate back to root directory
cd "$(git rev-parse --show-toplevel)"
conda activate {{cookiecutter.repo_name}}
python src/batch_infer.py
# Navigate back to root directory
Set-Location -Path (git rev-parse --show-toplevel)
conda activate {{cookiecutter.repo_name}}
python src/batch_infer.py
The script will log to the terminal the location of the .jsonl
file (batch-infer-res.jsonl
) containing predictions that look like such:
...
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in1.txt", "prediction": "Output1"}
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in2.txt", "prediction": "Output2"}
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in3.txt", "prediction": "Output3"}
...
The hydra.job.chdir=True
flag writes the .jsonl
file containing the predictions to a subdirectory within the outputs
folder. See here for more information on outputs generated by Hydra.