Skip to content

Batch Inferencing

Some problem statements do not warrant the deployment of an API server but instead methods for conducting batched inferencing where a batch of data is provided to a script and it is able to churn out a set of predictions, perhaps exported to a file.

This template provides a Python script (src/batch_infer.py) and a configuration file (conf/batch_infer.yaml) for this purpose.

Let's first download some data on our local machine for us to conduct batch inferencing on:

mkdir -p data/batch-infer && cd $_
echo -n "Output1" > in1.txt
echo -n "Output2" > in2.txt
echo -n "Output3" > in3.txt
New-Item -ItemType Directory -Force -Path 'data/batch-infer'
$currentDirectory = Get-Location
Set-Location -Path 'data/batch-infer'

New-Item -ItemType File -Force -Name 'in1.txt' | Out-Null
Add-Content -Path 'in1.txt' -Value "Output1"

New-Item -ItemType File -Force -Name 'in2.txt' | Out-Null
Add-Content -Path 'in2.txt' -Value "Output2"

New-Item -ItemType File -Force -Name 'in3.txt' | Out-Null
Add-Content -Path 'in3.txt' -Value "Output3"

Set-Location -Path $currentDirectory

To execute the batch inferencing script locally:

# Navigate back to root directory
cd "$(git rev-parse --show-toplevel)"
conda activate {{cookiecutter.repo_name}}
python src/batch_infer.py
# Navigate back to root directory
Set-Location -Path (git rev-parse --show-toplevel)
conda activate {{cookiecutter.repo_name}}
python src/batch_infer.py

The script will log to the terminal the location of the .jsonl file (batch-infer-res.jsonl) containing predictions that look like such:

...
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in1.txt", "prediction": "Output1"}
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in2.txt", "prediction": "Output2"}
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in3.txt", "prediction": "Output3"}
...

The hydra.job.chdir=True flag writes the .jsonl file containing the predictions to a subdirectory within the outputs folder. See here for more information on outputs generated by Hydra.