Batch Inferencing¶
Info
The guide below is only meant for reference only and not meant to be followed verbatim. You may need to generate your own guide site if you require guidance specifically for your own project.
Some problem statements do not warrant the deployment of an API server but instead methods for conducting batched inferencing where a batch of data is provided to a script and it is able to churn out a set of predictions, perhaps exported to a file.
This template provides a Python script (src/batch_infer.py
) and a configuration file (conf/batch_infer.yaml
) for this purpose.
Let's first create some sample data on our local machine for us to conduct batch inferencing on:
mkdir -p data/batch-infer && cd $_
echo -n "Output1" > in1.txt
echo -n "Output2" > in2.txt
echo -n "Output3" > in3.txt
New-Item -ItemType Directory -Force -Path 'data/batch-infer'
$currentDirectory = Get-Location
Set-Location -Path 'data/batch-infer'
New-Item -ItemType File -Force -Name 'in1.txt' | Out-Null
Add-Content -Path 'in1.txt' -Value "Output1"
New-Item -ItemType File -Force -Name 'in2.txt' | Out-Null
Add-Content -Path 'in2.txt' -Value "Output2"
New-Item -ItemType File -Force -Name 'in3.txt' | Out-Null
Add-Content -Path 'in3.txt' -Value "Output3"
Set-Location -Path $currentDirectory
To execute the batch inferencing script locally:
# Navigate back to root directory
cd "$(git rev-parse --show-toplevel)"
conda activate project
python src/batch_infer.py
# Navigate back to root directory
Set-Location -Path (git rev-parse --show-toplevel)
conda activate project
python src/batch_infer.py
The script will log to the terminal the location of the .jsonl
file (batch-infer-res.jsonl
) containing predictions that look like such:
...
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in1.txt", "prediction": "Output1"}
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in2.txt", "prediction": "Output2"}
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in3.txt", "prediction": "Output3"}
...
The hydra.job.chdir=True
flag writes the .jsonl
file containing the predictions to a subdirectory within the outputs
folder. See here for more information on outputs generated by Hydra.