Batch Inferencing¶
Info
The guide below is only meant for reference only and not meant to be followed verbatim. You may need to generate your own guide site if you require guidance specifically for your own project.
Some problem statements do not warrant the deployment of an API server but instead methods for conducting batched inferencing where a batch of data is provided to a script and it is able to churn out a set of predictions, perhaps exported to a file.
This template provides a Python script (src/batch_infer.py
) and a configuration file (conf/batch_infer.yaml
) for this purpose.
Let's first create some sample data on our local machine for us to conduct batch inferencing on:
docker run --rm \
-u $(id -u):$(id -g) \
-w /batch-infer \
-v ./data/batch-infer:/batch-infer \
alpine \
bash -c "echo -n 'Output1' > in1.txt && \
echo -n 'Output2' > in2.txt && \
echo -n 'Output3' > in3.txt"
docker run --rm \
-w /batch-infer \
-v ./data/batch-infer:/batch-infer \
alpine \
bash -c "echo -n 'Output1' > in1.txt && \
echo -n 'Output2' > in2.txt && \
echo -n 'Output3' > in3.txt"
docker run --rm `
-w /batch-infer `
-v .\data\batch-infer:/batch-infer `
alpine `
bash -c "echo -n 'Output1' > in1.txt && `
echo -n 'Output2' > in2.txt && `
echo -n 'Output3' > in3.txt"
To execute the batch inferencing script using the Docker image:
docker run --rm \
-v ./data:/home/aisg/project/data \
-w /home/aisg/project \
registry.aisingapore.net/project-path/gpu:0.1.0 \
python src/batch_infer.py output_path=data/batch-infer/batch_infer_res.jsonl
sudo chmod $(id -u):$(id -g) data/batch-infer/batch_infer_res.jsonl
docker run --rm \
-v ./data:/home/aisg/project/data \
-w /home/aisg/project \
registry.aisingapore.net/project-path/gpu:0.1.0 \
python src/batch_infer.py output_path=data/batch-infer/batch_infer_res.jsonl
docker run --rm `
-v .\data:/home/aisg/project/data `
-w /home/aisg/project `
registry.aisingapore.net/project-path/gpu:0.1.0 `
python src/batch_infer.py output_path=data/batch-infer/batch_infer_res.jsonl
The script will log to the terminal the location of the .jsonl
file (batch-infer-res.jsonl
) containing predictions that look like such:
...
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in1.txt", "prediction": "Output1"}
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in2.txt", "prediction": "Output2"}
{"time": "2024-02-29T10:09:00+0000", "text_filepath": "./data/batch-infer/in3.txt", "prediction": "Output3"}
...
The hydra.job.chdir=True
flag writes the .jsonl
file containing the predictions to a subdirectory within the outputs
folder. See here for more information on outputs generated by Hydra.