This post describes the process of converting an encrypted TLT file (.etlt) to a TensorRT engine file (.engine) on a Jetson Nano device. While it is entirely possible to use an encrypted TLT file with DeepStream, it is recommended that you convert to a TensorRT engine to take advantage of environment and hardware specific optimizations. Further, whenever you upgrade CUDA or TensorRT (minor versions included), you must rebuild the engine to optimize for the updated libraries that will eventually use it. According to Nvidia, “Running an engine that was generated with a different version of TensorRT or CUDA is not supported and will cause unknown behavior that affects inference speed, accuracy, and stability, or it may fail to run altogether.”
This guide will provide the necessary steps to make generating and updating engine files super simple. We will be using Nvidia’s PeopleNet network, which utilizes ResNet34. Instructions on downloading the pre-built pruned version of the encrypted TLT for this model are given below.
Note: This process should work on any Jetson device, but we will use the Nano because it is the most restricted in terms of on-board graphics memory. The tool we will be using,
tlt-converter, uses TensorRT to convert a pre-built .etlt file and hence requires sufficient memory.
It is assumed that you are running TensorRT version 7.1.x and CUDA 10.2, which is the versions included in Jetpack 4.4 and 4.5 as of this writing.
More information can be found on Nvidia’s Deploying to Deepstream guide.
Step 1: Download TLT-CONVERTER
While there are technically two methods for retrieving and using
tlt-converter, this article will only focus on downloading the binary to be used directly on the Jetson Nano. The other method is to use
nvidia-docker, but it requires an
x86 host machine. If you prefer this method, which requires you to install TLT on your
x86 host, you can find more information here.
Update (Feb 26, 20201): It appears the link to download
tlt-converter is changed periodically by Nvidia. This was brought to my attention by a comment made by AnthonyJ. Unfortunately, that means the process will be made slightly more complicated, since you will need to navigate to Nvidia’s website to download the binary rather than using
wget. You can find the download link here under the subheading “Instructions for Jetson”. Just make sure you save the zip file to your home directory (
~/). You can either manually unzip it or use the unzip command below. Thanks again to AnthonyJ for pointing this out!
From Jetson Nano:
sudo apt-get update sudo apt-get install libssl-dev
# Only if you want to use the unzip utility sudo apt-get install unzip unzip tlt-converter.zip -d ~/tlt-converter
cd tlt-converter chmod +x tlt-converter sudo mv tlt-converter /usr/bin
This installs (or updates) the unzip utility, as well as the SSL library that we will require later on when we run
tlt-converter. It also downloads the zip file and extracts the Readme and
tlt-converter binary. Further, it adds execution permissions so you can run the binary, and places it in the user’s binary folder which should be in your PATH. To verify this folder is in your PATH, run
You should see a colon-separated list with various paths, and
/usr/bin should be one of them. At this point, you should be able to run the following command:
If everything was successful, you will get a print out similar to the following:
usage: tlt-converter [-h] [-v] [-e ENGINE_FILE_PATH] [-k ENCODE_KEY] [-c CACHE_FILE] [-o OUTPUTS] [-d INPUT_DIMENSIONS] [-b BATCH_SIZE] [-m MAX_BATCH_SIZE] [-w MAX_WORKSPACE_SIZE] [-t DATA_TYPE] [-i INPUT_ORDER] input_file ...
If you get an error complaining about TensorRT or otherwise, it is quite possible that you do not have the correct version of TensorRT installed. Please ensure that you are using TensorRT 7.1.x and CUDA 10.2
Step 2: Download the encrypted tlt file
We need a sample file to work with, so for now let’s download a pre-built and pruned version of ResNet34 PeopleNet. You do not need to have DeepStream installed to run the resulting engine, since you can run it as a standalone TensorRT engine, so we will not assume that you have DeepStream installed at this point. Download the pre-built .etlt file and store it in your home directory:
wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplenet/versions/pruned_v1.0/files/resnet34_peoplenet_pruned.etlt -O ~/resnet34_peoplenet_pruned.etlt
Note: If you do have DeepStream installed with the included samples, you can find much more information on the various pre-built encrypted TLT files in the README typically located at
Step 3: environment variables
Until now, all of the steps would work on just about every Jetson device without trouble. However, there are a few key differences that set the Nano apart from devices like the TX2 or Xavier. The following environment variables reflect the Nano’s limitations in memory and processing power. Feel free to tune them to your needs, but these are the bare-bones defaults for converting on a Nano:
export TRT_LIB_PATH=”/usr/lib/aarch64-linux-gnu” export TRT_INC_PATH=”/usr/include/aarch64-linux-gnu” export INPUT_DIMENSIONS=3,544,960 export ENCODE_KEY=tlt_encode export BATCH_SIZE=1 export ENGINE_FILE_PATH=resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine export INPUT_ORDER=nhwc export MAX_BATCH_SIZE=2 export OUTPUTS=output_bbox/BiasAdd,output_cov/Sigmoid export DATA_TYPE=fp16 export MAX_WORKSPACE_SIZE=1610612736 export MODEL_IN=resnet34_peoplenet_pruned.etlt
TRT_INC_PATH: Simply point to the directory where your TensorRT shared library files are located.
INPUT_DIMENSIONS: Specify the input to ResNet34, in order of channels, height, width
ENCODE_KEY: The key that was used to generate the .etlt file. In this case, it was provided by Nvidia.
BATCH_SIZE: Calibration is a step performed by the builder when deciding suitable scale factors for inference. We set it to 1 because of memory considerations on the Nano. The default is 8. This is only relevant for converting the .etlt file to a TensorRT engine.
ENGINE_FILE_PATH:The output file name. Note that we did not prepend anything to the filename, meaning it will be stored in the same location where you run
INPUT_ORDER: Specifies the order in which
INPUT_DIMENSIONSshould be inferred. I have a post on Nvidia’s developer forums because, quite frankly, the INPUT_ORDER seems counter-intuitive to be at the time of this writing. I will update the post once I learn more. Suffice to say,
nhwcworks for this example.
MAX_BATCH_SIZE: Specifies the maximum TesnorRT engine batch size with a default value of 16.
OUTPUTS: Specifies the outputs expected for each node.
DATA_TYPE: Specifies the data type used in the model.
MAX_WORKSPACE_SIZE: Specifies the maximum TensorRT workspace size. This is probably the most significant setting for the Jetson Nano. I tried various values here, from the default of
1<<31=2147483648=2Gband the value
1610612736=1.5Gbhappens to work the best. The default value does not provide enough memory for TensorRT to employ various “tactics” and results in a warning, but it still technically works. The 2Gb value results in an out-of-memory exception. I found that using 1.5Gb will result in a workspace size warning about 15% of the time, but more often then not, it is sufficient to let TensorRT do its thing without complaining.
MODEL_IN: Is simply the input .etlt file.
Step 4: Converting (and waiting)
With the environment variables setup, we should be good to go on converting the encrypted TLT file to a TensorRT engine. We will assume that you saved the
resnet34_peoplenet_pruned.etlt file in your home directory (
cd ~) and you have
exported all of the environment variables from above.
tlt-converter \ -d $INPUT_DIMENSIONS \ -k $ENCODE_KEY \ -b $BATCH_SIZE \ -e $ENGINE_FILE_PATH \ -i $INPUT_ORDER \ -m $MAX_BATCH_SIZE \ -o $OUTPUTS \ -t $DATA_TYPE \ -w $MAX_WORKSPACE_SIZE \ $MODEL_IN
This could take anywhere for 1-5 minutes, depending on whether or not TensorRT complains about maximum workspace memory. I found the resetting the Nano typically resulted in faster conversion times, most likely because the GPU memory is wiped and not taken up by other processes.
The result should be a new TensorRT engine file in your
/home/username directory. If you have DeepStream or some other TensorRT pipeline setup, feel free to try it out! Be sure to leave and questions or comments below and I will make my best attempt to reply. Make sure you subscribe to get additional quality content in the future! Otherwise, look for an upcoming post where I discuss configuring and running PeopleNet on the Nano!