Person Search With Natural Language Description : Person Description Model

  • 0

Person Description Model

-Person Search With Natural Language Description-

DISCRIPTION

ort_inputs = { "images": np.zeros([1, 3, 384, 128], dtype=np.float32), "txt": np.expand_dims(np.array(tokens["input_ids"], dtype=np.int64), axis=0), "attention_mask": np.ones([1,64], dtype=np.int64) }

ort_outs = ort_session.run(None, ort_inputs)

text_emb = ort_outs[1] # 1,2048

The example code above shows how to use this model. The ort_session is an ONNX inference session.

During inference, the input image is not needed, because the video to be searched must be pre-indexed. It can be simply an array of all zeros. The txt input field is the tokenized search query. The tokenizer is the same as the WangchanBERTa model. Since the model must be able to see the entire query text, attention_mask is all ones.

The output text_emb is the embedding (numpy array of size 1x2048) of the description text. It can be compared with the pre-calculated image embedding of each person in the video using approximate nearest neighbour search.

Leave a Comment:

Your email address will not be published. Required fields are marked *