Model module

Model.register_connector(name)

Register the connector model to ConnectorFactory.

Parameters:: name (str) – name of connector

Model.register_llm(name)

Register the llm model to LLMFactory.

Parameters:: name (str) – name of llm

Model.register_vision_tower(name)

Register the VisionTower model to VisionTowerFactory.

Parameters:: name (str) – name of VisionTower

Model.VisionTowerFactory(vision_tower_name)

Get VisionTower model according to vision_tower_name.

Returns: model

Parameters:: vision_tower_name (str) – name of VisionTower

Model.LLMFactory(model_name_or_path)

Get llm model according to model_name_or_path.

Returns: model

Parameters:: model_name_or_path (str) – name or path of llm

Model.ConnectorFactory(connector_name)

Get connector model according to connector_name

Returns: model

Parameters:: connector_name (str) – name of connector

class Model.VisionTower

Bases: object

Load the VisionTower model by vision_tower_name, extract the image feature.

init(cfg)

Initialize VisionTower model.

Parameters:: cfg (dict) – config

load_model(vision_tower_name)

Load VisionTower model.

Parameters:: vision_tower_name (str) – name of model

forward(x)

Extract the image feature.

Returns: image features

Return type: tensor

Parameters:: x (tensor) – input image

class Model.Connector

Bases: object

Load the Connector model and weights.

init(config)

Initialize connector model.

Parameters:: config (dict)

class Model.TinyLlavaPreTrainedModel

Bases: object

Create pretrained TinyLLava model using TinyLlavaConfig configuration.

Parameters:: TinyLlavaConfig (dict) – the configuration of TinyLLava model

class Model.TinyLlavaForConditionalGeneration

Bases: object

Inherit from TinyLlavaPreTrainedModel class, create pretrained TinyLLava model.

init(TinyLlavaConfig)

Initialize tokenizer, llm, Connector, VisionTower.

Parameters:: TinyLlavaConfig (dict) – the configuration of TinyLLava model

forward(input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, images, image_sizes, return_dict)

Call language_model.forward to get the probability of the next token in the vocabulary.

Returns: the probability of the next token in the vocabulary

Return type: Tuple or CausalLMOutputWithPast

Parameters:

input_ids (tensor) – tensor containing input token id
attention_mask (tensor) – tensor containing attention mask
position_ids (tensor) – tensor containing position id
past_key_values (list(tensor)) – list of key-value pairs from the previous time step
inputs_embeds (tensor) – tensor containing input embeddings
labels (tensor) – tensor containing labels
use_cache (bool) – bool for determining whether to use cache
output_attentions (bool) – bool for determining whether to output attention
output_hidden_states (bool) – bool for determining whether to hidden states
images (tensor) – input image
image_sizes (list) – the size of input image
return_dict (bool) – bool for determining whether to return in the form of dict

generate(inputs, images, image_sizes)

Call language_model.generate to generate answer

Returns: answer

Return type: Tuple or CausalLMOutputWithPast

Parameters:

inputs (tensor) – input token id
images (tensor) – input image
image_sizes (tensor) – the size of input image

encode_images(images)

Extract the image feature.

Returns: image features

Return type: tensor

Parameters:: images (tensor) – input image

prepare_inputs_for_generation(input_ids, past_key_values, inputs_embeds)

Prepare input token id and input image for generation.

Returns: a dict containing input token id and input image

Return type: dict

Parameters:: images (tensor) – input image

prepare_inputs_labels_for_multimodal(input_ids, position_ids, attention_mask, past_key_values, labels, images, image_sizes)

Prepare inputs containing text and image data for multimodal processing.

Returns:

position_ids (tensor) – the position id after processing
attention_mask (tensor) – the attention mask after processing
past_key_values (list(tensor)) – list of key-value pairs same as input
new_input_embeds (tensor) – the input embeddings after processing
new_labels (tensor) – the label after processing

Parameters:

input_ids (tensor) – tensor containing input token id
position_ids (tensor) – tensor containing position id
attention_mask (tensor) – tensor containing attention mask
past_key_values (list(tensor)) – list of key-value pairs from the previous time step
labels (tensor) – tensor containing labels
images (tensor) – input image
image_sizes (list) – the size of input image