Model module

Model.register_connector(name)

Register the connector model to ConnectorFactory.

Parameters:

name (str) – name of connector

Model.register_llm(name)

Register the llm model to LLMFactory.

Parameters:

name (str) – name of llm

Model.register_vision_tower(name)

Register the VisionTower model to VisionTowerFactory.

Parameters:

name (str) – name of VisionTower

Model.VisionTowerFactory(vision_tower_name)

Get VisionTower model according to vision_tower_name.

Returns: model

Parameters:

vision_tower_name (str) – name of VisionTower

Model.LLMFactory(model_name_or_path)

Get llm model according to model_name_or_path.

Returns: model

Parameters:

model_name_or_path (str) – name or path of llm

Model.ConnectorFactory(connector_name)

Get connector model according to connector_name

Returns: model

Parameters:

connector_name (str) – name of connector

class Model.VisionTower

Bases: object

Load the VisionTower model by vision_tower_name, extract the image feature.

init(cfg)

Initialize VisionTower model.

Parameters:

cfg (dict) – config

load_model(vision_tower_name)

Load VisionTower model.

Parameters:

vision_tower_name (str) – name of model

forward(x)

Extract the image feature.

Returns: image features

Return type: tensor

Parameters:

x (tensor) – input image

class Model.Connector

Bases: object

Load the Connector model and weights.

init(config)

Initialize connector model.

Parameters:

config (dict)

class Model.TinyLlavaPreTrainedModel

Bases: object

Create pretrained TinyLLava model using TinyLlavaConfig configuration.

Parameters:

TinyLlavaConfig (dict) – the configuration of TinyLLava model

class Model.TinyLlavaForConditionalGeneration

Bases: object

Inherit from TinyLlavaPreTrainedModel class, create pretrained TinyLLava model.

init(TinyLlavaConfig)

Initialize tokenizer, llm, Connector, VisionTower.

Parameters:

TinyLlavaConfig (dict) – the configuration of TinyLLava model

forward(input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, images, image_sizes, return_dict)

Call language_model.forward to get the probability of the next token in the vocabulary.

Returns: the probability of the next token in the vocabulary

Return type: Tuple or CausalLMOutputWithPast

Parameters:
  • input_ids (tensor) – tensor containing input token id

  • attention_mask (tensor) – tensor containing attention mask

  • position_ids (tensor) – tensor containing position id

  • past_key_values (list(tensor)) – list of key-value pairs from the previous time step

  • inputs_embeds (tensor) – tensor containing input embeddings

  • labels (tensor) – tensor containing labels

  • use_cache (bool) – bool for determining whether to use cache

  • output_attentions (bool) – bool for determining whether to output attention

  • output_hidden_states (bool) – bool for determining whether to hidden states

  • images (tensor) – input image

  • image_sizes (list) – the size of input image

  • return_dict (bool) – bool for determining whether to return in the form of dict

generate(inputs, images, image_sizes)

Call language_model.generate to generate answer

Returns: answer

Return type: Tuple or CausalLMOutputWithPast

Parameters:
  • inputs (tensor) – input token id

  • images (tensor) – input image

  • image_sizes (tensor) – the size of input image

encode_images(images)

Extract the image feature.

Returns: image features

Return type: tensor

Parameters:

images (tensor) – input image

prepare_inputs_for_generation(input_ids, past_key_values, inputs_embeds)

Prepare input token id and input image for generation.

Returns: a dict containing input token id and input image

Return type: dict

Parameters:

images (tensor) – input image

prepare_inputs_labels_for_multimodal(input_ids, position_ids, attention_mask, past_key_values, labels, images, image_sizes)

Prepare inputs containing text and image data for multimodal processing.

Returns:

  • position_ids (tensor) – the position id after processing

  • attention_mask (tensor) – the attention mask after processing

  • past_key_values (list(tensor)) – list of key-value pairs same as input

  • new_input_embeds (tensor) – the input embeddings after processing

  • new_labels (tensor) – the label after processing

Parameters:
  • input_ids (tensor) – tensor containing input token id

  • position_ids (tensor) – tensor containing position id

  • attention_mask (tensor) – tensor containing attention mask

  • past_key_values (list(tensor)) – list of key-value pairs from the previous time step

  • labels (tensor) – tensor containing labels

  • images (tensor) – input image

  • image_sizes (list) – the size of input image