AIAS1/2_nlp_sdks/qa_natural_questions_sdk
2024-11-25 10:43:10 +08:00
..
models no message 2024-11-25 10:43:10 +08:00
src/main no message 2024-11-25 10:43:10 +08:00
pom.xml no message 2024-11-25 10:43:10 +08:00
qa_natural_questions_sdk.iml no message 2024-11-25 10:43:10 +08:00
README_CN.md no message 2024-11-25 10:43:10 +08:00
README_EN.md no message 2024-11-25 10:43:10 +08:00

Download the model and place it in the models directory

Natural Question Answering SDK [English]

The model is trained on Google's Natural Questions dataset (100k Google search query data and relevant passages sourced from Wikipedia). Google's Natural Questions (NQ) is a large-scale corpus for training and evaluating open-domain question answering systems. However, until recently, there were no large-scale publicly available datasets of naturally occurring questions (the kinds of questions people ask to seek information) and answers for training and evaluating question answering models. NQ is a large-scale corpus for training and evaluating open-domain question answering systems, which replicates the end-to-end process of how people find answers to questions.

img

SDK Features:

  • Query / passage [title, text] vector extraction
  • Similarity calculation

Running Example - QANaturalQuestionsExample

After running successfully, you should see the following information on the command line:

...
# Test sentence:
[INFO] - query: How many people live in London?
# Passage is a pair of data composed of <title, text>.
[INFO] - passage [title, text]: [London, London has 9,787,426 inhabitants at the 2011 census.]

# Vector dimension:
[INFO] - Vector dimension: 768

# Generating vectors:
[INFO] - query embeddings: [0.04629234, -0.33281654, ..., -0.22015738, -0.06693681]
[INFO] - passage[title, text] embeddings: [-0.015913313, -0.10886402, ..., 0.48449898, -0.32266212]

# Calculating similarity:
[INFO] - Similarity: 0.650292

Open source algorithm

1. Open source algorithms used by the SDK

2. How to export the model?

  • how_to_convert_your_model_to_torchscript

  • Exporting CPU models (PyTorch models are special, and CPU and GPU models are not interchangeable. Therefore, CPU and GPU models need to be exported separately)

  • device='cpu'

  • device='gpu'

  • export_model_natural_questions.py

from sentence_transformers import SentenceTransformer
import torch

# model = SentenceTransformer('stsb-distilbert-base', device='cpu')
model = SentenceTransformer('nq-distilbert-base-v1', device='cpu')
model.eval()
batch_size=1
max_seq_length=128
device = torch.device("cpu")
model.to(device)
input_ids = torch.zeros(batch_size, max_seq_length, dtype=torch.long).to(device)
input_type_ids = torch.zeros(batch_size, max_seq_length, dtype=torch.long).to(device)
input_mask = torch.zeros(batch_size, max_seq_length, dtype=torch.long).to(device)
# input_features = (input_ids, input_type_ids, input_mask)
input_features = {'input_ids': input_ids,'attention_mask': input_mask}

# traced_model = torch.jit.trace(model, example_inputs=input_features)
traced_model = torch.jit.trace(model, example_inputs=input_features,strict=False)
traced_model.save("models/nq-distilbert-base-v1/nq-distilbert-base-v1.pt")