With the current AI craze and it’s impact we want to use the opportunity to explore how to get started with playing around with some AI. In today’s article we will be exploring how to use the HuggingFace platform with Ubuntu. HuggingFace is a company known for its work in the field of Natural Language Processing (NLP). It has gained immense popularity due to its open-source library called “Transformers,” which provides state-of-the-art NLP models, architectures, and tools that are easy to use, train, and deploy. Ubuntu is one of the most popular distributions of Linux, known for its user-friendly interface and robust performance. When combined with the power of HuggingFace’s Transformers library, it becomes an ideal environment for diving into AI and NLP. In this guide, we’ll walk through setting up and using HuggingFace’s Transformers on an Ubuntu system.
Setting up your Ubuntu environment
Before we start with HuggingFace, let’s ensure our system is up-to-date:
sudo apt update
sudo apt upgrade
Installing Python and Pip
Although Ubuntu ships with Python, we need to ensure we have the latest version and the package manager pip:
sudo apt install python3 python3-pip
Installing HuggingFace’s Transformers and PyTorch
With Python set up, installing Transformers and its dependencies is straightforward:
pip3 install transformers
pip3 install torch # PyTorch, the backend for many HuggingFace models
Using Pre-trained Models
Using pre-trained models is straightforward. Here’s how to perform sentiment analysis using the BERT model:
from transformers import BertTokenizer, BertForSequenceClassification
from torch.nn.functional import softmax
import torch
# Load pre-trained model and tokenizer
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = BertForSequenceClassification.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)
# Tokenize input and get model predictions
text = "HuggingFace on Ubuntu is awesome!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs).logits
probs = softmax(outputs, dim=1) # Get probabilities
# Display results
predicted_class = torch.argmax(probs, dim=1).item()
print(f"Predicted sentiment: {predicted_class}")
Fine-tuning on Custom Datasets
Suppose we have a dataset custom_dataset
:
custom_dataset = [
("HuggingFace is great on Ubuntu!", 1),
("I don't like this setup.", 0),
...
]
Where 1
denotes positive sentiment and 0
negative.
Here’s how to fine-tune:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset, DataLoader
# Define a custom dataset
class CustomDataset(Dataset):
def __init__(self, data, tokenizer):
self.data = data
self.tokenizer = tokenizer
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
text, label = self.data[idx]
encoding = self.tokenizer(text, truncation=True, padding='max_length', max_length=512, return_tensors='pt')
return { 'input_ids': encoding['input_ids'].squeeze(), 'attention_mask': encoding['attention_mask'].squeeze(), 'labels': label }
# Prepare dataset
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
dataset = CustomDataset(custom_dataset, tokenizer)
train_loader = DataLoader(dataset, shuffle=True, batch_size=8)
# Load BERT
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
# Training setup
training_args = TrainingArguments(
output_dir='./results',
per_device_train_batch_size=8,
logging_dir='./logs',
logging_steps=10,
num_train_epochs=2,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
# Fine-tune!
trainer.train()
Conclusion
Ubuntu provides a solid foundation for anyone looking to delve into AI and NLP. With the ease of setting up and the power of HuggingFace’s Transformers, developers can seamlessly integrate AI into their applications. As always, this is just the beginning. Dive deeper into the documentation, explore various models, and make the most of the capabilities that Ubuntu and HuggingFace have to offer.
Discussion about this post