Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2 | Amazon Web Services
In this post, we showcase fine-tuning a Llama 2 model using a Parameter-Efficient Fine-Tuning (PEFT) method and deploy the fine-tuned model on AWS Inferentia2. We use the AWS Neuron software…