Prompting, Generative AI and much more …
Welcome back, lets discuss some important topics related to LLMs
If you haven’t checked out the basics of LLMs — Introduction to LLMs
Topics discussed in this page -
- Prompting and prompt engineering
- Generative configuration
- Generative AI project lifecycle
Prompting and prompt engineering -
The text that you feed into the model is called the prompt.
The art of generating text is known as interference.
The output text is known as completion.
The full amount of text or the memory that is available to use for the prompt is called the context window.
Revising the language in prompt or the way that it’s written several times to get the model to behave in the way that you want, this work to develop and improve the prompt is known as prompt engineering.

One powerful strategy to get the model to produce better outcomes is to include examples of the task that you want the model to carry out inside the prompt. Providing examples inside the context window is called in-context learning.
In-context learning ? what it means ? lets look with some examples -
Within the prompt, we can ask the model to classify the sentiment of a review. Classification of sentiment can be positive or negative. For example, we can ask -
Classify the review
“I love this movie”
Sentiment ?
It outputs — Positive, which is right. This method including your input data within the prompt, is called zero-shot interference.
Smaller models on the other hand, can struggle with this. Earlier smaller version of ChatGPT i.e. GPT-2, the model doesn’t follow the instruction. While it does generate text with some relation to the prompt, the model can’t figure out the details of the task and does not identify the sentiment. This is where providing an example within the prompt can improve performance. The inclusion of a single example is known as one-shot interference. For example -
Classify the review
“I love this movie”
Sentiment ?
…..
Classify the review
“I hate movies”
Sentiment ?
It outputs — Positive for the first and Negative to the later. Sometimes we will need to give multiple examples where one or two wouldn’t help model to analyse where the process of giving multiple examples is called few-shot interference.
Generally, if you find that your model isn’t performing well when, say, including five or six examples, you should try fine-tuning your model instead.
Generative Configuration
If you have used LLMs in playgrounds such as Hugging Face website or AWS, you might have been presented with controls like these to adjust how the LLM behaves. Each model exposes a set of configuration parameters that can influence the model’s output during interference.
Note:
These are different than the training parameters which are learned during training time. These configuration parameters are invoked at interference time and give you control over things like -
- Maximum number of tokens in the completion
- How creative the output is !
- top p and top k sampling techniques to help limit the random sampling
- Temperature value (which is applied within the final softmax layer of the model) to control the randomness of the model output. Higher the temperature — higher is the randomness and vice versa.

Generative AI project lifecycle

Scope —
The most important step in any project is to define the scope as accurately and narrowly as you can.
LLMs are capable of carrying out many tasks, but their abilities depend on the size and architecture of the model. Things to be considered while deciding the scope -
- What function will the LLM have in the specific application you build ?
- The area of data and capacity LLM should deal with i.e. should it carry out different tasks including long form text generation or with high degree of capability or is much specific to tasks like named entity recognition.
Select —
Once scope is decided, you should choose eiter to train your model from scratch or use existing model where you can train on specific tasks related to the scope you decide on the first step.
Adapt and Align model —
You need to access the performance and carry out additional training if needed for your application, prompt engineering can sometimes be enoght to get your model to perform well.
As model gets trained its important to ensure that they behave well and in a way that is aligned with human preferences. Use some metrics and benchmarks to determine how well your model is performing or how well it is aligned to your preferences.
This step can be highly iterative.
Application Integration —
Once evaluated in previous step, the application can be deloyed in your infrastructure and integrate it with your application.
An important step here is to optimize your model for deployment to ensure the best use of compute resources so that the application can provide best user experience.