Building Four ML-Powered Language Applications with Bloom LLM

In this article, we explore BigScienceW's large language model called Bloom. You'll learn how to build ML-powered apps using Bloom and about hosting them on Gradio.
yuvraj
Created on July 31|Last edited on July 28
Comment
﻿Bloom is a Large Language Model (LLM) that more than 1000 researchers from HuggingFace, EleutherAI, and other 250+ institutions have built together. Researchers from over 70+ countries have come together under the umbrella of the BigScienceW community to build this LLM in an effort that is comparable in scale to scientific efforts put together at organizations like CERN.
In this article, we'll explore Bloom LLM so that you can build ML-powered apps and host them on Gradio. Here's what we'll be covering: 
Table of ContentsWhat Is Bloom LLM? Zero-Shot and Few-Shot LearnersZero-Shot Reasoners and Chain-of-ThoughtA Few Applications of LLMsBloom-Powered AppsConclusion
﻿
What Is Bloom LLM? When we say Bloom is a large model, we're not underselling it. Trained on a large corpus, it's remarkably similar in size to GPT-3 (176 billion parameters for Bloom, 175 million for GPT-3). Apart from its humongous size, Bloom has other notable features:
It is trained on 46 natural world languages and 13 programming languages
This multi-language approach gives Bloom a more inclusive worldview than GPT-3, OPT, or PALM﻿
The training corpus is of size 1.6 TB in pre-processed text, which was later converted into 350B unique tokens
It was trained continuously for 117 days (!)
The model is released with Big Science RAIL (or Responsible AI License), which means that if you fine-tune this model or otherwise use it, you will have to release your work under the same license. This license also prohibits using the Bloom model for certain things like generating text that could violate federal or state laws and publishing generated text without disclaiming that the text is machine-generated, using Bloom to harass or impersonate others, and so on.
If you want to get more ideas on Large Language Models, you should definitely check out this wonderful blog.﻿
Zero-Shot and Few-Shot LearnersLarge Language Models can be fine-tuned to new tasks very quickly (in fact, here's an example where we trained GPT-3 with Dr. Who synopses). For some of the tasks, these models can show good enough performance based on just a few examples. These examples have come to be called prompts to a language model, and formatting the examples as input is referred to as prompt engineering. 
Language generation based on prompts is a brilliant concept, and it can be done in two ways, mainly - Zero-Shot predictions and Few-Shot predictions. 
In Zero-shot predictions, you mainly pass prompts that give a task description to the LLM to generate text. For example, for zero-shot summarization, you can present a body of text to the LLM along with an instruction for it to follow, like 'In summary', or 'tldr:', or even 'To explain to a 5-year-old'.
In Few-Shot summarization, you can preset a few examples of text & their summary to an LLM. You can then present a text to the model and could expect the summary generated by the model. In other words, you give it a few examples vs. none.
Language generation based on prompts is a brilliant concept and it is so much simpler than fine-tuning. Let's look at some of the ways you can interact with an LLM based on this amazing paper from OpenAI and John Hopkins university.
Zero-shot 
The model predicts the answer when provided only a description of the task. No gradient updates are performed on the model. Example prompt -
Translate English to French: (This is the task description)
Cheese => (this is you prompting the LLM to complete the sentence)
💡
One-shot
In addition to task description, you provide the model with one example of what you are expecting it to produce. Example prompt -
Translate English to French: (Task description for the model)
Sea Otter => loutre de mer (One example for the model to learn from)
Cheese => (providing a prompt to LLM to follow the lead)
💡
Few-shot
On addition to task description, the model is provided with a few examples of the task. Example prompt - 
Translate English to French: (Task description for the model)
Sea Otter => loutre de mer (a few examples for the model to learn from)
Plush girafe => girafe poivree
Cheese => (providing a prompt to LLM to follow the lead)
💡
Zero-Shot Reasoners and Chain-of-ThoughtRecently, we've seen Large Language Models have immense and intuitive language generation power. This paper from the University of Tokyo and Google Brain team suggests that LLMs have fundamental zero-shot capabilities in high-level broad cognitive tasks and that these capabilities can be extracted by simple Chain-of-Thought (or CoT) prompting.
Another paper by the Google Brain team has further investigated the CoT prompting. They noted that by generating a chain of thought (or a series of intermediate reasoning steps),  LLMs significantly improve their ability to perform complex reasoning. Their experiments on three large language models have shown that chain-of-thought prompting improves performance on a range of arithmetic, common sense, and symbolic reasoning tasks.
This is probably best explained with an example. Let's look at one from the Google Brain team's paper on using Chain-of-Thought prompting to interact with an LLM:
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? 
A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.
Additionally: 
Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have
A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9.
From the overview of this paper:
Chain of thought reasoning allows models to decompose complex problems into intermediate steps that are solved individually. Moreover, the language-based nature of chain of thought makes it applicable to any task that a person could solve via language. We find through empirical experiments that chain of thought prompting can improve performance on various reasoning tasks, and that successful chain of thought reasoning is an emergent property of model scale.
A Few Applications of LLMsLLMs have kick-started a new range of AI-powered products. For example, GPT3 and GPT2 (both from OpenAI) have been used to produce coherent programming codes in GitHub Copilot and HuggingFace CodeParrot respectively. Copilot is now a living, breathing product built on top of an LLM’s capability of producing sensible code given the correct prompt.
As per OpenAI's website, there are hundreds of products and websites using their pioneering LLM GPT3 to build new features, mostly using few-shot and zero-shot capabilities. Examples include customer success ticket summarization NLP tools, improvement in search results by answering a search query with text instead of links, naturally conversing with virtual chatbots, etc. You can also refer to this webpage for very interesting use-cases of a large language model like GPT3. 
Bloom-Powered AppsNow, let's move into looking at some Bloom-powered applications, starting with a chain-of-thought reasoning app. We'll look at the implementation and code for all apps we're discussing below. 
App 1: Step By Step With BloomThis app explores Bloom for Chain-of-Thought reasoning or C-o-T prompts. You can append key phrases to the prompt, like 'Let's think step by step', or 'Let's solve the problem by splitting into steps', and so on.  
Token Authorization  - Using your HF token
Bloom provides Hosted inference API. Go over to the homepage on Hugging Face, select Deploy on the right-hand side top-corner and then select Accelerated Inference to create a new token, and click the 'Show API Token' checkbox. This will display your token listed in front of the headers field as 'Bearer **'. Copy the token and store it on a scratchpad or a notepad file temporarily.
Next, create a new Space and go to the 'settings' tab. Once there, create a new secure token (secret) by specifying a Name and Value pair. You will have to enter the literal HF_TOKEN in the Name field, while in the Value field, paste the above **** token you have on your scratchpad; lastly, press the 'Create Secret' button.
Using Bloom Accelerated Inference API for Python
Start using Bloom API for lightning-fast inference, as shown in below code snippet. 
##Bloom
API_URL = "https://api-inference.huggingface.co/models/bigscience/bloom"
HF_TOKEN = os.environ["HF_TOKEN"]
headers = {"Authorization": f"Bearer {HF_TOKEN}"}
def text_generate(problem, template, prompt):
  if len(problem) == 0 and len(template) == 0:
    p = prompt
  else:
    p = problem + "A: " + template #+ "\n"
  print(f"Final prompt is : {p}")
  json_ = {"inputs": p,
            "parameters":
            {
            "top_p": 0.9,
          "temperature": 1.1,
          "max_new_tokens": 64,
          "return_full_text": True
          }, "options": 
              {
              "use_cache": True,
              "wait_for_model":True
              },}
  response = requests.post(API_URL, headers=headers, json=json_)
  output = response.json()
  output_tmp = output[0]['generated_text']
  solution = output_tmp.split("\nQ:")[0]
  return solution 
Inference function
While making calls and receiving responses from the Inference API, you will have to cater to formatting JSON-style inputs and outputs. To learn more about formatting the inputs and outputs to API, please refer to the API Inference documentation on HuggingFace. The documentation is detailed and caters to all your possible ML use cases or tasks.  
Following is the code I have written to define my Inference function. If you are familiar with text generation using language models or if you have gone through HuggingFace documentation link from above, this code is self-explanatory.  
Understanding the Prompt
Prompt Engineering is a fancy and fun term coined for text-based instructions provided as inputs to an LLM. You can refer to this meme below (which Karpathy tweeted sometime back) to understand the impact of Prompt Engineering on Deep Learning world.
﻿
﻿https://twitter.com/karpathy/status/1273788774422441984/photo/1﻿
Prompt design expects a careful analysis of an LLM's behavior for any given problem. Prompting a large language model is a significant step in priming it to produce an expected response.
As general advice, while designing prompts for your LLM, you should first understand what makes the model produce correct zero-shot outputs. If you can provide a pattern in your prompt using a few input and desired output examples, then you might end up priming your large language model to produce similar results on all future prompts which have inputs configured in the same pattern. To understand more about effective Prompt Engineering, I would like to refer you to this wonderful Medium blog by Shubham Sahoo.
Gradio Interface and Model Reponses
What you'll see below: 
I am providing simple mathematical word problems as examples. These word problems will be displayed as different radio buttons.
Further, I am providing another set of prompts as radio buttons which can be suffixed to a word problem selected in the previous step.
I am also initiating a text box in which users of this app can write their own prompts following my given examples.
Lastly, I am creating a button with on-click event as a call to my inference function and receiving the outputs in a separate text box.
All this is done using Gradio's intuitive Blocks API, which allows me to decide on a very neat layout for all my components in the app.
You can play around with my App - Step By Step With Bloom embedded below. Please note that the app is using a public inference API provided by Hugging Face. LLMs are a very costly API service to run, thus, from time to time, HF might decide to curtail a number of request and response tokens, or they might degrade/upgrade hosted models depending on the traffic and other such crucial parameters during a given period. Given all this, the app's performance will go up and down from time to time, but it's free, so it's really not a big issue for us here. 
﻿
Run set1
﻿
App 2: Zero Shot SQL by BloomMoving on to the next app, we'll be using Bloom to create a SQL query based on your text prompts. Since token authorization, BLOOM Accelerated Inference API,  inference functions, and Gradio API are very similar; I won't be touching on these concepts for any of the remaining Apps. The key difference is in how the prompts are engineered, so let's understand that instead. 
Understanding the prompt
I am expecting the Bloom LLM to infer the pattern in my zero-shot prompts. In my experiments, as well as interpreting it through the general examples available on Twitter and Reddit posts, I have noticed that if given a small sensible heading for what to follow (e.g., Instruction, see below), an input question, and a prompt for expected output type (e.g., PostgreSQL query) - a large language model takes care of the rest. 
Again, you can play around with the App - Zero Shot SQL by Bloom embedded below. A 175 billion parameter LLM would sometime go into long-running, or you might have to wait in the serving queue. Please be patient with the API request in such scenarios. All the caveats stated in the above app are valid for all the apps discussed in this article.  
﻿
Run set1
﻿
App 3: Write Stories Using BloomThis Gradio App allows you to write quirky and creative stories based on an initial prompt. Please refer 'How this App works' section in the below-embedded app for more insights.   
Understanding the prompt
After providing the initial prompt as desired by the user, the app recursively feeds the last few generated tokens to the Bloom LLM to generate a coherent piece of text on every button click.
Please play around with the App - Write Stories Using Bloom embedded below. If your query is taking a long-time to run, please understand that the model is being served based on requests that are queued for its response at any given time. All the caveats stated in the above apps are applicable here as well.  
﻿
Run set1
﻿
App 4: Distracted Boyfriend Meme Using BloomWe live in the generation of memes. And one of the all-timers is the Distracted Boyfriend meme.
This Gradio app lets you produce infinite versions of the meme. Refer to the app description to understand how the app works and how you can use it. 
﻿
﻿
﻿
Run set1
﻿
ConclusionWe hope you enjoyed this explainer about using LLMs––specifically Bloom, in our case––to create language apps. We looked at everything from chain-of-thought to code writing to meme-ing and hope the breadth of applications from this single model showcases its power and impressive breadth of uses. Because, really, we just touched the surface of what these LLMs can do. 
﻿
Add a comment
Tags: Articles, Tutorial, Intermediate, NLP, Language Generation
Iterate on AI agents and models faster. Try Weights & Biases today.