Skip to Main Content

Evaluating Generative AI Tools

Evaluation strategies

Below is a list of questions to consider when evaluating a generative AI tool.  

Is the tool a good match for the task?  

This one may seem obvious, but since different AI tools work best for different kinds of tasks, the first thing you’ll want to consider is whether the AI tool you’re thinking of using is a good match for the task you’re trying to complete. If you need to look up accurate information, a large language model is not the best tool for that. 

How easy is the tool to use?  

Many generative AI tools are made to be intuitive, but sometimes it can take a little work to get the best out of them. Taking time to do a little research to find out, for example, the best way to word a prompt for a chatbot can end up saving you a lot of time in the long run.  

What is the quality of the tool’s output? 

This question can be tricky to answer, especially when it comes to chatbots. Any response a chatbot gives you will be clearly written and sound authoritative but that doesn’t mean that it is. Similarly, image and video generators can produce images that look great to a casual viewer but on closer inspection might have noticeable flaws. Doing a little research ahead of time or playing around with an AI tool to get a better sense of the quality of the information it generates can be helpful.   

What data was the tool trained on?  

This can also be difficult to answer since many AI companies are not transparent about the data that they use to train their tools. However, finding out what you can about the data the tool was trained on can help you learn a lot about both the tool’s capabilities as well as its limitations. For example, if a chatbot was only trained on data published up to a few years ago, there may be some important gaps in its “knowledge.” Similarly, image and video generators can reflect the biases of the data that they were trained on.

What are the ethical considerations of using this tool?  

The ethical considerations of using generative AI are going to depend a lot on the context in which you are using it. Using a generative AI tool to create a grocery list or exercise plan for your own personal use is likely just fine, as long as you’re okay with the user agreement attached to that tool. But using it to complete an assignment in a class where you know it’s against the rules or to complete a work project when your boss has asked you not to could be a problem. Inputting your own work into an AI tool to have it “clean up” the language and grammar is okay, too. But inputting someone else’s work without their permission and asking it to analyze that work in some way is less so because of the murky issues related to generative AI and copyright. These are just a few examples of the ethical considerations you’ll want to keep in mind when you use generative AI.  

What support is available for users? 

You don’t want to wait until you encounter a problem with an AI tool to find out what support the company that created the tool offers to its users. Find out ahead of time what support options you would have and find out how responsive the company is to reports of issues with the tool. 

The ROBOT Test

The ROBOT Test:

Reliability

  • How reliable is the information available about the AI technology?
  • If it’s not produced by the party responsible for the AI, what are the author’s credentials? Bias?
  • If it is produced by the party responsible for the AI, how much information are they making available? 
    • Is information only partially available due to trade secrets?
    • How biased is they information that they produce?

Objectivity

  • What is the goal or objective of the use of AI?
  • What is the goal of sharing information about it?
    • To inform?
    • To convince?
    • To find financial support?

Bias

  • What could create bias in the AI technology?
  • Are there ethical issues associated with this?
  • Are bias or ethical issues acknowledged?
    • By the source of information?
    • By the party responsible for the AI?
    • By its users?

Ownership

  • Who is the owner or developer of the AI technology?
  • Who is responsible for it?
    • Is it a private company?
    • The government?
    • A think tank or research group?
  • Who has access to it?
  • Who can use it?

Type

  • Which subtype of AI is it?
  • Is the technology theoretical or applied?
  • What kind of information system does it rely on?
  • Does it rely on human intervention? 

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

To cite in APA: Hervieux, S. & Wheatley, A. (2020). The ROBOT test [Evaluation tool]. The LibrAIry. https://thelibrairy.wordpress.com/2020/03/11/the-robot-test