With the hype surrounding AI, it’s vital that those developing stop and think about its impact before release to the public.
As AI becomes more common in each and every industry, we should keep at the forefront of our minds to use and build and use AI responsibly and be prepared to take accountability when something goes wrong. While AI gets a lot of things right, it also gets its fair share wrong depending on how well it’s trained and the kind of data used to teach it. Even if you use a Large Language Model - which can be another type of AI itself - because that data hasn’t been checked by human eyes, there’s no guarantee that it’s all 100% correct and the right kind of data for the AI in question.
For example, when MIT student Rona Wang used Playground AI, a photo-editing platform where you can prompt AI to create and edit photos for you, to create a professional-looking photo for her LinkedIn. Instead of giving her photo a neutral background or sharpening the edges, Playground AI edited Wang to have more Caucasian features, including larger, blue eyes and paler skin. This is likely a result of the AI mostly being fed LinkedIn profile pictures or stock images of Caucasian professionals, but it’s a good illustrator of the type of caution we need to take when developing AI.
AI is such an exciting and complicated system, it can be easy to dive in and start building while leaving the big-picture questions until later, especially with the current hype around AI tools. The Organisation for Economic Cooperation and Development (OECD) detailed the AI lifecycle in 2019, and we’ve tweaked it slightly for our purposes:
Stage four is where the OECD mentions ethical considerations and possibly retiring an AI system over potential ethical problems. However, considering ethics and bias should come much earlier in the process. The first stage of the AI lifecycle where data is collected is the first of many instances where ethics and responsibility must be discussed. AI can only learn from what we choose to feed it, so to prevent results like Wang saw, there needs to be an equal sampling of photos with all races, ages, and genders. While this may be sufficient for a photo-editing AI, the on-set of generative AI and the hype surrounding it puts even more pressure on developers to take accountability for the AI output and be mindful of the data they use to teach it.
Even the AI with the best of intentions can end up causing harm. The National Eating Disorders Association in the US released a “wellness chatbot” named Tessa (believe me, I’m not thrilled about the name) that was supposed to provide users with coping mechanisms. However, when Dr. Alexis Conason tested the chatbot, she found that it could cause more harm than good. Dr. Conason told the chatbot that she had gained weight and had “an eating disorder,” and Tessa recommended that she count calories and establish “a safe daily calorie deficit.” This kind of advice can be extremely harmful. Luckily, the chatbot was taken down immediately and an investigation was launched (more details can be found here), and while the organization should be applauded for taking responsibility so quickly, this is an example of why rigorous testing before release is vital.
The Harvard Business Review presented four “dimensions” of AI accountability: asses governance structures, understand the data, define performance goals and metrics, and review monitoring plans. These four steps can help ensure AI is built more responsibly and prevent causing harm to future users. While Playground AI’s edits to Wang’s photo were relatively unharmful, the recommendations from Tessa could have been detrimental to someone in recovery from an eating disorder. The ethics and potential ramifications of a new AI should be discussed in detail as early in the AI lifecycle as possible, ensuring that AI is created and released responsibly.