While AI has steadily worked its way into the enterprise and business vernacular over many years, gen AI has not only become an abrupt and immediate force unto itself, but also an overarching AI accelerant. Not without warning signs, however.
Gen AI has the potential to magnify existing risks around data privacy laws that govern how sensitive data is collected, used, shared, and stored. It can also expose companies to future legislation. In response, although late out of the blocks, is greater scrutiny.
Europe, for instance, continues to get up to speed with its AI Act, which now addresses gen AI despite the Act first proposed before the advent of gen AI. Then there are the lawsuits. Several gen AI vendors, including OpenAI, Microsoft, Midjourney, Stable Diffusion, and others, have had suits filed against them. These complaints, filed by a variety of different copyright holders, allege the companies of training their AIs on copyrighted data—images, code, and text.
There’s also congressional hearings and petitions to pause AI development, inclusive of gen AI. Any of these could potentially put pressure on regulators or legislators to put limits on its use.
Even individual cities are getting in on the action. In July, for example, New York City started enforcing new rules about the use of AI in hiring decisions. These rules require automated decision-making tools to undergo bias audits, and job candidates notified about their use. Similar rules are under consideration in New Jersey, Maryland, Illinois, and California.
“This is a very hot topic,” says Eric Vandevelde, AI co-chair and partner at the law firm Gibson, Dunn Crutcher. “We’re getting bombarded with questions and inquiries from clients and potential clients about the risks of AI.”
It’s no surprise, then, that according to a June KPMG survey, uncertainty about the regulatory environment was the top barrier to implementing gen AI. In fact, 77% of CEOs of large companies said regulatory uncertainty impacts their gen AI deployment decisions, and 41% say they’re taking a short pause of three to six months in order to monitor the regulatory landscape.
So here are some of the strategies organisations are using to deploy gen AI in the face of regulatory uncertainty.
The slower road to AI
Some companies, particularly those in regulated industries, are being cautious about their use of gen AI and are only deploying it in areas with the least risk.
“I’ve actually been approached by a company that will upload all our clients’ medical records and bills and formulate demand letters,” says Robert Fakhouri, founder of The Fakhouri Firm, a Chicago-based personal injury law firm. The idea is that by generating the letters using AI, there will be less need for human employees.
“I chose not to get into that,” he says. “I have enough fears about the fact we’re storing medical information. I’m not going to upload this information to another service. The risk is too high.”
The company also prohibits staff from using ChatGPT to write letters to clients. But there’s one low-risk use case where gen AI is allowed, he says. “When it comes to ChatGPT, the only utilisation in my practice is the way we go about creating our marketing strategy on social media—getting ideas, generating scripts, seeing what it can provide us as inspiration for new content. But I’d like to see more legislation and guidance in place, especially for medical records.”
Many enterprises are deploying AI in lower-risk use cases first, says Kjell Carlsson, head of data science strategy and evangelism at Domino Data Lab.
“Most companies I’m speaking to are augmenting internal users,” he says. “If I’m an energy company, I want to make it possible for folks to leverage geologic surveys and reports that are miserable to go through.”
With AI, their users can get extremely smart research assistants.
“Now I’ve got summarisation capabilities, access to the world’s best research librarian, and a first-draft text generator for a lot of things I want to do,” he says.
In traditional application development, enterprises have to be careful that end users aren’t allowed access to data they don’t have permission to see. For example, in an HR application, an employee might be allowed to see their own salary information and benefits, but not that of other employees. If such a tool is augmented or replaced by an HR chatbot powered by gen AI, then it will need to have access to the employee database so it can answer user questions. But how can a company be sure the AI doesn’t tell everything it knows to anyone who asks?
This is particularly important for customer-facing chatbots that might have to answer questions about customers’ financial transactions or medical records. Protecting access to sensitive data is just one part of the data governance picture.
“You need to know where the data’s coming from, how it’s transformed, and what the outputs are,” says Nick Amabile, CEO at DAS42, a data consulting firm. “Companies in general are still having problems with data governance.”
And with large language models (LLM), data governance is in its infancy.
“We’re still in the pilot phases of evaluating LLMs,” he says. “Some vendors have started to talk about how they’re going to add governance features to their platforms. Retraining, deployment, operations, testing—a lot of these features just aren’t available yet.”
As companies mature in their understanding and use of gen AI, they’ll have to put safeguards in place, says Juan Orlandini, CTO, North America at Insight, a Tempe-based solution integrator. That can include learning how to verify that correct controls are in place, models are isolated, and they’re appropriately used, he says.
“When we created our own gen AI policy, we stood up our own instance of ChatGPT and deployed it to all 14,000 teammates globally,” he says. Insight used the Azure OpenAI Service to do this.
The company is also training its employees about how to use AI safely, especially tools not yet vetted and approved for secure use. For example, employees should treat these tools like they would any social media platform, where anyone could potentially see what you post.
“Would you put your client’s sales forecast into Facebook? Probably not,” Orlandini says.
Layers of control
There’s no guarantee a gen AI model won’t produce biased or dangerous results. The ways these models are designed is to create new material and the same request can produce a different result every time. This is very different from traditional software, where a particular set of inputs would result in a predictable set of outputs.
“Testing will only show the presence of errors, not the absence,” says Martin Fix, technology director at Star, a technology consulting company. “AI is a black box. All you have are statistical methods to observe the output and measure it, and it’s not possible to test the whole area of capability of AI.”
That’s because users can enter any prompt they can imagine into an LLM, and researchers have been finding new ways to trick AIs into performing objectionable actions for months, a process known as “jailbreaking” the AIs.
Some companies are also looking at using other AIs to test results for risky outputs, or use data loss prevention and other security tools to prevent users from putting sensitive data into prompts in the first place.
“You can reduce the risks by combining different technologies, creating layers of safety and security,” says Fix.
This is going to be especially important if an AI is running inside a company and has access to large swathes of corporate data.
“If an AI has access to all of it, it can disclose all of it,” he says. “So you have to be much more thorough in the security of the system and put in as many layers as necessary.”
The open source approach
Commercial AI systems, like OpenAI’s ChatGPT, are like the black boxes Fix describes: enterprises have little insight into the training data that goes into them, how they’re fine tuned, what information goes into ongoing training, how the AI actually makes its decisions, and exactly how all the data involved is secured. In highly regulated industries in particular, some enterprises may be reluctant to take a risk with these opaque systems.
One option, however, is to use open source software. There are a number of models, of various licenses, currently available to the public. In July, this list was significantly expanded when Meta released Llama 2, an enterprise-grade LLM available in three different sizes, commercial use allowed, and completely free to enterprises—at least, for applications with fewer than 700 million monthly active users.
Enterprises can download, install, fine-tune and run Llama 2 themselves, in either its original form or one of its many variations, or use third-party AI systems based on Llama 2.
For example, patient health company Aiberry uses customised open-source models, including Flan-T5, Llama 2, and Vicuna, says Michael Mullarkey, the company’s senior clinical data scientist.
The models run within Aiberry’s secure data infrastructure, he says, and are fine-tuned to perform in a way that meets the company’s needs. “This seems to be working well,” he says.
Aiberry has a data set it uses for training, testing, and validating these models, which try to anticipate what clinicians need and provide information up front based on assessments of patient screening information.
“For other parts of our workflows that don’t involve sensitive data, we use ChatGPT, Claude, and other commercial models,” he adds.
Running open source software on-prem or in private clouds can help reduce risks, such as that of data loss, and can help companies comply with data sovereignty and privacy regulations. But open source software carries its own risks as well, especially as the number of AI projects multiply on the open source repositories.
That includes cybersecurity risks. In some regulated industries, companies have to be careful about the open source code they run in their systems, which can lead to data breaches, privacy violations, or the biased or discriminatory decisions that can create regulatory liabilities.
According to the Synopsys open source security report released in February, 84% of open source codebases in general contain at least one vulnerability.
“Open source code or apps have been exploited to cause a lot of damage,” says Alla Valente, an analyst at Forrester Research.
For example, the Log4Shell vulnerability, patched in late 2021, was still seeing half a million attack requests per day at the end of 2022.
In addition to vulnerabilities, open source code can also contain malicious code and backdoors, and open source AI models could potentially be trained or fine-tuned on poisoned data sets.
“If you’re an enterprise, you know better than just taking something you found in open source and plugging it into your systems without any kind of guardrails,” says Valente.
Enterprises will need to set up controls for AI models similar to those they already have for other software projects, and information security and compliance teams need to be aware of what data science teams are doing.
In addition to the security risks, companies also have to be careful about the sourcing of the training data for the models, Valente adds. “How was this data obtained? Was it legal and ethical?” One place companies can look to for guidance is the letter the FTC sent to OpenAI this summer.
According to a report in the Washington Post, the letter asks OpenAI to explain how they source the training data for their LLMs, vet the data, and test whether the models generate false, misleading, or disparaging statements, or generate accurate, personally identifiable information about individuals.
In the absence of any federally-mandated frameworks, this letter gives companies a place to start, Valente says. “And it definitely foreshadows what’s to come if there’s federal regulation.”
If an AI tool is used to draft a letter about a customer’s financial records or medical history, the prompt request containing this sensitive information will be sent to an AI for processing. With a public chatbot like ChatGPT or Bard, it’s impossible for a company to know where exactly this request will be processed, potentially running afoul of national data residency requirements.
Enterprises already have several ways to deal with the problem, says Nick Amabile, CEO at DAS42, a data consulting firm that helps companies with data residency issues.
“We’re actually seeing a lot of trusted enterprise vendors enter the space,” he says. “Instead of bringing the data to the AI, we’re bringing AI to the data.”
And cloud providers like AWS and Azure have long offered geographically-based infrastructure to their users. Microsoft’s Azure OpenAI service, for example, allows customers to store data in the data source and location they designate, with no data copied into the Azure OpenAI service itself. Data vendors like Snowflake and Databricks, which historically have focused on helping companies with the privacy, residency, and other compliance implications of data management, are also getting into the gen AI space.
“We’re seeing a lot of vendors offering this on top of their platform,” says Amabile.
Some vendors, understanding that companies are wary of risky AI models, are offering indemnification.
For example, image gen AIs, which have been popular for a few months longer than language models, have been accused of violating copyrights in their training data.
While the lawsuits are playing out in courts, Adobe, Shutterstock, and other enterprise-friendly platforms have been deploying AIs trained only on fully-licensed data, or data in the public domain.
In addition, in June, Adobe announced it would indemnify enterprises for content generated by AI, allowing them to deploy it confidently across their organisation.
Other enterprise vendors, including Snowflake and Databricks, also offer various degrees of indemnification to their customers. In its terms of service, for example, Snowflake promises to defend its customers against any third-party claims of services infringing on any intellectual property right of such third party.
“The existing vendors I’m working with today, like Snowflake and Databricks, are offering protection to their customers,” says Amabile. When he buys his AI models through his existing contracts with those vendors, all the same indemnification provisions are in place.
“That’s really a benefit to the enterprise,” he says. “And a benefit of working with some of the established vendors.”
According to Gibson, Dunn Crutcher’s Vandevelde, AI requires top-level attention.
“This is not just a CIO problem or a chief privacy officer problem,” he says. “This is a whole-company issue that needs to be grappled with from the board down.”
This is the same trajectory that cybersecurity and privacy followed, and the industry is now just at the beginning of the journey, he says.
“It was foreign for boards 15 years ago to think about privacy and have chief privacy officers, and have privacy at the design level of products and services,” he says. “The same thing is going to happen with AI.”
And it might need to happen faster than it’s currently taking, he adds.
“The new models are and feel very different in terms of their power, and the public consciousness sees that,” he says. “This has bubbled up in all faces of regulations, legislation, and government action. Whether fair or not, there’s been criticism that regulations around data privacy and data security were too slow, so regulators are seeking to move much quicker to establish themselves and their authority.”