One of the key highlights at Oracle Cloudworld 2023 is general availability of Select AI feature that allows anyone to query the Oracle Database using natural language. This is likely made possible with a template wrapping the database metadata with the prompt, and then calling either the OpenAI or Cohere APIs to generate the intended text.
This feature is already available on an Oracle Autonomous Database (ADB), and yes, that includes the Always Free instances that I use for the managing the Oracle APEX Challenge Coin registry. I know everyone's eager to try it out, and so am I! So here's a quick set of instructions to get you going.
The first thing that you'll need to do is make sure your application schema has the appropriate rights to use this feature. As the ADB ADMIN user, first add the Access Control Entry (ACE) for using the intended API. Also, grant the user access to both the DBMS_CLOUD
and new DBMS_CLOUD_AI
packages.
define schema_name = 'core'
begin
dbms_network_acl_admin.append_host_ace(
host => 'api.cohere.ai'
, ace => xs$ace_type(
privilege_list => xs$name_list('http')
, principal_name => upper('&schema_name')
, principal_type => xs_acl.ptype_db
)
);
end;
/
grant execute on dbms_cloud to &schema_name;
grant execute on dbms_cloud_ai to &schema_name;
NOTE:
For OpenAI, the
host
value would beapi.openai.com
.
Next, as the user, create a credential for storing the required credentials for accessing the target API. For Cohere, you will first need to sign up for an account. For development use, access to the API is free, and an API key should have already been generated for you. A username
is not required, and thus I replaced with the name associated with the API key.
begin
dbms_cloud.create_credential(
credential_name => 'COHERE_CRED'
, username => 'default'
, password => '***'
);
end;
/
Create a profile using the credential.
begin
dbms_cloud_ai.create_profile(
profile_name => 'COHERE'
, attributes => json_object(
key 'provider' value 'cohere'
, key 'credential_name' value 'COHERE_CRED'
, key 'object_list' value json_array(
json_object(
key 'owner' value 'core'
, key 'name' value 'coin'
)
)
)
);
end;
/
The credential and profile name can be anything that makes sense to you. However, when creating the profile, the provider
attribute is required and must be either openai
or cohere
. The object_list
attribute provides you the opportunity to scope what database objects should be included. You may optionally specify the owner
and not a name
to included all objects in the owner's schema. There are likely limits to how much data you might want to include in a request, and should also keep in mind that the metadata is submitted to an external system.
Other profile attributes include:
model
temperature
max_tokens
stop_tokens
This allows further adjustments to how the Generative AI model behaves.
Finally, simply set the user's profile for performing the Select AI tasks:
begin
dbms_cloud_ai.set_profile(
profile_name => 'COHERE'
);
end;
/
With that, I can now ask:
define question = 'how many coins where created for the APEX community?'
select ai runsql dbms_assert.enquote_literal('&question');
I correctly received the result 200
! But that's way too easy for this Large Language Model (LLM). So, I thought I'll add all the objects in the schema, since it isn't too big.
begin
dbms_cloud_ai.drop_profile('COHERE');
dbms_cloud_ai.create_profile(
profile_name => 'COHERE'
, attributes => json_object(
key 'provider' value 'cohere'
, key 'credential_name' value 'COHERE_CRED'
, key 'object_list' value json_array(
json_object(
key 'owner' value 'core'
--, key 'name' value 'coin'
)
)
)
);
end;
/
And then I asked, "How many coins have been sponsored?". It said, "0", and that is incorrect! To figure out why, I used the action keyword showsql as opposed to runsql.
define question = 'how many coins have been sponsored?'
select ai showsql dbms_assert.enquote_literal('&question');
Yielding the SQL statement:
SELECT COUNT(*) FROM "CORE"."SHIPMENT"
WHERE "SHIPMENT_ID" IN (
SELECT "SHIPMENT_ID" FROM "CORE"."SPONSOR_EVENT"
WHERE "SPONSORSHIP_TYPE" = 'SPONSORED'
)
And this could be wrong for reasons as follows:
- There is no
SPONSORSHIP_TYPE
with the valueSPONSORED
. I wonder if having a lookup table might have helped, but I do have aCHECK
constraint that specifies what values are allowed. - It might not have understood the intended meaning of the table name
SPONSOR_EVENT
, but I don't always name tables such they are semantically accurate. And of course, having lived through the age of 30-character limits on object names, table names might be missing vowels or word separators. - My data model for this project might not have been optimum.
SHIPMENT
was about shipments, but not all sponsor events involved a shipping event. They were delivered personally.
I would have expected the narrate
to explain how it had derived the SQL statement it generated.
select ai narrate dbms_assert.enquote_literal('&question');
However, what I got was a different SQL statement. And the same with chat
.
Finally, the next question I had was, "Can I use this in APEX?".
Maybe not yet?
I don't believe that, today, there is a simply way to "plugin" this new feature. The outputs of the query can be as dynamic as the inputs. Maybe a blend of using the APEX_EXEC
package and DBMS_CLOUD_AI.GENERATE
function? I wonder. Also, currently, I don't believe there's an APEX PL/SQL API that would allow us to create a report dynamically. Perhaps that might change some day. If you have thoughts about this, please reach out.
Summary
This technology is still very much in its infancy, but exciting nonetheless. Though I am on the fence if it's ready for prime time. I wouldn't put this in front of an uninformed user yet. To be fair, I have not tried this with the OpenAI LLM models. Maybe it'll do better. Or perhaps, I need the LLMs to suggest improvements to my data models. ;-) And that brings me to the final point - semantics! In the era of big data and machine learning, one concept that often gets brushed away is providing rich semantics to data and metadata. We have had OCI Data Catalog for a while now... how many of us actually use it? Or adding proper comments to the schema metadata, which was clearly lacking in my toy project. If that was included in the prompt, then perhaps I would have had better results.
Update [2023-09-24]
So, I was still trying to get Select AI to give me the correct answer to the question:
How many coins have been sponsored?
I did find the answer to my earlier question about whether or not comments were included in the prompt. There is one more profile attribute that can be added to the profile. The comments
attribute supplies the LLM with comments in the sys.all_col_comments
view. Unfortunately, table comments don't seem to be included. Adding comments describing what the tables and views represented didn't seem to help. I couldn't get it to ignore a view that was never going to tell you if a coin has been sponsored. It was also creating predicates on columns that didn't exist, so the third attribute I had tweaked was to specify what database objects are in the object_list
.
declare
l_objects_list_jsonstr varchar2(32767);
begin
begin
dbms_cloud_ai.drop_profile('COHERE_WITH_COMMENTS');
exception
when others then null;
end;
select json_arrayagg(
json_object(key 'owner' value 'core', key 'name' value table_name))
into l_objects_list_jsonstr
from user_tables;
dbms_cloud_ai.create_profile(
profile_name => 'COHERE_WITH_COMMENTS'
, attributes => json_object(
key 'provider' value 'cohere'
, key 'credential_name' value 'COHERE_CRED'
, key 'comments' value true
, key 'temperature' value 0
, key 'object_list' value l_objects_list_jsonstr
)
);
end;
/
I still do have the correct answer. :-(
Credits
- Banner image generated using Stability AI's Stable Diffusion model.