Power Query + OpenAI API (GPT)

Carlos Barboza is experimenting with visualizing Elon Musk’s activity on Twitter. That looks really great, but I wanted to see more than just quantity; I wanted to analyze the content of the tweets.

Subjectively, it appears that Elon is becoming more active in X, and recently he has switched from tweeting mostly about rockets to spreading fakes, and promoting anti-democratic, anti-Ukrainian, pro-Russian narratives…

But I want statistics. Therefore I’ve tried to categorize the tweets using OpenAI API (GPT) in Power BI.

First, I updated the M function fxOpenAI in my GitHub repository to make it compatible with the current version of OpenAI. It’s a basic template function for using OpenAI (GPT) in Power Query that accepts only the most important parameters.

To start using it, create a blank fxGPT query and insert the fxOpenAI function code into the Advanced Editor:

Don’t forget to visit https://platform.openai.com/api-keys (you need to sign in and buy some credits if you don’t have any) and create a new secret key. Replace <API_KEY> placeholder in the code with your key.

Now, fxGPT can be used to make requests to GPT from Power Query.

fxGPT parameters (user prompt, optional system prompt, optional model name):

Usage format:

= fxGPT ("user prompt", ["system prompt], ["model"])

Default model is gpt-3.5-turbo. It’s defined in the fxGPT code and you can change it. Otherwise use optional “model” parameter.

Warning: gpt-4-turbo-preview and gpt-4 are significantly more expensive than gpt-3.5-turbo. Use the latter for testing to reduce the cost.

In my project I was experimenting with this dataset of Elon Musk’s tweets.

To use the function add a custom column to the data table with, for example, the following code:

= Table.AddColumn(#"Kept First Rows", each fxGPT( [Tweet Text], "Categorize the following text into one of these categories: space and rockets, politics (left), politics (right), politics (other), entertainment, technology, finance, transportation. Use lowercase for all categories. If the text does not fit any of these, provide a custom category in 1-3 words. Respond with only the category name."))

Where [text] column contains the tweets.

The function returns a list with 2 values: GPT response message and finish reason.

The message the model returns may be partial because it exceeded the token limit (finish reason = “length”). To guard against this, check finish reason before parsing the response. If API returned complete message then finish reason = “stop”. See https://platform.openai.com/docs/guides/text-generation/chat-completions-response-format for more details.

I’ve added two more custom columns to extract both values (response message goes into the “category” column and finish reason into the “code” column.)

= Table.AddColumn(#"Removed Other Columns", "category", each [GPT]{0})

= Table.AddColumn(#"Added Custom1", "code", each [GPT]{1})

Then I used “category” column created by fxGPT to make this chart:

The first attempt was successful; I categorized the first 2,500 tweets. The next step is to write a more detailed and optimized prompt based on the results of the first test. For example, I don’t like that useless “custom category” and I would like to merge some small categories that were excluded from the chart.

Then, I’ll process the complete dataset and examine how the topics of the tweets changed over time. Although, the data in the dataset is from before Elon Musk acquired Twitter, I’ll need to obtain more recent data to confirm his current bias.

Regarding the performance and API rate limits, on Tier 1, it’s definitely not suitable for large datasets, but it is good enough for some simple tasks and experiments with a maximum of a few thousand rows in the data table. I’m yet to test it on the higher OpenAI tiers. Well, probably at some point the bottle neck will be on the Power Query side.

P.S. The tweets require some cleaning in Power Query, before they can be processed by the API:


Share the article