How to analyze surveys with ChatGPT

Lior Neu-ner

Nov 08, 2023

Surveys are a great way of collecting feedback from your users, especially if you ask your users like "How can we improve our product?". However, they can be hard to analyze if you receive hundreds or more answers. Fortunately, OpenAI's ChatGPT is great at doing this.

In this tutorial, we'll show you how to use ChatGPT to analyze your survey results. We'll create a basic Node.js script, parse a CSV of survey answers, and use the ChatGPT API to extract useful information such as sentiment and theme.

The completed code for this tutorial is available on GitHub

Creating a Node script

First, make sure Node is installed (18.16.0 or newer). Then create a package.json file by running npm init. Give your app a name and select the defaults for all the options.

Then, create a new JavaScript file for our script (we name ours survey-analyzer.js):

Terminal
npm init
touch survey-analyzer.js

Finally, add a placeholder function to your script. We'll add code to it in the next steps.

JavaScript
// in survey-analyzer.js

const analyzeSurveyAnswers = async () => {
  // TODO: add code
}

analyzeSurveyAnswers()

Parsing a CSV of survey answers

We'll import sample survey answers into our script from a CSV file called answers.csv. You can download the CSV file here.

💡 PostHog Tip: If you're using PostHog surveys, you can export your answers by navigating to your survey, clicking "Export", and then "Copy CSV to clipboard".

Add the CSV file to your project folder. Then, install csv-parser – a handy library for parsing CSVs.

Terminal
npm install csv-parser

Next, add the following code to convert the CSV into an array:

JavaScript
// in survey-analyzer.js

const fs = require('fs');
const csv = require('csv-parser');

const csvToArr = async (filePath) => {
  const results = [];
  return new Promise((resolve, reject) => {
    fs.createReadStream(filePath)
      .pipe(csv({ headers: ['answer', 'timestamp', 'person'] })) // your CSV header names
      .on('data', data => results.push(data))
      .on('end', () => {
        resolve(results);
      })
      .on('error', error => {
        reject(error);
      });
  });
}

const analyzeSurveyAnswers = async () => {
  const surveyAnswers = await csvToArr('answers.csv')
}

analyzeSurveyAnswers()

Analyzing answers with ChatGPT

The next step is to submit our answers to ChatGPT to analyze them. To do this, first you'll need to sign up for an OpenAI account and retrieve your API key.

Then, install OpenAI's Node library and initialize it in the code.

Terminal
npm install openai

JavaScript
// in survey-analyzer.js

const OpenAI = require("openai");

const openai = new OpenAI({
    apiKey: "<your-api-key>"
});

// rest of the code

Next, for each survey answer, we'll make a call to the chat completions API to extract the following information:

Whether the sentiment is positive or negative.
The theme of the answer.

To do so, the API method requires 2 parameters:

model – which OpenAI model to use. We use gpt-3.5-turbo-16k (we explain why later in this post).
messages – i.e., the task you are requesting the model to perform. You can find more details on the required format in the API docs.

With this in mind, here's what our code looks like:

JavaScript
// rest of the code

const analyzeSurveyAnswers = async () => {
  const surveyAnswers = await csvToArr('answers.csv')
  for (const surveyAnswer of surveyAnswers) {
    try {
      const result = await openai.chat.completions.create({
        model: "gpt-3.5-turbo-16k",
        messages: [
          {
            "role": "system", 
            "content": `Your job is to analyze survey answers. For each answer, extract the sentiment (positive or negative only) as well as the theme in the answer. 

              You must only respond with a JSON. This JSON must contain the keys "survey_answer", "sentiment", and "theme". 
              
              Here's an example of an valid answer: 
              \`\`\`
              {
                "survey_answer": "The product has potential, but it's not very intuitive to use. Please simplify the interface",
                "sentiment": "negative",
                "theme": "user interface"
              }
              \`\`\`
              `
          },
          {
            "role": "user", 
            "content": `Analyze this survey answer: ${JSON.stringify(surveyAnswer.answer)}`
          }
        ],
      });

      const analyzedAnswer = JSON.parse(result.choices[0].message.content);
    } catch (error) {
      console.log(`An error occurred: ${error}`)
    }
  }
}

Finally, we'll save the answer from ChatGPT into a CSV file. This way we can analyze the results later:

JavaScript
const analyzeSurveyAnswers = async () => {
    // rest of function code...

      const analyzedAnswer = JSON.parse(result.choices[0].message.content);
      const csvLine = `"${analyzedAnswer.survey_answer}","${analyzedAnswer.sentiment}","${analyzedAnswer.theme}"\n`;
      fs.appendFile('analyzed_answers.csv', csvLine, (err) => {
        if (err) throw err;
        console.log('The analyzed answer was appended to file!');
      });
    } catch (error) {
      console.log(`An error occurred: ${error}`)
    }
  }
}

If you now run node survey-analyzer.js, you should see your analyzed answers appear in your new CSV file.

Batching our requests

Since we submit each survey answer one at a time, it can be quite tedious waiting for the script to complete. To improve this, we can batch our survey answers so we analyze multiple at the same time.

The only constraint is the ChatGPT model token limit:

A token can be thought of as roughly 4 characters or 0.75 words. When you make an API request, the text in the messages parameter and the response from the API are converted into tokens. This means that the number of tokens in your API call is number tokens in your request plus the number of tokens in the response.

The token limit refers to the maximum number of tokens that the model can process in a single request. If we exceed this limit, the model won't be able to handle the request, resulting in an error. Each model has a different limit, ranging from approximately 4,000 to 32,000 tokens.

The gpt-3.5-turbo-16k model we use has a large limit (~16,000 tokens), while still being sufficiently fast and powerful for analyzing survey answers.

This brings us to the question: How many survey answers can we batch and analyze at the same time?

Using tokenizer, a tool that counts the number of tokens in a string, we see that each survey answer is roughly 60 tokens and each response is about 120 tokens.

So the total cost to analyze each survey answer is approximately 180 tokens. This means we can batch at most 16,000 / 180 = 88 answers together. To be extra safe and ensure we don't exceed this limit, we'll batch at most 50 answers together.

Let's update our code to do that:

First we use a variable batchSize = 50 in our for-loop.
Next, we modify our messages argument to request ChatGPT to analyze multiple survey answers (instead of only one at a time).
Finally, we save the array from the API response into our CSV file.

JavaScript
// rest of code...

const analyzeSurveyAnswers = async () => {
  const surveyAnswers = await csvToArr('answers.csv')
  const batchSize = 50;
  
  for (let i = 0; i < surveyAnswers.length; i += batchSize) {
    const batch = surveyAnswers.slice(i, i + batchSize);
    try {
      const result = await openai.chat.completions.create({
        model: "gpt-3.5-turbo-16k",
        messages: [
          {
            "role": "system", 
            "content": `Your job is to analyze survey answers. For each answer, extract the sentiment (positive or negative only) as well as the theme in the answer.

              You must only respond with an array containing JSONs. These JSON objects must contain the keys "survey_answer", "sentiment", and "theme". 
              
              Here's an example of an valid answer: 
              \`\`\`
              [
                {
                  "survey_answer": "The product has potential, but it's not very intuitive to use. Please simplify the interface",
                  "sentiment": "negative",
                  "theme": "user interface",
                }
              ]
              \`\`\`
              `
          },
          {
            "role": "user", 
            "content": `Analyze these survey answer: ${JSON.stringify(batch.map(surveyAnswer => surveyAnswer.answer))}`
          }
        ],
      });

      const analyzedAnswers = JSON.parse(result.choices[0].message.content);
      analyzedAnswers.forEach(r => {
        const csvLine = `"${r.survey_answer}","${r.sentiment}","${r.theme}"\n`;
        fs.appendFile('analyzed_answers.csv', csvLine, (err) => {
          if (err) throw err;
          console.log('The analyzed answer was appended to file!');
        });
      });
    } catch (error) {
      console.log(`An error occurred: ${error}`)
    }
  }
}

// rest of code...

Now if you run node survey-analyzer.js, ChatGPT analyzes 50 survey answers at a time.

Analyzing results in Excel or Google Sheets

Now that we've analyzed our survey answers, we can import them into Excel or Google Sheets to visualize them.

For example, you can create a pivot table to find the most common themes in the survey answers. You can do this by:

Importing analyzed_answers.csv in your sheet. Then, insert a new row at the top of the sheet so you can name the columns answer, sentiment and theme.
Clicking on the "Insert" tab, then "Pivot table".
Select the data range of your results.
In your pivot table editor, select theme in both the Rows and Value field.
Lastly, sort by descending COUNTA of theme.

You should now see a sorted list of the most common themes in your survey answers.

Reducing duplicate themes

You may have noticed that some of the themes suggested by ChatGPT are similar to others (but not the exact same).

For example, when we ran the script, it suggested a theme of user permissions for one survey answer but user roles and permissions for another. These are clearly the same theme, but the slightly different names mean that it's harder to aggregate the results in our pivot table.

To fix this, we create a new script reduce-themes.js which will aggregate similar themes together and reduce the total number of distinct themes.

In the script we:

Extract the themes analyzed by ChatGPT from analyzed_answers.csv.
Send this list of themes to ChatGPT and ask it aggregate it, thus reducing the number of similar themes.
Use the response from ChatGPT to update similar themes in analyzed_answers.csv

Altogether, this looks like this:

JavaScript
// in reduce-themes.js
const fs = require('fs');
const csv = require('csv-parser');
const OpenAI = require("openai");

const openai = new OpenAI({
  apiKey: "<your-api-key>"
});

const csvToArr = async (filePath) => {
  const results = [];
  return new Promise((resolve, reject) => {
    fs.createReadStream(filePath)
      .pipe(csv({ headers: ['answer', 'sentiment', 'theme'] }))
      .on('data', data => results.push(data))
      .on('end', () => {
        resolve(results);
      })
      .on('error', error => {
        reject(error);
      });
  });
}

const reduceThemes = async () => {
  const analyzedAnswers = await csvToArr('analyzed_answers.csv')  
    try {
      const themes = new Set(analyzedAnswers.map(answer => answer.theme))
      const result = await openai.chat.completions.create({
        model: "gpt-4",
        messages: [
          {
            "role": "system", 
            "content": `Your job is to aggregate this list of themes and group similar themes together. By grouping the themes together, you will reduce the number of themes in the list.
            In your answer, you must list the old theme and the new theme are you aggregating it to. 
            
            You must only respond with a JSON object. This JSON objects must use the old themes as the keys, and the values must be the new themes. 
              
              Here's an example of an valid answer: 
              \`\`\`
                {
                  "collaboration": "collaboration"
                  "collaboration tools": "collaboration",
                  "user permissions": "user permissions",
                  "user roles and permissions": "user permissions",
                  "integration": "integrations",
                  "integrations": "integrations",
                }
              \`\`\`
              `
          },
          {
            "role": "user", 
            "content": `Aggregate these themes: ${JSON.stringify(Array.from(themes))}`
          }
        ],
      });
      const newThemes = JSON.parse(result.choices[0].message.content);

      // update the previously analyzed answers with the new themes
      const updatedAnswers = analyzedAnswers.map(old => {
        const updatedAnswer = old
        updatedAnswer.theme = newThemes[old.theme]
        return updatedAnswer
      })

      // overwrite the current contents of analyzed_answers.csv with the updated answers
      const lines = analyzedAnswers.map(r => `"${r.answer}","${r.sentiment}","${r.theme}"`).join('\n');
      fs.writeFile('analyzed_answers.csv', lines, (err) => {
        if (err) throw err;
        console.log('All analyzed answers were written to file!');
      });

    } catch (error) {
      console.log(`An error occurred: ${error}`)
    }
}

reduceThemes()

Now, running node reduce-themes.js will aggregate similar themes. If you re-import your list in Google Sheets or Excel and create a pivot table, you should see fewer themes.

How to analyze surveys with ChatGPT

Contents