Part 3: Vibe Coding with LLM — Ask ChatGPT to Write Your Code

With the environment from Part 2 ready, this is where the course gets exciting. In this part, you will develop the habit of describing the analysis you want in plain English to an LLM, having it write Python code as an .ipynb file, and running it in your own VSCode. Even without knowing how to code, you can ask an LLM to write it for you. When errors occur, you can ask the LLM to fix them. This back-and-forth style is called "vibe coding."

01 What is vibe coding?

Vibe Coding is a coding style where, instead of writing code yourself, you describe your intent in natural language and have an LLM write the code for you. Tell it "I want to run this analysis" or "I want to read this CSV and aggregate by column," and the LLM translates that into Python code.

For this course, the free versions of ChatGPT (OpenAI) or Claude (Anthropic) are more than sufficient. Both free plans can handle everything in this course.

Paid plans (ChatGPT Plus, Claude Pro) give access to more capable models with higher code accuracy. However, the free plan is plenty to start. Consider upgrading once you've gotten comfortable.

02 Critical first: handling personal and confidential data

Before you start using an LLM, confirm one absolute rule upfront so you don't accidentally share sensitive information as you work through the examples.

Never send the following to an LLM:
· Raw experimental data (especially anything containing subject information or personally identifiable information)
· The text or figures of an unpublished manuscript
· Information covered by a confidentiality agreement in a collaborative project
· Personal information (names, addresses, IDs, etc.)

The only thing you should send to an LLM is the structure of your data (column names, what each column means, approximate row count). For example, the following kind of prompt is perfectly fine:

sample_data.xlsx has 4 columns: "Sample ID", "Biomass", "Leaf Area", and "Nitrogen Content", with about 100 rows. All values are numeric except Sample ID, which is a string. Please read this file and calculate the correlation between Biomass and Leaf Area.

The key is to describe only the shape of your data to the LLM. The code runs entirely on your own PC, so the data itself never leaves your machine.

Using the "Temporary Chat" feature

Both ChatGPT and Claude offer a Temporary Chat mode, where the conversation is not used for training data and does not appear in your chat history. This is useful for sensitive requests or when discussing your lab's internal data structure.

ChatGPT: "Temporary Chat" toggle near the account icon in the top right
Claude: Check settings when starting a new chat, or under Account → Privacy settings

Using Temporary Chat does not mean it is safe to send actual data. Think of it as an additional safeguard on top of keeping shared information to the bare minimum.

03 Starting point: extract your data structure and share it with the LLM

As Step 02 established, you cannot send your raw data to the LLM. But without knowing your data structure (column names, types, row count), the LLM cannot write accurate code. The solution is a two-step workflow that forms the foundation of vibe coding: extract a structural summary of your data locally, then share only that structural information with the LLM. Skipping this step causes the LLM to assume incorrect column names or types — always start here.

Steps

Ask the LLM: "Please write a short code snippet that outputs only the structure of my data (column names, data types, missing value counts, and min/max/mean for each numeric column)"
Run the returned code in VSCode
Copy the output (structural information only — no personal data)
Paste that output back into the LLM and continue: "Based on this data structure, please write the code to perform [X] analysis"

First, I'd like to check my data structure. Please write a short code snippet that reads my_first_analysis/data/sample_data.xlsx (the notebook lives in the notebooks/ folder, so in code use ../data/sample_data.xlsx) and outputs only the column names, data types, missing value counts, and min/max/mean for each numeric column. I'll paste the output back to you afterward and we'll continue from there.

Once you paste the output (e.g., "Columns are A, B, C; A is numeric, no missing values, min 0.5, max 12.3...") back into the LLM, it can return highly accurate code based on your actual data structure — without you having sent any of the actual data.

04 The most important skill: articulate your intent clearly and systematically

The most important skill in vibe coding is not learning Python syntax or memorizing useful libraries. It is understanding precisely what you want to do and communicating that to the LLM concretely and systematically.

LLMs are not magic — they infer code from the information you give them. A vague request like "please aggregate my data" gives the LLM no basis for deciding how to aggregate. Conversely, a specific request like "for each unique value in column A, calculate the mean and standard deviation of column B, sort by column A descending, and save the result as a CSV in the results/ folder" will almost always produce exactly what you want on the first try.

Checklist for what to include in a prompt

☐ Purpose: What is this analysis for?
☐ Input: What data, where, and in what structure? (paste the structural info from Step 03)
☐ Processing: What specific calculation or aggregation do you want?
☐ Output: What to save, where, and in what format?
☐ Constraints: Which libraries to use, what behavior to avoid, etc.

Simply keeping this checklist in mind will dramatically improve the quality of code the LLM returns. Conversely, if you yourself are unclear about what you want, no amount of prompting will produce the right result. Half of the analysis is decided in the "articulate your intent" stage — before any code is written.

05 Making your first request (ask for an .ipynb file directly)

Open ChatGPT (or Claude) and start a new chat. Both ChatGPT and Claude can now output an .ipynb file directly and make it available for download. There is no need to copy and paste code cell by cell. In your first request, explicitly ask: "Please write this as an .ipynb file and make it available for download."

Turning on web search improves accuracy: For questions about the latest library specifications or up-to-date error fixes, enable the web search icon (globe icon) in the ChatGPT input bar before sending your prompt. Claude has a similar web search feature. This is especially useful for newer libraries or version-specific questions.

I'd like to do data analysis in Python.

[Purpose]
Examine the relationship between biomass and leaf area in a new crop variety.

[Folder structure]
- The notebook is at notebooks/analysis.ipynb
- Raw data is in the data/ folder; outputs go in the results/ folder
- So in code, read data from ../data/ and save results to ../results/ (create results/ if it doesn't exist)

[Input]
data/sample_data.xlsx (from the notebook: ../data/sample_data.xlsx)
- Columns: "Sample ID" (string), "Biomass", "Leaf Area", "Nitrogen Content" (numeric)
- Rows: approximately 100
- (paste the detailed structural output from Step 03 here)

[Processing]
- Calculate basic statistics for each column
- Create a scatter plot of "Leaf Area vs Biomass" with a regression line
- Calculate the correlation coefficient and p-value

[Output]
Save scatter.png (scatter plot) and statistics.csv (summary statistics) to the project's results/ folder (from the notebook: ../results/).

[Constraints]
- Add English comments to each cell explaining what it does
- Please output this as an .ipynb file named analysis.ipynb and make it available for download

Once the LLM generates the file, a download link will appear. Follow these steps:

Download analysis.ipynb from the link (it typically saves to your Downloads folder)
In Explorer/Finder, move the downloaded analysis.ipynb into the my_first_analysis/notebooks/ folder you created in Part 2
With the folder open in VSCode, click analysis.ipynb to open and run it

If you place the file anywhere other than notebooks/, the code's relative paths (e.g., "../data/sample_data.xlsx") will fail to find the data, producing a FileNotFoundError. Always place the notebook inside the notebooks/ folder (it reads data from ../data/).

06 Running, partial editing, and full rewrites

Open the downloaded analysis.ipynb in VSCode. Run the cells from top to bottom (Shift + Enter). As you review the output, ask the LLM in plain English for any changes you want.

Example — partial edit

Regarding the scatter plot code from before, could you make the following changes? Please re-output the updated .ipynb for download.
· Set the chart title to "Leaf Area vs Biomass (n=100)"
· Change the point color to green
· Use English labels for both axes

Example — full rewrite

If you want to fundamentally change the approach, simply ask for a full rewrite.

Instead of the previous approach, could you redraw the scatter plot using the seaborn library? Please include a regression line as well. Rewrite the entire code and output a new .ipynb file for download.

07 Handling errors

Errors will happen. In fact, it would be unusual for the first run to be error-free. There is no need to panic when errors appear. Simply copy the entire error message and paste it into ChatGPT — that is all you need to do.

I got the following error when running the code. Could you fix it? Please output the corrected .ipynb for download.

---
FileNotFoundError: [Errno 2] No such file or directory: '../data/sample_data.xlsx'
---

The LLM will infer the cause from the error message and return a fixed file. Download the corrected version and run it again — repeat this cycle as needed.

Common errors and how to handle them

ModuleNotFoundError: A required library is missing → run pip install library-name in the terminal
FileNotFoundError: The file path is wrong → check the filename and location
KeyError: A column name is wrong → make sure it exactly matches the column names in your Excel file
SyntaxError: A coding mistake → ask the LLM to fix it

08 When needed: changing your Python version

As you work through analyses, you may encounter situations where "this library only supports up to Python 3.10" or "that library requires Python 3.12 or later." The LLM may also suggest "please use Python 3.X" while helping you resolve an error.

To change your Python version, follow these steps:

Delete the old virtual environment: Delete the .venv folder inside your project folder using Explorer/Finder (delete the whole folder).
Install the required Python version: Download it from python.org/downloads (e.g., Python 3.10). It can coexist with your existing Python installation.
Create a new virtual environment with the specified version: On Windows: py -3.10 -m venv .venv; on Mac/Linux: python3.10 -m venv .venv.
Activate the new environment and reinstall libraries: Run .venv\Scripts\Activate.ps1 to activate → then pip install ... to reinstall.
Select the new .venv via "Select Kernel" in VSCode.

Think of virtual environments as disposable work rooms. Don't hesitate to delete and recreate one. There are no important settings stored inside. As long as you keep your requirements.txt (explained in Part 4), recreation is straightforward.

09 The typical vibe coding workflow

The repeating pattern in real-world analysis looks like this:

Extract data structure and share it with the LLM (Step 03)
Clarify what you want and communicate it systematically to the LLM (Steps 04 & 05)
Download the .ipynb the LLM generates → move it to the project folder → run it
Error occurs → paste the error into the LLM and ask for a fix (Step 07)
It runs but the result is not what you expected → describe the difference and ask for an improvement (Step 06)
Satisfied → move on to the next analysis (repeat from step 1)

As you go through this cycle repeatedly, you develop an intuition for "this kind of prompt produces this kind of code." You don't need to memorize Python syntax. Instead, you train the skill of articulating what you want clearly.

✓ What you can now do

The fundamental rule of not sending personal or confidential data, and how to use Temporary Chat
A checklist for communicating your intent systematically to an LLM
The workflow of asking for .ipynb files directly → downloading → running in VSCode
Error handling (copying the full error message and giving it to the LLM)
How to request partial edits and full rewrites
The two-step workflow of extracting structural data and sharing it with the LLM
How to change your Python version when needed

In Part 4, you will learn how to create a requirements.txt file that lets others reproduce the analysis environment you built here.

Prev — Part 2 ← Environment Setup Index Next — Part 4 Ensuring Reproducibility →