I don't see what you would even do with an LLM in a code review.

mannykannot · on April 15, 2024

The positive view: generate scenarios in which the code will fail.

The negative view: write plausible-seeming explanations justifying the code as correct.

mattsan · on April 15, 2024

yep this is what I meant. hallucinating, justifying or missing bad stuff.

additionally, similar to how large PRs are more likely to just be skimmed and replied with a "LGTM!", an LLM missing some bad stuff but still producing a seemingly thorough review would increase the chance of the bad stuff making its way in.

allowing LLMs to write code would be fine if its truly verified by a human, but let another LLM hallucinate and cloud a persons judgement and you've got a problem

kragen · on April 15, 2024

i thought i'd find out, so i picked a quasirandom piece of my own code (http://canonical.org/~kragen/sw/dev3/nonarytbl.py) and gave it to gpt-4 to review, with the prompt:

"Please review the above code. How does it work? Is it well designed? Is it efficient? What are its good points and its bad points? How should it be improved? Is it readable and maintainable?"

i feel like gpt-4's code review (included below) was mostly correct and useful. however, the efficiency concerns in particular are unfounded, and the python approach to handling errors like those cited is to just let the exception propagate, suboptimal though that is for usability. also, i'm not sure i agree with its high opinion of the modularity, usability, and readability

simply pasting gpt-4's partly incorrect analysis into a textarea when someone asks you for a code review would be malpractice, but it could be a useful source of ideas for possible improvements to suggest and possible problems to investigate. also, i've found it to be especially good at coming up with plausible ways that someone could misunderstand my code, which is to say, ways that my code is unclear and therefore could have its clarity improved

overall i'd say that gpt-4 is a lot better at reviewing code than at writing code. i've gotten significant improvements out of its code by asking it to review its own code looking for improvements, then asking it what the code would look like with those improvements. also, having it write test suites, then showing it the results of running the test suites

— ⁂ —

The provided Python script is designed to generate and print arithmetic tables for balanced nonary (base 9) numbers using a unique set of Unicode glyphs. The script covers operations such as multiplication, maximum, addition, and subtraction. Here's a breakdown of how the code works and an evaluation of its design:

### How It Works:

1. `gen_table` Function:

- This function generates a table for a given binary operation (`op`) like multiplication or addition.

- It takes an optional `name` parameter for labeling the operation in the table header.

- The `values` parameter specifies the range of values to be used in generating the table (default is from -4 to 4).

- It constructs each row by applying the operation between all pairs of values.

2. `nonary` Function:

- Converts an integer into its balanced nonary representation using specified glyphs.

- Handles negative numbers and numbers greater than 8 by recursive calls.

3. `layout` Function:

- Formats the table into aligned columns for better readability.

4. `paste` Function:

- Combines multiple tables horizontally with spaces between them.

5. Main Execution Block (`if __name__ == '__main__'`):

- Generates specific tables for different operations and prints them formatted properly.

### Evaluation:

Good Points:

- Modularity: Functions like `gen_table`, `nonary`, and `layout` are modular, making specific tasks well encapsulated.

- Reusability: The functions can be reused or extended easily; e.g., adding new operations or changing glyph sets.

- Readability: Despite some complexity in logic, especially in recursion and formatting, variable names and function names are descriptive which aids understanding.

Areas for Improvement:

1. Efficiency Concerns:

- Recursion in `nonary`: For large numbers, this could lead to deep recursion levels.

- Repeated calculation of widths in `layout`: This could be optimized if performance becomes an issue with larger data sets.

2. Error Handling:

- There’s no error handling if invalid inputs are passed to functions like non-existent glyphs or unsupported operations.

3. Documentation & Comments:

- While there's a detailed module-level docstring explaining high-level concepts, inline comments explaining complex parts (like recursion in `nonary`) would improve maintainability.

- More detailed docstrings on what each function expects as input and what it outputs would help other developers understand their interfaces quickly without diving into code details.

4. Testing & Validation:

- Adding unit tests would ensure that changes do not break existing functionality and help validate edge cases like very large/small numbers or unusual sequences of operations.

5. Scalability & Generalization:

– Currently tailored specifically to balanced nonary system; generalizing this to handle any base might make it more useful as a tool for exploring other numeral systems.

6. User Interface Improvements:

– Providing command-line arguments or interactive prompts could allow users to specify ranges or select specific tables without modifying source code directly.

### Conclusion:

The script is well-designed concerning modularity and separation of concerns but lacks robustness due to missing error handling and testing components. Enhancements towards documentation, user interaction, scalability/generalization could significantly increase its utility while maintaining ease-of-use for educational purposes or numerical system explorations.