yep this is what I meant. hallucinating, justifying or missing bad stuff.
additionally, similar to how large PRs are more likely to just be skimmed and replied with a "LGTM!", an LLM missing some bad stuff but still producing a seemingly thorough review would increase the chance of the bad stuff making its way in.
allowing LLMs to write code would be fine if its truly verified by a human, but let another LLM hallucinate and cloud a persons judgement and you've got a problem
"Please review the above code. How does it work? Is it well designed? Is it efficient? What are its good points and its bad points? How should it be improved? Is it readable and maintainable?"
i feel like gpt-4's code review (included below) was mostly correct and useful. however, the efficiency concerns in particular are unfounded, and the python approach to handling errors like those cited is to just let the exception propagate, suboptimal though that is for usability. also, i'm not sure i agree with its high opinion of the modularity, usability, and readability
simply pasting gpt-4's partly incorrect analysis into a textarea when someone asks you for a code review would be malpractice, but it could be a useful source of ideas for possible improvements to suggest and possible problems to investigate. also, i've found it to be especially good at coming up with plausible ways that someone could misunderstand my code, which is to say, ways that my code is unclear and therefore could have its clarity improved
overall i'd say that gpt-4 is a lot better at reviewing code than at writing code. i've gotten significant improvements out of its code by asking it to review its own code looking for improvements, then asking it what the code would look like with those improvements. also, having it write test suites, then showing it the results of running the test suites
— ⁂ —
The provided Python script is designed to generate and print arithmetic tables
for balanced nonary (base 9) numbers using a unique set of Unicode glyphs. The
script covers operations such as multiplication, maximum, addition, and
subtraction. Here's a breakdown of how the code works and an evaluation of its
design:
### How It Works:
1. `gen_table` Function:
- This function generates a table for a given binary operation (`op`) like
multiplication or addition.
- It takes an optional `name` parameter for labeling the operation in the
table header.
- The `values` parameter specifies the range of values to be used in
generating the table (default is from -4 to 4).
- It constructs each row by applying the operation between all pairs of
values.
2. `nonary` Function:
- Converts an integer into its balanced nonary representation using
specified glyphs.
- Handles negative numbers and numbers greater than 8 by recursive calls.
3. `layout` Function:
- Formats the table into aligned columns for better readability.
4. `paste` Function:
- Combines multiple tables horizontally with spaces between them.
5. Main Execution Block (`if __name__ == '__main__'`):
- Generates specific tables for different operations and prints them
formatted properly.
### Evaluation:
Good Points:
- Modularity: Functions like `gen_table`, `nonary`, and `layout` are
modular, making specific tasks well encapsulated.
- Reusability: The functions can be reused or extended easily; e.g., adding
new operations or changing glyph sets.
- Readability: Despite some complexity in logic, especially in recursion
and formatting, variable names and function names are descriptive which aids
understanding.
Areas for Improvement:
1. Efficiency Concerns:
- Recursion in `nonary`: For large numbers, this could lead to deep
recursion levels.
- Repeated calculation of widths in `layout`: This could be optimized if
performance becomes an issue with larger data sets.
2. Error Handling:
- There’s no error handling if invalid inputs are passed to functions
like non-existent glyphs or unsupported operations.
3. Documentation & Comments:
- While there's a detailed module-level docstring explaining high-level
concepts, inline comments explaining complex parts (like recursion in `nonary`)
would improve maintainability.
- More detailed docstrings on what each function expects as input and what
it outputs would help other developers understand their interfaces quickly
without diving into code details.
4. Testing & Validation:
- Adding unit tests would ensure that changes do not break existing
functionality and help validate edge cases like very large/small numbers or
unusual sequences of operations.
5. Scalability & Generalization:
– Currently tailored specifically to balanced nonary system; generalizing
this to handle any base might make it more useful as a tool for exploring other
numeral systems.
6. User Interface Improvements:
– Providing command-line arguments or interactive prompts could allow
users to specify ranges or select specific tables without modifying source code
directly.
### Conclusion:
The script is well-designed concerning modularity and separation of concerns
but lacks robustness due to missing error handling and testing components.
Enhancements towards documentation, user interaction,
scalability/generalization could significantly increase its utility while
maintaining ease-of-use for educational purposes or numerical system
explorations.