o1 mini seems to get it on the first try (I didn't vet the code, but I tested it and it works on both examples provided in the notebook, `dates` and `gabe_dates`):
from collections import defaultdict
def find_cheryls_birthday(possible_dates):
# Parse the dates into month and day
dates = [date.split() for date in possible_dates]
months = [month for month, day in dates]
days = [day for month, day in dates]
# Step 1: Albert knows the month and says he doesn't know the birthday
# and that Bernard doesn't know either. This implies the month has no unique days.
month_counts = defaultdict(int)
day_counts = defaultdict(int)
for month, day in dates:
month_counts[month] += 1
day_counts[day] += 1
# Months with all days appearing more than once
possible_months = [month for month in month_counts if all(day_counts[day] > 1 for m, day in dates if m == month)]
filtered_dates = [date for date in dates if date[0] in possible_months]
# Step 2: Bernard knows the day and now knows the birthday
# This means the day is unique in the filtered dates
filtered_days = defaultdict(int)
for month, day in filtered_dates:
filtered_days[day] += 1
possible_days = [day for day in filtered_days if filtered_days[day] == 1]
filtered_dates = [date for date in filtered_dates if date[1] in possible_days]
# Step 3: Albert now knows the birthday, so the month must be unique in remaining dates
possible_months = defaultdict(int)
for month, day in filtered_dates:
possible_months[month] += 1
final_dates = [date for date in filtered_dates if possible_months[date[0]] == 1]
# Convert back to original format
return ' '.join(final_dates[0]) if final_dates else "No unique solution found."
# Example usage:
possible_dates = [
"May 15", "May 16", "May 19",
"June 17", "June 18",
"July 14", "July 16",
"August 14", "August 15", "August 17"
]
birthday = find_cheryls_birthday(possible_dates)
print(f"Cheryl's Birthday is on {birthday}.")
In addition to that after they create the 1st program with mistakes the author should have showed them the invalid output and let them have a chance to fix it. For humans solving this on the first try without running the code also tends to frequently not work.
"seems to" isn't good enough, especially since it's entirely possible to generate code that doesn't give the right answer. 4o is able to write some bad code, run it, recognize that it's bad, and then fix it, if you tell it to.