For an open source alternative, check out https://github.com/OpenAdaptAI/OpenAdapt. We combine Segment Anything Model with GPT4-V to understand recordings of workflows in desktop user interfaces, then replay them according to the user's instructions.