I’m Anton, a founding engineer at Athena Intelligence. Athena is an AI data platform+agent that supports workflows for enterprise teams.
Over the last two years we’ve deployed the state-of-the-art data LLM tooling to enterprise clients across several domains, zeroing in on co-piloting workflows and offloading more and more of the work to AI agents.
Today we’re proud to share a demo of the first (to our knowledge) autonomous data AI agent!
Athena engineering team has built several key pillars for end-to-end autonomous workflows:
* Data platform: Athena connects to all structured data sources and supports embeds of all popular unstructured document types. We're built on Apache Superset and Iceberg, balancing secure multi-database queries on client warehouses with lightning-fast data pipelines for complex workflows. Building an unstructured document platform in addition to that has enabled us to offer our users granular control over their knowledge retrieval pipelines. We support a broad scope of use cases, ranging from low-latency, high-volume simple retrievals with Groq Llama 3, to involved, self-reflective RAG pipelines powered by the most capable models (Clude 3 models, GPT-4T) and adaptive architectures.
* Execution platform: Athena has integrated SQL Editor and Jupyter Notebooks environments to work with data, Notion-like reports and Figma-like whiteboard to compile end knowledge work outputs, and Browse environment to explore and retrieve information from web. We are big believers in providing users with more UI options so they can experiment and find workflows that work for them. Most of our “apps” use Yjs backend, allowing live collaboration between people and agents.
* Headless and Human modes: every environment and its metadata are transparent to and can be manipulated by Athena (in addition to humans through UI). That means that any item of work can be collaborated on between AI and humans in a two-way fashion, in real time.
* Agentic tools and orchestration: each environment has a set of retrieval pipelines, tools, agents and agent orchestration items to support execution on specific tasks (example: find latest product catalog online) and end-to-end workflows (update our numbers and externalities, make a forecast for next months orders). We serve a wide range of models, both proprietary and open-source, through agentic architectures built with Langchain and Langgraph.
As the result, you have an agent that can:
* Intake real-world problems through existing workplace channels like email or Slack.
* Compile and iterate on a high-level plan to solve the problem.
* Execute the plan step-by-step, incorporating self-correcting and 'human-in-the-loop' behaviors.
* Access data sources, including freeform browsing and querying both structured and unstructured documents.
* Synthesize the final writeup for the multi-step analysis and disseminate it to platforms and colleagues downstream.
You can see the agent in action in the linked video.
The autonomous mode is in closed beta for now. We’re deploying it across select customers for the next month, battle-testing the architecture and taking notes. We're also excited about expanding and upgrading key parts of the architecture to unlock features such as:
* “Save checkpoints” agentic workflows to go back and forth between “execution paths” and specific nodes for a given workflow.
* "Time travel" capability with Apache Iceberg to enable versioning of data pipelines.
* Agent-to-agent interaction: have different agents collaborate on an analytical report to populate, review, and evaluate it before a human check is required.
* Asynchronous agents, autonomously executing background tasks like documenting newly connected datasets, identifying data drifts and compiling regular reports.
Excited to learn about other autonomous data agents out there and looking forward to your comments!
Depends on your needs.
A good CS undergrad is still very valuable, and for me a good structure, peers and context in which I just have to learn this help a lot, so I might’ve gone for a top CS uni or something. Besides you meet so many talented people.
Interactive online platforms are amazing nowadays so I would definitely try Datacamp to explore the basics and feel it out, maybe even stick with it if I’d see I can make a decent progress on my own.
And definitely would jump in the amazing online communities that are out there, make fun projects alongside others and so on.
I’d explore my bootcamp options too. If I’d see something like launchshool for DS, I would go for that :)
This isn't new by the way. Many people talked about it. Off the top of my head, Patrick Collison mentions the importance of reviewing and expanding your product-market fit on Tim Ferris's podcast.
Assuming you're confident around CS fundamentals, good software engineering practices, and higher math (statistics, probability theory, linear algebra) — you can take shortcuts that won't cut it for complete newbies, like playing around with other people's Jupyter notebooks with handy examples and reading good books/papers that come with lots of prerequisites.
I'm curious to hear what do you want to learn Data Science for though. Stating your goal and possible areas you want to focus on would help to give you the right resources a lot.
Hi. Thanks a lot for your answer! Those links will be of great help. :)
I was actually skeptic about getting an answer on this after 24h without one - thought it would become unreachable.
I am indeed confident around higher math, analysis of algorithms, software engineering etc, maybe I'll get to take some shortcuts.
I work at a company that collects data from social platforms. We used to provide our clients a data visualization tool but, lately, we started to also analyse the data and build custom reports. The thing is none of us have studied data analysis (nor data science), and it is currently impossible for us to hire. So I realized how useful that kind of knowledge is, even if I'm not gonna learn it all in the short term.
I work for online learning startup [1]. Here are some pointers:
1. Lambda School has some great reviews, high % of students employed and great community. It's very intense though (9mo full-time or 18mo part-time).
2. Try reading some of the reviews on review sites yourself and see if it sounds objective. I find major part of reviews to be helpful, so don't dismiss a big data source without validating its quality.
3. Reddit Data Science communities have a lot of detailed reviews and advice on bootcamps. I recommend looking through some, although Reddit users are a bit biased towards heroic self-learning.
If you elaborate a bit why you want to go to DS bootcamp, I could give you a more specific recommendation :)
Hey, what are your thoughts on going through the DS track at Yandex with the goal of landing a data scientist position in a competitive job market like San Francisco? Is this a realistic goal?
I have a bachelor's degree in a non-quantitative study. I recognize that job descriptions for data science, more often than not, stress their minimum requirements to be a Master's degree (PhD preferred) in a quantitative discipline like physics, math, statistics, etc. I feel this to be much more strictly enforced than compared to software engineering positions where you're much more likely to come across self-taught devs. I work on the product side and the data scientist attached to my team holds her M.S. in Statistics so that seems to hold up from everything I've observed. Essentially, I don't want to rush into studying and learning statistics/data science if the effort is futile or just not likely from the beginning.
Also does the job guarantee apply to students in the US? Or is that for Russian students? Does your program have any US employers as hiring partners?
I’m Anton, a founding engineer at Athena Intelligence. Athena is an AI data platform+agent that supports workflows for enterprise teams.
Over the last two years we’ve deployed the state-of-the-art data LLM tooling to enterprise clients across several domains, zeroing in on co-piloting workflows and offloading more and more of the work to AI agents.
Today we’re proud to share a demo of the first (to our knowledge) autonomous data AI agent!
Athena engineering team has built several key pillars for end-to-end autonomous workflows:
* Data platform: Athena connects to all structured data sources and supports embeds of all popular unstructured document types. We're built on Apache Superset and Iceberg, balancing secure multi-database queries on client warehouses with lightning-fast data pipelines for complex workflows. Building an unstructured document platform in addition to that has enabled us to offer our users granular control over their knowledge retrieval pipelines. We support a broad scope of use cases, ranging from low-latency, high-volume simple retrievals with Groq Llama 3, to involved, self-reflective RAG pipelines powered by the most capable models (Clude 3 models, GPT-4T) and adaptive architectures.
* Execution platform: Athena has integrated SQL Editor and Jupyter Notebooks environments to work with data, Notion-like reports and Figma-like whiteboard to compile end knowledge work outputs, and Browse environment to explore and retrieve information from web. We are big believers in providing users with more UI options so they can experiment and find workflows that work for them. Most of our “apps” use Yjs backend, allowing live collaboration between people and agents.
* Headless and Human modes: every environment and its metadata are transparent to and can be manipulated by Athena (in addition to humans through UI). That means that any item of work can be collaborated on between AI and humans in a two-way fashion, in real time.
* Agentic tools and orchestration: each environment has a set of retrieval pipelines, tools, agents and agent orchestration items to support execution on specific tasks (example: find latest product catalog online) and end-to-end workflows (update our numbers and externalities, make a forecast for next months orders). We serve a wide range of models, both proprietary and open-source, through agentic architectures built with Langchain and Langgraph.
As the result, you have an agent that can:
* Intake real-world problems through existing workplace channels like email or Slack.
* Compile and iterate on a high-level plan to solve the problem.
* Execute the plan step-by-step, incorporating self-correcting and 'human-in-the-loop' behaviors.
* Access data sources, including freeform browsing and querying both structured and unstructured documents.
* Synthesize the final writeup for the multi-step analysis and disseminate it to platforms and colleagues downstream.
You can see the agent in action in the linked video.
The autonomous mode is in closed beta for now. We’re deploying it across select customers for the next month, battle-testing the architecture and taking notes. We're also excited about expanding and upgrading key parts of the architecture to unlock features such as:
* “Save checkpoints” agentic workflows to go back and forth between “execution paths” and specific nodes for a given workflow.
* "Time travel" capability with Apache Iceberg to enable versioning of data pipelines.
* Agent-to-agent interaction: have different agents collaborate on an analytical report to populate, review, and evaluate it before a human check is required.
* Asynchronous agents, autonomously executing background tasks like documenting newly connected datasets, identifying data drifts and compiling regular reports.
Excited to learn about other autonomous data agents out there and looking forward to your comments!