Autogen Unveiled: Capabilities and Potential

Alternative Text
by OQTACORE TEAM
184

Table of content

alt

Today, we’ll overview something known as AutoGen by Microsoft.

It was released without much fanfare or advertising, unlike OpenAI, but it’s still a groundbreaking tool.

Why? That’s what we’ll explore today. 

What is No-Code?

No-Code. But what exactly is it?

No-code can be likened to an electric car: plug it in to charge, and it’s ready to use. This is the main advantage of no-code tools.

The “What You See Is What You Get” (WYSIWYG) interface allows you to interact visually, clicking and dragging elements. 

It’s important to understand that this is a 100% visual tool — there’s no possibility of writing code. Despite this, you get pre-built widgets that have been developed using manual coding, retaining the advantages of programming.

The target audience for no-code tools includes professional developers and experienced technologists, as well as a new class of developers like business technologists. This opens up the possibility of creating amazing applications even for those who are not technical experts.

In terms of applications, with no-code, you can create user interfaces for web and mobile apps, as well as dashboards for analytics displaying KPIs and charts. It also enables process automation, including decision making based on business rules, document processing, and integration between applications.

AutoGen as an LLM (Large Language Models)

AutoGen is a part of the rapidly evolving field of Large Language Models (LLMs).

But what exactly are LLMs?

They’ve played a significant role in the advancement of the No-Code approach. And it’s not just about ChatGPT. For virtually every task, there’s a specialized solution.

For copywriting and text generation, there are various options like ChatGPT, Claude, Llama 2, Cohere Command, and Jurassiccan – to name the simplest ones. For working with knowledge bases, meaning answering specific questions contained in digital archives, LLMs are incredibly useful.

When it comes to text classification, LLMs use clustering techniques. Applications include measuring customer sentiments, determining relationships between texts, and document retrieval.

And of course, there’s code generation. LLMs are proficient in generating code based on natural language queries. Examples include Amazon CodeWhisperer and the OpenAI Codex, used in GitHub Copilot, which can write code in Python, JavaScript, Ruby, and other programming languages. Other programming applications include creating SQL queries, writing command-line instructions, and designing websites. Let’s focus more on the code generation aspect.

What is AutoGen?

AutoGen is a Microsoft project designed to create and orchestrate autonomous agents for collaborative efforts. This versatile framework allows you to define various agents, assign them roles, and easily launch them into action. Interaction with these agents is also part of AutoGen’s functionality.

The essence of AutoGen lies in simplifying the creation and utilization of multiple agents working together towards a common goal. When several artificial intelligences collaborate, the results are usually of higher quality. AutoGen enables the integration of a multi-agent approach into any project and offers an alternative to OpenAI’s API.

AutoGen Overview

AutoGen is a framework that facilitates the orchestration, optimization, and automation of processes using large language models. It offers customizable and conversational agents, harnessing the power of advanced LLMs like GPT-4, and combines them with human interaction and tools. AutoGen allows for conversation between agents via automated chat, circumventing the limitations of the OpenAI API.

With AutoGen, creating a complex multi-agent communication system is simplified to defining agents with unique abilities and roles. For instance, creating an engineering team might involve an engineer, a project manager, and a quality specialist. Defining interactions between agents allows for precise tuning of their collaborative work.

AutoGen features two standard agents: the User Proxy Agent and the Assistant Agent. The User Proxy Agent acts on behalf of the user, making decisions and requesting input from the user. It can, for example, send code generated by an engineering agent to a human for approval or automatically execute it. The Assistant Agent, in this case, is a basic agent that writes code.

The User Proxy Agent and Assistant Agent from AutoGen can create an enhanced version of ChatGPT with a code interpreter and plugins. The Assistant Agent acts as an AI assistant, while the User Proxy Agent simulates user behavior, including executing code. Without it, one would have to run code in Visual Studio Code. However, the User Proxy Agent can do this automatically, saving time.

AutoGen automates communication between agents, allowing human intervention and feedback. The User Proxy Agent effectively includes a human in the process, requesting input when necessary.

Functionality of AutoGen

Now, let’s delve into the key features of AutoGen and find out how to navigate through its dashboard.

Build Tab:

Here, you can create skills, agents, and workflows. Pre-installed workflows are available, such as Travel Agent Workflow, Group Chat Workflow, General Agent Workflow, and Visualization Agent Workflow. These templates assist in executing various tasks. You can also craft your own workflows, define their characteristics, and choose from different summarization types like “none” or

Agent Management: 

Customize agents to perform specific tasks, including code creation, design, testing, and documentation. The ability to configure agents allows for the development of complex interaction systems between different departments and specializations.

Playground:

This is the primary space for testing and interacting with your agents. Here, you can experiment with various configurations and skills, a crucial aspect for adjusting and enhancing agent functionality.

Gallery:

This section is likely intended for storing and showcasing examples of work created using AutoGen Studio. The absence of saved projects due to application reinstallation emphasizes the importance of regularly saving your projects.

Agent Behavior Configuration:

Adjust the frequency of input requests from humans and system messages to control agent behavior. This enables agents to have their unique “personalities” and roles within the workflow.

AutoGen Studio simplifies the process of creating and managing multi-agent systems, making them accessible even for those who have faced challenges in this field. It can be described as a powerful tool that combines the capabilities of various AI technologies, allowing the creation of personalized skills for agents tailored to specific tasks.

Now, let’s explore some additional intriguing features:

Integration with GPT-4:

This feature enables the incorporation of the latest advancements in language models. Integration with GPT-4 significantly expands the capabilities of your agents, especially in understanding and generating text.

Adding Skills through API:

The ability to integrate with various APIs enhances the functionality of your agents. For instance, using arXiv for scientific article searches or 11 Labs for creating AI voices. This makes agents more flexible and adaptable to different tasks.

Utilizing DALL-E 3 for Image Generation:

This showcases that AutoGen Studio is not limited to text tasks only; it can also include visual aspects, significantly expanding the potential for content creation.

AutoGen + GPT or How to Not Write a Single Line of Code

But still, to install and configure AutoGen, you might need to check tutorials, guides, and work in the command line. Do you need to be a super-skilled developer and programmer for this?

Now, apparently not. Thinking Abacus has created an automatic constructor for this. They’ve developed a user-friendly GPT that transforms your ideas into the necessary number of agents.

So, to start, you can ask anything, but what caught our attention is one of the questions: “Can I upload a screenshot of the process?” This intrigued us — can it actually do that?

Let’s try it out. We have a screenshot of the Game Studio.

Upload the image and write: “Create a software development agency with four departments: design, coding, testing, and documentation. In each department, there will be one agent managing it. The intermediary agent for the user will act as the Chief Technical Officer (CTO), communicating with other agents to create the final product.” Normally, we would expand on what each department does, but let’s just hit Enter and see what happens, and find out if additional explanations are needed.

So, it creates, and by the way, we’re using only one agent per department, although, of course, you can use more, as shown here. In many AI studies, it has been shown that multiple agents working together, especially if they slightly differ from each other, usually yield better results than a single agent working in isolation. This is why code generation is so crucial because it allows us to fully harness this power.

So, it writes a Python script that you need to create these agents and describes everything. The intermediary agent is called CTO, the system message is how you tell him what he should do, you give him a mission.

Then it creates a design agent, system message: “Responsible for game design.” We think Chat Dev somewhere lists their system messages, so at this stage, it even makes sense to just take them and paste them here. But in general, you can change this text to anything.

In other words, if we combine generation capabilities with management capabilities, we get an incredibly powerful tool. You literally don’t need to write anything yourself. We just want to say one thing: incredible. We can already imagine how much it simplifies and speeds up the development process. And it makes it accessible not only to engineers but also to product managers, for example.

Conclusion

So, that was a basic overview of what AutoGen by Microsoft is all about. The beauty of AutoGen lies in its support for various communication schemes for complex workflows. Developers can use it to create a wide range of communication schemes involving autonomy of communication, the number of agents, and agent communication topology. This flexibility enables the creation of systems with varying degrees of complexity, covering a broad spectrum of applications from different domains.

You don’t need to learn coding from scratch; now, the combination of AI can do a lot for you. Obviously, at the moment, this doesn’t compare to a team of highly skilled engineers. There are issues, glitches, and challenges with generation, but it’s rapidly evolving, becoming truly impressive, and I’m looking forward to seeing where it goes.

Currently, since this is a new thing, some aspects may seem not quite polished, but over time, it will become more streamlined. If you find this interesting, get familiar with the installation process if you haven’t already. It might seem daunting, but it can genuinely be beneficial.

 

You can also read about

Rate this article
Please wait...