Skip to content

Extend AI Actions

By extending AI Actions, you can make regular content management and editing tasks more appealing and less demanding. You can start by integrating additional AI services to the existing action types or develop custom ones that impact completely new areas of application. For example, you can create a handler that connects to a translation model and use it to translate your website on-the-fly, or generate illustrations based on a body of an article.

Execute Actions

You can execute AI Actions by using the ActionServiceInterface service, as in the following example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
        $action = new GenerateAltTextAction(new Image([$imageEncodedInBase64]));

        $action->setRuntimeContext(new RuntimeContext(['languageCode' => $languageCode]));
        $action->setActionContext(
            new ActionContext(
                new ActionConfigurationOptions(['default_locale_fallback' => 'en']), // System context
                new ActionConfigurationOptions(['max_lenght' => 100]), // Action Type options
                new ActionConfigurationOptions( // Action Handler options
                    [
                        'prompt' => 'Generate the alt text for this image in less than 100 characters.',
                        'temperature' => 0.7,
                        'max_tokens' => 4096,
                        'model' => 'gpt-4o-mini',
                    ]
                )
            )
        );

        $output = $this->actionService->execute($action)->getOutput();

The GenerateAltTextAction is a built-in action that implements the ActionInterface, takes an Image as an input, and generates the alternative text in the response.

This action is parameterized with the RuntimeContext and the ActionContext, which allows you to pass additional options to the Action before it's executed.

Type of context Type of options Usage Example
Runtime Context Runtime options Sets additional parameters that are relevant to the specific action that is currently executed Information about the language of the content that is being processed
Action Context Action Type options Sets additional parameters for the Action Type Information about the expected response length
Action Context Action Handler options Sets additional parameters for the Action Handler Information about the model, temperature, prompt, and max tokens allowed
Action Context System options Sets additional information, not matching the other option collections Information about the fallback locale

Both ActionContext and RuntimeContext are passed to the Action Handler (an object implementing the ActionHandlerInterface) to execute the action. The Action Handler is responsible for combining all the options together, sending them to the AI service and returning an ActionResponse.

You can pass the Action Handler directly to the ActionServiceInterface::execute() method, which overrides all the other ways of selecting the Action Handler. You can also specify the Action Handler by including it in the provided Action Configuration. In other cases, the Action Handler is selected automatically. You can affect this choice by creating your own class implementing the ActionHandlerResolverInterface or by listening to the ResolveActionHandlerEvent Event sent by the default implementation.

You can influence the execution of an Action with two events:

Below you can find the full example of a Symfony Command, together with a matching service definition. The command finds the images modified in the last 24 hours, and adds the alternative text to them if it's missing.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
<?php

declare(strict_types=1);

namespace App\Command;

use Ibexa\Contracts\ConnectorAi\Action\ActionContext;
use Ibexa\Contracts\ConnectorAi\Action\DataType\Image;
use Ibexa\Contracts\ConnectorAi\Action\DataType\Text;
use Ibexa\Contracts\ConnectorAi\Action\GenerateAltTextAction;
use Ibexa\Contracts\ConnectorAi\Action\RuntimeContext;
use Ibexa\Contracts\ConnectorAi\ActionConfiguration\ActionConfigurationOptions;
use Ibexa\Contracts\ConnectorAi\ActionServiceInterface;
use Ibexa\Contracts\Core\Repository\ContentService;
use Ibexa\Contracts\Core\Repository\FieldTypeService;
use Ibexa\Contracts\Core\Repository\PermissionResolver;
use Ibexa\Contracts\Core\Repository\UserService;
use Ibexa\Contracts\Core\Repository\Values\Content\ContentList;
use Ibexa\Contracts\Core\Repository\Values\Content\Query\Criterion\ContentTypeIdentifier;
use Ibexa\Contracts\Core\Repository\Values\Content\Query\Criterion\DateMetadata;
use Ibexa\Contracts\Core\Repository\Values\Content\Query\Criterion\Operator;
use Ibexa\Contracts\Core\Repository\Values\Filter\Filter;
use Ibexa\Core\FieldType\Image\Value;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;

final class AddMissingAltTextCommand extends Command
{
    protected static $defaultName = 'app:add-alt-text';

    private const IMAGE_FIELD_IDENTIFIER = 'image';

    private ContentService $contentService;

    private PermissionResolver $permissionResolver;

    private UserService $userService;

    private FieldTypeService $fieldTypeService;

    private ActionServiceInterface $actionService;

    private string $projectDir;

    public function __construct(
        ContentService $contentService,
        PermissionResolver $permissionResolver,
        UserService $userService,
        FieldTypeService $fieldTypeService,
        ActionServiceInterface $actionService,
        string $projectDir
    ) {
        parent::__construct();
        $this->contentService = $contentService;
        $this->permissionResolver = $permissionResolver;
        $this->userService = $userService;
        $this->fieldTypeService = $fieldTypeService;
        $this->actionService = $actionService;
        $this->projectDir = $projectDir;
    }

    protected function configure(): void
    {
        $this->addArgument('user', InputArgument::OPTIONAL, 'Login of the user executing the actions', 'admin');
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $this->setUser($input->getArgument('user'));

        $modifiedImages = $this->getModifiedImages();
        $output->writeln(sprintf('Found %d modified image in the last 24h', $modifiedImages->getTotalCount()));

        /** @var \Ibexa\Contracts\Core\Repository\Values\Content\Content $content */
        foreach ($modifiedImages as $content) {
            /** @var ?Value $value */
            $value = $content->getFieldValue(self::IMAGE_FIELD_IDENTIFIER);

            if ($value === null || !$this->shouldGenerateAltText($value)) {
                $output->writeln(sprintf('Image %s has the image field empty or the alternative text is already specified. Skipping.', $content->getName()));
                continue;
            }

            $contentUpdateStruct = $this->contentService->newContentUpdateStruct();
            $value->alternativeText = $this->getSuggestedAltText($this->convertImageToBase64($value->uri), $content->getDefaultLanguageCode());
            $contentUpdateStruct->setField(self::IMAGE_FIELD_IDENTIFIER, $value);

            $updatedContent = $this->contentService->updateContent(
                $this->contentService->createContentDraft($content->getContentInfo())->getVersionInfo(),
                $contentUpdateStruct
            );
            $this->contentService->publishVersion($updatedContent->getVersionInfo());
        }

        return Command::SUCCESS;
    }

    private function getSuggestedAltText(string $imageEncodedInBase64, string $languageCode): string
    {
        $action = new GenerateAltTextAction(new Image([$imageEncodedInBase64]));

        $action->setRuntimeContext(new RuntimeContext(['languageCode' => $languageCode]));
        $action->setActionContext(
            new ActionContext(
                new ActionConfigurationOptions(['default_locale_fallback' => 'en']), // System context
                new ActionConfigurationOptions(['max_lenght' => 100]), // Action Type options
                new ActionConfigurationOptions( // Action Handler options
                    [
                        'prompt' => 'Generate the alt text for this image in less than 100 characters.',
                        'temperature' => 0.7,
                        'max_tokens' => 4096,
                        'model' => 'gpt-4o-mini',
                    ]
                )
            )
        );

        $output = $this->actionService->execute($action)->getOutput();

        assert($output instanceof Text);

        return $output->getText();
    }

    private function convertImageToBase64(?string $uri): string
    {
        $file = file_get_contents($this->projectDir . \DIRECTORY_SEPARATOR . 'public' . \DIRECTORY_SEPARATOR . $uri);
        if ($file === false) {
            throw new \RuntimeException('Cannot read file');
        }

        return 'data:image/jpeg;base64,' . base64_encode($file);
    }

    private function getModifiedImages(): ContentList
    {
        $filter = (new Filter())
            ->withCriterion(
                new DateMetadata(DateMetadata::MODIFIED, Operator::GTE, strtotime('-1 day'))
            )
        ->andWithCriterion(new ContentTypeIdentifier('image'));

        return $this->contentService->find($filter);
    }

    private function shouldGenerateAltText(Value $value): bool
    {
        return $this->fieldTypeService->getFieldType('ezimage')->isEmptyValue($value) === false &&
            $value->isAlternativeTextEmpty();
    }

    private function setUser(string $userLogin): void
    {
        $this->permissionResolver->setCurrentUserReference($this->userService->loadUserByLogin($userLogin));
    }
}
1
2
3
    App\Command\AddMissingAltTextCommand:
        arguments:
            $projectDir: '%kernel.project_dir%'

Executing Actions this way has a major drawback: all the parameters are stored directly in the code and cannot be easily reused or changed. To manage configurations of an AI Action you need to use another concept: Action Configurations.

Action Configurations

Manage Action Configurations

Action Configurations allow you to store the parameters for a given Action in the database and reuse them when needed. They can be managed through the back office, data migrations, or through the PHP API.

To manage Action Configurations through the PHP API, you need to use the ActionConfigurationServiceInterface service.

You can manage them using the following methods:

See the AI Actions event reference for a list of events related to these operations.

You can get a specific Action Configuration using the ActionConfigurationServiceInterface::getActionConfiguration() method and search for them using the ActionConfigurationServiceInterface::findActionConfigurations() method. See Action Configuration Search Criteria reference and Action Configuration Search Sort Clauses reference to discover query possibilities.

The following example creates a new Action Configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
        $refineTextActionType = $this->actionTypeRegistry->getActionType('refine_text');

        $actionConfigurationCreateStruct = new ActionConfigurationCreateStruct('rewrite_casual');

        $actionConfigurationCreateStruct->setType($refineTextActionType);
        $actionConfigurationCreateStruct->setName('eng-GB', 'Rewrite in casual tone');
        $actionConfigurationCreateStruct->setDescription('eng-GB', 'Rewrites the text using a casual tone');
        $actionConfigurationCreateStruct->setActionHandler('openai-text-to-text');
        $actionConfigurationCreateStruct->setActionHandlerOptions(new ArrayMap([
            'max_tokens' => 4000,
            'temperature' => 1,
            'prompt' => 'Rewrite this content to improve readability. Preserve meaning and crucial information but use casual language accessible to a broader audience.',
            'model' => 'gpt-4-turbo',
        ]));
        $actionConfigurationCreateStruct->setEnabled(true);

        $this->actionConfigurationService->createActionConfiguration($actionConfigurationCreateStruct);

Actions Configurations are tied to a specific Action Type and are translatable.

Execute Actions with Action Configurations

Reuse existing Action Configurations to simplify the execution of AI Actions. You can pass one directly to the ActionServiceInterface::execute() method:

1
2
3
4
5
6
7
8
        $action = new RefineTextAction(new Text([
            <<<TEXT
            Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, 
            and which usually results in protein folding into a specific 3D structure that determines its activity.
TEXT
        ]));
        $actionConfiguration = $this->actionConfigurationService->getActionConfiguration('rewrite_casual');
        $actionResponse = $this->actionService->execute($action, $actionConfiguration)->getOutput();

The passed Action Configuration is only taken into account if the Action Context was not passed to the Action directly using the ActionInterface::setActionContext() method. The ActionServiceInterface service extracts the configuration options from the Action Configuration object and builds the Action Context object internally:

  • Action Type options are mapped to Action Type options in the Action Context
  • Action Handler options are mapped to Action Handler options in the Action Context
  • System Context options are modified using the ContextEvent event

Create custom Action Handler

Ibexa DXP comes with a built-in connector to OpenAI services, but you're not limited to it and can add support for additional AI services in your application.

The following example adds a new Action Handler connecting to a local AI run using the llamafile project which you can use to execute Text-To-Text Actions, such as the built-in "Refine Text" Action.

Register a custom Action Handler in the system.

Create a class implementing the ActionHandlerInterface and register it as a service:

  • The ActionHandlerInterface::supports() method decides whether the Action Handler is able to execute given Action.
  • The ActionHandlerInterface::handle() method is responsible for combining all the Action options together, sending them to the AI service and forming an Action Response.
  • The ActionHandlerInterface::getIdentifier() method returns the identifier of the Action Handler which you can use to refer to it in other places in the code.

See the code sample below, together with a matching service definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
<?php

declare(strict_types=1);

namespace App\AI\Handler;

use Ibexa\Contracts\ConnectorAi\Action\ActionHandlerInterface;
use Ibexa\Contracts\ConnectorAi\Action\DataType\Text;
use Ibexa\Contracts\ConnectorAi\Action\Response\TextResponse;
use Ibexa\Contracts\ConnectorAi\Action\TextToText\Action as TextToTextAction;
use Ibexa\Contracts\ConnectorAi\ActionInterface;
use Ibexa\Contracts\ConnectorAi\ActionResponseInterface;
use Symfony\Contracts\HttpClient\HttpClientInterface;

final class LLaVaTextToTextActionHandler implements ActionHandlerInterface
{
    private HttpClientInterface $client;

    private string $host;

    public const IDENTIFIER = 'LLaVATextToText';

    public function __construct(HttpClientInterface $client, string $host = 'http://localhost:8080')
    {
        $this->client = $client;
        $this->host = $host;
    }

    public function supports(ActionInterface $action): bool
    {
        return $action instanceof TextToTextAction;
    }

    public function handle(ActionInterface $action, array $context = []): ActionResponseInterface
    {
        /** @var \Ibexa\Contracts\ConnectorAi\Action\DataType\Text */
        $input = $action->getInput();
        $text = $this->sanitizeInput($input->getText());

        $systemMessage = $action->hasActionContext() ? $action->getActionContext()->getActionHandlerOptions()->get('system_prompt', '') : '';

        $response = $this->client->request(
            'POST',
            sprintf('%s/v1/chat/completions', $this->host),
            [
                'headers' => [
                    'Authorization: Bearer no-key',
                ],
                'json' => [
                    'model' => 'LLaMA_CPP',
                    'messages' => [
                        (object)[
                            'role' => 'system',
                            'content' => $systemMessage,
                        ],
                        (object)[
                            'role' => 'user',
                            'content' => $text,
                        ],
                    ],
                    'temperature' => 0.7,
                ],
            ]
        );

        $output = strip_tags(json_decode($response->getContent(), true)['choices'][0]['message']['content']);

        return new TextResponse(new Text([$output]));
    }

    public static function getIdentifier(): string
    {
        return self::IDENTIFIER;
    }

    private function sanitizeInput(string $text): string
    {
        return str_replace(["\n", "\r"], ' ', $text);
    }
}
1
2
3
4
    App\AI\Handler\LLaVATextToTextActionHandler:
        tags:
            - { name: ibexa.ai.action.handler, priority: 0 }
            - { name: ibexa.ai.action.handler.text_to_text, priority: 0 }

The ibexa.ai.action.handler tag is used by the ActionHandlerResolverInterface to find all the Action Handlers in the system.

The built-in Action Types use service tags to find Action Handlers capable of handling them and display in the back office UI:

  • Refine Text uses the ibexa.ai.action.handler.text_to_text service tag
  • Generate Alt Text uses the ibexa.ai.action.handler.image_to_text service tag

Provide Form configuration

Form configuration makes the Handler configurable by using the back office. The example handler uses the system_prompt option, which becomes part of the Action Configuration UI thanks to the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<?php

declare(strict_types=1);

namespace App\Form\Type;

use Symfony\Component\Form\AbstractType;
use Symfony\Component\Form\Extension\Core\Type\TextareaType;
use Symfony\Component\Form\FormBuilderInterface;
use Symfony\Component\OptionsResolver\OptionsResolver;

final class TextToTextOptionsType extends AbstractType
{
    public function buildForm(FormBuilderInterface $builder, array $options): void
    {
        $builder->add('system_prompt', TextareaType::class, [
            'required' => true,
            'disabled' => $options['translation_mode'],
            'label' => 'System message',
        ]);
    }

    public function configureOptions(OptionsResolver $resolver): void
    {
        $resolver->setDefaults([
            'translation_domain' => 'app_ai',
            'translation_mode' => false,
        ]);

        $resolver->setAllowedTypes('translation_mode', 'bool');
    }
}
1
2
3
4
5
6
7
    app.connector_ai.action_configuration.handler.llava_text_to_text.form_mapper.options:
        class: Ibexa\Bundle\ConnectorAi\Form\FormMapper\ActionConfiguration\ActionHandlerOptionsFormMapper
        arguments:
            $formType: 'App\Form\Type\TextToTextOptionsType'
        tags:
            - name: ibexa.connector_ai.action_configuration.form_mapper.options
              type: !php/const \App\AI\Handler\LLaVaTextToTextActionHandler::IDENTIFIER

The created Form Type adds the system_prompt field to the Form. Use the Ibexa\Bundle\ConnectorAi\Form\FormMapper\ActionConfiguration\ActionHandlerOptionsFormMapper class together with the ibexa.connector_ai.action_configuration.form_mapper.options service tag to make it part of the Action Handler options form. Pass the Action Handler identifier (LLaVATextToText) as the type when tagging the service.

The Action Handler and Action Type options are rendered in the back office using the built-in Twig option formatter. You can create your own formatting by creating a class implementing the OptionsFormatterInterface interface and aliasing it to Ibexa\Contracts\ConnectorAi\ActionConfiguration\OptionsFormatterInterface.

The following service definition switches the options rendering to the other built-in option formatter, displaying the options as JSON.

1
2
    Ibexa\Contracts\ConnectorAi\ActionConfiguration\OptionsFormatterInterface:
        alias: Ibexa\ConnectorAi\ActionConfiguration\JsonOptionsFormatter

Custom Action Type use case

With custom Action Types you can create your own tasks for the AI services to perform. They can be integrated with the rest of the AI framework provided by Ibexa and incorporated into the back office.

The following example shows how to implement a custom Action Type dedicated for transcribing audio with an example Handler using the OpenAI's Whisper project.

Create custom Action Type

Start by creating your own Action Type, a class implementing the ActionTypeInterface. The class needs to define following parameters of the Action Type:

  • name
  • identifier
  • input type identifier
  • output type identifier
  • Action object
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
<?php

declare(strict_types=1);

namespace App\AI\ActionType;

use App\AI\Action\TranscribeAudioAction;
use App\AI\DataType\Audio;
use Ibexa\Contracts\ConnectorAi\Action\DataType\Text;
use Ibexa\Contracts\ConnectorAi\ActionInterface;
use Ibexa\Contracts\ConnectorAi\ActionType\ActionTypeInterface;
use Ibexa\Contracts\ConnectorAi\DataType;
use Ibexa\Contracts\Core\Exception\InvalidArgumentException;

final class TranscribeAudioActionType implements ActionTypeInterface
{
    public const IDENTIFIER = 'transcribe_audio';

    /** @var iterable<\Ibexa\Contracts\ConnectorAi\Action\ActionHandlerInterface> */
    private iterable $actionHandlers;

    /** @param iterable<\Ibexa\Contracts\ConnectorAi\Action\ActionHandlerInterface> $actionHandlers*/
    public function __construct(iterable $actionHandlers)
    {
        $this->actionHandlers = $actionHandlers;
    }

    public function getIdentifier(): string
    {
        return self::IDENTIFIER;
    }

    public function getName(): string
    {
        return 'Transcribe audio';
    }

    public function getInputIdentifier(): string
    {
        return Audio::getIdentifier();
    }

    public function getOutputIdentifier(): string
    {
        return Text::getIdentifier();
    }

    public function getOptions(): array
    {
        return [];
    }

    public function createAction(DataType $input, array $parameters = []): ActionInterface
    {
        if (!$input instanceof Audio) {
            throw new InvalidArgumentException(
                'audio',
                'expected \App\AI\DataType\Audio type, ' . get_debug_type($input) . ' given.'
            );
        }

        return new TranscribeAudioAction($input);
    }

    public function getActionHandlers(): iterable
    {
        return $this->actionHandlers;
    }
}
1
2
3
4
5
6
7
8
    App\AI\ActionType\TranscribeAudioActionType:
        arguments:
            $actionHandlers: !tagged_iterator
                tag: app.connector_ai.action.handler.audio_to_text
                default_index_method: getIdentifier
                index_by: key
        tags:
            - { name: ibexa.ai.action.type, identifier: !php/const \App\AI\ActionType\TranscribeAudioActionType::IDENTIFIER }

The service definition introduces a custom app.connector_ai.action.handler.audio_to_text service tag to mark all the handlers capable of working with this Action Type. The ibexa.ai.action.type service tag registers the class in the service container as a new Action Type.

If the Action Type is meant to be used mainly with prompt-based systems you can use the LLMBaseActionTypeInterface interface as the base for your Action Type. It allows you to define a base prompt directly in the Action Type that can be common for all Action Configurations.

Action Type names can be localized using the Translation component. See the built-in Action Types like Generate Alt Text or Refine Text for an example.

Create custom Data classes

The TranscribeAudio Action Type requires adding two data classes that exists in its definition:

  • an Audio class, implementing the DataType interface, to store the input data for the Action
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<?php

declare(strict_types=1);

namespace App\AI\DataType;

use Ibexa\Contracts\ConnectorAi\DataType;

/**
 * @implements DataType<string>
 */
final class Audio implements DataType
{
    /** @var non-empty-array<string> */
    private array $base64;

    /**
     * @param non-empty-array<string> $base64
     */
    public function __construct(array $base64)
    {
        $this->base64 = $base64;
    }

    public function getBase64(): string
    {
        return reset($this->base64);
    }

    public function getList(): array
    {
        return $this->base64;
    }

    public static function getIdentifier(): string
    {
        return 'audio';
    }
}
  • an TranscribeAudioAction class, implementing the ActionInterface interface. Pass this object to the ActionServiceInterface::execute() method to execute the action.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<?php

declare(strict_types=1);

namespace App\AI\Action;

use App\AI\DataType\Audio;
use Ibexa\Contracts\ConnectorAi\Action\Action;

final class TranscribeAudioAction extends Action
{
    private Audio $audio;

    public function __construct(Audio $audio)
    {
        $this->audio = $audio;
    }

    public function getParameters(): array
    {
        return [];
    }

    public function getInput(): Audio
    {
        return $this->audio;
    }

    public function getActionTypeIdentifier(): string
    {
        return 'transcribe_audio';
    }
}

Create custom Action Type options form

Custom Form Type is needed if the Action Type requires additional options configurable in the UI. The following example adds a checkbox field that indicates to the Action Handler whether the transcription should include the timestamps.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<?php

declare(strict_types=1);

namespace App\Form\Type;

use Symfony\Component\Form\AbstractType;
use Symfony\Component\Form\Extension\Core\Type\CheckboxType;
use Symfony\Component\Form\FormBuilderInterface;
use Symfony\Component\OptionsResolver\OptionsResolver;

final class TranscribeAudioOptionsType extends AbstractType
{
    public function buildForm(FormBuilderInterface $builder, array $options): void
    {
        $builder->add('include_timestamps', CheckboxType::class, [
            'required' => false,
            'disabled' => $options['translation_mode'],
            'label' => 'Include timestamps',
        ]);
    }

    public function configureOptions(OptionsResolver $resolver): void
    {
        $resolver->setDefaults([
            'translation_domain' => 'app_ai',
            'translation_mode' => false,
        ]);

        $resolver->setAllowedTypes('translation_mode', 'bool');
    }
}
1
2
3
4
5
6
7
    app.connector_ai.action_configuration.handler.transcribe_audio.form_mapper.options:
        class: Ibexa\Bundle\ConnectorAi\Form\FormMapper\ActionConfiguration\ActionTypeOptionsFormMapper
        arguments:
            $formType: 'App\Form\Type\TranscribeAudioOptionsType'
        tags:
            - name: ibexa.connector_ai.action_configuration.form_mapper.action_type_options
              type: !php/const \App\AI\ActionType\TranscribeAudioActionType::IDENTIFIER

The built-in Ibexa\Bundle\ConnectorAi\Form\FormMapper\ActionConfiguration\ActionTypeOptionsFormMapper renders the Form Type in the back office when editing the Action Configuration for a specific Action Type (indicated by the type attribute of the ibexa.connector_ai.action_configuration.form_mapper.action_type_options service tag).

Create custom Action Handler

An example Action Handler combines the input data and the Action Type options and passes them to the Whisper executable to form an Action Response. The language of the transcribed data is extracted from the Runtime Context for better results. The Action Type options provided in the Action Context dictate whether the timestamps will be removed before returning the result.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
<?php

declare(strict_types=1);

namespace App\AI\Handler;

use App\AI\ActionType\TranscribeAudioActionType;
use Ibexa\Contracts\ConnectorAi\Action\ActionHandlerInterface;
use Ibexa\Contracts\ConnectorAi\Action\DataType\Text;
use Ibexa\Contracts\ConnectorAi\Action\Response\TextResponse;
use Ibexa\Contracts\ConnectorAi\ActionInterface;
use Ibexa\Contracts\ConnectorAi\ActionResponseInterface;
use Symfony\Component\Process\Exception\ProcessFailedException;
use Symfony\Component\Process\Process;

final class WhisperAudioToTextActionHandler implements ActionHandlerInterface
{
    private const TIMESTAMP_FORMAT = '/^\[\d{2}:\d{2}\.\d{3} --> \d{2}:\d{2}\.\d{3}]\s*/';

    public function supports(ActionInterface $action): bool
    {
        return $action->getActionTypeIdentifier() === TranscribeAudioActionType::IDENTIFIER;
    }

    public function handle(ActionInterface $action, array $context = []): ActionResponseInterface
    {
        /** @var \App\AI\DataType\Audio $input */
        $input = $action->getInput();

        $path = $this->saveInputToFile($input->getBase64());

        $arguments = ['whisper'];

        $language = $action->getRuntimeContext()?->get('languageCode');
        if ($language !== null) {
            $arguments[] = sprintf('--language=%s', substr($language, 0, 2));
        }

        $arguments[] = '--output_format=txt';
        $arguments[] = $path;

        $process = new Process($arguments);
        $process->run();

        if (!$process->isSuccessful()) {
            unlink($path);
            throw new ProcessFailedException($process);
        }

        $output = $process->getOutput();

        $includeTimestamps = $action->getActionContext()
            ?->getActionTypeOptions()
            ->get('include_timestamps', false)
            ?? false;

        if (!$includeTimestamps) {
            $output = $this->removeTimestamps($output);
        }

        unlink($path);

        return new TextResponse(new Text([$output]));
    }

    public static function getIdentifier(): string
    {
        return 'whisper_audio_to_text';
    }

    private function removeTimestamps(string $text): string
    {
        $lines = explode(PHP_EOL, $text);

        $processedLines = array_map(static function (string $line): string {
            return preg_replace(self::TIMESTAMP_FORMAT, '', $line) ?? '';
        }, $lines);

        return implode(PHP_EOL, $processedLines);
    }

    private function saveInputToFile(string $audioEncodedInBase64): string
    {
        $filename = uniqid('audio');
        $path = sys_get_temp_dir() . \DIRECTORY_SEPARATOR . $filename;
        file_put_contents($path, base64_decode($audioEncodedInBase64));

        return $path;
    }
}
1
2
3
4
    App\AI\Handler\WhisperAudioToTextActionHandler:
        tags:
            - { name: ibexa.ai.action.handler, priority: 0 }
            - { name: app.connector_ai.action.handler.audio_to_text, priority: 0 }

Integrate with the REST API

At this point the custom Action Type can already be executed by using the PHP API. To integrate it with the AI Actions execute endpoint you need to create additional classes responsible for parsing the request and response data. See adding custom media type and creating new REST resource to learn more about extending the REST API.

Handle input data

Start by creating an Input Parser able to handle the application/vnd.ibexa.api.ai.TranscribeAudio media type.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
<?php

declare(strict_types=1);

namespace App\AI\REST\Input\Parser;

use App\AI\DataType\Audio as AudioDataType;
use App\AI\REST\Value\TranscribeAudioAction;
use Ibexa\ConnectorAi\REST\Input\Parser\Action;
use Ibexa\Contracts\ConnectorAi\Action\RuntimeContext;
use Ibexa\Contracts\Rest\Input\ParsingDispatcher;
use Ibexa\Rest\Input\BaseParser;

final class TranscribeAudio extends BaseParser
{
    public const AUDIO_KEY = 'Audio';
    public const BASE64_KEY = 'base64';

    /** @param array<mixed> $data */
    public function parse(array $data, ParsingDispatcher $parsingDispatcher): TranscribeAudioAction
    {
        $this->assertInputIsValid($data);
        $runtimeContext = $this->getRuntimeContext($data);

        return new TranscribeAudioAction(
            new AudioDataType([$data[self::AUDIO_KEY][self::BASE64_KEY]]),
            $runtimeContext
        );
    }

    /** @param array<mixed> $data */
    private function assertInputIsValid(array $data): void
    {
        if (!array_key_exists(self::AUDIO_KEY, $data)) {
            throw new \InvalidArgumentException('Missing audio key');
        }

        if (!array_key_exists(self::BASE64_KEY, $data[self::AUDIO_KEY])) {
            throw new \InvalidArgumentException('Missing base64 key');
        }
    }

    /**
     * @param array<string, mixed> $data
     */
    private function getRuntimeContext(array $data): RuntimeContext
    {
        return new RuntimeContext(
            $data[Action::RUNTIME_CONTEXT_KEY] ?? []
        );
    }
}
1
2
3
4
    App\AI\REST\Input\Parser\TranscribeAudio:
        parent: Ibexa\Rest\Server\Common\Parser
        tags:
            - { name: ibexa.rest.input.parser, mediaType: application/vnd.ibexa.api.ai.TranscribeAudio }

The TranscribeAudioAction is a value object holding the parsed request data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<?php

declare(strict_types=1);

namespace App\AI\REST\Value;

use App\AI\DataType\Audio;
use Ibexa\Contracts\ConnectorAi\Action\RuntimeContext;

final class TranscribeAudioAction
{
    private Audio $input;

    private RuntimeContext $runtimeContext;

    public function __construct(
        Audio $input,
        RuntimeContext $runtimeContext
    ) {
        $this->input = $input;
        $this->runtimeContext = $runtimeContext;
    }

    public function getInput(): Audio
    {
        return $this->input;
    }

    public function getRuntimeContext(): RuntimeContext
    {
        return $this->runtimeContext;
    }
}

Handle output data

To transform the TranscribeAudioAction into a REST response you need to create:

  • An AudioText value object holding the REST response data
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<?php

declare(strict_types=1);

namespace App\AI\REST\Value;

use Ibexa\ConnectorAi\REST\Value\RestActionResponse;

final class AudioText extends RestActionResponse
{
}
  • A resolver converting the Action Response returned from the PHP API layer into the AudioText object. The resolver is activated when application/vnd.ibexa.api.ai.AudioText media type is specified in the Accept header:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
<?php

declare(strict_types=1);

namespace App\AI\REST\Output\Resolver;

use App\AI\REST\Value\AudioText;
use Ibexa\ConnectorAi\REST\Output\ResolverInterface;
use Ibexa\Contracts\ConnectorAi\ActionResponseInterface;

final class AudioTextResolver implements ResolverInterface
{
    public function getRestValue(
        ActionResponseInterface $actionResponse
    ): AudioText {
        return new AudioText(
            $actionResponse->getOutput()
        );
    }
}
1
2
3
    App\AI\REST\Output\Resolver\AudioTextResolver:
        tags:
            - { name: ibexa.ai.action.mime_type, key: application/vnd.ibexa.api.ai.AudioText }
  • A visitor converting the response value object into a serialized REST response:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<?php

declare(strict_types=1);

namespace App\AI\REST\Output\ValueObjectVisitor;

use Ibexa\Contracts\Rest\Output\Generator;
use Ibexa\Contracts\Rest\Output\ValueObjectVisitor;
use Ibexa\Contracts\Rest\Output\Visitor;

final class AudioText extends ValueObjectVisitor
{
    private const OBJECT_IDENTIFIER = 'AudioText';

    public function visit(Visitor $visitor, Generator $generator, $data): void
    {
        $mediaType = 'ai.' . self::OBJECT_IDENTIFIER;
        $text = $data->getOutput();

        $generator->startObjectElement(self::OBJECT_IDENTIFIER, $mediaType);
        $visitor->setHeader('Content-Type', $generator->getMediaType($mediaType));

        $visitor->visitValueObject($text);

        $generator->endObjectElement(self::OBJECT_IDENTIFIER);
    }
}
1
2
3
4
    App\AI\REST\Output\ValueObjectVisitor\AudioText:
        parent: Ibexa\Contracts\Rest\Output\ValueObjectVisitor
        tags:
            - { name: ibexa.rest.output.value_object.visitor, type: App\AI\REST\Value\AudioText }

You can now execute a specific Action Configuration for the new custom Action Type through REST API by sending the following request:

1
2
3
POST /ai/action/execute/my_action_configuration HTTP/1.1
Accept: application/vnd.ibexa.api.ai.AudioText+json
Content-Type: application/vnd.ibexa.api.ai.TranscribeAudio+json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
    "TranscribeAudio": {
        "Audio": {
            "base64": "audioEncodedInBase64"
        },
        "RuntimeContext": {
            "languageCode": "eng-GB"
        }
    }
}

Integrate into the back office

The last step in fully integrating the Transcribe Audio Action Type embeds it directly into the back office, allowing Editors to invoke it while doing their daily work.

Extend the default editing template of the ezbinaryfile fieldtype by creating a new file called templates/themes/admin/admin/ui/fieldtype/edit/form_fields_binary_ai.html.twig. This template embeds the AI component, but only if a dedicated transcript field (of eztext type) is available in the same content type to store the content of the transcription.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{% extends '@ibexadesign/ui/field_type/edit/ezbinaryfile.html.twig' %}

{% block ezbinaryfile_preview %}
    {{ parent() }}

    {% set transcriptFieldIdentifier = 'transcript' %}
    {% set fieldTypeIdentifiers = form.parent.parent.vars.value|keys %}

    {% if transcriptFieldIdentifier in fieldTypeIdentifiers %}
        {% set module_id = 'TranscribeAudio' %}
        {% set ai_config_id = 'transcribe_audio' %}
        {% set container_selector = '.ibexa-edit-content' %}
        {% set input_selector = '.ibexa-field-edit-preview__action--preview' %}
        {% set output_selector = '#ezplatform_content_forms_content_edit_fieldsData_transcript_value' %}
        {% set cancel_wrapper_selector = '.ibexa-field-edit-preview__media-wrapper' %}

        {% embed '@ibexadesign/connector_ai/ui/ai_module/ai_component.html.twig' with {
            ai_config_id,
            container_selector,
            input_selector,
            output_selector,
        } %}
        {% endembed %}
    {% endif %}
{% endblock %}

And add it to the SiteAccess configuration for the admin_group:

1
2
3
4
5
6
7
ibexa:
    system:
        admin_group:
            admin_ui_forms:
                content_edit:
                    form_templates:
                        - { template: '@ibexadesign/admin/ui/fieldtype/edit/form_fields_binary_ai.html.twig', priority: -10 } }

The configuration of the AI component takes the following parameters:

  • module_id - name of the JavaScript module to handle the invoked action. ImgToText is a built-in one handling alternative text use case, TranscribeAudio is a custom one.
  • ai_config_id - identifier of the Action Type to load Action Configurations for. The ibexa_ai_config Twig function is used under the hood.
  • container_selector - CSS selector to narrow down the HTML area which is affected by the AI component.
  • input_selector - CSS selector indicating the input field (must be below the container_selector in the HTML structure).
  • output_selector - CSS selector indicating the output field (must be below the container_selector in the HTML structure).
  • cancel_wrapper_selector - CSS selector indicating the element to which the "Cancel AI" UI element is attached.

Now create the JavaScript module mentioned in the template that is responsible for:

  • gathering the input data (downloading the attached binary file and converting it into base64)
  • executing the Action Configuration chosen by the editor through the REST API
  • attaching the response to the output field

You can find the code of the module below. Place it in a file called assets/js/transcribe.audio.js

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import BaseAIComponent from '../../vendor/ibexa/connector-ai/src/bundle/Resources/public/js/core/base.ai.component';

export default class TranscribeAudio extends BaseAIComponent {
    constructor(mainElement, config) {
        super(mainElement, config);

        this.requestHeaders = {
            Accept: 'application/vnd.ibexa.api.ai.AudioText+json',
            'Content-Type': 'application/vnd.ibexa.api.ai.TranscribeAudio+json',
        };
    }

    getAudioInBase64() {
        const request = new XMLHttpRequest();
        request.open('GET', this.inputElement.href, false);
        request.overrideMimeType('text/plain; charset=x-user-defined');
        request.send();

        if (request.status === 200) {
            return this.convertToBase64(request.responseText);
        }
        else {
            this.processError('Error occured when decoding the file.');
        }
    }

    getRequestBody() {
        const body = {
            TranscribeAudio: {
                Audio: {
                    base64: this.getAudioInBase64(),
                },
                RuntimeContext: {},
            },
        };

        if (this.languageCode) {
            body.TranscribeAudio.RuntimeContext.languageCode = this.languageCode;
        }

        return JSON.stringify(body);
    }

    afterFetchData(response) {
        super.afterFetchData();

        if (response) {
            this.outputElement.value = response.AudioText.Text.text[0];
        }
    }

    toggle(forceEnabled) {
        super.toggle(forceEnabled);

        this.outputElement.disabled = !forceEnabled || !this.outputElement.disabled;
    }

    convertToBase64(data) {
        let binary = '';

        for (let i = 0; i < data.length; i++) {
            binary += String.fromCharCode(data.charCodeAt(i) & 0xff);
        }

        return btoa(binary);
    }
}

The last step is adding the module to the list of AI modules in the system, by using the provided addModule function.

Create a file called assets/js/addAudioModule.js:

1
2
3
4
import { addModule } from '../../vendor/ibexa/connector-ai/src/bundle/Resources/public/js/core/create.ai.module';
import TranscribeAudio from './transcribe.audio';

addModule(TranscribeAudio);

And include it into the back office using Webpack Encore. See configuring assets from main project files to learn more about this mechanism.

1
2
3
4
5
6
7
8
9
const ibexaConfigManager = require('./ibexa.webpack.config.manager.js');

ibexaConfigManager.add({
    ibexaConfig,
    entryName: 'ibexa-admin-ui-layout-js',
    newItems: [
        path.resolve(__dirname, './assets/js/addAudioModule.js')
    ],
});

Your custom Action Type is now fully integrated into the back office UI and can be used by the Editors.

Transcribe Audio Action Type integrated into the back office