IP-Coster | WO2023099954 | DYNAMIC BATCHING FOR INFERENCE SYSTEM FOR TRANSFORMER-BASED GENERATION TASKS

Publication Number WO/2023/099954

Publication Date 08.06.2023

International Application No. PCT/IB2022/000666

International Filing Date 18.11.2022

Title **

[English] DYNAMIC BATCHING FOR INFERENCE SYSTEM FOR TRANSFORMER-BASED GENERATION TASKS

[French] MISE EN LOTS DYNAMIQUE POUR SYSTÈME D'INFÉRENCE POUR TÂCHES DE GÉNÉRATION BASÉES SUR UN TRANSFORMATEUR

Applicants **

FRIENDLIAI INC. Rm. 514, Bldg. 138, Gwanak-ro, Gwanak-gu Seoul 08826, KR

Inventors

YU, Gyeongin Friendliai Inc. Rm. 514, Bldg. 138 Gwanak-ro, Gwanak-gu Seoul 08826, KR

KIM, Geon-woo Friendliai Inc. Rm. 514, Bldg. 138, Gwanak-ro, Gwanak-gu Seoul 08826, KR

JEONG, Joo, Seong Friendliai Inc. Rm. 514, Bldg. 138, Gwanak-ro, Gwanak-gu Seoul 08826, KR

KIM, Soojeong Friendliai Inc. Rm. 514, Bldg. 138, Gwanak-ro, Gwanak-gu Seoul 08826, KR

CHUN, Byung-gon Friendliai Inc. Rm. 514, Bldg. 138, Gwanak-ro, Gwanak-gu Seoul 08826, KR

Priority Data

17/542,193 03.12.2021 US

17/881,549 04.08.2022 US

Application details

Total Number of Claims/PCT	*
Number of Independent Claims	*
Number of Priorities	*
Number of Multi-Dependent Claims	*
Number of Drawings	*
Pages for Publication	*
Number of Pages with Drawings	*
Pages of Specification	*
Sequence Listing	*
International Search Report is established	*
International Searching Authority	MOIP *
Applicant's Legal Status	Legal Entity *
Small Entity	*
Non-Commercial Organization	*
Small Entity, USA	*
Micro Entity, USA	*
Entry into National Phase under	Chapter I *
Translation

Recalculate

* The data is based on automatic recognition. Please verify and amend if necessary.

** IP-Coster compiles data from publicly available sources. If this data includes your personal information, you can contact us to request its removal.

Quotation for National Phase entry

Country	Stages	Total
China	Filing	1438
EPO	Filing, Examination	8601
Japan	Filing	591
South Korea	Filing	607
USA	Filing, Examination	2710

+ Add country

Total: 13947 USD

The term for entry into the National Phase has expired. This quotation is for informational purposes only

QUOTE TO EMAIL ONLINE QUOTE

Abstract[English] An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.[French] L'invention porte sur un système d'inférence appliquant un modèle de transformateur d'apprentissage automatique à un lot de demandes ayant une longueur d'entrée variable ou une longueur cible variable ou une longueur d'état interne variable par mise en lots sélective d'un sous-ensemble d'opérations dans le modèle de transformateur mais traitement des demandes dans le lot individuellement pour un sous-ensemble d'opérations dans le modèle de transformateur. Dans un mode de réalisation, l'opération à traiter individuellement est une opération d'attention d'un codeur ou d'un décodeur du modèle de transformateur. Au moyen de la mise en lots sélective, le système d'inférence peut permettre que des opérations de mise en lots soient effectuées pour un lot de demandes ayant une longueur d'entrée variable ou une longueur cible variable ou une longueur d'état interne variable pour utiliser les capacités de calcul parallèle d'accélérateurs matériels tout en empêchant les calculs inutiles qui surviennent pour des solutions de contournement qui limitent les données d'un lot de demandes à une même longueur.

WO2023099954 - DYNAMIC BATCHING FOR INFERENCE SYSTEM FOR TRANSFORMER-BASED GENERATION TASKS

Quotation for National Phase entry