IP-Coster | WO2023209632 | VOICE ATTRIBUTE CONVERSION USING SPEECH TO SPEECH

Publication Number WO/2023/209632

Publication Date 02.11.2023

International Application No. PCT/IB2023/054378

International Filing Date 27.04.2023

Title **

[English] VOICE ATTRIBUTE CONVERSION USING SPEECH TO SPEECH

[French] CONVERSION VOIX-VOIX D'ATTRIBUTS VOCAUX

Applicants **

MEANING.TEAM, INC

Inventors

CARMIEL, Yishay

WOJCIAK, Lukasz

ZELASKO, Piotr

VAINER, Jan

NEKVINDA, Tomas

PLATEK, Ondrej

Priority Data

17/731,474 28.04.2022 US

Application details

Total Number of Claims/PCT	*
Number of Independent Claims	*
Number of Priorities	*
Number of Multi-Dependent Claims	*
Number of Drawings	*
Pages for Publication	*
Number of Pages with Drawings	*
Pages of Specification	*
Sequence Listing	*
International Search Report is established	*
International Searching Authority	ILPO *
Applicant's Legal Status	Legal Entity *
Small Entity	*
Non-Commercial Organization	*
Small Entity, USA	*
Micro Entity, USA	*
Entry into National Phase under	Chapter I *
Translation

* The data is based on automatic recognition. Please verify and amend if necessary.

** IP-Coster compiles data from publicly available sources. If this data includes your personal information, you can contact us to request its removal.

Quotation for National Phase entry

Country	Stages	Total
China	Filing	1686
EPO	Filing, Examination	11689
Japan	Filing	590
South Korea	Filing	574
USA	Filing, Examination	4110

Total: 18,649

The term for entry into the National Phase has expired. This quotation is for informational purposes only

Abstract[English] There is provided a computer-implemented method of training a speech-to-speech (S2S) machine learning (ML) model for adapting voice attribute(s) of speech, comprising: creating an S2S training dataset of S2S records, wherein an S2S record comprises: a first audio content comprising speech having first voice attribute(s), and a ground truth label of a second audio content comprising speech having second voice attribute(s), wherein the first audio content and the second audio content have the same lexical content and are time-synchronized, wherein duration of phones of the second audio content are controlled in response to segment-level durations defined by the segment-level start and end time stamps and training the S2S ML model using the S2S training dataset, wherein the S2S ML model is fed an input of a source audio content with source voice attribute(s) and generates an outcome of the source audio content with target voice attribute(s).[French] L'invention concerne un procédé mis en œuvre par ordinateur d'entraînement d'un modèle d'apprentissage automatique (ML) voix-voix (S2S) pour adapter un ou plusieurs attributs vocaux de parole, comprenant : la création d'un ensemble de données d'entraînement S2S d'enregistrements S2S, un enregistrement S2S comprenant : un premier contenu audio comprenant une parole ayant un ou plusieurs attributs vocaux, et une étiquette de réalité de terrain d'un second contenu audio comprenant une parole ayant un ou plusieurs seconds attributs vocaux, le premier contenu audio et le second contenu audio ayant le même contenu lexical et étant synchronisés dans le temps, la durée des appels téléphoniques du second contenu audio étant commandée en réponse à des durées de niveau de segment définies par les estampilles temporelles de début et de fin de niveau de segment et l'entraînement du modèle ML S2S à l'aide de l'ensemble de données d'entraînement S2S, le modèle ML S2S étant alimenté par une entrée d'un contenu audio source avec un ou plusieurs attributs vocaux sources et générant un résultat du contenu audio source avec un ou plusieurs attributs vocaux cibles.

WO2023209632 - VOICE ATTRIBUTE CONVERSION USING SPEECH TO SPEECH

Quotation for National Phase entry