da Audio a Testo come?

Installazione, configurazione e uso di programmi e strumenti.
Avatar utente
neolinux
Rampante Reduce
Rampante Reduce
Messaggi: 5511
Iscrizione: giovedì 11 dicembre 2008, 21:52
Contatti:

da Audio a Testo come?

Messaggio da neolinux » mercoledì 9 settembre 2020, 1:40

da Audio a Testo come?
Pare che da un file vocale mp3 Transcriber 1.5.1 si possa con Ubuntu 18.04 ,

però gli mp3 Transcriber li interpreta come un fragoroso rumore, sembra un problema noto di codec ma non capisco cosa dovrei fare con GStream, o c'è software più semplice?

3mb di upload come ho visto in un sito online non mi basta, almeno 5mb o di più andrebbe bene.

Con Android negli smartphone queste cose paiono di ordinaria amministrazione, forse sbaglio a cercare con Linux - Ubuntu.
Ultima modifica di neolinux il mercoledì 16 settembre 2020, 19:02, modificato 1 volta in totale.

Avatar utente
neolinux
Rampante Reduce
Rampante Reduce
Messaggi: 5511
Iscrizione: giovedì 11 dicembre 2008, 21:52
Contatti:

Re: Riconoscimento vocale mp3 Transcriber 1.5.1 o altro???

Messaggio da neolinux » mercoledì 16 settembre 2020, 12:19

Le parole chiave giuste in synaptic (gestore pacchetti) sembrano queste:
speech recognition

Avatar utente
neolinux
Rampante Reduce
Rampante Reduce
Messaggi: 5511
Iscrizione: giovedì 11 dicembre 2008, 21:52
Contatti:

Re: da Audio a Testo come?

Messaggio da neolinux » mercoledì 16 settembre 2020, 19:03

Ho così trovato Julius ma è solo riga di comando e non capisco come potrei dirgli (trascrivimi in testo questo file vocale audio).

Avatar utente
neolinux
Rampante Reduce
Rampante Reduce
Messaggi: 5511
Iscrizione: giovedì 11 dicembre 2008, 21:52
Contatti:

Re: da Audio a Testo come?

Messaggio da neolinux » venerdì 18 settembre 2020, 12:28

Per me è ostico comprendere le istruzioni da terminale, ma quali comandi mi consentono di digitalizzare un file audio parlato in testo automaticamente?
Se ho degli esempi poi procedo da solo di copia incolla e con piccoli aggiustamenti (es. cambio formati, cambio nome e destinazione di file...)
Questo è il manuale di Julius ma non ho trovato esempi di comandi purtroppo, l'italiano parlato lo supporta???

Codice: Seleziona tutto

julius -help
Julius rev.4.2.2 - based on JuliusLib rev.4.2.2 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    :
 -  Compiled by  : gcc -g -O2 -fdebug-prefix-map=/build/julius-BbVgaR/julius-4.2.2=. -fstack-protector-strong -Wformat -Werror=format-security

Options:

--- Global Options -----------------------------------------------

 Speech Input:
    (Can extract only MFCC based features from waveform)
    [-input devname]    input source  (default = htkparam)
         htkparam/mfcfile  HTK parameter file
         file/rawfile      waveform file (RAW(BE),WAV)
         mic               default microphone device
         alsa              use ALSA interface
         oss               use OSS interface
         pulseaudio        use PulseAudio interface
         adinnet           adinnet client (TCP/IP)
         stdin             standard input
    [-filelist file]    filename of input file list
    [-adport portnum]   adinnet port number to listen         (5530)
    [-48]               enable 48kHz sampling with internal down sampler (OFF)
    [-zmean/-nozmean]   enable/disable DC offset removal      (OFF)
    [-nostrip]          disable stripping off zero samples
    [-record dir]       record triggered speech data to dir
    [-rejectshort msec] reject an input shorter than specified

 Speech Detection: (default: on=mic/net off=files)
    [-cutsilence]       turn on (force) skipping long silence
    [-nocutsilence]     turn off (force) skipping long silence
    [-lv unsignedshort] input level threshold (0-32767)       (2000)
    [-zc zerocrossnum]  zerocross num threshold per sec.      (60)
    [-headmargin msec]  header margin length in msec.         (300)
    [-tailmargin msec]  tail margin length in msec.           (400)
    [-chunksize sample] unit length for processing            (1000)

 GMM utterance verification:
    -gmm filename       GMM definition file
    -gmmnum num         GMM Gaussian pruning num              (10)
    -gmmreject string   comma-separated list of noise model name to reject

 On-the-fly Decoding: (default: on=mic/net off=files)
    [-realtime]         turn on, input streamed with MAP-CMN
    [-norealtime]       turn off, input buffered with sentence CMN

 Others:
    [-C jconffile]      load options from jconf file
    [-quiet]            reduce output to only word string
    [-demo]             equal to "-quiet -progout"
    [-debug]            (for debug) dump numerous log
    [-callbackdebug]    (for debug) output message per callback
    [-check (wchmm|trellis)] (for debug) check internal structure
    [-check triphone]   triphone mapping check
    [-setting]          print engine configuration and exit
    [-help]             print this message and exit

--- Instance Declarations ----------------------------------------

    [-AM]               start a new acoustic model instance
    [-LM]               start a new language model instance
    [-SR]               start a new recognizer (search) instance
    [-AM_GMM]           start an AM feature instance for GMM
    [-GLOBAL]           start a global section
    [-nosectioncheck]   disable option location check

--- Acoustic Model Options (-AM) ---------------------------------

 Acoustic analysis:
    [-htkconf file]     load parameters from the HTK Config file
    [-smpFreq freq]     sample period (Hz)                    (16000)
    [-smpPeriod period] sample period (100ns)                 (625)
    [-fsize sample]     window size (sample)                  (400)
    [-fshift sample]    frame shift (sample)                  (160)
    [-preemph]          pre-emphasis coef.                    (0.97)
    [-fbank]            number of filterbank channels         (24)
    [-ceplif]           cepstral liftering coef.              (22)
    [-rawe] [-norawe]   toggle using raw energy               (no)
    [-enormal] [-noenormal] toggle normalizing log energy     (no)
    [-escale]           scaling log energy for enormal        (1.0)
    [-silfloor]         energy silence floor in dB            (50.0)
    [-delwin frame]     delta windows length (frame)          (2)
    [-accwin frame]     accel windows length (frame)          (2)
    [-hifreq freq]      freq. of upper band limit, off if <0  (-1)
    [-lofreq freq]      freq. of lower band limit, off if <0  (-1)
    [-sscalc]           do spectral subtraction (file input only)
    [-sscalclen msec]   length of head silence for SS (msec)  (300)
    [-ssload filename]  load constant noise spectrum from file for SS
    [-ssalpha value]    alpha coef. for SS                    (2.000000)
    [-ssfloor value]    spectral floor for SS                 (0.500000)
    [-zmeanframe/-nozmeanframe] frame-wise DC removal like HTK(OFF)
    [-usepower/-nousepower] use power in fbank analysis       (OFF)
    [-cmnload file]     load initial CMN param from file on startup
    [-cmnsave file]     save CMN param to file after each input
    [-cmnnoupdate]      not update CMN param while recog. (use with -cmnload)
    [-cmnmapweight]     weight value of initial cm for MAP-CMN (100.00)
    [-cvn]              cepstral variance normalisation       (on)
    [-vtln alpha lowcut hicut] enable VTLN (1.0 to disable)   (1.000000)

 Acoustic Model:
    -h hmmdefsfile      HMM definition file name
    [-hlist HMMlistfile] HMMlist filename (must for triphone model)
    [-iwcd1 methodname] switch IWCD triphone handling on 1st pass
             best N     use N best score (default of n-gram, N=3)
             max        use maximum score
             avg        use average score (default of dfa)
    [-force_ccd]        force to handle IWCD
    [-no_ccd]           don't handle IWCD
    [-notypecheck]      don't check input parameter type
    [-spmodel HMMname]  name of short pause model             ("sp")
    [-multipath]        switch decoding for multi-path HMM    (auto)

 Acoustic Model Computation Method:
    [-gprune methodname] select Gaussian pruning method:
             safe          safe pruning
             heuristic     heuristic pruning
             beam          beam pruning (default for TM/PTM)
             none          no pruning (default for non tmix models)
    [-tmix gaussnum]    Gaussian num threshold per mixture for pruning (2)
    [-gshmm hmmdefs]    monophone hmmdefs for GS
    [-gsnum N]          N-best state will be selected        (24)

--- Language Model Options (-LM) ---------------------------------

 N-gram:
    -d file.bingram     n-gram file in Julius binary format
    -nlr file.arpa      forward n-gram file in ARPA format
    -nrl file.arpa      backward n-gram file in ARPA format
    [-lmp float float]  weight and penalty (tri: 8.0 -2.0 mono: 5.0 -1)
    [-lmp2 float float]       for 2nd pass (tri: 8.0 -2.0 mono: 6.0 0)
    [-transp float]     penalty for transparent word (+0.0)

 DFA Grammar:
    -dfa file.dfa       DFA grammar file
    -gram file[,file2...] (list of) grammar prefix(es)
    -gramlist filename  filename of grammar list
    [-penalty1 float]   word insertion penalty (1st pass)     (0.0)
    [-penalty2 float]   word insertion penalty (2nd pass)     (0.0)

 Word Dictionary for N-gram and DFA:
    -v dictfile         dictionary file name
    [-silhead wordname] (n-gram) beginning-of-sentence word   (<s>)
    [-siltail wordname] (n-gram) end-of-sentence word         (</s>)
    [-mapunk wordname]  (n-gram) map unknown words to this    (<unk>)
    [-forcedict]        ignore error entry and keep running
    [-iwspword]         (n-gram) add short-pause word for inter-word CD sp
    [-iwspentry entry]  (n-gram) word entry for "-iwspword" (<UNK> [sp] sp sp)
    [-adddict dictfile] (n-gram) load extra dictionary
    [-addentry entry]   (n-gram) load extra word entry

 Isolated Word Recognition:
    -w file[,file2...]  (list of) wordlist file name(s)
    -wlist filename     file that contains list of wordlists
    -wsil head tail sp  name of silence/pause model
                          head - BOS silence model name       (silB)
                          tail - EOS silence model name       (silE)
                           sp  - their name as context or "NULL" (NULL)

--- Recognizer / Search Options (-SR) ----------------------------

 Search Parameters for the First Pass:
    [-b beamwidth]      beam width (by state num)             (guessed)
                        (0: full search, -1: force guess)
    [-bs score_width]   beam width (by score offset)          (disabled)
                        (-1: disable)
    [-sepnum wordnum]   (n-gram) # of hi-freq word isolated from tree (150)
    [-1pass]            do 1st pass only, omit 2nd pass
    [-inactive]         recognition process not active on startup

 Search Parameters for the Second Pass:
    [-b2 hyponum]       word envelope beam width (by hypo num) (30)
    [-n N]              # of sentence to find                 (1)
    [-output N]         # of sentence to output               (1)
    [-sb score]         score beam threshold (by score)       (80.0)
    [-s hyponum]        global stack size of hypotheses       (500)
    [-m hyponum]        hypotheses overflow threshold num     (2000)
    [-lookuprange N]    frame lookup range in word expansion  (5)
    [-looktrellis]      (dfa) expand only backtrellis words
    [-[no]multigramout] (dfa) output per-grammar results
    [-oldtree]          (dfa) use old build_wchmm()
    [-oldiwcd]          (dfa) use full lcdset
    [-iwsp]             insert sp for all word end (multipath)(off)
    [-iwsppenalty]      trans. penalty for iwsp (multipath)   (-1.0)

 Short-pause Segmentation:
    [-spsegment]        enable short-pause segmentation
    [-spdur]            length threshold of sp frames         (10)
    [-pausemodels str]  comma-delimited list of pause models for segment

 Graph Output with graph-oriented search:
    [-lattice]          enable word graph (lattice) output
    [-confnet]          enable confusion network output
    [-nolattice]][-noconfnet] disable lattice / confnet output
    [-graphrange N]     merge same words in graph (0)
                        -1: not merge, leave same loc. with diff. score
                         0: merge same words at same location
                        >0: merge same words around the margin
    [-graphcut num]     graph cut depth at postprocess (-1: disable)(80)
    [-graphboundloop num] max. num of boundary adjustment loop (20)
    [-graphsearchdelay] inhibit search termination until 1st sent. found
    [-nographsearchdelay] disable it (default)

 Forced Alignment:
    [-walign]           optionally output word alignments
    [-palign]           optionally output phoneme alignments
    [-salign]           optionally output state alignments

 Confidence Score:
    [-cmalpha value]    CM smoothing factor                    (0.050000)

 Message Output:
    [-fallback1pass]    use 1st pass result when search failed
    [-progout]          progressive output in 1st pass
    [-proginterval]     interval of progout in msec           (300)

-------------------------------------------------

 Additional options for application:
    [--help]	display this help
    [-help]	display this help
    [-outfile]	save result in separate .out file
    [-nolog]	not output any log
    [-logfile arg]	output log to file
    [-separatescore]	output AM and LM scores separately
    [-kanji arg]	convert character set for output
    [-nocharconv]	disable charconv
    [-charconv arg arg]	convert character set for output
    [-outcode arg]	select info to output to the module: WLPSCwlps
    [-module (arg)]	run as a server module
    [-record arg]	record input waveform to file in dir
Dovrebbe essere un qualcosa tipo:
Julius comando destinazionefiletxt

Avatar utente
neolinux
Rampante Reduce
Rampante Reduce
Messaggi: 5511
Iscrizione: giovedì 11 dicembre 2008, 21:52
Contatti:

Re: da Audio a Testo come?

Messaggio da neolinux » martedì 22 settembre 2020, 22:35

Chiudo questo post, provo una strada diversa
neolinux ha scritto:
martedì 22 settembre 2020, 22:42
qui

Scrivi risposta

Ritorna a “Applicazioni”

Chi c’è in linea

Visualizzano questa sezione: Bing [Bot], Google [Bot], Majestic-12 [Bot], Nennuzzo, newso e 18 ospiti