<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="it">
	<id>https://wiki.montellug.it/index.php?action=history&amp;feed=atom&amp;title=OCR</id>
	<title>OCR - Cronologia</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.montellug.it/index.php?action=history&amp;feed=atom&amp;title=OCR"/>
	<link rel="alternate" type="text/html" href="index.php?title=OCR&amp;action=history"/>
	<updated>2026-05-08T00:01:39Z</updated>
	<subtitle>Cronologia della pagina su questo sito</subtitle>
	<generator>MediaWiki 1.35.14</generator>
	<entry>
		<id>index.php?title=OCR&amp;diff=23885&amp;oldid=prev</id>
		<title>Odeeno il 14:50, 8 feb 2017</title>
		<link rel="alternate" type="text/html" href="index.php?title=OCR&amp;diff=23885&amp;oldid=prev"/>
		<updated>2017-02-08T14:50:39Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;it&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Versione meno recente&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Versione delle 14:50, 8 feb 2017&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot; &gt;Riga 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Riga 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= Riconoscimento ottico dei caratteri &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= Riconoscimento ottico dei caratteri HOWTO sui programmi per fare OCR optical character recognition =&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;== &lt;/del&gt;HOWTO sui programmi per fare OCR optical character recognition &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;=&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Capita sovente di dover rielaborare il testo di documenti che abbiamo scansionato. Normalmente dalla scansione si ottengono documenti di tipo grafico, che non sono editabili con i programmi di editing del testo, quali LibreOffice Writer e affini.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Capita sovente di dover rielaborare il testo di documenti che abbiamo scansionato. Normalmente dalla scansione si ottengono documenti di tipo grafico, che non sono editabili con i programmi di editing del testo, quali LibreOffice Writer e affini.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l39&quot; &gt;Riga 39:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Riga 38:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;      mv temp.txt $OUTPUT&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;      mv temp.txt $OUTPUT&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  done&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  done&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;== Programmi con interfaccia grafica ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* ocropy https://github.com/tmbdev/ocropy&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* OCRFeeder scaricabile tramite repository&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Ogni integrazione è ben gradita.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Ogni integrazione è ben gradita.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key montellu_mediawiki:diff::1.12:old-23884:rev-23885 --&gt;
&lt;/table&gt;</summary>
		<author><name>Odeeno</name></author>
	</entry>
	<entry>
		<id>index.php?title=OCR&amp;diff=23884&amp;oldid=prev</id>
		<title>Odeeno: HOWTO OCR</title>
		<link rel="alternate" type="text/html" href="index.php?title=OCR&amp;diff=23884&amp;oldid=prev"/>
		<updated>2017-02-08T14:33:49Z</updated>

		<summary type="html">&lt;p&gt;HOWTO OCR&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Nuova pagina&lt;/b&gt;&lt;/p&gt;&lt;div&gt;= Riconoscimento ottico dei caratteri =&lt;br /&gt;
== HOWTO sui programmi per fare OCR optical character recognition ==&lt;br /&gt;
&lt;br /&gt;
Capita sovente di dover rielaborare il testo di documenti che abbiamo scansionato. Normalmente dalla scansione si ottengono documenti di tipo grafico, che non sono editabili con i programmi di editing del testo, quali LibreOffice Writer e affini.&lt;br /&gt;
&lt;br /&gt;
E allora che si fa? Anche in Linux ci sono degli ottimi programmi che si occupano di svolgere questo lavoro: esaminare un documento di tipo grafico ed estrapolarne il testo. Purtroppo sono pochi i programmi che hanno una interfaccia grafica utile (invito chi li conosce ad integrare questa pagina), ma si può a tal proposito usare un comodo script che riporto di seguito.&lt;br /&gt;
&lt;br /&gt;
== Comandi utili da utilizzare ==&lt;br /&gt;
Se il file di origine è un PDF di una sola pagina è possibile utilizzare direttamente i comandi che seguono.&lt;br /&gt;
&lt;br /&gt;
'''Prima fase''': conversione del file di origine in PDF in un file tif, utilizzando il programma ''convert''&lt;br /&gt;
 convert -monochrome -density 300 sorgente.pdf nuovo_sorgente.tif&lt;br /&gt;
&lt;br /&gt;
Ulteriori opzioni: in alcuni casi, convertire il file di origine in un file bianco e nero consente di ottenere migliori risultati: provare quindi con&lt;br /&gt;
 convert -density 300 sorgente.pdf -depth 8 -background white +matte nuovo_sorgente.tif&lt;br /&gt;
&lt;br /&gt;
'''Seconda fase''': lettura del file .tif e salvataggio del testo contenuto, utilizzando il programma ''tesseract''&lt;br /&gt;
 tesseract nuovo_sorgente.tif testo.txt&lt;br /&gt;
&lt;br /&gt;
Ulteriori opzioni: è possibile dire al programma in quale in lingua è scritto il testo; in più si può specificare se il testo è in una sola colonna o più: provare quindi con&lt;br /&gt;
 tesseract -l ita -psm 4 nuovo_sorgente.tif testo.txt&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Script per automatizzare il riconoscimento ==&lt;br /&gt;
Se, come di solito capita, il documento è composto di più pagine possiamo automatizzare la lettura dell'intero documento con il seguente script.&lt;br /&gt;
&lt;br /&gt;
 PAGINE=11               # imposta il numero delle pagine di cui si compone il PDF &lt;br /&gt;
 SORGENTE=sorgente.pdf   # imposta il nome del file PDF da elaborare &lt;br /&gt;
 OUTPUT=testo.txt        # imposta il nome del file di output &lt;br /&gt;
 RESOLUTION=400          # imposta la risoluzione utilizzata dallo scanner di elaborazione &lt;br /&gt;
 touch $OUTPUT&lt;br /&gt;
 for i in `seq 1 $PAGINE`; do&lt;br /&gt;
     convert -density $RESOLUTION $SORGENTE\[$i\] -depth 8 -background white +matte page$i.tif&lt;br /&gt;
     tesseract page$i.tif page$i -l ita -psm 4 &lt;br /&gt;
     cat $OUTPUT page$i.txt &amp;gt; temp.txt&lt;br /&gt;
     rm $OUTPUT&lt;br /&gt;
     rm page$i.tif&lt;br /&gt;
     rm page$i.txt&lt;br /&gt;
     mv temp.txt $OUTPUT&lt;br /&gt;
 done&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ogni integrazione è ben gradita.&lt;/div&gt;</summary>
		<author><name>Odeeno</name></author>
	</entry>
</feed>