[Postfixbuch-users] nochmal FuzzyOcr - sehr ausführlich

usenet at deiszner.de usenet at deiszner.de
Di Mai 1 17:19:24 CEST 2007


Hallo,

ich habe mir jetzt von meinem 'Dienstkonto' an der Uni eine Mail an den
betreffenden Server geschickt - im Anhang eine jpg-Datei mit dem Wort (die
kleinen blauen Pillen - ich weiss nicht ob ich dann mit dieser Mail bei Peer
in der Liste lande).

Jetzt kommt das problematische:

1. die Mail wird zugestellt - er ningelt nur am fehlenden Namen des
Absenders :

2. wenn ich die Mail direkt über die bash-Shell durch spamassassin laufen
lasse:

Warum erkennt er die Mail als Spam nicht im 'realen' Modus?

#spamassassin  --debug < msg.VA4U\:2\,S  /dev/null

Subroutine FuzzyOcr::O_CREAT redefined at
/usr/local/lib/perl5/5.8.8/Exporter.pm line 65.
 at /usr/local/lib/perl5/5.8.8/i686-linux/POSIX.pm line 19
Subroutine FuzzyOcr::O_EXCL redefined at
/usr/local/lib/perl5/5.8.8/Exporter.pm line 65.
 at /usr/local/lib/perl5/5.8.8/i686-linux/POSIX.pm line 19
Subroutine FuzzyOcr::O_RDWR redefined at
/usr/local/lib/perl5/5.8.8/Exporter.pm line 65.
 at /usr/local/lib/perl5/5.8.8/i686-linux/POSIX.pm line 19
[20323] dbg: FuzzyOcr: focr_bin_helper:
'pnmnorm,pnminvert,convert,ppmtopgm,tesseract'
[20323] info: FuzzyOcr: Adding <5> new helper apps
[20323] info: FuzzyOcr: Starting preprocessor parser for file
"/etc/mail/spamassassin/FuzzyOcr.preps"...
[20323] dbg: FuzzyOcr: line: preprocessor normalize {
[20323] dbg: FuzzyOcr: line: command = pnmnorm
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor invert {
[20323] dbg: FuzzyOcr: line: command = pnminvert
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor ppmtopgm {
[20323] dbg: FuzzyOcr: line: command = ppmtopgm
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor pamtopnm {
[20323] dbg: FuzzyOcr: line: command = pamtopnm
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor pamthreshold {
[20323] dbg: FuzzyOcr: line: command = pamthreshold
[20323] dbg: FuzzyOcr: line: args = -simple -threshold 0.5
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor maketiff {
[20323] dbg: FuzzyOcr: line: command = pnmtotiff
[20323] dbg: FuzzyOcr: line: args = -color -truecolor
[20323] dbg: FuzzyOcr: line: }
[20323] info: FuzzyOcr: Starting scanset parser for file
"/etc/mail/spamassassin/FuzzyOcr.scansets"...
[20323] dbg: FuzzyOcr: line scanset ocrad {
[20323] dbg: FuzzyOcr: line command = $ocrad
[20323] dbg: FuzzyOcr: line args = -s5 $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset ocrad-invert {
[20323] dbg: FuzzyOcr: line command = $ocrad
[20323] dbg: FuzzyOcr: line args = -s5 -i $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset ocrad-decolorize-invert {
[20323] dbg: FuzzyOcr: line preprocessors = ppmtopgm, pamthreshold, pamtopnm
[20323] dbg: FuzzyOcr: line command = $ocrad
[20323] dbg: FuzzyOcr: line args = -s5 -i $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset ocrad-decolorize {
[20323] dbg: FuzzyOcr: line preprocessors = ppmtopgm, pamthreshold, pamtopnm
[20323] dbg: FuzzyOcr: line command = $ocrad
[20323] dbg: FuzzyOcr: line args = -s5 $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset gocr {
[20323] dbg: FuzzyOcr: line command = $gocr
[20323] dbg: FuzzyOcr: line args = -i $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset gocr-180 {
[20323] dbg: FuzzyOcr: line command = $gocr
[20323] dbg: FuzzyOcr: line args = -l 180 -d 2 -i $input
[20323] dbg: FuzzyOcr: line }
[20323] info: FuzzyOcr: Searching in: /usr/local/netpbm/bin
[20323] info: FuzzyOcr: Searching in: /usr/local/bin
[20323] info: FuzzyOcr: Searching in: /usr/bin
[20323] info: FuzzyOcr: Using gifsicle => /usr/local/bin/gifsicle
[20323] info: FuzzyOcr: Using giffix => /usr/bin/giffix
[20323] info: FuzzyOcr: Using giftext => /usr/bin/giftext
[20323] info: FuzzyOcr: Using gifinter => /usr/bin/gifinter
[20323] info: FuzzyOcr: Using giftopnm => /usr/bin/giftopnm
[20323] info: FuzzyOcr: Using jpegtopnm => /usr/bin/jpegtopnm
[20323] info: FuzzyOcr: Using pngtopnm => /usr/bin/pngtopnm
[20323] info: FuzzyOcr: Using bmptopnm => /usr/bin/bmptopnm
[20323] info: FuzzyOcr: Using tifftopnm => /usr/bin/tifftopnm
[20323] info: FuzzyOcr: Using ppmhist => /usr/bin/ppmhist
[20323] info: FuzzyOcr: Using pamfile => /usr/bin/pamfile
[20323] info: FuzzyOcr: Using ocrad => /usr/local/bin/ocrad
[20323] info: FuzzyOcr: Using gocr => /usr/bin/gocr
[20323] info: FuzzyOcr: Using pnmnorm => /usr/bin/pnmnorm
[20323] info: FuzzyOcr: Using pnminvert => /usr/bin/pnminvert
[20323] info: FuzzyOcr: Using convert => /usr/bin/convert
[20323] info: FuzzyOcr: Using ppmtopgm => /usr/bin/ppmtopgm
[20323] info: FuzzyOcr: Using tesseract => /usr/local/bin/tesseract
[20323] dbg: FuzzyOcr: Threshold[max_hash] => 5
[20323] dbg: FuzzyOcr: Threshold[c] => 5
[20323] dbg: FuzzyOcr: Threshold[s] => 0.01
[20323] dbg: FuzzyOcr: Threshold[w] => 0.01
[20323] dbg: FuzzyOcr: Threshold[h] => 0.01
[20323] dbg: FuzzyOcr: Threshold[cn] => 0.01
[20323] dbg: FuzzyOcr: focr_add_score => 1
[20323] dbg: FuzzyOcr: focr_autodisable_negative_score => -5
[20323] dbg: FuzzyOcr: focr_autodisable_score => 1000
[20323] dbg: FuzzyOcr: focr_autosort_buffer => 10
[20323] dbg: FuzzyOcr: focr_autosort_scanset => 1
[20323] dbg: FuzzyOcr: focr_base_score => 5
[20323] dbg: FuzzyOcr: focr_corrupt_score => 2.5
[20323] dbg: FuzzyOcr: focr_corrupt_unfixable_score => 5
[20323] dbg: FuzzyOcr: focr_counts_required => 2
[20323] dbg: FuzzyOcr: focr_db_hash => /etc/mail/spamassassin/FuzzyOcr.db
[20323] dbg: FuzzyOcr: focr_db_max_days => 35
[20323] dbg: FuzzyOcr: focr_db_safe =>
/etc/mail/spamassassin/FuzzyOcr.safe.db
[20323] dbg: FuzzyOcr: focr_digest_db =>
/etc/mail/spamassassin/FuzzyOcr.hashdb
[20323] dbg: FuzzyOcr: focr_enable_image_hashing => 2
[20323] dbg: FuzzyOcr: focr_global_timeout => 0
[20323] dbg: FuzzyOcr: focr_global_wordlist =>
/etc/mail/spamassassin/FuzzyOcr.words
[20323] dbg: FuzzyOcr: focr_hashing_learn_scanned => 1
[20323] dbg: FuzzyOcr: focr_keep_bad_images => 0
[20323] dbg: FuzzyOcr: focr_log_pmsinfo => 1
[20323] dbg: FuzzyOcr: focr_log_stderr => 1
[20323] dbg: FuzzyOcr: focr_logfile => /home/fuzzyocr.log
[20323] dbg: FuzzyOcr: focr_max_height => 800
[20323] dbg: FuzzyOcr: focr_max_width => 800
[20323] dbg: FuzzyOcr: focr_min_height => 4
[20323] dbg: FuzzyOcr: focr_min_width => 4
[20323] dbg: FuzzyOcr: focr_minimal_scanset => 1
[20323] dbg: FuzzyOcr: focr_mysql_db => FuzzyOcr
[20323] dbg: FuzzyOcr: focr_mysql_hash => Hash
[20323] dbg: FuzzyOcr: focr_mysql_host => localhost
[20323] dbg: FuzzyOcr: focr_mysql_port => 3306
[20323] dbg: FuzzyOcr: focr_mysql_safe => Safe
[20323] dbg: FuzzyOcr: focr_mysql_update_hash => 0
[20323] dbg: FuzzyOcr: focr_mysql_user => fuzzyocr
[20323] dbg: FuzzyOcr: focr_no_homedirs => 0
[20323] dbg: FuzzyOcr: focr_path_bin =>
/usr/local/netpbm/bin:/usr/local/bin:/usr/bin
[20323] dbg: FuzzyOcr: focr_personal_wordlist =>
__userstate__/FuzzyOcr.words
[20323] dbg: FuzzyOcr: focr_preprocessor_file =>
/etc/mail/spamassassin/FuzzyOcr.preps
[20323] dbg: FuzzyOcr: focr_scanset_file =>
/etc/mail/spamassassin/FuzzyOcr.scansets
[20323] dbg: FuzzyOcr: focr_score_ham => 0
[20323] dbg: FuzzyOcr: focr_skip_bmp => 0
[20323] dbg: FuzzyOcr: focr_skip_gif => 0
[20323] dbg: FuzzyOcr: focr_skip_jpeg => 0
[20323] dbg: FuzzyOcr: focr_skip_png => 0
[20323] dbg: FuzzyOcr: focr_skip_tiff => 0
[20323] dbg: FuzzyOcr: focr_skip_updates => 0
[20323] dbg: FuzzyOcr: focr_strip_numbers => 1
[20323] dbg: FuzzyOcr: focr_threshold => 0.25
[20323] dbg: FuzzyOcr: focr_timeout => 10
[20323] dbg: FuzzyOcr: focr_twopass_scoring_factor => 1.5
[20323] dbg: FuzzyOcr: focr_unique_matches => 0
[20323] dbg: FuzzyOcr: focr_verbose => 1
[20323] dbg: FuzzyOcr: focr_wrongctype_score => 1.5
[20323] dbg: FuzzyOcr: focr_wrongext_score => 1.5
[20323] info: FuzzyOcr: Loaded preprocessor normalize: /usr/bin/pnmnorm
[20323] info: FuzzyOcr: Loaded preprocessor invert: /usr/bin/pnminvert
[20323] info: FuzzyOcr: Loaded preprocessor ppmtopgm: /usr/bin/ppmtopgm
[20323] info: FuzzyOcr: Loaded preprocessor pamtopnm: pamtopnm
[20323] info: FuzzyOcr: Loaded preprocessor pamthreshold:
pamthreshold -simple -threshold 0.5
[20323] info: FuzzyOcr: Loaded preprocessor maketiff:
pnmtotiff -color -truecolor
[20323] info: FuzzyOcr: Using scan ocrad: /usr/local/bin/ocrad -s5 $input
[20323] info: FuzzyOcr: Using scan ocrad-invert: /usr/local/bin/ocrad -s5 -i
$input
[20323] info: FuzzyOcr: Using scan ocrad-decolorize-invert:
/usr/local/bin/ocrad -s5 -i $input
[20323] info: FuzzyOcr: Using scan ocrad-decolorize:
/usr/local/bin/ocrad -s5 $input
[20323] info: FuzzyOcr: Using scan gocr: /usr/bin/gocr -i $input
[20323] info: FuzzyOcr: Using scan gocr-180: /usr/bin/gocr -l 180 -d 2 -i
$input
[20323] info: FuzzyOcr: Added <47> words from
"/etc/mail/spamassassin/FuzzyOcr.words"
[20323] info: rules: meta test DIGEST_MULTIPLE has undefined dependency
'DCC_CHECK'
[20323] dbg: FuzzyOcr: Starting FuzzyOcr...
[20323] info: FuzzyOcr: Processing Message with ID
"<002b01c78c01$43811f20$2601a8c0 at SKHWDNOTEBOOK>"
(<dienstkonto at uni-leipzig.de> -> <sebastian at deiszner.de>)
[20323] dbg: FuzzyOcr: fname: "testmail.jpg" => "testmail.jpg"
[20323] info: FuzzyOcr: JPEG: [46x144] testmail.jpg (2577)
[20323] dbg: FuzzyOcr: Saved: /tmp/.spamassassin20323q4X4mLtmp/testmail.jpg
[20323] dbg: FuzzyOcr: Saved: /tmp/.spamassassin20323q4X4mLtmp/raw.eml
[20323] info: FuzzyOcr: Found: 1 images
[20323] dbg: FuzzyOcr: pfile =>
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20323] dbg: FuzzyOcr: efile =>
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.err
[20323] dbg: FuzzyOcr: Errors to: /tmp/.spamassassin20323q4X4mLtmp/raw.err
[20323] dbg: FuzzyOcr: File has Content-Type "image/jpeg" and File Extension
"jpg"
[20323] info: FuzzyOcr: Found JPEG header name="testmail.jpg"
[20323] dbg: FuzzyOcr: Saved pid: 20326
[20326] dbg: FuzzyOcr: Exec : /usr/bin/jpegtopnm
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg
[20326] dbg: FuzzyOcr: Stdout:
 >/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20326] dbg: FuzzyOcr: Stderr:
 >>/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.err
[20323] dbg: FuzzyOcr: Elapsed [20326]: 0.009051 sec. (/usr/bin/jpegtopnm:
exit 0)
[20323] info: FuzzyOcr: Calculating image hash for:
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20323] dbg: FuzzyOcr: Saved pid: 20327
[20327] dbg: FuzzyOcr: Exec : /usr/bin/ppmhist -noheader
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20327] dbg: FuzzyOcr: Stdout:
 >/tmp/.spamassassin20323q4X4mLtmp/ppmhist.info
[20327] dbg: FuzzyOcr: Stderr: >/dev/null
[20323] dbg: FuzzyOcr: Elapsed [20327]: 0.009170 sec. (/usr/bin/ppmhist:
exit 0)
[20323] dbg: FuzzyOcr: Got:
<19886:46:144:208::255:255:255:255:3678::0:0:0:0:436::254:254:254:254:152::253:253:253:253:141::252:252:252:252:134::251:251:251:251:123>
[20323] info: FuzzyOcr: Scanset Order: ocrad(0) ocrad-invert(0)
ocrad-decolorize-invert(0) ocrad-decolorize(0) gocr(0) gocr-180(0)
[20323] dbg: FuzzyOcr: Saved pid: 20328
[20328] dbg: FuzzyOcr: Exec : /usr/local/bin/ocrad -s5
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20328] dbg: FuzzyOcr: Stdout:
 >/tmp/.spamassassin20323q4X4mLtmp/scanset.ocrad.out
[20328] dbg: FuzzyOcr: Stderr:
 >/tmp/.spamassassin20323q4X4mLtmp/scanset.ocrad.err
[20323] dbg: FuzzyOcr: Elapsed [20328]: 0.018049 sec. (/usr/local/bin/ocrad:
exit 0)
[20323] dbg: FuzzyOcr: ocrdata=>>viagra
[20323] dbg: FuzzyOcr:
[20323] dbg: FuzzyOcr: <<=end
[20323] info: FuzzyOcr: Scanset "ocrad" found word "viagra" with fuzz of
0.0000
[20323] info: FuzzyOcr: line: "viagra"
[20323] dbg: FuzzyOcr: Not enough OCR Hits without space stripping, doing
second matching pass...
[20323] info: FuzzyOcr: Scanset "ocrad" found word "viagra" with fuzz of
0.0000
[20323] info: FuzzyOcr: line: "viagra"
[20323] info: FuzzyOcr: Scanset "ocrad" generates enough hits (2), skipping
further scansets...
[20323] info: FuzzyOcr: Message is spam, score = 5.000
[20323] info: FuzzyOcr: Adding Hash to "/etc/mail/spamassassin/FuzzyOcr.db"
with score "5.000"
[20323] dbg: FuzzyOcr: Digest:
19886:46:144:208::255:255:255:255:3678::0:0:0:0:436::254:254:254:254:152::253:253:253:253:141::252:252:252:252:134::251:251:251:251:123
[20323] info: FuzzyOcr: Words found:
[20323] info: FuzzyOcr: "viagra" in 1 lines
[20323] info: FuzzyOcr: "viagra" in 1 lines
[20323] info: FuzzyOcr: (2 word occurrences found)
[20323] dbg: FuzzyOcr: Remove DIR: /tmp/.spamassassin20323q4X4mLtmp
[20323] dbg: FuzzyOcr: FuzzyOcr ending successfully...
[20323] dbg: FuzzyOcr: Processed in 0.075586 sec.
>From dienstkonto at uni-leipzig.de  Tue May  1 16:59:08 2007
Received: from localhost by 86-56-25-43
 with SpamAssassin (version 3.1.8);
 Tue, 01 May 2007 17:15:19 +0200
From: <dienstkonto at uni-leipzig.de>
To: <ich at deiszner.de>
Subject: E-Mail schreiben an: testmail.jpg
Date: Tue, 1 May 2007 16:58:42 +0200
Message-Id: <002b01c78c01$43811f20$2601a8c0 at SKHWDNOTEBOOK>
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on 86-56-25-43
X-Spam-Level: *****
X-Spam-Status: Yes, score=5.5 required=5.0 tests=FUZZY_OCR,NO_REAL_NAME
 autolearn=no version=3.1.8
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_46375987.485D3BC8"

This is a multi-part message in MIME format.

------------=_46375987.485D3BC8
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

Software zur Erkennung von "Spam" auf dem Rechner

    cdu-anhalt-bitterfeld.de

hat die eingegangene E-mail als mögliche "Spam"-Nachricht identifiziert.
Die ursprüngliche Nachricht wurde an diesen Bericht angehängt, so dass
Sie sie anschauen können (falls es doch eine legitime E-Mail ist) oder
ähnliche unerwünschte Nachrichten in Zukunft markieren können.
Bei Fragen zu diesem Vorgang wenden Sie sich bitte an

    ich at cdu-anhalt-bitterfeld.de

Vorschau: Die Nachricht kann jetzt mit folgender Datei oder Link als
   Anlage gesendet werden: testmail.jpg Hinweis: E-Mail-Programme können das
   Senden oder Empfangen von bestimmten Dateitypen als Anlagen aufgrund von
  Computerviren verhindern. Überprüfen Sie die
E-Mail-Sicherheitseinstellungen,
   um zu ermitteln, wie Anlagen gehandhabt werden. [...]

Inhaltsanalyse im Detail:   (5.5 Punkte, 5.0 benötigt)

Pkte Regelname              Beschreibung
---- ---------------------- --------------------------------------------------
 0.6 NO_REAL_NAME           Kein vollständiger Name in Absendeadresse
 5.0 FUZZY_OCR              BODY: Mail contains an image with common spam
text inside
                            Words found:
"viagra" in 1 lines
"viagra" in 1
                            lines
(2 word occurrences found)

Die ursprüngliche Nachricht enthielt nicht ausschließlich Klartext
(plain text) und kann eventuell eine Gefahr für einige E-Mail-Programme
darstellen (falls sie z.B. einen Computervirus enthält).
Möchten Sie die Nachricht dennoch ansehen, ist es wahrscheinlich
sicherer, sie zuerst in einer Datei zu speichern und diese Datei danach
mit einem Texteditor zu öffnen.


------------=_46375987.485D3BC8
Content-Type: message/rfc822; x-spam-type=original
Content-Description: original message before SpamAssassin
Content-Disposition: attachment
Content-Transfer-Encoding: 8bit

Return-Path: <dienstkonto at uni-leipzig.de>
X-Spam-Flag: NO
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on 86-56-25-43
X-Spam-Level:
X-Spam-Status: No, hits=-101.3 required=4.0 tests=AWL,BAYES_00,NO_REAL_NAME,
 USER_IN_WHITELIST autolearn=no version=3.1.8
X-Original-To: ich at deiszner.de
Delivered-To: ich at localhost
Received: from v1.rz.uni-leipzig.de (v1.rz.uni-leipzig.de [139.18.1.26])
 by cdu-anhalt-bitterfeld.de (Postfix) with ESMTP id ED1FB4A00FA
 for <ich at deiszner.de>; Tue,  1 May 2007 16:59:06 +0200 (CEST)
X-Virus-Scanned: by amavisd-new at v1-ul
Received: from v1.rz.uni-leipzig.de ([127.0.0.1])
 by localhost (v1.rz.uni-leipzig.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 41Fg1iwRrjLN for <ich at deiszner.de>;
 Tue,  1 May 2007 16:59:06 +0200 (CEST)
Received: from server1.rz.uni-leipzig.de (server1.rz.uni-leipzig.de
[139.18.1.1])
 by v1.rz.uni-leipzig.de (Postfix) with ESMTP id 3D608126F4
 for <ich at deiszner.de>; Tue,  1 May 2007 16:59:06 +0200 (CEST)
Received: from SKHWDNOTEBOOK (110.75.203.62.cust.bluewin.ch [62.203.75.110])
 by server1.rz.uni-leipzig.de (Postfix) with ESMTP id 52B2421
 for <ich at deiszner.de>; Tue,  1 May 2007 16:59:04 +0200 (METDST)
Message-ID: <002b01c78c01$43811f20$2601a8c0 at SKHWDNOTEBOOK>
From: <dienstkonto at uni-leipzig.de>
To: <ich at deiszner.de>
Subject: E-Mail schreiben an: testmail.jpg
Date: Tue, 1 May 2007 16:58:42 +0200
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="----=_NextPart_000_0009_01C78C11.F95B9590"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3028
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
X-Virus-Status: No
X-Virus-Checker-Version: clamassassin 1.2.3 with clamscan / ClamAV
0.90.2/3188/Tue May  1 12:24:57 2007 signatures .

This is a multi-part message in MIME format.




Mehr Informationen über die Mailingliste Postfixbuch-users