[Postfixbuch-users] nochmal FuzzyOcr - sehr ausführlich
usenet at deiszner.de
usenet at deiszner.de
Di Mai 1 17:19:24 CEST 2007
Hallo,
ich habe mir jetzt von meinem 'Dienstkonto' an der Uni eine Mail an den
betreffenden Server geschickt - im Anhang eine jpg-Datei mit dem Wort (die
kleinen blauen Pillen - ich weiss nicht ob ich dann mit dieser Mail bei Peer
in der Liste lande).
Jetzt kommt das problematische:
1. die Mail wird zugestellt - er ningelt nur am fehlenden Namen des
Absenders :
2. wenn ich die Mail direkt über die bash-Shell durch spamassassin laufen
lasse:
Warum erkennt er die Mail als Spam nicht im 'realen' Modus?
#spamassassin --debug < msg.VA4U\:2\,S /dev/null
Subroutine FuzzyOcr::O_CREAT redefined at
/usr/local/lib/perl5/5.8.8/Exporter.pm line 65.
at /usr/local/lib/perl5/5.8.8/i686-linux/POSIX.pm line 19
Subroutine FuzzyOcr::O_EXCL redefined at
/usr/local/lib/perl5/5.8.8/Exporter.pm line 65.
at /usr/local/lib/perl5/5.8.8/i686-linux/POSIX.pm line 19
Subroutine FuzzyOcr::O_RDWR redefined at
/usr/local/lib/perl5/5.8.8/Exporter.pm line 65.
at /usr/local/lib/perl5/5.8.8/i686-linux/POSIX.pm line 19
[20323] dbg: FuzzyOcr: focr_bin_helper:
'pnmnorm,pnminvert,convert,ppmtopgm,tesseract'
[20323] info: FuzzyOcr: Adding <5> new helper apps
[20323] info: FuzzyOcr: Starting preprocessor parser for file
"/etc/mail/spamassassin/FuzzyOcr.preps"...
[20323] dbg: FuzzyOcr: line: preprocessor normalize {
[20323] dbg: FuzzyOcr: line: command = pnmnorm
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor invert {
[20323] dbg: FuzzyOcr: line: command = pnminvert
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor ppmtopgm {
[20323] dbg: FuzzyOcr: line: command = ppmtopgm
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor pamtopnm {
[20323] dbg: FuzzyOcr: line: command = pamtopnm
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor pamthreshold {
[20323] dbg: FuzzyOcr: line: command = pamthreshold
[20323] dbg: FuzzyOcr: line: args = -simple -threshold 0.5
[20323] dbg: FuzzyOcr: line: }
[20323] dbg: FuzzyOcr: line: preprocessor maketiff {
[20323] dbg: FuzzyOcr: line: command = pnmtotiff
[20323] dbg: FuzzyOcr: line: args = -color -truecolor
[20323] dbg: FuzzyOcr: line: }
[20323] info: FuzzyOcr: Starting scanset parser for file
"/etc/mail/spamassassin/FuzzyOcr.scansets"...
[20323] dbg: FuzzyOcr: line scanset ocrad {
[20323] dbg: FuzzyOcr: line command = $ocrad
[20323] dbg: FuzzyOcr: line args = -s5 $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset ocrad-invert {
[20323] dbg: FuzzyOcr: line command = $ocrad
[20323] dbg: FuzzyOcr: line args = -s5 -i $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset ocrad-decolorize-invert {
[20323] dbg: FuzzyOcr: line preprocessors = ppmtopgm, pamthreshold, pamtopnm
[20323] dbg: FuzzyOcr: line command = $ocrad
[20323] dbg: FuzzyOcr: line args = -s5 -i $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset ocrad-decolorize {
[20323] dbg: FuzzyOcr: line preprocessors = ppmtopgm, pamthreshold, pamtopnm
[20323] dbg: FuzzyOcr: line command = $ocrad
[20323] dbg: FuzzyOcr: line args = -s5 $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset gocr {
[20323] dbg: FuzzyOcr: line command = $gocr
[20323] dbg: FuzzyOcr: line args = -i $input
[20323] dbg: FuzzyOcr: line }
[20323] dbg: FuzzyOcr: line scanset gocr-180 {
[20323] dbg: FuzzyOcr: line command = $gocr
[20323] dbg: FuzzyOcr: line args = -l 180 -d 2 -i $input
[20323] dbg: FuzzyOcr: line }
[20323] info: FuzzyOcr: Searching in: /usr/local/netpbm/bin
[20323] info: FuzzyOcr: Searching in: /usr/local/bin
[20323] info: FuzzyOcr: Searching in: /usr/bin
[20323] info: FuzzyOcr: Using gifsicle => /usr/local/bin/gifsicle
[20323] info: FuzzyOcr: Using giffix => /usr/bin/giffix
[20323] info: FuzzyOcr: Using giftext => /usr/bin/giftext
[20323] info: FuzzyOcr: Using gifinter => /usr/bin/gifinter
[20323] info: FuzzyOcr: Using giftopnm => /usr/bin/giftopnm
[20323] info: FuzzyOcr: Using jpegtopnm => /usr/bin/jpegtopnm
[20323] info: FuzzyOcr: Using pngtopnm => /usr/bin/pngtopnm
[20323] info: FuzzyOcr: Using bmptopnm => /usr/bin/bmptopnm
[20323] info: FuzzyOcr: Using tifftopnm => /usr/bin/tifftopnm
[20323] info: FuzzyOcr: Using ppmhist => /usr/bin/ppmhist
[20323] info: FuzzyOcr: Using pamfile => /usr/bin/pamfile
[20323] info: FuzzyOcr: Using ocrad => /usr/local/bin/ocrad
[20323] info: FuzzyOcr: Using gocr => /usr/bin/gocr
[20323] info: FuzzyOcr: Using pnmnorm => /usr/bin/pnmnorm
[20323] info: FuzzyOcr: Using pnminvert => /usr/bin/pnminvert
[20323] info: FuzzyOcr: Using convert => /usr/bin/convert
[20323] info: FuzzyOcr: Using ppmtopgm => /usr/bin/ppmtopgm
[20323] info: FuzzyOcr: Using tesseract => /usr/local/bin/tesseract
[20323] dbg: FuzzyOcr: Threshold[max_hash] => 5
[20323] dbg: FuzzyOcr: Threshold[c] => 5
[20323] dbg: FuzzyOcr: Threshold[s] => 0.01
[20323] dbg: FuzzyOcr: Threshold[w] => 0.01
[20323] dbg: FuzzyOcr: Threshold[h] => 0.01
[20323] dbg: FuzzyOcr: Threshold[cn] => 0.01
[20323] dbg: FuzzyOcr: focr_add_score => 1
[20323] dbg: FuzzyOcr: focr_autodisable_negative_score => -5
[20323] dbg: FuzzyOcr: focr_autodisable_score => 1000
[20323] dbg: FuzzyOcr: focr_autosort_buffer => 10
[20323] dbg: FuzzyOcr: focr_autosort_scanset => 1
[20323] dbg: FuzzyOcr: focr_base_score => 5
[20323] dbg: FuzzyOcr: focr_corrupt_score => 2.5
[20323] dbg: FuzzyOcr: focr_corrupt_unfixable_score => 5
[20323] dbg: FuzzyOcr: focr_counts_required => 2
[20323] dbg: FuzzyOcr: focr_db_hash => /etc/mail/spamassassin/FuzzyOcr.db
[20323] dbg: FuzzyOcr: focr_db_max_days => 35
[20323] dbg: FuzzyOcr: focr_db_safe =>
/etc/mail/spamassassin/FuzzyOcr.safe.db
[20323] dbg: FuzzyOcr: focr_digest_db =>
/etc/mail/spamassassin/FuzzyOcr.hashdb
[20323] dbg: FuzzyOcr: focr_enable_image_hashing => 2
[20323] dbg: FuzzyOcr: focr_global_timeout => 0
[20323] dbg: FuzzyOcr: focr_global_wordlist =>
/etc/mail/spamassassin/FuzzyOcr.words
[20323] dbg: FuzzyOcr: focr_hashing_learn_scanned => 1
[20323] dbg: FuzzyOcr: focr_keep_bad_images => 0
[20323] dbg: FuzzyOcr: focr_log_pmsinfo => 1
[20323] dbg: FuzzyOcr: focr_log_stderr => 1
[20323] dbg: FuzzyOcr: focr_logfile => /home/fuzzyocr.log
[20323] dbg: FuzzyOcr: focr_max_height => 800
[20323] dbg: FuzzyOcr: focr_max_width => 800
[20323] dbg: FuzzyOcr: focr_min_height => 4
[20323] dbg: FuzzyOcr: focr_min_width => 4
[20323] dbg: FuzzyOcr: focr_minimal_scanset => 1
[20323] dbg: FuzzyOcr: focr_mysql_db => FuzzyOcr
[20323] dbg: FuzzyOcr: focr_mysql_hash => Hash
[20323] dbg: FuzzyOcr: focr_mysql_host => localhost
[20323] dbg: FuzzyOcr: focr_mysql_port => 3306
[20323] dbg: FuzzyOcr: focr_mysql_safe => Safe
[20323] dbg: FuzzyOcr: focr_mysql_update_hash => 0
[20323] dbg: FuzzyOcr: focr_mysql_user => fuzzyocr
[20323] dbg: FuzzyOcr: focr_no_homedirs => 0
[20323] dbg: FuzzyOcr: focr_path_bin =>
/usr/local/netpbm/bin:/usr/local/bin:/usr/bin
[20323] dbg: FuzzyOcr: focr_personal_wordlist =>
__userstate__/FuzzyOcr.words
[20323] dbg: FuzzyOcr: focr_preprocessor_file =>
/etc/mail/spamassassin/FuzzyOcr.preps
[20323] dbg: FuzzyOcr: focr_scanset_file =>
/etc/mail/spamassassin/FuzzyOcr.scansets
[20323] dbg: FuzzyOcr: focr_score_ham => 0
[20323] dbg: FuzzyOcr: focr_skip_bmp => 0
[20323] dbg: FuzzyOcr: focr_skip_gif => 0
[20323] dbg: FuzzyOcr: focr_skip_jpeg => 0
[20323] dbg: FuzzyOcr: focr_skip_png => 0
[20323] dbg: FuzzyOcr: focr_skip_tiff => 0
[20323] dbg: FuzzyOcr: focr_skip_updates => 0
[20323] dbg: FuzzyOcr: focr_strip_numbers => 1
[20323] dbg: FuzzyOcr: focr_threshold => 0.25
[20323] dbg: FuzzyOcr: focr_timeout => 10
[20323] dbg: FuzzyOcr: focr_twopass_scoring_factor => 1.5
[20323] dbg: FuzzyOcr: focr_unique_matches => 0
[20323] dbg: FuzzyOcr: focr_verbose => 1
[20323] dbg: FuzzyOcr: focr_wrongctype_score => 1.5
[20323] dbg: FuzzyOcr: focr_wrongext_score => 1.5
[20323] info: FuzzyOcr: Loaded preprocessor normalize: /usr/bin/pnmnorm
[20323] info: FuzzyOcr: Loaded preprocessor invert: /usr/bin/pnminvert
[20323] info: FuzzyOcr: Loaded preprocessor ppmtopgm: /usr/bin/ppmtopgm
[20323] info: FuzzyOcr: Loaded preprocessor pamtopnm: pamtopnm
[20323] info: FuzzyOcr: Loaded preprocessor pamthreshold:
pamthreshold -simple -threshold 0.5
[20323] info: FuzzyOcr: Loaded preprocessor maketiff:
pnmtotiff -color -truecolor
[20323] info: FuzzyOcr: Using scan ocrad: /usr/local/bin/ocrad -s5 $input
[20323] info: FuzzyOcr: Using scan ocrad-invert: /usr/local/bin/ocrad -s5 -i
$input
[20323] info: FuzzyOcr: Using scan ocrad-decolorize-invert:
/usr/local/bin/ocrad -s5 -i $input
[20323] info: FuzzyOcr: Using scan ocrad-decolorize:
/usr/local/bin/ocrad -s5 $input
[20323] info: FuzzyOcr: Using scan gocr: /usr/bin/gocr -i $input
[20323] info: FuzzyOcr: Using scan gocr-180: /usr/bin/gocr -l 180 -d 2 -i
$input
[20323] info: FuzzyOcr: Added <47> words from
"/etc/mail/spamassassin/FuzzyOcr.words"
[20323] info: rules: meta test DIGEST_MULTIPLE has undefined dependency
'DCC_CHECK'
[20323] dbg: FuzzyOcr: Starting FuzzyOcr...
[20323] info: FuzzyOcr: Processing Message with ID
"<002b01c78c01$43811f20$2601a8c0 at SKHWDNOTEBOOK>"
(<dienstkonto at uni-leipzig.de> -> <sebastian at deiszner.de>)
[20323] dbg: FuzzyOcr: fname: "testmail.jpg" => "testmail.jpg"
[20323] info: FuzzyOcr: JPEG: [46x144] testmail.jpg (2577)
[20323] dbg: FuzzyOcr: Saved: /tmp/.spamassassin20323q4X4mLtmp/testmail.jpg
[20323] dbg: FuzzyOcr: Saved: /tmp/.spamassassin20323q4X4mLtmp/raw.eml
[20323] info: FuzzyOcr: Found: 1 images
[20323] dbg: FuzzyOcr: pfile =>
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20323] dbg: FuzzyOcr: efile =>
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.err
[20323] dbg: FuzzyOcr: Errors to: /tmp/.spamassassin20323q4X4mLtmp/raw.err
[20323] dbg: FuzzyOcr: File has Content-Type "image/jpeg" and File Extension
"jpg"
[20323] info: FuzzyOcr: Found JPEG header name="testmail.jpg"
[20323] dbg: FuzzyOcr: Saved pid: 20326
[20326] dbg: FuzzyOcr: Exec : /usr/bin/jpegtopnm
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg
[20326] dbg: FuzzyOcr: Stdout:
>/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20326] dbg: FuzzyOcr: Stderr:
>>/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.err
[20323] dbg: FuzzyOcr: Elapsed [20326]: 0.009051 sec. (/usr/bin/jpegtopnm:
exit 0)
[20323] info: FuzzyOcr: Calculating image hash for:
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20323] dbg: FuzzyOcr: Saved pid: 20327
[20327] dbg: FuzzyOcr: Exec : /usr/bin/ppmhist -noheader
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20327] dbg: FuzzyOcr: Stdout:
>/tmp/.spamassassin20323q4X4mLtmp/ppmhist.info
[20327] dbg: FuzzyOcr: Stderr: >/dev/null
[20323] dbg: FuzzyOcr: Elapsed [20327]: 0.009170 sec. (/usr/bin/ppmhist:
exit 0)
[20323] dbg: FuzzyOcr: Got:
<19886:46:144:208::255:255:255:255:3678::0:0:0:0:436::254:254:254:254:152::253:253:253:253:141::252:252:252:252:134::251:251:251:251:123>
[20323] info: FuzzyOcr: Scanset Order: ocrad(0) ocrad-invert(0)
ocrad-decolorize-invert(0) ocrad-decolorize(0) gocr(0) gocr-180(0)
[20323] dbg: FuzzyOcr: Saved pid: 20328
[20328] dbg: FuzzyOcr: Exec : /usr/local/bin/ocrad -s5
/tmp/.spamassassin20323q4X4mLtmp/testmail.jpg.pnm
[20328] dbg: FuzzyOcr: Stdout:
>/tmp/.spamassassin20323q4X4mLtmp/scanset.ocrad.out
[20328] dbg: FuzzyOcr: Stderr:
>/tmp/.spamassassin20323q4X4mLtmp/scanset.ocrad.err
[20323] dbg: FuzzyOcr: Elapsed [20328]: 0.018049 sec. (/usr/local/bin/ocrad:
exit 0)
[20323] dbg: FuzzyOcr: ocrdata=>>viagra
[20323] dbg: FuzzyOcr:
[20323] dbg: FuzzyOcr: <<=end
[20323] info: FuzzyOcr: Scanset "ocrad" found word "viagra" with fuzz of
0.0000
[20323] info: FuzzyOcr: line: "viagra"
[20323] dbg: FuzzyOcr: Not enough OCR Hits without space stripping, doing
second matching pass...
[20323] info: FuzzyOcr: Scanset "ocrad" found word "viagra" with fuzz of
0.0000
[20323] info: FuzzyOcr: line: "viagra"
[20323] info: FuzzyOcr: Scanset "ocrad" generates enough hits (2), skipping
further scansets...
[20323] info: FuzzyOcr: Message is spam, score = 5.000
[20323] info: FuzzyOcr: Adding Hash to "/etc/mail/spamassassin/FuzzyOcr.db"
with score "5.000"
[20323] dbg: FuzzyOcr: Digest:
19886:46:144:208::255:255:255:255:3678::0:0:0:0:436::254:254:254:254:152::253:253:253:253:141::252:252:252:252:134::251:251:251:251:123
[20323] info: FuzzyOcr: Words found:
[20323] info: FuzzyOcr: "viagra" in 1 lines
[20323] info: FuzzyOcr: "viagra" in 1 lines
[20323] info: FuzzyOcr: (2 word occurrences found)
[20323] dbg: FuzzyOcr: Remove DIR: /tmp/.spamassassin20323q4X4mLtmp
[20323] dbg: FuzzyOcr: FuzzyOcr ending successfully...
[20323] dbg: FuzzyOcr: Processed in 0.075586 sec.
>From dienstkonto at uni-leipzig.de Tue May 1 16:59:08 2007
Received: from localhost by 86-56-25-43
with SpamAssassin (version 3.1.8);
Tue, 01 May 2007 17:15:19 +0200
From: <dienstkonto at uni-leipzig.de>
To: <ich at deiszner.de>
Subject: E-Mail schreiben an: testmail.jpg
Date: Tue, 1 May 2007 16:58:42 +0200
Message-Id: <002b01c78c01$43811f20$2601a8c0 at SKHWDNOTEBOOK>
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on 86-56-25-43
X-Spam-Level: *****
X-Spam-Status: Yes, score=5.5 required=5.0 tests=FUZZY_OCR,NO_REAL_NAME
autolearn=no version=3.1.8
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_46375987.485D3BC8"
This is a multi-part message in MIME format.
------------=_46375987.485D3BC8
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Software zur Erkennung von "Spam" auf dem Rechner
cdu-anhalt-bitterfeld.de
hat die eingegangene E-mail als mögliche "Spam"-Nachricht identifiziert.
Die ursprüngliche Nachricht wurde an diesen Bericht angehängt, so dass
Sie sie anschauen können (falls es doch eine legitime E-Mail ist) oder
ähnliche unerwünschte Nachrichten in Zukunft markieren können.
Bei Fragen zu diesem Vorgang wenden Sie sich bitte an
ich at cdu-anhalt-bitterfeld.de
Vorschau: Die Nachricht kann jetzt mit folgender Datei oder Link als
Anlage gesendet werden: testmail.jpg Hinweis: E-Mail-Programme können das
Senden oder Empfangen von bestimmten Dateitypen als Anlagen aufgrund von
Computerviren verhindern. Überprüfen Sie die
E-Mail-Sicherheitseinstellungen,
um zu ermitteln, wie Anlagen gehandhabt werden. [...]
Inhaltsanalyse im Detail: (5.5 Punkte, 5.0 benötigt)
Pkte Regelname Beschreibung
---- ---------------------- --------------------------------------------------
0.6 NO_REAL_NAME Kein vollständiger Name in Absendeadresse
5.0 FUZZY_OCR BODY: Mail contains an image with common spam
text inside
Words found:
"viagra" in 1 lines
"viagra" in 1
lines
(2 word occurrences found)
Die ursprüngliche Nachricht enthielt nicht ausschließlich Klartext
(plain text) und kann eventuell eine Gefahr für einige E-Mail-Programme
darstellen (falls sie z.B. einen Computervirus enthält).
Möchten Sie die Nachricht dennoch ansehen, ist es wahrscheinlich
sicherer, sie zuerst in einer Datei zu speichern und diese Datei danach
mit einem Texteditor zu öffnen.
------------=_46375987.485D3BC8
Content-Type: message/rfc822; x-spam-type=original
Content-Description: original message before SpamAssassin
Content-Disposition: attachment
Content-Transfer-Encoding: 8bit
Return-Path: <dienstkonto at uni-leipzig.de>
X-Spam-Flag: NO
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on 86-56-25-43
X-Spam-Level:
X-Spam-Status: No, hits=-101.3 required=4.0 tests=AWL,BAYES_00,NO_REAL_NAME,
USER_IN_WHITELIST autolearn=no version=3.1.8
X-Original-To: ich at deiszner.de
Delivered-To: ich at localhost
Received: from v1.rz.uni-leipzig.de (v1.rz.uni-leipzig.de [139.18.1.26])
by cdu-anhalt-bitterfeld.de (Postfix) with ESMTP id ED1FB4A00FA
for <ich at deiszner.de>; Tue, 1 May 2007 16:59:06 +0200 (CEST)
X-Virus-Scanned: by amavisd-new at v1-ul
Received: from v1.rz.uni-leipzig.de ([127.0.0.1])
by localhost (v1.rz.uni-leipzig.de [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id 41Fg1iwRrjLN for <ich at deiszner.de>;
Tue, 1 May 2007 16:59:06 +0200 (CEST)
Received: from server1.rz.uni-leipzig.de (server1.rz.uni-leipzig.de
[139.18.1.1])
by v1.rz.uni-leipzig.de (Postfix) with ESMTP id 3D608126F4
for <ich at deiszner.de>; Tue, 1 May 2007 16:59:06 +0200 (CEST)
Received: from SKHWDNOTEBOOK (110.75.203.62.cust.bluewin.ch [62.203.75.110])
by server1.rz.uni-leipzig.de (Postfix) with ESMTP id 52B2421
for <ich at deiszner.de>; Tue, 1 May 2007 16:59:04 +0200 (METDST)
Message-ID: <002b01c78c01$43811f20$2601a8c0 at SKHWDNOTEBOOK>
From: <dienstkonto at uni-leipzig.de>
To: <ich at deiszner.de>
Subject: E-Mail schreiben an: testmail.jpg
Date: Tue, 1 May 2007 16:58:42 +0200
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_0009_01C78C11.F95B9590"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3028
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
X-Virus-Status: No
X-Virus-Checker-Version: clamassassin 1.2.3 with clamscan / ClamAV
0.90.2/3188/Tue May 1 12:24:57 2007 signatures .
This is a multi-part message in MIME format.
Mehr Informationen über die Mailingliste Postfixbuch-users