Project

General

Profile

Actions

Demande #3021

closed

Demande #2979: Améliorer la déliverabilité de nos courriels

Traiter les erreurs 5xx sur les emails

Added by François Poulain over 6 years ago. Updated over 3 years ago.

Status:
Fermé
Priority:
Normale
Category:
-
Target version:
Start date:
03/14/2018
Due date:
% Done:

0%

Estimated time:
Difficulté:
2 Facile

Description

Actuellement on monitore les files d'email avec mailq et qshape. Mais si un serveur refuse notre connexion c'est un comportement transparent au monitoring.

Il faut grepper les logs pour voir apparaître nos relations avec les serveurs vers lesquels on expédie, et remonter une alerte si le taux d'erreur est trop élevé.


Files

check_qshape.py (6.34 KB) check_qshape.py François Poulain, 09/06/2019 02:09 PM
check_bounces.py (7.11 KB) check_bounces.py François Poulain, 09/06/2019 03:28 PM

Related issues 1 (0 open1 closed)

Related to Admins - Demande #3187: Mieux comprendre les erreurs emailsFerméFrançois Poulain06/01/2018

Actions
Actions #1

Updated by François Poulain over 6 years ago

Ceci donne un premier indice :

grep status /var/log/mail.log.1 | grep -v 'relay=\(127.0.0.1\|local\|172.16.0.[0-9]\+\)' | grep -o 'status=[a-z]\+' | sort | uniq -c

Sur mail:

# grep status /var/log/mail.log.1 | grep -v 'relay=\(127.0.0.1\|local\|172.16.0.[0-9]\+\)' | grep -o 'status=[a-z]\+' | sort | uniq -c
   3258 status=bounced
  41088 status=deferred
    227 status=expired
  61687 status=sent

Sur un petit srv perso :

# grep status /var/log/mail.log.1 | grep -v 'relay=\(127.0.0.1\|local\|172.16.0.[0-9]\+\)' | grep -o 'status=[a-z]\+' | sort | uniq -c
      6 status=bounced
     46 status=deferred
   2319 status=sent

Actions #2

Updated by François Poulain over 6 years ago

En remontant dans le temps pour voir ce qui se passe :

======== /var/log/mail.log.52.gz ========
   1254 status=bounced
  20800 status=deferred
     70 status=expired
  44041 status=sent
======== /var/log/mail.log.51.gz ========
    806 status=bounced
  32369 status=deferred
    149 status=expired
  43568 status=sent
======== /var/log/mail.log.50.gz ========
    577 status=bounced
  15528 status=deferred
     46 status=expired
  17542 status=sent
======== /var/log/mail.log.49.gz ========
    959 status=bounced
  24425 status=deferred
    108 status=expired
  63238 status=sent
======== /var/log/mail.log.48.gz ========
    728 status=bounced
  20431 status=deferred
     78 status=expired
  17796 status=sent
======== /var/log/mail.log.47.gz ========
   1399 status=bounced
  41192 status=deferred
    111 status=expired
  35967 status=sent
======== /var/log/mail.log.46.gz ========
    746 status=bounced
  18989 status=deferred
    164 status=expired
  68464 status=sent
======== /var/log/mail.log.45.gz ========
    685 status=bounced
  16917 status=deferred
     87 status=expired
  69933 status=sent
======== /var/log/mail.log.44.gz ========
    724 status=bounced
  62526 status=deferred
     78 status=expired
  51419 status=sent
======== /var/log/mail.log.43.gz ========
   1535 status=bounced
 128312 status=deferred
    252 status=expired
  71554 status=sent
======== /var/log/mail.log.42.gz ========
    954 status=bounced
  46654 status=deferred
    207 status=expired
  40905 status=sent
======== /var/log/mail.log.41.gz ========
   1864 status=bounced
  63824 status=deferred
     96 status=expired
  96238 status=sent
======== /var/log/mail.log.40.gz ========
    441 status=bounced
  21313 status=deferred
     19 status=expired
  66549 status=sent
======== /var/log/mail.log.39.gz ========
    441 status=bounced
  11190 status=deferred
     32 status=expired
  35736 status=sent
======== /var/log/mail.log.38.gz ========
    545 status=bounced
  32119 status=deferred
    123 status=expired
  54105 status=sent
======== /var/log/mail.log.37.gz ========
    605 status=bounced
  23014 status=deferred
     97 status=expired
  47575 status=sent
======== /var/log/mail.log.36.gz ========
    426 status=bounced
  13960 status=deferred
     60 status=expired
  42070 status=sent
======== /var/log/mail.log.35.gz ========
   1270 status=bounced
  27563 status=deferred
     56 status=expired
  29064 status=sent
======== /var/log/mail.log.34.gz ========
    949 status=bounced
  40742 status=deferred
    275 status=expired
  43303 status=sent
======== /var/log/mail.log.33.gz ========
    552 status=bounced
  10730 status=deferred
     76 status=expired
  16007 status=sent
======== /var/log/mail.log.32.gz ========
    424 status=bounced
  12929 status=deferred
     21 status=expired
  21529 status=sent
======== /var/log/mail.log.31.gz ========
    666 status=bounced
  11704 status=deferred
     66 status=expired
 100985 status=sent
======== /var/log/mail.log.30.gz ========
   1327 status=bounced
  39379 status=deferred
     41 status=expired
  50981 status=sent
======== /var/log/mail.log.29.gz ========
    604 status=bounced
  49388 status=deferred
    211 status=expired
  57371 status=sent
======== /var/log/mail.log.28.gz ========
    696 status=bounced
  32752 status=deferred
     75 status=expired
  26038 status=sent
======== /var/log/mail.log.27.gz ========
    448 status=bounced
  20059 status=deferred
     83 status=expired
  24808 status=sent
======== /var/log/mail.log.26.gz ========
    501 status=bounced
  33729 status=deferred
     77 status=expired
  12933 status=sent
======== /var/log/mail.log.25.gz ========
   1879 status=bounced
  57966 status=deferred
    229 status=expired
  70861 status=sent
======== /var/log/mail.log.24.gz ========
    722 status=bounced
  21268 status=deferred
     64 status=expired
  35044 status=sent
======== /var/log/mail.log.23.gz ========
      6 status=deferred
      1 status=sent
======== /var/log/mail.log.22.gz ========
     83 status=bounced
   5780 status=deferred
     64 status=expired
   6284 status=sent
======== /var/log/mail.log.21.gz ========
    738 status=bounced
  21598 status=deferred
     91 status=expired
  46008 status=sent
======== /var/log/mail.log.20.gz ========
    575 status=bounced
  16405 status=deferred
    117 status=expired
  37422 status=sent
======== /var/log/mail.log.19.gz ========
   1416 status=bounced
  40617 status=deferred
     86 status=expired
  31145 status=sent
======== /var/log/mail.log.18.gz ========
    496 status=bounced
  17452 status=deferred
    229 status=expired
  37518 status=sent
======== /var/log/mail.log.17.gz ========
   1080 status=bounced
  15386 status=deferred
     62 status=expired
  49990 status=sent
======== /var/log/mail.log.16.gz ========
    906 status=bounced
  17304 status=deferred
     82 status=expired
  34790 status=sent
======== /var/log/mail.log.15.gz ========
   1448 status=bounced
  30976 status=deferred
     94 status=expired
  38957 status=sent
======== /var/log/mail.log.14.gz ========
    607 status=bounced
  14832 status=deferred
    172 status=expired
  28377 status=sent
======== /var/log/mail.log.13.gz ========
    910 status=bounced
  19809 status=deferred
     49 status=expired
  47211 status=sent
======== /var/log/mail.log.12.gz ========
    973 status=bounced
  23172 status=deferred
    110 status=expired
  33582 status=sent
======== /var/log/mail.log.11.gz ========
      9 status=bounced
    442 status=deferred
      3 status=expired
    210 status=sent
======== /var/log/mail.log.10.gz ========
   2280 status=bounced
  28941 status=deferred
     43 status=expired
  38283 status=sent
======== /var/log/mail.log.9.gz ========
   1183 status=bounced
  21788 status=deferred
    255 status=expired
  38946 status=sent
======== /var/log/mail.log.8.gz ========
   1257 status=bounced
  33082 status=deferred
    173 status=expired
  27051 status=sent
======== /var/log/mail.log.7.gz ========
    946 status=bounced
  22904 status=deferred
    123 status=expired
  29684 status=sent
======== /var/log/mail.log.6.gz ========
   2121 status=bounced
  45794 status=deferred
     95 status=expired
  56044 status=sent
======== /var/log/mail.log.5.gz ========
   1346 status=bounced
  30627 status=deferred
    341 status=expired
  66218 status=sent
======== /var/log/mail.log.4.gz ========
   1738 status=bounced
  27895 status=deferred
    115 status=expired
  51034 status=sent
======== /var/log/mail.log.3.gz ========
    450 status=bounced
  13314 status=deferred
    128 status=expired
  28299 status=sent
======== /var/log/mail.log.2.gz ========
   2697 status=bounced
  25577 status=deferred
     88 status=expired
  78895 status=sent

Actions #3

Updated by François Poulain over 6 years ago

La commande :

for n in $(seq 52 -1 2); do echo ======== /var/log/mail.log.$n.gz ========; zgrep status /var/log/mail.log.$n.gz | grep -v 'relay=\(127.0.0.1\|local\|172.16.0.[0-9]\+\)' | grep -o 'status=[a-z]\+' | sort | uniq -c; done

Actions #4

Updated by Quentin Gibeaux over 6 years ago

  • Target version changed from Mars 2018 to Avril 2018
Actions #5

Updated by Quentin Gibeaux over 6 years ago

  • Assignee changed from François Poulain to Edouard Dausque

Mettre en place une telle sonde.
Pour info : mails de fred :

Salut,

il pourrait être utile d'avoir une sonde pour surveiller la mail queue
et les refus de courriels de la part des fournisseurs qui nous prennent
pour des spammeurs.

En cherchant rapidement j'ai trouvé
https://blog.christosoft.de/2014/08/icinga-monitor-refused-mails-postfix-mailqueue

Le script utilise les messages de la commande mailq. Pour mon propre
besoin (laposte.net qui des fois rejette des courriels sur le serveur
co-géré perso avec Benj) j'ai ajouté "service refused, please try
later".

Je ne suis pas sûr à 100% que le script fonctionne car pour le moment
sur la VM mail la commande mailq n'affiche pas de messages refusés
(pareil sur le serveur co-géré). Ou alors il y en a mais mailq n'affiche
pas le texte de rejet.

Lors du prochain mailing (lettres d'info ou mailing gDTC) on en aura forcément.

PS :

Dans le script, il y a une erreur sur la page :

remplacer

mails=`mailq | grep -oP "(refused to talk to me(?!(.*out of connection slots)))|(unsolicited mail originating from your IP)|(temporarily deferred due to user complai$

par

mails=`mailq | grep -oP "(refused to talk to me(?!(.*out of connection slots)))|(unsolicited mail originating from your IP)|(temporarily deferred due to user complaints)"` 

En fait, la ligne correcte serait plutôt (il manquait notamment le wc -l):

mails=`mailq | grep -oP "(refused to talk to me(?!(.*out of connection slots)))|(unsolicited mail originating from your IP)|(temporarily deferred due to user complaints)|(.*service refused, please try later.*)"|wc -l`"`

Fred.
Actions #6

Updated by François Poulain over 6 years ago

La situation ces 4 derniers jours:

# zgrep status /var/log/mail.log| grep -v 'relay=\(127.0.0.1\|local\|172.16.0.[0-9]\+\)' | grep -o 'status=[a-z]\+' | sort | uniq -c
   2262 status=bounced
 274928 status=deferred
   3430 status=expired
  25209 status=sent

Actions #7

Updated by Quentin Gibeaux over 6 years ago

  • Target version changed from Avril 2018 to Mai 2018
Actions #8

Updated by François Poulain over 6 years ago

Sinon on continue toujours d'avoir pas mal de bounces et deferred. Je pense qu'il faudrait en plus nettoyer notre dtc et notre sympa des emails à l'abandon (ils sont nombreux). Normalement sympa fait ça tout seul mais c'est sûrement mieux de vérifier.

Actions #9

Updated by Quentin Gibeaux over 6 years ago

  • Target version changed from Mai 2018 to Juin 2018
Actions #10

Updated by François Poulain over 6 years ago

La situation ces derniers jours :

(April) root@mail:~# for n in $(seq 10 -1 2); do echo ======== /var/log/mail.log.$n.gz ========; zgrep status /var/log/mail.log.$n.gz | grep -v 'relay=\(127.0.0.1\|local\|172.16.0.[0-9]\+\)' | grep -o 'status=[a-z]\+' | sort | uniq -c; done
======== /var/log/mail.log.10.gz ========
    827 status=bounced
  18507 status=deferred
    144 status=expired
  26017 status=sent
======== /var/log/mail.log.9.gz ========
    488 status=bounced
  22121 status=deferred
    100 status=expired
  28357 status=sent
======== /var/log/mail.log.8.gz ========
   2994 status=bounced
 624187 status=deferred
    127 status=expired
  56417 status=sent
======== /var/log/mail.log.7.gz ========
   2664 status=bounced
 298052 status=deferred
   3542 status=expired
  61501 status=sent
======== /var/log/mail.log.6.gz ========
    603 status=bounced
  32835 status=deferred
    147 status=expired
  50248 status=sent
======== /var/log/mail.log.5.gz ========
    958 status=bounced
  35820 status=deferred
    182 status=expired
  35727 status=sent
======== /var/log/mail.log.4.gz ========
   1858 status=bounced
  61413 status=deferred
     64 status=expired
  72616 status=sent
======== /var/log/mail.log.3.gz ========
    420 status=bounced
  19622 status=deferred
    249 status=expired
  33156 status=sent
======== /var/log/mail.log.2.gz ========
    943 status=bounced
  25045 status=deferred
    130 status=expired
  45488 status=sent
(April) root@mail:~# zgrep status /var/log/mail.log.1 | grep -v 'relay=\(127.0.0.1\|local\|172.16.0.[0-9]\+\)' | grep -o 'status=[a-z]\+' | sort | uniq -c
    683 status=bounced
  20577 status=deferred
     69 status=expired
  24545 status=sent

On continue a avoir beaucoup de deferred. Il faut travailler à identifier où et pourquoi.

Actions #11

Updated by François Poulain over 6 years ago

J'ai créé la tache dédiée #3187.

Actions #12

Updated by François Poulain over 6 years ago

  • Related to Demande #3187: Mieux comprendre les erreurs emails added
Actions #13

Updated by François Poulain over 6 years ago

  • Related to Demande #3187: Mieux comprendre les erreurs emails added
Actions #14

Updated by François Poulain over 6 years ago

  • Related to deleted (Demande #3187: Mieux comprendre les erreurs emails)
Actions #15

Updated by Quentin Gibeaux over 6 years ago

  • Target version changed from Juin 2018 to Été 2018
Actions #16

Updated by Edouard Dausque about 6 years ago

  • Subject changed from Monitorer les erreurs 5xx sur les emails to Traiter les erreurs 5xx sur les emails

pendant la revue de sprint de début septembre, nous évoquions le fait qu'il serait pertinent d'étudier ce qui est actuellement en place (mails envoyés par sympa, par dtc, par autre ?) dans le cas d'un bounce.
Où ces messages arrivent, comment/quand sont-ils traiter ?

à étudier/documenter

Actions #17

Updated by Quentin Gibeaux about 6 years ago

  • Target version changed from Été 2018 to Septembre 2018
Actions #18

Updated by Quentin Gibeaux about 6 years ago

  • Target version changed from Septembre 2018 to Octobre 2018
Actions #19

Updated by Quentin Gibeaux about 6 years ago

  • Assignee deleted (Edouard Dausque)
Actions #20

Updated by Quentin Gibeaux about 6 years ago

  • Target version changed from Octobre 2018 to Backlog
Actions #21

Updated by Quentin Gibeaux almost 6 years ago

  • Assignee set to François Poulain
  • Target version changed from Backlog to Décembre 2018
Actions #22

Updated by François Poulain almost 6 years ago

J'ai commencé à coder une sonde check_bounces.py.

Actions #23

Updated by Quentin Gibeaux almost 6 years ago

  • Target version changed from Décembre 2018 to Janvier 2019
Actions #24

Updated by Quentin Gibeaux almost 6 years ago

  • Target version changed from Janvier 2019 to Février 2019
Actions #25

Updated by Quentin Gibeaux almost 6 years ago

  • Target version changed from Février 2019 to Mars 2019
Actions #26

Updated by Quentin Gibeaux over 5 years ago

  • Target version changed from Mars 2019 to Avril 2019
Actions #27

Updated by Quentin Gibeaux over 5 years ago

  • Target version changed from Avril 2019 to Mai 2019
Actions #28

Updated by Quentin Gibeaux over 5 years ago

  • Target version changed from Mai 2019 to Juin 2019
Actions #29

Updated by Quentin Gibeaux over 5 years ago

  • Target version changed from Juin 2019 to Été 2019
Actions #30

Updated by Quentin Gibeaux about 5 years ago

  • Target version changed from Été 2019 to Septembre 2019
Actions #31

Updated by Loïc Dachary about 5 years ago

  • Assignee changed from François Poulain to Loïc Dachary
Actions #32

Updated by François Poulain about 5 years ago

François Poulain a écrit :

J'ai commencé à coder une sonde check_bounces.py.

Le script WIP en PJ.

Actions #33

Updated by Loïc Dachary about 5 years ago

Pour info qshape est un script de moins de 400 lignes qui n'a pas bougé depuis trois ans.

Actions #34

Updated by François Poulain about 5 years ago

François Poulain a écrit :

François Poulain a écrit :

J'ai commencé à coder une sonde check_bounces.py.

Le script WIP en PJ.

Le bon script.

Actions #35

Updated by Loïc Dachary about 5 years ago

TL;DR: le taux de bounce est bon, i.e. inférieur à 2%

$ cd /var/log ; pflogsumm --detail 5 mail.log mail.log.1
Grand Totals
------------
messages

  60832   received
 172374   delivered
  27732   forwarded
   1374   deferred  (65998  deferrals)
   3003   bounced
  24849   rejected (12%)
      0   reject warnings
      0   held
      0   discarded (0%)

Donc un bounce rate de 3003/172374 ~= 1.7% et en cherchant vaguement sur le net:

The benchmark for bounces is less than 2%. Anything above a 2% bounce rate for your email campaign is worthy of your attention. If you’re seeing bounce rates over 5%, or even as high as 10% or greater, this suggests a significant problem that you will want to resolve. Making Sense of Email Bounce Rates

As per our Sending Policy, your bounce rate should be below 8%. What is an acceptable bounce rate?

Et il y a un nombre incalculable de pages qui disent que le bounce rate doit être bas sans donner de nombre exact, ce qui n'est pas super utile.

Actions #36

Updated by Loïc Dachary about 5 years ago

Pour analyser les erreurs dtc, on pourrait analyser les logs dtc. Si ce n'est pas possible, comment on fait pour savoir qu'un courriel vient de dtc en regardant les logs postfix sur le relai ?

Actions #37

Updated by Quentin Gibeaux about 5 years ago

  • Target version changed from Septembre 2019 to Octobre 2019
Actions #38

Updated by François Poulain about 5 years ago

Pour analyser les erreurs dtc, on pourrait analyser les logs dtc.

Fyi, gDTC stock les bounce en db pour traitement manuel.

Si ce n'est pas possible, comment on fait pour savoir qu'un courriel vient de dtc en regardant les logs postfix sur le relai ?

fyi, client=dtc.cluster.april.org[172.16.0.6] ou message-id=<gdtc-out-xxxxxxx@april.org> ou from=<adherents-out@april.org> te renseigneront.

Actions #39

Updated by Quentin Gibeaux about 5 years ago

  • Target version changed from Octobre 2019 to Backlog
Actions #40

Updated by Loïc Dachary about 5 years ago

  • Assignee deleted (Loïc Dachary)
Actions #41

Updated by François Poulain over 3 years ago

  • Status changed from Nouveau to Résolu

Le taux de bounce reste identique à celui relevé par Loic à l'époque. Après les envois de convoc + vote en ligne :

Grand Totals
------------
messages

  77117   received
 171407   delivered
  47838   forwarded
   4364   deferred  (89652  deferrals)
   4029   bounced
  10927   rejected (5%)
      0   reject warnings
      0   held
      0   discarded (0%)

Pour moi la situation est actuellement saine. On peut fermer.

Actions #42

Updated by Christian P. Momon over 3 years ago

  • Assignee set to François Poulain
Actions #43

Updated by Quentin Gibeaux over 3 years ago

  • Status changed from Résolu to Fermé
Actions

Also available in: Atom PDF