[00:17:52] 06Operations, 06Commons, 10Datasets-General-or-Unknown, 10Dumps-Generation, 07Community-Wishlist-Survey-2016: Back up of Commons files - https://phabricator.wikimedia.org/T160229#3092719 (10Zppix) I'd be willing to help with backing it up [00:21:38] PROBLEM - puppet last run on analytics1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:32:02] (03PS7) 10Zppix: service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [00:51:38] RECOVERY - puppet last run on analytics1044 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [00:55:58] PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:23:58] RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [01:51:48] PROBLEM - puppet last run on labvirt1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:08:58] PROBLEM - puppet last run on db1085 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:20:06] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.15) (duration: 07m 25s) [02:20:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:21:49] RECOVERY - puppet last run on labvirt1005 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [02:25:38] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Mar 12 02:25:37 UTC 2017 (duration 5m 32s) [02:25:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:35:58] RECOVERY - puppet last run on db1085 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [02:40:58] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:08:59] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [03:24:18] PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:30:58] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:33:08] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [03:45:48] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:53:18] RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [03:58:58] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [04:01:08] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [04:08:28] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=1441.60 Read Requests/Sec=2756.00 Write Requests/Sec=7.30 KBytes Read/Sec=36111.60 KBytes_Written/Sec=2387.20 [04:08:48] PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100% [04:09:18] RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 36.19 ms [04:13:48] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [04:17:28] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=69.80 Read Requests/Sec=0.10 Write Requests/Sec=3.00 KBytes Read/Sec=0.40 KBytes_Written/Sec=85.20 [05:20:48] PROBLEM - puppet last run on analytics1056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:48:48] RECOVERY - puppet last run on analytics1056 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:29:58] PROBLEM - puppet last run on rdb1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:56:58] RECOVERY - puppet last run on rdb1005 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [07:19:18] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:23:58] PROBLEM - puppet last run on mc1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:39:48] PROBLEM - puppet last run on dbproxy1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:48:18] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:51:58] RECOVERY - puppet last run on mc1026 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [07:58:21] mw2256 again DOWN/UP, there must be something weird ongoing [08:06:39] 08:05:27 up 3:56, 1 user, load average: 0.05, 0.05, 0.01 [08:07:48] RECOVERY - puppet last run on dbproxy1007 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [08:11:00] 06Operations, 10ops-codfw, 13Patch-For-Review, 15User-Elukey: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#3093730 (10elukey) Happened again today: ``` 04:08 PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100% 04:09 RECOVERY - Host mw2... [09:01:58] PROBLEM - puppet last run on notebook1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:13:18] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:58] RECOVERY - puppet last run on notebook1002 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [09:42:18] RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [09:51:58] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:10:12] <_joe_> elukey: as I said, I suspect a double assignment of a IP address [10:15:58] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:20:58] RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [10:43:58] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:51:16] _joe_: yeah I'll try to re-check again, but didn't find evidence of the double ip assignment.. [10:51:27] (maybe I haven't looked in the right place :) [10:51:46] I added a note in the task to investigate though [11:56:08] PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:13:58] PROBLEM - puppet last run on ganeti1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:24:08] RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [12:41:58] RECOVERY - puppet last run on ganeti1004 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [12:43:08] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:00:14] Hi, im wondering should i create a task on the basis that yahoo is slowing mail comming from wikimedia servers? [13:10:08] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [13:15:36] paladox: sounds like something to report to yahoo rather than to wmf? [13:15:50] I did report it to them [13:16:05] But sending an email using a command on my mac works straight away [13:18:59] ....so? [13:22:23] That would indicate a wikimedia problem. As if it works for me then wikimedias mail servers are not doing something yahoo servers are liking [13:26:15] as you know, given you've commented on https://phabricator.wikimedia.org/T58414 it's an ongoing issue with Yahoo being shitty about e-mail in general. [13:27:11] the best advice is move your e-mail to a better service such as gMail, forward your Yahoo to gMail initially and change over all your accounts using Yahoo as you go. [13:46:54] I use btemail so i dought that one will move away from yahoo anytime soon as there migration is on hold. [13:47:33] btemail as in i have a private one then the one i use for yahoo. but btemail is using mix of bt email servers and yahoo. [13:47:45] Also i like the 1tb yahoo gives me. [13:48:58] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:16:58] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [14:39:28] PROBLEM - Juniper alarms on mr1-eqiad is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 208.80.154.199 [14:40:28] RECOVERY - Juniper alarms on mr1-eqiad is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [14:44:58] PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:08:58] PROBLEM - puppet last run on conf1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:13:58] RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [15:36:58] RECOVERY - puppet last run on conf1002 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:19:49] (03CR) 10Volans: [C: 04-1] "I did review only the python script, see a couple of inline comments in it. From the look of the puppet code it seems that it's not follow" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/341005 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [16:24:56] (03CR) 10Volans: [C: 031] "LGTM, but the use of an officially registered bot would be advisable." [puppet] - 10https://gerrit.wikimedia.org/r/342222 (owner: 10Muehlenhoff) [16:45:08] PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:14:08] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [19:09:58] PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100% [19:10:28] RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 36.15 ms [21:21:52] (03CR) 10Hashar: "Gerrit does not send mails directly but uses a smart relay:" [puppet] - 10https://gerrit.wikimedia.org/r/342313 (owner: 10Paladox) [21:29:38] (03CR) 10Paladox: "I did create this T159960 task and my email header is https://phabricator.wikimedia.org/P5028" [puppet] - 10https://gerrit.wikimedia.org/r/342313 (owner: 10Paladox) [21:40:20] 06Operations, 10Gerrit, 10Mail, 06Release-Engineering-Team: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084643 (10valhallasw) From P5028 : ``` Received: from 127.0.0.1 (EHLO mx1001.wikimedia.org) (208.80.154.76) by mta1470.mail.gq1.yahoo.com with SMTPS;... [22:02:19] 06Operations, 10Gerrit, 10Mail, 06Release-Engineering-Team: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3094454 (10Paladox) Filled here https://forums.yahoo.net/t5/Sending-and-receiving/Receiving-emails-from-some-of-wikipedia-s-domains-is-taking/m-p/203788/hi... [22:04:04] 06Operations, 10Gerrit, 10Mail, 06Release-Engineering-Team: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3094456 (10Paladox) Other users have reported it here https://forums.yahoo.net/t5/Sending-and-receiving/Delays-in-receiving-emails/m-p/205982/highlight/fal... [23:13:58] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.012 second response time [23:15:58] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.029 second response time [23:23:18] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:52:18] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [23:54:39] PROBLEM - puppet last run on prometheus2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues