[00:00:06] (03CR) 10jerkins-bot: [V: 04-1] planet: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/342163 (owner: 10Dzahn) [00:01:08] RECOVERY - MariaDB Slave Lag: s2 on db1047 is OK: OK slave_sql_lag Replication lag: 0.50 seconds [00:06:55] (03PS4) 10Dzahn: planet: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/342163 [00:07:34] !log going to deploy updater patch on wdq1003. The host is in maintenance, not a production deployment. [00:07:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:08:20] !log smalyshev@tin Started deploy [wdqs/wdqs@UNKNOWN]: Deploy new updater on 1003 for potential connection drop fix [00:08:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:08:36] !log smalyshev@tin Finished deploy [wdqs/wdqs@UNKNOWN]: Deploy new updater on 1003 for potential connection drop fix (duration: 00m 16s) [00:08:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:13:10] (03PS5) 10Dzahn: planet: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/342163 [00:16:49] !log smalyshev@tin Started deploy [wdqs/wdqs@UNKNOWN]: Deploy new updater on 1003 for potential connection drop fix [00:16:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:17:36] (03CR) 10Dzahn: [C: 031] "now no-diff: http://puppet-compiler.wmflabs.org/5751/" [puppet] - 10https://gerrit.wikimedia.org/r/342163 (owner: 10Dzahn) [00:19:05] !log smalyshev@tin Finished deploy [wdqs/wdqs@UNKNOWN]: Deploy new updater on 1003 for potential connection drop fix (duration: 02m 15s) [00:19:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:49:09] PROBLEM - puppet last run on labvirt1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:02:18] PROBLEM - puppet last run on mc1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:04:18] RECOVERY - puppet last run on mc1010 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [01:17:09] RECOVERY - puppet last run on labvirt1009 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [01:23:18] PROBLEM - puppet last run on mc1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:36:58] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [01:51:18] RECOVERY - puppet last run on mc1026 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [02:16:55] !ops [02:22:17] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.15) (duration: 07m 41s) [02:22:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:23:08] PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100% [02:23:38] RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 36.04 ms [02:55:34] (03PS1) 1020after4: fix a couple of bugs in scap clean [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342297 [02:56:20] (03CR) 1020after4: "still needs a bit of work but it is now usable" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342297 (owner: 1020after4) [03:08:13] 06Operations, 06Commons, 10Datasets-General-or-Unknown, 10Dumps-Generation, 07Community-Wishlist-Survey-2016: Back up of Commons files - https://phabricator.wikimedia.org/T160229#3092863 (10Reedy) [03:23:18] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 808.76 seconds [03:26:18] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 262.88 seconds [03:39:59] (03CR) 10Krinkle: [C: 031] fix a couple of bugs in scap clean [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342297 (owner: 1020after4) [03:40:24] (03CR) 1020after4: [C: 032] fix a couple of bugs in scap clean [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342297 (owner: 1020after4) [03:42:02] (03Merged) 10jenkins-bot: fix a couple of bugs in scap clean [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342297 (owner: 1020after4) [03:42:11] (03CR) 10jenkins-bot: fix a couple of bugs in scap clean [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342297 (owner: 1020after4) [03:52:18] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:14:48] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.012 second response time [04:15:08] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=1543.10 Read Requests/Sec=1569.80 Write Requests/Sec=0.60 KBytes Read/Sec=34950.40 KBytes_Written/Sec=29.60 [04:15:48] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.026 second response time [04:20:18] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [04:26:08] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=0.70 Read Requests/Sec=0.30 Write Requests/Sec=79.90 KBytes Read/Sec=2.00 KBytes_Written/Sec=550.00 [04:56:28] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:24:28] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [07:07:19] PROBLEM - puppet last run on db1078 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:58] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:35:18] RECOVERY - puppet last run on db1078 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [07:45:58] RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [07:52:18] PROBLEM - Host es2015 is DOWN: PING CRITICAL - Packet loss = 100% [07:58:48] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:18:28] PROBLEM - MariaDB Slave Lag: s2 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 308.02 seconds [08:19:28] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:22:28] PROBLEM - MariaDB Slave Lag: s2 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 320.03 seconds [08:25:38] 06Operations, 10ops-codfw, 10DBA: es2015 crashed on 2017-03-11 - https://phabricator.wikimedia.org/T160242#3092963 (10jcrespo) [08:26:48] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [08:28:28] RECOVERY - MariaDB Slave Lag: s2 on db1047 is OK: OK slave_sql_lag Replication lag: 29.02 seconds [08:36:38] PROBLEM - parsoid on wtp2007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:37:28] RECOVERY - parsoid on wtp2007 is OK: HTTP OK: HTTP/1.1 200 OK - 1014 bytes in 0.140 second response time [08:39:28] !log powercycle es2015 - unresponsive [08:39:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:38] RECOVERY - Host es2015 is UP: PING OK - Packet loss = 0%, RTA = 36.12 ms [08:44:29] PROBLEM - parsoid on wtp2009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:18] RECOVERY - parsoid on wtp2009 is OK: HTTP OK: HTTP/1.1 200 OK - 1014 bytes in 0.098 second response time [08:48:28] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [08:53:11] (03PS4) 10Mbch331: Remove exception on Other Projects sidebar for Dutch Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634) [08:55:38] PROBLEM - puppet last run on prometheus2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:39] RECOVERY - puppet last run on prometheus2001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [09:57:58] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 275 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [10:02:58] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 16 probes of 275 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [10:08:38] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:36:39] RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:53:41] 06Operations, 10IDS-extension, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#3093051 (10Shoichi) >>! In T148693#3092220, @Arthur2e5 wrote: > P.S.: Shoichi, I am seeing some heavy spamming on http://ids-testing.wmflabs.org/wiki/Mai... [11:54:28] PROBLEM - puppet last run on dbproxy1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:22:28] RECOVERY - puppet last run on dbproxy1009 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [12:33:44] 06Operations, 10Mail: Get mail relay out of Yahoo! blacklist: apply to Yahoo for whitelisting bulk mail - https://phabricator.wikimedia.org/T58414#616526 (10Thibaut120094) FYI, Yahoo! is blocking replies from OTRS: https://otrs-wiki.wikimedia.org/w/index.php?title=Caf%C3%A9&oldid=74130#Yahoo.21_Mail_is_blockin... [12:41:36] 06Operations, 10Mail: Get mail relay out of Yahoo! blacklist: apply to Yahoo for whitelisting bulk mail - https://phabricator.wikimedia.org/T58414#3093085 (10Liuxinyu970226) [12:55:22] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634) (owner: 10Mbch331) [14:27:38] PROBLEM - git_daemon_running on contint2001 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/lib/git-core/git-daemon [14:28:38] RECOVERY - git_daemon_running on contint2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/git-core/git-daemon [14:50:48] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 647604 [15:20:50] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [15:28:16] 06Operations, 10Mail: Get mail relay out of Yahoo! blacklist: apply to Yahoo for whitelisting bulk mail - https://phabricator.wikimedia.org/T58414#616526 (10Paladox) Not sure if related but emails from wikimedia our taking super long for me compared to using outlook. [15:31:30] (03PS5) 10Ladsgroup: service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) [15:32:31] (03CR) 10jerkins-bot: [V: 04-1] service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [15:34:22] (03PS6) 10Ladsgroup: service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) [15:42:58] PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:10:58] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:11:05] (03PS2) 10MarcoAurelio: Allow 'autoreviewrestore' to be managed from Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342042 [17:05:48] PROBLEM - puppet last run on mw1306 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:20:29] PROBLEM - puppet last run on analytics1056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:33:48] RECOVERY - puppet last run on mw1306 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:48:28] RECOVERY - puppet last run on analytics1056 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [18:07:38] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:23:08] PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:26:18] PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100% [18:26:38] RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 39.75 ms [18:36:38] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [18:50:09] RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:37:10] (03PS3) 10ArielGlenn: write hash sums, dumpruninfo, status report additionally in json [dumps] - 10https://gerrit.wikimedia.org/r/336395 (https://phabricator.wikimedia.org/T147177) [19:37:12] (03PS1) 10ArielGlenn: use the various json outputs to write a combined file for status api use [dumps] - 10https://gerrit.wikimedia.org/r/342310 (https://phabricator.wikimedia.org/T147177) [19:37:14] (03PS1) 10ArielGlenn: have dump monitor collect current run json files and produce index.json [dumps] - 10https://gerrit.wikimedia.org/r/342311 (https://phabricator.wikimedia.org/T147177) [19:37:38] (03CR) 10jerkins-bot: [V: 04-1] write hash sums, dumpruninfo, status report additionally in json [dumps] - 10https://gerrit.wikimedia.org/r/336395 (https://phabricator.wikimedia.org/T147177) (owner: 10ArielGlenn) [19:37:41] (03CR) 10jerkins-bot: [V: 04-1] use the various json outputs to write a combined file for status api use [dumps] - 10https://gerrit.wikimedia.org/r/342310 (https://phabricator.wikimedia.org/T147177) (owner: 10ArielGlenn) [19:37:42] (03CR) 10jerkins-bot: [V: 04-1] have dump monitor collect current run json files and produce index.json [dumps] - 10https://gerrit.wikimedia.org/r/342311 (https://phabricator.wikimedia.org/T147177) (owner: 10ArielGlenn) [19:39:19] booooo [19:39:23] * apergos snickers [19:39:40] I forgot to pep8 them of couse. [19:39:46] plenty o pylint >_< [19:46:17] (03PS4) 10ArielGlenn: write hash sums, dumpruninfo, status report additionally in json [dumps] - 10https://gerrit.wikimedia.org/r/336395 (https://phabricator.wikimedia.org/T147177) [19:46:19] (03PS2) 10ArielGlenn: use the various json outputs to write a combined file for status api use [dumps] - 10https://gerrit.wikimedia.org/r/342310 (https://phabricator.wikimedia.org/T147177) [19:46:21] (03PS2) 10ArielGlenn: have dump monitor collect current run json files and produce index.json [dumps] - 10https://gerrit.wikimedia.org/r/342311 (https://phabricator.wikimedia.org/T147177) [19:49:02] http://i3.kym-cdn.com/photos/images/facebook/001/088/640/bf6.jpg [20:09:18] (03PS1) 10ArielGlenn: add a version number to status file output!! [dumps] - 10https://gerrit.wikimedia.org/r/342312 [21:17:38] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:24:49] (03PS3) 10Paladox: Phabricator: Remove three unneeded configs [puppet] - 10https://gerrit.wikimedia.org/r/342275 [21:24:59] (03PS3) 10Paladox: Phabricator: Use hiera for deciding when to enable read and write for mysql search [puppet] - 10https://gerrit.wikimedia.org/r/342276 [21:28:46] (03Draft1) 10Paladox: Gerrit: Increase sendemail.threadPoolSize to 5 [puppet] - 10https://gerrit.wikimedia.org/r/342313 [21:28:49] (03PS2) 10Paladox: Gerrit: Increase sendemail.threadPoolSize to 5 [puppet] - 10https://gerrit.wikimedia.org/r/342313 [21:29:57] (03CR) 10Paladox: "On my local machine sending emails through gerrit is instant. So the issue may be with the email server we are using. Or how many threads " [puppet] - 10https://gerrit.wikimedia.org/r/342313 (owner: 10Paladox) [21:46:38] RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [22:09:58] PROBLEM - git_daemon_running on contint2001 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/lib/git-core/git-daemon [22:11:58] RECOVERY - git_daemon_running on contint2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/git-core/git-daemon [22:14:48] PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:42:48] RECOVERY - puppet last run on mw1281 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [23:07:48] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:36:48] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [23:54:08] 06Operations, 06Commons, 10Datasets-General-or-Unknown, 10Dumps-Generation, 07Community-Wishlist-Survey-2016: Back up of Commons files - https://phabricator.wikimedia.org/T160229#3093522 (10Peachey88)