[00:33:35] <icinga-wm>	 PROBLEM - MegaRAID on db1055 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[00:35:56] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Depool db1055 because hardware issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374062
[00:38:58] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1055 because hardware issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374062 (owner: 10Jcrespo)
[00:40:29] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Depool db1055 because hardware issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374062 (owner: 10Jcrespo)
[00:40:39] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Depool db1055 because hardware issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374062 (owner: 10Jcrespo)
[00:42:31] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1051, hw issues, may get lag (duration: 00m 44s)
[00:42:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:45:52] <jynus>	 !log correction last log s/db1051/db1055/
[00:46:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:50:02] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA: BBU issues on db1055, RAID cache on WriteThrough - https://phabricator.wikimedia.org/T174265#3556378 (10jcrespo)
[00:50:39] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA: BBU issues on db1055, RAID cache on WriteThrough - https://phabricator.wikimedia.org/T174265#3556391 (10jcrespo) db1055 depooled for performance reasons https://gerrit.wikimedia.org/r/374062
[01:03:35] <icinga-wm>	 RECOVERY - MegaRAID on db1055 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[01:04:25] <icinga-wm>	 PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1503795853 600 - REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 5172369 keys, up 4 minutes 11 seconds - replication_delay is 1503795853
[01:05:16] <icinga-wm>	 RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 5168749 keys, up 5 minutes 7 seconds - replication_delay is 0
[01:14:16] <icinga-wm>	 PROBLEM - puppet last run on rdb1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:33:35] <icinga-wm>	 PROBLEM - MegaRAID on db1055 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[01:42:35] <icinga-wm>	 RECOVERY - puppet last run on rdb1008 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[03:25:35] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 710.82 seconds
[03:32:55] <icinga-wm>	 PROBLEM - puppet last run on mw2164 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[03:33:35] <icinga-wm>	 PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz]
[03:41:48] <wikibugs_>	 (03PS1) 10GeoffreyT2000: Rename Wikisaurus namespace on Wiktionary to "Thesaurus" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374063 (https://phabricator.wikimedia.org/T174264)
[03:55:46] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 62.27 seconds
[04:00:56] <icinga-wm>	 RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[04:01:26] <icinga-wm>	 RECOVERY - puppet last run on mw2164 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[05:30:55] <marostegui>	 !log Force BBU relearn on db1055 - T174265
[05:31:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:31:09] <stashbot>	 T174265: BBU issues on db1055, RAID cache on WriteThrough - https://phabricator.wikimedia.org/T174265
[05:34:04] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA: BBU issues on db1055, RAID cache on WriteThrough - https://phabricator.wikimedia.org/T174265#3556421 (10Marostegui) I will force a re-learn cycle on this host to see if the BBU comes back to optimal. Anyhow, @Cmjohnson can we use a BBU of the servers that are ready to be dec...
[06:03:36] <icinga-wm>	 RECOVERY - MegaRAID on db1055 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[06:04:18] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA: BBU issues on db1055, RAID cache on WriteThrough - https://phabricator.wikimedia.org/T174265#3556425 (10Marostegui) After the re-learn the BBU is back to Optimal and the RAID back to WB:  ```  root@db1055:~#  megacli -AdpBbuCmd  -a0  BBU status for Adapter: 0  BatteryType: B...
[06:23:35] <icinga-wm>	 PROBLEM - MegaRAID on db1055 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[06:27:45] <icinga-wm>	 PROBLEM - graphite.wikimedia.org on graphite1003 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.001 second response time
[06:28:45] <icinga-wm>	 RECOVERY - graphite.wikimedia.org on graphite1003 is OK: HTTP OK: HTTP/1.1 200 OK - 1547 bytes in 0.009 second response time
[07:23:36] <icinga-wm>	 RECOVERY - MegaRAID on db1055 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[08:23:45] <icinga-wm>	 PROBLEM - MegaRAID on db1055 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[09:05:08] <wikibugs_>	 10Operations, 10media-storage: Two cases of local-multiwrite storage backend failure - https://phabricator.wikimedia.org/T174269#3556512 (10Ladsgroup)
[09:06:36] <wikibugs_>	 10Operations, 10media-storage: Two cases of local-multiwrite storage backend failure - https://phabricator.wikimedia.org/T174269#3556526 (10Ladsgroup)
[10:33:45] <icinga-wm>	 RECOVERY - MegaRAID on db1055 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[10:53:45] <icinga-wm>	 PROBLEM - MegaRAID on db1055 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[11:23:36] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2033270
[11:32:36] <wikibugs_>	 (03PS1) 10MarcoAurelio: SVG logo for es.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374065 (https://phabricator.wikimedia.org/T170604)
[11:48:59] <wikibugs_>	 (03PS1) 10Urbanecm: Allow sysops to grant/remove transwiki user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374066 (https://phabricator.wikimedia.org/T174226)
[12:03:45] <icinga-wm>	 RECOVERY - MegaRAID on db1055 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[12:04:28] <wikibugs_>	 (03CR) 10MarcoAurelio: Allow sysops to grant/remove transwiki user group (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374066 (https://phabricator.wikimedia.org/T174226) (owner: 10Urbanecm)
[12:23:45] <icinga-wm>	 PROBLEM - MegaRAID on db1055 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[12:43:45] <icinga-wm>	 RECOVERY - MegaRAID on db1055 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[13:03:45] <icinga-wm>	 PROBLEM - MegaRAID on db1055 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[13:57:03] <wikibugs_>	 (03PS1) 10Urbanecm: Add several HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374071 (https://phabricator.wikimedia.org/T150618)
[14:13:45] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 4051
[14:21:27] <wikibugs_>	 (03PS2) 10Urbanecm: Allow sysops to grant/remove transwiki user group in dtywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374066 (https://phabricator.wikimedia.org/T174226)
[14:21:34] <wikibugs_>	 (03CR) 10Urbanecm: "Fixed, thank you." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374066 (https://phabricator.wikimedia.org/T174226) (owner: 10Urbanecm)
[14:49:32] <wikibugs_>	 (03CR) 10Framawiki: [C: 031] throttle.php: Separate the throttling definitions from the exception values itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373695 (https://phabricator.wikimedia.org/T167040) (owner: 10Urbanecm)
[15:33:45] <icinga-wm>	 RECOVERY - MegaRAID on db1055 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[15:46:41] <akosiaris>	 !log upload kubernetes_1.4.6-7 to apt.wikimedia.org/jessie-wikimedia/main T170346
[15:46:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:46:57] <stashbot>	 T170346: Kubernetes man pages missing from WMF packages - https://phabricator.wikimedia.org/T170346
[15:50:45] <icinga-wm>	 PROBLEM - DPKG on kubernetes1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:52:45] <icinga-wm>	 RECOVERY - DPKG on kubernetes1001 is OK: All packages OK
[17:00:11] <wikibugs_>	 (03CR) 10MarcoAurelio: [C: 031] Allow sysops to grant/remove transwiki user group in dtywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374066 (https://phabricator.wikimedia.org/T174226) (owner: 10Urbanecm)
[17:28:32] <wikibugs_>	 (03PS1) 10Samtar: Make both LoginNotify email features default for Hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374082 (https://phabricator.wikimedia.org/T174263)
[18:01:51] <wikibugs_>	 (03CR) 10Framawiki: [C: 031] Make both LoginNotify email features default for Hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374082 (https://phabricator.wikimedia.org/T174263) (owner: 10Samtar)
[18:12:58] <wikibugs_>	 (03CR) 10Urbanecm: [C: 031] Make both LoginNotify email features default for Hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374082 (https://phabricator.wikimedia.org/T174263) (owner: 10Samtar)
[18:18:35] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received
[18:20:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received
[18:21:26] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received
[18:23:56] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy
[18:28:46] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) is CRITICAL: Test normal source and target returned the unexpected status 429 (expecting: 200)
[18:29:35] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy
[18:29:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy
[18:31:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy
[18:33:55] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) timed out before a response was received
[18:35:15] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) timed out before a response was received
[18:35:45] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) timed out before a response was received
[18:38:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received
[18:39:55] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy
[18:43:05] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received
[18:43:55] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy
[18:44:05] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy
[18:44:05] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy
[18:44:15] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy
[18:47:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 502 (expecting: 200)
[18:48:16] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy
[20:21:31] <wikibugs_>	 (03CR) 10Luke081515: Automatically include commons and wikidata in $wmgThrottlingExceptions (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373698 (https://phabricator.wikimedia.org/T163872) (owner: 10Urbanecm)
[20:24:19] <wikibugs_>	 (03CR) 10Framawiki: [C: 031] Automatically include commons and wikidata in $wmgThrottlingExceptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373698 (https://phabricator.wikimedia.org/T163872) (owner: 10Urbanecm)
[20:27:54] <wikibugs_>	 (03PS7) 10ArielGlenn: write dump output files to temporary location, move in place when done [dumps] - 10https://gerrit.wikimedia.org/r/368744 (https://phabricator.wikimedia.org/T169849)
[20:28:15] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] write dump output files to temporary location, move in place when done [dumps] - 10https://gerrit.wikimedia.org/r/368744 (https://phabricator.wikimedia.org/T169849) (owner: 10ArielGlenn)
[20:30:32] <wikibugs_>	 (03PS8) 10ArielGlenn: write dump output files to temporary location, move in place when done [dumps] - 10https://gerrit.wikimedia.org/r/368744 (https://phabricator.wikimedia.org/T169849)
[20:32:58] <wikibugs_>	 (03CR) 10ArielGlenn: [C: 032] write dump output files to temporary location, move in place when done [dumps] - 10https://gerrit.wikimedia.org/r/368744 (https://phabricator.wikimedia.org/T169849) (owner: 10ArielGlenn)
[20:34:51] <logmsgbot>	 !log ariel@tin Started deploy [dumps/dumps@39f9b52]: write output files to temp location and move into place when complete
[20:34:54] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@39f9b52]: write output files to temp location and move into place when complete (duration: 00m 02s)
[20:35:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:35:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:35:39] <wikibugs_>	 10Operations, 10Dumps-Generation, 10Patch-For-Review: Architecture and puppetize setup for dumpsdata boxes - https://phabricator.wikimedia.org/T169849#3557115 (10ArielGlenn) This is now deployed. The Sept 1 dumps will use this code.
[21:22:55] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2014703
[22:02:55] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 84
[22:27:34] <wikibugs_>	 (03Abandoned) 10Paladox: Enabled Ogg Opus support for TimedMediaHandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/256967 (owner: 10Paladox)
[22:45:00] <wikibugs_>	 (03CR) 10Platonides: [C: 04-1] "I don't think we should be setting per-wiki defaults for these preferences that should be global. See T174263#3557302" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374082 (https://phabricator.wikimedia.org/T174263) (owner: 10Samtar)