[00:10:34] <wikibugs>	 (03PS4) 10Dzahn: standard: actually drop 'has_ganglia' param entirely [puppet] - 10https://gerrit.wikimedia.org/r/382926 (https://phabricator.wikimedia.org/T177225)
[00:13:10] <wikibugs>	 (03PS5) 10Dzahn: standard: actually drop 'has_ganglia' param entirely [puppet] - 10https://gerrit.wikimedia.org/r/382926 (https://phabricator.wikimedia.org/T177225)
[00:14:38] <wikibugs>	 (03CR) 10Dzahn: [C: 032] standard: actually drop 'has_ganglia' param entirely [puppet] - 10https://gerrit.wikimedia.org/r/382926 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[00:21:49] <mutante>	 meh @  icinga-wm 
[00:29:51] <wikibugs>	 (03PS2) 10Dzahn: ganglia: delete ganglia-web classes and role [puppet] - 10https://gerrit.wikimedia.org/r/382932 (https://phabricator.wikimedia.org/T177225)
[00:30:44] <wikibugs>	 (03CR) 10Dzahn: [C: 032] ganglia: delete ganglia-web classes and role [puppet] - 10https://gerrit.wikimedia.org/r/382932 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[00:36:40] <wikibugs>	 (03PS1) 10Dzahn: network::constants: drop uranium from monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/399119 (https://phabricator.wikimedia.org/T177225)
[00:38:46] <wikibugs>	 (03PS1) 10Dzahn: remove ganglia_aggregators settings from hiera [puppet] - 10https://gerrit.wikimedia.org/r/399120 (https://phabricator.wikimedia.org/T177225)
[00:44:47] <mutante>	 !log einsteinium: sudo systemctl restrart ircecho   (alias kick-icinga-wm) 
[00:44:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:49:17] <wikibugs>	 (03PS1) 10Dzahn: rm role/manifests/ganglia/config [puppet] - 10https://gerrit.wikimedia.org/r/399121
[00:49:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] rm role/manifests/ganglia/config [puppet] - 10https://gerrit.wikimedia.org/r/399121 (owner: 10Dzahn)
[00:51:06] <wikibugs>	 (03PS2) 10Dzahn: rm role/manifests/ganglia/config [puppet] - 10https://gerrit.wikimedia.org/r/399121 (https://phabricator.wikimedia.org/T177225)
[01:00:22] <wikibugs>	 (03PS1) 10Chad: Nightly server: let MW releasers manage Jenkins [puppet] - 10https://gerrit.wikimedia.org/r/399123
[01:05:05] <icinga-wm>	 PROBLEM - HTTP on releases1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 13009 bytes in 0.009 second response time
[01:06:18] <mutante>	 no_justification: ^
[01:06:34] <no_justification>	 Yep yep, I know
[01:09:16] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp4021 is CRITICAL: CRITICAL: expiry mailbox lag is 2092413
[01:10:05] <icinga-wm>	 RECOVERY - HTTP on releases1001 is OK: HTTP OK: HTTP/1.1 200 OK - 19215 bytes in 0.082 second response time
[01:11:42] <wikibugs>	 (03CR) 10Dzahn: [C: 032] rm role/manifests/ganglia/config [puppet] - 10https://gerrit.wikimedia.org/r/399121 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[01:15:55] <wikibugs>	 (03PS1) 10Dzahn: remove ganglia.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/399124 (https://phabricator.wikimedia.org/T177225)
[01:20:25] <wikibugs>	 10Operations: decom uranium - https://phabricator.wikimedia.org/T183209#3846949 (10Dzahn)
[01:21:25] <wikibugs>	 10Operations, 10monitoring, 10Technical-Debt: decom uranium - https://phabricator.wikimedia.org/T183209#3846962 (10Dzahn)
[01:26:59] <wikibugs>	 (03PS1) 10Dzahn: remove uranium.wikimedia.org, v4 + v6 [dns] - 10https://gerrit.wikimedia.org/r/399125 (https://phabricator.wikimedia.org/T183209)
[01:30:02] <wikibugs>	 (03PS1) 10Dzahn: uranium: remove mapped v6, add decom comment [puppet] - 10https://gerrit.wikimedia.org/r/399127 (https://phabricator.wikimedia.org/T183209)
[01:31:15] <wikibugs>	 (03CR) 10Dzahn: [C: 032] uranium: remove mapped v6, add decom comment [puppet] - 10https://gerrit.wikimedia.org/r/399127 (https://phabricator.wikimedia.org/T183209) (owner: 10Dzahn)
[01:42:57] <wikibugs>	 (03CR) 10Dzahn: "as a minimum i can definitely confirm you wouldn't be the first to let puppet execute usermod to fix this or similar:" [puppet] - 10https://gerrit.wikimedia.org/r/399101 (owner: 10Ayounsi)
[01:45:24] <mutante>	 afk now,bbl
[01:50:05] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0
[01:50:15] <icinga-wm>	 PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0
[02:24:39] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.12) (duration: 05m 22s)
[02:24:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:54:55] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0
[02:54:56] <icinga-wm>	 RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0
[03:56:15] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp4024 is CRITICAL: CRITICAL: expiry mailbox lag is 2055624
[04:16:15] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp4024 is OK: OK: expiry mailbox lag is 0
[04:23:05] <icinga-wm>	 PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.016 second response time
[04:29:26] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.486 second response time
[05:11:06] <icinga-wm>	 RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.025 second response time
[05:14:26] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.332 second response time
[05:18:24] <andrewbogott>	 !log restarting slapd on seaborgium (in response to ldap complaints on the grid master)
[05:18:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:29:26] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.291 second response time
[05:39:26] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.391 second response time
[06:09:56] <marostegui>	 !log Deploy schema change on db1065 (s1 sanitarium master) with replication, so some lag will be generated on labs - T174569
[06:10:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:10:10] <stashbot>	 T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569
[06:21:26] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399138 (https://phabricator.wikimedia.org/T161294)
[06:24:27] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399138 (https://phabricator.wikimedia.org/T161294) (owner: 10Marostegui)
[06:25:49] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399138 (https://phabricator.wikimedia.org/T161294) (owner: 10Marostegui)
[06:26:48] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399138 (https://phabricator.wikimedia.org/T161294) (owner: 10Marostegui)
[06:26:51] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1106 - T161294 (duration: 00m 53s)
[06:27:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:27:02] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[06:29:07] <marostegui>	 !log Stop replication in sync on db1100 and db1106 - T161294
[06:29:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:37:23] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399140
[06:40:21] <marostegui>	 !log Stop replication in sync on db1106 and dbstore1002 s5 - T161294
[06:40:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:40:31] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[06:49:21] <logmsgbot>	 !log mobrovac@tin Started deploy [restbase/deploy@2b75a64]: Bug fix: Add the time_to_live config option to the Parsoid module
[06:49:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:51:28] <marostegui>	 !log Stop replication in sync on db1106 and db2052 - T161294 
[06:51:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:51:39] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[06:53:48] <logmsgbot>	 !log mobrovac@tin Finished deploy [restbase/deploy@2b75a64]: Bug fix: Add the time_to_live config option to the Parsoid module (duration: 04m 26s)
[06:53:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:55:36] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399140 (owner: 10Marostegui)
[06:59:00] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399140 (owner: 10Marostegui)
[06:59:10] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399140 (owner: 10Marostegui)
[07:00:03] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1106 - T161294 (duration: 00m 51s)
[07:00:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:00:15] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[07:16:32] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: First version of the helm chart scaffolding for production services (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/392619 (https://phabricator.wikimedia.org/T177397) (owner: 10Giuseppe Lavagetto)
[07:17:05] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Create an envoy docker image. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/396021
[08:02:32] <wikibugs>	 (03PS9) 10Jcrespo: Update mariadb::proxy to the latest style and path locations [puppet] - 10https://gerrit.wikimedia.org/r/398450 (https://phabricator.wikimedia.org/T148507)
[08:02:34] <wikibugs>	 (03PS2) 10Jcrespo: [WIP]Quick & dirty script to check data differences between tables [puppet] - 10https://gerrit.wikimedia.org/r/345188 (https://phabricator.wikimedia.org/T160509)
[08:03:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP]Quick & dirty script to check data differences between tables [puppet] - 10https://gerrit.wikimedia.org/r/345188 (https://phabricator.wikimedia.org/T160509) (owner: 10Jcrespo)
[08:05:07] <logmsgbot>	 !log jmm@puppetmaster1001 conftool action : set/pooled=yes; selector: mw2119.codfw.wmnet
[08:05:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:33] <logmsgbot>	 !log jmm@puppetmaster1001 conftool action : set/pooled=yes; selector: mw2246.codfw.wmnet
[08:05:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:05] <moritzm>	 !log installing openssl security updates
[08:21:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:28:44] <marostegui>	 !log Stop replication in sync on db2045 and db1109 - T161294
[08:28:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:28:55] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[08:30:42] <wikibugs>	 (03PS2) 10Muehlenhoff: Fix texlive dependency for stretch onwards [puppet] - 10https://gerrit.wikimedia.org/r/395712
[08:33:55] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2246 is OK: OK
[08:35:58] <moritzm>	 !log reimaging mw1317 (video scaler) to stretch
[08:36:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:36:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Fix texlive dependency for stretch onwards [puppet] - 10https://gerrit.wikimedia.org/r/395712 (owner: 10Muehlenhoff)
[08:38:03] <wikibugs>	 (03PS3) 10Jcrespo: [WIP]Quick & dirty script to check data differences between tables [puppet] - 10https://gerrit.wikimedia.org/r/345188 (https://phabricator.wikimedia.org/T160509)
[08:38:08] <wikibugs>	 (03PS2) 10Filippo Giunchedi: prometheus: recording rules for redis [puppet] - 10https://gerrit.wikimedia.org/r/398871 (https://phabricator.wikimedia.org/T148637)
[08:38:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP]Quick & dirty script to check data differences between tables [puppet] - 10https://gerrit.wikimedia.org/r/345188 (https://phabricator.wikimedia.org/T160509) (owner: 10Jcrespo)
[08:42:19] <wikibugs>	 (03PS2) 10Jcrespo: [WIP]Initial commit of existent python scripts [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/354206
[08:42:38] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] prometheus: recording rules for redis [puppet] - 10https://gerrit.wikimedia.org/r/398871 (https://phabricator.wikimedia.org/T148637) (owner: 10Filippo Giunchedi)
[08:46:09] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Bump puppetdb on puppet compiler to 3G [puppet] - 10https://gerrit.wikimedia.org/r/399145
[08:47:11] <wikibugs>	 (03PS1) 10Marostegui: Revert "Revert "db-eqiad.php: Depool db1106"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399146
[08:52:53] <wikibugs>	 (03CR) 10Elukey: "pcc https://puppet-compiler.wmflabs.org/compiler02/9395/" [puppet] - 10https://gerrit.wikimedia.org/r/398869 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey)
[08:52:58] <wikibugs>	 (03CR) 10Volans: "Nice! I know it's a WIP, I just left few minor comments/suggestions." (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/345188 (https://phabricator.wikimedia.org/T160509) (owner: 10Jcrespo)
[08:53:07] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "Revert "db-eqiad.php: Depool db1106"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399146 (owner: 10Marostegui)
[08:55:47] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, though I don't see pdns-exporter running on labservices1001 yet" [puppet] - 10https://gerrit.wikimedia.org/r/398867 (owner: 10Muehlenhoff)
[08:55:49] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Revert "db-eqiad.php: Depool db1106"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399146 (owner: 10Marostegui)
[08:56:49] <wikibugs>	 (03CR) 10jenkins-bot: Revert "Revert "db-eqiad.php: Depool db1106"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399146 (owner: 10Marostegui)
[08:56:53] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1106 - T161294 (duration: 00m 51s)
[08:57:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:04] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[08:57:33] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] Bump puppetdb on puppet compiler to 3G [puppet] - 10https://gerrit.wikimedia.org/r/399145 (owner: 10Alexandros Kosiaris)
[08:59:31] <wikibugs>	 (03CR) 10Volans: [C: 031] "Ack that there is no hurry and we can wait Jan. Adding +1 because the  patch looks good to me now. @herron: feel free to -2 it to ensure i" [puppet] - 10https://gerrit.wikimedia.org/r/398120 (https://phabricator.wikimedia.org/T182819) (owner: 10Herron)
[09:03:54] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2119 is OK: OK
[09:03:54] <icinga-wm>	 ACKNOWLEDGEMENT - Host cp4032 is DOWN: PING CRITICAL - Packet loss = 100% Volans Under maintenance https://phabricator.wikimedia.org/T183176
[09:03:57] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399148 (https://phabricator.wikimedia.org/T161294)
[09:04:19] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/9396/" [puppet] - 10https://gerrit.wikimedia.org/r/398847 (https://phabricator.wikimedia.org/T181995) (owner: 10Filippo Giunchedi)
[09:04:27] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Add nutcracker_exporter profile [puppet] - 10https://gerrit.wikimedia.org/r/398847 (https://phabricator.wikimedia.org/T181995)
[09:04:42] <wikibugs>	 (03CR) 10Jcrespo: "Volans: aside from return values, the fact that you focus on the nitpicks and not on the fact that this is a monolithic unmaintainable mes" [puppet] - 10https://gerrit.wikimedia.org/r/345188 (https://phabricator.wikimedia.org/T160509) (owner: 10Jcrespo)
[09:06:14] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic: cp4032 memory error - https://phabricator.wikimedia.org/T183176#3845938 (10Volans) @RobH FYI I've ack'ed the Icinga alert of the host down and set it to downtime until Fri UTC morning.
[09:07:04] <icinga-wm>	 PROBLEM - DPKG on webperf1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[09:08:03] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399148 (https://phabricator.wikimedia.org/T161294) (owner: 10Marostegui)
[09:09:04] <icinga-wm>	 RECOVERY - DPKG on webperf1001 is OK: All packages OK
[09:09:06] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] "https://puppet-compiler.wmflabs.org/compiler02/9398/ is rather happy, I 'll proceed with this and see what we get out of it" [puppet] - 10https://gerrit.wikimedia.org/r/398276 (https://phabricator.wikimedia.org/T182860) (owner: 10Alexandros Kosiaris)
[09:10:42] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Populate the docker group in admin module [puppet] - 10https://gerrit.wikimedia.org/r/398276 (https://phabricator.wikimedia.org/T182860)
[09:11:33] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399148 (https://phabricator.wikimedia.org/T161294) (owner: 10Marostegui)
[09:11:47] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399148 (https://phabricator.wikimedia.org/T161294) (owner: 10Marostegui)
[09:12:39] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1097:3315 - T161294 (duration: 00m 51s)
[09:12:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:12:49] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[09:15:34] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: Populate the docker group in admin module [puppet] - 10https://gerrit.wikimedia.org/r/398276 (https://phabricator.wikimedia.org/T182860)
[09:18:40] <wikibugs>	 (03PS3) 10Elukey: profile::mariadb::misc::el::master: apply data sanitization policies [puppet] - 10https://gerrit.wikimedia.org/r/398869 (https://phabricator.wikimedia.org/T108850)
[09:19:43] <wikibugs>	 (03CR) 10Elukey: [C: 032] profile::mariadb::misc::el::master: apply data sanitization policies [puppet] - 10https://gerrit.wikimedia.org/r/398869 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey)
[09:20:35] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: Populate the docker group in admin module [puppet] - 10https://gerrit.wikimedia.org/r/398276 (https://phabricator.wikimedia.org/T182860)
[09:21:01] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Add Prometheus scraper configs for WDQS updater and Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/398865 (owner: 10Muehlenhoff)
[09:22:25] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] Populate the docker group in admin module [puppet] - 10https://gerrit.wikimedia.org/r/398276 (https://phabricator.wikimedia.org/T182860) (owner: 10Alexandros Kosiaris)
[09:26:13] <icinga-wm>	 PROBLEM - puppet last run on db1107 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:26:34] <elukey>	 this is me --^
[09:27:47] <wikibugs>	 (03PS10) 10Jcrespo: Update mariadb::proxy to the latest style and path locations [puppet] - 10https://gerrit.wikimedia.org/r/398450 (https://phabricator.wikimedia.org/T148507)
[09:30:50] <wikibugs>	 (03PS1) 10Elukey: profile::mariadb::misc::eventlogging: fix group/user dependencies [puppet] - 10https://gerrit.wikimedia.org/r/399149 (https://phabricator.wikimedia.org/T108850)
[09:31:23] <wikibugs>	 (03CR) 10Elukey: [C: 032] profile::mariadb::misc::eventlogging: fix group/user dependencies [puppet] - 10https://gerrit.wikimedia.org/r/399149 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey)
[09:31:34] <wikibugs>	 (03PS7) 10ArielGlenn: rename 'otherdir' in the dumps modules [puppet] - 10https://gerrit.wikimedia.org/r/398034
[09:31:54] <elukey>	 akosiaris: shall I merge?
[09:32:18] <akosiaris>	 elukey: no, I got it... I have to check the results anyway
[09:32:24] <elukey>	 sure
[09:32:47] <wikibugs>	 (03PS11) 10Jcrespo: Update mariadb::proxy to the latest style and path locations [puppet] - 10https://gerrit.wikimedia.org/r/398450 (https://phabricator.wikimedia.org/T148507)
[09:33:20] <wikibugs>	 (03PS8) 10ArielGlenn: rename 'otherdir' in the dumps modules [puppet] - 10https://gerrit.wikimedia.org/r/398034
[09:33:57] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] rename 'otherdir' in the dumps modules [puppet] - 10https://gerrit.wikimedia.org/r/398034 (owner: 10ArielGlenn)
[09:36:18] <icinga-wm>	 RECOVERY - puppet last run on db1107 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[09:37:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-2] "Done with a different approach in https://gerrit.wikimedia.org/r/398276" [puppet] - 10https://gerrit.wikimedia.org/r/398240 (https://phabricator.wikimedia.org/T182860) (owner: 10Hashar)
[09:38:11] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Allow contint-admins to interact with docker on CI hosts - https://phabricator.wikimedia.org/T182860#3847374 (10akosiaris) 05Open>03Resolved a:03akosiaris...
[09:38:17] <wikibugs>	 (03PS12) 10Jcrespo: Update mariadb::proxy to the latest style and path locations [puppet] - 10https://gerrit.wikimedia.org/r/398450 (https://phabricator.wikimedia.org/T148507)
[09:39:37] <wikibugs>	 (03Abandoned) 10Hashar: contint: allow releng to interact with Docker [puppet] - 10https://gerrit.wikimedia.org/r/398240 (https://phabricator.wikimedia.org/T182860) (owner: 10Hashar)
[09:41:08] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Allow contint-admins to interact with docker on CI hosts - https://phabricator.wikimedia.org/T182860#3847378 (10hashar) ``` contint1001$ groups  wikidev docker...
[09:41:58] <icinga-wm>	 PROBLEM - DPKG on mwdebug1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[09:42:35] <godog>	 that's me ^
[09:44:19] <wikibugs>	 (03PS3) 10ArielGlenn: clean up directory setup manifests for dumps nfs and web servers [puppet] - 10https://gerrit.wikimedia.org/r/398095
[09:49:59] <icinga-wm>	 RECOVERY - DPKG on mwdebug1001 is OK: All packages OK
[09:53:33] <wikibugs>	 (03PS13) 10Jcrespo: Update mariadb::proxy to the latest style and path locations [puppet] - 10https://gerrit.wikimedia.org/r/398450 (https://phabricator.wikimedia.org/T148507)
[09:54:13] <wikibugs>	 (03PS14) 10Jcrespo: Update mariadb::proxy to the latest style and path locations [puppet] - 10https://gerrit.wikimedia.org/r/398450 (https://phabricator.wikimedia.org/T148507)
[09:54:58] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Add nutcracker_exporter profile [puppet] - 10https://gerrit.wikimedia.org/r/398847 (https://phabricator.wikimedia.org/T181995) (owner: 10Filippo Giunchedi)
[09:55:07] <wikibugs>	 (03PS3) 10Filippo Giunchedi: Add nutcracker_exporter profile [puppet] - 10https://gerrit.wikimedia.org/r/398847 (https://phabricator.wikimedia.org/T181995)
[09:55:30] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] clean up directory setup manifests for dumps nfs and web servers [puppet] - 10https://gerrit.wikimedia.org/r/398095 (owner: 10ArielGlenn)
[09:56:09] <wikibugs>	 (03PS4) 10Filippo Giunchedi: Add nutcracker_exporter profile [puppet] - 10https://gerrit.wikimedia.org/r/398847 (https://phabricator.wikimedia.org/T181995)
[09:59:06] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: First version of the helm chart scaffolding for production services [deployment-charts] - 10https://gerrit.wikimedia.org/r/392619 (https://phabricator.wikimedia.org/T177397)
[10:00:19] <icinga-wm>	 PROBLEM - Check systemd state on mw1187 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:00:20] <icinga-wm>	 PROBLEM - Check systemd state on mw1262 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:00:26] <_joe_>	 akosiaris: I'd merge that change, and declare that done
[10:00:31] <_joe_>	 godog: is that you? ^^
[10:00:54] <godog>	 _joe_: yeah that's me :( I'm taking a look
[10:00:54] <_joe_>	 ● prometheus-nutcracker-exporter.service                                                    loaded failed failed    Prometheus Nutcracker exporter
[10:00:59] <_joe_>	 yes :)
[10:01:10] <icinga-wm>	 PROBLEM - Check systemd state on mw2202 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:01:19] <icinga-wm>	 PROBLEM - Check systemd state on mw2240 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:01:20] <icinga-wm>	 PROBLEM - Check systemd state on mw1188 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:01:29] <icinga-wm>	 PROBLEM - Check systemd state on mw2141 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:01:30] <icinga-wm>	 PROBLEM - Check systemd state on mw1190 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:01:33] <akosiaris>	 _joe_: yeah sounds fine to me
[10:01:36] <godog>	 sigh, it worked ok on mwdebug 
[10:01:39] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:01:39] <icinga-wm>	 PROBLEM - configured eth on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:01:40] <icinga-wm>	 PROBLEM - Check systemd state on mw2156 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:01:40] <icinga-wm>	 PROBLEM - Check systemd state on mw2212 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:01:40] <icinga-wm>	 PROBLEM - Check systemd state on mw2255 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:01:40] <icinga-wm>	 PROBLEM - Check systemd state on mw1183 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:01:45] <godog>	 I'll rollback, sorry about the spam
[10:01:57] <_joe_>	 godog: don't, let's try to fix it instead
[10:02:00] <icinga-wm>	 PROBLEM - Check systemd state on mw2223 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:02:09] <icinga-wm>	 PROBLEM - Check systemd state on mw2157 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:02:09] <wikibugs>	 (03PS2) 10ArielGlenn: apachedir is available to dumps cron jobs via a bash script, use it [puppet] - 10https://gerrit.wikimedia.org/r/398106
[10:02:10] <icinga-wm>	 PROBLEM - Check systemd state on mw1309 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:02:19] <_joe_>	 godog: let's disable puppet wherever it didn't run instead
[10:02:20] <icinga-wm>	 PROBLEM - Check systemd state on mw2137 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:02:21] <_joe_>	 lemme do it
[10:02:29] <icinga-wm>	 PROBLEM - Check systemd state on mw2172 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:02:34] <godog>	 _joe_: ok
[10:02:39] <icinga-wm>	 PROBLEM - Check systemd state on scb2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:02:39] <icinga-wm>	 PROBLEM - Check systemd state on mw2234 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:03:09] <icinga-wm>	 PROBLEM - Check systemd state on mw1218 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:03:10] <icinga-wm>	 PROBLEM - Check systemd state on mw2162 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:03:19] <icinga-wm>	 PROBLEM - Check systemd state on mw1214 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:03:19] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:03:19] <icinga-wm>	 PROBLEM - dhclient process on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:03:25] <_joe_>	 done
[10:03:29] <icinga-wm>	 PROBLEM - Check systemd state on mw1322 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:03:34] <_joe_>	 mw1317 is being reimaged?
[10:03:39] <volans>	 yes
[10:03:40] <icinga-wm>	 PROBLEM - Check systemd state on mw2225 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:03:40] <icinga-wm>	 PROBLEM - Check systemd state on mw2144 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:03:59] <icinga-wm>	 PROBLEM - Check systemd state on mw1216 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:03:59] <icinga-wm>	 PROBLEM - Check systemd state on mw2113 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:00] <icinga-wm>	 PROBLEM - Check systemd state on mw1204 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:00] <icinga-wm>	 PROBLEM - Check systemd state on mw2177 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:02] <_joe_>	 ok
[10:04:08] <wikibugs>	 (03CR) 10Marostegui: [C: 031] Update mariadb::proxy to the latest style and path locations [puppet] - 10https://gerrit.wikimedia.org/r/398450 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[10:04:09] <icinga-wm>	 PROBLEM - Check systemd state on mw1213 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:09] <icinga-wm>	 PROBLEM - Check systemd state on mw1287 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:09] <icinga-wm>	 PROBLEM - Check systemd state on mw2122 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:10] <icinga-wm>	 PROBLEM - Check systemd state on mw1233 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:14] <_joe_>	 godog: puppet is disabled on all those systems btw
[10:04:19] <icinga-wm>	 PROBLEM - Check systemd state on mw2251 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:19] <icinga-wm>	 PROBLEM - Check systemd state on mw2231 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:30] <icinga-wm>	 PROBLEM - Check systemd state on mw2106 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:30] <icinga-wm>	 PROBLEM - Check systemd state on mw2176 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:40] <icinga-wm>	 PROBLEM - Check systemd state on mw2201 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:04:40] <icinga-wm>	 PROBLEM - Check systemd state on mw1220 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:00] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1317 is CRITICAL: Host mw1317 is not in mediawiki-installation dsh group
[10:05:00] <icinga-wm>	 PROBLEM - DPKG on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:05:00] <icinga-wm>	 PROBLEM - Check systemd state on mw1195 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:09] <icinga-wm>	 PROBLEM - Check systemd state on scb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:09] <icinga-wm>	 PROBLEM - Check systemd state on mw2244 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:11] <godog>	 _joe_: thanks! I'm trying to understand why on those nutcracker answers with connection reset by peer when asked for stats, works ok e.g. on mwdebug 
[10:05:19] <icinga-wm>	 PROBLEM - Check systemd state on mw1232 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:20] <icinga-wm>	 PROBLEM - Check systemd state on mw2252 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:20] <icinga-wm>	 PROBLEM - Check systemd state on mw1295 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:20] <icinga-wm>	 PROBLEM - Check systemd state on mw2218 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:22] <_joe_>	 godog: uhm
[10:05:25] <_joe_>	 lemme see
[10:05:29] <icinga-wm>	 PROBLEM - Check systemd state on mw2233 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:29] <icinga-wm>	 PROBLEM - Check systemd state on mw2245 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:29] <icinga-wm>	 PROBLEM - Check systemd state on thumbor2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:29] <icinga-wm>	 PROBLEM - Check systemd state on mw2133 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:29] <icinga-wm>	 PROBLEM - Check systemd state on mw2017 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:30] <icinga-wm>	 PROBLEM - Check systemd state on mw2101 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:40] <icinga-wm>	 PROBLEM - Check systemd state on mw2214 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:40] <icinga-wm>	 PROBLEM - Check systemd state on mw1208 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:40] <icinga-wm>	 PROBLEM - Check systemd state on mw2220 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:49] <icinga-wm>	 PROBLEM - Check systemd state on mw1284 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:05:59] <icinga-wm>	 PROBLEM - Check systemd state on mw1263 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:06:08] <jynus>	 not causing real problems, right?
[10:06:09] <icinga-wm>	 PROBLEM - Check systemd state on mw2246 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:06:26] <_joe_>	 jynus: nope
[10:06:29] <godog>	 no, but noise
[10:06:30] <jynus>	 good
[10:06:49] <icinga-wm>	 PROBLEM - Disk space on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:06:49] <icinga-wm>	 PROBLEM - nutcracker port on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:06:49] <icinga-wm>	 PROBLEM - Check systemd state on mw1310 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:06:59] <icinga-wm>	 PROBLEM - Check systemd state on mw1294 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:07:20] <icinga-wm>	 PROBLEM - Check systemd state on mw1293 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:07:49] <icinga-wm>	 PROBLEM - Check systemd state on scb2005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:08:29] <icinga-wm>	 PROBLEM - nutcracker process on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:08:29] <icinga-wm>	 PROBLEM - HHVM processes on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:08:30] <icinga-wm>	 PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:09:00] <icinga-wm>	 PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:09:35] <_joe_>	 godog: on most of the systems where it failed nutcracker returns its stats correctly
[10:10:10] <icinga-wm>	 PROBLEM - Check systemd state on mw1203 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:10:10] <icinga-wm>	 PROBLEM - Check systemd state on mw1242 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:10:10] <icinga-wm>	 PROBLEM - HHVM rendering on mw1317 is CRITICAL: connect to address 10.64.16.198 and port 80: Connection refused
[10:10:10] <icinga-wm>	 PROBLEM - puppet last run on mw1317 is CRITICAL: Return code of 255 is out of bounds
[10:10:10] <icinga-wm>	 PROBLEM - Check systemd state on mw2169 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:10:12] <godog>	 indeed
[10:11:23] <wikibugs>	 (03PS1) 10Muehlenhoff: Add PowerDNS exporter to labservices1001 [puppet] - 10https://gerrit.wikimedia.org/r/399152
[10:11:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add PowerDNS exporter to labservices1001 [puppet] - 10https://gerrit.wikimedia.org/r/399152 (owner: 10Muehlenhoff)
[10:11:59] <icinga-wm>	 PROBLEM - Check systemd state on scb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:12:06] <_joe_>	 godog: uhm, yeah it seems like there is something we don't understand
[10:12:14] <_joe_>	 I'll have to look at the code again
[10:12:51] <_joe_>	 brb
[10:12:51] <godog>	 looks like nutcracker closes the connection before python has had time to read all the output
[10:13:12] <_joe_>	 yeah it's possible that's an optimization in order to preserve sockets
[10:13:18] <_joe_>	 under high load
[10:13:52] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] apachedir is available to dumps cron jobs via a bash script, use it [puppet] - 10https://gerrit.wikimedia.org/r/398106 (owner: 10ArielGlenn)
[10:14:28] <godog>	 yeah I got the code wrong, it 
[10:14:28] <wikibugs>	 (03PS2) 10Muehlenhoff: Add PowerDNS exporter to labservices1001 [puppet] - 10https://gerrit.wikimedia.org/r/399152
[10:14:29] <icinga-wm>	 PROBLEM - Check systemd state on thumbor2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:14:40] <godog>	 it just so happened to work in the cases I tested
[10:15:07] <godog>	 fixing it
[10:15:20] <icinga-wm>	 PROBLEM - Check systemd state on thumbor2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:15:20] <icinga-wm>	 PROBLEM - Apache HTTP on mw1317 is CRITICAL: connect to address 10.64.16.198 and port 80: Connection refused
[10:15:20] <icinga-wm>	 PROBLEM - MD RAID on mw1317 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[10:15:20] <icinga-wm>	 PROBLEM - Check systemd state on mw1274 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:17:49] <icinga-wm>	 PROBLEM - Check systemd state on thumbor1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:18:40] <icinga-wm>	 PROBLEM - Check systemd state on scb2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:18:49] <icinga-wm>	 PROBLEM - Check systemd state on mw2253 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:19:49] <icinga-wm>	 PROBLEM - Check systemd state on thumbor2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:20:27] <wikibugs>	 (03PS1) 10Elukey: eventlogging_purging_whitelist.tsv: remove old table [puppet] - 10https://gerrit.wikimedia.org/r/399153
[10:21:52] <wikibugs>	 (03PS2) 10Elukey: eventlogging_purging_whitelist.tsv: remove old table [puppet] - 10https://gerrit.wikimedia.org/r/399153 (https://phabricator.wikimedia.org/T108850)
[10:23:19] <icinga-wm>	 PROBLEM - Check systemd state on scb1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:23:59] <icinga-wm>	 PROBLEM - Check systemd state on thumbor1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:24:29] <icinga-wm>	 PROBLEM - puppet last run on scb1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:25:39] <icinga-wm>	 PROBLEM - Check systemd state on scb1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:25:41] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Fix nutcracker metrics fetching [debs/prometheus-nutcracker-exporter] - 10https://gerrit.wikimedia.org/r/399154 (https://phabricator.wikimedia.org/T181995)
[10:26:08] <godog>	 _joe_: ^
[10:27:00] <icinga-wm>	 PROBLEM - Check systemd state on scb2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:27:50] <_joe_>	 godog: seems ok, want me to do a serious review?
[10:28:23] <wikibugs>	 10Operations, 10Cloud-Services, 10Cloud-VPS: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#3847501 (10akosiaris) a:05akosiaris>03None
[10:28:51] <godog>	 _joe_: the code is the same as the diamond exporter now so I'll just go ahead
[10:29:34] <_joe_>	 please do
[10:30:02] <icinga-wm>	 PROBLEM - Check systemd state on scb2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:30:15] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Fix nutcracker metrics fetching [debs/prometheus-nutcracker-exporter] - 10https://gerrit.wikimedia.org/r/399154 (https://phabricator.wikimedia.org/T181995) (owner: 10Filippo Giunchedi)
[10:30:17] <wikibugs>	 10Operations, 10OTRS, 10Security: Upgrade OTRS to 5.0.26 - https://phabricator.wikimedia.org/T183228#3847505 (10akosiaris)
[10:30:36] <wikibugs>	 10Operations, 10OTRS, 10Security: Upgrade OTRS to 5.0.26 - https://phabricator.wikimedia.org/T183228#3847518 (10akosiaris) 05Open>03Resolved Upgrade to 5.0.26 done. Resolving.
[10:30:43] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] First version of the helm chart scaffolding for production services [deployment-charts] - 10https://gerrit.wikimedia.org/r/392619 (https://phabricator.wikimedia.org/T177397) (owner: 10Giuseppe Lavagetto)
[10:33:33] <icinga-wm>	 RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:34:02] <icinga-wm>	 RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:34:38] <wikibugs>	 (03PS1) 10Hashar: Bump Jinja2 to 2.10+ [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/399155
[10:34:51] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3847529 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1330.eqiad.wmnet', 'mw1331.eqiad.wmnet...
[10:35:10] <elukey>	 imaging mw133[0,1] --^
[10:37:03] <icinga-wm>	 RECOVERY - Check systemd state on mw1204 is OK: OK - running: The system is fully operational
[10:38:42] <icinga-wm>	 RECOVERY - Check systemd state on mw2017 is OK: OK - running: The system is fully operational
[10:38:42] <icinga-wm>	 RECOVERY - Check systemd state on mw2172 is OK: OK - running: The system is fully operational
[10:38:43] <icinga-wm>	 RECOVERY - Check systemd state on mw2157 is OK: OK - running: The system is fully operational
[10:38:45] <godog>	 !log rollout updated version of prometheus-nutcracker-exporter
[10:38:52] <icinga-wm>	 RECOVERY - Check systemd state on mw2141 is OK: OK - running: The system is fully operational
[10:38:52] <icinga-wm>	 RECOVERY - Check systemd state on mw2101 is OK: OK - running: The system is fully operational
[10:38:52] <icinga-wm>	 RECOVERY - Check systemd state on mw2106 is OK: OK - running: The system is fully operational
[10:38:52] <icinga-wm>	 RECOVERY - Check systemd state on mw2176 is OK: OK - running: The system is fully operational
[10:38:53] <icinga-wm>	 RECOVERY - Check systemd state on mw2156 is OK: OK - running: The system is fully operational
[10:38:53] <icinga-wm>	 RECOVERY - Check systemd state on mw2201 is OK: OK - running: The system is fully operational
[10:38:53] <icinga-wm>	 RECOVERY - Check systemd state on mw2212 is OK: OK - running: The system is fully operational
[10:38:54] <icinga-wm>	 RECOVERY - Check systemd state on mw2144 is OK: OK - running: The system is fully operational
[10:38:54] <icinga-wm>	 RECOVERY - Check systemd state on mw2225 is OK: OK - running: The system is fully operational
[10:38:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:02] <icinga-wm>	 RECOVERY - Check systemd state on scb2005 is OK: OK - running: The system is fully operational
[10:39:03] <icinga-wm>	 RECOVERY - Check systemd state on mw2122 is OK: OK - running: The system is fully operational
[10:39:12] <icinga-wm>	 RECOVERY - Check systemd state on mw2113 is OK: OK - running: The system is fully operational
[10:39:12] <icinga-wm>	 RECOVERY - Check systemd state on scb2002 is OK: OK - running: The system is fully operational
[10:39:12] <icinga-wm>	 RECOVERY - Check systemd state on mw2223 is OK: OK - running: The system is fully operational
[10:39:12] <icinga-wm>	 RECOVERY - Check systemd state on mw1287 is OK: OK - running: The system is fully operational
[10:39:13] <icinga-wm>	 RECOVERY - Check systemd state on mw2177 is OK: OK - running: The system is fully operational
[10:39:13] <icinga-wm>	 RECOVERY - Check systemd state on mw1218 is OK: OK - running: The system is fully operational
[10:39:13] <icinga-wm>	 RECOVERY - Check systemd state on mw1203 is OK: OK - running: The system is fully operational
[10:39:14] <icinga-wm>	 RECOVERY - Check systemd state on mw1242 is OK: OK - running: The system is fully operational
[10:39:14] <icinga-wm>	 RECOVERY - Check systemd state on mw1309 is OK: OK - running: The system is fully operational
[10:39:22] <icinga-wm>	 RECOVERY - Check systemd state on mw1274 is OK: OK - running: The system is fully operational
[10:39:32] <icinga-wm>	 RECOVERY - Check systemd state on mw2202 is OK: OK - running: The system is fully operational
[10:39:32] <icinga-wm>	 RECOVERY - Check systemd state on thumbor2001 is OK: OK - running: The system is fully operational
[10:39:32] <icinga-wm>	 RECOVERY - Check systemd state on mw1293 is OK: OK - running: The system is fully operational
[10:39:33] <icinga-wm>	 RECOVERY - Check systemd state on mw1295 is OK: OK - running: The system is fully operational
[10:39:33] <icinga-wm>	 RECOVERY - Check systemd state on mw2231 is OK: OK - running: The system is fully operational
[10:39:33] <icinga-wm>	 RECOVERY - Check systemd state on mw2252 is OK: OK - running: The system is fully operational
[10:39:33] <icinga-wm>	 RECOVERY - Check systemd state on mw1187 is OK: OK - running: The system is fully operational
[10:39:34] <icinga-wm>	 RECOVERY - Check systemd state on mw1188 is OK: OK - running: The system is fully operational
[10:39:34] <icinga-wm>	 RECOVERY - Check systemd state on mw1262 is OK: OK - running: The system is fully operational
[10:39:35] <icinga-wm>	 RECOVERY - Check systemd state on mw1322 is OK: OK - running: The system is fully operational
[10:39:35] <icinga-wm>	 RECOVERY - Check systemd state on mw2218 is OK: OK - running: The system is fully operational
[10:39:42] <icinga-wm>	 RECOVERY - Check systemd state on mw2137 is OK: OK - running: The system is fully operational
[10:39:42] <icinga-wm>	 RECOVERY - Check systemd state on mw2233 is OK: OK - running: The system is fully operational
[10:39:42] <icinga-wm>	 RECOVERY - Check systemd state on mw2245 is OK: OK - running: The system is fully operational
[10:39:42] <icinga-wm>	 RECOVERY - Check systemd state on thumbor2004 is OK: OK - running: The system is fully operational
[10:39:42] <icinga-wm>	 RECOVERY - Check systemd state on thumbor2003 is OK: OK - running: The system is fully operational
[10:39:43] <icinga-wm>	 RECOVERY - Check systemd state on mw2133 is OK: OK - running: The system is fully operational
[10:39:43] <icinga-wm>	 RECOVERY - Check systemd state on scb1003 is OK: OK - running: The system is fully operational
[10:39:44] <icinga-wm>	 RECOVERY - Check systemd state on mw1190 is OK: OK - running: The system is fully operational
[10:39:52] <icinga-wm>	 RECOVERY - Check systemd state on mw1208 is OK: OK - running: The system is fully operational
[10:39:52] <icinga-wm>	 PROBLEM - puppet last run on scb1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 16 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:39:52] <icinga-wm>	 RECOVERY - Check systemd state on scb2003 is OK: OK - running: The system is fully operational
[10:39:52] <icinga-wm>	 RECOVERY - Check systemd state on mw1183 is OK: OK - running: The system is fully operational
[10:39:53] <icinga-wm>	 RECOVERY - Check systemd state on mw1220 is OK: OK - running: The system is fully operational
[10:39:53] <icinga-wm>	 RECOVERY - Check systemd state on scb2001 is OK: OK - running: The system is fully operational
[10:39:53] <icinga-wm>	 RECOVERY - Check systemd state on mw2234 is OK: OK - running: The system is fully operational
[10:39:54] <icinga-wm>	 RECOVERY - Check systemd state on mw2214 is OK: OK - running: The system is fully operational
[10:39:54] <icinga-wm>	 RECOVERY - Check systemd state on thumbor1002 is OK: OK - running: The system is fully operational
[10:39:55] <icinga-wm>	 RECOVERY - Check systemd state on mw2220 is OK: OK - running: The system is fully operational
[10:40:02] <icinga-wm>	 RECOVERY - Check systemd state on mw2255 is OK: OK - running: The system is fully operational
[10:40:02] <icinga-wm>	 RECOVERY - Check systemd state on mw2253 is OK: OK - running: The system is fully operational
[10:40:02] <icinga-wm>	 RECOVERY - Check systemd state on thumbor2002 is OK: OK - running: The system is fully operational
[10:40:02] <icinga-wm>	 RECOVERY - Check systemd state on mw1263 is OK: OK - running: The system is fully operational
[10:40:03] <icinga-wm>	 RECOVERY - Check systemd state on mw1294 is OK: OK - running: The system is fully operational
[10:40:03] <icinga-wm>	 RECOVERY - Check systemd state on scb1002 is OK: OK - running: The system is fully operational
[10:40:03] <icinga-wm>	 RECOVERY - Check systemd state on thumbor1001 is OK: OK - running: The system is fully operational
[10:40:03] <icinga-wm>	 RECOVERY - Check systemd state on scb2004 is OK: OK - running: The system is fully operational
[10:40:04] <icinga-wm>	 RECOVERY - Check systemd state on mw1195 is OK: OK - running: The system is fully operational
[10:40:06] <elukey>	 \o/
[10:40:12] <icinga-wm>	 RECOVERY - Check systemd state on scb1001 is OK: OK - running: The system is fully operational
[10:40:13] <icinga-wm>	 RECOVERY - Check systemd state on mw1213 is OK: OK - running: The system is fully operational
[10:40:32] <icinga-wm>	 RECOVERY - Check systemd state on mw1233 is OK: OK - running: The system is fully operational
[10:40:32] <icinga-wm>	 RECOVERY - Check systemd state on mw2246 is OK: OK - running: The system is fully operational
[10:40:32] <icinga-wm>	 RECOVERY - Check systemd state on scb1004 is OK: OK - running: The system is fully operational
[10:42:12] <icinga-wm>	 RECOVERY - Check systemd state on mw1214 is OK: OK - running: The system is fully operational
[10:43:22] <icinga-wm>	 RECOVERY - Check systemd state on mw1232 is OK: OK - running: The system is fully operational
[10:43:32] <icinga-wm>	 RECOVERY - Check systemd state on mw2251 is OK: OK - running: The system is fully operational
[10:43:32] <icinga-wm>	 RECOVERY - Check systemd state on mw2240 is OK: OK - running: The system is fully operational
[10:43:52] <icinga-wm>	 RECOVERY - Check systemd state on mw1310 is OK: OK - running: The system is fully operational
[10:44:02] <icinga-wm>	 RECOVERY - Check systemd state on mw1284 is OK: OK - running: The system is fully operational
[10:44:02] <icinga-wm>	 RECOVERY - Check systemd state on mw2244 is OK: OK - running: The system is fully operational
[10:44:03] <icinga-wm>	 RECOVERY - Check systemd state on mw1216 is OK: OK - running: The system is fully operational
[10:44:05] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3847554 (10elukey) Next steps:  1) image all the hosts in https://gerrit.wikimedia.org/r/397749 and put them in production (January) 2) decom old row C appserve...
[10:44:14] <_joe_>	 godog: can I reenable puppet then?
[10:44:22] <godog>	 _joe_: yup I can do it too
[10:44:32] <wikibugs>	 (03PS15) 10Jcrespo: Update mariadb::proxy to the latest style and path locations [puppet] - 10https://gerrit.wikimedia.org/r/398450 (https://phabricator.wikimedia.org/T148507)
[10:44:38] <_joe_>	 done
[10:44:48] <godog>	 nice, thanks
[10:45:22] <jynus>	 !log disabling puppet on dbproxies for 398450 deploy
[10:45:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:45:42] <icinga-wm>	 RECOVERY - Check systemd state on mw2162 is OK: OK - running: The system is fully operational
[10:45:50] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] Update mariadb::proxy to the latest style and path locations [puppet] - 10https://gerrit.wikimedia.org/r/398450 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[10:45:53] <icinga-wm>	 PROBLEM - puppet last run on mw1293 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:46:02] <icinga-wm>	 PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:46:22] <icinga-wm>	 PROBLEM - puppet last run on mw2233 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:46:23] <icinga-wm>	 RECOVERY - Apache HTTP on mw1317 is OK: HTTP OK: HTTP/1.1 200 OK - 10975 bytes in 0.001 second response time
[10:46:32] <icinga-wm>	 PROBLEM - puppet last run on mw1310 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:46:32] <icinga-wm>	 PROBLEM - puppet last run on mw2252 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:46:32] <icinga-wm>	 PROBLEM - puppet last run on mw2251 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:46:43] <icinga-wm>	 PROBLEM - puppet last run on mw2253 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:47:10] <_joe_>	 uh
[10:47:12] <godog>	 mhh I thought I had upgraded the exporter everywhere with this cumin query
[10:47:22] <godog>	 'R:class = profile::prometheus::nutcracker_exporter'
[10:47:22] <icinga-wm>	 PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:47:22] <icinga-wm>	 RECOVERY - Check systemd state on mw2169 is OK: OK - running: The system is fully operational
[10:47:24] <godog>	 clearly not
[10:47:29] <elukey>	 !log restart zookeeper on conf2001 for jvm updates - T179943
[10:47:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:47:40] <stashbot>	 T179943: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943
[10:47:41] <_joe_>	 godog: those hosts are the ones where puppet didn't run maybe
[10:47:53] <icinga-wm>	 PROBLEM - puppet last run on mw2101 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:48:11] <volans>	 godog: is it the same of 'R:Service = prometheus-nutcracker-exporter' ? 119 hosts
[10:48:39] <_joe_>	 godog: what's the correct version?
[10:48:40] <godog>	 yeah should be volans 
[10:48:42] <icinga-wm>	 PROBLEM - puppet last run on mw2169 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-nutcracker-exporter]
[10:48:51] <godog>	 _joe_: 0.2 or 0.2~trusty1 on trusty
[10:48:57] <volans>	 'P:prometheus::nutcracker_exporter' gets 129 hosts
[10:49:12] <_joe_>	 godog: it's 0.2 indeed on mw1293
[10:49:26] <godog>	 I guess because puppet ran and failed they never updated puppetdb and thus don't return in the cumin query ?
[10:49:38] <_joe_>	 godog: no it's something else I'd say
[10:49:42] <_joe_>	 lemme see
[10:50:22] <_joe_>	 which is a videoscaler btw
[10:50:29] <_joe_>	 sorry, imagescaler
[10:51:52] <icinga-wm>	 RECOVERY - configured eth on mw1317 is OK: OK - interfaces up
[10:51:53] <icinga-wm>	 RECOVERY - nutcracker port on mw1317 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[10:51:53] <icinga-wm>	 RECOVERY - Disk space on mw1317 is OK: DISK OK
[10:52:12] <icinga-wm>	 RECOVERY - DPKG on mw1317 is OK: All packages OK
[10:52:23] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw1317 is OK: OK ferm input default policy is set
[10:52:23] <icinga-wm>	 RECOVERY - dhclient process on mw1317 is OK: PROCS OK: 0 processes with command name dhclient
[10:52:23] <icinga-wm>	 RECOVERY - MD RAID on mw1317 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0
[10:52:42] <icinga-wm>	 RECOVERY - nutcracker process on mw1317 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker
[10:52:42] <icinga-wm>	 RECOVERY - HHVM processes on mw1317 is OK: PROCS OK: 6 processes with command name hhvm
[10:52:46] <moritzm>	 !log upgrading pdns-recursor on nescio to 4.0.4+deb9u3~bpo8+1 (security fix)
[10:52:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:53:06] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] config setting to permit a list of wikis to be dumped in a specific order [dumps] - 10https://gerrit.wikimedia.org/r/398861 (owner: 10ArielGlenn)
[10:53:45] <elukey>	 !log reboot conf2001 for kernel updates - T179943
[10:53:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:53:58] <stashbot>	 T179943: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943
[10:54:32] <icinga-wm>	 RECOVERY - puppet last run on scb1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[10:54:32] <godog>	 _joe_: tried a new puppet run on mw2251 and it worked, I'll force a run where it failed
[10:54:40] <_joe_>	 yeah
[10:54:42] <logmsgbot>	 !log ariel@tin Started deploy [dumps/dumps@2bafffe]: allow dump runs in specified wiki list order, rather than by longest to wait
[10:54:45] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@2bafffe]: allow dump runs in specified wiki list order, rather than by longest to wait (duration: 00m 02s)
[10:54:47] <elukey>	 ah snap there is also etcd, always forget
[10:54:51] <_joe_>	 elukey: rebooting conf2001?
[10:54:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:54:52] <icinga-wm>	 RECOVERY - puppet last run on scb1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:54:54] <_joe_>	 seriosuly?
[10:55:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:55:10] <elukey>	 _joe_ yes yes I am checking etcd now, but it needs to be done
[10:55:45] <_joe_>	 elukey: and it's ok, let's just verify for instance where the mirror is running
[10:55:52] <icinga-wm>	 RECOVERY - puppet last run on mw1293 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[10:56:02] <icinga-wm>	 RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[10:56:13] <elukey>	 _joe_ yes sorry I had a moment of "yeah there is only zk on those" and then as soon as I've hit "enter" I realized :D
[10:56:22] <icinga-wm>	 RECOVERY - puppet last run on mw2233 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[10:56:24] <elukey>	 (enter == SAL)
[10:56:29] <_joe_>	 yeah I got that
[10:56:31] <_joe_>	 so
[10:56:32] <icinga-wm>	 RECOVERY - puppet last run on mw1310 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[10:56:32] <icinga-wm>	 RECOVERY - puppet last run on mw2251 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[10:56:32] <icinga-wm>	 RECOVERY - puppet last run on mw2252 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[10:56:44] <_joe_>	 conf2001 or the whole cluster in codfw?
[10:56:50] <wikibugs>	 (03PS1) 10Hashar: Bump Jinja2 from 2.9.6 to 2.10 [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/399157
[10:56:55] <elukey>	 _joe_ I'll to the reboots after the freeze, only zk restarts now
[10:57:10] <_joe_>	 elukey: yeah I think that's advisable
[10:57:16] <elukey>	 yep yep
[10:57:20] <_joe_>	 I mean we can reboot those
[10:57:22] <icinga-wm>	 RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[10:57:29] <_joe_>	 but I'd wait if possible
[10:57:36] <elukey>	 +1, brain fault
[10:57:47] <_joe_>	 we'll have to get the new conf* servers in prod in eqiad anyways in january
[10:58:44] <icinga-wm>	 RECOVERY - puppet last run on mw2169 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:00:31] <wikibugs>	 (03CR) 10Filippo Giunchedi: Add PowerDNS exporter to labservices1001 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/399152 (owner: 10Muehlenhoff)
[11:00:54] <icinga-wm>	 PROBLEM - nova-compute process on labvirt1010 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute
[11:01:35] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mw1317 is OK: OK: synced at Tue 2017-12-19 11:01:33 UTC.
[11:01:44] <icinga-wm>	 PROBLEM - MD RAID on mw1318 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[11:01:44] <icinga-wm>	 RECOVERY - puppet last run on mw2253 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[11:01:54] <icinga-wm>	 RECOVERY - nova-compute process on labvirt1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute
[11:03:47] <wikibugs>	 (03PS1) 10ArielGlenn: enable dumps of big wikis to run in a fixed order [puppet] - 10https://gerrit.wikimedia.org/r/399158
[11:04:01] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Move role::prometheus::k8s to profile [puppet] - 10https://gerrit.wikimedia.org/r/399159
[11:04:03] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Introduce profile::prometheus::k8s::staging [puppet] - 10https://gerrit.wikimedia.org/r/399160
[11:04:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Introduce profile::prometheus::k8s::staging [puppet] - 10https://gerrit.wikimedia.org/r/399160 (owner: 10Alexandros Kosiaris)
[11:05:05] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1318 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[11:05:05] <icinga-wm>	 PROBLEM - configured eth on mw1318 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[11:05:33] <moritzm>	 ^ reimage, silencing
[11:06:34] <icinga-wm>	 RECOVERY - puppet last run on mw2101 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[11:10:35] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Introduce profile::prometheus::k8s::staging [puppet] - 10https://gerrit.wikimedia.org/r/399160
[11:11:39] <wikibugs>	 (03CR) 10ArielGlenn: [C: 04-2] "Do not merge until current dump run completes." [puppet] - 10https://gerrit.wikimedia.org/r/399158 (owner: 10ArielGlenn)
[11:13:03] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3847595 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1330.eqiad.wmnet'] ``` The log can be...
[11:15:50] <wikibugs>	 (03CR) 10Hashar: "Or we can solely bump it in the deploy repo has done via https://gerrit.wikimedia.org/r/#/c/399157/" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/399155 (owner: 10Hashar)
[11:16:21] <moritzm>	 !log uploaded pdns-recursor 4.0.4+deb9u3~bpo8+1 to apt.wikimedia.org
[11:16:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:32] <wikibugs>	 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Port redis statistics to Prometheus - https://phabricator.wikimedia.org/T148637#3847626 (10fgiunchedi) I "promoted" (renamed) the prometheus dashboard to "redis" and the previous to "redis-graphite": https://grafana.wikim...
[11:26:22] <wikibugs>	 (03PS1) 10Volans: wmf-auto-reimage: improve resume capabilities [puppet] - 10https://gerrit.wikimedia.org/r/399161 (https://phabricator.wikimedia.org/T182702)
[11:26:25] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Bump Jinja2 from 2.9.6 to 2.10 [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/399157 (owner: 10Hashar)
[11:26:35] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "So, should I include this in the role ? But then codfw is going to be polling the staging cluster. Should I create a new role ? But then w" [puppet] - 10https://gerrit.wikimedia.org/r/399160 (owner: 10Alexandros Kosiaris)
[11:27:53] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "> So, should I include this in the role ? But then codfw is going to" [puppet] - 10https://gerrit.wikimedia.org/r/399160 (owner: 10Alexandros Kosiaris)
[11:28:07] <logmsgbot>	 !log hashar@tin Started deploy [docker-pkg/deploy@09087ad]: Bumping Jinja2 2.9.6..2.10
[11:28:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:37] <logmsgbot>	 !log hashar@tin Finished deploy [docker-pkg/deploy@09087ad]: Bumping Jinja2 2.9.6..2.10 (duration: 00m 30s)
[11:28:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:23] <moritzm>	 !log upgrading pdns-recursor on maerlant to 4.0.4+deb9u3~bpo8+1 (security fix)
[11:29:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:31:37] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mw1331 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[11:31:37] <icinga-wm>	 PROBLEM - configured eth on mw1331 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[11:33:24] <elukey>	 added downtime, new appserver --^
[11:34:03] <wikibugs>	 10Operations, 10MediaWiki-Configuration, 10discovery-system: [DRAFT] Use EtcdConfig in production to allow automation of a datacenter switch - https://phabricator.wikimedia.org/T182597#3847651 (10Volans) p:05Triage>03Normal
[11:35:25] <volans>	 thanks godog (kill+restore topic)
[11:36:14] <godog>	 np!
[11:36:18] <godog>	 the kill wasn't me tho
[11:40:01] <zeljkof>	 hashar, volans: do you know who is in charge these days for beta cluster? looks like enwiki is broken :( https://phabricator.wikimedia.org/T183232
[11:41:06] <volans>	 zeljkof: me not really
[11:41:15] <volans>	 but I can try to find out ;)
[11:41:38] <zeljkof>	 volans: that was my guess, but I was sure you would know more than I do :)
[11:41:44] <zeljkof>	 thanks
[11:41:45] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: add nutcracker job [puppet] - 10https://gerrit.wikimedia.org/r/399163 (https://phabricator.wikimedia.org/T181995)
[11:41:49] <wikibugs>	 (03CR) 10Hashar: "Bumping Jinja2 in /deploy fixed it for me ( https://gerrit.wikimedia.org/r/#/c/399157/ )." [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/399155 (owner: 10Hashar)
[11:42:26] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507)
[11:42:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[11:44:41] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507)
[11:45:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[11:46:02] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: Introduce profile::prometheus::k8s::staging [puppet] - 10https://gerrit.wikimedia.org/r/399160
[11:47:54] <wikibugs>	 (03PS3) 10Jcrespo: mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507)
[11:48:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[11:49:26] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1318 is OK: OK: nf_conntrack is 0 % full
[11:49:26] <icinga-wm>	 RECOVERY - configured eth on mw1318 is OK: OK - interfaces up
[11:49:57] <icinga-wm>	 RECOVERY - MD RAID on mw1318 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0
[11:53:46] <icinga-wm>	 RECOVERY - configured eth on mw1331 is OK: OK - interfaces up
[11:57:00] <wikibugs>	 (03PS4) 10Jcrespo: mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507)
[11:57:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[11:59:37] <hashar>	 !log CI: switching composer-php55 / composer-package-php55 jobs from Nodepool to Docker | https://gerrit.wikimedia.org/r/#/c/398920/
[11:59:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:59:54] <wikibugs>	 (03PS5) 10Jcrespo: mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507)
[12:00:40] <icinga-wm>	 PROBLEM - Check systemd state on mw1330 is CRITICAL: Return code of 255 is out of bounds
[12:01:39] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mw1331 is OK: OK: synced at Tue 2017-12-19 12:01:31 UTC.
[12:02:20] <icinga-wm>	 PROBLEM - configured eth on mw1330 is CRITICAL: Return code of 255 is out of bounds
[12:02:20] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mw1330 is CRITICAL: Return code of 255 is out of bounds
[12:04:09] <icinga-wm>	 PROBLEM - dhclient process on mw1330 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:04:09] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1330 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:05:49] <icinga-wm>	 PROBLEM - DPKG on mw1330 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:05:49] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1330 is CRITICAL: Host mw1330 is not in mediawiki-installation dsh group
[12:07:30] <icinga-wm>	 PROBLEM - Disk space on mw1330 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:07:30] <icinga-wm>	 PROBLEM - nutcracker port on mw1330 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:08:24] <godog>	 silenced
[12:09:02] <volans>	 zeljkof: so it seems it should be mostly releng if I'm not mistaken
[12:09:13] <zeljkof>	 volans: :D
[12:09:27] <zeljkof>	 hashar: do you agree? ;) ^
[12:09:42] <zeljkof>	 I know hashar used to work on it, but I don't think he does any more
[12:10:14] <volans>	 and having a quick look at shinken there are a lot of thing in alarm, not sure which one is the culprit. For example Puppet is failing on deployment-mediawikiNN since 5 days for a duplicate declaration
[12:10:17] <zeljkof>	 looks like its wikidata problem... (from the error message)
[12:11:45] <_joe_>	 I'm definitely sure deployment-prep is taken care of by releng.
[12:12:10] <_joe_>	 if that has changed, I didn't get the memo, or I didn't read it (both are equally possible)
[12:12:40] <hashar>	 !log CI: switching mwgate-composer-php70 job from Nodepool to Docker | https://gerrit.wikimedia.org/r/#/c/398921/
[12:12:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:29] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3847732 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1330.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['mw1330.eqiad.wmnet'] ```
[12:20:06] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: Create an envoy docker image. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/396021
[12:25:17] <wikibugs>	 10Operations: After reimage Puppet order: sudo command failed - https://phabricator.wikimedia.org/T183236#3847744 (10Volans)
[12:25:27] <wikibugs>	 10Operations: After reimage Puppet order: sudo command failed - https://phabricator.wikimedia.org/T183236#3847754 (10Volans) p:05Triage>03Normal
[12:29:33] <icinga-wm>	 RECOVERY - configured eth on mw1330 is OK: OK - interfaces up
[12:29:43] <icinga-wm>	 RECOVERY - Disk space on mw1330 is OK: DISK OK
[12:29:43] <icinga-wm>	 RECOVERY - nutcracker port on mw1330 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[12:29:53] <icinga-wm>	 RECOVERY - Check systemd state on mw1330 is OK: OK - running: The system is fully operational
[12:29:54] <icinga-wm>	 RECOVERY - DPKG on mw1330 is OK: All packages OK
[12:30:14] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw1330 is OK: OK ferm input default policy is set
[12:30:14] <icinga-wm>	 RECOVERY - dhclient process on mw1330 is OK: PROCS OK: 0 processes with command name dhclient
[12:31:05] <wikibugs>	 (03PS3) 10ArielGlenn: dataset1001 rsync to labs of dumps can now use explicit inclusion list [puppet] - 10https://gerrit.wikimedia.org/r/336204 (https://phabricator.wikimedia.org/T154798)
[12:31:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] dataset1001 rsync to labs of dumps can now use explicit inclusion list [puppet] - 10https://gerrit.wikimedia.org/r/336204 (https://phabricator.wikimedia.org/T154798) (owner: 10ArielGlenn)
[12:32:23] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mw1330 is OK: OK: synced at Tue 2017-12-19 12:32:18 UTC.
[12:34:21] <wikibugs>	 (03PS4) 10ArielGlenn: dataset1001 rsync to labs of dumps can now use explicit inclusion list [puppet] - 10https://gerrit.wikimedia.org/r/336204 (https://phabricator.wikimedia.org/T154798)
[12:48:49] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3847784 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1330.eqiad.wmnet'] ```  and were **ALL** successful.
[12:50:59] <wikibugs>	 (03PS5) 10ArielGlenn: dataset1001 rsync to labs of dumps can now use explicit inclusion list [puppet] - 10https://gerrit.wikimedia.org/r/336204 (https://phabricator.wikimedia.org/T154798)
[12:54:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 031] Jonas Kress move from ldap to shell, add to groups [puppet] - 10https://gerrit.wikimedia.org/r/398524 (https://phabricator.wikimedia.org/T182908) (owner: 10RobH)
[13:03:54] <wikibugs>	 (03CR) 10Gehel: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/398858 (owner: 10Muehlenhoff)
[13:05:12] <logmsgbot>	 !log elukey@puppetmaster1001 conftool action : set/pooled=no; selector: name=mw133[0-1].eqiad.wmnet
[13:05:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:53] <logmsgbot>	 !log jmm@puppetmaster1001 conftool action : set/pooled=yes; selector: mw1318.eqiad.wmnet
[13:05:54] <elukey>	 volans: confirmed also from my side that everything works fine with wmf-auto-reimage now, thanks!
[13:06:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:35] <volans>	 elukey: good to know, thanks, I've sent a CR for some improvements on resume, JIC ;)
[13:07:11] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3847801 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1332.eqiad.wmnet', 'mw1333.eqiad.wmnet...
[13:07:58] <icinga-wm>	 RECOVERY - HHVM rendering on mw1317 is OK: HTTP OK: HTTP/1.1 200 OK - 73999 bytes in 0.145 second response time
[13:10:45] <wikibugs>	 (03PS3) 10Muehlenhoff: Add Prometheus exporter for Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/398858
[13:13:36] <logmsgbot>	 !log jmm@puppetmaster1001 conftool action : set/pooled=yes; selector: mw1317.eqiad.wmnet
[13:13:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:14:15] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3796474 (10dcausse) I ported elasticsearch-memory and elasticsearch-indexing. - https://grafana-admin.wikimedia.org/dashboard/db/el...
[13:15:17] <icinga-wm>	 RECOVERY - puppet last run on mw1317 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[13:33:31] <wikibugs>	 (03PS1) 10Gilles: Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315)
[13:34:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315) (owner: 10Gilles)
[13:37:26] <wikibugs>	 (03PS2) 10Gilles: Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315)
[13:37:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Add Prometheus exporter for Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/398858 (owner: 10Muehlenhoff)
[13:37:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315) (owner: 10Gilles)
[13:40:14] <wikibugs>	 (03PS3) 10Gilles: Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315)
[13:41:16] <wikibugs>	 (03PS2) 10Muehlenhoff: Add Prometheus scraper configs for WDQS updater and Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/398865
[13:42:28] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic: cp4032 memory error - https://phabricator.wikimedia.org/T183176#3845938 (10ema) >>! In T183176#3847321, @Volans wrote: > @RobH FYI I've ack'ed the Icinga alert of the host down and set it to downtime until Fri UTC morning.  I've just ack'ed all related strongswan alerts...
[13:43:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Add Prometheus scraper configs for WDQS updater and Blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/398865 (owner: 10Muehlenhoff)
[13:55:13] <wikibugs>	 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3847941 (10MoritzMuehlenhoff)
[13:55:15] <wikibugs>	 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Create Prometheus exporter for wdqs-updater - https://phabricator.wikimedia.org/T182773#3847939 (10MoritzMuehlenhoff) 05Open>03Resolved An exporter has been written, packaged and rolled out.
[13:55:18] <wikibugs>	 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3650139 (10MoritzMuehlenhoff)
[13:55:21] <wikibugs>	 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Create Prometheus exporter for Blazegraph - https://phabricator.wikimedia.org/T182857#3847942 (10MoritzMuehlenhoff) 05Open>03Resolved An exporter has been written, packaged and rolled out.
[13:57:56] <moritzm>	 !log upgrading pdns-recursor on achernar/acamar to 4.0.4+deb9u3~bpo8+1 (security fix)
[13:58:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:54] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/399159 (owner: 10Alexandros Kosiaris)
[14:01:47] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM (modulo jenkins' -1)" [puppet] - 10https://gerrit.wikimedia.org/r/392441 (https://phabricator.wikimedia.org/T177196) (owner: 10Alexandros Kosiaris)
[14:01:52] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1332 is CRITICAL: Host mw1332 is not in mediawiki-installation dsh group
[14:01:52] <icinga-wm>	 PROBLEM - DPKG on mw1332 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[14:02:37] <wikibugs>	 (03PS4) 10Gilles: Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315)
[14:05:01] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1317 is OK: OK
[14:05:51] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1330 is OK: OK
[14:07:01] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] "Looks ok to me: https://puppet-compiler.wmflabs.org/compiler02/9411/" [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[14:08:20] <wikibugs>	 (03PS6) 10Jcrespo: mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507)
[14:11:49] <wikibugs>	 10Operations, 10DBA: Reimage and upgrade to stretch all proxies - https://phabricator.wikimedia.org/T183249#3848018 (10jcrespo) p:05Triage>03Normal
[14:12:07] <wikibugs>	 (03PS5) 10Gilles: Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315)
[14:12:17] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Firewall configurations for database hosts - https://phabricator.wikimedia.org/T104699#3848045 (10jcrespo)
[14:12:19] <wikibugs>	 10Operations, 10DBA: Reimage and upgrade to stretch all proxies - https://phabricator.wikimedia.org/T183249#3848043 (10jcrespo)
[14:13:35] <wikibugs>	 (03PS3) 10Ema: mtail: add varnishreqstats.mtail [puppet] - 10https://gerrit.wikimedia.org/r/398819 (https://phabricator.wikimedia.org/T177199)
[14:14:20] <wikibugs>	 (03PS7) 10Jcrespo: mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507)
[14:16:41] <wikibugs>	 (03PS6) 10Gilles: Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315)
[14:16:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: "See inline, LGTM overall." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/399160 (owner: 10Alexandros Kosiaris)
[14:18:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: mtail: add varnishreqstats.mtail (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/398819 (https://phabricator.wikimedia.org/T177199) (owner: 10Ema)
[14:23:02] <wikibugs>	 (03CR) 10Gilles: "Puppet compiler: https://puppet-compiler.wmflabs.org/compiler02/9415/" [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315) (owner: 10Gilles)
[14:25:01] <icinga-wm>	 RECOVERY - DPKG on mw1332 is OK: All packages OK
[14:26:47] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Move role::prometheus::k8s to profile [puppet] - 10https://gerrit.wikimedia.org/r/399159
[14:26:56] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Reimage and upgrade to stretch all dbproxies - https://phabricator.wikimedia.org/T183249#3848099 (10jcrespo)
[14:27:01] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Move role::prometheus::k8s to profile [puppet] - 10https://gerrit.wikimedia.org/r/399159 (owner: 10Alexandros Kosiaris)
[14:28:26] <jynus>	 !log disabling puppet on dbproxies for 399164 deploy
[14:28:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:28] <wikibugs>	 (03PS4) 10Ema: mtail: add varnishreqstats.mtail [puppet] - 10https://gerrit.wikimedia.org/r/398819 (https://phabricator.wikimedia.org/T177199)
[14:30:09] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[14:30:17] <wikibugs>	 (03PS8) 10Jcrespo: mariadb: Preparing reimage of dbproxy1001 and setup proxy firewall [puppet] - 10https://gerrit.wikimedia.org/r/399164 (https://phabricator.wikimedia.org/T148507)
[14:30:46] <wikibugs>	 (03CR) 10Alexandros Kosiaris: Introduce profile::prometheus::k8s::staging (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/399160 (owner: 10Alexandros Kosiaris)
[14:30:56] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: Introduce profile::prometheus::k8s::staging [puppet] - 10https://gerrit.wikimedia.org/r/399160
[14:34:14] <wikibugs>	 (03PS1) 10Catrope: Depool deployment-db04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399182 (https://phabricator.wikimedia.org/T183252)
[14:35:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Depool deployment-db04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399182 (https://phabricator.wikimedia.org/T183252) (owner: 10Catrope)
[14:35:37] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Introduce profile::prometheus::k8s::staging [puppet] - 10https://gerrit.wikimedia.org/r/399160
[14:36:09] <wikibugs>	 (03PS5) 10Ema: mtail: add varnishreqstats.mtail [puppet] - 10https://gerrit.wikimedia.org/r/398819 (https://phabricator.wikimedia.org/T177199)
[14:37:33] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Reimage and upgrade to stretch all dbproxies - https://phabricator.wikimedia.org/T183249#3848018 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['dbproxy1001.eqiad.wmnet'] ``` The log can be found in `/var/...
[14:38:46] <wikibugs>	 (03PS3) 10Muehlenhoff: Add PowerDNS exporter to labservices1001 [puppet] - 10https://gerrit.wikimedia.org/r/399152
[14:39:08] <wikibugs>	 (03CR) 10Ema: mtail: add varnishreqstats.mtail (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/398819 (https://phabricator.wikimedia.org/T177199) (owner: 10Ema)
[14:40:37] <wikibugs>	 (03PS1) 10Rush: openstack: labvirt role shuffle [puppet] - 10https://gerrit.wikimedia.org/r/399183
[14:42:43] <wikibugs>	 (03PS2) 10Catrope: Depool deployment-db04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399182 (https://phabricator.wikimedia.org/T183252)
[14:44:41] <moritzm>	 !log installing request-tracker4 update from jessie point release on ununpentium
[14:44:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:03] <wikibugs>	 (03CR) 10BBlack: [C: 031] mtail: add varnishreqstats.mtail [puppet] - 10https://gerrit.wikimedia.org/r/398819 (https://phabricator.wikimedia.org/T177199) (owner: 10Ema)
[14:45:11] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] Depool deployment-db04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399182 (https://phabricator.wikimedia.org/T183252) (owner: 10Catrope)
[14:49:23] <wikibugs>	 (03PS2) 10Rush: openstack: labvirt role shuffle [puppet] - 10https://gerrit.wikimedia.org/r/399183
[14:49:33] <moritzm>	 !log restarting hhvm on canary app servers to pick up security updates for openssl, icu and libx11
[14:49:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:43] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3848236 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1334.eqiad.wmnet', 'mw1333.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['mw1334.eq...
[14:51:25] <wikibugs>	 (03CR) 10Catrope: [C: 032] Depool deployment-db04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399182 (https://phabricator.wikimedia.org/T183252) (owner: 10Catrope)
[14:52:54] <wikibugs>	 (03Merged) 10jenkins-bot: Depool deployment-db04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399182 (https://phabricator.wikimedia.org/T183252) (owner: 10Catrope)
[14:53:13] <wikibugs>	 (03CR) 10jenkins-bot: Depool deployment-db04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399182 (https://phabricator.wikimedia.org/T183252) (owner: 10Catrope)
[14:53:40] <wikibugs>	 (03CR) 10Ema: [C: 032] mtail: add varnishreqstats.mtail [puppet] - 10https://gerrit.wikimedia.org/r/398819 (https://phabricator.wikimedia.org/T177199) (owner: 10Ema)
[14:54:51] <wikibugs>	 (03CR) 10Herron: "Thanks Volans!" [puppet] - 10https://gerrit.wikimedia.org/r/398120 (https://phabricator.wikimedia.org/T182819) (owner: 10Herron)
[14:55:05] <wikibugs>	 (03CR) 10Herron: [C: 04-2] "Not to be merged until after holiday break" [puppet] - 10https://gerrit.wikimedia.org/r/398120 (https://phabricator.wikimedia.org/T182819) (owner: 10Herron)
[14:55:09] <wikibugs>	 (03PS3) 10Rush: openstack: labvirt role shuffle [puppet] - 10https://gerrit.wikimedia.org/r/399183
[14:55:23] <volans>	 yw herron :)
[14:55:38] <wikibugs>	 (03PS1) 10Marostegui: Revert "Revert "Revert "db-eqiad.php: Depool db1106""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399186
[14:55:46] <wikibugs>	 (03CR) 10Rush: [C: 032] openstack: labvirt role shuffle [puppet] - 10https://gerrit.wikimedia.org/r/399183 (owner: 10Rush)
[14:55:51] <wikibugs>	 (03PS2) 10Marostegui: Revert "Revert "Revert "db-eqiad.php: Depool db1106""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399186
[14:59:08] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "Revert "Revert "db-eqiad.php: Depool db1106""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399186 (owner: 10Marostegui)
[15:00:36] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Revert "Revert "db-eqiad.php: Depool db1106""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399186 (owner: 10Marostegui)
[15:00:49] <wikibugs>	 (03CR) 10jenkins-bot: Revert "Revert "Revert "db-eqiad.php: Depool db1106""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399186 (owner: 10Marostegui)
[15:01:40] <wikibugs>	 (03PS1) 10Muehlenhoff: Add library hint for libx11 [puppet] - 10https://gerrit.wikimedia.org/r/399189
[15:01:41] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1106 - T161294 (duration: 00m 52s)
[15:01:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:54] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[15:04:06] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1109, db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399191 (https://phabricator.wikimedia.org/T161294)
[15:07:49] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1109, db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399191 (https://phabricator.wikimedia.org/T161294) (owner: 10Marostegui)
[15:08:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Add library hint for libx11 [puppet] - 10https://gerrit.wikimedia.org/r/399189 (owner: 10Muehlenhoff)
[15:09:27] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1109, db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399191 (https://phabricator.wikimedia.org/T161294) (owner: 10Marostegui)
[15:09:40] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1109, db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399191 (https://phabricator.wikimedia.org/T161294) (owner: 10Marostegui)
[15:10:45] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1099:3318 and db1109 - T161294 (duration: 00m 51s)
[15:10:50] <marostegui>	 !log Stop replication in sync on db1109 and db1099:3318 - https://phabricator.wikimedia.org/T161294
[15:10:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:54] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[15:11:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:48] <wikibugs>	 (03PS1) 10Jcrespo: dbproxy: Apply both regular and cloud only exception for 'cloud' [puppet] - 10https://gerrit.wikimedia.org/r/399194 (https://phabricator.wikimedia.org/T104699)
[15:14:29] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] dbproxy: Apply both regular and cloud only exception for 'cloud' [puppet] - 10https://gerrit.wikimedia.org/r/399194 (https://phabricator.wikimedia.org/T104699) (owner: 10Jcrespo)
[15:14:43] <wikibugs>	 (03PS3) 10Ottomata: Set superset auth_settings => undef if not using ldap_proxy [puppet] - 10https://gerrit.wikimedia.org/r/396143 (https://phabricator.wikimedia.org/T166689)
[15:14:49] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Set superset auth_settings => undef if not using ldap_proxy [puppet] - 10https://gerrit.wikimedia.org/r/396143 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata)
[15:15:38] <jynus>	 moritzm: ok to merge?
[15:16:05] <wikibugs>	 (03PS1) 10Elukey: role::druid::analytics: lower down all the Xms settings [puppet] - 10https://gerrit.wikimedia.org/r/399195
[15:17:42] <moritzm>	 jynus: yes, sorry
[15:18:22] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[15:18:51] <wikibugs>	 (03CR) 10Elukey: [C: 032] role::druid::analytics: lower down all the Xms settings [puppet] - 10https://gerrit.wikimedia.org/r/399195 (owner: 10Elukey)
[15:19:08] <wikibugs>	 (03CR) 10Mforns: [C: 031] "LVGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/399153 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey)
[15:19:20] <ottomata>	 elukey:   my thing can be merged
[15:19:24] <ottomata>	 its a no-op
[15:19:32] <ottomata>	 i'll let you puppet-merge yours and mine
[15:19:40] <elukey>	 super
[15:20:16] <wikibugs>	 (03PS3) 10Elukey: eventlogging_purging_whitelist.tsv: remove old table [puppet] - 10https://gerrit.wikimedia.org/r/399153 (https://phabricator.wikimedia.org/T108850)
[15:20:22] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge.
[15:21:59] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] network::constants: drop uranium from monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/399119 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[15:22:05] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: network::constants: drop uranium from monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/399119 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[15:22:07] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] network::constants: drop uranium from monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/399119 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[15:22:58] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] remove ganglia_aggregators settings from hiera [puppet] - 10https://gerrit.wikimedia.org/r/399120 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[15:24:43] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "The commit message needs some rewording. It gives the impression we are wondering about how to do things while the commit itself is clear " [puppet] - 10https://gerrit.wikimedia.org/r/382930 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[15:24:57] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] ganglia: delete the module [puppet] - 10https://gerrit.wikimedia.org/r/382933 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[15:25:54] <chasemp>	 !log labvirt10[19|20] aptitude install linux-image-4.4.0-81-generic linux-image-extra-4.4.0-81-generic; sudo update-grub; /sbin/reboot T172538
[15:26:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:05] <stashbot>	 T172538: rack/setup/install labvirt10(19|20).eqiad.wmnet - https://phabricator.wikimedia.org/T172538
[15:29:49] <wikibugs>	 (03Abandoned) 10Ottomata: [WIP] Add cergen module [puppet] - 10https://gerrit.wikimedia.org/r/391134 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata)
[15:30:21] <wikibugs>	 (03PS6) 10Alexandros Kosiaris: Introduce profile::prometheus::k8s::staging [puppet] - 10https://gerrit.wikimedia.org/r/399160
[15:30:23] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Add prometheus::postgres_exporter class to users [puppet] - 10https://gerrit.wikimedia.org/r/392441 (https://phabricator.wikimedia.org/T177196)
[15:30:57] <wikibugs>	 (03PS6) 10Alexandros Kosiaris: Add prometheus::postgres_exporter class to users [puppet] - 10https://gerrit.wikimedia.org/r/392441 (https://phabricator.wikimedia.org/T177196)
[15:31:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add prometheus::postgres_exporter class to users [puppet] - 10https://gerrit.wikimedia.org/r/392441 (https://phabricator.wikimedia.org/T177196) (owner: 10Alexandros Kosiaris)
[15:31:46] <akosiaris>	 damn I hate you jenkins
[15:33:45] <SantaC>	 rip
[15:33:46] <wikibugs>	 (03PS7) 10Alexandros Kosiaris: Add prometheus::postgres_exporter class to users [puppet] - 10https://gerrit.wikimedia.org/r/392441 (https://phabricator.wikimedia.org/T177196)
[15:38:16] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3848419 (10Gehel) Additional missing metrics:  * elasticsearch.indices.search.groups.prefix.query_total * elasticsearch.indices.sea...
[15:39:16] <icinga-wm>	 PROBLEM - NTP peers on achernar is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
[15:40:16] <icinga-wm>	 RECOVERY - NTP peers on achernar is OK: NTP OK: Offset 0.000159 secs
[15:40:24] <wikibugs>	 (03PS1) 10Ema: prometheus: add reqstats aggregation rule [puppet] - 10https://gerrit.wikimedia.org/r/399199 (https://phabricator.wikimedia.org/T177199)
[15:41:46] <icinga-wm>	 PROBLEM - NTP peers on nescio is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
[15:41:46] <icinga-wm>	 PROBLEM - NTP peers on acamar is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
[15:42:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 031] prometheus: add nutcracker job [puppet] - 10https://gerrit.wikimedia.org/r/399163 (https://phabricator.wikimedia.org/T181995) (owner: 10Filippo Giunchedi)
[15:42:46] <icinga-wm>	 RECOVERY - NTP peers on nescio is OK: NTP OK: Offset -0.00034 secs
[15:42:46] <icinga-wm>	 RECOVERY - NTP peers on acamar is OK: NTP OK: Offset -0.00036 secs
[15:43:12] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Fix bug by which the wrong role was being set up on dbproxies [puppet] - 10https://gerrit.wikimedia.org/r/399200
[15:44:30] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Fix bug by which the wrong role was being set up on dbproxies [puppet] - 10https://gerrit.wikimedia.org/r/399200 (owner: 10Jcrespo)
[15:45:46] <icinga-wm>	 PROBLEM - NTP peers on hydrogen is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
[15:46:38] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3848492 (10Gehel) Issue opened upstream to include those metrics: https://github.com/justwatchcom/elasticsearch_exporter/issues/115
[15:46:46] <icinga-wm>	 RECOVERY - NTP peers on hydrogen is OK: NTP OK: Offset -1e-06 secs
[15:48:26] <wikibugs>	 (03PS2) 10Ema: prometheus: add reqstats aggregation rule [puppet] - 10https://gerrit.wikimedia.org/r/399199 (https://phabricator.wikimedia.org/T177199)
[15:51:35] <wikibugs>	 (03CR) 10Ema: [C: 032] prometheus: add reqstats aggregation rule [puppet] - 10https://gerrit.wikimedia.org/r/399199 (https://phabricator.wikimedia.org/T177199) (owner: 10Ema)
[15:54:35] <icinga-wm>	 PROBLEM - NTP peers on chromium is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
[15:55:05] <icinga-wm>	 PROBLEM - NTP peers on maerlant is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
[15:55:34] <wikibugs>	 10Operations, 10Traffic, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Add Prometheus client support for varnish/statsd metrics daemons - https://phabricator.wikimedia.org/T177199#3848555 (10ema)
[15:55:35] <icinga-wm>	 RECOVERY - NTP peers on chromium is OK: NTP OK: Offset 0.008719 secs
[15:56:05] <icinga-wm>	 RECOVERY - NTP peers on maerlant is OK: NTP OK: Offset -0.000284 secs
[15:59:41] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3848559 (10Gehel) >>! In T181627#3845066, @Ottomata wrote: > I'm probably doing somethign wrong with the ~jessie1 and ~stretch1 ver...
[16:01:58] <wikibugs>	 (03PS1) 10Elukey: role::druid::analytics::worker: review jvm configurations [puppet] - 10https://gerrit.wikimedia.org/r/399205
[16:04:19] <wikibugs>	 (03CR) 10Elukey: [C: 032] eventlogging_purging_whitelist.tsv: remove old table [puppet] - 10https://gerrit.wikimedia.org/r/399153 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey)
[16:04:30] <wikibugs>	 (03PS4) 10Elukey: eventlogging_purging_whitelist.tsv: remove old table [puppet] - 10https://gerrit.wikimedia.org/r/399153 (https://phabricator.wikimedia.org/T108850)
[16:04:38] <wikibugs>	 (03CR) 10Elukey: [V: 032 C: 032] eventlogging_purging_whitelist.tsv: remove old table [puppet] - 10https://gerrit.wikimedia.org/r/399153 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey)
[16:06:27] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3796474 (10MoritzMuehlenhoff) >  As I understand it, Go statically links everything, so the same build should still be good for bot...
[16:10:40] <wikibugs>	 (03PS1) 10Catrope: Remove temporary read-only setting for beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399207 (https://phabricator.wikimedia.org/T183252)
[16:10:59] <wikibugs>	 (03CR) 10Catrope: [C: 032] Remove temporary read-only setting for beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399207 (https://phabricator.wikimedia.org/T183252) (owner: 10Catrope)
[16:12:38] <wikibugs>	 (03Merged) 10jenkins-bot: Remove temporary read-only setting for beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399207 (https://phabricator.wikimedia.org/T183252) (owner: 10Catrope)
[16:12:52] <wikibugs>	 (03CR) 10jenkins-bot: Remove temporary read-only setting for beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399207 (https://phabricator.wikimedia.org/T183252) (owner: 10Catrope)
[16:14:55] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] "PCC happy at https://puppet-compiler.wmflabs.org/compiler02/9423/, let's break puppet on those hosts ;-)" [puppet] - 10https://gerrit.wikimedia.org/r/392441 (https://phabricator.wikimedia.org/T177196) (owner: 10Alexandros Kosiaris)
[16:15:02] <wikibugs>	 (03PS8) 10Alexandros Kosiaris: Add prometheus::postgres_exporter class to users [puppet] - 10https://gerrit.wikimedia.org/r/392441 (https://phabricator.wikimedia.org/T177196)
[16:15:04] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add prometheus::postgres_exporter class to users [puppet] - 10https://gerrit.wikimedia.org/r/392441 (https://phabricator.wikimedia.org/T177196) (owner: 10Alexandros Kosiaris)
[16:15:43] <wikibugs>	 10Operations, 10MediaWiki-Vagrant: Import kibana package from jessie into stretch - https://phabricator.wikimedia.org/T183071#3848623 (10EBernhardson) @gehel It looks like we need to release stretch packages for all our custom elastic stuff (kibana, logstash, es, plugins?). My intuition is that since this is a...
[16:16:52] <wikibugs>	 (03PS2) 10Volans: Jonas Kress move from ldap to shell, add to groups [puppet] - 10https://gerrit.wikimedia.org/r/398524 (https://phabricator.wikimedia.org/T182908) (owner: 10RobH)
[16:18:19] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: puppet-compiler: fix facts update process [puppet] - 10https://gerrit.wikimedia.org/r/399210
[16:18:55] <_joe_>	 volans, akosiaris ^^
[16:19:25] <wikibugs>	 10Operations, 10MediaWiki-Vagrant: Import kibana package from jessie into stretch - https://phabricator.wikimedia.org/T183071#3842659 (10MoritzMuehlenhoff) >>! In T183071#3848623, @EBernhardson wrote: > @gehel It looks like we need to release stretch packages for all our custom elastic stuff (kibana, logstash,...
[16:19:42] <wikibugs>	 10Operations, 10MediaWiki-Vagrant: Import kibana package from jessie into stretch - https://phabricator.wikimedia.org/T183071#3848646 (10Gehel) The elasticsearch / kibana / logstash packages have already been uploaded to our stretch repo (under the thirdparty/elastic55 component). This should fix the issue rep...
[16:20:05] <wikibugs>	 (03CR) 10Volans: [C: 032] Jonas Kress move from ldap to shell, add to groups [puppet] - 10https://gerrit.wikimedia.org/r/398524 (https://phabricator.wikimedia.org/T182908) (owner: 10RobH)
[16:20:19] <wikibugs>	 (03PS1) 10Awight: Disable the ORES UI on beta wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399211 (https://phabricator.wikimedia.org/T183266)
[16:21:44] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic: cp4032 memory error - https://phabricator.wikimedia.org/T183176#3848653 (10RobH) Error codes from ePSA test:  Service Tag : 3ND3KH2 Error Code : 2000-0125 Validation : 107826
[16:22:38] <moritzm>	 !log installing ncurses updates from jessie point release
[16:22:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:23:04] <icinga-wm>	 PROBLEM - puppet last run on maps1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Augeas[hba_create-prometheus@localhost]
[16:23:23] <icinga-wm>	 PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:23:54] <icinga-wm>	 RECOVERY - IPsec on cp1065 is OK: Strongswan OK - 44 ESP OK
[16:23:54] <icinga-wm>	 RECOVERY - IPsec on cp2004 is OK: Strongswan OK - 56 ESP OK
[16:23:54] <icinga-wm>	 RECOVERY - IPsec on cp2019 is OK: Strongswan OK - 56 ESP OK
[16:23:54] <icinga-wm>	 RECOVERY - IPsec on cp1055 is OK: Strongswan OK - 44 ESP OK
[16:24:03] <icinga-wm>	 RECOVERY - Host cp4032 is UP: PING OK - Packet loss = 0%, RTA = 78.58 ms
[16:24:03] <icinga-wm>	 RECOVERY - IPsec on kafka1020 is OK: Strongswan OK - 114 ESP OK
[16:24:03] <icinga-wm>	 RECOVERY - IPsec on kafka1012 is OK: Strongswan OK - 114 ESP OK
[16:24:04] <icinga-wm>	 RECOVERY - IPsec on cp2007 is OK: Strongswan OK - 56 ESP OK
[16:24:13] <icinga-wm>	 RECOVERY - IPsec on cp1068 is OK: Strongswan OK - 44 ESP OK
[16:24:13] <icinga-wm>	 RECOVERY - IPsec on kafka1022 is OK: Strongswan OK - 114 ESP OK
[16:24:13] <icinga-wm>	 RECOVERY - IPsec on kafka1013 is OK: Strongswan OK - 114 ESP OK
[16:24:23] <icinga-wm>	 RECOVERY - IPsec on cp2016 is OK: Strongswan OK - 56 ESP OK
[16:24:23] <icinga-wm>	 RECOVERY - IPsec on cp1053 is OK: Strongswan OK - 44 ESP OK
[16:24:26] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic: cp4032 memory error - https://phabricator.wikimedia.org/T183176#3848685 (10BBlack) Dell info says that code means: `The IPMI system event log is full for various reasons or logging has stopped because too many ECC errors have occurred.`
[16:24:35] <icinga-wm>	 RECOVERY - IPsec on cp2023 is OK: Strongswan OK - 56 ESP OK
[16:24:35] <icinga-wm>	 RECOVERY - IPsec on cp1052 is OK: Strongswan OK - 44 ESP OK
[16:24:35] <icinga-wm>	 RECOVERY - IPsec on cp2010 is OK: Strongswan OK - 56 ESP OK
[16:24:35] <icinga-wm>	 RECOVERY - IPsec on cp2001 is OK: Strongswan OK - 56 ESP OK
[16:24:35] <icinga-wm>	 RECOVERY - IPsec on cp1067 is OK: Strongswan OK - 44 ESP OK
[16:24:43] <icinga-wm>	 RECOVERY - IPsec on cp1054 is OK: Strongswan OK - 44 ESP OK
[16:24:43] <icinga-wm>	 RECOVERY - IPsec on kafka1023 is OK: Strongswan OK - 114 ESP OK
[16:24:44] <icinga-wm>	 RECOVERY - IPsec on cp1066 is OK: Strongswan OK - 44 ESP OK
[16:24:53] <icinga-wm>	 RECOVERY - IPsec on kafka1014 is OK: Strongswan OK - 114 ESP OK
[16:24:53] <icinga-wm>	 RECOVERY - IPsec on cp2013 is OK: Strongswan OK - 56 ESP OK
[16:25:54] <icinga-wm>	 PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Augeas[hba_create-prometheus@localhost]
[16:27:23] <icinga-wm>	 PROBLEM - https://phabricator.wikimedia.org on phab1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string focus on bug not found on https://phabricator.wikimedia.org:443https://phabricator.wikimedia.org/ - 4297 bytes in 2.051 second response time
[16:27:55] <wikibugs>	 (03CR) 10Halfak: [C: 031] Disable the ORES UI on beta wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399211 (https://phabricator.wikimedia.org/T183266) (owner: 10Awight)
[16:28:17] <bblack>	 A Troublesome Encounter!
[16:28:19] <bblack>	 Woe! This request had its journey cut short by unexpected circumstances (Can Not Connect to MySQL).
[16:28:23] <bblack>	 ^ is what phab is saying
[16:28:57] <icinga-wm>	 PROBLEM - puppet last run on maps2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Augeas[hba_create-prometheus@localhost]
[16:29:17] <jynus>	 it is now back, right?
[16:29:17] <_joe_>	 bblack: known, jynus is on it
[16:29:18] <icinga-wm>	 RECOVERY - https://phabricator.wikimedia.org on phab1001 is OK: HTTP OK: HTTP/1.1 200 OK - 34525 bytes in 0.297 second response time
[16:29:27] <_joe_>	 can someone look at that augeas thing?
[16:29:35] <jynus>	 where is phabricator installed?
[16:29:36] <_joe_>	 akosiaris: maps2001 is you?
[16:29:39] <akosiaris>	 I am the reason for augeas
[16:29:43] <_joe_>	 jynus: phab1001/2001
[16:29:46] <akosiaris>	 I am fixing... trying to at least
[16:29:52] <wikibugs>	 (03CR) 10Awight: [C: 032] "Self-merging "urgent" beta change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399211 (https://phabricator.wikimedia.org/T183266) (owner: 10Awight)
[16:30:04] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic: cp4032 memory error - https://phabricator.wikimedia.org/T183176#3848695 (10RobH) Yeah, it turns up nothing but the error codes for the actual failed dimm.  It doesn't matter much, just helps for the part replacement.   SR958387090 is the self dispatch part # for the repl...
[16:30:37] <jynus>	 so the firewall was the expected, but it is not allowing all phab clients, apparently
[16:30:42] <robh>	 bblack: ^ new memory should be here tomorrow, and ill either put it in then or thursday =]
[16:32:14] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Remove single quotes from netbox prometheus [puppet] - 10https://gerrit.wikimedia.org/r/399216
[16:32:55] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] Remove single quotes from netbox prometheus [puppet] - 10https://gerrit.wikimedia.org/r/399216 (owner: 10Alexandros Kosiaris)
[16:33:07] <icinga-wm>	 PROBLEM - puppet last run on netmon1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:33:27] <wikibugs>	 (03Merged) 10jenkins-bot: Disable the ORES UI on beta wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399211 (https://phabricator.wikimedia.org/T183266) (owner: 10Awight)
[16:33:40] <wikibugs>	 (03CR) 10jenkins-bot: Disable the ORES UI on beta wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399211 (https://phabricator.wikimedia.org/T183266) (owner: 10Awight)
[16:33:57] <icinga-wm>	 PROBLEM - puppet last run on nihal is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 8 seconds ago with 1 failures. Failed resources (up to 3 shown): Augeas[hba_create-prometheus@localhost]
[16:34:07] <icinga-wm>	 PROBLEM - puppet last run on nitrogen is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Augeas[hba_create-prometheus@localhost]
[16:34:17] <akosiaris>	 all these expected ^ kind of
[16:34:36] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3848717 (10Ottomata) > While elasticsearch is JVM, the exporter is Go ?    Huh, I thought we were talking about prometheus-jmx-expo...
[16:34:39] <elukey>	 !log manually started eventlogging cleaner on db1107 to purge/sanitize data up to 90 days ago (tmux is running for user eventlogcleaner) - T108850
[16:34:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:50] <stashbot>	 T108850: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850
[16:35:59] <awight>	 RoanKattouw: FYI I’m pushing “Remove temporary read-only setting for beta labs” on tin and tin-beta to keep in sync (and cos I have a beta config patch that comes after it).
[16:36:10] <RoanKattouw>	 awight: Thanks and sorry
[16:36:38] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399217
[16:36:42] <wikibugs>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399217
[16:36:44] <awight>	 RoanKattouw: :D IOU several dozen awesome late-night/weekend saves, don’t be sorry.
[16:37:00] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: osm::slave: Correct dependency of prometheus [puppet] - 10https://gerrit.wikimedia.org/r/399218
[16:37:23] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] osm::slave: Correct dependency of prometheus [puppet] - 10https://gerrit.wikimedia.org/r/399218 (owner: 10Alexandros Kosiaris)
[16:37:27] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on dbproxy1003 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[16:37:54] <RoanKattouw>	 awight: Thursday's ORES breakage is actually directly responsible for me traveling to a freezing environment without a coat :D
[16:38:22] <awight>	 That’s terrible news.  Feel free to burn any ORES you still have clinging to your T-shirt.
[16:38:31] <RoanKattouw>	 (I had planned to go home around 4pm to pack, then go to a 6pm meeting, then sleep, but instead I ended up fighting the ORES fire from 4pm till about 5:40pm, and packing hastily around midnight, forgetting my coat)
[16:38:37] <awight>	 And buy a new coat…
[16:38:42] <logmsgbot>	 !log awight@tin Synchronized wmf-config/CirrusSearch-labs.php: wmf-config/CommonSettings-labs.php wmf-config/db-labs.php wmf-config/InitialiseSettings-labs.php wmf-config/interwiki-labs.php wmf-config/jobqueue-labs.php wmf-config/mc-labs.php wmf-config/mobile-labs.php wmf-config/Wikibase-labs.php Sync out labs config changes (duration: 00m 51s)
[16:38:45] <RoanKattouw>	 Oh well, I didn't like that coat anyway, so now I have a good excuse to buy a new one
[16:38:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:38:57] <awight>	 lol Salvation Army here we come
[16:39:27] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on dbproxy1005 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[16:39:28] <halfak>	 Oh no RoanKattouw!
[16:40:07] <RoanKattouw>	 Thankfully it is less freezing now, it's in the mid-40s
[16:40:51] <awight>	 halfak: I think we actually owe RoanKattouw a coat at least as nice as the ones we have :p
[16:41:05] <RoanKattouw>	 awight: Don't you live in a warm country now?
[16:41:06] <awight>	 RoanKattouw: You ever read this Gogol short story… The Overcoat...
[16:41:19] <awight>	 RoanKattouw: ;-) rats you got right to the bottom of that bluff
[16:41:28] <halfak>	 I dunno man.  The coats I carry around cost almost as much as my bike. 
[16:41:39] <halfak>	 I live in the land of necessary winter technology. 
[16:42:04] <SantaC>	 Nowhere can be as cold on average as Yakutsk, Russia
[16:42:38] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on dbproxy1010 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[16:42:43] <halfak>	 SantaC, yeah, on average, MN is pretty warm.  But -40 F/C is a common occurrence in a MN winter. 
[16:42:55] <halfak>	 We get hot in the summer and cold in the winter. :) 
[16:42:59] <wikibugs>	 (03CR) 10Addshore: "Woo!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399211 (https://phabricator.wikimedia.org/T183266) (owner: 10Awight)
[16:43:27] <halfak>	 Looks like Yakutsk gets a lot colder than MN though
[16:43:33] <wikibugs>	 (03CR) 10Addshore: "Seems lame that the solution is to disable the Ui though :(" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399211 (https://phabricator.wikimedia.org/T183266) (owner: 10Awight)
[16:43:36] <halfak>	 https://weatherspark.com/y/142848/Average-Weather-in-Yakutsk-Russia-Year-Round
[16:43:46] <SantaC>	 freeze off your face cold
[16:44:55] <wikibugs>	 (03CR) 10Awight: "@addshore: I agree that this is a lame workaround, but see the task for details.  There are two issues, one is that the models just should" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399211 (https://phabricator.wikimedia.org/T183266) (owner: 10Awight)
[16:45:25] <marostegui>	 !log Defragment s7 databases on db1102 - https://phabricator.wikimedia.org/T172169
[16:45:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:47:41] <wikibugs>	 (03CR) 10Catrope: "Per the task, there is a maintenance script that we could run to put the DB in a good state, but we can't run it right now because it need" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399211 (https://phabricator.wikimedia.org/T183266) (owner: 10Awight)
[16:47:47] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on dbproxy1011 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[16:48:23] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Fix bug causing too restrictive firewall on some proxies [puppet] - 10https://gerrit.wikimedia.org/r/399221
[16:48:40] <RoanKattouw>	 awight: To be completely honest I've actually forgotten which country you moved to, I just remembered that it sounded warm :)
[16:49:18] <awight>	 RoanKattouw: hehe, not your job to worry.  I’m in the Sacred Valley, Peru.  Did you go back home to bikelandia for the holidays?
[16:49:36] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Fix bug causing too restrictive firewall on some proxies [puppet] - 10https://gerrit.wikimedia.org/r/399221
[16:49:45] <awight>	 It is rainy season but I’m fine with that.  It’s magnificent to watch the irrigation canals here.
[16:50:26] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399217 (owner: 10Marostegui)
[16:50:30] <RoanKattouw>	 I did
[16:50:47] <RoanKattouw>	 -1C on Sunday but now back up to about 6C
[16:51:40] <RoanKattouw>	 awight: Nice! When you said you were considering moving away from the suburb you were in, I didn't think you'd go quite that far :D
[16:51:44] <awight>	 gross.  Don’t go licking any chain link fences.
[16:52:06] <awight>	 haha I don’t do anything half-assed.  Unless it’s writing code and config patches for a top-5 website.
[16:53:07] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399217 (owner: 10Marostegui)
[16:53:20] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399217 (owner: 10Marostegui)
[16:54:17] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1097:3315 - T161294 (duration: 00m 51s)
[16:54:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:54:27] <stashbot>	 T161294: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294
[16:54:31] <wikibugs>	 (03PS1) 10Rush: dumps: add wikidata-primary-sources-tool mount [puppet] - 10https://gerrit.wikimedia.org/r/399223 (https://phabricator.wikimedia.org/T183229)
[16:56:10] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Patch-For-Review, 10User-Addshore: Requesting access to analytics-privatedata-users group for Jonas Kress - https://phabricator.wikimedia.org/T182908#3848782 (10Volans)
[16:57:02] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10User-Addshore: Requesting access to analytics-privatedata-users group for Jonas Kress - https://phabricator.wikimedia.org/T182908#3838373 (10Volans) 05Open>03Resolved a:03Volans All done, resolving.
[16:58:26] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Fix bug causing too restrictive firewall on some proxies [puppet] - 10https://gerrit.wikimedia.org/r/399221 (owner: 10Jcrespo)
[16:58:46] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3848793 (10EBernhardson) Unfortunately all of the elasticsearch-specific metrics are no exposed over jmx. We can get generic JVM in...
[17:00:10] <moritzm>	 !log installing libxv security updates on jessie
[17:00:25] <icinga-wm>	 RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[17:00:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:03:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Add library hint for libxv [puppet] - 10https://gerrit.wikimedia.org/r/399227
[17:04:17] <wikibugs>	 (03CR) 10Chad: [C: 032] Remove unfinished/broken branch plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399116 (owner: 10Chad)
[17:06:14] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: postgresql::user: Differentiate augeas on type=local [puppet] - 10https://gerrit.wikimedia.org/r/399228
[17:06:45] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on dbproxy1007 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[17:06:46] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on dbproxy1004 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[17:07:40] <wikibugs>	 (03PS2) 10Chad: Remove unfinished/broken branch plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399116
[17:09:14] <wikibugs>	 (03CR) 10Chad: [C: 031] Remove unfinished/broken branch plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399116 (owner: 10Chad)
[17:09:17] <wikibugs>	 (03CR) 10Chad: [C: 032] Remove unfinished/broken branch plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399116 (owner: 10Chad)
[17:10:45] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] postgresql::user: Differentiate augeas on type=local [puppet] - 10https://gerrit.wikimedia.org/r/399228 (owner: 10Alexandros Kosiaris)
[17:11:19] <jynus>	 !log purging ferm from dbproxy1002, 3, 6, 9, 10 and 11
[17:11:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:38] <wikibugs>	 (03Merged) 10jenkins-bot: Remove unfinished/broken branch plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399116 (owner: 10Chad)
[17:13:55] <icinga-wm>	 RECOVERY - puppet last run on nihal is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:14:16] <wikibugs>	 (03CR) 10Chad: [C: 032] All kinds of pylint and other style fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399117 (owner: 10Chad)
[17:14:35] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on dbproxy1005 is OK: OK ferm input default policy is set
[17:15:55] <icinga-wm>	 RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[17:17:05] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on dbproxy1006 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[17:17:18] <wikibugs>	 (03CR) 10jenkins-bot: Remove unfinished/broken branch plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399116 (owner: 10Chad)
[17:18:05] <icinga-wm>	 RECOVERY - puppet last run on maps1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:18:05] <icinga-wm>	 RECOVERY - puppet last run on netmon1002 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures
[17:18:56] <icinga-wm>	 RECOVERY - puppet last run on maps2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[17:19:06] <icinga-wm>	 RECOVERY - puppet last run on nitrogen is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[17:19:12] <wikibugs>	 (03Merged) 10jenkins-bot: All kinds of pylint and other style fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399117 (owner: 10Chad)
[17:20:32] <wikibugs>	 (03CR) 10jenkins-bot: All kinds of pylint and other style fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/399117 (owner: 10Chad)
[17:23:30] <wikibugs>	 (03PS1) 10Ayounsi: LibreNMS: fix issue where service ircbot is declared twice [puppet] - 10https://gerrit.wikimedia.org/r/399230
[17:25:59] <wikibugs>	 (03PS1) 10Elukey: Revert "role::druid::analytics: lower down all the Xms settings" [puppet] - 10https://gerrit.wikimedia.org/r/399231
[17:26:06] <wikibugs>	 (03PS2) 10Elukey: Revert "role::druid::analytics: lower down all the Xms settings" [puppet] - 10https://gerrit.wikimedia.org/r/399231
[17:26:29] <wikibugs>	 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Port postgresql metrics to Prometheus - https://phabricator.wikimedia.org/T179306#3848869 (10akosiaris) Apart from netmon2001 who has some puppet issues, all other postgres dbs have now the reporter installed and seemingly working fine.  Let's cr...
[17:26:35] <wikibugs>	 (03CR) 10Elukey: [C: 032] Revert "role::druid::analytics: lower down all the Xms settings" [puppet] - 10https://gerrit.wikimedia.org/r/399231 (owner: 10Elukey)
[17:26:59] <wikibugs>	 (03CR) 10Ayounsi: [C: 032] LibreNMS: fix issue where service ircbot is declared twice [puppet] - 10https://gerrit.wikimedia.org/r/399230 (owner: 10Ayounsi)
[17:27:05] <wikibugs>	 (03CR) 10Ayounsi: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9427/" [puppet] - 10https://gerrit.wikimedia.org/r/399230 (owner: 10Ayounsi)
[17:27:17] <wikibugs>	 (03PS2) 10Ayounsi: LibreNMS: fix issue where service ircbot is declared twice [puppet] - 10https://gerrit.wikimedia.org/r/399230
[17:27:33] <wikibugs>	 (03CR) 10Ayounsi: [V: 032 C: 032] LibreNMS: fix issue where service ircbot is declared twice [puppet] - 10https://gerrit.wikimedia.org/r/399230 (owner: 10Ayounsi)
[17:29:22] <wikibugs>	 (03PS1) 10Andrew Bogott: tools exim: Fixes for our simple route-to-mail-relay setup [puppet] - 10https://gerrit.wikimedia.org/r/399233 (https://phabricator.wikimedia.org/T183171)
[17:33:37] <wikibugs>	 10Operations, 10Cloud-Services, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10cloud-services-team (Kanban): Recover "Flominator" svn account for use as a modern developer account - https://phabricator.wikimedia.org/T180813#3848896 (10bd808) 05Open>03Resolved
[17:37:20] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] tools exim: Fixes for our simple route-to-mail-relay setup [puppet] - 10https://gerrit.wikimedia.org/r/399233 (https://phabricator.wikimedia.org/T183171) (owner: 10Andrew Bogott)
[17:39:27] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3848913 (10Ottomata) Ah ok, my response was to Filippo asking how to build prometheus-jmx-exporter.
[17:39:43] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Firewall configurations for database hosts - https://phabricator.wikimedia.org/T104699#3848914 (10jcrespo) Firewall has been enabled on all proxies except the active ones:  ``` dbproxy1002.yaml:profile::mariadb::proxy::firewall: 'disabled' dbproxy1003.yaml:profile::ma...
[17:40:34] <wikibugs>	 (03PS1) 10Andrew Bogott: tools exim: ipv6 future-proof the exim config [puppet] - 10https://gerrit.wikimedia.org/r/399234
[17:42:38] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] tools exim: ipv6 future-proof the exim config [puppet] - 10https://gerrit.wikimedia.org/r/399234 (owner: 10Andrew Bogott)
[17:45:52] <wikibugs>	 (03PS4) 10Zoranzoki21: Redirect techblog.wikimedia.org to blog.wikimedia.org/c/technology [puppet] - 10https://gerrit.wikimedia.org/r/394743 (https://phabricator.wikimedia.org/T181878) (owner: 10Framawiki)
[17:46:37] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Reimage and upgrade to stretch all dbproxies - https://phabricator.wikimedia.org/T183249#3848937 (10jcrespo) dbproxy1001 has been successfully reimaged, which joins the already upgraded to stretch dbproxy1004 and dbproxy1009 (although these one have to yet be reconfig...
[17:58:00] <volans>	 nuria_: just in case you missed it, to let you know about T181952
[17:58:00] <stashbot>	 T181952: Requesting access to EventLogging data for Vinitha - https://phabricator.wikimedia.org/T181952
[17:59:29] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp4021 is OK: OK: expiry mailbox lag is 0
[18:00:13] <nuria_>	 volans: let me see
[18:00:45] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10AICaptcha, 10WMF-NDA-Requests: Requesting access to EventLogging data for Vinitha - https://phabricator.wikimedia.org/T181952#3848976 (10Nuria) Updating ticket from conversation on e-mail. To grant access two things are needed:  - the date at which access will expire...
[18:00:56] <nuria_>	 volans: ah, sorry i though i had updated that ticket ages ago!
[18:01:10] <nuria_>	 volans: updated now, had talked to tgr|away about it already
[18:01:16] <nuria_>	 volans: good remainder
[18:01:30] <wikibugs>	 10Operations, 10Puppet: Trusty puppet 4 approach - https://phabricator.wikimedia.org/T182894#3837838 (10MoritzMuehlenhoff) >>! In T182894#3841972, @herron wrote: > A first stab at Trusty packages for puppet 4.8.2 and dependencies (hiera, ruby-deep-merge) have been built on boron (in /var/cache/pbuilder/result/...
[18:02:04] <volans>	 nuria_: that's perfect, thanks a lot
[18:03:03] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10AICaptcha, 10WMF-NDA-Requests: Requesting access to EventLogging data for Vinitha - https://phabricator.wikimedia.org/T181952#3848980 (10Volans) a:05Nuria>03None
[18:04:08] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10AICaptcha, 10WMF-NDA-Requests: Requesting access to EventLogging data for Vinitha - https://phabricator.wikimedia.org/T181952#3848981 (10Tgr)
[18:05:53] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10AICaptcha, 10WMF-NDA-Requests: Requesting access to EventLogging data for Vinitha - https://phabricator.wikimedia.org/T181952#3807578 (10Tgr)
[18:14:54] <moritzm>	 !log installing zsh update from stretch point release
[18:15:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:22:21] <wikibugs>	 (03CR) 10Zoranzoki21: [C: 031] Enable TemplateStyles extension on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394831 (https://phabricator.wikimedia.org/T176082) (owner: 10Jon Harald Søby)
[18:25:18] <icinga-wm>	 PROBLEM - Disk space on kafka1023 is CRITICAL: DISK CRITICAL - free space: /var/spool/kafka/c 71227 MB (3% inode=99%): /var/spool/kafka/b 118950 MB (6% inode=99%)
[18:25:54] <wikibugs>	 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Investigate why ORES logs are being written to syslog despite explicit logging config.  Fix. - https://phabricator.wikimedia.org/T182614#3849065 (10awight) This is deployed on the beta cluster, but isn't working.  I think I accide...
[18:26:02] <volans>	 ottomata: FYI ^^^^
[18:26:19] <wikibugs>	 10Operations: Integrate stretch 9.3 point update - https://phabricator.wikimedia.org/T182655#3849066 (10MoritzMuehlenhoff) These are fully rolled out: xml2 libxkbcommon python2.7
[18:28:50] <ottomata>	 yeahhhhhh
[18:28:51] <ottomata>	 :)
[18:28:55] <ottomata>	 elukey:  ^^^
[18:31:18] <icinga-wm>	 PROBLEM - Disk space on kafka1023 is CRITICAL: DISK CRITICAL - free space: /var/spool/kafka/c 71270 MB (3% inode=99%): /var/spool/kafka/b 118078 MB (6% inode=99%)
[18:32:49] <elukey>	 yeah we investigated it today :)
[18:33:00] <elukey>	 ottomata: need to step away from keyboard for ~1h, brb 
[18:39:43] <wikibugs>	 10Operations, 10ops-eqsin, 10netops: setup and deploy eqsin network infrastructure - https://phabricator.wikimedia.org/T181558#3849122 (10ayounsi) 05Open>03Resolved
[18:41:18] <icinga-wm>	 PROBLEM - Disk space on kafka1023 is CRITICAL: DISK CRITICAL - free space: /var/spool/kafka/c 71155 MB (3% inode=99%): /var/spool/kafka/b 117649 MB (6% inode=99%)
[18:46:32] <wikibugs>	 10Operations, 10Patch-For-Review: Debian Jessie reimage/install ends up in kernel panic with 8.10 netboot image - https://phabricator.wikimedia.org/T182702#3849127 (10MoritzMuehlenhoff) Unfortunately there won't be rebuilt netinst images until the next point release: https://bugs.debian.org/cgi-bin/bugreport.c...
[18:52:46] <wikibugs>	 (03PS7) 10Alexandros Kosiaris: Introduce profile::prometheus::k8s::staging [puppet] - 10https://gerrit.wikimedia.org/r/399160
[18:54:17] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1059 - https://phabricator.wikimedia.org/T182853#3849130 (10Cmjohnson) There were 2 failed disks. Replaced both and they're rebuilding  Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online,...
[18:57:42] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1059 - https://phabricator.wikimedia.org/T182853#3849134 (10Marostegui) Thank you!
[18:58:10] <wikibugs>	 10Operations, 10ops-eqiad: Possible memory errors on ganeti1005, ganeti1006, ganeti1008 - https://phabricator.wikimedia.org/T181121#3780358 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by akosiaris on neodymium.eqiad.wmnet for hosts: ``` ganeti1006.eqiad.wmnet ``` The log can be found in `/var/l...
[18:58:42] <wikibugs>	 10Operations, 10ops-eqiad: Possible memory errors on ganeti1005, ganeti1006, ganeti1008 - https://phabricator.wikimedia.org/T181121#3849139 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['ganeti1006.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['ganeti1006.eqiad.wmnet'] ```
[19:01:03] <wikibugs>	 10Operations, 10ops-eqiad: Possible memory errors on ganeti1005, ganeti1006, ganeti1008 - https://phabricator.wikimedia.org/T181121#3849141 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by akosiaris on neodymium.eqiad.wmnet for hosts: ``` ganeti1006.eqiad.wmnet ``` The log can be found in `/var/l...
[19:12:28] <icinga-wm>	 PROBLEM - HHVM rendering on mw2150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:13:19] <icinga-wm>	 RECOVERY - HHVM rendering on mw2150 is OK: HTTP OK: HTTP/1.1 200 OK - 73976 bytes in 0.296 second response time
[19:16:28] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on dbproxy1001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[19:16:59] <wikibugs>	 (03PS1) 10Herron: tools exim: allow relay of unqualified mails via localhost smtp [puppet] - 10https://gerrit.wikimedia.org/r/399240 (https://phabricator.wikimedia.org/T183171)
[19:18:22] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] tools exim: allow relay of unqualified mails via localhost smtp [puppet] - 10https://gerrit.wikimedia.org/r/399240 (https://phabricator.wikimedia.org/T183171) (owner: 10Herron)
[19:18:28] <mutante>	 !log gerrit2001 - reboot for kernel upgrade
[19:18:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:22:49] <icinga-wm>	 PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:23:52] <wikibugs>	 (03PS6) 10ArielGlenn: dataset1001 rsync to labs of dumps can now use explicit inclusion list [puppet] - 10https://gerrit.wikimedia.org/r/336204 (https://phabricator.wikimedia.org/T154798)
[19:23:57] <mutante>	 !log webperf1001/webperf2001 - rebooting for kernel upgrades (not used yet)
[19:24:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:28:57] <wikibugs>	 (03PS2) 10Dzahn: remove ganglia_aggregators settings from hiera [puppet] - 10https://gerrit.wikimedia.org/r/399120 (https://phabricator.wikimedia.org/T177225)
[19:30:07] <wikibugs>	 (03CR) 10Dzahn: [C: 032] remove ganglia_aggregators settings from hiera [puppet] - 10https://gerrit.wikimedia.org/r/399120 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[19:35:46] <wikibugs>	 10Operations, 10ops-eqiad: Possible memory errors on ganeti1005, ganeti1006, ganeti1008 - https://phabricator.wikimedia.org/T181121#3849185 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['ganeti1006.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['ganeti1006.eqiad.wmnet'] ```
[19:38:21] <icinga-wm>	 RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 15 minutes ago with 0 failures
[19:40:37] <wikibugs>	 (03CR) 10Dzahn: [C: 032] Migrate contint::worker_localhost to a profile [puppet] - 10https://gerrit.wikimedia.org/r/398227 (owner: 10Hashar)
[19:40:44] <wikibugs>	 10Operations, 10ops-eqiad: Possible memory errors on ganeti1005, ganeti1006, ganeti1008 - https://phabricator.wikimedia.org/T181121#3849219 (10Volans) @akosiaris if you're trying to reimage those as Jessie, we still have the netinst issue open, so you need to set numa=off to unblock it, see T182702.
[19:40:46] <wikibugs>	 (03PS2) 10Dzahn: Migrate contint::worker_localhost to a profile [puppet] - 10https://gerrit.wikimedia.org/r/398227 (owner: 10Hashar)
[19:48:10] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "webserver was already down since yesterday, removing" [dns] - 10https://gerrit.wikimedia.org/r/399124 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[19:48:51] <wikibugs>	 (03PS7) 10ArielGlenn: dataset1001 rsync to labs of dumps can now use explicit inclusion list [puppet] - 10https://gerrit.wikimedia.org/r/336204 (https://phabricator.wikimedia.org/T154798)
[19:49:20] <hashar>	 mutante:  thanks :)
[19:49:23] <mutante>	 !log deleted ganglia.wikimedia.org from DNS - webserver was already down since yesterday - not used anymore (T177225)
[19:49:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:49:34] <stashbot>	 T177225: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225
[19:49:36] <hashar>	 ohh and Ganglia is gone!!!
[19:49:50] <mutante>	 hashar: welcome! how about the one in integration/config that linked to that :)
[19:49:55] <mutante>	 one number above 
[19:49:58] <hashar>	 I merged it :)
[19:50:01] <mutante>	 oK:)
[19:50:21] <hashar>	 which is really just about chagning:   include contint::foo   with include profile::ci::foo :D
[19:50:27] <mutante>	 hashar: yea, very very close to gone
[19:50:48] <mutante>	 a few more merges, like 'delete the module'
[19:51:04] <hashar>	 1814 warnings to go https://integration.wikimedia.org/ci/job/operations-puppet-wmf-style-guide/lastBuild/ :)
[19:51:11] <mutante>	 and "dont call it ganglia_clusters anymore in Hiera even though it is now unrelated"
[19:51:44] <mutante>	 hashar: :) i worked on site.pp 
[19:51:49] <mutante>	 for the style count
[19:51:57] <mutante>	 like appserver includes -9
[19:52:10] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on kafka1023 is CRITICAL: DISK CRITICAL - free space: /var/spool/kafka/c 67310 MB (3% inode=99%): /var/spool/kafka/b 113031 MB (6% inode=99%): ottomata elukey and I will fix this tomorrow EU morning.
[19:52:29] <hashar>	 ah nice
[19:52:58] <mutante>	 hashar: all site.pp https://gerrit.wikimedia.org/r/#/q/topic:site-includes  :p
[19:54:06] <hashar>	 I have another one for the CI boxes if you dont mind https://gerrit.wikimedia.org/r/#/c/397787/
[19:54:35] <hashar>	 which remove XDebug from the list of Zend extensions that are enabled by default
[19:54:46] <hashar>	 because that slows down PHP :]
[19:54:54] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1059 - https://phabricator.wikimedia.org/T182853#3849238 (10Marostegui) 05Open>03Resolved All good! Thanks  ``` root@db1059:~# megacli -LDInfo -L0 -a0   Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name                : RAID...
[19:55:27] <mutante>	 reads.. and yea
[19:55:29] <icinga-wm>	 RECOVERY - MegaRAID on db1059 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[19:56:09] <wikibugs>	 (03PS3) 10Dzahn: contint: disable XDebug by default [puppet] - 10https://gerrit.wikimedia.org/r/397787 (https://phabricator.wikimedia.org/T175028) (owner: 10Hashar)
[19:56:58] <wikibugs>	 (03CR) 10Dzahn: [C: 032] contint: disable XDebug by default [puppet] - 10https://gerrit.wikimedia.org/r/397787 (https://phabricator.wikimedia.org/T175028) (owner: 10Hashar)
[19:56:59] <wikibugs>	 (03PS1) 10Rush: openstack: whitelist kernel versions for compute [puppet] - 10https://gerrit.wikimedia.org/r/399243
[19:57:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: whitelist kernel versions for compute [puppet] - 10https://gerrit.wikimedia.org/r/399243 (owner: 10Rush)
[20:00:08] <hashar>	 mutante: the PHPUnit tests thank you in advance :D
[20:00:19] <mutante>	 [contint1001:~] $ /usr/sbin/php5query -s cli -m xdebug
[20:00:19] <mutante>	 No module matches xdebug
[20:00:25] <mutante>	 i didnt see puppet do anything
[20:00:32] <mutante>	 but the module is already not loaded
[20:00:33] <hashar>	 cause I already cherry picked it
[20:00:41] <hashar>	 and contint1001 probably doesn't have that class enabled anyway :]
[20:01:01] <hashar>	 contint1001 just host Zuul/Jenkins, all the rest is in labs/cloudVps/xxxthing :)
[20:01:10] <mutante>	 oh, i expected this one was contin1001-affecting
[20:01:14] <mutante>	 as opposed to the one before
[20:01:23] <mutante>	 ok
[20:09:46] <wikibugs>	 (03PS2) 10Dzahn: ganglia: delete the module [puppet] - 10https://gerrit.wikimedia.org/r/382933 (https://phabricator.wikimedia.org/T177225)
[20:10:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ganglia: delete the module [puppet] - 10https://gerrit.wikimedia.org/r/382933 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[20:13:22] <wikibugs>	 (03PS3) 10Dzahn: ganglia: delete the module [puppet] - 10https://gerrit.wikimedia.org/r/382933 (https://phabricator.wikimedia.org/T177225)
[20:17:40] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1044 - https://phabricator.wikimedia.org/T181696#3849296 (10Cmjohnson) Disks are wiped
[20:17:59] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Kanban, 10hardware-requests: Decommission db104[67] - https://phabricator.wikimedia.org/T181784#3849298 (10Cmjohnson) Disks are wiped
[20:18:09] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Kanban, 10hardware-requests: Decommission db104[67] - https://phabricator.wikimedia.org/T181784#3849299 (10Cmjohnson)
[20:18:26] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1044 - https://phabricator.wikimedia.org/T181696#3849300 (10Cmjohnson)
[20:18:35] <wikibugs>	 (03PS1) 10Dzahn: redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248
[20:18:44] <wikibugs>	 (03PS1) 10Hashar: contint: worker_localhost had the jenkins user hardcoded [puppet] - 10https://gerrit.wikimedia.org/r/399249
[20:18:44] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1015 - https://phabricator.wikimedia.org/T173570#3849302 (10Cmjohnson)
[20:19:00] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1021 - https://phabricator.wikimedia.org/T181378#3849305 (10Cmjohnson)
[20:19:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248 (owner: 10Dzahn)
[20:19:18] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1026 - https://phabricator.wikimedia.org/T174763#3849306 (10Cmjohnson)
[20:19:38] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1045 - https://phabricator.wikimedia.org/T174806#3849307 (10Cmjohnson)
[20:19:54] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1049 - https://phabricator.wikimedia.org/T175264#3849308 (10Cmjohnson)
[20:20:08] <wikibugs>	 (03CR) 10Dzahn: [C: 032] ganglia: delete the module [puppet] - 10https://gerrit.wikimedia.org/r/382933 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn)
[20:20:17] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1050 - https://phabricator.wikimedia.org/T178162#3849310 (10Cmjohnson)
[20:20:22] <wikibugs>	 (03PS4) 10Dzahn: ganglia: delete the module [puppet] - 10https://gerrit.wikimedia.org/r/382933 (https://phabricator.wikimedia.org/T177225)
[20:21:28] <hashar>	 mutante: despite rspec testing, I failled one of the change for  contint::worker_localhost  The fix up being  https://gerrit.wikimedia.org/r/399249 :D
[20:22:59] <mutante>	 yea, but i need to finish this one i am at. in a minute
[20:23:15] <hashar>	 no worries, it is not that urgent :]
[20:23:19] <wikibugs>	 (03PS2) 10Dzahn: redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248
[20:23:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248 (owner: 10Dzahn)
[20:24:54] <mutante>	 hashar: you know what's my most common reason for -1 probably?  The space between "Bug:" and "T12345" in the commit message :)
[20:24:55] <stashbot>	 T12345: Create "annotation" namespace on Hebrew Wikisource - https://phabricator.wikimedia.org/T12345
[20:25:14] <wikibugs>	 (03PS3) 10Dzahn: redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248 (https://phabricator.wikimedia.org/T177225)
[20:25:39] <mutante>	 just keep doing it all the time
[20:25:42] <hashar>	 mutante: you can probalby have it checked automatically whenever you do a commit
[20:25:53] <hashar>	 or find a vim rule to highlight it in red hehe
[20:26:16] <wikibugs>	 (03PS2) 10Dzahn: contint: worker_localhost had the jenkins user hardcoded [puppet] - 10https://gerrit.wikimedia.org/r/399249 (owner: 10Hashar)
[20:26:34] <mutante>	 hashar: yea, that's right, i should highlight it :)
[20:27:45] <wikibugs>	 (03CR) 10Dzahn: [C: 032] contint: worker_localhost had the jenkins user hardcoded [puppet] - 10https://gerrit.wikimedia.org/r/399249 (owner: 10Hashar)
[20:29:39] <wikibugs>	 (03PS4) 10Dzahn: redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248 (https://phabricator.wikimedia.org/T177225)
[20:29:41] <wikibugs>	 (03PS5) 10Dzahn: redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248 (https://phabricator.wikimedia.org/T177225)
[20:31:13] <hashar>	 mutante: https://phabricator.wikimedia.org/P6488 :)
[20:31:22] <hashar>	 ie run the tests before doing a commit
[20:32:21] <mutante>	 ah :) thanks!
[20:32:50] <hashar>	 -j1 is to run the tasks serially, that usually makes it easier to spot the issue
[20:34:53] <mutante>	 "The page you requested was not found, or you do not have permission to view this page."  eh.. Gerrit? 
[20:36:34] <mutante>	 and it's fine 
[20:38:35] <hashar>	 mutante: puppet fixed !:)
[20:39:04] <mutante>	 hashar: :)
[20:40:11] <paladox>	 mutante it's a draft you are trying to view.
[20:40:28] <mutante>	 ohh :)
[20:40:57] <mutante>	 i did not intend to use the draft feature
[20:41:22] <paladox>	 did you push it normally like git push? or did you add %draft
[20:41:44] <mutante>	 git review
[20:41:50] <mutante>	 and some rebasing
[20:42:14] <paladox>	 Using the rest api i got some details on the patches
[20:42:20] <mutante>	 how do you turn it from draft to final 
[20:42:22] <mutante>	 if you cant access it
[20:42:32] <paladox>	 https://phabricator.wikimedia.org/P6489
[20:43:03] <paladox>	 mutante requires an admin to do it i think. Though i've never switched the patch from drafts to normal patch using the command line.
[20:43:25] <mutante>	 it can just be abandoned, i wouldnt care
[20:43:31] <mutante>	 very easy to redo
[20:43:55] <mutante>	 and thanks for pointing this out, i really dont know why it became a draft
[20:44:53] <paladox>	 here's an updated view https://phabricator.wikimedia.org/P6490
[20:44:53] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Rack and setup db1111 and db1112 - https://phabricator.wikimedia.org/T180788#3849389 (10Marostegui) 05stalled>03Open
[20:45:59] <mutante>	 ok, how do you know it's a draft from that?
[20:46:44] <paladox>	 looks like it is not a draft.
[20:46:54] <paladox>	 as the status is new and not draft.
[20:47:59] <mutante>	 hmm, ok, well, as i said, if an admin can just abandon it that's fine
[20:48:07] <mutante>	 i'll just need to get lunch for now
[20:48:34] <mutante>	 i'll just make a new one later
[20:48:41] <paladox>	 ok
[20:48:56] <mutante>	 (and this one can still be investigated)  cu in a bit
[20:49:04] <wikibugs>	 (03PS1) 10Ottomata: Add nrpe check_newest_file_age; monitor some analytics file backups [puppet] - 10https://gerrit.wikimedia.org/r/399255 (https://phabricator.wikimedia.org/T182327)
[20:49:08] <wikibugs>	 (03PS1) 10Dduvall: Use sed instead of envsubst [deployment-charts] - 10https://gerrit.wikimedia.org/r/399256
[20:53:05] <wikibugs>	 (03CR) 10Ottomata: [C: 032] "Looks good i think! https://puppet-compiler.wmflabs.org/compiler02/9433/analytics1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/399255 (https://phabricator.wikimedia.org/T182327) (owner: 10Ottomata)
[20:54:53] <wikibugs>	 (03PS8) 10ArielGlenn: dataset1001 rsync to labs of dumps can now use explicit inclusion list [puppet] - 10https://gerrit.wikimedia.org/r/336204 (https://phabricator.wikimedia.org/T154798)
[20:55:17] <paladox>	 mutante you can try and do curl --user username:password -X POST https://gerrit.wikimedia.org/r/a/changes/399248/abandon
[20:56:33] <wikibugs>	 (03PS9) 10ArielGlenn: dataset1001 rsync to labs of dumps can now use explicit inclusion list [puppet] - 10https://gerrit.wikimedia.org/r/336204 (https://phabricator.wikimedia.org/T154798)
[20:57:50] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] dataset1001 rsync to labs of dumps can now use explicit inclusion list [puppet] - 10https://gerrit.wikimedia.org/r/336204 (https://phabricator.wikimedia.org/T154798) (owner: 10ArielGlenn)
[21:02:27] <wikibugs>	 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Create Prometheus exporter for wdqs-updater - https://phabricator.wikimedia.org/T182773#3834187 (10Gehel) I updated the grafana dashboard as well: https://grafana-admin.wikimedia.org/dashboard/db/wikidata-query-s...
[21:02:28] <wikibugs>	 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Create Prometheus exporter for Blazegraph - https://phabricator.wikimedia.org/T182857#3836850 (10Gehel) I updated the grafana dashboard as well: https://grafana-admin.wikimedia.org/dashboard/db/wikidata-query-ser...
[21:04:49] <wikibugs>	 (03PS2) 10Rush: openstack: whitelist kernel versions for compute [puppet] - 10https://gerrit.wikimedia.org/r/399243
[21:05:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: whitelist kernel versions for compute [puppet] - 10https://gerrit.wikimedia.org/r/399243 (owner: 10Rush)
[21:08:13] <wikibugs>	 (03PS3) 10Rush: openstack: whitelist kernel versions for compute [puppet] - 10https://gerrit.wikimedia.org/r/399243
[21:08:56] <wikibugs>	 (03PS1) 10Ottomata: Allow nagios to sudo to check analytics database backup newest file age [puppet] - 10https://gerrit.wikimedia.org/r/399260 (https://phabricator.wikimedia.org/T182327)
[21:09:17] <wikibugs>	 (03PS1) 10ArielGlenn: add dumpsgen user to labstore1003 for dumps cron cleanup job [puppet] - 10https://gerrit.wikimedia.org/r/399261 (https://phabricator.wikimedia.org/T154798)
[21:11:36] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] add dumpsgen user to labstore1003 for dumps cron cleanup job [puppet] - 10https://gerrit.wikimedia.org/r/399261 (https://phabricator.wikimedia.org/T154798) (owner: 10ArielGlenn)
[21:12:37] <wikibugs>	 (03PS2) 10Ottomata: Allow nagios to sudo to check analytics backup newest file age [puppet] - 10https://gerrit.wikimedia.org/r/399260 (https://phabricator.wikimedia.org/T182327)
[21:14:33] <wikibugs>	 (03PS3) 10Ottomata: Allow nagios to sudo to check analytics backup newest file age [puppet] - 10https://gerrit.wikimedia.org/r/399260 (https://phabricator.wikimedia.org/T182327)
[21:16:02] <wikibugs>	 (03PS4) 10Rush: openstack: whitelist kernel versions for compute [puppet] - 10https://gerrit.wikimedia.org/r/399243
[21:26:42] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9435/analytics1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/399260 (https://phabricator.wikimedia.org/T182327) (owner: 10Ottomata)
[21:31:31] <wikibugs>	 (03PS1) 10Andrew Bogott: Bigbrother: pass in a giant shell string to subprocess [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171)
[21:31:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Bigbrother: pass in a giant shell string to subprocess [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171) (owner: 10Andrew Bogott)
[21:34:40] <wikibugs>	 (03CR) 10Zhuyifei1999: Bigbrother: pass in a giant shell string to subprocess (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171) (owner: 10Andrew Bogott)
[21:47:03] <wikibugs>	 (03PS5) 10Rush: openstack: whitelist kernel versions for compute [puppet] - 10https://gerrit.wikimedia.org/r/399243
[21:47:43] <wikibugs>	 (03PS7) 10Rush: openstack: whitelist kernel versions for compute [puppet] - 10https://gerrit.wikimedia.org/r/399243
[21:51:18] <XioNoX>	 !log removing local-as AS43821 from ams transits - T167840
[21:51:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:29] <stashbot>	 T167840: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840
[22:00:11] <wikibugs>	 (03CR) 1020after4: [C: 031] Fix linewrap issue on wikimedia error page [puppet] - 10https://gerrit.wikimedia.org/r/395552 (https://phabricator.wikimedia.org/T180656) (owner: 10Phantom42)
[22:18:31] <wikibugs>	 (03PS2) 10Andrew Bogott: Bigbrother: build restart command out of a big list [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171)
[22:21:15] <wikibugs>	 (03PS3) 10Zoranzoki21: Add xpda.com to $wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/398702 (https://phabricator.wikimedia.org/T183073)
[22:30:18] <wikibugs>	 (03PS3) 10Andrew Bogott: Bigbrother: build restart command out of a big list [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171)
[22:31:06] <wikibugs>	 (03PS4) 10Andrew Bogott: Bigbrother: build restart command out of a big list [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171)
[22:33:54] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 031] openstack: whitelist kernel versions for compute [puppet] - 10https://gerrit.wikimedia.org/r/399243 (owner: 10Rush)
[22:44:23] <wikibugs>	 (03CR) 10BryanDavis: Bigbrother: build restart command out of a big list (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171) (owner: 10Andrew Bogott)
[22:47:56] <wikibugs>	 (03PS5) 10Andrew Bogott: Bigbrother: build restart command out of a big list [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171)
[22:50:01] <wikibugs>	 (03PS6) 10Andrew Bogott: Bigbrother: build restart command out of a big list [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171)
[22:50:34] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Bigbrother: build restart command out of a big list [puppet] - 10https://gerrit.wikimedia.org/r/399262 (https://phabricator.wikimedia.org/T183171) (owner: 10Andrew Bogott)
[22:51:47] <wikibugs>	 (03PS1) 10Hashar: contint: convert Apache proxying to profiles [puppet] - 10https://gerrit.wikimedia.org/r/399311
[22:54:50] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#2192798 (10Nuria) Can we go ahead and close ticket?
[22:59:41] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] contint: convert Apache proxying to profiles [puppet] - 10https://gerrit.wikimedia.org/r/399311 (owner: 10Hashar)
[23:00:16] <wikibugs>	 (03PS2) 10Hashar: contint: convert Apache proxying to profiles [puppet] - 10https://gerrit.wikimedia.org/r/399311
[23:02:47] <wikibugs>	 (03CR) 10Hashar: "https://puppet-compiler.wmflabs.org/compiler02/9437/ . Apparently only class get renamed:" [puppet] - 10https://gerrit.wikimedia.org/r/399311 (owner: 10Hashar)
[23:04:12] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3849840 (10Nuria) 05Open>03Resolved
[23:19:33] <icinga-wm>	 PROBLEM - parsoid on ruthenium is CRITICAL: connect to address 10.64.16.151 and port 8142: Connection refused
[23:58:29] <wikibugs>	 (03PS1) 10Cmjohnson: Adding entries for db1113 and 1114 T182896 [puppet] - 10https://gerrit.wikimedia.org/r/399314