[00:00:04] <jouncebot>	 addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: How many deployers does it take to do Evening SWAT (Max 8 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180104T0000).
[00:00:04] <jouncebot>	 Amir1: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[00:00:11] <Amir1>	 o/
[00:00:20] <Amir1>	 not testable, quick
[00:06:07] <Hauskatze>	 bd808: so personal pc's laptops like ours are affected... that makes 90% of the machines...
[00:08:41] <Reedy>	 probably more than 90%
[00:11:37] <bd808>	 Hauskatze: I'd guess more like 99.99% of all things with a CPU
[00:12:11] * bd808 buys stock in pen and paper manufactures
[00:12:30] <Hauskatze>	 typewritters and safes
[00:12:37] <Hauskatze>	 and a shotgun under your desk
[00:13:06] <Hauskatze>	 good night
[00:15:14] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1221 is OK: OK - load average: 6.30, 12.37, 23.52
[00:41:04] <no_justification>	 As I said elsewhere: good thing I use a laptop and not a desktop!
[00:41:08] <no_justification>	 I'm safe!
[00:41:37] <Krenair>	 true not all computers are going to be using functionality where this is a huge deal
[00:43:39] <Krenair>	 but stuff that runs untrusted VMs or JavaScript etc.... that's gonna be a huge percentage of things affected
[00:49:19] <wikibugs>	 10Operations, 10Cleanup, 10Continuous-Integration-Config, 10Gerrit, and 6 others: Archive mediawiki/extensions/Collection and others - https://phabricator.wikimedia.org/T183891#3873797 (10Tgr) Archiving a stable extension should involve some amount of public dicsussion, not just someone making an arbitrary...
[00:51:23] <Platonides>	 well, that's often pretty much the point why people use VMs...
[00:53:10] <Krenair>	 some VMs are more trustworthy than others
[00:55:45] <Krenair>	 there is a huge difference between VMs on hosts shared with anyone (e.g. public cloud providers, Labs, etc.), and the ganeti stuff running in prod where the guests are managed as production hosts
[00:55:59] <wikibugs>	 10Operations, 10Cleanup, 10Continuous-Integration-Config, 10Gerrit, and 6 others: Archive mediawiki/extensions/Collection and others - https://phabricator.wikimedia.org/T183891#3866385 (10demon) Just because it isn't used at WMF anymore doesn't mean it's worth archiving. I'm inclined to deny this.
[00:56:15] <wikibugs>	 10Operations, 10Cleanup, 10Continuous-Integration-Config, 10Gerrit, and 6 others: Archive mediawiki/extensions/Collection and others - https://phabricator.wikimedia.org/T183891#3873803 (10demon) (aka: what do the author(s) have to say?)
[00:56:28] <Krenair>	 it's still not good of course
[01:00:04] <jouncebot>	 twentyafterfour: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180104T0100).
[01:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[01:00:10] <wikibugs>	 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3873806 (10Papaul) @RobH   This was not on my dashboard so I missed it. I will get on it when back at the DC tomorrow.
[01:39:59] <bawolff>	 reading the details of these vulns, spectre sounds even scarier than meltdown in a way, because it seems impossible to fix, and affects all processors
[01:40:15] <Amir1>	 no_justification: no one deployed the SWAT, can we do it now?
[01:41:00] <no_justification>	 Amir1: Do swat? I'm about to walk out the door....
[01:41:32] <Amir1>	 I can do it
[01:41:39] <Amir1>	 just was thinking if it's okay
[01:44:14] <no_justification>	 I don't see why not :)
[01:44:17] <no_justification>	 All seems quiet
[01:44:25] <Amir1>	 cool
[01:44:29] <no_justification>	 (he says, as he walks away from any responsibility)
[01:45:00] <Amir1>	 I think it should be okay, the patch is super straightforward 
[01:45:36] <wikibugs>	 (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/398704 (https://phabricator.wikimedia.org/T182326) (owner: 10Ladsgroup)
[01:47:02] <wikibugs>	 (03Merged) 10jenkins-bot: Move testwiki2 from group0 to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/398704 (https://phabricator.wikimedia.org/T182326) (owner: 10Ladsgroup)
[01:47:13] <wikibugs>	 (03CR) 10jenkins-bot: Move testwiki2 from group0 to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/398704 (https://phabricator.wikimedia.org/T182326) (owner: 10Ladsgroup)
[01:49:52] <Amir1>	 I get lots of things like this when deploying with scap
[01:49:55] <Amir1>	 https://www.irccloud.com/pastebin/BFCq0qcL/
[01:50:05] <logmsgbot>	 !log ladsgroup@tin Synchronized dblists/group0.dblist: SWAT: Move testwiki2 from group0 to group1 (T182326) (duration: 01m 02s)
[01:50:10] <Reedy>	 ignore it
[01:50:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:50:17] <stashbot>	 T182326: Make one group1 wiki a client of testwikidata (preferably a test wiki)  - https://phabricator.wikimedia.org/T182326
[01:51:20] <Amir1>	 okay
[01:51:26] <Amir1>	 the deployment is done now
[01:58:08] <wikibugs>	 (03PS1) 10Dzahn: peopleweb: access based on roles, not host names [puppet] - 10https://gerrit.wikimedia.org/r/401829
[02:17:34] <icinga-wm>	 PROBLEM - MegaRAID on db1059 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[02:25:03] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.12) (duration: 07m 50s)
[02:25:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:32:04] <icinga-wm>	 PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1515033119 600 - REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 3115013 keys, up 4 minutes 9 seconds - replication_delay is 1515033119
[02:33:05] <icinga-wm>	 RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 3098559 keys, up 5 minutes 9 seconds - replication_delay is 0
[02:37:34] <icinga-wm>	 RECOVERY - MegaRAID on db1059 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[02:49:26] <logmsgbot>	 !log legoktm@tin Synchronized php-1.31.0-wmf.15/extensions/Flow/Hooks.php: Fix CheckUser type check thingy - T182834 (duration: 01m 01s)
[02:49:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:49:37] <stashbot>	 T182834: Argument 1 passed to FlowHooks::onSpecialCheckUserGetLinksFromRow() must be an instance of CheckUser, SpecialCheckUser given - https://phabricator.wikimedia.org/T182834
[03:07:34] <icinga-wm>	 PROBLEM - MegaRAID on db1059 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[03:12:27] <wikibugs>	 (03CR) 10Dzahn: [C: 031] "i would like to just merge it as is to be able to test it on my Cloud VPS project without having to use local puppetmaster, next i would l" [puppet] - 10https://gerrit.wikimedia.org/r/400100 (owner: 10Giuseppe Lavagetto)
[03:16:54] <icinga-wm>	 PROBLEM - MegaRAID on db1016 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[03:17:34] <icinga-wm>	 RECOVERY - MegaRAID on db1059 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[03:26:05] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 782.36 seconds
[04:05:14] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 267.44 seconds
[04:52:25] <icinga-wm>	 PROBLEM - HHVM rendering on mw2122 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:53:15] <icinga-wm>	 RECOVERY - HHVM rendering on mw2122 is OK: HTTP OK: HTTP/1.1 200 OK - 73439 bytes in 0.323 second response time
[05:17:34] <icinga-wm>	 PROBLEM - MegaRAID on db1059 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[05:47:34] <icinga-wm>	 RECOVERY - MegaRAID on db1059 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[05:56:54] <icinga-wm>	 RECOVERY - MegaRAID on db1016 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[05:57:35] <icinga-wm>	 PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100%
[05:57:45] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.199, interfaces up: 35, down: 1, dormant: 0, excluded: 1, unused: 0
[06:00:54] <icinga-wm>	 RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 37, down: 0, dormant: 0, excluded: 1, unused: 0
[06:02:44] <icinga-wm>	 RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.61 ms
[06:17:34] <icinga-wm>	 PROBLEM - MegaRAID on db1059 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[06:22:54] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1059 possibly BBU issues - https://phabricator.wikimedia.org/T184160#3874338 (10Marostegui)
[06:23:20] <marostegui>	 !log Issue a BBU re-learn cycle on db1059 - T184160
[06:23:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:23:33] <stashbot>	 T184160: db1059 possibly BBU issues - https://phabricator.wikimedia.org/T184160
[06:27:55] <marostegui>	 !log Deploy schema change on db1068 (s4) master - T174569
[06:28:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:28:08] <stashbot>	 T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569
[06:33:10] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1059 possibly BBU issues - https://phabricator.wikimedia.org/T184160#3874356 (10Marostegui) ``` Time: Fri Nov 24 23:39:07 2017 Event Description: Battery started charging Time: Fri Nov 24 23:46:42 2017 Event Description: Battery charge complete Time: Sun Nov 26 08:04:47 20...
[06:35:14] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401899 (https://phabricator.wikimedia.org/T174569)
[06:36:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401899 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui)
[06:37:06] <wikibugs>	 (03PS2) 10Marostegui: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401899 (https://phabricator.wikimedia.org/T174569)
[06:37:34] <icinga-wm>	 RECOVERY - MegaRAID on db1059 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[06:38:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401899 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui)
[06:40:20] <wikibugs>	 (03Abandoned) 10Marostegui: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401899 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui)
[06:41:15] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1059 possibly BBU issues - https://phabricator.wikimedia.org/T184160#3874359 (10Marostegui) After the manual relearn: ``` ˜/icinga-wm 7:37> RECOVERY - MegaRAID on db1059 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy ```  Don't know for how long it will last
[06:42:42] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401900 (https://phabricator.wikimedia.org/T174569)
[06:43:00] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3874361 (10Marostegui) This host failed again and recovered itself: ``` 03:16 < icinga-wm> PROBLEM - MegaRAID on db1016 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, cu...
[06:45:05] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1059 possibly BBU issues - https://phabricator.wikimedia.org/T184160#3874362 (10Marostegui) p:05Triage>03Normal
[06:46:00] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401900 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui)
[06:46:55] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: site.pp: convert dns recursors to single role [puppet] - 10https://gerrit.wikimedia.org/r/401547
[06:47:20] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9532/" [puppet] - 10https://gerrit.wikimedia.org/r/401547 (owner: 10Giuseppe Lavagetto)
[06:47:33] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401900 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui)
[06:47:43] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401900 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui)
[06:48:27] <marostegui>	 !log Deploy schema change on db1079 (s7) with replication enabled - this will generate lag on labs replicas - T174569
[06:48:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:48:39] <stashbot>	 T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569
[06:48:56] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1079 - T174569 (duration: 01m 02s)
[06:49:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:59:50] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: bastionhost: add role for caching PoPs [puppet] - 10https://gerrit.wikimedia.org/r/401548
[07:03:48] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1113 and db1114 - https://phabricator.wikimedia.org/T182896#3874373 (10Marostegui) a:05Cmjohnson>03Marostegui
[07:07:34] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#3874375 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1113.eqiad.wmnet', 'db111...
[07:07:34] <icinga-wm>	 PROBLEM - MegaRAID on db1059 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[07:11:04] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1059 possibly BBU issues - https://phabricator.wikimedia.org/T184160#3874377 (10Marostegui) a:03Cmjohnson ``` PROBLEM - MegaRAID on db1059 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough ```  We should replace the BBU
[07:13:22] <wikibugs>	 (03PS2) 10ArielGlenn: add wmflabs config for dumps scap [dumps/scap] - 10https://gerrit.wikimedia.org/r/400598
[07:15:06] <wikibugs>	 (03CR) 10ArielGlenn: "Yeah the dumpsgen user is included in the profiles that are applied via any of the snapshot roles." (031 comment) [dumps/scap] - 10https://gerrit.wikimedia.org/r/400598 (owner: 10ArielGlenn)
[07:31:51] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on db1113 is CRITICAL: Return code of 255 is out of bounds
[07:31:51] <icinga-wm>	 PROBLEM - DPKG on db1114 is CRITICAL: Return code of 255 is out of bounds
[07:33:31] <icinga-wm>	 PROBLEM - DPKG on db1113 is CRITICAL: Return code of 255 is out of bounds
[07:33:31] <icinga-wm>	 PROBLEM - Disk space on db1114 is CRITICAL: Return code of 255 is out of bounds
[07:35:20] <icinga-wm>	 PROBLEM - Disk space on db1113 is CRITICAL: Return code of 255 is out of bounds
[07:37:20] <icinga-wm>	 RECOVERY - Disk space on db1113 is OK: DISK OK
[07:37:31] <icinga-wm>	 RECOVERY - DPKG on db1114 is OK: All packages OK
[07:37:40] <icinga-wm>	 RECOVERY - Disk space on db1114 is OK: DISK OK
[07:37:41] <icinga-wm>	 RECOVERY - DPKG on db1113 is OK: All packages OK
[07:38:59] <wikibugs>	 (03CR) 10ArielGlenn: [V: 032 C: 032] add wmflabs config for dumps scap [dumps/scap] - 10https://gerrit.wikimedia.org/r/400598 (owner: 10ArielGlenn)
[07:43:41] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#3874390 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1113.eqiad.wmnet', 'db1114.eqiad.wmnet'] ```  and were **ALL** successful.
[07:55:38] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#3874396 (10Marostegui) I put the wrong task ID, it was meant to be T182896 Sorry!
[07:56:26] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1113 and db1114 - https://phabricator.wikimedia.org/T182896#3837888 (10Marostegui)
[08:01:54] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on db1113 is OK: OK: synced at Thu 2018-01-04 08:01:46 UTC.
[08:01:58] <wikibugs>	 (03PS2) 10ArielGlenn: add dumps repo source to beta scap, add snapshot to beta mw scap [puppet] - 10https://gerrit.wikimedia.org/r/400237
[08:02:08] <wikibugs>	 (03CR) 10ArielGlenn: add dumps repo source to beta scap, add snapshot to beta mw scap (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/400237 (owner: 10ArielGlenn)
[08:03:00] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] add dumps repo source to beta scap, add snapshot to beta mw scap [puppet] - 10https://gerrit.wikimedia.org/r/400237 (owner: 10ArielGlenn)
[08:22:57] <wikibugs>	 (03PS1) 10Elukey: profile::hadoop::master: remove the last hadoop cdh auto-lookup [puppet] - 10https://gerrit.wikimedia.org/r/401904 (https://phabricator.wikimedia.org/T167790)
[08:22:59] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Add db111{3,4} to spare [puppet] - 10https://gerrit.wikimedia.org/r/401905 (https://phabricator.wikimedia.org/T184161)
[08:24:29] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1113 and db1114 - https://phabricator.wikimedia.org/T182896#3874449 (10Marostegui) 05Open>03Resolved
[08:25:57] <wikibugs>	 (03CR) 10Elukey: "pcc: https://puppet-compiler.wmflabs.org/compiler02/9538/" [puppet] - 10https://gerrit.wikimedia.org/r/401904 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[08:26:42] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: bastionhost: add role for caching PoPs [puppet] - 10https://gerrit.wikimedia.org/r/401548
[08:28:51] <wikibugs>	 (03PS1) 10Elukey: role::analytics_cluster::coordinator: fix system::role [puppet] - 10https://gerrit.wikimedia.org/r/401907 (https://phabricator.wikimedia.org/T167790)
[08:29:17] <wikibugs>	 (03CR) 10Elukey: [C: 032] role::analytics_cluster::coordinator: fix system::role [puppet] - 10https://gerrit.wikimedia.org/r/401907 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[08:30:43] <wikibugs>	 (03CR) 10Marostegui: [C: 032] site.pp: Add db111{3,4} to spare [puppet] - 10https://gerrit.wikimedia.org/r/401905 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui)
[08:30:48] <wikibugs>	 (03PS2) 10Marostegui: site.pp: Add db111{3,4} to spare [puppet] - 10https://gerrit.wikimedia.org/r/401905 (https://phabricator.wikimedia.org/T184161)
[08:31:21] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401909
[08:34:48] <wikibugs>	 10Operations, 10Performance-Team, 10HHVM, 10Patch-For-Review: HHVM hangs on the API cluster - https://phabricator.wikimedia.org/T184048#3874481 (10Joe) @Imarlier no I think there isn't much we can do until we have a reproduction case. For now I'm focusing on mitigations for this issue as 1) we're not on th...
[08:43:43] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401909 (owner: 10Marostegui)
[08:45:01] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401909 (owner: 10Marostegui)
[08:45:49] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: bastionhost: add role for caching PoPs [puppet] - 10https://gerrit.wikimedia.org/r/401548
[08:46:17] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1079 - T174569 (duration: 01m 02s)
[08:46:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:29] <stashbot>	 T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569
[08:46:39] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401909 (owner: 10Marostegui)
[08:47:11] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] bastionhost: add role for caching PoPs [puppet] - 10https://gerrit.wikimedia.org/r/401548 (owner: 10Giuseppe Lavagetto)
[08:48:55] <marostegui>	 !log Deploy schema change on db1069 (s7) - T174569
[08:49:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:52:55] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1336 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[08:53:04] <marostegui>	 !log Fixing inconsistencies on s7 - T163190
[08:53:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:53:16] <stashbot>	 T163190: Checksum data on s7 - https://phabricator.wikimedia.org/T163190
[08:54:05] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1337 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[08:55:29] <elukey>	 so these are the new jobrunners
[08:55:41] <elukey>	 I bet that the race condition for the conntrack is wrong
[08:55:41] <elukey>	 fixing
[08:56:05] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1337 is OK: OK: nf_conntrack is 78 % full
[08:56:25] <icinga-wm>	 PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Exec[mkfs-/dev/sdf1]
[08:57:02] <elukey>	 !log set sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 on mw133[67] (new jobrunners)
[08:57:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:34] <icinga-wm>	 RECOVERY - MegaRAID on db1059 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[08:58:01] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1336 is OK: OK: nf_conntrack is 77 % full
[08:58:44] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1013 - https://phabricator.wikimedia.org/T184053#3874527 (10fgiunchedi) 05Open>03Resolved Thanks @Cmjohnson ! Disk is rebuilding.
[08:59:22] <wikibugs>	 (03CR) 10Gehel: [C: 031] "Puppet compiler looks good: https://puppet-compiler.wmflabs.org/compiler02/9540/" [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978) (owner: 10Smalyshev)
[09:01:31] <icinga-wm>	 RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:02:52] <icinga-wm>	 PROBLEM - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: consumer/mysql-m4-master-00 consumer/mysql-eventbus
[09:05:04] <elukey>	 ah yes this is ok, downtime expired --^
[09:12:21] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1panelId=2fullscreen
[09:13:22] <icinga-wm>	 PROBLEM - Apache HTTP on mw2125 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:13:26] <moritzm>	 !log rebooting kubernetes1001 for kernel update
[09:13:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:12] <icinga-wm>	 RECOVERY - Apache HTTP on mw2125 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.128 second response time
[09:15:21] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1panelId=2fullscreen
[09:17:46] <wikibugs>	 (03PS1) 10Filippo Giunchedi: graphite: cleanup stale ORES metrics [puppet] - 10https://gerrit.wikimedia.org/r/401917 (https://phabricator.wikimedia.org/T169969)
[09:18:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] graphite: cleanup stale ORES metrics [puppet] - 10https://gerrit.wikimedia.org/r/401917 (https://phabricator.wikimedia.org/T169969) (owner: 10Filippo Giunchedi)
[09:19:14] <elukey>	 marostegui: I am checking https://logstash.wikimedia.org/app/kibana#/dashboard/mediawiki-errors and the above mw exceptions are saying "Could not wait for replica DBs to catch up to db1062" - expected?
[09:21:15] <wikibugs>	 (03PS2) 10Filippo Giunchedi: graphite: cleanup stale ORES metrics [puppet] - 10https://gerrit.wikimedia.org/r/401917 (https://phabricator.wikimedia.org/T169969)
[09:21:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] graphite: cleanup stale ORES metrics [puppet] - 10https://gerrit.wikimedia.org/r/401917 (https://phabricator.wikimedia.org/T169969) (owner: 10Filippo Giunchedi)
[09:22:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/9542/graphite1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/401917 (https://phabricator.wikimedia.org/T169969) (owner: 10Filippo Giunchedi)
[09:25:02] <marostegui>	 elukey: checking
[09:25:17] <jynus>	 elukey: given that that is a master and has no replication, that is really bad
[09:25:39] <jynus>	 or is it happeneing on other host?
[09:26:31] <jynus>	 "Timed out waiting on db1101:3317"
[09:26:47] <marostegui>	 is it down?
[09:27:05] <marostegui>	 i can see it fine
[09:27:16] <jynus>	 https://logstash.wikimedia.org/goto/cc705e4aa21677c0e9a9ebce69235622
[09:27:36] <elukey>	 it was a 10 min blip afaics
[09:27:46] <elukey>	 now the exceptions have fully recovered
[09:27:51] <jynus>	 Server db1101:3317 has 63.380876064301 seconds of lag
[09:29:00] <marostegui>	 That could have been me, while fixing incosistencies
[09:29:06] <marostegui>	 I am checking the graphs, and matches the times
[09:29:32] <marostegui>	 I think I will prewarm the tables and throttle it a bit
[09:29:51] <icinga-wm>	 PROBLEM - SSH on ms-be1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:29:51] <jynus>	 it seems it was a large replace
[09:29:52] <marostegui>	 Because yesterday I saw no issues on other hosts, but these ones are multi-instance and have less buffer pool, so it coukld be that
[09:30:00] <marostegui>	 jynus: yep, that was me
[09:30:31] <jynus>	 I do not think it was performance
[09:30:35] <jynus>	 but locking
[09:30:55] <jynus>	 see the threads running stats
[09:31:41] <icinga-wm>	 RECOVERY - SSH on ms-be1013 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u1 (protocol 2.0)
[09:31:47] <marostegui>	 could be, yeah, I will throttle it more then
[09:31:51] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 112.76, 105.24, 68.57
[09:32:39] <jynus>	 in theory, one replica having issues should not affect mediawiki, in practice, because that ticket, it does :-(
[09:33:23] <marostegui>	 actually, I am going to depool it, it will be easier
[09:34:51] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be1013 is OK: OK - load average: 52.71, 79.99, 65.11
[09:35:16] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401925 (https://phabricator.wikimedia.org/T163190)
[09:38:28] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401925 (https://phabricator.wikimedia.org/T163190) (owner: 10Marostegui)
[09:38:30] <moritzm>	 !log rebooting mw1307 and wtp1025 for kernel update
[09:38:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:49] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401925 (https://phabricator.wikimedia.org/T163190) (owner: 10Marostegui)
[09:39:59] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1101:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401925 (https://phabricator.wikimedia.org/T163190) (owner: 10Marostegui)
[09:43:33] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1101:3317 - T163190 (duration: 03m 09s)
[09:43:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:45] <stashbot>	 T163190: Checksum data on s7 - https://phabricator.wikimedia.org/T163190
[09:46:39] <wikibugs>	 10Operations, 10Continuous-Integration-Config: tox 2.5.0 on phabricator-jessie-diffs fails with ERROR: Commands not specified - https://phabricator.wikimedia.org/T184060#3874550 (10fgiunchedi) My point is more like that's a regression in tox 2.5.0 (i.e. environment without `commands` is invalid) that got rever...
[09:50:04] <wikibugs>	 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Move deployment-prep redis instances to stretch - https://phabricator.wikimedia.org/T179371#3874565 (10fgiunchedi) So IIRC @Pchelolo has finished running their tests in deployment-prep that used redis. So we could actuall...
[09:54:18] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: site.pp: use role keyword for striker::web only on californium [puppet] - 10https://gerrit.wikimedia.org/r/401549
[09:58:35] <jynus>	 !log restart and upgrade db2053
[09:58:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:02:22] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: site.pp: use role keyword for striker::web only on californium [puppet] - 10https://gerrit.wikimedia.org/r/401549
[10:04:35] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9544/" [puppet] - 10https://gerrit.wikimedia.org/r/401549 (owner: 10Giuseppe Lavagetto)
[10:09:17] <wikibugs>	 (03PS1) 10Elukey: Refactor thorium's roles in one [puppet] - 10https://gerrit.wikimedia.org/r/401927 (https://phabricator.wikimedia.org/T167790)
[10:14:14] <logmsgbot>	 !log mobrovac@tin Started deploy [mathoid/deploy@7f664ff]: Update Mathoid in codfw to v0.7.0, take #2 - T183557
[10:14:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:26] <stashbot>	 T183557: Mathoid v0.7.0 not accepting chem formula - https://phabricator.wikimedia.org/T183557
[10:14:35] <wikibugs>	 (03PS2) 10Elukey: Refactor thorium's roles in one [puppet] - 10https://gerrit.wikimedia.org/r/401927 (https://phabricator.wikimedia.org/T167790)
[10:16:52] <logmsgbot>	 !log mobrovac@tin Finished deploy [mathoid/deploy@7f664ff]: Update Mathoid in codfw to v0.7.0, take #2 - T183557 (duration: 02m 38s)
[10:17:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:18:13] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/9546/thorium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/401927 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[10:20:00] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Move db2053 socket to /run [puppet] - 10https://gerrit.wikimedia.org/r/401928 (https://phabricator.wikimedia.org/T148507)
[10:20:32] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Move db2053 socket to /run [puppet] - 10https://gerrit.wikimedia.org/r/401928 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[10:21:26] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Disable docker bridge in production/staging [puppet] - 10https://gerrit.wikimedia.org/r/401929
[10:21:58] <wikibugs>	 (03PS3) 10Elukey: Refactor thorium's roles in one [puppet] - 10https://gerrit.wikimedia.org/r/401927 (https://phabricator.wikimedia.org/T167790)
[10:25:30] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401930 (https://phabricator.wikimedia.org/T163190)
[10:29:21] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: cache: add ipsec to basic roles [puppet] - 10https://gerrit.wikimedia.org/r/401550
[10:29:53] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] Disable docker bridge in production/staging [puppet] - 10https://gerrit.wikimedia.org/r/401929 (owner: 10Alexandros Kosiaris)
[10:31:38] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401930 (https://phabricator.wikimedia.org/T163190) (owner: 10Marostegui)
[10:37:37] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401930 (https://phabricator.wikimedia.org/T163190) (owner: 10Marostegui)
[10:39:05] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1079 - T163190 (duration: 01m 02s)
[10:39:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:16] <stashbot>	 T163190: Checksum data on s7 - https://phabricator.wikimedia.org/T163190
[10:39:17] <marostegui>	 !log Stop replication in sync on db1079 and db1101:3317 - T163190
[10:39:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:04] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler03/9550/ shows this is a noop." [puppet] - 10https://gerrit.wikimedia.org/r/401550 (owner: 10Giuseppe Lavagetto)
[10:40:38] <wikibugs>	 (03CR) 10Ema: [C: 031] cache: add ipsec to basic roles [puppet] - 10https://gerrit.wikimedia.org/r/401550 (owner: 10Giuseppe Lavagetto)
[10:44:57] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401935
[10:46:46] <wikibugs>	 (03PS9) 10Ema: mtail: add program to count varnish backend metrics [puppet] - 10https://gerrit.wikimedia.org/r/401535 (https://phabricator.wikimedia.org/T177199) (owner: 10Filippo Giunchedi)
[10:47:47] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401935 (owner: 10Marostegui)
[10:50:30] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401935 (owner: 10Marostegui)
[10:51:45] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1079 - T163190 (duration: 01m 01s)
[10:51:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:51:58] <stashbot>	 T163190: Checksum data on s7 - https://phabricator.wikimedia.org/T163190
[10:55:25] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: ifguard $realm and $cluster with defined() [puppet] - 10https://gerrit.wikimedia.org/r/401974
[10:57:01] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401930 (https://phabricator.wikimedia.org/T163190) (owner: 10Marostegui)
[10:57:05] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401935 (owner: 10Marostegui)
[10:57:42] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: cache: add ipsec to basic roles [puppet] - 10https://gerrit.wikimedia.org/r/401550
[11:00:46] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] cache: add ipsec to basic roles [puppet] - 10https://gerrit.wikimedia.org/r/401550 (owner: 10Giuseppe Lavagetto)
[11:06:21] <icinga-wm>	 PROBLEM - HHVM rendering on mw2107 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:07:11] <icinga-wm>	 RECOVERY - HHVM rendering on mw2107 is OK: HTTP OK: HTTP/1.1 200 OK - 73425 bytes in 0.317 second response time
[11:10:00] <wikibugs>	 (03PS7) 10Ema: varnish: add varnishmtail instance for varnish backends [puppet] - 10https://gerrit.wikimedia.org/r/401526 (https://phabricator.wikimedia.org/T177199)
[11:10:10] <wikibugs>	 (03CR) 10Ema: [V: 032 C: 032] varnish: add varnishmtail instance for varnish backends [puppet] - 10https://gerrit.wikimedia.org/r/401526 (https://phabricator.wikimedia.org/T177199) (owner: 10Ema)
[11:11:31] <wikibugs>	 (03CR) 10Ema: [C: 032] mtail: add program to count varnish backend metrics [puppet] - 10https://gerrit.wikimedia.org/r/401535 (https://phabricator.wikimedia.org/T177199) (owner: 10Filippo Giunchedi)
[11:11:37] <wikibugs>	 (03PS10) 10Ema: mtail: add program to count varnish backend metrics [puppet] - 10https://gerrit.wikimedia.org/r/401535 (https://phabricator.wikimedia.org/T177199) (owner: 10Filippo Giunchedi)
[11:11:45] <wikibugs>	 (03CR) 10Ema: [V: 032 C: 032] mtail: add program to count varnish backend metrics [puppet] - 10https://gerrit.wikimedia.org/r/401535 (https://phabricator.wikimedia.org/T177199) (owner: 10Filippo Giunchedi)
[11:12:56] <_joe_>	 ema: I'll reenable puppet everywhere shortly
[11:13:05] <ema>	 _joe_: thanks
[11:13:35] <_joe_>	 ema: not sure everything works as expected though
[11:13:48] <_joe_>	 ema: puppet has run on cp1048 and now on cp1052
[11:13:54] <_joe_>	 with your changes
[11:14:06] <ema>	 _joe_: let's see
[11:14:10] <_joe_>	 do you want to check those hosts before reenabling?
[11:14:44] <ema>	 _joe_: just checked, the change worked fine
[11:14:52] <ema>	 CC: godog 
[11:14:55] <_joe_>	 nevermind, it was a partial merge apparently with my first puppet run
[11:15:22] <_joe_>	 ema: reenabling then :)
[11:15:35] <ema>	 yes, please!
[11:15:45] <_joe_>	 {{done}}
[11:16:29] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: site.pp: simplify role() keyword call for cache::canary [puppet] - 10https://gerrit.wikimedia.org/r/401551
[11:17:00] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] site.pp: simplify role() keyword call for cache::canary [puppet] - 10https://gerrit.wikimedia.org/r/401551 (owner: 10Giuseppe Lavagetto)
[11:17:47] <_joe_>	 ema: do you think puppet can be reenabled on cp1008?
[11:18:21] <_joe_>	 if not, that's ok, but it's dis   able since forever
[11:18:54] <ema>	 _joe_: I'll check in a second
[11:19:44] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: site.pp: one role for dbstore2001.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/401552
[11:20:26] <wikibugs>	 (03PS1) 10Elukey: role::prometheus::analytics: add configuration for jmx hadoop agents [puppet] - 10https://gerrit.wikimedia.org/r/402021 (https://phabricator.wikimedia.org/T177458)
[11:21:33] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler03/9552/" [puppet] - 10https://gerrit.wikimedia.org/r/401552 (owner: 10Giuseppe Lavagetto)
[11:25:00] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: monitoring: create role::alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/401553
[11:30:16] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] role::prometheus::analytics: add configuration for jmx hadoop agents [puppet] - 10https://gerrit.wikimedia.org/r/402021 (https://phabricator.wikimedia.org/T177458) (owner: 10Elukey)
[11:30:47] <elukey>	 \o/
[11:30:54] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] monitoring: create role::alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/401553 (owner: 10Giuseppe Lavagetto)
[11:31:36] <wikibugs>	 (03PS1) 10Ema: mtail: update varnishbackend.mtail regex [puppet] - 10https://gerrit.wikimedia.org/r/402022 (https://phabricator.wikimedia.org/T177199)
[11:32:01] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[11:36:53] <wikibugs>	 (03PS2) 10Ema: mtail: update varnishbackend.mtail regex [puppet] - 10https://gerrit.wikimedia.org/r/402022 (https://phabricator.wikimedia.org/T177199)
[11:38:46] <ema>	 _joe_: puppet re-enabled on cp1008, it was disabled for digicert certs testing a while ago
[11:41:01] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes1003 is OK: OK - running: The system is fully operational
[11:42:12] <wikibugs>	 (03PS3) 10Ema: mtail: update varnishbackend.mtail regex [puppet] - 10https://gerrit.wikimedia.org/r/402022 (https://phabricator.wikimedia.org/T177199)
[11:43:07] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] mtail: update varnishbackend.mtail regex [puppet] - 10https://gerrit.wikimedia.org/r/402022 (https://phabricator.wikimedia.org/T177199) (owner: 10Ema)
[11:43:48] <wikibugs>	 (03CR) 10Ema: [C: 032] mtail: update varnishbackend.mtail regex [puppet] - 10https://gerrit.wikimedia.org/r/402022 (https://phabricator.wikimedia.org/T177199) (owner: 10Ema)
[11:51:46] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: eventlogging: create compound role, consolidate hiera [puppet] - 10https://gerrit.wikimedia.org/r/401554
[11:55:21] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: eventlogging: create compound role, consolidate hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/401554 (owner: 10Giuseppe Lavagetto)
[11:55:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] eventlogging: create compound role, consolidate hiera [puppet] - 10https://gerrit.wikimedia.org/r/401554 (owner: 10Giuseppe Lavagetto)
[11:58:57] <wikibugs>	 (03PS1) 10Filippo Giunchedi: smart: bump timeout to 60s [puppet] - 10https://gerrit.wikimedia.org/r/402023 (https://phabricator.wikimedia.org/T86552)
[11:58:59] <wikibugs>	 (03PS1) 10Filippo Giunchedi: smart: ignore drbd disks [puppet] - 10https://gerrit.wikimedia.org/r/402024 (https://phabricator.wikimedia.org/T86552)
[12:00:13] <moritzm>	 !log upgrading HHVM on API canaries (mw1276-mw1279) to HHVM 3.18.6
[12:00:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:29] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] smart: bump timeout to 60s [puppet] - 10https://gerrit.wikimedia.org/r/402023 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi)
[12:00:41] <wikibugs>	 (03PS2) 10Filippo Giunchedi: smart: bump timeout to 60s [puppet] - 10https://gerrit.wikimedia.org/r/402023 (https://phabricator.wikimedia.org/T86552)
[12:02:31] <logmsgbot>	 !log mobrovac@tin Started deploy [mathoid/deploy@c9957ce]: Mathoid v0.7.1 - T172767
[12:02:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:43] <stashbot>	 T172767: Prepare mathoid 0.7 release (tracking) - https://phabricator.wikimedia.org/T172767
[12:03:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] smart: ignore drbd disks [puppet] - 10https://gerrit.wikimedia.org/r/402024 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi)
[12:03:04] <wikibugs>	 (03PS2) 10Filippo Giunchedi: smart: ignore drbd disks [puppet] - 10https://gerrit.wikimedia.org/r/402024 (https://phabricator.wikimedia.org/T86552)
[12:06:11] <icinga-wm>	 PROBLEM - Host kubernetes1003 is DOWN: PING CRITICAL - Packet loss = 100%
[12:06:21] <icinga-wm>	 RECOVERY - Host kubernetes1003 is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms
[12:07:36] <logmsgbot>	 !log mobrovac@tin Finished deploy [mathoid/deploy@c9957ce]: Mathoid v0.7.1 - T172767 (duration: 05m 05s)
[12:07:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:49] <stashbot>	 T172767: Prepare mathoid 0.7 release (tracking) - https://phabricator.wikimedia.org/T172767
[12:10:08] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org (pilot instance) - https://phabricator.wikimedia.org/T180854#3874954 (10Qgil) @Andrew @Austin @EBernhardson @Tgr @Samwilson @yuvipanda, as current admins of [[ https://tools.wmflabs.org/openstack-brows...
[12:12:51] <wikibugs>	 (03PS1) 10ArielGlenn: use strict var syntax in snapshot/dumps modules [puppet] - 10https://gerrit.wikimedia.org/r/402029
[12:13:21] <icinga-wm>	 PROBLEM - puppet last run on mw1277 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm-dbg],Package[hhvm]
[12:17:08] <wikibugs>	 (03PS5) 10Volans: PuppetDB backend: add support for API v4 [software/cumin] - 10https://gerrit.wikimedia.org/r/399821 (https://phabricator.wikimedia.org/T182575)
[12:20:43] * volans looking at icinga-wm
[12:20:53] <akosiaris>	 volans: it's me
[12:21:01] <volans>	 akosiaris: ack :)
[12:24:30] <wikibugs>	 (03CR) 10Volans: "Tested on a local docker deployment of PuppetDB using:" [software/cumin] - 10https://gerrit.wikimedia.org/r/399821 (https://phabricator.wikimedia.org/T182575) (owner: 10Volans)
[12:28:10] <wikibugs>	 (03PS2) 10Elukey: role::prometheus::analytics: add configuration for jmx hadoop agents [puppet] - 10https://gerrit.wikimedia.org/r/402021 (https://phabricator.wikimedia.org/T177458)
[12:28:36] <Hauskatze>	 Reedy: I think I've finished with https://phabricator.wikimedia.org/P6522 -- can we run the script again and check if there's anything left?
[12:29:28] <wikibugs>	 (03CR) 10Elukey: [C: 032] role::prometheus::analytics: add configuration for jmx hadoop agents [puppet] - 10https://gerrit.wikimedia.org/r/402021 (https://phabricator.wikimedia.org/T177458) (owner: 10Elukey)
[12:30:00] <icinga-wm>	 CUSTOM - Host kubernetes1003 is UP: PING OK - Packet loss = 0%, RTA = 0.84 ms
[12:30:44] <icinga-wm>	 CUSTOM - Host kubernetes1003 is UP: PING OK - Packet loss = 0%, RTA = 0.84 ms
[12:31:49] <icinga-wm>	 CUSTOM - Host kubernetes1003 is UP: PING OK - Packet loss = 0%, RTA = 0.84 ms
[12:32:06] <akosiaris>	 I like how custom notifications require a text but are not reporting it 
[12:32:23] <volans>	 they usually do, at least for services...
[12:32:40] <volans>	 I've used them, at least for services and CRITICAL they do
[12:34:19] <icinga-wm>	 CUSTOM - Host kubernetes1003 is UP: PING OK - Packet loss = 0%, RTA = 0.84 ms
[12:36:01] <wikibugs>	 (03PS2) 10ArielGlenn: use strict var syntax in snapshot/dumps modules [puppet] - 10https://gerrit.wikimedia.org/r/402029
[12:37:57] <akosiaris>	 what on earth is this software doing...
[12:38:21] <icinga-wm_>	 RECOVERY - puppet last run on mw1277 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:40:07] <wikibugs>	 (03PS3) 10ArielGlenn: use strict var syntax in snapshot/dumps modules [puppet] - 10https://gerrit.wikimedia.org/r/402029
[12:40:45] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] use strict var syntax in snapshot/dumps modules [puppet] - 10https://gerrit.wikimedia.org/r/402029 (owner: 10ArielGlenn)
[12:40:50] <Hauskatze>	 legoktm: around?
[12:41:33] <logmsgbot>	 !log mobrovac@tin Started deploy [restbase/deploy@66b7efe]: Switch Mathoid to Cassandra 3 and drop Cassandra 2 references - T179419
[12:41:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:45] <stashbot>	 T179419: Migrate mathoid storage from legacy to new strategy - https://phabricator.wikimedia.org/T179419
[12:45:38] <logmsgbot>	 !log mobrovac@tin Finished deploy [restbase/deploy@66b7efe]: Switch Mathoid to Cassandra 3 and drop Cassandra 2 references - T179419 (duration: 04m 05s)
[12:45:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:23] <wikibugs>	 10Operations, 10RESTBase, 10Patch-For-Review, 10Services (blocked), 10User-mobrovac: Set up RESTBase on Cassandra 3 nodes - https://phabricator.wikimedia.org/T184110#3875011 (10mobrovac)
[12:49:21] <wikibugs>	 10Operations, 10RESTBase, 10Patch-For-Review, 10Services (doing), 10User-mobrovac: Set up RESTBase on Cassandra 3 nodes - https://phabricator.wikimedia.org/T184110#3872628 (10mobrovac)
[12:53:49] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] "It requires a sanitarium restart- but it is not high priority- x1 tables should not reach labsdbs anyway." [puppet] - 10https://gerrit.wikimedia.org/r/397623 (owner: 10Gergő Tisza)
[12:53:52] <moritzm>	 !log upgrading HHVM on mwdebug* to 3.18.6
[12:54:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:56:06] <wikibugs>	 (03CR) 10Jcrespo: "If this intends to run every day at 2:42, I do not think it will work- but I do not know which is the intended schedule (not shown on the " [puppet] - 10https://gerrit.wikimedia.org/r/395694 (https://phabricator.wikimedia.org/T181107) (owner: 10Gergő Tisza)
[13:09:59] <wikibugs>	 (03PS1) 10Steinsplitter: Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow localizaion. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402040
[13:12:58] <wikibugs>	 (03PS2) 10Steinsplitter: Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow onwiki localizaion of a commosn specific notice. All changes have been made onwiki yet. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402040 (https://phabricator.wikimedia.org/T183848)
[13:17:55] <moritzm>	 !log upgrading HHVM on mw1180-mw1220 to 3.18.6
[13:18:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:28:27] <wikibugs>	 (03PS9) 10Elukey: role::puppetmaster::puppetdb: add Prometheus monitoring for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/394966
[13:28:54] <wikibugs>	 10Operations, 10MediaWiki-Maintenance-scripts, 10Wikidata: Missing references to s8 on maintenance and cloud scripts (and potentially others) - https://phabricator.wikimedia.org/T184179#3875081 (10jcrespo) p:05Triage>03Normal
[13:28:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] role::puppetmaster::puppetdb: add Prometheus monitoring for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/394966 (owner: 10Elukey)
[13:30:19] <elukey>	 argh
[13:30:30] <wikibugs>	 (03PS10) 10Elukey: role::puppetmaster::puppetdb: add Prometheus monitoring for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/394966
[13:30:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] role::puppetmaster::puppetdb: add Prometheus monitoring for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/394966 (owner: 10Elukey)
[13:30:59] <elukey>	 ah yes new violation, expected
[13:31:14] <wikibugs>	 (03PS1) 10Marostegui: Revert "Revert "db-eqiad.php: Depool db1079"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402043
[13:33:13] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "Revert "db-eqiad.php: Depool db1079"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402043 (owner: 10Marostegui)
[13:34:20] <wikibugs>	 10Operations, 10Data-Services, 10MediaWiki-Maintenance-scripts, 10Wikidata: Missing references to s8 on maintenance and cloud scripts (and potentially others) - https://phabricator.wikimedia.org/T184179#3875094 (10jcrespo) @bd808 @Andrew This contains the self-actionable part of T181643 (mostly dns-related...
[13:34:42] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Revert "db-eqiad.php: Depool db1079"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402043 (owner: 10Marostegui)
[13:35:01] <icinga-wm_>	 PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 147.44, 129.70, 119.13
[13:35:55] <marostegui>	 !log Stop replication in sync db1079 db1101:3317 T163190
[13:36:02] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1079 - T163190 (duration: 01m 02s)
[13:36:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:05] <stashbot>	 T163190: Checksum data on s7 - https://phabricator.wikimedia.org/T163190
[13:36:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:26] <wikibugs>	 (03PS1) 10Marostegui: Revert "Revert "Revert "db-eqiad.php: Depool db1079""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402045
[13:36:44] <wikibugs>	 (03CR) 10jenkins-bot: Revert "Revert "db-eqiad.php: Depool db1079"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402043 (owner: 10Marostegui)
[13:40:32] <wikibugs>	 (03PS1) 10Jcrespo: mediawiki-maintenance: Run maintenance on new s8 replica set, too [puppet] - 10https://gerrit.wikimedia.org/r/402047 (https://phabricator.wikimedia.org/T184179)
[13:40:38] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "Revert "Revert "db-eqiad.php: Depool db1079""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402045 (owner: 10Marostegui)
[13:41:31] <icinga-wm_>	 PROBLEM - SSH on ms-be1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:42:06] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Revert "Revert "db-eqiad.php: Depool db1079""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402045 (owner: 10Marostegui)
[13:42:08] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "Requires s8.dblist/s5.dblist update and potentially noc source code update, too." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401436 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui)
[13:42:19] <wikibugs>	 (03CR) 10jenkins-bot: Revert "Revert "Revert "db-eqiad.php: Depool db1079""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402045 (owner: 10Marostegui)
[13:43:58] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1079 - T163190 (duration: 01m 01s)
[13:44:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:09] <stashbot>	 T163190: Checksum data on s7 - https://phabricator.wikimedia.org/T163190
[13:45:21] <icinga-wm_>	 RECOVERY - SSH on ms-be1013 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u1 (protocol 2.0)
[13:47:40] <wikibugs>	 10Operations, 10Data-Services, 10MediaWiki-Maintenance-scripts, 10Wikidata, 10Patch-For-Review: Missing references to s8 on maintenance and cloud scripts (and potentially others) - https://phabricator.wikimedia.org/T184179#3875127 (10jcrespo) ^see if that patch makes sense
[13:48:48] <wikibugs>	 10Operations, 10Data-Services, 10MediaWiki-Maintenance-scripts, 10Wikidata, 10Patch-For-Review: Missing references to s8 on maintenance and cloud scripts (and potentially others) - https://phabricator.wikimedia.org/T184179#3875129 (10mark) p:05Normal>03High
[13:58:13] <wikibugs>	 (03PS5) 10EddieGP: Restrict sending mails to new users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/397768 (https://phabricator.wikimedia.org/T182541)
[13:59:21] <icinga-wm_>	 PROBLEM - DPKG on mw1209 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[13:59:53] <wikibugs>	 (03PS4) 10Marostegui: db-eqiad.php: Point wikidatawiki to s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401436 (https://phabricator.wikimedia.org/T177208)
[14:00:04] <jouncebot>	 addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy European Mid-day SWAT(Max 8 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180104T1400).
[14:00:04] <jouncebot>	 eddiegp and Steinsplitter: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:21] <icinga-wm_>	 RECOVERY - DPKG on mw1209 is OK: All packages OK
[14:00:26] * eddiegp is here
[14:00:33] * Steinsplitter waves
[14:00:38] <Niharika>	 I can SWAT. 
[14:01:03] <zeljkof>	 o/
[14:01:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Point wikidatawiki to s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401436 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui)
[14:01:14] <wikibugs>	 (03CR) 10ArielGlenn: [C: 031] "aAs far as overlap with dumps usage of vslow, it's no better or worse than previous usage, which has gotten to be kind of crappy over time" [puppet] - 10https://gerrit.wikimedia.org/r/402047 (https://phabricator.wikimedia.org/T184179) (owner: 10Jcrespo)
[14:01:32] <icinga-wm_>	 PROBLEM - SSH on ms-be1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:01:34] <wikibugs>	 (03PS3) 10Steinsplitter: Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow onwiki localizaion of a commosn specific notice. All changes have been made onwiki yet. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402040 (https://phabricator.wikimedia.org/T183848)
[14:02:54] <wikibugs>	 (03CR) 10Niharika29: [C: 032] "SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/397768 (https://phabricator.wikimedia.org/T182541) (owner: 10EddieGP)
[14:03:05] <wikibugs>	 (03CR) 10Luke081515: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402040 (https://phabricator.wikimedia.org/T183848) (owner: 10Steinsplitter)
[14:03:13] <wikibugs>	 (03CR) 10Luke081515: [C: 031] Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow onwiki localizaion of a commosn specific notice. All changes have been made onw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402040 (https://phabricator.wikimedia.org/T183848) (owner: 10Steinsplitter)
[14:04:14] <wikibugs>	 (03Merged) 10jenkins-bot: Restrict sending mails to new users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/397768 (https://phabricator.wikimedia.org/T182541) (owner: 10EddieGP)
[14:06:52] <wikibugs>	 (03CR) 10jenkins-bot: Restrict sending mails to new users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/397768 (https://phabricator.wikimedia.org/T182541) (owner: 10EddieGP)
[14:07:13] <Niharika>	 eddiegp: Can you test your change? It's on mwdebug1002. 
[14:07:24] <eddiegp>	 Niharika: doing
[14:07:42] <Niharika>	 eddiegp: Wait a second. scap is taking too long.
[14:08:11] <icinga-wm_>	 PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 129.54, 140.99, 127.14
[14:08:31] <icinga-wm_>	 RECOVERY - SSH on ms-be1013 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u1 (protocol 2.0)
[14:08:55] <Niharika>	 Still waiting...
[14:09:41] <Niharika>	 eddiegp: Done. 
[14:09:44] <eddiegp>	 Niharika: It's already working though :)
[14:09:53] <Niharika>	 eddiegp: Okay, great! 
[14:10:56] <wikibugs>	 (03CR) 10Jcrespo: "I think the issue is that you created a file, not a ../../../ link" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401436 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui)
[14:11:41] <icinga-wm_>	 PROBLEM - SSH on ms-be1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:11:43] <logmsgbot>	 !log niharika29@tin Synchronized wmf-config/InitialiseSettings.php: Restrict sending mails to new users T182541 (duration: 01m 02s)
[14:11:52] <Niharika>	 eddiegp: Synced^
[14:11:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:55] <stashbot>	 T182541: Update Wikimedia configuration to prevent some users from sending emails - https://phabricator.wikimedia.org/T182541
[14:12:00] <wikibugs>	 (03PS4) 10Niharika29: Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow onwiki localizaion of a commosn specific notice. All changes have been made onwiki yet. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402040 (https://phabricator.wikimedia.org/T183848) (owner: 10Steinsplitter)
[14:12:10] <eddiegp>	 Niharika: Thanks :)
[14:12:34] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9551/ says it's fine with a PCC across the fleet. Also a quick check in labs was fine as we" [puppet] - 10https://gerrit.wikimedia.org/r/401974 (owner: 10Alexandros Kosiaris)
[14:12:49] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: ifguard $realm and $cluster with defined() [puppet] - 10https://gerrit.wikimedia.org/r/401974
[14:13:02] <wikibugs>	 (03CR) 10Niharika29: [C: 032] "SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402040 (https://phabricator.wikimedia.org/T183848) (owner: 10Steinsplitter)
[14:13:42] <icinga-wm_>	 RECOVERY - SSH on ms-be1013 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u1 (protocol 2.0)
[14:14:12] <icinga-wm_>	 PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 101.68, 118.30, 121.21
[14:14:21] <wikibugs>	 (03Merged) 10jenkins-bot: Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow onwiki localizaion of a commosn specific notice. All changes have been made onwiki yet. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402040 (https://phabricator.wikimedia.org/T183848) (owner: 10Steinsplitter)
[14:15:04] <Niharika>	 Steinsplitter: Yours is on mwdebug1002 as well.
[14:15:04] <wikibugs>	 (03PS5) 10Marostegui: db-eqiad.php: Point wikidatawiki to s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401436 (https://phabricator.wikimedia.org/T177208)
[14:15:17] <Steinsplitter>	 Niharika: thanks will test 
[14:16:11] <wikibugs>	 (03CR) 10Jcrespo: "Hey, I asked to upgrade the dump hosts! And dumps should be faster on dedicated hardware, probably?" [puppet] - 10https://gerrit.wikimedia.org/r/402047 (https://phabricator.wikimedia.org/T184179) (owner: 10Jcrespo)
[14:16:56] <wikibugs>	 (03CR) 10jenkins-bot: Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow onwiki localizaion of a commosn specific notice. All changes have been made onwiki yet. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402040 (https://phabricator.wikimedia.org/T183848) (owner: 10Steinsplitter)
[14:17:55] <wikibugs>	 (03CR) 10ArielGlenn: [C: 031] "They run slower due to other concerns, such as wikidata eating us alive, nothing for your todo list ;-)" [puppet] - 10https://gerrit.wikimedia.org/r/402047 (https://phabricator.wikimedia.org/T184179) (owner: 10Jcrespo)
[14:19:08] <Steinsplitter>	 Niharika: it is synchronized yet? 
[14:19:19] <Niharika>	 Steinsplitter: Yep. 
[14:19:45] <Steinsplitter>	 perfect, thanks.
[14:20:05] <Niharika>	 Steinsplitter: Oh you mean synchornised everywhere?
[14:20:11] <Niharika>	 No, I was waiting on you testing it.
[14:20:20] <Niharika>	 synchronized* 
[14:20:51] <Steinsplitter>	 one sec.
[14:21:58] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] "That should be it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401436 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui)
[14:22:05] <Steinsplitter>	 Niharika: works.
[14:22:18] <Niharika>	 Alright, it's going live then.
[14:24:24] <logmsgbot>	 !log niharika29@tin Synchronized wmf-config/InitialiseSettings.php: Adding Movepage-summary to wgForceUIMsgAsContentMsg T183848 (duration: 01m 02s)
[14:24:31] <Niharika>	 Steinsplitter: Done. It should be out everywhere now. 
[14:24:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:38] <stashbot>	 T183848: MediaWiki:Movepage-summary is not forced to content language - https://phabricator.wikimedia.org/T183848
[14:25:38] <Niharika>	 I guess that's all for SWAT today. 
[14:25:45] <Steinsplitter>	 Niharika>  thanks!
[14:25:51] <icinga-wm_>	 PROBLEM - SSH on ms-be1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:27:11] <icinga-wm_>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 24 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:27:24] <wikibugs>	 (03PS1) 10Elukey: profile::hadoop: set hiera defaults to ease labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790)
[14:28:41] <wikibugs>	 (03CR) 10Ottomata: [C: 031] profile::hadoop: set hiera defaults to ease labs deployments (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[14:32:12] <icinga-wm_>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 10 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:33:43] <wikibugs>	 10Operations, 10Puppet, 10Puppet-infrastructure-modernization: Fix unknown variables warning that occur with puppet 4.x - https://phabricator.wikimedia.org/T184186#3875233 (10akosiaris)
[14:34:51] <icinga-wm_>	 RECOVERY - SSH on ms-be1013 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u1 (protocol 2.0)
[14:35:22] <icinga-wm_>	 PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 88.80, 121.97, 129.62
[14:35:48] <wikibugs>	 (03CR) 10Ottomata: [C: 031] "Cool!  +1 for this as a no-op :)" [puppet] - 10https://gerrit.wikimedia.org/r/401927 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[14:36:09] <wikibugs>	 (03PS37) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956)
[14:36:17] <wikibugs>	 (03CR) 10Ottomata: [C: 031] profile::hadoop::master: remove the last hadoop cdh auto-lookup [puppet] - 10https://gerrit.wikimedia.org/r/401904 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[14:36:33] <volans>	 akosiaris: should we restart ircecho to have icinga-wm_ without underscore?
[14:38:20] <akosiaris>	 the code is a mess btw (not that I did not already know, my hands are in that one as well), I 've taken a step back to reevaluate a bit
[14:38:29] <akosiaris>	 volans: and done
[14:38:36] <volans>	 ok, and thanks!
[14:43:23] <wikibugs>	 (03PS2) 10Elukey: profile::hadoop::master: remove the last hadoop cdh auto-lookup [puppet] - 10https://gerrit.wikimedia.org/r/401904 (https://phabricator.wikimedia.org/T167790)
[14:44:03] <wikibugs>	 (03CR) 10Elukey: [C: 032] profile::hadoop::master: remove the last hadoop cdh auto-lookup [puppet] - 10https://gerrit.wikimedia.org/r/401904 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[14:44:19] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Disable docker bridge in production/staging [puppet] - 10https://gerrit.wikimedia.org/r/401929
[14:44:25] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Disable docker bridge in production/staging [puppet] - 10https://gerrit.wikimedia.org/r/401929 (owner: 10Alexandros Kosiaris)
[14:44:35] <wikibugs>	 (03CR) 10Jcrespo: "See parsercachepurging.pp for what I mean about the cron scheduling. Not voting -1 because maybe weekly purges are intended." [puppet] - 10https://gerrit.wikimedia.org/r/395694 (https://phabricator.wikimedia.org/T181107) (owner: 10Gergő Tisza)
[14:44:58] <wikibugs>	 (03PS4) 10Jcrespo: Add ReadingLists tables to Toolforge filter config [puppet] - 10https://gerrit.wikimedia.org/r/397623 (owner: 10Gergő Tisza)
[14:45:15] <wikibugs>	 (03PS4) 10Elukey: Refactor thorium's roles in one [puppet] - 10https://gerrit.wikimedia.org/r/401927 (https://phabricator.wikimedia.org/T167790)
[14:46:20] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] Add ReadingLists tables to Toolforge filter config [puppet] - 10https://gerrit.wikimedia.org/r/397623 (owner: 10Gergő Tisza)
[14:46:51] <wikibugs>	 (03CR) 10Elukey: [C: 032] Refactor thorium's roles in one [puppet] - 10https://gerrit.wikimedia.org/r/401927 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[14:46:56] <wikibugs>	 (03PS5) 10Elukey: Refactor thorium's roles in one [puppet] - 10https://gerrit.wikimedia.org/r/401927 (https://phabricator.wikimedia.org/T167790)
[14:47:06] <wikibugs>	 10Operations, 10Puppet, 10Puppet-infrastructure-modernization: Fix unknown variables warning that occur with puppet 4.x - https://phabricator.wikimedia.org/T184186#3875271 (10Paladox) For  'passwords::gerrit::gerrit_phab_token'. at /etc/puppet/modules/gerrit/manifests/jetty.pp:43:19  Can be fixed anytime but...
[14:49:58] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Move db2046 socket location to /run [puppet] - 10https://gerrit.wikimedia.org/r/402054 (https://phabricator.wikimedia.org/T148507)
[14:51:32] <icinga-wm>	 PROBLEM - puppet last run on thorium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[init_superset]
[14:51:41] <elukey>	 this is me --^
[14:51:41] <elukey>	 fixing
[14:51:52] <icinga-wm>	 PROBLEM - Host kubernetes1002 is DOWN: PING CRITICAL - Packet loss = 100%
[14:52:21] <icinga-wm>	 RECOVERY - Host kubernetes1002 is UP: PING WARNING - Packet loss = 61%, RTA = 0.19 ms
[14:54:15] <jynus>	 !log restart db2046 database to move socket location
[14:54:18] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: add backend varnish mtail job [puppet] - 10https://gerrit.wikimedia.org/r/402055 (https://phabricator.wikimedia.org/T177199)
[14:54:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:54:59] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Move db2046 socket location to /run [puppet] - 10https://gerrit.wikimedia.org/r/402054 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[14:58:05] <wikibugs>	 10Operations, 10ops-esams, 10Epic: Remove all decommissioned hardware - https://phabricator.wikimedia.org/T184063#3875293 (10mark)
[15:00:05] <icinga-wm>	 PROBLEM - HHVM rendering on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:00:35] <icinga-wm>	 PROBLEM - Host kubernetes1004 is DOWN: PING CRITICAL - Packet loss = 100%
[15:00:54] <icinga-wm>	 RECOVERY - Host kubernetes1004 is UP: PING WARNING - Packet loss = 50%, RTA = 84.17 ms
[15:00:55] <icinga-wm>	 RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 73461 bytes in 0.177 second response time
[15:01:34] <icinga-wm>	 RECOVERY - puppet last run on thorium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:01:55] <icinga-wm>	 PROBLEM - Host kubestage1001 is DOWN: PING CRITICAL - Packet loss = 100%
[15:02:23] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: extend eqiad SMART checking deployment [puppet] - 10https://gerrit.wikimedia.org/r/402056 (https://phabricator.wikimedia.org/T86552)
[15:02:44] <icinga-wm>	 PROBLEM - Host kubestage1002 is DOWN: PING CRITICAL - Packet loss = 100%
[15:03:01] <moritzm>	 !log upgrading HHVM on eqiad image scalers to 3.18.6
[15:03:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:14] <icinga-wm>	 RECOVERY - Host kubestage1002 is UP: PING WARNING - Packet loss = 61%, RTA = 0.26 ms
[15:03:24] <icinga-wm>	 RECOVERY - Host kubestage1001 is UP: PING OK - Packet loss = 0%, RTA = 1.37 ms
[15:03:55] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1101:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402057
[15:05:51] <wikibugs>	 (03PS2) 10Elukey: profile::hadoop: set hiera defaults to ease labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790)
[15:06:15] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 031] "The lab* hosts in this look fine to me.  You're only checking single servers as a proof-of-concept, I take it?  Ultimately I'd like all th" [puppet] - 10https://gerrit.wikimedia.org/r/402056 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi)
[15:06:27] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1101:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402057 (owner: 10Marostegui)
[15:06:35] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be1013 is OK: OK - load average: 68.53, 70.56, 79.82
[15:07:54] <icinga-wm>	 PROBLEM - HHVM rendering on mw1293 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:07:55] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402057 (owner: 10Marostegui)
[15:08:06] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402057 (owner: 10Marostegui)
[15:08:44] <icinga-wm>	 RECOVERY - HHVM rendering on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 73453 bytes in 0.260 second response time
[15:09:10] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1101:3317 - T163190 (duration: 01m 02s)
[15:09:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:21] <stashbot>	 T163190: Checksum data on s7 - https://phabricator.wikimedia.org/T163190
[15:10:29] <wikibugs>	 (03CR) 10Elukey: "no op: https://puppet-compiler.wmflabs.org/compiler02/9560/" [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[15:11:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: "> The lab* hosts in this look fine to me.  You're only checking" [puppet] - 10https://gerrit.wikimedia.org/r/402056 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi)
[15:11:44] <icinga-wm>	 PROBLEM - HHVM rendering on mw1296 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:12:25] <icinga-wm>	 PROBLEM - puppet last run on mw1293 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm-dbg],Package[hhvm]
[15:12:34] <icinga-wm>	 RECOVERY - HHVM rendering on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 73453 bytes in 0.313 second response time
[15:12:54] <icinga-wm>	 PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm-dbg],Package[hhvm]
[15:13:29] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 031] "> If you have a more representative sample of labvirt hosts you'd like to have covered first let me know!" [puppet] - 10https://gerrit.wikimedia.org/r/402056 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi)
[15:15:38] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Move db2060 socket to /run [puppet] - 10https://gerrit.wikimedia.org/r/402058 (https://phabricator.wikimedia.org/T148507)
[15:16:03] <wikibugs>	 (03PS1) 10Volans: Migration to Python 3 [software/cumin] - 10https://gerrit.wikimedia.org/r/402059
[15:16:14] <icinga-wm>	 PROBLEM - DPKG on kubernetes1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:16:24] <icinga-wm>	 PROBLEM - DPKG on kubernetes1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:16:25] <icinga-wm>	 PROBLEM - DPKG on kubernetes2003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:16:25] <icinga-wm>	 PROBLEM - DPKG on kubernetes2002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:16:35] <icinga-wm>	 PROBLEM - DPKG on kubernetes1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:16:45] <icinga-wm>	 PROBLEM - DPKG on kubernetes2004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:16:54] <icinga-wm>	 PROBLEM - DPKG on kubernetes2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:16:54] <icinga-wm>	 PROBLEM - DPKG on kubernetes1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:17:35] <icinga-wm>	 PROBLEM - puppet last run on kubernetes2001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[openssh-server],Exec[set debconf flag seen for wireshark-common/install-setuid]
[15:17:46] <jynus>	 !log upgrade and restart db2060
[15:17:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:18:44] <moritzm>	 did someone do a dist-upgrade on kubernetes* those are stuck in a debconf prompt for openssh
[15:19:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Migration to Python 3 [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 (owner: 10Volans)
[15:19:15] <icinga-wm>	 PROBLEM - puppet last run on kubernetes2002 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[openssh-server],Exec[set debconf flag seen for wireshark-common/install-setuid]
[15:19:21] <moritzm>	 needs to be done with -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="force-confold"
[15:19:35] <akosiaris>	 I did
[15:19:46] <akosiaris>	 but not a dist-upgrade
[15:19:54] <akosiaris>	 simply apt upgrade 
[15:19:55] <jynus>	 was puppet?
[15:19:57] <jynus>	 ah
[15:20:04] <akosiaris>	 anyway fixing
[15:20:06] <moritzm>	 that'll trigger it as well
[15:20:32] <wikibugs>	 (03PS1) 10Ema: cache_canary: use main Kafka cluster(s) [puppet] - 10https://gerrit.wikimedia.org/r/402061
[15:21:27] <wikibugs>	 (03PS2) 10Volans: Migration to Python 3 [software/cumin] - 10https://gerrit.wikimedia.org/r/402059
[15:21:31] <moritzm>	 ideally we'd find a way to fix our puppet sshd config not to interfere with the default config shipped in the package, needs some poking
[15:21:45] <icinga-wm>	 RECOVERY - DPKG on kubernetes2004 is OK: All packages OK
[15:21:54] <icinga-wm>	 RECOVERY - DPKG on kubernetes1002 is OK: All packages OK
[15:21:54] <icinga-wm>	 RECOVERY - DPKG on kubernetes2001 is OK: All packages OK
[15:22:16] <icinga-wm>	 RECOVERY - DPKG on kubernetes1004 is OK: All packages OK
[15:22:24] <icinga-wm>	 RECOVERY - DPKG on kubernetes1003 is OK: All packages OK
[15:22:24] <icinga-wm>	 RECOVERY - DPKG on kubernetes2003 is OK: All packages OK
[15:22:25] <icinga-wm>	 RECOVERY - DPKG on kubernetes2002 is OK: All packages OK
[15:22:25] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Move db2060 socket to /run [puppet] - 10https://gerrit.wikimedia.org/r/402058 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[15:22:35] <icinga-wm>	 RECOVERY - puppet last run on kubernetes2001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[15:22:35] <icinga-wm>	 RECOVERY - DPKG on kubernetes1001 is OK: All packages OK
[15:23:38] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] "Ah!  great, yes." [puppet] - 10https://gerrit.wikimedia.org/r/402061 (owner: 10Ema)
[15:23:42] <wikibugs>	 (03PS2) 10Ottomata: cache_canary: use main Kafka cluster(s) [puppet] - 10https://gerrit.wikimedia.org/r/402061 (owner: 10Ema)
[15:23:53] <ottomata>	 ema:  shall I merge and apply that?
[15:24:15] <icinga-wm>	 RECOVERY - puppet last run on kubernetes2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[15:25:13] <ema>	 ottomata: yeah, that patch fixes one of pinkunicorn puppetfails so let's do that :)
[15:25:24] <ema>	 thanks!
[15:26:41] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to Production SSH, statistics-privatedata-users, analytics-privatedata-users for imarlier - https://phabricator.wikimedia.org/T184190#3875340 (10Imarlier)
[15:26:46] <ema>	 _joe_: the other puppetfail is likely due to adding ipsec to basic cache roles I think
[15:26:57] <ema>	 https://puppet-compiler.wmflabs.org/compiler02/9562/cp1008.wikimedia.org/change.cp1008.wikimedia.org.err
[15:27:20] <wikibugs>	 10Operations, 10Patch-For-Review: Debian Jessie reimage/install ends up in kernel panic with 8.10 netboot image - https://phabricator.wikimedia.org/T182702#3875354 (10Marostegui)
[15:27:25] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Rack and setup db1111 and db1112 - https://phabricator.wikimedia.org/T180788#3875352 (10Marostegui) 05Open>03Resolved >>! In T180788#3854799, @Marostegui wrote: > This has been all set.  > Servers replicate between each other (db1111 being the master).  > They con...
[15:27:25] <akosiaris>	 moritzm: anyway why upgrading openssh-server generates an SSH2 DSA key ?
[15:27:28] <akosiaris>	 Creating SSH2 DSA key; this may take some time ...
[15:27:28] <akosiaris>	 1024 SHA256:fcRTTDIGn+z2JzgwZAx5RAuuG19jEK9tH9axQxhMlME root@kubestage1001 (DSA)
[15:27:39] <akosiaris>	 I thought it was disabled in 2016 
[15:27:49] <akosiaris>	 in fact that's what the changelog says
[15:27:49] <wikibugs>	 (03PS1) 10Ottomata: Refine mediawiki job queue events into Hive event database [puppet] - 10https://gerrit.wikimedia.org/r/402064
[15:28:01] <ema>	 _joe_: oh, yes, that's because role::cache::canary includes role::cache::text
[15:28:05] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Move db2067 socket to /run [puppet] - 10https://gerrit.wikimedia.org/r/402065 (https://phabricator.wikimedia.org/T148507)
[15:30:01] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Move db2067 socket to /run [puppet] - 10https://gerrit.wikimedia.org/r/402065 (https://phabricator.wikimedia.org/T148507)
[15:30:31] <wikibugs>	 (03PS1) 10Ema: role::cache::text: do not include ipsec role for pinkunicorn [puppet] - 10https://gerrit.wikimedia.org/r/402067
[15:30:33] <moritzm>	 akosiaris: the postinst checks whether sshd_config configures "HostKey" and if that's the case generates host keys for all configured variants
[15:31:06] <wikibugs>	 (03CR) 10Ottomata: [C: 032] "Looooks good. https://puppet-compiler.wmflabs.org/compiler02/9563/analytics1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/402064 (owner: 10Ottomata)
[15:31:07] <moritzm>	 to fully disable it from our config we need the ganeti from stretch 9.2 or a backport, see https://phabricator.wikimedia.org/T177371
[15:31:10] <wikibugs>	 (03PS2) 10Ottomata: Refine mediawiki job queue events into Hive event database [puppet] - 10https://gerrit.wikimedia.org/r/402064
[15:31:14] <wikibugs>	 (03PS1) 10Rush: tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018)
[15:31:20] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Refine mediawiki job queue events into Hive event database [puppet] - 10https://gerrit.wikimedia.org/r/402064 (owner: 10Ottomata)
[15:31:31] <wikibugs>	 (03PS1) 10Filippo Giunchedi: cassandra: use prometheus-jmx-exporter Debian package [puppet] - 10https://gerrit.wikimedia.org/r/402069 (https://phabricator.wikimedia.org/T181728)
[15:31:33] <wikibugs>	 (03PS1) 10Filippo Giunchedi: cassandra: switch to using jmx-exporter jar from Debian package [puppet] - 10https://gerrit.wikimedia.org/r/402070 (https://phabricator.wikimedia.org/T181728)
[15:31:36] <logmsgbot>	 !log demon@tin Synchronized php-1.31.0-wmf.15/extensions/ActiveAbstract/: unbreak, T184177 (duration: 01m 02s)
[15:31:44] <no_justification>	 apergos: ^^^
[15:31:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:45] <stashbot>	 T184177: Abstract dumps broken by MW deploy - https://phabricator.wikimedia.org/T184177
[15:31:48] <apergos>	 thank you
[15:31:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[15:31:52] <no_justification>	 yw
[15:32:00] <apergos>	 let me have a run of that command on my test host again
[15:32:04] <akosiaris>	 moritzm: ah nice thanks
[15:32:25] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to Production SSH, statistics-privatedata-users, analytics-privatedata-users, perf-team for imarlier - https://phabricator.wikimedia.org/T184190#3875392 (10Imarlier)
[15:32:53] <wikibugs>	 (03PS2) 10Filippo Giunchedi: hieradata: extend eqiad SMART checking deployment [puppet] - 10https://gerrit.wikimedia.org/r/402056 (https://phabricator.wikimedia.org/T86552)
[15:33:02] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1335 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[15:33:36] <akosiaris>	 moritzm: so what basically happened in this case is that I upgraded for the first time ever the openssh-server package since those hosts were installed (and the sshd_config we ship was applied afterwards). Ok makes sense
[15:33:37] <akosiaris>	 thanks!
[15:34:47] <wikibugs>	 (03PS2) 10Ema: role::cache::text: do not include ipsec role for pinkunicorn [puppet] - 10https://gerrit.wikimedia.org/r/402067
[15:34:57] <wikibugs>	 (03CR) 10Ema: [V: 032 C: 032] role::cache::text: do not include ipsec role for pinkunicorn [puppet] - 10https://gerrit.wikimedia.org/r/402067 (owner: 10Ema)
[15:35:02] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1335 is OK: OK: nf_conntrack is 72 % full
[15:35:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: "> > If you have a more representative sample of labvirt hosts you'd" [puppet] - 10https://gerrit.wikimedia.org/r/402056 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi)
[15:35:15] <wikibugs>	 (03PS2) 10Rush: tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018)
[15:35:28] <wikibugs>	 (03PS3) 10Rush: tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018)
[15:36:15] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Move db2067 socket to /run [puppet] - 10https://gerrit.wikimedia.org/r/402065 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo)
[15:36:21] <wikibugs>	 (03PS3) 10Jcrespo: mariadb: Move db2067 socket to /run [puppet] - 10https://gerrit.wikimedia.org/r/402065 (https://phabricator.wikimedia.org/T148507)
[15:36:34] <jynus>	 !log upgrade and restart db2067
[15:36:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[15:36:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:15] <apergos>	 looks good, thanks much
[15:37:22] <icinga-wm>	 RECOVERY - puppet last run on mw1293 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[15:37:49] <ema>	 _joe_: fixed with https://gerrit.wikimedia.org/r/402067 FYI 
[15:37:52] <icinga-wm>	 RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[15:40:45] <wikibugs>	 (03PS3) 10Elukey: profile::hadoop: set hiera defaults to ease labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790)
[15:41:12] <icinga-wm>	 PROBLEM - HHVM rendering on mw1297 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:41:22] <icinga-wm>	 PROBLEM - HHVM rendering on mw1298 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:41:57] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler03/9567/" [puppet] - 10https://gerrit.wikimedia.org/r/402070 (https://phabricator.wikimedia.org/T181728) (owner: 10Filippo Giunchedi)
[15:42:02] <icinga-wm>	 RECOVERY - HHVM rendering on mw1297 is OK: HTTP OK: HTTP/1.1 200 OK - 73453 bytes in 0.274 second response time
[15:42:13] <icinga-wm>	 RECOVERY - HHVM rendering on mw1298 is OK: HTTP OK: HTTP/1.1 200 OK - 73455 bytes in 1.996 second response time
[15:42:15] <wikibugs>	 (03PS1) 10Ottomata: Use intermediate script for json refine jobs [puppet] - 10https://gerrit.wikimedia.org/r/402072
[15:42:21] <wikibugs>	 (03PS2) 10Ottomata: Use intermediate script for json refine jobs [puppet] - 10https://gerrit.wikimedia.org/r/402072
[15:42:52] <icinga-wm>	 PROBLEM - Host kubernetes1002 is DOWN: PING CRITICAL - Packet loss = 100%
[15:43:02] <icinga-wm>	 RECOVERY - Host kubernetes1002 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms
[15:44:12] <jynus>	 !log upgrade and restart db2076
[15:44:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:46:22] <icinga-wm>	 PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm-dbg],Package[hhvm]
[15:47:02] <icinga-wm>	 PROBLEM - Host kubernetes1002 is DOWN: PING CRITICAL - Packet loss = 100%
[15:47:22] <icinga-wm>	 RECOVERY - Host kubernetes1002 is UP: PING WARNING - Packet loss = 93%, RTA = 192.30 ms
[15:48:32] <wikibugs>	 (03PS4) 10Elukey: profile::hadoop: set hiera defaults to ease labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790)
[15:49:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::hadoop: set hiera defaults to ease labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[15:49:08] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to Production SSH, statistics-privatedata-users, analytics-privatedata-users, perf-team for imarlier - https://phabricator.wikimedia.org/T184190#3875434 (10Nuria) Approved
[15:49:27] <wikibugs>	 (03PS3) 10Ottomata: Use intermediate script for json refine jobs [puppet] - 10https://gerrit.wikimedia.org/r/402072
[15:49:32] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:52:07] <wikibugs>	 (03PS5) 10Elukey: profile::hadoop: set hiera defaults to ease labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790)
[15:53:26] <wikibugs>	 (03PS4) 10Andrew Bogott: tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[15:53:51] <wikibugs>	 (03CR) 10Ottomata: [C: 032] "https://puppet-compiler.wmflabs.org/compiler03/9571/analytics1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/402072 (owner: 10Ottomata)
[15:53:59] <wikibugs>	 10Operations, 10Data-Services, 10MediaWiki-Maintenance-scripts, 10Wikidata, 10Patch-For-Review: Missing references to s8 on maintenance and cloud scripts (and potentially others) - https://phabricator.wikimedia.org/T184179#3875081 (10chasemp) Just an extra ping for @bd808 as he wrote most of what I think...
[15:55:36] <wikibugs>	 (03CR) 10Elukey: [C: 032] "a wonderful no-op https://puppet-compiler.wmflabs.org/compiler03/9572/" [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[15:55:43] <wikibugs>	 (03PS6) 10Elukey: profile::hadoop: set hiera defaults to ease labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/402050 (https://phabricator.wikimedia.org/T167790)
[16:03:37] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes1002 is OK: OK - unknown: The operational state could not be determined, due to lack of resources or another error cause.
[16:04:07] <akosiaris>	 all the kubernetes hosts rebooting is me (in case icinga-wm is fast enough)
[16:04:37] <icinga-wm>	 PROBLEM - Host kubernetes2003 is DOWN: PING CRITICAL - Packet loss = 100%
[16:04:47] <icinga-wm>	 PROBLEM - Host kubernetes1004 is DOWN: PING CRITICAL - Packet loss = 100%
[16:04:57] <icinga-wm>	 PROBLEM - Host kubernetes2002 is DOWN: PING CRITICAL - Packet loss = 100%
[16:04:58] <icinga-wm>	 PROBLEM - Host kubernetes1003 is DOWN: PING CRITICAL - Packet loss = 100%
[16:04:58] <icinga-wm>	 PROBLEM - Host kubestage1002 is DOWN: PING CRITICAL - Packet loss = 100%
[16:05:08] <mark>	 icinga-wm is faster than you think
[16:05:17] <icinga-wm>	 PROBLEM - Host kubestage1001 is DOWN: PING CRITICAL - Packet loss = 100%
[16:05:17] <icinga-wm>	 PROBLEM - Host kubernetes2004 is DOWN: PING CRITICAL - Packet loss = 100%
[16:05:17] <icinga-wm>	 RECOVERY - Host kubernetes1003 is UP: PING OK - Packet loss = 0%, RTA = 14.23 ms
[16:05:17] <icinga-wm>	 PROBLEM - Host kubernetes2001 is DOWN: PING CRITICAL - Packet loss = 100%
[16:05:28] <icinga-wm>	 RECOVERY - Host kubernetes1004 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms
[16:05:37] <icinga-wm>	 RECOVERY - Host kubernetes2004 is UP: PING WARNING - Packet loss = 61%, RTA = 57.96 ms
[16:05:37] <icinga-wm>	 RECOVERY - Host kubernetes2003 is UP: PING OK - Packet loss = 0%, RTA = 36.14 ms
[16:05:37] <icinga-wm>	 RECOVERY - Host kubernetes2001 is UP: PING OK - Packet loss = 0%, RTA = 36.10 ms
[16:05:47] <icinga-wm>	 RECOVERY - Host kubestage1001 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[16:05:49] <icinga-wm>	 RECOVERY - Host kubernetes2002 is UP: PING OK - Packet loss = 0%, RTA = 36.08 ms
[16:05:49] <icinga-wm>	 RECOVERY - Host kubestage1002 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[16:06:06] <akosiaris>	 it maybe has to do something with python threads starting python threads starting forever loops
[16:06:08] <akosiaris>	 :P
[16:06:24] * akosiaris has a headache once more by reading ircecho.py
[16:07:19] <chasemp>	 akosiaris: one time I fixed a bug in it and tha tresulted in revealing like 5 more scoping and shadow var name issues and I have fled from it ever since
[16:08:30] <akosiaris>	 chasemp: I can feel you. I 've done the exact same thing. I 've even started reading python-irclib code. That's when I was terrified and decided I don't want much to do with it. And it only returned like some bad movie with a vengeance
[16:09:55] <mark>	 is ircecho.py running on kubernetes yet :D
[16:10:09] <akosiaris>	 sure
[16:10:36] <akosiaris>	 for like -2147483648 days already
[16:10:41] <akosiaris>	 oh wait....
[16:10:48] <paravoid>	 should replace our IRC bots with a proper bot that consumes from kafka
[16:11:00] <paravoid>	 and a bunch of kafka producers emitting notable events :)
[16:11:33] <akosiaris>	 paravoid is volunteering to rewrite ircecho as kafkaecho!!!
[16:11:36] <ottomata>	 <:o)
[16:11:40] <paravoid>	 I just might :)
[16:11:44] <mark>	 paravoid has a team now
[16:11:48] <ottomata>	 he already sorta has!
[16:12:08] <paravoid>	 nah, that was an IRC server!
[16:12:13] <ottomata>	 OHHH
[16:12:15] <ottomata>	 the IRC bot
[16:12:16] <ottomata>	 haha
[16:12:17] <ottomata>	 yes!
[16:12:18] <ottomata>	 YES!
[16:12:24] <ottomata>	 bring it on IIIIN https://wikitech.wikimedia.org/wiki/User:Ottomata/Stream_Data_Platform#Stream_Data_Platform
[16:12:54] <paravoid>	 I proposed running fedmsg in the past
[16:13:00] <paravoid>	 ( http://www.fedmsg.com/en/stable/ )
[16:13:12] <paravoid>	 but nowadays should probably just leverage kafka instead
[16:14:58] <jynus>	 !log upgrade and restart db2087 (s6/s7)
[16:15:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:17] <icinga-wm>	 RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[16:18:36] <ottomata>	 paravoid:  have you seen https://github.com/confluentinc/ksql#use-cases-and-examples ?
[16:19:23] <paravoid>	 ksql you mean, or one of those specific examples?
[16:19:37] <ottomata>	 ksql
[16:20:01] <paravoid>	 --- Log opened Mon Aug 28 23:10:03 2017
[16:20:01] <paravoid>	 23:10 <paravoid> did you see https://www.confluent.io/blog/ksql-open-source-streaming-sql-for-apache-kafka/ ?
[16:20:04] <paravoid>	 23:11 <ottomata> whoooa no i didn't
[16:20:08] <paravoid>	 :P
[16:20:11] <ottomata>	 HAHAHHA
[16:20:24] <ottomata>	 i'm sure i'll ask you again in a few months too
[16:20:29] <paravoid>	 :D
[16:21:02] <ottomata>	 i finally got their more recent code to build and run pointed at kafka-jumbo 
[16:21:05] <ottomata>	 still need to play some more
[16:21:10] <paravoid>	 cool!
[16:22:04] <wikibugs>	 (03PS5) 10Andrew Bogott: tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[16:22:06] <wikibugs>	 (03PS1) 10Andrew Bogott: kmod blacklist: allow ensure => absent for a given blacklist [puppet] - 10https://gerrit.wikimedia.org/r/402075 (https://phabricator.wikimedia.org/T184018)
[16:26:20] <wikibugs>	 10Operations, 10Packaging, 10Scap, 10Patch-For-Review: SCAP: Upload debian package version 3.7.4-3 - https://phabricator.wikimedia.org/T182347#3875582 (10thcipriani)
[16:26:22] <wikibugs>	 10Operations, 10Scap, 10Patch-For-Review: scap 3.7.4-2 is broken - https://phabricator.wikimedia.org/T183046#3875579 (10thcipriani) 05Open>03Resolved a:03akosiaris This one was resolved with the release of scap 3.7.4-3
[16:29:36] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: cp1066's DRAC not responding to SSH - https://phabricator.wikimedia.org/T184196#3875589 (10ema)
[16:29:50] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: cp1066's DRAC not responding to SSH - https://phabricator.wikimedia.org/T184196#3875603 (10ema) p:05Triage>03Normal
[16:30:06] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: cp1066's DRAC not responding to SSH - https://phabricator.wikimedia.org/T184196#3875589 (10ema)
[16:31:12] <wikibugs>	 (03CR) 10Andrew Bogott: "Compiler output (effective no-op) here:  https://puppet-compiler.wmflabs.org/compiler02/9575/puppetmaster1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[16:32:42] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] hieradata: extend eqiad SMART checking deployment [puppet] - 10https://gerrit.wikimedia.org/r/402056 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi)
[16:37:09] <wikibugs>	 (03PS3) 10Filippo Giunchedi: hieradata: extend eqiad SMART checking deployment [puppet] - 10https://gerrit.wikimedia.org/r/402056 (https://phabricator.wikimedia.org/T86552)
[16:37:30] <wikibugs>	 (03CR) 10Rush: tools: need overlay module for overlay2 for k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[16:38:45] <jynus>	 !log upgrade and restart db2089 (s5/s6)
[16:38:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:39:24] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] hieradata: extend eqiad SMART checking deployment [puppet] - 10https://gerrit.wikimedia.org/r/402056 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi)
[16:46:12] <wikibugs>	 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3875655 (10Papaul) @RobH  Row B rack B1 labtestvirt2002 ge-1/0/12
[16:46:55] <wikibugs>	 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3875657 (10Papaul) a:05Papaul>03RobH
[16:50:26] <wikibugs>	 (03CR) 10Rush: [C: 031] "makes sense to me" [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[16:50:49] <wikibugs>	 (03CR) 10Rush: [C: 031] "Have to manage state here for good or ill to leave existing bans or future removed bans in a predictive state" [puppet] - 10https://gerrit.wikimedia.org/r/402075 (https://phabricator.wikimedia.org/T184018) (owner: 10Andrew Bogott)
[16:52:03] <wikibugs>	 (03PS2) 10Andrew Bogott: kmod blacklist: allow ensure => absent for a given blacklist [puppet] - 10https://gerrit.wikimedia.org/r/402075 (https://phabricator.wikimedia.org/T184018)
[16:52:05] <wikibugs>	 (03PS1) 10Herron: mx: add civicrm.wikimedia.org to donate_domains [puppet] - 10https://gerrit.wikimedia.org/r/402078 (https://phabricator.wikimedia.org/T184120)
[16:53:52] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] kmod blacklist: allow ensure => absent for a given blacklist [puppet] - 10https://gerrit.wikimedia.org/r/402075 (https://phabricator.wikimedia.org/T184018) (owner: 10Andrew Bogott)
[16:54:38] <wikibugs>	 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3875676 (10Gehel) A .deb of prometheus jxm_exporter is now available. I started to experiment on `deployment-elastic06`. Elasticsea...
[16:54:45] <wikibugs>	 (03CR) 10Herron: [C: 032] mx: add civicrm.wikimedia.org to donate_domains [puppet] - 10https://gerrit.wikimedia.org/r/402078 (https://phabricator.wikimedia.org/T184120) (owner: 10Herron)
[16:54:54] <wikibugs>	 (03PS2) 10Herron: mx: add civicrm.wikimedia.org to donate_domains [puppet] - 10https://gerrit.wikimedia.org/r/402078 (https://phabricator.wikimedia.org/T184120)
[16:56:27] <wikibugs>	 10Operations, 10Cloud-VPS, 10Toolforge, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3875680 (10chasemp)
[17:00:05] <jouncebot>	 godog, moritzm, and _joe_: #bothumor My software never has bugs. It just develops random features. Rise for Puppet SWAT(Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180104T1700).
[17:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[17:00:49] <mutante>	 ah, i'll take a look as well
[17:01:11] <mutante>	 nothing in it.. ok
[17:01:20] <mutante>	 because already merged :)
[17:02:28] <godog>	 \o/
[17:03:58] <wikibugs>	 (03CR) 10Eevans: [C: 031] "LGTM (when the timing is appropriate)" [puppet] - 10https://gerrit.wikimedia.org/r/401784 (https://phabricator.wikimedia.org/T184110) (owner: 10Mobrovac)
[17:06:48] <wikibugs>	 (03PS6) 10Andrew Bogott: tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[17:10:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[17:11:16] <wikibugs>	 (03CR) 10Andrew Bogott: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[17:11:54] <wikibugs>	 (03PS2) 10Herron: add mx records for civicrm.wikimedia.org pointing to production mx's [dns] - 10https://gerrit.wikimedia.org/r/401604 (https://phabricator.wikimedia.org/T184120) (owner: 10Jgreen)
[17:12:05] <wikibugs>	 (03PS2) 10Dzahn: apache: add httpd module as a replacement [puppet] - 10https://gerrit.wikimedia.org/r/400100 (owner: 10Giuseppe Lavagetto)
[17:12:28] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] tools: need overlay module for overlay2 for k8s [puppet] - 10https://gerrit.wikimedia.org/r/402068 (https://phabricator.wikimedia.org/T184018) (owner: 10Rush)
[17:14:07] <wikibugs>	 (03CR) 10Herron: [C: 032] add mx records for civicrm.wikimedia.org pointing to production mx's [dns] - 10https://gerrit.wikimedia.org/r/401604 (https://phabricator.wikimedia.org/T184120) (owner: 10Jgreen)
[17:14:14] <wikibugs>	 (03PS3) 10Herron: add mx records for civicrm.wikimedia.org pointing to production mx's [dns] - 10https://gerrit.wikimedia.org/r/401604 (https://phabricator.wikimedia.org/T184120) (owner: 10Jgreen)
[17:16:35] <wikibugs>	 (03CR) 10Ema: [C: 031] prometheus: add backend varnish mtail job [puppet] - 10https://gerrit.wikimedia.org/r/402055 (https://phabricator.wikimedia.org/T177199) (owner: 10Filippo Giunchedi)
[17:17:33] <icinga-wm>	 PROBLEM - puppet last run on db2058 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:19:12] <wikibugs>	 (03PS2) 10Filippo Giunchedi: prometheus: add backend varnish mtail job [puppet] - 10https://gerrit.wikimedia.org/r/402055 (https://phabricator.wikimedia.org/T177199)
[17:19:25] <wikibugs>	 (03CR) 10Chad: "This has been running beta, should we land it?" [puppet] - 10https://gerrit.wikimedia.org/r/386869 (https://phabricator.wikimedia.org/T179371) (owner: 10Filippo Giunchedi)
[17:20:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] prometheus: add backend varnish mtail job [puppet] - 10https://gerrit.wikimedia.org/r/402055 (https://phabricator.wikimedia.org/T177199) (owner: 10Filippo Giunchedi)
[17:20:33] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: ircecho: Remove redundant thread [puppet] - 10https://gerrit.wikimedia.org/r/402081
[17:24:37] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: ircecho: Remove redundant thread [puppet] - 10https://gerrit.wikimedia.org/r/402081
[17:26:31] <wikibugs>	 (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/402069 (https://phabricator.wikimedia.org/T181728) (owner: 10Filippo Giunchedi)
[17:26:44] <wikibugs>	 (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/402070 (https://phabricator.wikimedia.org/T181728) (owner: 10Filippo Giunchedi)
[17:26:54] <wikibugs>	 (03PS1) 10Elukey: role::analytics_cluster: avoid to expicitly instance the standard class [puppet] - 10https://gerrit.wikimedia.org/r/402084 (https://phabricator.wikimedia.org/T167790)
[17:27:04] <wikibugs>	 (03PS1) 10Herron: exim: add civicrm.wikimedia.org to wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/402085 (https://phabricator.wikimedia.org/T184120)
[17:27:53] <wikibugs>	 (03CR) 10Herron: [C: 032] exim: add civicrm.wikimedia.org to wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/402085 (https://phabricator.wikimedia.org/T184120) (owner: 10Herron)
[17:28:03] <wikibugs>	 (03PS1) 10BryanDavis: wikireplica_dns: Add s8 shard [puppet] - 10https://gerrit.wikimedia.org/r/402086 (https://phabricator.wikimedia.org/T184179)
[17:28:05] <wikibugs>	 (03PS1) 10BryanDavis: wmcs: Add s8 to maintain-meta_p [puppet] - 10https://gerrit.wikimedia.org/r/402087 (https://phabricator.wikimedia.org/T184179)
[17:32:01] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] wmcs: Add s8 to maintain-meta_p [puppet] - 10https://gerrit.wikimedia.org/r/402087 (https://phabricator.wikimedia.org/T184179) (owner: 10BryanDavis)
[17:32:40] <wikibugs>	 (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9577/ - noop" [puppet] - 10https://gerrit.wikimedia.org/r/402084 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey)
[17:32:46] <wikibugs>	 (03PS2) 10Elukey: role::analytics_cluster: avoid to expicitly instance the standard class [puppet] - 10https://gerrit.wikimedia.org/r/402084 (https://phabricator.wikimedia.org/T167790)
[17:33:03] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] wikireplica_dns: Add s8 shard [puppet] - 10https://gerrit.wikimedia.org/r/402086 (https://phabricator.wikimedia.org/T184179) (owner: 10BryanDavis)
[17:33:15] <wikibugs>	 (03CR) 10Dzahn: [C: 031] "the type aliases added into wmflib seem a little unrelated to the httpd module itself, maybe they should be a separate change" [puppet] - 10https://gerrit.wikimedia.org/r/400100 (owner: 10Giuseppe Lavagetto)
[17:34:25] <wikibugs>	 10Operations, 10ops-eqiad: mw1191 ipmi-sel cpu errors - https://phabricator.wikimedia.org/T179640#3731680 (10RobH) This host now also has a failed sector check:   ``` This message was generated by the smartd daemon running on:     host name:  mw1191    DNS domain: eqiad.wmnet  The following warning/error was l...
[17:34:44] <wikibugs>	 (03PS2) 10Andrew Bogott: wikireplica_dns: Add s8 shard [puppet] - 10https://gerrit.wikimedia.org/r/402086 (https://phabricator.wikimedia.org/T184179) (owner: 10BryanDavis)
[17:35:14] <wikibugs>	 (03PS2) 10Andrew Bogott: wmcs: Add s8 to maintain-meta_p [puppet] - 10https://gerrit.wikimedia.org/r/402087 (https://phabricator.wikimedia.org/T184179) (owner: 10BryanDavis)
[17:39:48] <wikibugs>	 (03CR) 10Chad: [V: 032 C: 032] Add hooks plugin @ 2.13.9 [software/gerrit] - 10https://gerrit.wikimedia.org/r/401697 (https://phabricator.wikimedia.org/T183792) (owner: 10Chad)
[17:40:25] <logmsgbot>	 !log demon@tin Started deploy [gerrit/gerrit@1e1a79d]: deploying hooks plugin
[17:40:35] <logmsgbot>	 !log demon@tin Finished deploy [gerrit/gerrit@1e1a79d]: deploying hooks plugin (duration: 00m 10s)
[17:40:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:40:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:41:10] <no_justification>	 paladox: ^^
[17:42:02] <no_justification>	 Oh, also I'm wondering if we can come up with /some/ sort of Jenkins job to validate us before merging? Like...I feel gross just +2 / +2 / Submit myself....
[17:42:04] <icinga-wm>	 PROBLEM - Check systemd state on pc2005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[17:42:04] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on pc2005 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[17:42:36] <moritzm>	 !log upgrading HHVM on eqiad video scalers to 3.18.6
[17:42:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:47:34] <icinga-wm>	 RECOVERY - puppet last run on db2058 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[17:52:13] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on pc2005 is OK: OK ferm input default policy is set
[17:53:04] <icinga-wm>	 RECOVERY - Check systemd state on pc2005 is OK: OK - running: The system is fully operational
[17:55:06] <wikibugs>	 (03PS3) 10Chad: Beta: Moving all docroots to standard-docroot [puppet] - 10https://gerrit.wikimedia.org/r/394203
[18:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and Amir1: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Services – Graphoid / Parsoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180104T1800).
[18:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[18:05:04] <wikibugs>	 (03PS4) 10Chad: Beta: Moving all docroots to standard-docroot [puppet] - 10https://gerrit.wikimedia.org/r/394203 (https://phabricator.wikimedia.org/T126306)
[18:05:06] <wikibugs>	 (03PS1) 10Chad: Moving all docroots to standard-docroot [puppet] - 10https://gerrit.wikimedia.org/r/402090 (https://phabricator.wikimedia.org/T126306)
[18:05:10] <wikibugs>	 (03PS1) 10Chad: Drop unused docroots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402091 (https://phabricator.wikimedia.org/T126306)
[18:20:22] <wikibugs>	 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3876000 (10RobH) synced up with @papaul via irc:  labtestvirt2003:eth1:ge-1/0/12 labtestvirt2003:eth2:ge-1/0/14
[18:25:59] <edsanders>	 hey - I'm trying to connect to stat1006 but can't (I haven't before either)
[18:26:12] <edsanders>	 I can connect to tin and bast1001 fine though
[18:27:23] <jynus>	 !log upgrade and restart labsdb1009
[18:27:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:29:19] <jynus>	 haproxy will complain in a second while I reboot
[18:31:45] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1010 is CRITICAL: CRITICAL check_failover servers up 1 down 1
[18:34:36] <bd808>	 edsanders: are you in the analytics-privatedata-users group?
[18:35:02] * bd808 grumbles about that list not be in alphabetical order
[18:35:38] <bd808>	 edsanders: you aren't. So you need to be granted into that group
[18:35:54] <edsanders>	 bd808: well that would explain it
[18:35:56] <wikibugs>	 (03PS1) 10Gehel: elasticsearch / prometheus: enable prometheus jmx_exporter [puppet] - 10https://gerrit.wikimedia.org/r/402095 (https://phabricator.wikimedia.org/T181627)
[18:36:03] <edsanders>	 how do I go about doing that?
[18:36:28] * bd808 is looking for an example ticket
[18:36:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch / prometheus: enable prometheus jmx_exporter [puppet] - 10https://gerrit.wikimedia.org/r/402095 (https://phabricator.wikimedia.org/T181627) (owner: 10Gehel)
[18:36:59] <bd808>	 edsanders: make a ticket like T115548
[18:36:59] <stashbot>	 T115548: Requesting access to analytics-privatedata-users for Bryan Davis - https://phabricator.wikimedia.org/T115548
[18:37:24] <jynus>	 and we should be back up
[18:37:34] <wikibugs>	 (03CR) 10Gehel: "I'm not entirely sure about the organization of the different files / classes. It is not entirely clear to me what should be in the elasti" [puppet] - 10https://gerrit.wikimedia.org/r/402095 (https://phabricator.wikimedia.org/T181627) (owner: 10Gehel)
[18:37:45] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1010 is OK: OK check_failover servers up 2 down 0
[18:39:02] <wikibugs>	 (03PS2) 10Gehel: elasticsearch / prometheus: enable prometheus jmx_exporter [puppet] - 10https://gerrit.wikimedia.org/r/402095 (https://phabricator.wikimedia.org/T181627)
[18:40:03] <edsanders>	 thanks
[18:43:34] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to analytics-privatedata-users for Ed Sanders - https://phabricator.wikimedia.org/T184206#3876092 (10Esanders)
[18:46:58] <wikibugs>	 (03CR) 10Gehel: "Puppet compiler looks happy: https://puppet-compiler.wmflabs.org/compiler02/9578/" [puppet] - 10https://gerrit.wikimedia.org/r/402095 (https://phabricator.wikimedia.org/T181627) (owner: 10Gehel)
[18:49:27] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team: 2018-01-02: labstore Tools and Misc share very full - https://phabricator.wikimedia.org/T183920#3876110 (10madhuvishy)
[18:49:31] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team: templatetiger is using 827G of 8T available tools nfs storage - https://phabricator.wikimedia.org/T183954#3876108 (10madhuvishy) 05Open>03Resolved Thank you!
[18:52:14] <logmsgbot>	 !log bsitzmann@tin Started deploy [mobileapps/deploy@8bcffa9]: Update mobileapps to a4ba9fd (T182330 T177430 T170690 T182652 T184198)
[18:52:15] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team: wikidumpparse is using 1.2TB of 5T available NFS misc storage - https://phabricator.wikimedia.org/T183970#3876124 (10madhuvishy) @notconfusing @Dfko @Hargup Hello! Poke on this task again, could you please clean up the home folder soon, thank you.
[18:52:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:30] <stashbot>	 T182652: Citations field becomes type with just a single value - https://phabricator.wikimedia.org/T182652
[18:52:30] <stashbot>	 T184198: French news has empty news story today - https://phabricator.wikimedia.org/T184198
[18:52:30] <stashbot>	 T182330: Media: handle galleries - https://phabricator.wikimedia.org/T182330
[18:52:30] <stashbot>	 T170690: Extract a References JSON API - https://phabricator.wikimedia.org/T170690
[18:52:31] <stashbot>	 T177430: Develop a Media JSON API - https://phabricator.wikimedia.org/T177430
[18:53:07] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team: tools.iabot is using 1.3T of 8T available tools nfs storage - https://phabricator.wikimedia.org/T183953#3876159 (10madhuvishy) @Cyberpower678 Any update on this? Thanks!
[18:53:49] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org (pilot instance) - https://phabricator.wikimedia.org/T180854#3876160 (10Tgr) >>! In T180854#3874954, @Qgil wrote: > @Andrew @Austin @EBernhardson @Tgr @Samwilson @yuvipanda, as current admins of [[ htt...
[18:58:14] <logmsgbot>	 !log bsitzmann@tin Finished deploy [mobileapps/deploy@8bcffa9]: Update mobileapps to a4ba9fd (T182330 T177430 T170690 T182652 T184198) (duration: 06m 01s)
[18:58:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:58:28] <stashbot>	 T182652: Citations field becomes type with just a single value - https://phabricator.wikimedia.org/T182652
[18:58:34] <stashbot>	 T184198: French news has empty news story today - https://phabricator.wikimedia.org/T184198
[18:58:34] <stashbot>	 T182330: Media: handle galleries - https://phabricator.wikimedia.org/T182330
[18:58:34] <stashbot>	 T170690: Extract a References JSON API - https://phabricator.wikimedia.org/T170690
[18:58:34] <stashbot>	 T177430: Develop a Media JSON API - https://phabricator.wikimedia.org/T177430
[18:59:28] <edsanders>	 bd808: Do I need to do anything else? (https://phabricator.wikimedia.org/T184206)
[19:00:05] <jouncebot>	 addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for Morning SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180104T1900).
[19:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[19:02:26] <bd808>	 edsanders: nope. I mean you could gently nudge some roots to review it and get the wait period started I guess
[19:05:47] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1335 is CRITICAL: CRITICAL: nf_conntrack is 91 % full
[19:08:25] <RoanKattouw>	 bd808: Is manager approval still required?
[19:08:31] <RoanKattouw>	 (I always thought that was a somewhat silly part of the process)
[19:08:46] <bd808>	 RoanKattouw: oh probably, and yes silly
[19:09:05] <bd808>	 although I guess expecting techops to know everyone and what they do is also silly
[19:09:08] <wikibugs>	 10Operations, 10Cloud-Services, 10netops: Consider renumbering Labs to separate address spaces - https://phabricator.wikimedia.org/T122406#1903452 (10chasemp)  > The 172.16.0.0/12 space (still RFC 1918) for private addresses, instead of 10/8.  Right now our allocation for instances in the main deployment in...
[19:10:01] <RoanKattouw>	 That's true
[19:12:56] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1335 is OK: OK: nf_conntrack is 74 % full
[19:13:00] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: auto reload log4j2 configuration [puppet] - 10https://gerrit.wikimedia.org/r/388130
[19:13:11] <chasemp>	 I can an attest to rarely knowing who anyone is or what they do :)
[19:13:52] <wikibugs>	 (03CR) 10Gehel: [C: 032] elasticsearch: auto reload log4j2 configuration [puppet] - 10https://gerrit.wikimedia.org/r/388130 (owner: 10Gehel)
[19:14:13] <mutante>	 it would help to have an orgchart where you can check
[19:14:56] <RoanKattouw>	 Namely has an org chart, kind of
[19:15:02] <RoanKattouw>	 It does reliably know who someone's manager is
[19:15:24] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: ircecho: Remove redundant thread [puppet] - 10https://gerrit.wikimedia.org/r/402081
[19:15:24] <RoanKattouw>	 (Also the staff page can tell you which team someone's on, and what their title is)
[19:15:26] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: ircecho: Force unbuffered stdin/stdout/stderr [puppet] - 10https://gerrit.wikimedia.org/r/402101
[19:15:28] <mutante>	 ah yea, that's good
[19:20:20] <mutante>	 this doesn't eliminate a need for approval but it certainly makes it easier to find out who the manager is
[19:21:18] <bd808>	 there is a undocumented wish to build a self-service portal app that would make all of this easier. someday p.aravoid or I will find developer time to work on that :)
[19:22:17] <RoanKattouw>	 You know, we did that once
[19:22:29] <RoanKattouw>	 Erik Möller commissioned marktraceur to write one once
[19:22:49] <marktraceur>	 RIP
[19:24:25] <wikibugs>	 10Operations, 10Data-Services, 10MediaWiki-Maintenance-scripts, 10Wikidata, 10Patch-For-Review: Missing references to s8 on maintenance and cloud scripts (and potentially others) - https://phabricator.wikimedia.org/T184179#3876321 (10bd808) I think my patches above take care of the Cloud Services/Data Se...
[19:25:37] <icinga-wm>	 PROBLEM - HP RAID on db2054 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:4 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK
[19:25:39] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on db2054 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:4 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T184210
[19:25:42] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on db2054 - https://phabricator.wikimedia.org/T184210#3876324 (10ops-monitoring-bot)
[19:26:10] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: db2054: Disk with predictive failure - https://phabricator.wikimedia.org/T183887#3876328 (10Papaul) a:05Papaul>03Marostegui Disk replacement complete
[19:31:51] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to Production SSH, statistics-privatedata-users, analytics-privatedata-users, perf-team for imarlier - https://phabricator.wikimedia.org/T184190#3876335 (10RobH) p:05Triage>03Normal
[19:32:13] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to Production SSH, statistics-privatedata-users, analytics-privatedata-users, perf-team for imarlier - https://phabricator.wikimedia.org/T184190#3875340 (10RobH)
[19:34:18] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2054 - https://phabricator.wikimedia.org/T184210#3876339 (10Peachey88)
[19:37:25] <paladox>	 no_justification :).
[19:38:25] <wikibugs>	 (03PS1) 10RobH: adding shell user imarlier [puppet] - 10https://gerrit.wikimedia.org/r/402102 (https://phabricator.wikimedia.org/T184190)
[19:41:35] <wikibugs>	 (03PS1) 10RobH: adding imarlier to groups [puppet] - 10https://gerrit.wikimedia.org/r/402103
[19:41:41] <joal>	 Hi ops-team - Little ping about me deploying analytics-refinery (analytics only stuff)
[19:42:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] adding imarlier to groups [puppet] - 10https://gerrit.wikimedia.org/r/402103 (owner: 10RobH)
[19:42:17] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to Production SSH, statistics-privatedata-users, analytics-privatedata-users, perf-team for imarlier - https://phabricator.wikimedia.org/T184190#3876351 (10RobH)
[19:42:27] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2054 - https://phabricator.wikimedia.org/T184210#3876353 (10Marostegui) 05Open>03declined Thanks -  this is because we replaced a disk which  was on predicted failure:  T183887
[19:43:29] <wikibugs>	 (03PS2) 10RobH: adding imarlier to groups [puppet] - 10https://gerrit.wikimedia.org/r/402103 (https://phabricator.wikimedia.org/T184190)
[19:43:35] <robh>	 ok the space AFTEr bug: is changed i could swear it required it NOT have a space not that long ago...
[19:43:53] <robh>	 i have more failed commits in the past 6 months for commit messages......
[19:44:35] <robh>	 oh well.
[19:46:21] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to Production SSH, statistics-privatedata-users, analytics-privatedata-users, perf-team for imarlier - https://phabricator.wikimedia.org/T184190#3875340 (10RobH) I've updated the task description with the checklist of required items....
[19:46:51] <logmsgbot>	 !log joal@tin Started deploy [analytics/refinery@a69a2cd]: Regular analytics deploy
[19:47:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:47:58] <wikibugs>	 (03PS1) 10Jgreen: A/PTR for civicrm-eqiad.wm.o at 208.80.152.232, remove deprecated frdev hostnames [dns] - 10https://gerrit.wikimedia.org/r/402105
[19:49:35] <wikibugs>	 (03CR) 10Jgreen: [C: 032] A/PTR for civicrm-eqiad.wm.o at 208.80.152.232, remove deprecated frdev hostnames [dns] - 10https://gerrit.wikimedia.org/r/402105 (owner: 10Jgreen)
[19:51:29] <logmsgbot>	 !log joal@tin Finished deploy [analytics/refinery@a69a2cd]: Regular analytics deploy (duration: 04m 38s)
[19:51:32] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to analytics-privatedata-users for Ed Sanders - https://phabricator.wikimedia.org/T184206#3876366 (10RobH)
[19:51:39] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to Production SSH, statistics-privatedata-users, analytics-privatedata-users, perf-team for imarlier - https://phabricator.wikimedia.org/T184190#3876367 (10Imarlier) Awesome -- thanks, Rob!
[19:51:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:52:51] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to analytics-privatedata-users for Ed Sanders - https://phabricator.wikimedia.org/T184206#3876092 (10RobH) @esanders: I didn't see your signature on the L3 document.  This is typically required for shell access.  If your access precedes the document usage...
[19:56:05] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to analytics-privatedata-users for Ed Sanders - https://phabricator.wikimedia.org/T184206#3876092 (10Ottomata) @Esanders, what data are you trying to access?  `analytics-privatedata-users` does not get you access to stat1006 or the MySQL EventLogging datab...
[20:00:04] <jouncebot>	 no_justification: I, the Bot under the Fountain, allow thee, The Deployer, to do MediaWiki train deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180104T2000).
[20:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[20:00:07] <wikibugs>	 (03CR) 10Paladox: [C: 031] "Need to build the hooks plugin for 2.14." [software/gerrit] - 10https://gerrit.wikimedia.org/r/395820 (https://phabricator.wikimedia.org/T156120) (owner: 10Chad)
[20:03:52] <twentyafterfour>	 !log preparing to deploy the train (filling in for no_justification)
[20:04:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:05:23] <twentyafterfour>	 RoanKattouw: care to take a look at https://phabricator.wikimedia.org/T184123 ? `git blame` says the change was your handiwork.
[20:05:51] <twentyafterfour>	 I'd attempt a patch but I'm actually not sure what 'text' is supposed to be
[20:06:38] <twentyafterfour>	 !log There are still open blockers for wmf.15 - see T180748 .. attempting to resolve them to unblock the train.
[20:06:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:06:49] <stashbot>	 T180748: 1.31.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T180748
[20:11:20] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team: wikidumpparse is using 1.2TB of 5T available NFS misc storage - https://phabricator.wikimedia.org/T183970#3876440 (10Dfko) Hi, I am looking around for the offending files to delete them, but it has been a long while since I worked on any of this and I don't r...
[20:19:43] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team: templatetiger is using 827G of 8T available tools nfs storage - https://phabricator.wikimedia.org/T183954#3876450 (10Peachey88) a:05madhuvishy>03Kolossos
[20:20:04] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to analytics-privatedata-users for Ed Sanders - https://phabricator.wikimedia.org/T184206#3876092 (10Jdforrester-WMF) >>! In T184206#3876383, @Ottomata wrote: > @Esanders, what data are you trying to access?  `analytics-privatedata-users` does not get you...
[20:20:07] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to analytics-privatedata-users for Ed Sanders - https://phabricator.wikimedia.org/T184206#3876453 (10Esanders) >>! In T184206#3876366, @RobH wrote: > Please sign the L3, thanks!  Done.
[20:23:18] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to analytics-privatedata-users for Ed Sanders - https://phabricator.wikimedia.org/T184206#3876466 (10Ottomata) `researchers` and `analytics-privatedata-users` should be whatchu need.  :)
[20:25:13] <RoanKattouw>	 twentyafterfour: Argh. It's supposed to be $sectionNAme
[20:25:17] <RoanKattouw>	 Fix coming
[20:28:36] <RoanKattouw>	 twentyafterfour: https://gerrit.wikimedia.org/r/#/c/402109/1/includes/parser/Parser.php
[20:37:13] <wikibugs>	 (03CR) 10VolkerE: [C: 04-1] "Those are largely unoptimized. Please see https://www.mediawiki.org/wiki/Manual:Coding_conventions/SVG for in-depth optimization guideline" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401523 (https://phabricator.wikimedia.org/T178942) (owner: 10Urbanecm)
[20:40:28] <no_justification>	 I never filed a task for that? cc twentyafterfour  RoanKattouw?
[20:40:39] <no_justification>	 I know  I complained on IRC....like before Xmas
[20:40:40] <RoanKattouw>	 For what?
[20:40:47] <RoanKattouw>	 The undefined variable text thing?
[20:42:13] <no_justification>	 Yeah
[20:42:26] <RoanKattouw>	 Not that I know of
[20:45:11] <no_justification>	 Yeah I didn't file a task
[20:45:15] <no_justification>	 I just bitched on IRC
[20:45:19] <no_justification>	 (that usually works heh)
[20:47:24] <twentyafterfour>	 cherry-picking to wmf.15
[20:55:40] <wikibugs>	 (03PS1) 10Rush: openstack: these servers should be an HA pair [puppet] - 10https://gerrit.wikimedia.org/r/402115 (https://phabricator.wikimedia.org/T167559)
[20:56:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: these servers should be an HA pair [puppet] - 10https://gerrit.wikimedia.org/r/402115 (https://phabricator.wikimedia.org/T167559) (owner: 10Rush)
[20:57:38] <mutante>	 chasemp: odd error, eh
[20:57:58] <mutante>	 The hostname 'labtestneutron200[1-2].codfw.wmnet' contains illegal characters (only letters, digits, '_', '-', and '.' are allowed)    but we have  [] all the time
[20:58:34] <wikibugs>	 (03PS2) 10Rush: openstack: these servers should be an HA pair [puppet] - 10https://gerrit.wikimedia.org/r/402115 (https://phabricator.wikimedia.org/T167559)
[20:58:46] <chasemp>	 mutante: yeah not sure but trying a patch now
[20:58:51] <mutante>	 oh, i see it
[20:59:05] <mutante>	 needs to start with   node /^
[20:59:17] <chasemp>	 yeppers
[21:00:54] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org (pilot instance) - https://phabricator.wikimedia.org/T180854#3876522 (10Qgil) Can you give me admin access, please?
[21:00:57] <wikibugs>	 (03CR) 10Rush: [C: 032] openstack: these servers should be an HA pair [puppet] - 10https://gerrit.wikimedia.org/r/402115 (https://phabricator.wikimedia.org/T167559) (owner: 10Rush)
[21:10:53] <wikibugs>	 (03CR) 10Aaron Schulz: "Sorry, I merged shortly before a break and didn't revert." [puppet] - 10https://gerrit.wikimedia.org/r/392221 (owner: 10Aaron Schulz)
[21:14:27] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org (pilot instance) - https://phabricator.wikimedia.org/T180854#3876561 (10Tgr) Done.
[21:21:34] <wikibugs>	 (03PS18) 10Aaron Schulz: [WIP] Add mcrouter module and mcrouter_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/392221
[21:24:14] <wikibugs>	 (03PS1) 10BryanDavis: wmcs: maintain-meta_p missing python-requests [puppet] - 10https://gerrit.wikimedia.org/r/402117
[21:25:32] <moritzm>	 !log reboot multatuli for kernel update
[21:25:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:32:10] <wikibugs>	 (03PS1) 10Dzahn: httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118
[21:32:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118 (owner: 10Dzahn)
[21:33:48] <twentyafterfour>	 !log deploying patches to unblock the train
[21:33:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:34:38] <wikibugs>	 (03CR) 10BryanDavis: "Seems to work ok: https://puppet-compiler.wmflabs.org/compiler02/9579/labsdb1009.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/402117 (owner: 10BryanDavis)
[21:35:03] <logmsgbot>	 !log twentyafterfour@tin Synchronized php-1.31.0-wmf.15/includes/parser/Parser.php: Deploy 601cf9d183b0e5a97d264048efaab71a4a925500 (duration: 01m 03s)
[21:35:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:35:55] <wikibugs>	 (03CR) 10Madhuvishy: [C: 032] wmcs: maintain-meta_p missing python-requests [puppet] - 10https://gerrit.wikimedia.org/r/402117 (owner: 10BryanDavis)
[21:37:47] <moritzm>	 !log uploaded linux-4.9.65-3+deb9u1~bpo8+2 for jessie-wikimedia to apt.wikimedia.org (provides KPTI backport)
[21:37:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:38:13] <wikibugs>	 (03PS1) 10BryanDavis: pcc: Python3 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/402119
[21:44:35] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Emails send by subscribers don't arrive on the mailing list Moderators-nl - https://phabricator.wikimedia.org/T181906#3876587 (10herron) Just now I sent 3 testing messages with different subjects and 3 replies (one reply to each subject).  The problem is happening for m...
[21:46:49] <wikibugs>	 (03PS1) 10Muehlenhoff: Bump meta package for new ABI in 4.9 [debs/linux-meta] - 10https://gerrit.wikimedia.org/r/402121
[21:49:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Bump meta package for new ABI in 4.9 [debs/linux-meta] - 10https://gerrit.wikimedia.org/r/402121 (owner: 10Muehlenhoff)
[21:53:20] <logmsgbot>	 !log twentyafterfour@tin Synchronized php-1.31.0-wmf.15/extensions/TitleBlacklist/TitleBlacklistPreAuthenticationProvider.php: Deploy 332fab0d737b5a524abbed7264d64890dd3ce6dc to stop logspam and unblock the train (duration: 01m 02s)
[21:53:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:59:21] <wikibugs>	 (03PS1) 1020after4: all wikis to 1.31.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402123
[21:59:23] <wikibugs>	 (03CR) 1020after4: [C: 032] all wikis to 1.31.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402123 (owner: 1020after4)
[22:00:35] <twentyafterfour>	 !log No blockers remain for T180748, proceeding to deploy wmf.15 to all wikis
[22:00:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:45] <stashbot>	 T180748: 1.31.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T180748
[22:01:14] <wikibugs>	 (03CR) 10Krinkle: [C: 031] mediawiki: Remove unused python-pygments package [puppet] - 10https://gerrit.wikimedia.org/r/400458 (https://phabricator.wikimedia.org/T182851) (owner: 10Legoktm)
[22:02:07] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.31.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402123 (owner: 1020after4)
[22:02:21] <wikibugs>	 (03CR) 10jenkins-bot: all wikis to 1.31.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402123 (owner: 1020after4)
[22:03:12] <logmsgbot>	 !log twentyafterfour@tin rebuilt and synchronized wikiversions files: all wikis to 1.31.0-wmf.15
[22:03:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:03:35] <wikibugs>	 (03PS19) 10Aaron Schulz: [WIP] Add mcrouter module and mcrouter_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/392221
[22:04:40] <wikibugs>	 (03PS2) 10Dzahn: httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118
[22:05:41] <icinga-wm>	 RECOVERY - HP RAID on db2054 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK
[22:06:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118 (owner: 10Dzahn)
[22:09:47] <moritzm>	 !log uploaded linux-meta 1.16 for jessie-wikimedia to apt.wikimedia.org (which installs the new KPTI-enabled kernel with the new ABI)
[22:09:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:24:32] <wikibugs>	 (03PS3) 10Dzahn: httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118
[22:25:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118 (owner: 10Dzahn)
[22:27:09] <wikibugs>	 (03PS4) 10Dzahn: httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118
[22:28:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118 (owner: 10Dzahn)
[22:29:03] <mutante>	 how to remove the Ganglia line from https://wikitech.wikimedia.org/wiki/Template:Server   without breaking the rest of the template ?:)
[22:29:14] <mutante>	 i did it wrong, mismatching brackets
[22:29:44] <mutante>	 it's a template using another template
[22:30:19] <paladox>	 mutante https://wikitech.wikimedia.org/w/index.php?title=Template%3AServer&type=revision&diff=1779540&oldid=1779538
[22:30:20] <mutante>	 https://wikitech.wikimedia.org/w/index.php?title=Template:Ganglia&action=edit
[22:30:59] <mutante>	 paladox: :) thanks
[22:31:03] <paladox>	 your welcome :).
[22:31:07] <mutante>	 i didn't remove enough
[22:31:16] <paladox>	 heh
[22:32:24] <mutante>	 https://wikitech.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Template%3AGanglia&namespace=   uhmm
[22:33:07] <mutante>	 it still uses the template in another place
[22:33:13] <mutante>	 server template using ganglia template that is
[22:33:32] <mutante>	 but the link is just  {{tl|Ganglia}})
[22:33:40] <paladox>	 ;server_group: (optional) Name of organizational server group (not physical per se). Should match the "Source" group of the node in Ganglia (passed to {{tl|Ganglia}})
[22:33:40] <paladox>	 ;server_nodename: (optional) "Node" hostname. Should match the "Node name" in Ganglia (passed to {{tl|Ganglia}})
[22:34:29] <mutante>	 removes
[22:34:42] <mutante>	 https://wikitech.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Template%3AGanglia&namespace= better 
[22:35:00] <mutante>	 still lists Template::server itself :)
[22:35:09] <mutante>	 but no transclusion
[22:36:19] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Emails send by subscribers don't arrive on the mailing list Moderators-nl - https://phabricator.wikimedia.org/T181906#3876673 (10Trijnstel) >>! In T181906#3876587, @herron wrote: > Just now I sent 3 testing messages with different subjects and 3 replies (one reply to ea...
[22:36:38] <mutante>	 == See also ==
[22:36:39] <mutante>	 * {{tl|Server}}
[22:37:02] <mutante>	 and NOW i can delete the template:)
[22:37:22] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Emails send by subscribers don't arrive on the mailing list Moderators-nl - https://phabricator.wikimedia.org/T181906#3876675 (10Natuur12) I received three emails from Herron, zero from Trijnstel.
[22:38:34] <wikibugs>	 (03PS5) 10Smalyshev: Add loading DCAT-AP data into dcatap namespace on WDQS [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978)
[22:39:36] <wikibugs>	 (03PS5) 10Dzahn: httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118
[22:40:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118 (owner: 10Dzahn)
[22:41:28] <wikibugs>	 10Operations, 10Cloud-VPS, 10Toolforge, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3876678 (10Paladox) I think upstream have started rolling out the security update.
[22:42:49] <wikibugs>	 10Operations, 10Cloud-VPS, 10Toolforge, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3875329 (10MoritzMuehlenhoff) @paladox: Most of WMCS runs trusty with either the 3.13 or 4.4 kernel and needs an update by Canonical (which is...
[23:11:21] <icinga-wm>	 PROBLEM - HHVM rendering on mw2222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:12:11] <icinga-wm>	 RECOVERY - HHVM rendering on mw2222 is OK: HTTP OK: HTTP/1.1 200 OK - 73405 bytes in 0.328 second response time
[23:12:25] <wikibugs>	 (03PS1) 10Dzahn: network::constants: add fake CACHE_MISC for labs [puppet] - 10https://gerrit.wikimedia.org/r/402136
[23:12:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] network::constants: add fake CACHE_MISC for labs [puppet] - 10https://gerrit.wikimedia.org/r/402136 (owner: 10Dzahn)
[23:15:13] <wikibugs>	 (03CR) 10Thcipriani: [C: 031] "Seems to work as long as you don't want a service called ^A" [deployment-charts] - 10https://gerrit.wikimedia.org/r/399256 (owner: 10Dduvall)
[23:17:25] <wikibugs>	 (03PS1) 10BryanDavis: wmcs: Add database drop support to maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/402137 (https://phabricator.wikimedia.org/T181925)
[23:17:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: Add database drop support to maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/402137 (https://phabricator.wikimedia.org/T181925) (owner: 10BryanDavis)
[23:19:31] <wikibugs>	 (03PS2) 10BryanDavis: wmcs: Add database drop support to maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/402137 (https://phabricator.wikimedia.org/T181925)
[23:19:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: Add database drop support to maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/402137 (https://phabricator.wikimedia.org/T181925) (owner: 10BryanDavis)
[23:21:34] <wikibugs>	 (03PS6) 10Dzahn: httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118
[23:23:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] httpd: testing new module with planet (test only) [puppet] - 10https://gerrit.wikimedia.org/r/402118 (owner: 10Dzahn)
[23:30:02] <apergos>	 !log rebooted releases1001 and 2001 (new kernel) 
[23:30:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:33:16] <wikibugs>	 (03PS3) 10BryanDavis: wmcs: Add database drop support to maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/402137 (https://phabricator.wikimedia.org/T181925)
[23:34:56] * bd808 wishes he could get the ops/puppet tests to execute locally
[23:35:21] <apergos>	 no puppet compiler?
[23:35:27] <apergos>	 dammit, I meant to not look at irc
[23:36:15] <bd808>	 apergos: no its something funky about bundler on my laptop I think. Weird and non-sensical ruby errors
[23:36:22] <bd808>	 pcc works for me
[23:36:26] <bd808>	 just not the linters
[23:37:30] <apergos>	 ah ha
[23:38:28] <apergos>	 and now I really am going to stop looking.  good night folks
[23:51:05] <wikibugs>	 (03CR) 10Smalyshev: "I think this is now ready for merge" [puppet] - 10https://gerrit.wikimedia.org/r/399954 (https://phabricator.wikimedia.org/T178978) (owner: 10Smalyshev)