[20:24:51] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] logstash: temp stop managing indices [puppet] - 10https://gerrit.wikimedia.org/r/472247 (owner: 10Filippo Giunchedi)
[20:24:52] <logmsgbot>	 !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: rollback labswiki to 1.33.0-wmf.2
[20:24:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:25:55] <thcipriani>	 greg-g: ^ done, problem gone?
[20:26:44] <wikibugs>	 (03PS1) 10Thcipriani: labswiki rollback to 1.33.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472250
[20:27:06] <wikibugs>	 (03CR) 10Anomie: [C: 031] Allow Cloud VPS 172.16.0.0/16 for $wmgAllowLabsAnonEdits wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472243 (https://phabricator.wikimedia.org/T208986) (owner: 10BryanDavis)
[20:28:48] <James_F>	 Oh oops, I never !log-ed that I ran namespaceDupes.php. Next time I'll remember.
[20:29:21] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] labswiki rollback to 1.33.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472250 (owner: 10Thcipriani)
[20:29:59] <wikibugs>	 (03PS1) 10Effie Mouzeli: Change rdb1005 to spare:system [puppet] - 10https://gerrit.wikimedia.org/r/472251 (https://phabricator.wikimedia.org/T206450)
[20:30:34] <wikibugs>	 (03Merged) 10jenkins-bot: labswiki rollback to 1.33.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472250 (owner: 10Thcipriani)
[20:31:53] <Krinkle>	 thcipriani: Got a couple of error fixes prepared whenever it's good – https://gerrit.wikimedia.org/r/#/q/status:open+branch:wmf/1.33.0-wmf.3
[20:34:18] <thcipriani>	 Krinkle: train rollout seems stable now
[20:36:14] <Krinkle>	 OK
[20:40:02] <wikibugs>	 (03CR) 10jenkins-bot: labswiki rollback to 1.33.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472250 (owner: 10Thcipriani)
[20:40:40] <greg-g>	 thcipriani: sorry, yes
[20:41:40] <thcipriani>	 I deduced from the error logs :)
[20:43:02] <greg-g>	 thcipriani: humans are fallible
[20:43:40] <wikibugs>	 (03PS1) 10Andrew Bogott: Nova: add cloudvirt1017 to the scheduler pool [puppet] - 10https://gerrit.wikimedia.org/r/472253 (https://phabricator.wikimedia.org/T208733)
[20:53:09] <wikibugs>	 (03PS2) 10Banyek: wiki replicas: depool lasbdb1010 for view changes [puppet] - 10https://gerrit.wikimedia.org/r/471295 (https://phabricator.wikimedia.org/T189158) (owner: 10Bstorm)
[20:53:15] <wikibugs>	 (03CR) 10Banyek: [V: 032 C: 032] wiki replicas: depool lasbdb1010 for view changes [puppet] - 10https://gerrit.wikimedia.org/r/471295 (https://phabricator.wikimedia.org/T189158) (owner: 10Bstorm)
[20:55:04] <banyek>	 !log depool labsdb1010 (T189158)
[20:55:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:55:06] <stashbot>	 T189158: Change `image` view to properly expose the new `img_description_id` field - https://phabricator.wikimedia.org/T189158
[20:56:21] <Raymond_>	 I am getting a fatal error " „UnexpectedValueException“ for https://commons.wikimedia.org/w/index.php?title=Category:Argenta_(company)&action=edit 
[20:59:21] <paladox>	 thcipriani ^^
[20:59:41] <MaxSem>	 Raymond_: disable TwoColConflict for now
[20:59:52] <Raymond_>	 MaxSem: thanks
[21:00:04] <jouncebot>	 cscott, arlolra, subbu, bearND, halfak, and Amir1: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181107T2100).
[21:00:24] <Raymond_>	 yep. works now :)
[21:00:45] <MaxSem>	 I'll submit the bug
[21:01:03] * Krinkle stating on mwdebug1002
[21:01:09] <wikibugs>	 (03PS2) 10Bstorm: sonofgridengine: remove ldapconfig materials [puppet] - 10https://gerrit.wikimedia.org/r/472241 (https://phabricator.wikimedia.org/T200557)
[21:01:18] <Amir1>	 I deploy something for ores
[21:01:41] <Krinkle>	 Amir1: mw?
[21:01:43] <wikibugs>	 (03PS2) 10Cwhite: diamond: ensure Nginx collector absent [puppet] - 10https://gerrit.wikimedia.org/r/471360 (https://phabricator.wikimedia.org/T183454)
[21:01:57] <Amir1>	 nope, ores It's services deploy window
[21:01:58] <wikibugs>	 (03PS3) 10Cwhite: diamond: ensure Nginx collector absent [puppet] - 10https://gerrit.wikimedia.org/r/471360 (https://phabricator.wikimedia.org/T183454)
[21:02:01] <Krinkle>	 OK
[21:02:58] <icinga-wm>	 PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 28734 MB (5% inode=99%)
[21:03:01] <wikibugs>	 (03CR) 10Bstorm: [C: 032] sonofgridengine: remove ldapconfig materials [puppet] - 10https://gerrit.wikimedia.org/r/472241 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm)
[21:03:21] <thcipriani>	 MaxSem: I see this error in logstash, is this something that I should be rolling back for? What is the impact?
[21:03:28] <logmsgbot>	 !log ladsgroup@deploy1001 Started deploy [ores/deploy@25dfa4f]: T191842 T197096
[21:03:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:03:33] <stashbot>	 T191842: Deployment git server can't supply ORES hosts in parallel - https://phabricator.wikimedia.org/T191842
[21:03:34] <stashbot>	 T197096: [Epic] Use LFS for large ORES files - https://phabricator.wikimedia.org/T197096
[21:03:59] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.3/includes/jobqueue/jobs/RefreshLinksJob.php: T208147 -I7f5fafe9439d8a7b4 (duration: 00m 54s)
[21:04:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:02] <stashbot>	 T208147: PHP Fatal Error from RefreshLinksJob: Argument to runForTitle() must be Title - https://phabricator.wikimedia.org/T208147
[21:04:06] <wikibugs>	 (03CR) 10Cwhite: [C: 032] diamond: ensure Nginx collector absent [puppet] - 10https://gerrit.wikimedia.org/r/471360 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite)
[21:04:11] <Krinkle>	 thcipriani: I believe the fix is already up for SWAT in a few hours, not a new error.
[21:04:23] <wikibugs>	 (03PS4) 10Cwhite: diamond: ensure Nginx collector absent [puppet] - 10https://gerrit.wikimedia.org/r/471360 (https://phabricator.wikimedia.org/T183454)
[21:04:32] <MaxSem>	 thcipriani: 48 errors over the last 1 hour
[21:04:37] <thcipriani>	 Krinkle: ah, ok. I hadn't seen it before afaicr.
[21:04:52] <Krinkle>	 Might be a different error from TwoColConflict, that's possible
[21:05:09] <Krinkle>	 https://phabricator.wikimedia.org/T205942
[21:06:08] <banyek>	 !log stopping replication on db2072 (T208954)
[21:06:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:06:12] <stashbot>	 T208954: Missing row in enwiki.archive on sanitarium - https://phabricator.wikimedia.org/T208954
[21:06:49] <Amir1>	 Canary is happy, moving to all
[21:06:58] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.3/extensions/AbuseFilter/includes/AbuseFilter.php: T208144 - I0fdda51010243 (duration: 00m 53s)
[21:07:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:07:01] <stashbot>	 T208144: Fatal error on file upload: "Argument to AbuseFilter::filterAction() must be Title, null given" - https://phabricator.wikimedia.org/T208144
[21:11:48] <icinga-wm>	 PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[21:16:48] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.3/extensions/VipsScaler: Id9f82afd (duration: 00m 55s)
[21:16:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:18:56] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.2/extensions/AbuseFilter/includes/AbuseFilter.php: T208144 - I0fdda510102436 (duration: 00m 53s)
[21:19:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:19:02] <stashbot>	 T208144: Fatal error on file upload: "Argument to AbuseFilter::filterAction() must be Title, null given" - https://phabricator.wikimedia.org/T208144
[21:20:52] <logmsgbot>	 !log ladsgroup@deploy1001 Finished deploy [ores/deploy@25dfa4f]: T191842 T197096 (duration: 17m 24s)
[21:21:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:21:02] <stashbot>	 T191842: Deployment git server can't supply ORES hosts in parallel - https://phabricator.wikimedia.org/T191842
[21:21:03] <stashbot>	 T197096: [Epic] Use LFS for large ORES files - https://phabricator.wikimedia.org/T197096
[21:21:49] <icinga-wm>	 RECOVERY - Disk space on elastic1025 is OK: DISK OK
[21:23:04] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10Scap, 10Patch-For-Review, and 2 others: Deployment git server can't supply ORES hosts in parallel - https://phabricator.wikimedia.org/T191842 (10Ladsgroup) Now deployment time has been reduced from 22 minutes to 17 minutes. I will increase the number of parallel...
[21:27:38] <icinga-wm>	 RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.539 second response time
[21:31:08] <icinga-wm>	 PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[21:44:52] <James_F>	 thcipriani: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/472336 should allow the train to go back to wikitechwiki
[21:45:14] <logmsgbot>	 !log arlolra@deploy1001 Started deploy [parsoid/deploy@4edc771]: Updating Parsoid to 970751a
[21:45:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:46:38] <thcipriani>	 James_F: thanks for backporting, I'll go ahead and push that out now and try to re-roll-forward
[21:51:19] <icinga-wm>	 PROBLEM - IPsec on rdb2005 is CRITICAL: Strongswan CRITICAL - ok: 1 not-conn: rdb1005_v4
[21:52:00] <James_F>	 Yay.
[21:52:07] <wikibugs>	 (03CR) 10Cwhite: [C: 032] add socket_bufsize option to make SO_RCVBUF tunable [debs/statsd-proxy] (wmf_v0.0.10) - 10https://gerrit.wikimedia.org/r/470512 (https://phabricator.wikimedia.org/T196484) (owner: 10Cwhite)
[21:52:29] <wikibugs>	 (03CR) 10Cwhite: [V: 032 C: 032] add socket_bufsize option to make SO_RCVBUF tunable [debs/statsd-proxy] (wmf_v0.0.10) - 10https://gerrit.wikimedia.org/r/470512 (https://phabricator.wikimedia.org/T196484) (owner: 10Cwhite)
[21:52:40] <James_F>	 thcipriani: There's also https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/472243/ to fix the Cloud VPS issue.
[21:52:58] <James_F>	 I said I'd do it once the train was done, but I'm happy to leave it to you. ;-)
[21:54:31] <thcipriani>	 oh good :)
[21:54:35] <thcipriani>	 sure, I'll get it
[21:54:48] <icinga-wm>	 PROBLEM - Check health of redis instance on 6378 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 628 600 - REDIS 2.8.17 on 127.0.0.1:6378 has 1 databases (db0) with 6 keys, up 127 days 19 hours - replication_delay is 628
[21:54:48] <logmsgbot>	 !log arlolra@deploy1001 Finished deploy [parsoid/deploy@4edc771]: Updating Parsoid to 970751a (duration: 09m 34s)
[21:54:49] <icinga-wm>	 PROBLEM - Check health of redis instance on 6381 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 631 600 - REDIS 2.8.17 on 127.0.0.1:6381 has 1 databases (db0) with 3178 keys, up 127 days 19 hours - replication_delay is 631
[21:54:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:59] <icinga-wm>	 PROBLEM - Check health of redis instance on 6380 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 645 600 - REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 2888 keys, up 127 days 19 hours - replication_delay is 645
[21:55:15] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] Allow Cloud VPS 172.16.0.0/16 for $wmgAllowLabsAnonEdits wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472243 (https://phabricator.wikimedia.org/T208986) (owner: 10BryanDavis)
[21:55:19] <icinga-wm>	 PROBLEM - Check health of redis instance on 6379 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 665 600 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 3088 keys, up 127 days 19 hours - replication_delay is 665
[21:55:19] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on rdb2005 is CRITICAL: Strongswan CRITICAL - ok: 1 not-conn: rdb1005_v4 Effie Mouzeli T206450: rdb1005 is being reimaged
[21:56:32] <wikibugs>	 (03Merged) 10jenkins-bot: Allow Cloud VPS 172.16.0.0/16 for $wmgAllowLabsAnonEdits wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472243 (https://phabricator.wikimedia.org/T208986) (owner: 10BryanDavis)
[21:58:09] <icinga-wm>	 ACKNOWLEDGEMENT - Check health of redis instance on 6378 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 830 600 - REDIS 2.8.17 on 127.0.0.1:6378 has 1 databases (db0) with 6 keys, up 127 days 19 hours - replication_delay is 830 Effie Mouzeli T206450: rdb1005 is being reimaged
[21:58:09] <icinga-wm>	 ACKNOWLEDGEMENT - Check health of redis instance on 6379 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 801 600 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 3088 keys, up 127 days 19 hours - replication_delay is 801 Effie Mouzeli T206450: rdb1005 is being reimaged
[21:58:09] <icinga-wm>	 ACKNOWLEDGEMENT - Check health of redis instance on 6380 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 780 600 - REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 2888 keys, up 127 days 19 hours - replication_delay is 780 Effie Mouzeli T206450: rdb1005 is being reimaged
[21:58:09] <icinga-wm>	 ACKNOWLEDGEMENT - Check health of redis instance on 6381 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 834 600 - REDIS 2.8.17 on 127.0.0.1:6381 has 1 databases (db0) with 3178 keys, up 127 days 19 hours - replication_delay is 834 Effie Mouzeli T206450: rdb1005 is being reimaged
[22:02:07] <arlolra>	 !log Updated Parsoid to 970751a (T206940)
[22:02:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:02:10] <stashbot>	 T206940: Quote marks in "alt" text break media attribute parsing - https://phabricator.wikimedia.org/T206940
[22:02:12] <logmsgbot>	 !log thcipriani@deploy1001 Synchronized wmf-config/CommonSettings.php: [[gerrit:472243|Allow Cloud VPS 172.16.0.0/16 for $wmgAllowLabsAnonEdits wikis]] T208986 (duration: 00m 54s)
[22:02:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:02:15] <stashbot>	 T208986: WDQS tests can no longer edit test.wikidata.org - https://phabricator.wikimedia.org/T208986
[22:07:08] <icinga-wm>	 RECOVERY - IPsec on rdb2005 is OK: Strongswan OK - 2 ESP OK
[22:07:08] <icinga-wm>	 RECOVERY - Check health of redis instance on 6378 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6378 has 1 databases (db0) with 6 keys, up 127 days 20 hours - replication_delay is 9
[22:07:24] <logmsgbot>	 !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.3/extensions/LdapAuthentication/LdapAuthenticationPlugin.php: [[gerrit:472336|Expose methods used by OpenStackManager]] T208995 (duration: 00m 54s)
[22:07:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:07:26] <stashbot>	 T208995: PHP Fatal Error: Call to private method LdapAuthenticationPlugin::bindAs() from context 'OpenStackNovaLdapConnection' - https://phabricator.wikimedia.org/T208995
[22:07:40] <wikibugs>	 (03CR) 10jenkins-bot: Allow Cloud VPS 172.16.0.0/16 for $wmgAllowLabsAnonEdits wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472243 (https://phabricator.wikimedia.org/T208986) (owner: 10BryanDavis)
[22:08:33] <wikibugs>	 (03PS1) 10Thcipriani: Revert "labswiki rollback to 1.33.0-wmf.2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472339
[22:08:58] <icinga-wm>	 RECOVERY - Check health of redis instance on 6379 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 3088 keys, up 127 days 20 hours - replication_delay is 7
[22:09:29] <icinga-wm>	 RECOVERY - Check health of redis instance on 6381 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6381 has 1 databases (db0) with 3178 keys, up 127 days 20 hours - replication_delay is 2
[22:10:49] <icinga-wm>	 RECOVERY - Check health of redis instance on 6380 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 2888 keys, up 127 days 20 hours - replication_delay is 2
[22:13:41] <logmsgbot>	 !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: Revert "labswiki rollback to 1.33.0-wmf.2"
[22:13:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:14:49] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] Revert "labswiki rollback to 1.33.0-wmf.2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472339 (owner: 10Thcipriani)
[22:16:36] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "labswiki rollback to 1.33.0-wmf.2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472339 (owner: 10Thcipriani)
[22:22:15] <wikibugs>	 (03CR) 10jenkins-bot: Revert "labswiki rollback to 1.33.0-wmf.2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472339 (owner: 10Thcipriani)
[22:23:26] <wikibugs>	 (03CR) 10Effie Mouzeli: puppet:Reduce cronspam from modules/mediawiki/ (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/470877 (https://phabricator.wikimedia.org/T150375) (owner: 10Thifranc)
[22:24:08] <icinga-wm>	 RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.622 second response time
[22:24:41] <RoanKattouw>	 thcipriani: Let me know when you're done doing stuff to wmf.2/wmf.3, I'd like to deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/472340 so that I can safely add GrowthExperiments to extension-list
[22:24:55] <thcipriani>	 RoanKattouw: I'm all finished
[22:24:56] <RoanKattouw>	 (Not for deployment in prod yet, just beta, but extension-list isn't split between prod and beta)
[22:24:58] <RoanKattouw>	 OK cool thanks
[22:27:00] <James_F>	 After Roan I have an extension to drop from production (yay) if that's OK.
[22:27:29] <icinga-wm>	 PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[22:34:08] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 032] Change rdb1005 to spare:system [puppet] - 10https://gerrit.wikimedia.org/r/472251 (https://phabricator.wikimedia.org/T206450) (owner: 10Effie Mouzeli)
[22:34:19] <wikibugs>	 (03PS2) 10Effie Mouzeli: Change rdb1005 to spare:system [puppet] - 10https://gerrit.wikimedia.org/r/472251 (https://phabricator.wikimedia.org/T206450)
[22:37:21] <wikibugs>	 10Operations, 10ops-codfw: unrack/decom cr1-eqord - https://phabricator.wikimedia.org/T208049 (10Papaul)
[22:37:37] <wikibugs>	 10Operations, 10ops-codfw: unrack/decom cr1-eqord - https://phabricator.wikimedia.org/T208049 (10Papaul)
[22:38:48] <wikibugs>	 10Operations, 10ops-codfw, 10netops, 10Patch-For-Review: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10Papaul)
[22:41:05] <wikibugs>	 10Operations, 10DBA: db2061 has predictive disk errors - https://phabricator.wikimedia.org/T208957 (10Papaul) p:05Triage>03Normal
[22:41:36] <wikibugs>	 (03PS2) 10Catrope: Add GrowthExperiments extension to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470885 (https://phabricator.wikimedia.org/T208449)
[22:41:47] <wikibugs>	 (03CR) 10Catrope: [C: 032] Add GrowthExperiments extension to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470885 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[22:42:58] <wikibugs>	 (03Merged) 10jenkins-bot: Add GrowthExperiments extension to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470885 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[22:43:35] <wikibugs>	 10Operations, 10Patch-For-Review, 10User-Joe, 10User-jijiki: Reorganize our redis rdb1/rdb2 clusters - https://phabricator.wikimedia.org/T206450 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ``` rdb1005.eqiad.wmnet ``` The log can be found in `/var/...
[22:43:48] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] wdqs: separation of concerns (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/471665 (https://phabricator.wikimedia.org/T208394) (owner: 10Mathew.onipe)
[22:47:29] <icinga-wm>	 RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.389 second response time
[22:48:39] <wikibugs>	 (03PS1) 10Bstorm: sonofgridengine: correct some issues for stretch bastions [puppet] - 10https://gerrit.wikimedia.org/r/472343 (https://phabricator.wikimedia.org/T200557)
[22:50:08] <wikibugs>	 (03CR) 10Bstorm: [C: 032] sonofgridengine: correct some issues for stretch bastions [puppet] - 10https://gerrit.wikimedia.org/r/472343 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm)
[22:50:29] <icinga-wm>	 PROBLEM - IPsec on rdb2005 is CRITICAL: Strongswan CRITICAL - ok: 1 not-conn: rdb1005_v4
[22:50:36] <wikibugs>	 (03CR) 10jenkins-bot: Add GrowthExperiments extension to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470885 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[22:53:19] <icinga-wm>	 PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[22:53:39] <icinga-wm>	 PROBLEM - Check health of redis instance on 6378 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 604 600 - REDIS 2.8.17 on 127.0.0.1:6378 has 1 databases (db0) with 6 keys, up 127 days 20 hours - replication_delay is 604
[22:53:48] <icinga-wm>	 PROBLEM - Check health of redis instance on 6381 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 609 600 - REDIS 2.8.17 on 127.0.0.1:6381 has 1 databases (db0) with 3178 keys, up 127 days 20 hours - replication_delay is 609
[22:53:59] <icinga-wm>	 PROBLEM - Check health of redis instance on 6380 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 623 600 - REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 2888 keys, up 127 days 20 hours - replication_delay is 623
[22:54:28] <icinga-wm>	 PROBLEM - Check health of redis instance on 6379 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 645 600 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 3088 keys, up 127 days 20 hours - replication_delay is 645
[22:57:53] <logmsgbot>	 !log catrope@deploy1001 Started scap: Full scap to rebuild i18n for the addition of the GrowthExperiments extension
[22:57:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:01:35] <icinga-wm>	 PROBLEM - puppet last run on rdb1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:03:34] <mutante>	 those redis alerts are also due to reinstall of rdb1005
[23:14:01] <icinga-wm>	 RECOVERY - Long running screen/tmux on an-coord1001 is OK: OK: SCREEN detected but not long running.
[23:16:16] <wikibugs>	 10Operations, 10Patch-For-Review, 10User-Joe, 10User-jijiki: Reorganize our redis rdb1/rdb2 clusters - https://phabricator.wikimedia.org/T206450 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['rdb1005.eqiad.wmnet'] ```  and were **ALL** successful.
[23:16:48] <James_F>	 How long does a full scap take nowadays?
[23:17:14] <mutante>	 jiji: ^ success :)
[23:19:35] <jiji>	 :D
[23:21:21] <jiji>	 !log Disabled nagios checks on rdb1006 and rdb2005 due to rdb1005 reimaging - T206450
[23:21:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:21:25] <stashbot>	 T206450: Reorganize our redis rdb1/rdb2 clusters - https://phabricator.wikimedia.org/T206450
[23:23:53] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave Lag: s1 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 6738.22 seconds Banyek T208954
[23:24:41] <wikibugs>	 (03CR) 10Nuria: [C: 031] "Let's please document this here: https://wikitech.wikimedia.org/wiki/X-Analytics" [puppet] - 10https://gerrit.wikimedia.org/r/471257 (https://phabricator.wikimedia.org/T208795) (owner: 10Dr0ptp4kt)
[23:32:21] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw2251 is CRITICAL: Return code of 255 is out of bounds
[23:32:21] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw2262 is CRITICAL: Return code of 255 is out of bounds
[23:32:21] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw2285 is CRITICAL: Return code of 255 is out of bounds
[23:33:22] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw2251 is OK: OK - load average: 0.79, 4.77, 3.09
[23:33:22] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw2262 is OK: OK - load average: 0.62, 1.99, 1.16
[23:33:22] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw2285 is OK: OK - load average: 0.37, 1.62, 1.02
[23:34:23] <thcipriani>	 James_F: full scap has been like 25 minutes
[23:34:25] <thcipriani>	 recently
[23:34:41] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[23:37:11] <James_F>	 thcipriani: Right.
[23:37:33] <logmsgbot>	 !log catrope@deploy1001 Finished scap: Full scap to rebuild i18n for the addition of the GrowthExperiments extension (duration: 39m 40s)
[23:37:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:37:40] <greg-g>	 or 40 :)
[23:38:02] <RoanKattouw>	 That was with i18n rebuild for both branches
[23:38:23] <wikibugs>	 (03CR) 10Catrope: [C: 032] GrowthExperiments, part I: Add extension flag to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470889 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:40:35] <James_F>	 Oy.
[23:41:29] <wikibugs>	 (03PS2) 10Catrope: GrowthExperiments, part I: Add extension flag to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470889 (https://phabricator.wikimedia.org/T208449)
[23:41:38] <wikibugs>	 (03CR) 10Catrope: [C: 032] GrowthExperiments, part I: Add extension flag to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470889 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:42:44] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments, part I: Add extension flag to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470889 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:43:31] <wikibugs>	 (03CR) 10jenkins-bot: GrowthExperiments, part I: Add extension flag to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470889 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:44:14] <wikibugs>	 (03PS3) 10Catrope: GrowthExperiments, part II: make extension flag operative in CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470890 (https://phabricator.wikimedia.org/T208449)
[23:44:19] <wikibugs>	 (03CR) 10Catrope: [C: 032] GrowthExperiments, part II: make extension flag operative in CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470890 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:44:55] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add flag for GrowthExperiments to InitialiseSettings (duration: 00m 53s)
[23:44:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:45:32] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments, part II: make extension flag operative in CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470890 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:46:01] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[23:48:26] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/CommonSettings.php: Make GrowthExperiments flag operative in CommonSettings (duration: 00m 53s)
[23:48:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:49:27] <wikibugs>	 (03PS3) 10Catrope: GrowthExperiments, part III: Enable on English and Korean beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470891 (https://phabricator.wikimedia.org/T208449)
[23:49:31] <wikibugs>	 (03CR) 10Catrope: [C: 032] GrowthExperiments, part III: Enable on English and Korean beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470891 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:49:43] <RoanKattouw>	 James_F: OK I'm done messing with production, all yours now
[23:50:14] <wikibugs>	 (03CR) 10Jforrester: [C: 032] Drop the Petition extension: Part I - disable in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472213 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:50:36] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments, part III: Enable on English and Korean beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470891 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:54:26] <wikibugs>	 (03PS1) 10Dzahn: icinga: fix path to retention.dat file on stretch [puppet] - 10https://gerrit.wikimedia.org/r/472352 (https://phabricator.wikimedia.org/T202782)
[23:55:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] icinga: fix path to retention.dat file on stretch [puppet] - 10https://gerrit.wikimedia.org/r/472352 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[23:55:30] <wikibugs>	 (03PS2) 10Jforrester: Drop the Petition extension: Part I - disable in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472213 (https://phabricator.wikimedia.org/T208081)
[23:55:36] <wikibugs>	 (03CR) 10Jforrester: [C: 032] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472213 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:55:43] <wikibugs>	 (03PS2) 10Jforrester: Drop the Petition extension: Part II - disable in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472214 (https://phabricator.wikimedia.org/T208081)
[23:55:49] <wikibugs>	 (03CR) 10Jforrester: [C: 032] Drop the Petition extension: Part II - disable in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472214 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:55:58] <wikibugs>	 (03PS2) 10Jforrester: Drop the Petition extension: Part III - drop related user-rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472215 (https://phabricator.wikimedia.org/T208081)
[23:56:04] <wikibugs>	 (03CR) 10Jforrester: [C: 032] Drop the Petition extension: Part III - drop related user-rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472215 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:56:22] <icinga-wm>	 RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.300 second response time
[23:57:08] <wikibugs>	 (03Merged) 10jenkins-bot: Drop the Petition extension: Part I - disable in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472213 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:57:34] <wikibugs>	 (03Merged) 10jenkins-bot: Drop the Petition extension: Part II - disable in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472214 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:57:40] <wikibugs>	 (03Merged) 10jenkins-bot: Drop the Petition extension: Part III - drop related user-rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472215 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:58:08] <wikibugs>	 (03CR) 10jenkins-bot: GrowthExperiments, part II: make extension flag operative in CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470890 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:58:10] <wikibugs>	 (03CR) 10jenkins-bot: GrowthExperiments, part III: Enable on English and Korean beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470891 (https://phabricator.wikimedia.org/T208449) (owner: 10Catrope)
[23:58:12] <wikibugs>	 (03CR) 10jenkins-bot: Drop the Petition extension: Part I - disable in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472213 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:58:14] <wikibugs>	 (03CR) 10jenkins-bot: Drop the Petition extension: Part II - disable in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472214 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:58:36] <wikibugs>	 (03CR) 10jenkins-bot: Drop the Petition extension: Part III - drop related user-rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472215 (https://phabricator.wikimedia.org/T208081) (owner: 10Jforrester)
[23:59:52] <icinga-wm>	 PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:59:52] <James_F>	 I'll SWAT, given I'm doing it already.