[00:00:04] MaxSem and kaldari: It is that lovely time of the day again! You are hereby commanded to deploy CongressLookup Window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T0000). [00:06:08] jouncebot: now [00:06:09] For the next 0 hour(s) and 53 minute(s): CongressLookup Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T0000) [00:31:44] (03PS1) 10MaxSem: Temporarily disable GlobalPreferences in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432022 (https://phabricator.wikimedia.org/T194229) [00:32:02] (03CR) 10MaxSem: [C: 032] Temporarily disable GlobalPreferences in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432022 (https://phabricator.wikimedia.org/T194229) (owner: 10MaxSem) [00:33:19] (03Merged) 10jenkins-bot: Temporarily disable GlobalPreferences in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432022 (https://phabricator.wikimedia.org/T194229) (owner: 10MaxSem) [00:33:35] (03CR) 10jenkins-bot: Temporarily disable GlobalPreferences in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432022 (https://phabricator.wikimedia.org/T194229) (owner: 10MaxSem) [00:35:35] !log maxsem@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/432022/ - noop in prod (duration: 01m 20s) [00:35:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:41:28] (03PS2) 10MaxSem: Deploy CongressLookup to betalabs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431989 [00:41:44] (03CR) 10MaxSem: [C: 032] Deploy CongressLookup to betalabs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431989 (owner: 10MaxSem) [00:42:59] (03Merged) 10jenkins-bot: Deploy CongressLookup to betalabs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431989 (owner: 10MaxSem) [00:43:16] (03CR) 10jenkins-bot: Deploy CongressLookup to betalabs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431989 (owner: 10MaxSem) [00:52:25] MaxSem: Everything's ready on beta labs. Lemme know when I can test. Thanks! [00:53:10] kaldari: it's merged, now gotsa wait for the god of jobs to sync it to beta :P [00:57:46] MaxSem: https://phabricator.wikimedia.org/T194230 [01:18:14] jouncebot: now [01:18:15] No deployments scheduled for the next 11 hour(s) and 41 minute(s) [01:19:54] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2205.codfw.wmnet [01:19:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:20:54] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2207.codfw.wmnet [01:20:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:20:59] What could https://noc.wikimedia.org/conf/highlight.php?file=dblists/wikipedia-e-acute.dblist possibly be for.. [01:21:20] oh, for the logo and stuff [01:21:51] e-acute = e with the é on top [01:22:44] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2208.codfw.wmnet [01:22:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:26:01] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2203.codfw.wmnet [01:26:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:27:03] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2204.codfw.wmnet [01:27:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:30:32] 10Operations, 10netops: Update BGP_sanitize_in filter - https://phabricator.wikimedia.org/T190317#4192939 (10ayounsi) The last change to be applied: ```lang=diff + route-filter 0.0.0.0/0 prefix-length-range /25-/32; - route-filter 0.0.0.0/0 prefix-length-range /27-/32; ``` Will cause 135 invalid pref... [01:33:40] !log mw2209,mw2210,mw2211 - reinstall wtih stretch [01:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:36:10] !log progressively push updated BGP_sanitize_in prefix-length-range to routers - T190317 [01:36:12] !log dzahn@neodymium conftool action : set/pooled=inactive; selector: name=mw2209.codfw.wmnet [01:36:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:36:14] T190317: Update BGP_sanitize_in filter - https://phabricator.wikimedia.org/T190317 [01:36:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:36:26] !log dzahn@neodymium conftool action : set/pooled=inactive; selector: name=mw2210.codfw.wmnet [01:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:36:38] !log dzahn@neodymium conftool action : set/pooled=inactive; selector: name=mw2211.codfw.wmnet [01:36:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:56:01] (03PS1) 10MaxSem: Deploy CongressLookup on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432029 (https://phabricator.wikimedia.org/T194230) [01:56:05] (03PS1) 10MaxSem: Deploy CongressLookup on Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432030 (https://phabricator.wikimedia.org/T194230) [02:00:00] RECOVERY - MegaRAID on analytics1032 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [02:00:38] !log maxsem@tin Synchronized php-1.32.0-wmf.3/extensions/CongressLookup/: Preparation for T194230 (duration: 01m 22s) [02:00:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:00:43] T194230: Deploy new CongressLookup to Beta Labs (for testing) and Meta Wiki (once it's tested) - https://phabricator.wikimedia.org/T194230 [02:02:13] !log maxsem@tin Synchronized php-1.32.0-wmf.2/extensions/CongressLookup/: Preparation for T194230 (duration: 01m 17s) [02:02:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:06:38] (03PS1) 10Catrope: Enable mapframe on all but a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432031 (https://phabricator.wikimedia.org/T191585) [02:06:57] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: Preparation for T194230 (duration: 01m 16s) [02:07:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:07:02] T194230: Deploy new CongressLookup to Beta Labs (for testing) and Meta Wiki (once it's tested) - https://phabricator.wikimedia.org/T194230 [02:07:03] (03PS2) 10Catrope: Enable mapframe on all but a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432031 (https://phabricator.wikimedia.org/T191585) [02:08:14] 10Operations, 10netops: Update BGP_sanitize_in filter - https://phabricator.wikimedia.org/T190317#4192977 (10ayounsi) 05Open>03Resolved All done! [02:09:22] !log maxsem@tin Synchronized wmf-config: Preparation for T194230 (duration: 01m 15s) [02:09:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:30:24] PROBLEM - MegaRAID on analytics1032 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [02:57:51] !log l10nupdate@tin scap sync-l10n completed (1.32.0-wmf.2) (duration: 09m 06s) [02:57:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:28:34] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 830.57 seconds [03:40:54] PROBLEM - Disk space on elastic1026 is CRITICAL: DISK CRITICAL - free space: /srv 61677 MB (12% inode=99%) [03:43:25] RECOVERY - Disk space on elastic1026 is OK: DISK OK [03:55:31] !log l10nupdate@tin scap sync-l10n completed (1.32.0-wmf.3) (duration: 16m 34s) [03:55:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:02:56] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed May 9 04:02:56 UTC 2018 (duration 7m 26s) [04:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:16:25] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 284.41 seconds [04:35:18] (03PS1) 10Andrew Bogott: keystonehooks: Update our create_project monkeypatch to match Mitaka upstream [puppet] - 10https://gerrit.wikimedia.org/r/432040 [04:36:21] (03CR) 10jerkins-bot: [V: 04-1] keystonehooks: Update our create_project monkeypatch to match Mitaka upstream [puppet] - 10https://gerrit.wikimedia.org/r/432040 (owner: 10Andrew Bogott) [04:37:20] (03PS2) 10Andrew Bogott: keystonehooks: Update our create_project monkeypatch to match Mitaka upstream [puppet] - 10https://gerrit.wikimedia.org/r/432040 [04:53:45] RECOVERY - MegaRAID on analytics1032 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [05:10:45] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432041 [05:10:52] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432041 [05:13:28] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432041 (owner: 10Marostegui) [05:14:41] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432041 (owner: 10Marostegui) [05:15:03] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1073 - https://phabricator.wikimedia.org/T194197#4193082 (10Marostegui) 05Open>03Resolved a:03Cmjohnson The disk is now part of the RAID again. ``` root@db1073:~# megacli -PDRbld -ShowProg -PhysDrv [32:7] -aALL Device(Encl-32 Slot-7) is not in rebui... [05:19:10] !log Stop slave on db1116:s3 to do some gtid cleanups and tests [05:19:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:19:23] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 after alter table (duration: 01m 36s) [05:19:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:19:55] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432041 (owner: 10Marostegui) [05:24:20] PROBLEM - MegaRAID on analytics1032 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [05:26:47] !log Stop MySQL on db2092 to do some clean ups [05:26:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:28:34] 10Operations, 10ops-eqiad: anaytics1032's BBU is not working correctly - https://phabricator.wikimedia.org/T194234#4193088 (10elukey) p:05Triage>03Normal [05:55:01] RECOVERY - MegaRAID on analytics1032 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [06:00:38] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2209.codfw.wmnet [06:00:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:41] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2210.codfw.wmnet [06:01:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:24] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2211.codfw.wmnet [06:04:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:50] PROBLEM - MegaRAID on analytics1032 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [06:30:40] PROBLEM - puppet last run on dbproxy1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh] [06:32:31] PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh] [06:32:50] PROBLEM - puppet last run on mw1319 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh] [06:35:51] (03PS1) 10Smalyshev: Add wikis with more that 1000 categories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432043 (https://phabricator.wikimedia.org/T194139) [06:38:20] (03PS2) 10Smalyshev: Add wikis with more that 1000 categories to categories dump [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432043 (https://phabricator.wikimedia.org/T194139) [06:46:50] PROBLEM - puppet last run on labvirt1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/gen_fingerprints] [06:57:01] RECOVERY - puppet last run on dbproxy1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:01] RECOVERY - puppet last run on mw1319 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:01:52] RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 6 minutes ago with 0 failures [07:05:32] !log reimaging mw1334, mw1335 (job runners) to stretch [07:05:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:16] !log reimaging mw2162 to stretch (last jessie job runner in codfw) [07:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:08] !log reimaging mw2152 to stretch (video scaler with a deprecated one-off partman recipe) [07:12:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:31] RECOVERY - puppet last run on labvirt1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:27:42] elukey: I kind of feel like that IP ban on phab may have banned the whole of morocco :P [07:28:18] every internet connection I have used so far all seem to be in the range [07:28:27] It's a shame the block can't only be for uploads or something [07:29:47] addshore: yep it might be too broad, I'd suggest to follow up with a phab task (I am a bit ignorant about how those rules are reviewed) [07:33:50] okay! [07:39:40] added a comment :) [07:47:33] RECOVERY - MegaRAID on analytics1032 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [08:01:45] PROBLEM - nutcracker process on mw2152 is CRITICAL: Return code of 255 is out of bounds [08:03:14] PROBLEM - Disk space on mw2152 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:14] PROBLEM - puppet last run on mw2152 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:35] PROBLEM - HHVM jobrunner on mw2152 is CRITICAL: connect to address 10.192.32.40 and port 9005: Connection refused [08:04:11] ^ silencing [08:13:25] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [08:15:49] (03CR) 10Chad: [V: 032 C: 032] Minimal pom.xml so output from mvn looks sane [software/gerrit] (stable-2.14) - 10https://gerrit.wikimedia.org/r/432017 (owner: 10Chad) [08:17:07] !log demon@tin Started deploy [gerrit/gerrit@c421c91]: 2.14.7 -> 2.14.8 [08:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:18] !log demon@tin Finished deploy [gerrit/gerrit@c421c91]: 2.14.7 -> 2.14.8 (duration: 00m 11s) [08:17:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:25] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [08:18:28] ^ the memcached alert is harmless and caused by job runner reimages [08:18:31] !log stop and upgrade db1053 [08:18:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:02] :) [08:19:07] !log gerrit: restarting for version bump 2.14.7 -> 2.14.8 [08:19:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:27] paladox: We're on 2.14.8-22-g07c8aa9910 now :) [08:21:54] PROBLEM - haproxy failover on dbproxy1008 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [08:22:14] PROBLEM - haproxy failover on dbproxy1003 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [08:22:30] ^that is scheduled [08:22:32] see log [08:26:24] RECOVERY - Memory correctable errors -EDAC- on db1053 is OK: (C)3 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db1053&var-datasource=eqiad%2520prometheus%252Fops [08:26:54] RECOVERY - haproxy failover on dbproxy1008 is OK: OK check_failover servers up 2 down 0 [08:27:14] RECOVERY - haproxy failover on dbproxy1003 is OK: OK check_failover servers up 2 down 0 [08:39:56] (03CR) 10Gehel: [C: 04-1] "A few comments inline. I have not tested any of this, so I might be entirely wrong. If that's the case, let me know!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/431860 (https://phabricator.wikimedia.org/T193766) (owner: 10Herron) [08:41:57] (03CR) 10Gehel: [C: 04-1] "Note (1): I have only a very generic idea of how kibana works with multiple indices. I suspect the current CR would work on the kibana sid" [puppet] - 10https://gerrit.wikimedia.org/r/431860 (https://phabricator.wikimedia.org/T193766) (owner: 10Herron) [08:48:36] PROBLEM - MegaRAID on analytics1032 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [08:51:59] !log reimaging mw1336, mw1337 (job runners) to stretch [08:52:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:03] 10Operations, 10Patch-For-Review: Ship host syslogs to ELK - https://phabricator.wikimedia.org/T193766#4193319 (10Gehel) Great to see this moving! Starting experimenting with multiple indices is a great idea! And syslog is probably sufficiently simple to be a good starting point. A few questions / comments:... [09:09:55] (03CR) 10Volans: "I'm late, but I have a a refactor suggestion" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431542 (https://phabricator.wikimedia.org/T186069) (owner: 10Filippo Giunchedi) [09:11:07] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [09:18:31] (03PS2) 10Jcrespo: mariab: Fully pool db1064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431740 (https://phabricator.wikimedia.org/T194118) [09:23:27] (03CR) 10Jcrespo: [C: 032] mariab: Fully pool db1064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431740 (https://phabricator.wikimedia.org/T194118) (owner: 10Jcrespo) [09:24:42] (03Merged) 10jenkins-bot: mariab: Fully pool db1064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431740 (https://phabricator.wikimedia.org/T194118) (owner: 10Jcrespo) [09:29:05] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Pool db1064 with full weight (duration: 01m 27s) [09:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:00] (03CR) 10jenkins-bot: mariab: Fully pool db1064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431740 (https://phabricator.wikimedia.org/T194118) (owner: 10Jcrespo) [09:34:16] (03CR) 10DCausse: [C: 031] Add wikis with more that 1000 categories to categories dump [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432043 (https://phabricator.wikimedia.org/T194139) (owner: 10Smalyshev) [09:53:21] (03PS2) 10Alexandros Kosiaris: Provision RSA keys for ganeti root auth [puppet] - 10https://gerrit.wikimedia.org/r/431782 [09:59:11] (03CR) 10Chad: [C: 031] mediawiki/apache: seperate line for each chapter ServerAlias [puppet] - 10https://gerrit.wikimedia.org/r/429863 (owner: 10Dzahn) [09:59:27] (03CR) 10Chad: [C: 032] Forward response codes >= 400 on search.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430502 (https://phabricator.wikimedia.org/T179266) (owner: 10EBernhardson) [10:00:42] (03Merged) 10jenkins-bot: Forward response codes >= 400 on search.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430502 (https://phabricator.wikimedia.org/T179266) (owner: 10EBernhardson) [10:00:55] (03CR) 10jenkins-bot: Forward response codes >= 400 on search.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430502 (https://phabricator.wikimedia.org/T179266) (owner: 10EBernhardson) [10:02:14] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [10:02:30] !log demon@tin Synchronized docroot/search.wikimedia.org/index.php: improve 5xx/4xx error handling (duration: 01m 27s) [10:02:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:05] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4193457 (10Vgutierrez) So MSI-X limit can be changed on the NIC BIOS, it was set to 16 for enp4s0f0, after setting it to 32 **and power cycling** the server, lspci showed the... [10:03:12] (03PS2) 10DCausse: [cirrus] Increase the number of shards for wikidatawiki_content, enwiki_general [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427176 (https://phabricator.wikimedia.org/T192064) [10:05:34] moritzm: I'm guessing your reimage explains the spike of nutcracker failures in logstash for mw1337? [10:06:08] PROBLEM - nutcracker port on mw2152 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused [10:06:27] (03CR) 10Chad: [C: 032] multiversion: Remove unused vendor/autoload from getMWVersion. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432011 (owner: 10Krinkle) [10:06:37] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [10:08:35] no_justification: yeah, those are all local, we've implemented some option to wmf-reimage to mask jobrunner during the reimage, but puppet still starts the service [10:09:09] on the bright side, those were the last two job runners to be reimaged to stretch, so we won't see that kind of monitoring spam for a while :-) [10:09:14] :) [10:10:27] RECOVERY - nutcracker port on mw2152 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212 [10:11:28] RECOVERY - Disk space on mw2152 is OK: DISK OK [10:12:58] RECOVERY - HHVM jobrunner on mw2152 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.201 second response time [10:13:27] RECOVERY - puppet last run on mw2152 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:14:28] RECOVERY - nutcracker process on mw2152 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker [10:14:31] (03CR) 10Chad: [C: 032] Remove unused vendor/autoload.php from missing.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432010 (owner: 10Krinkle) [10:15:48] (03Merged) 10jenkins-bot: Remove unused vendor/autoload.php from missing.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432010 (owner: 10Krinkle) [10:15:50] (03Merged) 10jenkins-bot: multiversion: Remove unused vendor/autoload from getMWVersion. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432011 (owner: 10Krinkle) [10:18:59] (03CR) 10Alexandros Kosiaris: [C: 032] Provision RSA keys for ganeti root auth [puppet] - 10https://gerrit.wikimedia.org/r/431782 (owner: 10Alexandros Kosiaris) [10:20:42] (03PS1) 10Jcrespo: mariadb: Move db1077 to ROW binlog_format [puppet] - 10https://gerrit.wikimedia.org/r/432054 (https://phabricator.wikimedia.org/T192979) [10:20:48] (03CR) 10jenkins-bot: Remove unused vendor/autoload.php from missing.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432010 (owner: 10Krinkle) [10:21:02] (03PS2) 10Jcrespo: mariadb: Move db1077 to ROW binlog_format [puppet] - 10https://gerrit.wikimedia.org/r/432054 (https://phabricator.wikimedia.org/T192979) [10:21:44] (03CR) 10Jcrespo: [C: 032] mariadb: Move db1077 to ROW binlog_format [puppet] - 10https://gerrit.wikimedia.org/r/432054 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:27:57] RECOVERY - Memory correctable errors -EDAC- on cp1068 is OK: (C)3 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=cp1068&var-datasource=eqiad%2520prometheus%252Fops [10:28:37] 10Operations, 10ops-eqiad: kafka1023 correctable memory errors - https://phabricator.wikimedia.org/T194249#4193534 (10fgiunchedi) [10:28:58] !log cycle-load edac kernel modules for cp1068 to reset counters [10:29:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:46] ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on kafka1023 is CRITICAL: 18 ge 3 Filippo Giunchedi T194249 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=kafka1023&var-datasource=eqiad%2520prometheus%252Fops [10:30:55] !log cycle-load edac kernel modules for scb1002 to reset counters [10:30:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:00] (03PS1) 10Jcrespo: mariadb: Depool db1072 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432056 (https://phabricator.wikimedia.org/T192979) [10:33:56] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1072 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432056 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:35:06] (03Merged) 10jenkins-bot: mariadb: Depool db1072 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432056 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [10:36:12] 10Operations, 10ops-codfw: wtp2013 memory correctable errors - https://phabricator.wikimedia.org/T194174#4193595 (10fgiunchedi) ``` wtp2013:~$ sudo ipmi-sel ID | Date | Time | Name | Type | Event 1 | Jan-15-2015 | 23:04:45 | SEL | Event Logging Disable... [10:36:49] 10Operations, 10ops-codfw: rdb2002 correctable memory errors - https://phabricator.wikimedia.org/T194171#4193600 (10fgiunchedi) ``` 4 | May-06-2018 | 04:46:06 | Mem ECC Warning | Memory | transition to Non-Critical from OK ; OEM Event Data2 code = 90h ; OEM Event Data3 code = 40h ``` [10:37:50] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4193603 (10Marostegui) To sum up the last events. db1116:3313 yesterday broke with: ``` Last_IO_Error: Got fatal error 1236 fr... [10:40:58] !log demon@tin Synchronized multiversion/getMWVersion: remove vendor dep (duration: 03m 27s) [10:41:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:46] RECOVERY - Memory correctable errors -EDAC- on scb1002 is OK: (C)3 ge (W)1 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=scb1002&var-datasource=eqiad%2520prometheus%252Fops [10:45:55] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1072 (duration: 01m 19s) [10:45:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:11] !log demon@tin Synchronized wmf-config/missing.php: remove vendor dep (duration: 01m 20s) [10:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:22] (03PS1) 10Filippo Giunchedi: prometheus: alert on config reload failure [puppet] - 10https://gerrit.wikimedia.org/r/432059 [10:54:06] (03CR) 10Addshore: BETA ONLY - WikibaseLexeme config (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [10:54:13] <_joe_> godog: is there a way for prometheus to verify its config without actually trying to load it? [10:54:40] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2067 - https://phabricator.wikimedia.org/T194103#4193662 (10Marostegui) a:05Marostegui>03Papaul [10:55:24] (03CR) 10Volans: [V: 032 C: 032] Created Django apps [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394619 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [10:55:33] (03CR) 10jerkins-bot: [V: 04-1] Created Django apps [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394619 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [10:55:37] !log reimaging mw2246 to stretch (video scaler with a deprecated one-off partman recipe) [10:55:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:13] _joe_: yes, we do an "onlyif" already for the main configuration in puppet, though for files not in the main configuration the reload is automatic from prometheus too [10:56:26] <_joe_> uh I see [10:56:36] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1962 bytes in 0.096 second response time [10:56:42] <_joe_> well there is the validate_cmd attribute in puppet that could help you maybe? [10:57:08] <_joe_> that only changes a file on disk if the command listed there exits with exit code 0 [10:58:33] (03CR) 10Ladsgroup: "In practice, you know the config belongs to repo or client, that narrows down the search drastically. I'd be happy with InitializeWikibase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [10:59:52] _joe_: validate_cmd takes the temporary file as input iirc ? that would work for rules files, for autogenerated target files there isn't a way to check each file in isolation [11:01:27] (03PS1) 10Marostegui: db2092.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/432062 (https://phabricator.wikimedia.org/T190704) [11:02:16] (03CR) 10Marostegui: [C: 032] db2092.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/432062 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [11:03:44] !log updated jenkins packages [11:03:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:06] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1953 bytes in 0.097 second response time [11:08:28] (03PS2) 10Filippo Giunchedi: prometheus: alert on config reload failure [puppet] - 10https://gerrit.wikimedia.org/r/432059 [11:10:57] PROBLEM - Varnish HTTP text-frontend - port 3127 on cp5009 is CRITICAL: connect to address 10.132.0.109 and port 3127: Connection refused [11:10:57] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp5009 is CRITICAL: connect to address 10.132.0.109 and port 3122: Connection refused [11:11:36] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp5009 is CRITICAL: connect to address 10.132.0.109 and port 3120: Connection refused [11:14:02] !log reimage mw2206 (earlier reimage failed since the host lacked a puppet cert) [11:14:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:14:16] RECOVERY - Varnish HTTP text-frontend - port 3127 on cp5009 is OK: HTTP OK: HTTP/1.1 200 OK - 503 bytes in 0.494 second response time [11:14:16] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp5009 is OK: HTTP OK: HTTP/1.1 200 OK - 502 bytes in 0.507 second response time [11:14:46] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp5009 is OK: HTTP OK: HTTP/1.1 200 OK - 503 bytes in 0.494 second response time [11:18:12] (03PS1) 10Muehlenhoff: Update MAC address for mw2202 [puppet] - 10https://gerrit.wikimedia.org/r/432066 [11:18:51] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/432066 (owner: 10Muehlenhoff) [11:19:51] (03PS2) 10Muehlenhoff: Update MAC address for mw2202 [puppet] - 10https://gerrit.wikimedia.org/r/432066 [11:20:49] (03CR) 10Muehlenhoff: [C: 032] Update MAC address for mw2202 [puppet] - 10https://gerrit.wikimedia.org/r/432066 (owner: 10Muehlenhoff) [11:21:08] (03PS1) 10Jcrespo: mariadb: Setup db1123 into s3 [puppet] - 10https://gerrit.wikimedia.org/r/432068 (https://phabricator.wikimedia.org/T192979) [11:21:23] (03PS2) 10Jcrespo: mariadb: Setup db1123 into s3 [puppet] - 10https://gerrit.wikimedia.org/r/432068 (https://phabricator.wikimedia.org/T192979) [11:22:01] (03CR) 10Jcrespo: [C: 032] mariadb: Setup db1123 into s3 [puppet] - 10https://gerrit.wikimedia.org/r/432068 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [11:23:57] (03PS1) 10Marostegui: db1074.yaml: Make clearer that db1074 uses ROW [puppet] - 10https://gerrit.wikimedia.org/r/432069 [11:24:18] (03PS2) 10Marostegui: db1074.yaml: Make clearer that db1074 uses ROW [puppet] - 10https://gerrit.wikimedia.org/r/432069 [11:25:11] (03CR) 10Jcrespo: [C: 031] "+1000" [puppet] - 10https://gerrit.wikimedia.org/r/432069 (owner: 10Marostegui) [11:26:42] !log installing wget security updates on Debian systems [11:26:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:54] (03CR) 10Marostegui: [C: 032] db1074.yaml: Make clearer that db1074 uses ROW [puppet] - 10https://gerrit.wikimedia.org/r/432069 (owner: 10Marostegui) [11:33:49] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4193719 (10WMDE-Fisch) >>! In T190717#4190164, @MoritzMuehlenhoff wrote: >> In the mean time deployment-prep was also migrated to stretch, so... [11:34:10] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Upgrade mw* servers to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T174431#4193720 (10MoritzMuehlenhoff) All job runners in eqiad and codfw are now running stretch. [11:34:12] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Upgrade mw* servers to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T174431#4193721 (10MoritzMuehlenhoff) [12:04:45] (03CR) 10Mforns: "It's safe to merge this now :]" [puppet] - 10https://gerrit.wikimedia.org/r/429465 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [12:06:12] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4193769 (10MoritzMuehlenhoff) Great, thanks [12:11:03] PROBLEM - Check whether ferm is active by checking the default input chain on ganeti2008 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [12:18:53] 10Operations, 10SRE-Access-Requests: Access to Google Search Console for Go Fish Digital - https://phabricator.wikimedia.org/T192893#4193791 (10MoritzMuehlenhoff) >>! In T192893#4190022, @faidon wrote: > Thanks @Deskana :) I think that all seems sufficient and we should just go ahead with this. 2018-08-01 soun... [12:31:29] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4193810 (10Lea_WMDE) @MoritzMuehlenhoff I just talked to my team, and with the hackathon coming up, it will be difficult for us to do the com... [12:39:50] RECOVERY - Check whether ferm is active by checking the default input chain on ganeti2008 is OK: OK ferm input default policy is set [12:48:07] (03CR) 10Volans: "Answered questions" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/430881 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:53:06] (03PS3) 10Elukey: Add cron job to sanitize EventLogging data in Hive [puppet] - 10https://gerrit.wikimedia.org/r/429465 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [12:54:02] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/11169/" [puppet] - 10https://gerrit.wikimedia.org/r/429465 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [12:54:04] (03CR) 10Elukey: [C: 032] Add cron job to sanitize EventLogging data in Hive [puppet] - 10https://gerrit.wikimedia.org/r/429465 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [12:54:56] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4193887 (10MoritzMuehlenhoff) >>! In T190717#4193810, @Lea_WMDE wrote: > @MoritzMuehlenhoff I just talked to my team, and with the hackathon... [12:59:02] jouncebot: refresh [12:59:03] I refreshed my knowledge about deployments. [12:59:06] jouncebot: next [12:59:07] In 0 hour(s) and 0 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T1300) [12:59:38] !log milimetric@tin Started deploy [analytics/refinery@640bc35]: Renaming geoeditors druid datasource [12:59:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy European Mid-day SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T1300). [13:00:04] dcausse and stephanebisson: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:19] o/ [13:00:20] hello [13:00:57] I can swat today [13:02:03] dcausse: want to deploy your commit yourself? or should I? [13:02:13] zeljkof: sure I can [13:02:28] dcausse: go ahead, I will review stephanebisson commits [13:02:34] ok [13:03:00] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4193908 (10Lea_WMDE) >>! In T190717#4193887, @MoritzMuehlenhoff wrote: > We can do that. My original plan was to start rolling out the new w... [13:03:04] (03CR) 10DCausse: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427176 (https://phabricator.wikimedia.org/T192064) (owner: 10DCausse) [13:03:43] stephanebisson: 431609 is not related to any phab task? [13:04:00] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.123 second response time [13:04:27] (03Merged) 10jenkins-bot: [cirrus] Increase the number of shards for wikidatawiki_content, enwiki_general [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427176 (https://phabricator.wikimedia.org/T192064) (owner: 10DCausse) [13:05:08] (03PS2) 10Sbisson: Remove unused wgKartographerDfltStyle setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431609 (https://phabricator.wikimedia.org/T191655) (owner: 10Catrope) [13:05:23] zeljkof: Sorry. It's a clean from yesterday. I've linked it. [13:05:36] a clean up* [13:05:49] !log milimetric@tin Finished deploy [analytics/refinery@640bc35]: Renaming geoeditors druid datasource (duration: 06m 11s) [13:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:24] stephanebisson: thanks! [13:06:30] (03PS1) 10Gergő Tisza: Enable TemplateStyles for nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432073 (https://phabricator.wikimedia.org/T193786) [13:08:10] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation=compareAndSwap https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:08:43] !log reimage analytics103[2,3] to Debian Stretch [13:08:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:01] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1978 bytes in 0.109 second response time [13:09:20] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:10:53] !log dcausse@tin Synchronized wmf-config/InitialiseSettings.php: T192064 [cirrus] Increase the number of shards for wikidatawiki_content, enwiki_general (duration: 01m 20s) [13:10:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:57] T192064: Increase the number of shards for enwiki_general, viwiki_general and wikidatawiki_content - https://phabricator.wikimedia.org/T192064 [13:11:44] (03PS3) 10Zfilipin: Remove unused wgKartographerDfltStyle setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431609 (https://phabricator.wikimedia.org/T191655) (owner: 10Catrope) [13:12:05] (03PS1) 10Filippo Giunchedi: prometheus: use validate_cmd for rules and config files [puppet] - 10https://gerrit.wikimedia.org/r/432074 [13:12:09] zeljkof: I'm done [13:12:20] ok, taking over swat [13:12:39] (03CR) 10jerkins-bot: [V: 04-1] prometheus: use validate_cmd for rules and config files [puppet] - 10https://gerrit.wikimedia.org/r/432074 (owner: 10Filippo Giunchedi) [13:12:46] stephanebisson: please stand by, your commit will be at mwdebug in a few minutes [13:13:08] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431609 (https://phabricator.wikimedia.org/T191655) (owner: 10Catrope) [13:14:18] (03Merged) 10jenkins-bot: Remove unused wgKartographerDfltStyle setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431609 (https://phabricator.wikimedia.org/T191655) (owner: 10Catrope) [13:14:39] (03PS2) 10Filippo Giunchedi: prometheus: use validate_cmd for rules and config files [puppet] - 10https://gerrit.wikimedia.org/r/432074 [13:15:40] stephanebisson: 431609 is at mwdebug1002, please test and let me know if I can deploy it [13:15:55] zeljkof: testing... [13:16:39] looks good [13:17:14] ok, deploying [13:17:59] stephanebisson: 432031 can not be rebased because of merge conflict [13:18:40] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:431609|Remove unused wgKartographerDfltStyle setting (T191655)]] (duration: 01m 20s) [13:18:41] zeljkof: I'll do it manually [13:18:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:44] T191655: Deploy maps internationalization to production - https://phabricator.wikimedia.org/T191655 [13:18:50] stephanebisson: 431609 is deployed [13:19:42] (03PS3) 10Sbisson: Enable mapframe on all but a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432031 (https://phabricator.wikimedia.org/T191585) (owner: 10Catrope) [13:20:31] zeljkof: rebased [13:20:50] stephanebisson: ok, reviewing [13:21:10] PROBLEM - Host ganeti2008 is DOWN: PING CRITICAL - Packet loss = 100% [13:21:45] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432031 (https://phabricator.wikimedia.org/T191585) (owner: 10Catrope) [13:21:56] akosiaris: any work in progress? ^^^ [13:22:00] RECOVERY - Host ganeti2008 is UP: PING OK - Packet loss = 0%, RTA = 36.39 ms [13:22:04] yup [13:22:17] final reboot before starting to move VMs to it [13:22:20] it's empty anyway [13:22:31] ah great, sorry my bad missing context [13:22:43] no worries, I should have logged it [13:23:10] * volans will refrain to say anything else to not jinx it [13:23:20] (03Merged) 10jenkins-bot: Enable mapframe on all but a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432031 (https://phabricator.wikimedia.org/T191585) (owner: 10Catrope) [13:23:24] so... what VMs to send over there [13:23:48] if it's for testing, feel free to send puppetboard2001 if in the same row (don't remember) [13:24:07] no it's not [13:24:15] stephanebisson: 432031 is at mwdebug [13:24:16] but mwdebug2001 is [13:24:27] sounds a good one :) [13:24:28] can't think of a better point in time to do this :-) [13:25:49] zeljkof: looks good [13:26:09] ok, deploying [13:27:32] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:432031|Enable mapframe on all but a few wikis (T191585)]] (duration: 01m 20s) [13:27:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:36] T191585: Release mapframe to all NON-Flagged Revs wikipedia (and a few who do have Flagged Revs). - https://phabricator.wikimedia.org/T191585 [13:27:52] 10Operations, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Parsing-Team, and 2 others: Different production servers have different versions of tidy installed, resulting in varying output - https://phabricator.wikimedia.org/T193414#4193987 (10ssastry) 05Open>03declined [13:28:03] stephanebisson: 432031 is deployed, please test and thanks for deploying with #releng :) [13:28:12] zeljkof: thanks! [13:28:23] !log EU SWAT finished [13:28:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:59] !log milimetric@tin Started deploy [analytics/refinery@a5a8cbc]: Renaming geoeditors druid datasource [13:50:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:21] (03PS13) 10Volans: First working version [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) [13:53:23] (03PS12) 10Volans: Add CLI script to be installed in the target hosts [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) [13:53:25] (03PS15) 10Volans: Add basic test coverage [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394621 (https://phabricator.wikimedia.org/T167504) [13:53:27] (03PS11) 10Volans: Add login and LDAP support [software/debmonitor] - 10https://gerrit.wikimedia.org/r/425417 (https://phabricator.wikimedia.org/T167504) [13:53:29] (03PS6) 10Volans: Add server side validation of client certificates [software/debmonitor] - 10https://gerrit.wikimedia.org/r/428302 (https://phabricator.wikimedia.org/T167504) [13:54:00] (03CR) 10jerkins-bot: [V: 04-1] First working version [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [13:54:07] (03CR) 10jerkins-bot: [V: 04-1] Add CLI script to be installed in the target hosts [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [13:54:39] (03CR) 10Volans: "Fixes/replies inline." (0310 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [13:55:26] !log milimetric@tin Finished deploy [analytics/refinery@a5a8cbc]: Renaming geoeditors druid datasource (duration: 05m 27s) [13:55:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:51] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:57:12] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation=compareAndSwap https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:59:21] !log beginning upgrade of Kafka main-eqiad cluster from 0.9.0.1 to 1.1.0 - T167039 [13:59:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:25] T167039: Upgrade Kafka on main cluster with security features - https://phabricator.wikimedia.org/T167039 [13:59:26] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:01:08] (03PS2) 10Ottomata: Stop main-codfw -> main-eqiad MirrorMaker during Kafka main upgrade [puppet] - 10https://gerrit.wikimedia.org/r/431799 (https://phabricator.wikimedia.org/T167039) [14:01:17] (03CR) 10Ottomata: [V: 032 C: 032] Stop main-codfw -> main-eqiad MirrorMaker during Kafka main upgrade [puppet] - 10https://gerrit.wikimedia.org/r/431799 (https://phabricator.wikimedia.org/T167039) (owner: 10Ottomata) [14:02:48] ok stopping mirror maker instances [14:04:14] ottomata: downtime added? [14:04:19] ya [14:04:22] super [14:05:59] also reset-failed :) [14:06:01] ok [14:06:28] ok beginning restart 1 [14:06:58] (03PS1) 10Fdans: Make sure only maxmind files are archived [puppet] - 10https://gerrit.wikimedia.org/r/432078 [14:08:07] (03PS1) 10Andrew Bogott: Nova: depool labvirt1001 [puppet] - 10https://gerrit.wikimedia.org/r/432081 (https://phabricator.wikimedia.org/T194258) [14:08:54] elukey, this fixes that issue ^ [14:10:22] fdans: not sure i totally understand but [14:10:24] would [14:10:41] cp -l $MAXMIND_DB_SOURCE_DIR/* $MAXMIND_DB_ARCHIVE_DIR/$CURRENT_DATE/ [14:10:43] do it too? [14:10:50] yeah I was about to ask the same [14:10:59] (03CR) 10Andrew Bogott: [C: 032] Nova: depool labvirt1001 [puppet] - 10https://gerrit.wikimedia.org/r/432081 (https://phabricator.wikimedia.org/T194258) (owner: 10Andrew Bogott) [14:11:24] OH i see [14:11:29] anyhow, let's focus on Kafka now :) [14:11:36] no elukey beacuse archive/ is inside of source dir [14:11:36] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:11:37] i see [14:11:46] find -type f makes sense [14:11:48] ottomata: yeah that's what failed :) [14:12:05] I didn't think of it because it was one of the last changes we made [14:12:17] fdans: for the find command, i think instead of xargs you could just do [14:12:18] (03CR) 10Anomie: WIP: wiki replicas - prepare for refactored actor storage (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431823 (https://phabricator.wikimedia.org/T188299) (owner: 10Bstorm) [14:12:34] -exec cp -l {} "$MAXMIND_DB_ARCHIVE_DIR/$CURRENT_DATE/" \; [14:13:26] ottomata: yesss the stackoverflow people agree with you [14:13:34] (03CR) 10jenkins-bot: multiversion: Remove unused vendor/autoload from getMWVersion. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432011 (owner: 10Krinkle) [14:14:12] (03CR) 10jenkins-bot: [cirrus] Increase the number of shards for wikidatawiki_content, enwiki_general [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427176 (https://phabricator.wikimedia.org/T192064) (owner: 10DCausse) [14:14:41] (03CR) 10jenkins-bot: Remove unused wgKartographerDfltStyle setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431609 (https://phabricator.wikimedia.org/T191655) (owner: 10Catrope) [14:14:56] ottomata: 1001 upgraded right? [14:15:00] (03CR) 10jenkins-bot: Enable mapframe on all but a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432031 (https://phabricator.wikimedia.org/T191585) (owner: 10Catrope) [14:15:09] (03PS2) 10Krinkle: multiversion: Move vendor/autoload from MWMultiVersion to profiler.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432012 [14:15:13] 10Operations, 10DBA, 10Traffic: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462#4194193 (10Rduran) hpenc looks interesting, so maybe we can keep it in mind for future improvements. [14:15:14] elukey: yes, sorry, just about done upgrading 1003 [14:15:14] (03PS3) 10Krinkle: multiversion: Move vendor/autoload from MWMultiVersion to profiler.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432012 [14:15:20] (03CR) 10jenkins-bot: mariadb: Depool db1072 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432056 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo) [14:15:24] ah nice! [14:16:31] ok restart 1 complete, package is upgraded [14:16:46] 10Operations, 10DBA, 10Traffic: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462#4194198 (10jcrespo) Yes, I was not suggesting to do it now, just document the suggestion for the future- or maybe they can even set it up for us in parallel. Changing the algorithm, assuming o... [14:16:47] (03PS2) 10Fdans: Make sure only maxmind files are archived [puppet] - 10https://gerrit.wikimedia.org/r/432078 [14:17:07] (03PS2) 10Ottomata: Kafka main-eqiad inter_broker_protocol_version: 1.1.0 [puppet] - 10https://gerrit.wikimedia.org/r/431800 (https://phabricator.wikimedia.org/T167039) [14:17:12] (03CR) 10Ottomata: [V: 032 C: 032] Kafka main-eqiad inter_broker_protocol_version: 1.1.0 [puppet] - 10https://gerrit.wikimedia.org/r/431800 (https://phabricator.wikimedia.org/T167039) (owner: 10Ottomata) [14:18:04] ok, beginning restart 2: inter broker protocol version bump [14:18:30] 10Operations, 10Patch-For-Review: consider hybrid caching options for ssd+disk - https://phabricator.wikimedia.org/T88992#4194207 (10MoritzMuehlenhoff) [14:18:35] 10Operations, 10Graphite, 10Patch-For-Review: use graphite1002 to test dm-cache - https://phabricator.wikimedia.org/T88994#4194203 (10MoritzMuehlenhoff) 05stalled>03declined Host was decomissioned in T187190 [14:19:04] PROBLEM - Ubuntu mirror in sync with upstream on sodium is CRITICAL: /srv/mirrors/ubuntu is over 14 hours old. [14:19:31] ottomata: can we wait a sec if not too late for all metrics to have recovered? [14:20:10] oh, already did 1001 elukey [14:20:20] super fine then [14:20:54] what are you waiting to see? leaders balanced? [14:21:01] ottomata: let's just wait a minute after this roung of restarts ok? [14:21:07] s/roung/round [14:21:17] ok [14:21:31] shall i continue with 1002 and 1003 or just wait for now? [14:22:21] log flush rate seems a bit up on 1003, but probably nothing to worry about, and also disk iops is recovering [14:22:23] continue, just give it a chill after [14:22:25] PROBLEM - nova-compute proc minimum on labvirt1007 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute [14:22:27] maybe 5 mins [14:22:37] there's a cirrus job running now, so traffic is a litlte higher [14:22:46] cirrusSearchIncomingLinkCount [14:22:52] yeah I was about to say it, probably unrelated sorry [14:22:52] (03CR) 10Krinkle: prometheus: varnish_thumbnails aggregation rule (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431528 (https://phabricator.wikimedia.org/T184942) (owner: 10Ema) [14:22:54] its bursty, runs every 30 mins maye? [14:23:23] you are free to go [14:23:26] RECOVERY - nova-compute proc minimum on labvirt1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute [14:23:31] ok [14:23:31] the cirrus stuff is a no-op right now [14:24:11] they are fed into kafka but nothing is depending on them on the other side [14:24:34] (03CR) 10Jcrespo: "Let's deploy the software on the operations/software/wmfmariadbpy repository, we will see later how to deploy it." [puppet] - 10https://gerrit.wikimedia.org/r/430868 (owner: 10Rduran) [14:24:49] 10Operations, 10DBA, 10Traffic: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462#4194211 (10Vgutierrez) in stretch chacha20 is available as "chacha20" and in jessie as "chacha20-poly1305", BTW for big enough block size (16384 bytes), chacha20 performs better than rc4 on on... [14:27:42] ok restart 2 finsished, all brokers have protocol version 1.1.0 [14:27:45] will chill for a bit [14:28:48] next step is to flip statsv to api 1.1.0 ? [14:29:30] ya next step is clients yup [14:29:44] ack [14:30:24] after you do your consumers ottomata I will deploy mine [14:30:54] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4194227 (10MoritzMuehlenhoff) >>! In T190717#4193908, @Lea_WMDE wrote: >> Do you have a time estimate for those mobile changes? Totally depen... [14:31:27] ping me when done [14:31:49] ok [14:32:06] strange db patterns at 14:11 and many errors at 14:24 [14:32:28] all sections at the same time [14:33:29] (03PS2) 10Ottomata: Kafka main-eqiad - remove api.version [puppet] - 10https://gerrit.wikimedia.org/r/431801 (https://phabricator.wikimedia.org/T167039) [14:33:33] (03CR) 10Ottomata: [V: 032 C: 032] Kafka main-eqiad - remove api.version [puppet] - 10https://gerrit.wikimedia.org/r/431801 (https://phabricator.wikimedia.org/T167039) (owner: 10Ottomata) [14:35:14] (03PS4) 10Rduran: [WIP] Use Cumin to implement the comunication for the transfer [puppet] - 10https://gerrit.wikimedia.org/r/430868 [14:35:58] 10Operations, 10DBA, 10Traffic: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462#4194233 (10Rduran) Thank you both! I'm using "chacha20" right now and it seems to work just fine (I'm using buster, but stretch is also on 1.1.0). Does jessie need to be supported too? [14:36:02] jynus: peaks that suddently recover or is it still ongoing ? [14:36:27] we started the kafka upgrade around that time (I suppose UTC) [14:36:33] 10Operations, 10DBA, 10Traffic: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462#4194234 (10jcrespo) No, stick to stretch, that is ok- that is the target. [14:37:35] (03CR) 10Volans: "Is there a Phabricator task where it's described what we're trying to achieve here? I don't see it linked in the commit message (as Bug: )" [puppet] - 10https://gerrit.wikimedia.org/r/430868 (owner: 10Rduran) [14:38:20] ok Pchelolo done [14:38:23] proceed with cp/jq [14:38:53] ack ottomata [14:39:37] vk metrics are good [14:39:48] (03CR) 10Jcrespo: "T156462" [puppet] - 10https://gerrit.wikimedia.org/r/430868 (owner: 10Rduran) [14:40:27] !log ppchelko@tin Started deploy [changeprop/deploy@e468d8e]: Allow protocol version negotiation. T167039 [14:40:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:30] T167039: Upgrade Kafka on main cluster with security features - https://phabricator.wikimedia.org/T167039 [14:41:19] !log ppchelko@tin Finished deploy [changeprop/deploy@e468d8e]: Allow protocol version negotiation. T167039 (duration: 00m 53s) [14:41:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:29] (03CR) 10Rduran: "> T156462" [puppet] - 10https://gerrit.wikimedia.org/r/430868 (owner: 10Rduran) [14:42:00] CP done, lemme wait for the metrics to come up [14:42:47] (03PS5) 10Rduran: [WIP] Use Cumin to implement the comunication for the transfer [puppet] - 10https://gerrit.wikimedia.org/r/430868 (https://phabricator.wikimedia.org/T156462) [14:45:05] !log ppchelko@tin Started deploy [cpjobqueue/deploy@58935d5]: Allow protocol version negotiation. T167039 [14:45:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:37] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@58935d5]: Allow protocol version negotiation. T167039 (duration: 00m 34s) [14:45:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:41] T167039: Upgrade Kafka on main cluster with security features - https://phabricator.wikimedia.org/T167039 [14:46:22] JQ done, again give me a minute to the metric come [14:47:10] hm ok, not exaclyt sure why, but the 0.9 MM running eqiad -> other eqiads (analytics, and jumbo) stopped replicating [14:47:12] i bounced them, they seem fine [14:47:31] the main->eqiad -> analytics-eqiad has to keep working, or EventStreams will stop [14:47:39] which....actually just made me realize something [14:48:28] i think it hsould be fine, but we are going to have live with the performance degradation for that mirror maker [14:48:33] until we switch EventStreams over [14:48:35] to jumbo [14:48:38] with 1.x mirrormaker [14:48:49] should be ok, since we don't replicate the high volume stuff [14:48:52] but something to be aware of [14:49:08] kafka won't be able to 0 copy transfer for that MM instance [14:50:26] waaait [14:50:46] elukey: https://grafana.wikimedia.org/dashboard/db/jobqueue-eventbus?orgId=1&panelId=15&fullscreen [14:51:06] you see that straight line over there [14:51:26] we removed that consumer long time ago [14:52:04] but burrow is weird [14:52:30] burrow is weird...i did restart it with new api version [14:52:40] maybe something in the new api version causes it to report lag differently? [14:52:55] if the consumer group data is still there, it reports it [14:53:12] but why would it show back up again all of hte sudden? [14:53:43] is that the lag? If you restarted it, it might have refreshed its internal state, it doesn't save it anywhere between restarts [14:54:11] yeah it is the lag [14:54:18] mmmm. Ok, it's not a blocker then [14:55:19] I am pretty sure that the consumer group state is still in kafka, and restarting burrow triggered a re-discovery of all the data [14:55:19] elukey: could you be so kind to clean up that straight line consumer group data pleace? [14:55:49] Pchelolo: I have no idea how to do it if burrow is reporting it [14:56:10] hi all, I was hoping to do a jenkins upgrade this morning in between deployment windows, but it seems like the kafka upgrade is going to be in progress for a bit, is there an eta so I don't step on toes? [14:56:43] thcipriani: let's recheck in ~1h to be sure, we should be ok but we are looking into some issues nw [14:56:46] now [14:57:04] would it be ok for you? [14:57:27] thcipriani: 20-30 mins [14:57:42] thcipriani: ya i don't see jenkins interfering with this [14:57:48] i think it'd be fine [14:58:06] 3 different answers :D [14:58:08] haha [14:58:40] (03PS2) 10Ottomata: Kafka main-eqiad - log.message.format.version [puppet] - 10https://gerrit.wikimedia.org/r/431802 (https://phabricator.wikimedia.org/T167039) [14:58:45] Pchelolo: elukey, you ok with me continuing ^ ? [14:59:13] I mean, worst-case is you can't run patches through jenkins for..however long it takes me to figure out issues, I'll check back in 30 minutes or so and if you're still ongoing I'll push to this afternoon after other deploy windows. [14:59:13] elukey: I think we can postpone the burrow thing and just file a ticket. all the rest of the metric report that everything is ok [14:59:49] Pchelolo: i've had a simliar problem with burrow metrics in mirrormaker dashes [14:59:54] I am. let's see what elukey thinks [15:00:04] i ddi a thing where I exclude showing ones if the offset lag hasn't changed in some period of time [15:00:05] i use a week [15:00:17] yep I agree, it is sadly a burrow weirdness, definitely worth a task [15:00:20] but not blocking atm [15:00:30] ottomata: do we know what happened to the mm instances? [15:00:32] https://grafana-admin.wikimedia.org/dashboard/db/kafka-mirrormaker?refresh=5m&panelId=5&fullscreen&edit&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-lag_datasource=codfw%20prometheus%2Fops&var-mirror_name=main-eqiad_to_eqiad&from=now-30m&to=now [15:00:41] elukey: no, but i put it up to the old buggy MM [15:00:43] version [15:00:47] it looks ok now [15:01:01] k. let's goooooooooooooooooooo [15:01:38] wait a sec, let's just triple check one mm instance [15:01:43] we are not in a hurry [15:02:11] (03PS1) 10Krinkle: prometheus: Add varnishrls aggregation rules [puppet] - 10https://gerrit.wikimedia.org/r/432090 (https://phabricator.wikimedia.org/T184942) [15:02:39] I know it is working now but the last rolling restart will take 10 mins tops [15:02:44] https://grafana-admin.wikimedia.org/dashboard/db/kafka-mirrormaker?refresh=5m&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-lag_datasource=codfw%20prometheus%2Fops&var-mirror_name=main-eqiad_to_eqiad&from=now-1h&to=now [15:03:23] sure but was the error the same that we encountered before? [15:03:35] (03PS1) 10Vgutierrez: pybal: switch lvs1016 to cr1-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/432091 (https://phabricator.wikimedia.org/T184293) [15:04:05] the last time we saw mm dying randomly even if restarting was getting it up as it was fine [15:05:37] elukey: all i see is Offset commit for group kafka-mirror-main-eqiad_to_eqiad f [15:05:38] ailed due to REQUEST_TIMED_OUT, will find new coordinator and retry [15:05:47] Offset commit for group kafka-mirror-main-eqiad_to_eqiad failed due to NOT_COORDINATOR_FOR_GROUP, will find new coordinator and retry [15:05:51] Marking the coordinator 2147482644 dead. [15:06:11] tons of those [15:06:47] jouncebot: next [15:06:48] In 0 hour(s) and 53 minute(s): CongressLookup deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T1600) [15:06:56] yep just seen those as well, seems good then [15:07:05] (03CR) 10Vgutierrez: [C: 032] pybal: switch lvs1016 to cr1-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/432091 (https://phabricator.wikimedia.org/T184293) (owner: 10Vgutierrez) [15:07:12] green light from my side [15:07:43] gooooooooooooo!!!!!! [15:07:47] 10Operations, 10SRE-Access-Requests: Access to Google Search Console for Go Fish Digital - https://phabricator.wikimedia.org/T192893#4194319 (10RobH) >>! In T192893#4190022, @faidon wrote: > Thanks @Deskana :) I think that all seems sufficient and we should just go ahead with this. 2018-08-01 sounds reasonable... [15:07:59] (03PS1) 10Ladsgroup: Enable wp10 data storage in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432093 (https://phabricator.wikimedia.org/T192268) [15:08:10] ok! [15:08:14] (03CR) 10Ottomata: [C: 032] Kafka main-eqiad - log.message.format.version [puppet] - 10https://gerrit.wikimedia.org/r/431802 (https://phabricator.wikimedia.org/T167039) (owner: 10Ottomata) [15:08:18] (03PS3) 10Ottomata: Kafka main-eqiad - log.message.format.version [puppet] - 10https://gerrit.wikimedia.org/r/431802 (https://phabricator.wikimedia.org/T167039) [15:08:20] (03CR) 10Ottomata: [V: 032 C: 032] Kafka main-eqiad - log.message.format.version [puppet] - 10https://gerrit.wikimedia.org/r/431802 (https://phabricator.wikimedia.org/T167039) (owner: 10Ottomata) [15:08:42] vgutierrez: ok if i puppet merge? [15:08:51] ottomata: yes please :) [15:09:01] k don [15:09:02] e [15:10:11] ok proceeding with restart 3 [15:11:20] (03CR) 10Filippo Giunchedi: prometheus: Add varnishrls aggregation rules (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/432090 (https://phabricator.wikimedia.org/T184942) (owner: 10Krinkle) [15:12:33] (03PS2) 10Krinkle: prometheus: Add varnishrls aggregation rules [puppet] - 10https://gerrit.wikimedia.org/r/432090 (https://phabricator.wikimedia.org/T184942) [15:13:56] (03PS3) 10Krinkle: prometheus: Add varnishrls aggregation rules [puppet] - 10https://gerrit.wikimedia.org/r/432090 (https://phabricator.wikimedia.org/T184942) [15:14:21] done restart 3 [15:14:35] \o/ [15:15:35] i see the same errors for the running 0.9 mms again [15:15:37] going to bounce them [15:15:49] (03PS2) 10Ema: prometheus: varnish_thumbnails aggregation rule [puppet] - 10https://gerrit.wikimedia.org/r/431528 (https://phabricator.wikimedia.org/T184942) [15:16:04] on the consumer side it was not even noticed [15:16:06] I really hope that the new mm will be a bit more reliable :D [15:16:11] me too [15:16:25] elukey: not sure if you heard me say, but the guy at blizzard i'm in touch with says it is wayyyy better [15:16:30] they are doing the exact same upgrades we are [15:16:34] and having the same mm problems we are [15:16:40] yep yep you mentioned during standup! [15:16:41] (03CR) 10Fdans: [C: 04-1] Make sure only maxmind files are archived [puppet] - 10https://gerrit.wikimedia.org/r/432078 (owner: 10Fdans) [15:16:43] aye ya [15:16:43] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/432090 (https://phabricator.wikimedia.org/T184942) (owner: 10Krinkle) [15:16:52] ok, i'm going to renable the main mm instance [15:16:55] really great also that we are in contact with people doing the same [15:16:59] first codfw -> eqiad [15:17:38] (03PS1) 10Ottomata: Re-enable main-codfw -> main-eqiad MirrorMaker [puppet] - 10https://gerrit.wikimedia.org/r/432096 (https://phabricator.wikimedia.org/T167039) [15:17:45] (03CR) 10Fdans: Make sure only maxmind files are archived [puppet] - 10https://gerrit.wikimedia.org/r/432078 (owner: 10Fdans) [15:18:22] (03CR) 10Ottomata: [C: 032] Re-enable main-codfw -> main-eqiad MirrorMaker [puppet] - 10https://gerrit.wikimedia.org/r/432096 (https://phabricator.wikimedia.org/T167039) (owner: 10Ottomata) [15:20:55] metrics coming back in so far so good [15:21:16] gonna wait til it catches back up to present [15:21:26] then will re enable eqiad -> codfw one [15:21:33] that will have all traffic since yesterday to catch up on [15:22:05] ottomata: do you want to do ops sync while we wait? Or do you prefer to skip? [15:22:36] elukey: ya let's do it! [15:22:43] no hurry at this stage [15:22:46] we are in post upgrade steps! [15:22:48] 10Operations, 10SRE-Access-Requests: Access to Google Search Console for Go Fish Digital - https://phabricator.wikimedia.org/T192893#4194356 (10Deskana) >>! In T192893#4193791, @MoritzMuehlenhoff wrote: > I can take care (but when it has been properly documented in wikitech it should be part of regular clinic... [15:22:50] !log mw2212,mw2213,mw2214 - reinstall with stretch [15:22:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:57] as long as things keep ticking i'd say we are done with upgrade! [15:22:58] elukey: r u filing the ticket about our burrow weirdness ore should I? [15:23:00] thanks Pchelolo akosiaris! [15:23:36] Pchelolo: if you have time please do, otherwise I'll do it later on [15:23:45] ottomata: it deserves more then "Thanks". it deserves a proper celebration [15:24:17] I have a bulk of meetings now, I'll put it in the todo list [15:24:25] ottomata: :D. Nice work! [15:25:28] OH elukey i think i might need to mess with dashboads! i think maybe metrics have change din new version! [15:25:30] for producer at least [15:25:31] cheers guys. I'll go prepare breakfast now. [15:25:39] k! [15:26:01] ottomata: you made the perfect plan, thank you [15:26:25] :) [15:26:57] !log Replacing lvs1003 with lvs1016 - T184293 [15:27:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:02] T184293: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293 [15:27:47] (03CR) 10WMDE-leszek: BETA ONLY - WikibaseLexeme config (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [15:31:30] congrats folks, I'm going to go ahead with the jenkins upgrade that should be done, hopefully, fairly quickly [15:31:55] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [15:31:59] !log upgrading jenkins on contint2001/contint1001 [15:32:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:25] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [15:34:01] (03PS1) 10Vgutierrez: pybal: set lvs1016 as primary instead of lvs1003 [puppet] - 10https://gerrit.wikimedia.org/r/432102 (https://phabricator.wikimedia.org/T184293) [15:34:18] 10Operations, 10SRE-Access-Requests: Access to Google Search Console for Go Fish Digital - https://phabricator.wikimedia.org/T192893#4194388 (10RobH) @deskana: It looks like we still need the following: * The names and a shared email address if these two users will share an account. ** The login is tied to an... [15:36:04] (03PS1) 10Jcrespo: mariadb: Move db1072 to m3, db1123 to s4 [software] - 10https://gerrit.wikimedia.org/r/432103 (https://phabricator.wikimedia.org/T192979) [15:36:13] (03CR) 10BBlack: [C: 031] pybal: set lvs1016 as primary instead of lvs1003 [puppet] - 10https://gerrit.wikimedia.org/r/432102 (https://phabricator.wikimedia.org/T184293) (owner: 10Vgutierrez) [15:36:20] (03PS2) 10Vgutierrez: pybal: set lvs1016 as primary instead of lvs1003 [puppet] - 10https://gerrit.wikimedia.org/r/432102 (https://phabricator.wikimedia.org/T184293) [15:37:05] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 2.97 ms [15:37:06] 10Operations, 10SRE-Access-Requests: Access to Google Search Console for Go Fish Digital - https://phabricator.wikimedia.org/T192893#4194394 (10RobH) @deskana: It looks like we still need the following: * The names and a shared email address if these two users will share an account. ** The login is tied to an... [15:37:26] (03CR) 10Vgutierrez: [C: 032] pybal: set lvs1016 as primary instead of lvs1003 [puppet] - 10https://gerrit.wikimedia.org/r/432102 (https://phabricator.wikimedia.org/T184293) (owner: 10Vgutierrez) [15:37:29] (03CR) 10BBlack: [C: 031] pybal: set lvs1016 as primary instead of lvs1003 [puppet] - 10https://gerrit.wikimedia.org/r/432102 (https://phabricator.wikimedia.org/T184293) (owner: 10Vgutierrez) [15:37:35] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.66 ms [15:41:22] 10Operations, 10SRE-Access-Requests: Access to Google Search Console for Go Fish Digital - https://phabricator.wikimedia.org/T192893#4194396 (10Deskana) Thank you, @RobH! >>! In T192893#4194388, @RobH wrote: > * The names and a shared email address if these two users will share an account. > ** The login is t... [15:42:46] (03PS1) 10Milimetric: Drop private data used for geoeditor aggregation [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T190409) [15:43:06] mutante: can you git pull pwstore and confirm the new google-search-console file is readable to you? [15:43:11] once you do, ill git rm the other outdated files [15:43:19] (03CR) 10jerkins-bot: [V: 04-1] Drop private data used for geoeditor aggregation [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T190409) (owner: 10Milimetric) [15:45:28] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4194406 (10Imarlier) [15:46:10] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4194417 (10Imarlier) [15:46:13] 10Operations, 10Performance-Team, 10Patch-For-Review: Move coal from graphite#001 nodes to webperf#001 - https://phabricator.wikimedia.org/T159354#4194418 (10Imarlier) [15:49:35] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [15:52:36] XioNoX: ^^ [15:52:46] robh: confirmed readable [15:53:28] paravoid: thx, saw it, low priority, hoping that it solves by itself :) [15:54:55] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 2.95 ms [15:56:08] !log starting revision cleanup job, wikipedia_T_mobile__ng_lead keyspace - T192689 [15:56:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:13] T192689: Unchecked storage growth - https://phabricator.wikimedia.org/T192689 [16:00:04] MaxSem and kaldari: My dear minions, it's time we take the moon! Just kidding. Time for CongressLookup deployment deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T1600). [16:00:17] ready, kaldari? [16:00:51] MaxSem: yes [16:01:39] okay, let's deploy the deployment! [16:02:41] can i deploy the new deployment server too j/k , i'll add it to calendar [16:04:19] (03PS2) 10MaxSem: Deploy CongressLookup on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432029 (https://phabricator.wikimedia.org/T194230) [16:04:33] is there any public page like https://tools.wmflabs.org/admin/oge/status where one can see if there is a replica lag? [16:04:49] (and is this the right place to ask this question?) [16:05:46] https://tools.wmflabs.org/replag/ [16:06:49] (03CR) 10Milimetric: [C: 04-1] "-1 to test with --dry-run" [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T190409) (owner: 10Milimetric) [16:06:56] seth_unr: if for wikireplicas, that^and #wikimedia-cloud [16:07:27] if for production, https://dbtree.wikimedia.org/ and yes [16:07:55] (03PS1) 10Vgutierrez: install_server: Reimage lvs1003 as strech spare system [puppet] - 10https://gerrit.wikimedia.org/r/432116 (https://phabricator.wikimedia.org/T184293) [16:08:21] I think you can also query it using mediawiki apis [16:08:41] thanks, jynus and MaxSem. [16:08:55] (03PS2) 10Vgutierrez: install_server: Reimage lvs1003 as stretch spare system [puppet] - 10https://gerrit.wikimedia.org/r/432116 (https://phabricator.wikimedia.org/T184293) [16:08:56] https://tools.wmflabs.org/replag/ was what I was looking for. [16:09:26] seth_unr: there was maintenance and then apparently overload [16:09:34] it is going down but it will take time [16:09:37] (03CR) 10MaxSem: [C: 032] Deploy CongressLookup on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432029 (https://phabricator.wikimedia.org/T194230) (owner: 10MaxSem) [16:10:00] I see. (right now dewiki has a replag of >24h) [16:10:03] the web one usally is much better because stricted query limits [16:10:55] (03Merged) 10jenkins-bot: Deploy CongressLookup on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432029 (https://phabricator.wikimedia.org/T194230) (owner: 10MaxSem) [16:11:08] seth_unr: connect to web and you will get just 3 hours [16:11:21] 2h30 [16:13:43] (03PS1) 10Krinkle: mtail: Use a temporary variable for $cache_control [puppet] - 10https://gerrit.wikimedia.org/r/432117 (https://phabricator.wikimedia.org/T184942) [16:13:54] (03CR) 10BBlack: [C: 031] install_server: Reimage lvs1003 as stretch spare system (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/432116 (https://phabricator.wikimedia.org/T184293) (owner: 10Vgutierrez) [16:14:12] (03CR) 10jerkins-bot: [V: 04-1] mtail: Use a temporary variable for $cache_control [puppet] - 10https://gerrit.wikimedia.org/r/432117 (https://phabricator.wikimedia.org/T184942) (owner: 10Krinkle) [16:15:38] !log maxsem@tin Started scap: Deploy CongressLookup on testwiki T194230 [16:15:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:43] T194230: Deploy new CongressLookup to Beta Labs (for testing) and Meta Wiki (once it's tested) - https://phabricator.wikimedia.org/T194230 [16:15:43] (03CR) 10Krinkle: "Yeah, I guess that isn't valid syntax." [puppet] - 10https://gerrit.wikimedia.org/r/432117 (https://phabricator.wikimedia.org/T184942) (owner: 10Krinkle) [16:16:15] (03CR) 10Milimetric: [C: 04-1] "these commands fail because of https://github.com/wikimedia/analytics-refinery/blob/master/python/refinery/util.py#L450, so we need to eit" [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T190409) (owner: 10Milimetric) [16:17:11] oh, a-team, joseph's not around so someone's gotta do scrum of scrums [16:17:22] (oops, sorry) [16:17:35] jynus: by "web" do you mean to normal api (e.g. https://de.wikipedia.org/w/api.php)? [16:18:30] seth_unr: join #wikimedia-cloud will give details on-topic there [16:18:54] thanks [16:19:31] (03PS3) 10Vgutierrez: install_server: Reimage lvs1003 as stretch spare system [puppet] - 10https://gerrit.wikimedia.org/r/432116 (https://phabricator.wikimedia.org/T184293) [16:20:15] (03PS2) 10Bstorm: wiki replicas: remove the SQL reference file for indexes since it is obsolete [puppet] - 10https://gerrit.wikimedia.org/r/431825 [16:20:37] (03PS1) 10Ottomata: Re-enable main-eqiad -> main-codfw MirrorMaker [puppet] - 10https://gerrit.wikimedia.org/r/432118 (https://phabricator.wikimedia.org/T167039) [16:20:49] (03CR) 10Vgutierrez: install_server: Reimage lvs1003 as stretch spare system (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/432116 (https://phabricator.wikimedia.org/T184293) (owner: 10Vgutierrez) [16:21:09] (03CR) 10Ottomata: [C: 032] Re-enable main-eqiad -> main-codfw MirrorMaker [puppet] - 10https://gerrit.wikimedia.org/r/432118 (https://phabricator.wikimedia.org/T167039) (owner: 10Ottomata) [16:25:36] (03CR) 10Krinkle: prometheus: varnish_thumbnails aggregation rule (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431528 (https://phabricator.wikimedia.org/T184942) (owner: 10Ema) [16:26:38] (03CR) 10jenkins-bot: Deploy CongressLookup on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432029 (https://phabricator.wikimedia.org/T194230) (owner: 10MaxSem) [16:39:44] !log restarting blazegraph and updater on wdqs1003 [16:39:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:51] (03PS1) 10Ottomata: Enable 1.1.0 MirrorMaker main-eqiad -> jumbo-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/432120 (https://phabricator.wikimedia.org/T189464) [16:53:40] (03CR) 10Ottomata: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11171/" [puppet] - 10https://gerrit.wikimedia.org/r/432120 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [16:56:41] !log disabled 0.9 MirrorMaker on kafka102[023], enabled 1.x MirrorMaker on kafka-jumbo* [16:56:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:57:40] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@1 on kafka1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@1/producer\.properties [16:57:40] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@0 on kafka1020 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@0/producer\.properties [16:57:40] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@2 on kafka1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@2/producer\.properties [16:57:49] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@3 on kafka1023 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@3/producer\.properties [16:57:50] oh shush! icinga puppet needs updating [16:57:59] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@1 on kafka1023 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@1/producer\.properties [16:57:59] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@2 on kafka1023 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@2/producer\.properties [16:58:09] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@0 on kafka1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@0/producer\.properties [16:58:19] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@0 on kafka1023 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@0/producer\.properties [16:58:46] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@3 on kafka1020 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@3/producer\.properties [16:58:47] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@2 on kafka1020 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@2/producer\.properties [16:59:38] MaxSem: scap still running? [17:00:05] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Morning SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T1700). [17:00:05] Amir1 and Smalyshev: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:00:20] here [17:00:33] kaldari: yep :O [17:00:52] here but the first patch might not go out soon [17:00:59] hmm .. what happened to my swat entry? [17:01:01] ignore my first patch for now [17:02:00] https://wikitech.wikimedia.org/w/index.php?title=Deployments&oldid=1790654 got lost in later edits. let me re-add it. [17:02:44] MaxSem: crap, guess we should have started earlier :P [17:03:41] i re-added my (deleted) swat entry for this window. [17:05:56] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@3 on kafka1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@3/producer\.properties [17:08:56] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad@1 on kafka1020 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_jumbo-eqiad@1/producer\.properties [17:11:28] (03PS1) 10Ottomata: Remove MirrorMaker configs from analytics_b hosts [puppet] - 10https://gerrit.wikimedia.org/r/432124 (https://phabricator.wikimedia.org/T189464) [17:12:08] (03CR) 10Ottomata: [C: 032] Remove MirrorMaker configs from analytics_b hosts [puppet] - 10https://gerrit.wikimedia.org/r/432124 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [17:12:56] 10Operations, 10Product-Analytics: Requesting access to stat1006 for Go Fish Digital - https://phabricator.wikimedia.org/T194287#4194550 (10Framawiki) [17:13:03] okay, don't deploy my first patch [17:13:28] (03PS1) 10Ottomata: Remove profile::kafka::mirror from role analytics b [puppet] - 10https://gerrit.wikimedia.org/r/432125 (https://phabricator.wikimedia.org/T189464) [17:13:45] (03CR) 10Ottomata: [V: 032 C: 032] Remove profile::kafka::mirror from role analytics b [puppet] - 10https://gerrit.wikimedia.org/r/432125 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [17:17:20] anyone for SWAT? [17:17:26] MaxSem: When did scap go back to taking over an hour? [17:17:41] when we ditched php5 [17:18:02] wha? [17:18:08] :( [17:19:26] MaxSem: once we run scap for testwiki, will we have to run it again for meta, or is it fine since the i18n cache is shared? [17:19:48] no, meta is simple sync [17:20:13] (03PS1) 10Ottomata: Use mirror_name label for produce rate alert [puppet] - 10https://gerrit.wikimedia.org/r/432127 (https://phabricator.wikimedia.org/T189464) [17:20:24] are swats waiting on the above-mentioned scap to complete? [17:21:03] (03CR) 10Ottomata: [C: 032] Use mirror_name label for produce rate alert [puppet] - 10https://gerrit.wikimedia.org/r/432127 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [17:22:17] (03CR) 10MarcoAurelio: [C: 04-1] "https://phabricator.wikimedia.org/T194230#4193544" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432030 (https://phabricator.wikimedia.org/T194230) (owner: 10MaxSem) [17:23:20] kaldari: it's due to HHVM, not scap :) [17:23:37] tl;dr: blame FB :) [17:23:44] did we try php7? [17:23:59] stupid facebook! [17:26:49] MaxSem: I believe we'll do that when we switch to deploy1001 (from tin) [17:27:01] * greg-g isn't 100% sure, honestly [17:30:50] discussion about rebuildLocalisationCache w/php7/hhvm/etc is happening on https://phabricator.wikimedia.org/T191921 and adjacent tasks [17:32:03] are swats happening in this window then? given the ongoing scap .. [17:38:03] subbu: can't over-ride the current scap [17:38:10] it's still going? [17:38:25] MaxSem, ^? [17:38:30] wooh! [17:38:31] 17:38:08 Started sync-pull-masters [17:40:48] 10Operations, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4121650 (10Paladox) coulden't a new flag be added that switches between php7.0 and hhvm? [17:41:28] (03PS1) 10Arturo Borrero Gonzalez: [WIP] openstack: neutron: nova.conf: enable options [puppet] - 10https://gerrit.wikimedia.org/r/432130 (https://phabricator.wikimedia.org/T193657) [17:43:16] MaxSem: close then! [17:43:21] so close! [17:43:47] kaldari: want to comment on https://gerrit.wikimedia.org/r/#/c/432030/ ? [17:44:04] ug [17:44:08] sure [17:44:16] already 44 minutes passed [17:44:49] FYI, I'm running the scap "fetch [17:45:03] "fetch" stage on ORES boxes, but there will be no impact. [17:46:05] PROBLEM - MegaRAID on analytics1032 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [17:48:23] Amir1: I can swat but I think MaxSem hasn't finished. [17:48:35] PROBLEM - High CPU load on API appserver on mw1339 is CRITICAL: CRITICAL - load average: 77.47, 32.10, 20.17 [17:48:40] which means you can't swat:) [17:48:43] Niharika: yeah, ignore the first patch [17:49:24] MaxSem: Which means I can swat after you're done. Let me know. [17:49:45] RECOVERY - High CPU load on API appserver on mw1339 is OK: OK - load average: 33.83, 28.29, 19.71 [17:49:55] Amir1: Take it off the calendar if you can. [17:50:13] sure thing [17:50:25] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1960 bytes in 0.087 second response time [17:51:11] (03PS2) 10MaxSem: Deploy CongressLookup on Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432030 (https://phabricator.wikimedia.org/T194230) [17:54:50] ok so I assume SWAT is not really happening? should we reschedule the patches? [17:54:56] greg-g: Can I propose moving swat back to 11am PST? And moving the train and services windows ahead by one hour to create the "pre train sanity break"? It conflicts with people's meetings which means fewer people turn up to swat, if anyone. [17:55:36] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1961 bytes in 0.103 second response time [17:56:04] SMalyshev: I imagine we can swat since the next window is free and this one was not available. [17:56:48] well, I can wait till next one, not a big deal... just wanted to get it today, but I can wait for 4pm window [17:57:37] uh, and we're losing several minutes waiting for several slow hosts (dumps?) [17:57:55] what's going on with them that they're SO slow? [18:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T1800) [18:00:43] (03PS3) 10Bstorm: wiki replicas: remove the SQL reference file for indexes since it is obsolete [puppet] - 10https://gerrit.wikimedia.org/r/431825 [18:00:58] !log maxsem@tin Finished scap: Deploy CongressLookup on testwiki T194230 (duration: 105m 19s) [18:01:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:01:03] T194230: Deploy new CongressLookup to Beta Labs (for testing) and Meta Wiki (once it's tested) - https://phabricator.wikimedia.org/T194230 [18:01:21] (03CR) 10MaxSem: [C: 032] "Sorry, we have no time for this. The senate votes today and I have ED orders to make this campaign happen." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432030 (https://phabricator.wikimedia.org/T194230) (owner: 10MaxSem) [18:01:54] (03CR) 10Bstorm: [C: 032] wiki replicas: remove the SQL reference file for indexes since it is obsolete [puppet] - 10https://gerrit.wikimedia.org/r/431825 (owner: 10Bstorm) [18:02:35] (03Merged) 10jenkins-bot: Deploy CongressLookup on Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432030 (https://phabricator.wikimedia.org/T194230) (owner: 10MaxSem) [18:02:52] (03CR) 10jenkins-bot: Deploy CongressLookup on Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432030 (https://phabricator.wikimedia.org/T194230) (owner: 10MaxSem) [18:03:00] (03CR) 10Ottomata: [C: 032] Make sure only maxmind files are archived [puppet] - 10https://gerrit.wikimedia.org/r/432078 (owner: 10Fdans) [18:03:04] (03PS3) 10Ottomata: Make sure only maxmind files are archived [puppet] - 10https://gerrit.wikimedia.org/r/432078 (owner: 10Fdans) [18:03:06] (03CR) 10Ottomata: [V: 032 C: 032] Make sure only maxmind files are archived [puppet] - 10https://gerrit.wikimedia.org/r/432078 (owner: 10Fdans) [18:03:55] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: Traceback (most recent call last) [18:04:23] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/432030/ (duration: 01m 21s) [18:04:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:46] whee https://meta.wikimedia.org/wiki/Special:NetNeutrality [18:04:50] kaldari: ^ [18:04:59] yep, already testing [18:05:03] looks good to me [18:05:22] for Satan't sake, almost 2 hours 8-& [18:07:05] PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: Traceback (most recent call last) [18:07:51] MaxSem: I don't see any problems. Anything in the error logs? [18:08:33] kaldari: https://meta.wikimedia.org/wiki/Special:SenateLookup [18:08:47] Is it supposed to give that error? [18:09:05] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 7 probes of 302 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [18:09:12] (03PS1) 10ArielGlenn: ad ability to keep some dumps files and remove others, during cleanup [puppet] - 10https://gerrit.wikimedia.org/r/432135 (https://phabricator.wikimedia.org/T194124) [18:09:41] Niharika: not on meta yet [18:09:53] Ah, okay. [18:10:01] ooh https://meta.wikimedia.org/wiki/Special:SenateLookup?state=CA [18:10:13] kaldari: nah, all A-OK [18:10:53] (03PS3) 10Ottomata: eventlogging service logstash with gelf [puppet] - 10https://gerrit.wikimedia.org/r/430808 (https://phabricator.wikimedia.org/T193230) [18:10:56] SenateLookup only displays stuff wen it gets state posted [18:11:25] RECOVERY - IPv4 ping to eqsin on ripe-atlas-eqsin is OK: OK - failed 2 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/11645085/#!map [18:11:39] (03PS1) 10DCausse: Add extra-analysis analyzers as separate plugins [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/432136 (https://phabricator.wikimedia.org/T193734) [18:12:48] (03PS2) 10DCausse: Add extra-analysis analyzers as separate plugins [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/432136 (https://phabricator.wikimedia.org/T193734) [18:14:00] Maybe we could redirect users back to Special:NetNeutrality if there isn't a state rather than show them an error. [18:14:45] are there any chances for the Senate to overturn the FCC ruling? [18:14:53] haha, no [18:15:54] (03PS4) 10Ottomata: eventlogging service logstash with gelf [puppet] - 10https://gerrit.wikimedia.org/r/430808 (https://phabricator.wikimedia.org/T193230) [18:16:05] (the swat is done, for those who are wondering) [18:17:57] (03PS5) 10Ottomata: eventlogging service logstash with gelf [puppet] - 10https://gerrit.wikimedia.org/r/430808 (https://phabricator.wikimedia.org/T193230) [18:18:21] ummmm https://meta.wikimedia.org/wiki/Special:NetNeutrality outputs unbalanced HTML [18:18:33] you can tell because the footer is all messed up, compare it to normal pages [18:19:05] !log installing jenkins security updates on releases* [18:19:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:02] may be cause by the footer message, MatmaRex ? [18:20:30] (header and footer are local only) [18:20:39] https://meta.wikimedia.org/w/index.php?title=MediaWiki:Net-neutrality-footer&action=edit [18:20:41] Check https://meta.wikimedia.org/wiki/MediaWiki:Net-neutrality-header and https://meta.wikimedia.org/wiki/MediaWiki:Net-neutrality-footer [18:21:14] (03CR) 10Tjones: [C: 031] Add extra-analysis analyzers as separate plugins [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/432136 (https://phabricator.wikimedia.org/T193734) (owner: 10DCausse) [18:21:16] MaxSem: i don't know why, i'm just noticing that it's broken [18:22:20] kaldari: ^ [18:22:30] hopefully fixed now [18:22:43] !log imarlier@tin Started deploy [performance/coal@8e57e4a]: Deploy only to webperf [18:22:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:49] !log imarlier@tin Finished deploy [performance/coal@8e57e4a]: Deploy only to webperf (duration: 00m 06s) [18:22:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:58] MatmaRex: Fixed [18:23:18] (03PS2) 10ArielGlenn: add ability to keep some dumps files and remove others, during cleanup [puppet] - 10https://gerrit.wikimedia.org/r/432135 (https://phabricator.wikimedia.org/T194124) [18:24:18] thanks kaldari [18:27:23] !log sbisson@tin Started deploy [kartotherian/deploy@ef61ad7]: Make kartotherian serve up to z15 [18:27:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:29:43] !log sbisson@tin Finished deploy [kartotherian/deploy@ef61ad7]: Make kartotherian serve up to z15 (duration: 02m 20s) [18:29:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:38] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4194795 (10Imarlier) [18:31:00] !log sbisson@tin Started deploy [tilerator/deploy@a86f8f8]: Make tilerator store up to zoom 15 [18:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:31] (03PS6) 10Ottomata: eventlogging service logstash with gelf [puppet] - 10https://gerrit.wikimedia.org/r/430808 (https://phabricator.wikimedia.org/T193230) [18:32:43] Niharika, swats moved to a later window in the day? [18:33:21] subbu: Gah, I suppose so. :( [18:33:29] ok. [18:33:42] should i move my swat entries to that window or do they get automatically carried over? :) [18:33:55] subbu: You have to move them. [18:34:33] k [18:37:28] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2214.codfw.wmnet [18:37:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:35] !log sbisson@tin Finished deploy [tilerator/deploy@a86f8f8]: Make tilerator store up to zoom 15 (duration: 06m 36s) [18:37:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:51] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2213.codfw.wmnet [18:39:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:06] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2212.codfw.wmnet [18:41:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:53] (03PS1) 10Dzahn: Revert "disable icinga notifications on mw22* hosts" [puppet] - 10https://gerrit.wikimedia.org/r/432139 [19:00:04] twentyafterfour: Dear deployers, time to do the MediaWiki train deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T1900). [19:02:05] 10Operations, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4194970 (10Legoktm) >>! In T191921#4161858, @Krinkle wrote: > I've put a straw-man up at T176370#4161855. Seems reasonable, but... [19:04:16] (03CR) 10Bstorm: WIP: wiki replicas - prepare for refactored actor storage (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431823 (https://phabricator.wikimedia.org/T188299) (owner: 10Bstorm) [19:06:01] (03CR) 10Dzahn: [C: 032] "done with mw22* (except mw2202)" [puppet] - 10https://gerrit.wikimedia.org/r/432139 (owner: 10Dzahn) [19:06:08] (03PS2) 10Dzahn: Revert "disable icinga notifications on mw22* hosts" [puppet] - 10https://gerrit.wikimedia.org/r/432139 [19:10:25] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4194981 (10ayounsi) [19:10:54] subbu: are y'all using your deployment window today? [19:16:02] (03PS1) 10Dzahn: disable icinga notifications on mw21[3-4]* hosts [puppet] - 10https://gerrit.wikimedia.org/r/432153 [19:17:35] (03CR) 10Dzahn: [C: 032] "this is the last batch anyways ;)" [puppet] - 10https://gerrit.wikimedia.org/r/432153 (owner: 10Dzahn) [19:19:10] (03CR) 10Dzahn: [C: 032] "@imarlier: do we need to do any manual cleanup and delete files?" [puppet] - 10https://gerrit.wikimedia.org/r/431792 (https://phabricator.wikimedia.org/T159354) (owner: 10Imarlier) [19:19:15] (03PS7) 10Ottomata: eventlogging service logstash with gelf [puppet] - 10https://gerrit.wikimedia.org/r/430808 (https://phabricator.wikimedia.org/T193230) [19:20:13] (03CR) 10Milimetric: [C: 04-1] "This can't be merged until this is fixed: https://gerrit.wikimedia.org/r/#/c/432154/" [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T190409) (owner: 10Milimetric) [19:23:07] !log Disable coal service on graphite1001/graphite2001 (T194283) [19:23:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:13] T194283: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283 [19:23:14] !log Stop and disable coal-web (uwsgi-coal) service on graphite1001/graphite2001 (T194283) [19:23:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:32] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4194406 (10Krinkle) I've performed this subset of commands on graphite1001 and graphite2001, as I noticed coal-web running there. ``` sudo systemctl stop uwsgi-coal sudo systemctl stop coal... [19:25:43] cscott: are y'all using your deployment window today? [19:26:05] !log mw2145, mw2146, mw2147 - reinstall with stretch, depooled, downtimed [19:26:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:54] (03CR) 10Anomie: WIP: wiki replicas - prepare for refactored actor storage (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431823 (https://phabricator.wikimedia.org/T188299) (owner: 10Bstorm) [19:28:10] (03CR) 10Imarlier: "> @imarlier: do we need to do any manual cleanup and delete files?" [puppet] - 10https://gerrit.wikimedia.org/r/431792 (https://phabricator.wikimedia.org/T159354) (owner: 10Imarlier) [19:29:41] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4195058 (10Dzahn) a:03Dzahn [19:29:56] (03CR) 10Dzahn: [C: 032] "wow, perfect! taking that ticket" [puppet] - 10https://gerrit.wikimedia.org/r/431792 (https://phabricator.wikimedia.org/T159354) (owner: 10Imarlier) [19:34:29] (03CR) 10Ottomata: [C: 032] eventlogging service logstash with gelf [puppet] - 10https://gerrit.wikimedia.org/r/430808 (https://phabricator.wikimedia.org/T193230) (owner: 10Ottomata) [19:34:34] (03PS8) 10Ottomata: eventlogging service logstash with gelf [puppet] - 10https://gerrit.wikimedia.org/r/430808 (https://phabricator.wikimedia.org/T193230) [19:35:08] !log otto@tin Started deploy [eventlogging/eventbus@c70e8c5]: logstash - T193230 [19:35:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:13] T193230: EventBus HTTP Proxy service does not report errors to logstash - https://phabricator.wikimedia.org/T193230 [19:36:52] !log graphite1001, graphite2001 - deleting uwsgi-coal and coal sytemd unit files; systemctl daemon-reload (T194283) [19:36:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:56] T194283: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283 [19:38:41] !log otto@tin Finished deploy [eventlogging/eventbus@c70e8c5]: logstash - T193230 (duration: 03m 33s) [19:38:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:58] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4195086 (10Dzahn) [19:41:50] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4194406 (10Dzahn) [19:43:01] (03PS1) 10Ottomata: Use gelf port for eventbus logstash [puppet] - 10https://gerrit.wikimedia.org/r/432157 (https://phabricator.wikimedia.org/T193230) [19:44:46] (03CR) 10Ottomata: [C: 032] Use gelf port for eventbus logstash [puppet] - 10https://gerrit.wikimedia.org/r/432157 (https://phabricator.wikimedia.org/T193230) (owner: 10Ottomata) [19:44:50] (03PS2) 10Ottomata: Use gelf port for eventbus logstash [puppet] - 10https://gerrit.wikimedia.org/r/432157 (https://phabricator.wikimedia.org/T193230) [19:44:54] (03CR) 10Ottomata: [V: 032 C: 032] Use gelf port for eventbus logstash [puppet] - 10https://gerrit.wikimedia.org/r/432157 (https://phabricator.wikimedia.org/T193230) (owner: 10Ottomata) [19:45:21] (03CR) 10Raimond Spekking: [C: 031] cawiki: remove gendered namespace aliases, already on MW core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429989 (https://phabricator.wikimedia.org/T113616) (owner: 10MarcoAurelio) [19:47:34] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4195097 (10Dzahn) [19:48:42] !log otto@tin Started deploy [eventlogging/eventbus@aa9eb2c]: logstash T193230 [19:48:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:47] T193230: EventBus HTTP Proxy service does not report errors to logstash - https://phabricator.wikimedia.org/T193230 [19:48:58] !log otto@tin Finished deploy [eventlogging/eventbus@aa9eb2c]: logstash T193230 (duration: 00m 15s) [19:49:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:18] !log graphite1001/2001 - rm check_uwsgi-coal NRPE check config, reloading nagios-nrpe-server (T194283) [19:49:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:22] T194283: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283 [19:49:48] !log otto@tin Started deploy [eventlogging/eventbus@aa9eb2c]: logstash T193230 [19:49:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:58] (03PS14) 10Volans: First working version [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) [19:51:00] (03PS13) 10Volans: Add CLI script to be installed in the target hosts [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) [19:51:02] (03PS16) 10Volans: Add basic test coverage [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394621 (https://phabricator.wikimedia.org/T167504) [19:51:04] (03PS12) 10Volans: Add login and LDAP support [software/debmonitor] - 10https://gerrit.wikimedia.org/r/425417 (https://phabricator.wikimedia.org/T167504) [19:51:05] !log otto@tin Finished deploy [eventlogging/eventbus@aa9eb2c]: logstash T193230 (duration: 01m 17s) [19:51:06] (03PS7) 10Volans: Add server side validation of client certificates [software/debmonitor] - 10https://gerrit.wikimedia.org/r/428302 (https://phabricator.wikimedia.org/T167504) [19:51:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:51:38] (03PS3) 10ArielGlenn: add ability to keep some dumps files and remove others, during cleanup [puppet] - 10https://gerrit.wikimedia.org/r/432135 (https://phabricator.wikimedia.org/T194124) [19:51:40] (03CR) 10jerkins-bot: [V: 04-1] First working version [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [19:51:46] (03CR) 10jerkins-bot: [V: 04-1] Add CLI script to be installed in the target hosts [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [19:53:39] !log rolling restart eventbus service to deploy logstash config [19:53:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:15] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4195105 (10Dzahn) [19:55:18] (03PS1) 1020after4: group1 wikis to 1.32.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432160 [19:55:20] (03CR) 1020after4: [C: 032] group1 wikis to 1.32.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432160 (owner: 1020after4) [19:56:15] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4194406 (10Dzahn) [19:56:36] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432160 (owner: 1020after4) [19:59:25] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4195125 (10Dzahn) [19:59:57] 10Operations: Clean up graphite1001.eqiad.wmnet, now that coal has been moved - https://phabricator.wikimedia.org/T194283#4194406 (10Dzahn) 05Open>03Resolved all done, see checkboxes in the description and notes, i appreciate the detailed ticket with command lines :) [20:00:00] 10Operations, 10Performance-Team, 10Patch-For-Review: Move coal from graphite#001 nodes to webperf#001 - https://phabricator.wikimedia.org/T159354#4195133 (10Dzahn) [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / … . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T2000). [20:00:26] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.3 [20:00:34] marlier: ^ all done on graphite1001/2001! and thanks for that detailed ticket with commands, that made it easy [20:00:34] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432160 (owner: 1020after4) [20:00:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:44] !log group1 to 1.32.0-wmf.3 refs T191049 [20:00:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:54] T191049: 1.32.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T191049 [20:00:56] mutante: Nice, thank you! [20:01:08] 10Operations, 10Performance-Team, 10monitoring, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837#4195140 (10Imarlier) [20:01:11] 10Operations, 10Performance-Team, 10Patch-For-Review: Move coal from graphite#001 nodes to webperf#001 - https://phabricator.wikimedia.org/T159354#4195139 (10Imarlier) 05Open>03Resolved [20:01:47] !log twentyafterfour@tin Synchronized php: group1 wikis to 1.32.0-wmf.3 (duration: 01m 20s) [20:01:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:44] twentyafterfour: lemme know when the train deploy is over [20:02:50] nothing for mobileapps today [20:03:14] kaldari: it's over assuming I don't see any reason to roll back [20:03:25] so far I don't see anything [20:04:33] MaxSem: no one's using the parsoid window. Could we push out https://gerrit.wikimedia.org/r/#/c/432146/? [20:04:51] greg-g: ^ [20:04:52] i'm deploying parsoid now [20:05:20] arlolra: oops :) [20:05:26] MaxSem: nevermind [20:05:39] https://www.mediawiki.org/wiki/Parsoid/Deployments#Wednesday,_May_9,_2018_around_1:15_pm_PT:_5ce2608_to_be_deployed [20:06:12] shouldn't take long though [20:06:14] kaldari: that should be ok, I'll speak for greg on this one if he isn't around? [20:06:17] kaldari: go ahead [20:06:24] * greg-g is in our quarterly check-in [20:06:26] oh there he is [20:06:28] arlolra first :) [20:06:38] we can wait [20:08:33] !log arlolra@tin Started deploy [parsoid/deploy@181e3b1]: Updating Parsoid to 5ce2608 [20:08:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:16] I'm preparing the cherrypicks meanwhile [20:16:14] 10Operations, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Parsing-Team, and 2 others: Different production servers have different versions of tidy installed, resulting in varying output - https://phabricator.wikimedia.org/T193414#4195174 (10Legoktm) [20:17:25] !log arlolra@tin Finished deploy [parsoid/deploy@181e3b1]: Updating Parsoid to 5ce2608 (duration: 08m 52s) [20:17:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:04] (03PS1) 10Herron: mailman: reject HTTP subscription requests from IPs listed on spam blocklists [puppet] - 10https://gerrit.wikimedia.org/r/432168 (https://phabricator.wikimedia.org/T194032) [20:18:34] (03CR) 10jerkins-bot: [V: 04-1] mailman: reject HTTP subscription requests from IPs listed on spam blocklists [puppet] - 10https://gerrit.wikimedia.org/r/432168 (https://phabricator.wikimedia.org/T194032) (owner: 10Herron) [20:19:05] 10Operations, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Parsing-Team, and 2 others: Servers using tidy-html5 are rendering pages differently, especially with - https://phabricator.wikimedia.org/T193414#4195192 (10Legoktm) [20:19:10] 10Operations, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Parsing-Team, and 2 others: Servers using tidy-html5 are rendering pages differently, especially with - https://phabricator.wikimedia.org/T193414#4168635 (10Legoktm) [20:19:22] 10Operations, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Parsing-Team, and 2 others: Servers using tidy-html5 are rendering pages differently, especially with - https://phabricator.wikimedia.org/T193414#4168635 (10Legoktm) 05duplicate>03declined Sorry, I duped the wrong way. [20:20:17] (03PS2) 10Herron: mailman: reject HTTP subscription requests from IPs listed on spam blocklists [puppet] - 10https://gerrit.wikimedia.org/r/432168 (https://phabricator.wikimedia.org/T194032) [20:21:39] !log Updated Parsoid to 5ce2608 (T194081, T188118) [20:21:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:45] T194081: Cannot read property 'replace' of null - https://phabricator.wikimedia.org/T194081 [20:21:45] T188118: Suppress spurious autoInsertedEnd flags from native wikitext tags - https://phabricator.wikimedia.org/T188118 [20:21:54] kaldari: all yours [20:22:01] thank you! [20:22:14] MaxSem: You still around? [20:22:24] yup [20:22:27] 10Operations, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Parsing-Team, and 2 others: Servers using tidy-html5 are rendering pages differently, especially with - https://phabricator.wikimedia.org/T193414#4195214 (10zhuyifei1999) So what makes this declined? The merged task clearly displays some... [20:22:42] doing [20:22:48] yay! [20:23:01] (03PS3) 10Herron: mailman: reject HTTP subscription requests from IPs listed on spam blocklists [puppet] - 10https://gerrit.wikimedia.org/r/432168 (https://phabricator.wikimedia.org/T194032) [20:28:41] !log maxsem@tin Synchronized php-1.32.0-wmf.3/extensions/CongressLookup/: https://gerrit.wikimedia.org/r/#/c/432146/ (duration: 01m 19s) [20:28:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:08] kaldari: do we have a repo for the wikimedia equivset? [20:29:16] antispoof etc [20:29:18] 10Operations, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Parsing-Team, and 2 others: Servers using tidy-html5 are rendering pages differently, especially with - https://phabricator.wikimedia.org/T193414#4195257 (10Legoktm) All servers are running stretch, so if there are still inconsistencies I... [20:29:32] I believe so... [20:29:34] (03PS4) 10Herron: mailman: reject HTTP subscription requests from IPs listed on spam blocklists [puppet] - 10https://gerrit.wikimedia.org/r/432168 (https://phabricator.wikimedia.org/T194032) [20:30:04] Hauskatze: https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/libs/Equivset [20:30:12] !log maxsem@tin Synchronized php-1.32.0-wmf.2/extensions/CongressLookup/: https://gerrit.wikimedia.org/r/#/c/432146/ (duration: 01m 20s) [20:30:16] thanks :) [20:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:30:49] kaldari: done [20:31:19] MaxSem: Looks great! Thanks! [20:31:26] wee [20:33:15] (03PS1) 10Chad: Initial stable-2.15 fork for wikimedia [software/gerrit/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/432180 [20:34:48] (03CR) 10Paladox: [C: 031] "Yay!" [software/gerrit/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/432180 (owner: 10Chad) [20:52:03] jouncebot: now [20:52:03] For the next 0 hour(s) and 7 minute(s): Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T2000) [20:52:03] For the next 0 hour(s) and 7 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T1900) [20:52:05] jouncebot: next [20:52:05] In 2 hour(s) and 7 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T2300) [21:18:45] !log reedy@tin Synchronized php-1.32.0-wmf.2/includes/DefaultSettings.php: Add default edit rate limit of 90 edits/minute for all users (duration: 01m 20s) [21:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:20:39] !log reedy@tin Synchronized php-1.32.0-wmf.3/includes/DefaultSettings.php: Add default edit rate limit of 90 edits/minute for all users (duration: 01m 20s) [21:20:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:50] PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds [21:37:11] PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds [21:37:11] PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds [21:37:31] PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds [21:37:46] (03PS4) 10MarcoAurelio: cawiki: remove gendered namespace aliases, already on MW core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429989 (https://phabricator.wikimedia.org/T113616) [21:37:51] PROBLEM - Disk space on stat1005 is CRITICAL: Return code of 255 is out of bounds [21:38:00] PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds [21:38:21] Reedy: rate limit? :o [21:39:11] PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds [21:39:28] (03PS3) 10MarcoAurelio: mediawiki/apache: seperate line for each chapter ServerAlias [puppet] - 10https://gerrit.wikimedia.org/r/429863 (owner: 10Dzahn) [21:40:20] mutante: I think we can deploy ^^ [21:40:35] cha-d voted +1 so it's probably right? [21:43:13] 10Operations, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Parsing-Team, and 2 others: Servers using tidy-html5 are rendering pages differently, especially with - https://phabricator.wikimedia.org/T193414#4195475 (10ssastry) 05declined>03Open This has been fixed upstream https://github.com/ht... [21:47:07] 10Operations, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Parsing-Team, and 2 others: Servers using tidy-html5 are rendering pages differently, especially with - https://phabricator.wikimedia.org/T193414#4195487 (10ssastry) @zhuyifei1999 Tidy is being removed completely in 7 weeks time (see {T17... [21:47:45] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4195489 (10ayounsi) [21:47:55] (03PS5) 10Addshore: BETA ONLY - WikibaseLexeme config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745) [21:50:09] (03PS4) 10Addshore: BETA ONLY - Enable WikibaseLexeme on BETA wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431563 (https://phabricator.wikimedia.org/T191459) [21:50:17] _joe_: when you have time, mind looking into https://phabricator.wikimedia.org/T190893#4193076 since the linked patch was by you? [21:50:20] jouncebot now [21:50:20] No deployments scheduled for the next 1 hour(s) and 9 minute(s) [21:50:23] jouncebot next [21:50:24] In 1 hour(s) and 9 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T2300) [21:50:37] * addshore is going to throw some config changes in for beta now [21:52:33] (03PS6) 10Addshore: BETA ONLY - WikibaseLexeme config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745) [21:52:37] (03PS5) 10Addshore: BETA ONLY - Enable WikibaseLexeme on BETA wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431563 (https://phabricator.wikimedia.org/T191459) [21:53:00] (03PS2) 10Bstorm: WIP: wiki replicas - prepare for refactored actor storage [puppet] - 10https://gerrit.wikimedia.org/r/431823 (https://phabricator.wikimedia.org/T188299) [21:54:18] (03CR) 10Addshore: [C: 032] BETA ONLY - WikibaseLexeme config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [21:55:39] (03Merged) 10jenkins-bot: BETA ONLY - WikibaseLexeme config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [21:56:06] * addshore waits for scap to run on beta [21:56:10] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [21:56:10] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [21:56:30] RECOVERY - DPKG on stat1005 is OK: All packages OK [21:56:41] RECOVERY - Disk space on stat1005 is OK: DISK OK [21:56:50] RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational [21:57:00] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [21:57:22] well... the previous in progress scap... lame [21:57:52] 10Operations, 10Design-Research: Edit optoutresearch@ mailing list recipients - https://phabricator.wikimedia.org/T100860#4195541 (10aripstra) Hi! Sorry bout delayed response. Just saw it yesterday. Thank you for bringing it back from the dead! Dchen and aripstra (me) are still here, so we can remain on the... [22:00:11] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [22:00:29] (03CR) 10jenkins-bot: BETA ONLY - WikibaseLexeme config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [22:03:04] !log awight@tin Started deploy [ores/deploy@c0db102]: ORES: force git-lfs install [22:03:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:05:39] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4195555 (10ayounsi) Thanks for unblocking that. Let's aim to do the move on Thursday May 24th, morning east coast time. Those server types will suffer a few... [22:05:53] !log awight@tin Finished deploy [ores/deploy@c0db102]: ORES: force git-lfs install (duration: 02m 50s) [22:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:06:04] !log addshore@tin Synchronized wmf-config/InitialiseSettings-labs.php: T184745 BETA ONLY [[gerrit:431306|WikibaseLexeme config]] (duration: 01m 19s) [22:06:06] !log awight@tin Started deploy [ores/deploy@c0db102]: ORES: force git-lfs install (take 2) [22:06:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:06:09] T184745: Prepare config for WikibaseLexeme on beta wikidata - https://phabricator.wikimedia.org/T184745 [22:06:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:02] !log addshore@tin Synchronized wmf-config/Wikibase-labs.php: T184745 BETA ONLY [[gerrit:431306|WikibaseLexeme config]] (duration: 01m 20s) [22:08:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:09:21] !log awight@tin Finished deploy [ores/deploy@c0db102]: ORES: force git-lfs install (take 2) (duration: 03m 15s) [22:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:09:40] (03PS6) 10Addshore: BETA ONLY - Enable WikibaseLexeme on BETA wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431563 (https://phabricator.wikimedia.org/T191459) [22:11:16] (03CR) 10Addshore: [C: 032] BETA ONLY - Enable WikibaseLexeme on BETA wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431563 (https://phabricator.wikimedia.org/T191459) (owner: 10Addshore) [22:12:28] (03Merged) 10jenkins-bot: BETA ONLY - Enable WikibaseLexeme on BETA wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431563 (https://phabricator.wikimedia.org/T191459) (owner: 10Addshore) [22:13:04] Hauskatze: ok, i will do that soon [22:14:43] (03CR) 10jenkins-bot: BETA ONLY - Enable WikibaseLexeme on BETA wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431563 (https://phabricator.wikimedia.org/T191459) (owner: 10Addshore) [22:17:48] !log awight@tin Started deploy [ores/deploy@2a09939]: ORES: force git-lfs install (take 3) [22:17:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:06] !log addshore@tin Synchronized wmf-config/InitialiseSettings-labs.php: T184745 T191459 BETA ONLY [[gerrit:431563|Enable WikibaseLexeme on BETA wikidatawiki]] (duration: 01m 19s) [22:18:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:11] T191459: Deploy WikibaseLexeme to the Beta Cluster - https://phabricator.wikimedia.org/T191459 [22:18:11] T184745: Prepare config for WikibaseLexeme on beta wikidata - https://phabricator.wikimedia.org/T184745 [22:18:49] That should be my silly syncing done [22:20:53] !log awight@tin Finished deploy [ores/deploy@2a09939]: ORES: force git-lfs install (take 3) (duration: 03m 05s) [22:20:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:23:04] (03PS1) 10Reedy: Add default edit rate limit of 90 edits/minute for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432304 [22:23:05] * Reedy stabs portals [22:23:17] (03PS2) 10Reedy: Add default edit rate limit of 90 edits/minute for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432304 [22:23:26] addshore: Have you deployed yours? [22:23:35] yup [22:23:40] all yours [22:24:55] (03CR) 10Reedy: [C: 032] Add default edit rate limit of 90 edits/minute for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432304 (owner: 10Reedy) [22:26:12] (03Merged) 10jenkins-bot: Add default edit rate limit of 90 edits/minute for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432304 (owner: 10Reedy) [22:27:54] (03PS1) 10Addshore: Add WikibaseLexeme to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432306 (https://phabricator.wikimedia.org/T184745) [22:28:14] (03CR) 10Addshore: [C: 04-2] "Should be added to the make-wmf-branch script first https://gerrit.wikimedia.org/r/#/c/431305/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [22:28:25] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: Add default edit rate limit of 90 edits/minute for all users (except wikidata) (duration: 01m 19s) [22:28:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:28:51] Reedy: I'll have 1 more in a sec, let me know when your done! [22:28:56] I'm done [22:29:00] sweet! :)_ [22:30:08] (03PS2) 10Addshore: Add WikibaseLexeme to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432306 (https://phabricator.wikimedia.org/T184745) [22:30:12] (03CR) 10Addshore: [C: 032] Add WikibaseLexeme to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [22:31:02] (03CR) 10jenkins-bot: Add default edit rate limit of 90 edits/minute for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432304 (owner: 10Reedy) [22:31:29] (03Merged) 10jenkins-bot: Add WikibaseLexeme to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [22:32:00] !log awight@tin Started deploy [ores/deploy@1b13ef1]: ORES: drafttopic [22:32:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:33:34] !log addshore@tin Synchronized wmf-config/extension-list: extension-list [[gerrit:432306|Add WikibaseLexeme to extension-list]] (duration: 01m 19s) [22:33:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:33:41] Thats my syncing done again! [22:34:25] * addshore waits for beta [22:35:15] !log awight@tin Finished deploy [ores/deploy@1b13ef1]: ORES: drafttopic (duration: 03m 15s) [22:35:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:35:21] addshore: uhhhh [22:35:27] ? [22:35:29] Did you add it to the wmf deployed branches? [22:35:37] the extension, that is [22:35:51] hmm, it is added to the make branch script but not to deployed branches no [22:35:56] which I now realize also needs to be done [22:36:00] Yeah [22:36:06] Or you just add it to extension-list-labs for now [22:36:08] instead [22:36:11] * addshore thinks some of this should be cleverer [22:36:16] otherwise scap is gonna fuck some shit up [22:36:21] extension-list-labs hasn't existed for a while [22:36:31] wat [22:36:34] xD [22:36:38] How does that work? [22:36:42] Maybe it is cleverer? [22:37:07] Check the email from 1 March from chad [22:37:13] Email is hard [22:37:15] * addshore looks in the channel list for chads current username [22:37:20] no_justification [22:37:33] * addshore forgot [22:37:38] Just put it in extension-list [22:37:44] sweet :) [22:38:01] (03CR) 10jenkins-bot: Add WikibaseLexeme to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432306 (https://phabricator.wikimedia.org/T184745) (owner: 10Addshore) [22:38:22] Ideally I wanna kill that file outright [22:39:41] mmm [22:40:02] mmmmmm [22:40:18] jouncebot next [22:40:18] In 0 hour(s) and 19 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T2300) [22:40:37] !log awight@tin Started deploy [ores/deploy@bf1e2b1]: ORES: drafttopic [22:40:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:56:40] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2145.codfw.wmnet [22:56:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:58:37] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2146.codfw.wmnet [22:58:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:59:26] !log dzahn@neodymium conftool action : set/pooled=yes; selector: name=mw2147.codfw.wmnet [22:59:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:00:05] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180509T2300). [23:00:05] RoanKattouw, Smalyshev, and subbu: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:01:14] o/ [23:02:48] !log mw2141,mw2143,mw2142 - reinstalling with stretch - mw2144: puppet cert not found [23:02:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:03:15] Hello [23:03:18] I'll do the SWAT today [23:04:12] (03PS2) 10Catrope: Enable RemexHtml on more wikibooks wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430903 (https://phabricator.wikimedia.org/T192821) (owner: 10Subramanya Sastry) [23:04:16] (03CR) 10Catrope: [C: 032] Enable RemexHtml on more wikibooks wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430903 (https://phabricator.wikimedia.org/T192821) (owner: 10Subramanya Sastry) [23:05:34] (03Merged) 10jenkins-bot: Enable RemexHtml on more wikibooks wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430903 (https://phabricator.wikimedia.org/T192821) (owner: 10Subramanya Sastry) [23:06:27] !log awight@tin Finished deploy [ores/deploy@bf1e2b1]: ORES: drafttopic (duration: 25m 51s) [23:06:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:07:43] Did mwdebug1002 have its ssh key change? [23:08:05] i think so do to the reinstall. [23:08:07] mutante ^^ [23:08:18] RoanKattouw: on the 3rd moritz reimaged it [23:08:23] So yeah [23:08:57] Thanks [23:09:24] subbu: Your patch is on mwdebug1002, please test [23:09:38] testing [23:10:03] RoanKattouw: Do you mind if I slip in a tiny Wikibase backport? https://phabricator.wikimedia.org/T194316 [23:10:20] Reply ahead of time ;) [23:11:03] Sure -- but I'll wait until it merges in master first [23:11:34] RoanKattouw, lgtm. [23:11:41] OK, syncing [23:13:03] SMalyshev: You around for SWAT? [23:13:25] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable RemexHtml on some wikibooks wikis (T192821) (duration: 01m 21s) [23:13:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:13:29] (03PS3) 10Catrope: Enable ORES on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431036 (https://phabricator.wikimedia.org/T192501) [23:13:30] T192821: Enable RemexHTML on wikibook wikis with < 100 linter errors in all high priority linter categories in ns0 (main namespace) - https://phabricator.wikimedia.org/T192821 [23:13:33] (03CR) 10Catrope: [C: 032] Enable ORES on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431036 (https://phabricator.wikimedia.org/T192501) (owner: 10Catrope) [23:13:38] (03PS5) 10Catrope: Enable ORES on lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431038 (https://phabricator.wikimedia.org/T192499) [23:13:41] (03CR) 10Catrope: [C: 032] Enable ORES on lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431038 (https://phabricator.wikimedia.org/T192499) (owner: 10Catrope) [23:13:47] (03PS4) 10Catrope: Enable ORES on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431040 (https://phabricator.wikimedia.org/T192496) [23:13:50] (03CR) 10Catrope: [C: 032] Enable ORES on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431040 (https://phabricator.wikimedia.org/T192496) (owner: 10Catrope) [23:14:53] (03Merged) 10jenkins-bot: Enable ORES on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431036 (https://phabricator.wikimedia.org/T192501) (owner: 10Catrope) [23:15:19] (03Merged) 10jenkins-bot: Enable ORES on lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431038 (https://phabricator.wikimedia.org/T192499) (owner: 10Catrope) [23:15:28] (03Merged) 10jenkins-bot: Enable ORES on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431040 (https://phabricator.wikimedia.org/T192496) (owner: 10Catrope) [23:24:04] !log awight@tin Started deploy [ores/deploy@bf182e2]: Rollback ores1001 [23:24:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:24:07] !log awight@tin Finished deploy [ores/deploy@bf182e2]: Rollback ores1001 (duration: 00m 03s) [23:24:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:25:40] (03CR) 10jenkins-bot: Enable RemexHtml on more wikibooks wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430903 (https://phabricator.wikimedia.org/T192821) (owner: 10Subramanya Sastry) [23:25:45] (03CR) 10jenkins-bot: Enable ORES on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431036 (https://phabricator.wikimedia.org/T192501) (owner: 10Catrope) [23:25:51] (03CR) 10jenkins-bot: Enable ORES on lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431038 (https://phabricator.wikimedia.org/T192499) (owner: 10Catrope) [23:25:54] Ugh the population script for ORES is broken [23:25:57] (03CR) 10jenkins-bot: Enable ORES on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431040 (https://phabricator.wikimedia.org/T192496) (owner: 10Catrope) [23:31:37] Reedy: RoanKattouw: Seems like the backport will fail https://integration.wikimedia.org/ci/job/mwext-php70-phan-docker/6313/ for unrelated reasons [23:48:00] (03PS1) 10Catrope: Follow-up 3904582dfd: fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432328 [23:48:21] (03CR) 10Catrope: [C: 032] Follow-up 3904582dfd: fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432328 (owner: 10Catrope) [23:49:31] (03Merged) 10jenkins-bot: Follow-up 3904582dfd: fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432328 (owner: 10Catrope) [23:49:40] (03PS1) 10Catrope: Follow-up 1a40cea4dccd: fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432329 [23:49:50] (03CR) 10Catrope: [C: 032] Follow-up 1a40cea4dccd: fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432329 (owner: 10Catrope) [23:50:57] (03CR) 10jenkins-bot: Follow-up 3904582dfd: fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432328 (owner: 10Catrope) [23:51:05] (03Merged) 10jenkins-bot: Follow-up 1a40cea4dccd: fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432329 (owner: 10Catrope) [23:53:18] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable ORES on cawiki, lvwiki, huwiki (T192501, T192499, T192496) (duration: 01m 29s) [23:53:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:53:25] T192499: Deploy ORES advanced editquality models to lvwiki - https://phabricator.wikimedia.org/T192499 [23:53:26] T192501: Deploy ORES advanced editquality models to cawiki - https://phabricator.wikimedia.org/T192501 [23:53:26] T192496: Deploy ORES advanced editquality models to huwiki - https://phabricator.wikimedia.org/T192496 [23:55:15] (03CR) 10Catrope: [C: 032] Enable ORES on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431035 (https://phabricator.wikimedia.org/T192498) (owner: 10Catrope) [23:55:24] (03CR) 10jerkins-bot: [V: 04-1] Enable ORES on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431035 (https://phabricator.wikimedia.org/T192498) (owner: 10Catrope) [23:56:49] (03PS2) 10Catrope: Enable ORES on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431035 (https://phabricator.wikimedia.org/T192498) [23:57:05] (03CR) 10Catrope: [C: 032] Enable ORES on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431035 (https://phabricator.wikimedia.org/T192498) (owner: 10Catrope) [23:57:21] (03CR) 10jenkins-bot: Follow-up 1a40cea4dccd: fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432329 (owner: 10Catrope) [23:58:21] (03Merged) 10jenkins-bot: Enable ORES on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431035 (https://phabricator.wikimedia.org/T192498) (owner: 10Catrope)