[00:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150910T0000). [00:04:42] (03PS1) 10Dzahn: mailman: also rsync qfiles [puppet] - 10https://gerrit.wikimedia.org/r/237299 [00:04:44] okay, for some reason, interwiki language aliases don't overwrite existing links [00:04:56] (03PS4) 10Dzahn: mailman: change the way we rsync stuff [puppet] - 10https://gerrit.wikimedia.org/r/237287 (https://phabricator.wikimedia.org/T108071) [00:06:52] (03PS2) 10Dzahn: mailman: also rsync qfiles [puppet] - 10https://gerrit.wikimedia.org/r/237299 [00:07:14] (03PS3) 10Dzahn: mailman: also rsync qfiles [puppet] - 10https://gerrit.wikimedia.org/r/237299 (https://phabricator.wikimedia.org/T110138) [00:08:21] (03CR) 10Dzahn: [C: 032] "i'm running this stuff with --dry-run first" [puppet] - 10https://gerrit.wikimedia.org/r/237287 (https://phabricator.wikimedia.org/T108071) (owner: 10Dzahn) [00:08:33] (03PS4) 10Dzahn: mailman: also rsync qfiles [puppet] - 10https://gerrit.wikimedia.org/r/237299 (https://phabricator.wikimedia.org/T110138) [00:11:23] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 14 data above and 9 below the confidence bounds [00:11:55] (03CR) 10Dzahn: [C: 032] mailman: also rsync qfiles [puppet] - 10https://gerrit.wikimedia.org/r/237299 (https://phabricator.wikimedia.org/T110138) (owner: 10Dzahn) [00:15:48] (03CR) 10Dzahn: "we can also do this via hiera. i see this in hieradata:" [puppet] - 10https://gerrit.wikimedia.org/r/237045 (owner: 10Hashar) [00:17:23] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [00:18:05] (03PS1) 10Dzahn: contint: add nagios contact group in hiera common [puppet] - 10https://gerrit.wikimedia.org/r/237301 [00:26:48] (03PS2) 10Dzahn: contint: add nagios contact group in hiera common [puppet] - 10https://gerrit.wikimedia.org/r/237301 [00:28:32] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/237301/ ?" [puppet] - 10https://gerrit.wikimedia.org/r/237045 (owner: 10Hashar) [00:39:30] (03PS5) 10Thcipriani: Add config deployment [tools/scap] - 10https://gerrit.wikimedia.org/r/235385 [00:46:32] PROBLEM - nutcracker port on mw1154 is CRITICAL: Connection refused [00:46:51] i'll look at mw1154 [00:49:48] ori: memory leak? i saw: Memory cgroup out of memory: Kill process 27901 (convert) and this extreme load and actually tried starting nutcracker [00:50:35] mutante: yeah, I can't ssh in [00:51:30] mutante: what do you suggest we do? We could probably bring it back to life with salt (or by rebooting it via console, if that fails). If we want to examine it more deeply, then we should probably depool it. [00:51:53] ori: let's reboot it [00:51:59] i can do it [00:52:04] k, +1. thanks. [00:54:13] !log powercycling unresponsive mw1154 [00:54:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:54:23] i tried a normal reboot first, but no [00:55:43] PROBLEM - Host mw1154 is DOWN: PING CRITICAL - Packet loss = 100% [00:55:45] fwiw, it was somehow really busy since pretty much exactly 1 day [00:56:12] RECOVERY - nutcracker process on mw1154 is OK: PROCS OK: 1 process with UID = 109 (nutcracker), command name nutcracker [00:56:13] RECOVERY - Host mw1154 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [00:56:32] RECOVERY - nutcracker port on mw1154 is OK: TCP OK - 0.000 second response time on port 11212 [00:57:02] RECOVERY - dhclient process on mw1154 is OK: PROCS OK: 0 processes with command name dhclient [00:57:23] RECOVERY - salt-minion processes on mw1154 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:57:32] RECOVERY - HHVM processes on mw1154 is OK: PROCS OK: 6 processes with command name hhvm [00:57:45] i saw it in icinga, could still ssh, try to restart nutcracker and then ..meh. [01:00:32] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:03:27] !log ori@tin Synchronized php-1.26wmf22/extensions/NavigationTiming: I2605c746b: Ensure timings are reported after the page has loaded (duration: 00m 12s) [01:03:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:04:03] !log ori@tin Synchronized php-1.26wmf21/extensions/NavigationTiming: I2605c746b: Ensure timings are reported after the page has loaded (duration: 00m 13s) [01:04:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:21:28] (03PS1) 10Andrew Bogott: Designate kilo renames 'database_connection' to 'connection' [puppet] - 10https://gerrit.wikimedia.org/r/237314 [01:22:49] (03CR) 10Andrew Bogott: [C: 032] Designate kilo renames 'database_connection' to 'connection' [puppet] - 10https://gerrit.wikimedia.org/r/237314 (owner: 10Andrew Bogott) [01:29:40] (03PS1) 10Dzahn: mailman: remove import scripts [puppet] - 10https://gerrit.wikimedia.org/r/237315 (https://phabricator.wikimedia.org/T110131) [01:32:13] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Puppet last ran 1 day ago [01:32:17] (03PS2) 10Dzahn: mailman: remove import scripts [puppet] - 10https://gerrit.wikimedia.org/r/237315 (https://phabricator.wikimedia.org/T110131) [01:34:13] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:38:37] (03PS1) 10Dzahn: mailman: rsyncd conf for exim sync [puppet] - 10https://gerrit.wikimedia.org/r/237316 (https://phabricator.wikimedia.org/T110440) [01:54:03] PROBLEM - DPKG on planet1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:54:13] PROBLEM - RAID on planet1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:54:23] PROBLEM - dhclient process on planet1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:55:03] PROBLEM - puppet last run on planet1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:55:22] PROBLEM - Check size of conntrack table on planet1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:55:22] PROBLEM - salt-minion processes on planet1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:55:33] PROBLEM - Disk space on planet1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:55:42] PROBLEM - HTTP on planet1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:56:02] PROBLEM - configured eth on planet1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:56:31] eh, checking that [01:56:53] RECOVERY - puppet last run on planet1001 is OK: OK: Puppet is currently enabled, last run 13 minutes ago with 0 failures [01:57:12] RECOVERY - Check size of conntrack table on planet1001 is OK: OK: nf_conntrack is 0 % full [01:57:12] RECOVERY - salt-minion processes on planet1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [01:57:23] RECOVERY - Disk space on planet1001 is OK: DISK OK [01:57:33] RECOVERY - HTTP on planet1001 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 459 bytes in 0.003 second response time [01:57:50] nrpe[22674]: Error: Could not complete SSL handshake [01:57:52] RECOVERY - configured eth on planet1001 is OK: OK - interfaces up [01:57:53] RECOVERY - DPKG on planet1001 is OK: All packages OK [01:58:03] RECOVERY - RAID on planet1001 is OK: OK: no RAID installed [01:58:13] RECOVERY - dhclient process on planet1001 is OK: PROCS OK: 0 processes with command name dhclient [02:03:07] (03CR) 10Dzahn: [C: 032] mailman: remove import scripts [puppet] - 10https://gerrit.wikimedia.org/r/237315 (https://phabricator.wikimedia.org/T110131) (owner: 10Dzahn) [02:03:46] (03CR) 10Dzahn: [C: 032] mailman: rsyncd conf for exim sync [puppet] - 10https://gerrit.wikimedia.org/r/237316 (https://phabricator.wikimedia.org/T110440) (owner: 10Dzahn) [02:10:29] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: import all lists with the script we wrote for that - https://phabricator.wikimedia.org/T110131#1623442 (10Dzahn) we won't use that script anymore. instead we adjusted the rsyncd config and the script running on the source side to directly sync into /... [02:10:41] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: import all lists with the script we wrote for that - https://phabricator.wikimedia.org/T110131#1623443 (10Dzahn) 5Open>3Invalid [02:10:41] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1623444 (10Dzahn) [02:10:48] (03PS1) 10Andrew Bogott: Change the notification driver for kilo [puppet] - 10https://gerrit.wikimedia.org/r/237318 [02:12:39] (03PS2) 10Andrew Bogott: Change the notification driver for kilo [puppet] - 10https://gerrit.wikimedia.org/r/237318 [02:22:11] (03PS3) 10Andrew Bogott: Change the notification driver for kilo [puppet] - 10https://gerrit.wikimedia.org/r/237318 [02:22:49] Krenair: there is a local uncommitted modiciation on tin in WikimediaEvents [02:23:16] I think that's from you if I'm reading the logs right [02:23:17] WikimediaEvents? [02:23:24] you mean WikimediaMaintenance? [02:23:27] yes [02:23:28] sorry [02:23:45] yeah [02:23:52] that's actually a revert of a commit from earlier that broke things [02:24:11] !log krinkle@tin Synchronized php-1.26wmf21/resources/src/mediawiki/mediawiki.js: Ic0b1fb64ee7 backport (duration: 00m 12s) [02:24:11] Can you revert that in gerrit as well? [02:24:14] (03CR) 10Andrew Bogott: [C: 032] Change the notification driver for kilo [puppet] - 10https://gerrit.wikimedia.org/r/237318 (owner: 10Andrew Bogott) [02:24:18] I put the revert commit up for review [02:24:20] and restore tin :) [02:24:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:24:38] Krenair: If it's in a wmf branch only, you can just self-merge, right? [02:24:48] no, it's also on master [02:25:03] it's a silly distinction given that we're talking about WikimediaMaintenance, but there you go... [02:25:11] I mean whatever is deployed and on tin should match wmf branch. I don't care about the details or what's in master. :/ [02:25:30] we should just revert on the deploy branches unless master is also fixed [02:25:49] shouldn't* [02:25:51] otherwise the next branch will break it [02:25:53] At the very least commit the revert locally on tin using git. [02:25:58] OK [02:25:59] whether it's in gerrit or not I don't mind even. [02:26:21] You can also use git-revert to do the revert itself of course :) [02:26:30] actually, I'm just going to self-+2 on master [02:26:42] it's a revert to a previously reviewed state, so meh [02:28:12] will wait for that to go through jenkins and then sort out tin [02:30:24] k [02:32:11] are you running scap Krinkle ? [02:32:15] No [02:32:18] oh, l10nupdate [02:32:23] I'm done [02:36:50] !log l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 10m 45s) [02:37:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:43:20] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-10 02:43:20+00:00 [02:43:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:50:49] !log krenair@tin Synchronized php-1.26wmf21/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/237303 (duration: 00m 10s) [02:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:51:53] !log krenair@tin Synchronized php-1.26wmf22/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/237304 (duration: 00m 11s) [02:52:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:59:45] !log l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 06m 10s) [02:59:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:02:45] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-10 03:02:45+00:00 [03:02:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:34:03] PROBLEM - Host mw2031 is DOWN: PING CRITICAL - Packet loss = 100% [03:35:22] RECOVERY - Host mw2031 is UP: PING OK - Packet loss = 0%, RTA = 34.45 ms [03:52:33] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: puppet fail [03:54:33] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:18:33] (03PS1) 10KartikMistry: Beta: Enable Content Translation suggestions for Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237322 [04:54:21] 6operations, 6Release-Engineering-Team, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1623545 (10greg) Is the list of blockers here... [05:19:35] (03PS1) 10KartikMistry: CX: Enable suggestion for ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237327 (https://phabricator.wikimedia.org/T111901) [05:29:43] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [05:33:53] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [05:47:11] (03PS1) 10Mdann52: noindex userspace, per T014797 Change-Id: Iae2edc388f081da4618bee0697c67b15367e227f [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237330 [05:47:53] (03PS2) 10Mdann52: noindex userspace, per T014797 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237330 [05:48:58] (03PS3) 10Mdann52: noindex userspace, per T104797 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237330 [05:51:44] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 9 below the confidence bounds [05:57:44] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 8 below the confidence bounds [06:01:54] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [06:07:53] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 8 below the confidence bounds [06:12:50] (03CR) 10Nemo bis: [C: 031] "Communication part done." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236045 (https://phabricator.wikimedia.org/T44894) (owner: 10Greg Grossmeier) [06:13:02] (03CR) 10Santhosh: CX: Enable suggestion for ptwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237327 (https://phabricator.wikimedia.org/T111901) (owner: 10KartikMistry) [06:13:15] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Sep 10 06:13:14 UTC 2015 (duration 13m 13s) [06:13:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:13:52] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [06:21:31] (03PS2) 10KartikMistry: CX: Enable suggestion for ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237327 (https://phabricator.wikimedia.org/T111901) [06:21:38] (03CR) 10jenkins-bot: [V: 04-1] CX: Enable suggestion for ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237327 (https://phabricator.wikimedia.org/T111901) (owner: 10KartikMistry) [06:24:06] (03PS3) 10KartikMistry: CX: Enable suggestion for ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237327 (https://phabricator.wikimedia.org/T111901) [06:27:04] (03PS2) 10KartikMistry: Beta: Enable Content Translation suggestions for Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237322 [06:29:15] (03PS1) 10Nemo bis: Configure $wgExtraSignatureNamespaces for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237331 (https://phabricator.wikimedia.org/T7645) [06:30:43] PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:14] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [100000000.0] [06:31:33] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: puppet fail [06:31:42] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:53] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:03] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:33] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:53] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:12] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:12] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:13] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:22] PROBLEM - puppet last run on subra is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:33] PROBLEM - puppet last run on mw2043 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:42] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:12] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:23] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:33] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:48:32] (03PS4) 10KartikMistry: CX: Enable suggestion for ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237327 (https://phabricator.wikimedia.org/T111901) [06:51:36] (03PS3) 10KartikMistry: Beta: Enable Content Translation suggestions for Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237322 [06:51:59] (03PS5) 10KartikMistry: CX: Enable suggestion for ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237327 (https://phabricator.wikimedia.org/T111901) [06:55:32] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:55:33] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:55:43] RECOVERY - puppet last run on db1056 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:56:02] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:56:13] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:56:23] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:56:23] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:56:34] RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:43] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:57:02] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:03] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:57:03] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:57:13] RECOVERY - puppet last run on subra is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:23] RECOVERY - puppet last run on mw2043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:24] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:03] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:02:02] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 2 below the confidence bounds [07:07:14] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [07:29:17] (03PS2) 10MarcoAurelio: T110674 Update of permissions at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234544 (https://phabricator.wikimedia.org/T110674) [07:30:06] (03PS3) 10MarcoAurelio: Update of permissions at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234544 (https://phabricator.wikimedia.org/T110674) [07:43:31] 6operations, 6Release-Engineering-Team, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ExternalStore has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1623741 (10hashar) [08:38:01] 6operations, 10ops-codfw, 5Patch-For-Review: provision wmf5846 and wmf5848 - https://phabricator.wikimedia.org/T111697#1623817 (10fgiunchedi) 5Open>3Resolved switch ports configured, resolving [08:39:29] (03PS1) 10Muehlenhoff: Create ferm rules for Hadoop master and Hadoop standby (common rules) [puppet] - 10https://gerrit.wikimedia.org/r/237335 [08:43:54] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [08:46:05] 6operations, 10Math, 5Patch-For-Review: Install texlive-extra-utils on mw appservers - https://phabricator.wikimedia.org/T109195#1623852 (10Reedy) >>! In T109195#1623089, @Dzahn wrote: >>>! In T109195#1543321, @Reedy wrote: >> Do they just need whitelisting in the Math extension maybe then? :/ > > How do yo... [08:47:53] (03PS1) 10Filippo Giunchedi: install_server: provision restbase-test2* with raid1 [puppet] - 10https://gerrit.wikimedia.org/r/237338 (https://phabricator.wikimedia.org/T111382) [08:48:16] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: provision restbase-test2* with raid1 [puppet] - 10https://gerrit.wikimedia.org/r/237338 (https://phabricator.wikimedia.org/T111382) (owner: 10Filippo Giunchedi) [08:50:15] 6operations, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#1623888 (10hashar) [08:51:34] 6operations, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#1623896 (10hashar) 5Open>3stalled Stalling the ticket. The gitblit software powering git.wikimedia.org is in the process of being replaced by Phabricator Diffusion. I have made {T111465} a bloc... [08:59:11] (03PS1) 10Jcrespo: Add grants for new database designate_pool_manager [puppet] - 10https://gerrit.wikimedia.org/r/237339 (https://phabricator.wikimedia.org/T112041) [09:08:04] (03PS8) 10Filippo Giunchedi: certificate/keystore generation script [puppet] - 10https://gerrit.wikimedia.org/r/236389 (https://phabricator.wikimedia.org/T108953) (owner: 10Eevans) [09:08:11] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] certificate/keystore generation script [puppet] - 10https://gerrit.wikimedia.org/r/236389 (https://phabricator.wikimedia.org/T108953) (owner: 10Eevans) [09:20:13] !log restbase deploying 0182962 [09:20:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:20:58] (03CR) 10MarcoAurelio: "Not a dev, but I think it needs to be rebased? Also (very minor), it seems that the practice is to add simply "T..." instead of "Bug T..."" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218353 (https://phabricator.wikimedia.org/T97013) (owner: 10Cenarium) [09:21:41] jynus: around? [09:21:51] kart_, yes [09:22:37] jynus: I guess I misread comment you did for CX tables. [09:22:46] ? [09:23:14] jynus: regarding: https://phabricator.wikimedia.org/T111317 [09:24:45] jynus: you can go ahead with creating tables :) [09:25:00] kart_, I know, I have to fix the blocker task [09:25:26] but I have like 30 high tickets because people are abusing "high" [09:25:26] jynus: good to fix by next Thursday (code will be deployed in wmf23) [09:25:32] heh [09:25:36] I intended to do it today [09:26:12] jynus: thanks. We've script to run which depends on tables. [09:26:12] in fact, I intended to do it the very same day you told me about, but as you delayed it, 30 more tickets arrived in between [09:26:23] OOO [09:26:30] jynus: sorry about that. [09:26:37] I want to test in Beta first. [09:26:41] oh, it is not your fault [09:27:27] Thanks. Ping me anytime with regard to T111317. [09:27:28] kart_, as I commented, you can do Beta yourself [09:27:37] jynus: done on Beta. [09:28:03] I just want you to undertand that I have a huge backlog [09:28:26] due mainly to being only 1 person (and we will fix that soon) [09:28:48] I also proposed [09:28:54] a new tag [09:29:05] to make sure I do not miss schema changes [09:29:25] please check https://wikitech.wikimedia.org/wiki/Schema_changes/Coordination_proposal [09:29:32] No worries. [09:29:36] Checking. [09:29:43] with that, I will be able to attend those faster [09:29:50] and make sure I do not miss them [09:29:59] and that I have all the information [09:31:01] I know there is a problem, I am trying to fix it (I am also on clinic duty this week, which means I have a lot of regular operations tasks with high priority) [09:31:10] looks good. [09:31:46] as I said to someone before, I can only work 80 hours a week [09:33:01] 80! [09:33:42] and pinging me, I assure you does not make me work faster, I assure you [09:34:14] it is on the backlog, it will be done soon(TM) [09:38:07] (03PS2) 10Jcrespo: Add grants for new database designate_pool_manager [puppet] - 10https://gerrit.wikimedia.org/r/237339 (https://phabricator.wikimedia.org/T112041) [09:39:02] (03CR) 10Jcrespo: [C: 032] Add grants for new database designate_pool_manager [puppet] - 10https://gerrit.wikimedia.org/r/237339 (https://phabricator.wikimedia.org/T112041) (owner: 10Jcrespo) [09:40:46] kart_, one thing you can do for helping me, kart_ is trying to promote the new workflow I sent you before [09:41:25] I will send a proposal soon to release eng team and the several developers teams [09:42:09] and if it works, it will help me prioritize workflows in which I am a blocker [09:43:20] (03PS3) 10Muehlenhoff: Add a ferm define for mw_appserver_networks (needed for scap::proxy) [puppet] - 10https://gerrit.wikimedia.org/r/237085 [09:44:52] jynus: sure. I'll be happy to help to make work faster. [09:45:12] thanks, kart_ :-) [09:45:29] you can share it at least within your team [09:45:45] to see if they have any comments or criticism [09:46:19] please note that I would love to make everything work faster, ok? [09:47:37] (03CR) 10Muehlenhoff: [C: 032] "Going ahead with merging this; I've tested it in deployment-prep and it only adds a definition which isn't sourced anywhere yet." [puppet] - 10https://gerrit.wikimedia.org/r/237085 (owner: 10Muehlenhoff) [09:49:37] jynus: okay :) [09:50:19] so, here it is the thing, kart_ [09:50:28] tables on wikishared are latin1 [09:50:36] not only the database configuration [09:50:49] should we convert them to utf8mb4? [09:51:12] right now you won't be able to insert non-latin characters there [09:51:56] can you point me to your repository for wikishared, to see what was the intended encoding? [09:52:53] on regular wikis, we use the binary encoding, and let mediawiki handle everything [09:53:36] jynus: it is mediawiki/extensions/ContentTranslation [09:53:51] cx_translations is binary already [09:54:07] jynus: that should be good then? [09:55:00] so, only bounce_records is latin1, aside from the default config [09:55:08] the others are binary [09:55:25] and for WMF, binary is the standard [09:55:30] jynus: bounce_records is not CX table. [09:55:33] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests: Rename Võro Wikipedia, fiu-vro -> vro - https://phabricator.wikimedia.org/T31186#1624273 (10Malafaya) I wonder what's up with these rename requests that takes years to complete. What's hindering it? [09:55:50] jynus: ie it is not from Content Translation :) [09:55:51] ok, then we wouldn't touch it [09:55:56] cool. [09:56:19] we should put a bug report for that [09:56:40] so, my proposal would be only to change de default [09:56:42] to binary [09:56:51] (no toching any existing tables) [09:57:06] put the bug report to whoever owns that [09:57:10] jynus: it is from, mediawiki-extensions-bouncehandler but you can check with legoktm to confirm. [09:57:11] and then create the new ones [09:57:20] does that sound ok? [09:57:25] yes. [10:00:33] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, and 4 others: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1624293 (10jcrespo) [10:03:18] (03PS1) 10Filippo Giunchedi: cassandra: add restbase-test200[1-3] spares [puppet] - 10https://gerrit.wikimedia.org/r/237347 (https://phabricator.wikimedia.org/T111382) [10:04:33] (03PS2) 10Filippo Giunchedi: cassandra: add restbase-test200[1-3] spares [puppet] - 10https://gerrit.wikimedia.org/r/237347 (https://phabricator.wikimedia.org/T111382) [10:04:34] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests: Rename Võro Wikipedia, fiu-vro -> vro - https://phabricator.wikimedia.org/T31186#1624314 (10Amire80) Not much any more. I started a thread about it a couple of weeks ago - https://lists.wikimedia.org/pipermail/wikitech-l/2015-August/082914.... [10:04:39] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: add restbase-test200[1-3] spares [puppet] - 10https://gerrit.wikimedia.org/r/237347 (https://phabricator.wikimedia.org/T111382) (owner: 10Filippo Giunchedi) [10:15:34] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, and 4 others: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1624337 (10jcrespo) These 2 tables have been created on x1-master and its slaves:... [10:16:32] kart_, one thing I am going to point out [10:16:45] the second table has no primary key [10:17:45] there is no policy yet, but you should check T17441, and try to fix that [10:20:07] jynus: thanks a lot. [10:20:38] if there is no candidate key, an auto-increment would be fine [10:21:18] I mention this in your own interest- onece the table has data, having no primary key would usually mean a more painful alter process [10:21:26] (and more delays :-P) [10:23:02] jynus: noted. Poked santhosh too. [10:29:22] (03PS1) 10Filippo Giunchedi: install_server: provision restbase-test2 with standard 2-disk raid [puppet] - 10https://gerrit.wikimedia.org/r/237351 [10:29:52] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: provision restbase-test2 with standard 2-disk raid [puppet] - 10https://gerrit.wikimedia.org/r/237351 (owner: 10Filippo Giunchedi) [10:32:47] PROBLEM - Host restbase-test2001 is DOWN: PING CRITICAL - Packet loss = 100% [10:35:32] that's me ^ races between wmf-reimage / clean puppet stored configs / puppet generating icinga config [10:35:58] RECOVERY - Host restbase-test2001 is UP: PING OK - Packet loss = 0%, RTA = 34.31 ms [10:40:08] PROBLEM - configured eth on restbase-test2001 is CRITICAL: Connection refused by host [10:40:17] PROBLEM - Cassandra database on restbase-test2001 is CRITICAL: Connection refused by host [10:40:28] PROBLEM - Check size of conntrack table on restbase-test2001 is CRITICAL: Connection refused by host [10:40:29] PROBLEM - DPKG on restbase-test2001 is CRITICAL: Connection refused by host [10:40:47] PROBLEM - Disk space on restbase-test2001 is CRITICAL: Connection refused by host [10:48:41] (03CR) 10Gilles: [C: 04-1] Send image varnish frontend data from logs to statsd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/234157 (https://phabricator.wikimedia.org/T105681) (owner: 10Gilles) [10:53:08] 6operations, 7Database: Upgrade x1 cluster - https://phabricator.wikimedia.org/T112079#1624574 (10jcrespo) 3NEW a:3jcrespo [10:56:38] RECOVERY - configured eth on restbase-test2001 is OK: OK - interfaces up [10:56:58] RECOVERY - Check size of conntrack table on restbase-test2001 is OK: OK: nf_conntrack is 0 % full [10:56:59] RECOVERY - DPKG on restbase-test2001 is OK: All packages OK [10:57:17] RECOVERY - Disk space on restbase-test2001 is OK: DISK OK [11:03:32] (03CR) 10JanZerebecki: [C: 031] Create real URIs for wikidata RDF URIs [puppet] - 10https://gerrit.wikimedia.org/r/230483 (https://phabricator.wikimedia.org/T97195) (owner: 10Smalyshev) [11:08:30] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, and 4 others: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1601252 (10KartikMistry) [11:20:24] (03PS1) 10Muehlenhoff: Enable the second video scaler in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237354 [11:20:37] (03PS2) 10TTO: Configure $wgExtraSignatureNamespaces for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237331 (https://phabricator.wikimedia.org/T7645) (owner: 10Nemo bis) [11:20:45] (03CR) 10TTO: [C: 031] Configure $wgExtraSignatureNamespaces for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237331 (https://phabricator.wikimedia.org/T7645) (owner: 10Nemo bis) [11:24:32] (03PS1) 10Muehlenhoff: Enable ferm on remaining image scalers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237355 [11:42:27] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.69% of data above the critical threshold [500.0] [11:46:21] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable the second video scaler in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237354 (owner: 10Muehlenhoff) [11:47:10] (03PS2) 10Muehlenhoff: Enable ferm on remaining image scalers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237355 [11:50:38] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:52:55] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm on remaining image scalers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237355 (owner: 10Muehlenhoff) [11:52:57] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624790 (10Dereckson) [11:54:30] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624124 (10Dereckson) Can't repro one of the point: I can access https://fr.wikiversity.org/wiki/Projet:Wikiversit%C3%A9/Cr%C3%A9ation_d%27une_biblioth%C3%A8que_Wikiversitaire (Chrome 44) [11:56:30] (03PS1) 10Filippo Giunchedi: cassandra: fail on missing CA/cert subject [puppet] - 10https://gerrit.wikimedia.org/r/237358 [11:57:16] (03PS2) 10Filippo Giunchedi: cassandra: fail on missing CA/cert subject [puppet] - 10https://gerrit.wikimedia.org/r/237358 [11:57:22] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: fail on missing CA/cert subject [puppet] - 10https://gerrit.wikimedia.org/r/237358 (owner: 10Filippo Giunchedi) [12:02:53] (03PS1) 10Muehlenhoff: Enable ferm on remaining job runners in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237359 [12:06:58] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm on remaining job runners in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237359 (owner: 10Muehlenhoff) [12:19:03] PROBLEM - puppet last run on mw1007 is CRITICAL: CRITICAL: Puppet has 1 failures [12:19:05] (03PS9) 10Gilles: Send image varnish frontend data from logs to statsd [puppet] - 10https://gerrit.wikimedia.org/r/234157 (https://phabricator.wikimedia.org/T105681) [12:19:30] (03CR) 10Gilles: Send image varnish frontend data from logs to statsd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/234157 (https://phabricator.wikimedia.org/T105681) (owner: 10Gilles) [12:24:15] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624897 (10Krenair) What does "*Edition are recorded but not visible." refer to? [12:30:21] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624911 (10Krenair) The search suggestions show behind the sidebar somehow [12:35:18] PROBLEM - Cassandra database on restbase-test2002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (cassandra), command name java, args CassandraDaemon [12:35:18] PROBLEM - Cassandra database on restbase-test2003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (cassandra), command name java, args CassandraDaemon [12:36:20] !log enabled ferm on mediawiki video scalers, image scalers and job runners in codfw [12:36:24] (03PS1) 10Muehlenhoff: Enable ferm on remaining API appservers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237361 [12:36:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:36:57] PROBLEM - Restbase endpoints health on restbase-test2002 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=127.0.0.1, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [12:36:57] PROBLEM - Restbase endpoints health on restbase-test2003 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=127.0.0.1, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [12:37:08] PROBLEM - Restbase root url on restbase-test2002 is CRITICAL: Connection refused [12:37:08] PROBLEM - Restbase root url on restbase-test2003 is CRITICAL: Connection refused [12:38:37] PROBLEM - Cassanda CQL query interface on restbase-test2002 is CRITICAL: Connection refused [12:38:37] PROBLEM - Cassanda CQL query interface on restbase-test2003 is CRITICAL: Connection refused [12:39:15] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm on remaining API appservers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237361 (owner: 10Muehlenhoff) [12:41:18] PROBLEM - Restbase endpoints health on restbase-test2001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=127.0.0.1, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [12:41:28] PROBLEM - Restbase root url on restbase-test2001 is CRITICAL: Connection refused [12:42:37] PROBLEM - Cassanda CQL query interface on restbase-test2001 is CRITICAL: Connection refused [12:44:58] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:47:43] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624964 (10Lionel_Scheepmans) @ Krenair : "Edition are recorded but not visible." meens, I've made a wikicode edit [[ //fr.wikiversity.org/w/index.php?title=Projet:Wikiversité/Journal_scient... [12:49:01] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624971 (10Krenair) Search suggestions work if you add debug=true to the URL [12:50:16] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624982 (10Krenair) >>! In T112069#1624964, @Lionel_Scheepmans wrote: > @ Krenair : "Edition are recorded but not visible." meens, I've made a wikicode edit [[ //fr.wikiversity.org/w/index.p... [12:50:48] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624986 (10Lionel_Scheepmans) >>! In T112069#1624897, @Krenair wrote: > What does "*Edition are recorded but not visible." refer to? I've made a wikicode edit [[ //fr.wikiversity.org/w/index... [12:51:34] !log enabled ferm on mediawiki API servers in codfw [12:51:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:53:16] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624999 (10Lionel_Scheepmans) >>! In T112069#1624982, @Krenair wrote: >>>! In T112069#1624964, @Lionel_Scheepmans wrote: >> @ Krenair : "Edition are recorded but not visible." meens, I've ma... [12:58:07] !sal [12:58:07] https://wikitech.wikimedia.org/wiki/Server_Admin_Log https://tools.wmflabs.org/sal/production See it and you will know all you need. [13:10:22] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1625082 (10JackPotte) Personally I've encountered all these bugs on the French Wiktionary too, in Monobook with the last Firefox. But I never had the time to edit without any gadget to compl... [13:16:56] (03PS1) 10BBlack: varnish: standardize/de-duplicate do_gzip [puppet] - 10https://gerrit.wikimedia.org/r/237366 (https://phabricator.wikimedia.org/T96847) [13:16:58] (03PS1) 10BBlack: Add do_gzip to the misc_web cache cluster [puppet] - 10https://gerrit.wikimedia.org/r/237367 [13:17:00] (03PS1) 10BBlack: Add do_gzip to the maps cache cluster [puppet] - 10https://gerrit.wikimedia.org/r/237368 [13:17:02] (03PS1) 10BBlack: Compress js (and other text) in varnish [puppet] - 10https://gerrit.wikimedia.org/r/237369 (https://phabricator.wikimedia.org/T109040) [13:23:19] (03PS1) 10Muehlenhoff: Enable ferm on remaining appservers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237370 [13:26:51] (03CR) 10Alex Monk: [C: 04-1] "This is changing a single wiki, so please put that in the commit message. And the task on a separate line" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237330 (owner: 10Mdann52) [13:29:32] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [13:30:04] !log performing schema change and maintenance on officewiki and public all wikis with flow enabled [13:30:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:30:36] ^no downtime, just clarifying that [13:34:40] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm on remaining appservers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237370 (owner: 10Muehlenhoff) [13:35:15] !log enabled ferm on mediawiki app servers in codfw [13:35:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:39:10] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:39:31] jynus: I’m back to getting "please make sure all tables are CHARSET=utf8” on my powerdns database. That was an easy fix, right? [13:39:47] that shouldn't happen anymore [13:39:59] good morning andrewbogott :-} [13:40:00] because I changed the config [13:40:08] this is a different db from the one we looked at before [13:40:08] let me check, andrewbogott [13:40:11] thanks [13:40:21] oh, then it is a 1 line change [13:40:32] if you can start from 0 [13:40:38] not from 0 [13:40:49] rerun the command you are doing [13:40:57] I definitely need to preserve the data in that db... [13:40:59] but, ok, trying... [13:41:04] yes, I meant that [13:41:06] not yet [13:41:12] ok :) [13:41:13] have to change it first [13:41:53] to be fair, it should be an upstream bug [13:43:35] so, andrewbogott, probably what you are running has already created some table with the wrong config [13:44:10] yes, most likely [13:44:33] lets talk in private [13:47:12] git fetch https://review.openstack.org/openstack-infra/nodepool refs/changes/73/221873/2 && git cherry-pick FETCH_HEAD [13:47:17] .. [13:50:10] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 693 [13:50:19] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 696 [13:50:48] db1008? [13:51:13] I think that ones is not mine [13:51:39] I think that ones is jeff's [13:55:10] RECOVERY - check_mysql on db1008 is OK: Uptime: 4136840 Threads: 1 Questions: 28318127 Slow queries: 27919 Opens: 67744 Flush tables: 2 Open tables: 64 Queries per second avg: 6.845 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [13:55:19] RECOVERY - check_mysql on lutetium is OK: Uptime: 1901966 Threads: 1 Questions: 14099009 Slow queries: 6077 Opens: 30697 Flush tables: 2 Open tables: 64 Queries per second avg: 7.412 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [14:03:40] godog: mind acking those codfw RB/Cass alerts? [14:09:51] mobrovac: whoops, downtime expired.. [14:14:57] (03PS3) 10Dzahn: phab: use mysql slave not master for scripts [puppet] - 10https://gerrit.wikimedia.org/r/236944 (https://phabricator.wikimedia.org/T111547) [14:15:10] (03CR) 10Dzahn: [C: 032] phab: use mysql slave not master for scripts [puppet] - 10https://gerrit.wikimedia.org/r/236944 (https://phabricator.wikimedia.org/T111547) (owner: 10Dzahn) [14:15:36] (03PS1) 10Filippo Giunchedi: cassandra: new class ca_manager [puppet] - 10https://gerrit.wikimedia.org/r/237377 (https://phabricator.wikimedia.org/T108953) [14:19:42] wut.. applying puppet on iridium, configuration version number keeps changing but what i merged is not applied? [14:20:14] let me check mysql backend [14:21:09] script contents are just not changed .. [14:21:49] (03PS1) 10Hashar: 0.1.1-wmf3: Use 'debian' user for setup step [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/237380 (https://phabricator.wikimedia.org/T111377) [14:22:02] (03PS1) 10Hashar: Add 'debian' user in bootstrapServer() [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/237381 [14:22:27] (03CR) 10Hashar: [C: 04-2] "Not meant to be merged, that is applied to the Debian package as a quilt patch." [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/237381 (owner: 10Hashar) [14:24:03] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests: Rename Võro Wikipedia, fiu-vro -> vro - https://phabricator.wikimedia.org/T31186#1625347 (10Krenair) >>! In T31186#1624273, @Malafaya wrote: > I wonder what's up with these rename requests that takes years to complete. What's hindering it?... [14:25:43] (03PS2) 10Alex Monk: Also update langlist for be-x-old -> be-tarask rename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236966 [14:26:43] (03PS3) 10Alex Monk: Also update langlist for be-x-old -> be-tarask rename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236966 (https://phabricator.wikimedia.org/T111853) [14:27:03] (03CR) 10Alex Monk: [C: 04-1] "Need to deal with T111876" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236966 (https://phabricator.wikimedia.org/T111853) (owner: 10Alex Monk) [14:28:03] (03CR) 10Mobrovac: "LGTM, I just find the class name ca_manager somewhat clumsy... I'd rather lose the underscore, but that's just me" [puppet] - 10https://gerrit.wikimedia.org/r/237377 (https://phabricator.wikimedia.org/T108953) (owner: 10Filippo Giunchedi) [14:32:34] (03PS5) 10BBlack: park wikiartpedia domains [dns] - 10https://gerrit.wikimedia.org/r/197361 (owner: 10Dzahn) [14:34:08] bblack: ..oh :) [14:34:17] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [14:34:42] mutante: seems like a good first test case to see if anyone screams. if nobody does for a while, then we can use that argument to back up a bunch more :) [14:34:58] 7Blocked-on-Operations, 6operations: Upload nodepool_0.1.1-wmf3 to apt.wikimedia.org and upgrade package on labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T112100#1625406 (10hashar) 3NEW a:3hashar [14:35:25] bblack: yes, that's what i thougt too, wikiartpedia and a ".biz" on top of it:) [14:35:38] i have one more like that for "visualwikipedia" [14:35:45] (03CR) 10BBlack: [C: 032] park wikiartpedia domains [dns] - 10https://gerrit.wikimedia.org/r/197361 (owner: 10Dzahn) [14:36:50] i don't think the mobile team wants .mobi either:) [14:37:40] do we really need to park all those domains by registering them ? [14:38:16] hashar: whether we need to register some of our random odd domains is really a legal question, probably complicated and case-by-case [14:38:26] hashar: it's the other way around, they were registered all the time, we just park them now [14:38:44] the thing I'm going after is basically if they're non-canonical, we can at least not have them be working insecure redirects, which kinda screws up the whole TLS thing [14:38:59] so if I went with the idea of wikibizpedia... we would register it ? [14:39:06] the parking template puts them in a state where they "work" and are responsive in a DNS sense, but aren't useful for web browsers [14:39:12] hashar: maybe we would win it in court [14:39:23] hashar: please don't even ask that question heh. the last thing we need is more junk domains [14:39:28] mutante: but still have to register it after we won ? [14:39:50] seems to me if a domain is infringing some trademark it should be parked / disabled at the registrars level [14:39:51] hashar: i guess so, we don't do that part anymore [14:39:54] saving us the operation overhead [14:39:59] and the associated fee [14:40:16] well either way this would be the first step [14:40:20] we can have that conversation later [14:41:00] hashar: it's really a question for legal, markmonitor does some trademark blocking and the registrar too [14:41:05] at [14:41:17] migrating existing junk domains from "aliasing wikipedia.org + redirects from apache" to "parked with no browser support at all" is something we can do and undo locally first. [14:41:48] the list of junk domains is long, and there might be some arguments about whether some of them are truly junk or not along the way [14:42:08] (100+ of them) [14:42:10] (03PS1) 10Alex Monk: Add gom, lrc and azb wikipedias to restbase [puppet] - 10https://gerrit.wikimedia.org/r/237388 (https://phabricator.wikimedia.org/T111897) [14:42:24] pretty sure we have more that WMF owns but are not in our DNS [14:42:36] yeah that too [14:42:39] mutante: bblack: I was just wondering :-} Don't obsess about it! [14:42:50] just it is a pity to have to register/setup/pay for all those junk domains [14:43:01] some of which are acutally pointed at our DNS servers and lame delegating, but we never were informed to configure them either [14:43:05] (03PS1) 10Muehlenhoff: Raise default conntrack table size [puppet] - 10https://gerrit.wikimedia.org/r/237389 (https://phabricator.wikimedia.org/T105307) [14:43:41] Anyone mind helping with an on-wiki javascript problem? Autocomplete is broken on en.wikiversity.org due to some Common.js code that sets User-Agent to MOOC-JS/0.2 (https://en.wikiversity.org/wiki/User:Sebschlicht; sebschlicht@uni-koblenz.de) [14:44:20] i'm pretty sure commenting out "importScript("MediaWiki:Common.js/addin-mooc.js");" from https://en.wikiversity.org/wiki/MediaWiki:Common.js will fix it [14:44:55] it's actually working for me on enwikiversity [14:45:15] despite the error in console about user agent [14:45:16] possibly depends on browser? its the browser rejecting to spoof the User-Agent in an xhr [14:45:27] oh, maybe [14:45:31] I tested with chrome 46 [14:45:36] i doesn't work for me in chrome 44 [14:45:56] There is another wiki which has search autocomplete actually broken though - frwikiversity [14:46:07] oh fun :) i'll poke that too [14:46:10] results appear in a funny position on the page without proper styling [14:55:11] Morning, SWATty people. I have a patch in so I can do it, or I can test, or whatever [14:57:54] (03CR) 10Mobrovac: [C: 031] "We can go ahead with this, no need to coordinate the deploy since RESTBase will not pick up the new config until it's been restarted." [puppet] - 10https://gerrit.wikimedia.org/r/237388 (https://phabricator.wikimedia.org/T111897) (owner: 10Alex Monk) [14:58:34] (03PS1) 10Filippo Giunchedi: cassandra: install certs and CA from private.git [puppet] - 10https://gerrit.wikimedia.org/r/237397 (https://phabricator.wikimedia.org/T108953) [15:00:04] anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150910T1500). Please do the needful. [15:00:05] kart_ marktraceur: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:13] kart_: You ready? [15:00:17] yep [15:00:46] We'll do yours first [15:01:03] (03CR) 10MarkTraceur: [C: 032] Beta: Enable Content Translation suggestions for Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237322 (owner: 10KartikMistry) [15:01:10] (03Merged) 10jenkins-bot: Beta: Enable Content Translation suggestions for Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237322 (owner: 10KartikMistry) [15:03:43] (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/237377 (https://phabricator.wikimedia.org/T108953) (owner: 10Filippo Giunchedi) [15:04:03] * marktraceur waiting for sync [15:04:08] (03PS2) 10Alex Monk: Add gom, lrc and azb wikipedias to restbase [puppet] - 10https://gerrit.wikimedia.org/r/237388 (https://phabricator.wikimedia.org/T111897) [15:04:41] * YuviPanda notes that therr doesn't seem to be anything for puppetswat today [15:04:44] (03CR) 10Hashar: "Filled T112100 to get it deployed." [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/237380 (https://phabricator.wikimedia.org/T111377) (owner: 10Hashar) [15:04:53] (03CR) 10Hashar: [C: 032 V: 032] 0.1.1-wmf3: Use 'debian' user for setup step [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/237380 (https://phabricator.wikimedia.org/T111377) (owner: 10Hashar) [15:05:26] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling: Upload nodepool_0.1.1-wmf3 to apt.wikimedia.org and upgrade package on labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T112100#1625406 (10hashar) [15:05:30] YuviPanda, services were going to put up one of my patches [15:05:31] !log marktraceur@tin Synchronized wmf-config/: [SWAT] [config] Beta: Enable Content Translation suggestions (duration: 02m 22s) [15:05:32] for restbase [15:05:35] That....is a lot of errors [15:05:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:05:41] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling: Upload nodepool_0.1.1-wmf3 to apt.wikimedia.org and upgrade package on labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T112100#1625614 (10hashar) a:5hashar>3None [15:05:42] Krenair: ah [15:05:48] Krenair: don't see it yet tho [15:05:52] or at least, I think they were [15:05:55] 15:05:31 64 apaches had sync errors [15:06:02] Tell me that's not normal [15:06:25] That's not normal. [15:06:30] Should I try it again? [15:06:41] Was this only changing a labs file? [15:07:28] Yes. [15:07:37] But I have another patch for UW that's going to prod. [15:07:39] what were the errors? [15:08:06] sudo -u mwdeploy /usr/bin/rsync returned non-zero exit status 10 [15:08:12] (snipped) [15:08:18] marktraceur: error with patch? [15:08:26] kart_: I think error with deploy, not patch [15:08:30] that it? [15:08:34] Krenair: Yeah [15:09:05] Er, maybe it was a timeout [15:09:14] rsync: failed to connect to mw2187.codfw.wmnet (10.192.32.75): Connection timed out (110) [15:09:25] Not totally clear which error comes first [15:09:26] I see more in the log [15:09:29] 6operations, 10Wikimedia-Site-Requests: instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1625642 (10Nemo_bis) [15:09:38] I mean, there were 64 of them, that's the only one I see [15:09:45] 6operations, 10Wikimedia-Site-Requests: Instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1625644 (10Nemo_bis) [15:10:08] ahh [15:10:12] so mw2187 is a scap proxy [15:10:22] C4 codfw [15:10:39] if you fail to connect to that, likely quite a few others will fail, you'd think [15:10:41] is bd808 around? [15:10:42] So if that's down, then lots of things fail, yeah [15:10:46] * marktraceur wonders. [15:11:00] at least it's codfw and won't be serving real user traffic yet [15:11:05] True. [15:11:08] o/ [15:11:09] Should I carry on? [15:11:41] if we've got a downed proxy it should be removed from the dsh group file [15:11:48] (03CR) 10Eevans: cassandra: install certs and CA from private.git (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/237397 (https://phabricator.wikimedia.org/T108953) (owner: 10Filippo Giunchedi) [15:11:58] 6operations, 10Wikimedia-Site-Requests: Instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1625654 (10Nemo_bis) [15:12:34] kart_: I assume your patch has propagated to Beta, at least, so please check [15:12:49] krenair@fluorine:/a/mw-log$ grep marktraceur scap.log | grep mw2187 -c [15:12:49] 64 [15:12:59] so it probably is the cause of all 64 apache sync failures [15:13:01] 6operations, 10Datasets-General-or-Unknown, 7JavaScript: Instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624124 (10Nemo_bis) [15:13:09] Cool. That's good news. [15:14:13] So if you ignored it and went ahead now because that's codfw only, then you'd have to re-run the sync later [15:14:23] So I might as well wait [15:14:25] No problemo. [15:14:26] well [15:14:43] hashar, that is mostly for andrewbogott, which he found issues, wanted to keep you in the loop in case you could be also affected [15:14:51] I logged into that host yesterday. It much have been the one that was missing i18n cache files [15:15:24] yup -- https://tools.wmflabs.org/sal/log/AU-yrXeA1oXzWjit5e34 [15:15:25] if it were removed, what would those 64 (now 63?) apaches sync from? another nearby proxy? [15:15:27] tin? [15:15:48] they would pull from the "next closest" [15:15:56] probably something else in codfw [15:16:02] jynus: yeah and that is an excellent idea [15:16:35] scap tries the whole list with increasing hop count and picks the first to respond [15:16:39] jynus: my database knowledge for the last year is resumed by: echo $question | pick_favorite_dba [15:16:53] hashar, the summary is wikis' defaults, which we use, may not be liked by many other applications [15:17:15] even if its the application's fault for not checking it [15:17:18] jynus: due to binary / unicode connection and collations right ? [15:17:22] yep [15:17:52] (03PS4) 10Mdann52: noindex userspace on en wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237330 [15:17:53] I will probably write a warning somewhere in the wiki for future users [15:17:56] marktraceur: thanks! [15:18:13] bd808: So why didn't the timeout cause it to use some other proxy? [15:18:33] I can actually log into mw2187... [15:18:33] (03CR) 10Mdann52: "Apologies - I was unaware of the exact requirements - I noticed the first earlier, but I didn't get a chance to update until now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237330 (owner: 10Mdann52) [15:18:39] it's a tcp connect check which apparently worked [15:18:58] marktraceur, try syncing again [15:19:03] Yessir [15:19:26] see if it works this time, because I shouldn't be able to connect and log in if mwdeploy can't [15:19:34] I suppose at some point we should integrate scap pystuff and all of that in a single monster [15:19:41] Looks....similar [15:19:42] I can log in from tin and direct but `last` didn't show mwdeploy until I just did it manually [15:19:46] I'm at 64 remaining [15:20:10] oh, hang on [15:20:36] did moritz just firewall this? [15:20:39] * marktraceur is [15:21:44] !log marktraceur@tin Synchronized wmf-config/: [SWAT] Attempting another sync to mw2187 hoping it's up now (duration: 02m 22s) [15:21:46] Krenair: Talking about the patch re: "ferm" about two hours ago? [15:21:48] 7Puppet, 10Continuous-Integration-Config, 6Scrum-of-Scrums, 5Patch-For-Review: Setup rubocop for operations/puppet ruby code lints - https://phabricator.wikimedia.org/T102020#1625716 (10akosiaris) >>! In T102020#1567220, @zeljkofilipin wrote: > @hashar: any idea on which folders contain third party code?... [15:21:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:21:53] (same result, by the way) [15:22:31] 6operations, 10ops-codfw, 5Patch-For-Review: rack & initial setup of elastic2001-2024 - https://phabricator.wikimedia.org/T111080#1625720 (10fgiunchedi) all machines provisioned, puppet keys not signed yet. I have a couple of takeaways: * legacy bios should be enabled by default from the vendor not UEFI * u... [15:22:54] I can ssh as mwdeploy from tin to mw2187 [15:22:54] Krenair: yes https://gerrit.wikimedia.org/r/#/c/237370/1/manifests/site.pp,unified [15:23:19] do we have ferm rules for rsync? [15:23:36] moritzm, hey [15:23:42] :| [15:23:49] That doesn't bode well [15:24:42] so maybe ferm isn't playing nicely with the proxy [15:25:21] how do the proxies actually work bd808? just like bastions in terms of ssh config? [15:25:26] 6operations, 10Datasets-General-or-Unknown, 7JavaScript: Instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1624124 (10Nemo_bis) We now have each component of the issue tracked, except the search bar thing. >>! In T112069#1624911, @Krenair wrote: > The search suggestions show... [15:25:47] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests: Rename Võro Wikipedia, fiu-vro -> vro - https://phabricator.wikimedia.org/T31186#1625748 (10Amire80) >>! In T31186#1625347, @Krenair wrote: >>>! In T31186#1624273, @Malafaya wrote: >> I wonder what's up with these rename requests that takes... [15:25:51] keyholder isn't running there with the mwdeploy key is it? [15:25:53] hey, is the deployment still ongoing? (still going poorly?) [15:26:03] !log started hadoop decomission of analytics1016 [15:26:04] ssh in to run a fetch; rsync protocol from clients [15:26:09] MatmaRex: codfw is codfwcked [15:26:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:26:32] the ssh all comes from tin [15:26:33] bd808, right... and it's the actual hosts failing to connect to the proxy, right? [15:26:45] (i would like to have https://gerrit.wikimedia.org/r/#/c/237401/ deployed, if you manage to uncodfwck it) [15:26:45] right via rsync [15:26:54] so maybe we're missing ferm rules for rsync to the proxy? [15:26:56] so it seems like 873 on mw2187 is unreachable from mw2188 so rsync ferm rules, I guess. [15:26:57] moritzm, hey [15:26:57] * MatmaRex watches [15:27:01] MatmaRex: Noted [15:27:09] I'll probably be doing that during our standup. :) [15:27:40] Let this be a lesson to me, never think to yourself "Oh, two patches, I can finish that SWAT in 30 minutes" [15:27:43] * bd808 is trying to find the puppet bits that setup the proxies [15:28:42] why is mw2187 a scap proxy while being commented from mediawiki-installation? [15:28:48] is that ok? [15:28:55] $scap_proxies = hiera('dsh::config::scap_proxies',[]) [15:29:05] two different lists :/ [15:30:42] rsync module defined in puppet/modules/scap/manifests/proxy.pp; applied from puppet/manifests/role/mediawiki.pp if $::fqdn in hiera('dsh::config::scap_proxies',[]) [15:30:50] I don't see any ferm for it at all [15:31:27] ::role::mediawiki::common needs ferm in the if member($scap_proxies, $::fqdn) {...} block [15:31:46] hmm, there was a ferm rule in modules/rsync/manifests/server.pp [15:32:44] 6operations, 10Datasets-General-or-Unknown, 7JavaScript: Instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1625829 (10Krenair) >>! In T112069#1625736, @Nemo_bis wrote: > We now have each component of the issue tracked, except the search bar thing. > >>>! In T112069#1624911, @... [15:34:02] hey roots! we need some help to fix scap. broken in codfw due to missing ferm rules for rsync [15:35:00] 6operations, 10Datasets-General-or-Unknown, 7JavaScript: Instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1625859 (10Nemo_bis) [15:36:51] bd808, Krenair, I'm going to do the other patches and just come back and scap later [15:36:59] ok [15:37:09] it looks like it is just codfw right? [15:37:15] if so yeah move along [15:38:21] Hi people [15:38:34] Has there been scheduled maintance today? [15:38:41] akosiaris: looks like moritzm has a patch ready in gerrit -- https://gerrit.wikimedia.org/r/#/c/237087/ [15:38:50] ShakespeareFan00: what's broken? [15:39:03] Early I've been getting some very odd and intermittent issues in accessing sites [15:39:27] bd808: yes that one looks good. let me merge it [15:39:32] as in click the link in a Google search and the site fails to load at all (nothing , no 404) and no page referesh [15:39:35] Phabricator got updated and I synced a config patch, ShakespeareFan00, nothing else I think [15:39:48] codfw has troubles but it's not serving pages yet (I thought) [15:39:57] That wouldn't cause non access of an entire Wikimedia site though [15:40:08] and links from Google to hang Firefox [15:40:19] codfw caches are serving pages to some parts of the US [15:40:33] A couple of days agao I was having a problem withh the logo on Wikipedia non appearing... [15:40:34] but not mw server AFAIK [15:40:41] hmmm, firefox has a problem with redirect loops on pages with ' some ppl are reporting on WP:VP/T [15:40:54] yes [15:40:56] I've heard a similar report about particular links causing a redirect loop in FF [15:41:00] thedj: https://gerrit.wikimedia.org/r/#q,237405,n,z [15:41:05] somebody please deploy that [15:41:09] if it's possible [15:41:15] (03PS2) 10Alexandros Kosiaris: Add ferm rules for scap proxy [puppet] - 10https://gerrit.wikimedia.org/r/237087 (owner: 10Muehlenhoff) [15:41:29] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add ferm rules for scap proxy [puppet] - 10https://gerrit.wikimedia.org/r/237087 (owner: 10Muehlenhoff) [15:41:31] i just put it in the table for the current SWAT. are we still swatting? [15:41:53] (03CR) 10Alexandros Kosiaris: "Merging as people complained about scap not working in codfw. LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/237087 (owner: 10Muehlenhoff) [15:42:11] Also from -tech : https://support.mozilla.org/de/questions/1038557 [15:42:41] marktraceur: what is the status of this SWAT? [15:43:26] MatmaRex: I'm syncing the offset patch, yours is up next [15:43:35] akosiaris: thanks. can you force puppet on the codfw hosts in https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/dsh/config.yaml [15:43:36] codfw is failing but I'm accepting that as fate [15:43:43] ah. okay, thanks [15:43:46] bd808: yes I was about to [15:43:51] my hero :) [15:44:08] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [15:44:25] So what's causing link failures? [15:44:42] (and intermittent images) [15:45:30] !log marktraceur@tin Synchronized php-1.26wmf22/extensions/UploadWizard/resources/transports/mw.FormDataTransport.js: [SWAT] [wmf22] Always set 'offset' with chunked uploads, even for first chunk (offset == 0) (duration: 02m 21s) [15:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:45:51] Oh, MatmaRex, is your patch going to fix ShakespeareFan00's problem? [15:46:02] what is ShakespeareFan00's problem? [15:46:21] 2015-09-10 - 10:33:06 and links from Google to hang Firefox [15:46:26] Complete link failure with no obviosu page change when accessing some Wikimedia sites [15:46:41] probably not? unless the pages have an apostrophe in the titles, in which case probably yes [15:47:02] en.wikiversity.org doesn't have an apostrphe annd it wasn't accessing [15:47:13] The fault seemt to be intermittent [15:47:16] the bug i'm fixing is that you get a "redirect loop" error on some versions of Firefox when trying to view a page with an apostrophe in the title [15:47:27] anything else is not my fault :) [15:47:34] But the pages I'm viewing don't have apostrophes... [15:47:45] then you likely have a different problem :) [15:47:49] Still puzzled why it works on IE, but not intermittently in fieforx [15:47:55] Ah well, I was hopeful for nothing [15:48:01] thedj: OK So what broke? [15:50:17] 'some wikimedia sites' [15:50:23] if we knew what's broken, it would be already fixed :P [15:51:11] thedj: Namely "Wikisource" annd "Wikiversity" currently [15:51:58] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail [15:52:12] bd808: I suppose scap is behaving fine now ? [15:52:44] marktraceur: ^ any news good or bad for akosiaris? [15:53:04] No news right now. I'll try in a sec. [15:53:16] no news is good news!! [15:53:17] Waiting for a patch to merge. [15:53:32] if only that was the case right now [15:53:53] ShakespeareFan00: i have no problems with FF + google + those web properties [15:53:55] I could try touching something and syncing it if you want, but my patch should be merged in a matter of minutes [15:54:15] That makes it all the more frustratring [15:54:39] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1625979 (10ArielGlenn) Authentication errors: finally found one at least of the causes. there's some cleanup script that deletes certain salt keys if they don't match a particular... [15:54:46] thedj: As it's ann intermittent problem [15:55:09] (03PS1) 10RobH: I'm no longer on vacation. [puppet] - 10https://gerrit.wikimedia.org/r/237409 [15:55:09] flapping route somewhere? [15:55:25] bd808: maybe [15:55:36] !log restbase disabled puppet on rb100x [15:55:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:55:46] akosiaris: YuviPanda: ^^ [15:55:53] ok [15:55:53] (03PS2) 10RobH: RobH back into pager contacts. [puppet] - 10https://gerrit.wikimedia.org/r/237409 [15:56:15] but not sure how to prove it conclusively as it's intermittennt [15:56:25] (03CR) 10Krinkle: [C: 031] varnish: standardize/de-duplicate do_gzip [puppet] - 10https://gerrit.wikimedia.org/r/237366 (https://phabricator.wikimedia.org/T96847) (owner: 10BBlack) [15:56:53] (03CR) 10Mobrovac: [C: 031] "I have disabled puppet on rb100x, so we can go ahead, merge it and run puppet in staging" [puppet] - 10https://gerrit.wikimedia.org/r/237388 (https://phabricator.wikimedia.org/T111897) (owner: 10Alex Monk) [15:57:07] YuviPanda: Looks like you have a patch for puppet SWAT, so just fyi I'm still waiting on one patch [15:57:13] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1626007 (10ArielGlenn) Note that reauth of all minions after key rotation takes a little time, this is by design. you don't want several hundred or a thousand hosts all trying to do... [15:57:19] Syncing soon here [15:57:37] OK did something change in the last few seconds? [15:57:40] (03PS3) 10RobH: RobH back into pager contacts. [puppet] - 10https://gerrit.wikimedia.org/r/237409 [15:57:46] marktraceur: ok, I'll wait for you to be done [15:57:56] because styles definitly just broke for me [15:58:00] on wikisource [15:58:01] Ta. [15:58:06] (03CR) 10RobH: [C: 032] RobH back into pager contacts. [puppet] - 10https://gerrit.wikimedia.org/r/237409 (owner: 10RobH) [15:58:34] marktraceur: Found something? [15:58:48] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1626023 (10ArielGlenn) oh and in the long term we shouldn't be running such a script once a minute. let's delete salt keys on instance deletion etc [15:58:50] ShakespeareFan00: Nothing for you right now, I'm just SWATting [15:59:03] Oh OK [15:59:07] ShakespeareFan00, what happens when you clear your browser cache [15:59:12] apergos: lol [15:59:20] yeah wow [15:59:21] The fault goes away for a time [15:59:34] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: phabricator metrics script should use slave, not master - https://phabricator.wikimedia.org/T111547#1626025 (10Dzahn) 5Open>3Resolved a:3Dzahn applied on iridium now. scripts use slave. i ran one of them and works normal. resolving [15:59:49] but I've been cache clearing on each reccurrence of the glitch which shouldn't be needed (i.e like once ann hour or so) [15:59:57] ShakespeareFan00, do you have any firefox extensions? [16:00:04] YuviPanda akosiaris: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150910T1600). Please do the needful. [16:00:04] mobrovac: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:07] Hm, scap seems to be hanging. Never seen that before. [16:00:18] jynus: Not that should be enabled ofr Wikimedia sites [16:00:23] marktraceur: still SWATting? [16:00:33] Yeah, just waiting for one last sync... [16:00:34] akosiaris: let's wait for marktraceur to finish swatting [16:00:54] kk [16:01:18] OK here we go, proxies going [16:01:33] bblack: the gzip here, https://gerrit.wikimedia.org/r/#/c/237367/1/modules/role/manifests/cache/misc.pp, would that apply to all requests unconditionally (eg. incl images) [16:01:42] All is good [16:01:49] bblack: It's fine I suppose, just wondering. [16:01:56] YuviPanda: If you would be a dear and run scap when you're done, that's all I have remaining to do [16:02:01] !log marktraceur@tin Synchronized php-1.26wmf22/: [SWAT] [wmf22] Revert opera redirect loop fix that caused redirect loops in Firefox (duration: 02m 30s) [16:02:08] All yours [16:02:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:02:10] ShakespeareFan00, what if you browse in icognito mode? [16:02:13] MatmaRex: Can you make sure it's working now? [16:02:16] marktraceur: not it :D but puppetswat shouldn't do anything with swat [16:02:21] marktraceur: sorry for the interruption [16:02:22] marktraceur: thanks, i can view https://www.mediawiki.org/wiki/Manual:Chris_G's_botclasses using Firefox 3.6 again [16:02:23] I don't have incognitio mode in Firefox [16:02:27] Nemo_bis: around? can you verify? [16:02:37] ShakespeareFan00, which version are you using of firefox? [16:02:37] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 1 below the confidence bounds [16:02:42] YuviPanda: Mind if I run a scap just to bring codfw into the fold, then? [16:02:47] marktraceur: sure [16:03:07] MatmaRex: confirmed; thanks a lot [16:03:07] 40.0.3 I think [16:03:12] !log marktraceur@tin Started scap: Make sure codfw got the last few patches sync'd to it [16:03:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:03:33] ShakespeareFan00, well it is not called incognito mode but "new private window" [16:03:45] Still puzzled why this non-link issue is only Wikisource and not the mian Wikipedia [16:03:57] ShakespeareFan00: Different versions today, I think [16:04:06] ? [16:04:18] could be a cache issue, a cooky issue [16:04:20] wikipedia is not running the same version as wikisource at the moment, ShakespeareFan00 [16:04:27] it's planned to update later today [16:04:33] Ah [16:04:39] So I might see a breakage there? [16:04:46] I noted the Notifcation box had changed [16:04:49] If it doesn't get fixed in the meantime... [16:04:51] and there was, I think, a javascrip issue at some point in some wikis [16:05:05] at some point in some wikis? :) [16:05:13] more like javascript issues at all points in all wikis [16:05:20] Krenair, yes, I could be more specific [16:05:23] :-) [16:05:30] :D [16:05:43] 6operations, 10Datasets-General-or-Unknown, 7JavaScript: Instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1626057 (10matmarex) The patch from {T106793}, which accidentally broke viewing pages with a single quote (apostrophe) in their titles in some Firefox versions, is reverte... [16:05:50] 6operations, 10Datasets-General-or-Unknown, 7JavaScript: Instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1626059 (10matmarex) [16:05:53] Wikiversity, in the Project: namespace, with a cache issue. [16:06:08] * marktraceur starts designing Wikimedia Cluster Clue [16:06:30] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1626065 (10ArielGlenn) --rotate-aes-key this option to salt-key -d in the script will stop the rotations in case/when this comes up again. the quickie fix. [16:06:31] Sorry to useless at bug pinpointing [16:06:48] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [16:09:38] 6operations, 10Datasets-General-or-Unknown, 7JavaScript: Instability on fr.wikiversity server - https://phabricator.wikimedia.org/T112069#1626081 (10matmarex) [16:10:41] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1626084 (10akosiaris) OK, intrigued. Which script ? [16:10:48] !log marktraceur@tin Finished scap: Make sure codfw got the last few patches sync'd to it (duration: 07m 36s) [16:10:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:11:36] marktraceur: scap's done I suppose. YuviPanda I think we can proceed with puppetswat [16:11:38] OK then! [16:11:41] thanks! [16:11:48] akosiaris: 100% perfect, thanks for fixing [16:11:51] akosiaris: yup! wanna do the merges? :) [16:11:56] mobrovac: moving forward with puppetswat [16:12:14] akosiaris: gimme 5 min please [16:12:23] (03PS3) 10Alexandros Kosiaris: Add gom, lrc and azb wikipedias to restbase [puppet] - 10https://gerrit.wikimedia.org/r/237388 (https://phabricator.wikimedia.org/T111897) (owner: 10Alex Monk) [16:12:29] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add gom, lrc and azb wikipedias to restbase [puppet] - 10https://gerrit.wikimedia.org/r/237388 (https://phabricator.wikimedia.org/T111897) (owner: 10Alex Monk) [16:12:59] mobrovac: ok, stalling on the merge [16:13:06] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [16:13:26] !log started puppetSWAT [16:13:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:14:53] akosiaris, wanna do it yourself, as I did 2 on Tuesday? [16:15:12] you probably won't sweat [16:15:18] jynus: yeah I am already on it [16:15:22] :-) [16:17:16] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [16:17:21] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: phabricator metrics script should use slave, not master - https://phabricator.wikimedia.org/T111547#1626139 (10jcrespo) Thank you, @Dzahn, didn't had the time! Thanks a lot! [16:19:16] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:19:17] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [16:19:57] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [16:20:19] 2? which is the other one? [16:20:30] thanks jynus and akosiaris :) [16:21:05] so not thank me, I did absolutly nothing [16:21:10] *do [16:21:11] jynus: robh key [16:21:17] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [16:21:20] ah! [16:21:28] I 'll merge them both once mobrovac gives me an OK [16:22:23] ack [16:22:24] sorry [16:22:39] akosiaris: sorry about that, thx for merging [16:23:00] robh: no worries. and wb! [16:23:26] one of these days ill learn to put a vacation day at the end of vacation days. i wanted to sleep in so bad! thx =] [16:23:37] yay [16:23:48] heh [16:26:12] (03PS1) 10Alexandros Kosiaris: WIP: just testing something [puppet] - 10https://gerrit.wikimedia.org/r/237412 [16:26:19] mobrovac: any luck ? [16:27:37] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [16:29:56] (03CR) 10Dzahn: "it's Thursday now. re: "-1 until the 3 day-wait rule (expires on Thursday)."" [puppet] - 10https://gerrit.wikimedia.org/r/236793 (https://phabricator.wikimedia.org/T111756) (owner: 10Dzahn) [16:30:20] akosiaris: i'm back [16:30:22] sorry, meeting [16:30:25] (03PS1) 10Jcrespo: Depool es1001 for decommision; increase weight of es1015 and es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237414 (https://phabricator.wikimedia.org/T105843) [16:30:25] go ahead [16:30:51] jynus: ^ is your -1 not valid anymore now? it's Thursrday [16:30:53] akosiaris: once merged, could you force a puppet run on staging? [16:31:33] (03CR) 10Jcrespo: [C: 031] admin: add user for addshore [puppet] - 10https://gerrit.wikimedia.org/r/236793 (https://phabricator.wikimedia.org/T111756) (owner: 10Dzahn) [16:31:38] mobrovac: yeah sure [16:31:44] jynus: thx [16:31:45] mutante, will merge later [16:31:49] ok [16:32:02] I think there is another related one [16:32:06] had to check [16:32:28] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [16:32:50] (03PS2) 10Filippo Giunchedi: cassandra: install certs and CA from private.git [puppet] - 10https://gerrit.wikimedia.org/r/237397 (https://phabricator.wikimedia.org/T108953) [16:33:19] mobrovac: merged. you should be ok to go [16:33:33] thnx akosiaris [16:33:57] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [16:34:10] Krinkle: no, it only applies to the mime-types matching the regex [16:34:27] (the final pending patch to that is here: https://gerrit.wikimedia.org/r/#/c/237369/) [16:38:05] akosiaris: yupi, works on staging, out of precaution will enable puppet only on rb1001 and then try there [16:40:41] Who would be the best person to talk to about how Wikimedia uses Grafana? [16:40:53] (03PS1) 10Andrew Bogott: Designate/pdns changes for kilo. [puppet] - 10https://gerrit.wikimedia.org/r/237417 [16:41:42] walkeran: godog probably [16:41:58] Perfect. Thanks! [16:44:17] (03CR) 10Yuvipanda: [C: 04-1] Designate/pdns changes for kilo. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/237417 (owner: 10Andrew Bogott) [16:44:38] walkeran: hashar (not here atm) and addshore have also been peeking at it [16:45:48] YuviPanda: Gotcha. Thanks again [16:49:12] bblack: aha, that parameter materialises as @vcl_config.fetch("do_gzip", .. ) it doesn't set beresp.do_gzip itself [16:49:27] !log restbase enabled puppet on rb100x [16:49:32] (03PS2) 10Andrew Bogott: Designate/pdns changes for kilo. [puppet] - 10https://gerrit.wikimedia.org/r/237417 [16:49:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:50:14] akosiaris: YuviPanda: patch applied, restarting rb on rb100x and then we should be done [16:50:25] !log restbase rolling restart of rb100x [16:50:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:50:41] mobrovac: in the future, if you have conflicts during the puppetswat window, you shouldn't put it for puppetswat :) [16:50:46] akosiaris: ^ [16:51:06] YuviPanda: yes, i know, sorry for the confusion [16:51:25] (03CR) 10Krinkle: [C: 031] Compress js (and other text) in varnish [puppet] - 10https://gerrit.wikimedia.org/r/237369 (https://phabricator.wikimedia.org/T109040) (owner: 10BBlack) [16:51:59] YuviPanda: ok [16:52:07] !log puppetswat done [16:52:12] did puppet swat en- [16:52:14] oh [16:52:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:52:16] ok [16:52:44] Krenair: your rb patch got through though [16:52:45] thnx [16:53:48] 6operations, 6Discovery: Fix CirrusSearch monitoring - https://phabricator.wikimedia.org/T84163#1626330 (10Deskana) a:5Manybubbles>3None [16:55:02] mobrovac, when will restbase apply it? [16:56:16] Krenair: restarting RB as we speak [16:56:22] ok [16:58:21] mobrovac: ping me too when done. [16:58:34] kk [17:06:41] akosiaris: YuviPanda: restart done, all good, thnx a lot! [17:06:55] Krinkle: kart_: done, these are now available via restbase [17:06:56] mobrovac: ok thanks [17:07:09] (03CR) 10Yuvipanda: "Services look familiar." [puppet] - 10https://gerrit.wikimedia.org/r/237417 (owner: 10Andrew Bogott) [17:07:19] Krenair: I assume that was for you ^ [17:07:38] yep, thanks [17:08:34] ah, right, sorry Krinkle (hitting tab too quickly apparently) :P [17:09:38] (03PS1) 10Ottomata: Move hdfs balancer cron command into a script to be sure stdout and stderr are properly redirected [puppet] - 10https://gerrit.wikimedia.org/r/237423 [17:10:36] (03PS2) 10Ottomata: Move hdfs balancer cron command into a script to be sure stdout and stderr are properly redirected [puppet] - 10https://gerrit.wikimedia.org/r/237423 [17:13:10] (03CR) 10Ottomata: [C: 032] Move hdfs balancer cron command into a script to be sure stdout and stderr are properly redirected [puppet] - 10https://gerrit.wikimedia.org/r/237423 (owner: 10Ottomata) [17:14:48] (03PS3) 10Andrew Bogott: Designate/pdns changes for kilo. [puppet] - 10https://gerrit.wikimedia.org/r/237417 [17:17:11] (03CR) 10Yuvipanda: [C: 031] "That should work!" [puppet] - 10https://gerrit.wikimedia.org/r/237417 (owner: 10Andrew Bogott) [17:19:14] (03PS4) 10Andrew Bogott: Designate/pdns changes for kilo. [puppet] - 10https://gerrit.wikimedia.org/r/237417 [17:20:19] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [17:20:33] (03CR) 10Andrew Bogott: [C: 032] Designate/pdns changes for kilo. [puppet] - 10https://gerrit.wikimedia.org/r/237417 (owner: 10Andrew Bogott) [17:23:43] mobrovac: Thanks! [17:23:51] mobrovac: seems working now. [17:23:57] nice! [17:28:11] Thanks Krenair for the patch :) [17:28:39] heya, need deployment, tin, salt, network.pp advice [17:28:46] who should I Piiinng... [17:28:47] ? [17:28:48] hm. [17:28:58] akosiaris: maybe? [17:31:06] ottomata, deployment+tin? what's up? [17:31:24] can't help with salt/network.pp though [17:31:46] so, apache on tin restricts requests via Order allow,deny [17:31:47] allow from ... [17:31:48] ottomata: wassup ? [17:32:02] eventlogging is deployed to a few places, one of which includes hafnium.wikimedia.org [17:32:04] ottomata, yes? [17:32:04] which has a public IP [17:32:13] $deployable_networks = $::network::constants::deployable_networks [17:32:26] $deployable_networks = [ [17:32:27] $mw_appserver_networks, [17:32:27] $analytics_networks, [17:32:27] ] [17:32:27] } [17:32:59] oh, right, you're deploying the eventlogging service rather than mw? [17:33:00] (03PS1) 10Andrew Bogott: Tidy up designate init scripts [puppet] - 10https://gerrit.wikimedia.org/r/237425 [17:33:07] yes [17:33:07] (03CR) 10jenkins-bot: [V: 04-1] Tidy up designate init scripts [puppet] - 10https://gerrit.wikimedia.org/r/237425 (owner: 10Andrew Bogott) [17:33:16] and, eventlogging hasn't properly deployed to hafnium since may [17:33:22] i don't see what changed made it stop working [17:33:27] but we need to be able to deploy there [17:33:34] (03PS2) 10Andrew Bogott: Tidy up designate init scripts [puppet] - 10https://gerrit.wikimedia.org/r/237425 [17:33:41] i'd rather not hardcode hafnium's IP into the list of deployable networks, but i'm not sure what else to do [17:33:47] when i try to pull from hafnium, i get [17:34:00] [client 2620:0:861:3:7a2b:cbff:fe1f:1748] client denied by server configuration: [17:34:06] in tin's apache logs [17:34:32] maybe it is using ipv6 instead of ipv4? [17:34:42] well, there are IPv6 networks that are allowed [17:34:44] just not public ones [17:34:47] (03CR) 10Andrew Bogott: [C: 032] Tidy up designate init scripts [puppet] - 10https://gerrit.wikimedia.org/r/237425 (owner: 10Andrew Bogott) [17:35:20] why not add hafnium to $analytics_networks, if it is part of that? [17:36:08] cause hafnium is not analytics [17:36:19] as in it is not in the analytics networks [17:36:27] not in those VLANs [17:36:52] ottomata: I am looking into it btw [17:37:56] while aleks checks it, I will see what we do with addshore :-) [17:37:57] ottomata: so public1-b-eqiad has been whitelisted because of silver from what I see [17:38:14] adding public1-c-eqiad does not seems like a bad idea [17:38:21] I am liking it [17:38:32] lemme create a patch [17:39:03] (03PS1) 10Andrew Bogott: Make config and logfile explicit for designate services [puppet] - 10https://gerrit.wikimedia.org/r/237426 [17:39:39] jynus: so i think we can just merge that one that creates his user, and then we still need new changes to be the actual access request that add him to the right groups [17:39:56] and which groups these are is maybe defined already but maybe not [17:40:01] ok, mutante let's do in 2 steps [17:40:06] yes, it is defined [17:40:08] btw [17:40:10] $deployable_networks = [ [17:40:10] $mw_appserver_networks, [17:40:10] $analytics_networks, [17:40:11] ] [17:40:14] this I dislike [17:40:21] (03CR) 10Andrew Bogott: [C: 032] Make config and logfile explicit for designate services [puppet] - 10https://gerrit.wikimedia.org/r/237426 (owner: 10Andrew Bogott) [17:40:22] the first element is based on a software [17:40:26] the second on a team [17:40:32] jynus: this one is like a pre-requisite for 2 tickets [17:40:34] It's back [17:40:35] not the best way to create infrastructure [17:40:43] mutante, yes [17:40:47] The non-link issue has reccured again [17:40:53] This time on Wikipedia itselg [17:40:58] *itself [17:41:15] A few days ago prior to the most recent upgrade it was OK [17:41:19] So what chanaged? [17:41:25] look at the list [17:41:38] jynus: the ssh key is not particularly long, but i dont think we have strict rules for that [17:41:39] Krenair : which list? [17:41:51] the list of commits in wmf22? [17:42:05] Well what ever changed , it broke [17:42:14] although wikipedia is not on wmf22 yet [17:42:20] (03PS1) 10Andrew Bogott: Typo fixes [puppet] - 10https://gerrit.wikimedia.org/r/237427 [17:42:22] Lots of things change all the time, ShakespeareFan00. [17:42:25] Yeah [17:42:38] It's likely that only one of them would be to blame. [17:42:42] but until a few days ago I wasn't getting this intermittent link failure issue [17:42:46] (and the missing logo) [17:43:09] So far I've tried cache clearing about eveyr hour or so [17:43:13] (03CR) 10Andrew Bogott: [C: 032] Typo fixes [puppet] - 10https://gerrit.wikimedia.org/r/237427 (owner: 10Andrew Bogott) [17:43:20] which I am starting to find rathe tiresome [17:43:45] would like to make that "openstack: typo fixes" or so [17:43:48] akosiaris: not a team [17:43:54] as i am not having issues with non-wikimedia sites... [17:43:57] its based on a cluster [17:44:02] mutante, how did you determined the uid? [17:44:04] of nodes for a purpose [17:44:08] andrewbogott: you should prefix commit messages with the module they're doing it to :) so 'openstack: Typo fixes' [17:44:22] ottomata: not making it much better. still inconsistent [17:44:27] yeah, that’s fair [17:44:35] jynus: i went to terbium.eqiad.wmnet and ran "ldaplist -l passwd addshore" [17:44:47] akosiaris: you don't liek the deployable_networks part, or the fact that there is an $analytics_networks [17:44:48] ? [17:45:05] of I am fine with $analytics_networks [17:45:20] it 's the mixing of mw networks and analytics network in deployable_networks [17:45:27] jynus: we try to always use the existing uidNumber from labs to match those up. if a user wants production but doesnt have labs (rare) we even ask them to quickly create a labs user for that [17:45:47] yes, makes sense, mutante [17:45:59] oh ok [17:46:00] i see [17:46:10] uhhh not sure how else to do that though [17:46:17] you think we should make more network vars? [17:46:23] yup... I am thinking about it [17:46:46] messing with network.pp is so often unclear... [17:46:50] yeah [17:46:53] what to do ... what to do .. [17:47:05] i mean, i think just leave it as it is [17:47:17] it is fine really. it'd be a pain if we had to update network.pp anytime a node purpose changes [17:47:18] or we add a new one [17:47:42] indeed [17:47:54] the restriction here just keeps random nodes from pulling from git [17:47:56] 6operations, 7Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#1626647 (10Addshore) I guess an issue with grafana however is that the currently access policy means that anyone can write... [17:47:56] sorry [17:47:56] from tin. [17:48:25] we also have the ticket formerly known as "kill network.pp" [17:48:35] so if there is a way to move stuff _out_ of that, that would be good [17:48:55] recently i could remove like 10 lines at least :p [17:49:29] we 'll never fully kill network.pp unfortunately [17:49:32] (03PS1) 10Andrew Bogott: Move labvirt1004 to kilo [puppet] - 10https://gerrit.wikimedia.org/r/237431 [17:49:45] but yeah, a lot of stuff can move out of it [17:49:47] yea, i saw it got renamed to "where possible" and "as much as we can" [17:51:03] (03CR) 10Andrew Bogott: [C: 032] Move labvirt1004 to kilo [puppet] - 10https://gerrit.wikimedia.org/r/237431 (owner: 10Andrew Bogott) [17:51:51] (03PS1) 10Legoktm: Remove legoktm from integration shinken alerts [puppet] - 10https://gerrit.wikimedia.org/r/237432 [17:52:24] YuviPanda: ^ plz [17:52:39] legoktm: debugging some ORES stuff, poke me in 1h [17:52:56] ok [17:53:09] ottomata: I think I got an idea [17:53:27] this is actual "gerrit spam" as opposed to normal review requests: "(Abandoned) Umherirrender: fdsafdsa [sandbox] .. (owner: Gerrit Patch Uploader) ". patch uploader makes it easy but there are 2 sides to that [17:53:50] akosiaris: oh? [17:55:03] mutante: those look like valhallasw testing the patch uploader :P [17:55:20] legoktm: oh, i thought he was just cleaning up spam [17:58:54] (03PS1) 10Yuvipanda: celery: Allow customizing loglevels [puppet] - 10https://gerrit.wikimedia.org/r/237434 [17:59:22] (03PS2) 10Yuvipanda: celery: Allow customizing loglevels [puppet] - 10https://gerrit.wikimedia.org/r/237434 [18:00:05] twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150910T1800). [18:00:26] (03PS1) 10Jcrespo: addshore groups: bastion, analytics-privatedata and mw-log-readers [puppet] - 10https://gerrit.wikimedia.org/r/237437 (https://phabricator.wikimedia.org/T111204) [18:01:09] (03CR) 10Jcrespo: [C: 04-1] "This patch depends on gerrit:236793" [puppet] - 10https://gerrit.wikimedia.org/r/237437 (https://phabricator.wikimedia.org/T111204) (owner: 10Jcrespo) [18:01:20] (03CR) 10Yuvipanda: [C: 032] celery: Allow customizing loglevels [puppet] - 10https://gerrit.wikimedia.org/r/237434 (owner: 10Yuvipanda) [18:01:29] (03CR) 10jenkins-bot: [V: 04-1] addshore groups: bastion, analytics-privatedata and mw-log-readers [puppet] - 10https://gerrit.wikimedia.org/r/237437 (https://phabricator.wikimedia.org/T111204) (owner: 10Jcrespo) [18:02:26] nice that it checks that for me :-) [18:04:50] RECOVERY - RAID on ms-be1010 is OK: OK: optimal, 13 logical, 13 physical [18:05:21] jouncebot: next [18:05:21] In 4 hour(s) and 54 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150910T2300) [18:05:26] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1626713 (10Dzahn) We have a ticket at T112135 asking about phabricator dumps being disabled since Aug 28. Looking at that i saw the cronjob for there has been di... [18:06:11] (03PS1) 10Yuvipanda: celery: Because puppet is possessive of parameters named loglevel [puppet] - 10https://gerrit.wikimedia.org/r/237441 [18:06:17] (03CR) 10jenkins-bot: [V: 04-1] celery: Because puppet is possessive of parameters named loglevel [puppet] - 10https://gerrit.wikimedia.org/r/237441 (owner: 10Yuvipanda) [18:07:02] twentyafterfour: would it be possible to sneak https://gerrit.wikimedia.org/r/237440 in your train deploy? if not I can do it afterwards [18:07:04] (03PS2) 10Yuvipanda: celery: Because puppet is possessive of parameters named loglevel [puppet] - 10https://gerrit.wikimedia.org/r/237441 [18:07:18] (03CR) 10Yuvipanda: [C: 032 V: 032] celery: Because puppet is possessive of parameters named loglevel [puppet] - 10https://gerrit.wikimedia.org/r/237441 (owner: 10Yuvipanda) [18:07:35] 6operations, 10ops-eqiad: ms-be1010.eqiad.wmnet: slot=5 dev=sdf failed - https://phabricator.wikimedia.org/T111553#1626719 (10Cmjohnson) 5Open>3Resolved Swapped the disk and added back. Enclosure Device ID: 32 Slot Number: 5 Drive's position: DiskGroup: 13, Span: 0, Arm: 0 Enclosure position: 1 Device Id:... [18:09:01] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1626723 (10mmodell) When we had issues before, I noticed heavy search engine indexer activity. I committed some changes to the robots.txt to exclude a lot of URL... [18:09:31] (03PS1) 1020after4: wikipedia wikis to 1.26wmf22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237443 [18:09:39] !log reseating pem2 cr2-eqiad [18:09:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:09:47] legoktm: sure [18:09:59] (03PS1) 10Andrew Bogott: Labs: Move labvirt1009 to kilo and add to the scheduler pool [puppet] - 10https://gerrit.wikimedia.org/r/237444 [18:10:54] twentyafterfour: i also have something for wikidata (would be nice not to wait until swat at 1am) [18:10:55] 6operations, 10fundraising-tech-ops: build libanon package for trusty - https://phabricator.wikimedia.org/T110739#1626728 (10Jgreen) >>! In T110739#1585629, @Jgreen wrote: > Doesn't compile: > > checking whether to build shared libraries... yes > checking whether to build static libraries... yes > ./configure... [18:11:11] aude: ok, link? [18:11:47] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1626731 (10jcrespo) @Dzahn, very unlikely, but still, dumps should be done on the slave (they create more overhead than statistics). We can talk about that on tha... [18:11:48] PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: puppet fail [18:12:03] twentyafterfour: thanks! [18:12:12] (03PS2) 10Andrew Bogott: Labs: Move labvirt1009 to kilo and add to the scheduler pool [puppet] - 10https://gerrit.wikimedia.org/r/237444 [18:12:49] link coming in a minute [18:12:55] 6operations, 10fundraising-tech-ops: package udp-filter for Trusty, for use on fundraising banner_logger - https://phabricator.wikimedia.org/T110592#1626741 (10Jgreen) [18:13:09] (03PS1) 10Alexandros Kosiaris: networks::constants: Bring up to par IPv4/IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/237446 [18:13:15] (03CR) 10Andrew Bogott: [C: 032] Labs: Move labvirt1009 to kilo and add to the scheduler pool [puppet] - 10https://gerrit.wikimedia.org/r/237444 (owner: 10Andrew Bogott) [18:13:37] ottomata: ^ [18:14:16] 6operations, 10ops-eqiad: ms-be1010.eqiad.wmnet: slot=5 dev=sdf failed - https://phabricator.wikimedia.org/T111553#1626756 (10Cmjohnson) Return Info USPS 9202 3946 5301 2428 7317 75 FEDEX (9611918) 2393026 50223458 [18:14:26] 6operations, 10fundraising-tech-ops: package udp-filter for Trusty, for use on fundraising banner_logger - https://phabricator.wikimedia.org/T110592#1581705 (10Jgreen) >>! In T110592#1620649, @Ottomata wrote: > Ah hm. Ok! Since they work fine there, maybe we can just reprepro copy them into Trusty? I don't... [18:16:01] (03CR) 10Alexandros Kosiaris: [C: 031] networks::constants: Bring up to par IPv4/IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/237446 (owner: 10Alexandros Kosiaris) [18:16:17] ottomata: this should solve your problem. I think it's ok and more consistent than before. feel free to merge [18:16:19] 6operations, 7Graphite, 7Monitoring: Restrict edit rights in grafana / enable dashboard deletion - https://phabricator.wikimedia.org/T93710#1626770 (10GWicke) [18:16:40] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1626771 (10jcrespo) The spikes are actually back since the 8th, but not re-surfing to the user because of the patch I mentioned. I would wait for this weekend aga... [18:17:31] 6operations, 10fundraising-tech-ops: package udp-filter for Trusty, for use on fundraising banner_logger - https://phabricator.wikimedia.org/T110592#1626773 (10Jgreen) Package builds for Precise, but explodes in a ball of fire on Trusty in what appears to be an automake forward incompatibility: > Makefile.am:... [18:17:43] (03PS1) 10Yuvipanda: ores: Allow customizing log_level [puppet] - 10https://gerrit.wikimedia.org/r/237447 [18:17:46] legoktm: merged and pulled, should I sync wmf22 and test it somehow before updating wikipedias to the new branch? [18:17:48] PROBLEM - puppet last run on db2045 is CRITICAL: CRITICAL: puppet fail [18:17:50] (03PS1) 10Ori.livneh: Put Grafana behind password authentication [puppet] - 10https://gerrit.wikimedia.org/r/237448 [18:18:03] (03PS2) 10Yuvipanda: ores: Allow customizing log_level [puppet] - 10https://gerrit.wikimedia.org/r/237447 [18:18:08] twentyafterfour: if you sync it I can test it on mw.o [18:18:21] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1626778 (10jcrespo) [18:18:32] (03CR) 10Ori.livneh: [C: 032 V: 032] Put Grafana behind password authentication [puppet] - 10https://gerrit.wikimedia.org/r/237448 (owner: 10Ori.livneh) [18:18:33] twentyafterfour: https://gerrit.wikimedia.org/r/#/c/237449/ [18:18:36] (03CR) 10Yuvipanda: [C: 032 V: 032] ores: Allow customizing log_level [puppet] - 10https://gerrit.wikimedia.org/r/237447 (owner: 10Yuvipanda) [18:18:44] so we have to wait for jenkins [18:18:57] (03PS3) 10Yuvipanda: ores: Allow customizing log_level [puppet] - 10https://gerrit.wikimedia.org/r/237447 [18:18:59] (03CR) 10jenkins-bot: [V: 04-1] ores: Allow customizing log_level [puppet] - 10https://gerrit.wikimedia.org/r/237447 (owner: 10Yuvipanda) [18:18:59] * aude links the bugs [18:19:04] (03CR) 10Yuvipanda: [V: 032] ores: Allow customizing log_level [puppet] - 10https://gerrit.wikimedia.org/r/237447 (owner: 10Yuvipanda) [18:20:17] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1597666 (10jcrespo) @Addshore, 3 days have passed, which is the delay requirement by the policy, which means your request is approved. [18:20:28] (03CR) 10MaxSem: "I don't see this variable being used in maps VCL, unlike mobile, text and upload. Also, is it really needed? Background: of all data being" [puppet] - 10https://gerrit.wikimedia.org/r/237368 (owner: 10BBlack) [18:20:54] 6operations, 6Parsing-Team, 6Services, 10VisualEditor: [DRAFT] Services team roadmap October - December 2015 (Q2 2015/16) - https://phabricator.wikimedia.org/T111819#1626804 (10GWicke) [18:21:15] 6operations, 6Parsing-Team, 6Services, 10VisualEditor: [DRAFT] Services team goals October - December 2015 (Q2 2015/16) - https://phabricator.wikimedia.org/T111819#1626807 (10GWicke) [18:21:33] (03PS1) 10Yuvipanda: ores: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/237451 [18:21:39] (03CR) 10jenkins-bot: [V: 04-1] ores: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/237451 (owner: 10Yuvipanda) [18:21:50] (03PS2) 10Yuvipanda: ores: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/237451 [18:22:29] (03PS4) 10Jcrespo: admin: add user for addshore [puppet] - 10https://gerrit.wikimedia.org/r/236793 (https://phabricator.wikimedia.org/T111756) (owner: 10Dzahn) [18:23:00] (03CR) 10Jcrespo: [C: 032] admin: add user for addshore [puppet] - 10https://gerrit.wikimedia.org/r/236793 (https://phabricator.wikimedia.org/T111756) (owner: 10Dzahn) [18:23:06] (03CR) 10Yuvipanda: [C: 032] ores: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/237451 (owner: 10Yuvipanda) [18:23:11] aude: no problem, I'm syncing wmf22 in the meantime [18:23:13] !log twentyafterfour@tin Synchronized php-1.26wmf22: deploy https://gerrit.wikimedia.org/r/#/c/237440/ (duration: 01m 42s) [18:23:18] legoktm: it's sync'd [18:23:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:23:22] twentyafterfour: ok [18:23:46] twentyafterfour: confirmed fixed, thanks [18:23:58] YuviPanda, please deploy [18:24:20] jynus: ? [18:24:23] np [18:24:33] I had a conflict with you [18:24:40] :-) [18:24:47] (03PS1) 10Andrew Bogott: Openstack nova: Increase the allowed disk ration a bit [puppet] - 10https://gerrit.wikimedia.org/r/237452 [18:24:58] 6operations, 7Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#1626821 (10Dzahn) see this now: https://gerrit.wikimedia.org/r/#/c/237448/ [18:25:08] jynus: aaah,. want me to rebase and +2 yours again? [18:25:11] jynus: I merged mine [18:25:21] I am giving you preference :-) [18:25:25] nice! [18:25:32] then now it is me :-) [18:25:34] 6operations, 7Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#1626830 (10Addshore) heh, well, its now behind LDAP so I guess this can be closed for now? [18:25:35] jynus: :D thanks! [18:25:41] (03PS5) 10Jcrespo: admin: add user for addshore [puppet] - 10https://gerrit.wikimedia.org/r/236793 (https://phabricator.wikimedia.org/T111756) (owner: 10Dzahn) [18:26:52] (03PS2) 10Andrew Bogott: Openstack nova: Increase the allowed disk ratio a bit [puppet] - 10https://gerrit.wikimedia.org/r/237452 [18:27:35] 6operations, 7Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#1626838 (10Dzahn) @ori did this to mitigate a JS injection issue on grafana reported by @Volker_E. {P2006} [18:29:54] (03PS2) 10Jcrespo: addshore groups: bastion, analytics-privatedata and mw-log-readers [puppet] - 10https://gerrit.wikimedia.org/r/237437 (https://phabricator.wikimedia.org/T111204) [18:29:56] (03PS3) 10Andrew Bogott: Openstack nova: Increase the allowed disk ratio a bit [puppet] - 10https://gerrit.wikimedia.org/r/237452 [18:30:18] RECOVERY - puppet last run on db2045 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:31:16] jynus, does https://gerrit.wikimedia.org/r/#/c/237453/1/db_patches/patch-reference_wiki-phase2.sql look more performant? [18:31:23] If so I'll test it. [18:31:27] (03CR) 10Andrew Bogott: [C: 032] Openstack nova: Increase the allowed disk ratio a bit [puppet] - 10https://gerrit.wikimedia.org/r/237452 (owner: 10Andrew Bogott) [18:31:56] matt_flaschen, we can try it :-) [18:32:15] let me copy the table on a slave and I will test it [18:32:36] Great, thanks. I'll take a look at the explain on x1. [18:33:18] in theory, those are things that I should do, but to be fair, this week any help is welcome [18:34:10] and you also have a taste of "my world" [18:35:26] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1626882 (10Ottomata) +1 [18:35:26] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1626883 (10Ottomata) +1 [18:35:36] (03CR) 10Yurik: "agree with max - why would we use this? From my undertsanding, adding do_zip alters ETag, and regardless - it is not needed as we always r" [puppet] - 10https://gerrit.wikimedia.org/r/237368 (owner: 10BBlack) [18:36:15] AndyRussG: I'm gonna deploy https://gerrit.wikimedia.org/r/#/c/234830/ if there isn't any objection [18:36:28] OK [18:36:37] Finally someone suggested what might be causing proeblems [18:36:46] Does anyone here use Avast? [18:37:16] because according to someone in #firefox that was alleged to have broken Firefox in Windows 10 [18:37:39] (03CR) 10Jcrespo: [C: 031] addshore groups: bastion, analytics-privatedata and mw-log-readers [puppet] - 10https://gerrit.wikimedia.org/r/237437 (https://phabricator.wikimedia.org/T111204) (owner: 10Jcrespo) [18:37:46] twentyafterfour: great! thanks :) sorry for the delay on that. Do you want to merge CN master into wmf_deploy or cherry-pick? [18:38:47] AndyRussG: that's what I was just trying to decide, I guess merge if there aren't any potentially problematic commits on master [18:38:47] 6operations, 7Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#1626912 (10faidon) A few clarifications are in order: - We do want our Graphite data and our dashboards to be open. It's part of our transparency efforts. - Graphite is not actually behind auth, the original task descrip... [18:38:49] RECOVERY - puppet last run on mw2178 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:39:03] (03CR) 10Ottomata: [C: 031] networks::constants: Bring up to par IPv4/IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/237446 (owner: 10Alexandros Kosiaris) [18:39:26] twentyafterfour: I think that's usually better... gimme 2 minutes, I'll check to be sure [18:39:32] akosiaris: +1ed [18:39:33] sorry was in meeting [18:39:38] ori: why grafana behind pw? [18:40:26] jynus, it still says "rows: 20634", and there are only 41,723 where ref_src_wiki is unpopulated. Does that mean it's scanning half those rows even with the LIMIT 1000? [18:41:01] Or maybe not half those particular rows, but an equivalent number. [18:41:08] AndyRussG: I don't see anything that looks too suspicious [18:41:57] 6operations, 10ops-eqiad, 10netops: cr2-eqiad PEM 2 failure - https://phabricator.wikimedia.org/T112000#1626929 (10Cmjohnson) Started ticket with JTAC Dear Juniper Networks Customer, Thank you for contacting the Juniper Networks Global Support. We have opened Service Request number 2015-0910-0589 to track... [18:42:08] twentyafterfour: yeah all good! [18:42:17] matt_flaschen, maybe, let me check [18:42:34] twentyafterfour: When would you deploy? Just to make sure I'm nearby [18:45:39] AndyRussG: right about now .. just waiting for wikidata to merge and I think it just finished testing [18:46:05] twentyafterfour: cool! [18:46:08] Thanks much :) [18:46:31] * aude tries to be patient for jenkins [18:46:57] 6operations, 7Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#1627017 (10faidon) Also note T93710 which is similar in nature and could be potentially merged here. [18:47:15] oooh, it's merged! [18:49:37] matt_flaschen, Unknown column 'flow_wiki_ref_deleteme.workflow_id' in 'where clause' [18:49:54] the flow_wiki_ref_deleteme is the fake table [18:50:32] aude: I'm gonna deploy the wikidata patch before pushing wmf22 to wikipedias [18:50:42] ok [18:51:28] jynus: I don't see flow_wiki_ref.workflow_id in his query, but I do see flow_workflow.workflow_id [18:51:33] !log twentyafterfour@tin Synchronized php-1.26wmf22/extensions/Wikidata: Deploy wikidata patch: https://gerrit.wikimedia.org/r/#/c/237449/ (duration: 00m 19s) [18:51:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:51:42] RoanKattouw, yes, just saw it [18:52:00] AndyRussG: https://gerrit.wikimedia.org/r/#/c/237458/ look ok? [18:52:07] that is a convoluted subquery [18:52:11] I just merged master into wmf_deploy [18:52:16] * aude tests [18:52:29] akosiaris: , can I merge? [18:52:48] If it makes you feel any better, the one we had before was worse :P (involved three tables) [18:53:08] ha [18:53:28] twentyafterfour: looks ok [18:53:39] 6operations, 10ops-ulsfo: Properly patch Telia @ ulsfo - https://phabricator.wikimedia.org/T112152#1627080 (10faidon) 3NEW a:3RobH [18:53:45] (03PS2) 10Yuvipanda: Remove legoktm from integration shinken alerts [puppet] - 10https://gerrit.wikimedia.org/r/237432 (owner: 10Legoktm) [18:53:47] (03CR) 1020after4: [C: 032] wikipedia wikis to 1.26wmf22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237443 (owner: 1020after4) [18:53:51] and don't see anything problematic in the logs [18:53:52] (03CR) 10Yuvipanda: [C: 032 V: 032] Remove legoktm from integration shinken alerts [puppet] - 10https://gerrit.wikimedia.org/r/237432 (owner: 10Legoktm) [18:53:58] so RoanKattouw, matt_flaschen it takes 35 seconds to execute and changes 1022 rows? [18:54:06] (03Merged) 10jenkins-bot: wikipedia wikis to 1.26wmf22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237443 (owner: 1020after4) [18:54:10] twentyafterfour: hmm one sec... [18:54:41] Changes 1022 rows despite LIMIT 1000? [18:54:45] Oh I guess I can see how that works [18:55:07] yes, the limit is in the select, not in the update [18:55:11] not an issue [18:55:13] twentyafterfour: yeah looks fine! [18:55:19] Yeah and the WHERE covers multiple rows [18:55:23] But yeah that's fine [18:55:28] but still 35 seconds every execution [18:55:39] :S [18:55:49] Do you think that's because the WHERE part of the UPDATE is unindexed? [18:56:07] I brought that up earlier and Matt said he was aware but questioned whether it would be worth it to index that just for this [18:56:18] 6operations, 10ops-ulsfo: Move NTT @ ulsfo to a different patch - https://phabricator.wikimedia.org/T112154#1627110 (10faidon) 3NEW a:3RobH [18:56:39] actually yes, I can create an index online without problems and delete it later [18:56:49] I can try easily on the fake table [18:57:20] OK [18:57:30] How slow would creating the index be? [18:57:39] Hm I guess you'll find out when you try it on the fake table :) [18:57:52] !log restarted phd on iridium [18:58:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:58:02] AndyRussG: thanks [18:58:09] RoanKattouw, the index is not an issue because it can be done online an independently on each host [18:58:28] Oh right [18:58:29] Likewise...! Mmm I don't have +2 rights on the deploy branchs of core, however [18:58:31] It doesn't have to replicate [18:58:40] yes, can be done twice [18:58:47] 3, actually [18:58:56] AndyRussG: You should have +2 rights on deploy branches if and only if you have deploy access [18:59:09] Because we have a rule that if you +2 something there, you need to deploy it imminently [19:00:05] RoanKattouw: Yeah I hear... it's something I just haven't gotten around to fixing... 8p [19:00:17] https://gerrit.wikimedia.org/r/#/admin/groups/21,members [19:00:32] RoanKattouw, wost case scenario, we can do a failover, but I would try to avoid that option for now [19:00:57] I'll fix it [19:01:09] So far CentralNotice deploys outside the train or swat have been group operations, so it hasn't been an issue... Ah thanks Krenair! [19:02:11] AndyRussG, you should now be able to +2 on deployment branches [19:02:24] Krenair: cool beans! [19:03:37] This *should* be done when you get deployment access. [19:03:44] But several people are mysteriously missing it. [19:04:02] I strongly suspect that most of these people do not actually need deployment access. [19:04:33] Query OK, 1001 rows affected, 1 warning (0.22 sec) <-- it owrks [19:05:00] RoanKattouw, matt_flaschen [19:05:28] Found the glitch [19:05:36] Avast SNAFU [19:05:36] (03PS1) 10Ottomata: Make eventlogging mysql consumer consume from kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237469 (https://phabricator.wikimedia.org/T106260) [19:05:39] Yay! [19:05:49] So people [19:05:51] ALTER TABLE flow_wiki_ref add index(ref_src_workflow_id, ref_src_wiki); [19:06:01] If you wannt Wikimedia sites to work for you don't use Avast [19:06:11] What does Wikimedia use as AV? [19:06:14] Clam? [19:06:29] (03CR) 10jenkins-bot: [V: 04-1] Make eventlogging mysql consumer consume from kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237469 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [19:06:35] Hmmm [19:07:04] (03PS2) 10Ottomata: Make eventlogging mysql consumer consume from kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237469 (https://phabricator.wikimedia.org/T106260) [19:08:34] Thanks, jynus, RoanKattouw. BTW, the reason for the weird subjquery is just that mysql does not allow a LIMIT on an UPDATE that accesses more than one table. [19:08:48] ah, yes [19:08:57] in any case [19:09:02] this is the plan [19:09:17] we add ref_src_workflow_id, ref_src_wiki (it would be nice to put in on the patch) [19:09:30] !log twentyafterfour@tin Synchronized php-1.26wmf22/extensions/CentralNotice: deploy https://gerrit.wikimedia.org/r/#/c/237458/ (duration: 00m 12s) [19:09:31] in an online fashion [19:09:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:10:08] do the rest of the process as the new patch suggests [19:10:12] ok I think I'm ready to actually finish the train deployment now [19:10:18] then delete the index is it is not useful [19:10:30] does that seem ok? [19:10:31] jynus [19:10:33] Hi [19:10:38] hello, ShakespeareFan00 [19:10:48] Does someone have a list of Wikimedia domains? [19:11:06] So I can tell Avast's Web shield to calm the *** down [19:11:11] the dns configuration is public [19:11:12] have fun [19:11:19] jynus: Sounds good to me [19:11:19] Link? [19:11:22] ShakespeareFan00: I guess you want wm.o dns? [19:11:29] you should have them in puppet operations/dns [19:11:37] !log twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf22 [19:11:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:11:46] RoanKattouw, it is getting late here [19:11:47] https://github.com/wikimedia/operations-dns if you like GitHub [19:11:49] jynus, yeah, sounds reasonable. [19:11:52] jynus: I'm on an airplane and I only have about 22 minutes of battery left, but hopefully matt_flaschen will be around [19:11:56] let me comment on the patch [19:11:57] just the names... SPF... I don't need the key to Kimbos nuke vault.... [19:12:00] I will. [19:12:00] ;) [19:12:43] (03PS1) 10Yurik: Changed yurik's pub key [puppet] - 10https://gerrit.wikimedia.org/r/237473 [19:12:45] however, I won't be able to monitor it if we apply it now [19:12:48] (03PS3) 10Ottomata: Make eventlogging mysql consumer consume from kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237469 (https://phabricator.wikimedia.org/T106260) [19:12:57] maybe do it on my morning? [19:13:04] in 12 hours? [19:13:51] jynus, I won't be here in 12 hours, but I'll have the patch tested and up (including ADD and DROP index) ready for you. [19:14:07] SPF|Cloud: Thanks [19:14:08] yes, that is the plan, I will roll the schema change [19:14:08] 12 hours from now it'll be 3am in Matt's timezone [19:14:20] I can probably figure out most of the domains from there... [19:14:24] If needed [19:14:43] matt_flaschen, your call [19:14:46] And 12 hours from now I will be in your timezone :) but I will also be sleep-deprived and in a car somewhere [19:15:06] but I do not want to do a schema change and not be around [19:15:23] could someone +2 my pubkey change https://gerrit.wikimedia.org/r/#/c/237473/ -- can be confirmed via the phone if needed. [19:15:29] I mean, we can schedule for Monday, your morning [19:15:33] no problem with that [19:16:06] jynus, do you need someone from Collaboration team around when you do it? If so, mlitn might be. [19:16:14] Oh, yeah, good idea [19:16:17] He's in CEST [19:16:22] yes [19:16:52] Friday is not a good day for deployments, anyway [19:16:56] Right [19:17:09] I'll also be awake on Monday morning/afternoon your time [19:17:15] because I'll be in the same timezone [19:17:19] So lets schedule for your Morning, I will be awake [19:17:31] jynus, I'm going to be out on Monday, though. [19:17:42] I have a few non-work things to do but I can keep an eye on IRC, or we can schedule it before 2pm or so [19:17:48] you put a date :-) [19:18:02] at that time (my afternoon, your morning) [19:18:05] matt_flaschen: You OK with Monday with both me and mlitn being around? [19:18:08] and we roll it [19:18:14] RoanKattouw, yes, that's fine. [19:18:17] OK [19:18:19] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.69% of data above the critical threshold [500.0] [19:18:34] jynus: Does Monday 11am work for you? [19:18:50] 11 Pacific? [19:18:51] What's with the apache vs. www-data split? [19:19:19] jynus: 11am Amsterdam/Paris/Berlin time [19:19:30] matt_flaschen, RoanKattouw lets go to #wikimedia-office to avoid creating more noise here [19:19:35] sorry [19:19:38] * RoanKattouw will be working remotely from the Netherlands that week [19:19:39] (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/237397 (https://phabricator.wikimedia.org/T108953) (owner: 10Filippo Giunchedi) [19:19:40] #wikimedia-databases [19:19:46] Krenair: i think that comes even from Fedora vs. Ubuntu .. [19:19:59] https://wikitech.wikimedia.org/wiki/UID too [19:20:25] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests: Rename Võro Wikipedia, fiu-vro -> vro - https://phabricator.wikimedia.org/T31186#1627432 (10Malafaya) Thank you both for the explanations. [19:20:26] www-data would be standard on Debian-like systems [19:20:32] Some servers don't seem to care, but some of them will only work if you sudo as www-data [19:20:36] and a few don't work with either [19:20:48] (to run php) [19:21:47] let's make a list of server names for these groups [19:22:09] then check what they have in common ? [19:22:26] Krenair: Solved my glitch from earlier [19:22:33] (03CR) 10Madhuvishy: [C: 031] Make eventlogging mysql consumer consume from kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237469 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [19:22:35] Turend out be Avast Web Shield [19:22:48] (03CR) 10Ottomata: [C: 032] Make eventlogging mysql consumer consume from kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237469 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [19:22:55] being paranoid [19:23:02] Yes, I saw [19:23:20] (03CR) 10Gilles: "Did strangers start messing with stuff when the SSL dashboard hit hacker news?" [puppet] - 10https://gerrit.wikimedia.org/r/237448 (owner: 10Ori.livneh) [19:23:22] (It thought AVG's install shouldn't be dowloaded!! and silently blocked it!!!) ROFL [19:24:29] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:27:37] akosiaris: i wanna merge it! i think i'm gonna! [19:27:40] you +1ed it so! [19:27:41] here we go [19:27:48] (03PS2) 10Ottomata: networks::constants: Bring up to par IPv4/IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/237446 (owner: 10Alexandros Kosiaris) [19:28:09] (03CR) 10Ottomata: [C: 032 V: 032] networks::constants: Bring up to par IPv4/IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/237446 (owner: 10Alexandros Kosiaris) [19:33:11] 6operations, 10hardware-requests, 7Performance: Refresh Parser cache servers pc1001-pc1003 - https://phabricator.wikimedia.org/T111777#1627495 (10ori) [19:34:37] (03PS1) 10Dzahn: mailman: don't use "c" option with rsync [puppet] - 10https://gerrit.wikimedia.org/r/237476 [19:37:22] (03PS2) 10Dzahn: mailman: don't use "c" option with rsync [puppet] - 10https://gerrit.wikimedia.org/r/237476 [19:38:14] 6operations, 10netops, 7Monitoring: Netflow Collector Project - https://phabricator.wikimedia.org/T83119#909374 (10faidon) [19:39:25] (03CR) 10Dzahn: [C: 032] mailman: don't use "c" option with rsync [puppet] - 10https://gerrit.wikimedia.org/r/237476 (owner: 10Dzahn) [19:39:55] (03PS2) 10Jcrespo: Depool es1001 for decommision; increase weight of es1015 and es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237414 (https://phabricator.wikimedia.org/T105843) [19:40:08] hi, please merge my pubkey change while i'm online so i can check - https://gerrit.wikimedia.org/r/#/c/237473/ -- can be confirmed via the phone if needed. [19:41:17] (03CR) 10Jcrespo: [C: 032] Depool es1001 for decommision; increase weight of es1015 and es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237414 (https://phabricator.wikimedia.org/T105843) (owner: 10Jcrespo) [19:41:19] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [500.0] [19:43:44] (03CR) 10BBlack: "Yeah, I fixed it a couple of times. most-recently they wiped out all the definitions :P" [puppet] - 10https://gerrit.wikimedia.org/r/237448 (owner: 10Ori.livneh) [19:46:50] (03PS2) 10BBlack: varnish: standardize/de-duplicate do_gzip [puppet] - 10https://gerrit.wikimedia.org/r/237366 (https://phabricator.wikimedia.org/T96847) [19:47:05] (03CR) 10Dzahn: [C: 031] "i can confirm "bastion" and "mw-log-readers" is right, for the analytics groups i'd get a +1 from ottomata though" [puppet] - 10https://gerrit.wikimedia.org/r/237437 (https://phabricator.wikimedia.org/T111204) (owner: 10Jcrespo) [19:47:12] (03CR) 10BBlack: [C: 032 V: 032] varnish: standardize/de-duplicate do_gzip [puppet] - 10https://gerrit.wikimedia.org/r/237366 (https://phabricator.wikimedia.org/T96847) (owner: 10BBlack) [19:47:49] ebernhardson: would be nice to slowly push https://phabricator.wikimedia.org/T109715 along [19:47:50] (03PS2) 10BBlack: Add do_gzip to the misc_web cache cluster [puppet] - 10https://gerrit.wikimedia.org/r/237367 [19:48:57] (03CR) 10BBlack: [C: 032] Add do_gzip to the misc_web cache cluster [puppet] - 10https://gerrit.wikimedia.org/r/237367 (owner: 10BBlack) [19:49:06] (03PS1) 10Ottomata: Remove eventlogging multiplexer, forward kafka eventlogging events to zmq port 8600 [puppet] - 10https://gerrit.wikimedia.org/r/237479 (https://phabricator.wikimedia.org/T106260) [19:49:08] (03PS2) 10BBlack: Add do_gzip to the maps cache cluster [puppet] - 10https://gerrit.wikimedia.org/r/237368 [19:49:31] (03PS2) 10Ottomata: Remove eventlogging multiplexer, forward kafka eventlogging events to zmq port 8600 [puppet] - 10https://gerrit.wikimedia.org/r/237479 (https://phabricator.wikimedia.org/T106260) [19:50:01] (03CR) 10BBlack: [C: 032 V: 032] Add do_gzip to the maps cache cluster [puppet] - 10https://gerrit.wikimedia.org/r/237368 (owner: 10BBlack) [19:50:33] 6operations, 10ops-codfw, 10netops: cr1-eqdfw PEM 0 failure - https://phabricator.wikimedia.org/T110435#1627613 (10Papaul) create an account with Juniper, waiting for them to send me a confirmation email for activation. [19:50:36] (03PS2) 10BBlack: Compress js (and other text) in varnish [puppet] - 10https://gerrit.wikimedia.org/r/237369 (https://phabricator.wikimedia.org/T109040) [19:50:49] PROBLEM - puppet last run on mw2110 is CRITICAL: CRITICAL: puppet fail [19:50:54] i think on stat1002 they are loaded [19:51:00] oops wrong chat [19:52:31] (03PS3) 10Ottomata: Remove eventlogging multiplexer, forward kafka eventlogging events to zmq port 8600 [puppet] - 10https://gerrit.wikimedia.org/r/237479 (https://phabricator.wikimedia.org/T106260) [19:52:42] YuviPanda: if you have 2.5TB of disk i can try :) [19:52:52] (03CR) 10BBlack: [C: 032] Compress js (and other text) in varnish [puppet] - 10https://gerrit.wikimedia.org/r/237369 (https://phabricator.wikimedia.org/T109040) (owner: 10BBlack) [19:53:02] 6operations, 10ops-ulsfo: Move NTT @ ulsfo to a different cross-connect - https://phabricator.wikimedia.org/T112154#1627622 (10faidon) [19:53:16] (03PS3) 10Jcrespo: addshore groups: bastion, analytics-privatedata and mw-log-readers [puppet] - 10https://gerrit.wikimedia.org/r/237437 (https://phabricator.wikimedia.org/T111204) [19:53:21] ebernhardson: Dell PowerEdge R420, dual Intel Xeon E5-2450 v2 2.50GHz, 64GB Memory, (4) 3TB Disks [19:53:32] ebernhardson: not sure how it'll do with only 64G RAM [19:54:09] YuviPanda: hard to say. could try though :) [19:54:37] (03CR) 10Jcrespo: [C: 032] "Otto game the +1 here: https://phabricator.wikimedia.org/T111204#1626882" [puppet] - 10https://gerrit.wikimedia.org/r/237437 (https://phabricator.wikimedia.org/T111204) (owner: 10Jcrespo) [19:54:42] ebernhardson: I can file a request to get that one setup if you think you'll have the time to try set it up :) [19:55:21] YuviPanda: getting it running shouldn't be too hard, i'm not sure what to do about the pipeline from prod to labs yet. there isn't any kind of cross-cluster replication in es [19:55:49] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:55:51] there used to be a thing called rivers for that, but es found them unstable (or something) and removed them [19:56:00] YuviPanda: so i suppose...more brainstorming before requesting hardware [19:56:05] ebernhardson: yeah, I guess that'll be the 'big deal'. Also will need to do security stuff - prevent deleted / revdelled things from showing up [19:56:06] ebernhardson: yeah [19:56:32] ebernhardson: I noted the revdel stuff there [19:56:53] generally revdel isn't a big deal in search, we only index the top revision and you always revdel old revisions [19:56:57] so the only question is did we get the update [19:57:08] right [19:57:13] 6operations, 10Datasets-General-or-Unknown, 7JavaScript: Some other bugs with Firefox 37.0 and Vector skin on fr.wikiversity : *Predefined messages for summary deit box don't work. *Purge option (preference>gadget ) don't work. *Book creator menu for adding p... - https://phabricator.wikimedia.org/T112161#1627653 [19:57:25] ebernhardson: so then it means we can't just dump and reload every other day / week, I guess [19:57:46] rivers like river the lucene hero, funny [19:58:12] 6operations, 10Datasets-General-or-Unknown, 7JavaScript: Some other bugs with Firefox 37.0 and Vector skin on fr.wikiversity - https://phabricator.wikimedia.org/T112161#1627665 (10Krenair) [19:58:24] 6operations, 7JavaScript: Some other bugs with Firefox 37.0 and Vector skin on fr.wikiversity - https://phabricator.wikimedia.org/T112161#1627653 (10Krenair) [19:59:09] YuviPanda: i suppose, i just wrote all the code for elasticsearch updates to go to multiple clusters. *if* the job runners are allowed to talk to the beta labs it could work. Might actually be a good test as well. [19:59:20] YuviPanda: i don't know what the deal is with opening up holes thugh [19:59:32] labs != beta labs :P [19:59:36] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to fluorine / mw-log-readers group for Addshore - https://phabricator.wikimedia.org/T111756#1627671 (10jcrespo) 5Open>3Resolved a:3jcrespo Access provided. [19:59:38] :P [19:59:45] i meant just labs :) [19:59:52] yeah we can open up specific hles [19:59:53] *holes [20:00:00] ebernhardson: that's how the db replication works [20:00:33] 6operations, 10ops-codfw, 10netops: cr1-eqdfw PEM 0 failure - https://phabricator.wikimedia.org/T110435#1627674 (10Cmjohnson) Papaul, Once you have an account ping me on IRC. In the meantime you will need to pull a couple of reports for JTAC. You will need to have: Shipping information needed for RMA. (O... [20:00:41] YuviPanda: so i guess i changed my mind, we can ask for hardware :) [20:01:08] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [20:01:09] ebernhardson: hah :) write that out in the ticket and I"ll setup the ticket for hardware setup? [20:01:34] 6operations: Setup install server in codfw - tftp done, but not apt and other install services - https://phabricator.wikimedia.org/T84380#1627681 (10faidon) [20:01:36] 6operations, 10ops-codfw, 10netops: cr1-eqdfw PEM 0 failure - https://phabricator.wikimedia.org/T110435#1627683 (10Cmjohnson) Also worth noting, make sure you mention in the notes that you reseated the PEM several times. [20:01:37] ebernhardson: doing at least some back-of-the-envelope calculations about the ratio of disks to memory would be nice tho [20:02:05] ebernhardson: also, does ES support any form of user accounts? [20:02:34] (03CR) 10Ottomata: [C: 032] Remove eventlogging multiplexer, forward kafka eventlogging events to zmq port 8600 [puppet] - 10https://gerrit.wikimedia.org/r/237479 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [20:02:36] YuviPanda: oh, no that is also going to be an issue. we might have to throw together a quick nginx/nodejs/whatever proxy that only allows GET requests through [20:02:42] YuviPanda: anyone can delete or change anything [20:02:57] it's pretty easy to reverse proxy with apache [20:03:14] probably nginx too but I'm an apache geek [20:03:21] whatever works :) [20:03:30] ebernhardson: so if we just allow GETs [20:03:40] hmm we can even whitelist them [20:03:48] bd808: yeah, pretty easy with nginx too :D [20:03:51] re ES auth: https://www.elastic.co/products/shield [20:03:55] * YuviPanda has never worked with apache much in life [20:03:56] in theory people can still crash the cluster with gets ... but thats a hazard we can't fix while allowing people open access to invent their own queries [20:04:01] walkeran: yeah, unfortunately is proprietary... [20:04:12] shield makes me sad [20:04:12] Ah, gotcha. Sorry, I should really quit lurking in here :) [20:04:13] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1627697 (10jcrespo) 5Open>3Resolved a:3jcrespo Addshore: both tickets requesting access were granted. Ping me on irc (jynus) f... [20:04:26] walkeran: shussh :D do stay on [20:04:36] That said, I'm rather enjoying the massive chatops movement. This is amazing. [20:04:38] there's probably some simple regex-y ways to limit maximum query complexity, but the better answer would be to have ES be able to set an internal complexity budget and bail if the query's too hard [20:04:54] ebernhardson: yeah, but we can do things at the nginx layer anyway. fitlering, rate limiting [20:06:33] 6operations, 10netops, 7Monitoring: Juniper monitoring - https://phabricator.wikimedia.org/T83992#1627704 (10faidon) [20:08:09] ebernhardson: when you say ' For reference in prod we are running 1.4GB of memory for every 1GB of primary shard size. We are seeing reasonable results on a 2 node lab cluster w/ 32GB memory against 50GB of primary shard.' [20:08:12] ebernhardson: what do you mean? [20:08:22] bblack: yeah, and I wonder if ES has a simple 'time limit' thing as well [20:08:26] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1001; increase weight of es1015 and es1019 (duration: 00m 11s) [20:08:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:09:18] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [20:09:46] YuviPanda: well first some backstory, i bring that up because our first attempt to backtest data was to import 50GB enwiki index to a 16GB labs instance. This ended taking ~10-30s per query [20:10:10] ahhhhh [20:10:34] oddly adding a second 16GB machine brought that down to .5-1s per query [20:10:50] so, some of those mw* failed [20:10:59] so, i just expect there to be some sort of issues related to the amount of data queried vs the amount of memory available [20:11:02] ebernhardson: which project was this in? is there a ticket or something somewhere? [20:11:21] wasn't the rsync issue solved? [20:11:27] not that oddly, with multiple copies of the same shard elasticsearch is pretty smart about routing things to where the caches are hot [20:11:32] (03PS1) 10Ottomata: Remove unused ZMQ inputs and outputs [puppet] - 10https://gerrit.wikimedia.org/r/237486 (https://phabricator.wikimedia.org/T106260) [20:12:01] jynus: it was. the last scap during swat had no errors [20:12:06] mmm [20:12:15] (03CR) 10Aaron Schulz: "Needs rebase? Anyone looking into this btw?" [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) (owner: 10Yuvipanda) [20:12:20] YuviPanda: hmm, i dunno if we made explicit tickets for it just tickets for the things we wanted to backtest [20:12:34] YuviPanda: its in the `search` project, the new instances are estest01 and estest02.search.eqiad.wmflabs [20:13:09] the first test that failed miserably was called elasticsearch-tests.search.eqiad.wmflabs iirc but it was shut down. [20:13:43] 6operations, 6Community-Advocacy, 10Traffic, 5Patch-For-Review: Fix/decom multiple-subdomain wikis in wikimedia.org - https://phabricator.wikimedia.org/T102826#1627732 (10BBlack) [20:14:02] 6operations, 10Traffic, 7HTTPS: Preload HSTS for select hostnames within wikimedia.org - https://phabricator.wikimedia.org/T111967#1621104 (10BBlack) Removed the subdomain blocker - it's not complete as a whole yet, but the parts that blocked this task are. [20:14:09] ebernhardson: also 2.5TB is just enwiki? [20:14:23] ebernhardson: oh, no, that's everything [20:14:31] YuviPanda: looking for time limits on elasticsearch queries leads to an open PR that keeps getting bumped from making it into release roadmaps (was 2.0 now 2.1) [20:14:46] (03CR) 10Madhuvishy: [C: 031] Remove unused ZMQ inputs and outputs [puppet] - 10https://gerrit.wikimedia.org/r/237486 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [20:14:50] (03PS2) 10Ottomata: Remove unused ZMQ inputs and outputs [puppet] - 10https://gerrit.wikimedia.org/r/237486 (https://phabricator.wikimedia.org/T106260) [20:14:52] YuviPanda: yup, everthing [20:15:06] I think it is the same issue rsync exits with 10 error code [20:15:34] 6operations, 10hardware-requests: Site: (eqiad) hardware access request for ElasticSearch replication to Labs - https://phabricator.wikimedia.org/T112163#1627748 (10yuvipanda) 3NEW [20:15:39] bd808: ebernhardson ^ [20:16:05] (03CR) 10Ottomata: [C: 032] Remove unused ZMQ inputs and outputs [puppet] - 10https://gerrit.wikimedia.org/r/237486 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [20:16:09] YuviPanda: the top 10 indices take up 1.4TB, everything else takes up the other 1.1TB [20:16:10] jynus: were all the failing hosts in codfw? [20:16:24] ebernhardson: when you say 'top 10 indices' whwat do you mean? [20:16:34] YuviPanda: the 10 largest. There are 2 for each wiki [20:16:36] 10 biggest wikis [20:16:38] I am checking that, many ate least [20:16:58] ebernhardson: ah, I see [20:17:05] but also eqiad rsync: failed to connect to mw1201.eqiad.wmnet [20:17:13] 6operations, 10hardware-requests: Site: (eqiad) hardware request for ElasticSearch replication to Labs - https://phabricator.wikimedia.org/T112163#1627768 (10yuvipanda) [20:17:18] oh no [20:17:25] they are all codfw [20:17:38] RECOVERY - puppet last run on mw2110 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [20:17:42] !log deployed kartotherian [20:17:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:18:40] ebernhardson: is my characterization on the hardware request ticket accurate? [20:19:08] PROBLEM - Disk space on ms-be2006 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdg1 is not accessible: Input/output error [20:19:24] it is definitelly both :-) [20:19:47] YuviPanda: yup that looks accurate [20:19:49] PROBLEM - RAID on ms-be2006 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [20:19:55] jynus: yeah I'm looking at the logs and see them from all over [20:20:01] 6operations, 6Community-Advocacy, 10Traffic, 5Patch-For-Review: Fix/decom multiple-subdomain wikis in wikimedia.org - https://phabricator.wikimedia.org/T102826#1627789 (10Jalexander) >>! In T102826#1621167, @BBlack wrote: > `www.meta` and `www.commons` I think we can just remove unilaterally at this point... [20:20:01] Copying to mw1136.eqiad.wmnet from mw1070.eqiad.wmnet -> failed [20:20:08] ebernhardson: yup, I'm requesting it 'temporarily' now so we can play with it and see how it goes, and adjust hardware requirements [20:20:17] bd808, it is too verbose, sorry [20:20:35] $ tail -1000 /a/mw-log/scap.log | ~bd808/scaplog.py |grep "failed to connect to" [20:20:48] yep, was looking at in on my backscroll [20:21:08] I was confused because some proxys were mixed, I think [20:21:20] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: Puppet has 1 failures [20:21:26] !log killed/restarted ganglia aggregator process for text-caches esams on hooft [20:21:27] 6operations, 10hardware-requests: Site: (eqiad) hardware request for ElasticSearch replication to Labs - https://phabricator.wikimedia.org/T112163#1627795 (10RobH) a:3RobH chatted in irc, yuvi says he'd need these for 3-4 weeks. (stealing for review and escalation for approvals later this week) [20:21:29] ebernhardson: primarily to see how the utilization is. do you know what needs to be opened up for https://phabricator.wikimedia.org/T109734? just port 80? [20:21:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:22:02] YuviPanda: it would talk over port 9200 [20:22:04] (03PS1) 10Hashar: Fix linting issues [debs/adminbot] - 10https://gerrit.wikimedia.org/r/237488 [20:22:06] (03PS1) 10Hashar: Add tox and flake8 [debs/adminbot] - 10https://gerrit.wikimedia.org/r/237489 [20:22:10] ebernhardson: and that'd just be http, right? [20:22:16] !log last SCAP failed on 266/466 hosts [20:22:21] YuviPanda: yup, plain http [20:22:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:22:25] (03PS1) 10Ottomata: Turn off varnishncsa udp logging for eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/237490 (https://phabricator.wikimedia.org/T106260) [20:22:38] ebernhardson: and from labs instances to this would be port 80? [20:22:46] is that all of codfw it failed for? [20:22:49] Does $::MW_APPSERVER_NETWORKS expand into ipv4 and ipv6? [20:23:09] YuviPanda: i think so, if we setup nginx on the same box. [20:23:14] yup, cool [20:23:43] 6operations, 10hardware-requests: Site: (eqiad) hardware request for ElasticSearch replication to Labs - https://phabricator.wikimedia.org/T112163#1627806 (10yuvipanda) We'll need for production hosts (mw*) jobrunners to be able to hit port 9200 on this box, and for labs instances to be able to hit port 80. B... [20:26:09] ebernhardson: so if I'm understanding this correctly, we'll just add 'labs' as an additional 'cluster' to the config in your patches, and maybe do an 'initial import' [20:26:36] Krenair: they were all over both DCs [20:26:52] * bd808 is looking to see if it was a subset of proxies [20:27:20] (03CR) 10Hashar: "Will let us get rid of the pep8 and pyflakes Jenkins jobs. The related CI patch is https://gerrit.wikimedia.org/r/237489" [debs/adminbot] - 10https://gerrit.wikimedia.org/r/237489 (owner: 10Hashar) [20:28:09] !log killed/restarted ganglia aggregator process for mobile-cache, upload cache, misc esams ... [20:28:16] (03PS2) 10Ottomata: Turn off varnishncsa udp logging for eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/237490 (https://phabricator.wikimedia.org/T106260) [20:28:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:28:35] jynus, Krenair: here are the proxies and hosts that had problems -- https://phabricator.wikimedia.org/P2008 [20:28:47] bd808, thanks [20:29:44] I am looking for the last commit after the last issue [20:29:48] (03CR) 10Ottomata: [C: 032] Turn off varnishncsa udp logging for eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/237490 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [20:30:11] the ferm patch was Iaac4b9132849114d55ade0edd098a2159e15903d [20:30:29] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to fluorine / mw-log-readers group for Addshore - https://phabricator.wikimedia.org/T111756#1627874 (10Dzahn) [fluorine:~] $ sudo -u addshore cat /a/mw-log/api.log ^ works @addshore there's the API log [20:31:23] !log turning off varnishncsa eventlogging eventlistener instances on frontend caches, it is now superseded by varnishkafka [20:31:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:31:44] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to fluorine / mw-log-readers group for Addshore - https://phabricator.wikimedia.org/T111756#1627894 (10Dzahn) [20:33:32] 6operations, 10Traffic, 5Patch-For-Review: Support ALPN + HTTP/2 - https://phabricator.wikimedia.org/T96848#1627909 (10Paladox) [20:35:54] 6operations, 7Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#1627937 (10Tgr) > Only the web dashboard designer is and that's due to some extra paranoia around discoverability purposes and possible security or performance issues. Grafana also provides discoverability via its query... [20:37:21] bd808: (back) yes, $::MW_APPSERVER_NETWORKS includes ipv4 and ipv6 [20:37:45] cool. that was just one random guess for the partial failure [20:38:13] it's sourced from the same manifests/network.pp definition as used in modules/scap [20:38:21] * bd808 makes a lot of random guesses [20:38:26] (03PS1) 10Ottomata: Parallelize 12 eventlogging client side processors on eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/237494 (https://phabricator.wikimedia.org/T104228) [20:38:31] well, it is the time! [20:38:45] to do random guesses :-) [20:38:57] (03PS2) 10Ottomata: Parallelize 12 eventlogging client side processors on eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/237494 (https://phabricator.wikimedia.org/T104228) [20:39:11] if not, we have to do it the old way, telnet/curl and iptables [20:39:39] PROBLEM - nova-compute process on labvirt1005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [20:39:41] (03PS3) 10Ottomata: Parallelize 12 eventlogging client side processors on eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/237494 (https://phabricator.wikimedia.org/T104228) [20:40:14] so, first it is the ports, because 10 is socket io error [20:41:20] (03CR) 10Ottomata: [C: 032] Parallelize 12 eventlogging client side processors on eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/237494 (https://phabricator.wikimedia.org/T104228) (owner: 10Ottomata) [20:42:38] It looks like mw1017:/usr/local/bin/mwscript is out of date? [20:43:03] Krenair, suspecting puppet issue? [20:43:06] probably unpuppetised [20:43:17] Krenair: I think it is leftover from days of old [20:43:42] Right now it doesn't work because it tries to sudo as apache rather than www-data [20:43:46] I think we only actively put mwscript on tin/terbium [20:44:12] and silver and mira, but yeah [20:44:19] greg-g: can I deploy an Echo backport? https://gerrit.wikimedia.org/r/237496 we missed it yesterday [20:44:23] what is different from a full scap to a sync-file, could that be the difference? [20:44:40] Krenair: we switched to the standard apache user (www-data) with the 14.04 update [20:45:00] jynus: really shouldn't matter. both use the same driver script on the MW server side [20:45:01] (it is another random suggestion) [20:45:12] I supposed so [20:45:49] RECOVERY - nova-compute process on labvirt1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [20:46:18] I just did a manual sync between two hosts on the list of failures and it worked fine (to mw2113 from mw2080.codfw.wmnet) [20:46:53] bd808, do you want me to retry? "have you tried to switch it on an of again" [20:46:58] the sync updated wikiversions, InitialiseSettings and db-eqiad [20:47:12] that might be the thing to do, yeah [20:47:22] Krenair: wah that's almost a year old then :) [20:47:23] !log restarting eventlogging with 12 client side processors on eventlog1001 [20:47:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:47:30] YuviPanda, what, the mwscript file? [20:47:43] yeah, over: -rwxr-xr-x 1 root root 654 Aug 8 2014 /usr/local/bin/mwscript [20:47:47] yeah, before www-data [20:48:14] but it's a 14.04 machine running hhvm [20:48:17] so how did this ever work? [20:48:34] it wasn't switched over during hhvm [20:48:37] it came after, I think [20:48:43] when I was doing the betacluster work [20:50:14] yeah, we did it when the jessie boxes were being added I think (or there abouts) [20:50:25] 12/12 proxis [20:50:28] !log jynus@tin Synchronized wmf-config/db-eqiad.php: depool es1001; increase weight of es1015 and es1019 (duration: 00m 19s) [20:50:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:50:37] legoktm: sure [20:50:41] 466/466 mw [20:50:48] emh [20:50:48] sunspots! [20:50:51] ? [20:51:00] no, really [20:51:10] this is worse than before [20:51:18] all failed? [20:51:21] before que had a problem [20:51:35] now we have a problem only sometimes (all worked) [20:52:12] should I be happy or sad? [20:53:02] as someone with root you should be sad. roots hate nondeterminism [20:54:36] aude: umm, I see that 'Update Wikidata - Fix uncaught exception on some diff pages' wasn't pulled on tin? [20:55:20] 6operations: mw1017 has outdated broken mwscript - https://phabricator.wikimedia.org/T112174#1628038 (10Krenair) 3NEW [20:55:26] does the host firewall log drops? If it does maybe someone can look at mw1201.eqiad.wmnet (most failures) around 20:08 [20:56:02] I think mor* usually let those in logging mode for a while when first setup [20:56:09] aude: nvm, it was pulled, just weirdly [20:56:30] SMalyshev: wanna try out something with the icinga commands? [20:56:43] mutante: sure [20:57:17] SMalyshev: so you can't send anything right now, correct? it tells you some permission denied thing [20:57:51] twentyafterfour: umm, why did you make a local commit to wmf22 to pull in the echo change? [20:58:44] mutante: right, I can't disable notifications etc. [20:58:52] mutante: says "Not Authorized" [20:59:10] bd808: there's no firewall on mw1201 [20:59:36] ah right. so that shoots that idea down entirely [20:59:36] so, that is an X-file, for sure [20:59:40] is /var/cache/hhvm/cli.hhbc.sq3 supposed to be owned by root on some random hosts? it's usually www-data, and seems to prevent `php -a` from being run by www-data [20:59:44] SMalyshev: ok, so next step let's make you a contact for a specific service to test this with [21:00:34] checking if you are in a contact group yet [21:00:48] Krenair: it "should" always be owned by www-data. if it's owned by root that means something is running php without proper sudo wrapping [21:01:09] so I guess twentyafterfour isn't around [21:01:12] let me check whether rsyncd logged something on mw1201 [21:01:17] can someone help me figure out what's going on on tin? [21:01:29] what's up? [21:01:37] SMalyshev: i guess we should make a new contact group first, and call it "wdqs-admins" or so. does that make sense? [21:01:46] mutante: yes, it does [21:01:58] he created a local commit bumping the Echo submodule, but now my core submodule bump is unable to rebase [21:02:03] and it spit out a bunch of scary git errors [21:02:19] but bd808 that should have preventy me from doing it again, shouldn't it? [21:02:25] RECOVERY - Disk space on ms-be2006 is OK: DISK OK [21:02:54] legoktm, ahh... it's that [21:03:00] Krenair: http://fpaste.org/265833/18975144/raw/ [21:03:01] 6operations: mw1017 has outdated broken mwscript - https://phabricator.wikimedia.org/T112174#1628084 (10bd808) mwscript is provisioned by scap::scripts. That class is applied directly on terbium and indirectly on hosts with role::deployment::server (tin, mira) and role::nova::manager (silver). If /usr/local/bin... [21:03:48] legoktm, oh wow, I haven't seen something like that before [21:04:09] (03PS1) 10Dzahn: icinga: add new contact group wdqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/237499 (https://phabricator.wikimedia.org/T111243) [21:04:13] jynus: moritzm is reminding us that base::firewall isn't applied in eqiad yet; just codfw [21:04:22] true [21:04:32] so the failures we saw in eqiad were caused by something else [21:04:53] still, the errors was a socket io error [21:04:55] Krenair: so I'm not sure what to do...can I just throw away his commit and bump it properly? [21:05:01] (and how do I do that :/) [21:05:06] aude, you have an un-rebased change on tin [21:05:19] (03PS2) 10Dzahn: icinga: add new contact group wdqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/237499 (https://phabricator.wikimedia.org/T111243) [21:05:30] Krenair: the Wikidata one was pulled into the Wikidata repo, but not core... [21:05:33] (03PS3) 10Dzahn: icinga: add new contact group wdqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/237499 (https://phabricator.wikimedia.org/T111243) [21:05:43] it's got a core submodule update [21:06:06] could it be transient network error? [21:06:09] (03CR) 10Dzahn: [C: 032] icinga: add new contact group wdqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/237499 (https://phabricator.wikimedia.org/T111243) (owner: 10Dzahn) [21:06:20] ahh [21:06:40] SMalyshev: ^ there. after this we should pick a service and give it that contact group [21:07:05] jynus: possibly? but most of those connections should be within the same row. seems weird that so many would hit at the same time [21:07:06] probably in hiera [21:07:34] 6operations, 10Wikimedia-Git-or-Gerrit: Upgrade gerrit to latest 2.8.x (minor version upgrade) - https://phabricator.wikimedia.org/T65847#1628121 (10greg) p:5Normal>3Low Reducing priority as the energy spent on code-review tools in the near term (ie: for the next two quarters) will be spent on migrating to... [21:07:37] mutante: ok, I guess everything around wdqs_eqiad group should be there [21:07:49] I do not know, I predict that would resurface again [21:08:01] and we may have more information [21:08:09] looking at syslog on mw1201 there was a puppet change in /etc/rsync.d/frag-common (19:42 UTC), was that before the scap run failed? [21:08:35] yes, just before I think [21:08:37] a change in the hosts allow stanza [21:08:52] SMalyshev: i'll try "role/common/wdqs.yaml" in hiera [21:09:00] which would have restarted rsyncd as puppet carried it out [21:09:00] legoktm, don't touch it, am poking [21:09:14] ok, thanks [21:09:44] moritzm: the failed sync was at 20:08 [21:09:58] Krenair: twentyafterfour deployed it for me [21:10:27] * aude assumes it really is deployed [21:10:53] in the preceding change one allowed network was changed from 2620:0:861:2:: to 2620:0:860::/46 [21:11:02] mutante: maybe makes sense to keep irc (or if possible irc that goes to wikmedia-search) in the contacts? [21:11:20] * aude knows it is deployed [21:11:32] legoktm, how's that? [21:11:35] mutante: or maybe irc-wikidata. After all it's *wikidata* query service :) [21:12:03] ah, and later on in syslog is shows: [21:12:05] Krenair: awesome, thanks [21:12:09] Krenair: what did you do? [21:12:16] bind() failed, Address already in use [21:12:16] legoktm, magic [21:12:31] SMalyshev: we have wikidata-feed for stuff like gerrit bot [21:12:33] (03PS1) 10Dzahn: icinga: set admin groups for common/wdqs in hiera [puppet] - 10https://gerrit.wikimedia.org/r/237504 (https://phabricator.wikimedia.org/T111243) [21:12:37] so the restart of rsynd failed after the "allowed networks" were changed [21:12:37] that sound interesting, moritzm [21:12:54] suppose icinga is more important though and maybe ok in the main wikidata channel [21:12:54] probably because puppet doesn't wait until the older process is shut down before starting the new one [21:12:59] !log legoktm@tin Synchronized php-1.26wmf22/extensions/Echo/modules: Align popup footer buttons to take 50% width each (duration: 00m 15s) [21:13:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:13:11] aude: SMalyshev: we already have icinga-wm in #wikidata [21:13:23] can do [21:13:31] so, moritzm, a race condition (shouldn't be very frequent)? [21:13:31] The socket error is also logged: "rsync error: error in socket IO (code 10) at socket.c(555) [21:13:33] but kind of separate from the permissions [21:13:34] mutante: yeah that's what I mean [21:13:54] mutante: ok [21:13:54] legoktm, IIRC I made it start rebasing, went into the submodule and reset --hard to the correct new commit for the submodule update, cherry-pick'd the live patch back, staged the submodule change and rebase --continue'd [21:13:56] jynus: I think so, I'm having at look at what puppet does [21:14:10] I think [21:14:16] Krenair: sounds like magic :P thanks [21:14:31] I've had to do something like that a few times before, quite a pain [21:14:34] Krenair: will I/you have to do something similar if we have to backport an Echo patch again? [21:14:43] especially when you have to do it on both branches [21:14:44] because I know for sure we'll need to backport more stuff :/ [21:14:44] yeah [21:14:49] SMalyshev: and yes, i'm going to change contacts from "admins" to "admins,wdqs-admins" and "admins" has member "irc" [21:15:46] SMalyshev: mutante are there instructions on how to restart it? (and who has permission?) [21:15:57] mutante: ah, ok [21:16:00] * aude assumes any ops [21:16:10] aude: ops have permissions for everything :) [21:16:18] yeah [21:16:18] moritzm, do not lose too much time, it is too late for us [21:16:26] do they know what to do? [21:16:32] aude: docs are here: https://www.mediawiki.org/wiki/Wikidata_query_service/Implementation [21:16:39] k [21:16:49] aude: https://wikitech.wikimedia.org/wiki/Icinga#IRC_notification but yea, it needs a root [21:16:57] we had enough trouble with magnus' query service and labs breaking on the weekends [21:17:11] it's a bug somewhere the rsync::server module, but I don't know the puppet primitives well enough to pin-point, I'd say let's create a ticket and get rsync::server fixed [21:17:11] aude: (comments welcome btw - as I deployed it, there may be things that are obvious for me but less obvious for others :) [21:17:13] oh, you meant restarting the service or the bot ?:) [21:17:22] the service [21:17:25] ok [21:17:34] * aude can wait and see how often it is needed [21:17:41] (hopefully rarely) [21:17:56] aude: also, anybody with login/sudo can restart the services [21:18:27] the new part for me is that i'm trying to use hiera to set the groups, i'll check if it works on neon [21:18:35] aude: :( it shouldn't have broken recently, though? [21:18:39] (Magnus' query service) [21:18:44] YuviPanda: not recently [21:18:48] yay :) [21:18:58] I think it's been unbroken since the NFS outage when I ripped NFS from that project [21:18:59] SMalyshev: sounds ok for now [21:19:05] bd808: ^ that seems to have been caused by the preceding rsyncd restart, we'll should get that fixed, but it shoudn't affect further deployments today unless there's an additional change to the rsyncd config [21:19:26] thanks for digging [21:19:29] jynus: I'll create a ticket tomorrow, I'm calling it a day [21:20:09] (03PS2) 10Dzahn: icinga: set contact groups for common/wdqs in hiera [puppet] - 10https://gerrit.wikimedia.org/r/237504 (https://phabricator.wikimedia.org/T111243) [21:20:57] aude: yeah, there's wdqs-admins which is me for now, but if needed that's where it should be extended [21:21:48] (03PS3) 10Dzahn: icinga: set contact groups for common/wdqs in hiera [puppet] - 10https://gerrit.wikimedia.org/r/237504 (https://phabricator.wikimedia.org/T111243) [21:22:00] SMalyshev: ok [21:22:02] (03CR) 10Dzahn: [C: 032] icinga: set contact groups for common/wdqs in hiera [puppet] - 10https://gerrit.wikimedia.org/r/237504 (https://phabricator.wikimedia.org/T111243) (owner: 10Dzahn) [21:25:27] now it takes a bit, waiting for puppet on neon [21:25:37] but will get back to you [21:26:36] mutante: https://gerrit.wikimedia.org/r/#/c/235065/ [21:26:46] Otherwise that does nothing :) [21:28:24] umpf, that was exactly what i wanted to check [21:29:00] then how does it already get use in LVS config? [21:29:13] common/lvs/configuration.yaml [21:31:28] mutante: I see no evidence of its use [21:31:36] this is an example of https://de.wiktionary.org/wiki/vom_H%C3%B6lzchen_aufs_St%C3%B6ckchen_kommen :p which is untranslatable [21:33:12] at least it's not a bad change and just preparing for the future [21:36:33] JohnFLewis: i'm gonna use the role class for now (at least not site.pp) [21:37:05] should apply not only to the hosts but also to all services on that host [21:37:27] But when will the future be? :£ [21:37:46] jynus: ^^ see gerrit patch I linked above [21:39:40] (03CR) 10Dzahn: [C: 04-2] "nevermind, this won't work yet before something like https://gerrit.wikimedia.org/r/#/c/235065/" [puppet] - 10https://gerrit.wikimedia.org/r/237301 (owner: 10Dzahn) [21:40:42] (03PS2) 10Dzahn: contint: tweak Icinga contact group for prod servers [puppet] - 10https://gerrit.wikimedia.org/r/237045 (owner: 10Hashar) [21:40:54] (03PS1) 10Dzahn: wdqs: set icinga contact groups, add wdqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/237508 (https://phabricator.wikimedia.org/T111243) [21:41:18] (03CR) 10Dzahn: [C: 032] contint: tweak Icinga contact group for prod servers [puppet] - 10https://gerrit.wikimedia.org/r/237045 (owner: 10Hashar) [21:41:58] (03PS2) 10Dzahn: wdqs: set icinga contact groups, add wdqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/237508 (https://phabricator.wikimedia.org/T111243) [21:44:30] download.wikimedia.org is 4 times slower than usual [21:44:36] at least 4 times [21:44:53] is it worth notifying anywhere? [21:45:16] (03PS3) 10Dzahn: wdqs: set icinga contact groups, add wdqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/237508 (https://phabricator.wikimedia.org/T111243) [21:45:37] Vito: yes, ideally can you create a ticket? [21:45:54] (03PS4) 10Dzahn: wdqs: set icinga contact groups, add wdqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/237508 (https://phabricator.wikimedia.org/T111243) [21:46:47] (03CR) 10Dzahn: [C: 032] wdqs: set icinga contact groups, add wdqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/237508 (https://phabricator.wikimedia.org/T111243) (owner: 10Dzahn) [21:47:12] mutante: it could be a temporary carrier problem so I'm not sure a task is the best way to report [21:47:15] anyway let's do [21:49:05] (03PS1) 10Dzahn: Revert "phab: disable tools crons" [puppet] - 10https://gerrit.wikimedia.org/r/237513 [21:49:20] Vito: thanks [21:49:32] uhm [21:49:39] which project? [21:50:14] Vito: "operations" and we'll sort it out [21:52:22] (03PS2) 10Dzahn: Revert "phab: disable tools crons" [puppet] - 10https://gerrit.wikimedia.org/r/237513 (https://phabricator.wikimedia.org/T112135) [21:52:44] (03PS3) 10Dzahn: Revert "phab: disable tools crons" [puppet] - 10https://gerrit.wikimedia.org/r/237513 (https://phabricator.wikimedia.org/T112135) [21:53:40] (03PS4) 10Dzahn: Revert "phab: disable tools crons" [puppet] - 10https://gerrit.wikimedia.org/r/237513 (https://phabricator.wikimedia.org/T112135) [21:54:00] (03PS1) 10Dzahn: phab: re-enable dump script [puppet] - 10https://gerrit.wikimedia.org/r/237514 (https://phabricator.wikimedia.org/T112135) [21:54:10] 6operations: download.wikimedia.org is slow from Telecom Italia - https://phabricator.wikimedia.org/T112190#1628429 (10Vituzzu) [21:54:20] https://phabricator.wikimedia.org/T112190 here's mutante [21:54:47] (03PS2) 10Dzahn: phab: re-enable dump script [puppet] - 10https://gerrit.wikimedia.org/r/237514 (https://phabricator.wikimedia.org/T112135) [21:55:21] Vito: alright, looks good [21:55:22] 6operations, 10netops: download.wikimedia.org is slow from Telecom Italia - https://phabricator.wikimedia.org/T112190#1628434 (10Dzahn) [21:55:57] mine it's a simple g.dmt ADSL behind an ATM over PDH wireless link but still I never seen it running so slowly [21:56:10] oh I was about to use "netops" but I didn't find any description [21:56:32] (03PS3) 10Dzahn: phab: re-enable dump script [puppet] - 10https://gerrit.wikimedia.org/r/237514 (https://phabricator.wikimedia.org/T112135) [21:56:44] bblack, "Error: 403, Insecure POST Forbidden - use HTTPS" on beta? [21:56:46] Vito: it's kind of new [21:56:49] the tag i mean [21:57:08] mutante: my connection isn't for sure :p [21:58:32] (03CR) 10Dzahn: [C: 032] phab: re-enable dump script [puppet] - 10https://gerrit.wikimedia.org/r/237514 (https://phabricator.wikimedia.org/T112135) (owner: 10Dzahn) [22:00:16] 6operations, 10netops: download.wikimedia.org is slow from Telecom Italia - https://phabricator.wikimedia.org/T112190#1628450 (10MaxSem) Potentially related: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Images_won.27t_load [22:02:49] 6operations, 10netops: download.wikimedia.org is slow from Telecom Italia - https://phabricator.wikimedia.org/T112190#1628468 (10Vituzzu) Yep! Yesterday or maybe the day before some thumbsnails (namely old file versions in Commons image history) didn't load. [22:06:29] 6operations, 6Phabricator: phabricator dump script should use slave db, not master - https://phabricator.wikimedia.org/T112193#1628514 (10Dzahn) 3NEW [22:07:39] 6operations, 10Beta-Cluster, 10Traffic: Beta giving Error: 403, Insecure POST Forbidden - https://phabricator.wikimedia.org/T112195#1628536 (10Krenair) 3NEW [22:09:05] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1628547 (10Dzahn) I re-enabled the dump script for T112135 (follow-up to make it use the slave at T112193). I did _not_ disable 2 other disabled crons called "b... [22:09:30] (03PS5) 10Dzahn: Revert "phab: disable tools crons" [puppet] - 10https://gerrit.wikimedia.org/r/237513 (https://phabricator.wikimedia.org/T112135) [22:11:27] legoktm: that local commit was due to a local commit in the subrepo. I just did it so that there wouldn't be uncommitted changes in git status [22:11:55] twentyafterfour: but that's not necessary. Just pull/rebase in core, and run "git submodule update Echo" [22:12:34] legoktm: ok, you want me to reset the commit? [22:12:41] yes please [22:13:47] (03PS8) 10Yuvipanda: redis: Have redis machines overcommit if persistance is enabled [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) [22:14:26] (03CR) 10Ori.livneh: [C: 04-1] "Get food first. Looks good otherwise." [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) (owner: 10Yuvipanda) [22:15:04] (03CR) 10Yuvipanda: "MAKE ME" [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) (owner: 10Yuvipanda) [22:15:44] legoktm: so git submodule update won't wipe out the submodule local commit? [22:15:58] no, it should rebase [22:16:29] PROBLEM - HTTP on dataset1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:16:55] mutante, isn't download. on dataset1001? [22:17:42] Vito, ^ [22:17:45] Krenair: not anymore it seems [22:17:52] heh [22:17:53] download 600 IN DYNA geoip!text-addrs [22:18:06] } elsif (req.http.Host == "download.wikimedia.org") { [22:18:06] set req.backend = dataset1001; [22:18:19] but that isnt the misc cluster [22:18:33] that's the normal "text" cluster [22:18:41] ... true. [22:19:00] and that might be the change here .. [22:19:44] legoktm: ok it rebased, I guess that's how it should look [22:19:51] and now text load at download is pretty slow [22:26:07] (03PS1) 10Alex Monk: Don't try to enforce secure POSTs on beta [puppet] - 10https://gerrit.wikimedia.org/r/237523 (https://phabricator.wikimedia.org/T112195) [22:31:02] (03CR) 10Alex Monk: "I guess we want to do something with hiera here, this is just the hack I put on deployment-puppetmaster so I could continue what I was doi" [puppet] - 10https://gerrit.wikimedia.org/r/237523 (https://phabricator.wikimedia.org/T112195) (owner: 10Alex Monk) [22:33:08] (03CR) 10Dzahn: "can't we stop making the insecure requests instead of introducing another special rule for labs? i assume it's blocked by T50501 though, r" [puppet] - 10https://gerrit.wikimedia.org/r/237523 (https://phabricator.wikimedia.org/T112195) (owner: 10Alex Monk) [22:34:36] (03PS1) 10Andrew Bogott: Move labvirt1008 to kilo [puppet] - 10https://gerrit.wikimedia.org/r/237524 [22:35:05] (03CR) 10Alex Monk: "That was my assumption, yes." [puppet] - 10https://gerrit.wikimedia.org/r/237523 (https://phabricator.wikimedia.org/T112195) (owner: 10Alex Monk) [22:35:15] 6operations, 10Beta-Cluster, 6Labs, 10Labs-Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1628650 (10Dzahn) one more reason we should have this is to avoid needing https://gerrit.wikimedia.org/r/237523 (T105794, T112195) [22:35:42] (03CR) 10Andrew Bogott: [C: 032] Move labvirt1008 to kilo [puppet] - 10https://gerrit.wikimedia.org/r/237524 (owner: 10Andrew Bogott) [22:36:35] mutante, you probably don't need a hack like this for it [22:36:42] it can probably be much nicer [22:38:10] bd808: why would '+message:*CentralAuth' give kibana results but not '+message:*Central'? [22:39:39] Because "CentralAuth" is a word and you don't have a trailing wildcard? [22:40:06] also "message:*Foo" is the same as "message:Foo" [22:40:21] I think... [22:40:54] it's lucene search under the hood so it is a token search [22:44:11] ori: AaronSchulz am going to merge https://gerrit.wikimedia.org/r/#/c/194095/ [22:44:22] will you guys be around for a bit in case redis starts acting up? [22:44:27] or should I postpone deployment? [22:47:32] (03CR) 10Yurik: "Brandon, did you see our comments above?" [puppet] - 10https://gerrit.wikimedia.org/r/237368 (owner: 10BBlack) [22:47:38] bblack, ^ [22:48:58] YuviPanda: I assume so [22:49:01] YuviPanda: i'm here [22:49:58] ori: ok, remove -1? :) [22:50:14] (03CR) 10Ori.livneh: [C: 031] redis: Have redis machines overcommit if persistance is enabled [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) (owner: 10Yuvipanda) [22:51:36] nothing for swat? [22:54:28] (03CR) 10GWicke: [C: 031] cassandra: install certs and CA from private.git [puppet] - 10https://gerrit.wikimedia.org/r/237397 (https://phabricator.wikimedia.org/T108953) (owner: 10Filippo Giunchedi) [22:55:16] (03CR) 10MarcoAurelio: [C: 031] "I think that it makes sense to do it. Per once a wiki is c" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224771 (owner: 10Dereckson) [22:55:57] (03PS1) 10Alex Monk: Revert "Add interwiki-labs.cdb" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237529 [22:57:20] (03CR) 10Alex Monk: Cleans up wikimania2013. extraneous rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224771 (owner: 10Dereckson) [22:58:12] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint, 7Icinga, 5Patch-For-Review: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1628808 (10Dzahn) a:3Dzahn [22:58:16] (03PS5) 10Alex Monk: noindex user namespace on en.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237330 (https://phabricator.wikimedia.org/T104797) (owner: 10Mdann52) [22:58:24] (03CR) 10Alex Monk: noindex user namespace on en.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237330 (https://phabricator.wikimedia.org/T104797) (owner: 10Mdann52) [23:00:04] RoanKattouw ostriches rmoen Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150910T2300). [23:01:55] okay then [23:02:28] so, nothing on the deployments calendar... let's see if I can pull some trivial stuff out of the review queue [23:03:06] (03PS2) 10Alex Monk: Cleans up wikimania2013. extraneous rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224771 (owner: 10Dereckson) [23:03:27] (03CR) 10Alex Monk: [C: 032] Cleans up wikimania2013. extraneous rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224771 (owner: 10Dereckson) [23:03:33] (03Merged) 10jenkins-bot: Cleans up wikimania2013. extraneous rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224771 (owner: 10Dereckson) [23:04:37] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/224771 (duration: 00m 12s) [23:04:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:11:15] 6operations, 10Wikimedia-Git-or-Gerrit: Wikimedia Gerrit doesn't work if OpenSSH version is higher than 7.0 - https://phabricator.wikimedia.org/T112025#1628930 (10greg) [23:11:41] (03CR) 10Alex Monk: [C: 032] "Yep, seems to work. Thanks." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [23:12:13] (03Merged) 10jenkins-bot: wikitech: Local logging config for ldap debugging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [23:12:50] jynus around for a quikc pm? [23:13:09] !log krenair@tin Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/221825 (duration: 00m 13s) [23:13:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:13:28] addshore, * [jynus] idle 01:56:59, signon: Thu Sep 10 09:11:30 [23:13:47] ahh, will leave it to tommorrow! :) [23:18:03] (03PS9) 10Yuvipanda: redis: Have redis machines overcommit if persistance is enabled [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) [23:18:17] (03CR) 10Yuvipanda: [C: 032 V: 032] redis: Have redis machines overcommit if persistance is enabled [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) (owner: 10Yuvipanda) [23:22:11] ori: merged [23:22:15] thanks [23:26:17] ori: looks ok (on mc1001) [23:29:34] YuviPanda: sweet, thanks [23:35:09] PROBLEM - puppet last run on rcs1001 is CRITICAL: CRITICAL: puppet fail [23:35:29] hmm [23:36:08] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: puppet fail [23:36:12] am looking [23:38:23] (03PS1) 10Dzahn: wdqs: set icinga contact group on node [puppet] - 10https://gerrit.wikimedia.org/r/237535 [23:38:39] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: undefined method `empty?' for false:FalseClass at /etc/puppet/modules/redis/manifests/init.pp:49 on node rcs1001.eqiad.wmnet [23:39:36] lolpuppet :'( [23:39:58] false:FalseClass ? [23:40:39] Krenair: https://gerrit.wikimedia.org/r/#/c/233659/ [23:41:59] it fixed the cert issue for download.wm [23:42:10] that's why it's not dumps directly anymore [23:42:18] ah [23:43:16] (03PS1) 10Yuvipanda: redis: Handle the case when $persist is false [puppet] - 10https://gerrit.wikimedia.org/r/237536 [23:43:20] ori: ^ fix rcstream puppetfail [23:43:22] (03CR) 10jenkins-bot: [V: 04-1] redis: Handle the case when $persist is false [puppet] - 10https://gerrit.wikimedia.org/r/237536 (owner: 10Yuvipanda) [23:43:35] (03PS2) 10Yuvipanda: redis: Handle the case when $persist is false [puppet] - 10https://gerrit.wikimedia.org/r/237536 [23:43:59] (03CR) 10Ori.livneh: [C: 031] redis: Handle the case when $persist is false [puppet] - 10https://gerrit.wikimedia.org/r/237536 (owner: 10Yuvipanda) [23:44:11] (03CR) 10Yuvipanda: [C: 032 V: 032] redis: Handle the case when $persist is false [puppet] - 10https://gerrit.wikimedia.org/r/237536 (owner: 10Yuvipanda) [23:45:28] (03CR) 10Alex Monk: [C: 032] Enable Extension:GuidedTour on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237056 (https://phabricator.wikimedia.org/T107862) (owner: 10MarcoAurelio) [23:45:48] (03Merged) 10jenkins-bot: Enable Extension:GuidedTour on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237056 (https://phabricator.wikimedia.org/T107862) (owner: 10MarcoAurelio) [23:47:55] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237056/ (duration: 00m 11s) [23:48:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:48:19] ori: ok now [23:49:19] RECOVERY - puppet last run on rcs1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:50:53] 6operations, 10netops: download.wikimedia.org is slow from Telecom Italia - https://phabricator.wikimedia.org/T112190#1629057 (10Dzahn) for completeness because we talked about it on IRC: Recently on Aug 25 with [[ https://gerrit.wikimedia.org/r/#/c/233659/2 | change 233659 ]] download.wm has been switched t... [23:51:36] (03CR) 10Alex Monk: [C: 032] Change wgSitename, wgMetaNamespace and wgMetaNamespaceTalk for srwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237064 (https://phabricator.wikimedia.org/T111247) (owner: 10MarcoAurelio) [23:51:58] (03Merged) 10jenkins-bot: Change wgSitename, wgMetaNamespace and wgMetaNamespaceTalk for srwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237064 (https://phabricator.wikimedia.org/T111247) (owner: 10MarcoAurelio) [23:52:26] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237064/ (duration: 00m 11s) [23:52:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:58:10] 6operations, 10Beta-Cluster, 10Traffic, 5Patch-For-Review: Beta giving Error: 403, Insecure POST Forbidden - https://phabricator.wikimedia.org/T112195#1629068 (10Dzahn) importing gerrit comments: // Alex Monk 15:31 Patch Set 1: I guess we want to do something with hiera here, this is just the hack I put... [23:59:37] 6operations, 10Beta-Cluster, 6Labs, 10Labs-Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#527164 (10Dzahn) [23:59:39] 6operations, 10Beta-Cluster, 10Traffic, 5Patch-For-Review: Beta giving Error: 403, Insecure POST Forbidden - https://phabricator.wikimedia.org/T112195#1629090 (10Dzahn)