[00:00:00] I wasn't allowed to change the topic here last I tried. I went ahead and added your request to the meeting agenda. [00:09:23] thanks shdubsh [00:10:12] looks like I don't have the mojo for topic change either. I bet the channel has topiclock on [00:10:31] i think it was due to that spam attack a few years ago. [00:20:31] !log add port 22 in cloud-in4 term labsdb [00:20:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:48:36] PROBLEM - High load average on labstore1007 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [24.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [01:22:09] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:24:13] 10Operations, 10Parsoid, 10RESTBase, 10Traffic, and 5 others: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 (10mobrovac) >>! In T215956#4958423, @BBlack wrote: > Correct me if I'm wrong, but I would think all VE traffic would already be uncacheable at the Varnish level a... [01:53:29] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [02:17:27] (03PS1) 10Andrew Bogott: labsdb: open up some firewall ports for VM access [puppet] - 10https://gerrit.wikimedia.org/r/490960 [02:18:05] (03CR) 10jerkins-bot: [V: 04-1] labsdb: open up some firewall ports for VM access [puppet] - 10https://gerrit.wikimedia.org/r/490960 (owner: 10Andrew Bogott) [02:25:46] (03PS2) 10Andrew Bogott: labsdb: open up some firewall ports for VM access [puppet] - 10https://gerrit.wikimedia.org/r/490960 [02:26:20] (03CR) 10jerkins-bot: [V: 04-1] labsdb: open up some firewall ports for VM access [puppet] - 10https://gerrit.wikimedia.org/r/490960 (owner: 10Andrew Bogott) [02:26:56] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] "Overriding the linter because this is a soon-to-be removed class for a soon-to-be-decom'd server" [puppet] - 10https://gerrit.wikimedia.org/r/490960 (owner: 10Andrew Bogott) [02:27:18] RECOVERY - High load average on labstore1007 is OK: OK: Less than 85.00% above the threshold [16.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [02:39:24] PROBLEM - High load average on labstore1007 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [24.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [06:28:31] PROBLEM - Check systemd state on netmon2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:32:29] PROBLEM - puppet last run on cp5008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:23] RECOVERY - Check systemd state on netmon2001 is OK: OK - running: The system is fully operational [06:43:17] PROBLEM - Check systemd state on netmon2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:58:33] RECOVERY - puppet last run on cp5008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:08:49] RECOVERY - Check systemd state on netmon2001 is OK: OK - running: The system is fully operational [08:49:51] 10Operations, 10Performance-Team, 10Traffic, 10media-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10aaron) >>! In T211661#4928637, @ori wrote: > It seems that Swift has [[ https://docs.openstack.org/swift/latest/api/object-exp... [10:13:23] (03PS1) 10Elukey: Remove profile::oozie::client global configuration [puppet] - 10https://gerrit.wikimedia.org/r/490965 [10:17:39] (03PS2) 10Elukey: Remove profile::oozie::client global configuration [puppet] - 10https://gerrit.wikimedia.org/r/490965 [10:20:11] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/14709/" [puppet] - 10https://gerrit.wikimedia.org/r/490965 (owner: 10Elukey) [10:42:43] as I read things via special:version Extension:LabeledSectionTransclusion is installed on wikimaniawiki however, it isn't functioning (same pages tested plWikisource) for different results [10:43:01] what am I missing? [10:45:38] or more accurately I am not seeing #lst at wikimaniawiki [11:01:20] Just got an exception.. [11:01:24] "Original exception: [XGftIQpAAEMAADgWX4AAAAAS] 2019-02-16 11:00:46: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError"" [11:01:26] :O [11:35:42] not to worry, typos don't help [14:17:52] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T194855 (10aborrero) Checking cloudvirt1020: {F28229232} {F28229236} So this should be ready to go from this point of view. [14:21:57] !log T194855 cloudvirt1020 is poweroff, waiting for disk setup before installing [14:22:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:00] T194855: Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T194855 [14:45:04] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1009: evaluate upgrading to 10G - https://phabricator.wikimedia.org/T216324 (10aborrero) p:05Triage→03High [15:25:41] Hi, I have a question about SWAT deployment [15:26:03] How can I submit a patch to SWAT [15:47:11] WQL: you add it to the deployment calendar after making sure it's passed through code review https://wikitech.wikimedia.org/wiki/Deployments#Upcoming [15:47:51] thx [15:47:54] if you're not sure if the patch is ready, you can ask in #wikimedia-releng when people are around (more likely Mon through Fri) [15:49:11] err let me see...  [15:49:27] I have added into the calender [15:49:28] https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=1816328&oldid=1816244 [15:49:37] Tuesday, February 19 [15:49:44] is this right? [15:59:34] 10Operations, 10DNS, 10Domains, 10Traffic, 10HTTPS: Merge Wikipedia subdomains into one, to discourage censorship - https://phabricator.wikimedia.org/T215071 (10Liuxinyu970226) @Krenair > How exactly would certs from LetsEncrypt be a downgrade in security? Because, as I've tried once on my localhost, th... [16:01:31] wql... that looks... like they left [16:01:39] oh well I was going to say 'it looks fine', heh [16:09:13] (03PS3) 10ArielGlenn: misc dumps: report names of most recent failed wikis if we bail out [dumps] - 10https://gerrit.wikimedia.org/r/488261 [16:10:39] (03CR) 10ArielGlenn: [C: 03+2] misc dumps: report names of most recent failed wikis if we bail out [dumps] - 10https://gerrit.wikimedia.org/r/488261 (owner: 10ArielGlenn) [16:11:34] (03PS2) 10ArielGlenn: generate recombined multistream index file without (m)awk [dumps] - 10https://gerrit.wikimedia.org/r/490591 (https://phabricator.wikimedia.org/T215414) [16:11:39] 10Operations, 10Performance-Team, 10Traffic, 10media-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) Goos point. I think we just ought to clean up that loophole in Varnish, which is already removing the most obvious cac... [16:23:14] (03CR) 10ArielGlenn: [C: 03+2] generate recombined multistream index file without (m)awk [dumps] - 10https://gerrit.wikimedia.org/r/490591 (https://phabricator.wikimedia.org/T215414) (owner: 10ArielGlenn) [16:25:59] !log ariel@deploy1001 Started deploy [dumps/dumps@8f83eea]: fix up multistream index file recombines for large files; better errors for misc dumps failures [16:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:03] !log ariel@deploy1001 Finished deploy [dumps/dumps@8f83eea]: fix up multistream index file recombines for large files; better errors for misc dumps failures (duration: 00m 03s) [16:26:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:30] !help [16:42:30] want docs? ask for "!wm-bot". all keywords? try "@regsearch .*" [17:06:23] (03PS1) 10Arturo Borrero Gonzalez: wmcs: introduce new toolsdb primary role [puppet] - 10https://gerrit.wikimedia.org/r/491003 (https://phabricator.wikimedia.org/T193264) [17:15:34] (03PS1) 10Bstorm: toolsdb: fix up the config for the new server [puppet] - 10https://gerrit.wikimedia.org/r/491004 (https://phabricator.wikimedia.org/T216208) [17:16:31] (03CR) 10jerkins-bot: [V: 04-1] toolsdb: fix up the config for the new server [puppet] - 10https://gerrit.wikimedia.org/r/491004 (https://phabricator.wikimedia.org/T216208) (owner: 10Bstorm) [17:17:41] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: refresh FQDN A record for tools.db.svc.eqiad.wmflabs [puppet] - 10https://gerrit.wikimedia.org/r/491005 (https://phabricator.wikimedia.org/T193264) [17:19:25] (03PS2) 10Bstorm: toolsdb: fix up the config for the new server [puppet] - 10https://gerrit.wikimedia.org/r/491004 (https://phabricator.wikimedia.org/T216208) [17:20:28] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] toolsdb: fix up the config for the new server [puppet] - 10https://gerrit.wikimedia.org/r/491004 (https://phabricator.wikimedia.org/T216208) (owner: 10Bstorm) [17:25:05] (03PS3) 10Bstorm: toolsdb: fix up the config for the new server [puppet] - 10https://gerrit.wikimedia.org/r/491004 (https://phabricator.wikimedia.org/T216208) [17:26:08] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] toolsdb: fix up the config for the new server [puppet] - 10https://gerrit.wikimedia.org/r/491004 (https://phabricator.wikimedia.org/T216208) (owner: 10Bstorm) [17:26:56] (03CR) 10Bstorm: [C: 03+2] toolsdb: fix up the config for the new server [puppet] - 10https://gerrit.wikimedia.org/r/491004 (https://phabricator.wikimedia.org/T216208) (owner: 10Bstorm) [17:32:38] (03CR) 10Bstorm: wmcs: introduce new toolsdb primary role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/491003 (https://phabricator.wikimedia.org/T193264) (owner: 10Arturo Borrero Gonzalez) [17:43:01] (03PS6) 10Zoranzoki21: Add new throttle rule for T215839 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489819 (https://phabricator.wikimedia.org/T215839) [17:45:29] (03PS4) 10ArielGlenn: showcrcs: util to write out crc information from a bzip2 file [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/490299 (https://phabricator.wikimedia.org/T216009) [17:57:49] PROBLEM - Host thumbor2002 is DOWN: PING CRITICAL - Packet loss = 100% [18:05:51] (03PS11) 10ArielGlenn: dumps:nfs: add data types, move variables to parameters and Hiera [puppet] - 10https://gerrit.wikimedia.org/r/479335 (owner: 10Dzahn) [18:08:17] (03CR) 10ArielGlenn: [C: 03+2] dumps:nfs: add data types, move variables to parameters and Hiera [puppet] - 10https://gerrit.wikimedia.org/r/479335 (owner: 10Dzahn) [18:16:17] 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10bd808) > The plan has syslog + json as formatting, since that's what we use for logstash already and preserves more information. Although we could have... [18:33:10] sigh, downtime expired on thumbor2002 [18:33:14] sorry [18:43:22] (03PS10) 10Paladox: Add support for cherry picking with merge conflicts from the UI (PolyGerrit) [software/gerrit/plugins/wikimedia] - 10https://gerrit.wikimedia.org/r/490225 [18:45:19] finally got ^^ working! [18:50:51] (03PS1) 10Andrew Bogott: labsdb: add ::role::mariadb::ferm to the master role [puppet] - 10https://gerrit.wikimedia.org/r/491007 [18:52:41] (03CR) 10Andrew Bogott: "I don't especially think we should merge this, but the puppet compiler confirms that it does what it says it does." [puppet] - 10https://gerrit.wikimedia.org/r/491007 (owner: 10Andrew Bogott) [18:59:25] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:30:41] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [19:41:57] RECOVERY - ElasticSearch shard size check - eqiad-search- on search.svc.eqiad.wmnet is OK: OK - All good! [19:43:52] Yay! [20:38:02] (03PS1) 10Bstorm: maintain_dbusers: add the new database VM [puppet] - 10https://gerrit.wikimedia.org/r/491013 (https://phabricator.wikimedia.org/T193264) [20:46:52] (03PS2) 10Bstorm: wmcs: introduce new toolsdb primary role [puppet] - 10https://gerrit.wikimedia.org/r/491003 (https://phabricator.wikimedia.org/T193264) (owner: 10Arturo Borrero Gonzalez) [20:47:39] (03PS3) 10Bstorm: wmcs: introduce new toolsdb primary role [puppet] - 10https://gerrit.wikimedia.org/r/491003 (https://phabricator.wikimedia.org/T193264) (owner: 10Arturo Borrero Gonzalez) [20:51:14] (03PS4) 10Bstorm: wmcs: introduce new toolsdb primary role [puppet] - 10https://gerrit.wikimedia.org/r/491003 (https://phabricator.wikimedia.org/T193264) (owner: 10Arturo Borrero Gonzalez) [20:52:33] (03CR) 10Bstorm: "This is blocked on network. The VM cannot be contacted by the labstore cluster at the moment." [puppet] - 10https://gerrit.wikimedia.org/r/491013 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm) [20:53:13] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review, 10User-herron: Ban recurrent spam to Wikimedia mailing lists (January 2019) - https://phabricator.wikimedia.org/T215251 (10Quiddity) All seems clear on the lists I subscribe to and admin. Thanks for the work here! [20:53:17] (03PS5) 10Bstorm: wmcs: introduce new toolsdb primary role [puppet] - 10https://gerrit.wikimedia.org/r/491003 (https://phabricator.wikimedia.org/T193264) (owner: 10Arturo Borrero Gonzalez) [20:54:32] (03CR) 10Bstorm: [C: 03+2] wmcs: introduce new toolsdb primary role [puppet] - 10https://gerrit.wikimedia.org/r/491003 (https://phabricator.wikimedia.org/T193264) (owner: 10Arturo Borrero Gonzalez) [21:03:05] 10Operations: New cronspam from db clusters - https://phabricator.wikimedia.org/T216273 (10Marostegui) We probably just need to reboot them without the kernel running debug mode as spoken on Friday [21:21:53] (03PS2) 10Bstorm: cloudvps: refresh FQDN A record for tools.db.svc.eqiad.wmflabs [puppet] - 10https://gerrit.wikimedia.org/r/491005 (https://phabricator.wikimedia.org/T193264) (owner: 10Arturo Borrero Gonzalez) [21:54:23] (03CR) 10Bstorm: "This is presuming we are using the publicly routable static IP. I am curious if there are alternatives to doing that, not that it doesn't" [puppet] - 10https://gerrit.wikimedia.org/r/491013 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm) [22:18:25] 10Operations, 10DNS, 10Domains, 10Traffic, 10HTTPS: Merge Wikipedia subdomains into one, to discourage censorship - https://phabricator.wikimedia.org/T215071 (10Platonides) I don't think any //certificate// could. The SNI is transferred //before// the certificate is presented by the server. The server ca... [23:00:43] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10cloud-services-team (Kanban): decommission labvirt101[01].eqiad.wmnet (Dec 2018 lease return) - https://phabricator.wikimedia.org/T210735 (10Andrew) I marked these as 'offline' which is not totally accurate but the closest thing I could find. Is it... [23:56:37] PROBLEM - puppet last run on install1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:58:34] (03PS5) 10Krinkle: PhpAutoPrepend: Remove PhpAutoPrepend-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486177 [23:59:16] (03CR) 10Krinkle: [C: 03+2] "krinkle@deployment-deploy01:/srv/mediawiki$ cat /etc/hhvm/* | grep prepend" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486177 (owner: 10Krinkle)