[02:20:20] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.8) (duration: 08m 17s) [02:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:04:02] 10Operations, 10Wikimedia-Mailing-lists: New mail list for Signpost team - https://phabricator.wikimedia.org/T197732#4301223 (10Brianhe) [03:27:11] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 876.26 seconds [03:38:01] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 228.52 seconds [04:57:31] (03PS1) 10Tim Starling: Rewrite sql script to use the new mysql.php wrapper [puppet] - 10https://gerrit.wikimedia.org/r/441153 [05:35:17] 10Operations, 10MediaWiki-extensions-Translate, 10Language-2018-Apr-June, 10User-Nikerabbit, and 2 others: 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) - https://phabricator.wikimedia.org/T195293#4301482 (10Nikerabbit) a:05Nikerabbit>03None [05:39:48] (03CR) 10Smalyshev: Add cirrussearch settings for wikibase (1/3) (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [06:24:55] 10Operations, 10MediaWiki-extensions-Translate, 10Language-2018-Apr-June, 10User-Nikerabbit, and 2 others: 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) - https://phabricator.wikimedia.org/T195293#4301537 (10Aklapper) >>! In T195293#4224220, @Nikerabbit wr... [06:49:15] 10Operations, 10Wikimedia-Mailing-lists: New mail list for Signpost team - https://phabricator.wikimedia.org/T197732#4301553 (10Aklapper) @Brianhe: Will a secondary list administrator's email address be provided? [06:53:10] (03PS1) 10Dzahn: site: add spare role to bast2002 [puppet] - 10https://gerrit.wikimedia.org/r/441157 [06:54:43] (03PS2) 10Dzahn: site: add spare role to bast2002 [puppet] - 10https://gerrit.wikimedia.org/r/441157 (https://phabricator.wikimedia.org/T196665) [06:56:11] (03CR) 10Dzahn: [C: 032] site: add spare role to bast2002 [puppet] - 10https://gerrit.wikimedia.org/r/441157 (https://phabricator.wikimedia.org/T196665) (owner: 10Dzahn) [06:59:06] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4301559 (10Dzahn) a:05Dzahn>03None Thank you Papaul. I am unassigning from myself for now because i will be on vacation for a couple weeks. It doesn't have to be blocked by... [07:00:50] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665#4301563 (10Dzahn) P.S. Just added role(spare) because that adds firewalling. As pointed out by others we should avoid having servers (with a public IP) in puppet without a rol... [07:02:24] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T197707#4301564 (10elukey) p:05Triage>03High a:03elukey [07:03:37] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T197707#4300251 (10elukey) @RobH, @Cmjohnson - is it possible to swap the disk even if warranty is expired? We are not ready yet to decom this host (but will anticipate its hw replacement... [07:03:49] 10Operations, 10Wikimedia-IRC-RC-Server, 10Patch-For-Review: Replace ircd-ratbox with something newer/maintained - https://phabricator.wikimedia.org/T134271#4301568 (10Dzahn) p:05Normal>03Low [07:06:50] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [07:07:10] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [07:21:03] 10Operations: Redirect http://status.wikipedia.org to http://status.wikimedia.org - https://phabricator.wikimedia.org/T79839#4301578 (10Dzahn) [07:21:38] 10Operations, 10DNS, 10Traffic: Redirect http://status.wikipedia.org to http://status.wikimedia.org - https://phabricator.wikimedia.org/T32811#347241 (10Dzahn) >>! In T32811#4298820, @hashar wrote: > (which is not public and under #wmf-nda ) Fixed that. Now public. It was just NDA because that was default f... [07:22:34] 10Operations, 10DNS, 10Traffic: Redirect http://status.wikipedia.org to http://status.wikimedia.org - https://phabricator.wikimedia.org/T32811#4301586 (10Dzahn) [07:22:37] 10Operations, 10DNS, 10Traffic: Redirect status.wikipedia.org to status.wikimedia.org - https://phabricator.wikimedia.org/T167239#4301587 (10Dzahn) [07:23:24] 10Operations, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Remove wildcard vhost for *.wikimedia.org - https://phabricator.wikimedia.org/T192206#4301588 (10Joe) p:05Triage>03Low [07:27:51] 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4301599 (10Joe) We also need @greg approval for adding people to deployers. [07:28:56] 10Operations, 10ops-eqiad: decommission samarium.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T197630#4301600 (10Joe) p:05Triage>03Normal [07:29:21] !log deploy1001: rebased php-1.32.0-wmf.8/extensions/Translate to catch up with a non production merged change ( https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/441127 ). [07:29:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:31:15] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T197707#4301603 (10Marostegui) These are 1TB disks. ``` Raw Size: 1.090 TB [0x8bba0cb0 Sectors] ``` [07:32:22] 10Operations, 10monitoring, 10Privacy, 10Security-Core: status.wikimedia.org should not load Google Analytics - https://phabricator.wikimedia.org/T115945#1736775 (10Joe) >>! In T115945#4298247, @Framawiki wrote: > Hello @Ottomata. Ping @Dzahn and @BBlack. > > The fact that this site is hosted by a third p... [07:32:29] 10Operations, 10monitoring, 10Privacy, 10Security-Core: status.wikimedia.org should not load Google Analytics - https://phabricator.wikimedia.org/T115945#4301615 (10Joe) p:05Triage>03Normal [07:34:51] 10Operations, 10ops-eqiad, 10DC-Ops: Replace wtp1043's sda - https://phabricator.wikimedia.org/T196886#4301619 (10Cmjohnson) [07:34:53] 10Operations, 10ops-eqiad: Degraded RAID on wtp1043 - https://phabricator.wikimedia.org/T196260#4301621 (10Cmjohnson) [07:40:22] RECOVERY - Host mw1272 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [07:40:59] <_joe_> !log powercycled mw1272, down since yesterday [07:41:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:05] <_joe_> hashar: why a deploy? [07:47:06] apergos: for when you think is okay: https://gerrit.wikimedia.org/r/c/operations/puppet/+/440613 [07:47:27] Amir1: I saw it but no deploys this week :-/ [07:47:32] so yeah it will go in next week [07:47:47] I see. That's good. Thank you! [07:47:48] I could have +1 it but I will jut turn around and +2 it later [07:48:03] thanks for the patch, I never noticed that little issue [07:54:51] (03CR) 10DCausse: Add cirrussearch settings for wikibase (1/3) (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [08:04:36] (03CR) 10Hashar: [V: 031] "I have checked the disk usage and it is rather marginal, we have enough disk space on the CI labs instance." [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469) (owner: 10Hashar) [08:07:30] (03PS19) 10DCausse: Add cirrussearch settings for wikibase (1/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) [08:07:32] (03PS4) 10DCausse: Add cirrussearch settings for wikibase (2/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441056 (https://phabricator.wikimedia.org/T182717) [08:07:34] (03PS4) 10DCausse: Add cirrussearch settings for wikibase (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441057 (https://phabricator.wikimedia.org/T182717) [08:23:47] ACKNOWLEDGEMENT - Router interfaces on cr1-eqdfw is CRITICAL: CRITICAL: host 208.80.153.198, interfaces up: 46, down: 1, dormant: 0, excluded: 0, unused: 0: Ayounsi GTT mess [08:31:22] PROBLEM - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 288 bytes in 0.011 second response time [08:33:02] cteam ^ I believe this is a false negative. There is some weird condition w/ that check that is nonsensical even if things are fine. [08:33:47] huh [08:34:14] who runs that checker, chasemp? [08:34:47] apergos: that check is run on icinga and hits an endpoint at http://checker.tools.wmflabs.org/ [08:34:58] which checks for an OK string on a job looking at dumps status [08:35:35] what I meant was that tool checker.tools [08:35:38] who runs that [08:35:42] RECOVERY - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.006 second response time [08:36:18] and can we get a look at it... [08:38:47] apergos: [08:38:50] tools-bastion-03:~# getent group tools.toolschecker [08:38:50] tools.toolschecker:*:52524:marc,andrew,yuvipanda [08:38:56] ah [08:38:57] run from tools-checker-01 [08:39:16] fwiw modules/toollabs/manifests/checker.pp and modules/toollabs/files/toolschecker.py [08:39:33] great, thanks [08:47:29] 10Operations, 10Mail, 10Wikimedia-Mailing-lists, 10Patch-For-Review: mailman listing unresponsive (fermium high latency) - https://phabricator.wikimedia.org/T196989#4301795 (10jcrespo) I would be ok with resolving this, long term actions could be done on a separate ticket(s) if needed. [08:47:34] <_joe_> !log removing user.log.1 and messages.log.1 on tegmen to save some space [08:47:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:28] <_joe_> !log restarting nsca daemon on tegmen (gone wild, hundreds of subprocesses [08:51:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:46] 10Operations, 10MediaWiki-extensions-Translate, 10Language-2018-Apr-June, 10User-Nikerabbit, and 2 others: 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) - https://phabricator.wikimedia.org/T195293#4301798 (10jcrespo) I am ok with any index change you want,... [08:54:14] <_joe_> !log killall nsca on tegmen [08:54:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:48] <_joe_> !log running logrotate on tegmen [08:54:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:14] 10Operations, 10JADE, 10Scoring-platform-team (Current), 10User-Joe: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547#4301801 (10awight) Back to giving a reasonable estimate, now that we're only planning for human patrolling actions. frwik... [08:55:45] 10Operations, 10Wikimedia-Mailing-lists: comunicação_BR1 - https://phabricator.wikimedia.org/T197717#4301807 (10Qgil) [08:57:11] <_joe_> !log removing user.log as well on tegmen [08:57:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:58:01] (03CR) 10Krinkle: Rewrite sql script to use the new mysql.php wrapper (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/441153 (owner: 10Tim Starling) [09:01:14] (03CR) 10Jcrespo: [C: 031] "0 issues with this, but do you have some context pointer for me about mysql.php so I can have a look to it for pure understanding reasons?" [puppet] - 10https://gerrit.wikimedia.org/r/441153 (owner: 10Tim Starling) [09:01:41] hi all! [09:02:00] I'm seeing 15% packet loss en route to gerrit.wikimedia.org. [09:02:07] Packets seem to drop at ash-b1-v6.telia.net [09:02:10] is this known? [09:02:37] I'm coming in via Telekom in Germany [09:03:09] <_joe_> DanielK_WMDE_: nope, but see https://wikitech.wikimedia.org/wiki/Reporting_a_connectivity_issue [09:03:59] <_joe_> FWIW, I have 0% packet loss here in italy, so it's specific of your connection [09:04:03] <_joe_> *to [09:04:16] <_joe_> XioNoX: ^^ some telia trouble in aushburn? [09:04:33] routes via esams seem fine [09:04:45] <_joe_> DanielK_WMDE_: in fact, I go to eqiad via GTT [09:11:56] 10Operations, 10Gerrit, 10Traffic, 10netops: Packet loss en route to gerrit.wikimedia.org - https://phabricator.wikimedia.org/T197763#4301821 (10daniel) [09:12:01] _joe_: https://phabricator.wikimedia.org/T197763 [09:12:46] <_joe_> DanielK_WMDE_: I hope XioNoX can notice, I'm dealing with tegmen right now, sorry [09:13:51] _joe_: that was just fyi, didn't mean to by pushy. do your think :) [09:13:55] thing even [09:13:57] i'll get on my bike and go to the office now, let's hope it's better there :) same ISP though. [09:13:59] I'm here [09:14:04] looking [09:14:21] <_joe_> thanks [09:16:50] XioNoX: oh, i'm coming in via ipv6. 2003:e9:ef04:48b1:718b:37f4:2e05:10ea. will add that to the ticket [09:17:16] DanielK_WMDE_: is the issue still happening? [09:17:30] 10Operations, 10Gerrit, 10Traffic, 10netops: Packet loss en route to gerrit.wikimedia.org - https://phabricator.wikimedia.org/T197763#4301861 (10daniel) [09:17:39] XioNoX: yes. loss is 15% now [09:17:48] There was a telia outage yesterday, and they are just done with the fix [09:17:57] DanielK_WMDE_: can you try via v4? [09:18:04] mtr -4 xxxxx [09:18:12] XioNoX: my wifi is also flaky some times. but right now, all hops up to telia.net have 0% loss [09:18:31] From Prague via v4 is fine [09:18:51] i don't even know if mtr defaults to ipv6. probably doesn't. [09:19:08] it's v6 on the one you shared [09:19:49] XioNoX: -4 and -6 are both fine now. [09:20:55] 10Operations, 10Gerrit, 10Traffic, 10netops: Packet loss en route to gerrit.wikimedia.org - https://phabricator.wikimedia.org/T197763#4301865 (10daniel) Update: it's fine now. Feel free to close or keep open, as makes more sense for you. [09:20:57] I hate intermediate problems ;) [09:21:27] 10Operations, 10Gerrit, 10Traffic, 10netops: Packet loss en route to gerrit.wikimedia.org - https://phabricator.wikimedia.org/T197763#4301821 (10ayounsi) v4 from Prague goes well: ``` $ mtr text-lb.eqiad.wikimedia.org --report-wide Start: Wed Jun 20 11:15:08 2018 HOST: laptop... [09:21:36] yeah, Telia just finished fixing a fibercut [09:21:51] ah, well, that would explain it :) [09:22:33] XioNoX: thanks for looking [09:22:54] 10Operations, 10Gerrit, 10Traffic, 10netops: Packet loss en route to gerrit.wikimedia.org - https://phabricator.wikimedia.org/T197763#4301869 (10ayounsi) 05Open>03Resolved a:03ayounsi [09:23:01] thanks for reporting it [09:35:03] (03PS1) 10Dzahn: site: add dns2001/dns2002 with role(spare) [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) [09:35:21] (03CR) 10jerkins-bot: [V: 04-1] site: add dns2001/dns2002 with role(spare) [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) (owner: 10Dzahn) [09:36:01] (03PS2) 10Dzahn: site: add dns2001/dns2002 with role(spare) [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) [09:37:19] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#4301952 (10Imarlier) [09:39:36] (03CR) 10Vgutierrez: site: add dns2001/dns2002 with role(spare) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) (owner: 10Dzahn) [09:42:29] (03PS1) 10ArielGlenn: make sure dump checker always looks at the dump dir from earliest run [puppet] - 10https://gerrit.wikimedia.org/r/441179 [09:42:49] (03CR) 10Dzahn: site: add dns2001/dns2002 with role(spare) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) (owner: 10Dzahn) [09:42:57] (03PS3) 10Dzahn: site: add dns2001/dns2002 with role(spare) [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) [09:43:15] (03CR) 10jerkins-bot: [V: 04-1] site: add dns2001/dns2002 with role(spare) [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) (owner: 10Dzahn) [09:43:21] (03CR) 10ArielGlenn: "Not convinced this is the fix for the false positives problem, but we should do it anyhow." [puppet] - 10https://gerrit.wikimedia.org/r/441179 (owner: 10ArielGlenn) [09:43:54] (03PS4) 10Dzahn: site: add dns2001/dns2002 with role(spare) [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) [09:45:20] (03CR) 10Aklapper: "Who can review my latest try from 11 days ago?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439436 (https://phabricator.wikimedia.org/T165773) (owner: 10Aklapper) [09:46:08] (03CR) 10Vgutierrez: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) (owner: 10Dzahn) [09:47:22] (03PS3) 10Reedy: Create a FeaturedFeed for the News on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439436 (https://phabricator.wikimedia.org/T165773) (owner: 10Aklapper) [09:47:36] (03CR) 10Reedy: "Careful with git commit -a! PS3 removes an unrelated .gitreview change :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439436 (https://phabricator.wikimedia.org/T165773) (owner: 10Aklapper) [10:26:03] PROBLEM - MediaWiki centralauth errors on graphite1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [1.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=3&fullscreen [10:40:50] (03CR) 10MarcoAurelio: "Does this really make the local and global steward group have the same increased password policies as expected and described in the task? " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440834 (https://phabricator.wikimedia.org/T197577) (owner: 10MarcoAurelio) [10:41:35] 10Operations, 10scap2, 10HHVM, 10Scap (Scap3-MediaWiki-MVP), 10releng-201617-q4: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#4302182 (10Imarlier) [10:41:53] (03CR) 10MarcoAurelio: "I'm asking because I've personally found that block of config rather hard to understand sometimes :-) But maybe it's just me. I just want " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440834 (https://phabricator.wikimedia.org/T197577) (owner: 10MarcoAurelio) [10:46:23] (03Abandoned) 10Gergő Tisza: MWScript: do not wrangle absolute path [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440743 (owner: 10Gergő Tisza) [10:49:52] RECOVERY - MediaWiki centralauth errors on graphite1001 is OK: OK: Less than 30.00% above the threshold [0.5] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=3&fullscreen [11:16:29] (03CR) 10Tim Starling: "> Patch Set 1: Code-Review+1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/441153 (owner: 10Tim Starling) [11:38:15] 10Operations, 10Commons, 10Multimedia, 10media-storage, 10User-Josve05a: Specific revisions of multiple files missing from Swift - 404 Not Found returned - https://phabricator.wikimedia.org/T124101#4302403 (10AlexisJazz) I tried 00 to ff for https://upload.wikimedia.org/wikipedia/commons/2/2e/Burbuja_%28... [11:50:13] (03CR) 10Dzahn: [C: 032] site: add dns2001/dns2002 with role(spare) [puppet] - 10https://gerrit.wikimedia.org/r/441177 (https://phabricator.wikimedia.org/T196493) (owner: 10Dzahn) [11:54:42] (03CR) 10Dzahn: [C: 032] DNS: Add production DNS entries for dns200[1-2] [dns] - 10https://gerrit.wikimedia.org/r/441055 (https://phabricator.wikimedia.org/T196493) (owner: 10Papaul) [11:55:45] (03PS2) 10Dzahn: DHCP: Add MAC address entries for dns200[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/441059 (https://phabricator.wikimedia.org/T196493) (owner: 10Papaul) [11:59:31] (03CR) 10Dzahn: [C: 032] DHCP: Add MAC address entries for dns200[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/441059 (https://phabricator.wikimedia.org/T196493) (owner: 10Papaul) [12:01:15] (03PS2) 10Dzahn: DNS: Fix mgmt asset tag for dns2002 [dns] - 10https://gerrit.wikimedia.org/r/441062 (https://phabricator.wikimedia.org/T196493) (owner: 10Papaul) [12:05:27] (03CR) 10Dzahn: [C: 032] DNS: Fix mgmt asset tag for dns2002 [dns] - 10https://gerrit.wikimedia.org/r/441062 (https://phabricator.wikimedia.org/T196493) (owner: 10Papaul) [12:38:27] 10Operations, 10Wikimedia-Mailing-lists: Request for a mailing list for VVIT WikiConnect - https://phabricator.wikimedia.org/T191702#4302590 (10Krishna_Chaitanya_Velaga) p:05Triage>03High Hi Daniel, I am sorry that I've forgot the administrator's password. I request you to help me to reset it. Sorry for... [12:39:26] 10Operations, 10Wikimedia-Mailing-lists: Request for a mailing list for VVIT WikiConnect - https://phabricator.wikimedia.org/T191702#4302594 (10Aklapper) 05Resolved>03Open p:05High>03Triage [[ https://www.mediawiki.org/wiki/Phabricator/Project_management#Setting_task_priorities | Resetting priority ]]... [13:00:26] 10Operations, 10Traffic, 10User-Johan, 10User-notice: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4302612 (10Verdy_p) Will that affect Wikipedia Zero, given it is freely hosted by third party volunteer ISPs that provide their own... [13:10:32] PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) timed out before a response was received [13:11:33] RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy [13:18:01] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T197707#4302652 (10Cmjohnson) These servers are out of warranty, I do not think I have any 1TB disks in the data center but I can use a 2TB. [13:28:15] 10Operations, 10monitoring, 10Privacy, 10Security-Core: status.wikimedia.org should not load Google Analytics - https://phabricator.wikimedia.org/T115945#4302670 (10Ottomata) What about just removing the .wikimedia.org domain? I think the objection is mostly that it looks like WMF is using Google Analytics. [13:31:24] (03CR) 10Anomie: "I note that the old script had to have options before the wiki name: `sql --write enwiki` or `sql --group vslow enwiki` work, while `sql e" [puppet] - 10https://gerrit.wikimedia.org/r/441153 (owner: 10Tim Starling) [13:34:08] 10Operations, 10Wikimedia-Mailing-lists: New mail list for Signpost team - https://phabricator.wikimedia.org/T197732#4302689 (10herron) p:05Triage>03Normal [13:38:45] !log Re-running populateExternallinksIndex60.php on plwiki and ptwiki for [[phab:T59176]] (initial run collided with the s2 master switch). [13:38:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:49] T59176: ApiQueryExtLinksUsage::run query has crazy limit - https://phabricator.wikimedia.org/T59176 [13:39:44] This may cause codfw replication lag. Estimated time for the run is about 8 hours. [13:40:04] s/lag/lag on s2/ [13:43:10] !log imarlier@deploy1001 Started deploy [performance/navtiming@995cb0f]: (no justification provided) [13:43:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:15] !log imarlier@deploy1001 Finished deploy [performance/navtiming@995cb0f]: (no justification provided) (duration: 00m 05s) [13:43:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:21] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: RFC: Use content hash based image / thumb URLs - https://phabricator.wikimedia.org/T149847#4302761 (10Krinkle) [13:58:40] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 10 others: RFC: Use content hash based image / thumb URLs - https://phabricator.wikimedia.org/T149847#2766313 (10Krinkle) Removing from perf radar in favour of T19577. [14:00:31] (03PS1) 10Dzahn: convert check_prometheus_metric.py to python3 [puppet] - 10https://gerrit.wikimedia.org/r/441208 [14:03:24] !log imarlier@deploy1001 Started deploy [performance/navtiming@8914e26]: (no justification provided) [14:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:29] !log imarlier@deploy1001 Finished deploy [performance/navtiming@8914e26]: (no justification provided) (duration: 00m 05s) [14:03:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:49] (03PS1) 10Dzahn: convert check_graphite to python3 [puppet] - 10https://gerrit.wikimedia.org/r/441209 [14:05:32] (03PS3) 10Hagar Shilo: CORS whitelist chapter wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) [14:07:15] !log imarlier@deploy1001 Started deploy [performance/navtiming@742edb0]: (no justification provided) [14:07:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:20] !log imarlier@deploy1001 Finished deploy [performance/navtiming@742edb0]: (no justification provided) (duration: 00m 04s) [14:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:26] (03CR) 10Krinkle: [C: 04-1] CORS whitelist chapter wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) (owner: 10Hagar Shilo) [14:36:37] 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review, 10User-herron: Icinga check for sysctl settings - https://phabricator.wikimedia.org/T160060#4302870 (10herron) [14:40:46] 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review, 10User-herron: Icinga check for sysctl settings - https://phabricator.wikimedia.org/T160060#4302883 (10herron) No news, but I should pick it back up. Next step is phased roll out to systems. [14:45:41] 10Operations, 10Wikimedia-Mailing-lists, 10User-herron: Spam to -owner mailing lists from *@qq.com emails - https://phabricator.wikimedia.org/T189957#4302900 (10herron) [14:49:45] (03PS4) 10Hagar Shilo: CORS whitelist chapter wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) [14:51:03] PROBLEM - Disk space on elastic1020 is CRITICAL: DISK CRITICAL - free space: /srv 60086 MB (12% inode=99%) [14:51:22] 10Operations, 10Wikimedia-Mailing-lists, 10User-herron: Spam to -owner mailing lists from *@qq.com emails - https://phabricator.wikimedia.org/T189957#4302934 (10herron) In the past qq.com spam to -owner addresses was coming mostly from blocklisted mail systems, so https://gerrit.wikimedia.org/r/#/c/operation... [14:58:41] 10Operations, 10DNS, 10Mail, 10Patch-For-Review, 10User-herron: Outbound mail from Greenhouse is broken - https://phabricator.wikimedia.org/T189065#4302980 (10herron) Following up -- Will we be able to enable DKIM for `gh-mail.wikimedia.org`? [14:59:18] !log disk space issue on elastic1020 is due to shard rebalancing (currently receiving 2 enwiki_general shards but removing one wikidatawiki_content) [14:59:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:28] 10Operations, 10ORES, 10Scoring-platform-team: Tuning profile::ores::celery parameters should cause a Celery service restart - https://phabricator.wikimedia.org/T182203#3816483 (10awight) 05Open>03declined I learned that Ops would rather restart the services explicitly [15:13:39] (03CR) 10Krinkle: "@Tim: Good point regarding on distinguishing from wiki-anostic "wikiless". I suppose we could have that concept within MWScript.php, but t" [puppet] - 10https://gerrit.wikimedia.org/r/441153 (owner: 10Tim Starling) [15:15:53] RECOVERY - Disk space on elastic1020 is OK: DISK OK [15:18:24] (03CR) 10Jforrester: "Do we want to fix T196923 first to avoid an avalanche of unhelpful feedback from users?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439471 (owner: 10Reedy) [15:22:02] PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received [15:24:12] RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy [15:28:23] (03CR) 10BryanDavis: [C: 031] "As Ariel said, this may not fix all the false positives, but it seems like a reasonable step to take." [puppet] - 10https://gerrit.wikimedia.org/r/441179 (owner: 10ArielGlenn) [15:30:20] 10Operations, 10Puppet, 10Traffic, 10User-herron: Puppet hosts with signed certificate present on agent but not master - https://phabricator.wikimedia.org/T185239#4303127 (10herron) [15:30:34] 10Operations, 10Puppet, 10Patch-For-Review, 10User-herron: remove puppet_major_version and puppetdb_major_version variables. clean up puppet master/db hieradata - https://phabricator.wikimedia.org/T190318#4303128 (10herron) [15:31:21] (03CR) 10Mooeypoo: CORS whitelist chapter wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) (owner: 10Hagar Shilo) [15:31:31] 10Operations, 10monitoring, 10Patch-For-Review, 10User-herron: Reduce false positive icinga alerts during host reimages - https://phabricator.wikimedia.org/T195423#4303132 (10herron) The CR looks good to me. Anything we should do before going ahead with it? [15:31:45] 10Operations, 10User-herron: Improve visibility of incoming operations tasks - https://phabricator.wikimedia.org/T197624#4303134 (10herron) [15:48:36] (03CR) 10Reedy: CORS whitelist chapter wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) (owner: 10Hagar Shilo) [15:49:08] !log sbisson@deploy1001 Started deploy [kartotherian/deploy@9af9191]: (no justification provided) [15:49:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:43] (03CR) 10Reedy: [C: 04-1] CORS whitelist chapter wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) (owner: 10Hagar Shilo) [15:50:02] 10Operations, 10Mail, 10monitoring, 10User-herron, 10Wikimedia-Incident: Improve outbound mail service alerting - https://phabricator.wikimedia.org/T197172#4303186 (10herron) [15:52:34] 04̶C̶r̶i̶t̶i̶c̶a̶l Device asw2-c-eqiad.mgmt.eqiad.wmnet recovered from Critical syslog messages [15:52:43] !log sbisson@deploy1001 Finished deploy [kartotherian/deploy@9af9191]: (no justification provided) (duration: 03m 36s) [15:52:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:03] <_joe_> sigh [15:56:37] !log sbisson@deploy1001 Started deploy [kartotherian/deploy@9af9191]: Kartotherian: make Pl fallback to EN [15:56:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:04] !log sbisson@deploy1001 Finished deploy [kartotherian/deploy@9af9191]: Kartotherian: make Pl fallback to EN (duration: 00m 27s) [15:57:04] stephanebisson: hey, just so yo uknow, we have a deployment freeze on this week [15:57:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:12] while all SREs are at a summit [15:58:04] !log sbisson@deploy1001 Started deploy [kartotherian/deploy@9af9191]: Kartotherian: make Pl fallback to EN [15:58:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:13] !log sbisson@deploy1001 Finished deploy [kartotherian/deploy@9af9191]: Kartotherian: make Pl fallback to EN (duration: 03m 08s) [16:01:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:30] (03PS5) 10Hagar Shilo: CORS whitelist chapter wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) [16:11:33] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 44 probes of 325 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [16:16:42] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 1 probes of 325 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [16:46:52] (03PS1) 10Thcipriani: Scap: Remove git_server from scap.cfg [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/441235 (https://phabricator.wikimedia.org/T162814) [16:47:22] 10Operations, 10Mail, 10Wikimedia-Logstash, 10User-herron: Ship MX logs to ELK - https://phabricator.wikimedia.org/T197173#4303413 (10herron) That makes sense, and likely applies to other types of logs as well. It seems to me like something worth solving in the ELK stack. Do we have an X-Pack subscriptio... [16:52:08] 10Operations, 10Mail, 10Wikimedia-Logstash, 10User-herron: Ship MX logs to ELK - https://phabricator.wikimedia.org/T197173#4303438 (10herron) [16:52:10] 10Operations, 10Mail, 10monitoring, 10Wikimedia-Incident: Graph outbound mail volume on per-service or hostgroup level - https://phabricator.wikimedia.org/T197171#4303437 (10herron) [16:52:15] 10Operations, 10Cloud-Services, 10Developer-Relations, 10LDAP: Create a single application to provision and manage developer (LDAP) accounts - https://phabricator.wikimedia.org/T179463#4303441 (10bd808) [16:53:06] 10Operations, 10Mail, 10monitoring, 10User-herron, 10Wikimedia-Incident: Graph outbound mail volume on per-service or hostgroup level - https://phabricator.wikimedia.org/T197171#4280935 (10herron) [16:53:42] 10Operations, 10Patch-For-Review, 10User-herron: Ship host syslogs to ELK - https://phabricator.wikimedia.org/T193766#4303447 (10herron) [16:53:58] 10Operations, 10Puppet, 10User-herron: Knock down puppet 4 deprecation warnings - https://phabricator.wikimedia.org/T193664#4303450 (10herron) [17:07:32] 10Operations, 10Puppet, 10puppet-compiler, 10User-herron: Upgrade Puppet compilers to Stretch - https://phabricator.wikimedia.org/T191438#4105315 (10herron) [17:14:02] 10Operations, 10Puppet, 10User-herron: Improve puppet alerting - https://phabricator.wikimedia.org/T178628#4303532 (10herron) That sounds great, what is the next step? [17:14:54] 10Operations, 10Puppet, 10Patch-For-Review, 10User-herron: custom fact interface_primary breaks under newer versions of facter - https://phabricator.wikimedia.org/T182819#4303534 (10herron) [17:15:21] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review: Connect or troubleshoot eth1 on labvirt1019 and labvirt1020 - https://phabricator.wikimedia.org/T194964#4303537 (10Bstorm) I probably shouldn't actually switch things in the bios for 1020 until we confirm it is cabled for that? @Cmjohnson is... [17:15:24] 10Operations, 10Mail, 10Wikidata: Large number of "A page you created was linked on Wikidata" emails to one recipient in short period of time - https://phabricator.wikimedia.org/T177099#4303538 (10herron) 05Open>03Resolved a:03herron [17:17:58] 10Operations, 10Mail, 10monitoring, 10User-herron, 10Wikimedia-Incident: Graph outbound mail volume on per-service or hostgroup level - https://phabricator.wikimedia.org/T197171#4303541 (10herron) @fgiunchedi I linked this to T197173 (which I think is worthwhile thing to do anyway) but do you think there... [17:18:50] 10Operations, 10Wikimedia-Mailing-lists, 10User-herron: Spam to -owner mailing lists from *@qq.com emails - https://phabricator.wikimedia.org/T189957#4303546 (10herron) a:03herron [17:40:51] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: investigate caching of mailman listinfo pages - https://phabricator.wikimedia.org/T197819#4303598 (10herron) p:05Triage>03Normal [17:41:35] 10Operations, 10Mail, 10Wikimedia-Mailing-lists, 10Patch-For-Review: mailman listing unresponsive (fermium high latency) - https://phabricator.wikimedia.org/T196989#4275321 (10herron) 05Open>03Resolved Sounds good, thanks @jcrespo! [18:20:54] (03CR) 10Mooeypoo: CORS whitelist chapter wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) (owner: 10Hagar Shilo) [18:49:26] (03PS1) 10Bstorm: install: Add labstore1008 & labstore1009 [puppet] - 10https://gerrit.wikimedia.org/r/441247 (https://phabricator.wikimedia.org/T193655) [18:57:29] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Release-Engineering-Team (Kanban), 10User-greg: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4303725 (10RobH) a:05herron>03greg @greg: Would you review and approve/deny deployers access for @mbsantos? Once done, feel f... [19:01:33] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Release-Engineering-Team (Kanban), 10User-greg: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4303730 (10greg) Sorry, approved! [19:01:54] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Release-Engineering-Team (Kanban), 10User-greg: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4303731 (10RobH) a:05greg>03None [19:02:29] 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4303732 (10greg) [19:19:29] 10Operations, 10Wikimedia-Mailing-lists, 10User-herron: Spam to -owner mailing lists from *@qq.com emails - https://phabricator.wikimedia.org/T189957#4303756 (10Aklapper) At least for `cep-owner@` this stopped a while ago and I don't have any such messages anymore, sorry. If noone experiences this problem an... [19:22:52] (03CR) 10Bstorm: [C: 032] install: Add labstore1008 & labstore1009 [puppet] - 10https://gerrit.wikimedia.org/r/441247 (https://phabricator.wikimedia.org/T193655) (owner: 10Bstorm) [19:26:33] PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200) [19:29:52] RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy [19:30:14] (03PS1) 10Ottomata: Remove unsused webrequest-analytics.erb camus and irrelevant TODO comments [puppet] - 10https://gerrit.wikimedia.org/r/441252 [19:31:46] (03PS1) 10Ottomata: Move camus mediawiki_job to use kafka jumbo instead of analytics [puppet] - 10https://gerrit.wikimedia.org/r/441254 (https://phabricator.wikimedia.org/T189713) [19:32:01] (03PS2) 10Ottomata: Remove unsused webrequest-analytics.erb camus and irrelevant TODO comments [puppet] - 10https://gerrit.wikimedia.org/r/441252 [19:32:14] (03CR) 10Ottomata: [V: 032 C: 032] Remove unsused webrequest-analytics.erb camus and irrelevant TODO comments [puppet] - 10https://gerrit.wikimedia.org/r/441252 (owner: 10Ottomata) [19:32:31] (03PS2) 10Ottomata: Move camus mediawiki_job to use kafka jumbo instead of analytics [puppet] - 10https://gerrit.wikimedia.org/r/441254 (https://phabricator.wikimedia.org/T189713) [19:33:05] (03CR) 10Ottomata: [V: 032 C: 032] Move camus mediawiki_job to use kafka jumbo instead of analytics [puppet] - 10https://gerrit.wikimedia.org/r/441254 (https://phabricator.wikimedia.org/T189713) (owner: 10Ottomata) [19:44:04] (03PS1) 10Ottomata: Disable main-eqiad -> analytics MirrorMaker [puppet] - 10https://gerrit.wikimedia.org/r/441255 (https://phabricator.wikimedia.org/T175461) [19:46:45] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler02/11539/" [puppet] - 10https://gerrit.wikimedia.org/r/441255 (https://phabricator.wikimedia.org/T175461) (owner: 10Ottomata) [19:46:52] (03PS2) 10Ottomata: Disable main-eqiad -> analytics MirrorMaker [puppet] - 10https://gerrit.wikimedia.org/r/441255 (https://phabricator.wikimedia.org/T175461) [19:47:15] (03CR) 10Ottomata: [V: 032 C: 032] Disable main-eqiad -> analytics MirrorMaker [puppet] - 10https://gerrit.wikimedia.org/r/441255 (https://phabricator.wikimedia.org/T175461) (owner: 10Ottomata) [19:49:49] (03PS3) 10EBernhardson: Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 [19:50:03] PROBLEM - Check systemd state on install2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:50:54] (03CR) 10jerkins-bot: [V: 04-1] Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 (owner: 10EBernhardson) [19:52:49] PROBLEM - Check systemd state on install1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:54:26] !log removed Kafka MirrorMaker from kafka10(12|13|14) [19:54:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:14] (03PS1) 10Ottomata: Remove Kafka MirrorMaker jmxtrans puppetization [puppet] - 10https://gerrit.wikimedia.org/r/441256 (https://phabricator.wikimedia.org/T175461) [20:03:24] !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@3420e67]: Update mobileapps to 9d856ec [20:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:10] (03PS2) 10Ottomata: Remove Kafka MirrorMaker jmxtrans puppetization [puppet] - 10https://gerrit.wikimedia.org/r/441256 (https://phabricator.wikimedia.org/T175461) [20:07:15] !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@3420e67]: Update mobileapps to 9d856ec (duration: 03m 51s) [20:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:29] Rolledback canary^ [20:08:08] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler02/11541/" [puppet] - 10https://gerrit.wikimedia.org/r/441256 (https://phabricator.wikimedia.org/T175461) (owner: 10Ottomata) [20:08:10] (03CR) 10Ottomata: [C: 032] Remove Kafka MirrorMaker jmxtrans puppetization [puppet] - 10https://gerrit.wikimedia.org/r/441256 (https://phabricator.wikimedia.org/T175461) (owner: 10Ottomata) [20:08:18] !log rolled back "Update mobileapps to 9d856ec" (was just on canary) [20:08:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:46] (03PS1) 10Ottomata: Remove unused MirrorMaker old consumer parameters [puppet] - 10https://gerrit.wikimedia.org/r/441259 (https://phabricator.wikimedia.org/T175461) [20:12:30] (03CR) 10jerkins-bot: [V: 04-1] Remove unused MirrorMaker old consumer parameters [puppet] - 10https://gerrit.wikimedia.org/r/441259 (https://phabricator.wikimedia.org/T175461) (owner: 10Ottomata) [20:13:26] (03CR) 10Krinkle: [C: 04-1] CORS whitelist chapter wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) (owner: 10Hagar Shilo) [20:13:38] (03CR) 10Krinkle: [C: 04-1] "Thanks, and sorry about the protocol confusion." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) (owner: 10Hagar Shilo) [20:15:12] (03PS2) 10Ottomata: Remove unused MirrorMaker old consumer parameters [puppet] - 10https://gerrit.wikimedia.org/r/441259 (https://phabricator.wikimedia.org/T175461) [20:16:09] PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200) [20:17:10] RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy [20:18:17] (03PS3) 10Ottomata: Remove unused MirrorMaker old consumer parameters [puppet] - 10https://gerrit.wikimedia.org/r/441259 (https://phabricator.wikimedia.org/T175461) [20:20:22] (03CR) 10Ottomata: [C: 032] "No op https://puppet-compiler.wmflabs.org/compiler02/11544/kafka1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/441259 (https://phabricator.wikimedia.org/T175461) (owner: 10Ottomata) [20:20:24] (03CR) 10Ottomata: [C: 032] Remove unused MirrorMaker old consumer parameters [puppet] - 10https://gerrit.wikimedia.org/r/441259 (https://phabricator.wikimedia.org/T175461) (owner: 10Ottomata) [20:20:49] PROBLEM - puppet last run on install2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[isc-dhcp-server] [20:23:25] (03PS4) 10EBernhardson: Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 [20:24:11] (03CR) 10jerkins-bot: [V: 04-1] Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 (owner: 10EBernhardson) [20:25:30] PROBLEM - puppet last run on install1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[isc-dhcp-server] [20:26:25] (03PS1) 10Ottomata: Remove outdated README TODO [puppet] - 10https://gerrit.wikimedia.org/r/441260 [20:26:40] (03CR) 10Ottomata: [V: 032 C: 032] Remove outdated README TODO [puppet] - 10https://gerrit.wikimedia.org/r/441260 (owner: 10Ottomata) [20:32:08] (03PS5) 10EBernhardson: Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 [20:40:07] (03PS6) 10Hagar Shilo: CORS whitelist chapter wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) [20:51:06] (03PS6) 10EBernhardson: Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 [20:55:19] (03CR) 10Smalyshev: [C: 031] Add cirrussearch settings for wikibase (1/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [20:56:24] (03CR) 10Smalyshev: [C: 031] Add cirrussearch settings for wikibase (1/3) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [20:57:42] (03CR) 10Smalyshev: Add cirrussearch settings for wikibase (2/3) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441056 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [20:58:06] (03CR) 10Smalyshev: [C: 031] Add cirrussearch settings for wikibase (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441057 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [21:31:15] (03CR) 10EBernhardson: "This is ready for review now, there is more prep work to do (prometheus and ferm) but i don't think it would fit in this patch." [puppet] - 10https://gerrit.wikimedia.org/r/440498 (owner: 10EBernhardson) [21:59:00] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install labstore1008 & labstore1009 - https://phabricator.wikimedia.org/T193655#4304028 (10Bstorm) on labstore1008, that setup didn't work. It might be the 10g interface plugged in. I'll try that one... [21:59:30] (03PS1) 10Bstorm: Change labstore1008 to 10g interface [puppet] - 10https://gerrit.wikimedia.org/r/441311 (https://phabricator.wikimedia.org/T193655) [22:00:53] (03CR) 10Bstorm: [C: 032] Change labstore1008 to 10g interface [puppet] - 10https://gerrit.wikimedia.org/r/441311 (https://phabricator.wikimedia.org/T193655) (owner: 10Bstorm) [22:05:37] (03CR) 10Reedy: [C: 031] "LGTM (I haven't checked/compared the dblists)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441096 (https://phabricator.wikimedia.org/T181165) (owner: 10Hagar Shilo) [23:03:20] 10Operations, 10Wikimedia-Mailing-lists: New mail list for Signpost team - https://phabricator.wikimedia.org/T197732#4304175 (10Brianhe) Just a brief note that we don't have a backup admin yet, I have to make sure they are OK with email address appearing on Phabricator. If possible I'd like to assign them afte... [23:21:40] (03PS1) 10EBernhardson: prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 [23:22:42] (03CR) 10jerkins-bot: [V: 04-1] prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 (owner: 10EBernhardson) [23:24:36] (03PS2) 10EBernhardson: prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 [23:25:06] (03CR) 10Reedy: "I'm not sure without trying to test it. It's a good candidate for mwdebug" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440834 (https://phabricator.wikimedia.org/T197577) (owner: 10MarcoAurelio) [23:25:32] (03CR) 10jerkins-bot: [V: 04-1] prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 (owner: 10EBernhardson) [23:30:48] (03PS3) 10Reedy: Increase password policies for 'steward' to max [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440834 (https://phabricator.wikimedia.org/T197577) (owner: 10MarcoAurelio) [23:40:17] (03PS3) 10EBernhardson: prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321