[02:02:58] (03CR) 10Yuvipanda: WIP Add simple-json-datasource plugin to labs grafana (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/302119 (https://phabricator.wikimedia.org/T141636) (owner: 10Addshore) [02:03:05] (03CR) 10Yuvipanda: "Thanks for the patch :D" [puppet] - 10https://gerrit.wikimedia.org/r/302119 (https://phabricator.wikimedia.org/T141636) (owner: 10Addshore) [02:21:07] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.12) (duration: 08m 22s) [02:21:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:26:44] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Aug 1 02:26:44 UTC 2016 (duration 5m 38s) [02:26:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:20:28] !log aaron@tin Synchronized php-1.28.0-wmf.12/includes/filerepo/file/LocalFile.php: c4f34e7a12baa9 (duration: 00m 44s) [03:20:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:21:59] 06Operations, 10MediaWiki-Cache, 10Traffic, 05MW-1.28-release-notes, and 3 others: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2510875 (10aaron) Thumbail deferred updates were also post-send for the last few days by oversight, which was fixed/deployed in... [03:34:18] PROBLEM - puppet last run on sca2001 is CRITICAL: CRITICAL: puppet fail [04:01:27] RECOVERY - puppet last run on sca2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:19:18] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24 [04:20:19] (03CR) 10KartikMistry: add cron job for Content Translation dumps (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) (owner: 10ArielGlenn) [04:21:07] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [05:05:48] PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 1 mailman queue(s) above limits (thresholds: bounces: 25 in: 25 virgin: 25) [05:06:48] PROBLEM - puppet last run on cp2009 is CRITICAL: CRITICAL: Puppet has 1 failures [05:07:37] RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below the limits. [05:31:58] RECOVERY - puppet last run on cp2009 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:52:17] PROBLEM - puppet last run on mw2172 is CRITICAL: CRITICAL: puppet fail [05:53:17] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:19] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:20] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:21] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:22] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:23] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:24] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:25] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:27] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:28] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:29] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:30] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:31] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:32] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:34] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:35] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:36] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:37] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:38] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:39] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:40] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:42] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:43] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:44] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:46] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:47] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:48] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:49] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:51] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:52] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:53] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:54] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:55] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [05:53:56] MIERDA.. FUCK IT IS WIKIPEDIA.. PUDRANSE TODOS LOS DE #WIKIPEDIA-ES HIJOS DE PERRA [06:00:30] wtf [06:00:39] Seriously... [06:02:49] Bsadowski1: Couple of days now. [06:04:09] in commons, wikidata… pretty promiscuous [06:09:17] is there someone that is an op in #countervandalism? [06:16:39] MasterMarshall: #wikimedia-ops is the place to ask [06:17:01] thanks [06:17:41] MasterMarshall: Most channels, people are stalking ! ops.. without the space [06:21:18] RECOVERY - puppet last run on mw2172 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:21:33] ok [06:30:57] PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: puppet fail [06:31:28] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:37] PROBLEM - puppet last run on ms-be2022 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:58] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:52:46] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:48] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:49] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:50] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:51] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:52] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:53] !ops [06:52:53] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:54] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:56] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:58] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:52:59] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:53:00] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:53:01] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:53:02] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:53:03] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:53:06] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:53:08] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:53:10] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:53:11] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:53:12] !staff JEM Y UAWIKI Y MATIIA SON BASURAS. . #WIKIPEDIA-ES VALE MIERDA [06:55:59] RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:56:29] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:39] RECOVERY - puppet last run on ms-be2022 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:56:58] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:00:58] PROBLEM - puppet last run on mw1260 is CRITICAL: CRITICAL: Puppet has 1 failures [07:26:17] RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [07:42:38] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2510990 (10elukey) Checked if a 304 response would return the body with the new... [08:04:17] 06Operations, 10Ops-Access-Requests: Requesting access to deployment access for Niharika - https://phabricator.wikimedia.org/T141593#2511019 (10ema) p:05Triage>03Normal [08:15:22] (03CR) 10Thiemo Mättig (WMDE): [C: 031] Gerrit: Avoid breaking full phabricator URLs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [08:22:55] (03CR) 10Thiemo Mättig (WMDE): "An other suggestion to avoid the use of $2:" [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [08:25:25] (03PS2) 10Ema: cache_maps: remove varnish3 VCL compat [puppet] - 10https://gerrit.wikimedia.org/r/302074 (https://phabricator.wikimedia.org/T122880) [08:26:13] (03CR) 10Ema: [C: 032 V: 032] cache_maps: remove varnish3 VCL compat [puppet] - 10https://gerrit.wikimedia.org/r/302074 (https://phabricator.wikimedia.org/T122880) (owner: 10Ema) [08:26:54] 06Operations, 10Ops-Access-Requests: Requesting access to stat1003.eqiad.wmnet for WMDE-jand - https://phabricator.wikimedia.org/T141339#2511048 (10Abraham) Hi all, I approve this request /cc @Dzahn. Thank you. [08:27:44] 06Operations, 10Ops-Access-Requests: Requesting deployment access for Niharika - https://phabricator.wikimedia.org/T141593#2511050 (10Legoktm) [08:31:29] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: puppet fail [08:46:04] (03CR) 10MarcoAurelio: "To SWAT officer: after merge, please run the following:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301807 (https://phabricator.wikimedia.org/T140566) (owner: 10Dereckson) [08:47:48] (03PS1) 10Jcrespo: Sort s3.dblist in lexicographical order [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302223 [08:49:51] (03PS2) 10Jcrespo: Sort s3.dblist in lexicographical order [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302223 [08:51:06] (03PS5) 10ArielGlenn: add cron job for Content Translation dumps [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) [08:52:09] (03CR) 10ArielGlenn: add cron job for Content Translation dumps (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) (owner: 10ArielGlenn) [08:55:29] 06Operations, 10media-storage, 07Tracking: expand swift hardware in codfw/eqiad (tracking) - https://phabricator.wikimedia.org/T130012#2511086 (10fgiunchedi) [08:55:32] 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2511081 (10fgiunchedi) 05Open>03Resolved this is completed, though see also T136631 as we likely need to upgrade the controller firmware on these boxes too [08:57:17] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [09:04:04] (03CR) 10Addshore: WIP Add simple-json-datasource plugin to labs grafana (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/302119 (https://phabricator.wikimedia.org/T141636) (owner: 10Addshore) [09:04:07] (03PS3) 10Addshore: WIP Add simple-json-datasource plugin to labs grafana [puppet] - 10https://gerrit.wikimedia.org/r/302119 (https://phabricator.wikimedia.org/T141636) [09:04:36] (03PS4) 10Addshore: WIP Add simple-json-datasource plugin to labs grafana [puppet] - 10https://gerrit.wikimedia.org/r/302119 (https://phabricator.wikimedia.org/T141636) [09:06:18] (03CR) 10jenkins-bot: [V: 04-1] WIP Add simple-json-datasource plugin to labs grafana [puppet] - 10https://gerrit.wikimedia.org/r/302119 (https://phabricator.wikimedia.org/T141636) (owner: 10Addshore) [09:07:10] (03PS5) 10Addshore: WIP Add simple-json-datasource plugin to labs grafana [puppet] - 10https://gerrit.wikimedia.org/r/302119 (https://phabricator.wikimedia.org/T141636) [09:08:11] (03CR) 10Jcrespo: "It could be that they are ordered in binary collation, rather than unicode, in that case only one wiki is missplaced:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302223 (owner: 10Jcrespo) [09:09:41] (03PS1) 10ArielGlenn: reduce cronspam from dataset pagecounts rsync [puppet] - 10https://gerrit.wikimedia.org/r/302225 [09:12:14] (03CR) 10ArielGlenn: [C: 032] reduce cronspam from dataset pagecounts rsync [puppet] - 10https://gerrit.wikimedia.org/r/302225 (owner: 10ArielGlenn) [09:13:04] (03CR) 10Jcrespo: "diff s3.dblist <(LC_ALL=C sort s3.dblist)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302223 (owner: 10Jcrespo) [09:21:52] 06Operations, 10ops-codfw, 10ops-eqiad, 10media-storage: audit / test / upgrade hp smartarray firmware - https://phabricator.wikimedia.org/T141756#2511148 (10fgiunchedi) [09:22:05] 06Operations, 10ops-eqiad, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2341913 (10fgiunchedi) a:05RobH>03Cmjohnson @Cmjohnson we'll likely require firmware upgrades on these machines, see also T141756 [09:25:22] !log dropping index name_type_patrolled_timestamp on zhwiki on db1060 and db1054 T140108 [09:25:23] T140108: ApiQueryRecentChanges::run is spiking, nuking API servers - https://phabricator.wikimedia.org/T140108 [09:25:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:25:55] 06Operations, 10ops-codfw, 10ops-eqiad, 10media-storage: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756#2511169 (10fgiunchedi) [09:32:52] apergos: ^ the issue we were talking about on sat [09:33:06] yeah, I saw your notes on the tickets [09:33:19] hopefully that will fix it [09:33:31] interesting [09:34:36] indeed, I was looking at other controller models across the HPs and elastic/restbase machines might need the upgrade too [09:48:04] 06Operations, 10ops-codfw, 10ops-eqiad, 10media-storage: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756#2511197 (10fgiunchedi) The hp machines with raid controllers ```lines=5 root@neodymium:~# salt --out raw -C 'G@manufacturer:hp' cmd.run '[ -x /usr/sbin/hp... [09:49:26] godog: have you investigated what it would take to upgrade the firmware? [09:50:16] paravoid: yeah from the HP support page they provide an rpm to do it online, or put files on a usb key and do it offline [09:50:35] I've never done it before though, wikitech doesn't seem to have hp-specific firmware instructions [09:50:58] HP had an upgrade CD [09:51:38] but it'd be nicer if we did it from within the system [09:51:52] (03CR) 10Paladox: Gerrit: Avoid breaking full phabricator URLs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [09:52:42] yeah that'd be nicer, I'll poke at the rpm and see what's inside [09:53:10] (03CR) 10Paladox: "Currently we carnt do # links until we make sure it dosent cause gerrit to fail." [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [09:53:54] (03CR) 10Paladox: "@Thiemo Mättig (WMDE) I'm wondering if you can get it to only work with the full phabricator link which will allow us to test it in Bug: b" [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [09:53:58] (03CR) 10Paladox: "please." [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [09:56:44] (03PS1) 10ArielGlenn: combine steps in dump stages for enwiki so that dependent steps are together [puppet] - 10https://gerrit.wikimedia.org/r/302230 [09:59:17] (03CR) 10ArielGlenn: [C: 032] combine steps in dump stages for enwiki so that dependent steps are together [puppet] - 10https://gerrit.wikimedia.org/r/302230 (owner: 10ArielGlenn) [10:00:49] (03Draft2) 10Paladox: Gerrit: Support having phab commits as links [puppet] - 10https://gerrit.wikimedia.org/r/302229 [10:01:10] (03PS2) 10Paladox: Gerrit: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) [10:01:16] paravoid: looks fairly straightforward, I'll try on a ms-be codfw box [10:01:30] (03PS3) 10Paladox: Gerrit: Support having phab commits as links [puppet] - 10https://gerrit.wikimedia.org/r/302229 [10:11:01] !log reboot ms-be2027 after raid controller fw upgrade T141756 [10:11:02] T141756: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 [10:11:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:44:06] 06Operations, 10ops-codfw, 10ops-eqiad, 10media-storage: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756#2511254 (10fgiunchedi) Looking at the fleet of controller models, not all are covered by the same firmware edition/version. Namely the `P420i` isn't includ... [11:02:35] 06Operations, 10Datasets-General-or-Unknown: reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 - https://phabricator.wikimedia.org/T140439#2511294 (10ArielGlenn) snapshot1002,3,4 are now ready for decommissioning as all functionality has been moved off the misc cron host and nothing now run... [11:15:29] godog: ms-fe3001/2 alerts? is this you? [11:21:05] paravoid: ah yes it is, thanks! I've cleaned the containers there but didn't recreate the monitoring one, doing it now [11:29:05] looks like the firmware upgrade was fine on ms-be2027, I'm tempted to go with ms-be1026 (ms-be1027 isn't in service yet) [11:30:07] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1890 [11:35:17] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2190 [11:37:58] mhh frack eh, might be related to https://phabricator.wikimedia.org/T126314 [11:40:08] RECOVERY - check_mysql on fdb2001 is OK: Uptime: 413827 Threads: 1 Questions: 2577677 Slow queries: 2632 Opens: 664 Flush tables: 2 Open tables: 578 Queries per second avg: 6.228 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [11:51:03] Any mantainence planned this morning people? [11:51:12] Seeing ocassional "Techncial Difficulties' signs [11:51:17] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [11:51:32] And getting some odd before on Quarry queries concerning category links [11:51:46] (it's using old data meaning it gives bad results). [11:52:58] 06Operations, 10ops-eqiad: Megaraid controller reset due to (what seemsa) a faulty disk on analytics1045 - https://phabricator.wikimedia.org/T141761#2511340 (10elukey) [11:53:18] ShakespeareFan00, there was some heavy queries on labsdb1003 this morning [11:53:30] jynus: Any idea which users? [11:53:39] I was doing some BIG queries earlier [11:53:40] but nothing on production, that I can see [11:53:56] 06Operations, 10ops-eqiad: Megaraid controller reset due to (what seemsa) a faulty disk on analytics1045 - https://phabricator.wikimedia.org/T141761#2511353 (10elukey) p:05Triage>03Normal [11:54:03] can you check the logs to determine if a particular user was doing something? [11:56:18] ShakespeareFan00, I do not think it was a specific user, just the confluence of several [11:56:42] if you still want me to check load when you run anything, please create a ticker and I will do it [11:56:55] jynus: How do i set up a ticker? [11:56:57] (the ticket is needed to check ownership of the account) [11:57:01] (or do you mean a phab ticket.) [11:57:06] yes, sorry [11:57:13] Will bear that in mind [11:57:23] thanks [11:57:25] I do not see right now nothing strange [11:57:28] * ShakespeareFan00 otu [11:57:31] and load is lower [11:57:58] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [11:59:33] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2511360 (10elukey) >>! In T73487#2510990, @elukey wrote: > So the new version o... [12:01:37] 06Operations, 10Datasets-General-or-Unknown: decommission snapshot1002, 1003, 1004 - https://phabricator.wikimedia.org/T141762#2511362 (10ArielGlenn) [12:01:58] 06Operations, 10Datasets-General-or-Unknown: decommission snapshot1002, 1003, 1004 - https://phabricator.wikimedia.org/T141762#2511376 (10ArielGlenn) [12:02:00] 06Operations, 10Datasets-General-or-Unknown: reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 - https://phabricator.wikimedia.org/T140439#2511375 (10ArielGlenn) [12:03:38] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:04:17] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:04:19] 06Operations, 10Datasets-General-or-Unknown, 10hardware-requests: decommission snapshot1002, 1003, 1004 - https://phabricator.wikimedia.org/T141762#2511378 (10Peachey88) [12:07:31] there was a big spike of 500s https://grafana.wikimedia.org/dashboard/db/varnish-http-errors?panelId=17&fullscreen [12:08:09] (03PS1) 10ArielGlenn: remove all traces of snapshot1002 1003 1004 [puppet] - 10https://gerrit.wikimedia.org/r/302241 [12:09:48] elukey: yeah, in the grand scheme of things isn't very big, couldn't find anything obvious from the logs on oxygen tho [12:09:57] (03CR) 10ArielGlenn: [C: 032] remove all traces of snapshot1002 1003 1004 [puppet] - 10https://gerrit.wikimedia.org/r/302241 (owner: 10ArielGlenn) [12:10:18] godog: I checked logstash and same thing, nothing noticeable [12:12:26] elukey: indeed [12:12:35] I'll go to lunch [12:19:14] 06Operations, 10Datasets-General-or-Unknown, 10hardware-requests: decommission snapshot1002, 1003, 1004 - https://phabricator.wikimedia.org/T141762#2511401 (10Peachey88) [12:26:29] 06Operations, 10Datasets-General-or-Unknown, 10hardware-requests: decommission snapshot1002, 1003, 1004 - https://phabricator.wikimedia.org/T141762#2511407 (10ArielGlenn) These hosts have been shut off and their puppet certs, storedconfigs and their salt keys removed. Handing off to dc-tech folks. [12:27:14] 06Operations, 10Datasets-General-or-Unknown, 10hardware-requests: decommission snapshot1002, 1003, 1004 - https://phabricator.wikimedia.org/T141762#2511409 (10ArielGlenn) [12:28:01] 06Operations, 10ops-eqiad, 10hardware-requests: decommission snapshot1002, 1003, 1004 - https://phabricator.wikimedia.org/T141762#2511411 (10ArielGlenn) [12:31:16] PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [12:33:25] RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [12:36:15] PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [12:38:06] RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [12:42:26] 06Operations, 10Datasets-General-or-Unknown: reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 - https://phabricator.wikimedia.org/T140439#2511431 (10ArielGlenn) asking @RobH if you know if we have any hosts with Perc H200 running rAID (I guess RAID 1 would do), and if so, any gotchas I nee... [12:45:56] PROBLEM - Disk space on francium is CRITICAL: DISK CRITICAL - free space: / 1537 MB (3% inode=75%) [12:46:42] !log zotero translators deployed cde2f75 [12:46:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:54:16] (03CR) 10Thiemo Mättig (WMDE): "What's happening here, in detail (I added spaces to work around all the bugs, please ignore these spaces):" [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [12:55:25] PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [12:56:26] (03CR) 10Paladox: "Hi when you do for example just" [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [12:57:25] RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [13:09:15] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05MW-1.28-release-notes, and 3 others: Create Wikipedia Tulu - https://phabricator.wikimedia.org/T140898#2511461 (10Dereckson) Okay, so let's wait wmf13 is deployed and we create the wiki? [13:09:39] disk space on francium is my fault folks, cleaning up now [13:10:06] RECOVERY - Disk space on francium is OK: DISK OK [13:14:48] 06Operations, 10ops-eqiad, 10Analytics-Cluster: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#2511466 (10elukey) @Cmjohnson ping :) [13:20:46] (03PS1) 10Alexandros Kosiaris: puppetmaster: Add one final round of codfw boxes [puppet] - 10https://gerrit.wikimedia.org/r/302244 [13:21:19] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] puppetmaster: Add one final round of codfw boxes [puppet] - 10https://gerrit.wikimedia.org/r/302244 (owner: 10Alexandros Kosiaris) [13:23:14] 503 on phabricator [13:23:42] I can't reproduce [13:23:47] not 503s for me [13:24:16] jynus: on POST or GET ? [13:24:21] GET [13:24:27] I do not see db issues, not I can repeat it [13:24:39] trasnient ? [13:24:44] could there be indeed something ongoing on varnish level [13:24:55] remember there was someone else here before [13:25:06] (at a very low percentage) [13:25:59] no, levels are low [13:26:15] except for a couple of spikes some hours ago [13:31:25] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [13:32:46] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [13:34:46] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:35:25] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:36:39] (03PS1) 10Alexandros Kosiaris: puppetmaster: Switch all of codfw to the new server [puppet] - 10https://gerrit.wikimedia.org/r/302246 [13:37:31] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] puppetmaster: Switch all of codfw to the new server [puppet] - 10https://gerrit.wikimedia.org/r/302246 (owner: 10Alexandros Kosiaris) [13:48:36] !log reboot ms-be1023 after raid controller fw upgrade T141756 [13:48:37] T141756: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 [13:48:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:51:46] (03CR) 10MarcoAurelio: "@Nemo: I'd say that NS_PROJECT can be 'ವಿಕಿಪೀಡಿಯ', as it translates to 'Wikipedia'. It's the same as SiteName, although MetaNamespaces is " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/300182 (https://phabricator.wikimedia.org/T140898) (owner: 10Paladox) [14:02:40] mafk: yeah that is ok [14:02:53] 06Operations, 10Traffic: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2511537 (10BBlack) Initial run at an equal-pref patch for openssl-1.1.0 uploaded to github: https://github.com/blblack/openssl/commit/3c88126a9814be08d100f1d14e660d31b7fece75 I haven't really tested... [14:03:03] :) [14:09:48] 06Operations, 10ops-codfw, 10ops-eqiad, 10media-storage: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756#2511542 (10fgiunchedi) fw upgrade on ms-be1023 seems to be ok, it still takes ~40s for `check_hpssacli` though, waiting for a bit and see if the kernel mes... [14:15:24] 06Operations, 10Traffic: Age header reset to 0 after 24 hours on varnish frontends - https://phabricator.wikimedia.org/T141373#2511553 (10ema) I've started looking for an upper bound when it comes to the value of Age coming out of backends and frontends in eqiad sampling 5 minutes of traffic. Objects with Age... [14:18:55] 06Operations, 10MediaWiki-Releasing, 10Parsoid, 06Release-Engineering-Team: debian signing keyid E84AFDD2 has expired - https://phabricator.wikimedia.org/T141400#2511561 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi resolving, key updated on wikitech/mediawiki.org/etc [14:22:54] 06Operations, 10Traffic: Age header reset to 0 after 24 hours on varnish frontends - https://phabricator.wikimedia.org/T141373#2511583 (10BBlack) It will probably take significantly longer to get the same statistical certainty on the backend, due to the reduced request rate there. [14:24:35] (03PS1) 10Alexandros Kosiaris: puppetmaster: Switch over eqiad [puppet] - 10https://gerrit.wikimedia.org/r/302254 [14:25:15] 06Operations, 10Traffic: Age header reset to 0 after 24 hours on varnish frontends - https://phabricator.wikimedia.org/T141373#2511584 (10ema) >>! In T141373#2511583, @BBlack wrote: > It will probably take significantly longer to get the same statistical certainty on the backend, due to the reduced request rat... [14:31:01] (03PS1) 10Ottomata: 1.2.5 release [debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/302255 [14:34:44] jouncebot next [14:34:45] In 0 hour(s) and 25 minute(s): Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160801T1500) [14:38:55] (03CR) 10Alexandros Kosiaris: [C: 032] "https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=puppetmaster&vl=load&x=&n=&hreg[]=%28palladium%7Cstrontium%7Crhodium%" [puppet] - 10https://gerrit.wikimedia.org/r/302254 (owner: 10Alexandros Kosiaris) [14:40:58] !log restarting slapd on serpens to reclaim leaked memory [14:41:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:46:03] (03CR) 10Andrew Bogott: [C: 031] "I have to take that tangled commandline on faith, but I would very much like to see this merged soon." [puppet] - 10https://gerrit.wikimedia.org/r/300902 (https://phabricator.wikimedia.org/T130593) (owner: 10Dzahn) [14:56:15] (03PS1) 10Addshore: Remove dewiki_diffstats logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302259 (https://phabricator.wikimedia.org/T135751) [14:57:29] oh hi mafk ! :P [14:58:19] * mafk hides from addshore [14:58:34] 06Operations, 10Traffic, 10Wikimedia-Planet: mixed-content issues on planet.wikimedia.org - https://phabricator.wikimedia.org/T141480#2511673 (10ema) p:05Triage>03Normal [15:00:05] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160801T1500). Please do the needful. [15:00:05] mafk and Addshore: A patch you scheduled for Morning SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [15:00:10] *waves* [15:00:20] meow [15:00:38] I can SWAT today. [15:00:41] I can do my patches if you would like thcipriani ;) [15:00:51] addshore: awesome :) [15:01:05] so it leaves thcipriani and me [15:01:21] sorry for torturing you today again :) [15:01:27] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301761 (https://phabricator.wikimedia.org/T141421) (owner: 10MarcoAurelio) [15:01:27] thcipriani: in theory I could do them all, but I should let you do something! ;) [15:01:58] (03Merged) 10jenkins-bot: Expanding throttle limits for enwiki Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301761 (https://phabricator.wikimedia.org/T141421) (owner: 10MarcoAurelio) [15:02:36] addshore: heh, noteworthy that it is rare that I find myself itching to do SWAT :) [15:02:57] haha! [15:04:33] !log thcipriani@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:301761|Expanding throttle limits for enwiki Edit-a-thon (T141421)]] (duration: 00m 24s) [15:04:34] T141421: Request to lift IP cap for Wikipedia Edit-a-thon on 2016-08-03 - https://phabricator.wikimedia.org/T141421 [15:04:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:04:40] ^ mafk throttle limits updated [15:04:47] thcipriani: ack, thanks [15:05:01] can't test that, any errors in the logs? [15:05:51] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302054 (https://phabricator.wikimedia.org/T140563) (owner: 10Matanya) [15:06:04] (03PS4) 10Thcipriani: Allow sysops on he.wiktionary to remove autopatroller and patroller user rights. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302054 (https://phabricator.wikimedia.org/T140563) (owner: 10Matanya) [15:06:35] mafk: nope, no new errors from the last patch. [15:06:47] ktnx [15:06:47] (03CR) 10Thcipriani: Allow sysops on he.wiktionary to remove autopatroller and patroller user rights. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302054 (https://phabricator.wikimedia.org/T140563) (owner: 10Matanya) [15:06:54] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302054 (https://phabricator.wikimedia.org/T140563) (owner: 10Matanya) [15:07:25] (03Merged) 10jenkins-bot: Allow sysops on he.wiktionary to remove autopatroller and patroller user rights. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302054 (https://phabricator.wikimedia.org/T140563) (owner: 10Matanya) [15:07:38] (03PS1) 10Alexandros Kosiaris: varnish: prepend @ to site in VCL templates [puppet] - 10https://gerrit.wikimedia.org/r/302260 [15:08:43] mafk: https://gerrit.wikimedia.org/r/#/c/302054 is live on mw1099, check with X-Wikimedia-Debug please [15:08:49] on it [15:10:01] thcipriani: looks ok on mw1099.equiad [15:10:13] mafk: ack, thanks, deploying everywhere [15:11:43] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:302054|Allow sysops on he.wiktionary to remove autopatroller and patroller user rights (T140563)]] (duration: 00m 30s) [15:11:44] T140563: Configuration error on user rights for he.wiktionary - https://phabricator.wikimedia.org/T140563 [15:11:45] ^ mafk check live please [15:11:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:11:49] on it [15:12:17] looks good on Special:ListGroupRights@hewiktionary [15:12:30] (03PS2) 10Thcipriani: Set favicon for mk.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301807 (https://phabricator.wikimedia.org/T140566) (owner: 10Dereckson) [15:12:51] cool, thanks for checking [15:13:05] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301807 (https://phabricator.wikimedia.org/T140566) (owner: 10Dereckson) [15:13:30] (03Merged) 10jenkins-bot: Set favicon for mk.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301807 (https://phabricator.wikimedia.org/T140566) (owner: 10Dereckson) [15:13:34] thcipriani: this requires a maintenance script, I left a comment in the gerrit patch, did you see it? [15:13:44] mafk: yup, saw it :) [15:13:48] thanks for the reminder. [15:13:51] great :D [15:13:54] no probs [15:15:57] PROBLEM - puppet last run on titanium is CRITICAL: CRITICAL: puppet fail [15:17:05] PROBLEM - puppet last run on potassium is CRITICAL: CRITICAL: puppet fail [15:17:14] hmm [15:17:26] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: puppet fail [15:17:46] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:301807|Set favicon for mk.wiktionary (T140566)]] (duration: 00m 25s) [15:17:47] T140566: On mk.wiktionary, rename namespace Wiktionary to Викиречник and change logo - https://phabricator.wikimedia.org/T140566 [15:17:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:18:06] PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: puppet fail [15:18:22] ^ mafk check please [15:19:02] thcipriani: no change, can you purge the static resources from terbium? [15:19:13] it has to be done on en.wikipedia regardless of the project [15:19:34] mafk: yup, run. [15:19:44] ok, I purged the caché [15:19:51] favicon is updated as well [15:19:56] WFM [15:20:06] cool, thanks :) [15:20:11] addshore: all yours! [15:20:17] great! [15:20:19] *starts* [15:20:27] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: puppet fail [15:20:33] (03PS2) 10Addshore: Beta move $wgEchoMentionStatusNotifications to CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301820 (https://phabricator.wikimedia.org/T140234) [15:20:35] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [15:20:37] (03CR) 10ArielGlenn: [C: 031] "If 50% is the good number, my awk-fu says that's a good command." [puppet] - 10https://gerrit.wikimedia.org/r/300902 (https://phabricator.wikimedia.org/T130593) (owner: 10Dzahn) [15:20:39] (03CR) 10Addshore: [C: 032] Beta move $wgEchoMentionStatusNotifications to CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301820 (https://phabricator.wikimedia.org/T140234) (owner: 10Addshore) [15:20:53] all the periodic table is failing :P [15:21:05] (03Merged) 10jenkins-bot: Beta move $wgEchoMentionStatusNotifications to CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301820 (https://phabricator.wikimedia.org/T140234) (owner: 10Addshore) [15:21:30] thanks thcipriani :) [15:21:56] mafk: yup, thanks for the patches :) [15:22:40] !log addshore@tin Synchronized wmf-config/CommonSettings-labs.php: [[gerrit:301820|Beta move $wgEchoMentionStatusNotifications to CommonSettings]] PART 1/2 (duration: 00m 24s) [15:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:23:08] !log addshore@tin Synchronized wmf-config/InitialiseSettings-labs.php: [[gerrit:301820|Beta move $wgEchoMentionStatusNotifications to CommonSettings]] PART 2/2 (duration: 00m 24s) [15:23:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:23:16] 06Operations, 10Traffic: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2511727 (10BBlack) OpenSSL upstream confirms equal-pref ciphers groups won't make 1.1, and they're looking to revamp ciphersuite selection stuff more significantly in the long term in light of TLSv1.... [15:23:31] (03CR) 10Addshore: [C: 032] Debug logging for T138987 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302085 (https://phabricator.wikimedia.org/T138987) (owner: 10Addshore) [15:23:34] (03PS2) 10Addshore: Debug logging for T138987 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302085 (https://phabricator.wikimedia.org/T138987) [15:23:35] PROBLEM - MegaRAID on graphite1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [15:23:47] (03CR) 10Addshore: [C: 032] Debug logging for T138987 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302085 (https://phabricator.wikimedia.org/T138987) (owner: 10Addshore) [15:24:11] (03Merged) 10jenkins-bot: Debug logging for T138987 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302085 (https://phabricator.wikimedia.org/T138987) (owner: 10Addshore) [15:25:12] (03PS2) 10Addshore: Remove T124356 debug logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302114 (https://phabricator.wikimedia.org/T124356) [15:25:25] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:302085|Debug logging for T138987]] (duration: 00m 26s) [15:25:26] T138987: Notice: Undefined index: width in /srv/mediawiki/php-1.28.0-wmf.7/includes/Linker.php - https://phabricator.wikimedia.org/T138987 [15:26:00] (03CR) 10Addshore: [C: 032] Remove T124356 debug logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302114 (https://phabricator.wikimedia.org/T124356) (owner: 10Addshore) [15:26:21] (03Merged) 10jenkins-bot: Remove T124356 debug logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302114 (https://phabricator.wikimedia.org/T124356) (owner: 10Addshore) [15:26:36] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: puppet fail [15:27:14] (03PS2) 10Addshore: Remove dewiki_diffstats logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302259 (https://phabricator.wikimedia.org/T135751) [15:27:32] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:302114|Remove T124356 debug logging]] (duration: 00m 25s) [15:27:33] T124356: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356 [15:27:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:27:52] (03CR) 10Addshore: [C: 032] Remove dewiki_diffstats logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302259 (https://phabricator.wikimedia.org/T135751) (owner: 10Addshore) [15:28:07] (03CR) 10BBlack: [C: 032] openssl (1.0.2h-1~wmf2) jessie-wikimedia; urgency=medium [debs/openssl] - 10https://gerrit.wikimedia.org/r/301903 (owner: 10BBlack) [15:28:14] (03Merged) 10jenkins-bot: Remove dewiki_diffstats logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302259 (https://phabricator.wikimedia.org/T135751) (owner: 10Addshore) [15:28:26] PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: puppet fail [15:29:21] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:302259|Remove dewiki_diffstats logging]] (duration: 00m 26s) [15:29:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:30:04] 06Operations, 10Wikimedia-Logstash, 10service-runner: service-runner events not showing up in Kibana since 2016-07-28 21:00 UTC - https://phabricator.wikimedia.org/T141776#2511781 (10ssastry) [15:31:01] 06Operations, 10Wikimedia-Logstash, 10service-runner: service-runner events not showing up in Kibana since 2016-07-28 21:00 UTC - https://phabricator.wikimedia.org/T141776#2511794 (10ssastry) p:05Triage>03High [15:31:26] last one [15:31:26] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: puppet fail [15:31:46] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: puppet fail [15:32:26] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [15:32:36] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: puppet fail [15:32:50] lots of puppet 'problems' [15:33:39] 06Operations, 10Parsoid, 10Wikimedia-Logstash, 10service-runner: Parsoid's service-runner events not showing up in Kibana since 2016-07-28 21:00 UTC - https://phabricator.wikimedia.org/T141776#2511796 (10ssastry) [15:37:05] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: puppet fail [15:37:22] (03PS1) 10Eevans: Enable Cassandra instance restbase1015-c.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/302263 (https://phabricator.wikimedia.org/T134016) [15:37:50] what a cocktail of elements :) [15:37:59] (03CR) 10Eevans: [C: 031] "This is ready to be merged." [puppet] - 10https://gerrit.wikimedia.org/r/302263 (https://phabricator.wikimedia.org/T134016) (owner: 10Eevans) [15:38:15] 06Operations, 10Parsoid, 10Wikimedia-Logstash, 10service-runner: Parsoid's service-runner events not showing up in Kibana since 2016-07-28 21:00 UTC - https://phabricator.wikimedia.org/T141776#2511800 (10ssastry) More relevant info: ``` Pchelolo, mobrovac, looks like this may not be specific to pa... [15:38:51] !log addshore@tin Synchronized php-1.28.0-wmf.12/includes/Linker.php: [[gerrit:302226|Debug Logging for Undefined index: width in Linker.php]] (duration: 00m 25s) [15:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:39:01] all done here thcipriani [15:39:06] PROBLEM - puppet last run on strontium is CRITICAL: CRITICAL: puppet fail [15:39:34] addshore: nice :) [15:39:58] Is there someone that can merge https://gerrit.wikimedia.org/r/#/c/302263/ for me? It's a Cassandra instance bootstrap in the RESTBase cluster (the very last one in this expansion, as a matter of fact). [15:40:10] * urandom is so excited [15:40:18] 06Operations, 10Parsoid, 10Wikimedia-Logstash, 10service-runner: Parsoid's service-runner events not showing up in Kibana since 2016-07-28 21:00 UTC - https://phabricator.wikimedia.org/T141776#2511781 (10mobrovac) Indeed, not specific to Parsoid. Change Prop logs are missing completely too. [15:41:13] 06Operations, 10Parsoid, 06Services, 10Wikimedia-Logstash, and 2 others: Parsoid's service-runner events not showing up in Kibana since 2016-07-28 21:00 UTC - https://phabricator.wikimedia.org/T141776#2511820 (10mobrovac) [15:42:13] urandom: o/ [15:42:21] elukey: o/ [15:42:45] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: puppet fail [15:42:45] checking [15:43:14] urandom: ready for the merge right? [15:43:21] elukey: yup! [15:43:28] all right, looks good, merging [15:43:36] elukey: thank you sir! [15:43:47] 06Operations, 10Parsoid, 06Services, 10Wikimedia-Logstash, and 2 others: Parsoid's service-runner events not showing up in Kibana since 2016-07-28 21:00 UTC - https://phabricator.wikimedia.org/T141776#2511823 (10ssastry) [[https://logstash.wikimedia.org/app/kibana#/dashboard/default?_g=(refreshInterval:(d... [15:44:13] (03CR) 10Elukey: [C: 032] "Reverse of 10.64.48.140 is restbase1015-c.eqiad.wmnet, plus I had a chat with urandom on IRC and he is ready for the merge. Looks good to " [puppet] - 10https://gerrit.wikimedia.org/r/302263 (https://phabricator.wikimedia.org/T134016) (owner: 10Eevans) [15:45:24] urandom: done! [15:45:25] PROBLEM - Host eeden is DOWN: PING CRITICAL - Packet loss = 100% [15:45:26] PROBLEM - Host ns2-v4 is DOWN: PING CRITICAL - Packet loss = 100% [15:45:38] elukey: thanks again! [15:47:52] urandom: so, done?! :) !!! [15:48:13] mobrovac: well, in a day or so, but yeah :) [15:48:27] hell yeah! [15:50:38] I think we are doing 4 million queries/minute on enwiki, maybe unsually high? [15:52:16] RECOVERY - Host eeden is UP: PING OK - Packet loss = 0%, RTA = 83.80 ms [15:52:46] RECOVERY - Host ns2-v4 is UP: PING OK - Packet loss = 0%, RTA = 82.95 ms [15:53:11] jynus: DDoS? [15:54:09] no, load seems usual [15:54:20] wow [15:54:34] or a bot making too many queries? [15:54:43] Luke081515, that is only one wiki, we have 900 of those :-) [15:54:44] just brainstorming [15:54:59] jynus: yeah, but not 900 with the size of enwiki :P [15:55:05] true :-) [15:55:25] oops? [15:55:43] probably it is some very fast job queue or some query that takes no time, such as a code change doing SET X=Y; [15:55:58] hm [15:56:03] I will check the performance logs [15:56:20] we have some global renames in progress but I'd say that's not the source of this [15:56:43] to calm you- this is not causing any issues [15:57:15] but I need to investigate to know why the rates are high, when normaly we have 1/2 or 2/3s of that [15:57:16] I'm fine, thanks :) [15:57:42] sometimes this things uncover things that should not be tehre [15:57:56] !log uploaded openssl_1.0.2h-1~wmf2 to carbon (jessie-wikimedia) - https://gerrit.wikimedia.org/r/#/c/301903/ [15:58:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:58:13] !log T134016: Bootstrapping restbase1015-c.eqiad.wmnet [15:58:14] T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016 [15:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:59:41] !log cp1065: canary upgrade of openssl to 1.0.2h-1~wmf2 (+ nginx upgrade-restart) [15:59:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:08:51] robh: I think you can use /msg ChanServ SET TOPICLOCK ON, so that you can set -t here, and you don't need to +o to change the topic. the benefit is, that chanserv reverts a topicchange and sets +r then, if someone unautherized changes the topic [16:09:37] robh: for more info: /msg ChanServ help set topiclock [16:09:56] cool, i'll check it out later, in ops meeting now. (I just changed it cuz someone asked me to ;) [16:10:45] !log Restarted logstash on logstash1001; missing hhvm and service logs; no output to /var/log/logstash/logstash.log for days [16:10:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:13:27] 06Operations, 10Parsoid, 06Services, 10Wikimedia-Logstash, and 3 others: Parsoid's service-runner events not showing up in Kibana since 2016-07-28 21:00 UTC - https://phabricator.wikimedia.org/T141776#2511919 (10ssastry) a:03bd808 [16:14:18] 06Operations, 10Parsoid, 06Services, 10Wikimedia-Logstash, and 3 others: Parsoid's service-runner events not showing up in Kibana since 2016-07-28 21:00 UTC - https://phabricator.wikimedia.org/T141776#2511935 (10ssastry) 05Open>03Resolved Looks like logstash1001 needed a service restart. hhvm logs wer... [16:16:05] legoktm: one global rename stuck, see ml, thanks. [16:16:42] 06Operations, 10Wikimedia-Logstash: Add monitoring for ensuring logstash services are operational - https://phabricator.wikimedia.org/T141783#2511954 (10ssastry) [16:16:52] 06Operations, 10Wikimedia-Logstash: Log event rate types from Logstash to graphite - https://phabricator.wikimedia.org/T141784#2511968 (10mobrovac) [16:17:05] 06Operations, 10Wikimedia-Logstash: Add monitoring for detecting when logstash services are down - https://phabricator.wikimedia.org/T141783#2511980 (10ssastry) [16:21:16] (03CR) 10BBlack: [C: 04-1] "Don't merge this for now, it's just for testing on pinkunicorn." [debs/openssl] - 10https://gerrit.wikimedia.org/r/301920 (https://phabricator.wikimedia.org/T131908) (owner: 10BBlack) [16:24:07] 06Operations, 10hardware-requests: Eqiad: procure 4 servers for kubernetes - https://phabricator.wikimedia.org/T141624#2512002 (10RobH) a:03RobH [16:24:39] (03CR) 10Addshore: [C: 04-1] "needs a manual rebase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302113 (https://phabricator.wikimedia.org/T107711) (owner: 10Addshore) [16:24:41] 06Operations, 10hardware-requests: Eqiad: procure 4 servers for kubernetes - https://phabricator.wikimedia.org/T141624#2505235 (10RobH) This was discussed during our operations meeting. This may be able to be fulfilled with spare units in codfw (and then we can just add more new servers to the spares pool at... [16:27:54] (03PS3) 10Addshore: Remove T107711 debug logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302113 (https://phabricator.wikimedia.org/T107711) [16:39:08] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2512057 (10BBlack) [16:41:02] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2512073 (10BBlack) [16:41:12] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2512057 (10BBlack) [16:41:40] 06Operations, 10Wikimedia-Logstash, 15User-bd808: Log event rate types from Logstash to graphite - https://phabricator.wikimedia.org/T141784#2512081 (10bd808) 05Open>03Resolved a:03bd808 We have the rates (https://grafana.wikimedia.org/dashboard/db/production-logging?panelId=13&fullscreen) so now we ju... [16:58:32] (03CR) 10Alexandros Kosiaris: "apart from a nitpick, change looks good. Need to test though before we merge" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296687 (https://phabricator.wikimedia.org/T139008) (owner: 10Ladsgroup) [16:59:14] 06Operations, 10Wikimedia-Logstash: Add monitoring for detecting when logstash services are down - https://phabricator.wikimedia.org/T141783#2511954 (10bd808) We have monitoring for the service proper, but what has seemed to happen several times is that the java process gets hung up in a gc cycle and stops pro... [16:59:57] !log labs deploy restbase 840411a44 [17:00:01] (03PS2) 10Alexandros Kosiaris: varnish: prepend @ to site in VCL templates [puppet] - 10https://gerrit.wikimedia.org/r/302260 [17:00:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:00:04] gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160801T1700). [17:00:06] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] varnish: prepend @ to site in VCL templates [puppet] - 10https://gerrit.wikimedia.org/r/302260 (owner: 10Alexandros Kosiaris) [17:00:32] twentyafterfour, ostriches: "/bin/sh: 1: /srv/phab/tools/public_task_dump.py: not found" [17:00:35] twentyafterfour, ostriches: phab2001 cronspam [17:01:24] Herp derp. [17:01:29] Lemme find out why [17:01:41] no WDQS deployment scheduled for today [17:02:38] ostriches: also Cron /usr/local/bin/project_changes.sh ("Access denied for user 'phstats'@'10.192.32.147" + /usr/bin/mail: invalid option) [17:03:08] Yeah I knew about that one. I think phab2001 and iridium have different `mail` installed? [17:03:57] (03PS6) 10Addshore: Add simple-json-datasource plugin to labs grafana [puppet] - 10https://gerrit.wikimedia.org/r/302119 (https://phabricator.wikimedia.org/T141636) [17:04:18] yuvipanda: ^^ repo is created and that should be ready to go in (in theroy) [17:04:33] paravoid: Actually, phab_task_dump shouldn't exist on phab2001. Lemme tweak the puppet manifest so the cron disappears properly. [17:05:23] (03CR) 10Ottomata: [C: 032] Add dbslave to statistics::wmde config [puppet] - 10https://gerrit.wikimedia.org/r/302130 (owner: 10Addshore) [17:05:28] (03PS2) 10Ottomata: Add dbslave to statistics::wmde config [puppet] - 10https://gerrit.wikimedia.org/r/302130 (owner: 10Addshore) [17:05:51] ottomata: and it is fine to split my queries / scripts between the master and slave then? [17:06:16] addshore: that's actually not something I know much about [17:06:19] nor does elukey [17:06:28] but i'm sure dan or marcel know a lot [17:06:52] !log staging deploy restbase 840411a44 [17:06:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:07:11] ottomata: okay! [17:08:13] (03PS1) 10Chad: Phab: Properly remove public_task_dump.py if $dump = false [puppet] - 10https://gerrit.wikimedia.org/r/302271 [17:08:19] paravoid: ^^ should fix the first one [17:08:21] 06Operations, 10Datasets-General-or-Unknown: reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 - https://phabricator.wikimedia.org/T140439#2512257 (10RobH) So I looked in a few places: http://www.dell.com/learn/us/en/04/campaigns/dell-raid-controllers The H200 seems very, very much like t... [17:09:24] (03CR) 10jenkins-bot: [V: 04-1] Phab: Properly remove public_task_dump.py if $dump = false [puppet] - 10https://gerrit.wikimedia.org/r/302271 (owner: 10Chad) [17:09:42] 06Operations, 10Datasets-General-or-Unknown: reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 - https://phabricator.wikimedia.org/T140439#2512267 (10ArielGlenn) No need to replace, sw raid should be just fine for this. Thanks for the info! [17:09:44] Oh jenkins, bleh [17:12:17] PROBLEM - restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.200, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [17:13:27] PROBLEM - Restbase root url on xenon is CRITICAL: Connection refused [17:14:57] (03PS1) 10Alexandros Kosiaris: include standard in a number of places [puppet] - 10https://gerrit.wikimedia.org/r/302275 [17:15:12] (03PS2) 10Chad: Phab: Properly remove public_task_dump.py if $dump = false [puppet] - 10https://gerrit.wikimedia.org/r/302271 [17:19:11] (03PS2) 10Alexandros Kosiaris: include standard in a number of places [puppet] - 10https://gerrit.wikimedia.org/r/302275 [17:19:16] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] include standard in a number of places [puppet] - 10https://gerrit.wikimedia.org/r/302275 (owner: 10Alexandros Kosiaris) [17:21:18] (03PS1) 10Chad: Adding my new SSH key to production [puppet] - 10https://gerrit.wikimedia.org/r/302277 [17:22:59] is it best practice to add standard and base::firewall right into the role classes? [17:23:07] I mean if a role class goes on a node [17:23:19] guess I'm asking you that, akosiaris [17:23:40] apergos: yes it is [17:23:55] well there's a changeset waiting for me then, I've got them on the node instead [17:23:57] thank you [17:24:07] don't mention it [17:24:15] ahahaha [17:24:16] so [17:24:20] https://tickets.puppetlabs.com/browse/PUP-864 [17:24:28] guess what ? we are doing that [17:24:33] at least in nova.pp [17:24:50] sigh.. anyway deprecated since 3.5, not yet removed... will survive for now [17:26:38] (03PS1) 10ArielGlenn: include standard and base::firewall in manifest common to snapshots [puppet] - 10https://gerrit.wikimedia.org/r/302279 [17:27:59] (03CR) 10ArielGlenn: [C: 032] include standard and base::firewall in manifest common to snapshots [puppet] - 10https://gerrit.wikimedia.org/r/302279 (owner: 10ArielGlenn) [17:29:27] (03PS1) 10Alexandros Kosiaris: Revert "puppetmaster: Switch over eqiad" [puppet] - 10https://gerrit.wikimedia.org/r/302280 [17:29:44] (03PS2) 10Alexandros Kosiaris: Revert "puppetmaster: Switch over eqiad" [puppet] - 10https://gerrit.wikimedia.org/r/302280 [17:29:49] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Revert "puppetmaster: Switch over eqiad" [puppet] - 10https://gerrit.wikimedia.org/r/302280 (owner: 10Alexandros Kosiaris) [17:31:46] RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.127 second response time [17:32:36] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [17:35:58] !log upgrading openssl package on cache_maps + cache_misc [17:36:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:36:46] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:37:06] RECOVERY - puppet last run on titanium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [17:37:26] RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [17:37:27] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [17:37:27] RECOVERY - puppet last run on labsdb1005 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:37:36] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:37:36] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:37:36] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:37:59] (03PS1) 10ArielGlenn: move ferm rules for dataset roles into common manifest [puppet] - 10https://gerrit.wikimedia.org/r/302281 [17:38:05] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:38:06] RECOVERY - puppet last run on strontium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:38:33] (03PS1) 10Ottomata: Parameterize eventlogging-service access log level so labs can make this more verbose [puppet] - 10https://gerrit.wikimedia.org/r/302282 [17:38:35] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:38:37] (03PS2) 10ArielGlenn: move ferm rules for dataset roles into common manifest [puppet] - 10https://gerrit.wikimedia.org/r/302281 [17:38:45] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:39:26] RECOVERY - puppet last run on fluorine is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:41:00] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:42:38] 06Operations, 06Labs: labnet100[12].eqiad.wmnet need to be reimaged with RAID - https://phabricator.wikimedia.org/T136718#2512444 (10chasemp) we said next week post-Liberty upgrade we will schedule a time for failover and reimage of labnet1002. [17:43:11] PROBLEM - restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.200, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [17:44:51] (03CR) 10ArielGlenn: [C: 032] move ferm rules for dataset roles into common manifest [puppet] - 10https://gerrit.wikimedia.org/r/302281 (owner: 10ArielGlenn) [17:45:12] PROBLEM - cassandra-c CQL 10.64.48.140:9042 on restbase1015 is CRITICAL: Connection refused [17:45:30] (03PS1) 10Dzahn: statistics: set cluster for stat1004 [puppet] - 10https://gerrit.wikimedia.org/r/302283 (https://phabricator.wikimedia.org/T141360) [17:47:11] PROBLEM - Restbase root url on xenon is CRITICAL: Connection refused [17:48:42] (03PS3) 10Paladox: Gerrit: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) [17:48:47] (03PS4) 10Paladox: Gerrit: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) [17:49:11] RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.874 second response time [17:51:22] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [17:52:45] 06Operations, 10ops-eqiad: graphite1002.eqiad.wmnet: slot=10 disk failed - https://phabricator.wikimedia.org/T141795#2512457 (10fgiunchedi) [17:55:50] ACKNOWLEDGEMENT - MegaRAID on graphite1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Filippo Giunchedi disk failed, T141795 [18:10:28] !log upgrading openssl on cache_text and cache_upload [18:10:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:14:01] PROBLEM - cassandra-a CQL 10.64.0.202:9042 on xenon is CRITICAL: Connection refused [18:18:00] RECOVERY - cassandra-a CQL 10.64.0.202:9042 on xenon is OK: TCP OK - 0.002 second response time on port 9042 [18:33:50] PROBLEM - restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.200, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [18:34:18] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2512718 (10BBlack) The DHE-related Windows bugfix is deployed, and so far doesn't seem to have any effect on this (which I kinda expected, but had to check!). This bug was only supposed to... [18:35:40] PROBLEM - Restbase root url on xenon is CRITICAL: Connection refused [18:39:31] RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.814 second response time [18:41:50] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [18:42:47] Is there an opsen that can help me twiddle something in the bios of xenon.eqiad.wmnet? [18:43:04] basically, this: https://phabricator.wikimedia.org/T123924#1941098 [18:43:24] because of this: https://phabricator.wikimedia.org/T141675 [18:46:13] 06Operations, 10Ops-Access-Requests: Platonides access to #mediawiki_security - https://phabricator.wikimedia.org/T140288#2512752 (10Dzahn) @Platonides We can do that if you could sign an NDA (--> L2) please. [18:47:23] urandom: ok, yes [18:47:32] mutante: thanks! [18:47:45] (03PS1) 10Catrope: Add $wmgEchoMentionStatusNotifications and enable it in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) [18:48:13] (03CR) 10jenkins-bot: [V: 04-1] Add $wmgEchoMentionStatusNotifications and enable it in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) (owner: 10Catrope) [18:49:06] !log rebooting xenon to toggle HT setting in BIOS [18:49:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:50:50] PROBLEM - Host xenon is DOWN: PING CRITICAL - Packet loss = 100% [18:51:29] (03PS2) 10Catrope: Add $wmgEchoMentionStatusNotifications and enable it in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) [18:51:52] ACKNOWLEDGEMENT - cassandra-c CQL 10.64.48.140:9042 on restbase1015 is CRITICAL: Connection refused eevans Bootstrapping - The acknowledgement expires at: 2016-08-03 18:51:36. [18:52:03] (03CR) 10jenkins-bot: [V: 04-1] Add $wmgEchoMentionStatusNotifications and enable it in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) (owner: 10Catrope) [18:53:13] 06Operations, 10Cassandra: xenon.eqiad.wmnet: very high cpu utilization - https://phabricator.wikimedia.org/T141675#2512799 (10Dzahn) as requested i have turned "logical processor" off and on again in BIOS, like on T123924#1941098 [18:54:00] 06Operations, 10Parsoid, 06Services, 15User-mobrovac: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176#2512809 (10mobrovac) Proposed timeline: - 2016-08-02: take out `wtp100[12]` and reimage them, potentially start them up the same day - 2016-08-03: have the two node... [18:54:11] urandom: done. assuming it was actually "off" -> "on" -> save .. and did not need 2 saves [18:54:22] * urandom shrugs [18:54:36] mutante: this feels like a Hail Mary to me either way :) [18:54:47] heh, yes [18:55:00] 06Operations, 10Cassandra: xenon.eqiad.wmnet: very high cpu utilization - https://phabricator.wikimedia.org/T141675#2512811 (10Eevans) [18:55:31] RECOVERY - Host xenon is UP: PING OK - Packet loss = 0%, RTA = 1.14 ms [18:55:54] urandom: http://www.catb.org/jargon/html/W/wave-a-dead-chicken.html http://www.catb.org/jargon/html/R/rain-dance.html :) [18:56:52] * urandom nods [18:57:01] both apropos [18:58:00] mutante: well, the high utilization acpi_pad processes haven't returned, so... [18:59:14] godog ftw [18:59:38] :) yay [19:00:56] what's funny, is that one of godog's fav bits of comedy is the whole "Did you trying turning it off, then back on again?", from the IT crowd [19:01:04] s/trying/try/ [19:01:08] hahah, true [19:03:09] (03CR) 10Catrope: [C: 031] "Turns out it is set to a different value in labs by way of $wmfLocalServices, but we no longer need this line. The Parsoid settings should" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301743 (owner: 10MaxSem) [19:03:27] hahaha yeah that scene is epic [19:04:02] 06Operations, 10Parsoid, 06Services, 15User-mobrovac: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176#2512823 (10greg) @mobrovac what's the plan regarding upgrading the services in Beta Cluster? Seems unwise to ignore our only holistic testing cluster and do this in... [19:04:09] from the same writer as Father Ted, recommended [19:04:40] mutante: https://www.youtube.com/watch?v=nn2FB1P_Mn8 [19:05:04] hehe, yes :) [19:05:45] 06Operations, 10Cassandra: xenon.eqiad.wmnet: very high cpu utilization - https://phabricator.wikimedia.org/T141675#2512828 (10Eevans) >>! In T141675#2512799, @Dzahn wrote: > as requested i have turned "logical processor" off and on again in BIOS, like on T123924#1941098 I cannot believe that actually worked. [19:07:10] 06Operations, 06Release-Engineering-Team, 15User-greg: Institute a weekly review of all UBN! tasks - https://phabricator.wikimedia.org/T141130#2512832 (10greg) ```lang=irc 18:34 <+ greg-g> For the record, there are only 4 UBN! tasks right now: 2 related to Fundraising (they use UBN! for their stuff in a... [19:14:31] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [19:16:25] (03CR) 10Dzahn: [C: 032] statistics: set cluster for stat1004 [puppet] - 10https://gerrit.wikimedia.org/r/302283 (https://phabricator.wikimedia.org/T141360) (owner: 10Dzahn) [19:16:33] (03PS2) 10Dzahn: statistics: set cluster for stat1004 [puppet] - 10https://gerrit.wikimedia.org/r/302283 (https://phabricator.wikimedia.org/T141360) [19:18:30] 06Operations, 10Traffic, 13Patch-For-Review: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502#2512853 (10ema) There are a few Range-related things we need to test before starting the upgrade: 1) User1 requests range 1-10 of a 1GB object. We want to make sure that varnish fetc... [19:18:52] (03CR) 10Ottomata: [C: 032] Parameterize eventlogging-service access log level so labs can make this more verbose [puppet] - 10https://gerrit.wikimedia.org/r/302282 (owner: 10Ottomata) [19:18:59] (03PS2) 10Ottomata: Parameterize eventlogging-service access log level so labs can make this more verbose [puppet] - 10https://gerrit.wikimedia.org/r/302282 [19:22:56] (03CR) 10Ottomata: [V: 032] Parameterize eventlogging-service access log level so labs can make this more verbose [puppet] - 10https://gerrit.wikimedia.org/r/302282 (owner: 10Ottomata) [19:24:06] mutante: are sca boxes ubuntu ? [19:25:41] yes matanya [19:26:14] planned to move to jessie in near future ? [19:27:40] (03PS3) 10Dzahn: statistics: set cluster for stat1004 [puppet] - 10https://gerrit.wikimedia.org/r/302283 (https://phabricator.wikimedia.org/T141360) [19:32:57] !log deploy restbase 840411a4 canary on restbase1007 [19:33:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:34:49] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 6x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2512875 (10RobH) [19:35:16] (03PS1) 10Mobrovac: [Beta] Parsoid: Use deployment-parsoid09 [puppet] - 10https://gerrit.wikimedia.org/r/302300 (https://phabricator.wikimedia.org/T135176) [19:37:29] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 6x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2512883 (10RobH) So in the last order, we didn't need to order SSDs for all the systems in codfw, since rebalancing freed up enough existing samsung SSDs for 2 of t... [19:37:43] !log deploy restbase 840411a4 [19:37:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:39:06] 06Operations, 10Analytics-Wikistats, 13Patch-For-Review, 07Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281#2512886 (10Krinkle) [19:39:08] 06Operations, 10DNS, 10Traffic: Set up compat redirect stats.wikipedia.org -> stats.wikimedia.org - https://phabricator.wikimedia.org/T21353#2512888 (10Krinkle) [19:39:36] (03PS2) 10Mobrovac: [Beta] Parsoid: Use deployment-parsoid09 [puppet] - 10https://gerrit.wikimedia.org/r/302300 (https://phabricator.wikimedia.org/T135176) [19:40:41] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [19:41:24] 06Operations, 06Services, 10Wikimedia-Logstash: Kibana / logstash dashboards timing out consistently since Kibana upgrade - https://phabricator.wikimedia.org/T141384#2512892 (10Pchelolo) 05Open>03Resolved a:03Pchelolo After the deployment of https://github.com/wikimedia/hyperswitch/pull/50 the restbase... [19:44:31] 06Operations, 10Analytics: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2512900 (10Dzahn) investigated a bit. could confirm outgoing packets from stat1004 towards carbon (the aggregator for eqiad).. could NOT confirm incoming packets on carbon (unlike from stat1003 and other... [19:46:20] (03CR) 10Mobrovac: [C: 031] "Works in beta." [puppet] - 10https://gerrit.wikimedia.org/r/302300 (https://phabricator.wikimedia.org/T135176) (owner: 10Mobrovac) [19:46:21] PROBLEM - Apache HTTP on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:47:23] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 6x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2512906 (10GWicke) @RobH: We don't have very conclusive data, as those nodes aren't seeing any read traffic. iowait from writes is lower on the intel ssds, but read... [19:48:11] RECOVERY - Apache HTTP on mw1267 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.243 second response time [19:48:51] 06Operations, 10Analytics: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2512910 (10Dzahn) i don't understand how analytics roles are setup. "role::analytics_cluster::client" includes a bunch of other things and the word "firewall" or "base::firewall" does not show up in any o... [19:50:02] 06Operations, 10Analytics: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2512911 (10Dzahn) a:05Dzahn>03None [19:51:04] (03PS1) 10Mobrovac: [Beta] Parsoid: Switch to using deployment-parsoid09 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302301 (https://phabricator.wikimedia.org/T135176) [19:52:59] (03CR) 10Mobrovac: [C: 032] [Beta] Parsoid: Switch to using deployment-parsoid09 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302301 (https://phabricator.wikimedia.org/T135176) (owner: 10Mobrovac) [19:53:27] (03Merged) 10jenkins-bot: [Beta] Parsoid: Switch to using deployment-parsoid09 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302301 (https://phabricator.wikimedia.org/T135176) (owner: 10Mobrovac) [19:54:59] 06Operations, 10Analytics-Wikistats, 13Patch-For-Review, 07Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281#2512943 (10Krinkle) Until 4 months ago, this redirect existed. {0f5815e9b6} - https://gerrit.wikimedia.org/... [19:55:05] 06Operations, 10Analytics-Wikistats, 13Patch-For-Review, 07Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281#2512944 (10Krinkle) 05declined>03Open [19:55:22] 06Operations, 10Analytics-Wikistats, 07Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281#2010000 (10Krinkle) [19:55:24] (03PS2) 10Dzahn: admin: add shell account for Jan Dittrich [puppet] - 10https://gerrit.wikimedia.org/r/301721 (https://phabricator.wikimedia.org/T141339) [19:56:21] !log mobrovac@tin Synchronized wmf-config/LabsServices.php: (no message) (duration: 00m 38s) [19:56:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:57:51] 06Operations, 10Parsoid, 06Services, 15User-mobrovac: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176#2512947 (10mobrovac) >>! In T135176#2512823, @greg wrote: > @mobrovac what's the plan regarding upgrading the services in Beta Cluster? Seems unwise to ignore our o... [19:59:07] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 6x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2512948 (10Eevans) Unfortunately, the Intel disks ended up in restbase2009.codfw.wmnet, where we don't typically see much traffic. However, I did some bootstraps i... [20:00:04] gwicke, cscott, arlolra, subbu, bearND, and mdholloway: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160801T2000). Please do the needful. [20:02:17] (03PS1) 10Ladsgroup: ores: changes for cofigs for the refactor [puppet] - 10https://gerrit.wikimedia.org/r/302303 (https://phabricator.wikimedia.org/T141575) [20:03:41] (03CR) 10jenkins-bot: [V: 04-1] ores: changes for cofigs for the refactor [puppet] - 10https://gerrit.wikimedia.org/r/302303 (https://phabricator.wikimedia.org/T141575) (owner: 10Ladsgroup) [20:04:03] !log starting parsoid deploy [20:04:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:04:35] Hey, anyone from Ops around? [20:05:07] We want to deploy ores, we need a patch gets merged before and in case we needed to rollback, revert it [20:05:10] https://gerrit.wikimedia.org/r/#/c/302303/ [20:06:24] !log synced new parsoid code; restarted parsoid on wtp1001 as a canary [20:06:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:07:38] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2512977 (10GWicke) [20:08:37] (03PS2) 10Ladsgroup: ores: changes for cofigs for the refactor [puppet] - 10https://gerrit.wikimedia.org/r/302303 (https://phabricator.wikimedia.org/T141575) [20:09:18] (03CR) 10Andrew Bogott: [C: 032] [Beta] Parsoid: Use deployment-parsoid09 [puppet] - 10https://gerrit.wikimedia.org/r/302300 (https://phabricator.wikimedia.org/T135176) (owner: 10Mobrovac) [20:12:53] (03PS3) 10Ladsgroup: ores: changes for configs for the refactor [puppet] - 10https://gerrit.wikimedia.org/r/302303 (https://phabricator.wikimedia.org/T141575) [20:13:57] any ops around? I see ganglia showing some strange things .. https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Parsoid+eqiad&h=&tab=m&vn=&hide-hf=false&m=cpu_report&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [20:14:46] for the record, no mobileapps deployment today [20:14:48] it is affecting all eqiad servers. something with the aggregator? [20:14:49] negative memory usage ?? [20:15:14] subbu: it's because i restarted the aggregators on carbon trying to debug [20:15:16] ok .. so, good, it is indeed unrelated to the code i am deploying. [20:15:19] it will catch up to normal soon [20:15:24] yes, unrelated [20:15:26] ok. i'll proceed with my deploy then. :) [20:15:32] !log restarted ganglia aggregators on carbon [20:15:35] yes please [20:15:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:16:08] mutante: hey, It would be great if you merge this patch. I'm blocked on it to make the deployment [20:16:16] ema: ^ could you help [20:16:17] https://gerrit.wikimedia.org/r/302303 [20:18:10] !log finished deploying parsoid sha abf396eb [20:18:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:18:34] (03PS3) 10Dzahn: admin: add shell account for Jan Dittrich [puppet] - 10https://gerrit.wikimedia.org/r/301721 (https://phabricator.wikimedia.org/T141339) [20:19:04] 06Operations, 10Ops-Access-Requests: Requesting deployment access for Niharika - https://phabricator.wikimedia.org/T141593#2512990 (10kaldari) I approve and can help train Niharika on using the prod servers. [20:21:26] 06Operations, 10Analytics, 10Analytics-Wikistats, 07Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281#2512993 (10Dzahn) [20:23:12] (03PS4) 10Dzahn: ores: changes for configs for the refactor [puppet] - 10https://gerrit.wikimedia.org/r/302303 (https://phabricator.wikimedia.org/T141575) (owner: 10Ladsgroup) [20:23:18] 06Operations, 10Analytics, 10Analytics-Wikistats, 07Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281#2010000 (10BBlack) I tend to agree that if this was linked externally, we shouldn't have broken it. I don't think... [20:24:33] (03CR) 10Dzahn: [C: 032] ores: changes for configs for the refactor [puppet] - 10https://gerrit.wikimedia.org/r/302303 (https://phabricator.wikimedia.org/T141575) (owner: 10Ladsgroup) [20:25:15] (03PS9) 10BBlack: VCL backends 2/N: sort misc req_handling [puppet] - 10https://gerrit.wikimedia.org/r/300579 (https://phabricator.wikimedia.org/T110717) [20:25:17] (03PS10) 10BBlack: VCL backends 5/N: use for all clusters [puppet] - 10https://gerrit.wikimedia.org/r/300656 [20:25:19] (03PS9) 10BBlack: VCL backends 3/N: add force-pass support [puppet] - 10https://gerrit.wikimedia.org/r/300581 (https://phabricator.wikimedia.org/T110717) [20:25:21] (03PS10) 10BBlack: VCL backends 4/N: subpaths and defaulting [puppet] - 10https://gerrit.wikimedia.org/r/300655 [20:25:23] (03PS9) 10BBlack: VCL backends 1/N [WIP] [puppet] - 10https://gerrit.wikimedia.org/r/300574 (https://phabricator.wikimedia.org/T110717) [20:27:29] (03PS1) 10Ppchelko: WIP: Change-Prop: Enable sampled logging [puppet] - 10https://gerrit.wikimedia.org/r/302309 [20:29:03] 06Operations, 10Analytics: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2513016 (10Dzahn) 05Open>03Resolved a:03Dzahn eh.. yea.. after looking more, i restarted all aggregators on carbon (as in "kill" them and run puppet) stat1004 showed up https://ganglia.wikimedia.... [20:31:41] (03CR) 10Dzahn: [C: 032] "approved by Abraham (head of software dev at WMDE)" [puppet] - 10https://gerrit.wikimedia.org/r/301721 (https://phabricator.wikimedia.org/T141339) (owner: 10Dzahn) [20:32:02] (03PS4) 10Dzahn: admin: add shell account for Jan Dittrich [puppet] - 10https://gerrit.wikimedia.org/r/301721 (https://phabricator.wikimedia.org/T141339) [20:33:04] !log deploying 624d777 to ores [20:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:33:23] \o/ [20:34:46] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [20:36:46] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4822219 keys - replication_delay is 0 [20:36:47] PROBLEM - puppet last run on mw2180 is CRITICAL: CRITICAL: puppet fail [20:38:44] and we have problem with redis [20:38:48] the puppet change is not there [20:45:36] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] [20:46:21] (03PS2) 10Ppchelko: WIP: Change-Prop: Enable sampled logging [puppet] - 10https://gerrit.wikimedia.org/r/302309 [20:47:41] (03CR) 10jenkins-bot: [V: 04-1] WIP: Change-Prop: Enable sampled logging [puppet] - 10https://gerrit.wikimedia.org/r/302309 (owner: 10Ppchelko) [20:57:25] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002/stat1004 for Jdlrobson - https://phabricator.wikimedia.org/T141811#2513064 (10Jdlrobson) [20:57:44] 06Operations, 10Parsoid, 06Services, 15User-mobrovac: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176#2513078 (10greg) >>! In T135176#2512947, @mobrovac wrote: >>>! In T135176#2512823, @greg wrote: >> @mobrovac what's the plan regarding upgrading the services in Bet... [20:58:58] (03PS3) 10Ppchelko: Change-Prop: Enable sampled logging [puppet] - 10https://gerrit.wikimedia.org/r/302309 (https://phabricator.wikimedia.org/T139674) [20:59:37] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [21:00:04] dapatrick and bawolff: Dear anthropoid, the time has come. Please deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160801T2100). [21:00:44] (03CR) 10Ppchelko: "Puppet compiler: https://puppet-compiler.wmflabs.org/3550/" [puppet] - 10https://gerrit.wikimedia.org/r/302309 (https://phabricator.wikimedia.org/T139674) (owner: 10Ppchelko) [21:04:58] RECOVERY - puppet last run on mw2180 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:08:54] https://phabricator.wikimedia.org/diffusion/1912/browse/master/ [21:09:09] okay, something urgent. Can you update this manually [21:09:16] it's a mirror [21:09:21] and it's not updated [21:09:48] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] [21:11:30] Amir1: who's "you"? [21:11:46] the person who can update that [21:11:47] :D [21:11:53] ^ errors appear to be ORES-related. ping Amir1 [21:12:05] Amir1: rollback [21:12:12] it's too complicated [21:12:15] .... [21:12:18] puppet changes [21:12:26] if we can get this fixed [21:12:37] I'm making patch in tin [21:12:41] like a security patch [21:14:01] do you need a revert of the puppet change? [21:14:09] I'm confused on what needs to be changed Amir1 [21:14:20] are you saying you pull from diffusion and it's lagging behind has caused an issue? [21:14:25] yeah [21:14:28] chasemp: yup [21:14:55] diffusion is on a staggered update schedule based on change rate in a repo iirc [21:15:04] let me see here if I can force it [21:16:01] what is teh callsign for this? [21:16:18] Callsign [21:16:18] No Callsign [21:16:21] ok nvmd [21:16:27] r1912 [21:17:36] !log iridium sudo -u phd /srv/phab/phabricator/bin/repository update 1912 [21:17:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:17:44] ok.. Amir1 can you tell if it's fixed? [21:17:49] yup [21:18:07] thanks chasemp [21:18:08] so you guys will have o work this out and wait until it's reached iridium or build in some kind of hook for scap to force an update [21:19:04] Amir1: this needs an incident report [21:19:10] deploying [21:19:15] greg-g: yeah, definitely [21:19:29] Amir1: :) [21:19:32] I will explain completely why this bug survived even beta [21:19:38] bad bug [21:20:21] !log deploying e8d2475 to scb nodes [21:20:22] ores [21:20:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:20:26] !for ores [21:20:34] Amir1: file for when you need it: https://phabricator.wikimedia.org/F4327465 [21:20:55] thanks [21:20:56] s/file/fatalmonitor screenshot/ [21:20:57] :) [21:21:23] I hate when it takes so long to deploy [21:21:57] 06Operations, 10Parsoid, 06Services, 15User-mobrovac: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176#2513154 (10ssastry) [21:21:59] 06Operations, 06Services: Move all Node.JS services to Jessie and Node 4 - https://phabricator.wikimedia.org/T124989#2513155 (10ssastry) [21:22:08] 06Operations, 06Discovery, 06Maps, 06WMF-Legal, 03Maps-Sprint: Define tile usage policy - https://phabricator.wikimedia.org/T141815#2513156 (10MaxSem) [21:22:40] the logstash canary check that thcipriani added to scap would *probably* have caught this/prevented it going past canary hosts (just saying "probably" because I can't predict the future) [21:22:50] restarting service [21:23:01] greg-g: we have canary, and it didn't work out [21:23:02] \o/ [21:23:04] should be released this week (pending new deb package creation/adding to wikimedia apt) [21:23:17] We're up! [21:23:28] it's up [21:23:30] yeah [21:23:35] https://gerrit.wikimedia.org/r/#/c/302346/ [21:23:43] I'm sorry for this commit message though [21:23:56] greg-g: seems like a decent candidate for a deploy hook since more than a few will want the newest version to clone from etc, glad it worked outt ho [21:24:23] thanks chasemp! [21:24:25] chasemp: the "make sure whatever you are pulling from is updated" hook? maybe [21:24:34] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513172 (10BBlack) Yeah. As a first pass, we could simply strip URLs with `?.*$`. If `shortcode` is a pretty restricted subset, it might make more sense just g... [21:24:53] or have a tie between projects and repos and update it pre-fetch [21:24:53] I say maybe only because... shouldn't you know what you are deploying/double check the hash? [21:25:01] I can't thikn of a good scenario where you want diffusion to be old [21:25:06] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 705 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4825472 keys - replication_delay is 705 [21:25:10] sure true but I'm unsure how obvious that is [21:25:11] and also [21:25:13] race conditions [21:25:19] as you aren't actually blocking it from updating either [21:25:24] greg-g: I do double check the hash. [21:25:34] I guess they pulled down head tho yeah [21:25:35] so in theory [21:25:41] this shoudl use deploy tags? [21:25:49] Amir1: I'll just wait until the incident report before opining then ;) [21:25:50] and checkout a tag and the missing tag would cause a freeze or rollback [21:25:59] aaaanyway, side seat driving [21:26:03] yup [21:27:06] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4821542 keys - replication_delay is 0 [21:27:41] still seeing the "no model available for {blah}" fatal [21:28:41] where? [21:28:53] in exception log [21:29:35] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002/stat1004 for Jdlrobson - https://phabricator.wikimedia.org/T141811#2513179 (10dr0ptp4kt) Approved. [21:29:41] logstash view of fatalmonitor: https://logstash.wikimedia.org/goto/6ad16af7764be672a10670a8f00f7922 [21:30:28] halfak: it seems like a bug [21:30:32] ^ [21:30:45] url? [21:30:51] did you change the model and revid behavior [21:30:55] https://logstash.wikimedia.org/goto/6ad16af7764be672a10670a8f00f7922 [21:30:57] halfak: open logstash and look at the errors [21:30:57] Shouldn't have [21:31:04] Checks are still working in labs [21:31:26] not working in prod, this is spamming the hell out of fatalmonitor, we need to revert ASAP, this is not ok to leave as is [21:31:42] it seems it gets rev id instead of model [21:31:46] yup [21:32:11] example: https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2016.08.01/mediawiki/?id=AVZIAsJp14thRtYyt2lD [21:32:23] Amir1, is this the kind of URL that the extension should be hitting? https://ores.wikimedia.org/v2/scores/wikidatawiki/damaging/2142536 [21:32:26] (03PS1) 10Alex Monk: beta: update graphite URL that seems to have been missed during changes [puppet] - 10https://gerrit.wikimedia.org/r/302351 [21:32:28] Seems to be working fine [21:32:42] (03CR) 10Dzahn: "paladox has tested on http://gerrit-test.wmflabs.org/gerrit/#/c/16/" [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [21:32:48] https://ores.wikimedia.org/v2/scores/plwiki/damaging/2142 Works [21:32:50] let me get the url [21:32:58] https://ores.wikimedia.org/v2/scores/enwiki/damaging/2142 works [21:33:07] let's rollback and check later [21:33:15] mutante: hey, can you revert that? [21:33:16] https://ores.wikimedia.org/v2/scores/testwiki/damaging/124235 works [21:33:26] But it seems to be working [21:33:30] What is broken about the check? [21:33:33] I don't know [21:33:44] https://ores.wikimedia.org/v2/scores/testwiki/reverted/124235 works [21:33:49] halfak: the ORES extension is fataling in prod, it needs to be reverted now [21:34:04] (03PS1) 10Dzahn: Revert "ores: changes for configs for the refactor" [puppet] - 10https://gerrit.wikimedia.org/r/302352 [21:34:04] Oh! I thought it was the healthmonitor [21:34:08] https://github.com/wikimedia/mediawiki-extensions-ORES/blob/master/includes/Api.php [21:34:16] halfak: did you not open the links I shared? [21:34:26] and :P [21:34:43] (03PS2) 10Dzahn: Revert "ores: changes for configs for the refactor" [puppet] - 10https://gerrit.wikimedia.org/r/302352 [21:34:46] Amir1: ^ that? [21:34:51] yup [21:35:06] (03CR) 10Dzahn: [C: 032] "causes fatals in prod" [puppet] - 10https://gerrit.wikimedia.org/r/302352 (owner: 10Dzahn) [21:35:14] (03CR) 10Dzahn: [V: 032] Revert "ores: changes for configs for the refactor" [puppet] - 10https://gerrit.wikimedia.org/r/302352 (owner: 10Dzahn) [21:35:26] halfak: https://ores.wikimedia.org/scores/wikidatawiki?models=damaging|revids=12234 [21:35:33] nope [21:35:48] AmandaNP, https://ores.wikimedia.org/scores/wikidatawiki?models=damaging&revids=12234 [21:35:48] Amir1: reverted on puppetmaster [21:35:49] Woops [21:35:50] https://ores.wikimedia.org/scores/wikidatawiki?models=damaging&revids=12234 [21:35:52] Amir ^ [21:35:57] That URL you gave is broken [21:36:06] mutante: thanks [21:36:15] halfak: yeah, I made it manually [21:36:43] Hmm... This response structure does look wrong. [21:36:49] * halfak looks into that [21:37:17] Now *this* we should have caught in Beta [21:37:19] FYI, I'm holding off my my security patch deploy until this^ business is resolved. [21:37:32] yes [21:37:33] we changed "staging" and "prod" at the same time. next time just change "staging" first ? [21:37:37] !log deploying 6790ccb [21:37:37] 06Operations, 06Discovery, 06Maps, 06WMF-Legal, 03Maps-Sprint: Define tile usage policy - https://phabricator.wikimedia.org/T141815#2513156 (10BBlack) In general we'll probably allow third parties, as we've done for all of our other content in the general case (e.g. upload.wm.o images are linked and embe... [21:37:41] !for ores [21:37:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:37:47] !log for ores [21:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:38:06] halfak: it was v1, I think v1 should be okay [21:38:12] mutante: yes [21:38:37] mutante: I'm still in the dark about what actually happened, so I'll wait to say more :) [21:38:43] *nod* [21:39:04] ...other than "REVERT!" ;) [21:39:29] rollbacks shouldn't take that long [21:39:31] strange [21:39:50] mutante: can you run puppet agent on scb nodes too [21:40:01] I don't have access [21:40:05] ok [21:40:08] all scb* ? [21:40:29] well, just 2, yea [21:40:39] 2 per dc [21:40:51] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513218 (10Legoktm) The shortcode set is `23456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz$_`. It's basically static and won't be changing once we enabl... [21:41:21] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513219 (10BBlack) Looks like base36 in the specs, up to 9 total characters (3x wiki id, 6x article id). We could restrict legal urls to `^/[0-9A-Z]{0,9}$`. Is... [21:42:03] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513221 (10BBlack) Simul-edit! So it's not base36? [21:42:42] late to the party, but fwiw, the logstash canary check I added was for mediawiki deploys. scap3 deploys have canary check hooks. [21:43:24] thcipriani: dur, sorry [21:43:29] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513225 (10Legoktm) It is `[0-9A-Za-z$\-]` with 0, O, I, l, and 1 removed because they're visually confusing when written down. And yes, it is case-sensitive. [21:44:37] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513226 (10Legoktm) Oh, and there is no max length, we'll keep generating longer ids as people keep shortening new URLs... [21:44:52] Amir1: all 4 ran puppet.. scb1002 may be different from others ? [21:44:53] greg-g: np :) The canary check script that gwick.e wrote could be useful for other services, too. [21:45:25] mutante: nope, probably it's done already [21:45:30] we are up [21:45:33] really up [21:45:38] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513227 (10BBlack) Ok I basically get it. But in the two different explanations, one has `-` and one has `_` ...? [21:45:53] starting to go down, maybe [21:46:23] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513228 (10Legoktm) Oops sorry, I meant `_`. My bad. [21:47:51] no, not going down completely [21:48:01] (03PS2) 10Yuvipanda: beta: update graphite URL that seems to have been missed during changes [puppet] - 10https://gerrit.wikimedia.org/r/302351 (owner: 10Alex Monk) [21:48:09] (03CR) 10Yuvipanda: [C: 032 V: 032] beta: update graphite URL that seems to have been missed during changes [puppet] - 10https://gerrit.wikimedia.org/r/302351 (owner: 10Alex Monk) [21:48:26] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513231 (10BBlack) Ok. Either way, when I count up both representations, I get 59 total characters, does that sound right? [21:49:26] bblack, hey [21:49:57] I know the letsencrypt puppet code requires the server to map /.well-known/acme-challenge to /var/acme/challenge [21:50:04] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513234 (10Legoktm) ``` >>> len('23456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz$_') 59 ``` Yep! [21:50:28] is that possible to do with just varnish (as running for text/upload) and no extra web server? [21:51:34] Hi [21:51:38] anny devs in [21:51:40] maybe using the nginx we have on those boxes for tls termination? [21:51:45] hi ShakespeareFan00 [21:51:51] I've got some image 'phantoms' in a query - https://quarry.wmflabs.org/query/6052 [21:52:13] In that they show up in the query but not when i look for them on the project concerned.. [21:52:35] I thought this might be cache related but they've been there for a few days [21:53:33] Krenair: possible, but not useful except in the single-varnish case like beta. v3 doesn't do static content. I think there's a vmod for varnish4. Either way you could theoretically hack it into nginx over HTTPS-only, and have varnish just redirect HTTP there. [21:53:49] Krenair: for multiple varnishes, we'd have to have a way to sync the challenge data out. [21:54:46] The concern is 'bad tables' hence my mentioning it here [21:57:32] (03PS1) 10BBlack: text VCL: validate w.wiki short URLs [puppet] - 10https://gerrit.wikimedia.org/r/302354 (https://phabricator.wikimedia.org/T141170) [21:57:42] ShakespeareFan00: it would be great if you can copy/paste all that on a ticket. backlog is changing fast here. people are in different timezones etc [21:58:16] Willl wait another 34 hours and then submit a ticket if it perissts [21:58:27] ShakespeareFan00, looks to me like a replication issue [21:58:28] Some stuff was resynched this morning i understand [21:59:02] !log restarting uwsgi-ores in scb1001 and scb1002 [21:59:04] "select * from page where page_id in (49959270, 37396818, 49954143, 49949181, 47935756, 49954287, 49956079, 48731525, 2093268);" returns 9 rows on labsdb1001's enwiki_p, but only 1 row on db2034's enwiki [21:59:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:59:07] was there a ticket about the thing that happened this morning? [21:59:17] wherever it was morning [21:59:57] mutante: I don't know [22:00:05] Krenair: Thanks [22:00:10] that's why we should have one [22:00:25] SO I just need to wait for the replication to catch up? [22:00:48] I'm not sure [22:01:29] 07:45, 31 March 2016 Closedmouth (talk | contribs) deleted page File:BIT central teaching building panorama.jpg (F7: Violates non-free content criterion #1) [22:01:29] Opening a ticket [22:01:37] it should definitely be replicated to labs by now [22:01:43] it's been 4 months [22:02:22] (03PS1) 10Dzahn: admin: add jdittrich to researchers, bastions [puppet] - 10https://gerrit.wikimedia.org/r/302355 (https://phabricator.wikimedia.org/T141339) [22:03:52] !log Deployed patch for T139670 to wmf.12 [22:03:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:05:33] (03PS2) 10Dzahn: admin: add jdittrich to researchers, bastions [puppet] - 10https://gerrit.wikimedia.org/r/302355 (https://phabricator.wikimedia.org/T141339) [22:05:50] (03CR) 10Paladox: [C: 031] "Yes please" [puppet] - 10https://gerrit.wikimedia.org/r/301896 (owner: 10Chad) [22:06:11] (03CR) 10Dzahn: [C: 032] admin: add jdittrich to researchers, bastions [puppet] - 10https://gerrit.wikimedia.org/r/302355 (https://phabricator.wikimedia.org/T141339) (owner: 10Dzahn) [22:06:37] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [22:07:00] ^ strontium is not active anymore [22:07:19] i heard earlier today [22:08:03] or not, there is actually an unmerged change [22:08:19] I'm deploying patches if anyone else needs time in the window. [22:08:23] merges Yuvipands [22:08:26] PROBLEM - Unmerged changes on repository puppet on rhodium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [22:08:27] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [22:08:36] sigh [22:08:38] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:08:38] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [22:08:42] (03PS2) 10Paladox: Gerrit: Simplify dependencies [puppet] - 10https://gerrit.wikimedia.org/r/301896 (owner: 10Chad) [22:08:44] there we go.. also strontium [22:08:51] kind of surprisingly [22:08:55] (03PS1) 10Chad: Gerrit: Default to no replication [puppet] - 10https://gerrit.wikimedia.org/r/302356 (https://phabricator.wikimedia.org/T141803) [22:08:57] (03CR) 10Legoktm: text VCL: validate w.wiki short URLs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/302354 (https://phabricator.wikimedia.org/T141170) (owner: 10BBlack) [22:09:04] bblack: thanks :D [22:09:08] https://phabricator.wikimedia.org/T141818 - Wasn't sure how to tag it [22:09:41] (03CR) 10Paladox: [C: 031] "Thanks :)" [puppet] - 10https://gerrit.wikimedia.org/r/302356 (https://phabricator.wikimedia.org/T141803) (owner: 10Chad) [22:10:18] RECOVERY - Unmerged changes on repository puppet on rhodium is OK: No changes to merge. [22:10:27] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [22:12:32] 06Operations, 10Ops-Access-Requests: Requesting access to stat1003.eqiad.wmnet for WMDE-jand - https://phabricator.wikimedia.org/T141339#2513304 (10Dzahn) [bast1001:~] $ id jdittrich uid=14685(jdittrich) gid=500(wikidev) groups=500(wikidev),707(bastiononly) [stat1003:~] $ id jdittrich uid=14685(jdittrich) g... [22:12:33] ShakespeareFan00, you should really put my line earlier about labsdb1001 vs. db2034 in the ticket [22:12:38] (03PS1) 10Chad: Gerrit: Make IPv6 optional [puppet] - 10https://gerrit.wikimedia.org/r/302359 (https://phabricator.wikimedia.org/T133070) [22:12:51] tag DBA, Labs [22:12:57] (03CR) 10Paladox: [C: 031] "Yay thanks :)" [puppet] - 10https://gerrit.wikimedia.org/r/302359 (https://phabricator.wikimedia.org/T133070) (owner: 10Chad) [22:13:06] change Quarry reference in the title to "labs replicas" or something [22:13:39] Added your note from earlie [22:14:32] 06Operations, 10Ops-Access-Requests: Requesting access to stat1003.eqiad.wmnet for WMDE-jand - https://phabricator.wikimedia.org/T141339#2513326 (10Dzahn) 05Open>03Resolved a:03Dzahn @Jan_Dittrich Your request has been granted. You should now be able to ssh to the "bastion hosts" (like bast1001.wikimedi... [22:16:27] (03CR) 10Dzahn: [C: 032] Gerrit: Default to no replication [puppet] - 10https://gerrit.wikimedia.org/r/302356 (https://phabricator.wikimedia.org/T141803) (owner: 10Chad) [22:17:06] ShakespeareFan00, okay... also maybe my recommendations on tags and title [22:17:17] See the udpated version [22:24:09] (03CR) 10BBlack: text VCL: validate w.wiki short URLs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/302354 (https://phabricator.wikimedia.org/T141170) (owner: 10BBlack) [22:24:25] 06Operations, 10Ops-Access-Requests: Requesting deployment access for Niharika - https://phabricator.wikimedia.org/T141593#2504190 (10Dzahn) per ops meeting just needs the "3 day waiting period". [22:24:34] (03PS2) 10BBlack: text VCL: validate w.wiki short URLs [puppet] - 10https://gerrit.wikimedia.org/r/302354 (https://phabricator.wikimedia.org/T141170) [22:25:41] (03CR) 10BBlack: "re: the "not found" text, I updated it, but keep in mind this will use our standard error page template, so it's not going to be prominent" [puppet] - 10https://gerrit.wikimedia.org/r/302354 (https://phabricator.wikimedia.org/T141170) (owner: 10BBlack) [22:26:01] (03PS2) 10Dzahn: Gerrit: Make IPv6 optional [puppet] - 10https://gerrit.wikimedia.org/r/302359 (https://phabricator.wikimedia.org/T133070) (owner: 10Chad) [22:27:53] (03PS3) 10Chad: Gerrit: Make IPv6 optional [puppet] - 10https://gerrit.wikimedia.org/r/302359 (https://phabricator.wikimedia.org/T133070) [22:28:27] (03CR) 10Dzahn: [C: 032] Gerrit: Make IPv6 optional [puppet] - 10https://gerrit.wikimedia.org/r/302359 (https://phabricator.wikimedia.org/T133070) (owner: 10Chad) [22:30:46] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] [22:31:05] (03CR) 10Legoktm: [C: 031] "OK, sounds good to me! Thanks :)" [puppet] - 10https://gerrit.wikimedia.org/r/302354 (https://phabricator.wikimedia.org/T141170) (owner: 10BBlack) [22:32:59] FYI: Incident report for ORES downtime: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160801-ORES [22:33:03] Thanks for everyone's help. [22:33:34] (03CR) 10QChris: [C: 031] Gerrit: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [22:34:47] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:35:27] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/3551/lead.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/301896 (owner: 10Chad) [22:35:36] (03PS3) 10Dzahn: Gerrit: Simplify dependencies [puppet] - 10https://gerrit.wikimedia.org/r/301896 (owner: 10Chad) [22:38:01] 06Operations, 10Deployment-Systems: dologmsg doesn't work on terbium - https://phabricator.wikimedia.org/T141619#2513444 (10greg) p:05Triage>03Normal [22:39:54] bblack: Notice: /Stage[main]/Role::Cache::Ssl::Unified/Letsencrypt::Cert::Integrated[testing-le]/Exec[acme-setup-acme-testing_le]/returns: "detail": "Provided agreement URL [https://letsencrypt.org/documents/LE-SA-v1.0.1-July-27-2015.pdf] does not match current agreement URL [https://letsencrypt.org/documents/LE-SA-v1.1.1-August-1-2016.pdf]", [22:40:14] looks like they have updated it today :( [22:40:48] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] [22:41:21] huh, ores is still broken? [22:41:31] Oh... It shouldn't be. [22:41:47] MaxSem, looks up to me [22:41:53] RuntimeException from line 136 of /srv/mediawiki/php-1.28.0-wmf.12/extensions/ORES/includes/Cache.php: No model available for [361156268] [22:42:02] seems that old url is hardcoded in our acme_tiny.py script for new registrations [22:42:41] MaxSem, I have no idea what that is about [22:42:46] Amir1, still around? [22:42:53] MaxSem, when did that error come through? [22:43:35] not one error, flood of them [22:44:01] resumed at 22:23 UTC [22:45:40] That's really weird. That's an hour after our issues stopped. I don't think it's the service since we did a revert. [22:45:45] I'm looking though. [22:46:46] yeah. Seems like we're definitely running the old code on the server. [22:47:01] How could I tell if new code was deployed for the extension? [22:47:24] go to a random server and see [22:47:35] MaxSem, not sure how to do that [22:47:52] you don't have MW access? [22:47:57] https://github.com/wiki-ai/ores/commit/3d6f5dd844b9a8681e304f6a20a791910842a041 [22:48:01] Krenair: ok, updating [22:48:05] is there a revert needed on github? [22:48:24] (03PS1) 10BBlack: LE: update agreement in acme_tiny [puppet] - 10https://gerrit.wikimedia.org/r/302363 [22:48:29] mutante, that code isn't deployed. [22:48:50] (03CR) 10BBlack: [C: 032 V: 032] LE: update agreement in acme_tiny [puppet] - 10https://gerrit.wikimedia.org/r/302363 (owner: 10BBlack) [22:49:00] (03CR) 10Paladox: "@Chad could this be merged please?" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/299164 (owner: 10Chad) [22:49:06] halfak, maxsem@tin:/srv/mediawiki-staging/php-1.28.0-wmf.12/extensions/ORES$ git log -1 [22:49:06] commit 0afafe70f765153ae298ef10490ef5dda1b9a59b [22:49:06] Author: Amir Sarabadani [22:49:06] Date: Tue Jul 26 19:40:39 2016 +0430 [22:50:47] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:50:49] (03CR) 10Dzahn: "it says "as well", but isn't this actually replacing it, so the original redirect without /r won't work anymore" [puppet] - 10https://gerrit.wikimedia.org/r/301829 (owner: 10Chad) [22:51:20] Thanks MaxSem. Looking now [22:52:38] halfak, poking at a random appserver indicates that it has whatever is present on tin [22:53:54] (03PS3) 10BBlack: text VCL: validate w.wiki short URLs [puppet] - 10https://gerrit.wikimedia.org/r/302354 (https://phabricator.wikimedia.org/T141170) [22:54:10] (03CR) 10BBlack: [C: 032 V: 032] text VCL: validate w.wiki short URLs [puppet] - 10https://gerrit.wikimedia.org/r/302354 (https://phabricator.wikimedia.org/T141170) (owner: 10BBlack) [22:54:33] Amir1: halfak I'd like to see a review of why it took so long to fix/revert [22:55:08] greg-g, sure. As soon as I confirm that it is indeed fixed. [22:55:09] 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2513539 (10BBlack) 05Open>03Resolved a:03BBlack [22:55:17] Also, what do you mean by review? [22:55:37] MaxSem, it looks to me like error you cited stop by 22:41 [22:55:49] The errors that occur after that are expected. [22:56:31] yep [22:56:33] OK. I'm declaring victory again. [22:57:09] jouncebot: next [22:57:10] In 0 hour(s) and 2 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160801T2300) [22:57:18] So yeah. The review. Generally, I think it took a long time to revert because (1) we tried a hotfix and found that diffusion took a *long* time to sync with github and (2) the revert required a revert in puppet too. [22:57:18] heh, ok, then i won't [22:57:30] greg-g, ^ [22:57:55] So, lessons learned. no hot fixes in deployment and try not to combine puppet changes with code. [22:58:40] bblack, any tips for what the VCL code to force passing to nginx would look like? I'm thinking something like: if (req.url ~ "^/\.well-known/acme-challenge") { pass } [22:58:42] this one was hard because the code required config changes and puppet manages our config. [22:58:51] So they needed to happen at the same time. [22:58:58] Do we have a green light for SWAT or do you still need some time? [22:59:01] Or maybe if we're more clever we can avoid that. [23:00:04] RoanKattouw, ostriches, MaxSem, and Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160801T2300). Please do the needful. [23:00:04] RoanKattouw, Kaldari, and MaxSem: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:38] here [23:00:39] I can swat this evening (excepted if there is still something to do before). [23:00:51] yay [23:01:01] here [23:01:05] (03CR) 10Dzahn: [C: 031] "paladox tested on gerrit-test, just not restarting gerrit during deploy window:)" [puppet] - 10https://gerrit.wikimedia.org/r/301898 (owner: 10Chad) [23:01:06] here [23:02:18] (03PS2) 10Dereckson: Labs: Set CategoryCollation for dewiki to 'uca-de-u-kn' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301550 (https://phabricator.wikimedia.org/T128806) (owner: 10Raimond Spekking) [23:02:40] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2513575 (10BBlack) Looks like buggy Microsoft updates (similar/related to the one(s) linked before) are the culprit. Some related reporting: http://www.infoworld.com/article/3099109/micro... [23:02:57] (03CR) 10Dereckson: [C: 032] "SWAT, labs only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301550 (https://phabricator.wikimedia.org/T128806) (owner: 10Raimond Spekking) [23:03:05] (03CR) 10Dzahn: [C: 031] "yup, this will go together with other change(s) to minimize the number of restarts" [puppet] - 10https://gerrit.wikimedia.org/r/301894 (owner: 10Chad) [23:03:23] (03Merged) 10jenkins-bot: Labs: Set CategoryCollation for dewiki to 'uca-de-u-kn' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301550 (https://phabricator.wikimedia.org/T128806) (owner: 10Raimond Spekking) [23:03:46] (03CR) 10Dzahn: [C: 031] Gerrit: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [23:05:16] !log dereckson@tin Synchronized wmf-config/InitialiseSettings-labs.php: Labs: Set CategoryCollation for dewiki to 'uca-de-u-kn' (T128806) (duration: 00m 38s) [23:05:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:05:23] T128806: Switch German Wikipedia to uca-de category collation - https://phabricator.wikimedia.org/T128806 [23:06:09] (03PS3) 10Dereckson: Add $wmgEchoMentionStatusNotifications and enable it in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) (owner: 10Catrope) [23:06:38] (03CR) 10jenkins-bot: [V: 04-1] Add $wmgEchoMentionStatusNotifications and enable it in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) (owner: 10Catrope) [23:07:09] RoanKattouw: ^ [23:07:33] 23:06:20 PHP Parse error: syntax error, unexpected '=>' (T_DOUBLE_ARROW), expecting ']' in wmf-config/InitialiseSettings-labs.php on line 203 [23:07:52] (03PS2) 10Dereckson: Load Elastica via extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301856 (owner: 10MaxSem) [23:08:26] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301856 (owner: 10MaxSem) [23:08:55] (03Merged) 10jenkins-bot: Load Elastica via extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301856 (owner: 10MaxSem) [23:09:38] mutante: on tin: error: insufficient permission for adding an object to repository database .git/objects [23:10:35] mutante: there are a lot of root in /srv/mediawiki-staging/.git/objects [23:10:36] Urgh [23:10:42] Looking [23:11:04] we just recently added an Icinga check for that [23:11:19] should have triggered when a root was doing stuff (i wasn't, never touch -staging ever) [23:11:21] (03PS4) 10Catrope: Add $wmgEchoMentionStatusNotifications and enable it in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) [23:11:25] (03PS5) 10Catrope: Add $wmgEchoMentionStatusNotifications and enable it in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) [23:11:36] Dereckson: Sorry, silly mistake. Fixed [23:11:43] * RoanKattouw loves the new edit-in-browser feature in Gerrit [23:11:53] .git/objects is still mwdeploy, it's some subdirectories which belong to root [23:12:11] That usually means someone ran git pull or some other git command as root [23:12:29] mutante: Could you do a recursive chown as root to fix that? [23:12:33] (03PS1) 10Dzahn: tcpircbot: allow connections from terbium [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) [23:12:41] I've done this before but I gave up root a couple years ago [23:12:45] Krenair: I meant the other way around. As in, VCL code should enforce redirects to HTTPS (like prod already does), and the nginx https server should support acme-challenge lookup on the FS directly (but will only work over HTTPS, not HTTP, since nginx never sees HTTP traffic. But that should work fine in theory if the redirect is there). [23:13:18] Krenair: but it's kind of going way off into a beta-production-split direction doing that kind of thing [23:15:06] !log root@tin:/srv/mediawiki-staging# find . -uid 0 -exec chown mwdeploy:wikidev {} \; [23:15:11] Oh I see [23:15:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:15:18] RoanKattouw: Dereckson: try again [23:15:34] also.. why did the alert not work :p [23:16:04] mutante: what exactly is watched by the alert? [23:16:13] that files are owned by root [23:16:35] .git was okay, it was a subdirectory lost in the hierarchy [23:17:01] this https://gerrit.wikimedia.org/r/#/c/301327/ [23:17:04] was supposed to tell us [23:17:09] MaxSem: live on mw1099 [23:17:18] but also i dont ever see a root actually doing stuff [23:17:53] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2513661 (10BBlack) "KB 3161639" is an interesting search term. That's what's most-often cited in related bug reports. Some of the reports sound like this actually *is* related to the inte... [23:17:59] Dereckson, wfm [23:18:25] oh, the check is broken due to [23:18:28] NRPE: Unable to read output [23:18:37] well that explains it at least [23:18:50] gotta go, be back in a little [23:18:52] MaxSem: ack [23:21:36] halfak: I'm more interested in root cause [23:21:38] !log dereckson@tin Synchronized wmf-config/CommonSettings.php: Load Elastica extension via extension.json ([[Gerrit:301856]]) (duration: 00m 31s) [23:21:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:21:51] halfak: I saw Amir1 say "you" to no one in particular after a long time [23:21:59] once there was an ops it was 2 minute [23:22:00] s [23:22:06] what were the previous 30 minutes doing? [23:22:18] (03CR) 10Chad: "Well that's dumb :\" [puppet] - 10https://gerrit.wikimedia.org/r/301829 (owner: 10Chad) [23:22:19] spent* [23:22:58] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) (owner: 10Catrope) [23:23:22] (03Merged) 10jenkins-bot: Add $wmgEchoMentionStatusNotifications and enable it in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302293 (https://phabricator.wikimedia.org/T135717) (owner: 10Catrope) [23:23:37] RoanKattouw: live on mw1099 [23:23:52] Dereckson: OK, checking that it is in fact a no-op there [23:23:52] halfak: please add a section/paragraph explaining what lessons are learned regarding time to revert [23:24:22] Dereckson: Yup, looking good [23:25:02] !log dereckson@tin Synchronized wmf-config/: Add $wmgEchoMentionStatusNotifications and enable it in beta labs (T135717, T139623) (duration: 00m 30s) [23:25:04] T139623: Create notification for successful mentions - https://phabricator.wikimedia.org/T139623 [23:25:04] T135717: Add mention failure notifications - https://phabricator.wikimedia.org/T135717 [23:25:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:25:25] 498 Notice: Undefined variable: wmgEchoMentionStatusNotifications in /srv/mediawiki/wmf-config/CommonSettings.php on line 2741 [23:25:39] (counting down) [23:26:20] RoanKattouw: I revert [23:26:34] OK [23:26:39] Checking where that came from [23:27:00] Dereckson: wtf how is that possible, look at InitialiseSettings.php [23:27:20] Are you sure those errors persist, or are they just a temporary spike while CommonSettings had synced but InitialiseSettings hadn't? [23:27:25] !log dereckson@tin Synchronized wmf-config/: Revert "Add $wmgEchoMentionStatusNotifications and enable it in beta labs" (T135717, T139623) (duration: 00m 27s) [23:27:27] T139623: Create notification for successful mentions - https://phabricator.wikimedia.org/T139623 [23:27:27] T135717: Add mention failure notifications - https://phabricator.wikimedia.org/T135717 [23:27:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:29:41] RoanKattouw: I'd say sync-dir is really not the way to sync [23:29:59] If you sync-file InitialiseSettings first and CommonsSettings second, I think that should be safe [23:30:27] Also I thought sync-* were supposed to have a waiting period during which errors are tracked? Or has that feature not been released yet? [23:30:49] (03CR) 10Chad: [C: 031] Rely on commits name instead of branch [puppet] - 10https://gerrit.wikimedia.org/r/301849 (owner: 10Paladox) [23:31:21] I was discussing the sync-dir vs. CS IS careful order issue with Reedy some days ago, it seems we should *really* avoid sync-dir. [23:31:39] (03CR) 10Paladox: "It seems this does not work" [puppet] - 10https://gerrit.wikimedia.org/r/301849 (owner: 10Paladox) [23:31:47] What's "CS IS"? [23:31:52] Oh, I see [23:31:58] CommonSettings, InitialiseSettings [23:32:01] yes [23:32:40] also, log collection seems slow to propagate [23:32:51] we've still notices now after the revert [23:33:32] (03Abandoned) 10Paladox: gerrit: support linking to a phabricator comment [puppet] - 10https://gerrit.wikimedia.org/r/301673 (https://phabricator.wikimedia.org/T76459) (owner: 10Paladox) [23:33:39] 06Operations, 06Discovery, 06Maps, 06WMF-Legal, 03Maps-Sprint: Define tile usage policy - https://phabricator.wikimedia.org/T141815#2513698 (10Slaporte) a:03Slaporte [23:33:59] RoanKattouw: I offer we wait some minutes fatalmonitor is quiet about the former notice error, then we sync IS first, then CS? [23:34:08] Sure, sounds good [23:34:12] I'll be around for another hour or so [23:36:37] Am I the only one who gets zero results for the trending panel in logstash for the mw-errors and fatalmonitor dashboards? [23:36:57] ostriches: me too, been annoying all day [23:37:11] if you click edit I *think* it shows you what you should see, which is... odd [23:42:28] (03Draft2) 10Paladox: Testing [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302371 [23:42:42] (03Draft1) 10Paladox: Testing [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302371 [23:44:44] only semi-related, but I just noticed the GeoIP cookies are borked for w.wiki tii [23:44:47] *too [23:44:47] < Set-Cookie: GeoIP=:::::v6; Path=/; secure; Domain=.w.wiki [23:46:13] (03CR) 10Dzahn: "we just had an incident this should detect. noticed that we have currently "NRPE: Unable to read output" which makes it a WARN.. looking i" [puppet] - 10https://gerrit.wikimedia.org/r/301842 (owner: 10Chad) [23:49:00] (03PS3) 10Dzahn: Deploy masters: Improve icinga check for bad ownership [puppet] - 10https://gerrit.wikimedia.org/r/301842 (owner: 10Chad) [23:49:48] (03PS4) 10Paladox: Add gbp.conf file for debian [debs/gerrit] - 10https://gerrit.wikimedia.org/r/301841 [23:51:01] (03PS5) 10Paladox: Add gbp.conf file for debian [debs/gerrit] - 10https://gerrit.wikimedia.org/r/301841 [23:51:24] (03CR) 10Dzahn: [C: 032] Deploy masters: Improve icinga check for bad ownership [puppet] - 10https://gerrit.wikimedia.org/r/301842 (owner: 10Chad) [23:52:46] (03PS1) 10Dereckson: Revert "Add $wmgEchoMentionStatusNotifications and enable it in beta labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302373 [23:53:55] (03CR) 10Dereckson: [C: 032] "SWAT (already reverted at 23:27:25)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302373 (owner: 10Dereckson) [23:54:20] (03Merged) 10jenkins-bot: Revert "Add $wmgEchoMentionStatusNotifications and enable it in beta labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302373 (owner: 10Dereckson)