[02:54:32] (03PS2) 10Brian Wolff: Customize article.html for better bidi support [puppet] - 10https://gerrit.wikimedia.org/r/543252 (https://phabricator.wikimedia.org/T235458) [03:02:15] 10Operations, 10Wikimedia-General-or-Unknown, 10Readers-Web-Backlog (Tracking), 10SEO: Yoruba Language Wikipedia not being indexed by search engines - https://phabricator.wikimedia.org/T236241 (10Krinkle) In my experience, Google's index of Wikipedia content tends to be updated near real-time, in the order... [03:02:23] 10Operations, 10Wikimedia-General-or-Unknown, 10Readers-Web-Backlog (Tracking), 10SEO: Yoruba Language Wikipedia not being indexed by search engines - https://phabricator.wikimedia.org/T236241 (10Krinkle) 05Resolved→03Open [03:33:19] 10Operations, 10Security-Team, 10Wikimedia-Site-requests, 10MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), 10Patch-For-Review: Enable csp-report-only mode everywhere - https://phabricator.wikimedia.org/T207900 (10Bawolff) >>! In T207900#4846582, @Krinkle wrote: >>>! @Bawolff wrote at 10Operations, 10Wikimedia-General-or-Unknown, 10Readers-Web-Backlog (Needs Product Owner Decisions), 10SEO: Yoruba Language Wikipedia not being indexed by search engines - https://phabricator.wikimedia.org/T236241 (10Jdlrobson) The fact some of that [[ https://yo.wikipedia.org/wiki/D%E1%BA%B9%CC%80j%E1%BB%... [04:00:19] 10Operations, 10Wikimedia-General-or-Unknown, 10serviceops, 10Performance-Team (Radar), 10Wikimedia-Incident: Investigate recurrent GET latency spikes on MediaWiki appservers (Oct 2019) - https://phabricator.wikimedia.org/T235872 (10Krinkle) 05Open→03Resolved Sounds good to me. [04:06:44] 10Operations, 10Research, 10serviceops: Request for a in-memory caching data set for caching research - https://phabricator.wikimedia.org/T240503 (10Krinkle) [04:08:00] 10Operations, 10Research, 10serviceops: Request for a in-memory caching data set for caching research - https://phabricator.wikimedia.org/T240503 (10Krinkle) This is not a bug or feature request about MediaWiki core's ability to cache data. Rather it appears to be a request for data gathering and publication... [04:16:05] (03PS2) 10Brian Wolff: Adjust CSP header for pdfs & videos & set enforce on testwiki [puppet] - 10https://gerrit.wikimedia.org/r/547929 (https://phabricator.wikimedia.org/T117618) [04:34:38] (03CR) 10Jdlrobson: [C: 03+1] Re-add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557584 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [07:22:39] (03CR) 10Jcrespo: [C: 03+1] "Sorry I blocked this- the gtid issue is not such a thing- we have still spikes of lag* but after researching code, logs and production sta" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525147 (owner: 10Aaron Schulz) [07:23:32] (03PS5) 10Elukey: profile::analytics::cluster::packages::common: add libcrypto.so link [puppet] - 10https://gerrit.wikimedia.org/r/566062 (https://phabricator.wikimedia.org/T240934) [07:29:27] (03CR) 10Jcrespo: [C: 03+1] "I don't get this part." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525147 (owner: 10Aaron Schulz) [07:33:32] (03CR) 10Elukey: [C: 03+2] profile::analytics::cluster::packages::common: add libcrypto.so link [puppet] - 10https://gerrit.wikimedia.org/r/566062 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [08:29:13] (03PS3) 10Elukey: Set Spark2 encryption options as default for Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/566231 (https://phabricator.wikimedia.org/T240934) [08:48:46] (03PS1) 10Ema: cache: collect varnish fd count everywhere [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) [08:58:26] 10Operations, 10Continuous-Integration-Infrastructure: Add python3.8 to buster-wikimedia pyall component - https://phabricator.wikimedia.org/T241195 (10Legoktm) [08:59:54] 10Operations, 10Continuous-Integration-Infrastructure: Add python3.8 to buster-wikimedia pyall component - https://phabricator.wikimedia.org/T241195 (10Legoktm) >>! In T241195#5758441, @faidon wrote: > - That said, I don't have any intentions to backport 3.8 to stretch. Seems reasonable. Retitled the task acc... [09:02:59] (03CR) 10Filippo Giunchedi: [C: 03+1] Switch authdns* to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566476 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [09:04:03] (03CR) 10Filippo Giunchedi: [C: 03+2] Scrape perception survey alerts from Grafana [puppet] - 10https://gerrit.wikimedia.org/r/568092 (https://phabricator.wikimedia.org/T243865) (owner: 10Dave Pifke) [09:10:40] (03PS4) 10Filippo Giunchedi: install_server: introduce raid0 standard partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/564959 (https://phabricator.wikimedia.org/T156955) [09:10:42] (03PS3) 10Filippo Giunchedi: install_server: move oresrdb and sessionstore to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566290 (https://phabricator.wikimedia.org/T156955) [09:10:44] (03PS3) 10Filippo Giunchedi: install_server: switch ms-fe to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566291 (https://phabricator.wikimedia.org/T156955) [09:10:46] (03PS3) 10Filippo Giunchedi: install_server: switch wtp/weblog to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566293 (https://phabricator.wikimedia.org/T156955) [09:11:18] (03PS4) 10Filippo Giunchedi: install_server: switch wtp/weblog to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566293 (https://phabricator.wikimedia.org/T156955) [09:12:32] (03CR) 10Filippo Giunchedi: [C: 03+2] "Thanks for the reviews!" [puppet] - 10https://gerrit.wikimedia.org/r/566293 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:14:15] (03CR) 10Muehlenhoff: [C: 03+1] install_server: introduce raid0 standard partman recipe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564959 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:17:36] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: introduce raid0 standard partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/564959 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:23:48] (03PS1) 10Tarrow: Wikidata - enable TaintedRefs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569515 (https://phabricator.wikimedia.org/T241989) [09:25:08] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10ema) Thanks to netconsole (T242579) we finally managed to get the kernel oops of two upload@esams crashes. cp3051 crashing: ` Jan 26 21:20:27 ganeti3002 nc.openbsd[14771]: [3097828.536600] ------... [09:25:57] (03CR) 10Tarrow: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569515 (https://phabricator.wikimedia.org/T241989) (owner: 10Tarrow) [09:29:15] (03CR) 10Elukey: [C: 03+2] Set Spark2 encryption options as default for Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/566231 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [09:41:05] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10ema) Source code taken from linux-source-4.9 4.9.189-3+deb9u2, the crash is at net/core/skbuff.c:1212 (see ema@boron.eqiad.wmnet:~/linux-source-4.9): ` 1185 /** 1186 * pskb_expand_head - realloca... [09:49:29] (03PS1) 10ArielGlenn: filter out 10wiki from dump stats early enough to make a difference [puppet] - 10https://gerrit.wikimedia.org/r/569518 [09:51:12] (03CR) 10ArielGlenn: [C: 03+2] filter out 10wiki from dump stats early enough to make a difference [puppet] - 10https://gerrit.wikimedia.org/r/569518 (owner: 10ArielGlenn) [10:05:11] 10Operations, 10Citoid, 10Core Platform Team Workboards (Clinic Duty Team): Citoid is logging all request / response headers as separate fields - https://phabricator.wikimedia.org/T239713 (10Mvolz) [10:05:32] 10Operations, 10Citoid, 10observability, 10serviceops: Alert on 0 zotero requests from zotero - https://phabricator.wikimedia.org/T234544 (10Mvolz) [10:08:57] !log installing sudo security updates on stretch [10:08:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:03] (03CR) 10Filippo Giunchedi: "nit on metric name, LGTM otherwise" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema) [10:14:08] (03CR) 10Ema: [C: 03+2] traffic-pool.service: replace nginx with ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/567526 (https://phabricator.wikimedia.org/T231627) (owner: 10Ema) [10:24:08] !log temp disable puppet on cp hosts as precaution for https://gerrit.wikimedia.org/r/c/operations/puppet/+/563977 [10:24:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:23] (03CR) 10Filippo Giunchedi: [C: 03+2] varnish: use journald for varnishlog consumers [puppet] - 10https://gerrit.wikimedia.org/r/563977 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi) [10:24:37] (03PS7) 10Filippo Giunchedi: varnish: use journald for varnishlog consumers [puppet] - 10https://gerrit.wikimedia.org/r/563977 (https://phabricator.wikimedia.org/T227108) [10:25:49] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] varnish: use journald for varnishlog consumers [puppet] - 10https://gerrit.wikimedia.org/r/563977 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi) [10:45:00] (03PS2) 10Ema: cache: collect varnish fd count everywhere [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) [10:45:14] (03CR) 10Ema: cache: collect varnish fd count everywhere (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema) [10:46:01] (03CR) 10Filippo Giunchedi: [C: 03+1] cache: collect varnish fd count everywhere [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema) [10:49:07] (03PS1) 10Filippo Giunchedi: Revert "varnish: use journald for varnishlog consumers" [puppet] - 10https://gerrit.wikimedia.org/r/569529 [10:49:30] (03CR) 10Filippo Giunchedi: [C: 03+2] Revert "varnish: use journald for varnishlog consumers" [puppet] - 10https://gerrit.wikimedia.org/r/569529 (owner: 10Filippo Giunchedi) [10:50:33] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] Revert "varnish: use journald for varnishlog consumers" [puppet] - 10https://gerrit.wikimedia.org/r/569529 (owner: 10Filippo Giunchedi) [10:53:38] 10Operations, 10Traffic, 10Wikimedia-Logstash, 10observability, and 2 others: Port varnishlog consumers to log to syslog / logging infra - https://phabricator.wikimedia.org/T227108 (10fgiunchedi) Had to revert in https://gerrit.wikimedia.org/r/c/operations/puppet/+/569529, at least two issues found: 1. jo... [10:55:04] (03PS1) 10Elukey: spark: remove spark.io.encryption settings from defaults [puppet] - 10https://gerrit.wikimedia.org/r/569530 (https://phabricator.wikimedia.org/T240934) [11:02:04] (03CR) 10Elukey: [C: 03+2] spark: remove spark.io.encryption settings from defaults [puppet] - 10https://gerrit.wikimedia.org/r/569530 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [11:04:31] (03PS1) 10Jcrespo: acct: Add 2 line cron patch to mitigate cronspam [puppet] - 10https://gerrit.wikimedia.org/r/569532 (https://phabricator.wikimedia.org/T167035) [11:06:11] (03PS2) 10Jcrespo: acct: Add 2 line cron patch to mitigate cronspam [puppet] - 10https://gerrit.wikimedia.org/r/569532 (https://phabricator.wikimedia.org/T167035) [11:10:03] (03CR) 10Jcrespo: "I know this is bad and not the right way, but the cron is bothering me, and Filippo reported it almost 3 years ago. I am ok if you don't l" [puppet] - 10https://gerrit.wikimedia.org/r/569532 (https://phabricator.wikimedia.org/T167035) (owner: 10Jcrespo) [11:14:36] (03PS1) 10Filippo Giunchedi: varnish: switch logs from syslog to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569533 (https://phabricator.wikimedia.org/T227108) [11:16:42] (03CR) 10jerkins-bot: [V: 04-1] varnish: switch logs from syslog to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569533 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi) [11:17:01] PROBLEM - Host cp3057 is DOWN: PING CRITICAL - Packet loss = 100% [11:24:53] (03PS2) 10Filippo Giunchedi: varnish: switch logs from syslog to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569533 (https://phabricator.wikimedia.org/T227108) [11:26:07] PROBLEM - DPKG on proton1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [11:27:15] PROBLEM - DPKG on mx1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [11:29:45] RECOVERY - DPKG on proton1001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [11:30:04] jan_drewniak: Dear deployers, time to do the Wikimedia Portals Update deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200203T1130). [11:30:13] PROBLEM - rsyslog TLS listener on port 6514 on centrallog2001 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Logs [11:31:03] RECOVERY - rsyslog TLS listener on port 6514 on centrallog2001 is OK: SSL OK - Certificate centrallog2001.codfw.wmnet valid until 2024-11-16 16:04:24 +0000 (expires in 1748 days) https://wikitech.wikimedia.org/wiki/Logs [11:31:54] 10Operations, 10ops-esams, 10Traffic: cp3057 network down - https://phabricator.wikimedia.org/T244127 (10jcrespo) [11:32:40] 10Operations, 10ops-esams, 10Traffic, 10netops: cp3057 network down - https://phabricator.wikimedia.org/T244127 (10jcrespo) [11:32:45] RECOVERY - DPKG on mx1001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [11:38:01] !log powercycle cp3057 T244127 T238305 [11:38:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:05] T238305: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 [11:38:06] T244127: cp3057 network down - https://phabricator.wikimedia.org/T244127 [11:41:49] RECOVERY - Host cp3057 is UP: PING OK - Packet loss = 0%, RTA = 83.36 ms [11:50:42] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10ema) [11:50:55] 10Operations, 10ops-esams, 10Traffic, 10netops: cp3057 network down - https://phabricator.wikimedia.org/T244127 (10ema) p:05Triage→03Normal [11:51:36] jouncebot: next [11:51:36] In 0 hour(s) and 8 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200203T1200) [11:53:10] jouncebot: refresh [11:53:11] I refreshed my knowledge about deployments. [11:55:00] that’s a lot of changes in the calendar 🤨️ [11:55:33] 10Operations, 10ops-esams, 10Traffic, 10netops: cp3057 network down - https://phabricator.wikimedia.org/T244127 (10ema) The host went down at 11:17 according to icinga, and the following warning was reported a little earlier to netconsole. Unfortunately, we currently cannot tell which host sent which messa... [11:56:03] hi Lucas_WMDE :) [11:56:30] hi:) [12:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: Time to snap out of that daydream and deploy European Mid-day SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200203T1200). [12:00:05] Zoranzoki21, revi, tarrow, and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:06] hi [12:00:10] I can SWAT today! [12:00:57] 10Operations, 10ops-esams, 10Traffic, 10netops: cp3057 network down - https://phabricator.wikimedia.org/T244127 (10jcrespo) +1, there where icinga errors as early as 11:15: ` [2020-02-03 11:15:57] SERVICE ALERT: cp3057;Webrequests Varnishkafka log producer;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket ti... [12:01:50] o/ I'm here for SWAT. Seemingly fell off IRC at some point [12:02:02] hi tarrow [12:02:06] hey! [12:02:30] doh doh doh [12:03:02] (03CR) 10Revi: "(I'm taking the SWAT)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569267 (https://phabricator.wikimedia.org/T244022) (owner: 10Majavah) [12:03:04] (03PS1) 10Urbanecm: Remove $wgImgAuthDetails=true; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569536 [12:03:24] (03CR) 10Urbanecm: [C: 03+2] Remove $wgImgAuthDetails=true; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569536 (owner: 10Urbanecm) [12:03:26] (03CR) 10Urbanecm: [C: 03+2] Add wgImportSources for hiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569267 (https://phabricator.wikimedia.org/T244022) (owner: 10Majavah) [12:03:54] mobileapps-codfw soft-flopping again, like last week [12:04:24] (03Merged) 10jenkins-bot: Remove $wgImgAuthDetails=true; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569536 (owner: 10Urbanecm) [12:04:32] (03Merged) 10jenkins-bot: Add wgImportSources for hiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569267 (https://phabricator.wikimedia.org/T244022) (owner: 10Majavah) [12:04:35] jynus: should I stop SWATting? [12:04:57] Urbanecm: unless you are deploying movile apps services, no [12:05:08] *mobile [12:05:12] jynus: ack, thanks [12:05:39] revi: pulled onto mwdebug1001 [12:06:17] ack [12:06:23] maybe deployments could impact performance???? but I am not seeing significant impact/strong correlation on mw side [12:07:20] !log urbanecm@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: Remove $wgImgAuthDetails=true (T153459) (duration: 01m 36s) [12:07:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:34] jynus: ^^ is my first sync today [12:07:43] Urbanecm: so definitely unrelated [12:07:50] Urbanecm: +2 [12:08:04] revi: thanks [12:08:10] has been happening on an off for days anyway [12:08:35] revi: syncing [12:08:46] (abused my stew bit but who cares when I didn't commit anything :P) [12:08:56] (03PS2) 10Urbanecm: Wikidata - enable TaintedRefs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569515 (https://phabricator.wikimedia.org/T241989) (owner: 10Tarrow) [12:09:10] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569515 (https://phabricator.wikimedia.org/T241989) (owner: 10Tarrow) [12:09:20] thanks! [12:09:21] tarrow: +2'ed your patch, will ping you once it's ready to test [12:09:33] you'll do the deploy? [12:09:33] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 0c0ef87: Add wgImportSources for hiwikibooks (T244022) (duration: 01m 05s) [12:09:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:37] T244022: Enabling Transwiki Import on hi.wikibooks - https://phabricator.wikimedia.org/T244022 [12:09:42] coolio [12:09:51] tarrow: yes (unless you really want to do it yourself :)) [12:09:59] I guess I am done? /me goes to Starbucks to refill his last tea of the day [12:10:00] nah, I'm good [12:10:07] (03Merged) 10jenkins-bot: Wikidata - enable TaintedRefs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569515 (https://phabricator.wikimedia.org/T241989) (owner: 10Tarrow) [12:10:13] revi: yup [12:10:14] enjoy your tea [12:10:16] cya [12:10:40] tarrow: pulled onto mwdebug1001, please test and let me know [12:11:17] thanks! [12:11:20] just checking [12:12:20] change is as expected for me on wikidata.org [12:12:55] thanks tarrow [12:12:55] Hi, sorry for lating [12:13:07] Who works on today's SWAT? [12:13:16] * Urbanecm is [12:13:21] Cool [12:13:26] akosiaris: Hi. Got a minute, please? [12:13:59] I was in ambulance for 5-10 minutes [12:14:25] what happened to you Zoranzoki21 ? [12:14:32] хитна помоћ [12:14:36] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 6b497e7: Wikidata - enable TaintedRefs (T241989) (duration: 01m 06s) [12:14:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:40] T241989: Enable Tainted References on www.wikidata.org - https://phabricator.wikimedia.org/T241989 [12:14:47] 10Operations, 10ops-esams, 10Traffic: cp3057 network down - https://phabricator.wikimedia.org/T244127 (10ayounsi) [12:14:57] tarrow: deployed [12:15:01] thanks! [12:15:01] Ahh stomach and etc... [12:15:11] (03PS2) 10Urbanecm: Assign editautopatrolprotected to hewiki patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567240 (https://phabricator.wikimedia.org/T243665) [12:15:15] yw tarrow [12:15:20] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567240 (https://phabricator.wikimedia.org/T243665) (owner: 10Urbanecm) [12:15:41] (03PS7) 10Urbanecm: Add vzg-easydb.gbv.de to the wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565723 (https://phabricator.wikimedia.org/T243118) (owner: 10Zoranzoki21) [12:15:47] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565723 (https://phabricator.wikimedia.org/T243118) (owner: 10Zoranzoki21) [12:16:20] (03Merged) 10jenkins-bot: Assign editautopatrolprotected to hewiki patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567240 (https://phabricator.wikimedia.org/T243665) (owner: 10Urbanecm) [12:17:38] 10Operations, 10ops-esams, 10Traffic: cp3057 crash (was: network down) - https://phabricator.wikimedia.org/T244127 (10jcrespo) [12:19:08] 10Operations, 10ops-esams, 10Traffic: cp3057 crash (was: network down) - https://phabricator.wikimedia.org/T244127 (10jcrespo) [12:20:01] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 6c48af8: Assign editautopatrolprotected to hewiki patrollers (T243665) (duration: 01m 06s) [12:20:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:20:05] T243665: Hewiki: Allow patroller group to edit autopatrol protect level - https://phabricator.wikimedia.org/T243665 [12:21:30] (03PS8) 10Urbanecm: Add vzg-easydb.gbv.de to the wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565723 (https://phabricator.wikimedia.org/T243118) (owner: 10Zoranzoki21) [12:21:37] (03CR) 10Urbanecm: [C: 03+2] Add vzg-easydb.gbv.de to the wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565723 (https://phabricator.wikimedia.org/T243118) (owner: 10Zoranzoki21) [12:22:02] Cool, you can deploy wgCopyUploadsDomains patch without mwdebug [12:22:30] I am at home, let me log in from laptop [12:22:33] (03Merged) 10jenkins-bot: Add vzg-easydb.gbv.de to the wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565723 (https://phabricator.wikimedia.org/T243118) (owner: 10Zoranzoki21) [12:23:03] sure [12:24:55] I'm on laptop now :) [12:24:57] (03PS6) 10Urbanecm: Re-add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557584 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [12:25:00] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 32e0356: Add vzg-easydb.gbv.de to the wgCopyUploadsDomains (T243118) (duration: 01m 07s) [12:25:02] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557584 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [12:25:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:25:04] T243118: Add vzg-easydb.gbv.de to the wgCopyUploadsDomains whitelist of Wikimedia Commons - https://phabricator.wikimedia.org/T243118 [12:25:29] (03PS4) 10Urbanecm: Add wordmark for etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565549 (https://phabricator.wikimedia.org/T230379) (owner: 10Pikne) [12:25:32] hi Zoranzoki21 [12:25:36] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565549 (https://phabricator.wikimedia.org/T230379) (owner: 10Pikne) [12:26:03] (03Merged) 10jenkins-bot: Re-add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557584 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [12:26:38] Zoranzoki21: I'm syncing the png/svgs now [12:26:38] Related to: "If it's okay with Zoranzoki21 and time allows could you also deploy the following similar changes (jdlrobson (talk) 04:38, 3 February 2020 (UTC))?:" [12:26:40] (03Merged) 10jenkins-bot: Add wordmark for etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565549 (https://phabricator.wikimedia.org/T230379) (owner: 10Pikne) [12:26:42] Ok [12:29:12] !log urbanecm@deploy1001 Synchronized static/images/mobile/copyright/: SWAT: 76e67cd: e266e25: Add static wordmarks for szlwiki and etwiki (T233104, T230379) (duration: 01m 06s) [12:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:16] T230379: Estonian Wikipedia does not have a SVG wordmark - https://phabricator.wikimedia.org/T230379 [12:29:17] T233104: Add localized Wikipedia wordmark to the Silesian (szl) mobile frontend - https://phabricator.wikimedia.org/T233104 [12:30:37] Zoranzoki21: and now the IS.php... [12:30:41] !log urbanecm@deploy1001 Synchronized static/images/mobile/copyright/: SWAT: 76e67cd: e266e25: Add wordmarks for szlwiki and etwiki (T233104, T230379) (duration: 01m 06s) [12:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:07] cool [12:31:20] Zoranzoki21: could you check and update the tasks, please? [12:31:59] !log Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg (T233104) [12:32:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:13] Which all? [12:33:33] See the task numbers I mentioned in SAL [12:33:43] (03PS14) 10Urbanecm: Add minerva custom log for la.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557439 (https://phabricator.wikimedia.org/T240728) (owner: 10Ammarpad) [12:34:05] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557439 (https://phabricator.wikimedia.org/T240728) (owner: 10Ammarpad) [12:35:05] (03Merged) 10jenkins-bot: Add minerva custom log for la.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557439 (https://phabricator.wikimedia.org/T240728) (owner: 10Ammarpad) [12:35:35] !log installing openjpeg2 security updates [12:35:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:05] Zoranzoki21: 557439 is going now [12:37:52] !log urbanecm@deploy1001 Synchronized static/images/mobile/copyright/: SWAT: 5f13c19: Add minerva custom log for la.wiki (T240728; 1/2) (duration: 01m 06s) [12:37:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:55] T240728: Create and use the Latin wikipedia (VICIPÆDIA) wordmark on mobile site - https://phabricator.wikimedia.org/T240728 [12:38:01] Urbanecm: Cool, I have +3 patches more [12:38:26] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/562025/ and +2 from https://phabricator.wikimedia.org/T243509 [12:38:32] Can you deploy it too pls? [12:38:37] Zoranzoki21: could you add them to the calendar please? [12:38:42] Yes, sure [12:40:05] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 5f13c19: Add minerva custom log for la.wiki (T240728; 2/2) (duration: 01m 06s) [12:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:40:25] Added [12:40:50] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562025 (https://phabricator.wikimedia.org/T241888) (owner: 10Ammarpad) [12:40:53] thanks [12:41:46] (03Merged) 10jenkins-bot: Disable MobileFrontend Mainpage special casing on frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562025 (https://phabricator.wikimedia.org/T241888) (owner: 10Ammarpad) [12:42:22] Zoranzoki21: could you test https://gerrit.wikimedia.org/r/562025 at mwdebug1001, please? [12:42:55] No needs i [12:42:57] *it [12:43:46] ack [12:45:58] !log urbanecm@deploy1001 Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: e9387b2: Disable MobileFrontend Mainpage special casing on frwiktionary (T241888) (duration: 01m 05s) [12:46:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:02] T241888: Disabling MFSpecialCaseMainPage on French Wiktionary - https://phabricator.wikimedia.org/T241888 [12:47:16] Zoranzoki21: could you verify it now? [12:47:33] (03CR) 10Urbanecm: [C: 03+2] Update logo for zh_classical wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567307 (https://phabricator.wikimedia.org/T243509) (owner: 10Ammarpad) [12:48:22] (03CR) 10Urbanecm: [C: 04-1] "Could you ask them to configure extension on-wiki please? https://www.mediawiki.org/wiki/Extension:NewUserMessage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567306 (https://phabricator.wikimedia.org/T243509) (owner: 10Ammarpad) [12:48:23] good to go [12:48:31] Zoranzoki21: ^^ [12:48:33] (03Merged) 10jenkins-bot: Update logo for zh_classical wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567307 (https://phabricator.wikimedia.org/T243509) (owner: 10Ammarpad) [12:48:37] NewUserMessage is not configured on-wiki yet [12:49:49] ok, let's ignore that patch and work on 567307 [12:50:04] I'm syncing on that one [12:50:10] just asking you to communicate that on-task [12:51:30] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: SWAT: af0b745: Update logo for zh_classical wiki (T243509) (duration: 01m 06s) [12:51:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:51:34] T243509: Site requests for zh-classical wikipedia - https://phabricator.wikimedia.org/T243509 [12:51:34] Done [12:52:06] All ok, I think we done European Mid-Day SWAT today :) [12:52:33] !log Purge https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki*.png (T243509) [12:52:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:45] !log Morning SWAT done [12:52:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:55] Morning? [12:53:30] !Log Previous message should be "EU SWAT done" [12:53:32] thanks [12:53:37] !log Previous message should be "EU SWAT done" [12:53:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:47] yw [12:53:53] "it is always morning somewhere" :-D [12:54:14] so never technically wrong :-D [12:54:15] yes, right [12:54:17] I think it’s morning in NYC right now? ^^ [12:54:37] still a bit early for the west coast though [12:55:14] hehe [12:55:21] :D [12:57:48] !log installing spamassassin security updates [12:57:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:47] !log deactivate v6 BGP to AS25596 [12:58:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:33] 10Operations, 10observability, 10Patch-For-Review: log spam from mtail 3.0.0~rc19 on wezen - https://phabricator.wikimedia.org/T225604 (10MoritzMuehlenhoff) [13:15:48] revi: thanks for cleaning up the wikitech spam [13:16:13] say hi to our LTA :P [13:16:27] /joke [13:20:15] (03PS1) 10Effie Mouzeli: (WIP) mcrouter: add gutter pool servers in configuration [puppet] - 10https://gerrit.wikimedia.org/r/569541 (https://phabricator.wikimedia.org/T213089) [13:22:19] (03CR) 10jerkins-bot: [V: 04-1] (WIP) mcrouter: add gutter pool servers in configuration [puppet] - 10https://gerrit.wikimedia.org/r/569541 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli) [13:28:52] (03CR) 10Filippo Giunchedi: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/569532 (https://phabricator.wikimedia.org/T167035) (owner: 10Jcrespo) [13:31:32] !log rebooting ganeti1009 - ganeti1022 to pick up microcode update T228924 [13:31:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:36] T228924: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 [13:32:11] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [13:32:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:20] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:32:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:28] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [13:32:29] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:32:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:35] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1009 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [13:38:55] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:43:56] zhwiki, although it looks like a larger spike than usual [13:44:03] will keep monitoring it [13:46:33] (03PS3) 10Filippo Giunchedi: varnish: switch logs from syslog to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569533 (https://phabricator.wikimedia.org/T227108) [13:48:13] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:50:04] (03CR) 10Filippo Giunchedi: "Tested on cp5012 and DTRT, please take a look!" [puppet] - 10https://gerrit.wikimedia.org/r/569533 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi) [13:52:20] (03CR) 10Filippo Giunchedi: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/566303 (owner: 10Filippo Giunchedi) [13:58:05] !log installing libidn2 security updates [13:58:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:10] 10Operations, 10SRE-Access-Requests: Requesting access to deployment for niedzielski - https://phabricator.wikimedia.org/T243924 (10jijiki) @Niedzielski please check the boxes of the actions which are already done, so we know where are stand, @MarkTraceur please approve this request. Thank you! [14:05:41] 10Operations, 10SRE-Access-Requests: Requesting access to deployment for niedzielski - https://phabricator.wikimedia.org/T243924 (10Niedzielski) [14:08:06] (03PS1) 10Muehlenhoff: Add library hint for libidn2 [puppet] - 10https://gerrit.wikimedia.org/r/569547 [14:12:09] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for libidn2 [puppet] - 10https://gerrit.wikimedia.org/r/569547 (owner: 10Muehlenhoff) [14:18:09] (03CR) 10Jcrespo: [C: 04-1] "> * add the fix to the 'toil' module since that's essentially what this is" [puppet] - 10https://gerrit.wikimedia.org/r/569532 (https://phabricator.wikimedia.org/T167035) (owner: 10Jcrespo) [14:18:20] !log T243634 ✔️ cdanis@cp4031.ulsfo.wmnet ~ 🕤☕ sudo varnish-frontend-restart [14:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:25] T243634: ulsfo varinsh-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634 [14:19:44] (03CR) 10CDanis: [C: 03+1] cache: collect varnish fd count everywhere (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema) [14:21:17] !log restarting slapd on ldap-corp* to pick up libidn2 security updates [14:21:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:19] RECOVERY - Maps tiles generation on icinga1001 is OK: OK: Less than 90.00% under the threshold [10.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1 [14:30:24] (03PS3) 10Ema: cache: collect varnish fd count everywhere [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) [14:32:59] (03CR) 10Ema: cache: collect varnish fd count everywhere (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema) [14:35:30] (03PS4) 10Ema: cache: collect varnish fd count everywhere [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) [14:41:11] (03CR) 10Ema: [C: 03+2] cache: collect varnish fd count everywhere [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema) [14:41:20] (03CR) 10Ema: [C: 03+2] cache: collect varnish fd count everywhere [puppet] - 10https://gerrit.wikimedia.org/r/569513 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema) [14:44:02] !log restarting apache on an-tool*. cloudmetrics*, logstash*, grafana1002 to pick up libidn security update [14:44:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:20] !log restarting superset on an-tool1004/1005 to pick up libidn security update [14:55:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:21] !log restarting exim on phab* to pick up libidn security update [14:59:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:35] (03CR) 10Andrew Bogott: [C: 03+1] "This sounds good to me! One request -- can you add 'on VMs' to the top-level patch title? Like, "refactor hiera for cloud VMs' or someth" [puppet] - 10https://gerrit.wikimedia.org/r/569230 (https://phabricator.wikimedia.org/T229441) (owner: 10Arturo Borrero Gonzalez) [15:02:07] (03CR) 10Andrew Bogott: [C: 03+2] openstack: Add Python 3 support to wmcs-region-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565799 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:02:54] (03CR) 10Andrew Bogott: [C: 03+2] openstack: Add Python 3 support to wmcs-region-migrate-security-groups [puppet] - 10https://gerrit.wikimedia.org/r/565798 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:03:07] (03CR) 10Andrew Bogott: [C: 03+2] openstack: Add Python 3 support to wmcs-region-migrate-quotas [puppet] - 10https://gerrit.wikimedia.org/r/565797 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:04:18] (03CR) 10Andrew Bogott: [C: 03+2] openstack: Add Python 3 support to wmcs-makedomain [puppet] - 10https://gerrit.wikimedia.org/r/565796 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:04:50] (03CR) 10Andrew Bogott: [C: 03+2] openstack: Add Python 3 support to wmcs-live-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565795 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:05:21] (03CR) 10Andrew Bogott: [C: 03+2] openstack: Add Python 3 support to wmcs-cold-nova-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565794 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:06:03] (03CR) 10Andrew Bogott: [C: 03+2] openstack: Add Python 3 support to wmcs-cold-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565793 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:07:14] (03PS2) 10Andrew Bogott: openstack: Add Python 3 support to wmcs-cold-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565793 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:08:13] (03PS1) 10Muehlenhoff: Add Icinga check for SMTP on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/569563 [15:08:17] (03PS2) 10Andrew Bogott: openstack: Add Python 3 support to wmcs-cold-nova-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565794 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:08:29] (03PS2) 10Andrew Bogott: openstack: Add Python 3 support to wmcs-live-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565795 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:08:49] (03PS2) 10Andrew Bogott: openstack: Add Python 3 support to wmcs-makedomain [puppet] - 10https://gerrit.wikimedia.org/r/565796 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:09:05] (03PS2) 10Andrew Bogott: openstack: Add Python 3 support to wmcs-region-migrate-quotas [puppet] - 10https://gerrit.wikimedia.org/r/565797 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:09:17] (03PS2) 10Andrew Bogott: openstack: Add Python 3 support to wmcs-region-migrate-security-groups [puppet] - 10https://gerrit.wikimedia.org/r/565798 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:09:29] (03PS2) 10Andrew Bogott: openstack: Add Python 3 support to wmcs-region-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565799 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [15:17:47] (03CR) 10Ema: [C: 03+1] varnish: switch logs from syslog to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569533 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi) [15:21:06] 10Operations: ProdPasteBot uses deprecated certificate auth - https://phabricator.wikimedia.org/T242857 (10Dsharpe) Is there any update on this one? I ask because it is blocking "Update WMF run bots using certificate auth (Phaste Bot and bzimport) to use token auth" action item in incident https://docs.google.... [15:22:50] (03PS1) 10Filippo Giunchedi: wip: cassandra logs to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569564 (https://phabricator.wikimedia.org/T213899) [15:23:18] (03CR) 10Filippo Giunchedi: [C: 03+2] varnish: switch logs from syslog to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569533 (https://phabricator.wikimedia.org/T227108) (owner: 10Filippo Giunchedi) [15:30:08] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for nfacctd [puppet] - 10https://gerrit.wikimedia.org/r/569567 (https://phabricator.wikimedia.org/T135991) [15:33:07] (03PS1) 10Andrew Bogott: Openstack scripts: use keystoneauth1.session instead of keystoneclient.session [puppet] - 10https://gerrit.wikimedia.org/r/569568 [15:33:43] 10Operations, 10netops: BFD session alerts due to inconsistent status on cr3-knams - https://phabricator.wikimedia.org/T240659 (10ayounsi) Now that the issue is on the cr1-eqiad to cr3-knams link, I'm going to push the following: `lang=diff,name=cr3-knams [edit system syslog] file messages { ... } + fi... [15:33:54] !log add debug on eqiad-knams link interfaces - T240659 [15:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:00] T240659: BFD session alerts due to inconsistent status on cr3-knams - https://phabricator.wikimedia.org/T240659 [15:34:09] 10Operations, 10Wikimedia-Logstash, 10observability, 10Patch-For-Review, 10User-herron: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs - https://phabricator.wikimedia.org/T213899 (10fgiunchedi) [15:36:33] (03CR) 10Andrew Bogott: [C: 03+2] Openstack scripts: use keystoneauth1.session instead of keystoneclient.session [puppet] - 10https://gerrit.wikimedia.org/r/569568 (owner: 10Andrew Bogott) [15:38:30] !log rollback: add debug on eqiad-knams link interfaces - T240659 [15:38:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:02] (03CR) 10Ammarpad: "Thanks, I was not aware the extension is not even installed there." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567306 (https://phabricator.wikimedia.org/T243509) (owner: 10Ammarpad) [15:41:45] (03CR) 10Ammarpad: "Oh, sorry. I confused the onwiki config with installation. I got it now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567306 (https://phabricator.wikimedia.org/T243509) (owner: 10Ammarpad) [15:47:11] PROBLEM - Host mc2035.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:49:48] (03CR) 10Ammarpad: "> Could you ask them to configure extension on-wiki please?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567306 (https://phabricator.wikimedia.org/T243509) (owner: 10Ammarpad) [15:50:05] (03PS2) 10Filippo Giunchedi: wip: cassandra logs to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569564 (https://phabricator.wikimedia.org/T213899) [15:50:07] (03PS1) 10Filippo Giunchedi: cassandra: use wmflib::secret for binary files [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T213899) [15:54:15] (03CR) 10Filippo Giunchedi: "This fixes the compilation issues for me:" [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T213899) (owner: 10Filippo Giunchedi) [15:58:49] (03CR) 10Ayounsi: [C: 03+1] "I have no idea how it works, but it's fine to restart the process occasionally anytime." [puppet] - 10https://gerrit.wikimedia.org/r/569567 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:59:13] RECOVERY - Host mc2035.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.69 ms [16:02:35] PROBLEM - Host ms-be2029.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:02:45] PROBLEM - Host ms-be2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:08:56] godog: is this expected or should we do something ^ [16:08:58] ? [16:10:38] effie: not expected, since it is mgmt I'm assuming onsite work though cc papaul [16:11:25] oh cool right [16:14:19] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [16:20:53] RECOVERY - Host ms-be2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.67 ms [16:21:42] (03PS2) 10Eevans: Configure remainder of testwikis group for kask-transition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565696 (https://phabricator.wikimedia.org/T243106) [16:25:19] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [16:26:17] PROBLEM - Host ms-be2031.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:26:51] RECOVERY - Host ms-be2029.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.65 ms [16:28:36] 10Operations: Remove mobrovac@wikimedia.org from techcom@wikimedia.org - https://phabricator.wikimedia.org/T244146 (10kchapman) [16:28:59] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [16:31:05] (03PS2) 10Arturo Borrero Gonzalez: cloud: hiera: puppetmaster: refactor hiera (for VM instances) [puppet] - 10https://gerrit.wikimedia.org/r/569230 (https://phabricator.wikimedia.org/T229441) [16:35:16] (03PS1) 10Eevans: Upgrade staging to Kask v1.0.6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/569575 (https://phabricator.wikimedia.org/T243106) [16:35:37] RECOVERY - Host ms-be2030.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.72 ms [16:36:39] (03CR) 10Eevans: [V: 03+2 C: 03+2] Upgrade staging to Kask v1.0.6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/569575 (https://phabricator.wikimedia.org/T243106) (owner: 10Eevans) [16:38:23] PROBLEM - Host ms-be2032.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:38:40] !log eevans@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' . [16:38:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:25] RECOVERY - Host ms-be2031.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.71 ms [16:45:26] (03PS1) 10Eevans: Upgrade sessionstore production to Kask v1.0.6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/569577 (https://phabricator.wikimedia.org/T243106) [16:46:45] (03CR) 10Eevans: [V: 03+2 C: 03+2] Upgrade sessionstore production to Kask v1.0.6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/569577 (https://phabricator.wikimedia.org/T243106) (owner: 10Eevans) [16:48:00] !log eevans@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' . [16:48:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:51:35] PROBLEM - Host ms-be2033.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:51:35] PROBLEM - Host ms-be2034.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:52:19] !log eevans@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' . [16:52:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:32] <_joe_> urandom: you don't need to V+2 in that repo [16:55:58] <_joe_> urandom: CI is running there and can save you from the most obvious mistakes [16:56:31] RECOVERY - Host ms-be2032.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.67 ms [16:56:58] _joe_: auh, OK [16:57:05] _joe_: has that always been the case? [16:57:41] <_joe_> urandom: nope [16:57:57] Ok [16:58:03] <_joe_> that's why I'm telling you, we introduced it in december IIRC [16:58:11] awesome [16:59:11] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Add Itamar Givon to the ldap/wmde group - https://phabricator.wikimedia.org/T244148 (10ItamarWMDE) [16:59:27] PROBLEM - Host ms-be2035.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:03:37] RECOVERY - Host ms-be2033.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.72 ms [17:04:25] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:05:23] PROBLEM - Host ms-be2036.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:06:15] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:07:25] PROBLEM - Host blog.wikimedia.org is DOWN: /bin/ping -n -U -w 15 -c 5 blog.wikimedia.org [17:08:55] RECOVERY - Host blog.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.61 ms [17:09:43] RECOVERY - Host ms-be2034.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.78 ms [17:11:25] PROBLEM - Host ms-be2037.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:11:29] RECOVERY - Host ms-be2035.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.74 ms [17:13:05] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Add Itamar Givon to the ldap/wmde group - https://phabricator.wikimedia.org/T244148 (10RStallman-legalteam) Hello @ItamarWMDE, I can create the NDA for you and send it to you for electronic signature. I couldn't find your email address on the WMDE staff page... [17:13:53] PROBLEM - Host ms-be2038.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:14:55] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Add Itamar Givon to the ldap/wmde group - https://phabricator.wikimedia.org/T244148 (10WMDE-leszek) As an Engineering Manager at WMDE I endorse this request, and confirm @ItamarWMDE is who he claims. @RStallman-legalteam could you please send the NDA to sign... [17:15:14] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Add Itamar Givon to the ldap/wmde group - https://phabricator.wikimedia.org/T244148 (10WMDE-leszek) [17:17:21] PROBLEM - Host ms-be2039.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:19:05] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:23:33] RECOVERY - Host ms-be2036.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.67 ms [17:24:37] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:29:21] RECOVERY - Host ms-be2039.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.68 ms [17:29:35] RECOVERY - Host ms-be2037.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.68 ms [17:29:49] (03PS1) 10Effie Mouzeli: hieradata: put memcached gutter hosts in cluster memcached_gutter [puppet] - 10https://gerrit.wikimedia.org/r/569579 (https://phabricator.wikimedia.org/T240684) [17:32:21] (03PS10) 10ArielGlenn: write out and reuse pagerage info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434) [17:34:00] 10Operations, 10observability, 10Patch-For-Review: log spam from mtail 3.0.0~rc19 on wezen - https://phabricator.wikimedia.org/T225604 (10colewhite) @MoritzMuehlenhoff doing that shouldn't hurt anything AFAIK. [17:38:05] RECOVERY - Host ms-be2038.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.69 ms [17:41:05] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [17:41:37] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [17:45:38] hello folks. as the train is still blocked on T243548, i'm planning to see if i can usefully reproduce on a mwdebug box and will be updating wikiversions there accordingly. [17:45:39] T243548: Elevated response times and CPU usage after deploy of 1.35.0-wmf.16 to all wikis - https://phabricator.wikimedia.org/T243548 [17:46:18] i don't see anything listed under the upcoming wikidata query service window, but please ping me if i'm standing in the way of anything. [17:52:05] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:55:07] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [17:56:17] ^ looks like a spike of timeouts on zhwiki [17:58:49] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [17:59:27] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:00:05] gehel and onimisionipe: My dear minions, it's time we take the moon! Just kidding. Time for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200203T1800). [18:02:29] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [18:03:31] (03CR) 10Jeena Huneidi: "> Patch Set 10:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/545421 (https://phabricator.wikimedia.org/T228910) (owner: 10Jeena Huneidi) [18:03:48] (03Abandoned) 10Jeena Huneidi: Modify Restrouter chart to allow for minikube development [deployment-charts] - 10https://gerrit.wikimedia.org/r/545421 (https://phabricator.wikimedia.org/T228910) (owner: 10Jeena Huneidi) [18:09:26] (03CR) 10Giuseppe Lavagetto: "> Patch Set 10:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/545421 (https://phabricator.wikimedia.org/T228910) (owner: 10Jeena Huneidi) [18:12:40] (03CR) 10Jeena Huneidi: "> Patch Set 10:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/545421 (https://phabricator.wikimedia.org/T228910) (owner: 10Jeena Huneidi) [18:12:46] (03PS1) 10Mholloway: Update wikifeeds to 2020-02-03-180007-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/569585 [18:15:16] (03CR) 10Mholloway: [C: 03+2] Update wikifeeds to 2020-02-03-180007-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/569585 (owner: 10Mholloway) [18:15:33] (03Merged) 10jenkins-bot: Update wikifeeds to 2020-02-03-180007-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/569585 (owner: 10Mholloway) [18:17:53] !log mholloway-shell@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' . [18:17:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:57] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [18:22:39] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [18:23:22] !log mholloway-shell@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' . [18:23:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:43] !log mholloway-shell@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' . [18:25:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:17] !log edited /srv/mediawiki-stating/wikiversions.json on deploy1001; scap pull and scap wikiversions-compile on mwdebug1002; revert wikiversions changes on deploy1001. [18:34:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:26] !log doc1001: chown -R nobody:wikidev /srv/docroot [18:44:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:32] ah.. there is a ticket for that by the way [18:47:38] doc1001 permission problems https://phabricator.wikimedia.org/T237707 [18:48:07] thanks bblack. that once looked resolved but is reopened [18:56:17] !doc1001 sudo -u doc-uploader chmod g+w /srv/docroot/org/wikimedia/doc (the 'manual fixed' from https://gerrit.wikimedia.org/r/c/operations/puppet/+/484304) [18:57:41] < bblack> !log doc1001: chown -R nobody:wikidev /srv/docroot | < mutante> !doc1001 sudo -u doc-uploader chmod g+w /srv/docroot/org/wikimedia/doc | https://gerrit.wikimedia.org/r/c/operations/puppet/+/484304 | (T237707) [18:57:42] T237707: doc1001 permission problems for doc.wikimedia.org deploy - https://phabricator.wikimedia.org/T237707 [18:58:12] !log < bblack> !log doc1001: chown -R nobody:wikidev /srv/docroot | < mutante> !doc1001 sudo -u doc-uploader chmod g+w /srv/docroot/org/wikimedia/doc | https://gerrit.wikimedia.org/r/c/operations/puppet/+/484304 | (T237707) [18:58:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:05] RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200203T1900). [19:00:05] urandom and Pikne: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:20] o/ [19:00:23] I can SWAT today! [19:00:34] please note i've still got mwdebug1002 pointed at wmf.16 for all wikis. [19:00:43] brennen: should I wait? [19:01:11] Urbanecm: go ahead; i'll hold any further activity until after SWAT. [19:01:17] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [19:01:17] brennen: thanks [19:01:25] (03CR) 10Dzahn: [C: 04-2] "per hashar and https://phabricator.wikimedia.org/T236675" [puppet] - 10https://gerrit.wikimedia.org/r/566383 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [19:01:31] Pikne: you around? [19:01:35] Hi [19:01:45] (just note that wikiversions.json is using wmf.16 for everything on mwdebug1002, which does not reflect reality.) [19:01:55] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [19:02:01] (03CR) 10Urbanecm: [C: 03+2] Configure remainder of testwikis group for kask-transition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565696 (https://phabricator.wikimedia.org/T243106) (owner: 10Eevans) [19:02:21] brennen: taking that into account :). Thanks for letting me know. [19:02:27] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [19:02:42] sure thing. [19:03:00] (03Merged) 10jenkins-bot: Configure remainder of testwikis group for kask-transition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565696 (https://phabricator.wikimedia.org/T243106) (owner: 10Eevans) [19:03:07] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569310 (owner: 10Pikne) [19:03:39] urandom: pulled onto mwdebug1001, let me know if I can deploy [19:04:01] Urbanecm: checking [19:04:05] (03Merged) 10jenkins-bot: Add gcr, mnw and szy to InterwikiSortOrders [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569310 (owner: 10Pikne) [19:04:36] thanks [19:07:49] Urbanecm: looks good [19:07:57] urandom: syncing [19:08:28] ty! [19:08:32] yw [19:09:44] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 7bb6a12: Configure remainder of testwikis group for kask-transition (T243106) (duration: 01m 14s) [19:09:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:47] T243106: Phased rollout of sessionstore to production fleet - https://phabricator.wikimedia.org/T243106 [19:10:21] Pikne: pulled onto mwdebug1001, not sure if it's possible to test it [19:10:36] I'll see [19:11:01] Pikne: see https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug if you're looking for docs [19:13:16] I've used the addon for a couple of times before [19:13:27] okay, wasn't sure if you know about that :) [19:13:45] But regretfully I'm not getting the desired result at the moment [19:14:12] (03PS1) 10Eevans: Configure remainder of testwikis group for kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569609 (https://phabricator.wikimedia.org/T243106) [19:14:20] E.g. on https://et.wikipedia.org/wiki/Esileht last interwiki link shouldn't be "Sakizaya" [19:15:26] I'm trying to pull once more [19:18:34] I believe this is some kind of a cache, I'll sync anyway - code seems to be good [19:19:42] !log doc1001 - chown -R doc-uploader:doc-uploader /srv/docroot ; temp. disabled puppet (T237707) [19:19:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:46] T237707: doc1001 permission problems for doc.wikimedia.org deploy - https://phabricator.wikimedia.org/T237707 [19:20:11] !log urbanecm@deploy1001 Synchronized wmf-config/InterwikiSortOrders.php: SWAT: 7b53a52: Add gcr, mnw and szy to InterwikiSortOrders (duration: 01m 11s) [19:20:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:21:07] !log Morning SWAT done [19:21:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:21:16] brennen: I'm done [19:21:25] Ah, indeed, null edit on page did the trick, look good now [19:21:28] Thanks! [19:22:00] Wonderful! [19:22:01] Yw [19:22:30] (03PS2) 10Jforrester: [officewiki] Enable VisualEditor desktop section editing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566867 [19:22:49] brennen: Can I sneak out ^ for a bit of fun? [19:23:52] James_F: by all means [19:23:58] Awesome. [19:24:02] (03CR) 10Jforrester: [C: 03+2] [officewiki] Enable VisualEditor desktop section editing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566867 (owner: 10Jforrester) [19:25:01] (03Merged) 10jenkins-bot: [officewiki] Enable VisualEditor desktop section editing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566867 (owner: 10Jforrester) [19:27:31] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [officewiki] Enable VisualEditor desktop section editing (duration: 01m 07s) [19:27:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:55] brennen: Clear. [19:32:38] James_F: thx. [19:33:47] (03PS1) 10Dzahn: doc: use doc-uploader group for docroot privs, stop using shared=>true [puppet] - 10https://gerrit.wikimedia.org/r/569620 (https://phabricator.wikimedia.org/T237707) [19:36:09] (03PS2) 10Dzahn: doc: use doc-uploader group for docroot privs, stop using shared=>true [puppet] - 10https://gerrit.wikimedia.org/r/569620 (https://phabricator.wikimedia.org/T237707) [19:38:41] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:39:12] (03CR) 10Dzahn: "As the existing ticket says "There is always some permission problem or another." and it struck again today. This attempt would be to make" [puppet] - 10https://gerrit.wikimedia.org/r/569620 (https://phabricator.wikimedia.org/T237707) (owner: 10Dzahn) [19:40:54] (03CR) 10Jforrester: [C: 03+1] doc: use doc-uploader group for docroot privs, stop using shared=>true [puppet] - 10https://gerrit.wikimedia.org/r/569620 (https://phabricator.wikimedia.org/T237707) (owner: 10Dzahn) [19:41:37] brennen: did you manage to figure out whats going on with the cpu times at all :/ ? [19:42:23] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:43:13] addshore: Nope. [19:43:23] is it still on mwdebug1001? [19:48:41] addshore: not a clue. [19:49:21] addshore: sorry, to the last question - i'm working on mwdebug1002. [19:49:43] ack, i was gonna stare at some profiles for a few min see if i spot anything [19:50:16] i say "working", but what i really mean is this is the first time i've tried to profile anything like this in production and i'm pretty much lost. [19:50:23] any input welcome. :) [19:50:43] :D [19:50:54] im trying to figure out how this comapre thingy works in xhgui [19:51:48] (03CR) 10Aaron Schulz: Use GTIDs for master position queries for external DB when possible (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525147 (owner: 10Aaron Schulz) [19:51:51] (03PS4) 10Aaron Schulz: Use GTIDs for master position queries for external DB when possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525147 [19:52:47] (03PS1) 10Zoranzoki21: Add .webm in files.viewable-mime-types of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T244162) [19:53:00] (03PS5) 10Aaron Schulz: Avoid code duplication in 'lagDetectionMethod' settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525147 [19:53:40] (03PS2) 10Zoranzoki21: Add .webm in files.viewable-mime-types of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T244162) [20:06:27] (03CR) 10Ayounsi: [V: 03+2 C: 03+2] "tested on cr3-knams" [homer/public] - 10https://gerrit.wikimedia.org/r/562505 (https://phabricator.wikimedia.org/T243482) (owner: 10Ayounsi) [20:08:40] 10Operations, 10ops-codfw, 10fundraising-tech-ops: codfw: rack/setup/install 3 new payments server for frack - https://phabricator.wikimedia.org/T244169 (10Papaul) [20:08:51] 10Operations, 10ops-codfw, 10fundraising-tech-ops: codfw: rack/setup/install 3 new payments server for frack - https://phabricator.wikimedia.org/T244169 (10Papaul) p:05Triage→03Normal [20:09:12] (03CR) 10Dzahn: [C: 03+2] "since i already ran the command to let doc-uploader own the files i'll merge it to reflect reality and enable puppet again. but please do " [puppet] - 10https://gerrit.wikimedia.org/r/569620 (https://phabricator.wikimedia.org/T237707) (owner: 10Dzahn) [20:12:53] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [20:13:56] !log doc1001 - re-enabled puppet after merging gerrit:569620 - Git::Clone[integration/docroot]/File[/srv/docroot]/mode: mode changed '2775' to '0755' - Profile::Doc/File[/srv/docroot/org/wikimedia/doc]/group: group changed 'doc-uploader' to 'wikidev', mode changed '0775' to '0755'. needs another follow-up (T237707) [20:13:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:02] T237707: doc1001 permission problems for doc.wikimedia.org deploy - https://phabricator.wikimedia.org/T237707 [20:18:25] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [20:19:24] !log remove test flowspec rule from cr3-knams [20:19:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:19:58] !log reactivate L3 only LB in esams/knams [20:20:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:03] (03PS1) 10Ayounsi: Add option to clamp TCP-MSS [homer/public] - 10https://gerrit.wikimedia.org/r/569636 [20:28:51] 10Operations, 10ops-codfw, 10fundraising-tech-ops: codfw: rack/setup/install 3 new payments server for frack - https://phabricator.wikimedia.org/T244169 (10Papaul) [20:30:48] (03PS1) 10Dzahn: doc: stop using wikidev group, use doc-uploader group [puppet] - 10https://gerrit.wikimedia.org/r/569637 (https://phabricator.wikimedia.org/T237707) [20:35:05] (03PS1) 10Ayounsi: Add option to prepend our AS# to peers [homer/public] - 10https://gerrit.wikimedia.org/r/569639 [20:35:07] (03PS1) 10Ayounsi: Add outbound flowspec support [homer/public] - 10https://gerrit.wikimedia.org/r/569640 (https://phabricator.wikimedia.org/T243482) [20:36:43] going to cherry-pick https://gerrit.wikimedia.org/r/c/mediawiki/core/+/565155 to wmf.16 to see if that affects T243548 [20:36:43] T243548: Elevated response times and CPU usage after deploy of 1.35.0-wmf.16 to all wikis - https://phabricator.wikimedia.org/T243548 [20:37:50] (03CR) 10Jforrester: [C: 03+1] doc: stop using wikidev group, use doc-uploader group [puppet] - 10https://gerrit.wikimedia.org/r/569637 (https://phabricator.wikimedia.org/T237707) (owner: 10Dzahn) [20:39:29] (03CR) 10Dzahn: "Thanks for uploading. Though I think it needs some discussion (ticket, mailing list, wiki, ..) before we can just enable embedding videos " [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T244162) (owner: 10Zoranzoki21) [20:39:43] (03PS1) 10BryanDavis: nf-mounts: expose the dumps mount in the recommendation-api project [puppet] - 10https://gerrit.wikimedia.org/r/569642 (https://phabricator.wikimedia.org/T244166) [20:40:14] (03CR) 10Dzahn: [C: 03+2] doc: stop using wikidev group, use doc-uploader group [puppet] - 10https://gerrit.wikimedia.org/r/569637 (https://phabricator.wikimedia.org/T237707) (owner: 10Dzahn) [20:40:22] (03PS2) 10Dzahn: doc: stop using wikidev group, use doc-uploader group [puppet] - 10https://gerrit.wikimedia.org/r/569637 (https://phabricator.wikimedia.org/T237707) [20:43:09] !log doc1001 - sudo chown -R doc-uploader:doc-uploader /srv/docroot/ [20:43:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:46:43] 10Operations, 10Traffic, 10Inuka-Team (Kanban), 10MW-1.35-notes (1.35.0-wmf.16; 2020-01-21), 10Performance-Team (Radar): Code for InukaPageView instrumentation - https://phabricator.wikimedia.org/T238029 (10nshahquinn-wmf) Thanks for testing on Beta cluster, @SBisson! I see two server log entries here so... [20:53:21] (03CR) 10Aaron Schulz: Avoid code duplication in 'lagDetectionMethod' settings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525147 (owner: 10Aaron Schulz) [20:55:05] (03PS6) 10Aaron Schulz: Use GTIDs for "wait for replica" barriers for external DB clusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525147 [20:55:20] (03PS7) 10Aaron Schulz: Use GTIDs for "wait for replica" barriers for external DB clusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/525147 [20:55:22] (03CR) 10Ppchelko: [C: 03+1] Configure remainder of testwikis group for kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569609 (https://phabricator.wikimedia.org/T243106) (owner: 10Eevans) [20:56:01] 10Operations, 10Traffic, 10Inuka-Team (Kanban), 10MW-1.35-notes (1.35.0-wmf.16; 2020-01-21), 10Performance-Team (Radar): Code for InukaPageView instrumentation - https://phabricator.wikimedia.org/T238029 (10nshahquinn-wmf) [21:00:04] cscott, arlolra, subbu, halfak, and accraze: Time to snap out of that daydream and deploy Services – Graphoid / Parsoid / Citoid / ORES. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200203T2100). [21:00:15] Doing an ORES deployment [21:01:24] !log halfak@deploy1001 Started deploy [ores/deploy@50a101a]: T243451 [21:01:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:30] T243451: Deploy ORES -- Late Jan 2020 - https://phabricator.wikimedia.org/T243451 [21:06:43] halfak: mind giving me a ping when clear? going to give wmf.16 another shot. [21:07:06] (03PS2) 10Bstorm: nf-mounts: expose the dumps mount in the recommendation-api project [puppet] - 10https://gerrit.wikimedia.org/r/569642 (https://phabricator.wikimedia.org/T244166) (owner: 10BryanDavis) [21:07:16] RoanKattouw: that revert is on mwdebug1002 and things do not seem to explode when loading pages there. [21:07:27] (03PS3) 10Bstorm: nfs-mounts: expose the dumps mount in the recommendation-api project [puppet] - 10https://gerrit.wikimedia.org/r/569642 (https://phabricator.wikimedia.org/T244166) (owner: 10BryanDavis) [21:08:24] brennen: :D [21:10:20] (03CR) 10Bstorm: [C: 03+2] nfs-mounts: expose the dumps mount in the recommendation-api project [puppet] - 10https://gerrit.wikimedia.org/r/569642 (https://phabricator.wikimedia.org/T244166) (owner: 10BryanDavis) [21:11:35] 10Operations, 10Performance-Team, 10serviceops, 10Wikimedia-production-error: Page takes over 15s to load: https://en.wikipedia.org/w/index.php?title=European_Union&type=revision&diff=938561921&oldid=938557616 - https://phabricator.wikimedia.org/T244058 (10Gilles) a:03aaron [21:14:09] !log halfak@deploy1001 Finished deploy [ores/deploy@50a101a]: T243451 (duration: 12m 47s) [21:14:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:13] T243451: Deploy ORES -- Late Jan 2020 - https://phabricator.wikimedia.org/T243451 [21:16:28] Looks like we're out of the woods for ORES. [21:16:30] All done. [21:16:36] cool, thx. [21:20:17] going ahead with wmf.16 -> all wikis. [21:22:22] * addshore watches [21:28:31] !log brennen@deploy1001 Synchronized php-1.35.0-wmf.16/includes/TemplateParser.php: Syncing https://gerrit.wikimedia.org/r/c/mediawiki/core/+/569643 for T243548 (duration: 01m 08s) [21:28:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:28:35] T243548: Elevated response times and CPU usage after deploy of 1.35.0-wmf.16 to all wikis - https://phabricator.wikimedia.org/T243548 [21:29:12] (03PS1) 10Brennen Bearnes: all wikis to 1.35.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569652 [21:29:14] (03CR) 10Brennen Bearnes: [C: 03+2] all wikis to 1.35.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569652 (owner: 10Brennen Bearnes) [21:30:01] (03CR) 10Dzahn: Add Icinga check for SMTP on Phabricator (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569563 (owner: 10Muehlenhoff) [21:30:24] (03Merged) 10jenkins-bot: all wikis to 1.35.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569652 (owner: 10Brennen Bearnes) [21:31:24] (03CR) 10Dzahn: Add Icinga check for SMTP on Phabricator (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569563 (owner: 10Muehlenhoff) [21:31:34] (03PS2) 10Dzahn: Add Icinga check for SMTP on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/569563 (owner: 10Muehlenhoff) [21:31:45] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [21:31:53] (03CR) 10Dzahn: [C: 03+2] Add Icinga check for SMTP on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/569563 (owner: 10Muehlenhoff) [21:32:55] i guess its not that then [21:33:01] !log brennen@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.16 [21:33:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:33:25] welp. [21:35:25] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method= [21:35:34] rolling back. [21:39:41] !log brennen@deploy1001 rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.35.0-wmf.15" [21:39:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:39:45] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:41:37] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:42:44] (03PS1) 10Brennen Bearnes: Revert "all wikis to 1.35.0-wmf.16" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569654 [21:42:46] (03CR) 10Brennen Bearnes: [C: 03+2] Revert "all wikis to 1.35.0-wmf.16" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569654 (owner: 10Brennen Bearnes) [21:42:56] @_@ [21:44:09] (03Merged) 10jenkins-bot: Revert "all wikis to 1.35.0-wmf.16" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569654 (owner: 10Brennen Bearnes) [21:45:36] (03PS11) 10ArielGlenn: write out and reuse pagerage info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434) [21:46:27] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [21:48:23] 10Operations, 10ops-codfw: codfw: rack/setup/install parse200[1-20].codfw.wmnet - https://phabricator.wikimedia.org/T243112 (10Dzahn) Cool, thanks. [21:48:46] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [21:50:29] (03CR) 10Dzahn: "https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=phab1001&service=Phabricator+SMTP works!" [puppet] - 10https://gerrit.wikimedia.org/r/569563 (owner: 10Muehlenhoff) [21:50:56] things are now status quo as far as deployments are concerned except that https://gerrit.wikimedia.org/r/c/mediawiki/core/+/569643 is on wmf.16 now. is something else up unrelated to deploy activity? [21:53:00] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [21:53:40] stuff still looks busy [21:55:18] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [21:56:04] its looks stable, i dont quite understand the alarm [21:58:50] yea, the graph is back to like it was before the deploy [21:59:31] 10Operations, 10Performance-Team, 10SRE-Access-Requests: Requesting access to deployment for dpifke - https://phabricator.wikimedia.org/T244183 (10dpifke) [22:00:04] Reedy and sbassett: (Dis)respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200203T2200). Please do the needful. [22:00:20] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:05:28] !log ganeti1010 - rebooting host to clear microcode mitigations CPU alert [22:05:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:10:38] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1010 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:13:31] !log rebooting ganeti1010, ganeti1011 and other new ganeti machines to pickup microcode mitigations, for some reason the previous reboots did not do it. rescheduled service check on icinga for ganeti1010 and now it recovered (T228924) [22:13:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:35] T228924: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 [22:14:33] !log andrew@deploy1001 Started deploy [horizon/deploy@8bffc7d]: Fix for T243355 [22:14:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:36] T243355: puppet panel: Can't add new prefixes - https://phabricator.wikimedia.org/T243355 [22:16:14] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1011 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:18:01] !log andrew@deploy1001 Finished deploy [horizon/deploy@8bffc7d]: Fix for T243355 (duration: 03m 29s) [22:18:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:05] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1012 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:20:17] PROBLEM - Host ganeti1013 is DOWN: PING CRITICAL - Packet loss = 100% [22:21:23] PROBLEM - Host ganeti1014 is DOWN: PING CRITICAL - Packet loss = 100% [22:21:27] RECOVERY - Host ganeti1013 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [22:21:33] RECOVERY - Host ganeti1014 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [22:22:33] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1014 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:22:39] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1013 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:24:49] PROBLEM - Host ganeti1015 is DOWN: PING CRITICAL - Packet loss = 100% [22:25:19] RECOVERY - Host ganeti1015 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [22:28:33] (03CR) 10Zoranzoki21: "> Thanks for uploading. Though I think it needs some discussion" [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T244162) (owner: 10Zoranzoki21) [22:30:21] (03PS12) 10ArielGlenn: write out and reuse pagerage info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434) [22:31:37] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1015 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:31:43] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1016 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:32:10] 10Operations, 10LDAP-Access-Requests: Request LDAP access to the WMF group for Edna M - https://phabricator.wikimedia.org/T244176 (10Reedy) [22:32:12] 10Operations, 10LDAP-Access-Requests: Request LDAP access to the WMF group for Edna M - https://phabricator.wikimedia.org/T244176 (10Reedy) [22:33:02] (03CR) 10Jforrester: [C: 03+1] Support MPEG-1 and MPEG-2 video files with .mpg or .mpeg extension [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/569341 (https://phabricator.wikimedia.org/T166024) (owner: 10Brion VIBBER) [22:35:13] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1019 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:37:01] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1021 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:37:32] PROBLEM - Host ganeti1018 is DOWN: PING CRITICAL - Packet loss = 100% [22:37:32] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:37:33] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1020 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:38:07] RECOVERY - Host ganeti1018 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [22:39:01] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1018 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:40:07] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1022 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [22:40:40] ganeti host reboots done - this time it did recover those microcode alerts [22:57:36] ah cool [23:01:26] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, 10Release-Engineering-Team (Development services): Deploy multi-site plugin to gerrit1001 and gerrit2001 - https://phabricator.wikimedia.org/T217174 (10Dzahn) [23:02:17] 10Operations, 10ops-eqiad, 10vm-requests: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10Dzahn) For some reason the previous reboots did not fix it but the second attempt did it. The microcode alerts are recovered now after rebooting hosts. [23:02:48] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team Workboards (Clinic Duty Team): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) a:05Dzahn→03hnowlan Hey Hugh, per chat at allhands. Can you test an Icinga command? [23:10:39] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team Workboards (Clinic Duty Team): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) Regarding the GPG key i see it on the keyserver but it has no new signatures yet. Looks like we... [23:12:31] !log removing AS15542 from esams [23:12:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:18:14] all right, after some discussion, one more shot for the day at taking wmf.16 -> all wikis. [23:21:41] (03PS1) 10Brennen Bearnes: all wikis to 1.35.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569675 [23:21:43] (03CR) 10Brennen Bearnes: [C: 03+2] all wikis to 1.35.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569675 (owner: 10Brennen Bearnes) [23:21:49] !log gerrit1002 - deleting gerrit.log and gerrit.json files from January to free about 4GB of space (T239151 T243983) [23:21:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:21:54] T239151: Gerrit VM to test data migration - https://phabricator.wikimedia.org/T239151 [23:21:54] T243983: Add second virtual hard disk to ganeti gerrit test instance - https://phabricator.wikimedia.org/T243983 [23:22:43] (03Merged) 10jenkins-bot: all wikis to 1.35.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569675 (owner: 10Brennen Bearnes) [23:24:52] !log brennen@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.16 [23:24:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:32] !log ganeti1003 - sudo gnt-instance modify --disk add:size=10G gerrit1002.wikimedia.org (T239151 T243983) [23:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:34:13] !log rebooting gerrit1002 (test VM) [23:34:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:33] 10Operations, 10ops-eqsin, 10Traffic: rack/setup/install ps[12]-60[34]-eqsin - https://phabricator.wikimedia.org/T242250 (10RobH) Please note this has been confirmed as likely to occur on Feb 6th (GMT). Jin has approved that he can work during that window, and we need to get confirmation from @bblack that t... [23:46:01] 10Operations, 10Parsoid-PHP, 10serviceops, 10User-brennen, 10Wikimedia-production-error: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10brennen) [23:59:16] (03PS2) 10Jforrester: Configure remainder of testwikis group for kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569609 (https://phabricator.wikimedia.org/T243106) (owner: 10Eevans)