[00:45:34] (03CR) 10Krinkle: Update authmanager-statsd channel names (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572502 (owner: 10Gergő Tisza) [00:55:10] (03CR) 10Krinkle: Increase Commons linkpurge rate limit for patrollers (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572339 (https://phabricator.wikimedia.org/T245214) (owner: 10Gergő Tisza) [01:21:08] (03PS2) 10Gergő Tisza: Update authmanager-statsd channel names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572502 [01:21:31] (03CR) 10Gergő Tisza: Update authmanager-statsd channel names (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572502 (owner: 10Gergő Tisza) [04:04:14] (03CR) 10Thcipriani: [C: 03+1] admin: change matrix.php column "grp" to "groups" [puppet] - 10https://gerrit.wikimedia.org/r/556281 (owner: 10Krinkle) [04:12:08] (03PS1) 10Andrew Bogott: Keystone: switch from UUID tokens to fernet tokens [puppet] - 10https://gerrit.wikimedia.org/r/572507 (https://phabricator.wikimedia.org/T243418) [04:14:42] (03PS2) 10Andrew Bogott: Keystone: switch from UUID tokens to fernet tokens [puppet] - 10https://gerrit.wikimedia.org/r/572507 (https://phabricator.wikimedia.org/T243418) [04:41:26] (03CR) 10Brian Wolff: "Note, there are other things going to the capcha channel (they use wfDebugLog), will that still be ok with this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572502 (owner: 10Gergő Tisza) [06:16:49] (03CR) 10Vgutierrez: [C: 03+2] ATS: Disable DNS resolution on ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/571270 (https://phabricator.wikimedia.org/T244538) (owner: 10Vgutierrez) [06:20:46] (03PS1) 10Marostegui: wmnet: Promote dbproxy1015 to m2-master [dns] - 10https://gerrit.wikimedia.org/r/572512 (https://phabricator.wikimedia.org/T202367) [06:21:58] (03CR) 10Marostegui: [C: 03+2] wmnet: Promote dbproxy1015 to m2-master [dns] - 10https://gerrit.wikimedia.org/r/572512 (https://phabricator.wikimedia.org/T202367) (owner: 10Marostegui) [06:32:35] (03PS1) 10Marostegui: dbproxy100[2,7]: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/572514 (https://phabricator.wikimedia.org/T245384) [06:37:07] (03CR) 10Gergő Tisza: "> Note, there are other things going to the capcha channel (they use wfDebugLog), will that still be ok with this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572502 (owner: 10Gergő Tisza) [06:43:31] (03PS4) 10Gergő Tisza: Increase Commons linkpurge rate limit for patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572339 (https://phabricator.wikimedia.org/T245214) [06:45:01] (03CR) 10Gergő Tisza: Increase Commons linkpurge rate limit for patrollers (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572339 (https://phabricator.wikimedia.org/T245214) (owner: 10Gergő Tisza) [06:46:48] (03CR) 10Marostegui: [C: 03+2] dbproxy100[2,7]: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/572514 (https://phabricator.wikimedia.org/T245384) (owner: 10Marostegui) [06:51:44] (03PS1) 10Vgutierrez: ATS: Extend KA experiment between ats-tls and varnish-fe to all ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/572515 (https://phabricator.wikimedia.org/T244464) [06:52:57] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "I like this patch a lot, and sorry for being late in reviewing it!" (033 comments) [software/httpbb] - 10https://gerrit.wikimedia.org/r/567147 (owner: 10RLazarus) [06:54:36] (03CR) 10Giuseppe Lavagetto: discoverydns: integrate into servicecatalog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572214 (owner: 10Giuseppe Lavagetto) [06:59:01] (03PS7) 10Giuseppe Lavagetto: discoverydns: integrate into servicecatalog [puppet] - 10https://gerrit.wikimedia.org/r/572214 [07:02:41] (03PS8) 10Giuseppe Lavagetto: discoverydns: integrate into servicecatalog [puppet] - 10https://gerrit.wikimedia.org/r/572214 [07:02:57] (03PS2) 10Vgutierrez: ATS: Extend KA experiment between ats-tls and varnish-fe to all ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/572515 (https://phabricator.wikimedia.org/T244464) [07:09:50] (03PS3) 10Vgutierrez: ATS: Extend KA experiment between ats-tls and varnish-fe to all ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/572515 (https://phabricator.wikimedia.org/T244464) [07:13:57] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/20820/" [puppet] - 10https://gerrit.wikimedia.org/r/572214 (owner: 10Giuseppe Lavagetto) [07:22:06] !log Stop haproxy on dbproxy1002 - T245384 [07:22:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:22:10] T245384: decommission dbproxy1002.eqiad.wmnet - https://phabricator.wikimedia.org/T245384 [07:23:09] (03PS3) 10Giuseppe Lavagetto: profile::configmaster: use wmflib::service functions [puppet] - 10https://gerrit.wikimedia.org/r/570070 [07:24:37] (03PS4) 10Vgutierrez: ATS: Extend KA experiment between ats-tls and varnish-fe to all ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/572515 (https://phabricator.wikimedia.org/T244464) [07:26:44] (03CR) 10Vgutierrez: "pcc seems happy: https://puppet-compiler.wmflabs.org/compiler1001/20822/" [puppet] - 10https://gerrit.wikimedia.org/r/572515 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [07:30:27] (03PS4) 10Giuseppe Lavagetto: profile::configmaster: use wmflib::service functions [puppet] - 10https://gerrit.wikimedia.org/r/570070 [07:32:31] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/20823/puppetmaster1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/570070 (owner: 10Giuseppe Lavagetto) [07:35:32] (03PS3) 10Giuseppe Lavagetto: profile::lvs::realserver: use wmflib::service::fetch [puppet] - 10https://gerrit.wikimedia.org/r/570071 [07:47:43] (03CR) 10Giuseppe Lavagetto: [C: 04-1] httpd: Add a new LogFormat, wmfjson (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572252 (owner: 10Alexandros Kosiaris) [07:50:42] (03PS4) 10Giuseppe Lavagetto: profile::lvs::realserver: use wmflib::service::fetch [puppet] - 10https://gerrit.wikimedia.org/r/570071 [07:57:14] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::lvs::realserver: use wmflib::service::fetch [puppet] - 10https://gerrit.wikimedia.org/r/570071 (owner: 10Giuseppe Lavagetto) [07:57:37] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/20826/" [puppet] - 10https://gerrit.wikimedia.org/r/570071 (owner: 10Giuseppe Lavagetto) [08:00:04] Deploy window NO DEPLOYS (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200217T0800) [08:21:49] (03PS22) 10ArielGlenn: write out and reuse pagerange info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434) [08:41:54] (03Restored) 10Kosta Harlan: Echo: Enable poll for updates feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530639 (https://phabricator.wikimedia.org/T219222) (owner: 10Kosta Harlan) [08:42:16] (03PS4) 10Kosta Harlan: Echo: Enable poll for updates feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530639 (https://phabricator.wikimedia.org/T219222) [08:45:58] (03PS5) 10Kosta Harlan: Echo: Enable poll for updates feature on testwiki and mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530639 (https://phabricator.wikimedia.org/T219222) [08:47:27] (03PS2) 10Muehlenhoff: Switch component-pyall to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/563474 [08:54:34] (03CR) 10Muehlenhoff: [C: 03+2] Switch component-pyall to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/563474 (owner: 10Muehlenhoff) [09:01:33] (03PS2) 10Muehlenhoff: profile::url_downloader: Add types and switch to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/562472 [09:06:09] !log +50G to prometheus/ops fs on prometheus eqiad - T245361 [09:06:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:14] T245361: prometheus1003/prometheus1004 /srv/prometheus/ops disk space warning - https://phabricator.wikimedia.org/T245361 [09:09:58] !log +10G to prometheus/ops fs on prometheus eqiad - T245361 [09:10:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:05] !log correction, +100G [09:10:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:50] 10Operations, 10observability: prometheus1003/prometheus1004 /srv/prometheus/ops disk space warning - https://phabricator.wikimedia.org/T245361 (10fgiunchedi) Thanks @Marostegui @Volans ! Indeed the space used grew because of longer retention, I added 150G to the LVs (last log is wrong, it is 100G) which shoul... [09:16:13] (03PS3) 10Muehlenhoff: profile::url_downloader: Add types and switch to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/562472 [09:19:22] (03CR) 10jerkins-bot: [V: 04-1] profile::url_downloader: Add types and switch to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/562472 (owner: 10Muehlenhoff) [09:21:21] 10Operations, 10observability: prometheus1003/prometheus1004 /srv/prometheus/ops disk space warning - https://phabricator.wikimedia.org/T245361 (10fgiunchedi) a:03fgiunchedi I'll take this and resolve once space has stabilized again [09:21:24] (03PS4) 10Muehlenhoff: profile::url_downloader: Add types and switch to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/562472 [09:21:34] 10Operations, 10observability: prometheus1003/prometheus1004 /srv/prometheus/ops disk space warning - https://phabricator.wikimedia.org/T245361 (10fgiunchedi) p:05High→03Medium [09:25:49] (03Abandoned) 10Filippo Giunchedi: Switch mw/mwmaint to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/572211 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:31:31] 10Operations, 10Core Platform Team, 10MediaWiki-API, 10Pywikibot: WMFTimeoutException on non-existent files - https://phabricator.wikimedia.org/T245374 (10Xqt) I am missing the error traceback and log entry in front of the wait() statement in line 1767. The Code is ` except Exception:... [09:37:48] 10Operations, 10ops-eqiad, 10serviceops: (Need By Feb 28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10jijiki) Thank you all! [09:41:27] (03Abandoned) 10Muehlenhoff: Puppetise yubikey-val (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/285962 (owner: 10Muehlenhoff) [09:42:59] (03CR) 10Muehlenhoff: [C: 03+1] add apt1001.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/572311 (https://phabricator.wikimedia.org/T244626) (owner: 10Dzahn) [09:48:10] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Misc work to make puppet run in codfw1dev again following Icad66f70 [puppet] - 10https://gerrit.wikimedia.org/r/572421 (https://phabricator.wikimedia.org/T242607) (owner: 10Alex Monk) [09:48:51] (03CR) 10Muehlenhoff: [C: 04-1] "This seems some additional Hiera changes first, right now it would setup an additional sync between install1002 and install2002." [puppet] - 10https://gerrit.wikimedia.org/r/572312 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [09:52:22] (03CR) 10Ema: [C: 03+2] ATS: remove 'tls:' prefix from X-Analytics-TLS [puppet] - 10https://gerrit.wikimedia.org/r/572009 (https://phabricator.wikimedia.org/T237993) (owner: 10Ema) [10:00:23] !log installing Linux 4.9.210 kernels on stretch systems [10:00:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:28] (03CR) 10Ema: [C: 03+1] ATS: Extend KA experiment between ats-tls and varnish-fe to all ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/572515 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [10:15:28] (03CR) 10Vgutierrez: [C: 03+2] ATS: Extend KA experiment between ats-tls and varnish-fe to all ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/572515 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [10:20:41] !log rolling restart of ats-tls and varnish-fe on ulsfo to enable KA between them - T244464 [10:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:45] T244464: Investigate side-effects of enabling KA between ats-tls and varnish-fe - https://phabricator.wikimedia.org/T244464 [10:22:18] !log marostegui@cumin1001 dbctl commit (dc=all): ' db1107 increase API weight from 10 to 15 for 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10420 and previous config saved to /var/cache/conftool/dbconfig/20200217-102218-marostegui.json [10:22:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:23] T242702: Test MariaDB 10.4 in production - https://phabricator.wikimedia.org/T242702 [10:23:50] (03CR) 10Alexandros Kosiaris: httpd: Add a new LogFormat, wmfjson (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572252 (owner: 10Alexandros Kosiaris) [10:25:05] (03PS2) 10Alexandros Kosiaris: httpd: Add a new LogFormat, wmfjson [puppet] - 10https://gerrit.wikimedia.org/r/572252 [10:25:07] (03PS2) 10Alexandros Kosiaris: otrs: Add wmfjson logs to OTRS [puppet] - 10https://gerrit.wikimedia.org/r/572253 [10:28:21] (03PS2) 10Arturo Borrero Gonzalez: cloud: refresh names for DNS servers in eqiad1/codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/572213 (https://phabricator.wikimedia.org/T243766) [10:29:58] (03CR) 10Giuseppe Lavagetto: [C: 03+1] httpd: Add a new LogFormat, wmfjson [puppet] - 10https://gerrit.wikimedia.org/r/572252 (owner: 10Alexandros Kosiaris) [10:30:15] <_joe_> akosiaris: interesting results ffrom your tests [10:31:25] !log dropping all databases from db1140:3313 [10:31:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:34:50] (03CR) 10Alexandros Kosiaris: httpd: Add a new LogFormat, wmfjson (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572252 (owner: 10Alexandros Kosiaris) [10:34:54] (03CR) 10Alexandros Kosiaris: [C: 03+2] httpd: Add a new LogFormat, wmfjson [puppet] - 10https://gerrit.wikimedia.org/r/572252 (owner: 10Alexandros Kosiaris) [10:35:24] (03PS1) 10Arturo Borrero Gonzalez: realm: rename labs realm to cloud [puppet] - 10https://gerrit.wikimedia.org/r/572626 (https://phabricator.wikimedia.org/T244222) [10:35:32] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks everyone! Merging, let's evaluate first on a low traffic host like ticket.wikimedia.org (subsequent patch)" [puppet] - 10https://gerrit.wikimedia.org/r/572252 (owner: 10Alexandros Kosiaris) [10:35:51] (03CR) 10Alexandros Kosiaris: [C: 03+2] otrs: Add wmfjson logs to OTRS [puppet] - 10https://gerrit.wikimedia.org/r/572253 (owner: 10Alexandros Kosiaris) [10:38:06] _joe_: I was thinking I should just spin up an apache and fuzzy it [10:38:08] (03PS1) 10Ladsgroup: Start reading for the new term store for clients up to Q1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572628 (https://phabricator.wikimedia.org/T225057) [10:38:25] try the headers with log with anykind of input, just for the heck of it [10:39:51] (03CR) 10jerkins-bot: [V: 04-1] Start reading for the new term store for clients up to Q1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572628 (https://phabricator.wikimedia.org/T225057) (owner: 10Ladsgroup) [10:40:04] (03CR) 10Ladsgroup: "IS.php should be synced first, twice." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569031 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek) [10:46:11] (03PS2) 10Ladsgroup: Start reading for the new term store for clients up to Q1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572628 (https://phabricator.wikimedia.org/T225057) [10:47:42] (03CR) 10Jbond: [C: 03+1] "lgtm" [dns] - 10https://gerrit.wikimedia.org/r/572311 (https://phabricator.wikimedia.org/T244626) (owner: 10Dzahn) [10:55:38] (03CR) 10Jbond: "I know this is WIP so you are likely aware however just wanted to note that this CR dose not address instances of the `$::realm` global va" [puppet] - 10https://gerrit.wikimedia.org/r/572626 (https://phabricator.wikimedia.org/T244222) (owner: 10Arturo Borrero Gonzalez) [11:28:34] 10Operations, 10Traffic: Session resumption seems to be broken in ATS for TLSv1.3 - https://phabricator.wikimedia.org/T245419 (10Vgutierrez) [11:30:23] 10Operations, 10Traffic: Session resumption seems to be broken in ATS for TLSv1.3 - https://phabricator.wikimedia.org/T245419 (10Vgutierrez) p:05Triage→03High [11:31:01] 10Operations, 10Traffic: Session resumption seems to be broken in ATS for TLSv1.3 - https://phabricator.wikimedia.org/T245419 (10Vgutierrez) [11:32:16] (03CR) 10Arturo Borrero Gonzalez: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/572626 (https://phabricator.wikimedia.org/T244222) (owner: 10Arturo Borrero Gonzalez) [11:33:01] (03PS2) 10Arturo Borrero Gonzalez: realm: rename labs realm to cloud [puppet] - 10https://gerrit.wikimedia.org/r/572626 (https://phabricator.wikimedia.org/T244222) [11:36:12] (03CR) 10jerkins-bot: [V: 04-1] realm: rename labs realm to cloud [puppet] - 10https://gerrit.wikimedia.org/r/572626 (https://phabricator.wikimedia.org/T244222) (owner: 10Arturo Borrero Gonzalez) [11:37:59] (03CR) 10Jbond: "taken a first pass looks good, some nits and recommendations" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/572251 (owner: 10Muehlenhoff) [11:40:45] (03CR) 10Jbond: [C: 03+2] tlsproxy::localssl: change duplicate definition detection [puppet] - 10https://gerrit.wikimedia.org/r/572282 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [11:44:24] (03PS3) 10Arturo Borrero Gonzalez: realm: rename labs realm to cloud [puppet] - 10https://gerrit.wikimedia.org/r/572626 (https://phabricator.wikimedia.org/T244222) [11:48:37] (03CR) 10jerkins-bot: [V: 04-1] realm: rename labs realm to cloud [puppet] - 10https://gerrit.wikimedia.org/r/572626 (https://phabricator.wikimedia.org/T244222) (owner: 10Arturo Borrero Gonzalez) [11:48:54] (03PS4) 10Arturo Borrero Gonzalez: realm: rename labs realm to cloud [puppet] - 10https://gerrit.wikimedia.org/r/572626 (https://phabricator.wikimedia.org/T244222) [12:39:41] (03CR) 10jerkins-bot: [V: 04-1] realm: rename labs realm to cloud [puppet] - 10https://gerrit.wikimedia.org/r/572626 (https://phabricator.wikimedia.org/T244222) (owner: 10Arturo Borrero Gonzalez) [12:39:41] 10Operations, 10Traffic, 10Patch-For-Review: ats-tls performance issues under production load - https://phabricator.wikimedia.org/T244538 (10Vgutierrez) `Tested again on cp1075 (now running buster) before disabling DNS on ats-tls and after: `name=before vgutierrez@cp1075:~$ ./hey -c 1 -z 10s https://en.wiki... [12:39:41] (03CR) 10Hnowlan: Migrate changeprop & cpjobqueue to kubernetes (037 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/554576 (owner: 10Holger Knust) [12:39:44] 10Operations, 10Core Platform Team, 10MediaWiki-API, 10Pywikibot: WMFTimeoutException on non-existent files - https://phabricator.wikimedia.org/T245374 (10jbond) p:05Triage→03Medium [12:39:44] 10Operations, 10Analytics, 10decommission, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10jbond) p:05Triage→03Medium [12:39:44] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Couple of inline comments of my own" (037 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/554576 (owner: 10Holger Knust) [12:39:45] 10Operations: Integrate Stretch 9.12 point update - https://phabricator.wikimedia.org/T244695 (10MoritzMuehlenhoff) [12:39:48] 10Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 10puppet-compiler, 10User-jbond: PCC always has an ERROR when compiling for servers with profile::redis::slave - https://phabricator.wikimedia.org/T228266 (10jbond) [12:39:50] 10Operations, 10Puppet, 10User-jbond: Add check for changes applied at all runs - https://phabricator.wikimedia.org/T242910 (10jbond) I have deployed the change to remove the tlsproxy noise however there are still quit a few boxes which are showing changed on every run ` lang=yaml ./reports.py | sort --- ac... [12:39:50] !log reboot acmechief instances (kernel upgrade) [12:39:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:52] !log installing postgresql-9.4 security updates [12:39:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:52] !log add test flowspec rules to cr3-knams [12:39:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:52] great, they killed the session... [12:40:20] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] nova_fixed_multi: support adding/deleting records in a 'legacy' domain [puppet] - 10https://gerrit.wikimedia.org/r/572122 (https://phabricator.wikimedia.org/T245173) (owner: 10Andrew Bogott) [12:40:23] (03CR) 10Brian Wolff: "Note to self: may want to use the plugin-types directive for pdfs" [puppet] - 10https://gerrit.wikimedia.org/r/547929 (https://phabricator.wikimedia.org/T117618) (owner: 10Brian Wolff) [12:40:31] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] nova_fixed_multi: support adding/deleting records in a 'legacy' domain [puppet] - 10https://gerrit.wikimedia.org/r/572122 (https://phabricator.wikimedia.org/T245173) (owner: 10Andrew Bogott) [12:42:36] (03PS4) 10Muehlenhoff: Add script to track OS migrations status [puppet] - 10https://gerrit.wikimedia.org/r/572251 [12:42:51] (03PS1) 10Jbond: acme_chief::server: change exec job to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) [12:43:18] (03CR) 10Muehlenhoff: Add script to track OS migrations status (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/572251 (owner: 10Muehlenhoff) [12:44:08] (03CR) 10jerkins-bot: [V: 04-1] acme_chief::server: change exec job to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [12:48:34] 10Operations, 10ops-eqiad: eqiad - Duplicate IP on mgmt network - https://phabricator.wikimedia.org/T245427 (10ayounsi) p:05Triage→03High [12:48:48] (03PS2) 10Jbond: acme_chief::server: change exec job to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) [12:50:06] XioNoX: :-) [12:53:02] (03PS3) 10Jbond: acme_chief::server: change exec job to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) [12:54:32] 10Operations, 10MediaWiki-General, 10serviceops, 10Service-Architecture: Create a service-to-service proxy for handling HTTP calls from services to other entities - https://phabricator.wikimedia.org/T244843 (10Joe) a:03Joe [12:55:13] (03CR) 10Vgutierrez: acme_chief::server: change exec job to a systemd timer (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [12:55:21] (03CR) 10Muehlenhoff: "JFTR, the new standard RAID0 recipes were successfully tested with logstash2026." [puppet] - 10https://gerrit.wikimedia.org/r/570596 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [12:56:29] (03PS2) 10Muehlenhoff: elasticsearch::packages: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561866 [12:56:55] (03PS4) 10Jbond: acme_chief::server: change exec job to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) [12:57:27] (03CR) 10Jbond: "thanks for the quick review, updated" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [12:58:25] (03PS5) 10Jbond: acme_chief::server: change exec job to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) [13:03:42] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [13:05:13] (03Abandoned) 10Muehlenhoff: Allow filtering services for restart notification (WIP) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493697 (owner: 10Muehlenhoff) [13:10:01] (03PS1) 10Arturo Borrero Gonzalez: icinga: remove wmflabs.org HTTPS cert check [puppet] - 10https://gerrit.wikimedia.org/r/572665 (https://phabricator.wikimedia.org/T235252) [13:31:27] (03PS1) 10Jbond: profile::ganeti: update the permissions of the users file [puppet] - 10https://gerrit.wikimedia.org/r/572667 (https://phabricator.wikimedia.org/T242910) [13:31:42] (03CR) 10Jbond: [C: 03+2] acme_chief::server: change exec job to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/572659 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [13:42:21] (03CR) 10Volans: "Two quick comments from a very shallow pass, I'll have a deeper look later" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/572251 (owner: 10Muehlenhoff) [13:47:38] (03PS1) 10Ema: cache: enable cgroup accounting on two esams nodes [puppet] - 10https://gerrit.wikimedia.org/r/572669 (https://phabricator.wikimedia.org/T183146) [13:49:46] (03CR) 10Vgutierrez: [C: 03+1] cache: enable cgroup accounting on two esams nodes [puppet] - 10https://gerrit.wikimedia.org/r/572669 (https://phabricator.wikimedia.org/T183146) (owner: 10Ema) [14:02:59] (03CR) 10Jbond: Add script to track OS migrations status (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/572251 (owner: 10Muehlenhoff) [14:03:04] (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/561866 (owner: 10Muehlenhoff) [14:04:42] (03CR) 10Muehlenhoff: [C: 03+2] elasticsearch::packages: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561866 (owner: 10Muehlenhoff) [14:09:47] 10Operations, 10observability, 10Patch-For-Review: Monitor resource usage on a per-cgroup basis - https://phabricator.wikimedia.org/T183146 (10ema) Cadvisor is not in Buster, see [[ https://packages.qa.debian.org/c/cadvisor.html | the package tracker ]]. I have tried building it on boron: it turns out that t... [14:17:21] !log reprepro includedeb buster-wikimedia ~ema/cadvisor_0.35.0+ds1-4_amd64.deb T183146 [14:17:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:26] T183146: Monitor resource usage on a per-cgroup basis - https://phabricator.wikimedia.org/T183146 [14:17:57] (03CR) 10Ema: [C: 03+2] cache: enable cgroup accounting on two esams nodes [puppet] - 10https://gerrit.wikimedia.org/r/572669 (https://phabricator.wikimedia.org/T183146) (owner: 10Ema) [14:31:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10422 and previous config saved to /var/cache/conftool/dbconfig/20200217-143146-marostegui.json [14:31:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:51] T242702: Test MariaDB 10.4 in production - https://phabricator.wikimedia.org/T242702 [14:37:48] (03PS1) 10Ayounsi: Add flowspec1001 to DNS [dns] - 10https://gerrit.wikimedia.org/r/572681 [14:38:57] (03CR) 10Ayounsi: [C: 03+2] Add flowspec1001 to DNS [dns] - 10https://gerrit.wikimedia.org/r/572681 (owner: 10Ayounsi) [14:42:05] 10Operations, 10Parsoid, 10VisualEditor, 10wikitech.wikimedia.org: VisualEditor was removed from Wikitech because Parsoid/PHP isn't yet compatible with how Wikitech is set up - https://phabricator.wikimedia.org/T241961 (10ayounsi) Any ETA on when it will be fixed? [14:43:08] (03PS1) 10Ema: Add module to configure cadvisor [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) [14:46:26] (03CR) 10Muehlenhoff: Add module to configure cadvisor (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) (owner: 10Ema) [14:52:06] (03PS2) 10Ema: prometheus: add cadvisor_exporter module [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) [14:52:40] (03CR) 10Ema: prometheus: add cadvisor_exporter module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) (owner: 10Ema) [14:59:49] (03PS1) 10Jbond: query_service::common: ensure we dont run exec on every run [puppet] - 10https://gerrit.wikimedia.org/r/572684 (https://phabricator.wikimedia.org/T242910) [15:00:57] (03CR) 10Muehlenhoff: prometheus: add cadvisor_exporter module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) (owner: 10Ema) [15:01:51] (03PS3) 10Ema: prometheus: add cadvisor_exporter module and profile [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) [15:02:41] (03PS2) 10CDanis: maps.wm.o: reduce TTL from 1D to 10m [dns] - 10https://gerrit.wikimedia.org/r/572274 [15:02:43] (03PS2) 10CDanis: Manitoba: better served by codfw [dns] - 10https://gerrit.wikimedia.org/r/572269 [15:02:45] (03PS2) 10CDanis: Saskatchewan: ulsfo >> codfw > eqiad [dns] - 10https://gerrit.wikimedia.org/r/572270 [15:06:14] (03CR) 10Muehlenhoff: prometheus: add cadvisor_exporter module and profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) (owner: 10Ema) [15:22:37] (03CR) 10Muehlenhoff: prometheus: add cadvisor_exporter module and profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) (owner: 10Ema) [15:23:00] (03PS1) 10Jcrespo: backups: Disable s3-eqiad backups until source host is restored [puppet] - 10https://gerrit.wikimedia.org/r/572685 (https://phabricator.wikimedia.org/T244958) [15:23:25] (03PS2) 10Andrew Bogott: nova_fixed_multi: support adding/deleting records in a 'legacy' domain [puppet] - 10https://gerrit.wikimedia.org/r/572122 (https://phabricator.wikimedia.org/T245173) [15:23:27] (03PS1) 10Andrew Bogott: Designate: start using '.eqiad1.wikimedia.cloud' domain in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/572686 (https://phabricator.wikimedia.org/T245173) [15:27:41] (03PS4) 10Ema: prometheus: add cadvisor_exporter module and profile [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) [15:30:05] (03PS1) 10Elukey: Fix Redirect rule in httpd config of stats.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/572690 (https://phabricator.wikimedia.org/T245414) [15:30:14] (03PS1) 10Jbond: profile::prometheus::ops_mysql: change exec to a system timer [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) [15:31:26] (03CR) 10Elukey: [C: 03+2] Fix Redirect rule in httpd config of stats.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/572690 (https://phabricator.wikimedia.org/T245414) (owner: 10Elukey) [15:31:32] (03CR) 10Mforns: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/572690 (https://phabricator.wikimedia.org/T245414) (owner: 10Elukey) [15:31:48] (03CR) 10Jcrespo: [C: 03+2] backups: Disable s3-eqiad backups until source host is restored [puppet] - 10https://gerrit.wikimedia.org/r/572685 (https://phabricator.wikimedia.org/T244958) (owner: 10Jcrespo) [15:32:12] (03PS5) 10Ema: prometheus: add cadvisor_exporter module and profile [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) [15:32:14] (03PS1) 10Ema: cache: add cadvisor exporter [puppet] - 10https://gerrit.wikimedia.org/r/572693 (https://phabricator.wikimedia.org/T183146) [15:37:40] (03PS2) 10Andrew Bogott: Designate: start using '.eqiad1.wikimedia.cloud' domain in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/572686 (https://phabricator.wikimedia.org/T245173) [15:39:29] (03PS1) 10Jbond: openstack::clientpackages::mitaka::buster: change notice to warning [puppet] - 10https://gerrit.wikimedia.org/r/572696 (https://phabricator.wikimedia.org/T242910) [15:44:04] !log ✔️ cdanis@icinga1001.wikimedia.org ~ 🕥☕ sudo systemctl restart ircecho [15:44:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:54] (03CR) 10Filippo Giunchedi: "Change LGTM, would be nice to have PCC too. Adding DBA as heads up" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [15:46:59] (03CR) 10Filippo Giunchedi: [C: 03+1] profile::prometheus::ops_mysql: change exec to a system timer [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [15:47:32] (03PS1) 10Alexandros Kosiaris: otrs: Add local hostname to trusted proxies [puppet] - 10https://gerrit.wikimedia.org/r/572697 [15:47:34] (03PS1) 10Alexandros Kosiaris: httpd: Fix X-Client-IP defition in wmfjson [puppet] - 10https://gerrit.wikimedia.org/r/572698 [15:48:26] !log ayounsi@cumin1001 START - Cookbook sre.ganeti.makevm [15:48:26] !log ayounsi@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [15:48:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:02] !log ayounsi@cumin1001 START - Cookbook sre.ganeti.makevm [15:50:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:11] (03PS6) 10Ema: prometheus: add cadvisor_exporter module and profile [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) [15:52:12] (03CR) 10Ema: prometheus: add cadvisor_exporter module and profile (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/572682 (https://phabricator.wikimedia.org/T183146) (owner: 10Ema) [15:52:14] (03PS2) 10Alexandros Kosiaris: otrs: Add local hostname to trusted proxies [puppet] - 10https://gerrit.wikimedia.org/r/572697 [15:52:16] (03PS2) 10Alexandros Kosiaris: httpd: Fix X-Client-IP defition in wmfjson [puppet] - 10https://gerrit.wikimedia.org/r/572698 [15:55:02] !log ayounsi@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [15:55:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:42] CUSTOM - Memory correctable errors -EDAC- on mwdebug2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mwdebug2001&var-datasource=codfw+prometheus/ops [15:56:07] haha, funnily the IRC notification config does not log the message if you 'send custom notification' [15:56:21] but, bot seems to work again [15:58:18] thanks! [15:58:38] (03CR) 10Alexandros Kosiaris: [C: 03+2] otrs: Add local hostname to trusted proxies [puppet] - 10https://gerrit.wikimedia.org/r/572697 (owner: 10Alexandros Kosiaris) [15:58:50] (03CR) 10Alexandros Kosiaris: [C: 03+2] httpd: Fix X-Client-IP defition in wmfjson [puppet] - 10https://gerrit.wikimedia.org/r/572698 (owner: 10Alexandros Kosiaris) [16:01:30] (03CR) 10Marostegui: "Let's get a PCC indeed" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:02:08] (03PS1) 10Ayounsi: Add DHCP and Netboot for flowspec1001 [puppet] - 10https://gerrit.wikimedia.org/r/572701 [16:03:52] (03CR) 10Ayounsi: [C: 03+2] Add DHCP and Netboot for flowspec1001 [puppet] - 10https://gerrit.wikimedia.org/r/572701 (owner: 10Ayounsi) [16:04:34] (03CR) 10Jcrespo: "There was a lot of discussion on how to implement this when it was first deployed. I agree it is not optimal, but it was left as is as a c" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:06:47] (03CR) 10Jcrespo: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:12:56] (03CR) 10Jbond: "> > What is the error handling if run fails?" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:14:03] (03CR) 10Jbond: "Sending again as formatting was off in the last mail" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:14:21] (03PS1) 10Alexandros Kosiaris: DNM: httpd: Globally enable wmfjson [puppet] - 10https://gerrit.wikimedia.org/r/572702 [16:21:17] (03CR) 10Jcrespo: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:23:19] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:24:48] (03CR) 10Jcrespo: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:26:33] (03CR) 10Jcrespo: [C: 03+1] "I think also adding documentation on how to update the list manually (command to run) on wikitech, I can add that once it is tested." [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:28:35] (03PS1) 10Jbond: profile::ci::docker: manage all group membership in data module [puppet] - 10https://gerrit.wikimedia.org/r/572707 (https://phabricator.wikimedia.org/T242910) [16:36:42] (03PS2) 10Alexandros Kosiaris: DNM: httpd: Globally enable wmfjson [puppet] - 10https://gerrit.wikimedia.org/r/572702 [16:36:44] (03PS1) 10Alexandros Kosiaris: httpd: Switch defaults.conf from file to template [puppet] - 10https://gerrit.wikimedia.org/r/572708 [16:37:19] (03CR) 10Muehlenhoff: Add script to track OS migrations status (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/572251 (owner: 10Muehlenhoff) [16:38:09] (03CR) 10Alexandros Kosiaris: [C: 03+1] profile::ci::docker: manage all group membership in data module [puppet] - 10https://gerrit.wikimedia.org/r/572707 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:38:16] (03PS5) 10Muehlenhoff: Add script to track OS migrations status [puppet] - 10https://gerrit.wikimedia.org/r/572251 [16:41:52] (03PS2) 10Jbond: profile::prometheus::ops_mysql: change exec to a system timer [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) [16:42:00] (03CR) 10Jbond: "missed some clean up otherwise lgtm" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572251 (owner: 10Muehlenhoff) [16:44:27] (03PS3) 10Jbond: profile::prometheus::ops_mysql: change exec to a system timer [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) [16:44:40] (03CR) 10Jbond: "PCC added" [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:45:21] (03CR) 10Muehlenhoff: "Looks good, one comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572707 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:47:07] (03PS6) 10Muehlenhoff: Add script to track OS migrations status [puppet] - 10https://gerrit.wikimedia.org/r/572251 [16:47:09] (03CR) 10Muehlenhoff: Add script to track OS migrations status (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572251 (owner: 10Muehlenhoff) [16:47:54] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 92 probes of 527 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [16:49:20] PROBLEM - Host flowspec1001 is DOWN: PING CRITICAL - Packet loss = 100% [16:50:06] (03PS4) 10Jcrespo: profile::prometheus::ops_mysql: change exec to a system timer [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:52:58] (03CR) 10Jbond: [C: 03+1] "one optional nit i missed but lgtm" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572251 (owner: 10Muehlenhoff) [16:52:58] (03CR) 10Jcrespo: [C: 03+1] profile::prometheus::ops_mysql: change exec to a system timer [puppet] - 10https://gerrit.wikimedia.org/r/572691 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:53:50] er, that's me, I created the VM and shut it down until the puppet side is ready [16:54:02] flowspec? [16:54:21] jynus: basically a network controller [16:54:40] flowspec allows to propagate firewall rules via BGP [16:54:46] yeah, sorry, just confirming it was an answer to that [16:54:54] no prob on my side :-D [16:58:24] (03PS1) 10Elukey: admin: add kerberos flag for user sguebo [puppet] - 10https://gerrit.wikimedia.org/r/572714 (https://phabricator.wikimedia.org/T244913) [16:59:32] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 1.982e+04 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:59:45] ^ me [17:00:23] <_joe_> effie: please stop for now [17:00:52] PROBLEM - LVS HTTPS IPv6 #page on ncredir-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:00:54] (03PS2) 10Jbond: profile::ci::docker: manage all group membership in data module [puppet] - 10https://gerrit.wikimedia.org/r/572707 (https://phabricator.wikimedia.org/T242910) [17:00:58] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 10 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:01:06] * volans here [17:01:09] * vgutierrez here [17:01:16] * effie stopped [17:01:24] what's going on? [17:01:27] * jbond42 here [17:01:50] the memcached errors were mine [17:01:56] the LVS not mine [17:02:04] (03CR) 10Elukey: [C: 03+2] admin: add kerberos flag for user sguebo [puppet] - 10https://gerrit.wikimedia.org/r/572714 (https://phabricator.wikimedia.org/T244913) (owner: 10Elukey) [17:02:26] PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 27.81 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:02:45] 10Operations, 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests: LDAP access to the wmf group for CherRaye Glenn (superset, turnilo, hue) - https://phabricator.wikimedia.org/T244410 (10Nuria) [17:02:49] (03CR) 10Jbond: profile::ci::docker: manage all group membership in data module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572707 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [17:03:50] PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 60672528 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [17:03:58] 10Operations: Remove references to m4-master - https://phabricator.wikimedia.org/T245238 (10Nuria) [17:04:26] PROBLEM - LVS HTTPS IPv4 #page on ncredir-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:04:28] PROBLEM - LVS HTTP IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:05:22] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by ConnectTimeoutError(urllib3.connection.VerifiedHTTPSConnection object at 0x7f10564a9160, Connection to text-lb.esams.wikimedia.org timed out. (connect timeout=15)): /api/rest_v1/?spec https://wikitech.wikimedia.org/wiki/RESTBase [17:05:30] RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 0 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [17:05:58] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 67 probes of 523 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [17:06:00] PROBLEM - LVS HTTPS IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:06:18] PROBLEM - LVS HTTPS IPv4 #page on text-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:07:10] PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog1001.eqiad.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops [17:07:40] RECOVERY - LVS HTTPS IPv4 #page on ncredir-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 233 bytes in 3.390 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:08:00] (03PS1) 10Giuseppe Lavagetto: depool esams [dns] - 10https://gerrit.wikimedia.org/r/572718 [17:08:46] RECOVERY - LVS HTTPS IPv6 #page on ncredir-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 233 bytes in 7.513 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:08:52] RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops [17:09:09] (03CR) 10BBlack: [C: 03+1] depool esams [dns] - 10https://gerrit.wikimedia.org/r/572718 (owner: 10Giuseppe Lavagetto) [17:09:39] (03CR) 10BBlack: [C: 03+2] depool esams [dns] - 10https://gerrit.wikimedia.org/r/572718 (owner: 10Giuseppe Lavagetto) [17:10:24] ^ pending authdns-update, not live yet, we may leave it [17:10:32] PROBLEM - LibreNMS has a critical alert #page on icinga1001 is CRITICAL: Primary inbound port utilisation over 80% #page (cr2-esams.wikimedia.org,cr3-esams.wikimedia.org) https://wikitech.wikimedia.org/wiki/Network_monitoring%23LibreNMS_alerts [17:11:04] RECOVERY - LVS HTTPS IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15019 bytes in 4.091 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:11:15] RECOVERY - LVS HTTP IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 563 bytes in 4.087 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:11:16] cdanis: neat ^ (the librenms utilization) [17:11:25] RECOVERY - LVS HTTPS IPv4 #page on text-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15006 bytes in 4.714 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:11:47] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by ReadTimeoutError(HTTPSConnectionPool(host=text-lb.esams.wikimedia.org, port=443): Read timed out.,): /api/rest_v1/?spec https://wikitech.wikimedia.org/wiki/RESTBase [17:12:47] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [17:14:21] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 7959 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:15:04] (03CR) 10Alexandros Kosiaris: [C: 03+1] Add config for OpusMT [deployment-charts] - 10https://gerrit.wikimedia.org/r/563110 (https://phabricator.wikimedia.org/T234194) (owner: 10KartikMistry) [17:15:44] 10Operations, 10DBA, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10Trizek-WMF) >>! In T244238#5876674, @Marostegui wrote: > @Trizek-WMF so we are going to do this maintenance Thursday 20th at 09:00 AM UTC, can you post it on Technews? I'm just co... [17:16:11] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 12 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:16:13] RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 104.1 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:17:45] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 31 probes of 523 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [17:18:01] RECOVERY - LibreNMS has a critical alert #page on icinga1001 is OK: OK: zero critical LibreNMS alerts https://wikitech.wikimedia.org/wiki/Network_monitoring%23LibreNMS_alerts [17:18:22] 10Operations, 10DBA, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10Marostegui) >>! In T244238#5890721, @Trizek-WMF wrote: >>>! In T244238#5876674, @Marostegui wrote: >> @Trizek-WMF so we are going to do this maintenance Thursday 20th at 09:00 AM U... [17:19:24] 10Operations, 10DBA, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1135) - https://phabricator.wikimedia.org/T244238 (10Trizek-WMF) We all run at high speeds. Don't worry though, I don't think that missing Tech News will be a blocker. :) [17:25:39] !log GRE MTU mitigations applied to esams cp hosts only - T232602 [17:25:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:44] T232602: GRE MTU mitigations - Tracking - https://phabricator.wikimedia.org/T232602 [17:43:13] (03CR) 10Florianschmidtwelzow: [C: 03+1] Update authmanager-statsd channel names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572502 (owner: 10Gergő Tisza) [17:44:03] (03CR) 10Florianschmidtwelzow: [C: 03+1] Make the logstash and authmanager-statsd Monolog handlers compatible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572401 (owner: 10Gergő Tisza) [17:45:31] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Designate: start using '.eqiad1.wikimedia.cloud' domain in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/572686 (https://phabricator.wikimedia.org/T245173) (owner: 10Andrew Bogott) [17:52:42] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "I may still have some servers set to use 'mitaka'. We can either apply this patch or wait until we discover which server and fix it so we " [puppet] - 10https://gerrit.wikimedia.org/r/572696 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [17:55:47] (03CR) 10Jbond: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/572696 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [17:56:43] RECOVERY - rpki grafana alert on icinga1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [18:03:45] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [18:03:45] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:08:13] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 69 probes of 527 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [18:09:51] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 38 probes of 523 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [18:10:43] PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:10:47] PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:10:47] PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:10:47] PROBLEM - restbase endpoints health on restbase2023 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:10:47] PROBLEM - restbase endpoints health on restbase2014 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:11:45] 10Operations, 10Core Platform Team, 10MediaWiki-API, 10Pywikibot: WMFTimeoutException on non-existent files - https://phabricator.wikimedia.org/T245374 (10AntiCompositeNumber) The code I used: ` #!/bin/env python3 import pywikibot import logging logging.basicConfig(filename="test.log", level=logging.DEBU... [18:12:41] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/legacy/pagecounts/aggregate/{project}/{access-site}/{granularity}/{start}/{end} (Get pagecounts) timed out before a response was received: /analytics.wikimedia.org/v1/unique-devices/{project}/{access-site}/{granularity}/{start}/{end} (Get unique devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitorin [18:13:22] I am restarting cassandra on aqs1004, might be me [18:13:43] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [18:13:43] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:14:27] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 37 probes of 527 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [18:15:49] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 31 probes of 523 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [18:16:33] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [18:16:37] RECOVERY - restbase endpoints health on restbase1023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:16:39] RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:16:39] RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:16:39] RECOVERY - restbase endpoints health on restbase2023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:16:39] RECOVERY - restbase endpoints health on restbase2014 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:18:02] elukey: aware of aqs, but do you know if restbase services could use aqs as a dependency? [18:18:27] *be using [18:19:12] jynus: aqs is behind restbase, it affects (usually) mobile apps or similar if it breaks, but not rb as I am seeing now [18:19:38] yeah, I knew the first part, but I was surprised about the feed alert one [18:20:10] me too, the timing is suspicious, but I don't know any correlation [18:20:11] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 32 probes of 527 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [18:20:24] maybe something to ask maintainers :-D [18:20:25] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 31 probes of 527 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [18:20:42] will do tomorrow [18:25:31] !log restart kafka on kafka-jumbo1001 to pick up new openjdk updates [18:25:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:36:38] 10Operations: Integrate Stretch 9.12 point update - https://phabricator.wikimedia.org/T244695 (10MoritzMuehlenhoff) [18:57:48] can someone put this person on moderation? https://lists.wikimedia.org/pipermail/wikitech-ambassadors/2020-February/thread.html [18:57:53] quiddity: ^ [18:58:11] will do [19:01:25] Thanks! [19:02:29] PROBLEM - rpki grafana alert on icinga1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: RRDP status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [19:08:19] (03PS4) 10CDanis: Add option to clamp TCP-MSS [homer/public] - 10https://gerrit.wikimedia.org/r/569636 (owner: 10Ayounsi) [19:27:45] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 1.174e+04 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:28:04] (03PS3) 10Ayounsi: Add cookbook to control CF BGP advertisements [cookbooks] - 10https://gerrit.wikimedia.org/r/572262 [19:28:28] (03PS1) 10Bartosz Dziewoński: Add DiscussionTools to four wikis in hidden mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572731 (https://phabricator.wikimedia.org/T244870) [19:29:45] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 15 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:33:51] !log no-op enable flowspec change on cr2-eqord and cr2-eqiad [19:33:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:30] (03CR) 10CDanis: [C: 03+2] "Needed now." [homer/public] - 10https://gerrit.wikimedia.org/r/569636 (owner: 10Ayounsi) [19:34:45] (03CR) 10CDanis: [V: 03+2 C: 03+2] Add option to clamp TCP-MSS [homer/public] - 10https://gerrit.wikimedia.org/r/569636 (owner: 10Ayounsi) [19:34:47] (03Merged) 10jenkins-bot: Add option to clamp TCP-MSS [homer/public] - 10https://gerrit.wikimedia.org/r/569636 (owner: 10Ayounsi) [19:38:20] (03PS1) 10Ayounsi: Add prepending to esams [homer/public] - 10https://gerrit.wikimedia.org/r/572732 [19:39:29] (03PS2) 10Ayounsi: Add prepending and TCP-mss clamping to esams [homer/public] - 10https://gerrit.wikimedia.org/r/572732 [19:39:56] (03PS1) 10CDanis: enable TCP MSS clamping in eqiad/eqord [homer/public] - 10https://gerrit.wikimedia.org/r/572733 [19:40:07] (03CR) 10CDanis: [C: 03+2] Add prepending and TCP-mss clamping to esams [homer/public] - 10https://gerrit.wikimedia.org/r/572732 (owner: 10Ayounsi) [19:40:23] (03Merged) 10jenkins-bot: Add prepending and TCP-mss clamping to esams [homer/public] - 10https://gerrit.wikimedia.org/r/572732 (owner: 10Ayounsi) [19:41:25] (03CR) 10Ayounsi: [C: 03+1] "not tested but lgtm" [homer/public] - 10https://gerrit.wikimedia.org/r/572733 (owner: 10CDanis) [19:44:26] (03CR) 10CDanis: [C: 03+2] enable TCP MSS clamping in eqiad/eqord [homer/public] - 10https://gerrit.wikimedia.org/r/572733 (owner: 10CDanis) [19:44:44] (03Merged) 10jenkins-bot: enable TCP MSS clamping in eqiad/eqord [homer/public] - 10https://gerrit.wikimedia.org/r/572733 (owner: 10CDanis) [19:49:00] !log no-op enable TCP-MSS clamping on eqord and eqiad [19:49:09] !log s/no-op// [19:49:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:47] Hello, I have problems with ContentTranslation [19:53:20] When I click to publish translation it just shows publishing and nothing hapens [19:54:07] *happens [19:54:48] Can someone check logstash for srwiki? [19:56:30] !log finish enabling TCP-MSS clamping in eqiad [19:56:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:34] (03PS1) 10Krinkle: Raise minimum log level for 'OAuth' from DEBUG to INFO [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572737 (https://phabricator.wikimedia.org/T244185) [19:57:57] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 49 probes of 527 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [19:58:26] I'm waiting 8 minutes already, but nothing in my contributions: https://sr.wikipedia.org/wiki/%D0%9F%D0%BE%D1%81%D0%B5%D0%B1%D0%BD%D0%BE:%D0%94%D0%BE%D0%BF%D1%80%D0%B8%D0%BD%D0%BE%D1%81%D0%B8/Zoranzoki21 [20:05:17] 10Operations, 10serviceops, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10jijiki) [20:06:13] RECOVERY - rpki grafana alert on icinga1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [20:07:09] (03PS4) 10Krinkle: [BETA] Enable array LCStoreStaticArray format on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/508724 (https://phabricator.wikimedia.org/T99740) (owner: 10Jforrester) [20:07:15] PROBLEM - Juniper alarms on asw-a-codfw is CRITICAL: JNX_ALARMS CRITICAL - 1 red alarms, 0 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm [20:08:37] (03CR) 10jerkins-bot: [V: 04-1] [BETA] Enable array LCStoreStaticArray format on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/508724 (https://phabricator.wikimedia.org/T99740) (owner: 10Jforrester) [20:11:17] RECOVERY - Juniper alarms on asw-a-codfw is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm [20:14:03] (03CR) 10Gergő Tisza: [C: 03+1] Raise minimum log level for 'OAuth' from DEBUG to INFO [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572737 (https://phabricator.wikimedia.org/T244185) (owner: 10Krinkle) [20:14:35] (03PS23) 10ArielGlenn: write out and reuse pagerange info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434) [20:14:37] (03PS5) 10ArielGlenn: properly handle failure of writing of temp stubs for page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/562995 (https://phabricator.wikimedia.org/T242209) [20:16:20] hmm [20:16:24] * apergos looks at patchset 23 [20:16:52] (03PS5) 10Krinkle: [BETA] Enable LCStoreStaticArray format on Beta Cluster wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/508724 (https://phabricator.wikimedia.org/T99740) (owner: 10Jforrester) [20:17:17] oh heh i forgot to push the subtest stuff. well then [20:18:06] 25 minutes and translation isn't still published.... [20:19:57] 10Operations, 10ops-codfw: asw-a-codfw:FPC8 PEM0 flapping - https://phabricator.wikimedia.org/T245458 (10ayounsi) [20:26:44] (03CR) 10Krinkle: [C: 04-1] "Updated to enable Beta all at once. Added warnings and checks. -1 because still blocked on Scap release." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/508724 (https://phabricator.wikimedia.org/T99740) (owner: 10Jforrester) [20:29:07] 10Operations, 10serviceops, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10jijiki) **Test if failover works and when to failover** //Failover when we hit a TKO. // Failover works properly, when we block access from mwdebug1001... [20:29:29] Zoranzoki21: please file a task [20:38:34] p858snake: Thanks, reported as T245461 [20:38:35] T245461: ContentTranslation: Publishing translation takes much time - https://phabricator.wikimedia.org/T245461 [20:38:42] 10Operations, 10ops-eqiad: eqiad - Duplicate IP on mgmt network - https://phabricator.wikimedia.org/T245427 (10wiki_willy) Hey Arzhel - John came in on Saturday, to fix the dup ip issue via the following: https://phabricator.wikimedia.org/T245320 Are you still seeing issues? Thanks Willy [20:41:58] 10Operations, 10Wikimedia-Mailing-lists: Creation of North Carolina mailing list - https://phabricator.wikimedia.org/T245462 (10Pharos) [20:49:52] 10Operations, 10ContentTranslation: ContentTranslation: Publishing translation takes much time - https://phabricator.wikimedia.org/T245461 (10Zoranzoki21) [21:02:08] 10Operations, 10ops-eqiad, 10DC-Ops: mr1-eqiad.wikimedia.org - Duplicate IP on mgmt network - https://phabricator.wikimedia.org/T245320 (10ayounsi) [21:02:10] 10Operations, 10ops-eqiad: eqiad - Duplicate IP on mgmt network - https://phabricator.wikimedia.org/T245427 (10ayounsi) [21:02:39] 10Operations, 10ops-eqiad, 10DC-Ops: mr1-eqiad.wikimedia.org - Duplicate IP on mgmt network - https://phabricator.wikimedia.org/T245320 (10ayounsi) Thanks! LibreNMS check re-enabled. [21:08:09] PROBLEM - rpki grafana alert on icinga1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: RRDP status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [22:18:41] (03PS1) 10Art-Baltai: Complete WikiPage/Article split and deprecate Page interface change Article::getTouched to Article::getPage()->getTouched() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572751 (https://phabricator.wikimedia.org/T239975) [22:51:29] 10Operations, 10Cloud-VPS, 10DNS, 10Maps, and 2 others: multi-component wmflabs.org subdomains doesn't work under simple wildcard TLS cert - https://phabricator.wikimedia.org/T161256 (10TheDJ) [23:37:55] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=pdu_sentry4 site=eqsin https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:39:53] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:43:15] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review, 10Wikimedia-Incident: 15% response start regression as of 2019-11-11 (Varnish->ATS) - https://phabricator.wikimedia.org/T238494 (10Krinkle) I've updated some of the navtiming dashboards in Grafana to include a comparison line for "1 year ag... [23:48:32] 10Operations, 10ContentTranslation: ContentTranslation: Publishing translation takes much time - https://phabricator.wikimedia.org/T245461 (10Zoranzoki21) I tried again now to publish, but looks like it behaves same. Can someone check Logstash?