[00:04:02] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/377366 (owner: 10Dzahn) [00:04:26] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: replace validate_bool with validate_legacy [puppet] - 10https://gerrit.wikimedia.org/r/377366 (owner: 10Dzahn) [00:09:31] (03PS2) 10Smalyshev: Add setup for https://www.mediawiki.org/ontology [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368120 (https://phabricator.wikimedia.org/T171807) [00:43:07] PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2048879 [00:45:32] (03PS6) 10GeoffreyT2000: Rename Wikisaurus namespace on Wiktionary to "Thesaurus" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374063 (https://phabricator.wikimedia.org/T174264) [00:46:57] PROBLEM - Check Varnish expiry mailbox lag on cp1050 is CRITICAL: CRITICAL: expiry mailbox lag is 2022245 [00:48:19] !log awight@tin Started deploy [ores/deploy@42c5663]: Cause ORES service restart [00:48:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:48:37] !log awight@tin Finished deploy [ores/deploy@42c5663]: Cause ORES service restart (duration: 00m 18s) [00:48:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:13:04] robh: o/ feel free to reenable puppet on ores100*, I’m done testing and just puzzling over the results. [01:13:15] cool, doing now [01:13:41] done [01:24:20] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3599239 (10awight) I'm stuck and unable to run the service (T175654), seemingly due to file handl... [02:08:55] !log cp1099 - backend restart, mailbox lagh [02:09:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:10:32] !log cp1050 - backend restart, mailbox lag [02:10:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:13:07] RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 0 [02:17:07] RECOVERY - Check Varnish expiry mailbox lag on cp1050 is OK: OK: expiry mailbox lag is 0 [02:22:36] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.17) (duration: 07m 31s) [02:22:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:29:21] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Sep 12 02:29:20 UTC 2017 (duration 6m 45s) [02:29:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:25:03] (03PS1) 10KartikMistry: Fix apertium-all-dev Depends [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/377389 [03:31:19] (03PS1) 10KartikMistry: apertium-crh: Initial Debian packaging [debs/contenttranslation/apertium-crh] - 10https://gerrit.wikimedia.org/r/377390 (https://phabricator.wikimedia.org/T174765) [03:40:40] (03PS1) 10KartikMistry: apertium-tur: Initial Debian packaging [debs/contenttranslation/apertium-tur] - 10https://gerrit.wikimedia.org/r/377392 (https://phabricator.wikimedia.org/T174765) [04:04:53] (03PS1) 10EBernhardson: Setup Cirrus MLR models for top 20 language AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377393 [04:05:09] (03PS2) 10EBernhardson: Setup Cirrus MLR models for top 20 language AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377393 [04:43:40] (03PS1) 10KartikMistry: apertium-cat: New upstream release [debs/contenttranslation/apertium-cat] - 10https://gerrit.wikimedia.org/r/377395 (https://phabricator.wikimedia.org/T174988) [04:56:39] (03PS1) 10KartikMistry: apertium-srd: New upstream release [debs/contenttranslation/apertium-srd] - 10https://gerrit.wikimedia.org/r/377396 (https://phabricator.wikimedia.org/T174988) [04:57:59] (03PS3) 10KartikMistry: cg3: New upstream version [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/362334 (https://phabricator.wikimedia.org/T171406) [05:06:38] (03PS1) 10KartikMistry: apertium-ita: New upstream release [debs/contenttranslation/apertium-ita] - 10https://gerrit.wikimedia.org/r/377398 (https://phabricator.wikimedia.org/T174988) [05:14:06] (03PS1) 10KartikMistry: apertium-srd-ita: New upstream release [debs/contenttranslation/apertium-srd-ita] - 10https://gerrit.wikimedia.org/r/377399 (https://phabricator.wikimedia.org/T174988) [05:14:45] (03CR) 10jerkins-bot: [V: 04-1] apertium-srd-ita: New upstream release [debs/contenttranslation/apertium-srd-ita] - 10https://gerrit.wikimedia.org/r/377399 (https://phabricator.wikimedia.org/T174988) (owner: 10KartikMistry) [05:26:49] (03CR) 10ArielGlenn: [C: 032] Add categories RDF dump into the index page [puppet] - 10https://gerrit.wikimedia.org/r/377369 (https://phabricator.wikimedia.org/T173892) (owner: 10Smalyshev) [05:31:08] (03PS1) 10KartikMistry: apertium-cat-srd: Initial Debian packaging [debs/contenttranslation/apertium-cat-srd] - 10https://gerrit.wikimedia.org/r/377402 (https://phabricator.wikimedia.org/T174987) [06:07:37] PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS1299/IPv6: Active, AS1299/IPv4: Active [06:13:47] RECOVERY - BGP status on cr2-eqiad is OK: BGP OK - up: 150, down: 0, shutdown: 4 [06:16:29] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0 [06:28:47] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [06:37:52] (03PS1) 10VolkerE: Remove unnecessary `id` attributes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377406 (https://phabricator.wikimedia.org/T175670) [07:02:47] (03CR) 10Alexandros Kosiaris: [C: 032] cg3: New upstream version [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/362334 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [07:05:07] (03CR) 10Alexandros Kosiaris: "@Kartik, I would suggest deployment-prep. I 'll avoid uploading the package to apt.wikimedia.org and just provide it via my homedir while " [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/362334 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [07:09:20] (03CR) 10Alexandros Kosiaris: [C: 04-2] "Hm, I 'd say less avoid making changes this time to third-party code like stdlib. I 'd suggest submitting a PR to https://github.com/puppe" [puppet] - 10https://gerrit.wikimedia.org/r/377355 (owner: 10Dzahn) [07:10:37] (03CR) 10Alexandros Kosiaris: [C: 04-1] "19:23:21 Line 2: Second line should be empty" [puppet] - 10https://gerrit.wikimedia.org/r/377327 (owner: 10Halfak) [07:13:07] (03PS2) 10Jcrespo: Revert "mariadb: Depool es1019 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377306 [07:14:16] (03PS3) 10Jcrespo: Revert "mariadb: Depool es1019 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377306 [07:14:28] (03CR) 10Alexandros Kosiaris: "@KartikMistry, newer cg3 packages at https://people.wikimedia.org/~akosiaris/" [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/362334 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [07:15:34] !log upgrading mw1180-mw1188, mw1209-1220 (app servers) to HHVM-Luasandbox 2.0.14 (along with HHVM restarts) [07:15:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:18:40] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool es1019 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377306 (owner: 10Jcrespo) [07:19:09] (03CR) 10KartikMistry: "> @KartikMistry, newer cg3 packages at https://people.wikimedia.org/~akosiaris/" [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/362334 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [07:20:04] akosiaris: Thanks. Can you quick guide me how can I update package in deployment-prep? [07:20:14] (03Merged) 10jenkins-bot: Revert "mariadb: Depool es1019 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377306 (owner: 10Jcrespo) [07:20:23] (03CR) 10jenkins-bot: Revert "mariadb: Depool es1019 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377306 (owner: 10Jcrespo) [07:20:38] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [07:20:58] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [07:24:56] kart_: hmm I don't even know the hosts apertium is deployed at in deployment-prep these days [07:27:25] kart_: maybe deployment-apertium02.deployment-prep.eqiad.wmflabs is enough ? [07:27:30] login and update the package ? [07:29:40] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3599482 (10akosiaris) >>! In T174402#3598841, @RobH wrote: > So I just did the same, upon request... [07:30:19] (03CR) 10DCausse: Setup Cirrus MLR models for top 20 language AB test (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377393 (owner: 10EBernhardson) [07:33:11] akosiaris: it is deployment-sca02 now :) [07:33:28] heh, ok good to know [07:34:39] akosiaris: and I found some mistakes! [07:35:10] akosiaris: OK. I'll update and let you know. cg3 like to break many packages, so I'll go bit slow. [07:35:30] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1019 with full weight (duration: 00m 45s) [07:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:57] 10Operations, 10Performance-Team, 10monitoring: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837#3599487 (10Krinkle) [07:43:41] (03PS17) 10Phedenskog: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) [07:48:35] 10Operations, 10Release Pipeline, 10Release-Engineering-Team (Kanban): Provision Docker >= 17.05 on contint1001 - https://phabricator.wikimedia.org/T175293#3599497 (10hashar) In apt.wikimedia.org we have: | docker.io | 1.6.2~dfsg1-1~bpo8+1 | http://mirrors.wikimedia.org/debian/ jessie-backports/main amd64 P... [07:51:06] I didn't add anything to SWAT [07:52:07] RECOVERY - Disk space on stat1005 is OK: DISK OK [07:55:36] 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review, 10Performance-Team (Radar): Decommission osmium.eqiad.wmnet - https://phabricator.wikimedia.org/T175093#3599504 (10Krinkle) [08:01:16] 10Operations, 10ops-eqiad, 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3599508 (10elukey) >>! In T132256#3598540, @Cmjohnson wrote: > @elukey we finished these...correct? Nope still to do :) Updated l... [08:04:20] (03CR) 10Hashar: "recheck" [debs/contenttranslation/apertium-cat-srd] - 10https://gerrit.wikimedia.org/r/377402 (https://phabricator.wikimedia.org/T174987) (owner: 10KartikMistry) [08:04:22] (03CR) 10Hashar: "recheck" [debs/contenttranslation/apertium-crh] - 10https://gerrit.wikimedia.org/r/377390 (https://phabricator.wikimedia.org/T174765) (owner: 10KartikMistry) [08:04:25] (03CR) 10Hashar: "recheck" [debs/contenttranslation/apertium-tur] - 10https://gerrit.wikimedia.org/r/377392 (https://phabricator.wikimedia.org/T174765) (owner: 10KartikMistry) [08:05:06] (03CR) 10jerkins-bot: [V: 04-1] apertium-cat-srd: Initial Debian packaging [debs/contenttranslation/apertium-cat-srd] - 10https://gerrit.wikimedia.org/r/377402 (https://phabricator.wikimedia.org/T174987) (owner: 10KartikMistry) [08:05:38] (03CR) 10jerkins-bot: [V: 04-1] apertium-tur: Initial Debian packaging [debs/contenttranslation/apertium-tur] - 10https://gerrit.wikimedia.org/r/377392 (https://phabricator.wikimedia.org/T174765) (owner: 10KartikMistry) [08:05:49] (03PS2) 10Muehlenhoff: Stop installing debdeploy-minion for new WMCS VPS instances [puppet] - 10https://gerrit.wikimedia.org/r/377202 [08:06:49] (03CR) 10Filippo Giunchedi: "LGTM modulo comment needs updating" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/376571 (owner: 10Thcipriani) [08:10:51] (03CR) 10Muehlenhoff: [C: 032] Stop installing debdeploy-minion for new WMCS VPS instances [puppet] - 10https://gerrit.wikimedia.org/r/377202 (owner: 10Muehlenhoff) [08:12:23] (03PS2) 10KartikMistry: apertium-tur: Initial Debian packaging [debs/contenttranslation/apertium-tur] - 10https://gerrit.wikimedia.org/r/377392 (https://phabricator.wikimedia.org/T174765) [08:20:02] (03PS1) 10Aaron Schulz: Enable logging of post-send DB updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377412 (https://phabricator.wikimedia.org/T166199) [08:22:09] (03CR) 10Filippo Giunchedi: "LGTM overall, see comment on having 3d2png on imagescalers too" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/345377 (https://phabricator.wikimedia.org/T160185) (owner: 10MarkTraceur) [08:24:04] (03PS1) 10Hashar: graphite: cleanup servers.* [puppet] - 10https://gerrit.wikimedia.org/r/377414 [08:24:49] (03CR) 10Hashar: "So that when doing a query for servers.mw* we get rid of obsoletes servers :]" [puppet] - 10https://gerrit.wikimedia.org/r/377414 (owner: 10Hashar) [08:26:23] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3599528 (10elukey) [08:26:25] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: kafka-jumbo.cfg partman recipe creation/troubleshooting - https://phabricator.wikimedia.org/T174457#3599527 (10elukey) 05Open>03Resolved [08:28:03] !log upgrading mw1189-mw1208 (API servers) to HHVM-Luasandbox 2.0.14 (along with HHVM restarts) [08:28:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:42] !log deploying elasticsearch plugins on relforge - T175159 [08:34:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:56] T175159: Prepare a new deb package for search plugins compatible with elastic 5.5.2 - https://phabricator.wikimedia.org/T175159 [08:38:07] !log deploying elasticsearch plugins on relforge - T158560 (wrong ticket number before) [08:38:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:19] T158560: Use debian packages instead of salt to deploy elasticsearch plugins - https://phabricator.wikimedia.org/T158560 [08:41:49] RECOVERY - Host kafka-jumbo1001 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [08:42:18] 10Operations, 10Traffic, 10monitoring: prometheus -> grafana stats for per-numa-node meminfo - https://phabricator.wikimedia.org/T175636#3599588 (10ema) p:05Triage>03Normal [08:43:26] 10Operations, 10DBA, 10Performance-Team, 10Availability (Multiple-active-datacenters): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3599592 (10aaron) [08:44:57] 10Operations, 10DBA, 10Performance-Team, 10Availability (Multiple-active-datacenters): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3599592 (10aaron) [08:49:26] 10Operations, 10DBA, 10Performance-Team, 10Availability (Multiple-active-datacenters): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3599616 (10jcrespo) I can help with this, but I will need supervision to understand the... [08:55:42] (03PS6) 10DCausse: Upgrade plugins to elastic 5.5.2 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/376477 (https://phabricator.wikimedia.org/T175159) [09:03:12] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3599635 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by volans on sarin.codfw.wmnet for hosts: ``` ['kafka-jumbo1002.e... [09:03:26] (03CR) 10Gehel: [C: 031] "I'll wait until the 5.3.x plugins are deployed everywhere to merge this CR, build the package and upload it." [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/376477 (https://phabricator.wikimedia.org/T175159) (owner: 10DCausse) [09:14:29] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3599667 (10Gilles) [09:21:43] jynus: For the watchlist bug, I think the second bullet you make ("Else, if the number of recentchanges is less than M, join recentchanges -> Watchlist") is also status quo, just that with wikidata adding so many RC entries, recentchanges is never less than M anymore, for any sane value of M [09:22:31] the important bit is the third one :-) [09:22:39] I only wrote that for completeness [09:23:37] I think there could be some plans ongoing to do some paging, but I am not sure about that [09:23:45] some key people is on vacations [09:24:02] I believe there is, although paging a query using a temporary table doesn't sound too helpful [09:24:12] what? [09:24:14] no [09:24:20] no temporary table [09:24:24] we do not use those [09:24:30] we do not even talk about those [09:24:36] I mean the "using temporary" in the explain [09:24:40] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3599688 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['kafka-jumbo1002.eqiad.wmnet', 'kafka-jumbo1003.eqiad.wmnet', 'k... [09:24:47] not an actual created temporary table [09:24:55] !log upgrading mw1293-mw1205 (image scalers) to HHVM-Luasandbox 2.0.14 (along with HHVM restarts) [09:25:03] !log upgrading mw1293-mw1295 (image scalers) to HHVM-Luasandbox 2.0.14 (along with HHVM restarts) [09:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:11] PROBLEM - Check systemd state on kafka-jumbo1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:25:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:19] if we join recentchanges -> watchlist [09:25:27] there is not implicit temporary table [09:25:34] on the kafka-jumbo hosts that's me with elukey ^^^ [09:25:40] that is only needed because the extra sort phase [09:26:07] PROBLEM - Kafka Broker Server on kafka-jumbo1002 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args Kafka /etc/kafka/server.properties [09:26:07] PROBLEM - Check systemd state on kafka-jumbo1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:26:43] yes, but in the case of doing the watchlist -> recentchanges join direction, I can't imagine paging is going to help at all [09:26:48] volans elukey ack [09:27:01] we are not doing that, read my comment again [09:27:02] damn, disabled notification, I didn't know it would have paged, sorry [09:28:10] jynus: Oh you mean in future when doing the "Else, scan recentchanges in inverse timestamp order, paged in small batches of size S, and join in the direction small recentchanges batch" idea? [09:28:22] Sorry, I thought we were talking about the current situation [09:28:38] bawolff: https://phabricator.wikimedia.org/P5988 [09:28:45] see, no sort, no temporary table [09:29:04] the problem there is the large amount of rows scanned on recentchanges [09:29:09] that is where the paging is useful [09:29:28] normally you have to scan N * M results [09:29:42] I think I understand what you mean now [09:29:56] by making N so small it is almost O(1), you only have M rather than M^2 results [09:30:10] please feel free to clarify on the ticket for other too [09:30:13] I believe some other people had plans to do some sort of paging, while keeping the current queries, which is how I got confused over what we were talking about [09:30:33] It is dificult to do mysqldump my brain to the ticket sometimes [09:31:15] if we do all queries, it will be the same results as now [09:31:37] the point is to page so 1) each individual query is very fast so it doesn't timeout [09:31:55] 2) you stop after gathering enough results so not all queries are needed [09:32:29] !log upgrading mw2* to HHVM-Luasandbox 2.0.14 [09:32:36] we could even tune a bit the query, let me see [09:32:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:58] The concern I have with this plan, is does scanning in small batches actually return any results in a reasonable number of batches for actual users? [09:33:25] I assume when you say small batch size, you mean something on the order of 1000 ? [09:33:27] !log upgrading deployment servers to HHVM-Luasandbox 2.0.14 [09:33:30] if watchlist for the user is large, it should [09:33:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:02] in any case, if a user has to wait 1 minute, queries will not fail [09:34:13] Well I guess I could try and test and see. Fae has an insanely large watchlist on commons so makes a good test case [09:34:17] I think actually I can make query better [09:34:22] *the [09:34:30] on rc slaves [09:34:32] let m see [09:36:08] an index on rc_type, rc_id, maybe? [09:36:14] like we do with revision [09:36:14] 10Operations, 10MediaWiki-Platform-Team, 10Epic, 10Performance-Team (Radar), 10Services (watching): 2017/18 Annual Plan Program 8: Multi-datacenter support, Q2 goals - https://phabricator.wikimedia.org/T175213#3599706 (10Gilles) [09:36:19] 10Operations, 10MediaWiki-Platform-Team, 10Epic, 10Performance-Team (Radar), 10Services (watching): 2017/18 Annual Plan Program 8: Multi-datacenter support - https://phabricator.wikimedia.org/T175206#3599708 (10Gilles) [09:37:51] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 32 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [09:38:51] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 41 probes of 276 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [09:39:56] (03PS6) 10MarcoAurelio: Lift account creation restrictions for WM Taiwan 10th anniversary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377227 (https://phabricator.wikimedia.org/T175534) [09:41:40] PROBLEM - Host ripe-atlas-ulsfo IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [09:41:51] mark, paravoid: ^ [09:43:40] (03CR) 10MarcoAurelio: "> Just to check, was this done intentionally?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377274 (owner: 10MarcoAurelio) [09:43:45] (03PS4) 10MarcoAurelio: Add Extension:Newsletter permissions to CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377274 [09:44:14] can that be due to PWIC78398 (in Ashburn)? [09:45:29] (03CR) 10Matthias Mullie: [C: 031] Add 3d2png deploy repo to image scalers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/345377 (https://phabricator.wikimedia.org/T160185) (owner: 10MarkTraceur) [09:45:30] PROBLEM - Host ripe-atlas-codfw IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [09:45:52] (03Abandoned) 10MarcoAurelio: Update es.wiktionary logo from SVG version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374370 (owner: 10MarcoAurelio) [09:46:24] 10Operations, 10Performance-Team: Move coal from graphite machine(s) - https://phabricator.wikimedia.org/T159354#3599741 (10Krinkle) [09:46:25] (03PS4) 10Muehlenhoff: Remove package declarations for debdeploy-minion/debdeploy-common [puppet] - 10https://gerrit.wikimedia.org/r/377265 [09:47:03] ema: looking [09:47:52] (03PS16) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [09:49:13] maint window is already past [09:49:18] and the bgp session is up [09:53:11] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3352701 (10Volans) For the record they were reimaged correctly, the new reimage script hit a small bug in the post-reimage part, I've... [09:53:48] mark: ok [09:54:00] mark: perhaps interestingly, the alerts are only about IPv6 [09:54:00] (03PS1) 10Elukey: network::constants: update IP addresses of the new Kafka hosts [puppet] - 10https://gerrit.wikimedia.org/r/377417 (https://phabricator.wikimedia.org/T167992) [09:54:05] yes [09:55:01] moritzm: whenever you have time --^ [09:55:29] (03PS1) 10Filippo Giunchedi: prometheus: increase jmx_exporter timeout [puppet] - 10https://gerrit.wikimedia.org/r/377418 (https://phabricator.wikimedia.org/T171772) [09:56:24] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: increase jmx_exporter timeout [puppet] - 10https://gerrit.wikimedia.org/r/377418 (https://phabricator.wikimedia.org/T171772) (owner: 10Filippo Giunchedi) [09:56:38] 10Operations, 10Performance-Team, 10Patch-For-Review: webpagetest-alerts: Difference in size authenticated - https://phabricator.wikimedia.org/T164209#3599776 (10Krinkle) 05Open>03Resolved I can't find the task for that regression, but it was a genuine regression in enwiki-mobile.anon.js size, and was ad... [09:56:40] weird [09:56:47] 10Operations, 10Performance-Team: webpagetest-alerts: Difference in size authenticated - https://phabricator.wikimedia.org/T164209#3599778 (10Krinkle) [09:57:16] elukey: having a look now [09:57:58] (03PS1) 10Alexandros Kosiaris: Support supplying a default egress policy [calico-k8s-policy-controller] - 10https://gerrit.wikimedia.org/r/377419 (https://phabricator.wikimedia.org/T170111) [10:00:30] i can't ping ripe-atlas-ulsfo even from the ulsfo routers [10:00:32] over ipv6 [10:00:37] no idea how they are managed [10:03:28] 10Operations, 10Performance-Team: /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam - https://phabricator.wikimedia.org/T169249#3391697 (10Gilles) Is this still a problem? [10:03:32] the two hosts marked as down on icinga, ripe-altas-{ulsfo,codfw}, are those the anchors? [10:05:10] yes [10:05:11] 2620:0:863:201:198:35:26:244 none incomplete 1 no no ae1.1221 [10:05:22] the router has no nexthop info for it (ulsfo) [10:06:27] so it seems like something up with those two anchors themselves [10:06:33] i have no idea how they are managed, remotely perhaps [10:06:38] they're in sandbox subnets on our side [10:08:31] librenms doesn't let me in [10:08:31] fun [10:08:58] 10Operations, 10Goal, 10Kubernetes, 10Patch-For-Review, and 2 others: Implement a pod networking policy approach - https://phabricator.wikimedia.org/T170111#3599812 (10akosiaris) My first approach had been to just document these things in the puppet repo without even partially actually enforcing them. That... [10:13:03] 10Operations, 10Operations-Software-Development, 10Patch-For-Review, 10Technical-Debt: Remove Salt from wmf-auto-reimage / wmf-reimage - https://phabricator.wikimedia.org/T166300#3291472 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by volans on sarin.codfw.wmnet for hosts: ``` ['mc1002.eqiad... [10:13:52] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/377417 (https://phabricator.wikimedia.org/T167992) (owner: 10Elukey) [10:13:56] i'm in now [10:13:59] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, Matthias can you make it to Thurs puppet swat slot to merge this? i.e. this slot https://wikitech.wikimedia.org/wiki/Deployments#dep" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/345377 (https://phabricator.wikimedia.org/T160185) (owner: 10MarkTraceur) [10:14:09] but i see 0 evidence of any other problems than the 2 anchors themselves [10:14:14] (03PS2) 10Elukey: network::constants: update IP addresses of the new Kafka hosts [puppet] - 10https://gerrit.wikimedia.org/r/377417 (https://phabricator.wikimedia.org/T167992) [10:14:16] (03CR) 10Muehlenhoff: [C: 032] Remove package declarations for debdeploy-minion/debdeploy-common [puppet] - 10https://gerrit.wikimedia.org/r/377265 (owner: 10Muehlenhoff) [10:14:21] thanks moritzm ! [10:14:22] (03PS5) 10Muehlenhoff: Remove package declarations for debdeploy-minion/debdeploy-common [puppet] - 10https://gerrit.wikimedia.org/r/377265 [10:15:49] 10Operations, 10Traffic, 10monitoring: prometheus -> grafana stats for per-numa-node meminfo - https://phabricator.wikimedia.org/T175636#3598644 (10fgiunchedi) AFAICT the `meminfo_numa` collector has been introduced in node_exporter 0.13 so it is already available across the fleet (we are running 0.14). It s... [10:16:16] (03CR) 10Matthias Mullie: [C: 031] "Yeah sure, Thursday works for me!" [puppet] - 10https://gerrit.wikimedia.org/r/345377 (https://phabricator.wikimedia.org/T160185) (owner: 10MarkTraceur) [10:16:18] PROBLEM - Disk space on copper is CRITICAL: DISK CRITICAL - /var/lib/docker/containers/680a7c956be0486d20392e113ff61a7a5df1451a77473d750493802ec360be64/shm is not accessible: Permission denied [10:17:01] (03CR) 10Elukey: [C: 032] network::constants: update IP addresses of the new Kafka hosts [puppet] - 10https://gerrit.wikimedia.org/r/377417 (https://phabricator.wikimedia.org/T167992) (owner: 10Elukey) [10:20:13] (03PS6) 10Muehlenhoff: Remove package declarations for debdeploy-minion/debdeploy-common [puppet] - 10https://gerrit.wikimedia.org/r/377265 [10:20:30] (03CR) 10Muehlenhoff: [V: 032 C: 032] Remove package declarations for debdeploy-minion/debdeploy-common [puppet] - 10https://gerrit.wikimedia.org/r/377265 (owner: 10Muehlenhoff) [10:20:49] mark: so we've got one service alert (ripe-atlas-eqiad IPv6 ping to eqiad) and two hosts down (ripe-atlas-ulsfo/codfw) [10:21:54] nothing wrong in esams though [10:22:13] oh there's no anchor in esams :P [10:29:20] (03CR) 10Ema: [C: 031] Remove salt grains for LVS [puppet] - 10https://gerrit.wikimedia.org/r/377291 (owner: 10Muehlenhoff) [10:29:48] (03PS2) 10Hashar: aptly: support components for clients [puppet] - 10https://gerrit.wikimedia.org/r/374813 [10:30:32] (03Abandoned) 10Hashar: contint: aptly server in labs [puppet] - 10https://gerrit.wikimedia.org/r/374805 (https://phabricator.wikimedia.org/T161882) (owner: 10Hashar) [10:31:37] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3599827 (10elukey) Proposed new term for the analytics-in4 filter on cr1/cr2 eqiad: ``` term mysql { from { destination-... [10:32:47] (03PS4) 10Hashar: aptly: support https [puppet] - 10https://gerrit.wikimedia.org/r/374837 [10:33:20] (03PS2) 10Alexandros Kosiaris: Support supplying a default egress policy [calico-k8s-policy-controller] - 10https://gerrit.wikimedia.org/r/377419 (https://phabricator.wikimedia.org/T170111) [10:33:40] (03CR) 10Hashar: "I removed the part that switched contint to use https since https://gerrit.wikimedia.org/r/#/c/374805/ got abandoned (we will use reprepo " [puppet] - 10https://gerrit.wikimedia.org/r/374837 (owner: 10Hashar) [10:34:03] (03PS1) 10Alexandros Kosiaris: Support supplying a default egress policy [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377421 (https://phabricator.wikimedia.org/T170111) [10:34:33] (03Abandoned) 10Alexandros Kosiaris: Support supplying a default egress policy [calico-k8s-policy-controller] - 10https://gerrit.wikimedia.org/r/377419 (https://phabricator.wikimedia.org/T170111) (owner: 10Alexandros Kosiaris) [10:35:28] RECOVERY - Disk space on copper is OK: DISK OK [10:35:43] _joe_: mobrovac: hello. Can we come around installation of packages for graphoid and other services ? :) [10:37:15] <_joe_> hashar: later, I have like ~ 10 things in queue already :P [10:37:24] to me the main question is if this mechanism will be used once we switch to k8s or not [10:37:36] if not, i don't care, if yes, i do care and want to bike-shed :D [10:37:43] well it breaks the provision of CI nodes so I can come with a ugly copy paste, but would like to figure out a good solution [10:37:52] <_joe_> akosiaris: thanks, master removed [10:38:06] _joe_: ah nice. thanks! [10:39:19] I hate to be a nuisance, but I'm still unable to delete https://commons.wikimedia.org/wiki/File:Literature_II,_Harutyun_Surkhatian.djvu, the bug https://phabricator.wikimedia.org/T173374 has been sitting for nearly a month now with no activity. [10:39:51] I guess I will just copy paste the list of packages [10:40:18] <_joe_> hashar: I can work on a decent solution tonight, if you just do a c/p for now it's better [10:40:23] (03PS1) 10Muehlenhoff: yubiauth: Use the future parser [puppet] - 10https://gerrit.wikimedia.org/r/377422 [10:40:31] <_joe_> but I urge you again to include your code into ops/puppet [10:40:39] <_joe_> else this will happen again [10:42:21] well now I gotta do the opposite: monkey patch the list of packages in the CI puppet to unblock myself [10:42:28] but yeah part of them should be "upstreamed" [10:42:29] PROBLEM - Disk space on copper is CRITICAL: DISK CRITICAL - /var/lib/docker/containers/59e63204c2668d0bf3df83747cd47c2ce09a1036f777662c42ce08526142cb0b/shm is not accessible: Permission denied [10:43:28] RECOVERY - Disk space on copper is OK: DISK OK [10:44:28] RECOVERY - Host ripe-atlas-ulsfo IPv6 is UP: PING OK - Packet loss = 0%, RTA = 78.77 ms [10:45:02] NotASpy: I'll take a look now, can you try again out of curiosity? [10:45:29] yeah, still not working [10:47:52] _joe_: mobrovac: I have just copy pasted the list of packages in ci https://gerrit.wikimedia.org/r/#/c/377424/1/dib/puppet/ciimage.pp [10:47:59] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 16 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [10:48:07] also noted, there's no JPG previews either. They return 404 errors. [10:48:18] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 89 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [10:48:38] RECOVERY - Host ripe-atlas-codfw IPv6 is UP: PING OK - Packet loss = 0%, RTA = 36.22 ms [10:51:04] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 9 probes of 276 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [10:53:14] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 8 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [10:55:28] 10Operations, 10Operations-Software-Development, 10Patch-For-Review, 10Technical-Debt: Remove Salt from wmf-auto-reimage / wmf-reimage - https://phabricator.wikimedia.org/T166300#3599846 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by volans on sarin.codfw.wmnet for hosts: ``` ['mc1002.eqiad... [10:55:39] NotASpy: yeah, I found what's up with it, I'll update the task [10:56:11] cheers. Do you think there could be more djvu files hiding with the same issue ? [11:00:42] 10Operations, 10media-storage: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3599853 (10fgiunchedi) The file is not accessible due to nginx on swift proxy machines yielding `upstream sent too big head... [11:01:13] NotASpy: could be, a pathological case but valid djvu nevertheless [11:02:28] (03PS2) 10Muehlenhoff: Remove salt grains for LVS [puppet] - 10https://gerrit.wikimedia.org/r/377291 [11:02:45] 10Operations, 10media-storage, 10User-fgiunchedi: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3599857 (10fgiunchedi) [11:02:46] that's good news, I was quite sure the uploader had managed to create corrupt djvu files. [11:03:49] NotASpy: I forced a download of the file but haven't looked into it, I can pass it along if you want to take a look if it is indeed corrupt [11:05:57] (03CR) 10Muehlenhoff: [C: 032] Remove salt grains for LVS [puppet] - 10https://gerrit.wikimedia.org/r/377291 (owner: 10Muehlenhoff) [11:07:33] don't know if I've anything which would allow me to check for corruption. I'll take a look and see what I can do. [11:08:34] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:10:32] 10Operations, 10Performance-Team: /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam - https://phabricator.wikimedia.org/T169249#3599864 (10fgiunchedi) @Gilles yeah looks like the spam does recur every now and then {F9479077} [11:10:46] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [11:14:33] <_joe_> godog: gee that's awful :P [11:17:20] _joe_: yep, processing arbitrary input FTW [11:18:11] 10Operations, 10media-storage, 10User-fgiunchedi: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3599866 (10fgiunchedi) I've downloaded the file Literature_II,_Harutyun_Surkhatian.djvu to check for c... [11:39:01] (03PS1) 10Alexandros Kosiaris: Add WMF http_proxies in build.sh [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377432 [11:44:59] (03PS2) 10Alexandros Kosiaris: Support supplying a default egress policy [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377421 (https://phabricator.wikimedia.org/T170111) [11:45:01] (03PS1) 10Alexandros Kosiaris: Better logging around json decode errors [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377433 [11:45:03] (03PS1) 10Alexandros Kosiaris: Add copyrights [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377434 [11:45:05] (03PS1) 10Alexandros Kosiaris: Fix dockerfile url to use our own images [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377435 [11:45:41] (03PS3) 10Alexandros Kosiaris: Support supplying a default egress policy [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377421 (https://phabricator.wikimedia.org/T170111) [11:45:43] (03PS2) 10Alexandros Kosiaris: Add WMF http_proxies in build.sh [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377432 [11:46:01] (03Abandoned) 10Alexandros Kosiaris: Fix dockerfile url to use our own images [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377435 (owner: 10Alexandros Kosiaris) [11:46:12] (03Abandoned) 10Alexandros Kosiaris: Add copyrights [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377434 (owner: 10Alexandros Kosiaris) [11:46:32] (03Abandoned) 10Alexandros Kosiaris: Better logging around json decode errors [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377433 (owner: 10Alexandros Kosiaris) [11:48:33] mark: ripe anchors are back online apparently [11:50:36] 10Operations, 10DBA, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3599905 (10jcrespo) [11:52:49] yeah [11:54:20] (03PS1) 10Faidon Liambotis: swift: use !~ instead of ! $title =~ /.../ [puppet] - 10https://gerrit.wikimedia.org/r/377436 [11:54:22] (03PS1) 10Faidon Liambotis: nutcracker: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377437 [11:54:24] (03PS1) 10Faidon Liambotis: uwsgi: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377438 [11:54:26] (03PS1) 10Faidon Liambotis: statsd_proxy: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377439 [11:54:28] (03PS1) 10Faidon Liambotis: Use String as redis::instance's $name (noop) [puppet] - 10https://gerrit.wikimedia.org/r/377440 [11:54:30] (03PS1) 10Faidon Liambotis: ganglia: fix class dependencies [puppet] - 10https://gerrit.wikimedia.org/r/377441 [11:54:32] (03PS1) 10Faidon Liambotis: openstack: move $ssl_settings near the template [puppet] - 10https://gerrit.wikimedia.org/r/377442 [11:55:07] _joe_: ^^^ [11:55:44] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3599907 (10jcrespo) p:05High>03Low 5 got rebuilt correctly, let's go with **Slot Number: 0** now (much lower priority). It has 12K errors. [11:55:52] <_joe_> paravoid: <3 [11:55:59] <_joe_> I'll look soon [11:59:21] (03PS1) 10BBlack: Add NUMA meminfo stats for cache+lvs nodes [puppet] - 10https://gerrit.wikimedia.org/r/377443 (https://phabricator.wikimedia.org/T175636) [12:00:02] (03PS1) 10Faidon Liambotis: grafana: quote 'type' as the class' parameter [puppet] - 10https://gerrit.wikimedia.org/r/377444 [12:01:52] (03CR) 10BBlack: [C: 032] Add NUMA meminfo stats for cache+lvs nodes [puppet] - 10https://gerrit.wikimedia.org/r/377443 (https://phabricator.wikimedia.org/T175636) (owner: 10BBlack) [12:02:27] (03PS2) 10DCausse: [cirrus] Force native script for super noop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376027 (https://phabricator.wikimedia.org/T174652) [12:02:43] gehel: speaking of future parser, any progress with the cassandra stuff? [12:02:55] I completely skipped over those, because I think you were working on them? [12:03:17] paravoid: i should ping joe again so that he can tell me I did a crappy job :) [12:03:28] lol [12:03:31] (03CR) 10Giuseppe Lavagetto: [C: 031] swift: use !~ instead of ! $title =~ /.../ [puppet] - 10https://gerrit.wikimedia.org/r/377436 (owner: 10Faidon Liambotis) [12:03:38] is there a patch queued or something? [12:03:48] I have a CR that seems to work, lemme find it [12:04:01] https://gerrit.wikimedia.org/r/#/c/372124/ [12:04:30] (03PS5) 10Ema: [WIP] stabilize backend storage patterns [puppet] - 10https://gerrit.wikimedia.org/r/376751 (owner: 10BBlack) [12:04:35] oh cool [12:04:43] 10Operations, 10media-storage, 10User-fgiunchedi: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3599963 (10Gilles) We've abandoned X-Content-Dimensions, so I think we need to look at how we can clea... [12:04:46] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM, but see the comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377437 (owner: 10Faidon Liambotis) [12:05:14] _joe_: I didn't add that, because we'll have to replace all validate_numerics anyway [12:05:18] they're deprecated [12:05:23] and issue warnings as it is [12:05:37] so purely grepping for "validate_numeric" should suffice [12:05:38] <_joe_> ok [12:05:42] but happy to add the comment if you want too :) [12:05:46] <_joe_> actually validate_* [12:05:47] <_joe_> :P [12:05:50] yeah, sure [12:05:53] <_joe_> but yeah, you're right [12:06:00] 10Operations, 10DBA, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3599981 (10jcrespo) [12:06:05] <_joe_> those can be done in a single sweep [12:06:17] also validate_integer is the right one here, but I didn't see it being used anywhere else so I avoided it [12:06:41] basically everywhere we use validate_numeric, we expect integers I think [12:06:45] but whatever :) [12:06:49] it's all temporary anyway [12:08:38] <_joe_> validate_numeric is more flexible IIRC [12:11:51] validate_numeric accepts Float too, that's the difference AIUI [12:12:19] (03CR) 10Mobrovac: [C: 031] uwsgi: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377438 (owner: 10Faidon Liambotis) [12:14:40] (03CR) 10Giuseppe Lavagetto: [C: 031] statsd_proxy: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377439 (owner: 10Faidon Liambotis) [12:28:30] (03CR) 10Giuseppe Lavagetto: [C: 04-1] openstack: move $ssl_settings near the template (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377442 (owner: 10Faidon Liambotis) [12:31:29] 10Operations, 10Collection, 10OfflineContentGenerator, 10Readers-Web-Backlog, and 2 others: Remove deprecated features from book creator UI - https://phabricator.wikimedia.org/T150917#3600039 (10ovasileva) [12:31:44] (03CR) 10Giuseppe Lavagetto: [C: 031] grafana: quote 'type' as the class' parameter [puppet] - 10https://gerrit.wikimedia.org/r/377444 (owner: 10Faidon Liambotis) [12:32:44] (03CR) 10Muehlenhoff: [C: 04-1] "All of these three users are not part of any further groups, so it makes sense to disable shell access entirely by moving them to groups:a" [puppet] - 10https://gerrit.wikimedia.org/r/377446 (https://phabricator.wikimedia.org/T170878) (owner: 10Elukey) [12:34:24] (03PS2) 10Faidon Liambotis: openstack: move $ssl_settings near the template [puppet] - 10https://gerrit.wikimedia.org/r/377442 [12:34:26] (03PS2) 10Faidon Liambotis: grafana: quote 'type' as the class' parameter [puppet] - 10https://gerrit.wikimedia.org/r/377444 [12:34:28] (03PS3) 10Faidon Liambotis: nagios_common: use the template if empty($content) [puppet] - 10https://gerrit.wikimedia.org/r/377445 [12:35:07] (03CR) 10Giuseppe Lavagetto: [C: 031] nagios_common: use the template if empty($content) [puppet] - 10https://gerrit.wikimedia.org/r/377445 (owner: 10Faidon Liambotis) [12:35:59] (03CR) 10Giuseppe Lavagetto: [C: 031] service::node: move to puppet 4 compatible parameter validation [puppet] - 10https://gerrit.wikimedia.org/r/372063 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [12:36:21] (03CR) 10jerkins-bot: [V: 04-1] nagios_common: use the template if empty($content) [puppet] - 10https://gerrit.wikimedia.org/r/377445 (owner: 10Faidon Liambotis) [12:37:10] <_joe_> paravoid: I clearly didn't catch the failing tests in nagios_common when I prepared the rakefile :( [12:37:27] <_joe_> I'll either fix the tests or add the module to the excluded ones for spec running [12:37:30] what do you mean? [12:37:36] (03PS2) 10Elukey: Remove access to analytics posix groups for users not needing them. [puppet] - 10https://gerrit.wikimedia.org/r/377446 (https://phabricator.wikimedia.org/T170878) [12:37:43] oh you mean these aren't related to my change [12:37:56] <_joe_> nope [12:38:07] <_joe_> how could they be? [12:38:11] <_joe_> let me check though [12:38:53] (03PS1) 10KartikMistry: apertium-crh-tur: Initial Debian packaging [debs/contenttranslation/apertium-crh-tur] - 10https://gerrit.wikimedia.org/r/377449 (https://phabricator.wikimedia.org/T174765) [12:38:59] (03CR) 10jerkins-bot: [V: 04-1] apertium-crh-tur: Initial Debian packaging [debs/contenttranslation/apertium-crh-tur] - 10https://gerrit.wikimedia.org/r/377449 (https://phabricator.wikimedia.org/T174765) (owner: 10KartikMistry) [12:39:24] (03PS1) 10Aaron Schulz: [WIP] Avoid warnings for invalid lines in reverse-stack mode [puppet] - 10https://gerrit.wikimedia.org/r/377451 (https://phabricator.wikimedia.org/T169249) [12:39:32] <_joe_> uhm, actually it's your change I guess? [12:39:39] <_joe_> how can that be? [12:39:41] <_joe_> wtf? [12:39:58] 12:20:36 Unknown function empty at /tmp/cache/puppet/modules/monitoring/spec/fixtures/modules/nagios_common/manifests/contacts.pp:50 on node 42c6e6a22fe4.integration.eqiad.wmflabs [12:40:48] <_joe_> ah [12:41:05] <_joe_> ok, that's fixable adding stdlib to the dependencies of nagios_common [12:41:08] it's a stdlib function, not sure why it wouldn't be visible to rspec? [12:41:11] oh! [12:41:13] of course [12:41:19] <_joe_> because you have a mechanism to do that [12:41:31] (03PS4) 10Alexandros Kosiaris: Support supplying a default egress policy [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377421 (https://phabricator.wikimedia.org/T170111) [12:41:33] (03PS3) 10Alexandros Kosiaris: Add WMF http_proxies in build.sh [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377432 [12:41:59] nagios_common: "#{source_dir}" [12:41:59] + stdlib: "../../../../stdlib" [12:42:05] under modules/nagios_common/.fixtures.yml [12:42:06] right? [12:42:49] <_joe_> yes [12:42:51] <_joe_> IIRC [12:43:17] yes that's it [12:43:18] (03PS4) 10Faidon Liambotis: nagios_common: use the template if empty($content) [puppet] - 10https://gerrit.wikimedia.org/r/377445 [12:43:34] alright [12:44:13] (03CR) 10BBlack: [C: 031] varnish::instance: fix template attributes scope [puppet] - 10https://gerrit.wikimedia.org/r/376242 (owner: 10Ema) [12:44:31] akosiaris: https://puppet-compiler.wmflabs.org/compiler02/7683/kubernetes2004.codfw.wmnet//index-future.html + https://puppet-compiler.wmflabs.org/compiler02/7683/neon.eqiad.wmnet//index-future.html [12:44:39] while I'm keeping _joe_ busy :P [12:45:09] (03CR) 10jerkins-bot: [V: 04-1] nagios_common: use the template if empty($content) [puppet] - 10https://gerrit.wikimedia.org/r/377445 (owner: 10Faidon Liambotis) [12:46:11] <_joe_> the latter is just a case of undef being treated differently by the two parsers maybe [12:46:28] the former is k8s::proxy's $master_host being referenced in a template included by k8s::infrastructure_config [12:46:29] <_joe_> if a parameter is undef, the new parser will not add it to the catalog [12:46:40] <_joe_> adn k8s::kubelet [12:46:46] <_joe_> it comes from two places [12:46:52] <_joe_> I can fix that [12:46:57] k [12:47:44] (03CR) 10Muehlenhoff: [C: 04-1] Remove access to analytics posix groups for users not needing them. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377446 (https://phabricator.wikimedia.org/T170878) (owner: 10Elukey) [12:47:55] the stdlib fix didn't work [12:47:59] PROBLEM - Check size of conntrack table on labnet1001 is CRITICAL: CRITICAL: nf_conntrack is 92 % full [12:48:13] I'm not sure why the path is ../../../../stdlib and not ../stdlib, I just copied it from modules/nrpe [12:48:28] akosiaris: ^ [12:48:36] * elukey hides in shame [12:49:50] (03PS3) 10Elukey: Remove access to analytics posix groups for users not needing them. [puppet] - 10https://gerrit.wikimedia.org/r/377446 (https://phabricator.wikimedia.org/T170878) [12:51:00] RECOVERY - Check size of conntrack table on labnet1001 is OK: OK: nf_conntrack is 77 % full [12:51:11] <_joe_> gehel: the cassandra patch is a behemoth [12:51:28] <_joe_> but it does exactly what I wanted to do vaguely since I had to put my hands on it [12:51:43] _joe_: yeah, sorry for the size, but it was hard to split... [12:51:55] <_joe_> :) yes don't worry [12:51:59] <_joe_> I was praising your work [12:52:19] thanks! [12:52:20] <_joe_> can you imagine doing such a patch without the catalog differ? [12:52:49] see the number of patches in that change... you have your answer :) [12:53:36] (03PS2) 10Aaron Schulz: [WIP] Avoid warnings for invalid lines in reverse-stack mode [puppet] - 10https://gerrit.wikimedia.org/r/377451 (https://phabricator.wikimedia.org/T169249) [12:53:44] tbh, I'm still going to be scared to merge that one :) [12:53:55] (03PS3) 10Aaron Schulz: Avoid warnings for invalid lines in reverse-stack mode [puppet] - 10https://gerrit.wikimedia.org/r/377451 (https://phabricator.wikimedia.org/T169249) [12:54:20] <_joe_> gehel: I'll rebase it by hand [12:54:32] damn, good luck! [12:54:37] <_joe_> heh [12:54:47] if you have something more pressing, I'm happy to do the rebase [12:55:06] <_joe_> I just have N patches to review :P [12:55:26] <_joe_> but this is the last big hurdle to overcome [12:55:28] _joe_: go do your reviews, I'll do the rebase [12:55:32] <_joe_> ok [12:55:34] <_joe_> thanks [12:59:33] (03CR) 10Giuseppe Lavagetto: [C: 031] "This might not be the most elegant approach (I think we could refactor the whole caches puppet to become a bit less convoluted, once we ha" [puppet] - 10https://gerrit.wikimedia.org/r/376242 (owner: 10Ema) [12:59:35] (03PS1) 10Jcrespo: mariadb: Depool db1059, pool db1097 as api with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377455 (https://phabricator.wikimedia.org/T175679) [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170912T1300). [13:00:04] Amir1, tabbycat, and stephanebisson: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:05] present o/ [13:00:12] o/ [13:00:35] hello [13:01:01] take 2 [13:01:10] take 4 for me [13:01:38] So it hasn't worked all of yesterday? [13:01:52] barely on morning swat [13:01:56] what happened yesterday night? [13:01:58] no on EU/Evening [13:02:10] should be fixed *cross fingers* [13:02:11] cause the deployment calendar says the Eveniing one had to be canceled due to some jenkins issue [13:03:28] Amir1: https://gerrit.wikimedia.org/r/#/c/376562/ got reverted. Can you handle the cherry pick + changing the Change-Id please ? :) [13:03:52] Amir1: I have CR+2 the Wikidata change that tweaks some css. Will deploy once it is merged [13:04:14] stephanebisson: your is in the pipe [13:04:26] hashar: Thanks. I forgot that [13:05:04] hashar: yep, let me know when/where to test [13:05:23] stephanebisson: that would be on mwdebug1001 at some point :] [13:06:08] (03PS1) 10Ladsgroup: Reduce wikiPageUpdaterDbBatchSize to 20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377458 (https://phabricator.wikimedia.org/T173710) [13:06:19] hashar: https://gerrit.wikimedia.org/r/377458 [13:06:35] (03PS3) 10Gehel: redis - instance names should be strings in puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/369695 [13:06:37] (03PS4) 10Gehel: service::node: move to puppet 4 compatible parameter validation [puppet] - 10https://gerrit.wikimedia.org/r/372063 (https://phabricator.wikimedia.org/T171704) [13:06:39] (03PS25) 10Gehel: cassandra - puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) [13:06:53] tabbycat: same for https://gerrit.wikimedia.org/r/#/c/376886/ and https://gerrit.wikimedia.org/r/#/c/376897/ they got merged and reverted. Can you please cherry pick them with a new change-id ? :) [13:07:02] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377458 (https://phabricator.wikimedia.org/T173710) (owner: 10Ladsgroup) [13:07:19] (03CR) 10jerkins-bot: [V: 04-1] service::node: move to puppet 4 compatible parameter validation [puppet] - 10https://gerrit.wikimedia.org/r/372063 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [13:07:21] hashar: I think I already did that? [13:07:28] (03CR) 10jerkins-bot: [V: 04-1] cassandra - puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [13:07:53] tabbycat: I guess that is just https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0September.C2.A012 pointing to the old changes :) [13:08:22] hashar: sh-t, yep, I just copy-pasted from above [13:08:35] https://gerrit.wikimedia.org/r/#/c/377275/ [13:08:59] https://gerrit.wikimedia.org/r/#/c/377274/ [13:09:15] (03CR) 10Muehlenhoff: [C: 04-1] "The user fields need to remain, but with "ensure: absent" and an empty ssh_keys field, e.g. check the entry for jhobs as an example. This " [puppet] - 10https://gerrit.wikimedia.org/r/377446 (https://phabricator.wikimedia.org/T170878) (owner: 10Elukey) [13:09:33] hashar: I can fix the links there in the meanwhile [13:09:38] but those should be the ones [13:09:56] 10Operations, 10Performance-Team, 10Patch-For-Review: /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam - https://phabricator.wikimedia.org/T169249#3391697 (10Gilles) a:03aaron [13:10:36] (03Merged) 10jenkins-bot: Reduce wikiPageUpdaterDbBatchSize to 20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377458 (https://phabricator.wikimedia.org/T173710) (owner: 10Ladsgroup) [13:10:45] (03CR) 10jenkins-bot: Reduce wikiPageUpdaterDbBatchSize to 20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377458 (https://phabricator.wikimedia.org/T173710) (owner: 10Ladsgroup) [13:11:20] (03Abandoned) 10Gehel: service::node: move to puppet 4 compatible parameter validation [puppet] - 10https://gerrit.wikimedia.org/r/372063 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [13:11:35] 10Operations, 10monitoring, 10Release-Engineering-Team (Watching / External), 10Tracking, 10Wikimedia-Incident: Tracking: Monitoring and alerts for "business" metrics - https://phabricator.wikimedia.org/T140942#3600174 (10Gilles) [13:11:37] Amir1: setting wikiPageUpdaterDbBatchSize = 20 [13:11:45] err setting > syncing [13:12:02] (03PS26) 10Gehel: cassandra - puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) [13:12:13] !log hashar@tin Synchronized wmf-config/Wikibase-production.php: Reduce wikiPageUpdaterDbBatchSize to 20 - T173710 (duration: 00m 45s) [13:12:13] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377227 (https://phabricator.wikimedia.org/T175534) (owner: 10MarcoAurelio) [13:12:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:25] T173710: Job queue is increasing non-stop - https://phabricator.wikimedia.org/T173710 [13:12:34] (03CR) 10jerkins-bot: [V: 04-1] cassandra - puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [13:12:38] hashar: I've fixed the numbers in wikitech [13:12:43] (03PS1) 10Giuseppe Lavagetto: k8s: fix template scoping [puppet] - 10https://gerrit.wikimedia.org/r/377459 (https://phabricator.wikimedia.org/T171704) [13:13:00] hashar: thanks. Not testable [13:13:05] <_joe_> paravoid: your turn ^^ [13:13:25] <_joe_> Amir1: for some values of "not testable" :P [13:13:26] heh [13:13:38] <_joe_> paravoid: I need to check labs before I can merge it [13:13:41] Amir1: and syncing the css fix up [13:13:54] <_joe_> I strongly suspect we're using some fucked up thing there [13:14:12] (03Merged) 10jenkins-bot: Lift account creation restrictions for WM Taiwan 10th anniversary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377227 (https://phabricator.wikimedia.org/T175534) (owner: 10MarcoAurelio) [13:14:29] stephanebisson: will sync your change next :] [13:14:41] Awesome! [13:14:59] (03PS4) 10Elukey: Remove access to analytics posix groups for users not needing them. [puppet] - 10https://gerrit.wikimedia.org/r/377446 (https://phabricator.wikimedia.org/T170878) [13:15:32] 10Operations, 10media-storage, 10User-fgiunchedi: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3600200 (10Nick) >>! In T173374#3599866, @fgiunchedi wrote: > I've downloaded the file Literature_II,_... [13:15:42] !log hashar@tin Synchronized php-1.30.0-wmf.17/extensions/Wikidata/extensions/Wikibase/view/resources/jquery/wikibase/themes/default/jquery.wikibase.entityselector.css: Remove red highlighting on all entity selectors - T175525 (duration: 00m 45s) [13:15:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:55] T175525: When editing a statement, unedited entity input fields are marked unrecognized once unfocused - https://phabricator.wikimedia.org/T175525 [13:16:39] stephanebisson: and the CentralAuth patch is finally on mwdebug1001 ! [13:16:51] _joe_: I don't think this works [13:16:54] you mean me? [13:17:01] I think "require" implies "include" and will fail [13:17:03] (03CR) 10jenkins-bot: Lift account creation restrictions for WM Taiwan 10th anniversary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377227 (https://phabricator.wikimedia.org/T175534) (owner: 10MarcoAurelio) [13:17:16] <_joe_> paravoid: it won't [13:17:28] <_joe_> paravoid: you need a class to be declared explicitly exactly once [13:17:36] hashar: I'm testing... [13:17:38] (03PS5) 10Elukey: Remove access to analytics posix groups for users not needing them. [puppet] - 10https://gerrit.wikimedia.org/r/377446 (https://phabricator.wikimedia.org/T170878) [13:17:48] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377274 (owner: 10MarcoAurelio) [13:17:59] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377275 (https://phabricator.wikimedia.org/T175356) (owner: 10MarcoAurelio) [13:18:02] !log hashar@tin Synchronized wmf-config/throttle.php: Lift account creation restrictions for WM Taiwan 10th anniversary - T175534 (duration: 00m 45s) [13:18:09] <_joe_> paravoid: https://puppet-compiler.wmflabs.org/compiler02/7814/kubernetes1004.eqiad.wmnet/ and https://puppet-compiler.wmflabs.org/compiler02/7814/index-future.html [13:18:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:16] T175534: Requesting temporary lift of IP cap for Wikimedia Taiwan 10th anniversary (16 & 17 Sept, 2017) - https://phabricator.wikimedia.org/T175534 [13:18:53] (03PS27) 10Gehel: cassandra - puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) [13:19:22] (03Merged) 10jenkins-bot: Add Extension:Newsletter permissions to CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377274 (owner: 10MarcoAurelio) [13:19:50] (03CR) 10jenkins-bot: Add Extension:Newsletter permissions to CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377274 (owner: 10MarcoAurelio) [13:20:38] (03Merged) 10jenkins-bot: Enable WikidataPageBanner for Russian Wikimedia chapter wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377275 (https://phabricator.wikimedia.org/T175356) (owner: 10MarcoAurelio) [13:21:36] tabbycat: for the ruwikimedia page banner, I guess it is all about browsing the site ? [13:21:44] tabbycat: I am going to add it to mwdebug1001 [13:21:54] hashar: I think so, please push to mwdebug and I'll test [13:21:55] _joe_: that works only if you define these in the right order I think, but I guess in this case it's ok [13:21:55] (03PS4) 10Aaron Schulz: Avoid perl warnings for invalid lines in reverse-stack mode [puppet] - 10https://gerrit.wikimedia.org/r/377451 (https://phabricator.wikimedia.org/T169249) [13:21:56] !log hashar@tin Synchronized wmf-config/CommonSettings.php: Add Extension:Newsletter permissions to CommonSettings (duration: 00m 45s) [13:22:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:16] I'll check the newsletter stuff by adding them to the steward group [13:22:16] tabbycat: ok it is on mwdebug1001 now [13:22:25] <_joe_> paravoid: that's why I prefer require to include [13:22:33] hashar: just the ruwikimedia, right? the commonsettings is live already? [13:22:37] (03CR) 10Faidon Liambotis: [C: 031] k8s: fix template scoping [puppet] - 10https://gerrit.wikimedia.org/r/377459 (https://phabricator.wikimedia.org/T171704) (owner: 10Giuseppe Lavagetto) [13:22:37] <_joe_> it forces you to have that class defined befor the other [13:22:45] (03CR) 10jenkins-bot: Enable WikidataPageBanner for Russian Wikimedia chapter wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377275 (https://phabricator.wikimedia.org/T175356) (owner: 10MarcoAurelio) [13:22:53] hashar: I see high db lag from mwdebug1001... but my patch seems to work [13:23:02] _joe_: "require" is "include" + Class->Class dependency, sadly :( [13:23:13] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/377446 (https://phabricator.wikimedia.org/T170878) (owner: 10Elukey) [13:23:37] stephanebisson: I guess , lets push it everywhere :) [13:23:38] _joe_: so if that class parameter had a default value, depending on the order, it might have been included without a parameter [13:23:47] (03CR) 10Elukey: [C: 032] Remove access to analytics posix groups for users not needing them. [puppet] - 10https://gerrit.wikimedia.org/r/377446 (https://phabricator.wikimedia.org/T170878) (owner: 10Elukey) [13:23:51] <_joe_> yes, but my point is it avoids the chance you do end up with unexpected behaviour, as long as you explicitly declare the class before the scope where you require that [13:23:52] _joe_: and then when the next inclusion with a (different) parameter happened, it would conflict and error out [13:24:19] <_joe_> brb [13:24:27] hashar: nothing seems broken at ruwm [13:24:35] I guess you can merge [13:24:52] tabbycat: great thank you [13:25:17] (03PS1) 10Jcrespo: Add new m1 host db2078, enable firewall on all misc services [puppet] - 10https://gerrit.wikimedia.org/r/377460 (https://phabricator.wikimedia.org/T175685) [13:25:33] !log hashar@tin Synchronized php-1.30.0-wmf.17/includes/changes/ChangesListBooleanFilter.php: WLFilters: Respect default values - T174725 (duration: 00m 45s) [13:25:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:47] T174725: Turn off 'classic' ORES highlighting on the Watchlist (with beta) - https://phabricator.wikimedia.org/T174725 [13:26:30] (03PS1) 10Muehlenhoff: Add apache2-dev to package list [puppet] - 10https://gerrit.wikimedia.org/r/377461 [13:26:36] (03PS5) 10Alexandros Kosiaris: Support supplying a default egress policy [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377421 (https://phabricator.wikimedia.org/T170111) [13:26:38] (03PS4) 10Alexandros Kosiaris: Add WMF http_proxies in build.sh [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377432 [13:26:40] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable WikidataPageBanner for Russian Wikimedia chapter wiki - T175356 (duration: 00m 45s) [13:26:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:52] T175356: Enable WikidataPageBanner extension on ru.wikimedia.org - https://phabricator.wikimedia.org/T175356 [13:28:04] Thanks hashar, I'm glad this is finally deployed [13:28:30] tabbycat: and the centralAuth patch, I have yet to find it merged in ( https://gerrit.wikimedia.org/r/#/c/377326/ ) :D [13:28:46] (03CR) 10Muehlenhoff: [C: 032] Add apache2-dev to package list [puppet] - 10https://gerrit.wikimedia.org/r/377461 (owner: 10Muehlenhoff) [13:30:14] (03PS2) 10Jcrespo: Add new m1 host db2078, enable firewall on all misc services [puppet] - 10https://gerrit.wikimedia.org/r/377460 (https://phabricator.wikimedia.org/T175685) [13:30:19] hashar: on php-1.30-wmf.17 [13:30:58] tabbycat: yeah sorry. I got confused because git log was showing the author (kunal) and I was expecting your name to show up :D [13:31:19] stephanebisson: thanks for thecherry pick and sorry for the troubles yesterday :( [13:31:23] !log hashar@tin Synchronized php-1.30.0-wmf.17/extensions/CentralAuth/includes/api/ApiSetGlobalAccountStatus.php: API: Unbreak setglobalaccountstatus locked=lock - T175462 (duration: 00m 44s) [13:31:26] so I believe I got everything deploye [13:31:27] d [13:31:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:38] T175462: Inability to lock via setglobalaccountstatus API in mw 1.29 - https://phabricator.wikimedia.org/T175462 [13:34:38] (03PS16) 10Ema: varnish::instance: fix template attributes scope [puppet] - 10https://gerrit.wikimedia.org/r/376242 [13:36:07] (03CR) 10Ema: [C: 032] varnish::instance: fix template attributes scope [puppet] - 10https://gerrit.wikimedia.org/r/376242 (owner: 10Ema) [13:36:34] hashar: tested the commonsettings patch merged right now and it works [13:36:41] will test the API lock right now too [13:37:02] 10Operations, 10Traffic, 10Patch-For-Review: Explicitly limit varnishd transient storage - https://phabricator.wikimedia.org/T164768#3600287 (10BBlack) We're still missing caps for the upload cluster, right? (well and misc, but that case isn't all that important here). I'm a little concerned about the inter... [13:37:28] !log pooling wdqs100[45] - T171210 [13:37:30] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wdqs1004.eqiad.wmnet [13:37:38] (03PS6) 10Alexandros Kosiaris: Support supplying a default egress policy [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377421 (https://phabricator.wikimedia.org/T170111) [13:37:39] 10Operations, 10media-storage, 10User-fgiunchedi: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3600290 (10fgiunchedi) >>! In T173374#3599963, @Gilles wrote: > We've abandoned X-Content-Dimensions,... [13:37:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:40] T171210: rack/setup/install wdqs100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T171210 [13:37:40] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wdqs1005.eqiad.wmnet [13:37:40] (03PS5) 10Alexandros Kosiaris: Add WMF http_proxies in build.sh [calico-k8s-policy-controller] (0.6.0) - 10https://gerrit.wikimedia.org/r/377432 [13:37:45] tabbycat: thank you for the extra tests :) [13:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:06] 10Operations, 10media-storage, 10User-fgiunchedi: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3600291 (10fgiunchedi) >>! In T173374#3600200, @Nick wrote: >>>! In T173374#3599866, @fgiunchedi wrote... [13:38:10] (03PS1) 10Giuseppe Lavagetto: Add missing secrets [labs/private] - 10https://gerrit.wikimedia.org/r/377465 [13:38:10] 15:37] StewardBot MarcoAurelio locked global account Example~metawiki with the following comment: API test [13:38:20] works as well [13:38:21] :D [13:38:22] <_joe_> paravoid: I'll start merging your patches if that's ok with you [13:38:29] <_joe_> and then run PCC on the whole tree again [13:38:46] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add missing secrets [labs/private] - 10https://gerrit.wikimedia.org/r/377465 (owner: 10Giuseppe Lavagetto) [13:38:47] I was about to say that :) [13:38:58] unfortunately I have to go now [13:39:06] and I'll be back in a couple of hours [13:39:09] 10Operations, 10media-storage, 10User-fgiunchedi: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3600293 (10Gilles) And I guess we can do the cleanup "just" for file types that can be multipage (TIFF... [13:39:13] dr's appointment [13:39:21] <_joe_> ttyl [13:39:22] (03PS28) 10Gehel: cassandra - puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) [13:41:20] (03PS2) 10Giuseppe Lavagetto: swift: use !~ instead of ! $title =~ /.../ [puppet] - 10https://gerrit.wikimedia.org/r/377436 (owner: 10Faidon Liambotis) [13:41:27] (03CR) 10Alexandros Kosiaris: [C: 031] "The entire infrastucture_config class will probably be re-evaluated when we work on authn/authz, seems this is good enough for now" [puppet] - 10https://gerrit.wikimedia.org/r/377459 (https://phabricator.wikimedia.org/T171704) (owner: 10Giuseppe Lavagetto) [13:43:32] (03PS2) 10Gehel: role::elasticsearch::(cirrus|relforge): move to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/374341 (https://phabricator.wikimedia.org/T171704) [13:43:39] paravoid: https://puppet-compiler.wmflabs.org/compiler02/7683/neon.eqiad.wmnet//index-future.html is due to the future parser's hiera function working with undef values [13:43:56] as in it used to not work reliably [13:44:00] let me find the task [13:44:23] https://tickets.puppetlabs.com/browse/PUP-3863 [13:44:38] hmm this is supposedly fixed in 3.8 [13:45:13] 10Operations, 10Performance-Team, 10Thumbor, 10User-fgiunchedi: Remove X-Content-Dimensions for multipage originals - https://phabricator.wikimedia.org/T175689#3600307 (10fgiunchedi) [13:45:14] (03CR) 10Giuseppe Lavagetto: [C: 032] swift: use !~ instead of ! $title =~ /.../ [puppet] - 10https://gerrit.wikimedia.org/r/377436 (owner: 10Faidon Liambotis) [13:45:31] (03PS2) 10Giuseppe Lavagetto: nutcracker: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377437 (owner: 10Faidon Liambotis) [13:46:12] ah .. finally [13:46:29] my changes to the calico policy controller work fine [13:46:50] <_joe_> :)) [13:47:02] <_joe_> akosiaris: too bad we will need to reimplement them in go [13:47:03] <_joe_> :P [13:47:19] (03CR) 10Giuseppe Lavagetto: [C: 032] nutcracker: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377437 (owner: 10Faidon Liambotis) [13:47:20] meh, I 'll just learn go at that point [13:47:26] 10Operations, 10Performance-Team, 10Thumbor, 10User-fgiunchedi: Remove X-Content-Dimensions for multipage originals - https://phabricator.wikimedia.org/T175689#3600327 (10Gilles) a:03Gilles [13:47:34] I am already 1 year late according to my plan [13:49:29] (03CR) 10Gehel: "puppet compiler looks happy on the few hosts I tested: https://puppet-compiler.wmflabs.org/compiler02/7816/" [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [13:49:59] <_joe_> akosiaris: me too! [13:51:18] (03PS2) 10Giuseppe Lavagetto: uwsgi: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377438 (owner: 10Faidon Liambotis) [13:52:35] (03CR) 10Giuseppe Lavagetto: [C: 032] uwsgi: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377438 (owner: 10Faidon Liambotis) [13:52:37] _joe_: I still have one strange diff with the future parser: https://puppet-compiler.wmflabs.org/compiler02/7816/aqs1004.eqiad.wmnet/index-future.html [13:52:50] * gehel has been staring at that code for too long, needs new eyes [13:52:58] <_joe_> uh [13:53:11] <_joe_> that seems something that gets fixed by the future parser :P [13:53:14] <_joe_> I'll look [13:53:20] lol [13:53:31] so ... we had no monitoring for a and b ? [13:53:32] or something that I broke with the current parser ... [13:53:43] (03PS2) 10Giuseppe Lavagetto: statsd_proxy: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377439 (owner: 10Faidon Liambotis) [13:54:09] <_joe_> yeah it's a complex patch :) [13:54:22] (03CR) 10Giuseppe Lavagetto: [C: 032] statsd_proxy: use validate_numeric() [puppet] - 10https://gerrit.wikimedia.org/r/377439 (owner: 10Faidon Liambotis) [13:55:14] _joe_: thanks for the help [13:55:25] <_joe_> I did nothing :P [13:55:39] yeah, but I'm counting on you to do something :) [13:55:48] (03PS4) 10Gehel: redis - instance names should be strings in puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/369695 [13:56:25] (03CR) 10Gehel: [C: 032] redis - instance names should be strings in puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/369695 (owner: 10Gehel) [13:56:29] Is swat done? [13:56:50] (03PS2) 10Jcrespo: mariadb: Depool db1059, pool db1097 as api with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377455 (https://phabricator.wikimedia.org/T175679) [13:58:20] <_joe_> gehel: ouch I had a competing patch to merge [13:58:32] _joe_: oops, sorry... [13:58:45] <_joe_> can you revert? paravoid's patch was more complete :) [13:59:02] sure [13:59:09] <_joe_> https://gerrit.wikimedia.org/r/#/c/377440/1 [13:59:14] <_joe_> well I can rebase instead [13:59:16] <_joe_> I'll do that [13:59:27] ok, thanks! [13:59:54] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1059, pool db1097 as api with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377455 (https://phabricator.wikimedia.org/T175679) (owner: 10Jcrespo) [14:00:10] yeah, my patch was only looking at my small part of the world... [14:00:18] hashar: just to double check, SWAT is over right? [14:00:28] volans: yes ! [14:00:33] thx! [14:01:31] (03Merged) 10jenkins-bot: mariadb: Depool db1059, pool db1097 as api with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377455 (https://phabricator.wikimedia.org/T175679) (owner: 10Jcrespo) [14:01:41] (03CR) 10jenkins-bot: mariadb: Depool db1059, pool db1097 as api with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377455 (https://phabricator.wikimedia.org/T175679) (owner: 10Jcrespo) [14:02:18] !log switching elasticsearch plugin deployment to .deb instead of scap on cirrus/codfw - T158560 [14:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:32] T158560: Use debian packages instead of salt to deploy elasticsearch plugins - https://phabricator.wikimedia.org/T158560 [14:04:35] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1059, pool db1097 with low load (duration: 00m 45s) [14:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:51] (03CR) 10Bmansurov: [C: 031] Remove unnecessary `id` attributes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377406 (https://phabricator.wikimedia.org/T175670) (owner: 10VolkerE) [14:04:58] (03PS2) 10Giuseppe Lavagetto: Use String as redis::instance's $name (noop) [puppet] - 10https://gerrit.wikimedia.org/r/377440 (owner: 10Faidon Liambotis) [14:05:59] !log restarting elasticsearch on elastic2001 to validate new plugin deployment - T158560 [14:06:06] (03CR) 10Giuseppe Lavagetto: [C: 032] Use String as redis::instance's $name (noop) [puppet] - 10https://gerrit.wikimedia.org/r/377440 (owner: 10Faidon Liambotis) [14:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:14] !log testing reimage of mw2100 - T166300 [14:06:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:27] T166300: Remove Salt from wmf-auto-reimage / wmf-reimage - https://phabricator.wikimedia.org/T166300 [14:07:25] !log switching elasticsearch plugin deployment to .deb instead of scap on cirrus/eqiad - T158560 [14:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:37] T158560: Use debian packages instead of salt to deploy elasticsearch plugins - https://phabricator.wikimedia.org/T158560 [14:07:45] (03PS1) 10Jcrespo: mariadb: Remove all references of db1059 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377468 (https://phabricator.wikimedia.org/T175679) [14:08:01] (03PS1) 10Muehlenhoff: Create a repository component component/ci [puppet] - 10https://gerrit.wikimedia.org/r/377469 [14:08:28] (03PS4) 10Gehel: elasticsearch - deploy plugins with debian package instead of trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/375812 (https://phabricator.wikimedia.org/T158560) [14:09:18] 10Operations, 10Goal, 10Kubernetes, 10Patch-For-Review, and 2 others: Implement a pod networking policy approach - https://phabricator.wikimedia.org/T170111#3600392 (10akosiaris) > From here things become a little bit more obscure as Updates to the ConfigMap will make it to the policy controller and it wil... [14:09:28] (03PS1) 10Alexandros Kosiaris: Ship the default egress policy [puppet] - 10https://gerrit.wikimedia.org/r/377470 (https://phabricator.wikimedia.org/T170111) [14:09:38] (03PS2) 10Giuseppe Lavagetto: ganglia: fix class dependencies [puppet] - 10https://gerrit.wikimedia.org/r/377441 (owner: 10Faidon Liambotis) [14:10:16] !log restarting elasticsearch on elastic1020 to validate new plugin deployment - T158560 [14:10:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:41] (03PS2) 10Jcrespo: install_server: Remove db1069 & dbstore2001 from the list of reimaging [puppet] - 10https://gerrit.wikimedia.org/r/373309 [14:11:46] (03CR) 10Jcrespo: [C: 032] install_server: Remove db1069 & dbstore2001 from the list of reimaging [puppet] - 10https://gerrit.wikimedia.org/r/373309 (owner: 10Jcrespo) [14:12:52] 10Operations, 10Operations-Software-Development, 10Patch-For-Review, 10Technical-Debt: Remove Salt from wmf-auto-reimage / wmf-reimage - https://phabricator.wikimedia.org/T166300#3600402 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by volans on sarin.codfw.wmnet for hosts: ``` mw2100.codfw.w... [14:12:57] 10Operations, 10Operations-Software-Development, 10Patch-For-Review, 10Technical-Debt: Remove Salt from wmf-auto-reimage / wmf-reimage - https://phabricator.wikimedia.org/T166300#3600404 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw2100.codfw.wmnet'] ``` Of which those **FAILED**: ``` [... [14:13:58] (03CR) 10Jcrespo: [C: 032] mariadb: Remove all references of db1059 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377468 (https://phabricator.wikimedia.org/T175679) (owner: 10Jcrespo) [14:15:38] (03Merged) 10jenkins-bot: mariadb: Remove all references of db1059 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377468 (https://phabricator.wikimedia.org/T175679) (owner: 10Jcrespo) [14:15:52] (03PS1) 10Elukey: admin::data.yaml: Set cwdent to ldap user only [puppet] - 10https://gerrit.wikimedia.org/r/377472 (https://phabricator.wikimedia.org/T170878) [14:16:31] (03CR) 10jenkins-bot: mariadb: Remove all references of db1059 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377468 (https://phabricator.wikimedia.org/T175679) (owner: 10Jcrespo) [14:16:46] (03PS1) 10MarcoAurelio: New 'abusefilter-helper' configuration for en.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377473 (https://phabricator.wikimedia.org/T175684) [14:17:11] (03PS2) 10MarcoAurelio: New 'abusefilter-helper' configuration for en.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377473 (https://phabricator.wikimedia.org/T175684) [14:18:35] !log jynus@tin Synchronized wmf-config/db-codfw.php: Remove all references to db1059 (duration: 00m 45s) [14:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:28] 10Operations, 10Operations-Software-Development, 10Patch-For-Review, 10Technical-Debt: Remove Salt from wmf-auto-reimage / wmf-reimage - https://phabricator.wikimedia.org/T166300#3600433 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by volans on sarin.codfw.wmnet for hosts: ``` mw2100.codfw.w... [14:19:35] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Remove all references to db1059 (duration: 00m 44s) [14:19:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:26] (03PS3) 10Giuseppe Lavagetto: ganglia: fix class dependencies [puppet] - 10https://gerrit.wikimedia.org/r/377441 (owner: 10Faidon Liambotis) [14:22:23] (03PS3) 10Jcrespo: Add new m1 host db2078, enable firewall on all misc services [puppet] - 10https://gerrit.wikimedia.org/r/377460 (https://phabricator.wikimedia.org/T175685) [14:22:25] (03PS1) 10Jcrespo: mariadb: Move db1059 from mediawiki to misc (m3) [puppet] - 10https://gerrit.wikimedia.org/r/377474 (https://phabricator.wikimedia.org/T175679) [14:23:05] (03PS5) 10Gehel: elasticsearch - deploy plugins with debian package instead of trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/375812 (https://phabricator.wikimedia.org/T158560) [14:23:14] (03CR) 10Giuseppe Lavagetto: [C: 032] ganglia: fix class dependencies [puppet] - 10https://gerrit.wikimedia.org/r/377441 (owner: 10Faidon Liambotis) [14:23:31] (03PS3) 10Giuseppe Lavagetto: openstack: move $ssl_settings near the template [puppet] - 10https://gerrit.wikimedia.org/r/377442 (owner: 10Faidon Liambotis) [14:23:59] (03CR) 10Giuseppe Lavagetto: [C: 032] openstack: move $ssl_settings near the template [puppet] - 10https://gerrit.wikimedia.org/r/377442 (owner: 10Faidon Liambotis) [14:25:26] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3363320 (10akosiaris) LGTM. Minor nitpick: I love comments about which hostname IPs match to . e.g. ``` term puppet { from {... [14:25:58] (03PS1) 10Hashar: nodepool: remove trusty image [puppet] - 10https://gerrit.wikimedia.org/r/377476 (https://phabricator.wikimedia.org/T175696) [14:26:04] 10Operations, 10Traffic, 10Patch-For-Review: Explicitly limit varnishd transient storage - https://phabricator.wikimedia.org/T164768#3600514 (10ema) >>! In T164768#3600287, @BBlack wrote: > We're still missing caps for the upload cluster, right? (well and misc, but that case isn't all that important here).... [14:26:59] (03CR) 10Hashar: [C: 031] "Once Nodepool catch up with that new configuration, I will delete the leftover trusty instances/images." [puppet] - 10https://gerrit.wikimedia.org/r/377476 (https://phabricator.wikimedia.org/T175696) (owner: 10Hashar) [14:27:42] (03PS2) 10Jcrespo: mariadb: Move db1059 from mediawiki to misc (m3) [puppet] - 10https://gerrit.wikimedia.org/r/377474 (https://phabricator.wikimedia.org/T175679) [14:28:14] (03PS6) 10Gehel: elasticsearch - deploy plugins with debian package instead of trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/375812 (https://phabricator.wikimedia.org/T158560) [14:28:55] (03PS3) 10Giuseppe Lavagetto: grafana: quote 'type' as the class' parameter [puppet] - 10https://gerrit.wikimedia.org/r/377444 (owner: 10Faidon Liambotis) [14:29:43] (03CR) 10Giuseppe Lavagetto: [C: 032] grafana: quote 'type' as the class' parameter [puppet] - 10https://gerrit.wikimedia.org/r/377444 (owner: 10Faidon Liambotis) [14:32:25] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/377472 (https://phabricator.wikimedia.org/T170878) (owner: 10Elukey) [14:34:09] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2087253 [14:34:18] 10Operations, 10ops-eqiad, 10DC-Ops: scs-c1-eqiad unresponsive - https://phabricator.wikimedia.org/T175625#3600549 (10Cmjohnson) scs-c1-eqiad is dead! No power, swapped power cable, tried different power port. We need to replace this ASAP. [14:35:28] 10Operations, 10Goal, 10Kubernetes, 10Patch-For-Review, and 2 others: Implement a pod networking policy approach - https://phabricator.wikimedia.org/T170111#3600554 (10akosiaris) This is practically done. I 'll wait a bit for some review on the above commits before merging them. Those commits above impleme... [14:38:34] 10Operations, 10Goal, 10Kubernetes, 10Patch-For-Review, and 2 others: Implement a pod networking policy approach - https://phabricator.wikimedia.org/T170111#3600569 (10akosiaris) [14:38:46] (03PS5) 10Giuseppe Lavagetto: nagios_common: use the template if empty($content) [puppet] - 10https://gerrit.wikimedia.org/r/377445 (owner: 10Faidon Liambotis) [14:40:27] (03CR) 10Giuseppe Lavagetto: [C: 032] nagios_common: use the template if empty($content) [puppet] - 10https://gerrit.wikimedia.org/r/377445 (owner: 10Faidon Liambotis) [14:40:36] (03PS2) 10Muehlenhoff: Create a repository component component/ci [puppet] - 10https://gerrit.wikimedia.org/r/377469 [14:41:29] (03PS2) 10Alexandros Kosiaris: Ship the default egress policy [puppet] - 10https://gerrit.wikimedia.org/r/377470 (https://phabricator.wikimedia.org/T170111) [14:42:56] !log cp1074 - backend restart, mailbox lag [14:43:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:46] !log add kafka-jumbo IPs to the kafka term of the analytics-in4 filter on cr1/cr2 eqiad [14:43:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:04] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [14:45:33] (03CR) 10Muehlenhoff: [C: 032] Create a repository component component/ci [puppet] - 10https://gerrit.wikimedia.org/r/377469 (owner: 10Muehlenhoff) [14:48:47] (03PS19) 10Rush: openstack: designate as module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/376848 (https://phabricator.wikimedia.org/T171494) [14:49:23] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3600621 (10elukey) Added them with annotations! [14:49:46] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3600623 (10Papaul) Disk replacement in slot 0 complete [14:50:20] 10Operations, 10TechCom-RfC, 10Traffic, 10Wikimedia-Developer-Summit-2016, and 5 others: RFC: API-driven web front-end - https://phabricator.wikimedia.org/T111588#3600637 (10Krinkle) [14:50:41] (03CR) 10DCausse: [C: 031] elasticsearch - deploy plugins with debian package instead of trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/375812 (https://phabricator.wikimedia.org/T158560) (owner: 10Gehel) [14:51:48] (03PS1) 10Jcrespo: mariadb: Microsecond view bug fixed, removing workaround [puppet] - 10https://gerrit.wikimedia.org/r/377480 [14:51:50] (03CR) 10Filippo Giunchedi: "LGTM overall, see inline for naming bikeshed. Also it looks like in PCC there are a few "requires" lost on some /etc directories, though t" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [14:52:02] (03PS2) 10Jcrespo: mariadb: Microsecond view bug fixed, removing workaround [puppet] - 10https://gerrit.wikimedia.org/r/377480 [14:52:32] (03PS3) 10Jcrespo: mariadb: Microsecond view bug fixed, removing workaround [puppet] - 10https://gerrit.wikimedia.org/r/377480 [14:52:59] (03CR) 10Gehel: "The missing "require" are actually auto required, so they were useless in the first place..." [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [14:53:06] !log shutting down mw2256 for maintenance [14:53:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:36] (03PS6) 10Ema: [WIP] stabilize backend storage patterns [puppet] - 10https://gerrit.wikimedia.org/r/376751 (owner: 10BBlack) [14:55:49] * elukey cheers for papaul fighting with mw2256 [14:56:02] (03CR) 10Jcrespo: [C: 032] mariadb: Microsecond view bug fixed, removing workaround [puppet] - 10https://gerrit.wikimedia.org/r/377480 (owner: 10Jcrespo) [14:59:23] (03Draft1) 10MarcoAurelio: Temporary lift account creation limits for WM United Kingdom workshop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377481 (https://phabricator.wikimedia.org/T175700) [14:59:32] (03PS2) 10MarcoAurelio: Temporary lift account creation limits for WM United Kingdom workshop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377481 (https://phabricator.wikimedia.org/T175700) [15:01:18] (03CR) 10MarcoAurelio: "Urbanecm: I am not sure about the +1:00 would work as requested. Can you please review this patch set? Thank you." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377481 (https://phabricator.wikimedia.org/T175700) (owner: 10MarcoAurelio) [15:02:14] godog: can I get a review of https://gerrit.wikimedia.org/r/#/c/377332/ ? and/or thoughts about when to deploy? [15:03:34] (03Abandoned) 10Jcrespo: [WIP]mariadb: Include a new option "socket" for all servers [puppet] - 10https://gerrit.wikimedia.org/r/339004 (owner: 10Jcrespo) [15:04:04] !log uploaded php5.5 5.5.38 to component/ci of jessie-wikimedia (for use in CI) [15:04:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:33] andrewbogott: yep, was the puppet compiler happy about it? [15:04:56] godog: I tried it on a labs VM and it worked ok; haven't run it through the compiler [15:05:05] I can do that now, hang on... [15:06:02] !log disable puppet for lab* things for designate refactor rolleout (delayed a few minutes for a meeting) [15:06:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:00] (03CR) 10Elukey: [C: 032] admin::data.yaml: Set cwdent to ldap user only [puppet] - 10https://gerrit.wikimedia.org/r/377472 (https://phabricator.wikimedia.org/T170878) (owner: 10Elukey) [15:07:06] (03PS2) 10Elukey: admin::data.yaml: Set cwdent to ldap user only [puppet] - 10https://gerrit.wikimedia.org/r/377472 (https://phabricator.wikimedia.org/T170878) [15:07:20] 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175704#3600716 (10ops-monitoring-bot) [15:07:34] (03PS4) 10Jcrespo: Add new m1 host db2078, enable firewall on all misc services [puppet] - 10https://gerrit.wikimedia.org/r/377460 (https://phabricator.wikimedia.org/T175685) [15:07:36] (03PS3) 10Jcrespo: mariadb: Move db1059 from mediawiki to misc (m3) [puppet] - 10https://gerrit.wikimedia.org/r/377474 (https://phabricator.wikimedia.org/T175679) [15:07:38] (03PS1) 10Jcrespo: mariadb: Fix typo on view creation query (missing `) [puppet] - 10https://gerrit.wikimedia.org/r/377484 [15:08:07] ^it is complaining, not sure if it is the substitution, a new thing or the rebuild [15:09:12] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3600731 (10jcrespo) [15:09:14] 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175704#3600733 (10jcrespo) [15:09:22] (03CR) 10Krinkle: "Regarding sslNeg bugging nav timing, this gets worse with Firefox 56 implementing the prop, too:" [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [15:10:43] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3587259 (10jcrespo) Still on Firmware state: Rebuild, we will wait a bit for the next one. (I am a bit more cautions than I have to be due to the RAID 10 because the disks are not new, so there is a change f... [15:11:27] (03PS2) 10Jcrespo: mariadb: Fix typo on view creation query (missing `) [puppet] - 10https://gerrit.wikimedia.org/r/377484 [15:15:23] (03CR) 10Filippo Giunchedi: "Overall LGTM, couple of comments inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/377332 (https://phabricator.wikimedia.org/T151009) (owner: 10Andrew Bogott) [15:19:21] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3600771 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by bblack on neodymium.eqiad.wmnet for hosts: ``` ['lvs1007.eqiad.wmnet'] ``` The log can be found in `/var/log/wm... [15:20:54] (03CR) 10Andrew Bogott: prometheus::web to apache (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377332 (https://phabricator.wikimedia.org/T151009) (owner: 10Andrew Bogott) [15:21:41] (03CR) 10Jcrespo: [C: 032] mariadb: Fix typo on view creation query (missing `) [puppet] - 10https://gerrit.wikimedia.org/r/377484 (owner: 10Jcrespo) [15:22:10] 10Operations, 10Operations-Software-Development, 10Patch-For-Review, 10Technical-Debt: Remove Salt from wmf-auto-reimage / wmf-reimage - https://phabricator.wikimedia.org/T166300#3600800 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw2100.codfw.wmnet'] ``` Of which those **FAILED**: ``` [... [15:22:43] anyone notice icinga dropped out and never came back? [15:24:02] hmm yeah, gone for a while now [15:24:09] 14:53 -!- icinga-wm [~icinga-wm@einsteinium.wikimedia.org] has quit [Ping timeout: 240 seconds] [15:24:17] 31 mins? [15:25:48] there was a nagios-related commit not that long before it [15:25:52] https://gerrit.wikimedia.org/r/#/c/377445/ [15:26:00] (03CR) 10Filippo Giunchedi: prometheus::web to apache (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377332 (https://phabricator.wikimedia.org/T151009) (owner: 10Andrew Bogott) [15:26:21] doesn' [15:26:29] doesn't seem likely to be a problematic one, though [15:28:56] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3600824 (10Cmjohnson) The CPU in slot 2 has been replaced and racadm log cleared. Please let me know if additional problems pop up. Return shipping info of old part USPS 92... [15:31:16] (03CR) 10Thcipriani: [C: 031] Remove salt grains from app server canaries [puppet] - 10https://gerrit.wikimedia.org/r/377222 (owner: 10Muehlenhoff) [15:31:27] (03PS7) 10Gehel: elasticsearch - deploy plugins with debian package instead of trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/375812 (https://phabricator.wikimedia.org/T158560) [15:33:36] bblack: probably unrelated but my tests with the reimage script made a puppet run on einsteinium that completed successfully at 15:17:54 [15:33:49] (03CR) 10Gehel: [C: 032] elasticsearch - deploy plugins with debian package instead of trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/375812 (https://phabricator.wikimedia.org/T158560) (owner: 10Gehel) [15:34:03] it might be one of those cases in which the process is still up but disconnected [15:34:06] and doesn't get restarted [15:34:53] (03PS1) 10Jcrespo: mariadb: Set db1097 as the main api server for s4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377489 [15:35:39] !log restart elasticsearch on relforge (last check of new plugin deployment) - T158560 [15:35:46] (03PS3) 10EBernhardson: Setup Cirrus MLR models for top 20 language AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377393 [15:35:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:52] T158560: Use debian packages instead of salt to deploy elasticsearch plugins - https://phabricator.wikimedia.org/T158560 [15:35:53] (03PS2) 10Giuseppe Lavagetto: k8s: fix template scoping [puppet] - 10https://gerrit.wikimedia.org/r/377459 (https://phabricator.wikimedia.org/T171704) [15:37:20] (03CR) 10jerkins-bot: [V: 04-1] Setup Cirrus MLR models for top 20 language AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377393 (owner: 10EBernhardson) [15:37:48] (03PS4) 10EBernhardson: Setup Cirrus MLR models for top 20 language AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377393 [15:37:56] (03PS4) 10Jcrespo: mariadb: Move db1059 from mediawiki to misc (m3) [puppet] - 10https://gerrit.wikimedia.org/r/377474 (https://phabricator.wikimedia.org/T175679) [15:39:23] (03PS5) 10Jcrespo: mariadb: Move db1059 from mediawiki to misc (m3) [puppet] - 10https://gerrit.wikimedia.org/r/377474 (https://phabricator.wikimedia.org/T175679) [15:39:26] (03CR) 10jerkins-bot: [V: 04-1] Setup Cirrus MLR models for top 20 language AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377393 (owner: 10EBernhardson) [15:40:18] (03CR) 10Giuseppe Lavagetto: [C: 032] k8s: fix template scoping [puppet] - 10https://gerrit.wikimedia.org/r/377459 (https://phabricator.wikimedia.org/T171704) (owner: 10Giuseppe Lavagetto) [15:40:21] (03PS5) 10EBernhardson: Setup Cirrus MLR models for top 20 language AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377393 [15:40:28] bblack: can you restart ircecho? [15:40:33] bblack: that are seconds [15:40:45] oh, sry [15:42:12] I'm tied up on a serial console at present [15:42:48] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 2 others: NDA, MOU and LDAP (analytics cluster) for Shilad Sen - https://phabricator.wikimedia.org/T171988#3482327 (10elukey) Added the user `shiladsen` to the `nda` LDAP group to allow yarn/pivot/etc.. access. [15:43:02] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: Switch all hosts to the future parser - https://phabricator.wikimedia.org/T171704#3600865 (10Joe) We did a lot of work today on this, and I am thus running a new puppet compiler full run, which can be found here https://puppet-compiler.wmflabs.org/co... [15:43:50] <_joe_> !log restarted ircecho on einsteinium [15:43:58] <_joe_> Sagan: ^^ [15:44:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:40] (03CR) 10Mforns: Add cron to purge old mediawiki data snapshots (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) (owner: 10Nuria) [15:48:40] (03PS6) 10Jcrespo: mariadb: Move db1059 from mediawiki to misc (m3) [puppet] - 10https://gerrit.wikimedia.org/r/377474 (https://phabricator.wikimedia.org/T175679) [15:49:56] (03PS1) 10Thcipriani: CI: install docker-ce from download.docker.com [puppet] - 10https://gerrit.wikimedia.org/r/377492 (https://phabricator.wikimedia.org/T175293) [15:51:02] (03PS1) 10Giuseppe Lavagetto: role::snapshot::common: properly scope included classes [puppet] - 10https://gerrit.wikimedia.org/r/377493 (https://phabricator.wikimedia.org/T171704) [15:55:06] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3600884 (10elukey) Cleaned up placeholders lvm partitions, now the next steps are: 1) decide a TLS port for the Kafka cluster and wh... [15:56:29] 10Operations, 10ops-eqiad, 10DC-Ops: scs-c1-eqiad unresponsive - https://phabricator.wikimedia.org/T175625#3600890 (10RobH) a:05Cmjohnson>03RobH After chatting with sales to determine the warranty period, turns out its 4 years for opengear without any additional warranties. I've opened a support case vi... [15:59:06] (03CR) 10Jcrespo: [C: 032] "https://puppet-compiler.wmflabs.org/compiler03/7826/db1059.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/377474 (https://phabricator.wikimedia.org/T175679) (owner: 10Jcrespo) [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170912T1600). [16:00:05] Niharika, twentyafterfour, and Amir1: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:10] o/ [16:01:16] Whoops. twentyafterfour I'm still on the way to office. Will be there in about twenty minutes. I misread the time as 10 PST. [16:01:17] Hello puppet swat :) [16:01:38] Niharika: ok [16:01:41] godog: you have a cherry-picked patch on labs which breaks: https://gerrit.wikimedia.org/r/#/c/361648/ [16:01:53] godog: want me to rebase it? Is it still needed? [16:03:01] Niharika: ok [16:03:44] (03PS2) 1020after4: Scap: Allow phabricator as a source [puppet] - 10https://gerrit.wikimedia.org/r/376571 (owner: 10Thcipriani) [16:03:57] gehel: yes please rebase it if it isn't too nasty, otherwise I can take a look tomorrow to get rid of it [16:04:08] (03CR) 1020after4: [C: 031] Scap: Allow phabricator as a source [puppet] - 10https://gerrit.wikimedia.org/r/376571 (owner: 10Thcipriani) [16:04:17] godog: ok, I'll try (I have no idea what this does, so we'll see...) [16:04:49] so puppet swat, I'll start with Amir1's patch since it is easy and move on to the rest [16:05:00] twentyafterfour: You could take care of puppet steps on https://phabricator.wikimedia.org/T129134#3589830 (you might need to read some comments ahead of it) [16:05:18] no_justification: standby for gerrit wmf logo revert :P [16:05:49] Gerrit got a custom logo? :) [16:05:53] (03PS2) 10Filippo Giunchedi: Use new logo of WMF for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/377049 (https://phabricator.wikimedia.org/T174576) (owner: 10Ladsgroup) [16:06:25] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 2 others: NDA, MOU and LDAP (analytics cluster) for Shilad Sen - https://phabricator.wikimedia.org/T171988#3600914 (10Shilad) Indeed, I now have Yarn access! Thanks @elukey! [16:06:28] aye, the guideline-based one [16:06:34] (03CR) 10Filippo Giunchedi: [C: 032] Use new logo of WMF for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/377049 (https://phabricator.wikimedia.org/T174576) (owner: 10Ladsgroup) [16:06:43] Neat. [16:07:58] (03CR) 1020after4: [C: 031] Deploy scholarships with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/326461 (https://phabricator.wikimedia.org/T129134) (owner: 10Niharika29) [16:08:24] (03PS7) 10Gehel: swift: use implicit /dev/swift prefix for swift devices [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) (owner: 10Filippo Giunchedi) [16:08:29] Amir1: nope, I see the logo too high [16:08:45] (03CR) 1020after4: [C: 031] Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo [puppet] - 10https://gerrit.wikimedia.org/r/377304 (owner: 10Chad) [16:08:59] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Must pass master_host to Class[K8s::Infrastructure_config] at /etc/puppet/modules/k8s/manifests/proxy.pp:6 on node tools-bastion-03.tools.eqiad.wmflabs [16:08:59] Warning: Not using cache on failed catalog [16:08:59] Error: Could not retrieve catalog; skipping run [16:09:01] _joe_: ^ [16:09:01] yeah it's missing some margin [16:09:13] godog: I guessed [16:09:24] Can I make another one right now? [16:10:08] (03CR) 10Gehel: "rebased" [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) (owner: 10Filippo Giunchedi) [16:10:17] Amir1: sure, I'll go ahead with the rest in the meantime [16:10:27] godog: thanks [16:10:52] it just needs background-position: 0 [16:11:11] actually [16:11:12] background-position-y: center; [16:11:23] background-position-x: 0; [16:12:00] 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3600929 (10jcrespo) db1069 has been reused on s7, probably we should chose db1066 instead. [16:12:36] twentyafterfour Niharika ok to go ahead with scholarships patch? [16:12:51] godog: yeah [16:12:52] (03PS5) 10Andrew Bogott: prometheus::web to apache [puppet] - 10https://gerrit.wikimedia.org/r/377332 (https://phabricator.wikimedia.org/T151009) [16:12:56] Yup. [16:13:25] (03PS9) 10Filippo Giunchedi: Deploy scholarships with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/326461 (https://phabricator.wikimedia.org/T129134) (owner: 10Niharika29) [16:14:08] (03CR) 10Filippo Giunchedi: [C: 032] Deploy scholarships with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/326461 (https://phabricator.wikimedia.org/T129134) (owner: 10Niharika29) [16:14:11] godog: before running puppet on targets we need to run `scap deploy --init` on tin [16:14:26] but before that we need to merge puppet patch and run puppet on tin [16:14:45] so merge -> puppet on tin -> scap deploy --init -> puppet on targets -> deploy [16:15:01] or I think that's what I got from https://wikitech.wikimedia.org/wiki/Scap3/Migration_Guide#First_Deployment [16:15:40] godog: patch still failing after a rebase, I'll let you look into it, I'm missing too much context, sorry... [16:16:00] I can help with it in five minutes. [16:16:18] twentyafterfour: ack, puppet has just finished running on tin [16:16:39] ok I'll try scap init [16:17:14] done [16:17:16] oops, just did that too [16:17:20] no prob [16:17:28] there's a yaml-error in checks.yaml I fixed, see diff [16:17:46] (03PS6) 10Andrew Bogott: prometheus::web to apache [puppet] - 10https://gerrit.wikimedia.org/r/377332 (https://phabricator.wikimedia.org/T151009) [16:18:06] Niharika: no problem, there will be a fix for checks.yaml too ^ [16:18:30] I'll run puppet on krypton [16:18:32] (03CR) 10Gehel: "Some of the test failures seem to not be related to this CR, but tests that were already failing (my bad most probably)." [puppet] - 10https://gerrit.wikimedia.org/r/377366 (owner: 10Dzahn) [16:19:05] gehel: ack, I'll take a look tomorrow [16:19:13] godog: thanks! [16:21:38] (03PS1) 10Ladsgroup: gerrit: Fix up for the logo [puppet] - 10https://gerrit.wikimedia.org/r/377499 [16:21:49] godog: https://gerrit.wikimedia.org/r/377499 [16:21:52] (03CR) 10jerkins-bot: [V: 04-1] gerrit: Fix up for the logo [puppet] - 10https://gerrit.wikimedia.org/r/377499 (owner: 10Ladsgroup) [16:23:40] (03PS1) 10Ottomata: role::kafka::jumbo::broker - Don't include standard if already included [puppet] - 10https://gerrit.wikimedia.org/r/377500 [16:24:26] (03PS7) 10Andrew Bogott: prometheus::web to apache [puppet] - 10https://gerrit.wikimedia.org/r/377332 (https://phabricator.wikimedia.org/T151009) [16:24:39] (03PS2) 10Ottomata: role::kafka::jumbo::broker - Don't include standard if already included [puppet] - 10https://gerrit.wikimedia.org/r/377500 [16:24:42] (03CR) 10Ottomata: [V: 032 C: 032] role::kafka::jumbo::broker - Don't include standard if already included [puppet] - 10https://gerrit.wikimedia.org/r/377500 (owner: 10Ottomata) [16:25:11] (03PS29) 10Gehel: cassandra - puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) [16:25:36] (03CR) 10Gehel: cassandra - puppet 4 compatibility (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [16:25:42] (03PS2) 10Zppix: gerrit: Fix up for the logo [puppet] - 10https://gerrit.wikimedia.org/r/377499 (owner: 10Ladsgroup) [16:25:52] (03PS30) 10Gehel: cassandra - puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) [16:26:21] (03PS3) 10Gehel: role::elasticsearch::(cirrus|relforge): move to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/374341 (https://phabricator.wikimedia.org/T171704) [16:26:37] so can we do https://gerrit.wikimedia.org/r/#/c/376571/2 today too? ;) [16:26:52] godog: I set up the old nginx-based class on a labs VM and then applied https://gerrit.wikimedia.org/r/#/c/377332/ and everything looks good [16:26:58] !log volans@sarin conftool action : set/pooled=yes; selector: name=mw2100.codfw.wmnet [16:27:00] (03CR) 10Gehel: [C: 032] role::elasticsearch::(cirrus|relforge): move to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/374341 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [16:27:03] (it removed nginx and switched over without issue) [16:27:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:27:37] twentyafterfour: we can, mind adding it to the deployments page? [16:27:49] (03CR) 10Filippo Giunchedi: [C: 032] gerrit: Fix up for the logo [puppet] - 10https://gerrit.wikimedia.org/r/377499 (owner: 10Ladsgroup) [16:27:54] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: Switch all hosts to the future parser - https://phabricator.wikimedia.org/T171704#3600977 (10Gehel) [16:27:56] (03PS3) 10Filippo Giunchedi: gerrit: Fix up for the logo [puppet] - 10https://gerrit.wikimedia.org/r/377499 (owner: 10Ladsgroup) [16:28:19] k [16:29:53] godog: added [16:30:00] thanks twentyafterfour [16:30:31] * Niharika is here now [16:30:48] twentyafterfour: godog: Can I help with anything or you're done? :) [16:31:15] Niharika: iegreview isn't done afaik but I guess scholerships is ready to deploy? [16:31:58] Niharika: yup what twentyafterfour said, you can scap deploy scholarships now [16:32:04] twentyafterfour: If we've run puppet on targets then yup. [16:32:12] Okay. [16:32:39] (03PS1) 10Volans: wmf-auto-reimage refactoring [puppet] - 10https://gerrit.wikimedia.org/r/377501 (https://phabricator.wikimedia.org/T148814) [16:32:42] Here's the patch for iegreview: https://gerrit.wikimedia.org/r/#/c/375112/ Same steps. [16:33:08] (03CR) 10jerkins-bot: [V: 04-1] wmf-auto-reimage refactoring [puppet] - 10https://gerrit.wikimedia.org/r/377501 (https://phabricator.wikimedia.org/T148814) (owner: 10Volans) [16:33:20] (03CR) 1020after4: [C: 031] Deploy iegreview with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) (owner: 10BryanDavis) [16:33:51] Niharika: kk, let me know when you are done with scholarships and we'll move to iegreview [16:34:03] godog: it looks great [16:34:05] thank you [16:34:16] Amir1: np [16:34:39] godog: I think we need https://gerrit.wikimedia.org/r/#/c/376571/2 before iegreview will work? since it's hosted in phab instead of gerrit [16:35:08] !log niharika29@tin Started deploy [scholarships/scholarships@004635d]: Deploying scholarships with scap3 T129134 [16:35:09] (03PS2) 10Volans: wmf-auto-reimage refactoring [puppet] - 10https://gerrit.wikimedia.org/r/377501 (https://phabricator.wikimedia.org/T148814) [16:35:11] !log niharika29@tin Finished deploy [scholarships/scholarships@004635d]: Deploying scholarships with scap3 T129134 (duration: 00m 03s) [16:35:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:20] T129134: Deploy scholarships with scap3 - https://phabricator.wikimedia.org/T129134 [16:35:25] godog: Done^ [16:35:25] twentyafterfour: ah that's right, I'll merge the phab origin one then [16:35:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:23] (03PS2) 10Andrew Bogott: nodepool: remove trusty image [puppet] - 10https://gerrit.wikimedia.org/r/377476 (https://phabricator.wikimedia.org/T175696) (owner: 10Hashar) [16:37:30] (03PS3) 10Filippo Giunchedi: Scap: Allow phabricator as a source [puppet] - 10https://gerrit.wikimedia.org/r/376571 (owner: 10Thcipriani) [16:38:43] 10Operations, 10Performance-Team: Add profiling for Varnish and VCL - https://phabricator.wikimedia.org/T175710#3601019 (10Krinkle) [16:38:53] waiting for CI [16:38:57] (03CR) 10Andrew Bogott: [C: 032] nodepool: remove trusty image [puppet] - 10https://gerrit.wikimedia.org/r/377476 (https://phabricator.wikimedia.org/T175696) (owner: 10Hashar) [16:39:13] Niharika: https://gerrit.wikimedia.org/r/#/c/377502/ found while running scap deploy --init [16:39:24] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3601037 (10jcrespo) 0 is Online, Spun UP. Next one should be **Span: 1** [16:40:01] (03CR) 10Filippo Giunchedi: [C: 032] Scap: Allow phabricator as a source [puppet] - 10https://gerrit.wikimedia.org/r/376571 (owner: 10Thcipriani) [16:40:10] godog: Okay. Want me to +2? [16:40:11] (03PS4) 10Filippo Giunchedi: Scap: Allow phabricator as a source [puppet] - 10https://gerrit.wikimedia.org/r/376571 (owner: 10Thcipriani) [16:40:35] Niharika: sure, +2 and merge is fine, I changed it on tin already [16:40:45] Alright. [16:41:07] unrelated but I'd really appreciate if we don't keep merging unrelated patches during puppet swat [16:41:12] Might be similar for iegreview. [16:41:43] andrewbogott: hi :) puppet is disabled on labnodepool1001.eqiad.wmnet, I guess because rate/servers are live hacked [16:41:58] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3601051 (10RobH) a:05RobH>03Cmjohnson Ok, now we are in a bad state. We are trying to remotely enter the Bios, and get the following when telling it to enter bios on vsp: ``` R... [16:42:04] hasharAway: it's not because of the rate hack, chase is rolling out a big refactor today [16:42:13] (03PS2) 10Giuseppe Lavagetto: role::snapshot::common: properly scope included classes [puppet] - 10https://gerrit.wikimedia.org/r/377493 (https://phabricator.wikimedia.org/T171704) [16:42:13] so it's disabled on lab* [16:42:15] (03PS1) 10Giuseppe Lavagetto: toollabs: fix k8s classes that just include k8s::proxy [puppet] - 10https://gerrit.wikimedia.org/r/377505 [16:42:24] hasharAway: should be back to normal in a few hours if things go well [16:42:43] andrewbogott: sounds good. Thanks! [16:43:14] Niharika: quite possible yeah [16:43:45] PROBLEM - Host lvs1007 is DOWN: PING CRITICAL - Packet loss = 100% [16:44:11] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3601088 (10BBlack) Also, the NIC firmware update only applied to ports 2+3, but not ports 0+1. I don't suspect NIC firmware level was a leading candidate for the fix anyways, but ha... [16:44:15] (03CR) 10Dduvall: [C: 031] "Yes please! This is needed to support multi-stage builds in the new container release pipeline." [puppet] - 10https://gerrit.wikimedia.org/r/377492 (https://phabricator.wikimedia.org/T175293) (owner: 10Thcipriani) [16:44:21] (03PS2) 10Giuseppe Lavagetto: toollabs: fix k8s classes that just include k8s::proxy [puppet] - 10https://gerrit.wikimedia.org/r/377505 [16:44:55] (03CR) 10Volans: "Puppet compiler: https://puppet-compiler.wmflabs.org/compiler03/7828/" [puppet] - 10https://gerrit.wikimedia.org/r/377501 (https://phabricator.wikimedia.org/T148814) (owner: 10Volans) [16:46:00] (03CR) 10Rush: [C: 031] "The params passed to this role are a mess and dupe value's abound in teh toollabs roles :( but that's not a problem here -- thank you for " [puppet] - 10https://gerrit.wikimedia.org/r/377505 (owner: 10Giuseppe Lavagetto) [16:46:08] (03CR) 10Giuseppe Lavagetto: [C: 032] toollabs: fix k8s classes that just include k8s::proxy [puppet] - 10https://gerrit.wikimedia.org/r/377505 (owner: 10Giuseppe Lavagetto) [16:47:01] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler03/7829/krypton.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) (owner: 10BryanDavis) [16:47:13] (03PS6) 10Filippo Giunchedi: Deploy iegreview with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) (owner: 10BryanDavis) [16:48:08] (03CR) 10Filippo Giunchedi: [C: 032] Deploy iegreview with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) (owner: 10BryanDavis) [16:48:09] <_joe_> can y'all wait a sec before merging puppet patches? [16:48:15] <_joe_> lol :P [16:48:25] _joe_: no, puppet swat is on :P [16:48:35] <_joe_> godog: I'm fixing a breakage in puppet [16:48:52] <_joe_> it will take me ~ 1 min from now [16:49:11] (03PS3) 10Giuseppe Lavagetto: toollabs: fix k8s classes that just include k8s::proxy [puppet] - 10https://gerrit.wikimedia.org/r/377505 [16:49:35] godog: I would ask to revert that logo again because I think the black & white one is just generally unattractive, but I lost *that* battle some time ago [16:49:36] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] toollabs: fix k8s classes that just include k8s::proxy [puppet] - 10https://gerrit.wikimedia.org/r/377505 (owner: 10Giuseppe Lavagetto) [16:49:45] At least this time the font isn't Times New Roman ;-) [16:49:59] <_joe_> {{done}} [16:52:16] Niharika: yeah iegreview will need the same checks.yaml fix, I'll patch it on tin now [16:52:22] no_justification: +1 ;) [16:53:34] godog: Thanks. It's on arcanist so I was digging the instructions for submitting a patch. [16:53:47] no_justification: what about removing the page-bkg.cache.jpg light blue background? I think it's nicer without with this logo [16:54:17] Niharika: ack, you can scap deploy iegreview now btw, I've ran puppet on krypton [16:54:36] Okay. On it. [16:54:54] Niharika: if you need help with arcanist just let me know [16:55:18] `arc diff HEAD^` is the magic incantation, once you have arc installed [16:55:24] !log niharika29@tin Started deploy [iegreview/iegreview@69c4c3f]: Deploying iegreview with scap3 T129154 [16:55:24] volans: I'd rather not have a logo at all :\ [16:55:25] !log niharika29@tin Finished deploy [iegreview/iegreview@69c4c3f]: Deploying iegreview with scap3 T129154 (duration: 00m 01s) [16:55:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:38] T129154: Deploy iegreview with scap3 - https://phabricator.wikimedia.org/T129154 [16:55:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:55] lol [16:55:59] And done. Woo. Thanks godog and twentyafterfour. I'll let you know if I have arcanist problems. I've done it in the past, just need a refresher. [16:56:29] no_justification: I'd rather have a custom logo. Not this one. :( [16:56:39] Niharika: no problem, ok! [16:56:42] Anyway, I've stopped caring already [16:56:44] (03PS2) 10Muehlenhoff: Remove salt grains from app server canaries [puppet] - 10https://gerrit.wikimedia.org/r/377222 [16:56:50] godog: Niharika: thanks to you both as well ;) [16:56:51] Got 99 problems but logos aren't one [16:57:02] no_justification: lols [16:57:28] no_justification: MRW if you ask for the revert https://media1.giphy.com/media/FdmPbRxNRGNvq/giphy.mp4 [16:57:59] godog: thanks for all the puppet wrangling! [16:58:14] on a serious note, I don't particularly mind the logo [16:58:23] thcipriani: np! was easy enough :) [16:59:20] Niharika: twentyafterfour thanks for the deploys. Two more repos down! \o/ [16:59:50] no_justification: Cause you're young and you're black and your hat's real low? or are your real problems caused by a logo? [17:00:04] No patches in the queue for this window. Wheeee! [17:00:32] jouncebot: did you mean to rhyme with me and jay-z? [17:00:45] _joe_: thx [17:00:55] jouncebot has a smooth flow [17:01:29] :) [17:01:58] jouncebot's got 99 problems but a patch ain't one. [17:02:09] bd808: can we get a new command added to have jouncebot drop some sick verses? [17:02:40] no_justification: I handed over the maintainer keys to Niharika :) [17:03:11] Imma keep dumping old projects on her until she starts dumping them on others [17:03:32] (03CR) 10Muehlenhoff: [C: 032] Remove salt grains from app server canaries [puppet] - 10https://gerrit.wikimedia.org/r/377222 (owner: 10Muehlenhoff) [17:03:45] jouncebot raps + T170484 [17:03:46] T170484: Play elevator music while scap is running - https://phabricator.wikimedia.org/T170484 [17:04:17] no_justification: jouncebot should probably throw some classic taytay for legoktm too [17:04:32] spoken like a true manager bd808 ! [17:05:11] its the cycle of life in the FLOSS world :) [17:05:44] show yourself to be useful and competent and you will be rewarded with more work than you can possibly keep up with [17:06:37] bd808: For me too..... [17:06:37] indeed, "careful what you wish for" [17:06:40] * no_justification shakes it off [17:07:10] hahah [17:08:00] https://www.youtube.com/watch?v=WcM14Al83Ls [17:08:31] ^ not a rickroll [17:08:54] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3601142 (10Papaul) Disk in slot 2 replacement complete. [17:09:00] that's exactly what I would say if it was a rickroll twentyafterfour [17:09:08] chasemp: exactly [17:09:23] crafty :) [17:11:02] Instead of Rick Astley, this rickroll contains the sickest Canadian a-capella Taylor Swift cover ever produced. [17:12:16] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: rack/setup/install wdqs100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T171210#3601148 (10Gehel) Server are installed, pooled and are serving user traffic. We can close this task. [17:16:41] Amir1: It seems the PNG logo is too bad. If I dsable the svg rule in my browser, the PNG doesn't fit (way too big) [17:17:09] either scale it down to 1x, or add background-size rule if it's meant to support non-svg-hdpi browsers (I don't think we support those) [17:17:28] but afaik we only support hi-dpi for svg-supporting browsers [17:17:42] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: rack/setup/install wdqs100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T171210#3601163 (10Smalyshev) 05Open>03Resolved [17:21:28] Or, as I keep suggesting... No logo at all [17:21:35] * no_justification writes a patch [17:24:44] (03CR) 10Dzahn: prometheus::web to apache (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/377332 (https://phabricator.wikimedia.org/T151009) (owner: 10Andrew Bogott) [17:26:22] (03PS1) 10Catrope: Enable $wgStructuredFiltersShowPreference in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377511 [17:26:38] (03CR) 10Catrope: [C: 032] Enable $wgStructuredFiltersShowPreference in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377511 (owner: 10Catrope) [17:27:21] PROBLEM - MegaRAID on db2010 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [17:27:23] ACKNOWLEDGEMENT - MegaRAID on db2010 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T175715 [17:27:26] 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175715#3601227 (10ops-monitoring-bot) [17:27:59] ema bblack Hi! Is it possible that recently we've changed the Access-Control-Allow-Origin hedaers for /becaon/event calls in the past few months? [17:28:15] (03Merged) 10jenkins-bot: Enable $wgStructuredFiltersShowPreference in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377511 (owner: 10Catrope) [17:28:25] (03CR) 10jenkins-bot: Enable $wgStructuredFiltersShowPreference in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377511 (owner: 10Catrope) [17:28:34] 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175715#3601237 (10Volans) [17:28:38] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3601239 (10Volans) [17:28:47] (03PS1) 10Faidon Liambotis: openstack2: use !~ instead of ! $title =~ /.../ [puppet] - 10https://gerrit.wikimedia.org/r/377512 [17:28:49] (03PS1) 10Faidon Liambotis: thumbor: fix weird integer interpolation [puppet] - 10https://gerrit.wikimedia.org/r/377513 [17:29:00] Also, do you know if for requests rejected due to CORS (not sure of the details of the interaction there) an entry would still go into the server logs? [17:29:15] Krinkle: ^ also :) [17:29:18] thx in advance!! [17:36:30] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3601285 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['lvs1007.eqiad.wmnet'] ``` Of which those **FAILED**: ``` set(['lvs1007.eqiad.wmnet']) ``` [17:37:11] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 2071818 [17:38:24] (03CR) 10EBernhardson: [C: 031] [cirrus] Force native script for super noop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376027 (https://phabricator.wikimedia.org/T174652) (owner: 10DCausse) [17:47:21] RECOVERY - MegaRAID on db2010 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [17:47:33] (03PS20) 10Rush: openstack: designate as module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/376848 (https://phabricator.wikimedia.org/T171494) [17:48:50] (03CR) 10Rush: [C: 032] openstack: designate as module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/376848 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [17:52:14] AndyRussG, isn't that something enforced by web browsers? [17:54:03] Krenair: I think so... So, I'm getting an error and request canceled message in the browser. And for some reason I'm seeing less than the expected number of said requests in the Hive in wmf.webrequests [17:54:07] logs [17:54:14] or table, rather [17:54:26] yeah that sounds like the web browser is blocking it from ever getting to the server [17:54:55] Krenair: so I guess somehow the browser stops the http negotiation before its completed, so we don't log the request? [17:55:02] it's* [17:55:47] hmm. I'd check with Krinkle [17:55:49] Also the lower than expected number of events received is pretty new (last few months) so I was kinda hoping to identify what happened [17:56:13] in some cases it tries the request with a special http method first [17:57:02] Ahh right hmmm yes I seem to recall reading that... Hmmm but only for POST and such, I think [17:57:15] K that makes sense actually [17:57:28] https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS#Overview [17:57:30] thx!! [17:57:41] it's slightly complicated :/ [17:57:41] yeah :) [17:58:12] Aaaarg my brain didn't want to set the priority for "learning that" very high, yurp [17:59:26] * AndyRussG sprints back to the old-age home [18:20:45] (03PS1) 10Cmjohnson: Adding new mac address for deploy1001 to reflect nic change w/new motherboard. [puppet] - 10https://gerrit.wikimedia.org/r/377523 [18:21:21] (03CR) 10Cmjohnson: [C: 032] Adding new mac address for deploy1001 to reflect nic change w/new motherboard. [puppet] - 10https://gerrit.wikimedia.org/r/377523 (owner: 10Cmjohnson) [18:27:42] Krinkle: I will fix it ASAP [18:27:47] It's hard to test png [18:35:11] RECOVERY - Host lvs1007 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [18:35:21] (03CR) 10Nuria: Add cron to purge old mediawiki data snapshots (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) (owner: 10Nuria) [18:42:10] (03PS1) 10MaxSem: Start migration to HTML5 sections on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377527 (https://phabricator.wikimedia.org/T175725) [18:47:49] (03PS3) 10Nuria: Add cron to purge old mediawiki data snapshots [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) [18:48:14] (03CR) 10jerkins-bot: [V: 04-1] Add cron to purge old mediawiki data snapshots [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) (owner: 10Nuria) [18:50:11] andrewbogott: trusty is gone from nodepool though I am keeping the base image around for a while just in case :] [18:50:19] ok! [18:50:32] that is one less madness on nodepool/openstack! [18:59:44] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [19:00:05] No patches in the queue for this window. Wheeee! [19:00:15] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [19:02:48] :) [19:03:13] PROBLEM - puppet last run on kafka1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:14] PROBLEM - puppet last run on druid1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:14] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:23] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:33] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:33] PROBLEM - puppet last run on install2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:43] PROBLEM - puppet last run on elastic1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:53] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:53] PROBLEM - puppet last run on pc1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:54] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:03:54] 10Operations, 10Deployment-Systems, 10Release-Engineering-Team (Backlog): Trebuchet targets for test/testrepo are out of date - https://phabricator.wikimedia.org/T149180#3601711 (10demon) 05Open>03declined Nobody cares anymore. [19:04:03] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:04:13] PROBLEM - puppet last run on db1095 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:04:13] RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 36.07 ms [19:04:14] RECOVERY - Host mw2256.mgmt is UP: PING OK - Packet loss = 0%, RTA = 41.03 ms [19:04:14] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:05:05] !log maintenance complete on mw2256 [19:05:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:05:18] 10Operations, 10ops-codfw: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346#3601717 (10Papaul) Tech arrived on site at 10:13am and started working on the server at 10:30. After replacing the main board the server came up with an error on the PSU 's (PSU Mismatch) After troubleshooting with... [19:05:21] 10Operations, 10Maps-Sprint, 10Maps (Kartotherian): Upgrade kartotherian and tilerator to nodejs 6.11 - https://phabricator.wikimedia.org/T171707#3601718 (10Pnorman) The test servers are done re-imaging, so we can now upgrade to 6.11 on them and test [19:05:23] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [19:05:45] (03PS1) 10Hashar: DO NOT SUBMIT contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 [19:05:55] (03CR) 10Hashar: [C: 04-1] DO NOT SUBMIT contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 (owner: 10Hashar) [19:06:00] (03CR) 10jerkins-bot: [V: 04-1] DO NOT SUBMIT contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 (owner: 10Hashar) [19:06:37] "No patches in the queue for this window. Wheeee!" [19:06:43] jouncebot is stupid these days [19:06:48] NOT ALL WINDOWS HAVE PATCHES YOU STUPID BOT [19:06:53] * no_justification preferred the senseless ping [19:07:32] To avoid pinging in an empty window, we made it ping never? [19:07:36] Seems like a sledgehammer. [19:09:03] PROBLEM - puppet last run on wtp1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:09:05] no_justification: yeah... I think there should be a flag for windows that its not wanted for in the markup or something. [19:09:16] Or just go back to the old mode [19:09:21] Which worked just fine [19:09:22] but some SWAT folks were crabby about the empty pings [19:09:27] no_justification: I have been meaning to fix that. Sorry. [19:09:28] Maybe they shouldn't SWAT then [19:09:31] 10Operations, 10Discovery, 10Maps, 10Maps-Sprint, 10Traffic: Make maps active / active - https://phabricator.wikimedia.org/T162362#3601731 (10debt) [19:10:08] (03PS2) 10Hashar: DO NOT SUBMIT contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 [19:10:10] bd808: If I got crabby over every useless ping I got on IRC, my ignore list would be 60 people long by now. [19:10:22] (03CR) 10jerkins-bot: [V: 04-1] DO NOT SUBMIT contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 (owner: 10Hashar) [19:10:29] mine probably is :) [19:11:09] (03PS3) 10Volans: wmf-auto-reimage refactoring [puppet] - 10https://gerrit.wikimedia.org/r/377501 (https://phabricator.wikimedia.org/T148814) [19:14:26] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:14:27] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [19:16:56] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:17:09] bd808: tbh, I'd just hang out in non-public channels if pings bothered me :p [19:17:39] (03CR) 10Volans: "Updated compiler https://puppet-compiler.wmflabs.org/compiler02/7830/" [puppet] - 10https://gerrit.wikimedia.org/r/377501 (https://phabricator.wikimedia.org/T148814) (owner: 10Volans) [19:18:36] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:20:08] hashar: I'm not sure I'm getting your comment in T175712, if there are salt masters internally to single projects that's out of scope of the task [19:20:09] T175712: Install cumin in the WMCS infrastructure - https://phabricator.wikimedia.org/T175712 [19:23:51] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3601816 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by bblack on neodymium.eqiad.wmnet for hosts: ``` ['lvs1007.eqiad.wmnet'] ``` The log can be found in `/var/log/wm... [19:24:33] o/ I've got some log files on ores1001.eqiad.wmnet that I can't read. Can someone help? [19:24:35] See ores1001:/srv/log/ores [19:24:37] (03CR) 10Dzahn: "ooh, that's all upstream? for some reason i thought the validate_legacy stuff is our local thing. gotcha!" [puppet] - 10https://gerrit.wikimedia.org/r/377355 (owner: 10Dzahn) [19:24:48] app.log is 644 [19:24:51] but main.log is 600 [19:24:57] I'd like both to be 644 [19:25:02] (03Abandoned) 10Dzahn: stdlib: fix quoting in validate_legacy example [puppet] - 10https://gerrit.wikimedia.org/r/377355 (owner: 10Dzahn) [19:26:06] PROBLEM - Host lvs1007 is DOWN: PING CRITICAL - Packet loss = 100% [19:26:31] (03Abandoned) 10BBlack: maps: active/active public interface [puppet] - 10https://gerrit.wikimedia.org/r/345591 (owner: 10BBlack) [19:27:36] RECOVERY - Host lvs1007 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [19:27:50] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3601825 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by bblack on neodymium.eqiad.wmnet for hosts: ``` ['lvs1007.eqiad.wmnet'] ``` The log can be found in `/var/log/wm... [19:28:50] 10Operations, 10ORES, 10Scoring-platform-team (Current): Give ores admins read access to /srv/log/ores/main.log* - https://phabricator.wikimedia.org/T175736#3601826 (10awight) [19:29:19] 10Operations, 10ORES, 10Scoring-platform-team (Current): Give ores admins read access to /srv/log/ores/main.log* - https://phabricator.wikimedia.org/T175736#3601826 (10awight) I'd be happy to write the patch, if an Ops engineer wants to suggest the preferred way to give us access? [19:29:44] 10Operations, 10Scoring-platform-team, 10monitoring: ORES error messages not in logstash? - https://phabricator.wikimedia.org/T175653#3601840 (10awight) [19:30:06] PROBLEM - Host lvs1007 is DOWN: PING CRITICAL - Packet loss = 100% [19:30:09] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops: two switches have same serial in racktables - https://phabricator.wikimedia.org/T175737#3601841 (10RobH) [19:31:09] 10Operations, 10Scoring-platform-team, 10monitoring: ORES log messages should be sent to logstash - https://phabricator.wikimedia.org/T175653#3599044 (10awight) [19:31:16] RECOVERY - puppet last run on pc1005 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:31:19] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [19:31:26] RECOVERY - puppet last run on wtp1033 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [19:31:37] RECOVERY - puppet last run on druid1002 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [19:31:37] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [19:31:46] RECOVERY - puppet last run on wtp1023 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [19:31:47] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [19:32:06] RECOVERY - puppet last run on elastic1031 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [19:32:06] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [19:32:12] volans: I just wanted to raise awareness that some labs project have their instance attached to a standalone salt master [19:32:19] volans: eg, not the WMCS Salt master :D [19:32:26] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [19:32:36] RECOVERY - puppet last run on db1095 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [19:32:36] RECOVERY - puppet last run on install2002 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [19:32:36] RECOVERY - puppet last run on kafka1012 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [19:32:47] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [19:32:53] 10Operations, 10ORES, 10Scoring-platform-team (Current): Give ores admins read access to /srv/log/ores/main.log* - https://phabricator.wikimedia.org/T175736#3601826 (10Dzahn) The preferred way would be if logs can be read using systemd's journalctl and in the admin module in the section for ores-admin, a lin... [19:32:55] hashar: yeah, that's fine as long as the owners of the labs project manage their salt [19:33:12] not sure it that means that we'll need to keep salt code into the puppet tree though [19:33:56] (03PS2) 10Ottomata: Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 [19:34:24] (03CR) 10jerkins-bot: [V: 04-1] Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 (owner: 10Ottomata) [19:34:31] (03PS3) 10Hashar: DO NOT SUBMIT contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 [19:34:45] (03CR) 10jerkins-bot: [V: 04-1] DO NOT SUBMIT contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 (owner: 10Hashar) [19:35:30] (03PS3) 10Ottomata: Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 [19:35:59] (03PS4) 10Ottomata: Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 [19:36:01] (03CR) 10jerkins-bot: [V: 04-1] Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 (owner: 10Ottomata) [19:36:13] volans: most probably we would want a way to install a cumin master on a labs project :D [19:36:27] (03CR) 10jerkins-bot: [V: 04-1] Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 (owner: 10Ottomata) [19:38:09] hashar: I don't see a problem for that, project owners have root in their VMs ;) it might need some puppet coding to make it easier/automatic though [19:39:24] (03PS5) 10Ottomata: Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 [19:39:27] volans: put another way, just like most things, we should have a very similar setup in beta cluster as we do in prod, so, could you see how hard it would be to do that? :) [19:39:37] volans: better question: do you know off hand if that is possible? [19:39:52] (03CR) 10jerkins-bot: [V: 04-1] Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 (owner: 10Ottomata) [19:41:13] volans: (with the current way that it is configured in puppet, that is) [19:41:16] greg-g: I've developed and tested cumin in a labs project ;) [19:41:35] with very little local modifications in the puppet tree [19:42:34] (03PS6) 10Ottomata: Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 [19:43:18] volans: given our standing policy/guideline of "people who maintain it in prod maintain it in beta (eg: discovery folks maintain the ES cluster there), can oyu take a stab at making a cumin master in beta during your other cloud-vps related work? [19:43:31] unbalanced ", sorry ;) [19:43:59] I can look into it, yes, but probably after the end of the quarter [19:44:30] but who should use it? [19:44:58] I mean has a real use case or is just a mirroring of prod? [19:45:18] (03PS1) 10Chad: group0 to wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377541 [19:45:24] I use salt in beta sometimes [19:45:36] !log demon@tin Started scap: bootstrap wmf.18 [19:45:46] I don't want to put something unpuppetised in beta [19:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:18] It should ideally work like prod, with some allowances for the constraints of labs and separate hostnames, access control, etc. [19:46:24] volans: yeah, people use salt in beta at times, so we should migrate those uses to cumin, afaiui [19:49:02] (03PS7) 10Ottomata: Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 [19:50:52] greg-g: fair enough, but is the puppetmaster in depl-prep using puppetdb? [19:51:07] I cannot find it in the config of deployment-puppetmaster02 [19:51:32] volans: no. there is no puppetdb outside of "prod" [19:52:15] host targeting will do >90% of what is done with salt today in Beta Cluster and CI [19:52:16] actually... [19:52:22] https://wikitech.wikimedia.org/w/index.php?title=Hiera%3ADeployment-prep&type=revision&diff=1368443&oldid=1337008 [19:52:29] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler02/7832/analytics1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/377352 (owner: 10Ottomata) [19:52:31] (03CR) 10Ottomata: [C: 032] Fetch Hadoop NameNode fsimage backups daily and also save them in bacula [puppet] - 10https://gerrit.wikimedia.org/r/377352 (owner: 10Ottomata) [19:52:33] I don't remember how far I got with it [19:52:58] heh. Krenair's always ahead of the curve [19:53:50] !log demon@tin Finished scap: bootstrap wmf.18 (duration: 08m 14s) [19:54:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:08] bd808: what do you mean with "host targeting will do..." ? [19:54:24] related operations/puppet commit that remains open *to this day* [19:54:25] https://gerrit.wikimedia.org/r/#/c/333471/ [19:54:30] 10Operations, 10fundraising-tech-ops: Long term storage for frack prometheus data - https://phabricator.wikimedia.org/T175738#3601924 (10cwdent) [19:55:19] volans: the hostnames in those environments are typically descriptive so being able to say "run this on the foo-* hosts" is most of what they need (if not all) [19:56:11] bd808: sure, but cumin needs a place where to expand the "*" into FQDN ;) I was planning to write at some point a backend for the known hosts format (if not hashed) [19:56:24] PROBLEM - Check systemd state on analytics1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:56:50] volans, that might already work [19:57:02] (03PS1) 10Ottomata: Allow read access to /srv/backup in analytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/377543 [19:57:09] volans, this is the reason for the puppetdb dependency right? [19:57:25] volans: https://github.com/bd808/wikimedia-cloud-vps-hostgroup-generator :) [19:58:12] Krenair: puppetdb is not a dependency, is one of the possible backends, in prod it allows to query hosts by puppet resources or facts [19:58:23] (03CR) 10Ottomata: [C: 032] Allow read access to /srv/backup in analytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/377543 (owner: 10Ottomata) [19:58:54] ok [19:58:56] well [19:59:02] preferably it'd be set up like prod [19:59:58] so if the puppetdb installation in beta is functional then we may be able to use it [20:00:25] RECOVERY - Check systemd state on analytics1002 is OK: OK - running: The system is fully operational [20:00:32] if not we may be able to either fix it, or if it came to it consider another backend, but I'd like to avoid that [20:01:51] (03PS1) 10Ladsgroup: gerrit: Smaller png logo [puppet] - 10https://gerrit.wikimedia.org/r/377547 [20:02:08] I’m trying to point a Python service (ORES) at logstash, and wondering: Should I construct the SysLogHandler with address=(logstash1001.eqiad.wmnet, 10514) or “/dev/log”? [20:03:25] volans, is there a simple way to check our puppetdb is functioning? [20:03:34] It's been many months since I looked at it [20:03:37] godog: sorry, do you have a minute to merge this? https://gerrit.wikimedia.org/r/#/c/377547/ [20:06:26] bd808: Can I ask for logstash advice? (Q above ^) [20:06:39] !log demon@tin Started scap: bootstrap wmf.18 (for real this time) [20:06:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:16] awight: there's a new service name you should point at I think [20:07:20] * bd808 is on a call [20:07:28] kk ty [20:08:42] awight: logstash.svc.eqiad.wmnet [20:08:46] awight: you should use logstash.svc.eqiad.wmnet (the service name) [20:10:30] Great! I see in operations-puppet that the UDP port is the same, 10514. That should be all I need, ty [20:14:18] bd808: No rush to review, but here’s my rough patch to send to logstash, https://gerrit.wikimedia.org/r/377553 [20:14:53] 10Operations, 10Analytics, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#3602063 (10Jan_Dittrich) Is there some progress on this issue? The last experiments I was in contact with (and responsible for thinking up) were still hack-ish. I wonder if packaging one of the hac... [20:25:11] MaxSem: I'm running a tad late on train today. Plz check with me before your deploy window @ 2pm [20:25:11] kthnxbai [20:25:26] okie [20:27:01] 10Operations, 10Analytics, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#3602138 (10Nuria) We will not be doing more work towards this as statistically we did not find a more private conscoius way to do ab testing than the one event logging -based experiments offers. You... [20:33:03] !log demon@tin Finished scap: bootstrap wmf.18 (for real this time) (duration: 26m 23s) [20:33:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:25] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377541 (owner: 10Chad) [21:00:04] No patches in the queue for this window. Wheeee! [21:01:10] (03CR) 10Chad: [C: 032] group0 to wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377541 (owner: 10Chad) [21:02:49] (03Merged) 10jenkins-bot: group0 to wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377541 (owner: 10Chad) [21:03:03] (03CR) 10jenkins-bot: group0 to wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377541 (owner: 10Chad) [21:04:23] no_justification: Can you determine whether 0942f2071 in extensions/VisualEditor/lib/ve (the submodule) made it into wmf.18? Gerrit says it is, but that https://gerrit.wikimedia.org/r/#/c/377524/ (which pulled it into extensions/VisualEditor) didn't… [21:05:09] Krenair: dunno, and I have no knowledge of what was done in the past with it [21:05:17] no_justification: (It was a UBN we spotted this morning.) [21:05:26] I dunno. [21:05:32] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.18 [21:05:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:07:50] https://gerrit.wikimedia.org/r/#/c/376500/ when this can be merged? _joe_ mobrovac ? [21:07:54] akosiaris: [21:08:22] (03CR) 10Zppix: [C: 031] service: Use LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/376500 (https://phabricator.wikimedia.org/T175242) (owner: 10Ladsgroup) [21:15:06] no_justification, is the train done? [21:17:13] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3602339 (10Nuria) 05Open>03Resolved [21:17:39] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: rack/setup/install druid100[456].eqiad.wmnet - https://phabricator.wikimedia.org/T171626#3602340 (10Nuria) 05Open>03Resolved [21:18:20] PROBLEM - salt-minion processes on labtestvirt2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [21:19:03] 10Operations, 10Scoring-platform-team, 10monitoring, 10Patch-For-Review: ORES log messages should be sent to logstash - https://phabricator.wikimedia.org/T175653#3602349 (10Halfak) [21:29:50] MaxSem: looks so? [21:37:54] !log Populating ip_changes in group0 wikis [21:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:41:33] MaxSem: would you ping me when your deploy is done? I'd like to deploy mobileapps afterwards. [21:42:06] (03PS2) 10MaxSem: Start migration to HTML5 sections on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377527 (https://phabricator.wikimedia.org/T175725) [21:42:25] (03CR) 10MaxSem: [C: 032] Start migration to HTML5 sections on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377527 (https://phabricator.wikimedia.org/T175725) (owner: 10MaxSem) [21:43:54] (03Merged) 10jenkins-bot: Start migration to HTML5 sections on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377527 (https://phabricator.wikimedia.org/T175725) (owner: 10MaxSem) [21:46:22] (03CR) 10jenkins-bot: Start migration to HTML5 sections on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377527 (https://phabricator.wikimedia.org/T175725) (owner: 10MaxSem) [21:49:13] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/377527/2 T175725 (duration: 00m 49s) [21:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:49:25] T175725: Deploy HTML5 sections to WMF production - https://phabricator.wikimedia.org/T175725 [21:53:10] PROBLEM - nova-compute process on labtestvirt2002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute [21:55:00] bearND, all yours [21:55:36] MaxSem: thank you [21:55:45] !log bsitzmann@tin Started deploy [mobileapps/deploy@b11b75c]: Update mobileapps to 297b048 (T174707 T175305) [21:55:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:55:58] T174707: French onthisday/selected for some 1sts of the month pages not working - https://phabricator.wikimedia.org/T174707 [21:55:58] T175305: mobile-sections: Cannot read property 'indexOf' of undefined in markReferenceSections - https://phabricator.wikimedia.org/T175305 [21:56:40] team, we just went live X2! [21:57:09] test wikis now have range contribs and human readable sections [21:59:09] wrong channel [22:00:13] !log bsitzmann@tin Finished deploy [mobileapps/deploy@b11b75c]: Update mobileapps to 297b048 (T174707 T175305) (duration: 04m 28s) [22:00:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:00:33] (03PS1) 10Volans: Clustershell: make call to tqdm.write() explicit [software/cumin] - 10https://gerrit.wikimedia.org/r/377656 [22:01:15] (03PS2) 10Volans: Clustershell: make call to tqdm.write() explicit [software/cumin] - 10https://gerrit.wikimedia.org/r/377656 [22:03:50] (03CR) 10Volans: [C: 032] "Trivial fix" [software/cumin] - 10https://gerrit.wikimedia.org/r/377656 (owner: 10Volans) [22:05:33] (03Merged) 10jenkins-bot: Clustershell: make call to tqdm.write() explicit [software/cumin] - 10https://gerrit.wikimedia.org/r/377656 (owner: 10Volans) [22:05:44] (03PS1) 10Mattflaschen: Enable Flow on Newsletter_talk on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377657 (https://phabricator.wikimedia.org/T175502) [22:09:21] PROBLEM - Host labtestvirt2002 is DOWN: PING CRITICAL - Packet loss = 100% [22:10:54] ^me [22:17:56] !log labvirt1018:~# /sbin/reboot [22:18:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:30:59] (03CR) 10Mattflaschen: [C: 04-2] "Mattflaschen needs to run a script immediately before deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377657 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [22:43:40] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2063364 [22:43:50] PROBLEM - Host labtestvirt2003 is DOWN: PING CRITICAL - Packet loss = 100% [22:47:10] (03PS7) 10GeoffreyT2000: Rename Wikisaurus namespace on Wiktionary to "Thesaurus" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374063 (https://phabricator.wikimedia.org/T174264) [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170912T2300). [23:00:04] Smalyshev and matt_flaschen: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:28] Present [23:03:10] I can SWAT [23:03:30] SMalyshev: ping for SWAT [23:04:21] matt_flaschen: do you still need to run your script for https://gerrit.wikimedia.org/r/#/c/377657/ ? [23:04:55] thcipriani, yes, let me know when you're about to deploy mine, and I'll do it. [23:05:05] It should be directly before. [23:05:29] okie doke [23:05:59] thcipriani: thanks, I'm here [23:06:07] sorry, was distracted by patches :) [23:06:16] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377657 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [23:06:27] SMalyshev: no problem :) [23:07:49] (03CR) 10Thcipriani: Enable Flow on Newsletter_talk on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377657 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [23:08:14] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377657 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [23:08:48] (03PS2) 10Mattflaschen: Enable Flow on Newsletter_talk on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377657 (https://phabricator.wikimedia.org/T175502) [23:10:00] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [23:10:00] thcipriani, sorry, I made a last-minute comment change (PS2). [23:10:21] RECOVERY - Host labtestvirt2003 is UP: PING OK - Packet loss = 0%, RTA = 36.03 ms [23:11:01] matt_flaschen: np, was confused there for a second :) [23:11:26] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377657 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [23:13:13] (03Merged) 10jenkins-bot: Enable Flow on Newsletter_talk on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377657 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [23:13:35] (03CR) 10jenkins-bot: Enable Flow on Newsletter_talk on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377657 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [23:14:18] matt_flaschen: everything is ready to go on tin. You want me to pull to mwdebug1002? Or do you want to run script first? [23:16:19] thcipriani, you can pull to mwdebug1002. I'll just test that a non-existent page appears as Flow there. I'm not sure I want to make an edit, because then the config may be in inconsistent state between that machine and job queue. [23:16:28] Then before going to all machines, I'll do the script. [23:17:11] matt_flaschen: ok, it's live on mwdebug1002 [23:19:15] thcipriani, it's not working. It has to be the number. I should have caught that. One sec, I'll a followup. [23:19:26] ok [23:20:47] (03PS1) 10Nuria: Stopping event collection for PageCreation events [puppet] - 10https://gerrit.wikimedia.org/r/377667 (https://phabricator.wikimedia.org/T171629) [23:21:59] Hello [23:22:04] (03PS1) 10Mattflaschen: Followup Newsletter talk: Must be numeric [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377668 (https://phabricator.wikimedia.org/T175502) [23:22:41] thcipriani, up now. ^ [23:23:21] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377668 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [23:25:02] (03Merged) 10jenkins-bot: Followup Newsletter talk: Must be numeric [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377668 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [23:25:10] thcipriani: yesterday, there were some JenkinsBot / Jenkins / Zuul issues, so I've two config changes I'd like to report to this window. Could you handle them afterwards or ping me so I can handle them? [23:25:36] Dereckson: sure, I can get them [23:25:45] Thaks [23:25:59] matt_flaschen: pulled to mwdebug1002 [23:26:22] (03CR) 10jenkins-bot: Followup Newsletter talk: Must be numeric [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377668 (https://phabricator.wikimedia.org/T175502) (owner: 10Mattflaschen) [23:27:24] thcipriani, thanks, sorry for the inconvenience. It's working on MediaWiki.org: https://www.mediawiki.org/wiki/Newsletter_talk:Something . I'll do the script now. [23:28:25] no problem, let me know when it's ready to go live [23:32:31] thcipriani, script is done. [23:32:43] matt_flaschen: ok, going live [23:33:02] (03PS2) 10Nuria: Stopping event collection for Page events [puppet] - 10https://gerrit.wikimedia.org/r/377667 (https://phabricator.wikimedia.org/T171629) [23:35:22] !log thcipriani@tin Synchronized wmf-config: SWAT: [[gerrit:377657|Enable Flow on Newsletter_talk on mediawiki.org]] T175502 (duration: 00m 52s) [23:35:31] ^ matt_flaschen live now [23:35:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:35:36] T175502: Please enable Flow in MediaWiki.org's Newsletter talk namespace - https://phabricator.wikimedia.org/T175502 [23:36:02] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377358 (https://phabricator.wikimedia.org/T171807) (owner: 10Smalyshev) [23:41:27] thcipriani, confirmed working (created then deleted Newsletter_talk:Sandbox) [23:41:36] awesome :) [23:41:46] thanks for checking [23:42:21] thcipriani: that patch should have no immediate effect, it's only used as config for weekly dumps now. So you can deploy it [23:42:52] SMalyshev: whenever Jenkins decides to merge it, will do :) [23:44:55] PROBLEM - MD RAID on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [23:45:27] (03Merged) 10jenkins-bot: Add more wikis to the list of the dumped [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377358 (https://phabricator.wikimedia.org/T171807) (owner: 10Smalyshev) [23:45:39] (03PS1) 10Faidon Liambotis: racktables: pass $racktables_host to module [puppet] - 10https://gerrit.wikimedia.org/r/377669 [23:46:29] (03CR) 10jenkins-bot: Add more wikis to the list of the dumped [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377358 (https://phabricator.wikimedia.org/T171807) (owner: 10Smalyshev) [23:47:23] !log thcipriani@tin Synchronized dblists/categories-rdf.dblist: SWAT: [[gerrit:377358|Add more wikis to the list of the dumped]] T171807 (duration: 00m 49s) [23:47:25] PROBLEM - configured eth on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [23:47:29] ^ SMalyshev live now [23:47:36] thcipriani: thanks! [23:47:59] (03PS2) 10Thcipriani: Revert "Don't deploy Timeless on fr.wiktionary for now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377346 (owner: 10Dereckson) [23:48:09] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377346 (owner: 10Dereckson) [23:48:25] PROBLEM - dhclient process on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [23:49:15] PROBLEM - kvm ssl cert on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [23:50:25] (03Merged) 10jenkins-bot: Revert "Don't deploy Timeless on fr.wiktionary for now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377346 (owner: 10Dereckson) [23:50:35] (03CR) 10jenkins-bot: Revert "Don't deploy Timeless on fr.wiktionary for now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377346 (owner: 10Dereckson) [23:51:05] PROBLEM - puppet last run on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [23:51:10] Dereckson: https://gerrit.wikimedia.org/r/#/c/377346/2 live on mwdebug1002, check please [23:51:25] works [23:51:46] Isarra: now you can do your on-wiki magic to reduce the wordmark :) ^ [23:51:48] (03CR) 10Faidon Liambotis: [C: 032] role::snapshot::common: properly scope included classes [puppet] - 10https://gerrit.wikimedia.org/r/377493 (https://phabricator.wikimedia.org/T171704) (owner: 10Giuseppe Lavagetto) [23:51:56] PROBLEM - DPKG on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [23:51:56] PROBLEM - salt-minion processes on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [23:52:02] * thcipriani deploys [23:52:15] PROBLEM - nova-compute process on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [23:52:45] PROBLEM - Disk space on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [23:53:58] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:377346|Revert "Do not deploy Timeless on fr.wiktionary for now"]] (duration: 00m 49s) [23:54:03] ^ Dereckson live now [23:54:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:54:19] (03PS2) 10Thcipriani: Enable Special:PageLanguage on mul.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377357 (https://phabricator.wikimedia.org/T175622) (owner: 10Dereckson) [23:54:25] (works in prod too) [23:54:54] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377357 (https://phabricator.wikimedia.org/T175622) (owner: 10Dereckson) [23:55:20] (03CR) 10Faidon Liambotis: "I'm not sure if we should do that, or if we should just go for data types directly. In any case, not strictly future-parser related, so ch" [puppet] - 10https://gerrit.wikimedia.org/r/377331 (owner: 10Dzahn) [23:57:55] (03Merged) 10jenkins-bot: Enable Special:PageLanguage on mul.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377357 (https://phabricator.wikimedia.org/T175622) (owner: 10Dereckson) [23:58:09] (03CR) 10jenkins-bot: Enable Special:PageLanguage on mul.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377357 (https://phabricator.wikimedia.org/T175622) (owner: 10Dereckson) [23:58:40] Dereckson: https://gerrit.wikimedia.org/r/#/c/377357/2 live on mwdebug1002, check please [23:58:59] (03PS31) 10Faidon Liambotis: cassandra: future parser and Puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/372124 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [23:59:40] Doesn't seem to work, checking more