[00:00:04] wait a second, there is already an older ticket that allowed this (krypton to logstash)... [00:00:05] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T0000). [00:00:05] RoanKattouw: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:25] RoanKattouw: Give us a few more minutes. [00:00:27] (03CR) 10CRusnov: [C: 03+1] "looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/490417 (https://phabricator.wikimedia.org/T215251) (owner: 10Herron) [00:00:29] I'll do the SWAT since I'm the only customer [00:00:33] Niharika: OK, ping me when you're done [00:06:50] 10Operations, 10Wikimedia-Logstash, 10serviceops: ensure httpd error logs from "misc apps" (krypton) end up in logstash - https://phabricator.wikimedia.org/T216090 (10Dzahn) [00:12:27] RoanKattouw: Go ahead and SWAT. [00:16:03] Thanks [00:16:34] (03PS2) 10Catrope: GrowthExperiments: Enable help panel search on kowiki and cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490372 (https://phabricator.wikimedia.org/T209301) (owner: 10Kosta Harlan) [00:16:38] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable help panel search on kowiki and cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490372 (https://phabricator.wikimedia.org/T209301) (owner: 10Kosta Harlan) [00:17:18] 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review: Decommission baham - https://phabricator.wikimedia.org/T199247 (10Papaul) 05Open→03Resolved Complete [00:17:44] (03Merged) 10jenkins-bot: GrowthExperiments: Enable help panel search on kowiki and cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490372 (https://phabricator.wikimedia.org/T209301) (owner: 10Kosta Harlan) [00:18:18] (03PS2) 10Catrope: Enable ORES (damaging-only) on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489934 (https://phabricator.wikimedia.org/T211032) [00:18:45] (03CR) 10Dzahn: [C: 03+1] "thanks, that was good to read. the only thing i could see would be too nitpicky so i'm not even saying it :) lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/490027 (owner: 10Alexandros Kosiaris) [00:20:46] (03PS7) 10D3r1ck01: Stop NavPopups gadget conflict with PagePreviews on Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) [00:22:04] (03CR) 10Dzahn: "https://www.quora.com/What-is-the-ideal-Max-thread-configuration-for-jetty-server-What-are-the-ways-to-estimate-calculate-the-max-threads-" [puppet] - 10https://gerrit.wikimedia.org/r/489475 (owner: 10Paladox) [00:23:56] (03CR) 10Dzahn: "i think we need to actually calculate the best numbers per the formula in that quora response etc.. or ideally from jetty upstream" [puppet] - 10https://gerrit.wikimedia.org/r/489475 (owner: 10Paladox) [00:24:12] (03PS3) 10Catrope: Enable ORES (damaging-only) on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489934 (https://phabricator.wikimedia.org/T211032) [00:24:23] (03CR) 10Catrope: [C: 03+2] Enable ORES (damaging-only) on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489934 (https://phabricator.wikimedia.org/T211032) (owner: 10Catrope) [00:24:39] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable help panel search on cswiki and kowiki (T209301) (duration: 00m 55s) [00:24:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:24:49] T209301: Help panel: enable searching help articles - https://phabricator.wikimedia.org/T209301 [00:25:25] (03Merged) 10jenkins-bot: Enable ORES (damaging-only) on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489934 (https://phabricator.wikimedia.org/T211032) (owner: 10Catrope) [00:27:01] (03CR) 10Dzahn: "but only if we actually solve a problem we have, which i am not sure about. there is no problem statement or event linked to this, right" [puppet] - 10https://gerrit.wikimedia.org/r/489475 (owner: 10Paladox) [00:28:27] (03PS6) 10Dzahn: tor: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/489347 [00:30:21] RoanKattouw: can I do a scap3 deploy of scholarships without stepping on your toes? [00:30:36] bd808: Can you wait 5 mins just to be sure? [00:30:43] RoanKattouw: yup [00:32:57] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable ORES (damaging only) on itwiki (T211032) (duration: 00m 53s) [00:33:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:33:01] T211032: Enable ORES filters on RC for Italian Wikipedia - https://phabricator.wikimedia.org/T211032 [00:35:40] bd808: all yours, go for it [00:35:52] thx [00:36:10] !log bd808@deploy1001 Started deploy [scholarships/scholarships@1d89fe2]: Live hack PHPMailer namespace T215302 [00:36:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:36:13] !log bd808@deploy1001 Finished deploy [scholarships/scholarships@1d89fe2]: Live hack PHPMailer namespace T215302 (duration: 00m 02s) [00:36:13] T215302: Website Revamp - https://phabricator.wikimedia.org/T215302 [00:36:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:39:17] (03CR) 10Dzahn: "noop in prod on torrelay1001" [puppet] - 10https://gerrit.wikimedia.org/r/489347 (owner: 10Dzahn) [00:40:06] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Papaul) @Marostegui in most cases the CPU1/CPU2 Machine check error detected is caused from outdated BIOS. I will recommend that we first update the BIOS. The sys... [00:41:40] PROBLEM - Check systemd state on cloudvirt1024 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:43:33] ##mpp [00:44:12] (03PS13) 10Dzahn: services: add missing 'mediawiki/services' prefix to git cloning [puppet] - 10https://gerrit.wikimedia.org/r/484602 (https://phabricator.wikimedia.org/T201366) [01:00:04] twentyafterfour: That opportune time is upon us again. Time for a Phabricator update deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T0100). [01:01:53] twentyafterfour: how about "phab swat" with that https://gerrit.wikimedia.org/r/c/operations/puppet/+/489121 [01:03:35] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1001/14676/scandium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/484602 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [01:07:29] 10Operations, 10Parsoid, 10serviceops: parsoid-vd - "no such file or directory, open '/srv/visualdiff/testreduce/testrun.ids" - https://phabricator.wikimedia.org/T215049 (10Dzahn) [01:07:44] mutante: sure [01:08:23] twentyafterfour: ok, before or after the upgrade [01:08:48] mutante: I don't have an update to deploy today [01:08:55] oh, heh :) [01:09:06] (03PS3) 10Dzahn: phabricator: Remove old mail config [puppet] - 10https://gerrit.wikimedia.org/r/489121 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [01:09:19] paladox: ^ [01:09:24] :) [01:10:21] (03CR) 10Dzahn: [C: 03+2] phabricator: Remove old mail config [puppet] - 10https://gerrit.wikimedia.org/r/489121 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [01:11:40] [/srv/phab/phabricator/conf/local/local.json]/content: content changed '{ [01:11:45] deployed [01:11:50] please test [01:12:37] ok [01:12:42] mutante comment on the link task [01:13:09] done [01:13:13] 10Operations, 10Mail, 10Phabricator, 10serviceops, and 2 others: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Dzahn) deployed in production (phab1001) , please test that mail still works [01:14:42] mutante i got your email [01:15:14] !log phab1001 - phabricator mail config converted to cluster.mailers to adjust to upstream change (T212989) [01:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:15:17] T212989: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 [01:15:36] paladox: good :) and i see outgoing in mail log too [01:15:42] though i dont use mail myself [01:15:46] for phab [01:15:52] heh :) [01:16:35] (03PS1) 10Ayounsi: Monitoring: add cr2-eqsin [puppet] - 10https://gerrit.wikimedia.org/r/490518 (https://phabricator.wikimedia.org/T213121) [01:16:39] twentyafterfour: also looks good to you? [01:16:55] seems good, I also don't have phab set up to email me [01:16:59] then i guess we can close that ticket [01:18:00] 10Operations, 10ops-eqiad, 10ops-eqsin, 10netops, 10Patch-For-Review: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 (10ayounsi) a:05Cmjohnson→03ayounsi [01:18:20] paladox' link at https://github.com/phacility/phabricator/commit/9d5b933ed5a36076801a95429d625941d04e77ff#diff-c083f10aba02bb34e6e823f23c980485 describes the upstream part nicely, ack [01:18:27] yup [01:18:39] "About a year ago (in 2018 Week 6, see D19003) we moved from individually configured mailers to `cluster.mailers`, primarily to support fallback across multiple mail providers." [01:19:40] chaomodus: do you get mail from phab? [01:20:34] I get mail about like things I get tagged in on yes [01:23:45] chaomodus: did you receive one just now (sorry for doing a test) [01:24:09] i added a comment on one you are subscribed to but not many others [01:25:45] (03PS14) 10Dzahn: services: add missing 'mediawiki/services' prefix to git cloning [puppet] - 10https://gerrit.wikimedia.org/r/484602 (https://phabricator.wikimedia.org/T201366) [01:34:38] (03CR) 10Dzahn: [C: 03+1] "new compiler output looks indeed good. ack. https://puppet-compiler.wmflabs.org/compiler1001/14676/scandium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/484602 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [01:37:24] mutante: I did indeed get that email [01:39:07] chaomodus: thanks! just double checking because we changed phab config, that was all [01:39:27] (03CR) 10GTirloni: [C: 03+2] hiera: upgrade prometheus-node-exporter to 0.17 in labs [puppet] - 10https://gerrit.wikimedia.org/r/489753 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [01:40:15] Hah I see [01:45:28] (03CR) 10Dzahn: [C: 03+2] "only parsoid-testing uses the combination of "service::node" but with "deployment => git" instead of scap3, so only this was / is affected" [puppet] - 10https://gerrit.wikimedia.org/r/484602 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [01:46:45] (03CR) 10jenkins-bot: group1 wikis to 1.33.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490398 (owner: 10Thcipriani) [01:46:47] (03CR) 10jenkins-bot: GrowthExperiments: Enable help panel search on kowiki and cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490372 (https://phabricator.wikimedia.org/T209301) (owner: 10Kosta Harlan) [01:46:49] (03CR) 10jenkins-bot: Enable ORES (damaging-only) on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489934 (https://phabricator.wikimedia.org/T211032) (owner: 10Catrope) [01:46:51] (03CR) 10Dzahn: "and for the record, only service::node uses service::deploy::gitclone" [puppet] - 10https://gerrit.wikimedia.org/r/484602 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [01:48:37] 10Operations, 10Parsoid, 10serviceops: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10Dzahn) [01:52:21] !log scandium - removing parsoid deploy dir and letting puppet re-clone it after merging gerrit fix 484602 - replace manual hack with proper puppet [01:52:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:52:48] !log scandium - removing parsoid deploy dir and letting puppet re-clone it after merging gerrit fix 484602 - replace manual clone with proper puppetization (T201366) [01:52:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:52:51] T201366: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 [02:02:09] (03PS1) 10Dzahn: visualdiff: create empty testrun.ids but don't change content [puppet] - 10https://gerrit.wikimedia.org/r/490521 (https://phabricator.wikimedia.org/T215049) [02:02:43] (03CR) 10jerkins-bot: [V: 04-1] visualdiff: create empty testrun.ids but don't change content [puppet] - 10https://gerrit.wikimedia.org/r/490521 (https://phabricator.wikimedia.org/T215049) (owner: 10Dzahn) [02:20:17] (03PS2) 10Dzahn: visualdiff: create empty testrun.ids but don't change content [puppet] - 10https://gerrit.wikimedia.org/r/490521 (https://phabricator.wikimedia.org/T215049) [02:21:56] (03CR) 10Dzahn: "https://puppet.com/docs/puppet/5.3/types/file.html#file-attribute-replace" [puppet] - 10https://gerrit.wikimedia.org/r/490521 (https://phabricator.wikimedia.org/T215049) (owner: 10Dzahn) [02:22:24] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 18236088 and 0 seconds [02:23:40] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 2968 and 3 seconds [02:46:36] PROBLEM - puppet last run on mc1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:12:40] RECOVERY - puppet last run on mc1023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:42:53] 10Operations, 10ops-eqiad, 10ops-eqsin, 10netops, 10Patch-For-Review: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 (10ayounsi) [03:55:09] (03PS1) 10Ayounsi: Make lvs5003 peer with cr2-eqsin [puppet] - 10https://gerrit.wikimedia.org/r/490525 (https://phabricator.wikimedia.org/T213121) [03:57:03] (03PS2) 10Ayounsi: Make lvs5003 peer with cr2-eqsin [puppet] - 10https://gerrit.wikimedia.org/r/490525 (https://phabricator.wikimedia.org/T213121) [03:57:12] 10Operations, 10ops-eqiad, 10ops-eqsin, 10netops, 10Patch-For-Review: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 (10ayounsi) [03:57:32] 10Operations, 10ops-eqiad, 10ops-eqsin, 10netops, 10Patch-For-Review: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 (10ayounsi) @BBlack @faidon @mark reviews welcome if you have some time! [05:24:48] 10Operations, 10DBA, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Nuria) [05:33:44] (03CR) 10Mobrovac: "> Sure, but I am not sure I understand the intent behind the last" [deployment-charts] - 10https://gerrit.wikimedia.org/r/488800 (owner: 10Alexandros Kosiaris) [05:43:23] 10Operations, 10netops: Fix codfw x-connect 65373 - https://phabricator.wikimedia.org/T215193 (10ayounsi) Ticket opened with CyrusOne. [05:45:35] (03CR) 10Mobrovac: Add EventBus multi endpoint configuration and add eventgate-analytics endpoint (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490418 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata) [06:01:46] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) >>! In T214840#4953022, @Papaul wrote: > @Marostegui in most cases the CPU1/CPU2 Machine check error detected is caused from outdated BIOS. I will rec... [06:02:55] 10Operations, 10Cloud-VPS, 10Traffic, 10netops, 10cloud-services-team (Kanban): Evaluate the possibility to add Juniper images to Openstack - https://phabricator.wikimedia.org/T180179 (10ayounsi) 05Stalled→03Open Bumping this task, now that WMCS is on Neutron. [06:05:29] (03PS1) 10Marostegui: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490535 (https://phabricator.wikimedia.org/T210713) [06:07:19] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1114 crashed - https://phabricator.wikimedia.org/T214720 (10Marostegui) @Cmjohnson should we also try to exchange the DIMM modules listed at T214720#4937872 and see if they fail again? [06:07:42] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490535 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:08:48] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490535 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:09:38] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490535 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:10:00] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1087 T210713 (duration: 00m 55s) [06:10:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:10:04] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [06:10:16] !log Deploy schema change on db1087 with replication, lag will be generated on labsdb:s8 T210713 [06:10:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:14:28] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) >>! In T215231#4951831, @Cmjohnson wrote: > @elukey is this a 1G or 10G rack? I see that labsdb1010 and 1011 have 1G, so I'd go for... [06:53:52] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Let's also remove the usage of base::service_unit here." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490090 (owner: 10Muehlenhoff) [06:56:54] 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Smalyshev) Thanks, T213318 makes it a bit clearer though not entirely 100% clear which parts stay in PHP and which parts move to JS. Would... [07:13:50] (03CR) 10Muehlenhoff: thumbor: add support for debian stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490405 (https://phabricator.wikimedia.org/T214597) (owner: 10Effie Mouzeli) [07:28:27] 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10daniel) @Smalyshev My understanding (which may be dated or incomplete) is this: there would be no PHP rendering, JS rendering would need to... [07:29:50] PROBLEM - Check systemd state on cloudvirt1024 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:36:54] !log Stop MySQL on db1106 for reboot - T214840 [07:36:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:57] T214840: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 [07:50:56] (03PS1) 10Muehlenhoff: Removed LDAP access for audiohazel [puppet] - 10https://gerrit.wikimedia.org/r/490540 [07:52:44] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490541 [07:54:54] (03CR) 10Muehlenhoff: [C: 03+2] Removed LDAP access for audiohazel [puppet] - 10https://gerrit.wikimedia.org/r/490540 (owner: 10Muehlenhoff) [07:54:57] (03PS8) 10D3r1ck01: Stop NavPopups gadget conflict with PagePreviews on Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) [07:54:59] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490541 (owner: 10Marostegui) [07:56:06] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490541 (owner: 10Marostegui) [07:57:13] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1087 T210713 (duration: 00m 54s) [07:57:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:15] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [08:01:20] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) After the FW and BIOS upgraded I have rebooted db1106 a number of times with 4.9.0-8 and this is the result: 1st reboot: OK 2nd reboot: OK 3rd reboot... [08:02:45] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490541 (owner: 10Marostegui) [08:05:52] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490543 [08:05:57] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490543 [08:08:22] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490543 (owner: 10Marostegui) [08:09:24] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490543 (owner: 10Marostegui) [08:10:31] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1106 T214840 (duration: 00m 52s) [08:10:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:34] T214840: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 [08:12:54] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490543 (owner: 10Marostegui) [08:16:03] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10jcrespo) > After the FW and BIOS upgraded I have rebooted db1106 a number of times with 4.9.0-8 and this is the result yay {meme, src="goat-for-it"} [08:19:04] (03PS2) 10Muehlenhoff: memcached: Unconditionally use systemd [puppet] - 10https://gerrit.wikimedia.org/r/487898 [08:20:35] (03CR) 10Muehlenhoff: [C: 03+2] memcached: Unconditionally use systemd [puppet] - 10https://gerrit.wikimedia.org/r/487898 (owner: 10Muehlenhoff) [08:22:49] (03CR) 10Marostegui: "@jcrespo any thoughts?" [puppet] - 10https://gerrit.wikimedia.org/r/489703 (owner: 10Marostegui) [08:24:41] (03CR) 10Jcrespo: [C: 03+1] mariadb: Disable local_infile on some roles [puppet] - 10https://gerrit.wikimedia.org/r/489703 (owner: 10Marostegui) [08:24:59] (03PS3) 10Marostegui: mariadb: Disable local_infile on some roles [puppet] - 10https://gerrit.wikimedia.org/r/489703 [08:26:05] (03CR) 10Marostegui: [C: 03+2] mariadb: Disable local_infile on some roles [puppet] - 10https://gerrit.wikimedia.org/r/489703 (owner: 10Marostegui) [08:43:15] (03PS2) 10Muehlenhoff: profile::redis::multidc: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/489732 [08:45:54] (03CR) 10Muehlenhoff: [C: 03+2] profile::redis::multidc: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/489732 (owner: 10Muehlenhoff) [08:48:16] 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10WMDE-leszek) We do not intend to maintain the "proper" UI logic in PHP. SSR service will render the page on the server side which will (via... [08:52:22] (03CR) 10Alexandros Kosiaris: [C: 03+1] service::node: Stop supporting trusty/upstart [puppet] - 10https://gerrit.wikimedia.org/r/490090 (owner: 10Muehlenhoff) [08:56:05] (03PS4) 10Effie Mouzeli: thumbor: add support for debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/490405 (https://phabricator.wikimedia.org/T214597) [08:56:40] (03CR) 10jerkins-bot: [V: 04-1] thumbor: add support for debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/490405 (https://phabricator.wikimedia.org/T214597) (owner: 10Effie Mouzeli) [08:57:13] (03CR) 10Effie Mouzeli: thumbor: add support for debian stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490405 (https://phabricator.wikimedia.org/T214597) (owner: 10Effie Mouzeli) [09:11:40] PROBLEM - Hadoop NodeManager on analytics1032 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:12:02] PROBLEM - Hadoop NodeManager on analytics1037 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:12:12] PROBLEM - Hadoop NodeManager on analytics1036 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:12:13] PROBLEM - Hadoop NodeManager on analytics1033 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:12:18] elukey: ^^^ [09:12:22] PROBLEM - Hadoop NodeManager on analytics1035 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:12:52] PROBLEM - Hadoop NodeManager on analytics1040 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:12:58] PROBLEM - Hadoop NodeManager on analytics1038 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:15:24] testing cluster sorry, I thought it was silenced [09:15:45] fiuuu, I was trying to check logs and about to call you ;) [09:16:20] lemme explain what happens :) [09:16:39] I had to turn off the yarn master in testing since they were interfering with production [09:16:51] so the orphaned yarn workers are complaining now [09:17:16] ok [09:17:38] * volans will have a look to https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [09:18:10] 10Operations, 10Wiki-Loves-Love, 10Wikimedia-Mailing-lists, 10User-jijiki: Reset password for wll mailling list - https://phabricator.wikimedia.org/T215390 (10jijiki) 05Open→03Resolved Please reopen if you have any issues [09:19:17] (03PS1) 10Gilles: Revert "Skip PDF JPX test on Jessie" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/490565 [09:22:13] (03CR) 10Gilles: [V: 03+2 C: 03+2] Revert "Skip PDF JPX test on Jessie" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/490565 (owner: 10Gilles) [09:22:38] (03PS1) 10Elukey: Silence icinga alerts for the Hadoop Testing cluster [puppet] - 10https://gerrit.wikimedia.org/r/490566 [09:23:41] (03CR) 10Elukey: [C: 03+2] Silence icinga alerts for the Hadoop Testing cluster [puppet] - 10https://gerrit.wikimedia.org/r/490566 (owner: 10Elukey) [09:24:52] (03PS1) 10Gehel: admin: extend Mathew's expiry date and update contact [puppet] - 10https://gerrit.wikimedia.org/r/490567 [09:25:16] (03CR) 10Muehlenhoff: "This should have been hieradata/role/common/analytics_test_cluster/hadoop/standby.yaml instead of hieradata/role/common/analytics_cluster/" [puppet] - 10https://gerrit.wikimedia.org/r/490566 (owner: 10Elukey) [09:26:07] (03PS4) 10Gilles: Upgrade to 2.3 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/488060 (https://phabricator.wikimedia.org/T198370) [09:27:14] moritzm: you are right, I spotted it after merging, I was about to send the fix [09:27:32] ack [09:27:44] (03CR) 10Mathew.onipe: [C: 03+1] "Thanks Nuria and Guillaume for this!!" [puppet] - 10https://gerrit.wikimedia.org/r/490567 (owner: 10Gehel) [09:28:54] (03PS1) 10Elukey: Silence icinga notifications only for the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/490568 [09:29:07] (03PS2) 10Gehel: profile::maps: remove replication factor [puppet] - 10https://gerrit.wikimedia.org/r/490389 (https://phabricator.wikimedia.org/T215521) (owner: 10Mathew.onipe) [09:29:45] 10Operations, 10Wikimedia-Mailing-lists, 10User-jijiki: Please create docker-sig@ mailing list - https://phabricator.wikimedia.org/T215563 (10jijiki) 05Open→03Resolved List has been created, all list options are set to default. Please add the list to https://meta.wikimedia.org/wiki/Mailing_lists/Overview... [09:30:25] (03CR) 10Gehel: [C: 03+2] profile::maps: remove replication factor [puppet] - 10https://gerrit.wikimedia.org/r/490389 (https://phabricator.wikimedia.org/T215521) (owner: 10Mathew.onipe) [09:31:28] (03CR) 10Gehel: "Note that increasing replication factor is documented in https://wikitech.wikimedia.org/wiki/Maps#Manual_steps, with the appropriate warni" [puppet] - 10https://gerrit.wikimedia.org/r/490389 (https://phabricator.wikimedia.org/T215521) (owner: 10Mathew.onipe) [09:31:41] (03CR) 10Muehlenhoff: [C: 03+1] Silence icinga notifications only for the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/490568 (owner: 10Elukey) [09:32:56] (03PS2) 10Elukey: Silence icinga notifications only for the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/490568 [09:35:21] (03CR) 10Elukey: [C: 03+2] Silence icinga notifications only for the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/490568 (owner: 10Elukey) [09:36:58] 10Operations, 10serviceops, 10HHVM, 10Wikimedia-production-error: mw1338 hhvm complaining intermittently about TC - https://phabricator.wikimedia.org/T216084 (10Joe) [09:45:24] (03CR) 10Gehel: [C: 04-1] "See comments inline" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [09:46:45] (03CR) 10Giuseppe Lavagetto: "Turns out I was wrong in my commit message, the patch works as-is on jessie too." (031 comment) [debs/python-etcd] - 10https://gerrit.wikimedia.org/r/490109 (owner: 10Giuseppe Lavagetto) [09:48:19] (03PS2) 10Giuseppe Lavagetto: Fix (again) dnspython dependency [debs/python-etcd] - 10https://gerrit.wikimedia.org/r/490109 (https://phabricator.wikimedia.org/T209136) [09:49:42] (03CR) 10jerkins-bot: [V: 04-1] Fix (again) dnspython dependency [debs/python-etcd] - 10https://gerrit.wikimedia.org/r/490109 (https://phabricator.wikimedia.org/T209136) (owner: 10Giuseppe Lavagetto) [09:55:02] (03PS5) 10Effie Mouzeli: thumbor: add support for debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/490405 (https://phabricator.wikimedia.org/T214597) [09:55:36] (03PS1) 10Elukey: Allow the configuration of the ZK rmstore path in yarn-site.xml [puppet/cdh] - 10https://gerrit.wikimedia.org/r/490572 [09:56:24] (03PS2) 10Elukey: Allow the configuration of the ZK rmstore path in yarn-site.xml [puppet/cdh] - 10https://gerrit.wikimedia.org/r/490572 [09:57:45] (03CR) 10Alexandros Kosiaris: [C: 03+1] lists: enforce domain or ip literal HELO check [puppet] - 10https://gerrit.wikimedia.org/r/490416 (https://phabricator.wikimedia.org/T215251) (owner: 10Herron) [09:58:30] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] "Merging as per my discussion with Filippo, his -1 was based on a false premise I gave him." [debs/python-etcd] - 10https://gerrit.wikimedia.org/r/490109 (https://phabricator.wikimedia.org/T209136) (owner: 10Giuseppe Lavagetto) [09:58:39] (03CR) 10Alexandros Kosiaris: [C: 03+1] lists: drop connection if remote tries to send HELO [puppet] - 10https://gerrit.wikimedia.org/r/490417 (https://phabricator.wikimedia.org/T215251) (owner: 10Herron) [09:59:12] (03CR) 10DCausse: Add wdqs data transfer cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [09:59:18] (03CR) 10Alexandros Kosiaris: [C: 03+1] lists: add 5 second smtp banner delay [puppet] - 10https://gerrit.wikimedia.org/r/490481 (https://phabricator.wikimedia.org/T215251) (owner: 10Herron) [10:01:01] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "thanks, merging, we can always make it better in subsequent patches :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/490027 (owner: 10Alexandros Kosiaris) [10:01:23] (03CR) 10Muehlenhoff: [C: 03+1] thumbor: add support for debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/490405 (https://phabricator.wikimedia.org/T214597) (owner: 10Effie Mouzeli) [10:01:41] (03CR) 10Elukey: [C: 03+2] Allow the configuration of the ZK rmstore path in yarn-site.xml [puppet/cdh] - 10https://gerrit.wikimedia.org/r/490572 (owner: 10Elukey) [10:05:18] (03PS2) 10Alexandros Kosiaris: Add GPLv3 license to the repo to be used by all charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/490028 [10:08:26] (03CR) 10Alexandros Kosiaris: [C: 03+2] Remove all non kubernetes related zotero stuff from repo [puppet] - 10https://gerrit.wikimedia.org/r/490069 (owner: 10Alexandros Kosiaris) [10:08:34] (03PS2) 10Alexandros Kosiaris: Remove all non kubernetes related zotero stuff from repo [puppet] - 10https://gerrit.wikimedia.org/r/490069 [10:13:06] (03PS1) 10Elukey: Define the zookeeper rmstore path on each Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/490575 [10:14:58] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/14679/" [puppet] - 10https://gerrit.wikimedia.org/r/490575 (owner: 10Elukey) [10:24:34] (03CR) 10Alexandros Kosiaris: "I 've switch to GPLv3, although I have to say I have exactly 0 expectations of the extra provisions provided by it over apache-2 ever bein" [deployment-charts] - 10https://gerrit.wikimedia.org/r/490028 (owner: 10Alexandros Kosiaris) [10:30:16] (03CR) 10DCausse: Add wdqs data transfer cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [10:39:49] (03CR) 10DCausse: Add wdqs data transfer cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [10:42:32] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, +Luca +Andrew" [puppet] - 10https://gerrit.wikimedia.org/r/490401 (https://phabricator.wikimedia.org/T213898) (owner: 10Herron) [10:46:43] (03CR) 10Elukey: [C: 03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/490401 (https://phabricator.wikimedia.org/T213898) (owner: 10Herron) [10:48:18] (03PS1) 10Muehlenhoff: Also sync pcre3 to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/490579 [10:53:30] !log bounce prometheus instances on prometheus2004 to take a snapshot [10:53:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:55:56] PROBLEM - Maps - OSM synchronization lag - codfw on icinga2001 is CRITICAL: 3.85e+05 ge 2.592e+05 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1 [10:56:05] there will be a bunch of UNKNOWN in icinga btw, should be recovering now [10:59:37] !log installing python3.4 security updates [10:59:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:50] !log Disabling puppet on thumbor* servers - T214597 [11:02:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:53] T214597: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 [11:03:06] jijiki: \o/ [11:03:56] !log rolling security updates for curl [11:03:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:37] <_joe_> !log upgrading python3-etcd on stretch T209136 [11:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:40] T209136: python3-etcd needs python3-dnspython - https://phabricator.wikimedia.org/T209136 [11:05:13] <_joe_> jbond42: oh sorry [11:05:13] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Add GPLv3 license to the repo to be used by all charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/490028 (owner: 10Alexandros Kosiaris) [11:05:17] <_joe_> lemme know when you're done [11:05:26] <_joe_> I risk interfering with your operations [11:05:50] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Add GPLv3 license to the repo to be used by all charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/490028 (owner: 10Alexandros Kosiaris) [11:06:03] _joe_: :D no problem will be done in 15 mins [11:07:54] (03CR) 10Effie Mouzeli: [C: 03+2] thumbor: add support for debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/490405 (https://phabricator.wikimedia.org/T214597) (owner: 10Effie Mouzeli) [11:08:03] (03PS6) 10Effie Mouzeli: thumbor: add support for debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/490405 (https://phabricator.wikimedia.org/T214597) [11:12:48] <_joe_> jijiki: uhm have you tested what happens on current thumbor servers? [11:13:16] _joe_: I disabled puppet for that reason [11:13:37] <_joe_> yeah I thought librsvg2-2 might need to go in a distro conditional [11:13:39] <_joe_> that's all [11:14:04] hmm well in theory since it is already installed on the current servers [11:14:09] it should do nothing [11:14:13] <_joe_> but worse that can happen is puppet fails to install it and you can add the conditional at that point [11:14:15] <_joe_> oh ok [11:14:18] <_joe_> then it's fine [11:14:36] <_joe_> and yes, if we don't plan to reinstall with jessie ever again (yay) it makes sense :) [11:14:52] yep :) [11:15:09] <_joe_> (I almost always disable puppet for any non-trivial change btw) [11:21:04] (03CR) 10Alexandros Kosiaris: [C: 03+2] "PCC at https://puppet-compiler.wmflabs.org/compiler1001/14680/ is OK, merging" [puppet] - 10https://gerrit.wikimedia.org/r/490069 (owner: 10Alexandros Kosiaris) [11:21:17] (03PS3) 10Alexandros Kosiaris: Remove all non kubernetes related zotero stuff from repo [puppet] - 10https://gerrit.wikimedia.org/r/490069 [11:21:20] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Remove all non kubernetes related zotero stuff from repo [puppet] - 10https://gerrit.wikimedia.org/r/490069 (owner: 10Alexandros Kosiaris) [11:21:39] _joe_: im done with updates [11:21:54] <_joe_> jbond42: quite literally 15 minutes :D [11:21:56] <_joe_> thanks [11:22:01] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10faidon) The medium-term plan is for this data to be entered into Netbox after a server is racked but before it's provisioned or even powered up, and that data to be used by our tooling to configure and execute the provisio... [11:22:13] :) [11:24:22] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Thanks @EBernhardson! Buried in the previous updates there is a solution to this mess, namely using python3.6 via snapshot.debian.org.... [11:24:46] (03CR) 10Giuseppe Lavagetto: Add a prune action (031 comment) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485499 (https://phabricator.wikimedia.org/T207703) (owner: 10Giuseppe Lavagetto) [11:27:51] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/490353 (https://phabricator.wikimedia.org/T213527) (owner: 10Muehlenhoff) [11:34:19] (03PS1) 10Filippo Giunchedi: prometheus: add rules_k8s.yml converted from rules_k8s.conf [puppet] - 10https://gerrit.wikimedia.org/r/490582 (https://phabricator.wikimedia.org/T187987) [11:36:38] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10MoritzMuehlenhoff) >>! In T148843#4953772, @elukey wrote: > 2) the null pointer is due to some code for the Hawaii GPU cards (like ours), so no... [11:38:15] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: add rules_k8s.yml converted from rules_k8s.conf [puppet] - 10https://gerrit.wikimedia.org/r/490582 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [11:39:30] (03CR) 10Mathew.onipe: Add wdqs data transfer cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [11:40:03] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Yep definitely, I am all for it. I hoped to get this GPU working beforehand to have a vague idea about what card we needed (and what qu... [11:43:12] <_joe_> !log restarting hhvm on mw1338, hot tc exhausted T216084 [11:43:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:15] T216084: mw1338 hhvm complaining intermittently about TC - https://phabricator.wikimedia.org/T216084 [11:43:18] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10MoritzMuehlenhoff) https://rocm.github.io/ROCmInstall.html#supported-gpus should serve as a useful enough base to select a new GPU I guess (we'... [11:43:38] 10Operations, 10serviceops, 10HHVM, 10Wikimedia-production-error: mw1338 hhvm complaining intermittently about TC - https://phabricator.wikimedia.org/T216084 (10Joe) a:03Joe [11:44:00] PROBLEM - HHVM jobrunner on mw1338 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.074 second response time [11:44:35] 10Operations, 10serviceops, 10HHVM, 10Wikimedia-production-error: mw1338 hhvm complaining intermittently about TC - https://phabricator.wikimedia.org/T216084 (10Joe) 05Open→03Resolved The Jit TC hot portion was completely full and exhausted on that server. A restart of HHVM should've solved the issue. [11:45:14] RECOVERY - HHVM jobrunner on mw1338 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.077 second response time [11:48:11] (03PS1) 10Effie Mouzeli: thumbor: add support for debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/490583 (https://phabricator.wikimedia.org/T214597) [11:49:13] 10Operations, 10Operations-Software-Development, 10Core Platform Team Backlog (Watching / External), 10Patch-For-Review, 10Services (watching): python3-etcd needs python3-dnspython - https://phabricator.wikimedia.org/T209136 (10Joe) 05Open→03Resolved [11:50:25] 10Operations, 10RESTBase, 10RESTBase-Cassandra, 10Core Platform Team Backlog (Watching / External), and 2 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Joe) 05Open→03Resolved ` oblivian@restbase1016:~$ sudo -i pool-restbase oblivian@restbase1016:~$ echo $? 0 ` [11:51:09] (03CR) 10Effie Mouzeli: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/14681/ NOOP (expected for jessie hosts)" [puppet] - 10https://gerrit.wikimedia.org/r/490583 (https://phabricator.wikimedia.org/T214597) (owner: 10Effie Mouzeli) [11:58:13] (03PS26) 10Jbond: Improve CI checks to ensure a basic catalogue compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275) [11:58:36] (03CR) 10Jbond: "Ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275) (owner: 10Jbond) [11:59:56] 10Operations, 10monitoring, 10Patch-For-Review: Evaluate/integrate rasdaemon as a replacement for mcelog - https://phabricator.wikimedia.org/T205396 (10jbond) 05Open→03Resolved rasdaemon is now part of the buster policy [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T1200). [12:00:04] kart_: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:14] o/ [12:00:18] (03PS1) 10WMDE-leszek: Added settings to defined Wikibase entity types that have no RDF output [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490586 [12:00:18] I can SWAT today [12:00:20] (03PS1) 10WMDE-leszek: Disable RDF output of mediainfo Wikibase entities [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490587 (https://phabricator.wikimedia.org/T213483) [12:00:28] kart_: around for swat? [12:00:40] (03PS37) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [12:00:53] (03CR) 10Jbond: "Ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [12:01:27] zeljkof: yes [12:01:43] (03PS3) 10Jbond: Update the location of the contacts file [puppet] - 10https://gerrit.wikimedia.org/r/490362 (https://phabricator.wikimedia.org/T82937) [12:01:45] kart_: ok, I'll ping you when the patch is at mwdebug for testing [12:01:49] (03CR) 10jerkins-bot: [V: 04-1] Added settings to defined Wikibase entity types that have no RDF output [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490586 (owner: 10WMDE-leszek) [12:02:01] zeljkof: OK. Nothing much we can test, but still.. [12:02:15] (03CR) 10jerkins-bot: [V: 04-1] Disable RDF output of mediainfo Wikibase entities [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490587 (https://phabricator.wikimedia.org/T213483) (owner: 10WMDE-leszek) [12:03:04] (03CR) 10Jbond: [C: 03+2] Update the location of the contacts file [puppet] - 10https://gerrit.wikimedia.org/r/490362 (https://phabricator.wikimedia.org/T82937) (owner: 10Jbond) [12:04:30] kart_: nobody from your team is a deployer? (just asking) [12:04:30] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) I'd stick with: ` GFX9 GPUs “Vega 10” chips, such as on the AMD Radeon RX Vega 64 and Radeon Instinct MI25 “Vega 7nm” chips ` The pri... [12:04:51] because, you know, you could deploy your patches yourself :) [12:05:06] zeljkof: That reminds me, I'm. But, I hate to deploy my patches. [12:05:14] kart_: uh oh :P [12:05:26] I hate to deploy other people patches ;P [12:05:32] zeljkof: Let's restart with next deploy? [12:05:35] that will have to change [12:06:10] kart_: sure, there's no rush, but devs should deploy own patches, we are here in case we're needed, but deployments should be taken care of by devs [12:06:16] zeljkof: Probably I can be your trainee for a week and I'll refresh my deployment knowledge. [12:07:08] zeljkof: Let's start with next week's EU Evening SWAT. [12:07:13] kart_: sure, I'm almost always around, both IRC and Meet training is available :) [12:07:36] I'm around only for EU SWAT, but other #releng people will be glad to help you at other times [12:10:35] cool. Thanks! [12:13:50] kart_: ok, merged, will be at mwdebug in a minute or two [12:14:08] OK! [12:16:46] kart_: it's at mwdebug1002, please test [12:18:37] zeljkof: OK. [12:20:08] zeljkof: go ahead. Nothing much to test except it doesn't break current status. [12:20:22] ok, deploying [12:21:31] !log zfilipin@deploy1001 Synchronized php-1.33.0-wmf.17/extensions/ExternalGuidance/: SWAT: [[gerrit:490523|Fix the eventlogging schema definition as per manifest_version=2]] (duration: 00m 55s) [12:21:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:40] kart_: deployed! [12:21:48] zeljkof: thanks! [12:21:49] please test and thanks for deploying with #releng ;) [12:22:13] !log EU SWAT finished [12:22:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:35] (03PS1) 10ArielGlenn: generate recombined multistream index file without (m)awk [dumps] - 10https://gerrit.wikimedia.org/r/490591 (https://phabricator.wikimedia.org/T215414) [12:26:20] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Also sync pcre3 to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/490579 (owner: 10Muehlenhoff) [12:51:07] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10BBlack) >>! In T205897#4953769, @faidon wrote: > The medium-term plan is for this data to be entered into Netbox after a server is racked but before it's provisioned or even powered up, and that data to be used by our tool... [12:55:41] (03PS2) 10Muehlenhoff: Also sync pcre3/libzip to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/490579 [12:58:53] (03PS2) 10WMDE-leszek: Added a setting to define Wikibase entity types that have no RDF output [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490586 (https://phabricator.wikimedia.org/T213483) [12:58:58] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Overall LGTM, an inline comments plus this misses a hieradata rule for the user. The user should be in the public repo, the password in th" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490397 (owner: 10CRusnov) [12:59:06] Hi ops team - I have trouble editing wikitech - I get JS errors (Cannot read property 'getDefaultMode' of undefined) when trying to visual-edit or edit-source [12:59:44] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add ganeti read-only user deployment (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490397 (owner: 10CRusnov) [13:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T1300) [13:00:29] (03PS2) 10WMDE-leszek: Disable RDF output of mediainfo Wikibase entities [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490587 (https://phabricator.wikimedia.org/T213483) [13:01:53] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Add GPLv3 license to the repo to be used by all charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/490028 (owner: 10Alexandros Kosiaris) [13:02:27] 10Operations, 10DC-Ops, 10Wikimedia-Incident, 10cloud-services-team (Kanban): Increase visibility of auto-generated tasks for RAID errors - https://phabricator.wikimedia.org/T216133 (10aborrero) [13:03:01] 10Operations, 10DC-Ops, 10Wikimedia-Incident, 10cloud-services-team (Kanban): Increase visibility of auto-generated tasks for RAID errors - https://phabricator.wikimedia.org/T216133 (10aborrero) p:05Triage→03High [13:08:46] (03PS3) 10Giuseppe Lavagetto: Add a prune action [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485499 (https://phabricator.wikimedia.org/T207703) [13:11:35] (03CR) 10Mathew.onipe: Add wdqs data transfer cookbook (035 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [13:13:00] (03PS6) 10Mathew.onipe: Add wdqs data transfer cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) [13:13:52] (03PS3) 10Muehlenhoff: Also sync pcre3/libzip to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/490579 [13:14:10] (03CR) 10Mathew.onipe: Add wdqs data transfer cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [13:15:42] PROBLEM - Check systemd state on cloudvirt1024 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:17:52] (03CR) 10Muehlenhoff: [C: 03+2] Also sync pcre3/libzip to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/490579 (owner: 10Muehlenhoff) [13:21:48] PROBLEM - Check systemd state on cloudvirt1024 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:23:44] (03CR) 10jerkins-bot: [V: 04-1] Add a prune action [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485499 (https://phabricator.wikimedia.org/T207703) (owner: 10Giuseppe Lavagetto) [13:24:01] 10Operations, 10monitoring, 10Patch-For-Review: Serve >= 50% of production Prometheus systems with Prometheus v2 - https://phabricator.wikimedia.org/T187987 (10fgiunchedi) Status update: yesterday I've reimaged prometheus2003 and prometheus 2.7.1 has been running there, host is still depooled but collecting... [13:24:33] ACKNOWLEDGEMENT - MegaRAID on cloudvirt1024 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T216135 [13:24:39] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T216135 (10ops-monitoring-bot) [13:25:58] (03PS2) 10Gehel: admin: extend Mathew's expiry date and update contact [puppet] - 10https://gerrit.wikimedia.org/r/490567 [13:27:06] (03CR) 10Gehel: [C: 03+2] admin: extend Mathew's expiry date and update contact [puppet] - 10https://gerrit.wikimedia.org/r/490567 (owner: 10Gehel) [13:27:46] (03PS5) 10Gehel: cassandra: package should be installed after apt-get update [puppet] - 10https://gerrit.wikimedia.org/r/490303 (https://phabricator.wikimedia.org/T214073) [13:29:01] (03CR) 10Gehel: [C: 03+2] cassandra: package should be installed after apt-get update [puppet] - 10https://gerrit.wikimedia.org/r/490303 (https://phabricator.wikimedia.org/T214073) (owner: 10Gehel) [13:31:08] RECOVERY - puppet last run on phab1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:34:06] (03PS2) 10Gehel: elasticsearch: check mean shard size intead of largest [puppet] - 10https://gerrit.wikimedia.org/r/490333 [13:35:25] (03CR) 10Gehel: [C: 03+2] elasticsearch: check mean shard size intead of largest [puppet] - 10https://gerrit.wikimedia.org/r/490333 (owner: 10Gehel) [13:35:57] 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10Jhernandez) Given the SCB deprecation and the incoming work load to re-architect this to work on Kubernet... [13:37:53] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T216068 (10aborrero) [13:37:56] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T215892 (10aborrero) [13:39:24] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T216135 (10aborrero) [13:39:28] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T215892 (10aborrero) [13:39:35] !log T215892 icinga downtime cloudvirt1024 for 2 weeks [13:39:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:38] T215892: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T215892 [13:40:49] (03PS1) 10Filippo Giunchedi: install_server: use default distribution for logstash100[789] [puppet] - 10https://gerrit.wikimedia.org/r/490601 (https://phabricator.wikimedia.org/T213898) [13:40:53] (03PS1) 10Filippo Giunchedi: scap: use logstash service name for logstash_host [puppet] - 10https://gerrit.wikimedia.org/r/490602 (https://phabricator.wikimedia.org/T213898) [13:41:55] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: use default distribution for logstash100[789] [puppet] - 10https://gerrit.wikimedia.org/r/490601 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi) [13:42:03] (03PS2) 10Filippo Giunchedi: install_server: use default distribution for logstash100[789] [puppet] - 10https://gerrit.wikimedia.org/r/490601 (https://phabricator.wikimedia.org/T213898) [13:54:07] (03CR) 10Filippo Giunchedi: [C: 04-1] "I take this back, 9200 (elasticsearch) isn't exposed via lvs, probably should though" [puppet] - 10https://gerrit.wikimedia.org/r/490602 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi) [13:56:17] Any sysadmin around? I need to perform a large history deletion [13:58:53] (03CR) 10Gehel: [C: 04-1] Add wdqs data transfer cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [14:00:04] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T1400) [14:01:01] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] thumbor: add support for debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/490583 (https://phabricator.wikimedia.org/T214597) (owner: 10Effie Mouzeli) [14:01:10] (03CR) 10Gehel: [C: 03+2] elasticsearch: add doc type to delete query [software/spicerack] - 10https://gerrit.wikimedia.org/r/490383 (https://phabricator.wikimedia.org/T207920) (owner: 10Mathew.onipe) [14:01:24] (03PS2) 10Effie Mouzeli: thumbor: add support for debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/490583 (https://phabricator.wikimedia.org/T214597) [14:02:05] (03CR) 10jenkins-bot: elasticsearch: add doc type to delete query [software/spicerack] - 10https://gerrit.wikimedia.org/r/490383 (https://phabricator.wikimedia.org/T207920) (owner: 10Mathew.onipe) [14:06:18] (03PS1) 10Elukey: profile::analytics::refinery: require python3-dnspthon [puppet] - 10https://gerrit.wikimedia.org/r/490604 (https://phabricator.wikimedia.org/T212386) [14:07:27] (03PS2) 10Elukey: profile::analytics::refinery: require python3-dnspthon [puppet] - 10https://gerrit.wikimedia.org/r/490604 (https://phabricator.wikimedia.org/T212386) [14:08:39] 10Operations, 10ops-eqiad: Broken memory on thumbor1004 - https://phabricator.wikimedia.org/T207721 (10CDanis) [14:08:42] 10Operations, 10ops-eqiad, 10Thumbor, 10serviceops: thumbor1004 memory errors - https://phabricator.wikimedia.org/T215411 (10CDanis) [14:09:09] (03PS3) 10Elukey: profile::analytics::refinery: require python3-dnspthon [puppet] - 10https://gerrit.wikimedia.org/r/490604 (https://phabricator.wikimedia.org/T212386) [14:12:01] 10Operations, 10Traffic: Investigating using CI to automate testing VCL changes against all cluster/dc combos - https://phabricator.wikimedia.org/T216140 (10akosiaris) [14:12:06] 10Operations, 10Traffic: Investigating using CI to automate testing VCL changes against all cluster/dc combos - https://phabricator.wikimedia.org/T216140 (10akosiaris) p:05Triage→03Low [14:12:28] (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery: require python3-dnspthon [puppet] - 10https://gerrit.wikimedia.org/r/490604 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey) [14:12:38] !log Enabling puppet on thumbor* servers - T214597 [14:12:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:41] T214597: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 [14:14:25] (03PS2) 10Filippo Giunchedi: scap: use logstash1008 for logstash_host [puppet] - 10https://gerrit.wikimedia.org/r/490602 (https://phabricator.wikimedia.org/T213898) [14:19:11] (03CR) 10Filippo Giunchedi: [C: 03+2] scap: use logstash1008 for logstash_host [puppet] - 10https://gerrit.wikimedia.org/r/490602 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi) [14:19:20] (03PS3) 10Filippo Giunchedi: scap: use logstash1008 for logstash_host [puppet] - 10https://gerrit.wikimedia.org/r/490602 (https://phabricator.wikimedia.org/T213898) [14:20:07] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Papaul) @Marostegui this can be done anytime today. Just let me know when the server is down. Thanks [14:20:44] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) @Papaul thanks - I am going to put it down now. Will ping you on IRC once it is down Thanks! [14:20:47] !log Stop MySQL on db2085 for on-site maintenance - T214840 [14:20:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:49] T214840: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 [14:24:04] Melos: What do you need? [14:26:32] marostegui: Usually before doing a bigdelete we ask for a sysadmin if something should go wrong (there are 57000 entries) [14:26:42] Melos: on which wiki is that? [14:26:50] itwiki [14:27:08] Melos: Let me check one thing [14:27:28] 10Operations, 10ops-eqiad, 10Thumbor, 10serviceops: thumbor1004 memory errors - https://phabricator.wikimedia.org/T215411 (10jijiki) @RobH How should we proceed? [14:27:53] (03PS1) 10Arturo Borrero Gonzalez: systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) [14:28:43] Melos: You can proceed [14:28:50] ok thank you [14:28:53] (03CR) 10jerkins-bot: [V: 04-1] systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) (owner: 10Arturo Borrero Gonzalez) [14:29:00] Melos: Can you let me know when done? [14:29:47] sure [14:29:52] thank you [14:30:13] (03PS1) 10Gehel: maps: restrict OSM sync check to maps2001 [puppet] - 10https://gerrit.wikimedia.org/r/490607 (https://phabricator.wikimedia.org/T215521) [14:31:56] marostegui: done thank you [14:32:09] Excellent! Thanks! [14:32:23] 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10dr0ptp4kt) Hi all - I was aware of this task but hadn't been following it. But it was brought to my atten... [14:34:45] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, 10User-jijiki: Thumbor upgrade to stretch plan - https://phabricator.wikimedia.org/T214597 (10jijiki) [14:34:54] (03PS4) 10Giuseppe Lavagetto: Add a prune action [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485499 (https://phabricator.wikimedia.org/T207703) [14:36:08] (03CR) 10Mathew.onipe: [C: 03+1] maps: restrict OSM sync check to maps2001 [puppet] - 10https://gerrit.wikimedia.org/r/490607 (https://phabricator.wikimedia.org/T215521) (owner: 10Gehel) [14:38:13] (03PS7) 10Mathew.onipe: Add wdqs data transfer cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) [14:39:08] (03CR) 10Mathew.onipe: Add wdqs data transfer cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [14:43:12] (03PS1) 10Effie Mouzeli: Upgrade thumbor2002 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/490610 (https://phabricator.wikimedia.org/T214597) [14:43:56] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add a prune action [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485499 (https://phabricator.wikimedia.org/T207703) (owner: 10Giuseppe Lavagetto) [14:45:34] !log depool and stop logstash1009 for stretch reimage - T213898 [14:45:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:37] T213898: Replace and expand Elasticsearch storage in eqiad and upgrade the cluster from Debian jessie to stretch - https://phabricator.wikimedia.org/T213898 [14:46:12] (03Merged) 10jenkins-bot: Add a prune action [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485499 (https://phabricator.wikimedia.org/T207703) (owner: 10Giuseppe Lavagetto) [14:46:52] (03CR) 10jenkins-bot: Add a prune action [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485499 (https://phabricator.wikimedia.org/T207703) (owner: 10Giuseppe Lavagetto) [14:48:20] (03CR) 10CDanis: [C: 03+1] Upgrade thumbor2002 to stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490610 (https://phabricator.wikimedia.org/T214597) (owner: 10Effie Mouzeli) [14:49:41] (03CR) 10Muehlenhoff: Upgrade thumbor2002 to stretch (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490610 (https://phabricator.wikimedia.org/T214597) (owner: 10Effie Mouzeli) [14:49:43] (03CR) 10Gehel: [C: 03+2] maps: restrict OSM sync check to maps2001 [puppet] - 10https://gerrit.wikimedia.org/r/490607 (https://phabricator.wikimedia.org/T215521) (owner: 10Gehel) [14:49:56] (03PS1) 10Alexandros Kosiaris: HTTP availability alerts: Decrease retries to 1 [puppet] - 10https://gerrit.wikimedia.org/r/490612 [14:50:46] PROBLEM - logstash syslog TCP port on logstash1009 is CRITICAL: connect to address 127.0.0.1 and port 10514: Connection refused [14:50:56] PROBLEM - logstash log4j TCP port on logstash1009 is CRITICAL: connect to address 127.0.0.1 and port 4560: Connection refused [14:51:12] PROBLEM - Check systemd state on logstash1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:51:32] (03CR) 10Muehlenhoff: [C: 03+1] systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) (owner: 10Arturo Borrero Gonzalez) [14:51:34] PROBLEM - logstash JSON linesTCP port on logstash1009 is CRITICAL: connect to address 127.0.0.1 and port 11514: Connection refused [14:51:34] PROBLEM - logstash process on logstash1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 499 (logstash), command name java, args logstash [14:51:43] ah that's me, forgot to silence [14:53:04] RECOVERY - Maps - OSM synchronization lag - codfw on icinga2001 is OK: (C)2.592e+05 ge (W)1.764e+05 ge 1.4e+05 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1 [14:53:07] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.15 [software/spicerack] - 10https://gerrit.wikimedia.org/r/490614 [15:01:02] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.15 [software/spicerack] - 10https://gerrit.wikimedia.org/r/490614 (owner: 10Volans) [15:01:08] (03CR) 10Mathew.onipe: [C: 03+1] CHANGELOG: add changelogs for release v0.0.15 [software/spicerack] - 10https://gerrit.wikimedia.org/r/490614 (owner: 10Volans) [15:07:15] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.15 [software/spicerack] - 10https://gerrit.wikimedia.org/r/490614 (owner: 10Volans) [15:08:14] (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.15 [software/spicerack] - 10https://gerrit.wikimedia.org/r/490614 (owner: 10Volans) [15:09:38] (03PS1) 10Volans: Upstream release v0.0.15 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/490616 [15:10:55] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Papaul) a:05Papaul→03Marostegui Upgrade BIOS from 2.4.3 to 2.9.1 IDRAC from 2.40. to 2.60 [15:12:06] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) Thank you! I will delete the idrac logs and start testing [15:12:09] !log Clear idrac logs from db2085 - T214840 [15:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:13] T214840: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 [15:12:14] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T216004 (10Andrew) This host is now fully drained, so the dcops folks can do whatever, whenever. [15:12:35] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T215892 (10Andrew) This host is now fully drained, so the dcops folks can do whatever, whenever. [15:17:01] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.15 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/490616 (owner: 10Volans) [15:17:42] RECOVERY - Check systemd state on logstash1009 is OK: OK - running: The system is fully operational [15:20:50] RECOVERY - logstash process on logstash1009 is OK: PROCS OK: 1 process with UID = 498 (logstash), command name java, args logstash [15:23:14] (03Merged) 10jenkins-bot: Upstream release v0.0.15 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/490616 (owner: 10Volans) [15:26:19] (03PS1) 10Andrew Bogott: cloudvirt1018: when we reimage, do so as Stretch [puppet] - 10https://gerrit.wikimedia.org/r/490619 (https://phabricator.wikimedia.org/T212302) [15:26:46] !log uploaded spicerack_0.0.15-1_amd64.deb to apt.wikimedia.org stretch-wikimedia [15:26:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:40] (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt1018: when we reimage, do so as Stretch [puppet] - 10https://gerrit.wikimedia.org/r/490619 (https://phabricator.wikimedia.org/T212302) (owner: 10Andrew Bogott) [15:27:52] (03PS1) 10Marostegui: dbproxy1011: Depool labsdb1009 [puppet] - 10https://gerrit.wikimedia.org/r/490620 (https://phabricator.wikimedia.org/T210713) [15:28:02] 10Operations, 10DC-Ops, 10Wikimedia-Incident, 10cloud-services-team (Kanban): Increase visibility of auto-generated tasks for RAID errors - https://phabricator.wikimedia.org/T216133 (10faidon) We discussed this a little bit yesterday, and T216088 was filed to further discuss this. Help there is welcome :)... [15:28:17] (03CR) 10Marostegui: [C: 04-2] "Wait for labsdb1010 to be done" [puppet] - 10https://gerrit.wikimedia.org/r/490620 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [15:28:42] !log upgraded spicerack to v0.0.15 on cumin[12]001 [15:28:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:47] onimisionipe, gehel: ^^^ [15:29:01] volans: thanks! [15:29:14] RECOVERY - logstash syslog TCP port on logstash1009 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 10514 [15:29:32] RECOVERY - logstash log4j TCP port on logstash1009 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 4560 [15:29:39] yw :) [15:29:50] RECOVERY - logstash JSON linesTCP port on logstash1009 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11514 [15:30:16] 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10akosiaris) >>! In T211881#4952874, @Jrbranaa wrote: >> In any case, and with the risk of repeating myself... [15:32:11] 10Operations: Mapping of servers to stakeholders - https://phabricator.wikimedia.org/T216088 (10Volans) I went ahead and tried the naming convention approach adding a column to that table and adding my Phabricator username there were relevant. I've actually added a link to the phab profile, probably just the nam... [15:39:25] (03PS2) 10Marostegui: dbproxy1011: Depool labsdb1009 [puppet] - 10https://gerrit.wikimedia.org/r/490620 (https://phabricator.wikimedia.org/T210713) [15:43:46] !log START - Cookbook sre.hosts.upgrade-and-reboot (volans@cumin1001) [15:43:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:40] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) 05Open→03Resolved Reboot tests with db2085 4.9.0-8 after getting the BIOS and FW upgraded by Papaul (T214840#4954418) 1st reboot: OK 2nd reboot:... [15:45:49] (03CR) 10Gehel: [C: 04-1] "See comments inline." (037 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [15:46:00] (03PS1) 10Fsero: Updating tiller image to latest version [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/490621 [15:46:29] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10MoritzMuehlenhoff) Are there other servers of that batch beside db1106 and db2085? [15:47:42] (03CR) 10Alexandros Kosiaris: [C: 03+1] Updating tiller image to latest version [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/490621 (owner: 10Fsero) [15:48:15] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490622 [15:48:29] (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490622 [15:48:48] (03CR) 10Fsero: [V: 03+2 C: 03+2] "Thanks :)" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/490621 (owner: 10Fsero) [15:50:10] !log END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) (volans@cumin1001) [15:50:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:18] !log building and publishing new tiller docker image on boron [15:50:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:51:58] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10jcrespo) yeah, I would like to see this applied to similar servers- while not in a hurry, I prefer this done rather than suffering after a crash or an emergency r... [15:52:30] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10Nuria) Giving my approval here to buy a new GPU card, need to consult with @elukey when it comes to budget but I think we could use part of the... [15:52:57] (03CR) 10Nuria: [C: 03+1] admin: extend Mathew's expiry date and update contact [puppet] - 10https://gerrit.wikimedia.org/r/490567 (owner: 10Gehel) [16:02:23] (03PS1) 10BryanDavis: php72: sync php-defaults to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/490623 (https://phabricator.wikimedia.org/T216076) [16:06:17] (03PS1) 10Fsero: typo: tag format doesnt allow ~ [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/490625 [16:06:56] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10crusnov) >>! In T205897#4953987, @BBlack wrote: >>>! In T205897#4953769, @faidon wrote: >> The medium-term plan is for this data to be entered into Netbox after a server is racked but before it's provisioned or even powere... [16:07:22] (03PS1) 10Filippo Giunchedi: Update config and modules [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/490626 [16:07:24] (03PS1) 10Filippo Giunchedi: Use hwraid packages from WMF apt repo [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/490627 [16:07:26] (03PS1) 10Filippo Giunchedi: .gitignore: more debirf profile ignores [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/490628 [16:07:28] (03PS1) 10Filippo Giunchedi: Import 'packages' file into debian-amd64 [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/490629 [16:07:30] (03PS1) 10Filippo Giunchedi: Makefile: add conveniency targets [software/rescue-pxe] - 10https://gerrit.wikimedia.org/r/490630 [16:07:50] cdanis: ^ the patches I mentioned yesterday [16:08:06] (03CR) 10Marostegui: [C: 03+2] Revert "db-codfw.php: Depool db2085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490622 (owner: 10Marostegui) [16:08:51] oooooh [16:09:13] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490622 (owner: 10Marostegui) [16:09:43] godog: this is awesome [16:10:14] (03CR) 10Fsero: [V: 03+2 C: 03+2] "automerging for minor typo" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/490625 (owner: 10Fsero) [16:10:15] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2085 - T214840 (duration: 00m 52s) [16:10:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:18] T214840: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 [16:10:28] (03CR) 10BBlack: [C: 03+1] "Makes sense!" [puppet] - 10https://gerrit.wikimedia.org/r/490612 (owner: 10Alexandros Kosiaris) [16:11:26] cdanis: yeah debirf is pretty sweet! feel free to review/change/etc at will if you have time, unlikely I will for another little while [16:11:42] !log updating tiller version on staging cluster [16:11:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:11] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] php72: sync php-defaults to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/490623 (https://phabricator.wikimedia.org/T216076) (owner: 10BryanDavis) [16:12:18] 10Operations, 10Maps, 10Traffic, 10Reading-Infrastructure-Team-Backlog (Kanban): Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732 (10Jhernandez) Hey @Mholloway, what is next for this? Resolve, clarify what is needed before resolving, make a follow up task, or someth... [16:13:53] godog: I didn't even know we had this [16:16:06] cdanis: hehe in production we still don't, though now most pieces are in place to have the debirf image as an option for sure [16:16:33] herron: you interested in this as well? ^^ [16:17:52] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490622 (owner: 10Marostegui) [16:18:11] chaomodus: you too ;) ^^^ [16:19:17] (03PS2) 10BryanDavis: systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) (owner: 10Arturo Borrero Gonzalez) [16:19:44] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) All eqiad servers from the same batch as db1106 are running 4.9.0-8 already db1096-db1106 All codfw servers from the same batch as db2085 are running... [16:20:12] 10Operations, 10cloud-services-team (Kanban): reprepro: automate incoming processing - https://phabricator.wikimedia.org/T215812 (10aborrero) [16:21:39] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10jcrespo) My suggestion would be to take one or 2 codfw servers, reboot it a few times and see if it suffers the same issues. Maybe I just got lucky and the next t... [16:22:44] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10jcrespo) It is really easy to reboot codfw servers, and I can take care of that if you want me to. [16:22:55] (03CR) 10BryanDavis: "> Uploaded patch set 2." [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) (owner: 10Arturo Borrero Gonzalez) [16:23:06] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) Sure - go ahead :-) [16:23:11] (03PS8) 10Eevans: Initial configuration for session storage service [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883) [16:23:55] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10jcrespo) Creating a separate ticket for that, will refer here. [16:25:21] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10MoritzMuehlenhoff) JFTR, the next Stretch update (this weekend) will update the kernel to 4.9.144-2, so that can be piggybacked. [16:26:51] !log upgrading tiller on codfw [16:26:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:12] PROBLEM - puppet last run on labtestmetal2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tmux] [16:31:38] (03CR) 10Arturo Borrero Gonzalez: "> > Uploaded patch set 2." [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) (owner: 10Arturo Borrero Gonzalez) [16:32:16] (03CR) 10Filippo Giunchedi: Initial configuration for session storage service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883) (owner: 10Eevans) [16:32:17] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Need to triple check with somebody else but from the inventory stat1005 is a Dell PowerEdge 730, that should be equipped with a Intel X... [16:33:09] (03PS1) 10Gehel: elasticsearch: both relforge clusters are accessible from mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/490631 [16:33:14] (03CR) 10Filippo Giunchedi: "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/490612 (owner: 10Alexandros Kosiaris) [16:33:19] (03CR) 10Filippo Giunchedi: [C: 03+1] HTTP availability alerts: Decrease retries to 1 [puppet] - 10https://gerrit.wikimedia.org/r/490612 (owner: 10Alexandros Kosiaris) [16:34:48] hah awesome [16:36:21] (03CR) 10Alexandros Kosiaris: [C: 03+2] HTTP availability alerts: Decrease retries to 1 [puppet] - 10https://gerrit.wikimedia.org/r/490612 (owner: 10Alexandros Kosiaris) [16:36:28] (03PS2) 10Alexandros Kosiaris: HTTP availability alerts: Decrease retries to 1 [puppet] - 10https://gerrit.wikimedia.org/r/490612 [16:36:35] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] HTTP availability alerts: Decrease retries to 1 [puppet] - 10https://gerrit.wikimedia.org/r/490612 (owner: 10Alexandros Kosiaris) [16:37:04] (03PS3) 10Marostegui: dbproxy1011: Depool labsdb1009 [puppet] - 10https://gerrit.wikimedia.org/r/490620 (https://phabricator.wikimedia.org/T210713) [16:37:14] (03PS1) 10Muehlenhoff: Fix Cumin alias after labs::monitoring rename [puppet] - 10https://gerrit.wikimedia.org/r/490632 [16:37:19] (03PS1) 10WMDE-leszek: Added wmgWikibaseRepoLocalEntitySourceName to define the "local" source of Wikibase Repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490633 (https://phabricator.wikimedia.org/T214557) [16:37:21] (03PS1) 10WMDE-leszek: Added wmgUseEntitySourceBasedFederation setting to switch the federation mechanism [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490634 (https://phabricator.wikimedia.org/T214557) [16:37:54] (03CR) 10Marostegui: [C: 03+2] dbproxy1011: Depool labsdb1009 [puppet] - 10https://gerrit.wikimedia.org/r/490620 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [16:38:59] (03PS2) 10Muehlenhoff: Fix Cumin alias after labs::monitoring rename [puppet] - 10https://gerrit.wikimedia.org/r/490632 [16:39:03] !log Depool labsdb1009 - T210713 [16:39:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:05] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [16:39:55] (03PS1) 10Marostegui: Revert "dbproxy1011: Depool labsdb1009" [puppet] - 10https://gerrit.wikimedia.org/r/490636 [16:40:04] (03CR) 10Marostegui: [C: 04-2] "Wait for labsdb1009 to be done" [puppet] - 10https://gerrit.wikimedia.org/r/490636 (owner: 10Marostegui) [16:40:24] (03CR) 10DCausse: [C: 03+1] elasticsearch: both relforge clusters are accessible from mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/490631 (owner: 10Gehel) [16:41:09] (03CR) 10Muehlenhoff: [C: 03+2] Fix Cumin alias after labs::monitoring rename [puppet] - 10https://gerrit.wikimedia.org/r/490632 (owner: 10Muehlenhoff) [16:44:09] (03PS3) 10BBlack: zone_validator: require -z argument zones dir [dns] - 10https://gerrit.wikimedia.org/r/489287 [16:44:12] (03PS3) 10BBlack: deploy-check: integrate other checks, no-gdnsd opt [dns] - 10https://gerrit.wikimedia.org/r/489288 [16:44:14] (03PS3) 10BBlack: update README and run-tests.sh [dns] - 10https://gerrit.wikimedia.org/r/489289 [16:45:48] (03PS1) 10Muehlenhoff: Restore accidentally removed alias [puppet] - 10https://gerrit.wikimedia.org/r/490638 [16:46:05] (03PS2) 10BBlack: authdns-local-update: update deploy-check.py args [puppet] - 10https://gerrit.wikimedia.org/r/489292 [16:46:34] (03CR) 10BBlack: [C: 03+2] zone_validator: require -z argument zones dir [dns] - 10https://gerrit.wikimedia.org/r/489287 (owner: 10BBlack) [16:46:39] (03CR) 10BBlack: [C: 03+2] deploy-check: integrate other checks, no-gdnsd opt [dns] - 10https://gerrit.wikimedia.org/r/489288 (owner: 10BBlack) [16:46:42] (03CR) 10BBlack: [C: 03+2] update README and run-tests.sh [dns] - 10https://gerrit.wikimedia.org/r/489289 (owner: 10BBlack) [16:46:55] (03CR) 10Muehlenhoff: [C: 03+2] Restore accidentally removed alias [puppet] - 10https://gerrit.wikimedia.org/r/490638 (owner: 10Muehlenhoff) [16:48:08] (03CR) 10BBlack: [C: 03+2] authdns-local-update: update deploy-check.py args [puppet] - 10https://gerrit.wikimedia.org/r/489292 (owner: 10BBlack) [16:48:17] (03PS3) 10BBlack: authdns-local-update: update deploy-check.py args [puppet] - 10https://gerrit.wikimedia.org/r/489292 [16:48:47] (03PS8) 10BBlack: Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris) [16:49:00] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10EBernhardson) As long as it fits in the case, a high end consumer GPU from AMD should be just fine. The most important spec for choosing will p... [16:49:08] (03PS1) 10Cwhite: expose status file mod time [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/490639 [16:50:31] (03PS1) 10Thcipriani: Gerrit: comment styles for the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/490640 (https://phabricator.wikimedia.org/T177868) [16:52:02] (03PS3) 10Herron: lists: add 5 second smtp banner delay [puppet] - 10https://gerrit.wikimedia.org/r/490481 (https://phabricator.wikimedia.org/T215251) [16:52:28] (03CR) 10Jforrester: [C: 04-1] "You need to add to extensionlist (for i18n), then to Init/Wikibase for the code to be aware, then add config (for beta cluster) first, and" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489598 (https://phabricator.wikimedia.org/T215684) (owner: 10Smalyshev) [16:52:56] (03CR) 10CDanis: [C: 04-1] "Nice! One nit, though" (031 comment) [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/490639 (owner: 10Cwhite) [16:53:15] (03CR) 10Herron: [C: 03+2] lists: add 5 second smtp banner delay [puppet] - 10https://gerrit.wikimedia.org/r/490481 (https://phabricator.wikimedia.org/T215251) (owner: 10Herron) [16:55:09] (03CR) 10Filippo Giunchedi: [C: 04-1] "Can't be merged as-is but looks good overall" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490073 (https://phabricator.wikimedia.org/T214289) (owner: 10Fsero) [16:55:27] 10Operations, 10Operations-Software-Development: Puppet compiler: abort on git rebase conflict - https://phabricator.wikimedia.org/T157001 (10Volans) p:05Normal→03Low [16:55:31] 10Operations, 10Operations-Software-Development: Puppet compiler: abort on git rebase conflict - https://phabricator.wikimedia.org/T157001 (10crusnov) If this is still occurring it seems as though it just needs to check if the rebase failed, and exit immediately. [16:56:03] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) IIRC during the last procurement task Rob took care of all the aspects related to power consumption and space, so in theory we should b... [16:56:38] RECOVERY - puppet last run on labtestmetal2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:56:49] (03PS3) 10Cwhite: prometheus: attempt to force apt update [puppet] - 10https://gerrit.wikimedia.org/r/490204 (https://phabricator.wikimedia.org/T213708) [16:57:14] (03PS2) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in esams [puppet] - 10https://gerrit.wikimedia.org/r/490229 (https://phabricator.wikimedia.org/T213708) [16:57:53] (03CR) 10Gehel: [C: 03+2] elasticsearch: both relforge clusters are accessible from mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/490631 (owner: 10Gehel) [16:58:00] (03PS2) 10Gehel: elasticsearch: both relforge clusters are accessible from mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/490631 [16:58:31] (03CR) 10Filippo Giunchedi: [C: 04-1] "Overall looks good, the non-nit comment would be to split introducing the module from rollout into separate changes" (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [16:58:55] (03CR) 10Muehlenhoff: "It would great to have some common Puppet define which adds a component and a list of packages and handles the repo addition, the apt-get " [puppet] - 10https://gerrit.wikimedia.org/r/490204 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [16:59:19] !log restarting apertium-apy on scb1001 to pick up Python security update [16:59:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:04] godog and _joe_: My dear minions, it's time we take the moon! Just kidding. Time for Puppet SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T1700). [17:00:05] No GERRIT patches in the queue for this window AFAICS. [17:01:09] (03PS1) 10Herron: lists: add 's' to smtp banner delay setting [puppet] - 10https://gerrit.wikimedia.org/r/490642 (https://phabricator.wikimedia.org/T215251) [17:03:02] (03CR) 10BBlack: [C: 03+1] "Should be good to go whenever you're ready" [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris) [17:04:22] (03CR) 10Herron: [C: 03+2] lists: add 's' to smtp banner delay setting [puppet] - 10https://gerrit.wikimedia.org/r/490642 (https://phabricator.wikimedia.org/T215251) (owner: 10Herron) [17:06:46] (03PS2) 10Jforrester: Enable WikibaseCirrusSearch on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489598 (https://phabricator.wikimedia.org/T215684) (owner: 10Smalyshev) [17:06:48] (03PS1) 10Jforrester: Deploy WikibaseCirrusSearch: Part I, extensionlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490643 [17:06:50] (03PS1) 10Jforrester: Deploy WikibaseCirrusSearch: Part II, InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490644 [17:06:52] (03PS1) 10Jforrester: Deploy WikibaseCirrusSearch: Part III, Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490645 [17:06:54] (03PS1) 10Jforrester: [BETA] Enable WikibaseCirrusSearch on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490646 [17:06:56] (03PS1) 10Jforrester: [BETA] Enable WikibaseCirrusSearch on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490647 [17:06:58] (03PS1) 10Jforrester: Enable WikibaseCirrusSearch on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490648 (https://phabricator.wikimedia.org/T215684) [17:07:00] (03PS1) 10Jforrester: Enable WikibaseCirrusSearch on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490649 (https://phabricator.wikimedia.org/T215684) [17:07:56] (03PS4) 10Cwhite: prometheus: attempt to force apt update [puppet] - 10https://gerrit.wikimedia.org/r/490204 (https://phabricator.wikimedia.org/T213708) [17:10:17] (03CR) 10Zppix: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/490204 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [17:12:19] (03PS3) 10Arturo Borrero Gonzalez: systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) [17:12:43] (03CR) 10Cwhite: [C: 03+2] prometheus: attempt to force apt update [puppet] - 10https://gerrit.wikimedia.org/r/490204 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [17:12:53] (03CR) 10jerkins-bot: [V: 04-1] systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) (owner: 10Arturo Borrero Gonzalez) [17:13:05] 10Operations, 10Core Platform Team Backlog (Watching / External), 10Patch-For-Review, 10Services (watching): More verbose messages from service-checker-swagger - https://phabricator.wikimedia.org/T150560 (10Volans) Removing #operations-software-development for now, feel free to re-add if you need anything... [17:13:56] (03CR) 10Cwhite: [C: 03+2] hiera: upgrade prometheus-node-exporter to 0.17 in esams [puppet] - 10https://gerrit.wikimedia.org/r/490229 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [17:14:09] (03PS3) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in esams [puppet] - 10https://gerrit.wikimedia.org/r/490229 (https://phabricator.wikimedia.org/T213708) [17:16:51] (03PS4) 10Arturo Borrero Gonzalez: systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) [17:17:47] (03CR) 10jerkins-bot: [V: 04-1] systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) (owner: 10Arturo Borrero Gonzalez) [17:22:27] (03PS5) 10Arturo Borrero Gonzalez: systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) [17:22:59] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10EBernhardson) I look back over things and it looks like stat1005 is in an R470 case, they advertise compatibilty with several full-size nvidia... [17:23:46] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] systemd: user slice: pinning for libsystemd0 as well [puppet] - 10https://gerrit.wikimedia.org/r/490605 (https://phabricator.wikimedia.org/T215154) (owner: 10Arturo Borrero Gonzalez) [17:24:21] (03CR) 10Dzahn: "> Patch Set 2:" [dns] - 10https://gerrit.wikimedia.org/r/489103 (owner: 10Dzahn) [17:27:26] apergos: hi :) [17:27:35] hello [17:27:47] * apergos eyes joal cautiously [17:27:54] apergos: would you have a few minutes for me to talk about wikidata-entities dump schedule ? [17:28:05] sure [17:28:07] * joal looks back gently smiling :) [17:28:12] it's once a week iirc [17:28:15] Great [17:28:32] there's not a way to move that up because the run takes too long, and we need to be able to recover from errors [17:28:35] apergos: I have noticed that currently the dumps are generated based on weekday stability [17:28:39] yep [17:29:30] apergos: Would there be a way to have scheduled for monthly-dates stability instead of weekday? [17:29:49] we'd have to see what users of them need [17:29:50] for instance, having YYYYMM01 and YYYYMM15 regular dates for dumps? [17:30:07] are there some users that have to have them once a week at certain times? [17:30:48] it's technically doable, it's a matter of what current users need [17:33:22] (03PS1) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/490651 (https://phabricator.wikimedia.org/T213708) [17:33:47] subbu: yesterday i removed the parsoid git clone dir on scandium and then let puppet git clone it freshly..into the same location. because we had cloned manually and then i fixed the puppet issue with it. today i see parsoid service is having an issue in icinga. did i cause that or unrelated? [17:34:10] PROBLEM - rsyslog TLS listener on port 6514 on lithium is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer [17:34:25] !log bounce rsyslog on wezen/lithium, tls listener timeout in icinga [17:34:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:34:33] anyways it did not cause notifications per the change we made to turn them off , per test system only [17:34:39] I hear you apergos - I should probably talk to wikidata PM [17:34:44] RECOVERY - rsyslog TLS listener on port 6514 on lithium is OK: SSL OK - Certificate lithium.eqiad.wmnet valid until 2021-10-23 19:09:29 +0000 (expires in 982 days) [17:34:52] hoo and SMalyshev have been involved in the past and might have some info [17:35:00] 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Systemd session creation fails under I/O load - https://phabricator.wikimedia.org/T199911 (10Volans) @fgiunchedi with the bandaid the problem doesn't show up anymore, do we want to keep this open for tracking or it's ok to resolve it? [17:35:12] Thanks telling me it's feasible, I'll try to get accpetance ticket :) [17:35:15] apergos: --^ [17:35:26] just keep me in the loop; if there's a ticket you can slap my name on there so I can follow along [17:35:30] PROBLEM - rsyslog TLS listener on port 6514 on wezen is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer [17:35:41] and put it in the 'dumps-generation' project [17:36:08] (03PS1) 10Arturo Borrero Gonzalez: systemd: slice: all_users: use proper syntax for apt::pin [puppet] - 10https://gerrit.wikimedia.org/r/490652 [17:36:56] 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Systemd session creation fails under I/O load - https://phabricator.wikimedia.org/T199911 (10fgiunchedi) >>! In T199911#4954809, @Volans wrote: > @fgiunchedi with the bandaid the problem doesn't show up anymore, do we want to keep this open... [17:37:10] (03CR) 10Filippo Giunchedi: [C: 03+1] hiera: upgrade prometheus-node-exporter to 0.17 in ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/490651 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [17:37:26] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] systemd: slice: all_users: use proper syntax for apt::pin [puppet] - 10https://gerrit.wikimedia.org/r/490652 (owner: 10Arturo Borrero Gonzalez) [17:38:26] PROBLEM - SSH on labservices1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:38:33] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 6 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10aaron) >>! In T215611#4951753, @jcrespo wrote: > ^I don't have enough context for this patch, is configuration for regular servers... [17:39:02] mutante, which service again? [17:39:08] parsoid-rt or parsoid-vd? [17:39:39] subbu: parsoid - HTTP CRITICAL: HTTP/1.1 500 Internal Server Error [17:39:46] that one [17:39:49] 10Operations, 10Operations-Software-Development: wmf-auto-reimage tries to remove from Debmonitor even with --new - https://phabricator.wikimedia.org/T204789 (10Volans) [17:39:54] that it's getting 500 on HTTP check [17:40:01] mutante, oh .. hmm .. let me check. [17:40:04] thanks [17:40:14] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Volans) [17:40:45] (03CR) 10Cwhite: [C: 03+2] hiera: upgrade prometheus-node-exporter to 0.17 in ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/490651 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [17:40:52] (03PS2) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/490651 (https://phabricator.wikimedia.org/T213708) [17:41:05] mutante, Failed to lookup view "home" in views directory "/srv/deployment/parsoid/deploy/src/lib/api/views" .. i'll check if there is some unchecked file or some other assumption that is broken. [17:41:06] RECOVERY - rsyslog TLS listener on port 6514 on wezen is OK: SSL OK - Certificate wezen.codfw.wmnet valid until 2021-08-21 20:09:05 +0000 (expires in 919 days) [17:42:00] RECOVERY - SSH on labservices1001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.11 (protocol 2.0) [17:42:36] subbu: ok, thanks. so fyi, what i did was (re)move the /srv/deployment/parsoid dir and then run puppet and it cloned it freshly into the same dir. no other changes and next time we apply this role it will just do it [17:42:49] (and in case there was something changed in there i do have a backup too) [17:43:24] apergos: I created T216160 - Please ping people you think would be interested in there :) [17:43:24] T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 [17:43:29] Many thanks again [17:43:42] sure thing [17:48:59] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10EBernhardson) Looks like one more option, a workstation card from AMD, the Vega Frontier has 16GB of memory with very similar compute to the Ve... [17:49:30] (03PS2) 10WMDE-leszek: DNM Define Wikibase "entity sources" on beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490108 (https://phabricator.wikimedia.org/T214557) [17:49:32] (03PS1) 10WMDE-leszek: DNM Use "entity source based federation" on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490658 (https://phabricator.wikimedia.org/T214557) [17:51:15] (03CR) 10jerkins-bot: [V: 04-1] DNM Define Wikibase "entity sources" on beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490108 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [17:51:57] (03CR) 10jerkins-bot: [V: 04-1] DNM Use "entity source based federation" on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490658 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [17:54:24] (03PS1) 10RobH: decommision analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/490659 (https://phabricator.wikimedia.org/T206524) [17:55:31] (03PS1) 10RobH: decom analytics1003 prod entries [dns] - 10https://gerrit.wikimedia.org/r/490660 (https://phabricator.wikimedia.org/T206524) [17:55:52] (03CR) 10RobH: [C: 03+2] decom analytics1003 prod entries [dns] - 10https://gerrit.wikimedia.org/r/490660 (https://phabricator.wikimedia.org/T206524) (owner: 10RobH) [17:55:58] (03CR) 10Mathew.onipe: "I feel like we should have two rules to differentiate them easily. Like 'elastic-https-mwmaint-9243' and 'elastic-https-mwmaint-[small|alp" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490631 (owner: 10Gehel) [17:56:22] (03CR) 10RobH: [C: 03+2] decommision analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/490659 (https://phabricator.wikimedia.org/T206524) (owner: 10RobH) [17:58:49] (03CR) 10Subramanya Sastry: visualdiff: create empty testrun.ids but don't change content (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490521 (https://phabricator.wikimedia.org/T215049) (owner: 10Dzahn) [17:59:14] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops, and 2 others: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10RobH) a:05RobH→03Cmjohnson [17:59:32] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops, and 2 others: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10RobH) ready for wipe and unracking steps [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: My dear minions, it's time we take the moon! Just kidding. Time for Services – Graphoid / Parsoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T1800). [18:03:24] !log upgrading tiller to 2.12.2 on eqiad [18:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:14] (03PS1) 10Dzahn: service::deploy::gitclone: add '/deploy' suffix to clone dir [puppet] - 10https://gerrit.wikimedia.org/r/490662 (https://phabricator.wikimedia.org/T201366) [18:05:26] (03CR) 10Jforrester: [C: 04-2] "Testing in Beta Cluster first. :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489598 (https://phabricator.wikimedia.org/T215684) (owner: 10Smalyshev) [18:05:32] (03CR) 10Jforrester: [C: 04-2] Enable WikibaseCirrusSearch on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490648 (https://phabricator.wikimedia.org/T215684) (owner: 10Jforrester) [18:05:36] (03CR) 10Jforrester: [C: 04-2] Enable WikibaseCirrusSearch on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490649 (https://phabricator.wikimedia.org/T215684) (owner: 10Jforrester) [18:06:02] (03PS2) 10Dzahn: service::deploy::gitclone: add '/deploy' suffix to clone dir [puppet] - 10https://gerrit.wikimedia.org/r/490662 (https://phabricator.wikimedia.org/T201366) [18:07:01] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10RobH) [18:07:03] (03CR) 10Dzahn: [C: 03+2] "already had checked yesterday that only parsoid-testing uses service::deploy::gitclone and previous change broke things there" [puppet] - 10https://gerrit.wikimedia.org/r/490662 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [18:07:28] How often is puppet meant to run on the Beta Cluster? Shelled in and it's pointing out that it's been over a week… [18:08:36] (03PS4) 10Herron: logstash: move role::ls::eventlogging to profile::logstash::collector [puppet] - 10https://gerrit.wikimedia.org/r/490401 (https://phabricator.wikimedia.org/T213898) [18:10:28] (03PS5) 10Herron: logstash: move role::ls::eventlogging to profile::logstash::collector [puppet] - 10https://gerrit.wikimedia.org/r/490401 (https://phabricator.wikimedia.org/T213898) [18:10:50] (03PS2) 10Cwhite: expose status file mod time [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/490639 [18:11:30] James_F: every ~20 minutes or so. Which instance? [18:11:37] (03CR) 10Herron: [C: 03+2] logstash: move role::ls::eventlogging to profile::logstash::collector [puppet] - 10https://gerrit.wikimedia.org/r/490401 (https://phabricator.wikimedia.org/T213898) (owner: 10Herron) [18:11:54] bd808: deployment-deploy01.deployment-prep.eqiad.wmflabs [18:12:17] * bd808 checks it out [18:12:20] (03CR) 10Cwhite: "> Patch Set 1: Code-Review-1" (031 comment) [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/490639 (owner: 10Cwhite) [18:12:27] !log scandium - deleting parsoid clone dir and running puppet [18:12:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:36] I guess someone was doing some local poking. [18:13:03] (03CR) 10CDanis: [C: 03+1] expose status file mod time [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/490639 (owner: 10Cwhite) [18:13:39] PROBLEM - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.248 second response time [18:13:44] (03CR) 10Jforrester: [C: 04-1] "I think this has to wait until the WikibaseCirrusSearch extension has been branched for all production variants (i.e., until Thursday afte" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490643 (owner: 10Jforrester) [18:14:58] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10RobH) a:05RobH→03ayounsi So, trying to disable the switch port: robh@asw2-b-eqiad# show | compare [edit interfaces interface-range disabled] member ge-4/0/38 { ... } +... [18:15:21] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10RobH) [18:15:34] ACKNOWLEDGEMENT - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.248 second response time andrew bogott this is probably a result of me moving things around [18:15:36] James_F: T216164 -- there are some hiera settings missing. Probably from refactoring and the confusing situation that role() hiera settings are not applied in cloud vps [18:15:37] T216164: Puppet failures on deployment-deploy01.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T216164 [18:16:31] bd808: You rock, thank you. [18:16:53] (03PS3) 10Dzahn: visualdiff: create empty testrun.ids but don't change content [puppet] - 10https://gerrit.wikimedia.org/r/490521 (https://phabricator.wikimedia.org/T215049) [18:17:50] (03PS3) 10WMDE-leszek: DNM Define Wikibase "entity sources" on beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490108 (https://phabricator.wikimedia.org/T214557) [18:21:33] (03PS2) 10WMDE-leszek: DNM Use "entity source based federation" on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490658 (https://phabricator.wikimedia.org/T214557) [18:23:25] (03CR) 10Smalyshev: [C: 03+1] "+1ing for that time it can go forward, let's do it in a week then" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490643 (owner: 10Jforrester) [18:26:52] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/14682/scandium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/490521 (https://phabricator.wikimedia.org/T215049) (owner: 10Dzahn) [18:26:54] (03PS3) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in labs [puppet] - 10https://gerrit.wikimedia.org/r/489753 (https://phabricator.wikimedia.org/T213708) [18:27:20] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10RobH) [18:27:57] (03PS4) 10Dzahn: visualdiff: create empty testrun.ids but don't change content [puppet] - 10https://gerrit.wikimedia.org/r/490521 (https://phabricator.wikimedia.org/T215049) [18:30:27] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10RobH) [18:33:04] (03CR) 10Gehel: [C: 03+2] elasticsearch: both relforge clusters are accessible from mwmaint (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490631 (owner: 10Gehel) [18:35:21] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10RobH) [18:35:24] (03PS2) 10Herron: lists: enforce domain or ip literal HELO check [puppet] - 10https://gerrit.wikimedia.org/r/490416 (https://phabricator.wikimedia.org/T215251) [18:36:34] (03PS3) 10Gehel: elasticsearch: both relforge clusters are accessible from mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/490631 [18:36:37] (03CR) 10Herron: [C: 03+2] lists: enforce domain or ip literal HELO check [puppet] - 10https://gerrit.wikimedia.org/r/490416 (https://phabricator.wikimedia.org/T215251) (owner: 10Herron) [18:42:43] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10RobH) [18:43:25] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for graphite1001.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Remo... [18:43:38] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for graphite1003.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Remo... [18:44:36] (03PS1) 10Ppchelko: [WIP]: Switch kafka logging to EventBus logging. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) [18:45:04] (03PS1) 10RobH: decom graphite100[13] prod dns [dns] - 10https://gerrit.wikimedia.org/r/490669 (https://phabricator.wikimedia.org/T209357) [18:45:26] (03CR) 10jerkins-bot: [V: 04-1] [WIP]: Switch kafka logging to EventBus logging. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) (owner: 10Ppchelko) [18:45:57] (03PS1) 10Dzahn: service::deploy::gitclone: let wikidev group own files [puppet] - 10https://gerrit.wikimedia.org/r/490671 (https://phabricator.wikimedia.org/T215049) [18:46:38] (03PS1) 10RobH: decom graphite100[12] [puppet] - 10https://gerrit.wikimedia.org/r/490672 (https://phabricator.wikimedia.org/T209357) [18:46:49] (03PS2) 10Dzahn: service::deploy::gitclone: let wikidev group own files [puppet] - 10https://gerrit.wikimedia.org/r/490671 (https://phabricator.wikimedia.org/T215049) [18:47:01] (03CR) 10RobH: [C: 03+2] decom graphite100[12] [puppet] - 10https://gerrit.wikimedia.org/r/490672 (https://phabricator.wikimedia.org/T209357) (owner: 10RobH) [18:47:25] 10Operations, 10Operations-Software-Development: Systemd session creation fails under I/O load - https://phabricator.wikimedia.org/T199911 (10Volans) 05Open→03Resolved a:03Volans [18:47:42] (03CR) 10RobH: [C: 03+2] decom graphite100[13] prod dns [dns] - 10https://gerrit.wikimedia.org/r/490669 (https://phabricator.wikimedia.org/T209357) (owner: 10RobH) [18:47:53] (03CR) 10Dzahn: [C: 03+2] "again only affects parsoid-testing code currently, all other "modules/service-services deploy with scap" [puppet] - 10https://gerrit.wikimedia.org/r/490671 (https://phabricator.wikimedia.org/T215049) (owner: 10Dzahn) [18:48:31] (03PS3) 10Dzahn: service::deploy::gitclone: let wikidev group own files [puppet] - 10https://gerrit.wikimedia.org/r/490671 (https://phabricator.wikimedia.org/T215049) [18:50:07] 10Operations, 10ops-eqiad, 10decommission, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10RobH) a:05RobH→03Cmjohnson [18:52:04] !log scandium - deleting parsoid clone dir and running puppet one more time, to fix permissions to allow wikidev [18:52:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:52:10] (03PS2) 10Ppchelko: [WIP]: Switch kafka logging to EventBus logging. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) [18:53:14] (03CR) 10jerkins-bot: [V: 04-1] [WIP]: Switch kafka logging to EventBus logging. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) (owner: 10Ppchelko) [18:55:11] (03PS3) 10Ppchelko: [WIP]: Switch kafka logging to EventBus logging. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) [18:56:56] (03CR) 10Ppchelko: "Testing Monolog in Vagrant appears to be a nightmare, so I guess we will have to test this in labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) (owner: 10Ppchelko) [18:58:47] (03PS1) 10Dzahn: service::deploy::gitclone: set 'shared' param to true [puppet] - 10https://gerrit.wikimedia.org/r/490673 (https://phabricator.wikimedia.org/T215049) [18:59:21] (03CR) 10jerkins-bot: [V: 04-1] service::deploy::gitclone: set 'shared' param to true [puppet] - 10https://gerrit.wikimedia.org/r/490673 (https://phabricator.wikimedia.org/T215049) (owner: 10Dzahn) [18:59:52] (03PS2) 10Dzahn: service::deploy::gitclone: set 'shared' param to true [puppet] - 10https://gerrit.wikimedia.org/r/490673 (https://phabricator.wikimedia.org/T215049) [19:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate Morning SWAT (Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T1900). [19:00:04] xSavitar: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:37] (03CR) 10Dzahn: [C: 03+2] service::deploy::gitclone: set 'shared' param to true [puppet] - 10https://gerrit.wikimedia.org/r/490673 (https://phabricator.wikimedia.org/T215049) (owner: 10Dzahn) [19:00:49] it used to be a t-shirt but inflation [19:01:29] !log scandium - deleting parsoid clone dir and running puppet one more time, to fix permissions to allow wikidev [19:01:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:01] I can SWAT, xSavitar around? [19:03:10] I'm around! [19:03:44] (03PS9) 10Thcipriani: Stop NavPopups gadget conflict with PagePreviews on Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) (owner: 10D3r1ck01) [19:03:49] thcipriani: ^ [19:04:08] (03CR) 10Thcipriani: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) (owner: 10D3r1ck01) [19:04:22] xSavitar: great :) [19:04:23] hey hey.. Once upon a time I was sent a link to a file that has all the configs currently in production for the wikipedias.. can someone point me to that once again? [19:05:15] (03Merged) 10jenkins-bot: Stop NavPopups gadget conflict with PagePreviews on Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) (owner: 10D3r1ck01) [19:07:06] dmaza: Maybe it was https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php ? [19:07:28] xSavitar: your change is live on mwdebug1002, check please [19:07:37] On it! [19:09:16] thcipriani: I've got a patch for SWAT as well, waiting for the build to finish so I can backport it [19:09:54] kostajh: okie doke [19:10:00] thx [19:10:06] thcipriani: Works as expected! When popups gadget is enabled, PagePreviews is disabled and when popups gadget is disabled, PagePreviews is enabled. Works as espected :) [19:10:15] Thanks, so I think I'm good :) [19:10:26] Maybe jdlrobson is around to further confirm this? [19:10:59] (03CR) 10jenkins-bot: Stop NavPopups gadget conflict with PagePreviews on Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) (owner: 10D3r1ck01) [19:11:38] Otherwise, then I think it's good. Thanks thcipriani :) [19:11:51] xSavitar: cool, I'll go ahead and deploy :) [19:11:55] +1 [19:14:07] !log thcipriani@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:487007|Stop NavPopups gadget conflict with PagePreviews on Wikivoyage]] T214878 (duration: 00m 54s) [19:14:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:10] T214878: English Wikivoyage NavPopups gadget conflicts with Page previews - https://phabricator.wikimedia.org/T214878 [19:14:11] ^ xSavitar live now [19:14:36] Cool! Thanks! To be at peace with myself, let me double check :) thcipriani [19:14:45] :) [19:15:32] thcipriani: You're wonderful. Thanks a lot, double checked and it works just fine! [19:15:52] xSavitar: glad all's working, thanks for checking (and doublechecking :)) [19:16:30] thcipriani: do I need to wait for the gate-and-submit jobs to finish before cherry-picking to wmf17? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/490670 [19:17:03] zuul just went from showing 0 minutes remaining to 10 [19:17:07] kostajh: I can cherry pick now, just for wmf.17? [19:17:16] thcipriani: yes [19:17:21] (I guess that's going live in everywhere in a few minutes) [19:17:51] (03CR) 10Mathew.onipe: [C: 03+1] elasticsearch: both relforge clusters are accessible from mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/490631 (owner: 10Gehel) [19:18:22] kostajh: can I get you to +1 that this looks right? https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/GrowthExperiments/+/490674/ [19:20:07] thcipriani: done [19:20:20] kostajh: +2'd, now also waiting for jenkins :) [19:21:17] RoanKattouw: I was under the impression that there was something else other than the config repo [19:21:32] but maybe I'm wrong [19:21:50] thcipriani: wmf.17 goes to group2 in 30 minutes or so? [19:21:53] That's possible, I just don't now offhand what you're talking about [19:22:02] kostajh: that's correct. [19:22:14] https://tools.wmflabs.org/versions/ [19:22:28] thcipriani: then yeah, I think wmf16 can live without this one :) [19:22:37] ok :) [19:23:29] PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 508 bytes in 3.121 second response time [19:25:28] kostajh: change is live on mwdebug1002, check please [19:25:34] thcipriani: looking [19:28:33] RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.131 second response time [19:29:03] thcipriani: not working as I'd expect but I'm checking to see if I'm getting stale JS [19:29:28] * thcipriani doublechecks I fetched it down correctly [19:29:53] PROBLEM - toolschecker: All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/flannel - 512 bytes in 3.952 second response time [19:30:10] yeah, I've done hard refresh + debug=1 in the URL + network inspector has Disable cache checked... [19:30:44] my traffic is going through mwdebug1002 [19:32:25] yep, after some grepping, I can confirm that the correct code is in place on mwdebug1002 [19:32:37] hmm [19:33:02] thcipriani: let me try a different browser, just a minute [19:33:38] k [19:34:20] RoanKattouw: to be more specific I was looking for the current config on the password policies in production. We dynamically set those in CommonSettings and I was under the impression that I saw a file with the "final" generated config [19:34:31] but I think I'm just confused [19:37:03] thcipriani: I'm not seeing the new code when accessing via mwdebug1002 [19:37:24] kostajh: which wiki are you trying? [19:37:40] thcipriani: oh, oops. [19:37:47] :) [19:37:51] one sec [19:38:12] * kostajh facepalms [19:39:23] thcipriani: everything's great [19:39:47] kostajh: yeah? ok to go live? [19:39:57] thcipriani: yep! [19:40:01] * thcipriani does [19:42:20] !log thcipriani@deploy1001 Synchronized php-1.33.0-wmf.17/extensions/GrowthExperiments/modules/help: SWAT: [[gerrit:490674|Help Panel: Fix IME broken in help panel search]] T216131 (duration: 00m 54s) [19:42:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:25] ^ kostajh live now [19:42:25] T216131: Help panel: typing Korean is broken in Help Panel Search bar - https://phabricator.wikimedia.org/T216131 [19:43:19] thcipriani: thanks for bearing with me. :) time for more coffee... [19:43:42] kostajh: no worries, thanks for verifying your change [19:44:34] 10Operations, 10Mail: Set up basic email infra for w.wiki domain - https://phabricator.wikimedia.org/T216172 (10BBlack) p:05Triage→03Normal [19:45:14] 10Operations, 10Mail: Set up basic email infra for w.wiki domain - https://phabricator.wikimedia.org/T216172 (10RobH) Copy of the email: > Hello Rob, > > I am trying to finish the yearly re-validation of your domain (w.wiki) so you can order SSL certificates for it without interruption, and I need your he... [19:48:25] 10Operations: HP Gen9 onboard controller review - https://phabricator.wikimedia.org/T216175 (10RobH) p:05Triage→03High [19:53:43] RECOVERY - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 2.110 second response time [19:55:53] RECOVERY - toolschecker: All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.257 second response time [20:00:04] thcipriani: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for MediaWiki train - Americas version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T2000). [20:00:54] 10Operations: HP Gen9 onboard controller review - https://phabricator.wikimedia.org/T216175 (10RobH) netbox list of ALL DL360 Gen9 systems: https://netbox.wikimedia.org/dcim/devices/?device_type_id=54&per_page=250 So, checking older orders of the HP DL360 Gen9 example of aqs1004: https://netbox.wikimedia.org/dc... [20:02:26] * thcipriani trains [20:04:29] (03PS4) 10Ppchelko: [WIP]: Switch kafka logging to EventBus logging. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) [20:10:56] (03PS1) 10Thcipriani: all wikis to 1.33.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490682 [20:10:58] (03CR) 10Thcipriani: [C: 03+2] all wikis to 1.33.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490682 (owner: 10Thcipriani) [20:12:28] (03Merged) 10jenkins-bot: all wikis to 1.33.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490682 (owner: 10Thcipriani) [20:14:07] !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.17 [20:16:05] thcipriani@deploy1001: Failed to log message to wiki. Somebody should check the error logs. [20:16:33] (03PS1) 10Herron: logstash: disable notifications on logstash101[0-2] during setup [puppet] - 10https://gerrit.wikimedia.org/r/490686 (https://phabricator.wikimedia.org/T214608) [20:18:14] !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.17 [20:18:43] (03CR) 10Herron: [C: 03+2] logstash: disable notifications on logstash101[0-2] during setup [puppet] - 10https://gerrit.wikimedia.org/r/490686 (https://phabricator.wikimedia.org/T214608) (owner: 10Herron) [20:19:18] thcipriani: Failed to log message to wiki. Somebody should check the error logs. [20:19:22] huh [20:20:43] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 6 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10aaron) >>! In T215611#4950057, @bd808 wrote: > My recommendation would be that if de-duplication or rate limiting is actually wha... [20:20:44] PROBLEM - Apache HTTP on mw1243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:20:47] (03PS2) 10EBernhardson: Re-apply defaults removed in cirrus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490496 [20:20:52] PROBLEM - HHVM rendering on mw1243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:21:10] PROBLEM - Nginx local proxy to apache on mw1243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:21:13] (03PS3) 10EBernhardson: Re-apply defaults removed in cirrus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490496 [20:23:04] RECOVERY - Apache HTTP on mw1243 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.114 second response time [20:23:12] RECOVERY - HHVM rendering on mw1243 is OK: HTTP OK: HTTP/1.1 200 OK - 75212 bytes in 0.251 second response time [20:23:30] RECOVERY - Nginx local proxy to apache on mw1243 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.184 second response time [20:28:30] (03PS1) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in codfw [puppet] - 10https://gerrit.wikimedia.org/r/490689 (https://phabricator.wikimedia.org/T213708) [20:36:40] (03PS1) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/490690 (https://phabricator.wikimedia.org/T213708) [20:36:58] (03PS4) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in labs [puppet] - 10https://gerrit.wikimedia.org/r/489753 (https://phabricator.wikimedia.org/T213708) [20:37:25] (03CR) 10jenkins-bot: all wikis to 1.33.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490682 (owner: 10Thcipriani) [20:38:02] !log Restarted mariadb on labsdb1005 for https://wikitech.wikimedia.org/wiki/Incident_documentation/20190214-labsdb1005 [20:38:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:33] 10Operations, 10Mail, 10Phabricator: DomainKeys Identified Mail (DKIM) for phabricator.wikimedia.org - https://phabricator.wikimedia.org/T116805 (10mmodell) I've never seen this but I don't routinely use gmail for following phabricator. Is this really still an issue? [20:42:25] 10Operations, 10Mail, 10Phabricator: DomainKeys Identified Mail (DKIM) for phabricator.wikimedia.org - https://phabricator.wikimedia.org/T116805 (10mmodell) afaik the outbound email path was completely redone since this task was filed. [20:43:15] 10Operations, 10Operations-Software-Development, 10Phabricator, 10Technical-Debt: Update Puppet repo code that uses deprecated maniphest.update/.createtask/.query Conduit API - https://phabricator.wikimedia.org/T159045 (10mmodell) [20:45:01] (03PS1) 10Cwhite: hiera: upgrade prometheus-node-exporter to 0.17 in labs [puppet] - 10https://gerrit.wikimedia.org/r/490693 (https://phabricator.wikimedia.org/T213708) [20:46:09] (03PS1) 10Herron: logstash: apply role::logstash to new logstash101[0-2] hardare hosts [puppet] - 10https://gerrit.wikimedia.org/r/490695 (https://phabricator.wikimedia.org/T214608) [20:46:39] (03CR) 10Cwhite: [C: 03+2] hiera: upgrade prometheus-node-exporter to 0.17 in labs [puppet] - 10https://gerrit.wikimedia.org/r/490693 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [20:48:00] (03CR) 10Cwhite: [C: 03+2] admin: add esanders to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/490500 (https://phabricator.wikimedia.org/T215830) (owner: 10Cwhite) [20:48:05] (03PS2) 10Cwhite: admin: add esanders to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/490500 (https://phabricator.wikimedia.org/T215830) [20:50:51] (03CR) 10Cwhite: [C: 03+2] admin: add ladsgroup to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/490403 (https://phabricator.wikimedia.org/T215938) (owner: 10Cwhite) [20:51:00] (03PS2) 10Cwhite: admin: add ladsgroup to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/490403 (https://phabricator.wikimedia.org/T215938) [20:53:55] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Access request: Ladsgroup to analytics-wmde-users - https://phabricator.wikimedia.org/T215938 (10colewhite) The group membership change has been deployed. Please feel free to reopen if you encounter any related issue. [20:54:06] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Access request: Ladsgroup to analytics-wmde-users - https://phabricator.wikimedia.org/T215938 (10colewhite) 05Open→03Resolved [20:54:51] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata for esanders - https://phabricator.wikimedia.org/T215830 (10colewhite) The group membership change has been deployed. Please feel free to reopen if you encounter any related issue. [20:55:04] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata for esanders - https://phabricator.wikimedia.org/T215830 (10colewhite) 05Open→03Resolved [21:00:07] 10Operations, 10Discovery, 10Discovery-Search, 10Elasticsearch: Have dedicated master nodes for elasticsearch - https://phabricator.wikimedia.org/T130590 (10EBernhardson) 05Open→03Declined [21:02:50] (03PS3) 10Dzahn: remove bast3003, bast3002 has been repaired [dns] - 10https://gerrit.wikimedia.org/r/489104 (https://phabricator.wikimedia.org/T184936) [21:03:06] (03PS4) 10Dzahn: remove bast3003, bast3002 has been repaired [dns] - 10https://gerrit.wikimedia.org/r/489104 (https://phabricator.wikimedia.org/T184936) [21:03:50] (03Abandoned) 10CRusnov: Update old hardware report to exclude certain device types. [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/480286 (https://phabricator.wikimedia.org/T205899) (owner: 10CRusnov) [21:07:32] (03PS1) 10CRusnov: Make oldhardware report exclued cablemgmt and storagebin [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/490700 [21:08:41] (03PS2) 10Herron: logstash: apply role::logstash to new logstash101[0-2] hardare hosts [puppet] - 10https://gerrit.wikimedia.org/r/490695 (https://phabricator.wikimedia.org/T214608) [21:09:32] (03CR) 10CRusnov: [C: 03+1] "Looks good! Please merge." [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/488909 (owner: 10Hashar) [21:12:47] (03PS2) 10CRusnov: Make oldhardware report exclued cablemgmt and storagebin [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/490700 [21:16:30] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1001/14687/" [puppet] - 10https://gerrit.wikimedia.org/r/490695 (https://phabricator.wikimedia.org/T214608) (owner: 10Herron) [21:17:16] (03CR) 10Zoranzoki21: Add new throttle rule for T215839 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489819 (https://phabricator.wikimedia.org/T215839) (owner: 10Zoranzoki21) [21:17:53] (03PS4) 10Zoranzoki21: Add new throttle rule for T215839 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489819 (https://phabricator.wikimedia.org/T215839) [21:17:58] (03PS5) 10Zoranzoki21: Add new throttle rule for T215839 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489819 (https://phabricator.wikimedia.org/T215839) [21:49:55] (03PS1) 10BryanDavis: Change !log formatting to match Stashbot expectations [software/spicerack] - 10https://gerrit.wikimedia.org/r/490774 [21:51:29] (03CR) 10Jforrester: [C: 03+1] "Neat." [puppet] - 10https://gerrit.wikimedia.org/r/490640 (https://phabricator.wikimedia.org/T177868) (owner: 10Thcipriani) [21:51:39] (03CR) 10BryanDavis: "Example of parsing the current format: https://tools.wmflabs.org/sal/log/AWjsrZoFzCcrHSwqSdCu" [software/spicerack] - 10https://gerrit.wikimedia.org/r/490774 (owner: 10BryanDavis) [21:52:12] (03PS1) 10Andrew Bogott: bootstrap-vz: tidy up root terminal settings [puppet] - 10https://gerrit.wikimedia.org/r/490775 (https://phabricator.wikimedia.org/T215211) [21:53:10] (03CR) 10Andrew Bogott: [C: 03+2] bootstrap-vz: tidy up root terminal settings [puppet] - 10https://gerrit.wikimedia.org/r/490775 (https://phabricator.wikimedia.org/T215211) (owner: 10Andrew Bogott) [21:56:37] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 6 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10jcrespo) > We don't emit DEBUG type events unless explicitly configured to. Thanks for the explanation, it help me understand it.... [22:00:18] (03PS1) 10Andrew Bogott: Rename labvirt1012 to cloudvirt1012, with Stretch [puppet] - 10https://gerrit.wikimedia.org/r/490780 [22:02:46] (03PS1) 10Andrew Bogott: Rename labvirt1012 to cloudvirt1012 [dns] - 10https://gerrit.wikimedia.org/r/490781 (https://phabricator.wikimedia.org/T216190) [22:02:48] (03PS1) 10Andrew Bogott: Remove old refs to labvirt1012 [dns] - 10https://gerrit.wikimedia.org/r/490782 (https://phabricator.wikimedia.org/T216190) [22:02:59] (03CR) 10jerkins-bot: [V: 04-1] Rename labvirt1012 to cloudvirt1012 [dns] - 10https://gerrit.wikimedia.org/r/490781 (https://phabricator.wikimedia.org/T216190) (owner: 10Andrew Bogott) [22:03:36] (03PS2) 10Andrew Bogott: Rename labvirt1012 to cloudvirt1012, with Stretch [puppet] - 10https://gerrit.wikimedia.org/r/490780 (https://phabricator.wikimedia.org/T216190) [22:07:17] !log rebuilding labvirt1012 as cloudvirt1012, T216190 [22:07:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:07:19] T216190: Rebuild labvirt1012 and cloudvirt1012 - https://phabricator.wikimedia.org/T216190 [22:07:26] (03CR) 10Andrew Bogott: [C: 03+2] Rename labvirt1012 to cloudvirt1012, with Stretch [puppet] - 10https://gerrit.wikimedia.org/r/490780 (https://phabricator.wikimedia.org/T216190) (owner: 10Andrew Bogott) [22:09:22] (03PS2) 10Andrew Bogott: Rename labvirt1012 to cloudvirt1012 [dns] - 10https://gerrit.wikimedia.org/r/490781 (https://phabricator.wikimedia.org/T216190) [22:09:38] (03Abandoned) 10Andrew Bogott: Remove old refs to labvirt1012 [dns] - 10https://gerrit.wikimedia.org/r/490782 (https://phabricator.wikimedia.org/T216190) (owner: 10Andrew Bogott) [22:09:51] (03CR) 10Andrew Bogott: [C: 03+2] Rename labvirt1012 to cloudvirt1012 [dns] - 10https://gerrit.wikimedia.org/r/490781 (https://phabricator.wikimedia.org/T216190) (owner: 10Andrew Bogott) [22:12:11] (03PS4) 10Gehel: elasticsearch: both relforge clusters are accessible from mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/490631 [22:12:49] (03CR) 10Gehel: [C: 03+2] elasticsearch: both relforge clusters are accessible from mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/490631 (owner: 10Gehel) [22:15:01] (03PS1) 10Andrew Bogott: cloudvirt1012: disable notifications during rebuild [puppet] - 10https://gerrit.wikimedia.org/r/490784 (https://phabricator.wikimedia.org/T216190) [22:15:27] (03CR) 10CRusnov: "> Patch Set 1: Code-Review-1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490397 (owner: 10CRusnov) [22:15:46] (03PS2) 10Andrew Bogott: cloudvirt1012: disable notifications during rebuild [puppet] - 10https://gerrit.wikimedia.org/r/490784 (https://phabricator.wikimedia.org/T216190) [22:15:58] 10Operations, 10Discovery, 10Discovery-Search, 10Elasticsearch, 10Epic: EPIC: Cultivating the Elasticsearch garden (operational lessons from 1.7.1 upgrade) - https://phabricator.wikimedia.org/T109089 (10EBernhardson) [22:16:01] 10Operations, 10Discovery, 10Discovery-Search, 10Elasticsearch: Investigate the need for master only (non data nodes) in our ES cluster - https://phabricator.wikimedia.org/T109090 (10EBernhardson) 05Open→03Declined [22:16:32] (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt1012: disable notifications during rebuild [puppet] - 10https://gerrit.wikimedia.org/r/490784 (https://phabricator.wikimedia.org/T216190) (owner: 10Andrew Bogott) [22:19:41] (03PS1) 10Andrew Bogott: cloudvirt1012: use the new, weird 'eno' names for nics [puppet] - 10https://gerrit.wikimedia.org/r/490785 (https://phabricator.wikimedia.org/T216190) [22:20:30] (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt1012: use the new, weird 'eno' names for nics [puppet] - 10https://gerrit.wikimedia.org/r/490785 (https://phabricator.wikimedia.org/T216190) (owner: 10Andrew Bogott) [22:24:36] 10Operations, 10monitoring, 10Patch-For-Review: Evaluate/integrate rasdaemon as a replacement for mcelog - https://phabricator.wikimedia.org/T205396 (10CDanis) 05Resolved→03Open a:05jbond→03CDanis @jbond kindly backported the buster version of rasdaemon to stretch. I'm going to attempt installing it... [22:26:01] 10Operations, 10ops-eqiad: Update label and switch to rename labvirt1012 to cloudvirt1012 - https://phabricator.wikimedia.org/T216192 (10Andrew) [22:28:08] (03PS1) 10Andrew Bogott: cloudvirt1012: enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/490786 (https://phabricator.wikimedia.org/T216190) [22:33:47] (03PS1) 10CDanis: WIP: puppet for rasdaemon testing [puppet] - 10https://gerrit.wikimedia.org/r/490787 [22:34:29] (03CR) 10jerkins-bot: [V: 04-1] WIP: puppet for rasdaemon testing [puppet] - 10https://gerrit.wikimedia.org/r/490787 (owner: 10CDanis) [22:36:56] (03PS2) 10CDanis: WIP: puppet for rasdaemon testing [puppet] - 10https://gerrit.wikimedia.org/r/490787 [22:37:39] (03CR) 10jerkins-bot: [V: 04-1] WIP: puppet for rasdaemon testing [puppet] - 10https://gerrit.wikimedia.org/r/490787 (owner: 10CDanis) [22:39:37] (03PS3) 10CDanis: WIP: puppet for rasdaemon testing [puppet] - 10https://gerrit.wikimedia.org/r/490787 [22:40:20] (03CR) 10jerkins-bot: [V: 04-1] WIP: puppet for rasdaemon testing [puppet] - 10https://gerrit.wikimedia.org/r/490787 (owner: 10CDanis) [22:46:45] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10EBernhardson) >>! In T148843#4954602, @elukey wrote: > Need to triple check with somebody else but from the inventory stat1005 is a Dell PowerE... [22:51:04] 10Operations, 10Office-IT, 10Wikimedia-Mailing-lists: Mailing list migration for Arbitration Committee - https://phabricator.wikimedia.org/T215940 (10colewhite) p:05Triage→03Normal a:03colewhite [22:52:07] 10Operations, 10Parsoid, 10serviceops, 10Patch-For-Review: parsoid-vd - "no such file or directory, open '/srv/visualdiff/testreduce/testrun.ids" - https://phabricator.wikimedia.org/T215049 (10Dzahn) 05Open→03Resolved If the file in question does not exist, puppet will create it, avoiding this issue f... [22:52:37] 10Operations, 10Parsoid, 10serviceops, 10Patch-For-Review: parsoid-vd - "no such file or directory, open '/srv/visualdiff/testreduce/testrun.ids" - https://phabricator.wikimedia.org/T215049 (10Dzahn) The other 2 changes above i should have linked to T201366 but it's kind of related. [22:59:47] 10Operations, 10ops-esams: decom bast3003 (formerly ms-be3003 - https://phabricator.wikimedia.org/T216199 (10Dzahn) p:05Triage→03Low [23:00:04] Niharika and bd808: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Wikimania scholarships app - part II . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190214T2300). [23:00:58] Niharika: I have a meeting right now, but I think you can handle this. ping me if things go sideways [23:05:36] bd808: Sounds good. [23:11:41] 10Operations, 10ops-esams, 10DC-Ops, 10decommission: decom bast3003 (formerly ms-be3003) - https://phabricator.wikimedia.org/T216199 (10Dzahn) [23:11:51] 10Operations, 10ops-esams, 10DC-Ops, 10decommission: decom bast3003 (formerly ms-be3003, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Dzahn) [23:11:55] 10Operations, 10ops-eqiad: Update label and switch to rename labvirt1012 to cloudvirt1012 - https://phabricator.wikimedia.org/T216192 (10colewhite) p:05Triage→03Normal [23:12:43] 10Operations, 10ops-eqiad: Disk failure on labsdb1005 - https://phabricator.wikimedia.org/T216202 (10Bstorm) [23:12:46] (03PS1) 10Dzahn: Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003" [puppet] - 10https://gerrit.wikimedia.org/r/490788 [23:12:46] 10Operations, 10Wikimedia-Logstash, 10serviceops: ensure httpd error logs from "misc apps" (krypton) end up in logstash - https://phabricator.wikimedia.org/T216090 (10colewhite) p:05Triage→03Normal [23:13:05] (03PS2) 10Dzahn: Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003" [puppet] - 10https://gerrit.wikimedia.org/r/490788 (https://phabricator.wikimedia.org/T216199) [23:13:29] (03CR) 10jerkins-bot: [V: 04-1] Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003" [puppet] - 10https://gerrit.wikimedia.org/r/490788 (https://phabricator.wikimedia.org/T216199) (owner: 10Dzahn) [23:15:10] (03PS3) 10Dzahn: Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003" [puppet] - 10https://gerrit.wikimedia.org/r/490788 (https://phabricator.wikimedia.org/T216199) [23:16:08] (03CR) 10jerkins-bot: [V: 04-1] Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003" [puppet] - 10https://gerrit.wikimedia.org/r/490788 (https://phabricator.wikimedia.org/T216199) (owner: 10Dzahn) [23:20:42] (03PS4) 10Dzahn: Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003" [puppet] - 10https://gerrit.wikimedia.org/r/490788 (https://phabricator.wikimedia.org/T216199) [23:23:50] (03CR) 10Dzahn: [C: 03+2] Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003" [puppet] - 10https://gerrit.wikimedia.org/r/490788 (https://phabricator.wikimedia.org/T216199) (owner: 10Dzahn) [23:24:09] (03PS5) 10Dzahn: Revert "DHCP: add 24:B6:FD:F6:17:3A as bast3003" [puppet] - 10https://gerrit.wikimedia.org/r/490788 (https://phabricator.wikimedia.org/T216199) [23:28:27] 10Operations, 10ops-esams, 10DC-Ops, 10decommission, 10Patch-For-Review: decom bast3003 (formerly ms-be3003, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Dzahn) [23:28:53] 10Operations, 10ops-esams, 10DC-Ops, 10decommission, 10Patch-For-Review: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Dzahn) [23:37:38] !log niharika29@deploy1001 Started deploy [scholarships/scholarships@25ea138]: Update app with updated dependencies to mitigate PHPMailer error T215302 [23:37:41] !log niharika29@deploy1001 Finished deploy [scholarships/scholarships@25ea138]: Update app with updated dependencies to mitigate PHPMailer error T215302 (duration: 00m 02s) [23:37:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:37:42] T215302: Website Revamp - https://phabricator.wikimedia.org/T215302 [23:37:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:41:41] bd808: Deployment went smooth. We have some new translations in so I left files for those languages in. Interestingly, it seems like some translators are ignoring URLs completely when doing translations (which makes sense from their POV) but it means we have links pointing to wikimania2016 wikis in those languages. [23:41:52] I created a new test application and it works fine. [23:43:09] Niharika: nice. If you really care, you can edit urls yourself via TranslateWiki :) [23:43:11] (03PS1) 10Dduvall: ci: Permit git traffic between zuul mergers ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) [23:43:52] (03PS1) 10Dzahn: contint: allow port 9418 on IPv6 for zuul-mergers [puppet] - 10https://gerrit.wikimedia.org/r/490791 (https://phabricator.wikimedia.org/T216204) [23:44:06] (03PS2) 10Dduvall: ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) [23:44:18] bd808: I'll probably just do a find and replace commit for them once we get more translations in and it's time for another deploy. [23:44:59] (03PS3) 10Dduvall: ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) [23:47:28] !log restarting labsdb1005 mysql in read only mode [23:47:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:48:22] mutante: i think our two patches sailed silently past each other :) [23:49:01] i will not be offended if you merge your own [23:49:02] marxarelli: yea, i noticed after hitting enter but before git-review was done :p [23:49:15] but yours is a bit fancier i guess :) [23:49:20] moving the stuff out to Hiera [23:49:36] haha :) [23:49:49] i wasn't even sure the @resolve would work in yaml [23:49:55] i was just annoyed with vim trying to wrap the line [23:50:03] as good a reason as any ^ [23:50:09] yea, i also got annoyed by the long line and used \ [23:50:12] ugly in its own way [23:50:39] yeah, @resolve should be fine in a multiline string [23:50:49] we should take yours if it compiles fine [23:51:12] cool :D [23:51:47] btw, if you use vim i fixed the syntax highlighting for those multiline scalars a while back [23:51:53] should be upstream [23:52:07] too much yaml editing! [23:52:12] ah :) nice [23:52:28] (03PS5) 10Dzahn: remove bast3003 prod IP, bast3002 has been repaired [dns] - 10https://gerrit.wikimedia.org/r/489104 (https://phabricator.wikimedia.org/T184936) [23:52:41] it helps a lot w/ integration/config/jjb edits [23:52:54] (03Abandoned) 10Dzahn: contint: allow port 9418 on IPv6 for zuul-mergers [puppet] - 10https://gerrit.wikimedia.org/r/490791 (https://phabricator.wikimedia.org/T216204) (owner: 10Dzahn) [23:53:29] i wanted to keep puppet-lint happy but we also tell it to ignore the "long line warnings" it would emit by default [23:54:13] marxarelli: try out "check experimental" ?:) [23:54:27] do you know that one already? [23:54:36] oh for operations/puppet? [23:54:39] yes [23:54:52] 10Operations, 10Office-IT, 10Wikimedia-Mailing-lists: Mailing list migration for Arbitration Committee - https://phabricator.wikimedia.org/T215940 (10elappen-WMF) [23:54:54] i think i may have used it before [23:54:55] you can add a line "Hosts: , " [23:55:02] oooh that's fancy [23:55:05] and then "check experimental" and it should give you puppet compiler output [23:55:09] for these shosts [23:56:04] the Hosts: line should be above the Bug: line [23:56:33] oh neat. in the commit message then? [23:56:52] yes, the Hosts in the commit message and the "check" command as a gerrit comment [23:57:10] (03PS4) 10Dduvall: ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) [23:57:19] (03CR) 10Dduvall: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) (owner: 10Dduvall) [23:57:25] I have always gone to the puppet compiler jenkins page and triggered it by hand, that's good to know [23:58:05] (03CR) 10jerkins-bot: [V: 04-1] ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) (owner: 10Dduvall) [23:58:07] i have to say it's really experimental because i also ran into a case where it said OK but in reality the compiler said it failed due to a typo [23:58:14] (03CR) 10jerkins-bot: [V: 04-1] ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) (owner: 10Dduvall) [23:58:17] oh that's so cool! [23:58:18] so dont trust it completely yet [23:59:01] so that got you https://puppet-compiler.wmflabs.org/compiler1001/112/ [23:59:12] and the srange at https://puppet-compiler.wmflabs.org/compiler1001/112/contint1001.wikimedia.org/ [23:59:29] hmm, that other job bailed with "Invalid commit message"