[00:15:57] (03Draft1) 10Paladox: WIP: Planet: Redesgn UI [puppet] - 10https://gerrit.wikimedia.org/r/435327 [00:16:01] (03PS2) 10Paladox: WIP: Planet: Redesgn UI [puppet] - 10https://gerrit.wikimedia.org/r/435327 [00:16:15] paladox: thanks ! [00:16:21] your welcome :) [00:16:42] not finished the ui though need to fix the location for the logo [00:16:51] what was the link again to find which instance uses a puppet role [00:17:26] https://planet-hotdog.wmflabs.org/# [00:17:55] wow, very nice [00:17:59] that's the KDE one , right [00:18:02] yep [00:18:06] much better than this morning, heh [00:18:10] heh [00:19:02] mobile optimised and also you doin't get any errors with urls like before (you would see errors in the console) this is from my testing at least. [00:19:08] i think that unblocked the switch away from venus.. and with that jessie [00:19:20] great [00:20:35] prettyyyyyyyyyyyy [00:21:47] 10Blocked-on-Operations, 10Puppet, 10Reading-Infrastructure-Team-Backlog, 10Sentry, and 3 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#4233265 (10Dzahn) @Tgr I would like to merge https://gerrit.wikimedia.org/r/#/c/434539/ and I saw this role is not in production... [00:22:02] yes :) a big difference [00:23:10] you gotta compare to rawdog's theme we had before, not the current prod planet, heh [00:23:14] but either way [00:23:44] heh [00:25:14] try the subscribe link [00:25:23] paladox, is it empty? [00:25:34] nope [00:25:44] atom should be in there (file not generated yet) [00:26:17] yes, there is the label but not the file. i was wondering if we call it atom or rss20 for that pending change [00:26:25] ok!:) [00:27:11] heh i am planing on seeing this [00:27:13] http://offog.org/git/rawdog-plugins/archive.py [00:27:16] will do it [00:27:58] ah, a different plugin [00:28:22] yep [00:28:51] cool! cu later, i'll step away for a bit [00:29:20] though it looks like it generates mutiple files [00:30:23] hmm. yea not sure if "archive" is exactly like "feed of feeds" [00:30:30] maybe not [00:48:00] the rss file should be atom compatible i think [03:44:14] PROBLEM - HHVM jobrunner on mw1296 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:44:25] PROBLEM - HHVM jobrunner on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:46:54] PROBLEM - HHVM jobrunner on mw1318 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:47:04] PROBLEM - HHVM jobrunner on mw1293 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:47:45] PROBLEM - HHVM jobrunner on mw1294 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:47:45] PROBLEM - HHVM jobrunner on mw1338 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:48:44] PROBLEM - HHVM jobrunner on mw1307 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:34] RECOVERY - HHVM jobrunner on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 9.603 second response time [03:58:44] RECOVERY - HHVM jobrunner on mw1318 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [03:58:55] RECOVERY - HHVM jobrunner on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [03:59:25] RECOVERY - HHVM jobrunner on mw1307 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [03:59:44] RECOVERY - HHVM jobrunner on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [04:00:44] RECOVERY - HHVM jobrunner on mw1294 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [04:00:44] RECOVERY - HHVM jobrunner on mw1338 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [05:13:03] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435284 (owner: 10Marostegui) [05:14:27] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435284 (owner: 10Marostegui) [05:16:33] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1092 after alter table (duration: 01m 22s) [05:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:16:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435492 (https://phabricator.wikimedia.org/T194273) [05:18:12] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435492 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [05:19:20] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435492 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [05:19:39] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435284 (owner: 10Marostegui) [05:21:28] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1099:3318 for alter table (duration: 01m 21s) [05:21:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:21:54] !log Add tmp1 index back on db1099:3318 - T194273 [05:21:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:21:58] T194273: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273 [05:40:25] PROBLEM - HHVM rendering on mw2261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:41:24] RECOVERY - HHVM rendering on mw2261 is OK: HTTP OK: HTTP/1.1 200 OK - 74053 bytes in 0.311 second response time [06:07:49] 10Operations, 10MediaWiki-extensions-WikibaseClient, 10Wikidata: Lookup by property label for {{#property}} and {{#statements}} is broken (WMF production environment) - https://phabricator.wikimedia.org/T195642#4233316 (10eranroz) [06:09:49] 10Operations, 10MediaWiki-extensions-WikibaseClient, 10Wikidata: Lookup by property label for {{#property}} and {{#statements}} is broken (WMF production environment) - https://phabricator.wikimedia.org/T195642#4233305 (10eranroz) May be related to T195520 [06:29:45] PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/modprobe.d/nf_conntrack.conf] [06:33:06] 10Operations, 10MediaWiki-extensions-WikibaseClient, 10Wikidata: Lookup by property label for {{#property}} and {{#statements}} is broken (WMF production environment) - https://phabricator.wikimedia.org/T195642#4233305 (10Legoktm) >>! In T195642#4233316, @eranroz wrote: > May be related to T195520 Yes, that... [06:33:15] 10Operations, 10MediaWiki-extensions-WikibaseClient, 10Wikidata: Lookup by property label for {{#property}} and {{#statements}} is broken (WMF production environment) - https://phabricator.wikimedia.org/T195642#4233342 (10eranroz) [06:33:19] 10Operations, 10Wikidata, 10Wikimedia-General-or-Unknown, 10MW-1.32-release-notes (WMF-deploy-2018-05-15 (1.32.0-wmf.4)), and 4 others: Multiple projects reporting Cannot access the database: No working replica DB server - https://phabricator.wikimedia.org/T195520#4233345 (10eranroz) [06:45:04] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.099 second response time [06:50:05] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1969 bytes in 0.110 second response time [07:00:05] RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:48:05] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1946 bytes in 0.100 second response time [07:58:14] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1951 bytes in 0.101 second response time [08:39:34] PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet operation_type={container_status,create_container,image_status,podsandbox_status,remove_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [08:40:35] RECOVERY - kubelet operational latencies on kubernetes1002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [09:33:55] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1972 bytes in 0.095 second response time [09:42:49] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435623 [09:44:28] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435623 (owner: 10Marostegui) [09:45:46] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435623 (owner: 10Marostegui) [09:47:40] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1099:3318 after alter table (duration: 01m 21s) [09:47:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:34] (03PS1) 10Marostegui: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435624 (https://phabricator.wikimedia.org/T194273) [09:51:24] (03PS2) 10Marostegui: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435624 (https://phabricator.wikimedia.org/T194273) [09:52:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435624 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [09:54:14] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1943 bytes in 0.095 second response time [09:54:30] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435624 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [09:56:10] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1087 for alter table (duration: 01m 20s) [09:56:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:15] !log Add tmp1 index back on db1087 (sanitarium master), this will generate lag on labsdb hosts - T194273 [09:56:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:19] T194273: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273 [10:47:05] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1960 bytes in 0.101 second response time [10:57:14] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1951 bytes in 0.091 second response time [11:04:34] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1976 bytes in 0.092 second response time [11:14:44] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1977 bytes in 0.099 second response time [11:26:55] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1975 bytes in 0.126 second response time [11:32:04] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1944 bytes in 0.106 second response time [11:44:15] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1968 bytes in 0.117 second response time [11:54:25] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1953 bytes in 0.094 second response time [12:42:14] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1972 bytes in 0.109 second response time [12:44:18] (03CR) 10Alex Monk: "with one tiny little edit I think this will be good" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/432532 (owner: 10Merlijn van Deen) [12:52:25] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1967 bytes in 0.094 second response time [12:56:13] (03PS1) 10Urbanecm: Revert "Revert "Enable $wgUseRCPatrol on azwiki"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435628 (https://phabricator.wikimedia.org/T194389) [12:56:42] (03PS1) 10Urbanecm: Revert "Revert "Revert "Temp rate limit for arwiki due to mass vandalism""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435629 (https://phabricator.wikimedia.org/T192668) [13:08:56] (03Abandoned) 10Urbanecm: Add images.rkd.nl to copy upload whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430390 (https://phabricator.wikimedia.org/T193639) (owner: 10Urbanecm) [13:29:18] 10Operations, 10Traffic, 10Wikimedia-Hackathon-2018: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962#4233674 (10Krenair) Getting that working was almost suspiciously easy... [13:30:14] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#4233678 (10Krenair) [13:30:17] 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure: Host deployment-puppetdb01 is DOWN: CRITICAL - Host Unreachable (10.68.23.76) - https://phabricator.wikimedia.org/T187736#4233675 (10Krenair) 05Open>03Resolved a:03Krenair It's back and Puppet is behaving. [13:40:47] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Domains, 10Traffic: HTTP 500 on stats.wikipedia.org (invalid domain) - https://phabricator.wikimedia.org/T195568#4231062 (10Nemo_bis) There are still lingering links to the old domain, so a redirect would be in order. (Unless it's extraordinarily compli... [13:41:02] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Domains, 10Traffic: HTTP 500 on stats.wikipedia.org (invalid domain) - https://phabricator.wikimedia.org/T195568#4233684 (10Nemo_bis) p:05Triage>03Normal [13:47:34] (03PS1) 10Alex Monk: Allow PuppetDB use on standalone puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/435631 (https://phabricator.wikimedia.org/T194962) [13:48:08] (03CR) 10jerkins-bot: [V: 04-1] Allow PuppetDB use on standalone puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/435631 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [13:48:24] (03PS3) 10ArielGlenn: turn off misc dump crons on snapshot1007 [puppet] - 10https://gerrit.wikimedia.org/r/432365 (https://phabricator.wikimedia.org/T181936) [13:49:03] (03CR) 10ArielGlenn: [C: 032] turn off misc dump crons on snapshot1007 [puppet] - 10https://gerrit.wikimedia.org/r/432365 (https://phabricator.wikimedia.org/T181936) (owner: 10ArielGlenn) [13:51:10] (03CR) 10Alex Monk: "13:48:06 modules/role/manifests/puppetmaster/standalone.pp:80 wmf-style: Found hiera call in class 'role::puppetmaster::standalone' for 'p" [puppet] - 10https://gerrit.wikimedia.org/r/435631 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [13:56:13] (03PS2) 10ArielGlenn: role for new misc dumps cron host [puppet] - 10https://gerrit.wikimedia.org/r/432366 (https://phabricator.wikimedia.org/T181936) [13:57:09] (03CR) 10ArielGlenn: [C: 032] role for new misc dumps cron host [puppet] - 10https://gerrit.wikimedia.org/r/432366 (https://phabricator.wikimedia.org/T181936) (owner: 10ArielGlenn) [14:00:15] (03PS3) 10ArielGlenn: add snapshot1008 role and hiera settings [puppet] - 10https://gerrit.wikimedia.org/r/432367 (https://phabricator.wikimedia.org/T181936) [14:08:55] (03CR) 10ArielGlenn: [C: 032] add snapshot1008 role and hiera settings [puppet] - 10https://gerrit.wikimedia.org/r/432367 (https://phabricator.wikimedia.org/T181936) (owner: 10ArielGlenn) [14:12:03] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435634 [14:12:24] (03PS1) 10ArielGlenn: add snapshot1008 to dumps scap targets [dumps/scap] - 10https://gerrit.wikimedia.org/r/435635 [14:12:38] apergos, any chance you can run 'apt-cache policy puppetdb-termini' on a prod puppetmaster? [14:12:50] looking at puppetmaster/puppetdb stuff in deployment-prep [14:12:52] not right now, I'm in the middle of someting [14:12:53] ok [14:13:13] p.s. ignore any broken puppet on snapshot1008 for the next little while [14:14:01] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435634 (owner: 10Marostegui) [14:14:31] (03CR) 10ArielGlenn: [V: 032 C: 032] add snapshot1008 to dumps scap targets [dumps/scap] - 10https://gerrit.wikimedia.org/r/435635 (owner: 10ArielGlenn) [14:15:23] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435634 (owner: 10Marostegui) [14:16:42] anyone happen to know if the puppetmasters run stretch? [14:17:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1087 after alter table (duration: 01m 20s) [14:17:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:07] yes they do [14:19:20] ok thanks [14:20:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435636 (https://phabricator.wikimedia.org/T194273) [14:22:42] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435636 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [14:23:57] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435636 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [14:25:31] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 for alter table (duration: 01m 07s) [14:25:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:37] !log Add tmp1 index back on db1101:3318 - T194273 [14:25:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:42] T194273: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273 [14:29:52] PROBLEM - Check systemd state on snapshot1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:46:02] PROBLEM - nutcracker port on snapshot1008 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused [14:47:51] PROBLEM - nutcracker process on snapshot1008 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nutcracker), command name nutcracker [14:53:11] RECOVERY - nutcracker process on snapshot1008 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker [14:53:41] RECOVERY - nutcracker port on snapshot1008 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212 [14:53:42] RECOVERY - Check systemd state on snapshot1008 is OK: OK - running: The system is fully operational [15:10:31] Krenair: did you get your puppetmaster stuff sorted? I have a few minutes now [15:10:57] well [15:11:05] I think I have the stuff I need for my project [15:11:33] puppet DB won't fully be in use in deployment-prep (e.g. including for SSH keys) until I figure out the different puppetdb versions vs. packaging vs. jessie [15:12:04] alternatively, until I just make a new puppetmaster running stretch [15:12:31] if prod is already heading in that direction, moving to stretch probably makes the most sense [15:12:38] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation: Rack and setup snapshot1008 - https://phabricator.wikimedia.org/T195385#4233839 (10ArielGlenn) Puppetization is done, closing this in favor of T181936 if there are any issue. [15:12:59] fair enough [15:13:46] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests, 10Patch-For-Review: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#3807102 (10ArielGlenn) [15:16:37] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests, 10Patch-For-Review: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#4233849 (10ArielGlenn) To do: make sure that cron jobs kick off as we expect (just wait for any one to run) up number... [15:19:18] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests, 10Patch-For-Review: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#4233851 (10ArielGlenn) @hoo: wikidata weeklies now run on snapshot1008. Do not try to run them on snapshot1007! [15:28:05] apergos, you still using deployment-dumps-puppetmaster? [15:28:11] yes [15:28:26] ok [15:29:11] if it can run stretch without a problem (at the time there was some issue) then feel free [15:32:09] i've been running a puppet master as stretch in the cloud [15:32:13] so it should work :) [16:18:46] (03CR) 10Tulsi Bhagat: [C: 031] Enable template editor group on newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435106 (https://phabricator.wikimedia.org/T195557) (owner: 10Biplab Anand) [16:25:46] now I'm trying to make my new stretch puppetmaster be it's own client [16:25:51] but it thinks it's own cert is revoked?? [16:29:04] in fact if I move another host to using it, that thinks the host cert is revoked too. wtf [16:31:04] Krenair puppetdb? [16:31:10] no [16:31:13] nothing to do with that [16:31:34] rm -rf /var/lib/puppet/ssl [16:32:08] sudo service apache2 stop [16:32:39] sudo puppet cert list -a (ctrl-c when you see "Notice: Signed certificate request for ca") [16:32:49] then "sudo puppet master --no-daemonize --verbose" [16:32:58] when you see "Notice: Starting Puppet master " ctrl-c [16:33:10] Krenair see https://docs.puppet.com/puppet/3.8/ssl_regenerate_certificates.html#step-1-clear-and-regenerate-certs-on-your-puppet-master [16:34:04] puppet cert list -a does not show such a notice [16:34:46] also we're using 4.8, not 3.8 [16:35:33] Krenair yeh but the steps still apply [16:36:09] looking at https://puppet.com/docs/puppet/4.8/ssl_regenerate_certificates.html#step-1-clear-and-regenerate-certs-on-your-puppet-master [16:36:16] "Regenerate the CA by running sudo puppet cert list -a. You should see this message: Notice: Signed certificate request for ca." [16:36:21] except I don't see that [16:36:46] wait there we go [16:37:46] had to get rid of /var/lib/puppet/server/ssl too [16:39:02] oh i forgot about that one [16:39:05] (different path) [16:39:50] nah, no good [16:40:34] Krenair did you sign the cert? [16:40:46] yes [16:40:46] puppet cert sign [16:40:50] ok [16:41:23] Krenair what is the full error? [16:41:43] there was no error from the signing process, that was fine [16:42:30] yep i mean the cert error your getting. [16:43:50] the same one I've been getting from the beginning [16:44:01] certificate revoked for /CN=deployment-puppetmaster03.deployment-prep.eqiad.wmflabs [16:44:38] https://phabricator.wikimedia.org/P7159 [16:45:22] Krenair what puppet config are you using [16:45:27] /etc/puppet/puppet.conf [16:45:31] ? [16:45:34] oh [16:46:10] it comes from base/puppet.conf.d/10-main.conf.erb [16:46:18] it shoulden't [16:46:29] is what it uses in cloud anyways /etc/puppet/puppet.conf [16:46:59] why not? [16:47:24] -puppetmaster02's /etc/puppet/puppet.conf comes from base/puppet.conf.d/10-main.conf.erb [16:48:01] /etc/puppet/puppet.conf that's symnlink to base/puppet.conf.d/10-main.conf.erb ? [16:48:15] is this deployment-puppetmaster03.deployment-prep.eqiad.wmflabs using labs puppet master? [16:48:25] no [16:48:30] it's not a symlink [16:48:41] there's a great big header at the top reading [16:48:42] ##### THIS FILE IS MANAGED BY PUPPET [16:48:42] ##### as template('base/puppet.conf.d/10-main.conf.erb') [16:48:55] and no, it's not using the labs puppet master [16:48:56] Krenair my config is https://phabricator.wikimedia.org/P7160 [16:49:13] it was bootstrapped from -puppetmaster02 and I'm now trying to make it use itself [16:49:47] oh i see. [16:52:12] Krenair maybe https://ask.puppet.com/question/18204/certificate-gets-revoked/ ? [16:52:23] already saw that [16:52:31] didn't do any good [17:04:22] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: Traceback (most recent call last) [17:04:22] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: Traceback (most recent call last) [17:09:31] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 9 probes of 304 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [17:09:31] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 7 probes of 303 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [17:31:49] Krenair: last I knew stretch + self-puppetmaster didn't play well together [17:32:04] yeah :( [17:32:09] you could set up a separate puppetmaster and get it done that way (which I therefore did for dumps) [17:32:16] maybe they have not been able to fix that yet [17:32:43] I am making a separate puppetmaster [17:32:45] ok it wastes an image but you don't need a lot of resources for the separate one [17:33:09] I'm making deployment-puppetmaster03 [17:33:35] if I can get it working as well as the current one, deployment-puppetmaster02 (jessie), I'll move everything over [17:34:25] ok [17:34:43] good luck! [17:35:05] thanks [18:28:28] (03PS2) 10Hoo man: Wikidata dispatching: Update comment [puppet] - 10https://gerrit.wikimedia.org/r/432778 [18:31:37] (03PS1) 10Hoo man: Wikidata: Always have 4 change dispatchers running [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435648 (https://phabricator.wikimedia.org/T194602) [18:56:18] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435650 [18:58:03] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435650 (owner: 10Marostegui) [18:59:35] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435650 (owner: 10Marostegui) [19:01:46] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1101:3318 after alter table (duration: 01m 21s) [19:01:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:05] paladox: I declare victory [20:47:10] Notice: Applied catalog in 8.39 seconds [20:47:13] Krenair :) [20:47:20] now to figure out what the *** I did that made it work [20:47:23] and document it somewhere [20:47:25] heh [20:47:28] probably [20:47:33] Krenair it should work straight away. [20:47:43] at least it works for me. [20:47:52] well it doesn't [20:50:14] https://phabricator.wikimedia.org/P7161 [20:50:56] Krenair ah, i guess the ssl path should be changed? [20:51:00] ? [20:51:15] Krenair cp server/ssl/certs/deployment-puppetmaster03.deployment-prep.eqiad.wmflabs.pem ssl/certs/deployment-puppetmaster03.deployment-prep.eqiad.wmflabs.pem [20:51:25] so moving it from server to ssl [20:51:29] which is the default [20:51:31] from server/ssl to ssl [20:51:34] yep [20:51:37] basically the trick was [20:51:40] that's the default in puppet [20:51:53] and that would at least fix puppetdb support [20:51:55] instead of running puppet agent -tv on the client, puppet cert sign {client fqdn} on the master, and puppet agent -tv on the client [20:52:06] run puppet cert generate {client fqdn}, then copy files from server to client [20:52:16] (except all on one box which serves as both master and client) [20:52:55] and with some swapping between puppet-master and apache2 for reasons I've forgotten [20:53:07] and creating server/ssl/crl to make our apache config happy [20:54:48] oh [21:11:22] so I had to do that [21:11:31] I also had to copy over the operations/puppet and labs/private repositories [21:11:37] and the volatile directory [21:11:52] but deployment-tin is successfully moved to the new puppetmaster [21:11:55] :) [21:15:12] 10Puppet, 10Cloud-VPS: role::puppetmaster::standalone on stretch: Unable to locate package geoipupdate - https://phabricator.wikimedia.org/T171916#3479983 (10Krenair) Looks like this got fixed at some stage, suggest closing: ```krenair@deployment-puppetmaster03:~$ apt-cache policy geoipupdate geoipupdate: I... [21:26:18] (03PS5) 10Reedy: Move /multiversion/vendor to /vendor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432013 (owner: 10Krinkle) [21:27:55] (03CR) 10Reedy: [C: 032] Move /multiversion/vendor to /vendor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432013 (owner: 10Krinkle) [21:29:31] (03Merged) 10jenkins-bot: Move /multiversion/vendor to /vendor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432013 (owner: 10Krinkle) [21:38:44] (03PS1) 10Reedy: rm test.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435657 [21:39:16] (03CR) 10Reedy: [C: 032] rm test.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435657 (owner: 10Reedy) [21:40:30] (03Merged) 10jenkins-bot: rm test.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435657 (owner: 10Reedy) [21:42:35] !log reedy@tin Synchronized vendor/: canhasvendor (duration: 01m 46s) [21:42:36] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3656187 (10JeroenDeDauw) Any ETA on when WMF will be able to run full PHP 7.0? And will it be running PHP 7.0 or a more rece... [21:42:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:30] another interesting thing I found: [21:43:33] 2018/05/26 21:39:58 [error] 461#461: *9709 connect() failed (111: Connection refused) while connecting to upstream, client: 10.68.21.200, server: , request: "POST /v3/commands?checksum=2f697045a43a9082c55862231b8b968804cbad8f HTTP/1.1", upstream: "http://[::1]:8080/v3/commands?checksum=2f697045a43a9082c55862231b8b968804cbad8f", host: "deployment-puppetdb01.deployment-prep.eqiad.wmflabs" [21:43:53] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4234207 (10Reedy) >>! In T176370#4234205, @JeroenDeDauw wrote: > Any ETA on when WMF will be able to run full PHP 7.0? And w... [21:43:54] this is because nginx is proxying to localhost (::1) [21:45:09] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4234208 (10Reedy) T174431#4198574 Should be a few weeks before we have PHP 7 available everywhere. There's still work to m... [21:45:13] !log reedy@tin Synchronized multiversion/: multiversion (duration: 01m 21s) [21:45:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:45:44] but it's not accepting connections on that port over ipv6 [21:45:51] Reedy, messing with multiversion on a saturday? :) [21:45:59] Past tense [21:46:04] heh [21:47:00] !log reedy@tin Synchronized composer.json: (no justification provided) (duration: 01m 19s) [21:47:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:43:38] i fixed the logo layout on planet (using the new ui) now https://planet-hotdog.wmflabs.org :) [22:50:09] Krenair: if anyone can get away with breaking the wiki over a weekend, its Reedy [23:00:40] (03PS1) 10Legoktm: Add `webservice-python-bootstrap` command [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/435662 (https://phabricator.wikimedia.org/T174769) [23:04:21] (03CR) 10Zhuyifei1999: [C: 031] Add `webservice-python-bootstrap` command [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/435662 (https://phabricator.wikimedia.org/T174769) (owner: 10Legoktm) [23:07:57] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435623 (owner: 10Marostegui) [23:08:09] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435636 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [23:08:20] (03CR) 10jenkins-bot: rm test.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435657 (owner: 10Reedy) [23:08:44] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435492 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [23:08:54] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435634 (owner: 10Marostegui) [23:09:26] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435624 (https://phabricator.wikimedia.org/T194273) (owner: 10Marostegui) [23:09:35] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435650 (owner: 10Marostegui) [23:09:41] (03Draft2) 10Reedy: Support PHPUnit 6.5 in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 [23:09:43] (03CR) 10jenkins-bot: Move /multiversion/vendor to /vendor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432013 (owner: 10Krinkle) [23:32:28] (03CR) 10Jforrester: "> "The requested package || could not be found, it looks like its name is invalid, "||" is not allowed in package names."" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [23:33:19] (03CR) 10Reedy: "Filed T195688" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [23:34:02] (03CR) 10Harej: [C: 031] "Thank you for your contribution!" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/435662 (https://phabricator.wikimedia.org/T174769) (owner: 10Legoktm) [23:38:38] (03CR) 10Jforrester: "Fun." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [23:52:07] (03PS1) 10Alex Monk: puppet DB nginx: Talk to upstream only over IPv4 localhost [puppet] - 10https://gerrit.wikimedia.org/r/435670 [23:52:35] (03CR) 10Alex Monk: "Full error example: 2018/05/26 23:16:35 [error] 17301#17301: *45 connect() failed (111: Connection refused) while connecting to upstream, " [puppet] - 10https://gerrit.wikimedia.org/r/435670 (owner: 10Alex Monk) [23:54:28] (03CR) 10Alex Monk: "(and before anyone asks: I checked, it looks like talking to ::1 in a labs VM does work, just not to this particular service)" [puppet] - 10https://gerrit.wikimedia.org/r/435670 (owner: 10Alex Monk)