[00:16:45] (03PS16) 10MaxSem: Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel) [00:53:43] !log hand hack apache on labmon to make it work temporarily [00:53:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:54:51] wow [00:54:56] shinken looks empty... [00:55:05] (for deployment-prep) [00:56:46] chasemp, so does this leave us at the PARTY! step of T137924? :) [00:56:46] T137924: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924 [00:57:23] Krenair: heh well I think I can make graphite work but I'm confused about why it ever worked before and yuvi tested this morning [00:57:32] so yeah light hearted party I think :) [00:59:45] chasemp, hmm [01:00:10] did there used to be a symlink from wherever the default docroot is to the proper one? [01:00:53] good question, I'm not sure [01:02:24] okay... shall I leave a comment suggesting that on the ticket? [01:04:04] sure, I was going to confirm w/ yuvi what exactly he tested this morning before I got too speculative [01:07:27] !log sign labstore1005 puppet certs and bootstrap the server [01:07:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:23:07] hey ori, keyholder issue has come up in beta [01:23:20] Shouldn't check_keyholder use SSH_AUTH_SOCK=/run/keyholder/proxy.sock when calling ssh-add -l? [01:48:42] Krenair: it has 'export SSH_AUTH_SOCK="/run/keyholder/proxy.sock"' in line 4 [01:48:53] so it doesn't need to set it especially for ssh-add -l [01:58:32] Oops. [01:58:34] Yes, good point [01:58:41] That's why things were not working for me [02:28:20] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.6) (duration: 10m 53s) [02:28:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:45:17] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.7) (duration: 07m 58s) [02:45:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:51:43] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Jun 25 02:51:43 UTC 2016 (duration 6m 26s) [02:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:39:45] 06Operations, 10Icinga, 10Shinken: shutdown neon (icinga) after it has been replaced with shinken - https://phabricator.wikimedia.org/T125023#2406116 (10Krenair) Do we have/need a separate ticket for migrating from icinga to shinken? [04:24:42] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 200, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/2/3: down - Core: cr2-codfw:xe-5/0/1 (Zayo, OGYX/120003//ZYO) 36ms {#2909} [10Gbps wave]BR [04:32:13] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 26 probes of 397 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [04:38:24] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 1 probes of 397 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [04:44:45] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 202, down: 0, dormant: 0, excluded: 0, unused: 0 [06:12:51] 06Operations, 10DBA, 10Phabricator: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2401019 (10greg) >>! In T138460#2404370, @jcrespo wrote: > Also, could this be part of your goal (maybe not technically, but give them equal importance as gallium)? I own these machines, but m3... [06:31:26] (03PS1) 1020after4: Install arcanist from apt rather than git. [puppet] - 10https://gerrit.wikimedia.org/r/295975 [06:31:42] PROBLEM - puppet last run on db2058 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:11] PROBLEM - puppet last run on nobelium is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:11] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:21] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:21] PROBLEM - puppet last run on db2056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:52] PROBLEM - puppet last run on elastic1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:52] PROBLEM - puppet last run on db2044 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:10] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:11] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:31] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:40] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:41] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 2 failures [06:34:10] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:41:51] PROBLEM - puppet last run on mw2086 is CRITICAL: CRITICAL: puppet fail [06:56:01] RECOVERY - puppet last run on db2058 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:56:21] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:41] RECOVERY - puppet last run on db2056 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:57:10] RECOVERY - puppet last run on elastic1042 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:57:11] RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:57:21] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:57:30] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:57:42] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:57:51] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:57:51] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:22] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:32] RECOVERY - puppet last run on nobelium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:32] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:50] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:08:59] RECOVERY - puppet last run on mw2086 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [07:20:19] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [07:34:56] PROBLEM - puppet last run on mw2138 is CRITICAL: CRITICAL: puppet fail [07:59:16] RECOVERY - puppet last run on mw2138 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:22:05] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures [08:23:13] chasemp sorry, was away again. I tested by going to https://graphite.wmflabs.org - everything worked as expected. I looked at a few tool labs instance graphs, they had a gap in the middle but that was it [08:23:38] chasemp robh err, and https://graphite.wmflabs.org/ works for me too just now? [09:06:01] ori: looks fine now [09:16:45] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [09:21:25] (03PS1) 10Dzahn: diamond: move _lib classes to own files [puppet] - 10https://gerrit.wikimedia.org/r/295982 [09:37:12] !log install2001 killing ganglia aggregator processes, running puppet, for debugging [09:37:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:37:35] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [09:40:56] (03PS1) 10Dzahn: openstack: move instancersync define to own file [puppet] - 10https://gerrit.wikimedia.org/r/295985 [09:46:30] 06Operations, 05Security: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2406322 (10Dzahn) [09:49:09] 06Operations, 10vm-requests, 05Security: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2406334 (10Peachey88) [09:50:35] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures [10:00:33] (03PS1) 10Dzahn: lint-ignore arrows in tests/server [puppet/kafka] - 10https://gerrit.wikimedia.org/r/295986 [10:03:46] (03PS1) 10Dzahn: puppetmaster: lint-ignore layout in test/puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/295987 [10:08:10] PROBLEM - puppet last run on mw2175 is CRITICAL: CRITICAL: puppet fail [10:15:41] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:16:42] (03PS1) 10Dzahn: rm misc::monitoring::view::hadoop [puppet] - 10https://gerrit.wikimedia.org/r/295988 [10:21:59] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2343854 (10Jarry1250) Hi. I'm really surprised that my bot (LivingBot) is still failing this. Is there any way to get more diagnostic info? [10:23:27] 10:19 < doctaxon|WM> Hi! How can I find out the target page of a redirecting page by API? Any ideas? [10:23:30] 10:20 < doctaxon|WM> I couldn't find it in mediawiki api manual [10:30:37] doctaxon|WM: https://en.wikipedia.org/w/api.php?action=query&titles=Hybrid%20cars&prop=info|links&redirects [10:30:52] doctaxon|WM: the from/to part in there i think [10:33:21] RECOVERY - puppet last run on mw2175 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [11:14:46] When do you sleep Alexz? [11:21:27] mutante: thank you [11:35:57] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2406455 (10BBlack) >>! In T136674#2406396, @Jarry1250 wrote: > Hi. I'm really surprised that my bot (LivingBot) is still failing this. Is t... [11:58:09] 06Operations, 10Traffic, 13Patch-For-Review: Decrease max object TTL in varnishes - https://phabricator.wikimedia.org/T124954#2406470 (10BBlack) Yeah it's not great, but what do you expect to happen? That's what we're telling Varnish to do based on the standards. This is the timeline we're talking about (... [12:46:39] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2406548 (10Aklapper) Crystal ball broken. :) When this will be finished depends on the author (see T138136#2404415) and anyon... [12:59:02] (03CR) 10Glaisher: "Can this be abandoned now? The task has been closed as declined for a long time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239854 (https://phabricator.wikimedia.org/T111898) (owner: 10Mdann52) [13:05:56] (03Abandoned) 10Jforrester: Allow sysops to add and remove accounts from bot group on mai.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239854 (https://phabricator.wikimedia.org/T111898) (owner: 10Mdann52) [13:06:03] Glaisher: {{done}} [13:06:16] Thanks [13:09:06] (03PS1) 10KartikMistry: apertium-hbs-eng: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/296049 (https://phabricator.wikimedia.org/T107306) [13:09:16] 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2406592 (10KartikMistry) [13:13:21] (03PS1) 10KartikMistry: apertium-isl: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-isl] - 10https://gerrit.wikimedia.org/r/296050 (https://phabricator.wikimedia.org/T107306) [13:14:53] 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2406596 (10KartikMistry) [13:19:53] 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2406600 (10KartikMistry) [13:19:58] (03PS1) 10KartikMistry: apertium-hbs-mkd: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/296051 (https://phabricator.wikimedia.org/T107306) [13:32:25] PROBLEM - puppet last run on mc2015 is CRITICAL: CRITICAL: puppet fail [13:33:57] PROBLEM - puppet last run on mw2138 is CRITICAL: CRITICAL: Puppet has 1 failures [13:40:15] PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: puppet fail [13:45:15] PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: puppet fail [13:50:15] RECOVERY - check_puppetrun on americium is OK: OK: Puppet is currently enabled, last run 144 seconds ago with 0 failures [13:58:46] RECOVERY - puppet last run on mc2015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:00:36] RECOVERY - puppet last run on mw2138 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:03:02] (03PS2) 10Dzahn: install_server: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295784 (owner: 10Muehlenhoff) [14:03:34] (03CR) 10Dzahn: [C: 032] install_server: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295784 (owner: 10Muehlenhoff) [14:25:43] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2406690 (10Jarry1250) >>! In T136674#2406455, @BBlack wrote: > @Jarry1250 - The insecure accesses with account `LivingBot` have the User-Ag... [14:32:08] (03PS1) 10Dzahn: ores: add monitoring for production workers [puppet] - 10https://gerrit.wikimedia.org/r/296054 [14:33:07] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:33:55] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:34:57] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [14:35:34] (03CR) 10Ladsgroup: [C: 031] ores: add monitoring for production workers [puppet] - 10https://gerrit.wikimedia.org/r/296054 (owner: 10Dzahn) [14:36:10] (03PS2) 10Dzahn: ores: add monitoring for production workers [puppet] - 10https://gerrit.wikimedia.org/r/296054 [14:36:59] (03CR) 10Dzahn: [C: 032] ores: add monitoring for production workers [puppet] - 10https://gerrit.wikimedia.org/r/296054 (owner: 10Dzahn) [14:40:27] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [15:00:30] (03PS1) 10Dzahn: ores: fix-up monitoring of workers in prod [puppet] - 10https://gerrit.wikimedia.org/r/296055 [15:02:35] (03PS2) 10Dzahn: ores: fix-up monitoring of workers in prod [puppet] - 10https://gerrit.wikimedia.org/r/296055 [15:04:01] (03CR) 10Dzahn: [C: 032] ores: fix-up monitoring of workers in prod [puppet] - 10https://gerrit.wikimedia.org/r/296055 (owner: 10Dzahn) [15:07:36] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2406796 (10Dzahn) @Ladsgroup The issue with "Fkraus" bot by Paulis that we talked about at Wikimania is part of this ticket. [16:08:42] PROBLEM - puppet last run on mw2230 is CRITICAL: CRITICAL: puppet fail [16:15:05] 06Operations, 06Labs, 10Labs-Infrastructure: Some labs instances IP have multiple PTR entries in DNS - https://phabricator.wikimedia.org/T115194#2406932 (10Krenair) ```krenair@mira:~$ host 10.68.17.146 146.17.68.10.in-addr.arpa domain name pointer ci-jessie-wikimedia-140638.contintcloud.eqiad.wmflabs. 146.17... [16:32:40] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2406964 (10Dereckson) @MoritzMuehlenhoff So, as said in T138136#2404415 the Debian package is for another font, Lateef, desig... [16:36:01] RECOVERY - puppet last run on mw2230 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:36:46] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2406972 (10Jarry1250) Okay, hopefully LivingBot is fixed now... let's see. [17:01:21] (03Abandoned) 10Yuvipanda: tools: Install jdk8 in trusty nodes [puppet] - 10https://gerrit.wikimedia.org/r/292960 (https://phabricator.wikimedia.org/T121279) (owner: 10Yuvipanda) [17:02:36] (03Abandoned) 10Yuvipanda: labs: Alias floating IPs in wikitextexp project as well [puppet] - 10https://gerrit.wikimedia.org/r/282511 (https://phabricator.wikimedia.org/T132216) (owner: 10Yuvipanda) [17:02:49] (03PS7) 10Yuvipanda: devpi: Add module + role [puppet] - 10https://gerrit.wikimedia.org/r/282102 [19:44:01] PROBLEM - MariaDB Slave SQL: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1032, Errmsg: Could not execute Delete_rows_v1 event on table phabricator_cache.cache_markupcache: Cant find record in cache_markupcache, Error_code: 1032: handler error HA_ERR_KEY_NOT_FOUND: the events master log db1043-bin.001181, end_log_pos 799761020 [19:50:44] (03PS1) 10Dzahn: ores: missing single quotes for worker monitor [puppet] - 10https://gerrit.wikimedia.org/r/296065 [19:53:15] (03CR) 10Dzahn: [C: 032] ores: missing single quotes for worker monitor [puppet] - 10https://gerrit.wikimedia.org/r/296065 (owner: 10Dzahn) [19:57:31] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 200, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/2/3: down - Core: cr2-codfw:xe-5/0/1 (Zayo, OGYX/120003//ZYO) 36ms {#2909} [10Gbps wave]BR [19:58:58] (03PS2) 10Dzahn: puppetmaster: lint-ignore layout in test/puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/295987 [19:59:09] (03CR) 10Dzahn: [C: 032] puppetmaster: lint-ignore layout in test/puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/295987 (owner: 10Dzahn) [20:07:21] PROBLEM - MariaDB Slave Lag: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1693.35 seconds [20:15:22] 07Puppet, 13Patch-For-Review, 03Scap3 (Scap3-Adoption-Phase1): move scap3 keyholder configuration to hiera to avoid proliferation of more*::deployment::source classes - https://phabricator.wikimedia.org/T130419#2407199 (10mmodell) 05Open>03Resolved a:03mmodell [20:24:26] twentyafterfour: your user account on phab2001 will be created once it uses the phabricator role class [20:24:59] mutante: thanks, how do I add the role? [20:25:21] ops has to do it [20:25:23] twentyafterfour: i'll do it and show you [20:26:00] we want it to be an exact copy of iridium, right [20:28:45] (03PS1) 10Dzahn: let phab2001 use same role classes as iridium [puppet] - 10https://gerrit.wikimedia.org/r/296067 (https://phabricator.wikimedia.org/T137928) [20:29:24] this way it is least likely they become any different [20:29:33] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2407232 (10Whatamidoing-WMF) I've contact all of the new names in the list (Electron_Bot, Pahles, KSFT, Amalthea_(bot), Qsx753698, and Alph... [20:29:48] (03CR) 10jenkins-bot: [V: 04-1] let phab2001 use same role classes as iridium [puppet] - 10https://gerrit.wikimedia.org/r/296067 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [20:30:06] isn't there still supposed to be a { ? [20:30:28] 06Operations, 10ORES, 06Revision-Scoring-As-A-Service, 10Traffic, 07HTTPS: https://ores.wikimedia.org redirects me to HTTP when I don't include a trailing slash - https://phabricator.wikimedia.org/T138682#2407233 (10Ladsgroup) [20:30:46] and I think it'd be better for the regex to be /^(iridium\.eqiad|phab2001\.codfw)\.wmnet$/ [20:31:11] (03PS2) 10Dzahn: let phab2001 use same role classes as iridium [puppet] - 10https://gerrit.wikimedia.org/r/296067 (https://phabricator.wikimedia.org/T137928) [20:32:04] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2407236 (10Whatamidoing-WMF) @Johan, if you are planning to run an announcement about this in Tech News soon, then may I recommend a link t... [20:39:00] PROBLEM - puppet last run on mw2122 is CRITICAL: CRITICAL: Puppet has 1 failures [20:47:54] Amir1: i cant confirm the redirect [20:49:23] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/295987 (owner: 10Dzahn) [20:51:48] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: lint-ignore layout in test/puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/295987 (owner: 10Dzahn) [20:51:56] haha [20:52:16] rake-jessie ABORTED in 52m 24s [20:52:32] ABORTED is an interesting and new reason to fail [20:53:00] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/295987 (owner: 10Dzahn) [20:53:50] (03PS3) 10Dzahn: let phab2001 use same role classes as iridium [puppet] - 10https://gerrit.wikimedia.org/r/296067 (https://phabricator.wikimedia.org/T137928) [20:54:47] mutante: yes exact copy of iridium. A couple of things will fail because it's running jessie but I can fix that (mostly just needs systemd changes, I think) [20:55:33] twentyafterfour: ok, i'm merging it then [20:55:39] Krenair: amended regex , true [20:57:28] twentyafterfour: yea, very likely. the service{} needs to use provider => systemd _if_ jessie.. or even better, just "provider => $::initsystem" [20:58:03] and then a unit file in /etc/systemd/system [20:58:34] * twentyafterfour had a patch to do that before but I think it got lost [20:59:01] can copy it from other modules where we had to do the same [21:00:11] https://gerrit.wikimedia.org/r/#/c/274488/ [21:01:30] (03Restored) 1020after4: Phabricator: support systemd as well as upstart. [puppet] - 10https://gerrit.wikimedia.org/r/274488 (owner: 1020after4) [21:01:43] yep, close. but i'd add "provider =>" to the service [21:01:46] (03PS3) 1020after4: Phabricator: support systemd as well as upstart. [puppet] - 10https://gerrit.wikimedia.org/r/274488 [21:03:02] (03CR) 10Dzahn: [C: 032] "no diff on iridium, (expected) fail on phab2001" [puppet] - 10https://gerrit.wikimedia.org/r/296067 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [21:03:39] RECOVERY - puppet last run on mw2122 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [21:05:10] (03PS3) 10Dzahn: puppetmaster: lint-ignore layout in test/puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/295987 [21:05:56] twentyafterfour: some more things that will be coming up [21:05:58] E: Unable to locate package python-phabricator [21:07:23] Submodule 'arcanist' ... [21:07:50] a bunch of dependency issues, let's see on second run though [21:08:42] twentyafterfour: but one by one.. first of all you should be able to ssh to it now [21:08:50] your user has been created [21:09:03] :) [21:10:43] E: Unable to locate package php5-mailparse [21:10:45] hrmm [21:11:45] mutante: these are packages that need to be updated for jessie [21:11:52] custom packages I think [21:13:07] ok, so i would expect it in E: Unable to locate package php5-mailparse [21:13:10] eh., wrong paste [21:13:16] here https://apt.wikimedia.org/wikimedia/pool/main/p/ [21:13:50] PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Puppet has 7 failures [21:14:22] hmm, maybe it was installed manually on iridium? I know phabricator got set up very much manually at first and then puppetized [21:15:14] arrg [21:15:59] even though we had phab-01 to test [21:16:22] well, it's the right time to fix it then [21:16:44] let's not do anything manual on this one [21:17:25] manual and production is just a big nono [21:18:01] ACKNOWLEDGEMENT - puppet last run on phab2001 is CRITICAL: CRITICAL: Puppet has 7 failures daniel_zahn https://phabricator.wikimedia.org/T137928 in progress [21:18:48] mutante: agreed [21:19:07] I thought I actually built the php-mailparse package, I'm not sure why it isn't in the repo [21:19:12] chasemp might know [21:21:47] alright, ok [21:24:03] relatedly, arcanist and libphutil are now packaged [21:25:00] but phabricator will use a local checkout of libphutil rather than the package. I'll tag an update for the package every time I deploy a new version to phabricator in production [21:25:16] so that we have a version that's in sync [21:26:30] *nod* [21:26:52] we need /srv/phab/phabricator/src/ [21:27:25] that's deployed with scap [21:27:27] hmmm [21:27:27] makes a patch [21:27:47] so puppet wants to create ./src/extensions [21:27:55] /srv/phab should be a symlink to /srv/deployment/phabricator/deploy [21:28:02] but ./src/ isnt there yet [21:28:10] src/extensions is obsolete, we can kill that puppet rule [21:28:38] ok :) [21:28:47] stops making a patch :) [21:28:52] I'll make one :) [21:28:53] well, amend [21:28:55] ok :) [21:30:22] madhuvishy: notebook1001 is down. did you know? [21:31:10] eh, actually, icinga claimed that but i can still connect to it. [21:31:16] which one should I amend? [21:31:52] twentyafterfour: i did not upload one yet for that one [21:32:10] make a new one that drops the ./extensions/ part [21:32:31] i had just started one that adds ./extensions/src [21:36:38] PROBLEM - PHD should be supervising processes on phab2001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (phd) [21:37:48] 06Operations: notebook1001 shown as DOWN in icinga, due to firewall rules - https://phabricator.wikimedia.org/T138685#2407306 (10Dzahn) [21:38:09] 06Operations: notebook1001 shown as DOWN in icinga, due to firewall rules - https://phabricator.wikimedia.org/T138685#2407294 (10Dzahn) https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=notebook1001 [21:38:14] (03PS1) 1020after4: Remove obsolete code that manages phabricator src/extensions [puppet] - 10https://gerrit.wikimedia.org/r/296068 [21:38:35] https://gerrit.wikimedia.org/r/#/c/296068/ [21:38:58] mutante: ^ ... also, should I silence icinga? I just got paged [21:39:58] ACKNOWLEDGEMENT - PHD should be supervising processes on phab2001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (phd) 20after4 new server, not yet live in production [21:40:07] ACKNOWLEDGEMENT - puppet last run on phab2001 is CRITICAL: CRITICAL: Puppet has 7 failures 20after4 new server, not yet live in production [21:40:38] ACKNOWLEDGEMENT - PHD should be supervising processes on phab2001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (phd) daniel_zahn https://phabricator.wikimedia.org/T137928 [21:40:51] yes, that :) [21:41:12] ACK is best because it means it will shut up until the next state change [21:41:25] but we dont have to remember removing something again either [21:41:31] nice [21:42:49] ACKNOWLEDGEMENT - Host notebook1001 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T138685 [21:43:47] (03PS2) 10Dzahn: Remove obsolete code that manages phabricator src/extensions [puppet] - 10https://gerrit.wikimedia.org/r/296068 (owner: 1020after4) [21:45:24] (03CR) 10Dzahn: [C: 032] Remove obsolete code that manages phabricator src/extensions [puppet] - 10https://gerrit.wikimedia.org/r/296068 (owner: 1020after4) [21:45:33] (03PS4) 1020after4: Phabricator: support systemd as well as upstart. [puppet] - 10https://gerrit.wikimedia.org/r/274488 [21:46:29] ^ systemd support should be good to go now [21:47:40] (03PS2) 1020after4: Install arcanist from apt rather than git. [puppet] - 10https://gerrit.wikimedia.org/r/295975 [21:48:05] (03PS5) 1020after4: Phabricator: support systemd as well as upstart. [puppet] - 10https://gerrit.wikimedia.org/r/274488 [21:48:30] ok, so the extensions thing looks fixed [21:48:34] now i noticed this [21:48:36] -LVS_SERVICE_IPS="10.0.5.3" [21:48:36] +LVS_SERVICE_IPS="" [21:48:58] runs puppet again [21:49:48] I don't think phab2001 is going to be hooked up to lvs, at least not initially. It's going to be a hot spare at first, eventually we can get it set up as a HA cluster [21:50:20] yea, i'm just wondering how come it had that IP and then puppet removes it later [21:50:40] that I do not know [21:51:31] this part is interesing.. didnt know [21:51:38] ConditionPathExists=!... [21:55:09] (03CR) 10Dzahn: [C: 04-1] "http://puppet-compiler.wmflabs.org/3196/" [puppet] - 10https://gerrit.wikimedia.org/r/274488 (owner: 1020after4) [21:55:31] it gets a duplicate declaration somehow [21:55:36] when that gets applied on iridium [21:56:14] need to make a phone call [21:58:44] (03PS6) 1020after4: Phabricator: support systemd as well as upstart. [puppet] - 10https://gerrit.wikimedia.org/r/274488 [22:01:10] (03PS7) 10Dzahn: Phabricator: support systemd as well as upstart. [puppet] - 10https://gerrit.wikimedia.org/r/274488 (owner: 1020after4) [22:01:32] (03CR) 10Dzahn: [C: 032] "lgtm now http://puppet-compiler.wmflabs.org/3197/iridium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/274488 (owner: 1020after4) [22:04:29] twentyafterfour: one more fixed, one more to go [22:04:31] Notice: /Stage[main]/Phabricator::Vcs/Service[ssh-phab]/ensure: ensure changed 'stopped' to 'running' [22:04:34] :) [22:05:14] Execution of '/usr/sbin/service phd start --force' returned 6: Failed to start phd.service: Unit phd.service failed to load: No such file or directory. [22:05:15] nice [22:05:32] it's getting a bit late though. maybe we can continue on that tomorrow? [22:05:39] already much closer [22:05:53] in euro timezone [22:05:54] mutante: yes that's fine with me. thanks for helping :) [22:06:05] yw! happy to continue tomorrow [22:27:06] 06Operations, 10ORES, 06Revision-Scoring-As-A-Service, 10Traffic, 07HTTPS: https://ores.wikimedia.org redirects me to HTTP when I don't include a trailing slash - https://phabricator.wikimedia.org/T138682#2407324 (10Ladsgroup) This seems to be related https://github.com/pallets/flask/issues/773