[00:01:15] (03CR) 10Reedy: [C: 031] MWMultiversion: Move CLI entry point to class and out of MWVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331930 (owner: 10Chad) [00:02:13] (03CR) 10Dzahn: [C: 031] Use LE for wikitech [puppet] - 10https://gerrit.wikimedia.org/r/331638 (https://phabricator.wikimedia.org/T154913) (owner: 10Alex Monk) [00:02:30] (03Abandoned) 10Dzahn: DNS: Add DNS entries for ms-fe200[5-8] Bug:T152612 [dns] - 10https://gerrit.wikimedia.org/r/325856 (owner: 10Papaul) [00:03:54] (03PS3) 10Dzahn: mirrors: Indent @ssl_settings in NGINX configuration [puppet] - 10https://gerrit.wikimedia.org/r/329743 (owner: 10Tim Landscheidt) [00:05:18] (03CR) 10Chad: [C: 032] MWMultiversion: Move CLI entry point to class and out of MWVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331930 (owner: 10Chad) [00:05:33] (03CR) 10Dzahn: [C: 032] mirrors: Indent @ssl_settings in NGINX configuration [puppet] - 10https://gerrit.wikimedia.org/r/329743 (owner: 10Tim Landscheidt) [00:06:29] (03Merged) 10jenkins-bot: MWMultiversion: Move CLI entry point to class and out of MWVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331930 (owner: 10Chad) [00:11:09] (03PS2) 10Brion VIBBER: Add 'webp' package to ImageMagick role [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) [00:11:29] (03CR) 10jenkins-bot: MWMultiversion: Move CLI entry point to class and out of MWVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331930 (owner: 10Chad) [00:12:42] !log demon@tin Synchronized multiversion: Clean up cli entry point (duration: 00m 54s) [00:12:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:16:18] (03CR) 10Brion VIBBER: "Note I have not tested the actual puppet config but I think I got syntax correct. :)" [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) (owner: 10Brion VIBBER) [00:18:55] brion: Protip, puppet compiler! :) [00:19:23] https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/build [00:19:32] Give it a change #, a target host, boom [00:19:45] (fwiw, syntax is fine) [00:19:50] ooooh [00:20:15] * brion *bookmarks* [00:22:58] (03CR) 10Dzahn: "you'll need a line like" [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) (owner: 10Brion VIBBER) [00:23:38] (03CR) 10Brion VIBBER: "Ah whoops! I'll fix. Thanks :)" [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) (owner: 10Brion VIBBER) [00:25:37] apergos: So, I landed the CLI side of multiversion cleanup w/o incident. I wrote up https://gerrit.wikimedia.org/r/#/c/331931/ for your runallthescriptsalltheplacesimserious script :) [00:25:47] Can land whenever you want [00:25:58] (03PS3) 10Brion VIBBER: Add 'webp' package to ImageMagick role [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) [00:26:26] PROBLEM - puppet last run on dbproxy1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:26:45] heh just noticed my commits are still on my personal email addr [00:26:57] volunteer 4 life [00:27:08] if that email redirector service ever goes out of business i'm gonna be so effed [00:27:10] (03PS1) 10Chad: Remove getMediaWikiCli() entry point, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331933 [00:28:10] holy shitballs i have had that pobox.com address for 20 years [00:28:13] "Member since January 1997" [00:34:22] (03PS2) 10ArielGlenn: MWMultiVersion: Use proper (new) cli entry point [puppet] - 10https://gerrit.wikimedia.org/r/331931 (owner: 10Chad) [00:35:50] (03CR) 10ArielGlenn: [C: 032] MWMultiVersion: Use proper (new) cli entry point [puppet] - 10https://gerrit.wikimedia.org/r/331931 (owner: 10Chad) [00:37:01] ostriches: thanks, done [00:37:20] brion: you _just_ noticed that? I've been snickering about that for... well since you left and came back [00:40:10] apergos: i vaguely thought i'd set it different on my work repos, must've forgot ;) [00:40:23] heh [00:42:56] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [00:43:56] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2818907 keys, up 73 days 16 hours - replication_delay is 47 [00:54:26] RECOVERY - puppet last run on dbproxy1008 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:59:06] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:59:46] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:00:00] 06Operations, 10ops-codfw: es2019 crashed again - https://phabricator.wikimedia.org/T149526#2937391 (10Papaul) 05Open>03Resolved This ask has been open for almost 2 months. closing it. it can be reopen anytime we have the issue again. [01:00:31] 06Operations, 10Ops-Access-Requests: Requesting access to analytics-privatedata-users for anomie - https://phabricator.wikimedia.org/T155143#2937393 (10dr0ptp4kt) Approved. [01:01:56] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [01:19:26] (03CR) 10Krinkle: [C: 031] "It seems Vagrant still has a variation of this method as well. Would be good to phase it out there as well (or first, to set the example)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331933 (owner: 10Chad) [01:20:36] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:21:02] (03CR) 10Krinkle: [C: 031] Grant access to analytics-privatedata-users to demon [puppet] - 10https://gerrit.wikimedia.org/r/331925 (https://phabricator.wikimedia.org/T155198) (owner: 10Chad) [01:21:34] ostriches: https://gerrit.wikimedia.org/r/#/c/330709/ - aye, not yet rolled out [01:21:40] Wouldbe nice to get out today [01:27:46] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [01:30:22] (03PS4) 10Reedy: Optionally filter private wiki results in mwgrep [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) [01:30:28] !log bsitzmann@tin Starting deploy [trending-edits/deploy@cf388a9]: Update trending-edits to 421fa63 [01:30:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:30:53] (03CR) 10jerkins-bot: [V: 04-1] Optionally filter private wiki results in mwgrep [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) (owner: 10Reedy) [01:32:21] !log bsitzmann@tin Finished deploy [trending-edits/deploy@cf388a9]: Update trending-edits to 421fa63 (duration: 01m 53s) [01:32:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:36:01] ostriches: https://en.wikipedia.org/w/wiki@phtml?title=foo&action=info - Fun url that works :P [01:36:03] Also - https://en.wikipedia.org/w/wiki/phtml?title=foo&action=info [01:38:28] (03CR) 10Reedy: [C: 04-1] "Needs rebasing" [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) (owner: 10Reedy) [01:47:25] Krinkle: probably apache rewriting those phtml ones [01:48:25] It is [01:48:33] Krinkle noticed no escaping on a . [01:48:37] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [01:55:23] (03PS1) 10Reedy: Escape period in wiki.phtml rewrites [puppet] - 10https://gerrit.wikimedia.org/r/331944 [01:58:06] (03PS2) 10Reedy: Escape period in wiki.phtml rewrites [puppet] - 10https://gerrit.wikimedia.org/r/331944 [01:58:30] (03CR) 10Reedy: "Prevents these:" [puppet] - 10https://gerrit.wikimedia.org/r/331944 (owner: 10Reedy) [02:20:42] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.7) (duration: 07m 11s) [02:20:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:25:57] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Jan 13 02:25:57 UTC 2017 (duration 5m 16s) [02:26:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:46:32] 06Operations, 10Wikimedia-General-or-Unknown: Increase $wgHTTPImportTimeout to a higher value on WMF wikis - https://phabricator.wikimedia.org/T155209#2937514 (10TTO) [03:06:03] (03Draft2) 10TTO: Increase $wgHTTPImportTimeout to 50 seconds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331946 (https://phabricator.wikimedia.org/T155209) [03:25:26] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [03:26:51] (03CR) 10Chad: [C: 031] Escape period in wiki.phtml rewrites [puppet] - 10https://gerrit.wikimedia.org/r/331944 (owner: 10Reedy) [03:37:12] (03CR) 10Chad: "Will do, but that's its own code and not ours, so not a dependency :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331933 (owner: 10Chad) [03:45:11] (03CR) 10Chad: "Done in I53fbc8acef6aa4dbdf81a62173d93e41f6af9150" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331933 (owner: 10Chad) [03:53:26] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [04:00:46] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:09:46] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [24.0] [04:21:46] (03CR) 10Chad: [C: 04-2] "I don't *think* this is used anywhere anymore, but want to hold this for a day when I can watch logs closely for a few hours. Waiting til " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331933 (owner: 10Chad) [04:29:47] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [04:30:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [04:39:38] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to hive/webrequest data for demon - https://phabricator.wikimedia.org/T155198#2937620 (10demon) [04:46:46] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [04:52:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [04:55:46] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [04:56:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [04:57:14] (03CR) 10Reedy: "I guess this needs .8, but won't harm going out before in prep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331946 (https://phabricator.wikimedia.org/T155209) (owner: 10TTO) [04:57:40] looking at labstore1003 flapping [05:13:22] seems like rsync from dumps is happening - should be okay [05:26:17] (03CR) 10Chad: "Well if the files in debian/ are GPLv2+, why do we list that and apache for it? I figured:" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox) [05:28:06] (03CR) 10Chad: "Everyone on the subscription list: can you please indicate that you're ok with relicensing this whole repo from GPL to Apache? Thanks!" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox) [05:29:35] (03CR) 10Chad: [C: 031] "Is the import limit new to .8? But yeah, can go out whenever" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331946 (https://phabricator.wikimedia.org/T155209) (owner: 10TTO) [05:32:56] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [06:03:56] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/0: down - Core: cr2-codfw:xe-5/2/1 (Telia, IC-314534, 29ms) {#11375} [10Gbps wave]BR [06:03:56] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/2/1: down - Core: cr1-eqord:xe-0/0/0 (Telia, IC-314534, 24ms) {#10694} [10Gbps wave]BR [06:29:56] PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:37:56] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [06:37:56] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [06:39:56] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [06:47:46] PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:50:06] PROBLEM - Check HHVM threads for leakage on mw1260 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:54:16] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:06:46] PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:06:56] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [07:09:56] PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:16:16] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:21:16] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:23:16] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [07:40:56] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps wave]BR [08:03:56] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:04:56] RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK [08:07:06] RECOVERY - Check HHVM threads for leakage on mw1260 is OK: OK [08:25:44] (03PS1) 10Alexandros Kosiaris: zotero: Actually limit as not rss [puppet] - 10https://gerrit.wikimedia.org/r/331959 [08:25:58] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] zotero: Actually limit as not rss [puppet] - 10https://gerrit.wikimedia.org/r/331959 (owner: 10Alexandros Kosiaris) [08:32:56] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [08:47:24] (03PS1) 10Juniorsys: base module linting changes [puppet] - 10https://gerrit.wikimedia.org/r/331960 [08:51:26] PROBLEM - puppet last run on mc1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:19:26] RECOVERY - puppet last run on mc1035 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [09:47:56] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0 [09:50:56] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps wave]BR [09:51:56] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0 [10:05:31] 06Operations, 10Wikimedia-General-or-Unknown: Increase $wgHTTPImportTimeout to a higher value on WMF wikis - https://phabricator.wikimedia.org/T155209#2937859 (10TTO) Not sure where @gerritbot went, but there is a #patch-for-review at https://gerrit.wikimedia.org/r/#/c/331946/ [10:51:06] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (Zotero alive) is CRITICAL: Test Zotero alive returned the unexpected status 404 (expecting: 200) [10:52:06] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [10:58:26] (03CR) 10Paladox: "Oh user now. * is Apache 2.0 and Debian/ is gpl 2.0+." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox) [10:59:12] (03PS31) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [11:00:50] (03CR) 10Paladox: "@Chad I fixed the license now so that it is gpl 2.0+ but * is Apache 2.0." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox) [11:02:03] 06Operations, 06Labs: register - https://phabricator.wikimedia.org/T155234#2938071 (10Ramalepe) [11:04:18] 06Operations, 10OTRS, 07Wikimedia-Incident: OTRS error (back up, now monitoring) - https://phabricator.wikimedia.org/T154841#2938084 (10akosiaris) The problem has been identified and an upstream bug has been filed https://bugs.otrs.org/show_bug.cgi?id=12536 [11:12:46] RECOVERY - Check HHVM threads for leakage on mw1259 is OK: OK [11:18:46] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [11:22:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [11:42:46] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [24.0] [11:43:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [11:46:46] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [11:50:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [11:58:01] (03PS7) 10Hashar: Use rake tasks to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223 [11:58:38] (03PS2) 10Alexandros Kosiaris: base module linting changes [puppet] - 10https://gerrit.wikimedia.org/r/331960 (owner: 10Juniorsys) [11:59:09] (03CR) 10jerkins-bot: [V: 04-1] Use rake tasks to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223 (owner: 10Hashar) [11:59:36] (03CR) 10Hashar: "I have dropped spec:all that had all the spec:* as prerequisites. It does not quite works since whenever a module fail rake stops." [puppet] - 10https://gerrit.wikimedia.org/r/307223 (owner: 10Hashar) [12:00:59] (03PS8) 10Hashar: Use rake tasks to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223 [12:01:30] (03CR) 10Hashar: "rubocop hates me. I have dropped a redundant return in spec_modules function" [puppet] - 10https://gerrit.wikimedia.org/r/307223 (owner: 10Hashar) [12:02:32] (03CR) 10Hashar: [C: 04-1] "That is a transient commit until all the spec fixes land in puppet.git" [puppet] - 10https://gerrit.wikimedia.org/r/331850 (owner: 10Hashar) [12:05:46] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [12:07:11] (03CR) 10Alexandros Kosiaris: [C: 032] base module linting changes [puppet] - 10https://gerrit.wikimedia.org/r/331960 (owner: 10Juniorsys) [12:07:53] 06Operations, 06Labs: register - https://phabricator.wikimedia.org/T155234#2938146 (10zhuyifei1999) 05Open>03Invalid * https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Access for Tool Labs shell account * https://wikitech.wikimedia.org/wiki/Help:Getting_Started for WMFLabs shell account * https://wikitec... [12:11:29] 06Operations, 10IDS-extension, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#2938148 (10Shoichi) >>! In T148693#2936504, @Arthur2e5 wrote: > Yes. OK. I make a little change to use 't' as means translation: ``` //t this is a fun... [12:12:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [12:15:46] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [24.0] [12:18:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [12:29:46] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [12:39:56] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [12:40:56] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2826234 keys, up 74 days 4 hours - replication_delay is 0 [12:52:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [13:04:56] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [13:09:46] PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:24:36] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [13:32:56] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [13:38:46] RECOVERY - puppet last run on conf1003 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [13:52:36] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [15:07:41] I got 4 successive swift failure running importImages.php, I've noted the filenames here: https://phabricator.wikimedia.org/T152938#2938378 [15:29:46] PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:56] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:04] (03PS1) 10Dereckson: Enable GuidedTour on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331986 (https://phabricator.wikimedia.org/T152827) [15:57:46] RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:01:56] RECOVERY - puppet last run on rcs1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:07:39] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video, and 3 others: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2938415 (10Paladox) I wonder is this http://stackoverflow.com/questions/30801590/ffmpeg-performance-issue-when-... [16:16:26] (03PS2) 10Tim Landscheidt: trebuchet: Fully qualify hostname [puppet] - 10https://gerrit.wikimedia.org/r/328457 (https://phabricator.wikimedia.org/T153608) [16:19:18] elukey: hey, if you don't mind can you run this query in hadoop? https://phabricator.wikimedia.org/T154434#2911597 [16:19:28] for Persian Wikipedia [16:20:13] I have access to stat1003 which doesn't have hadoop access :( [16:31:01] (03Draft1) 10Paladox: Update gerrit.init to match upstream stable-2.13 [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331988 [16:31:03] (03Draft2) 10Paladox: Update gerrit.init to match upstream stable-2.13 [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331988 [16:49:13] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video, and 3 others: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2938481 (10zhuyifei1999) >>! In T153488#2938415, @Paladox wrote: > I wonder is this http://stackoverflow.com/qu... [16:54:46] PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:58:03] (03CR) 10Dzahn: [C: 031] Add 'webp' package to ImageMagick role [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) (owner: 10Brion VIBBER) [17:02:46] RECOVERY - puppet last run on labservices1001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:02:54] brion: /win 3 [17:03:00] oops: p [17:03:15] windows 3.1.1 for workgroups [17:06:41] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/5090/mw1298.eqiad.wmnet/1" [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) (owner: 10Brion VIBBER) [17:12:06] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [17:12:56] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2843025 keys, up 74 days 8 hours - replication_delay is 0 [17:17:02] (03PS2) 10Dzahn: restbase: add wikimania2018 [puppet] - 10https://gerrit.wikimedia.org/r/331523 (https://phabricator.wikimedia.org/T155038) [17:21:14] (03CR) 10DatGuy: [C: 031] restbase: add wikimania2018 [puppet] - 10https://gerrit.wikimedia.org/r/331523 (https://phabricator.wikimedia.org/T155038) (owner: 10Dzahn) [17:42:16] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:43:16] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [17:44:32] (03CR) 10Chad: [C: 032] "Not planning to use this in production but sure ok" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331988 (owner: 10Paladox) [17:46:16] PROBLEM - puppet last run on mw1241 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:55:46] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [17:55:46] PROBLEM - puppet last run on mc1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:59:44] (03PS2) 10Tim Landscheidt: staging: Fully qualify hostname [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) [18:02:17] n 26 [18:02:25] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video, and 3 others: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2938572 (10Paladox) Should we give --speed ago? [18:05:17] (03PS2) 10Tim Landscheidt: deployment-prep: Fully qualify hostnames [puppet] - 10https://gerrit.wikimedia.org/r/328455 (https://phabricator.wikimedia.org/T153608) [18:10:46] Amir1: let's discuss it on #wikimedia-analytics [18:10:51] more people will help in there [18:11:24] sure [18:15:16] RECOVERY - puppet last run on mw1241 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [18:22:20] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to hive/webrequest data for demon - https://phabricator.wikimedia.org/T155198#2938582 (10greg) Chad's manager approval. [18:22:36] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [18:22:56] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 612 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2845120 keys, up 74 days 10 hours - replication_delay is 612 [18:23:46] RECOVERY - puppet last run on mc1018 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [18:24:56] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2843946 keys, up 74 days 10 hours - replication_delay is 0 [18:39:26] 06Operations, 10netops: cr2-esams<->cr2-eqiad link down - https://phabricator.wikimedia.org/T154952#2938588 (10faidon) 05Open>03Resolved a:03faidon This apparently was related to a maintenance that we were never notified of. [18:40:15] 06Operations, 10netops: cr2-esams<->cr2-eqiad link flaps - https://phabricator.wikimedia.org/T154577#2938591 (10faidon) Flaps seem to have reappeared again: ``` Jan 9 23:05:53 re0.cr2-esams mib2d[1797]: SNMP_TRAP_LINK_DOWN: ifIndex 537, ifAdminStatus up(1), ifOperStatus down(2), ifName xe-0/1/3 Jan 10 00:44:... [19:49:57] (03Draft1) 10Paladox: Gerrit: Allow running sudo service gerrit * under the gerrit2 user [puppet] - 10https://gerrit.wikimedia.org/r/331998 [19:50:01] (03Draft2) 10Paladox: Gerrit: Allow running sudo service gerrit * under the gerrit2 user [puppet] - 10https://gerrit.wikimedia.org/r/331998 [19:50:39] (03CR) 10Paladox: "This is safe as the user can only run sudo service gerrit* *" [puppet] - 10https://gerrit.wikimedia.org/r/331998 (owner: 10Paladox) [19:50:54] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Allow running sudo service gerrit * under the gerrit2 user [puppet] - 10https://gerrit.wikimedia.org/r/331998 (owner: 10Paladox) [19:51:42] (03PS3) 10Paladox: Gerrit: Allow running sudo service gerrit * under the gerrit2 user [puppet] - 10https://gerrit.wikimedia.org/r/331998 [19:52:36] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video, and 3 others: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2938801 (10brion) If I'm reading the graph data correctly, the queue is now caught up; I'm seeing actual transc... [20:02:09] 06Operations, 10Traffic, 13Patch-For-Review: convert librenms.wikimedia.org from GS to LE cert (expires: 2017-02-11) - https://phabricator.wikimedia.org/T154919#2938832 (10RobH) 05Open>03Resolved [20:03:06] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:03:14] (03PS4) 10Paladox: Gerrit: Allow running sudo service gerrit * under the gerrit2 user [puppet] - 10https://gerrit.wikimedia.org/r/331998 [20:04:54] 06Operations, 10ops-ulsfo, 10netops: lvs4002 power supply failure - https://phabricator.wikimedia.org/T151273#2938835 (10RobH) I've not gone on-site to do this yet, it seemed lower priority than my tasks at the time. I'll plan to drive into ULSFO during the week next week to swap power supplies around and o... [20:15:35] (03PS5) 10Paladox: Gerrit: Allow running sudo service gerrit * under the gerrit2 user [puppet] - 10https://gerrit.wikimedia.org/r/331998 [20:19:44] (03CR) 10Paladox: "tested and sudo service gerrit works on the user gerrit2 user :)" [puppet] - 10https://gerrit.wikimedia.org/r/331998 (owner: 10Paladox) [20:19:49] (03CR) 10Paladox: [C: 031] Gerrit: Allow running sudo service gerrit * under the gerrit2 user [puppet] - 10https://gerrit.wikimedia.org/r/331998 (owner: 10Paladox) [20:29:06] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:32:06] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [20:54:41] (03CR) 10Paladox: [C: 031] "Only problem i have identified is stopping it won't work. But starting it will. So we will need to manually go to the dir to stop it." [puppet] - 10https://gerrit.wikimedia.org/r/331998 (owner: 10Paladox) [20:58:06] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:01:08] (03PS32) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:02:23] (03PS33) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:03:20] (03PS34) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:07:17] (03PS35) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:11:40] (03PS36) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:12:55] 06Operations, 10Mail, 13Patch-For-Review: mx1001/2001 - Exim SMTP - Certificate expires Sep 22 2016 - https://phabricator.wikimedia.org/T144568#2938864 (10RobH) [21:13:32] 06Operations: repalce non-lvm paritioning with lvm - https://phabricator.wikimedia.org/T129287#2938865 (10RobH) 05Open>03declined [21:14:27] (03PS37) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:18:32] (03PS38) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:21:56] PROBLEM - puppet last run on mw1296 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:25:01] (03PS39) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:25:06] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: Puppet has 1 failures [21:30:06] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: Puppet has 1 failures [21:30:39] (03PS40) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:33:18] (03PS41) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:35:06] RECOVERY - check_puppetrun on backup4001 is OK: OK: Puppet is currently enabled, last run 170 seconds ago with 0 failures [21:35:47] (03PS42) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:37:46] (03PS43) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:40:16] PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: Puppet has 1 failures [21:40:32] (03PS44) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 [21:45:16] RECOVERY - check_puppetrun on db1025 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:45:19] (03CR) 10Paladox: "Thanks :)" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331988 (owner: 10Paladox) [21:47:00] (03CR) 10Paladox: "@Chad this now passes :)" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox) [21:50:56] RECOVERY - puppet last run on mw1296 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [22:11:56] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:39:56] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:55:30] (03CR) 10Hashar: [C: 031] deployment-prep: Fully qualify hostnames [puppet] - 10https://gerrit.wikimedia.org/r/328455 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [22:56:38] !log delete labs instance data older than 60d from graphite[21]001, low disk space [22:56:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log