[00:39:28] (03PS12) 10Dzahn: install_server: split out reprepro to module aptrepo [puppet] - 10https://gerrit.wikimedia.org/r/284763 (https://phabricator.wikimedia.org/T132757) [00:45:53] (03PS1) 10Dzahn: zookeeper: fix lint warnings [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/289979 [00:55:29] (03PS1) 10Dzahn: kafka: fix lint warnings [puppet/kafka] - 10https://gerrit.wikimedia.org/r/289980 [00:59:49] (03PS1) 10Dzahn: memcached: move logrotate,snapshot files into module [puppet] - 10https://gerrit.wikimedia.org/r/289981 [01:22:27] PROBLEM - puppet last run on mw2150 is CRITICAL: CRITICAL: puppet fail [01:51:36] RECOVERY - puppet last run on mw2150 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [02:06:08] PROBLEM - Last backup of the tools filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-tools was exit-code [02:23:29] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 10m 22s) [02:23:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:32:13] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat May 21 02:32:13 UTC 2016 (duration 8m 44s) [02:32:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:06:16] PROBLEM - Last backup of the others filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-others was exit-code [03:12:08] PROBLEM - BGP status on cr1-eqord is CRITICAL: BGP CRITICAL - AS6939/IPv4: Connect, AS6939/IPv6: Connect [03:14:08] RECOVERY - BGP status on cr1-eqord is OK: BGP OK - up: 43, down: 2, shutdown: 0 [03:22:16] PROBLEM - BGP status on cr1-eqord is CRITICAL: BGP CRITICAL - AS6939/IPv4: Connect, AS6939/IPv6: Active [03:26:16] RECOVERY - BGP status on cr1-eqord is OK: BGP OK - up: 43, down: 2, shutdown: 0 [06:05:27] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: puppet fail [06:30:07] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 187 bytes in 0.027 second response time [06:31:16] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:16] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: puppet fail [06:31:26] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:26] PROBLEM - puppet last run on mc2005 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:36] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:31:57] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:47] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:17] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:42:13] Can anybody abandon https://gerrit.wikimedia.org/r/#/c/289656/ as duplicate of https://gerrit.wikimedia.org/r/#/c/289773/ and abandon https://gerrit.wikimedia.org/r/#/c/289645/ as outdated (I'll merge 289773 and 289645 into one commit Permissions changes on fawiki)? Thanks. [06:43:49] and please abandon 289773 as outdated too for merging. Thanks. [06:55:57] RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:57:16] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:57:18] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:27] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:57:27] RECOVERY - puppet last run on mc2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:46] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:17] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:26:18] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.028 second response time [09:34:48] 06Operations, 06Performance-Team, 13Patch-For-Review: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963#2314781 (10elukey) Some mc1009 metrics from today: 1) memory consumption is still growing very slowly, it will probably reach the 80G limit in few days. Good... [10:07:30] 06Operations, 10Mail, 10OTRS: administrative rights for GLAM@ - https://phabricator.wikimedia.org/T135874#2314807 (10Peachey88) [10:18:09] 06Operations, 10Mail, 10OTRS: Administrative rights request for GLAM queue on OTRS - https://phabricator.wikimedia.org/T135874#2314822 (10Peachey88) [10:48:06] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [10:48:06] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [10:56:17] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:56:17] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:43:26] (03CR) 10Ladsgroup: [C: 031] "I just merged this: https://github.com/wiki-ai/ores-wikimedia-config/pull/59 so we are good now." [puppet] - 10https://gerrit.wikimedia.org/r/288618 (owner: 10Alexandros Kosiaris) [12:46:26] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: Puppet has 1 failures [13:12:27] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:38:44] 06Operations, 10Mail, 10OTRS: Administrative rights request for GLAM queue on OTRS - https://phabricator.wikimedia.org/T135874#2314980 (10Rjd0060) 05Open>03Resolved a:03Rjd0060 Hi- Alex (or his manager) can request access for an OTRS account by emailing us at otrs-admins@lists.wikimedia.org Ryan OTRS... [14:25:16] (03CR) 10Luke081515: [C: 031] Botadmin group should not be allowed to assign or remove user groups in FA WP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289773 (https://phabricator.wikimedia.org/T135774) (owner: 10Huji) [14:28:04] (03CR) 10Luke081515: [C: 04-1] "We can abadone this patch, as https://gerrit.wikimedia.org/r/#/c/289773/ does the same. No need for this patch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289656 (https://phabricator.wikimedia.org/T135736) (owner: 10Huji) [14:29:35] 06Operations, 06Commons, 10media-storage: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#2315032 (10Dvorapa) Is there any progress on this? Or is it stuck on any subprocess/review? [14:32:11] (03PS1) 10Urbanecm: Adjust groups permissions on fa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135736) [14:33:10] (03CR) 10Luke081515: [C: 04-1] "Correction: You just removed the permission for botadmin, to add users to the groups. The botadmins can still remove users from that group" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289773 (https://phabricator.wikimedia.org/T135774) (owner: 10Huji) [14:33:15] (03CR) 10Urbanecm: "This is merge of Huji's commits." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135736) (owner: 10Urbanecm) [14:34:02] Urbanecm: currently here? [14:35:09] (03PS2) 10Urbanecm: Adjust groups permissions on fa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135736) [14:39:01] (03CR) 10Luke081515: [C: 04-1] "Sorry, but I don't think we should deploy this:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135736) (owner: 10Urbanecm) [14:39:16] hm, gerrit has no ** ? [14:41:13] (03CR) 10Urbanecm: "I wanted to merge all commits into one. I requested for abandon of the others (https://gerrit.wikimedia.org/r/#/c/289773/, https://gerrit." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135736) (owner: 10Urbanecm) [14:41:53] Luke081515 I'm here :) [14:41:55] (03CR) 10Luke081515: "then we should include a imporved patch set of https://gerrit.wikimedia.org/r/#/c/289773/1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135736) (owner: 10Urbanecm) [14:42:13] Urbanecm: ok, I think your merge is ok, but we should include the third patch too ;) [14:44:20] working on it... [14:44:34] ok, no hurry :) [14:46:06] (03PS3) 10Urbanecm: Adjust groups permissions on fa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135736) [14:47:00] Done as PS3 [14:47:06] PROBLEM - puppet last run on mw2199 is CRITICAL: CRITICAL: puppet fail [14:48:16] Urbanecm: I think https://gerrit.wikimedia.org/r/#/c/289773/1 isn't included yet, is it? If so, botadmins should not be able to change any groups [14:50:39] (03PS4) 10Urbanecm: Adjust groups permissions on fa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135736) [14:51:52] Urbanecm: two things to do: the task concerning remove the group modifs says, the botadmins should not remove from any group too (you only removed the eleminator group), and: your commit message doesn't matches your patch ;) [14:51:56] nice patch number btw [14:52:52] (03PS5) 10Urbanecm: Adjust groups permissions on fa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135736) [14:53:46] Urbanecm: looks fine now :) [14:54:20] No, it isn't ok. I removed fawikinews section :D [14:54:22] (03PS6) 10Luke081515: Adjust groups permissions on fa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135774) (owner: 10Urbanecm) [14:54:25] oh [14:54:45] (I just corrected one task number and a bit more style) ;) [14:55:36] Urbanecm: are you sure? I can't see that you removed any fawikinews stuff [14:56:34] Luke081515: I didn't see Skipped ... common lines, it looks ok now. [14:56:51] ok, then I can +1 now :) [14:57:11] (03CR) 10Luke081515: [C: 031] "Looks fine now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135774) (owner: 10Urbanecm) [14:57:44] Thanks. What will happen with others commits? Should I ask Huji for abandoning? [14:58:07] Urbanecm: him, or someone with +2 access, I think they can do that too. [14:58:40] I will do the phab stuff [14:59:25] (03CR) 10Luke081515: "Merged into https://gerrit.wikimedia.org/r/#/c/290000/, we can abadone this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289645 (https://phabricator.wikimedia.org/T135725) (owner: 10Huji) [14:59:55] (03CR) 10Luke081515: "Finally merged into https://gerrit.wikimedia.org/r/#/c/290000/, we can abadone this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289656 (https://phabricator.wikimedia.org/T135736) (owner: 10Huji) [15:00:12] (03CR) 10Luke081515: "Merged into https://gerrit.wikimedia.org/r/#/c/290000/, we can abadone this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289773 (https://phabricator.wikimedia.org/T135774) (owner: 10Huji) [15:00:48] Ok. BTW, I can't schedule it, there is no timetable. Will it be archived soon (I want to use May 23 Morning SWAT)? [15:01:32] maybe ask the guys #wikimedia-releng? There normally creating the table [15:02:33] Urbanecm: concerning that whitelist patch [15:02:45] the gerrit bot will do more unit tests, and give you verified +2 [15:02:56] so if you make mistakes, you can find that earlier [15:04:08] Urbanecm: to give you an example: I'm whitelisted, you actually not. I fyou take a look at https://gerrit.wikimedia.org/r/#/c/290000/, the bot actually makes just basics tests for you. later, where I uploaded a patch set, he made a full test [15:04:55] If it'll help me, I have no idea why I should oppose this proposal. Why gerrit bot do not make full test for everybody? Performance? [15:05:40] Urbanecm: no, security. Unit tests can make damages, if someone with bad faith can trigger them [15:06:06] Urbanecm: and you can comment "recheck" at other users patches, to make a full tests them, as I did it at the whitelist patch [15:07:12] Luke081515: How can unit tests make damages? But thanks :) [15:07:42] Urbanecm: I'm not sure, that was just the reason they told me ;). You can ask hashar at -releng if he is there for the detailed reason, he is the expert there [15:08:03] Urbanecm: but you can take a look at https://integration.wikimedia.org/zuul/, there you can see the differences of checking a change :) [15:08:33] Urbanecm: maybe it was performance instead. It was some time ago, that they told me the reason... ;) [15:08:41] Thanks. [15:11:43] I'm creating a table. [15:11:52] Thanks Dereckson. [15:13:46] RECOVERY - puppet last run on mw2199 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:19:26] MatmaRex: Urbanecm: enjoy https://wikitech.wikimedia.org/wiki/Deployments#Week_of_May_23th [15:19:38] Thanks. [15:22:02] You're welcome. [15:39:41] Dereckson: do you have the permission to abadone mw-config patches? [15:44:12] (03Abandoned) 10Huji: Botadmin group should not be allowed to assign or remove user groups in FA WP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289773 (https://phabricator.wikimedia.org/T135774) (owner: 10Huji) [15:44:40] (03Abandoned) 10Huji: Add 'deletedhistory' right to "eliminator" user group on fa.wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289645 (https://phabricator.wikimedia.org/T135725) (owner: 10Huji) [15:44:52] (03Abandoned) 10Huji: Adjust groups permissions on fa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289656 (https://phabricator.wikimedia.org/T135736) (owner: 10Huji) [15:46:31] Urbanecm and Luke081515 i think hashar plans on letting everyone test fully and not having to be whitelisted. [15:46:38] I think he will use nodepool [15:46:44] Since it can be isolated. [15:46:59] But for now we have to whitelist everyone [15:47:08] ok [15:47:59] ok [15:50:07] PROBLEM - puppet last run on mw2114 is CRITICAL: CRITICAL: Puppet has 1 failures [15:55:26] Luke081515: I have, which ones? [16:00:21] Dereckson: see above, they are now abadoned by the author. but thanks :) [16:01:07] It abandoned only the duplicates or all? [16:01:58] only the dubs. he is not the owner of the change, were oll those patches are included in^^ [16:16:08] RECOVERY - puppet last run on mw2114 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:01:57] 06Operations, 06Commons, 10media-storage: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#2315212 (10Aklapper) @Dvorapa: See "Blocked By" above. (For the records, upstream 2.40.11 had some regressions. Latest releases can be found here: http://ftp.gnome.org/pub/GNOME/sources/lib... [17:32:04] 06Operations, 06Commons, 10media-storage: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#2315250 (10Dvorapa) @Aklapper Could this be provisionally fixed in an old way (before the blocked by task will be resolved)? There are still SVG files all over Wikimedia projects, which are... [18:02:46] PROBLEM - aqs endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:04:37] RECOVERY - aqs endpoints health on aqs1003 is OK: All endpoints are healthy [18:51:26] PROBLEM - HHVM rendering on mw2027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:53:17] RECOVERY - HHVM rendering on mw2027 is OK: HTTP OK: HTTP/1.1 200 OK - 69443 bytes in 3.307 second response time [19:54:07] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [19:54:07] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [19:58:07] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:58:07] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:02:16] PROBLEM - configured eth on kraz is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:02:17] PROBLEM - DPKG on kraz is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:02:37] PROBLEM - dhclient process on kraz is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:02:56] PROBLEM - RAID on kraz is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:03:17] PROBLEM - puppet last run on kraz is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:03:36] PROBLEM - salt-minion processes on kraz is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:03:38] PROBLEM - Check size of conntrack table on kraz is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:03:47] PROBLEM - Disk space on kraz is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:06:06] ^ weird, but ircd is still running [23:06:40] and the rc messages going through..