[00:53:56] PROBLEM - Debian mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/debian is over 37 hours old. [00:59:04] RECOVERY - Debian mirror in sync with upstream on carbon is OK: /srv/mirrors/debian is over 0 hours old. [01:04:24] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [01:05:45] RECOVERY - Host mw2027 is UPING OK - Packet loss = 16%, RTA = 43.64 ms [01:36:24] RECOVERY - puppet last run on sca1001 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [01:36:35] PROBLEM - are wikitech and wt-static in sync on silver is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (92732s 90000s) [02:13:55] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL 1.69% of data above the critical threshold [1000.0] [02:22:45] !log l10nupdate Synchronized php-1.26wmf6/cache/l10n: (no message) (duration: 06m 36s) [02:23:00] Logged the message, Master [02:27:43] !log LocalisationUpdate completed (1.26wmf6) at 2015-05-25 02:26:39+00:00 [02:27:47] Logged the message, Master [02:45:54] !log l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 32s) [02:46:02] Logged the message, Master [02:50:48] !log LocalisationUpdate completed (1.26wmf7) at 2015-05-25 02:49:45+00:00 [02:50:52] Logged the message, Master [03:09:25] RECOVERY - carbon-cache too many creates on graphite1001 is OK Less than 1.00% above the threshold [500.0] [03:58:35] PROBLEM - puppet last run on mw2001 is CRITICAL puppet fail [04:15:24] RECOVERY - puppet last run on mw2001 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [05:05:25] RECOVERY - are wikitech and wt-static in sync on silver is OK: wikitech-static OK - wikitech and wikitech-static in sync (17031 90000s) [05:12:50] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon May 25 05:11:47 UTC 2015 (duration 11m 46s) [05:12:55] Logged the message, Master [05:14:44] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [05:23:14] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [06:30:54] PROBLEM - puppet last run on cp3037 is CRITICAL Puppet has 2 failures [06:33:35] PROBLEM - puppet last run on mw1119 is CRITICAL Puppet has 1 failures [06:34:15] PROBLEM - puppet last run on elastic1030 is CRITICAL Puppet has 2 failures [06:34:34] PROBLEM - puppet last run on db2036 is CRITICAL Puppet has 1 failures [06:34:44] PROBLEM - puppet last run on mw1235 is CRITICAL Puppet has 1 failures [06:34:45] PROBLEM - puppet last run on mw2079 is CRITICAL Puppet has 1 failures [06:34:45] PROBLEM - puppet last run on mw2113 is CRITICAL Puppet has 1 failures [06:34:45] PROBLEM - puppet last run on mw2096 is CRITICAL Puppet has 1 failures [06:34:45] PROBLEM - puppet last run on mw2093 is CRITICAL Puppet has 1 failures [06:34:54] PROBLEM - puppet last run on mw2134 is CRITICAL Puppet has 1 failures [06:35:15] PROBLEM - puppet last run on mw1052 is CRITICAL Puppet has 1 failures [06:46:04] RECOVERY - puppet last run on cp3037 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:46:04] RECOVERY - puppet last run on elastic1030 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on mw1235 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:35] RECOVERY - puppet last run on mw2079 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:46:44] RECOVERY - puppet last run on mw2093 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:46:45] RECOVERY - puppet last run on mw2134 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:47:04] RECOVERY - puppet last run on mw1052 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:14] RECOVERY - puppet last run on mw1119 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:48:04] RECOVERY - puppet last run on db2036 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:15] RECOVERY - puppet last run on mw2113 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:24] RECOVERY - puppet last run on mw2096 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:54:14] PROBLEM - Debian mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/debian is over 43 hours old. [06:57:34] RECOVERY - Debian mirror in sync with upstream on carbon is OK: /srv/mirrors/debian is over 0 hours old. [07:54:14] PROBLEM - puppet last run on rdb1001 is CRITICAL Puppet has 1 failures [08:09:24] RECOVERY - puppet last run on rdb1001 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [08:11:25] PROBLEM - puppet last run on cp4001 is CRITICAL puppet fail [08:28:35] RECOVERY - puppet last run on cp4001 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [08:33:25] anyone doing maintenance on analytics1036 ? [08:34:21] or any router, network seems down for it? [08:36:11] !log running du -d 1 -h > du-may-25-2015 on /exp/project/tools on labstore1001 to audit tools' NFS usage [08:36:15] Logged the message, Master [08:37:37] (03PS2) 10Yuvipanda: ores: Initial module, with web class / role [puppet] - 10https://gerrit.wikimedia.org/r/213354 [08:41:38] I will reboot analytics1036 and flag it [08:42:15] (03PS3) 10Yuvipanda: ores: Initial module, with web class / role [puppet] - 10https://gerrit.wikimedia.org/r/213354 [08:43:22] (03PS4) 10Yuvipanda: ores: Initial module, with web class / role [puppet] - 10https://gerrit.wikimedia.org/r/213354 [08:44:53] oh, I see https://phabricator.wikimedia.org/T99845 [08:45:11] will mark it on icinga [08:55:25] (03PS5) 10Yuvipanda: ores: Initial module, with web class / role [puppet] - 10https://gerrit.wikimedia.org/r/213354 [08:58:35] (03PS6) 10Yuvipanda: ores: Initial module, with web class / role [puppet] - 10https://gerrit.wikimedia.org/r/213354 [09:01:22] (03PS1) 10Alexandros Kosiaris: labsdb1004: Enable daily osm planet sync [puppet] - 10https://gerrit.wikimedia.org/r/213488 [09:05:04] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [09:06:28] (03PS1) 10coren: Tool Labs: add xvfb to exec environment [puppet] - 10https://gerrit.wikimedia.org/r/213490 (https://phabricator.wikimedia.org/T100268) [09:07:06] 7Blocked-on-Operations, 6operations, 10Maps, 6Scrum-of-Scrums, 10hardware-requests: Eqiad Spare allocation: 1 hardware access request for OSM Maps project - https://phabricator.wikimedia.org/T97638#1308987 (10akosiaris) @yurik, @maxsem. The hardware is ready for use. The hostname is: pgsql.eqiad.wmnet. T... [09:12:28] (03CR) 10Yuvipanda: [C: 031] Tool Labs: add xvfb to exec environment [puppet] - 10https://gerrit.wikimedia.org/r/213490 (https://phabricator.wikimedia.org/T100268) (owner: 10coren) [09:17:48] (03PS7) 10Yuvipanda: ores: Initial module, with web class / role [puppet] - 10https://gerrit.wikimedia.org/r/213354 [09:21:25] (03PS8) 10Yuvipanda: ores: Initial module, with web class / role [puppet] - 10https://gerrit.wikimedia.org/r/213354 [09:22:01] (03CR) 10coren: [C: 032] "Eeew!" [puppet] - 10https://gerrit.wikimedia.org/r/213490 (https://phabricator.wikimedia.org/T100268) (owner: 10coren) [09:27:24] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [09:38:55] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [09:40:45] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [09:41:01] (03PS1) 10Mobrovac: RESTBase: Set Graphoid address for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/213500 (https://phabricator.wikimedia.org/T99885) [09:42:14] (03PS9) 10Yuvipanda: ores: Initial module, with web class / role [puppet] - 10https://gerrit.wikimedia.org/r/213354 [10:09:45] (03PS1) 10Jcrespo: Depooling db1018 for maintenance and testing References T99485 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213508 [10:23:24] (03PS1) 10Mobrovac: Set $::ipaddress_eth0 as the default IP address [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/213511 (https://phabricator.wikimedia.org/T99564) [10:25:19] (03CR) 10Alexandros Kosiaris: initial debian packaging (031 comment) [debs/python-etcd] - 10https://gerrit.wikimedia.org/r/212528 (https://phabricator.wikimedia.org/T99771) (owner: 10Filippo Giunchedi) [10:26:26] (03CR) 10Jcrespo: [C: 032] Depooling db1018 for maintenance and testing References T99485 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213508 (owner: 10Jcrespo) [10:26:44] 6operations, 7database: investigate performance_schema for wmf prod - https://phabricator.wikimedia.org/T99485#1309376 (10jcrespo) [10:29:35] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [10:31:50] !log jynus Synchronized wmf-config/db-eqiad.php: depool db1018 (duration: 00m 13s) [10:31:54] Logged the message, Master [10:32:02] (03CR) 10Alexandros Kosiaris: [C: 032] RESTBase: Set Graphoid address for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/213500 (https://phabricator.wikimedia.org/T99885) (owner: 10Mobrovac) [10:37:10] (03CR) 10Alexandros Kosiaris: [C: 032] labsdb1004: Enable daily osm planet sync [puppet] - 10https://gerrit.wikimedia.org/r/213488 (owner: 10Alexandros Kosiaris) [10:42:27] akosiaris: thnx for merging :) [10:43:12] akosiaris: here's an interesting one - https://gerrit.wikimedia.org/r/#/c/213511 [10:43:35] dunno how to run puppet-compiler on it though [10:47:11] mobrovac: where did you get bit by $::ipaddress ? [10:47:20] labs or dev env ? [10:49:44] akosiaris: labs, deployment-restbase0x [10:53:41] mobrovac: kill the docker instance running on docker0 ? [10:55:26] and the problem should pretty much fix itself. Plus as you point out "Note: this change does not affect the actual configuration in production, as there the first iface happens to be eth0". This is not guaranteed [10:57:20] hmm, puppet compiler is indeed having problems with this patch [10:57:35] there is the assumption of operations/puppet ... [11:05:29] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I am not very fond of this. We using $::ipaddress_eth0 instead of $::ipaddress we should be justifying why we do it and documenting it. As" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/213511 (https://phabricator.wikimedia.org/T99564) (owner: 10Mobrovac) [11:11:57] 6operations, 10Datasets-General-or-Unknown, 10Wikidata, 3Wikidata-Sprint-2015-04-07, and 2 others: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1309472 (10Tobi_WMDE_SW) [11:22:12] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Couple of questions and a suggestion" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/213293 (owner: 10Ori.livneh) [11:37:12] 6operations: Backport and include linux-tools-3.19 to our jessie repository - https://phabricator.wikimedia.org/T100216#1309510 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [11:49:48] (03CR) 10Mobrovac: "True that. In that case we would have to change the address in a different way. I'll follow up with a patch just setting the correct value" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/213511 (https://phabricator.wikimedia.org/T99564) (owner: 10Mobrovac) [11:59:13] (03PS1) 10Mobrovac: Cassandra: deployment-prep: Set the correct listen IP [puppet] - 10https://gerrit.wikimedia.org/r/213530 (https://phabricator.wikimedia.org/T99564) [12:00:06] (03Abandoned) 10Mobrovac: Set $::ipaddress_eth0 as the default IP address [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/213511 (https://phabricator.wikimedia.org/T99564) (owner: 10Mobrovac) [12:00:47] akosiaris: https://gerrit.wikimedia.org/r/#/c/213530/ should be a better, beta-localised solution [12:13:15] PROBLEM - puppet last run on ms-be2001 is CRITICAL puppet fail [12:27:47] (03PS1) 10Andrew Bogott: Remove redundant or defunct ldap servers from the ldap list. [puppet] - 10https://gerrit.wikimedia.org/r/213542 [12:27:49] (03PS1) 10Andrew Bogott: Replace many references to virt1000 and labcontrol2001 with hiera lookups [puppet] - 10https://gerrit.wikimedia.org/r/213543 [12:28:27] (03CR) 10jenkins-bot: [V: 04-1] Replace many references to virt1000 and labcontrol2001 with hiera lookups [puppet] - 10https://gerrit.wikimedia.org/r/213543 (owner: 10Andrew Bogott) [12:31:55] RECOVERY - puppet last run on ms-be2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [12:49:13] (03PS2) 10Andrew Bogott: Replace many references to virt1000 and labcontrol2001 with hiera lookups [puppet] - 10https://gerrit.wikimedia.org/r/213543 [12:49:55] (03CR) 10jenkins-bot: [V: 04-1] Replace many references to virt1000 and labcontrol2001 with hiera lookups [puppet] - 10https://gerrit.wikimedia.org/r/213543 (owner: 10Andrew Bogott) [12:53:25] PROBLEM - Debian mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/debian is over 49 hours old. [12:53:48] (03PS3) 10Andrew Bogott: Replace many references to virt1000 and labcontrol2001 with hiera lookups [puppet] - 10https://gerrit.wikimedia.org/r/213543 [12:54:30] (03CR) 10jenkins-bot: [V: 04-1] Replace many references to virt1000 and labcontrol2001 with hiera lookups [puppet] - 10https://gerrit.wikimedia.org/r/213543 (owner: 10Andrew Bogott) [12:55:05] RECOVERY - Debian mirror in sync with upstream on carbon is OK: /srv/mirrors/debian is over 0 hours old. [12:59:33] (03PS4) 10Andrew Bogott: Replace many references to virt1000 and labcontrol2001 with hiera lookups [puppet] - 10https://gerrit.wikimedia.org/r/213543 [13:05:49] (03PS1) 10Jcrespo: Repool db1018 db1018 will go back to production but with the activation of the performance schema T99485 in order to check its impact on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213554 [13:06:57] (03CR) 10Jcrespo: [C: 032] Repool db1018 db1018 will go back to production but with the activation of the performance schema T99485 in order to check its impact on pro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213554 (owner: 10Jcrespo) [13:09:05] !log jynus Synchronized wmf-config/db-eqiad.php: repool db1018 (duration: 00m 14s) [13:09:09] Logged the message, Master [13:28:17] greg-g, i'm about to +2 https://gerrit.wikimedia.org/r/#/c/212480/ -- labs-only minor change. Would rather not sync unless needed [13:29:26] (03CR) 10Yurik: [C: 032] Beta: updated graphoid to the new api endpoint [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212480 (owner: 10Yurik) [13:29:38] (03Merged) 10jenkins-bot: Beta: updated graphoid to the new api endpoint [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212480 (owner: 10Yurik) [13:30:00] yurik: please do just sync it, it'll be ok, and it'll annoy others who get surprised by it ;) [13:30:10] greg-g, k, syncing [13:31:48] jynus, did you change a file on tin and didn't commit it? CC: greg-g [13:32:46] yurik, yes [13:32:54] tsk tsk tsk :) [13:32:59] about to revert it [13:33:01] 6operations: pc100[123] maintenance and upgrade - https://phabricator.wikimedia.org/T100301#1310015 (10Matanya) [13:33:26] jynus, when done, can you git pull / sync - i made a minor cange to the labs settings and merged it [13:33:37] https://gerrit.wikimedia.org/r/#/c/212480/ [13:33:51] ok, I will [13:34:54] thx :) [13:34:55] :) :) [13:37:00] !log jynus Synchronized wmf-config/db-eqiad.php: repool db1018 (warm cache) (duration: 00m 13s) [13:37:04] Logged the message, Master [13:37:30] gwicke, is restbase enabled on all wiki domains, or just wp? [13:38:14] !log jynus Synchronized wmf-config/InitialiseSettings-labs.php: restbase change from yurik (duration: 00m 14s) [13:38:18] Logged the message, Master [13:53:34] !log legoktm Synchronized php-1.26wmf7/extensions/ExtensionDistributor: Update ExtensionDistributor to master (duration: 00m 13s) [13:53:39] Logged the message, Master [14:16:36] !log legoktm Synchronized php-1.26wmf7/extensions/WikimediaMessages/i18n/: ExtensionDistributor message updates (duration: 00m 17s) [14:16:40] Logged the message, Master [14:17:17] !log intentionally not scapping right now, will let l10nupdate sync it out [14:17:21] Logged the message, Master [14:21:18] (03PS1) 10Legoktm: Change default extension distributor branch to REL1_25 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213569 [14:22:05] (03CR) 10Legoktm: [C: 032] Change default extension distributor branch to REL1_25 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213569 (owner: 10Legoktm) [14:23:07] <_joe_> legoktm: please don't deploy code now [14:23:16] (03Merged) 10jenkins-bot: Change default extension distributor branch to REL1_25 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213569 (owner: 10Legoktm) [14:23:23] <_joe_> I don't know how many ops are available in case of problems [14:23:36] <_joe_> also, you should be @ the showcase :) [14:24:01] _joe_: it's just changing the extension distributor config for the 1.25 release, I can not deploy ^ if you'd rather I don't [14:24:41] and I'm at the showcase :P [14:30:26] _joe_: ^ ? [14:31:22] "it's just ..." <- famous last words [14:31:31] :) [14:31:51] docker push is so slowwww [14:37:51] (03PS1) 10Legoktm: Revert "Change default extension distributor branch to REL1_25" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213571 [14:38:15] (03CR) 10Legoktm: [C: 032] Revert "Change default extension distributor branch to REL1_25" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213571 (owner: 10Legoktm) [14:38:21] (03Merged) 10jenkins-bot: Revert "Change default extension distributor branch to REL1_25" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213571 (owner: 10Legoktm) [14:38:34] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [14:38:54] heh [14:39:10] I pulled it in [14:40:16] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [14:42:50] (03PS1) 10Legoktm: Revert "Revert "Change default extension distributor branch to REL1_25"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213572 [14:43:09] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#1310296 (10Patrick87) librsvg 2.40.9-2 is now in Debian testing. Is this enough to use it on WIkimedia servers? [14:54:52] (03PS1) 10GWicke: Point /api/rest_v1 (without trailing slash) to RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/213573 (https://phabricator.wikimedia.org/T99859) [14:57:40] (03CR) 10Mjbmr: Allow faux-renaming/database remapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [14:57:44] (03CR) 10Mjbmr: Rename chapcomwiki to affcomwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169939 (https://bugzilla.wikimedia.org/39482) (owner: 10Reedy) [15:01:00] (03CR) 10John F. Lewis: [C: 031] tin: set cluster in hiera, not in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/210835 (owner: 10Dzahn) [15:07:03] (03CR) 10Jcrespo: "I have thrown some thought on the db side of things: https://phabricator.wikimedia.org/T83609 (comments)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [15:08:55] jynus: I can't see any input from you on the ticket? [15:09:12] JohnLewis, on the comments [15:10:22] still can't see anything, am I looking at the right place? :) [15:11:04] JohnLewis, my fault [15:11:09] check again in 5 min [15:11:24] okay [15:22:14] jynus: see them now :) [15:22:26] sorry! [15:56:07] (03PS1) 10Nemo bis: [WIP] Stub LimeSurvey configuration [puppet] - 10https://gerrit.wikimedia.org/r/213579 (https://phabricator.wikimedia.org/T94807) [15:58:34] (03CR) 10Nemo bis: "How to test this meaningfully in labs?" [puppet] - 10https://gerrit.wikimedia.org/r/213579 (https://phabricator.wikimedia.org/T94807) (owner: 10Nemo bis) [15:59:57] (03CR) 10Nemo bis: "(So far I just monkey-copied the Wikimania Scholarship app configuration which Ori has suggested as model.)" [puppet] - 10https://gerrit.wikimedia.org/r/213579 (https://phabricator.wikimedia.org/T94807) (owner: 10Nemo bis) [16:00:10] Negative24: \o/ [16:00:15] sorry, wrong 'Ne' [16:00:20] Nemo_bis: \o/ :) [16:01:03] :) [16:01:06] :) [16:07:13] (03CR) 10Jcrespo: "Please note that for consistency reasons, mariadb changes do not get applied automatically. Ping any dba when you need those." [puppet] - 10https://gerrit.wikimedia.org/r/213579 (https://phabricator.wikimedia.org/T94807) (owner: 10Nemo bis) [16:34:00] 6operations: pc100[123] maintenance and upgrade - https://phabricator.wikimedia.org/T100301#1310549 (10jcrespo) Running the following on pc1001 in order to retrieve a sample of traffic for later analysis of 10 compatibility: ``` SET @slow_query_log := @@GLOBAL.slow_query_log; -- ON SET @slow_query_log_file := @... [16:34:11] 7Puppet, 6Reading-Infrastructure-Team, 6Release-Engineering, 5Patch-For-Review, 15User-Bd808-Test: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1310551 (10Tgr) [16:36:07] !log running diagnostics on mariadb@pc1001: a very small amount of requests may experience extra latency [16:36:14] Logged the message, Master [17:51:00] 6operations: pc100[123] maintenance and upgrade - https://phabricator.wikimedia.org/T100301#1310684 (10jcrespo) Log set back to normal. This wiki page documents the queries being done there and how fast, in order to compare with 10: - https://wikitech.wikimedia.org/wiki/MariaDB/parsercache It is a simple... [17:51:44] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/213216 (https://phabricator.wikimedia.org/T99990) (owner: 10QChris) [17:56:24] 6operations: pc100[123] maintenance and upgrade - https://phabricator.wikimedia.org/T100301#1310690 (10jcrespo) If someone needs the full log or output (includes potential private queries), it is on `root@pc1001:/home/jynus` [17:56:53] Is SUL finalized? From what I heard it was... [17:57:02] yes [17:57:14] Almost done. [17:57:17] Not really finalized. [17:57:50] Negative24: https://phabricator.wikimedia.org/T37707 [17:57:51] Glaisher: then what were all the celebratory emails that I got about? [17:58:28] Negative24: Renaming of conflicting usernames has been done. That's like 98% of it. [17:58:38] ah ok [18:00:26] legoktm would know what's missing [18:02:18] see you later! [18:06:30] (03CR) 10Alexandros Kosiaris: [C: 032] "This is technically correct, the flag naming is somewhat convoluted and had me thinking for a while, but not this commit's fault. Merging" [puppet] - 10https://gerrit.wikimedia.org/r/213216 (https://phabricator.wikimedia.org/T99990) (owner: 10QChris) [18:06:38] (03CR) 10Alexandros Kosiaris: [V: 032] Turn off sshd MAC and KEX hardening for gerrit replication targets [puppet] - 10https://gerrit.wikimedia.org/r/213216 (https://phabricator.wikimedia.org/T99990) (owner: 10QChris) [18:07:35] PROBLEM - Debian mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/debian is over 55 hours old. [18:09:14] RECOVERY - Debian mirror in sync with upstream on carbon is OK: /srv/mirrors/debian is over 0 hours old. [18:09:50] (03CR) 10Alex Monk: "Looks like I08d0a64d found a less temporary use for this." [puppet] - 10https://gerrit.wikimedia.org/r/212909 (owner: 10Yuvipanda) [18:21:46] You guys are lazy today! :) http://i.imgur.com/N7CU56P.png [18:22:15] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: git.wikimedia.org replication from gerrit stopped or lags - https://phabricator.wikimedia.org/T99990#1310710 (10Paladox) How long would it take for gerrit and git to pickup the change so that they both can start working again. [18:26:23] Negative24: I find it really funny that you call us lazy and then the next line is a bug that explains why we are :D [18:27:02] :D [18:29:38] (03CR) 10Paladox: "I still see no change on git.wikimedia.org." [puppet] - 10https://gerrit.wikimedia.org/r/213216 (https://phabricator.wikimedia.org/T99990) (owner: 10QChris) [18:30:19] (03CR) 10Paladox: "Never mind it works now." [puppet] - 10https://gerrit.wikimedia.org/r/213216 (https://phabricator.wikimedia.org/T99990) (owner: 10QChris) [18:32:44] PROBLEM - Debian mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/debian is over 55 hours old. [18:34:05] (03PS1) 10Alexandros Kosiaris: Merge branch 'debian' [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/213603 [18:36:04] RECOVERY - Debian mirror in sync with upstream on carbon is OK: /srv/mirrors/debian is over 0 hours old. [18:39:36] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: git.wikimedia.org replication from gerrit stopped or lags - https://phabricator.wikimedia.org/T99990#1310738 (10akosiaris) 5Open>3Resolved a:3akosiaris The general rule is 20 minutes right now, though some changes might take up to 40 minutes or... [18:41:25] (03CR) 10Alexandros Kosiaris: [C: 032] Merge branch 'debian' [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/213603 (owner: 10Alexandros Kosiaris) [18:41:29] (03CR) 10Alexandros Kosiaris: [V: 032] Merge branch 'debian' [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/213603 (owner: 10Alexandros Kosiaris) [18:59:36] PROBLEM - Debian mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/debian is over 55 hours old. [18:59:36] PROBLEM - puppet last run on mw1090 is CRITICAL Puppet has 1 failures [19:01:25] RECOVERY - Debian mirror in sync with upstream on carbon is OK: /srv/mirrors/debian is over 0 hours old. [19:03:45] PROBLEM - puppet last run on carbon is CRITICAL Puppet last ran 6 hours ago [19:16:25] RECOVERY - puppet last run on mw1090 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [19:23:55] RECOVERY - puppet last run on carbon is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [19:43:25] PROBLEM - puppet last run on mw2106 is CRITICAL puppet fail [19:50:02] matanya: what's your email? :) [19:50:14] matanya@foss.co.il [19:52:25] matanya: done, I've just straight up set you as list admin because you'll know what to do with it :) [19:52:39] thanks JohnFLewis ! [19:53:02] set a description for it asap otherwise we'll have words ;) [20:01:55] RECOVERY - puppet last run on mw2106 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:22:58] (03PS1) 10Andrew Bogott: Feed the puppet host IP directly to dnsmasq. [puppet] - 10https://gerrit.wikimedia.org/r/213629 [20:26:52] (03PS1) 10Dereckson: Import sources on mai.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213652 (https://phabricator.wikimedia.org/T99490) [20:51:45] PROBLEM - check if phabricator taskmaster is running on iridium is CRITICAL: PROCS CRITICAL: 10 processes with regex args PhabricatorTaskmasterDaemon [20:53:34] RECOVERY - check if phabricator taskmaster is running on iridium is OK: PROCS OK: 1 process with regex args PhabricatorTaskmasterDaemon [20:56:28] pages, what's up ? [21:00:02] pages? [21:00:20] ori: yeah for phabricator [21:00:30] ahh [21:00:34] it recovered 2 mins later on its own [21:01:37] * apergos looks in [21:01:47] *yawn* [21:01:52] ok, going away again [21:02:17] akosiaris: clearly phab felt like there was a challenge to take on in tasks [21:04:07] JohnFLewis: :-) [21:08:13] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Shell and research access for Moushira Elamrawy - https://phabricator.wikimedia.org/T100091#1311150 (10ori) 5Open>3Resolved a:3ori [21:09:35] PROBLEM - puppet last run on mw2050 is CRITICAL puppet fail [21:12:21] stderr:fatal: could not read Username for 'https://gerrit.wikimedia.org': No such device or address [21:12:22] interesting [21:13:07] what's that from? [21:13:09] 358147 times since Mar 24... [21:13:16] Krenair: phabricator [21:13:21] phd/daemons.log [21:13:45] https? huh.. [21:13:47] while trying to update repos seems like [21:14:58] the error message is coming from git, it appears: http://stackoverflow.com/questions/20871549/error-when-push-commits-with-github-fatal-could-not-read-username [21:16:01] yeah, git fetch is the culprit [21:16:12] 'the missing device is actually a console. You've tried to authenticate non-interactively. This has failed As a fallback, git has tried to prompt the user to input a username, but this doesn't work as an interactive console is not available.' (http://stackoverflow.com/a/21676317) [21:16:13] called from /srv/phab/phabricator/bin/repository [21:16:29] oh man... [21:16:37] ok anyway, irrelevant to the actual page [21:16:41] though it should be fixed [21:16:50] it's stack tracing btw [21:17:04] but seems like it's just git fetching failing [21:25:22] ori: the check needs some better thresholds, otherwise we are OK. And this is triggered by phabricator trying to contact non-existent email addresses. Either because they don't exist anymore, or perhaps never existed in the first place. [21:25:43] Will file a ticket tomorrow. Now I am going back to sleep. Have a nice night. [21:26:34] RECOVERY - puppet last run on mw2050 is OK Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:27:07] akosiaris: have a nice night. (and thanks for the detailed review of that varnish patch, by the way!) [21:27:53] ori: you 're welcome. I am actually not fully sure on some of my comments, hence the questions [21:28:25] take it with a grain of salt, I might not have grasped fully the design [21:30:03] i looked over them very briefly earlier and i think they were valid points [21:55:12] can someone give phabricator a kick [22:00:35] PROBLEM - check if phabricator taskmaster is running on iridium is CRITICAL: PROCS CRITICAL: 6 processes with regex args PhabricatorTaskmasterDaemon [22:02:25] RECOVERY - check if phabricator taskmaster is running on iridium is OK: PROCS OK: 1 process with regex args PhabricatorTaskmasterDaemon [22:11:18] 6operations: Backport and include linux-tools-3.19 to our jessie repository - https://phabricator.wikimedia.org/T100216#1311322 (10BBlack) Primarily the reason I haven't installed it (perf) by default in the past is it brings in the kernel debuginfo package, which is usually a significant fraction of the rootfs'... [23:03:27] 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests, 5Patch-For-Review: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1311379 (10AddisWang) Hey guys! Thank you a lot for all these work. I guess it already in the end of the process? I'm not sure. Is there... [23:19:34] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 3 below the confidence bounds [23:20:55] 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests, 5Patch-For-Review: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1311391 (10Krenair) @greg: can I please have a deployment window tomorrow after the morning SWAT? I'll finish this, it's just a couple o... [23:36:39] (03PS6) 10Alex Monk: cn.wikimedia.org initial configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/211103 (https://phabricator.wikimedia.org/T98676) (owner: 10Dereckson) [23:37:23] 6operations, 10Wikimedia-DNS, 10Wikimedia-Site-requests, 5Patch-For-Review: Create fishbowl wiki for Wikimedia User Group China - https://phabricator.wikimedia.org/T98676#1311396 (10Krenair) [23:37:46] (03Abandoned) 10Alex Monk: Imported logo for Wikimedia User Group China [mediawiki-config] - 10https://gerrit.wikimedia.org/r/211094 (https://phabricator.wikimedia.org/T98676) (owner: 10Dereckson) [23:45:19] 7Puppet, 6operations: require_package behaves herratically with other puppet contructs - https://phabricator.wikimedia.org/T1245#1311409 (10ori) 5Open>3Resolved This was fixed in [[ https://gerrit.wikimedia.org/r/#/c/178434/ | I2200e312 ]] and [[ https://gerrit.wikimedia.org/r/#/c/178431/ |I723673cbf ]].