[00:36:06] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [00:37:27] looks like gitblit is hosed again, i'll restart it. [00:43:06] !log restarted gitblit on antimony [00:43:06] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 0.181 second response time [00:44:12] hm no response from logmsgbot [00:45:28] er i mean morebots [00:46:16] died in the netsplit maybe? [00:51:56] hmm i don't have access to tools-login.eqiad.wmflabs as described in https://wikitech.wikimedia.org/wiki/Morebots#Example:_restart_the_ops_channel_morebot [00:52:03] and i don't see any instances in project "tools" in labs [01:09:01] jgage, access to tools-login.wmflabs.org wouldn't be enough, you'd need to be in the morebots group as well [01:09:33] people listed next to morebots at https://tools.wmflabs.org/ [01:09:41] e.g. legoktm_ [01:18:31] ah ok now that i've added myself to the project i can see the instance list [01:18:47] still can't login though [01:24:57] ok got it [01:27:42] !log testing morebots [01:27:52] hmph [01:28:35] well i followed the procedure and it still isn't responding. i'll let someone else mess with it. [01:37:43] !log testing morebots [01:37:49] Logged the message, Master [01:38:00] jgage, ^ [01:38:13] you just tried to !log before it had joined [01:38:55] hah. cool, ok. [01:39:49] !log restarted gitblit on antimony at 00:43 UTC [01:39:54] Logged the message, Master [01:39:56] :) [02:17:58] jgage: just a little note, the instance lists on Nova_Resource namespaces often aren't correct (usually need a null edit to fix and sometimes that doesn't even work). Special:NovaInstance is your best bet [02:23:02] !log l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 10m 23s) [02:23:11] Logged the message, Master [02:27:13] !log LocalisationUpdate completed (1.26wmf10) at 2015-06-21 02:27:13+00:00 [02:27:17] Logged the message, Master [02:28:12] 6operations, 10RESTBase-Cassandra, 6Services, 5Patch-For-Review, 7RESTBase-architecture: put new restbase servers in service - https://phabricator.wikimedia.org/T102015#1385512 (10GWicke) With 2.1.7 on both 1006 and 1009 the first ~400G stream succeeds, but for some reason some other part of the stream f... [02:32:48] I'm getting 503 errors when sending stuff (editing) via the api [02:33:17] Here's the "technical stuff": Request: POST http://es.wikipedia.org/w/api.php?action=edit&format=json, from 10.64.0.102 via cp1065 cp1065 ([10.64.0.102]:3128), Varnish XID 888688929
Forwarded for: 10.68.17.228, 10.64.0.102, 10.64.0.102
Error: 503, Service Unavailable at Sun, 21 Jun 2015 02:30:59 GMT [02:59:19] Negative24: interesting. thanks. [03:34:36] PROBLEM - puppet last run on mc1001 is CRITICAL Puppet has 1 failures [03:35:47] PROBLEM - puppet last run on cp2015 is CRITICAL Puppet has 1 failures [03:35:56] PROBLEM - puppet last run on mw2038 is CRITICAL Puppet has 1 failures [03:35:56] PROBLEM - puppet last run on mw2051 is CRITICAL Puppet has 1 failures [03:36:37] PROBLEM - puppet last run on mw1181 is CRITICAL Puppet has 1 failures [03:36:56] PROBLEM - puppet last run on mw1004 is CRITICAL Puppet has 1 failures [03:38:06] PROBLEM - puppet last run on cp3039 is CRITICAL puppet fail [03:39:37] PROBLEM - puppet last run on wtp2017 is CRITICAL Puppet has 1 failures [03:39:57] PROBLEM - puppet last run on db1037 is CRITICAL puppet fail [03:50:18] RECOVERY - puppet last run on mc1001 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [03:51:47] PROBLEM - puppet last run on mw1044 is CRITICAL puppet fail [03:51:47] RECOVERY - puppet last run on cp2015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [03:52:27] RECOVERY - puppet last run on mw1181 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [03:52:46] RECOVERY - puppet last run on mw1004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [03:53:27] RECOVERY - puppet last run on mw2038 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [03:53:28] RECOVERY - puppet last run on mw2051 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [03:53:56] RECOVERY - puppet last run on cp3039 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures [03:55:27] RECOVERY - puppet last run on db1037 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [03:57:16] RECOVERY - puppet last run on wtp2017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:09:17] RECOVERY - puppet last run on mw1044 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [04:22:27] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [04:24:07] RECOVERY - Host mw1085 is UPING OK - Packet loss = 0%, RTA = 0.92 ms [04:28:37] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [04:31:58] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 0.643 second response time [04:39:08] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [04:44:17] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 0.444 second response time [04:49:37] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [04:54:46] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 3.664 second response time [05:01:07] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 21 05:01:07 UTC 2015 (duration 1m 6s) [05:01:12] Logged the message, Master [05:01:57] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [05:06:57] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 0.402 second response time [05:14:06] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [05:17:26] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 0.390 second response time [05:22:56] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [05:29:47] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 0.200 second response time [05:35:06] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [05:41:58] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 0.197 second response time [05:49:06] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [05:50:38] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60573 bytes in 0.372 second response time [05:55:57] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [05:57:47] PROBLEM - puppet last run on es1010 is CRITICAL Puppet has 1 failures [06:06:36] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 9.171 second response time [06:13:27] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [06:15:17] RECOVERY - puppet last run on es1010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:23:56] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60552 bytes in 0.394 second response time [06:29:17] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [06:30:37] PROBLEM - puppet last run on cp2013 is CRITICAL Puppet has 2 failures [06:30:37] PROBLEM - puppet last run on mw2013 is CRITICAL Puppet has 1 failures [06:30:46] PROBLEM - puppet last run on mw2016 is CRITICAL Puppet has 1 failures [06:30:57] PROBLEM - puppet last run on db1042 is CRITICAL Puppet has 1 failures [06:30:57] PROBLEM - puppet last run on db1067 is CRITICAL Puppet has 1 failures [06:31:17] PROBLEM - puppet last run on db1023 is CRITICAL Puppet has 1 failures [06:32:07] PROBLEM - puppet last run on labcontrol2001 is CRITICAL Puppet has 1 failures [06:34:07] PROBLEM - puppet last run on db2040 is CRITICAL Puppet has 1 failures [06:34:17] PROBLEM - puppet last run on mw1123 is CRITICAL Puppet has 1 failures [06:34:27] PROBLEM - puppet last run on mw1166 is CRITICAL Puppet has 1 failures [06:34:48] PROBLEM - puppet last run on mw1144 is CRITICAL Puppet has 1 failures [06:35:37] PROBLEM - puppet last run on mw2092 is CRITICAL Puppet has 1 failures [06:35:37] PROBLEM - puppet last run on mw2126 is CRITICAL Puppet has 1 failures [06:35:47] PROBLEM - puppet last run on mw2059 is CRITICAL Puppet has 1 failures [06:35:56] PROBLEM - puppet last run on mw2123 is CRITICAL Puppet has 1 failures [06:36:16] PROBLEM - puppet last run on mw1129 is CRITICAL Puppet has 2 failures [06:36:16] PROBLEM - puppet last run on mw1061 is CRITICAL Puppet has 1 failures [06:36:28] PROBLEM - puppet last run on mw2030 is CRITICAL Puppet has 2 failures [06:39:46] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60553 bytes in 0.437 second response time [06:45:08] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [06:46:18] RECOVERY - puppet last run on cp2013 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:46:26] RECOVERY - puppet last run on mw2013 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:46:27] RECOVERY - puppet last run on mw2123 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:46:27] RECOVERY - puppet last run on mw2016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:37] RECOVERY - puppet last run on mw1123 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on db1042 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on mw1166 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on db1067 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on mw1061 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:07] RECOVERY - puppet last run on db1023 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:07] RECOVERY - puppet last run on mw1144 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:07] RECOVERY - puppet last run on labcontrol2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:07] RECOVERY - puppet last run on mw2126 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:07] RECOVERY - puppet last run on mw2092 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:48:17] RECOVERY - puppet last run on mw2059 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:48:17] RECOVERY - puppet last run on db2040 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:46] RECOVERY - puppet last run on mw1129 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:49:06] RECOVERY - puppet last run on mw2030 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:55:17] !log restarted bootstrap on restbase1009 earlier today; hardware hasn't died yet [06:55:22] Logged the message, Master [06:57:26] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60574 bytes in 0.893 second response time [07:02:57] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [07:11:17] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60553 bytes in 0.581 second response time [07:16:46] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [07:24:53] 6operations, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, 6Multimedia, and 6 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1385611 (10Nemo_bis) > Surely we have to draw the line of where the oldest base system we support running MediaWiki on is... [07:28:57] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60574 bytes in 0.057 second response time [07:34:17] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [07:44:47] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60553 bytes in 0.867 second response time [07:50:07] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [07:52:16] (03PS11) 10Giuseppe Lavagetto: varnish: add generation of the dynamic list of directors [puppet] - 10https://gerrit.wikimedia.org/r/217818 (https://phabricator.wikimedia.org/T97975) [08:04:07] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60553 bytes in 0.287 second response time [08:09:27] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [08:19:47] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60553 bytes in 0.061 second response time [08:42:56] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [08:43:46] RECOVERY - Host mw1085 is UPING OK - Packet loss = 0%, RTA = 1.78 ms [09:29:24] 6operations, 7HTTPS: Replace SHA1 certificates with SHA256 - https://phabricator.wikimedia.org/T73156#1385732 (10Chmarkine) >>! In T73156#1383664, @Dzahn wrote: > so all is left here is OTRS it seems WMF Labs, Planet, and the domains mentioned by @JGreen, civicrm, frdata, fundraising, payments-listener are st... [10:43:47] PROBLEM - puppet last run on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:45:37] RECOVERY - puppet last run on mw1110 is OK Puppet is currently enabled, last run 22 minutes ago with 0 failures [11:03:37] PROBLEM - RAID on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:04:57] PROBLEM - puppet last run on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:05:06] PROBLEM - Apache HTTP on mw1110 is CRITICAL - Socket timeout after 10 seconds [11:05:17] PROBLEM - salt-minion processes on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:05:18] PROBLEM - HHVM rendering on mw1110 is CRITICAL - Socket timeout after 10 seconds [11:05:18] PROBLEM - nutcracker process on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:05:18] PROBLEM - HHVM processes on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:05:18] PROBLEM - dhclient process on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:05:27] PROBLEM - SSH on mw1110 is CRITICAL - Socket timeout after 10 seconds [11:05:28] PROBLEM - nutcracker port on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:05:36] PROBLEM - configured eth on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:05:37] PROBLEM - DPKG on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:05:56] PROBLEM - Disk space on mw1110 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:06:57] RECOVERY - nutcracker process on mw1110 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [11:06:57] RECOVERY - salt-minion processes on mw1110 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [11:06:57] RECOVERY - HHVM processes on mw1110 is OK: PROCS OK: 6 processes with command name hhvm [11:06:57] RECOVERY - dhclient process on mw1110 is OK: PROCS OK: 0 processes with command name dhclient [11:06:58] RECOVERY - SSH on mw1110 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [11:07:06] RECOVERY - RAID on mw1110 is OK no RAID installed [11:07:07] RECOVERY - nutcracker port on mw1110 is OK: TCP OK - 0.000 second response time on port 11212 [11:07:16] RECOVERY - DPKG on mw1110 is OK: All packages OK [11:07:17] RECOVERY - configured eth on mw1110 is OK - interfaces up [11:07:27] RECOVERY - Disk space on mw1110 is OK: DISK OK [11:08:17] RECOVERY - puppet last run on mw1110 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [11:13:37] RECOVERY - Apache HTTP on mw1110 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.066 second response time [11:13:58] RECOVERY - HHVM rendering on mw1110 is OK: HTTP OK: HTTP/1.1 200 OK - 64762 bytes in 1.068 second response time [11:20:56] PROBLEM - Apache HTTP on mw1110 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.013 second response time [11:21:08] PROBLEM - HHVM rendering on mw1110 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.009 second response time [11:28:46] !log restarting apache on mw1110 [11:29:04] Logged the message, Master [11:35:07] RECOVERY - Apache HTTP on mw1110 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.049 second response time [11:35:07] RECOVERY - HHVM rendering on mw1110 is OK: HTTP OK: HTTP/1.1 200 OK - 64762 bytes in 0.160 second response time [11:35:53] that kick in the guts may have fixed it? for some reason, hhvm didn't want to come back [11:36:05] after an OOM [11:39:52] kibana/graphite seems to confirm it [12:13:17] PROBLEM - puppet last run on db2066 is CRITICAL puppet fail [12:16:37] ^second time in 2 days that I see the puppet master load balancing fail [12:16:48] RECOVERY - puppet last run on db2066 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [12:39:07] PROBLEM - Host eeden is DOWN: PING CRITICAL - Packet loss = 100% [12:39:17] PROBLEM - Host ns2-v4 is DOWN: PING CRITICAL - Packet loss = 100% [12:44:57] RECOVERY - Host eeden is UPING OK - Packet loss = 0%, RTA = 89.05 ms [12:45:07] RECOVERY - Host ns2-v4 is UPING OK - Packet loss = 0%, RTA = 88.36 ms [12:48:57] PROBLEM - puppet last run on eeden is CRITICAL puppet fail [13:01:17] RECOVERY - puppet last run on eeden is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [13:18:54] -_- [13:21:08] PROBLEM - puppet last run on lvs3002 is CRITICAL puppet fail [13:27:08] sjoerddebruin: have I missed anything with that face? [13:27:31] JohnFLewis: Don't know if you see all the events in this channel. [13:28:08] Oh Keegan? [13:38:28] RECOVERY - puppet last run on lvs3002 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [14:17:07] Has anyone noticed that the Google Doodle is pointing to a ?title= URL? [14:18:01] <_joe_> Katie: what doodle? every country gets a different one [14:19:32] U.S. Father's Day. [14:19:49] 6operations, 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-100, and 2 others: Migrate Labs NFS storage from RAID6 to RAID10 - https://phabricator.wikimedia.org/T96063#1385920 (10coren) The filesystem crashed caused us to... improvise around this plan a great deal. All but one project has been switched to a r... [14:19:59] _joe_: https://en.wikipedia.org/?title=Father's_Day in the top result. [14:20:30] <_joe_> Katie: oh ok, here the doodle is about summer starting [14:20:48] _joe_: I'm pretty frustrated about us sending wrong canonical tags. [14:20:57] I would think the operations team would be as well. [14:21:11] <_joe_> but the wikipedia link is like that as well [14:21:14] I feel like we must be hurting our caching. [14:22:11] <_joe_> uhm I'm not sure if that gets internally remapped, but I think not [14:22:25] <_joe_> and yes, this is going to hurt caching [14:22:49] I think it's partially due to the HTTPS switchover. I think Google is re-indexing us a lot. [14:23:39] Oh, Krinkle_ picked up the task: https://phabricator.wikimedia.org/T67402 [14:23:52] <_joe_> it's just enwiki btw [14:23:55] <_joe_> AFAICS [14:25:25] I see https://it.wikipedia.org/?title=/dev/null in Google search results. [14:26:46] PROBLEM - YARN NodeManager Node-State on analytics1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:28:17] RECOVERY - YARN NodeManager Node-State on analytics1016 is OK YARN NodeManager analytics1016.eqiad.wmnet:8041 Node-State: RUNNING [14:34:01] 6operations, 10ops-codfw, 6Labs: Labs: Install the new RAID controller in labstore2001 and test - https://phabricator.wikimedia.org/T103267#1385952 (10coren) 3NEW [14:36:52] 6operations, 10MediaWiki-Sites, 10SEO, 5Patch-For-Review: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#1385964 (10MZMcBride) >>! In T67402#1383833, @gerritbot wrote: > Change 219446 had a related patch set uploaded (by... [14:37:47] _joe_ are you a plumber by any chance? [14:38:10] <_joe_> ToAruShiroiNeko: I usually joke I'm an internet plumber :P [14:38:27] okay [14:38:35] so you plumb the series of tubes? [14:40:45] <_joe_> not really :P [14:57:12] 6operations, 10MediaWiki-Sites, 10SEO, 5Patch-For-Review: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#1385975 (10BBlack) Yeah this problem is growing in the Google indices. We may actually need to fix the rel=canonic... [14:58:02] 6operations, 10MediaWiki-Sites, 10SEO, 5Patch-For-Review: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#1385978 (10BBlack) (and yes: this hurts our cache performance. It also hurts purge behavior on edits, from the use... [15:01:51] (03PS1) 10Yuvipanda: quarry: Don't install flask and mwoauth everywhere [puppet] - 10https://gerrit.wikimedia.org/r/219622 [15:02:28] (03PS2) 10Yuvipanda: quarry: Don't install flask and mwoauth everywhere [puppet] - 10https://gerrit.wikimedia.org/r/219622 [15:02:36] (03CR) 10Yuvipanda: [C: 032 V: 032] quarry: Don't install flask and mwoauth everywhere [puppet] - 10https://gerrit.wikimedia.org/r/219622 (owner: 10Yuvipanda) [15:57:53] 7Puppet, 6Mobile-Web, 5Patch-For-Review: Certain urls do not redirect to mobile - https://phabricator.wikimedia.org/T103158#1386003 (10BBlack) [16:46:07] PROBLEM - puppet last run on mw2197 is CRITICAL Puppet has 1 failures [17:00:49] 6operations, 6Multimedia, 6Performance-Team, 10Wikimedia-Site-requests: Please offer larger image thumbnail sizes in Special:Preferences - https://phabricator.wikimedia.org/T65440#1386068 (10Glaisher) Could someone from one of the teams explain whether this is feasible or not, currently? [17:01:48] RECOVERY - puppet last run on mw2197 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [17:14:20] (03PS1) 10JanZerebecki: Default wmgUseWikibaseQuality on beta to true. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219630 (https://phabricator.wikimedia.org/T99351) [17:41:18] 6operations, 6Multimedia, 6Performance-Team, 10Wikimedia-Site-requests: Please offer larger image thumbnail sizes in Special:Preferences - https://phabricator.wikimedia.org/T65440#1386100 (10ori) >>! In T65440#1386068, @Glaisher wrote: > Could someone from one of the teams explain whether this is feasible... [17:43:43] (03CR) 10Glaisher: Default wmgUseWikibaseQuality on beta to true. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219630 (https://phabricator.wikimedia.org/T99351) (owner: 10JanZerebecki) [17:59:03] (03CR) 10Ori.livneh: [C: 031] "Could the service init script / unit file / whatever be made to check whether more than gc_time has elapsed since the node last joined the" (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/219503 (https://phabricator.wikimedia.org/T103134) (owner: 10GWicke) [18:05:47] (03CR) 10GWicke: "Re 1): Possibly, but it still wouldn't be safe to re-join the cluster after being down for more than the length of the hinted hand-off win" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/219503 (https://phabricator.wikimedia.org/T103134) (owner: 10GWicke) [18:07:48] (03CR) 10GWicke: Don't start cassandra on boot or via puppet (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/219503 (https://phabricator.wikimedia.org/T103134) (owner: 10GWicke) [18:14:14] (03PS1) 10BryanDavis: Add HHVM restart support [tools/scap] - 10https://gerrit.wikimedia.org/r/219751 (https://phabricator.wikimedia.org/T103008) [18:14:17] (03PS1) 10BryanDavis: Move dsh group file names to config [tools/scap] - 10https://gerrit.wikimedia.org/r/219752 [18:14:35] (03CR) 10jenkins-bot: [V: 04-1] Move dsh group file names to config [tools/scap] - 10https://gerrit.wikimedia.org/r/219752 (owner: 10BryanDavis) [18:16:37] (03PS2) 10BryanDavis: Move dsh group file names to config [tools/scap] - 10https://gerrit.wikimedia.org/r/219752 [18:23:13] (03CR) 10BryanDavis: "Needs testing. I don't have my local test VM setup properly to try this in any realistic way yet. We may be able to rig up a testing syste" (034 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/219751 (https://phabricator.wikimedia.org/T103008) (owner: 10BryanDavis) [18:24:17] PROBLEM - YARN NodeManager Node-State on analytics1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:26:06] RECOVERY - YARN NodeManager Node-State on analytics1016 is OK YARN NodeManager analytics1016.eqiad.wmnet:8041 Node-State: RUNNING [18:29:32] I'm getting database errors when attempting to revdel something: " Function: RevDelRevisionList::doQuery (newline) Error: 2013 Lost connection to MySQL server during query (10.64.48.28)" ­— knownoutage? [18:35:20] eh, breaking it up into smaller parts fixed it. still, ugh. [19:41:06] 6operations, 10MediaWiki-Sites, 10SEO, 5Patch-For-Review: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#1386193 (10Krinkle) >>! In T67402#1385964, @MZMcBride wrote: >>>! In T67402#1383833, @gerritbot wrote: >> Change 21... [20:23:16] PROBLEM - YARN NodeManager Node-State on analytics1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:26:46] RECOVERY - YARN NodeManager Node-State on analytics1016 is OK YARN NodeManager analytics1016.eqiad.wmnet:8041 Node-State: RUNNING [21:21:59] 6operations, 10RESTBase-Cassandra, 6Services, 5Patch-For-Review, 7RESTBase-architecture: put new restbase servers in service - https://phabricator.wikimedia.org/T102015#1386251 (10GWicke) During the second bootstrap attempt restbase1009 now suffered (almost) the same disk failure fate as its brethren; on... [21:24:57] PROBLEM - YARN NodeManager Node-State on analytics1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:31:56] RECOVERY - YARN NodeManager Node-State on analytics1016 is OK YARN NodeManager analytics1016.eqiad.wmnet:8041 Node-State: RUNNING [21:58:14] (03PS1) 10Thcipriani: Wrap varnishkafka ganglia monitor with has_ganglia [puppet] - 10https://gerrit.wikimedia.org/r/219775 [22:05:21] 6operations, 10MediaWiki-Sites, 10SEO, 5Patch-For-Review: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#1386273 (10BBlack) The only reason I mention rel=canonical first perhaps being clearer is because I'm no longer at... [22:06:54] (03PS2) 10Thcipriani: Wrap varnishkafka ganglia monitor with has_ganglia [puppet] - 10https://gerrit.wikimedia.org/r/219775 (https://phabricator.wikimedia.org/T103278) [22:10:27] (03CR) 10BBlack: [C: 04-1] "Can you do the same for the one in role/manifests/cache/kafka/webrequest.pp as well while you're at it?" [puppet] - 10https://gerrit.wikimedia.org/r/219775 (https://phabricator.wikimedia.org/T103278) (owner: 10Thcipriani) [22:24:10] (03PS3) 10Thcipriani: Wrap varnishkafka ganglia monitor with has_ganglia [puppet] - 10https://gerrit.wikimedia.org/r/219775 (https://phabricator.wikimedia.org/T103278) [22:26:17] PROBLEM - YARN NodeManager Node-State on analytics1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:29:46] RECOVERY - YARN NodeManager Node-State on analytics1016 is OK YARN NodeManager analytics1016.eqiad.wmnet:8041 Node-State: RUNNING [22:40:14] 6operations, 10MediaWiki-Sites, 10SEO, 5Patch-For-Review: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#1386307 (10ori) >>! In T67402#1386193, @Krinkle wrote: > I expect this will require standardisation of sorts that n... [23:29:17] PROBLEM - puppet last run on mw1106 is CRITICAL Puppet has 6 failures [23:31:37] PROBLEM - puppet last run on mw2092 is CRITICAL Puppet has 1 failures [23:49:08] RECOVERY - puppet last run on mw2092 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures