[00:01:42] woopsie :-) [00:01:43] !log initiate dd replicate from labstore1001 tools snapshot to labstore1002 lv of tools-04032016 [00:01:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:03:20] (03PS6) 10BBlack: varnish: get rid of backend_options [puppet] - 10https://gerrit.wikimedia.org/r/275117 (https://phabricator.wikimedia.org/T127484) [00:03:22] (03PS6) 10BBlack: varnish: allow director backends to be single-value again [puppet] - 10https://gerrit.wikimedia.org/r/275118 (https://phabricator.wikimedia.org/T127484) [00:03:24] (03PS6) 10BBlack: r::c::config: remove has_ganglia [puppet] - 10https://gerrit.wikimedia.org/r/275119 (https://phabricator.wikimedia.org/T127484) [00:03:26] (03PS6) 10BBlack: r::c::config: remove parsoid (unused) [puppet] - 10https://gerrit.wikimedia.org/r/275121 (https://phabricator.wikimedia.org/T127484) [00:03:29] (03PS6) 10BBlack: r::c::config: remove lvs::configuration include [puppet] - 10https://gerrit.wikimedia.org/r/275120 (https://phabricator.wikimedia.org/T127484) [00:03:30] (03PS7) 10BBlack: r::c::config: move to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275123 (https://phabricator.wikimedia.org/T127484) [00:03:32] (03PS6) 10BBlack: r::c::config: add restbase @ codfw [puppet] - 10https://gerrit.wikimedia.org/r/275122 (https://phabricator.wikimedia.org/T127484) [00:03:35] (03PS7) 10BBlack: varnishes: control applayer DC routing from hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275124 (https://phabricator.wikimedia.org/T127484) [00:03:44] and with that, I think I'm done spamming CI for the night :P [00:09:20] :-) [00:15:12] 6Operations, 10ops-eqiad, 6Labs: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2090540 (10chasemp) [00:16:47] (03CR) 10BBlack: [C: 031] "Compiler checks out ok:" [puppet] - 10https://gerrit.wikimedia.org/r/275124 (https://phabricator.wikimedia.org/T127484) (owner: 10BBlack) [00:19:57] 6Operations, 10Traffic, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Refactor VCL for applayer datacenter-switching - https://phabricator.wikimedia.org/T127484#2090577 (10BBlack) Status update: The changes uploaded so far to the [[ https://gerrit.wikimedia.org/r/#/q/project:operations... [00:22:27] (03PS1) 10Dzahn: ganglia: temp. put aggregator on alsafi [puppet] - 10https://gerrit.wikimedia.org/r/275137 [00:24:20] (03CR) 10Dzahn: [C: 032] ganglia: temp. put aggregator on alsafi [puppet] - 10https://gerrit.wikimedia.org/r/275137 (owner: 10Dzahn) [00:27:07] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 7 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2090607 (10Liuxinyu970226) [00:28:49] (03PS1) 10Dduvall: labs: Database server to support Program Dashboard [puppet] - 10https://gerrit.wikimedia.org/r/275138 (https://phabricator.wikimedia.org/T127105) [00:39:07] (03PS1) 10Dzahn: ganglia: don't try to use upstart on jessie [puppet] - 10https://gerrit.wikimedia.org/r/275139 (https://phabricator.wikimedia.org/T123674) [00:40:05] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 7 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2090639 (10Jdlrobson) A revert firstly cannot be... [00:40:24] (03CR) 10jenkins-bot: [V: 04-1] ganglia: don't try to use upstart on jessie [puppet] - 10https://gerrit.wikimedia.org/r/275139 (https://phabricator.wikimedia.org/T123674) (owner: 10Dzahn) [00:41:48] (03PS2) 10Dzahn: ganglia: don't try to use upstart on jessie [puppet] - 10https://gerrit.wikimedia.org/r/275139 (https://phabricator.wikimedia.org/T123674) [00:52:35] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/1958/" [puppet] - 10https://gerrit.wikimedia.org/r/275139 (https://phabricator.wikimedia.org/T123674) (owner: 10Dzahn) [00:52:42] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: puppet fail [00:55:57] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 7 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2090667 (10Jdlrobson) Note that https://gerrit.wi... [00:56:33] ACKNOWLEDGEMENT - puppet last run on alsafi is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn daniel/ganglia/jessie [00:59:15] (03PS2) 10Dduvall: labs: Database server to support Program Dashboard [puppet] - 10https://gerrit.wikimedia.org/r/275138 (https://phabricator.wikimedia.org/T127105) [01:01:58] (03PS3) 10Dduvall: labs: Database server to support Program Dashboard [puppet] - 10https://gerrit.wikimedia.org/r/275138 (https://phabricator.wikimedia.org/T127105) [01:10:16] (03PS4) 10Dduvall: labs: Database server to support Program Dashboard [puppet] - 10https://gerrit.wikimedia.org/r/275138 (https://phabricator.wikimedia.org/T127105) [01:19:01] (03PS1) 10Dzahn: ganglia: add unit file for systemd on jessie [puppet] - 10https://gerrit.wikimedia.org/r/275146 (https://phabricator.wikimedia.org/T123674) [01:20:00] 6Operations: Port Ganglia aggregator setup to systemd - https://phabricator.wikimedia.org/T124197#2090718 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/275139 https://gerrit.wikimedia.org/r/275146 (WIP) [01:20:02] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [01:22:43] (03PS2) 10Dzahn: ganglia: add unit file for systemd on jessie [puppet] - 10https://gerrit.wikimedia.org/r/275146 (https://phabricator.wikimedia.org/T123674) [01:30:57] (03PS1) 10Krinkle: wikitech: Remove confusing "Alias /w" that breaks static files [puppet] - 10https://gerrit.wikimedia.org/r/275147 (https://phabricator.wikimedia.org/T128747) [01:31:13] ori: Could you merge/deploy ^ for me? [01:31:40] Not've been able to test since I got no root on silver. But worked on mw1017 [01:31:51] Though this special in some way, so revert hammer standing by [01:42:51] (03PS1) 10Dereckson: Ateneo de Manila University workshops throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275149 (https://phabricator.wikimedia.org/T124284) [01:46:07] 6Operations, 6Discovery, 10Wikimedia-Logstash, 3Discovery-Search-Sprint, and 2 others: Upgrade ElasticSearch to 1.7.5 - https://phabricator.wikimedia.org/T122697#2090768 (10Deskana) 5Open>3Resolved Yay! [02:01:53] RECOVERY - Last backup of the tools filesystem on labstore1001 is OK: OK - Last run for unit replicate-tools was successful [02:09:27] o/ [02:10:37] 6Operations, 6Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Wikitechwiki has 4xx responses to requests for some static assets inc. poweredby_mediawiki_88x31.png and WikiEditor's button-sprite.svg - https://phabricator.wikimedia.org/T128747#2090838 (10Krinkle) >>! In T128747#2089235, @Krinkle wrote: >... [02:24:56] Hey... can anyone help me find dberror.log? Don't see it on fluorine as per https://wikitech.wikimedia.org/wiki/Logs [02:25:43] Hmm maybe it's now wfLogDBError.log [02:28:26] https://phabricator.wikimedia.org/T128869 [02:30:01] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 13m 51s) [02:30:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:37:44] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Mar 5 02:37:44 UTC 2016 (duration 7m 43s) [02:37:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:39:18] AndyRussG, what is logging to dberror.log? [02:39:56] Krenair: no I mean, I was just looking at stuff mentioned https://wikitech.wikimedia.org/wiki/Logs [02:40:05] ...for some log that might be worth looking at [02:40:17] oh, might be outdated [02:40:19] ^ https://phabricator.wikimedia.org/T128869 [02:40:26] Yeah heh so it seems :) [02:40:52] Here's an approximation of the message reported: "To avoid excessive replication lag, your transaction has been cancelled after exceeding the 6 second timeout (6.05 s). If you are changing many items, try multiple smaller transactions." [02:41:20] Sounds pretty clear, I do know what to look for in the code, but I'd also like to find a more concrete record [02:41:24] an approximation? [02:41:45] Person reporting closed browser tab, so that's not the message word for word [02:41:51] But it's prob'ly pretty close :) [02:42:03] ah [02:43:39] However the errors did stop when he only updated one item at a time, and we know that the code that was running is the absolute opposite of optimized [02:44:07] Just dunno where to look [02:50:41] Krenair: well I found something on oxygen.eqiad.wmnet:/srv/log/webrequest/5xx.json [02:50:54] not much info per se [02:51:02] yeah, won't find much there afaik [02:51:44] Well at least the exact times [02:52:07] actually, that probably shows you the requested url, right? [02:53:37] you might be able to match that against the mw error logs on fluorine [02:54:02] Krenair: yeah I know the URL infact [02:54:22] (03PS1) 10Dereckson: Namespace configuration on he.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275154 (https://phabricator.wikimedia.org/T127654) [02:54:29] Also sez which hosts [02:55:59] Though the real error was in DB interaction [02:57:48] Krenair: any idea which logs there, specifically? [02:58:14] (03PS1) 10Dereckson: Namespace configuration on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275155 [02:58:22] exception log keeps request URLs [02:59:51] Krenair: Ah gotcha, found something! cool thx :) [03:00:18] nice [03:00:27] To avoid creating high replication lag, this transaction was aborted because the write duration (6.1663353443146) exceeded the 6 seconds limit [03:01:02] RECOVERY - Last backup of the others filesystem on labstore1001 is OK: OK - Last run for unit replicate-others was successful [03:02:01] cff71ab3 [03:03:26] Krenair: hmmm yeah whatdoesthat mean? [03:03:33] it's just the ID [03:03:41] Of what on what? [03:03:52] the exception [03:04:17] On the DB end you mean? [03:04:50] Or in MW? [03:04:52] I don't think the DB gave an error itself [03:05:16] Well no [03:06:03] MW aborted the transaction [03:06:04] Ah K yes I see [03:06:05] Yeah [03:06:10] Gotcha [03:06:38] * AndyRussG does something like a facepalm to say, rrrg yes I should have seen that ;p [03:07:36] Is there something more that you think I could find out using the ID? [03:07:40] (03PS2) 10Dereckson: Ateneo de Manila University workshops throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275149 (https://phabricator.wikimedia.org/T128847) [03:07:56] no [03:08:32] (03PS2) 10Dereckson: Namespace configuration on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275155 (https://phabricator.wikimedia.org/T128354) [03:09:40] Krenair: K.... Mmmm so yeah, I guess this just confirms the obvious, that we have to optimize (or make less heinous) the code that we knew was running, and that we know is heinous [03:09:48] Really great to see some hard logs about it tho [03:09:52] from the bug: "However, it seems that disabling a lot of campaigns at once hasn't caused problems." [03:10:08] Krenair: yep [03:10:28] not much campaign disabling is happening at the moment? [03:10:39] or it really is behaving differently? [03:10:42] It was a Fundraising maintenance window [03:10:52] FR campaigns were disabled then enabled en masse [03:10:56] It happens every now and again [03:11:11] so it had issues during enabling, but not during disabling? [03:11:13] Sometimes unplanned disabling happens... [03:11:19] yeah [03:11:42] Dunno, could just be because it was at a different time of day and loads were different [03:11:59] when enabling campaigns, do you also update/insert lots of other data? [03:12:13] I don't know anything about the workflow for campaigns, not being a CN user [03:12:22] also tend to* [03:17:11] Hmmm looking at the code now, not sure if there's a huge difference in enabling or disabling [03:20:41] that code could definitely be improved but I wouldn't expect it to have performance issues like this [03:21:18] that table only has 1k rows :| [03:23:20] https://github.com/wikimedia/mediawiki-extensions-CentralNotice/blob/8b2b0ca60091370f0be3b4918489083027249124/special/SpecialCentralNotice.php#L184-L194 [03:23:36] Krenair: ah OK that's important to know! [03:23:57] yeah, that's the code I was looking at [03:24:27] I just subscribed u to the task, pls feel free to unsubscribe if u prefer :) [03:25:22] Krenair: I mean, it updates them all one at a time [03:27:42] would like to see the query it's executing that times out [03:28:39] Why does WebRequest::getArray default to null instead of []? [03:32:01] Humm no idea [03:33:52] Maybe it's somehow doing more writes than it should [03:35:18] The main thing we should do with things in that whole method, is to check if a value needs changing before updating [03:35:43] Shouldn't have been done the way it is, IMHO [03:41:13] Basically it's updating several properties for every row unnecessarily on every post. [03:43:12] PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 61.90% of data above the critical threshold [5000000.0] [03:45:12] I still suspect there's something non-obvious going on here [03:46:07] Hmmm sounds plausible [03:46:21] Krenair: OK if I paraphras eyou in the Phab task? [03:46:30] go for it [03:46:43] * Krenair sleeps [03:47:49] Krenair: cya! [03:49:45] Krenair: thx much! [03:50:22] RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 50.00% above the threshold [1000000.0] [04:01:52] RECOVERY - Last backup of the maps filesystem on labstore1001 is OK: OK - Last run for unit replicate-maps was successful [05:18:21] (03PS1) 10Andrew Bogott: Monkeypatch Horizon to simplify the instance-creation panel. [puppet] - 10https://gerrit.wikimedia.org/r/275156 [05:19:45] (03CR) 10jenkins-bot: [V: 04-1] Monkeypatch Horizon to simplify the instance-creation panel. [puppet] - 10https://gerrit.wikimedia.org/r/275156 (owner: 10Andrew Bogott) [06:29:51] PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: puppet fail [06:58:41] RECOVERY - puppet last run on mw2120 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [08:34:01] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [08:35:22] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [10:57:02] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 61.54% of data above the critical threshold [5000000.0] [11:12:22] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [11:14:15] !log Data trasnfer completed during the night, (re)starting MySQL on es200[124] and es201[123] T127330 [11:14:16] T127330: Migration from es2001-es2010 to es2011-es2019 - https://phabricator.wikimedia.org/T127330 [11:14:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:32:09] !log nodetool stop COMPACTION / CLEANUP on restbase1006 [11:32:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:43:01] <_joe_> Italians at work [11:43:03] <_joe_> :P [11:43:17] <_joe_> it's saturday, you know :P [11:55:02] hehe indeed, I'm about to go _joe_ [11:58:57] yeah! [12:08:12] PROBLEM - HHVM rendering on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:09:13] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:43:18] (03PS1) 10Volans: Repool es200[124] after data migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275166 (https://phabricator.wikimedia.org/T127330) [12:44:26] (03CR) 10Volans: "I'll merge it on Monday" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275166 (https://phabricator.wikimedia.org/T127330) (owner: 10Volans) [12:44:34] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [12:44:52] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 1.59 ms [13:29:50] !log hhvm restarted on mw1025 [13:29:52] RECOVERY - HHVM rendering on mw1025 is OK: HTTP OK: HTTP/1.1 200 OK - 67429 bytes in 0.256 second response time [13:29:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:31:02] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.041 second response time [13:42:38] (03PS3) 10Tim Landscheidt: Tools: Remove obsolete classes [puppet] - 10https://gerrit.wikimedia.org/r/272441 [13:50:34] (03CR) 10Tim Landscheidt: [C: 04-1] [ssh, WIP] allow login from tools-login [puppet] - 10https://gerrit.wikimedia.org/r/220214 (https://phabricator.wikimedia.org/T103552) (owner: 10Merlijn van Deen) [13:55:32] 6Operations, 6Services, 10hardware-requests: Hardware request for SCA and SCB in codfw - https://phabricator.wikimedia.org/T128475#2091153 (10Ricordisamoa) >>! In T128475#2089514, @RobH wrote: > We don't disclose individual servers prices. The overall budget is shared via the typical annual reports and such... [14:31:03] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 56.00% of data above the critical threshold [5000000.0] [14:31:55] (03CR) 10Luke081515: [C: 031] Enable assignment of 'accountcreator' for maiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/270897 (https://phabricator.wikimedia.org/T126950) (owner: 10Pmlineditor) [14:46:11] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [15:18:28] 7Blocked-on-Operations, 10RESTBase: Long-term graphite aggregation for restbase.requests.varnish_requests API request metrics not working - https://phabricator.wikimedia.org/T121580#2091220 (10GWicke) p:5Normal>3High [15:31:42] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [5000000.0] [15:42:53] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [16:13:23] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail [16:20:53] 6Operations, 6Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Wikitechwiki has 4xx responses to requests for some static assets inc. poweredby_mediawiki_88x31.png and WikiEditor's button-sprite.svg - https://phabricator.wikimedia.org/T128747#2084612 (10scfc) When I "Edit Source" a page, the buttons hav... [16:41:42] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:45:56] 6Operations, 10media-storage: Images not showing up at Commons - https://phabricator.wikimedia.org/T128961#2091362 (10Glaisher) [17:13:17] (03CR) 10JanZerebecki: [C: 04-1] Add caching headers for nginx (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) (owner: 10Smalyshev) [17:21:17] (03CR) 10Smalyshev: Add caching headers for nginx (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) (owner: 10Smalyshev) [17:27:03] 6Operations, 10media-storage: Images not showing up at Commons - https://phabricator.wikimedia.org/T128961#2091362 (10Platonides) This is a client of PlusNet (a subsidiary of BT, but operationally independent). Yet, when accessing irc or http://whatismyipaddress.com/ he connects from a dynamic IP of AS6871 Plu... [17:48:32] (03CR) 10JanZerebecki: Add caching headers for nginx (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) (owner: 10Smalyshev) [18:23:13] 6Operations, 10Traffic, 10domains: Register nlwikipedia.org to prevent squatting - https://phabricator.wikimedia.org/T128968#2091529 (10Multichill) [19:03:15] (03PS1) 10ArielGlenn: datasets: toss index.html files copied from wikitech anywhere in tree [puppet] - 10https://gerrit.wikimedia.org/r/275202 [19:04:56] (03CR) 10ArielGlenn: [C: 032] datasets: toss index.html files copied from wikitech anywhere in tree [puppet] - 10https://gerrit.wikimedia.org/r/275202 (owner: 10ArielGlenn) [19:08:22] PROBLEM - puppet last run on mw1121 is CRITICAL: CRITICAL: Puppet has 1 failures [19:10:44] 6Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 6Labs, and 2 others: copy wikitech dumps to dumps server ? - https://phabricator.wikimedia.org/T128680#2091569 (10ArielGlenn) The copy works but the cleanup of the autogenerated index.html files, also copied, did not. Patch is in, will check... [19:11:00] (03CR) 10Tim Landscheidt: [C: 04-1] [WIP] Unfinished db table check tools References: T104459 (031 comment) [software] - 10https://gerrit.wikimedia.org/r/256231 (owner: 10Jcrespo) [19:35:42] RECOVERY - puppet last run on mw1121 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:55:36] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 7 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2091623 (10Sjoerddebruin) [20:06:32] PROBLEM - puppet last run on mw2124 is CRITICAL: CRITICAL: puppet fail [20:35:33] RECOVERY - puppet last run on mw2124 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:53:10] (03PS1) 10ArielGlenn: turn off output messages in central auth dumps [puppet] - 10https://gerrit.wikimedia.org/r/275209 [20:54:20] (03CR) 10ArielGlenn: [C: 032] turn off output messages in central auth dumps [puppet] - 10https://gerrit.wikimedia.org/r/275209 (owner: 10ArielGlenn) [21:01:32] (03PS1) 10ArielGlenn: datasets: make wget quieter in cron job, clean up old wikitech dumps [puppet] - 10https://gerrit.wikimedia.org/r/275210 [21:02:56] (03CR) 10ArielGlenn: [C: 032] datasets: make wget quieter in cron job, clean up old wikitech dumps [puppet] - 10https://gerrit.wikimedia.org/r/275210 (owner: 10ArielGlenn) [21:08:24] PROBLEM - puppet last run on wtp2013 is CRITICAL: CRITICAL: puppet fail [21:16:26] (03PS1) 10ArielGlenn: dumps: add wikitech dumps to the 'other' index html page for downloaders [puppet] - 10https://gerrit.wikimedia.org/r/275215 [21:17:26] PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: puppet fail [21:17:32] (03CR) 10ArielGlenn: [C: 032] dumps: add wikitech dumps to the 'other' index html page for downloaders [puppet] - 10https://gerrit.wikimedia.org/r/275215 (owner: 10ArielGlenn) [21:18:06] 6Operations, 10Traffic, 10domains: Register nlwikipedia.org to prevent squatting - https://phabricator.wikimedia.org/T128968#2091706 (10Mbch331) [21:20:13] 6Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 6Labs, and 2 others: copy wikitech dumps to dumps server ? - https://phabricator.wikimedia.org/T128680#2091707 (10ArielGlenn) https://gerrit.wikimedia.org/r/#/c/275215/ added the dumps to the 'other' index page where they are now available. [21:35:32] RECOVERY - puppet last run on wtp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:41] RECOVERY - puppet last run on mw2109 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:48:51] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 27.27% of data above the critical threshold [100000000.0] [22:53:53] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]