[00:00:06] just on time. [00:00:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:00:06] T181107: Deploy Reading Lists Service to production - https://phabricator.wikimedia.org/T181107 [00:00:07] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Evening SWAT (Max 8 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171208T0000). [00:00:07] No GERRIT patches in the queue for this window AFAICS. [00:04:14] !log cp4026 - restart varnish backend, mailbox lag [00:04:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:16:47] (03PS1) 10Dzahn: mariadb::labs_deprecated: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/396291 (https://phabricator.wikimedia.org/T177225) [00:18:43] (03PS2) 10Dzahn: mariadb::labs_deprecated: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/396291 (https://phabricator.wikimedia.org/T177225) [00:25:20] (03PS3) 10Dzahn: mariadb::labs_deprecated: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/396291 (https://phabricator.wikimedia.org/T177225) [00:26:31] no_justification: is the symlink warning in https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Step_4:_synchronize_the_changes_to_the_cluster still relevant? [00:27:38] For that particular file, no. [00:27:38] But for symlinks in general, yes [00:28:02] (03CR) 10Dzahn: [C: 032] mariadb::labs_deprecated: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/396291 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:28:03] PrivateSettings and StartProfiler no longer have that pitfall [00:32:13] (03CR) 10Dzahn: "..no issues either.. can't reproduce" [puppet] - 10https://gerrit.wikimedia.org/r/396291 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:37:02] (03PS1) 10Dzahn: labsdb: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/396292 (https://phabricator.wikimedia.org/T177225) [00:38:42] (03PS2) 10Dzahn: labsdb: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/396292 (https://phabricator.wikimedia.org/T177225) [00:39:05] (03PS3) 10Dzahn: labsdb: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/396292 (https://phabricator.wikimedia.org/T177225) [00:40:11] (03CR) 10Dzahn: [C: 032] labsdb: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/396292 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:45:39] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:45:55] (03PS1) 10Dzahn: labsdb::slave: keep ganglia because postgresql [puppet] - 10https://gerrit.wikimedia.org/r/396294 (https://phabricator.wikimedia.org/T177225) [00:46:15] (03CR) 10Dzahn: "ok on all except one: 1004 because it is a postgresql, keeping that one" [puppet] - 10https://gerrit.wikimedia.org/r/396292 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:47:02] (03PS2) 10Dzahn: labsdb::slave: keep ganglia because postgresql [puppet] - 10https://gerrit.wikimedia.org/r/396294 (https://phabricator.wikimedia.org/T177225) [00:47:09] (03CR) 10Dzahn: [C: 032] labsdb::slave: keep ganglia because postgresql [puppet] - 10https://gerrit.wikimedia.org/r/396294 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:50:39] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:52:23] (03Draft10) 10Aaron Schulz: [WIP] Add mcrouter module and mcrouter_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/392221 [00:59:38] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3821855 (10Dzahn) @bblack ^ So i think it's basically a request to add a new domain to the list of canonical domains. Any thoughts how we sho... [02:14:53] (03CR) 10Legoktm: [C: 031] "This is ready to be deployed now!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394913 (https://phabricator.wikimedia.org/T181535) (owner: 10Legoktm) [02:17:10] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [02:17:19] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 [02:21:08] (03PS10) 10TerraCodes: Remove single editor tab for plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393121 (https://phabricator.wikimedia.org/T181045) [02:21:19] (03PS16) 10TerraCodes: Add loginwiki and wikidata to $wgLocalVirtualHosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392999 (https://phabricator.wikimedia.org/T117302) [02:35:30] (03PS22) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [02:40:15] (03PS23) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [02:42:28] (03PS24) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [03:07:29] PROBLEM - HHVM rendering on mw2147 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:08:19] RECOVERY - HHVM rendering on mw2147 is OK: HTTP OK: HTTP/1.1 200 OK - 79611 bytes in 0.301 second response time [03:24:29] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 783.14 seconds [03:46:30] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 294.37 seconds [04:27:49] 10Operations, 10Deployments, 10Beta-Cluster-reproducible, 10HHVM, and 2 others: Switch mwscript from Zend PHP5 to default php alternative (e.g. HHVM or PHP7) - https://phabricator.wikimedia.org/T146285#3822069 (10tstarling) [04:44:39] RECOVERY - MegaRAID on db1068 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [04:56:18] !log scholarships: updated db schema for 2018 cycle (T181072) [04:56:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:56:31] T181072: Updates to scholarship application form for Wikimania 2018 - https://phabricator.wikimedia.org/T181072 [06:35:12] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T182288#3822167 (10Marostegui) 05Open>03Resolved a:03Cmjohnson All good now - thanks!! ``` root@db1068:~# megacli -LDPDInfo -aAll Adapter #0 Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name... [06:36:20] (03PS1) 10Marostegui: db-eqiad.php: Fully pool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396304 (https://phabricator.wikimedia.org/T178359) [06:40:34] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully pool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396304 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [06:41:59] (03Merged) 10jenkins-bot: db-eqiad.php: Fully pool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396304 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [06:42:12] (03CR) 10jenkins-bot: db-eqiad.php: Fully pool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396304 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [06:43:33] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully pool db1099:3311 - T178359 (duration: 00m 55s) [06:43:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:45] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [06:52:30] !log Fix labsdb1004 replication broken [06:52:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:17:31] 10Operations, 10ops-eqsin, 10netops: Setup eqsin RIPE Atlas anchor - https://phabricator.wikimedia.org/T179042#3822193 (10RobH) a:05RobH>03None [07:17:46] 10Operations, 10ops-eqsin, 10netops: Setup eqsin RIPE Atlas anchor - https://phabricator.wikimedia.org/T179042#3711364 (10RobH) [07:35:43] 10Operations, 10ops-eqsin, 10netops: Setup eqsin RIPE Atlas anchor - https://phabricator.wikimedia.org/T179042#3822199 (10ayounsi) a:03faidon [08:19:17] (03CR) 10Alexandros Kosiaris: "As you can see jenkins doesn't like the approach. I have to agree with jenkins on this one. But I am not sure what kind of feedback to pro" [puppet] - 10https://gerrit.wikimedia.org/r/396072 (https://phabricator.wikimedia.org/T182304) (owner: 10Gehel) [08:28:37] akosiaris: ^ the feedback you should provide: "don't be lazy, and get started on that refactoring" ! [08:29:20] :D [08:29:36] it's not an easy refactoring btw [08:29:49] that class exists before hiera was introduced IIRC [08:30:05] and it's a reusable hiera store in essence [08:30:20] no, that's why I'm not so keen on transforming a very simple change into a major rewrite... [08:30:29] but it is probably worth it on the longer term... [08:32:56] not today though.. [08:35:40] RECOVERY - SSH on ganeti1006 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [10:08:55] (03CR) 10Elukey: "I like a lot the idea, thanks a lot for the accurate information! I'd prefer to have this feature controlled by a on/off switch rather tha" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/395923 (https://phabricator.wikimedia.org/T182276) (owner: 10EBernhardson) [10:20:27] (03PS1) 10Lucas Werkmeister (WMDE): Remove detail from wbcheckconstraints API response [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396311 (https://phabricator.wikimedia.org/T180614) [10:33:42] 10Operations, 10ops-eqiad: Decomission eventlog2001 - https://phabricator.wikimedia.org/T182397#3822380 (10elukey) p:05Triage>03Normal [10:34:07] 10Operations, 10Analytics, 10Analytics-EventLogging, 10Icinga, 10Patch-For-Review: eventlog2001 - CRITICAL status of defined EventLogging jobs - https://phabricator.wikimedia.org/T119930#3822391 (10elukey) Thanks a lot @Dzahn, opened https://phabricator.wikimedia.org/T182397 [10:34:30] 10Operations, 10ops-eqiad, 10Analytics: Decomission eventlog2001 - https://phabricator.wikimedia.org/T182397#3822392 (10elukey) [10:43:31] (03CR) 10Joal: "Quick nit-picky comment inside" (031 comment) [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/396051 (owner: 10Elukey) [10:49:14] (03PS8) 10Elukey: List of fixes: [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/396051 [10:49:54] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3623015 (10daniel) @tstarling According to TechCom notes, this was to enter Last Call on November 22, but that was never announced. What's the status? [10:50:44] (03PS9) 10Elukey: List of fixes: [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/396051 [11:04:01] (03PS10) 10Elukey: List of fixes: [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/396051 [11:07:11] (03PS11) 10Elukey: List of fixes: [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/396051 [11:21:02] (03PS12) 10Elukey: List of fixes: [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/396051 [11:23:28] (03PS1) 10EddieGP: Show HTML summaries on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396318 (https://phabricator.wikimedia.org/T182321) [11:25:09] (03CR) 10Elukey: [V: 032 C: 032] List of fixes: [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/396051 (owner: 10Elukey) [11:28:47] (03CR) 10Phuedx: [C: 04-1] "Blocked until T179875 is resolved." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396318 (https://phabricator.wikimedia.org/T182321) (owner: 10EddieGP) [11:39:41] !log upload prometheus-druid-exporter 0.6 to stretch/jessie wikimedia [11:39:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:48] (03PS1) 10Joal: Add clickstream rsync for external visibility [puppet] - 10https://gerrit.wikimedia.org/r/396324 [11:42:58] elukey: --^ if you have aminute [11:43:22] (03CR) 10jerkins-bot: [V: 04-1] Add clickstream rsync for external visibility [puppet] - 10https://gerrit.wikimedia.org/r/396324 (owner: 10Joal) [11:43:28] :( [11:44:21] indentation, sorry [11:44:36] (03PS2) 10Joal: Add clickstream rsync for external visibility [puppet] - 10https://gerrit.wikimedia.org/r/396324 [11:45:17] (03CR) 10jerkins-bot: [V: 04-1] Add clickstream rsync for external visibility [puppet] - 10https://gerrit.wikimedia.org/r/396324 (owner: 10Joal) [11:45:38] joal: should we ask also apergos? [11:45:56] !log updated prometheus-druid-exporter on druid* to 0.6 [11:46:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:14] elukey: hm, why? [11:46:31] elukey: This dataset is unrelated to xml dumps - Why not, but not sure why :) [11:47:12] (03PS3) 10Joal: Add clickstream rsync for external visibility [puppet] - 10https://gerrit.wikimedia.org/r/396324 (https://phabricator.wikimedia.org/T175844) [11:47:47] joal: sure, no idea about how those are managed, this is why I was asking :) [11:48:24] yes you should [11:48:56] how big are these files? how many will we want to keep? [11:49:35] joal: [11:50:06] apergos: As of now, ~500G [11:50:15] no, 500M sorry - every month [11:50:22] apergos: --^ [11:50:29] how many months are we expected to keep? [11:50:43] and I think we're gonna be willing to keep them kinda indefintely, meaning a long time [11:50:57] same as pageviews [11:54:26] the difference is that pageviews is only 36G a month [11:54:33] .5T is rather a lot [11:54:56] apergos: we're thinking 500M a month [11:55:01] oh [11:55:05] so way smaller than pageview [11:55:08] sorry, I read the 50-G [11:55:10] 500 [11:55:20] and my eyes popped out a bit :-D [11:55:26] yes apergos - correted that, but wrong message was still sent :) [11:55:27] that's fine then [11:55:29] yep [11:55:31] heh [11:55:41] Thanks apergos :) [11:55:43] yw [11:55:57] elukey: We have apergos approval ! Yay ! [11:55:58] at some point I'm going to refactor these so it's one job retrieving them all [11:56:01] but that day is not today [11:56:27] no problem apergos - ping us if you wish, we're insterested in following that obviously :) [11:56:55] of course [11:57:53] I need to get food, back in a little while (30 mins?) [11:58:12] _joe_ hashar managed to get to the bottom of the long API post requests from yesterday ( i believe ) https://phabricator.wikimedia.org/T182322#3822596 [12:00:38] (03CR) 10EddieGP: "Oops, I oversaw the blocking task. Thanks for pointing that out, I've reverted my edit on [[wikitech:Deployments]] that scheduled it for S" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396318 (https://phabricator.wikimedia.org/T182321) (owner: 10EddieGP) [12:06:28] (03PS4) 10Joal: Add clickstream rsync for external visibility [puppet] - 10https://gerrit.wikimedia.org/r/396324 (https://phabricator.wikimedia.org/T175844) [12:34:11] apergos/addshore (not sure who is on ops duty): Commons is loading really really slow. I also asked other users who confirmed it. Sometimes it takes 8-15 seconds to get a response from the server. [12:35:26] Steinsplitter: its in the topic :) although they are not in the channel? O_o [12:36:12] the topic is outdated? [12:36:27] perhaps [12:36:44] hmm, commons seems speedy / fine from here [12:36:50] you access from EU? [12:37:02] yup [12:38:21] it's fine for me as well (also eu) [12:38:57] people aren't on 24x7, Steinsplitter: the topic is probably accurate but he's in a different tz than we are [12:39:35] Steinsplitter: is it just page loads of something else specific? [12:39:43] seems ok (?) https://usercontent.irccloud-cdn.com/file/0YagvEwn/Unbenannt.JPG [12:39:47] is it page loads after doing another action? [12:40:06] I assume this is as a logged in user [12:40:11] yes [12:40:20] same here, I was logged in [12:40:47] okay, thanks^^ [12:41:09] Steinsplitter: it's not after doing another action then? just on page load? [12:41:34] on page load & api calls. [12:43:14] could be related to https://phabricator.wikimedia.org/T182322 but that only happens in a particular set of circumstances, and POST and then a GET [12:43:15] brb [12:46:25] addshore: thanks, seems to be the prob since few scripts using exactly that function in a few cases. thanks. [12:54:59] (03CR) 10Brian Wolff: "Just a note that changes to apache short url config needs to be merged in puppet before $wgVariantArticlePath is set." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396282 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [13:10:39] 10Operations, 10ORES, 10Scoring-platform-team: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3822730 (10akosiaris) [13:10:42] 10Operations, 10ORES, 10Scoring-platform-team, 10Patch-For-Review: rack/setup/install ores2001-2009 - https://phabricator.wikimedia.org/T165170#3822729 (10akosiaris) [13:17:05] Steinsplitter: mind writing a quick comment on the ticket perhaps with what exactly you are doing / what gadgets? :) [13:17:34] addshore: busy now, will do later :) [13:24:18] (03PS2) 10Brian Wolff: Updates to enable transliteration for crhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396282 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [13:26:53] (03CR) 10Brian Wolff: Updates to enable transliteration for crhwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396282 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [13:34:33] (03CR) 10Brian Wolff: "There are also rewrite rules in /Users/bawolff/src/puppet/modules/mediawiki/files/apache/beta/sites/wikipedia.conf (and wiktionary.conf, a" [puppet] - 10https://gerrit.wikimedia.org/r/396283 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [13:35:46] (03CR) 10Brian Wolff: "Obviously, in my last file path I posted, you should ignore the /Users/bawolff/src/puppet prefix" [puppet] - 10https://gerrit.wikimedia.org/r/396283 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [13:50:19] special:undelete moved to OOUUI, even on monobook, is this planned? [13:54:02] Vito: yes [13:54:12] Vito: but we are going to partially revert later today [13:54:33] err, i mean in mw. revert won't hit wikis until monday [13:54:57] bawolff: no chances to use monobook without oouui? [13:55:03] T182398 [13:55:03] T182398: Special:Undelete contains egregious white space after OOUI update - https://phabricator.wikimedia.org/T182398 [13:55:33] Umm, I don't think we change between ooui vs not ooui based on skin anywhere else [13:56:07] Isn't monobook supposed to change the look of ooui so it fits in with monobook? [13:56:33] We're planning to revert the text area due to the width thing, but the current plan is to keep the buttons ooui [13:58:47] bawolff: I personally don't like ooui, for example I don't like special:undelete text to be pale grey [13:59:37] I liked ooui on prefs sadly it was error heaven, and was reverted :/ [13:59:38] same for the special:block dropdown which I liked to use only via tab button [13:59:40] RECOVERY - Disk space on stat1004 is OK: DISK OK [14:00:31] Vito: Well I can understand that, I think that's more a question of what our broader direction for mediawiki's front end interface are [14:01:12] Vito: which I have no idea about, and I'd have to pass the buck on that question to either designers are whoever is responsible for the plans in that area [14:02:01] My understanding is that the long term plan is to slowly convert everything to ooui, but I don't really know much about the plans in this area of mediawiki [14:02:43] honestly it seems a (supposed) aesthetical improvement was given precedence over stratified micro-usability fixes [14:03:54] old editors tend to be conservative about interface, the number of people still using monobook should ring a bell [14:04:48] I don't see benefits in forcing them to change since we're dealing with plain old simple features needing not so much support [14:05:39] There's a benefit to consistency in the interface [14:05:51] But beyond that, I don't really have a horse in the ooui race [14:08:56] I'm wonder whatever it would be possible to use css to neutralize changes I don't like [14:09:07] grey of special:undelete for sure [14:09:36] Hmm, which part is grey [14:09:45] * bawolff looks at the page again [14:10:46] deleted text snippet [14:11:20] So that part is the part that will be reverted [14:12:45] (03CR) 10Tjones: "Thanks, Brian. I appreciate your help with this!" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396282 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [14:14:23] Vito: Weird, on my local wiki the text is black, but there is css to set it to grey that is overriden by other css to set it to black [14:14:37] uhm [14:14:56] oh wait no that's the background color [14:15:05] Hmm, let me see what its like on real wikimedia [14:15:48] (03PS3) 10Tjones: Updates to enable transliteration for crhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396282 (https://phabricator.wikimedia.org/T23582) [14:16:09] (03PS4) 10Tjones: Updates to enable transliteration for crhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396282 (https://phabricator.wikimedia.org/T23582) [14:18:18] Vito: So the text is only grey on monobook. It is not grey on vector [14:18:21] Which is kind of odd [14:18:41] For an actual read only text box (e.g. a protected page you cannot edit) grey text might make sense to show its disabled, but it certainly doesn't make sense here [14:18:50] And its really odd to be grey only in monobook skin [14:20:10] (03PS1) 10Jon Harald Søby: Set category collation for sewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396381 (https://phabricator.wikimedia.org/T181503) [14:21:02] bawolff: definitely! [14:21:31] I can fix it with my personal css, though it's definitely a dumb solution [14:22:58] Vito: https://phabricator.wikimedia.org/T182426 [14:23:10] In any case, we are definitely going to revert the textbox change come monday [14:23:21] (03PS3) 10Tjones: Updates to enable short URLs for transliteration for crhwiki [puppet] - 10https://gerrit.wikimedia.org/r/396283 (https://phabricator.wikimedia.org/T23582) [14:24:24] (03PS5) 10Tjones: Updates to enable short URLs for transliteration for crhwiki [puppet] - 10https://gerrit.wikimedia.org/r/396283 (https://phabricator.wikimedia.org/T23582) [14:24:41] And that should make the text black again [14:25:28] but I'll hardly have my tab button-only blocking interface back :/ [14:27:23] Vito: Why doesn't the block interface work with only tabs [14:27:35] err, first let me retest that on monobook to see if its a per skin thing ;) [14:28:11] bawolff: I used to select something from dropdowns, then moving to the next one with tab button [14:28:18] Vito: yeah, i just tested now. Seems entirely possible to use just tabs [14:28:32] now I have to press return before pressing tab [14:28:54] if I use arrows to select elements [14:29:03] Oh i see, you have to press space or tab to mark the one you want [14:29:56] yeah, and native widgets the moment you pressed the arrow key it made a selection [14:30:03] Well I think we should fix that behaviour [14:30:22] Do you know if there is an existing bug for this? [14:30:45] I don't think so [14:31:05] there's a series of mini-usability stuffs many of us are used to [14:36:59] !log gehel@tin Started deploy [tilerator/deploy@e52ea1d]: testing new tilerator packaging on maps-test2003 [14:37:04] Vito: https://phabricator.wikimedia.org/T182429 [14:37:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:36] great bawolff! [14:37:36] thank you! [14:37:52] Vito: I would encourage you to file bugs about mini-usability stuff . Many filed bugs are ignored, but we ignore 100% of the bugs that we don't know about :) [14:39:33] !log gehel@tin Finished deploy [tilerator/deploy@e52ea1d]: testing new tilerator packaging on maps-test2003 (duration: 02m 34s) [14:39:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:10] PROBLEM - tilerator on maps-test2003 is CRITICAL: connect to address 10.192.16.34 and port 6534: Connection refused [14:40:17] bawolff: I tend to file bugs about "more serious" stuffs, and that's my fault [14:44:08] 10Operations, 10HHVM, 10User-Elukey: Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy - https://phabricator.wikimedia.org/T177498#3660812 (10jhsoby-WMNO) [14:49:41] (03CR) 10Alexandros Kosiaris: [C: 031] "https://puppet-compiler.wmflabs.org/compiler02/9234/ says this will work. It will lower available uwsgi workers on scb1001 as you can see " [puppet] - 10https://gerrit.wikimedia.org/r/396055 (https://phabricator.wikimedia.org/T182249) (owner: 10Awight) [15:05:05] !log gehel@tin Started deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003 [15:05:08] (03CR) 10Alexandros Kosiaris: [C: 04-1] Refactor web workers for ORES (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/396064 (https://phabricator.wikimedia.org/T182249) (owner: 10Halfak) [15:05:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:47] !log gehel@tin Finished deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003 (duration: 00m 42s) [15:05:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:03] !log gehel@tin Started deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003 [15:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:43] ACKNOWLEDGEMENT - tilerator on maps-test2003 is CRITICAL: connect to address 10.192.16.34 and port 6534: Connection refused Gehel testing new packaging [15:08:12] !log gehel@tin Finished deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003 (duration: 02m 08s) [15:08:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:10] RECOVERY - tilerator on maps-test2003 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.090 second response time [15:28:33] !log gehel@tin Started deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003 [15:28:37] !log gehel@tin Finished deploy [tilerator/deploy@29d633e]: testing new tilerator packaging on maps-test2003 (duration: 00m 03s) [15:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:11] !log Fix dbstore1002 s5 replication [15:35:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:35] (03PS1) 10Herron: puppet: set diadem puppet_major_version 4 [puppet] - 10https://gerrit.wikimedia.org/r/396411 (https://phabricator.wikimedia.org/T179721) [15:50:12] (03PS1) 10Ottomata: Open druid coordinator to more networks; superset needs to query it [puppet] - 10https://gerrit.wikimedia.org/r/396413 (https://phabricator.wikimedia.org/T166689) [15:54:14] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler02/9235/" [puppet] - 10https://gerrit.wikimedia.org/r/396413 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata) [15:54:16] (03CR) 10Ottomata: [C: 032] Open druid coordinator to more networks; superset needs to query it [puppet] - 10https://gerrit.wikimedia.org/r/396413 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata) [15:57:47] (03CR) 10Herron: [C: 032] puppet: set diadem puppet_major_version 4 [puppet] - 10https://gerrit.wikimedia.org/r/396411 (https://phabricator.wikimedia.org/T179721) (owner: 10Herron) [15:57:56] (03PS2) 10Herron: puppet: set diadem puppet_major_version 4 [puppet] - 10https://gerrit.wikimedia.org/r/396411 (https://phabricator.wikimedia.org/T179721) [15:59:12] 10Operations, 10Analytics-Kanban: Allow access to druid public-eqiad cluster ports 8081 from analytics VLAN - https://phabricator.wikimedia.org/T182443#3823299 (10Ottomata) p:05Triage>03Normal [16:08:45] is there anyone available to jump on a hosts out-of-band console on a Just In Case basis? [16:09:36] basically, I want to disable smart path on a raid controller, which i expect to be non-disruptive, but...Just In Case :) [16:09:53] 10Operations, 10Analytics-Kanban: Allow access to druid public-eqiad cluster ports 8081 from analytics VLAN - https://phabricator.wikimedia.org/T182443#3823331 (10Ottomata) [16:09:54] herron: could you help urandom with that if needed? [16:10:13] sure which host urandom? [16:10:19] restbase1010 [16:10:45] herron: i just wanted to make sure there was someone that could, if on the off chance something doesn't go right [16:11:51] herron: i'll get setup and proceed in a few minutes if that's ok, and with any luck i won't need to bother you :) [16:12:21] urandom ok good call, I've got a serial connection open now [16:12:35] sounds good [16:13:21] thanks :) [16:14:06] (03PS1) 10Gehel: wdqs: adding "su" directive to log rotation [puppet] - 10https://gerrit.wikimedia.org/r/396419 [16:14:18] yes, thanks! [16:15:13] !log shutting down cassandra, restbase1010 - T178177 [16:15:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:24] T178177: Investigate aberrant Cassandra columnfamily read latency of restbase101{0,2,4} - https://phabricator.wikimedia.org/T178177 [16:18:37] 10Operations, 10Analytics-Kanban: Allow access to druid public-eqiad cluster ports 8081 from analytics VLAN - https://phabricator.wikimedia.org/T182443#3823385 (10mark) 05Open>03Resolved a:03mark Added port 8081 to the existing term for druid on cr1-eqiad and cr2-eqiad. [16:20:08] !log disabling smart path, restbase1010, array 'a' (canary) - T178177 [16:20:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:08] !log disabling smart path, restbase1010, arrays 'b'...'e' - T178177 [16:22:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:18] T178177: Investigate aberrant Cassandra columnfamily read latency of restbase101{0,2,4} - https://phabricator.wikimedia.org/T178177 [16:23:14] !log starting cassandra, restbase1010 - T178177 [16:23:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:45] (03CR) 10Halfak: [C: 031] "Alex, the plan here is to not disrupt production while still achieving the refactor. I think it is a great idea to have the # of workers " [puppet] - 10https://gerrit.wikimedia.org/r/396064 (https://phabricator.wikimedia.org/T182249) (owner: 10Halfak) [16:27:28] herron: as anti-climatic as i'd hoped (so far) :) [16:27:48] haha great [16:29:48] (03PS1) 10Herron: puppet: set puppetcompiler1001 puppet major version 4 [puppet] - 10https://gerrit.wikimedia.org/r/396423 (https://phabricator.wikimedia.org/T177254) [16:33:19] (03CR) 10Herron: [C: 032] puppet: set puppetcompiler1001 puppet major version 4 [puppet] - 10https://gerrit.wikimedia.org/r/396423 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [16:59:17] hrmm, so i guess the icinga check for HP RAID expects smart path to be enabled [16:59:49] lest it paint the service in orange to indicate its annoyance [17:25:14] (03PS1) 10Marostegui: sanitarium_multiinstance: Keep less the binlogs [puppet] - 10https://gerrit.wikimedia.org/r/396435 (https://phabricator.wikimedia.org/T153058) [17:31:45] (03CR) 10Marostegui: [C: 032] sanitarium_multiinstance: Keep less the binlogs [puppet] - 10https://gerrit.wikimedia.org/r/396435 (https://phabricator.wikimedia.org/T153058) (owner: 10Marostegui) [17:50:50] no_justification: hi. Have you updated MediaWiki recently, because Special:UserLogin@wmfwiki gives me [WirQvgpAEK0AABOc-fYAAABK] 2017-12-08 17:49:50: Fatal exception of type "InvalidArgumentException" [17:51:08] so not even the Special:UserLogin page is displayed [17:51:45] (reporting to Phab soon) [17:51:51] 2017-12-08 17:49:50 [WirQvgpAEK0AABOc-fYAAABK] mw1218 foundationwiki 1.31.0-wmf.11 exception ERROR: [WirQvgpAEK0AABOc-fYAAABK] /w/index.php?title=Special:UserLogin&returnto=Resolution%3ADelegation+of+policy-making+authority InvalidArgumentException from line 1492 of /srv/mediawiki/php-1.31.0-wmf.11/includes/Block.php: Blocker must be a local user or a name that cannot be a local user {"exception_id":"WirQvgpAEK0AABOc-fYAAABK"," [17:51:51] exception_url":"/w/index.php?title=Special:UserLogin&returnto=Resolution%3ADelegation+of+policy-making+authority","caught_by":"mwe_handler"} [17:52:04] Hauskatze: I think no_justification already filed a task for this [17:52:29] ah, true [17:52:37] that blocker must be" thing [17:52:44] okay, following-up on there [17:53:09] I did [17:53:42] It wasn't happening a lot so I didn't make it a blocker [17:54:10] PROBLEM - puppet last run on labstore1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:54:15] ubn? [17:54:42] no_justification: Shall we just disable globalblocking on fisbowl? [17:55:24] Hauskatze: It was happening less than once a minute, and mostly to fishbowl wikis [17:55:26] So not UBN [17:55:35] Reedy: A good start. But it was also happening at plwikivoyage [17:55:53] (03PS1) 10Reedy: Disable GlobalBlocking on fishbowl wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396437 [17:55:54] which is not fishbowl [17:56:26] Hence..."mostly" [17:56:42] well, it does not seem there would be much troubles not applying globalblocking on fishbowls [17:57:02] Reedy: Write a patch, I've gotta run some errands [17:57:08] I already have [17:57:30] (03PS2) 10Reedy: Disable GlobalBlocking on fishbowl wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396437 (https://phabricator.wikimedia.org/T182344) [17:57:36] (03CR) 10Reedy: [C: 032] Disable GlobalBlocking on fishbowl wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396437 (https://phabricator.wikimedia.org/T182344) (owner: 10Reedy) [17:57:54] selfmerge!!11 :P [17:58:30] Hauskatze: We're pretty cavalier like that :p [17:58:47] So the problem is that one a non-local user blocks someone, the block is from a non-local user ;) [17:58:52] *when [17:58:58] Kind of ironic error message [17:59:03] (03Merged) 10jenkins-bot: Disable GlobalBlocking on fishbowl wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396437 (https://phabricator.wikimedia.org/T182344) (owner: 10Reedy) [17:59:13] I know what might be no_justification , bawolff and Reedy [17:59:13] RECOVERY - puppet last run on labstore1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:59:14] (03CR) 10jenkins-bot: Disable GlobalBlocking on fishbowl wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396437 (https://phabricator.wikimedia.org/T182344) (owner: 10Reedy) [17:59:23] what if the IP you try to use is globally blocked? [17:59:32] I think that that might be the cause [17:59:48] Hauskatze: yes, that's likely the cause [18:00:01] I mean: user X is trying to log-in with a globally blocked IP/IP range [18:00:10] and that's likelly my reason [18:00:18] bawolff: Is there a super compelling reason fishbowl wikis aren't in CA? [18:00:33] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: Disable GlobalBlocking on fishbowl wikis (duration: 00m 45s) [18:00:36] No idea' [18:00:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:53] 'cause account creations on those are manual and shouldn't be automatic maybe? [18:00:56] I'm not aware of any at least [18:01:16] Yeah, manual right now I suppose [18:01:23] But they don't *have* to be is my point [18:01:38] I guess user rights are probably tied to has an account instead of some group, but there's no reason that has to be the case [18:02:09] I can log in now again Reedy [18:02:14] yay [18:06:29] also Reedy no_justification -- Is the proposed solution for T182356 actually valid? [18:06:29] T182356: Broken namespaces / pagelinks on enwiki Beta Cluster - https://phabricator.wikimedia.org/T182356 [18:06:52] it's 'pagelinks', not sure --fix would handle them? [18:07:03] I've never seen the task and I'm going afk now baiiiiiii [18:07:24] Hauskatze: Yes it does [18:07:33] // Update *_from_namespace in links tables [18:07:33] $fromNamespaceTables = [ [18:07:33] [ 'pagelinks', 'pl' ], [18:07:33] [ 'templatelinks', 'tl' ], [18:07:33] [ 'imagelinks', 'il' ] ]; [18:07:44] * Hauskatze runs the script then [18:35:59] (03PS1) 10Herron: puppet: fix puppet/facter package pinning in stretch [puppet] - 10https://gerrit.wikimedia.org/r/396438 (https://phabricator.wikimedia.org/T177254) [18:39:50] (03PS1) 10Ottomata: Set default topic timestamp.type to LogAppendTime [puppet] - 10https://gerrit.wikimedia.org/r/396439 (https://phabricator.wikimedia.org/T161731) [18:40:36] (03CR) 10Ppchelko: [C: 031] Set default topic timestamp.type to LogAppendTime [puppet] - 10https://gerrit.wikimedia.org/r/396439 (https://phabricator.wikimedia.org/T161731) (owner: 10Ottomata) [18:40:40] (03CR) 10Ottomata: [C: 032] Set default topic timestamp.type to LogAppendTime [puppet] - 10https://gerrit.wikimedia.org/r/396439 (https://phabricator.wikimedia.org/T161731) (owner: 10Ottomata) [18:41:14] (03CR) 10Herron: [C: 032] puppet: fix puppet/facter package pinning in stretch [puppet] - 10https://gerrit.wikimedia.org/r/396438 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [18:41:25] (03PS2) 10Herron: puppet: fix puppet/facter package pinning in stretch [puppet] - 10https://gerrit.wikimedia.org/r/396438 (https://phabricator.wikimedia.org/T177254) [18:47:39] Reedy: no_justification have you seen out UBN ticket? [18:48:03] addshore: auf Englisch, bitte. [18:48:26] Un block now for Wikidata *finds the link* [18:48:52] https://phabricator.wikimedia.org/T182322 [18:49:25] All of wmde is away already today and would love that reverting in core, but can't ourselves [18:50:13] Wikidata performance is currently rather shit ;) [18:50:21] PRESS TEH REVERT BUTTON [18:50:21] It's hard [18:50:31] xD [18:50:45] People were complaining about something similar at commons i believe [18:50:57] my internet is slooooow [18:51:00] in namens Wikimedia benutzer, bitte [18:51:23] PROBLEM - High lag on wdqs1003 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1800.0] https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1panelId=8fullscreen [18:51:47] bawolff: yes, I pointed them to the same ticket [18:52:12] I would revert but won't be at a machine until tommorrow [18:55:07] just .11? [18:55:14] Yup [18:55:44] I'm not sure if both of the patches I linked need reverting or just the one! [18:56:41] what about taking down Wikidata? [18:56:48] problem solved :P [18:57:32] Is gerrit slow? Or is it my internet? [18:58:11] 19:50:57 Reedy | my internet is slooooow [18:58:17] I guess it's your internet :P [18:58:28] Well, I'm not trying to use anything but gerrit atm [18:58:58] Might just be my laptop [18:59:07] More seriously: Gerrit works normal for me. [18:59:29] gerrit's fast for me [19:00:02] addshore: There seems to be more patches ontop.. [19:00:39] is someone taking the lead on reverting? [19:00:41] Hmm, I only spotted 2 [19:01:56] Reverting the later commit... conflicts in WANObjectCache.php [19:02:21] Blergh [19:02:55] There's 6 commits ontop on https://github.com/wikimedia/mediawiki/commits/master/includes/libs/objectcache/WANObjectCache.php [19:03:22] both the conflicts are trivial... [19:04:35] We should not forget to mark this ticket as a blocker for next week's train too (in its current state) [19:05:00] Lydia_WMDE says thank you :) [19:06:17] wtf is up with my laptop [19:09:53] Reedy: ask Santa for a new one [19:10:04] It's still under apple care [19:11:44] PROBLEM - Check systemd state on wdqs1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:15:27] Reedy: seems easier to just conflict merge. You'd only need the cpPosTime one. [19:15:44] [19:12:35] (PS1) Reedy: Revert "objectcache: Make WANObjectCache interim caching not interfere with ChronologyProtector" [core] - https://gerrit.wikimedia.org/r/396443 (https://phabricator.wikimedia.org/T182322) [19:15:44] [19:12:37] (PS1) Reedy: Revert "Make ChronologyProtector actually use cpPosTime cookies" [core] - https://gerrit.wikimedia.org/r/396444 (https://phabricator.wikimedia.org/T182322) [19:16:49] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler02/9238/dataset1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/396324 (https://phabricator.wikimedia.org/T175844) (owner: 10Joal) [19:17:13] (03CR) 10Ottomata: [C: 031] Add clickstream rsync for external visibility [puppet] - 10https://gerrit.wikimedia.org/r/396324 (https://phabricator.wikimedia.org/T175844) (owner: 10Joal) [19:17:27] Reedy: eh, I'll just make a small patch to setup.php [19:23:54] PROBLEM - Host labstore1006 is DOWN: PING CRITICAL - Packet loss = 100% [19:27:44] RECOVERY - Check systemd state on wdqs1003 is OK: OK - running: The system is fully operational [19:29:34] PROBLEM - Host mw2140.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [19:34:11] AaronSchulz: thanks! [20:00:57] hello ops, not sure if you were aware that there have been complaints of performance issues on enwiki? https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Slowness_with_Twinkle_scripts [20:01:02] specifically the API [20:02:00] musikanimal: maybe related to https://phabricator.wikimedia.org/T182322 [20:02:19] musikanimal, hello, as far as I know, the Performance team (who is responsible for performance surprisingly) do know about this issue and they are working on it. [20:02:26] bawolff, you were faster :D [20:02:31] okay! thank you :) [20:02:40] I'll comment on the discussion and let them know [20:03:13] musikanimal, thanks! By the way, "my" wiki (cs.wikipedia) is slow too. It is something very wide. Have a nice day! [20:03:39] Urbanecm: enwiki is being slow too [20:03:55] musikanimal: There's talk about reverting the thingy [20:04:17] bawolff, which thingy if I may ask? [20:04:20] Zppix, every wiki... [20:04:32] Its annoying [20:04:35] Maybe T182390 can be interesting too [20:04:35] T182390: 2017-12-07 Huge SaveTiming spike - https://phabricator.wikimedia.org/T182390 [20:04:50] Zppix, yeah, it is, especially when you're trying to do regular patrolling :D [20:05:01] Urbanecm: the using cpPosTime cookie thingy [20:05:38] https://gerrit.wikimedia.org/r/#/c/396451/ [20:05:51] That actually looks merged, so maybe that already happened [20:05:57] I haven't been following this issue too closely [20:06:39] bawolff, actually, I have cpPosTime cookie at my computer right now. [20:06:51] (downtimed labstore1006 -I'm poking at it) [20:06:54] LastAccessed:"Fri, 08 Dec 2017 19:59:29 GMT" [20:06:58] Expires:"Fri, 08 Dec 2017 20:00:29 GMT" [20:07:07] So it seems to be quite fresh [20:07:16] Urbanecm: Having the cookie has been that way for a long time, its how the cookie is processed that changed [20:08:02] The patch should disable issuing the cookie, shouldn't it? [20:08:22] bawolff: yes looks like the same issue [20:08:35] Is someone backporting & deploying? :) [20:08:52] Urbanecm: no, its only supposed to disable processing of the cookie in the way that caused slowness [20:09:03] bawolff, Ok. [20:09:23] addshore, at least in next MW window :D [20:10:08] Urbanecm: it's Friday there are no windows today [20:10:13] But it should be deployed today [20:10:33] addshore, yeah. I meant at least in next MW window there should be MW related deploys. [20:11:05] no deployment window, even for UBN? [20:11:28] There are no windows but we can deploy, but I am only on a phone until tommorrow [20:11:35] RECOVERY - Host labstore1006 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [20:11:38] I'm still poking around for someone else to do it :) [20:11:44] oh okay :) many thanks! [20:17:24] RECOVERY - High lag on wdqs1003 is OK: OK: Less than 30.00% above the threshold [600.0] https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1panelId=8fullscreen [20:19:14] RECOVERY - puppet last run on labstore1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:21:40] greg-g: Would I be able to deploy https://gerrit.wikimedia.org/r/#/c/396465/ today? Its reverting something that's making wikipedians uppity [20:22:49] !log aaron@tin Synchronized php-1.31.0-wmf.11/includes/Setup.php: a319c3e7ab61 - disable cpPosTime injection (duration: 00m 45s) [20:22:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:45:09] 10Operations, 10Wikimedia-Logstash, 10hardware-requests: decommission logstash100[1-3] - https://phabricator.wikimedia.org/T175830#3824061 (10debt) [20:45:11] 10Operations, 10Wikimedia-Logstash, 10Discovery-Search (Current work), 10Patch-For-Review: setup/install logstash100[7-9].eqiad.wmnet - https://phabricator.wikimedia.org/T175045#3824062 (10debt) [20:45:14] 10Operations, 10Wikimedia-Logstash, 10Discovery-Search (Current work), 10Patch-For-Review: all log producers need to use the logstash LVS endpoint - https://phabricator.wikimedia.org/T175242#3824060 (10debt) 05Open>03Resolved [20:47:04] 10Operations, 10ops-codfw, 10Discovery, 10Elasticsearch, 10Discovery-Search (Current work): HP RAID Battery issue on elastic2004 - https://phabricator.wikimedia.org/T181412#3824065 (10debt) 05Open>03Resolved [20:53:46] Thanks AaronSchulz ! [21:03:39] (03PS1) 10Umherirrender: Remove MoodBar from submodules.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396475 [21:08:52] (03PS1) 10Umherirrender: Remove AccountAudit from submodules.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396478 [21:12:09] PROBLEM - Check systemd state on wdqs1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:14:09] RECOVERY - Host labstore1007 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [21:16:32] (03PS2) 10Umherirrender: Remove AccountAudit from multiversion/submodules.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396478 [21:16:59] (03PS2) 10Umherirrender: Remove MoodBar from multiversion/submodules.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396475 [21:17:40] (03PS1) 10Umherirrender: Remove Wikidata from multiversion/submodules.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396482 [21:22:59] RECOVERY - Kafka Broker Under Replicated Partitions on kafka1012 is OK: OK: Less than 50.00% above the threshold [1.0] https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29fullscreenorgId=1 [21:24:29] PROBLEM - Kafka Broker Under Replicated Partitions on kafka1020 is CRITICAL: CRITICAL: 51.72% of data above the critical threshold [10.0] https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29fullscreenorgId=1 [21:28:09] RECOVERY - Check systemd state on wdqs1003 is OK: OK - running: The system is fully operational [21:31:49] RECOVERY - Check whether ferm is active by checking the default input chain on labstore1007 is OK: OK ferm input default policy is set [21:31:49] RECOVERY - Disk space on labstore1007 is OK: DISK OK [21:31:49] RECOVERY - dhclient process on labstore1007 is OK: PROCS OK: 0 processes with command name dhclient [21:31:49] RECOVERY - Check size of conntrack table on labstore1007 is OK: OK: nf_conntrack is 0 % full [21:31:49] RECOVERY - Check systemd state on labstore1007 is OK: OK - running: The system is fully operational [21:31:50] RECOVERY - configured eth on labstore1007 is OK: OK - interfaces up [21:31:50] RECOVERY - DPKG on labstore1007 is OK: All packages OK [21:33:37] (03PS1) 10Ottomata: Use sync worker for superset [puppet] - 10https://gerrit.wikimedia.org/r/396488 (https://phabricator.wikimedia.org/T166689) [21:34:10] (03CR) 10Ottomata: [C: 032] Use sync worker for superset [puppet] - 10https://gerrit.wikimedia.org/r/396488 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata) [21:35:49] RECOVERY - IPMI Sensor Status on labstore1007 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK [21:36:18] (03PS1) 10Ottomata: Bump superset workers to 8 [puppet] - 10https://gerrit.wikimedia.org/r/396489 [21:36:59] (03CR) 10jerkins-bot: [V: 04-1] Bump superset workers to 8 [puppet] - 10https://gerrit.wikimedia.org/r/396489 (owner: 10Ottomata) [21:37:32] (03PS2) 10Ottomata: Bump superset workers to 8 [puppet] - 10https://gerrit.wikimedia.org/r/396489 (https://phabricator.wikimedia.org/T396488) [21:37:39] (03CR) 10Ottomata: [V: 032 C: 032] Bump superset workers to 8 [puppet] - 10https://gerrit.wikimedia.org/r/396489 (https://phabricator.wikimedia.org/T396488) (owner: 10Ottomata) [21:42:56] 10Operations, 10Ops-Access-Requests, 10AICaptcha, 10WMF-NDA-Requests: Requesting access to EventLogging data for Vinitha - https://phabricator.wikimedia.org/T181952#3824268 (10Tgr) [21:46:21] 10Operations, 10Ops-Access-Requests, 10AICaptcha, 10WMF-NDA-Requests: Requesting access to EventLogging data for Vinitha - https://phabricator.wikimedia.org/T181952#3824275 (10Tgr) Per [[https://wikitech.wikimedia.org/wiki/Production_shell_access|Production shell access]], this needs approval from the empl... [21:58:44] RECOVERY - puppet last run on labstore1007 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:59:35] !log smalyshev@tin Started deploy [wdqs/wdqs@353b3cb]: temporary fix for T182464, better fix coming soon [21:59:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:59:45] T182464: Updater got stuck on large update - https://phabricator.wikimedia.org/T182464 [21:59:55] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [21:59:55] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0 [22:05:30] !log smalyshev@tin Finished deploy [wdqs/wdqs@353b3cb]: temporary fix for T182464, better fix coming soon (duration: 05m 55s) [22:05:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:05:41] T182464: Updater got stuck on large update - https://phabricator.wikimedia.org/T182464 [22:06:44] PROBLEM - Disk space on scb1001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=78%) [22:11:42] (03PS1) 10Reedy: Fix LandingCheck indenting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396535 [22:14:38] (03PS1) 10Reedy: Remove old commented out $wgCollectionFormats [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396536 [22:15:27] !log Kicked off rsync of /data/xmldatadumps/public to labstore1006 & 7 [22:15:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:17:05] (03CR) 10Chad: [C: 032] Fix LandingCheck indenting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396535 (owner: 10Reedy) [22:17:13] (03CR) 10Chad: "Whoops, wrong button" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/396535 (owner: 10Reedy) [22:30:55] madhuvishy: pulling from ms1001 to catch up? [22:32:44] PROBLEM - Disk space on scb1001 is CRITICAL: DISK CRITICAL - free space: / 340 MB (3% inode=78%) [22:38:56] apergos: no because i reimaged, starting again [22:42:54] ah, you mean the data was wiped from the arrays? madhuvishy [22:43:15] apergos: yeah, i had to start again from scratch [22:44:01] (03PS1) 10ArielGlenn: move content translation dumps to new nfs server [puppet] - 10https://gerrit.wikimedia.org/r/396541 (https://phabricator.wikimedia.org/T179942) [22:44:34] ok, well, good luck! [22:45:57] :) hopefully it'll be done when my work week starts Monday [22:49:19] baw / legoktm I'm out today (and the next 2 fridays). Ask thcipriani (or no_justification ) [22:55:54] change seems fairly innocuous, but "uppity" doesn't sound like it reaches the critical level for Friday deployment (excepting that there have been a weird number of deployments today) [22:56:54] (03CR) 10ArielGlenn: [C: 031] "Talked about it with Joal and the storage requirements are fine." [puppet] - 10https://gerrit.wikimedia.org/r/396324 (https://phabricator.wikimedia.org/T175844) (owner: 10Joal) [23:00:05] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:00:55] PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:01:04] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:01:04] PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:01:04] PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:01:05] PROBLEM - puppet last run on mw1187 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:01:44] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:01:45] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:02:44] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:02:45] PROBLEM - puppet last run on logstash1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:04] PROBLEM - puppet last run on nitrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:04] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:24] PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:44] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:21:44] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:23:46] !log force ran puppet on contint2001 [23:23:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:04] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [23:28:24] RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [23:28:44] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:30:04] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:30:55] RECOVERY - puppet last run on mw1282 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:31:04] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:31:04] RECOVERY - puppet last run on mw1186 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:31:04] RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:31:05] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:31:44] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:32:44] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:32:45] PROBLEM - Disk space on scb1001 is CRITICAL: DISK CRITICAL - free space: / 318 MB (3% inode=78%) [23:32:45] RECOVERY - puppet last run on logstash1005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:33:04] RECOVERY - puppet last run on nitrogen is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures