[00:00:01] Meh, just let it run to completion [00:00:04] 3Ops-Access-Requests, operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1026671 (10Matanya) [00:00:04] its really my fault, i keep forgetting to look at the next day in the depl schedule [00:00:05] RoanKattouw, ^d, marktraceur, RoanKattouw: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150210T0000). [00:00:16] Krenair: yes [00:00:16] i wish we had per time-zone rendering ) [00:00:26] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 48 threshold =0.1% breach: status: red, number_of_nodes: 2, unassigned_shards: 47, timed_out: False, active_primary_shards: 45, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 90, initializing_shards: 1, number_of_data_nodes: 2 [00:00:35] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 47 threshold =0.1% breach: status: yellow, number_of_nodes: 2, unassigned_shards: 46, timed_out: False, active_primary_shards: 46, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 91, initializing_shards: 1, number_of_data_nodes: 2 [00:00:37] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.137:9200/_cluster/health error while fetching: Request timed out. [00:00:42] 3Ops-Access-Requests, operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1026674 (10Dzahn) Did the test of the NDA volunteer process work? [00:00:48] RoanKattouw, ^d, marktraceur: still running scap ( [00:00:59] yurikR1: Actually the deployments page *does* have per-timezone rendering [00:01:17] RoanKattouw, the day breaks are UTC [00:01:17] At least it does for me when I'm in Europe or on the East Coast [00:01:26] OK that, yes [00:01:28] That is annoying [00:01:32] yep (( [00:01:36] yurikR1, I can wait for you, it's fine [00:01:36] The times themselves are converted though [00:01:36] (03PS1) 10Ori.livneh: vbench: tidy up string formatting code [puppet] - 10https://gerrit.wikimedia.org/r/189626 [00:01:43] Krenair: https://gerrit.wikimedia.org/r/189625 [00:01:49] (03CR) 10Ori.livneh: [C: 032 V: 032] vbench: tidy up string formatting code [puppet] - 10https://gerrit.wikimedia.org/r/189626 (owner: 10Ori.livneh) [00:02:05] 3Ops-Access-Requests, operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1026677 (10Dzahn) If you go to the Operations tag page at https://phabricator.wikimedia.org/tag/operations/ can you "Watch Project" or "Subs... [00:02:10] 3Ops-Access-Requests, operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1026678 (10Matanya) that was before the migration, and at the time, it worked. [00:03:44] 3Ops-Access-Requests, operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1026680 (10Matanya) None of them. [00:06:28] where's it at now yurikR1? [00:06:40] Krenair, 00:01:56 Updating LocalisationCache for 1.25wmf16 using 4 thread(s) [00:08:30] (03CR) 10BBlack: [C: 031] add Cyrillic project domain names [dns] - 10https://gerrit.wikimedia.org/r/189102 (https://phabricator.wikimedia.org/T88722) (owner: 10Dzahn) [00:09:33] RoanKattouw, greg-g: So does stuff break if you try to sync other things while the l10n cache is updating? [00:09:42] (03CR) 10Dzahn: [C: 032] add Cyrillic project domain names [dns] - 10https://gerrit.wikimedia.org/r/189102 (https://phabricator.wikimedia.org/T88722) (owner: 10Dzahn) [00:09:46] Probably not [00:10:04] Hmm although all the sync scripts were rewritten [00:10:07] RoanKattouw: won't have weird RL implications? [00:10:08] So you should ask bd808 [00:10:22] greg-g: Depends on whether it's in the cache building or cache syncing phase [00:10:27] I think it will refuse to sync now [00:10:36] syncing apaches [00:10:43] cool, not much longer then [00:10:46] Krenair: around [00:11:57] Trying to run sync-file/sync-dir while the scap was active would fail but there's nothing automated that stops changing files in /srv/mediawiki-staging which is actually horrible [00:12:06] RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 5, timed_out: False, active_primary_shards: 46, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 131, initializing_shards: 2, number_of_data_nodes: 3 [00:12:10] ebernhardson, okay, you're not too late because something else ran into the swat window :) [00:12:15] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 5, timed_out: False, active_primary_shards: 46, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 131, initializing_shards: 2, number_of_data_nodes: 3 [00:12:16] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 5, timed_out: False, active_primary_shards: 46, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 131, initializing_shards: 2, number_of_data_nodes: 3 [00:13:16] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:34] !log restarted elasticsearch on logstash1001; OOM [00:13:39] Logged the message, Master [00:15:36] PROBLEM - HHVM rendering on mw1139 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:06] PROBLEM - Apache HTTP on mw1139 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:26] hm. that sounds bad [00:19:54] sync-common 70% [00:20:44] what's up with mw1139? [00:21:04] https://ganglia.wikimedia.org/latest/?c=API%20application%20servers%20eqiad&h=mw1139.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2 does not look great [00:21:36] that doesn't look right at all [00:21:55] network drop could be pybal routing around it I guess [00:22:06] PROBLEM - HHVM rendering on mw1128 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:22:26] PROBLEM - Apache HTTP on mw1128 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:22:37] paravoid: rolling hhvm outage? ^ mw1139 and not mw1128 [00:22:43] *now [00:22:44] that's another box [00:23:01] is it scap that's causing it? [00:23:06] PROBLEM - HHVM busy threads on mw1139 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [86.4] [00:23:13] 96% commons synced [00:23:45] PROBLEM - HHVM queue size on mw1139 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [80.0] [00:23:55] yurikR1: Probably not scap's fault. neither of them are rsync slaves [00:24:57] rebuilding cdbs, 13% [00:27:35] PROBLEM - HHVM busy threads on mw1128 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [86.4] [00:28:46] PROBLEM - HHVM queue size on mw1128 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [80.0] [00:29:17] !log yurik Finished scap: syncing ZeroBanner i18n (duration: 34m 06s) [00:29:21] aright [00:29:23] Logged the message, Master [00:29:48] tgr [00:30:06] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:30:14] yurikR1, all good? [00:30:22] Krenair, yep, thx ) [00:30:25] sorry for the delay [00:30:35] (03CR) 10Alex Monk: [C: 032] Add -labs settings for Score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181358 (https://phabricator.wikimedia.org/T85049) (owner: 10Gergő Tisza) [00:30:46] (03Merged) 10jenkins-bot: Add -labs settings for Score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181358 (https://phabricator.wikimedia.org/T85049) (owner: 10Gergő Tisza) [00:31:03] http://commons.wikimedia.beta.wmflabs.org/ looks pretty raw [00:31:51] (03PS1) 10Ori.livneh: vbench: add 'vb' launcher [puppet] - 10https://gerrit.wikimedia.org/r/189630 [00:32:29] (03CR) 10Ori.livneh: [C: 032 V: 032] vbench: add 'vb' launcher [puppet] - 10https://gerrit.wikimedia.org/r/189630 (owner: 10Ori.livneh) [00:33:16] PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Puppet has 1 failures [00:33:22] !log krenair Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/181358/ (duration: 00m 05s) [00:33:25] Logged the message, Master [00:33:47] !log krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/181358/ (duration: 00m 05s) [00:33:51] tgr, ok? [00:33:54] Logged the message, Master [00:34:19] (03PS1) 10Ori.livneh: Add missing file to I28ae265729 [puppet] - 10https://gerrit.wikimedia.org/r/189631 [00:34:28] (03CR) 10Ori.livneh: [C: 032 V: 032] Add missing file to I28ae265729 [puppet] - 10https://gerrit.wikimedia.org/r/189631 (owner: 10Ori.livneh) [00:35:26] RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:35:59] tgr? [00:36:24] Krenair: slightly less broken than before, I'll accept that for now :) [00:36:27] http://en.wikipedia.beta.wmflabs.org/wiki/Sandbox [00:36:28] :) [00:36:38] no exception at least [00:36:55] ebernhardson, ok, around? [00:37:00] Krenair: ya [00:40:04] Krenair: got time for a quick fix? [00:40:17] the error seems trivial [00:40:42] I guess so, waiting for jenkins... [00:42:15] tgr, actually, maybe let's put it behind roan's patches [00:42:22] (03PS1) 10Gergő Tisza: Fix Score URL in I894f4591528aefb0182178e292984424ec16db11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189634 (https://phabricator.wikimedia.org/T85049) [00:42:32] sure [00:43:33] (03CR) 10Se4598: [C: 031] Fix Score URL in I894f4591528aefb0182178e292984424ec16db11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189634 (https://phabricator.wikimedia.org/T85049) (owner: 10Gergő Tisza) [00:45:34] is the planned wikitech outage already in 15min? [00:45:56] !log krenair Synchronized php-1.25wmf16/extensions/Echo/includes/DiscussionParser.php: https://gerrit.wikimedia.org/r/#/c/189549/ (duration: 00m 06s) [00:46:02] Logged the message, Master [00:46:35] !log krenair Synchronized php-1.25wmf16/extensions/Echo/tests/phpunit/includes/DiscussionParserTest.php: https://gerrit.wikimedia.org/r/#/c/189549/ (duration: 00m 06s) [00:46:36] ebernhardson, please test [00:46:38] Logged the message, Master [00:47:54] (03PS1) 10Ori.livneh: vbench: pass all args from vb to vbench [puppet] - 10https://gerrit.wikimedia.org/r/189636 [00:48:12] (03CR) 10Ori.livneh: [C: 032 V: 032] vbench: pass all args from vb to vbench [puppet] - 10https://gerrit.wikimedia.org/r/189636 (owner: 10Ori.livneh) [00:49:50] Krenair: hmm, mentions not working right :( please rollback and cancel the other [00:50:05] did not start the other [00:50:22] ah, apart from the core update and branch, ok [00:50:39] Krenair: yea, i mean do the revert on this patch, anddont deploy the other :) [00:50:53] not sure why its not working...but enough people use echo i should figure it out not in prod :) [00:51:36] ebernhardson, do we revert the core submodule update or revert the branch change? [00:51:40] for wmf16 [00:51:42] Krenair: the core submodule update [00:51:53] Krenair: leave the branch, we will fix up and re-deploy again tomorow probably [00:54:42] bd808: you around? [00:54:52] wondering about logstash health [00:54:58] gwicke: hup. what's up? [00:55:06] it's wreck [00:55:12] (03PS1) 10Ori.livneh: vbench: don't trip click handler [puppet] - 10https://gerrit.wikimedia.org/r/189641 [00:55:17] earlier this morning I got errors about an index not existing [00:55:24] I've been doing a rolling restart because of OOMs [00:55:26] (03CR) 10Ori.livneh: [C: 032 V: 032] vbench: don't trip click handler [puppet] - 10https://gerrit.wikimedia.org/r/189641 (owner: 10Ori.livneh) [00:55:28] now that's gone, but here aren't any entries [00:55:41] ok, so it's known [00:55:47] *nod* yeah [00:56:28] Having no mediawiki logs is sort of on purpose but your apps should still be logging. Hopefully I'll get it un-sick soon [00:57:12] yup [00:57:23] in many ways logstash is a victim of its own success [00:57:24] (03CR) 10Jforrester: [C: 031] Enable VisualEditor on project namespace at cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189211 (https://phabricator.wikimedia.org/T88896) (owner: 10Glaisher) [00:57:31] we need the new hardware [00:57:39] not sure if there is a quote for it yet or not [00:58:42] what are the specs like? [00:58:56] cassandra nodes have 3t ssd space + 64G ram [00:59:05] I wonder if elasticsearch has similar needs [00:59:09] "like pod elasticsearch boxes but with spinning disk" [00:59:20] *prod [00:59:27] spinning disks? [00:59:29] ouch ;) [00:59:43] ssd seemed too expensive for logging [00:59:59] we have 5.5T on each node [01:00:01] 'cuz no one cares about logging [01:00:12] !log krenair Synchronized php-1.25wmf16/extensions/Echo/includes/DiscussionParser.php: https://gerrit.wikimedia.org/r/#/c/189638/ (duration: 00m 05s) [01:00:39] 1T consumer / mid-level costs $400-$600 these days [01:00:40] !log krenair Synchronized php-1.25wmf16/extensions/Echo/tests/phpunit/includes/DiscussionParserTest.php: https://gerrit.wikimedia.org/r/#/c/189638/ (duration: 00m 08s) [01:00:48] ebernhardson, ^ [01:01:02] should be properly reverted now [01:01:11] gwicke: we need at least 2T per node and really more like 4T to grow [01:01:16] RoanKattouw, around still? [01:01:30] Krenair: Yeah [01:01:40] Krenair: looks to work now, thanks [01:01:41] bd808: *nod*, just saying that ssds aren't that expensive any more [01:01:44] I see you're having fun reverting a submodule change? [01:01:49] indeed [01:01:57] Echo DiscussionParser change did not go as hoped :( [01:02:36] robh: Do I need to do anything to help with getting a quote for the new logstash elasticsearch backends? -- https://phabricator.wikimedia.org/T87460#1008558 [01:02:53] Like make a ticket that makes sense? [01:04:08] hrmm, i think that makes sense, i just hadn't gotten to it yet [01:04:27] it could stand to be a bit more fomalized but whatever [01:04:48] oh wait [01:04:52] yea, this is to take temp nodes [01:04:56] which is rejected.... [01:05:06] springle: are you around and ready to stand by while I move wikitech? [01:05:13] robh: right. now we want new hardware [01:05:17] I’ve marked both wikis read-only… nothing left to do but move dns, right? [01:05:32] bd808: So yea, if you guys dont make a new one, I'll end up making one and adding you all to it [01:05:39] cuz that really should just be rejected and linked to the new one [01:05:49] but i'll get to it, its just not yet on top of list [01:06:20] robh: *nod* mostly wanted to know it was still in the queue [01:06:40] looks like it might be now (once that ticket is created) [01:06:40] yep, 5 hardware requests still pending [01:06:47] andrewbogott: go ahead [01:06:49] andrewbogott, fyi I am about to deploy a swat change, this won't conflict will it? [01:07:18] Krenair: it won’t, but I can’t guarantee that wikitech will remain readable in the meantime. [01:07:24] Krenair: let me know when you’re clear? [01:07:31] I have what I need open, I can use wikitech-static [01:07:35] we should be fine [01:07:38] andrewbogott: sorry, swat went over [01:07:45] s’alright [01:07:48] (03PS1) 10Andrew Bogott: Move wikitech from virt1000 to silver [dns] - 10https://gerrit.wikimedia.org/r/189643 [01:08:03] unless springle is missing lunch [01:08:11] nope [01:10:51] !log krenair Synchronized php-1.25wmf16/extensions/VisualEditor/modules/ve-mw: https://gerrit.wikimedia.org/r/#/c/189144/ (duration: 00m 05s) [01:11:09] RoanKattouw, please - [01:11:11] sigh. [01:11:17] guess I can test this myself [01:12:15] yeah seems fine [01:12:48] James_F, ^ [01:13:11] James_F, next is https://gerrit.wikimedia.org/r/#/c/189147/1 - do you know what we're fixing here? [01:13:18] there's no task [01:13:32] Krenair: It's the same as the VE change you just deployed… [01:13:42] Krenair: They needed to go together, I think. [01:14:23] Krenair: Roan confirms he screwed up. [01:14:46] ok, I will sync this as well [01:14:50] Thanks. [01:14:52] but it doesn't seem very broken to me [01:16:33] (03PS2) 10Alex Monk: Enable VisualEditor on project namespace at cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189211 (https://phabricator.wikimedia.org/T88896) (owner: 10Glaisher) [01:17:57] (03CR) 10Alex Monk: [C: 032] Enable VisualEditor on project namespace at cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189211 (https://phabricator.wikimedia.org/T88896) (owner: 10Glaisher) [01:20:16] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [01:20:30] 3Multimedia, operations, MediaWiki-extensions-UploadWizard: Investigate server error when uploading an OGV - https://phabricator.wikimedia.org/T89018#1026790 (10Tgr) >>! In T89018#1026574, @Tgr wrote: > Fatal/OOM? The zend interpreter can die without even invoking the shutdown handler on certain fatal errors (no... [01:20:41] (03Merged) 10jenkins-bot: Enable VisualEditor on project namespace at cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189211 (https://phabricator.wikimedia.org/T88896) (owner: 10Glaisher) [01:21:49] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/189211/ per James_F in -dev (duration: 00m 06s) [01:23:29] !log krenair Synchronized php-1.25wmf16/resources/lib/oojs-ui: https://gerrit.wikimedia.org/r/#/c/189147/ (duration: 00m 08s) [01:23:59] James_F, please check etc. [01:26:05] Krenair: Done. Thanks! [01:26:42] ok. I had tried to slip that config through jenkins while waiting for the oojs ui one to be merged, but instead it queued behind. gj jenkins [01:26:49] 3RESTBase, operations: Public entry point for RESTBase - https://phabricator.wikimedia.org/T78194#1026816 (10Jdforrester-WMF) [01:26:51] Krenair: :-) [01:27:10] (03CR) 10Alex Monk: [C: 032] Fix Score URL in I894f4591528aefb0182178e292984424ec16db11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189634 (https://phabricator.wikimedia.org/T85049) (owner: 10Gergő Tisza) [01:27:12] Krenair: so… all clear? [01:27:16] (03Merged) 10jenkins-bot: Fix Score URL in I894f4591528aefb0182178e292984424ec16db11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189634 (https://phabricator.wikimedia.org/T85049) (owner: 10Gergő Tisza) [01:27:21] tgr had one last thing for me [01:28:15] !log krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/189634/ (duration: 00m 10s) [01:28:19] tgr, there you go^ [01:28:44] Krenair: thanks, works this time [01:29:00] all clear andrewbogott [01:29:09] thx [01:29:27] (03CR) 10Andrew Bogott: [C: 032] Move wikitech from virt1000 to silver [dns] - 10https://gerrit.wikimedia.org/r/189643 (owner: 10Andrew Bogott) [01:29:52] where is morebots anyway? [01:30:23] 3operations, Project-Creators: Create #site-incident tag and use it for incident reports - https://phabricator.wikimedia.org/T85889#1026827 (10Aklapper) 5Open>3stalled Setting status to "stalled" - feel free to set back to open once this has been discussed and decided. [01:30:42] 3Ops-Access-Requests: Sudo for Roan on osmium - https://phabricator.wikimedia.org/T89038#1026830 (10RobH) While previously having root does show a history of trust, it doesn't invalidate any part of the access escalation request. Per our policies that operations settled on in our last in person meetings in Janu... [01:31:30] morebots, that better? [01:31:30] I am a logbot running on tools-exec-11. [01:31:30] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [01:31:30] To log a message, type !log . [01:31:38] !log moved wikitech to silver [01:32:49] guess I'll manually fill in SAL [01:33:04] heh [01:33:10] gotta love db lock [01:33:31] at least it tweets or whatever still right? (i dont use twitter) [01:33:31] chicken-egg [01:33:38] when you move wikitech :) [01:34:01] well there was a conversation about having SAL update on wikitech static as non static but then it required it being something than a plain old wiki page [01:34:06] so it was just not worth th eeffort [01:36:57] hm… can y’all try to log out/in at wikitech and confirm that it’s broken for everyone and not just me? [01:37:19] I was just trying to login to edit sal :) [01:37:24] confirmed broken [01:37:33] i was trying but now stopped [01:37:37] heh [01:37:52] 'k [01:38:28] or, maybe, just maybe, we should log things to a.... logging system [01:39:18] springle: any chance you didn’t replicate the 2fa db? [01:39:58] andrewbogott: there is no db called 2fa on either virt1000 or silver, so no :) [01:40:08] it’s something like oathauth [01:40:11] or oauthath? [01:40:15] or some similar acronym [01:40:22] oathauth, iirc [01:40:26] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [01:40:45] well you could strip the 2fa off your login and retry [01:40:48] no such db oathauth on either host [01:41:07] https://wikitech.wikimedia.org/wiki/Password_reset [01:41:18] oathauth_users is the table [01:41:19] mediawiki.org/wiki/Extension:OATHAuth [01:41:19] https://mediawiki.org/wiki/Extension:OATHAuth [01:41:27] where you wipe the 2fa off on labsdb [01:41:34] sorry, labswiki [01:42:10] labswiki.oathauth_users does exist on both hosts [01:42:19] is the extension installed? i wanted to check but Special:Version looks broken [01:42:23] so yea, thats the 2fa table, seems like it did copy [01:42:23] https://wikitech.wikimedia.org/wiki/Special:Version [01:42:57] so far silver db hasn't seen any connections except from root [01:43:37] interesting, I turned off 2fa but /now/ it complains that my 2fa token is wrong [01:43:54] PHP Fatal error: Class 'LuaSandbox' not found [01:43:55] ^ [01:44:17] "Class 'LuaSandbox' not found" [01:44:32] dammit, I thought I fixed all these [01:44:35] Reedy, still up? [01:44:44] andrewbogott: i think Reedy is gone :/ [01:44:51] I know [01:44:52] like for real..sigh [01:45:12] yea i cannot really be sad about that cuz hes gonna fly planes [01:45:21] * robh cannot legally fly planes as a diabetic so of course he wants to. [01:46:11] RobH, you have diabetes also? [01:46:14] im pretty sure i could get experimental license and kick it in ultralights. [01:46:18] type1 since i was 10 [01:46:22] hmm, so php-luasandbox is installed [01:46:22] I have type 1 also [01:46:27] Since I was 16 [01:46:29] it's just a major version difference to before [01:46:34] yea, not our fault diabetes! ;] [01:47:06] andrewbogott: what's the path to localconfig pls [01:47:06] (that really pisses off my buddy with type 2) [01:47:14] andrewbogott: LocalSettings.php i mean [01:47:20] mutante: it’s like a production wiki [01:47:34] so everything is in /srv/mediawiki/wmf-config/etc/etc/etc/etc [01:47:57] alright.. hmm [01:48:16] Krenair: interested in helping out with this? You probably have a login on silver [01:48:46] what are we missing exactly? [01:49:05] PHP Fatal error: Class 'LuaSandbox' not found in /srv/mediawiki/php-1.25wmf15/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 17 [01:49:09] for starters [01:49:20] I can't ssh to silver [01:49:25] Krenair: ii php-luasandbox 2.0-7+wmf2.1 [01:49:29] it's installed but still not found [01:50:01] the admin group ‘deployment’ is applied on silver. [01:50:05] I don’t know what that means loginwise [01:50:10] but it should be like any other wiki host [01:50:21] also running the tip config [01:50:35] I get "open failed: administratively prohibited: open failed" when trying to ssh in [01:50:39] CommonSettings.php: $wgScribuntoDefaultEngine = 'luasandbox'; [01:50:43] I try to ssh in exactly the same way I would to tin, it gives me ssh_exchange_identification: Connection closed by remote host [01:50:59] can we just disable Scribunto on wikitech? [01:51:19] that seems rash, we support templates after all [01:51:47] seems there are existing modules [01:51:49] e.g. https://wikitech.wikimedia.org/wiki/Module:Deployment_schedule_test [01:52:41] ok, so why does it no find the interpreter when we have that package installed [01:53:38] difference between silver and virt1000: on silver it's the 'amd64' version of liblua [01:54:34] (03PS1) 10Springle: switch wikitech to silver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189648 [01:54:35] mutante: silver is Trusty, so there should be lots of package version differences. [01:54:50] andrewbogott: ^ unrelated to lua, but likely also needed [01:55:17] else silver MW will use virt1000 db [01:55:26] springle: yeah, that’s clearly important :) [01:55:35] But let’s hold off until we aren’t blocked by other things. [01:55:38] bd808: any idea? [01:55:39] yep [01:57:29] andrewbogott: are you running under hhvm of php5? [01:57:34] no [01:57:50] no hhvm, normal php5 then [01:57:59] of should have been or [01:58:43] "Yes." [01:59:17] okay yeah, I definitely can't ssh in to silver [01:59:24] though it is in the deployment admin group [01:59:41] it's not due to being in the virt cluster is it? [01:59:48] It’s not in the virt cluster. [01:59:56] Krenair: i see your home dir and an ssh key on silver [02:00:34] got it. [02:00:38] Accepted publickey for krenair [02:00:40] I was trying silver.eqiad.wmnet [02:01:13] ah, sorry, yeah, it’s public [02:02:38] huh, php shell is not working for me here [02:02:38] php -m isn't showing luasandob as installed [02:02:48] *luasandbox [02:02:55] just "Interactive mode enabled" and then no response [02:03:45] back in August 2014, Reedy mentioned the same error once [02:03:50] isn't in /etc/php5/mods-available either [02:03:54] not seeing extension=luasandbox.so in php ini. is that needed [02:04:47] http://www.mediawiki.org/wiki/Extension:Scribunto#php.ini [02:05:09] yes [02:05:16] yeah, I think that's the problem [02:05:24] and should it be in apache2 or cli? [02:05:38] probably both..? [02:05:52] the ini files are in /etc/php5/conf.d but not symlinked to /etc/php5/apache2/conf.d [02:05:53] indeed, but wouldnt we expect the package to do that [02:06:20] so, there are 0 extensions referenced in that file [02:06:27] surely this isn’t the one exception? [02:06:50] The mediawiki puppet stuff expects all trusty boxes to be running hhvm [02:06:54] oh, maybe I’m looking at the wrong config [02:07:08] there is also no lua in /etc/php5/apache2/conf.d on a random appserver [02:07:14] the method for setting up php extensions is different [02:07:24] because they are hhvm [02:07:34] *nod* random appservers are running hhvm [02:07:43] does this mean wikitech should switch to hhvm ? [02:08:11] the 3 files in /etc/php5/conf.d need to be symlinked into /etc/php5/apache2/conf.d and /etc/php5/cli/conf.d [02:08:55] bd808: can you do that or shall I? [02:09:25] I shouldn't be able to... [02:09:48] drwxr-xr-x 2 root root [02:09:57] I wouldn't expect to be able to [02:10:39] and we need to fix puppet to put the ini files in the right place (or update the packages) [02:10:43] * mutante mumbles something about live hack while being 'like cluster' [02:10:47] Does anyone know https://phabricator.wikimedia.org/p/Memeht/ ? [02:11:08] ok, Special::Version loads now. Anyone remember why we were doing that? [02:11:18] to check if Scribunto extension was installed :) [02:11:19] * bd808 looks to see what mw-vagrant does for this [02:11:55] oh! and of course it’s looking at the db on virt1000 so thinks I’m still using 2fa [02:12:27] hmph [02:12:39] (03CR) 10Andrew Bogott: [C: 032] switch wikitech to silver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189648 (owner: 10Springle) [02:14:43] there is some permission thing for thumbnail dirs [02:14:49] it needs to be able mkdir new dirs [02:15:07] that’s not where it’s supposed to look for images [02:15:15] andrewbogott: mediawiki-vagrant has this hack to fix the php extensions for trusty -- https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/modules/apache/files/envvars#L18-L23 [02:15:15] but maybe creating thumbnails it ignores the image dir [02:15:54] bah, I don’t remember how to update config on tin [02:16:01] I thought I did a rebase in /srv/mediawiki/wmf-config [02:16:18] in /srv/mediawiki-staging [02:16:39] Fiona, OPW, I think? [02:16:46] Fiona, per https://phabricator.wikimedia.org/T77960#981419 [02:16:59] oh but I see you subscribed there now, ok [02:17:06] !log l10nupdate Synchronized php-1.25wmf15/cache/l10n: (no message) (duration: 00m 02s) [02:17:10] Yeah, I was trying to figure out how legit that request was. [02:17:19] They seem to be involved with EventLogging and Parsoid, maybe? [02:17:20] ok, and then to sync... [02:17:24] sync-common on tin? [02:17:25] So I figured somebody must know them. [02:17:41] andrewbogott: is this for the db-eqiad.php? sync-file [02:17:44] andrewbogott: sync-file is probably all you need [02:18:04] sync-common won't do what you think it will [02:18:14] !log LocalisationUpdate completed (1.25wmf15) at 2015-02-10 02:17:10+00:00 [02:18:20] sync-file, sync-dir and scap are for pushing things out [02:18:27] !log andrew Synchronized wmf-config/db-eqiad.php: (no message) (duration: 00m 06s) [02:18:34] sync-common pulls things in [02:18:44] !log andrew Synchronized wmf-config/db-eqiad.php: (no message) (duration: 00m 06s) [02:19:13] meaning it is what gets run on each host to sync with tin (or a designated rsync proxy) [02:19:15] Access denied for user 'wikiuser'@'208.80.154.136' [02:19:44] mutante: good. now to let it in carefully [02:20:03] ok, now wikitech is erroring out entirely [02:20:27] seems like when this happened before it was because I needed to regenerate the localized strings or something... [02:20:35] * bd808 waits for springle's magic fingers to fix it [02:20:42] i think it just started talking to the right db [02:20:46] and what bd808 said [02:20:52] oh, ok [02:21:03] I can see my User page now [02:21:04] Ah, I should’ve read the next line on the error page [02:21:05] ok, wikiuser@208.80.154.136 allowed [02:21:09] still read_only db [02:21:10] works again [02:21:40] cant test 2fa, phone down :/ [02:21:41] yeah, and I still can’t log in [02:21:52] your phone is ‘down’? [02:21:55] andrewbogott: we need to make a clean cut on virt1000, then allow writes on silver [02:22:11] yes, the battery is horrible [02:22:12] springle: ok, but, I guess I’d like silver to work first [02:22:35] this auth thing doesn’t make sense, I can see it succeeding the ldap check [02:22:36] andrewbogott: ok to do that? once we allow it, silver will diverge from virt1000. we can still fail back, but any writes done in the interim won't be on virt1000 [02:22:37] in incognito mode this works -- https://wikitech.wikimedia.org/wiki/User_talk:BryanDavis [02:22:54] springle: please not until I can log in [02:23:06] As myself (still have a live session on wikitech) it errors out because of the ro db [02:23:06] I don’t want to commit to the new box while it’s broken [02:23:22] unless you think that enable r/w will somehow fix logins [02:23:26] andrewbogott: ok, waiting (can you actually login with a read_only db?) [02:23:36] dunno [02:23:42] heh [02:23:43] i doubt it [02:23:49] ok, let’s try then [02:23:49] it needs to update the table [02:23:52] ok, switching over [02:23:53] after you used a code, no? [02:24:02] I guess if someone modifies the documentation in the next 10 minutes we can just chuck it :) [02:24:13] mutante: I’m not using 2fa [02:24:20] andrewbogott: try it now [02:24:20] and, no, it doesn’t use up codes unless you’re using emergency codes [02:24:25] oh, and you still can't login without it? ok.. [02:24:38] nah, same behavior [02:24:40] still can’t log in [02:26:55] quick, everyone try at once so I can’t read the logfile [02:27:33] maybe the /usr/local/apache path is related [02:28:15] you mean, the thumbnail thing? [02:28:24] it always tries to create an "Ambox_notice.png" and the referer is the login page [02:28:57] 3RESTBase, Services, operations, Scrum-of-Scrums: RESTbase deployment - https://phabricator.wikimedia.org/T1228#1026944 (10GWicke) [02:29:31] this is osmething in openstack I think [02:29:51] 3operations, Release-Engineering, WMF-Design: Better WMF error pages - https://phabricator.wikimedia.org/T76560#1026945 (10Jaredzimmerman-WMF) @technical13 what would a "desktop view" mean in this case, as it is a responsive page it adjusts for the device… many browsers (mobile and desktop) allow you to spoof or... [02:32:09] !log l10nupdate Synchronized php-1.25wmf16/cache/l10n: (no message) (duration: 00m 03s) [02:33:16] !log LocalisationUpdate completed (1.25wmf16) at 2015-02-10 02:32:13+00:00 [02:33:51] ah, firewall [02:35:29] andrewbogott: wmf-config/InitialiseSettings.php: 'labswiki' => '/usr/local/apache/images', [02:35:35] ^ i think that's the thumb issue [02:35:42] need to fix path in IntiSettings [02:35:48] mutante: ah, that should be /srv/org/somethingsomething [02:36:07] looking [02:38:11] ehm, yea, normally this is /mnt/upload7 for the other prod wikis [02:38:48] 'default' => '/mnt/upload7/$site/$lang', [02:41:28] you mean we are using /srv/mediawiki/images/ here? [02:41:42] omg it takes like 90 minutes to ‘git review' [02:41:49] I’ve never been anywhere with internet as slow as SF [02:41:59] did you copy the images from virt1000? [02:42:07] mutante: yes [02:42:14] but, no, they’re in... [02:42:21] well, look at the apache config [02:42:27] they’re in a weird place for historical reasons [02:42:51] (03PS1) 10Andrew Bogott: Give wikitech (wherever it is) access to keystone and nova services. [puppet] - 10https://gerrit.wikimedia.org/r/189654 [02:42:52] we have the choice now [02:42:58] it's new [02:44:08] mutante: ok, I’m not following, but… I welcome a patch, as long as you keep in mind that the images are in a weird place on silver [02:44:20] (03CR) 10Andrew Bogott: [C: 032] Give wikitech (wherever it is) access to keystone and nova services. [puppet] - 10https://gerrit.wikimedia.org/r/189654 (owner: 10Andrew Bogott) [02:44:27] just tell me where you put the images?? [02:45:29] from the apache site config: Alias /w/images /srv/org/wikimedia/controller/wikis/images [02:46:13] ok, I can log in now. bd808 and Krenair if still here can you confirm that your 2fa works? [02:46:58] (03PS1) 10Dzahn: fix image path on wikitech after switch to silver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189655 [02:48:56] (03PS2) 10Dzahn: fix image path on wikitech after switch to silver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189655 [02:49:08] (03CR) 10Andrew Bogott: [C: 031] fix image path on wikitech after switch to silver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189655 (owner: 10Dzahn) [02:49:11] thanks mutante [02:49:57] andrewbogott: so we're good? [02:50:09] springle: so far everything is working. I logged in, got a list of instances, created a new one… [02:50:19] let’s not drop anything off vir1000 just yet [02:50:23] yep [02:50:43] but yeah, so far it’s working surprisingly well for me. [02:50:54] Would be nice if a second user could confirm log in/page edit/instance list [02:52:06] andrewbogott: i can login, edit, and list instances [02:52:17] springle: splendid. And it’s not ridiculously slow for you? [02:52:23] It is for me, but I blame my local connection [02:52:29] no it seems fast [02:52:45] Krenair, do you still need https://gerrit.wikimedia.org/r/#/c/189639/ ? [02:52:54] great. [02:52:58] I think you can stand down, then, springle. [02:53:15] I’m sure there will be minor openstack integration things I’ll need to hunt down, but shouldn’t be anything db-related. [02:53:37] thank you, everyone, for your help with this! I always get super tense during things like this :( [02:54:21] !log finished wikitech move to silver. [02:54:29] Logged the message, Master [02:54:32] woo [02:54:47] andrewbogott: i've left wikiuser/wikiadmin db users on virt1000 but switched them to read_only there. nothing else is dropped yet, until you say so [02:55:10] hm, what’s going on here? https://wikitech.wikimedia.org/wiki/Labs_Server_Admin_Log [02:55:12] missing templates? [02:55:16] springle: sounds good [02:55:44] springle: any idea why those template bits didn’t replicate? [02:55:58] no [02:56:03] hm [02:56:16] the db was replicated entirely. something else is going on [02:56:18] maybe lua stuff… but that’s still in the db same as everything else, right? [02:56:49] andrewbogott, I logged in and edited [02:56:56] Krenair: thanks! [02:57:22] MaxSem, that was supposed to roll back a broken change [02:57:33] no idea why it randomly failed like that [02:57:45] do you still need it? [02:58:14] will try it again [02:58:35] (that commit it's reverting never made it to tin) [02:59:25] springle: false alarm, those templates look just as broken on the old wikitech [02:59:51] hm, wait, maybe... [03:00:52] yeah, they were broken all along [03:10:31] andrewbogott, I was useless, you didn't need to give me any credit :p [03:10:38] andrewbogott: Woo, congrats. [03:11:16] Also, well done springle and everyone. :-) [03:12:34] moral support counts [03:14:53] hm, and now suddenly my internet is so much slower that I can’t even keep an ssh session alive. [03:15:00] I guess that’s my cue to punch out for the night [03:22:34] andrewbogott, time to install mosh in prod? :P [03:22:45] maybe! [03:23:08] Then we can adopt the ‘all ops on busses, all the time’ policy that I’ve been advocating for. [03:27:05] g’night all — please email me if wikitech croaks [03:27:18] (03PS1) 10Andrew Bogott: Remove the nova::manager class from virt1000. [puppet] - 10https://gerrit.wikimedia.org/r/189658 [03:27:20] (03PS1) 10Andrew Bogott: Revert "Temporary hack: Turn off wikitech-static dump crons." [puppet] - 10https://gerrit.wikimedia.org/r/189659 [03:39:05] (03PS1) 10Cenarium: Remove 'autoreview' from 'autoconfirmed', check the former for PC2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189661 [03:45:33] (03Abandoned) 10Cenarium: Checking "autoreview" instead of "autoconfirmed" for enwiki FlaggedRevs restriction level [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189513 (owner: 10Cenarium) [04:10:51] andrewbogott_afk: Those missing SALs never showed on that page as I recall. They are the two that aren't like the others. See https://wikitech.wikimedia.org/wiki/Template:EmbedSAL to see why [04:30:14] (03PS2) 10Cenarium: Remove 'autoreview' from 'autoconfirmed', check the former for PC2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189661 [04:40:12] (03CR) 10Tim Starling: [C: 031] "The main reason for setting a high max_execution_time is to avoid terminating long-running write requests. Terminating them could lead to " [puppet] - 10https://gerrit.wikimedia.org/r/189505 (owner: 10Giuseppe Lavagetto) [04:43:21] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Feb 10 04:42:17 UTC 2015 (duration 42m 16s) [04:43:26] Logged the message, Master [05:15:17] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [05:33:07] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:28:05] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:55] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 4 failures [06:29:16] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:26] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:47] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 3 failures [06:30:06] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 1 failures [06:42:42] 3operations, Incident-20150205-SiteOutage, MediaWiki-Core-Team, Wikimedia-Logstash: Prototype Monolog and rsyslog configuration to ship log events from MediaWiki to Logstash - https://phabricator.wikimedia.org/T88870#1027134 (10bd808) The syslog transport will pass messages up to 65023 bytes minus the syslog mes... [06:46:07] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:46:35] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [06:46:55] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:47:37] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:47:55] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [07:09:27] <_joe_> !log restarting HHVM on mw1139, in a deadlock in HPHP::StatCache::refresh () [07:09:33] Logged the message, Master [07:10:06] RECOVERY - Apache HTTP on mw1139 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.079 second response time [07:10:06] RECOVERY - HHVM rendering on mw1139 is OK: HTTP OK: HTTP/1.1 200 OK - 65973 bytes in 0.380 second response time [07:10:52] <_joe_> !log restarting HHVM on mw1128, in a deadlock in HPHP::RequestInjectionData::onSessionInit () [07:10:56] Logged the message, Master [07:12:06] RECOVERY - Apache HTTP on mw1128 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.093 second response time [07:12:06] RECOVERY - HHVM rendering on mw1128 is OK: HTTP OK: HTTP/1.1 200 OK - 65965 bytes in 0.237 second response time [07:17:15] RECOVERY - HHVM queue size on mw1139 is OK: OK: Less than 30.00% above the threshold [10.0] [07:17:25] RECOVERY - HHVM busy threads on mw1139 is OK: OK: Less than 30.00% above the threshold [57.6] [07:20:06] RECOVERY - HHVM queue size on mw1128 is OK: OK: Less than 30.00% above the threshold [10.0] [07:20:15] RECOVERY - HHVM busy threads on mw1128 is OK: OK: Less than 30.00% above the threshold [57.6] [07:24:59] 3operations, Incident-20150205-SiteOutage: Nutcracker needs to automatically recover from MC failure - rebalancing issues - https://phabricator.wikimedia.org/T88730#1027183 (10Joe) [07:25:02] 3operations, Incident-20150205-SiteOutage, ops-eqiad: Split memcached in eqiad across multiple racks/rows - https://phabricator.wikimedia.org/T83551#1027182 (10Joe) [07:26:52] 3operations, Incident-20150205-SiteOutage, ops-eqiad: Split memcached in eqiad across multiple racks/rows - https://phabricator.wikimedia.org/T83551#915542 (10Joe) This is not blocked by me at all now - my investigation in rebalancing is done. Whenever mc1017 and mc1018 are moved and reinstalled I can work on m... [07:27:11] 3operations, Incident-20150205-SiteOutage, ops-eqiad: Split memcached in eqiad across multiple racks/rows - https://phabricator.wikimedia.org/T83551#1027185 (10Joe) a:5Joe>3Cmjohnson [07:45:36] PROBLEM - puppet last run on heze is CRITICAL: CRITICAL: puppet fail [07:55:20] 3operations: Create a service_unit custom type for puppet that supports systemd - https://phabricator.wikimedia.org/T89086#1027194 (10Joe) 3NEW [07:55:34] 3operations: Create a service_unit custom type for puppet that supports systemd - https://phabricator.wikimedia.org/T89086#1027201 (10Joe) a:3Joe [07:57:28] 3operations: Create a service_unit custom type for puppet that supports systemd - https://phabricator.wikimedia.org/T89086#1027194 (10Joe) [07:57:29] 3operations: Setup memcached cluster in codfw - https://phabricator.wikimedia.org/T86888#1027212 (10Joe) [08:03:36] RECOVERY - puppet last run on heze is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [08:23:17] (03PS2) 10Giuseppe Lavagetto: mediawiki: send .phtml files to HHVM as well [puppet] - 10https://gerrit.wikimedia.org/r/189440 [08:26:22] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: send .phtml files to HHVM as well [puppet] - 10https://gerrit.wikimedia.org/r/189440 (owner: 10Giuseppe Lavagetto) [08:40:26] 3operations: =?UTF-8?Q?For_exhibitor_of_DubaiWoodShow_2015.?= - https://phabricator.wikimedia.org/T89090#1027273 (10emailbot) [08:45:33] (03PS1) 10Giuseppe Lavagetto: Revert "mediawiki: send .phtml files to HHVM as well" [puppet] - 10https://gerrit.wikimedia.org/r/189688 [08:45:49] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Revert "mediawiki: send .phtml files to HHVM as well" [puppet] - 10https://gerrit.wikimedia.org/r/189688 (owner: 10Giuseppe Lavagetto) [08:45:54] * _joe_ sighs [08:58:29] (03CR) 10KartikMistry: "Real reason for this is:" [puppet] - 10https://gerrit.wikimedia.org/r/188796 (owner: 10KartikMistry) [09:03:39] greetings [09:07:57] <_joe_> ciao godog [09:08:33] (03CR) 10Filippo Giunchedi: [C: 031] Bump cassandra memory on restbase test cluster [puppet] - 10https://gerrit.wikimedia.org/r/189531 (owner: 10GWicke) [09:08:47] hey _joe_ [09:10:37] (03CR) 10Hashar: "Don't we want to run tests with the pinned versions from composer.lock ? It seems to me the issue we had with composer install could be f" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189148 (https://phabricator.wikimedia.org/T85947) (owner: 10Legoktm) [09:16:36] (03CR) 10QChris: Correcting docs and thresholds for eventlogging alarms (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/189588 (owner: 10Nuria) [09:23:11] 3operations: reclaim graphite1002 - https://phabricator.wikimedia.org/T88994#1027334 (10fgiunchedi) I've changed my mind :) I'll reprovision this with jessie and try out dm-cache instead [09:23:44] (03PS15) 10KartikMistry: cxserver: Use different registry for Beta and Production [puppet] - 10https://gerrit.wikimedia.org/r/188796 [09:23:52] 3operations: use graphite1002 to test dm-cache - https://phabricator.wikimedia.org/T88994#1027338 (10fgiunchedi) [09:24:14] 3operations: use graphite1002 to test dm-cache - https://phabricator.wikimedia.org/T88994#1025029 (10fgiunchedi) [09:24:46] 3operations: use graphite1002 to test dm-cache - https://phabricator.wikimedia.org/T88994#1025029 (10fgiunchedi) [09:24:48] 3operations: consider hybrid caching options for ssd+disk - https://phabricator.wikimedia.org/T88992#1027349 (10fgiunchedi) [09:33:28] 3operations: migrate graphite to new hardware - https://phabricator.wikimedia.org/T85909#1027357 (10fgiunchedi) [09:36:59] 3operations: migrate graphite to new hardware - https://phabricator.wikimedia.org/T85909#1027361 (10fgiunchedi) graphite1001 in service at the moment, waiting for graphite2001 to be online to resolve this [09:39:51] * godog morning phabricator sweeping [09:43:18] (03PS16) 10KartikMistry: cxserver: Use different registry for Beta and Production [puppet] - 10https://gerrit.wikimedia.org/r/188796 [09:50:07] (03CR) 10KartikMistry: "Tested at: http://es.wikipedia.beta.wmflabs.org/wiki/Especial:ContentTranslation" [puppet] - 10https://gerrit.wikimedia.org/r/188796 (owner: 10KartikMistry) [09:50:22] akosiaris: ^^ [09:51:36] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 11842 MB (3% inode=99%): [09:52:38] I'll take a look at xenon but that's restbase/cassandra test box [09:55:15] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 645 [09:58:47] ACKNOWLEDGEMENT - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 10681 MB (3% inode=99%): Filippo Giunchedi cassandra/restbase/titan test box [10:00:16] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 945 [10:00:36] <_joe_> godog: that's promising [10:01:22] hehe gwicke ^ 09:51 -icinga-wm:#wikimedia-operations- PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 11842 MB (3% inode=99% [10:02:59] (03PS1) 10Florianschmidtwelzow: Mobile: Fix JS-Exception for non-english beta projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189693 (https://phabricator.wikimedia.org/T89095) [10:04:46] (03CR) 10Florianschmidtwelzow: "Follow up: I36b62c084b17394ba8942a37f44d84dd26bf6faf" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186319 (owner: 10Jdlrobson) [10:05:15] (03PS2) 10Florianschmidtwelzow: Mobile: Fix JS-Exception for non-english beta projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189693 (https://phabricator.wikimedia.org/T89095) [10:05:15] RECOVERY - check_mysql on db1008 is OK: Uptime: 54541 Threads: 2 Questions: 787486 Slow queries: 432 Opens: 1268 Flush tables: 2 Open tables: 64 Queries per second avg: 14.438 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:13:19] (03PS1) 10Giuseppe Lavagetto: mediawiki: rewrite /w/wiki.phtml on HHVM [puppet] - 10https://gerrit.wikimedia.org/r/189697 [10:16:21] (03CR) 10Hashar: [C: 031] "Thanks Florian!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189693 (https://phabricator.wikimedia.org/T89095) (owner: 10Florianschmidtwelzow) [10:23:23] 3operations: Scribunto_LuaInterpreterNotFoundError in production - https://phabricator.wikimedia.org/T88942#1027454 (10Joe) I merged the patch, and upon early testing I found that HHVM is taking ~ 30 seconds to respond to any request to that page; also, the response is very different from the one you get from Ze... [10:40:46] PROBLEM - Host graphite1002 is DOWN: PING CRITICAL - Packet loss = 100% [10:42:05] PROBLEM - Disk space on cerium is CRITICAL: DISK CRITICAL - free space: /mnt/data 11851 MB (3% inode=99%): [10:42:10] <_joe_> godog: is that you? [10:43:26] RECOVERY - Host graphite1002 is UP: PING OK - Packet loss = 0%, RTA = 1.32 ms [10:43:28] graphite1002? yes, forgot to !log [10:43:35] !log reimage graphite1002 [10:43:42] Logged the message, Master [10:43:44] I thought it was in downtime tho [10:44:19] ah yes of course, that's icinga picking up the new host [10:46:55] (03PS1) 10Filippo Giunchedi: install graphite1002 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/189710 (https://phabricator.wikimedia.org/T88994) [10:47:25] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install graphite1002 with jessie [puppet] - 10https://gerrit.wikimedia.org/r/189710 (https://phabricator.wikimedia.org/T88994) (owner: 10Filippo Giunchedi) [10:48:36] (03CR) 10Faidon Liambotis: [C: 04-1] "- I read anecdotal evidence on the web that a configuration reload is enough to reload keys; we should verify this if you haven't done thi" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/189613 (owner: 10BBlack) [10:50:48] (03PS3) 10Giuseppe Lavagetto: mediawiki: do not escape urls in the catchall redirect to https [puppet] - 10https://gerrit.wikimedia.org/r/188762 [10:51:50] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: do not escape urls in the catchall redirect to https [puppet] - 10https://gerrit.wikimedia.org/r/188762 (owner: 10Giuseppe Lavagetto) [10:56:12] (03CR) 10Hoo man: [C: 032] "Beta-only change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189693 (https://phabricator.wikimedia.org/T89095) (owner: 10Florianschmidtwelzow) [10:56:28] (03Merged) 10jenkins-bot: Mobile: Fix JS-Exception for non-english beta projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189693 (https://phabricator.wikimedia.org/T89095) (owner: 10Florianschmidtwelzow) [10:57:13] !log hoo Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 06s) [10:57:17] Logged the message, Master [11:01:55] 3operations: Redirects to https need to set NE (no escape) in apache - https://phabricator.wikimedia.org/T88359#1027601 (10Joe) 5Open>3Resolved [11:02:06] (03CR) 10Krinkle: [C: 031] mediawikiwiki: Allow sysop to add and remove themself from translationadmin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187183 (https://phabricator.wikimedia.org/T87797) (owner: 10Florianschmidtwelzow) [11:15:10] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [11:20:12] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [11:30:12] (03CR) 10KartikMistry: [C: 04-1] "Should not merge unless we sure about Yandex on Beta 'only'." [puppet] - 10https://gerrit.wikimedia.org/r/188517 (owner: 10KartikMistry) [11:36:31] (03CR) 10Florianschmidtwelzow: "thanks for merging hoo :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189693 (https://phabricator.wikimedia.org/T89095) (owner: 10Florianschmidtwelzow) [11:40:35] (03PS13) 10KartikMistry: WIP: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [11:44:33] (03PS1) 10Filippo Giunchedi: introduce partman recipe for dm-cache [puppet] - 10https://gerrit.wikimedia.org/r/189720 (https://phabricator.wikimedia.org/T88992) [11:46:17] (03PS2) 10Filippo Giunchedi: introduce partman recipe for dm-cache [puppet] - 10https://gerrit.wikimedia.org/r/189720 (https://phabricator.wikimedia.org/T88992) [11:46:48] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] introduce partman recipe for dm-cache [puppet] - 10https://gerrit.wikimedia.org/r/189720 (https://phabricator.wikimedia.org/T88992) (owner: 10Filippo Giunchedi) [11:50:25] (03PS2) 10Giuseppe Lavagetto: mediawiki: rewrite /w/wiki.phtml on HHVM [puppet] - 10https://gerrit.wikimedia.org/r/189697 [11:50:29] (03CR) 10Phuedx: [C: 031] Adding original language of this work campaign for WikiGrok [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188731 (owner: 10Kaldari) [11:51:30] 3operations, ops-eqiad: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1027725 (10faidon) [11:51:56] 3operations, ops-eqiad: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1027726 (10faidon) 5declined>3stalled [12:00:16] (03CR) 10Alexandros Kosiaris: [C: 031] "Ew and LGTM in the same sentence. Perhaps we should consider dropping that .phtml file in the future ?" [puppet] - 10https://gerrit.wikimedia.org/r/189697 (owner: 10Giuseppe Lavagetto) [12:08:24] (03CR) 10Giuseppe Lavagetto: "Unluckily, that file is in the binary distribution of mediawiki. We could do that, but it's going to be a backward-compatibility-breaking " [puppet] - 10https://gerrit.wikimedia.org/r/189697 (owner: 10Giuseppe Lavagetto) [12:08:46] <_joe_> akosiaris: I know it's horrible [12:09:40] _joe_: yeah but it's like PHP2 called [12:10:05] and we have trouble moving on to PHP5.5 [12:10:18] <_joe_> akosiaris: :P [12:10:18] it sounds ... weird [12:10:22] <_joe_> it does [12:10:38] <_joe_> also, the fact that it breaks hhvm is quite notable if you ask me :P [12:21:07] 3operations: Upgrade eqiad LVS to 10G - https://phabricator.wikimedia.org/T89120#1027772 (10faidon) [12:27:39] (03CR) 10Alexandros Kosiaris: [C: 032] "I assume you meant smaller (not larger) JVM heap sizes lead to GC stop-the-world pauses. Otherwise LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/189531 (owner: 10GWicke) [12:33:22] (03PS17) 10KartikMistry: cxserver: Use different registry for Beta and Production [puppet] - 10https://gerrit.wikimedia.org/r/188796 (https://phabricator.wikimedia.org/T88793) [12:43:37] (03PS1) 10Matanya: apparmor: minor lint fix [puppet] - 10https://gerrit.wikimedia.org/r/189723 [12:44:30] PROBLEM - puppet last run on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:52:24] akosiaris: pupppet private repo question: source => 'puppet:///private/authdns/id_rsa.pub', returns in pupppet lint: WARNING: puppet:// URL without modules/ found on line 37 [12:52:24] do we want/can fix it ? [12:52:38] many ppp today [13:04:38] 3Ops-Access-Requests, operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1027818 (10Aklapper) When I go to https://phabricator.wikimedia.org/tag/operations/ , "Join Project" and "Edit Members" is greyed out for me a... [13:05:11] PROBLEM - puppet last run on mw1134 is CRITICAL: CRITICAL: Puppet has 1 failures [13:06:00] 3Ops-Access-Requests, operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1027820 (10Aklapper) p:5Triage>3Normal [13:23:13] 3Ops-Access-Requests, operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1027846 (10Matanya) I guess it refers to: https://git.wikimedia.org/blob/operations%2Fpuppet.git/e568e9aa26b8abfa4b8cbb0ceaba7d2497d39b11/modu... [13:23:21] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [13:38:38] (03PS2) 10Steinsplitter: To allow dia-files (for flowcharts) we need to whitelist x-gzip [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188557 (https://phabricator.wikimedia.org/T88242) [13:43:50] matanya: sorry, I was at lunch. No I don't think there is much that we can do about the private puppet repo [13:44:05] no worries, thanks for your answer [13:56:17] (03CR) 10Alexandros Kosiaris: [C: 032] Update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/189530 (https://phabricator.wikimedia.org/T88956) (owner: 10GWicke) [13:59:40] (03CR) 10Alexandros Kosiaris: [C: 032] apparmor: minor lint fix [puppet] - 10https://gerrit.wikimedia.org/r/189723 (owner: 10Matanya) [14:08:12] Steinsplitter: https://wikitech.wikimedia.org/w/index.php?diff=143650&oldid=143622 isn't quite right, should probably have been like https://wikitech.wikimedia.org/w/index.php?diff=143653&oldid=143622 [14:08:52] anomie: thanks :): [14:08:55] *) [14:12:05] (03CR) 10Gerardduenas: [C: 031] Enable Quiz extension at cawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187913 (https://phabricator.wikimedia.org/T88208) (owner: 10Glaisher) [14:17:40] (03CR) 10Gerardduenas: [C: 031] Create 'interface-editor' user group on cawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187915 (https://phabricator.wikimedia.org/T85713) (owner: 10Glaisher) [14:22:28] (03CR) 10Gerardduenas: [C: 031] Standardize the name of interface editor group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186593 (https://phabricator.wikimedia.org/T85731) (owner: 10Glaisher) [14:26:47] 3operations: NetEase/YouDa company seeks guidance for setting up local mirror of wikipedia - https://phabricator.wikimedia.org/T89137#1028006 (10ArielGlenn) 3NEW a:3ArielGlenn [14:27:10] 3operations: NetEase/YouDa company seeks guidance for setting up local mirror of wikipedia - https://phabricator.wikimedia.org/T89137#1028015 (10ArielGlenn) Quoting from one of those emails: "As you mentioned in the meeting, you provide three methods for downloading Wiki content(Dump, HTML, OpenZim). We tend to... [14:29:56] (03CR) 10Alex Monk: Allow a full text search button on Commons whenever possible (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186916 (https://phabricator.wikimedia.org/T19471) (owner: 10Nemo bis) [14:30:04] AndyRussG, ejegg: Respected human, time to deploy CentralNotice update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150210T1430). Please do the needful. [14:30:23] Yurp [15:01:21] 3operations: NetEase/YouDa company seeks guidance for setting up local mirror of wikipedia - https://phabricator.wikimedia.org/T89137#1028088 (10JanWMF) Thanks Ariel. "The meeting" in the quote on the top was a breakfast NetEase's delegation had with Erik, Damon, and Amir on 1/28 and this ticket is part of a wid... [15:12:37] (03PS2) 10Nuria: Correcting docs and thresholds for eventlogging alarms [puppet] - 10https://gerrit.wikimedia.org/r/189588 [15:16:48] (03CR) 10Nuria: Correcting docs and thresholds for eventlogging alarms (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/189588 (owner: 10Nuria) [15:25:00] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:32:21] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59691 bytes in 0.308 second response time [15:34:52] <^d> aude: About? [15:39:15] ^d: ? [15:39:26] <^d> Spotted in prod: https://phabricator.wikimedia.org/P277 [15:39:38] <^d> Dunno if it's known, fixed, new? [15:39:43] we know :/ [15:39:59] <^d> There a task for it already? [15:40:03] i think we were working on a fix, but wouldn't be deployed yet [15:40:06] let me find it [15:40:18] 3operations, Continuous-Integration: Create a Debian package for NodePool - https://phabricator.wikimedia.org/T89142#1028174 (10hashar) 3NEW [15:40:21] cool we can do pastes in phabricator :) [15:40:24] <^d> Long as there's a task and a fix in the pipe then I'm good :) [15:40:32] <^d> Just making sure it wasn't a "omg what is that?!?" [15:40:33] <^d> :) [15:40:47] <_joe_> ^d: you're working on those logs eh? :) [15:40:57] * _joe_ is happy [15:41:02] (03CR) 10GWicke: "@Alex, I actually mean larger, as the heap then gets too large for reasonably quick collections. 8g is basically the largest recommended h" [puppet] - 10https://gerrit.wikimedia.org/r/189531 (owner: 10GWicke) [15:41:17] <^d> _joe_: They've gotten noisy since we got them all quiet like 2 years ago :p [15:41:17] <_joe_> oh, java. [15:41:21] 3operations, Continuous-Integration: [upstream] Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1028185 (10hashar) [15:41:38] https://phabricator.wikimedia.org/T76545 [15:41:58] <^d> Awesome! [15:41:59] <^d> Thx! [15:42:06] sure [15:42:17] <^d> Cool trick with the pastes by the way, you can embed them. [15:42:18] thanks for poking us [15:42:25] cool :) [15:42:31] <^d> So pastebin to phab, then {P12182918291829} and it'll show it in a scrolling textbox [15:42:47] and can ctrl+v pictures into phab [15:43:04] <_joe_> ^d: at DAYJOB~2 we did blame of any error we had in prod (and we used E_STRICT) and the dev responsible for introducing it should just fix it. We had a zero-notice in prod policy that, being enforced by the local sysadmin, was quite effective :P [15:43:25] <_joe_> did I mention the code base was horrible anyways? :P [15:43:32] <^d> All are :) [15:43:50] <_joe_> no. Mediawiki is fairly decent tbh [15:44:09] <_joe_> even mediawiki-config is at least rationally built and indented [15:44:26] <_joe_> for php standards, of course [15:48:29] <^d> Ok, wikidata is on its way to being fixed. bugs filed for echo and flow [15:50:57] I suppose I'll SWAT this morning. [15:51:07] <^d> jouncebot: next [15:51:07] In 0 hour(s) and 8 minute(s): Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150210T1600) [15:51:23] Steinsplitter, Glaisher: Ping for SWAT in about 9 minutes [15:51:33] I'm here. [15:52:12] :) [15:52:20] <^d> # Pete made me --Roan [15:52:21] <^d> $wgFileExtensions[] = 'omniplan'; [15:52:22] <^d> Heh [15:54:56] <^d> _joe_: Is the apache2 log on fluorine supposed to just be a useless wall of fcgi noise now? [15:55:17] <_joe_> ^d: yes it's a known, bogus wall of noise [15:55:44] <_joe_> it's noise, and fixing it requires me patching, repackaging and maintaining apache for trusty [15:55:50] <_joe_> not sure it's gonna happen [15:56:42] <^d> All of apache? Or just proxy_fcgi? [15:57:57] <^d> Oh I guess it's a standard module they distribute in the apache2 package, nvm. [15:58:01] <_joe_> there is not such a distinction :) [15:58:39] (03CR) 10BBlack: "I'm confident that config reload does reload keys based on what I've read (including the source). We could verify that functionally, of c" [puppet] - 10https://gerrit.wikimedia.org/r/189613 (owner: 10BBlack) [15:59:00] oh wow that's lengthy :) [16:00:04] manybubbles, anomie, ^d, marktraceur: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150210T1600). Please do the needful. [16:00:11] Steinsplitter: You're first [16:00:24] Oh, I'm glad someone else is paying attention :) [16:00:38] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188557 (https://phabricator.wikimedia.org/T88242) (owner: 10Steinsplitter) [16:01:14] thanks. will test [16:01:19] (schould :P) [16:01:52] paravoid: :) probably this should have been a phab task instead of a gerrit-comment-discussion, but meh [16:03:32] 3operations, Continuous-Integration: Create a Debian package for NodePool - https://phabricator.wikimedia.org/T89142#1028225 (10hashar) [16:05:29] anomie: the change needs to by synchronized? [16:05:43] Steinsplitter: Jenkins is being slow/broken [16:06:00] ok :) [16:06:39] 3operations, Incident-20150205-SiteOutage, Wikimedia-Logstash: Decouple logging infrastructure failures from MediaWiki logging - https://phabricator.wikimedia.org/T88732#1028231 (10bd808) >>! In T88732#1025266, @bd808 wrote: > Not that I've found yet but I haven't deeply investigated Monolog's GELF support. I've... [16:06:54] (03Merged) 10jenkins-bot: To allow dia-files (for flowcharts) we need to whitelist x-gzip [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188557 (https://phabricator.wikimedia.org/T88242) (owner: 10Steinsplitter) [16:07:03] ^d: I don't see Nik around, however FYI I'll resume bouncing the elasticsearch cluster starting with elastic1002 [16:07:47] !log anomie Synchronized wmf-config/CommonSettings.php: SWAT: Whitelist application/x-gzip on private wikis to fully allow dia files [[gerrit:188557]] (duration: 00m 05s) [16:07:47] Steinsplitter: ^ Test please [16:07:52] Logged the message, Master [16:07:52] Glaisher: You're next [16:07:59] <^d> godog: Go for it, I'm about [16:08:17] k [16:08:49] anomie: unfortunately not [16:09:55] Steinsplitter: ? [16:10:21] RECOVERY - Disk space on xenon is OK: DISK OK [16:10:33] !log stop replication on elasticsearch cluster and restart ES on elastic1002 [16:10:37] Logged the message, Master [16:11:03] anomie: File extension ".dia" does not match the detected MIME type of the file (application/x-gzip). [16:11:07] 3operations, Continuous-Integration: [upstream] Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1028235 (10hashar) [16:11:26] but it schould now, because x-gzip is whitelisted. strange [16:11:54] 3operations, Incident-20150205-SiteOutage, MediaWiki-Core-Team, Wikimedia-Logstash: Prototype Monolog and rsyslog configuration to ship log events from MediaWiki to Logstash - https://phabricator.wikimedia.org/T88870#1028237 (10bd808) If we wanted to use rsyslog as an intermediary, it would need configuration si... [16:12:24] Steinsplitter: Well, I'm going to move on with Glaisher's patches for the moment while you figure out what you want to do next. [16:12:35] (03PS3) 10Anomie: Set $wmgUseFloatedToc to false at dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187408 (https://phabricator.wikimedia.org/T87534) (owner: 10Glaisher) [16:12:46] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187408 (https://phabricator.wikimedia.org/T87534) (owner: 10Glaisher) [16:12:49] (03Merged) 10jenkins-bot: Set $wmgUseFloatedToc to false at dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187408 (https://phabricator.wikimedia.org/T87534) (owner: 10Glaisher) [16:13:10] !log anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wmgUseFloatedToc to false at dewikivoyage [[gerrit:187408]] (duration: 00m 06s) [16:13:11] Glaisher: ^ Test please [16:13:13] Logged the message, Master [16:14:28] anomie: working as expected [16:14:41] (03PS2) 10Anomie: Enable Quiz extension at cawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187913 (https://phabricator.wikimedia.org/T88208) (owner: 10Glaisher) [16:14:42] 3operations, Incident-20150205-SiteOutage, Wikimedia-Logstash: Decouple logging infrastructure failures from MediaWiki logging - https://phabricator.wikimedia.org/T88732#1028241 (10faidon) We don't really care about local logging, so I don't see why we'd involve the local rsyslog. Remote syslog over UDP or GELF... [16:14:50] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187913 (https://phabricator.wikimedia.org/T88208) (owner: 10Glaisher) [16:14:54] (03Merged) 10jenkins-bot: Enable Quiz extension at cawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187913 (https://phabricator.wikimedia.org/T88208) (owner: 10Glaisher) [16:15:15] !log anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Quiz extension at cawikibooks [[gerrit:187913]] (duration: 00m 07s) [16:15:16] Glaisher: ^ Test please [16:15:17] ^d: cool, currently in yellow [16:15:19] Logged the message, Master [16:15:47] anomie: the config is synchronized to nun public wikis (like otrs?) [16:16:05] on my testwiki it has worked. *sigh* [16:16:46] Steinsplitter: Should be, as far as I know. [16:16:55] anomie: working [16:16:56] thanks [16:17:58] Steinsplitter: Are you getting the same error, or is it a different error now? [16:18:05] !log temporarily disable puppet on carbon [16:18:08] Logged the message, Master [16:18:10] RECOVERY - Disk space on cerium is OK: DISK OK [16:18:15] anomie: same [16:21:35] 3operations, Release-Engineering, WMF-Design: Better WMF error pages - https://phabricator.wikimedia.org/T76560#1028251 (10greg) >>! In T76560#1026252, @bd808 wrote: > @greg is going to find a #release-engineering helper for this project @Nirzar: Can you give me/this ticket a status update of where you are/what... [16:22:52] (03PS1) 10Anomie: Revert "To allow dia-files (for flowcharts) we need to whitelist x-gzip" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189746 [16:22:59] (03PS2) 10Anomie: Revert "To allow dia-files (for flowcharts) we need to whitelist x-gzip" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189746 [16:23:08] (03CR) 10Anomie: [C: 032] Revert "To allow dia-files (for flowcharts) we need to whitelist x-gzip" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189746 (owner: 10Anomie) [16:23:13] (03Merged) 10jenkins-bot: Revert "To allow dia-files (for flowcharts) we need to whitelist x-gzip" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189746 (owner: 10Anomie) [16:23:23] Steinsplitter: I'm going to revert your change then, since it doesn't seem to have been useful. [16:23:38] yes, there seems to be a other security measure [16:24:04] er docs this schould resolve the problem. thans anomie [16:24:12] !log anomie Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Whitelist application/x-gzip on private wikis to fully allow dia files", wasn't a correct fix for the issue (duration: 00m 05s) [16:24:18] Logged the message, Master [16:24:39] Steinsplitter: Once I looked into it some, application/x-gzip was never blacklisted in the first place. [16:27:42] 3Ops-Access-Requests: Create shell access for Zeljko - RelEng rights - https://phabricator.wikimedia.org/T87597#1028266 (10zeljkofilipin) Apologies for being slow on this. I have installed a clean OS on my main machine before going to a 3 week business trip. Now that I am back I am overwhelmed with stuff that ne... [16:27:52] *grimbl* [16:29:29] What is the procedure for getting ops people to check on something? We've added it to your project but if your backlog is as huge as ours that might not do anything [16:30:34] marktraceur: if you look at the ‘On Ops Duty’ entry in the topic… that’s the person to ping if you have something urgent. Alex, this week. [16:30:41] (Sorry if I’m stating the obvious) [16:30:56] Hm...urgency is subjective, but fair enough [16:31:13] akosiaris: We have this business happening: https://phabricator.wikimedia.org/T89018#1026790 [16:31:16] qchris_away: ping [16:31:24] well, it’s his job to triage as well. So telling you to get bent is well within his scope :) [16:31:38] akosiaris: Large files problematic on Commons currently, unsure of the cause, maybe OOM error, maybe something we can fix in the extension, not sure [16:31:54] andrewbogott: Sounds like a fun game [16:32:30] bd808: At this point I’m not clear on if the new wikitech is or isn’t fully on the deployment train. Do I need to change some settings in the config still, or is just the fact that deployers have access to the box the last step? [16:32:50] 3operations, Incident-20150205-SiteOutage, Wikimedia-Logstash: Decouple logging infrastructure failures from MediaWiki logging - https://phabricator.wikimedia.org/T88732#1028273 (10chasemp) Thought about this a bit and actually have had this same conversation in the past. I don't think rsyslog is clearly a bette... [16:32:57] I guess I can wait until swat and see what happens... [16:33:23] andrewbogott: well. you pushed changes last night via sync-file from tin right? [16:33:46] bd808: no, I ran sync-common on silver. But that was just out of habit, I’m not sure if it was needed :/ [16:34:08] ok. Lets check the dsh group file on tin to see if silver is in there [16:34:41] ok — where’s that? [16:34:55] Not there yet. /etc/dsh/groups/mediawiki-installation [16:35:07] marktraceur: ok looking into it [16:35:08] That is the file that scap reads to decide what hosts to talk to [16:35:42] Thanks akosiaris! [16:35:59] andrewbogott: So silver needs to be added there via ops/puppet and then someone can test a no-op sync to see if silver is updated or throws an error [16:36:16] But I can log into silver so that's a good sign that it may work [16:36:17] bd808: ah, mediawiki-installation is straight from puppet? [16:36:19] * andrewbogott looks [16:36:20] 3Ops-Access-Requests: Create shell access for Zeljko - RelEng rights - https://phabricator.wikimedia.org/T87597#1028277 (10hashar) @zeljkofilipin we can pair it together if you want :-] [16:38:32] marktraceur: only reproducible in firefox ? I assume due to firefogg being a prerequisite ? [16:38:51] akosiaris: Actually one of the commenters on the other bug said they were using Chrome [16:39:13] You're uploading an OGV file, so Firefogg doesn't do anything (because it converts *to* OGV) [16:39:40] I did not even know firefogg existed a couple of minutes ago, thanks for the explanation [16:40:06] (03PS2) 10Andrew Bogott: Remove the nova::manager class from virt1000. [puppet] - 10https://gerrit.wikimedia.org/r/189658 [16:40:08] (03PS2) 10Andrew Bogott: Revert "Temporary hack: Turn off wikitech-static dump crons." [puppet] - 10https://gerrit.wikimedia.org/r/189659 [16:40:10] (03PS1) 10Andrew Bogott: Add silver to the deployment train [puppet] - 10https://gerrit.wikimedia.org/r/189750 [16:40:16] 3Ops-Access-Requests, operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1028291 (10chasemp) So yeah this is a side effect of how we are organizing things. A few thoughts. 1. {T77228} -- I think we need this on ou... [16:40:42] 3operations: =?UTF-8?Q?For_exhibitor_of_DubaiWoodShow_2015.?= - https://phabricator.wikimedia.org/T89090#1028295 (10chasemp) 5Open>3Invalid a:3chasemp [16:42:21] (03CR) 10Gerardduenas: "Thanks Anomie" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187913 (https://phabricator.wikimedia.org/T88208) (owner: 10Glaisher) [16:42:53] bd808: so… that’s it? https://gerrit.wikimedia.org/r/#/c/189750/ [16:43:11] godog: those test boxes don't have much storage (only 280G each), so aren't really enough to hold just enwiki and the titan test db [16:43:31] now nuked some stuff to make space [16:43:47] 3Ops-Access-Requests: Create shell access for Zeljko - RelEng rights - https://phabricator.wikimedia.org/T87597#1028308 (10zeljkofilipin) @hashar: That would be great, thanks! So, this should be resolved around noon tomorrow CET. :) [16:43:59] (03CR) 10BBlack: "Looking at client capabilities listed at https://www.ssllabs.com/ssltest/clients.html , it seems obvious we won't be able to replace clien" [puppet] - 10https://gerrit.wikimedia.org/r/189613 (owner: 10BBlack) [16:44:34] gwicke: thanks, do you expect more tests like that that will fill up the disk? [16:44:46] (03CR) 10BryanDavis: [C: 031] "LGTM. A deployer should test right after merging to make sure that scap can actually talk to the host correctly." [puppet] - 10https://gerrit.wikimedia.org/r/189750 (owner: 10Andrew Bogott) [16:45:30] andrewbogott: I think so, yes. cc'd ^d and twentyafterfour on the patch. One of them should test a sync for you after merging to make sure it works [16:45:46] thanks [16:45:55] (03PS2) 10Andrew Bogott: Add silver to the deployment train. [puppet] - 10https://gerrit.wikimedia.org/r/189750 [16:46:04] godog: I think for now it should be fine, and the real hardware should be ready late this week / early next (with 3T bper node) [16:46:29] * ^d looks [16:46:59] (03CR) 10Chad: [C: 031] Add silver to the deployment train. [puppet] - 10https://gerrit.wikimedia.org/r/189750 (owner: 10Andrew Bogott) [16:47:38] ^d: shall I merge right now? Or wait until swat? Or… something? [16:47:55] <^d> anomie: You're done swatting, right? [16:48:12] paravoid: pong [16:48:25] ^d: Yes [16:48:33] <^d> thx [16:48:34] <^d> andrewbogott: Let's do it now then. [16:48:39] qchris: jgage and I were at the meeting :) [16:48:41] nevermind, next week [16:48:50] (03CR) 10Andrew Bogott: [C: 032] Add silver to the deployment train. [puppet] - 10https://gerrit.wikimedia.org/r/189750 (owner: 10Andrew Bogott) [16:48:53] ok :-) [16:48:58] <^d> bd808: Do we need a full scap so silver's up to date? [16:49:32] paravoid: wait ... "the" meeting not "a" meeting. Which meeting did I miss? [16:49:33] ^d: wikitech.php is a good file to test since that has local changes on silver. I can verify that they’re clobbered after you sync-file. [16:49:49] or a full scap is fine as well if you have the patience [16:49:50] <^d> Ah, we'll just sync-file that then [16:49:58] ^d: I think andrewbogott did a sync-common last night but it wouldn't have caught sat from this morning [16:50:01] running puppet on tin… [16:50:11] *swat [16:50:11] yeah, correct [16:50:27] <^d> We'll do a wikitech sync-file to confirm, then a full scap [16:50:30] (03PS3) 10Andrew Bogott: Remove the nova::manager class from virt1000. [puppet] - 10https://gerrit.wikimedia.org/r/189658 [16:50:32] (03PS3) 10Andrew Bogott: Revert "Temporary hack: Turn off wikitech-static dump crons." [puppet] - 10https://gerrit.wikimedia.org/r/189659 [16:50:55] ^d: ok, puppet is done on tin. So… sync away. [16:50:57] gwicke: *nod* [16:51:29] qchris: "analytics-ops checkpoint" [16:51:40] oh. I am not longer in those. [16:51:47] oh, didn't know [16:51:47] Gotta remove myself from the invite. [16:51:50] sorry. [16:52:03] !log demon Synchronized wmf-config/wikitech.php: Testing silver sync (duration: 00m 05s) [16:52:09] no worries [16:52:10] Logged the message, Master [16:52:30] <^d> andrewbogott: Ok, wikitech.php sync'd [16:52:33] <^d> No errors from tin [16:53:23] ^d: worked, my debug changes on silver are gone [16:53:23] <^d> \o/ [16:53:23] <^d> Ok, I'll start a full scap [16:53:24] great, thanks [16:53:25] !log demon Started scap: No code changes, bringing silver in as deploy target [16:53:25] Logged the message, Master [16:54:20] (03CR) 10Alexandros Kosiaris: "Ah, OK, now I understand the comment. I was confused from the 5G -> 8G increase and though the comment was related to that and not >8G. Ye" [puppet] - 10https://gerrit.wikimedia.org/r/189531 (owner: 10GWicke) [16:54:24] <^d> andrewbogott: So is this really it? Is wikitech totally on the train now? [16:54:38] yep! [16:54:41] <^d> \o/ \o/ [16:54:57] It still depends on services that run on virt1000, so I can still break it in a way that you can’t fix :) [16:55:05] (03CR) 10BryanDavis: "We can't fake entries in the compsoer.lock file. It's the list of exactly what composer has put into vendor. So either we pack all the dev" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189148 (https://phabricator.wikimedia.org/T85947) (owner: 10Legoktm) [16:55:05] But all of the mw stuff should be in your hands now. [16:55:25] (03CR) 10Legoktm: "It doesn't matter that much what versions of the PHP linter or PHPUnit we use as long as they follow semver so we don't pull in breaking c" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189148 (https://phabricator.wikimedia.org/T85947) (owner: 10Legoktm) [16:55:26] <^d> That's the important bit. I'm glad we've finally fixed this :) [16:56:18] (03PS17) 10ArielGlenn: Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 (https://phabricator.wikimedia.org/T86808) (owner: 10Hoo man) [16:56:20] (03CR) 10Faidon Liambotis: [C: 031] "Oh wow, that's such a detailed response. Merging this ASAP sounds good to me! (modulo whitespace :)" [puppet] - 10https://gerrit.wikimedia.org/r/189613 (owner: 10BBlack) [16:56:26] (03CR) 10jenkins-bot: [V: 04-1] Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 (https://phabricator.wikimedia.org/T86808) (owner: 10Hoo man) [16:56:57] (03PS1) 10Cmjohnson: Changing dns entries for mc1017 and mc1018 [dns] - 10https://gerrit.wikimedia.org/r/189752 [16:57:30] 3operations, Release-Engineering, WMF-Design: Better WMF error pages - https://phabricator.wikimedia.org/T76560#1028337 (10Technical13) >>! In T76560#1026945, @Jaredzimmerman-WMF wrote: > @technical13 what would a "desktop view" mean in this case, as it is a responsive page it adjusts for the device… many browse... [16:59:28] <^d> mw1097 be sick? [16:59:28] Reedy: wikitech is really on the train now! mission accomplished! (flags fly, rockets blaze) [16:59:29] andrewbogott, so what group does wikitech get deployments in? [16:59:50] Krenair: I’m in a meeting now, but… it’s in the config :) [17:00:00] <^d> greg-g: ^? [17:00:13] <^d> group0? [17:00:36] (03CR) 10Cmjohnson: [C: 032] Changing dns entries for mc1017 and mc1018 [dns] - 10https://gerrit.wikimedia.org/r/189752 (owner: 10Cmjohnson) [17:00:44] <^d> group1 seems more likely. [17:00:50] eh? [17:00:59] <^d> wikitech's on train now. which group? [17:01:03] (03PS1) 10Giuseppe Lavagetto: base: add the service_unit init wrapper [puppet] - 10https://gerrit.wikimedia.org/r/189753 [17:01:21] oh..., uhhh, 1 or 2 [17:01:27] <_joe_> bblack: when I decided to leave behind puppet' [17:01:28] just not 0 [17:01:45] <_joe_> s insanity, it wasn't too hard to provide something that looks nice [17:01:50] <_joe_> or almost nice [17:03:17] 3operations, ops-eqiad: wipe holmium disks - https://phabricator.wikimedia.org/T87391#1028348 (10Cmjohnson) 5Open>3Resolved Disks have been wiped - Task completed [17:03:46] 3operations, Incident-20150205-SiteOutage, ops-eqiad: Split memcached in eqiad across multiple racks/rows - https://phabricator.wikimedia.org/T83551#1028354 (10Cmjohnson) mc1017/18 have been moved to d8 in eqiad. [17:06:16] (03PS18) 10ArielGlenn: Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 (https://phabricator.wikimedia.org/T86808) (owner: 10Hoo man) [17:07:03] (03CR) 10jenkins-bot: [V: 04-1] Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 (https://phabricator.wikimedia.org/T86808) (owner: 10Hoo man) [17:09:50] (03PS19) 10ArielGlenn: Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 (https://phabricator.wikimedia.org/T86808) (owner: 10Hoo man) [17:10:19] 3operations, ops-eqiad: rack and setup restbase production cluster in eqiad - https://phabricator.wikimedia.org/T88805#1028371 (10Cmjohnson) We do not have a name for these yet. The bios doesn't seem to want to accept the 10G card as a boot device [17:10:49] !log demon Finished scap: No code changes, bringing silver in as deploy target (duration: 17m 31s) [17:10:55] Logged the message, Master [17:12:48] 3operations, Citoid, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1028376 (10akosiaris) Trying to answer to every comment in sequence, please bear with me on this. @GWicke, yes, let's not overcomplicate things more. They are already complicated enough as is. The zotero s... [17:13:16] (03CR) 10ArielGlenn: [C: 032] Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 (https://phabricator.wikimedia.org/T86808) (owner: 10Hoo man) [17:13:20] 3operations, Project-Creators, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1028378 (10Krenair) This broke the policy at https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects#New_projects that all project creations must go through discussion under #Pro... [17:13:34] (03CR) 10BBlack: "I think structurally this is a good place to start, but..." [puppet] - 10https://gerrit.wikimedia.org/r/189753 (owner: 10Giuseppe Lavagetto) [17:14:21] PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: Puppet last ran 1 day ago [17:14:33] ignore that, it will be fixed in 3 minues [17:15:09] ^d: Did that 17m scap pick up new l10n changes or is it just taking that long for a bare sync now? [17:15:20] <^d> It picked up a few. [17:15:24] <^d> had to rebuild cdbs. [17:15:39] *nod* [17:15:50] <^d> I can pastebin [17:16:12] <^d> https://phabricator.wikimedia.org/P278 [17:16:13] With all the new rsync slaves I would hope a truly no-op scap would only be ~5m [17:17:40] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:17:42] 3operations, Project-Creators, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1028388 (10Krenair) [17:19:47] 3operations, Project-Creators, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1028391 (10chasemp) >>! In T87262#1028378, @Krenair wrote: > Also, https://phabricator.wikimedia.org/tag/Interdatacenter-IPSEC/ is marked as visibility restricted, but this is not allowed. So we e... [17:20:27] (03CR) 10Giuseppe Lavagetto: "1) Not a problem IMHO - if the init script is provided upstream you just default to it with has_initscripts => false. I don't think we wil" [puppet] - 10https://gerrit.wikimedia.org/r/189753 (owner: 10Giuseppe Lavagetto) [17:20:55] ^d: Hmm... what about the file perms errors there? They look to be 0444 root:root on both sides now, was that fixed after the scap? [17:22:36] <^d> You mean on mw1097? [17:23:13] <^d> Oh the chgrp [17:23:13] <^d> wtf? [17:23:26] 3operations, Project-Creators, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1028395 (10Krenair) >>! In T87262#1028391, @chasemp wrote: >>>! In T87262#1028378, @Krenair wrote: >> Also, https://phabricator.wikimedia.org/tag/Interdatacenter-IPSEC/ is marked as visibility res... [17:23:41] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [17:25:01] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: puppet fail [17:29:21] yep, fixing already [17:29:31] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: puppet fail [17:29:32] er? unmerged change? ggrrrr [17:30:41] PROBLEM - puppet last run on snapshot1002 is CRITICAL: CRITICAL: puppet fail [17:31:00] PROBLEM - puppet last run on snapshot1004 is CRITICAL: CRITICAL: puppet fail [17:32:10] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [17:32:34] 3Deployment-Systems, operations: Add virt1000 to the dsh group for scap - https://phabricator.wikimedia.org/T75938#1028411 (10Krenair) So we no longer need this, right? [17:33:30] 3Deployment-Systems, operations, Scrum-of-Scrums: Update wikitech wiki with deployment train - https://phabricator.wikimedia.org/T70751#1028424 (10Andrew) [17:33:31] 3Deployment-Systems, operations: Add virt1000 to the dsh group for scap - https://phabricator.wikimedia.org/T75938#1028421 (10Andrew) 5stalled>3Invalid a:3Andrew yes, this is moot now. [17:35:44] ^d: I think those files should really be 0644 mwdeploy:mwdeploy like the other file in private. Would need a root to help with that. [17:35:48] (03PS1) 10ArielGlenn: snapshots: move the admin include into the role [puppet] - 10https://gerrit.wikimedia.org/r/189759 [17:37:16] <_joe_> bd808: what's the problem? [17:37:56] _joe_: File perms on tin and silver -- https://phabricator.wikimedia.org/P278$23 [17:38:13] <^d> Ah yeah [17:38:14] <^d> Got it [17:38:24] 3operations, Project-Creators, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1028440 (10chasemp) > What's the point in allowing restricted visibility projects like that? You'd still have to make a public #Project-Creators ticket to get one. If desirable that process does... [17:38:25] Probably files that andrewbogott touched last night in the move of wikitech [17:38:46] <_joe_> prolly [17:38:56] <_joe_> I can't really help right now [17:39:12] hm, thought I did those syncs as ‘andrew’ rather than as ‘root' [17:39:25] On mw1015 they show as 0444 mwdeploy:mwdeploy; on tin and silver they are 0444 root:root [17:39:28] ^d: still yellow but elastic1002 filling up its disks, I'll wait a bit more but here's what was in SAL the last time from Nik https://wikitech.wikimedia.org/wiki/Server_Admin_Log#February_4 [17:39:42] bd808: ok, so, want me to chown stuff on root? What files? [17:40:13] andrewbogott: /srv/mediawiki/private/WikitechPrivateLdapSettings.php and /srv/mediawiki/private/WikitechPrivateSettings.php [17:40:45] but in mediawiki-staging too [17:41:53] Hm, those files are deployed by puppet I think [17:41:55] (03CR) 10ArielGlenn: [C: 032] snapshots: move the admin include into the role [puppet] - 10https://gerrit.wikimedia.org/r/189759 (owner: 10ArielGlenn) [17:41:58] lemme look [17:42:06] <_joe_> andrewbogott: I don't think so? [17:43:00] Oh, they probably are. Hmmm [17:43:17] maybe they just need to be fixed in the /srv/mediawiki locations then [17:43:39] and rsync will make them right elsewhere? [17:43:47] If puppet creates them then puppet should set the ownerships. But I need to track this down a bit. [17:43:49] (03PS2) 10BBlack: fix PFS key rotation issues [puppet] - 10https://gerrit.wikimedia.org/r/189613 [17:44:03] (And I’m in a meeting so… distracted) [17:44:10] <^d> godog: I'm banning 1002 from allocation [17:44:11] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [17:44:37] andrewbogott: no worries; not fatal problems. The error was only from silver when ^d ran the scap [17:44:42] and so he leaves :-D [17:44:53] bye hoo|away [17:45:06] hm… and it doesn’t work for them to be owned by root? [17:45:15] I guess it doesn’t matter. [17:45:17] I’ll change [17:46:02] ^d: ack [17:46:37] andrewbogott: I think the error was on silver. Looking at mw1015 the files are 0444 mwdeploy:mwdeploy so I think the mwdeploy user tried to make that change on silver as was stopped by the root:root ownership [17:48:50] <^d> `curl -s localhost:9200/_cat/allocation?v; curl -s localhost:9200/_cat/shards | grep 'RELOCATING'` [17:48:50] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [17:48:51] RECOVERY - puppet last run on snapshot1002 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:48:55] <^d> This is going to take awhile to settle. [17:49:45] (03PS1) 10MaxSem: Add a mobile subdomain for wikitech [dns] - 10https://gerrit.wikimedia.org/r/189761 (https://phabricator.wikimedia.org/T87633) [17:50:01] 3operations, Incident-20150205-SiteOutage, MediaWiki-Core-Team, Wikimedia-Logstash: Prototype Monolog and rsyslog configuration to ship log events from MediaWiki to Logstash - https://phabricator.wikimedia.org/T88870#1028470 (10Anomie) >>! In T88870#1025205, @bd808 wrote: > What it does is prepend a partial RFC... [17:50:11] RECOVERY - puppet last run on snapshot1004 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:50:11] 3Ops-Access-Requests, operations: Give "hoo" sudo access to dataset snapshot hosts - https://phabricator.wikimedia.org/T86808#1028471 (10ArielGlenn) another rebasing, upate of a gid, fix of a typo, and class incllude cleanup later and it's live. hoo, please check with me before you start using this, you are goin... [17:51:33] 3operations, Incident-20150205-SiteOutage, MediaWiki-Core-Team, Wikimedia-Logstash: Prototype Monolog and rsyslog configuration to ship log events from MediaWiki to Logstash - https://phabricator.wikimedia.org/T88870#1028474 (10bd808) >>! In T88870#1028470, @Anomie wrote: >>>! In T88870#1025205, @bd808 wrote: >>... [17:52:16] greg-g: dammit, sorry — my internet is terrible here [17:52:20] Carry on without me :) [17:52:22] (03CR) 1020after4: [C: 031] Change Blocking Tasks to 'Blocked By' Tasks [puppet] - 10https://gerrit.wikimedia.org/r/189329 (https://phabricator.wikimedia.org/T33) (owner: 10Merlijn van Deen) [17:53:15] andrewbogott: s'ok :) [17:53:47] basically as long as I do just one thing it’s fine, but if I type ‘git review’ in another window or my automatic backup system starts… [17:55:17] ^d: sigh, okay [17:56:43] <^d> I'll babysit this [17:57:33] ^d: I wonder if I'm doing something silly, twice in a row is hardly is a coincidence, I'm looking at https://wikitech.wikimedia.org/wiki/Search#Restarting_a_node [17:58:43] (03PS1) 10Andrew Bogott: Change these mw config files to mwdeploy:mwdeploy to avoid scap confusion [puppet] - 10https://gerrit.wikimedia.org/r/189763 [17:59:15] (03CR) 10Andrew Bogott: [C: 032] Remove the nova::manager class from virt1000. [puppet] - 10https://gerrit.wikimedia.org/r/189658 (owner: 10Andrew Bogott) [18:00:04] maxsem, kaldari: Respected human, time to deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150210T1800). Please do the needful. [18:00:49] <^d> godog: Stop replication, bounce, start replication? No, that's the process. [18:01:21] (03CR) 10Andrew Bogott: [C: 032] Revert "Temporary hack: Turn off wikitech-static dump crons." [puppet] - 10https://gerrit.wikimedia.org/r/189659 (owner: 10Andrew Bogott) [18:02:23] twentyafterfour: I'm going to be afk during the deploy today, ^d's around though :) [18:02:59] ^d: yep, waiting on green health after bounce is what fails ATM [18:03:06] <^d> Yeahhh [18:03:11] <^d> I'll do 1003, see what's up. [18:03:19] thanks! [18:03:20] <^d> Well, once 1002 is less icky [18:03:54] 3operations, Citoid, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1028512 (10GWicke) >>! In T76308#1028376, @akosiaris wrote: > Trying to answer to every comment in sequence, please bear with me on this. > > @GWicke, yes, let's not overcomplicate things more. They are al... [18:04:50] (03CR) 10Andrew Bogott: [C: 032] Change these mw config files to mwdeploy:mwdeploy to avoid scap confusion [puppet] - 10https://gerrit.wikimedia.org/r/189763 (owner: 10Andrew Bogott) [18:04:59] bd808: ^ should fix the file permission thing [18:05:22] apergos: Bad timing :P [18:05:27] Works for me, btw \o/ [18:05:38] great [18:06:41] hoo: :) [18:09:32] (03PS1) 10Cmjohnson: Adding dhcpd entries for mc1018-8 [puppet] - 10https://gerrit.wikimedia.org/r/189767 [18:09:48] <^d> Ahhhhhhhh! [18:09:55] <^d> I see why it's taking so long to recover. [18:10:14] <^d> es-tool enable-replication wasn't getting called or something. Replication was still set to "primaries" [18:10:18] <^d> godog: ^ [18:10:35] (03CR) 10Cmjohnson: [C: 032] Adding dhcpd entries for mc1018-8 [puppet] - 10https://gerrit.wikimedia.org/r/189767 (owner: 10Cmjohnson) [18:11:22] ^d: that should happen after es-tool health is successful again tho? [18:11:45] 3operations: Redirects to https need to set NE (no escape) in apache - https://phabricator.wikimedia.org/T88359#1028543 (10Arlolra) Thanks @Joe [18:12:56] <^d> Gahhhhh [18:12:58] <^d> Stupid me [18:13:28] <^d> It's returning the same error code for red/yellow. [18:13:32] <^d> And only 0 on green [18:13:40] <^d> It'll never get green until replication is back on [18:13:44] ok, I have to run to an appt, be back after the deploy is done. :) [18:13:52] ^d: lolz, that explains it [18:14:01] _joe_ updated...running to get some food back in a few [18:14:10] (03Abandoned) 10Andrew Bogott: Specify the DNS server for dnsmasq. [puppet] - 10https://gerrit.wikimedia.org/r/186741 (owner: 10Andrew Bogott) [18:14:43] ^d: indeed es-tool restart-fast shouldn't have this problem [18:15:04] <^d> The output is kinda jankey but yeah [18:16:14] <^d> Also explains why disk would fill up. It still had non-primary shards it was hanging on to [18:16:26] <^d> So as things got moved back to that node, it had to run up the disk usage [18:16:45] <^d> disk usage on 1002 going down now [18:17:32] \o/ okay that's saner [18:20:44] 3Ops-Access-Requests, operations: Give "hoo" sudo access to dataset snapshot hosts - https://phabricator.wikimedia.org/T86808#1028581 (10ArielGlenn) 5Open>3Resolved had the chat, verified his access works, closing. [18:21:08] (03PS3) 10BBlack: fix PFS key rotation issues [puppet] - 10https://gerrit.wikimedia.org/r/189613 [18:21:19] (03CR) 10BBlack: [C: 032 V: 032] fix PFS key rotation issues [puppet] - 10https://gerrit.wikimedia.org/r/189613 (owner: 10BBlack) [18:21:27] ^d: I see what you meant now, "wait for elasticsearch to answer on http again" [18:21:32] <_joe_> godog: ping [18:21:32] _joe_: ping detected, please leave a message! [18:21:52] <^d> godog: Yeahhh. Sorry for screwing up the stuff on wikitech [18:23:05] ^d: np, I'm +1 on moving shell scripts off wikitech and into es-tool (didn't notice fast-restart earlier) [18:25:19] <^d> That script could use a little more love from someone who know more python than me :) [18:26:41] (03CR) 10Nikerabbit: "A note about the reason would be helpful for "outsiders"." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189746 (owner: 10Anomie) [18:27:39] 3Release-Engineering, Wikimedia-General-or-Unknown, operations, WMF-Design: Better WMF error pages - https://phabricator.wikimedia.org/T76560#1028598 (10Nemo_bis) [18:34:20] (03CR) 10Andrew Bogott: [C: 031] Add a mobile subdomain for wikitech [dns] - 10https://gerrit.wikimedia.org/r/189761 (https://phabricator.wikimedia.org/T87633) (owner: 10MaxSem) [18:35:57] (03CR) 10Dzahn: "MobileFrontends is disabled on wikitech, unless that was changed recently. it would be nice though if it could be enabled, because wikitec" [dns] - 10https://gerrit.wikimedia.org/r/189761 (https://phabricator.wikimedia.org/T87633) (owner: 10MaxSem) [18:36:41] (03CR) 10Dzahn: "this was also the reason i abandoned the idea to automatically create these" [dns] - 10https://gerrit.wikimedia.org/r/189761 (https://phabricator.wikimedia.org/T87633) (owner: 10MaxSem) [18:37:25] 3operations, Project-Creators, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1028622 (10Aklapper) Thanks for bringing this up so we can try to avoid such confusion in the future and/or improve our project creation guidelines. >>! In T87262#1028378, @Krenair wrote: > This... [18:38:41] (03CR) 10Dzahn: "'wmgMobileFrontend' => array(" [dns] - 10https://gerrit.wikimedia.org/r/189761 (https://phabricator.wikimedia.org/T87633) (owner: 10MaxSem) [18:44:13] (03PS1) 10Andrew Bogott: Puppetize a few symlinks that are hotfixed on silver [puppet] - 10https://gerrit.wikimedia.org/r/189774 [18:44:56] bd808: ^ is the last little bit of puppet from last night. Look ok to you? (I lost the link to your vagrant patch which probably does it more elegantly) [18:45:36] hoo, hey [18:46:28] hey [18:47:12] andrewbogott: That looks like it would work. You probably also want to add matching symlinks in /etc/php5/cli/conf.d [18:47:17] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Use the apache::conf and apache::mod directives as we do in the mediawiki module" [puppet] - 10https://gerrit.wikimedia.org/r/189774 (owner: 10Andrew Bogott) [18:47:18] will pm [18:47:26] <_joe_> andrewbogott: there you are :P [18:47:37] (03CR) 10MaxSem: "You can't enable MobileFrontend before the DNS change because you will either have the mobile view link lead to nowhere or pollute the mai" [dns] - 10https://gerrit.wikimedia.org/r/189761 (https://phabricator.wikimedia.org/T87633) (owner: 10MaxSem) [18:47:41] 3operations, Citoid, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1028654 (10akosiaris) > Running one instance of zotero per citoid worker can actually be more HA than a simple LVS setup, as it can also detect hanging backends & restart them as needed. We have [done this... [18:47:50] <_joe_> oh it's the PHP ones? [18:48:06] <_joe_> andrewbogott: how can it be that they are not linked by the deb package? [18:48:11] andrewbogott: The mw-vagrant version is not very elegant. It's more of a big hammer approach -- https://github.com/wikimedia/mediawiki-vagrant/blob/d7fbc059049c91a003f7a998a53f3e38dc583614/puppet/modules/apache/files/envvars#L18-L23 [18:48:31] _joe_: I don’t know. That box is not using hhvm so I assume I’m in a weird corner case. [18:48:48] I would not object to switching over to hhvm but only if you have time to steer [18:48:51] <_joe_> andrewbogott: is that trusty? [18:48:55] _joe_: yes [18:49:02] <_joe_> ohhh ok I see [18:49:22] <_joe_> andrewbogott: for now keep the unpuppetized symlinks, we need to fix the packages [18:49:25] _joe_: trusty changed where php ini files go. In precise the apache2 and cli conf.d dirs were symlinks to the same /etc/php5/conf.d dir. Trusty introduces php5enmod to manage the link farm [18:49:31] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "Sorry, I got this wrong. These symlinks should be created by the extensions debian packages." [puppet] - 10https://gerrit.wikimedia.org/r/189774 (owner: 10Andrew Bogott) [18:49:38] _joe_: ok, fair enough. Should I open a bug? [18:49:39] <_joe_> bd808: I know, hence I asked :) [18:49:42] andrewbogott, so what was the full fix for the luasandbox thing yesterday? some symlinks? [18:49:50] <_joe_> andrewbogott: please do [18:49:59] Krenair: yeah, it was approximately that patch above. [18:50:00] <_joe_> ok, now I'm really off [18:51:59] I think Roan & Ori ran into something broken about luasandbox in mod_php (not hhvm) on osmium [18:52:00] 3Ops-Access-Requests, operations: Give "hoo" sudo access to dataset snapshot hosts - https://phabricator.wikimedia.org/T86808#1028663 (10Lydia_Pintscher) Thank you! [18:52:18] we don't have luasandbox for php on trusty [18:52:46] we don't? there is a deb [18:53:06] I haven't really tested it but there is one [18:53:16] it's probably not installed by puppet [18:53:29] having mediawiki sorta work on trusty because of the default php installation is an accident [18:53:31] not by design [18:53:37] *nod* [18:53:48] wikitech is doing it since last night [18:53:58] php5 on trusty that is [18:58:29] ii php-luasandbox 2.0-7+wmf2.1 [19:00:04] twentyafterfour, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150210T1900). Please do the needful. [19:01:20] hey swatters, can we have this config change for wikitech? [19:01:29] https://gerrit.wikimedia.org/r/#/c/189655/ [19:03:32] 3Wikimedia-Mailing-lists, operations: New mailing list for wikidata-query-tech - https://phabricator.wikimedia.org/T89158#1028698 (10Manybubbles) 3NEW [19:05:28] akosiaris: Any news from serverland? [19:05:42] abartov just showed up in -commons asking about it [19:06:16] (03CR) 10Aklapper: [C: 031] "Under the assumption that ' do not need escaping when there are surrounding " this is a +1." [puppet] - 10https://gerrit.wikimedia.org/r/189329 (https://phabricator.wikimedia.org/T33) (owner: 10Merlijn van Deen) [19:06:46] 3Multimedia, operations, MediaWiki-extensions-UploadWizard: Investigate server error when uploading an OGV - https://phabricator.wikimedia.org/T89018#1028708 (10Ijon) Same just happened to me, on both Chrome and Firefox, with a 150MB PDF file, on a strong connection (from the WMF office). uploads complete, but... [19:07:45] (03CR) 10Merlijn van Deen: "http://yaml-online-parser.appspot.com/?url=https%3A%2F%2Fgit.wikimedia.org%2Fraw%2Foperations%252Fpuppet%2F4987f33bbcd0b9f8d27704eb7d7b2a4" [puppet] - 10https://gerrit.wikimedia.org/r/189329 (https://phabricator.wikimedia.org/T33) (owner: 10Merlijn van Deen) [19:08:21] (03PS1) 10Dzahn: site.pp: fix lint errors/warns (puppet-lint 1.1) [puppet] - 10https://gerrit.wikimedia.org/r/189779 [19:11:25] (03CR) 10Bartosz Dziewoński: [C: 031] Change Blocking Tasks to 'Blocked By' Tasks [puppet] - 10https://gerrit.wikimedia.org/r/189329 (https://phabricator.wikimedia.org/T33) (owner: 10Merlijn van Deen) [19:15:39] (03CR) 10Dzahn: [C: 032] Change Blocking Tasks to 'Blocked By' Tasks [puppet] - 10https://gerrit.wikimedia.org/r/189329 (https://phabricator.wikimedia.org/T33) (owner: 10Merlijn van Deen) [19:17:07] ^ if that confused you before in phabricator (blocking tasks and blocked tasks being kind of reversed) [19:17:28] we changed the labels [19:18:26] <^d> godog: 5 shards left to initialize then we'll be back to green [19:19:01] ^d: happy days! \o/ thanks for babysitting this time, I can resume with 1003 tomorrow now that we know what was wrong [19:19:09] <^d> Sounds good [19:20:22] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [19:29:19] 3Wikimedia-Mailing-lists, operations: New mailing list for wikidata-query-tech - https://phabricator.wikimedia.org/T89158#1028753 (10Legoktm) Why can't the already low traffic wikidata-tech mailing list be used? [19:30:02] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [500.0] [19:30:22] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [19:31:23] 3Wikimedia-Mailing-lists, operations, wikidata-query-service: New mailing list for wikidata-query-tech - https://phabricator.wikimedia.org/T89158#1028760 (10hoo) I agree with Legoktm, I don't think that a new list is needed. [19:33:04] mutante: if there’s a more normal place for images, I wouldn’t mind moving them if it lets us pull out a few lines of config. Any ideas? [19:33:21] They’re only in that weird /srv/org place because of old weird wiki setup on virt1000 as far as I know [19:37:21] 3Wikimedia-Mailing-lists, operations, wikidata-query-service: New mailing list for wikidata-query-tech - https://phabricator.wikimedia.org/T89158#1028765 (10Manybubbles) The folks that has asked me about making this list weren't particularly interested in other parts of wikidata-tech. We _can_ do it if that mak... [19:38:46] 3Wikimedia-Mailing-lists, operations, wikidata-query-service: New mailing list for wikidata-query-tech - https://phabricator.wikimedia.org/T89158#1028767 (10Lydia_Pintscher) Not much else is happening on that list anyway. [19:39:14] 3Wikimedia-Mailing-lists, operations, wikidata-query-service: New mailing list for wikidata-query-tech - https://phabricator.wikimedia.org/T89158#1028769 (10hoo) Are you talking about external (non WMDE/WMF) people here (like database vendors)? If so, that might make sense [19:42:37] (03CR) 10Anomie: "The reason for the revert? As mentioned in the commit message, application/x-gzip wasn't blacklisted to begin with. Thus the original patc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189746 (owner: 10Anomie) [19:42:39] (03PS1) 10Andrew Bogott: Add a backup cron to the nova database class. [puppet] - 10https://gerrit.wikimedia.org/r/189782 [19:43:02] (03CR) 10Andrew Bogott: "This is now documented here:" [puppet] - 10https://gerrit.wikimedia.org/r/189774 (owner: 10Andrew Bogott) [19:46:07] (03CR) 10Andrew Bogott: [C: 032] Add a backup cron to the nova database class. [puppet] - 10https://gerrit.wikimedia.org/r/189782 (owner: 10Andrew Bogott) [19:50:12] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: puppet fail [19:50:32] <^d> apergos: Got a sec? I'd like to push https://gerrit.wikimedia.org/r/#/c/164009/ through if you could have a look [19:50:44] <^d> I think it's legacy dumps stuff and isn't even needed, but I'd like to confirm [19:51:14] I don't know without testing dumps against it [19:51:43] wow this looks really old all right [19:51:45] hrm [19:51:48] <^d> Yeah. [19:52:03] <^d> I can't find any evidence that WIKIBACKUP is set as an env variable [19:52:11] <^d> (at least it's not in puppet) [19:52:11] I am… in the process of messing with the virt1000 puppet, everyone else can disregard [19:52:21] 3hardware-requests, operations: codfw: virtulization servers for misc services - https://phabricator.wikimedia.org/T89161#1028809 (10RobH) 3NEW a:3akosiaris [19:54:24] (03PS1) 10Andrew Bogott: I guess we need these target dirs if we're going to backup to them. [puppet] - 10https://gerrit.wikimedia.org/r/189783 [19:56:42] (03CR) 10Aaron Schulz: Revert "Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188586 (owner: 10Aaron Schulz) [19:57:49] andrewbogott: the normal place would be /mnt/upload7 but i only call it 'normal' because of: [19:57:57] 'default' => '/mnt/upload7/$site/$lang', [19:58:05] i havent actually looked on the appservers [19:58:10] greg-g: Is the train delayed? :D [19:58:15] just the mw config repo [19:58:20] <^d> apergos: How about we do it tomorrow, but earlier? I think it'll be harmless but I'd rather not blow up dumps so late in your day. [19:58:32] mutante: oh, yeah, fot a standalone box that’s not any simpler. [19:58:34] I'm trying to see if I can find anything [19:58:39] So I guess we can stick with the status quo [19:58:45] and yeah it's too late in my day for me to fix anything (10 pm) today [19:59:00] i suppose /mnt/upload7 only mounted on upload servers [19:59:04] ? [19:59:50] mm I bet it was on a few others [19:59:55] like *cough* fenari [20:00:03] andrewbogott: do we have local uploads or do we use commons now? [20:00:24] mutante: dunno. Local, I think. [20:00:27] I hope. [20:01:01] andrewbogott: noooo [20:01:05] andrewbogott: mark said to leave it at commons [20:01:07] marktraceur: I can reproduce the bug even on testwiki but I can not get a log [20:01:09] very specifically! [20:01:11] really? [20:01:13] yep [20:01:17] a stack trace or something [20:01:21] Doesn’t that mean that if commons goes down all the wikitech diagrams vanish? [20:01:29] 3operations: Our custom php packages need to create some conf.d links - https://phabricator.wikimedia.org/T89157#1028845 (10Krenair) [20:01:30] ......... yes [20:01:37] how does that affect static? [20:01:45] andrewbogott: i didnt think of that, seems like a good reason to revsit it [20:01:55] can you resurrect the thread if i tell you the subject line? ;] [20:01:55] ok, so i think we need to fix the ImagePath either way [20:02:02] y’know, I have no idea how images work on static. Maybe they don’t [20:02:11] I will find the thread, probably I neglected it when it first came through [20:02:16] 'Wikitech - uploads disabled to commons - intentional?' [20:02:27] sent initially on 2014-12-14 [20:02:32] <^d> You could configure wikitech-static to have wikitech as a commons-like thing too [20:02:33] thanks [20:02:34] sorry, 2014-12-22 [20:02:39] <^d> So you'd get both wikitech's and commons' images [20:02:39] fucking dates in non iso in gmail. [20:02:48] foreign file repository, yeah [20:02:52] well, i think that ideally wikitech would use commons [20:02:54] we _could_ also let puppet create /mnt/upload7 on wikitech, just so that it's like cluster wikis, to avoid the difference in mw config.. [20:02:58] and wikitech static would import them [20:03:05] cuz if we hit static due to cluster being down [20:03:08] we may need a diagram. [20:03:15] hm, static at least has images… [20:03:16] its a very very valid issue that i just hadnt thought of [20:03:28] andrewbogott: yea, but what if its a commons image? [20:03:29] https://wikitech-static.wikimedia.org/wiki/File:Geowiki_workflow.png [20:03:29] seems to host its own images [20:03:42] most of wikitech images are local from history [20:03:51] only recently went to commons, we need to find a commons file on wikitech, heh [20:03:59] in any case, it should _not_ be /usr/local/something [20:03:59] we should allow upload directly to wikitech [20:03:59] and see how it copies (if it does or just links or what) [20:04:09] <^d> You can configure instantcommons to cache images from commons for an absurdly long time [20:04:15] <^d> Then you don't have to fuck with copying images over [20:04:38] seems reasonble as long as something silly can't dump the cache? [20:04:40] .... that seems like a good answer [20:04:44] like an apache restart [20:04:58] <^d> They're stored on disk [20:05:03] ^d: I've run dumps a bunch locally without ever setting that variable so I think it's probably ok but let's do it tomorrow if that's ok with you [20:05:11] <^d> Sounds good to me [20:05:27] andrewbogott: I think ^d just gave us the best answer [20:05:35] we use the extension to cache the commons images on wikitech [20:05:36] ^d nice :) learned something today cool [20:05:39] and still use commons to store our things [20:06:00] using commons is a nice thing for community fuzzies i'd think [20:06:03] using our own stuff ;D [20:06:25] andrewbogott: or we can just push back to local uploads on wikitech, but i didnt want you to go directly against what mark said on email and get in trouble =] [20:06:41] I am so far behind… just because we agreed that wikitech /should/ use commons, did someone implement that? (I still haven’t finished reading the email thread) [20:06:49] andrewbogott: it was until you moved it. [20:06:58] so at some point before december, it stopped beigng local uploads [20:07:03] i dunno when or why, as i said in the email [20:07:08] andrewbogott: I think ppl started using commons because of local being messed up [20:07:12] and it's a bit of a defacto thing now [20:07:13] but until you moved it to silver, it was disabled for local uploads [20:07:29] and mark agreed that while it wasnt a calculated decision, he liked the result enough that we should try it [20:07:40] and use commons for our diagrams unless we ran into a good reason not to [20:07:47] this may count as that reason. [20:07:56] (03PS1) 10Cmjohnson: Fixing dhcpd entry for mc1017/18 [puppet] - 10https://gerrit.wikimedia.org/r/189785 [20:08:05] but if it does, we need to update folks on what to do [20:08:13] (though it seems no one reads my emails anyhow! ;p ) [20:08:17] <^d> o_O [20:08:19] ok, it is going to take me a few minute to digest this [20:08:23] * ^d stares harder at wikitech config [20:08:24] heh [20:08:37] I think it disabled the local uploads due tto some new licensing page addition [20:08:43] 3Multimedia, operations, MediaWiki-extensions-UploadWizard: Investigate server error when uploading an OGV - https://phabricator.wikimedia.org/T89018#1028856 (10akosiaris) I can reproduce this on https://test.wikipedia.org/wiki/Special:UploadWizard but can't actually get something relevant in the log files [20:08:55] So it seems it may have simply been disabled when wikitech went over the train deploy, as it is lacking content in its MediaWiki:Licenses page. [20:08:55] If we decide to re-enable on wikitech (it seems rude of me to simply do so now that I brought it up on here), we could just copy over the officewiki options: [20:08:55] https://office.wikimedia.org/wiki/MediaWiki:Licenses [20:08:57] (https://gerrit.wikimedia.org/r/#/c/136520/ shows when the page was required.) [20:09:00] (copy from my email) [20:09:04] (03CR) 10Cmjohnson: [C: 032] Fixing dhcpd entry for mc1017/18 [puppet] - 10https://gerrit.wikimedia.org/r/189785 (owner: 10Cmjohnson) [20:10:52] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:11:10] ok, so we want to… enable commons, and add the instantcommons extension, and set some absurdly long caching interval… is that where we’re heading? [20:11:21] or jsut say its all too much work [20:11:25] and we should go to localuploads [20:11:35] because its just WAY easier. [20:11:37] enabling instacommons is just setting a global [20:11:43] heh [20:11:45] this doesn’t sound like that much work to me [20:11:59] I would ++commons but since I'm not doing it will defer [20:12:00] then putting it in commons is likely the best way to de-dupe things. [20:12:17] and promotes goodwill in that ops puts things where folks can see and edit [20:12:18] etc... [20:12:35] <^d> It's a few line config change in wikitech.php [20:12:39] bd808 replied tot htat thread to tag things in [[Category:Wikimedia_technical_operations]] [20:12:44] <^d> Swap out $wgUseInstantCommons for some real config [20:12:44] which is a good idea and i'd advise following it [20:12:49] (for your wikitech uploads to commons) [20:13:19] andrewbogott: so yea, if you do all that i agree with chase that using commons is good) [20:13:20] <^d> (fwiw, wikitech already has instant commons as well) [20:13:40] ^d: you are doing a fine job as the new reedy for ops today. [20:13:51] ^d want to write a patch that just does all this? Since I’m still a bit lost? [20:14:11] yea reedy would have written a patch! ;D [20:14:21] * robh is just being a shit now. [20:14:54] <^d> I already am :p [20:14:59] 3Multimedia, operations, MediaWiki-extensions-UploadWizard: Investigate server error when uploading an OGV - https://phabricator.wikimedia.org/T89018#1028863 (10MarkTraceur) So weird. [20:15:10] so if wikitech is caching them the cached copies will pull to wikitech-static? [20:15:17] doesnt static need to have the extension enabled as well? [20:15:31] (which we can simply steal the config from wikitech, just asking) [20:16:25] robh: I’m pretty confused about how wikitech-static works. So, I’m not sure. [20:16:30] i'd think it does [20:16:46] or else it'll try to pull the commons file rather than the local cache [20:17:03] andrewbogott: wikitechstatic is just a mediawiki install where we overwrite its content regularly i thought [20:17:11] I wonder if I should rebuild -static and put it on the deployment train as well? [20:17:13] (03PS1) 10Chad: wikitech: use big kid settings for commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189786 [20:17:17] Probably not today! [20:17:20] its off cluster [20:17:20] robh: yeah, that’s right. [20:17:22] its on rackspace [20:17:29] so not sure how it could hit the deployment train [20:17:36] It would be annoying, that’s for sure. [20:17:42] IIRC InstantCommons is just a config variable in core that sets up a foreign file repository entry for commons [20:17:42] plus it only has to be read only [20:17:45] Ah, I guess deploy probably wouldn’t like fqdn [20:17:48] did we deploy the train yet? [20:17:49] indeed [20:17:53] <^d> Krenair: Yes [20:17:56] <^d> aude: Ask twentyafterfour [20:18:00] ok [20:18:09] wikitech-static is likely never going to be anything other than a standalone mw install that gets content overwritten [20:18:20] <^d> We could automate the code pull [20:18:22] not sure if it's suitable for wikimedia production usage [20:18:23] <^d> cron! [20:18:26] tying a third party virtual machine platform to the deploy would be horrible i'd think. [20:18:35] <^d> Krenair: It works well enough for wikitech and has for awhile [20:18:40] in particular since we ONLY care about it having the copy of wikitech, not editing at all. [20:18:56] ok [20:19:17] robh: yes. My only worry is if content ever becomes incompatible with the version of mw it’s running. I try to update it now and then but that’s sporadic. [20:19:29] <^d> I wonder if we could make it really static [20:19:30] most of wikimedia has some custom foreign file repo config for commons instead [20:19:31] <^d> Just html scrape [20:19:34] <^d> And pull any images [20:19:35] (03PS1) 1020after4: Group1 wikis to 1.25wmf16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189787 [20:19:36] That might be a thing that never happens though (content vs. mw version) [20:19:44] at the end of wmf-config/filebackend.php [20:19:49] andrewbogott: perhaps an icinga check that then at least annoys us? [20:19:54] email that is. [20:20:06] robh: yes, monitoring to ensure page loads would be good. [20:20:08] * andrewbogott makes a note [20:20:10] once its all nicely up to date and documented that is [20:20:15] right now we dont check shit [20:20:23] and one day we're going to need it and not have it ;P [20:20:54] !log restarting phd on iridium (phab) for config change [20:21:01] Logged the message, Master [20:21:08] (03CR) 1020after4: [C: 032] "self-submitting because our deployment system is insane" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189787 (owner: 1020after4) [20:21:54] * twentyafterfour is about to deploy the train, slightly behind schedule [20:22:01] twentyafterfour: ok, thanks [20:22:01] aude: ^ [20:22:19] twentyafterfour: let me know if there are any hiccups on silver — I think I just fixed the last ones. [20:22:20] don't think we need anything special for wikidata this time [20:23:56] andrewbogott: sorry for just adding a bunch of shit to your to do list =[ [20:24:05] ^d: thx for the help on this stuff =] [20:24:19] it’s ok. Getting wikitech-static working properly was one of the motivators for all of this wikitech stuff. [20:24:27] akosiaris: Thanks for looking into it, is there anything else you can think of or are we going to need to keep digging elsewhere? [20:24:35] well that and the uploads, i could have just kept quiet and likely no one would have noticed for months and months [20:25:06] much like when it moved to commons, but i disliked finding it out back then like that so figured avoid that for others... [20:26:34] Is it possible to change the text on the upload file page? [20:27:04] Where we could perhaps include instructions that not only should we use commons (like it says) but suggest the category [[Category:Wikimedia_technical_operations]] [20:27:05] (03Merged) 10jenkins-bot: Group1 wikis to 1.25wmf16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189787 (owner: 1020after4) [20:27:20] https://wikitech.wikimedia.org/wiki/Special:Upload [20:27:36] seems the best place to mention it since folks who dont know will hit that page first [20:27:50] marktraceur: I am still looking but to be honest I am a bit at a loss [20:27:56] <^d> Where do we want the local images to be cached? [20:27:59] <^d> For wikitech? [20:28:03] andrewbogott: ^ [20:28:04] somehow this thing does not get logged ... [20:28:09] <^d> I'm guessing not /usr/local/apache/ :p [20:28:17] akosiaris: That's what I'm saying, it's weird [20:28:22] indeed, id avoid that [20:28:26] ^d: https://gerrit.wikimedia.org/r/#/c/189655/2/wmf-config/InitialiseSettings.php [20:28:33] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.25wmf16 [20:28:42] Logged the message, Master [20:28:46] akosiaris: Did anything change recently about the API on Commons? We didn't switch it to hhvm or anything, right? [20:28:56] ^d: mutante's link shows » 'labswiki' => '/srv/org/wikimedia/controller/wikis/images', [20:29:03] plus the other mounts [20:29:05] ^d: I don’t care, somewhere on /srv [20:29:09] 'cause if that happened within the past week or two I'd probably start thinking about it as the cause [20:29:12] but the /srv is the newer standard [20:29:19] mutante: thats why you were linking right? [20:29:23] robh: that’s where actual uploaded content is I think [20:29:28] not sure if it makes sense to put the cache there [20:29:50] the /srv makes sense [20:29:55] but not sure behyond that [20:30:27] marktraceur: HHVM ? I am pretty sure we are on HHVM [20:30:37] plus I can reproduce it on test.wikipedia.org which is HHVM [20:30:43] akosiaris: For API traffic? [20:30:50] I wasn't sure that had happened [20:30:53] whereever the cached files are located, i just want to point out that currently: [20:30:59] failed to mkdir "/usr/local/apache/images/thumb/ [20:31:02] this happens all the time [20:31:07] imagescalers, videoscalers are not yet on HHVM [20:31:22] so either it needs to stop tryin to make directories in /thumbs [20:31:24] what's the issue? [20:31:29] or that needs to be changed away from /usr/local/ [20:31:31] * ori can't find it in scroll-back buffer [20:31:34] marktraceur: ^ [20:31:40] chad jsut wants to know where [20:31:45] ori: https://phabricator.wikimedia.org/T89018 [20:32:18] ori: UploadWizard dies with unknown error in prod, zero response from the server (not blank, *no response*, not even headers), only/mostly with big files [20:32:32] ideally we'd just make sure the cached directory copies to wikitech-static as well [20:32:33] Not reproducible in the same form on beta, you get either a 503 or the API help page as a response [20:32:36] so having it well defined would be ncie. [20:32:54] (03CR) 10Aaron Schulz: wikitech: use big kid settings for commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189786 (owner: 10Chad) [20:33:10] i'd say /srv/org/wikitech/cache but meh its default was /usr/local/apache [20:33:11] PROBLEM - Varnish HTTP upload-backend on cp1064 is CRITICAL: Connection refused [20:33:27] sorry, /srv/org/wikimedia/wikitech/cache [20:33:32] <_joe_> marktraceur: since when? [20:33:39] i forgot we have the fqdn in the directory (typically) [20:33:42] _joe_: Near as I can tell, middle of next week [20:33:48] last* [20:33:48] ^d: so thats my view, but fuck if its right ;D [20:33:53] Time-travelling bug. [20:34:12] andrewbogott: silver is wikitech? [20:34:16] yes [20:34:19] silver is wikitech [20:34:19] twentyafterfour: yep, as of yesterday [20:34:22] (seems to be just fine) [20:34:25] cool [20:34:26] thanks [20:34:40] (it got the new version, doesn't look broken) [20:34:42] andrewbogott: do you have a view on where the extension should cache the images from commons? [20:34:46] you probably want /srv/org/wikimedia/controller/wikis/ [20:34:47] _joe_: Maybe later - I think the complaints started to make sense around Saturday afternoon [20:34:57] because the imaes are in that ./controller/ dir for historic reasons [20:35:05] <_joe_> ok so definitely not HHVM [20:35:10] 'kay [20:35:12] robh: no, other than on /srv and ‘somewhere unsurprising’ [20:35:16] well [20:35:20] <_joe_> I'd look at deploys tbh [20:35:21] someone has to tell chad so he can finisht he patch [20:35:32] <_joe_> or, the outage changed something? [20:35:37] (dont make the dude who doesnt have to do this for us wait ;) [20:35:38] That may be. [20:35:50] mutante: if you feel strongly enough to argue that versus my view, i'll bow out [20:35:56] i dont feel strongly about it. [20:36:04] ^d: can you tell me all the directories that you need to know? I will formulate a scheme and make sure puppet creates them. [20:36:46] <^d> We just need a directory writable by the apache process so it can save the cached images it fetches from commons. That directory can be anywhere on disk. [20:36:46] he doesnt know yet [20:37:00] he was asking us what to call it ;] [20:37:05] it is trying to write but it cant [20:37:09] oh, just the one? [20:37:14] <^d> Yes, just one. [20:37:18] to cache all the commons images we pull. [20:37:22] RECOVERY - Varnish HTTP upload-backend on cp1064 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.020 second response time [20:37:23] ^d: isn’t thre also a directory for thumbs, or something? [20:37:37] <^d> We don't fetch originals, just thumbs [20:37:50] oh, .... well that doesnt fix gthe issue if its just thumbs [20:37:52] did you look at the gerrit link? [20:37:56] what happens when we want to view the actual diagram? [20:38:06] and commons is down. [20:38:36] here: silver:/srv/org/wikimedia/controller/wikis/images# [20:38:44] make "thumbs" inside that [20:38:46] via puppet [20:39:07] <^d> Got it [20:39:23] unless we are going to ask for thumbs that are like 90% of the actual scale [20:39:25] Bah, I’m still behind, didn’t clue in that those were the same thing [20:39:36] so again, if this extension is thumbs only [20:39:39] then we are wasting time [20:40:05] (but maybe not cuz im confused about it since someone keeps bringing up thumbs) [20:40:07] <^d> Guess it depends on how many pictures you upload that are big :) [20:40:30] marktraceur: do i need special permissions to upload files >100mb? [20:40:46] ^d: so this extension doesn't cache the full versions, only the thumbs? [20:40:52] <^d> Right [20:41:00] ^d: Ok, here’s the scenario. Suppose I visit https://wikitech.wikimedia.org/wiki/Puppet#Making_changes [20:41:04] And commons is down [20:41:15] and that image is hosted on commons (or instantcommons or whatever) [20:41:19] what happens when I click on the image? [20:41:33] <^d> You'd go to the local file page which would say "This image is from commons!" [20:41:37] ori: chunked uploads allows you more. " maximum file size (1000MB[1] as opposed to the usual 100MB)" [20:41:40] <^d> "Click here to go there!" [20:41:51] [[File:puppetquery.png |This could be worse, believe me.]] [20:41:51] ^d: and that page would or wouldn’t include the full-sized image? [20:42:20] <^d> It would include the thumb on the image page. If it's a super big image that's scaled down, no, you wouldn't have an original. [20:42:25] ah [20:42:30] <^d> If it's big enough to fit on that page, you'd get the full size. [20:42:36] mutante: how do i enable that? [20:42:37] meh, seems like it may be ok for most but not all use cases [20:42:38] ok, so then we still have a potential failure case if commons is down [20:42:43] <^d> /That picture is also a thumb/ [20:42:45] and seems like perahsp we should just opt out of commons due to this. [20:42:53] ori: https://upload.wikimedia.org/wikipedia/commons/thumb/0/09/Chunked_upload_checkbox.png/640px-Chunked_upload_checkbox.png [20:42:55] unless we wanna force scale thumbnail sizes for larger images. [20:42:58] (which is hacky) [20:42:58] (03PS1) 10BBlack: jessie vm tuning: back to more-conservative-ish [puppet] - 10https://gerrit.wikimedia.org/r/189790 [20:43:00] (03PS1) 10BBlack: downsize upload frontend mallocs to 1/12 [puppet] - 10https://gerrit.wikimedia.org/r/189791 [20:43:06] (totally doable, but hacky) [20:43:09] ori: https://commons.wikimedia.org/wiki/Commons:Chunked_uploads [20:43:15] mutante: thanks [20:43:36] * ^d is curious if there's any diagrams we'd need to reference if commons was down anyway [20:43:39] (03CR) 10BBlack: [C: 032 V: 032] jessie vm tuning: back to more-conservative-ish [puppet] - 10https://gerrit.wikimedia.org/r/189790 (owner: 10BBlack) [20:44:04] the idea is wikitech-static should be a fully read only accessible version of wikitech [20:44:06] (03CR) 10BBlack: [C: 032 V: 032] downsize upload frontend mallocs to 1/12 [puppet] - 10https://gerrit.wikimedia.org/r/189791 (owner: 10BBlack) [20:44:07] becaues we dont know [20:44:20] <^d> This isn't wikitech static we're talking about though [20:44:22] what if the backends have a cabling diagram and they go down and have to be redone [20:44:30] well, it is, because wikitech-static is simply a copy of wikitech [20:44:31] (03CR) 10Nikerabbit: "Thanks. I didn't notice that part." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189746 (owner: 10Anomie) [20:44:47] when its all localupload media, it gets copied [20:45:00] the issue now is when a page about something with an imagine is duplicated to static [20:45:07] and we need to see the diagram, and commons is down [20:45:12] because the entire cluster is down [20:45:20] * ^d abandons commit [20:45:22] so network diagrams and the like exist for reference [20:45:28] (not sure if im making sense) [20:45:48] I think good principles are: 1) wikitech is fully functional if any random piece of prod goes down 2) wikitech-static is fully functional if /all/ of prod goes down [20:45:57] +1 [20:46:00] It seems like relying on commons violates 1, regardless. [20:46:01] a script on wikitech-static could fetch the full size images from commons by file name [20:46:14] <_joe_> marktraceur: can you point me to the code handling the chunked uploads? [20:46:28] mutante: but then we have to rewrite all the image urls on static right? [20:46:53] so then they need to be re-written with every update of static from wikitech [20:47:32] how often does -static sync up anyways? [20:47:40] ideally it would be hourly to daily [20:47:52] as we may need to reference any document change for outage repair [20:47:55] i'm not sure about that, robh, but first thought is the urls should not change [20:48:02] akosiaris, marktraceur, _joe_: [20:48:03]

413 Request Entity Too Large

[20:48:03]
nginx/1.6.2
[20:48:10] blame bblack [20:48:13] * ^d votes that someone who knows MW sets up wikitech-static for ops [20:48:25] ori: what is that? [20:48:25] robh: just to reconfirm… wikitech /currently/ uses only local uploads, right? [20:48:32] Huh [20:48:33] NO [20:48:36] ori: Where did you get that? [20:48:38] bblack: when trying to upload large file on commons [20:48:48] marktraceur: that's what the api.php response ends up being, if you check in network inspector [20:48:54] how large? [20:48:55] <_joe_> that makes sense btw [20:48:57] 150mb [20:48:58] andrewbogott: that is not correct. Wikitech is predominiantly local uploads, but there are some that are not [20:49:05] i don't know where or how many [20:49:06] I've been checking network inspector for days, it never gave me *any* response [20:49:08] 3operations, Beta-Cluster: mwscript hardcoded with user 'apache', should use ::mediawiki::users::web - https://phabricator.wikimedia.org/T89165#1028974 (10hashar) [20:49:19] ori: chunked? [20:49:22] andrewbogott: however, its so few we can allowt hem to break imo [20:49:25] robh: ok, but, moving forward that’s the case? I mean, it’s currently configured only to use local files for new images? [20:49:28] <_joe_> bblack: it should be, yes [20:49:31] bblack: yeah [20:49:33] as long as we push a notification that folks need to know and update accordingly [20:49:46] does it work over http instead of https? [20:49:50] andrewbogott: well, moving forward then you went directly against what mark said to do in the email. [20:49:57] which is why i brought it up in the first place. [20:49:57] is this specific to nginx 1.6 frontend? [20:50:11] <_joe_> I don't think so [20:50:11] robh: OK, I’m not speaking in theory, though, but in practice! What is wikitech /currently/ configured to do with new uploads? [20:50:21] i dunno you just moved it man [20:50:25] is it possible to enable chunked uploading for logged-out (http)? [20:50:26] it seems to be set to upload locally. [20:50:28] alas [20:50:31] ok :) [20:50:43] ori: UW is the only thing that uses it, and it's only for logged-in [20:50:46] ori: I guess not [20:50:57] All upload is logged-in only [20:51:07] Autoconfirmed only outside Commons [20:51:08] mutante: I don't know how you could make wikitech use commons, and wikitech-static use local images, and not rewrite image urls. [20:51:10] in any case, I'm about to disable the only nginx 1.6 frontend you could be hitting, let's see if that changes anything [20:51:21] ^d: i agree, it would be nice if wikitech-static had mw specific knowledge setup. [20:51:23] <^d> it already uses commons. [20:51:32] !log cp1064 frontend disabled in pybal [20:51:34] <^d> and it also knows how to mangle urls when creating thingies. [20:51:39] robh: it would be a link like "Image:Foo" on the local wiki in either case [20:51:41] Logged the message, Master [20:51:45] whether it gets it from commons or not [20:51:52] <^d> That ^ [20:51:53] _joe_: btw, there's also this: [20:51:56] - EnableEarlyFlush, ForceChunkedEncoding [20:51:56] EnableEarlyFlush allows chunked encoding responses, and ForceChunkedEncoding will only send chunked encoding responses, unless client doesn’t understand. [20:51:59] (https://github.com/facebook/hhvm/wiki/runtime-options) [20:52:05] ok so who is writing the script to sync commons images? [20:52:23] i dont have any personal preference, i just brought up the issue that andrew reenabled something that mark said to keep disabled. [20:52:58] i dont want to have a preference, since NONE of the work to make either option the default involve me doing shit ;] [20:53:08] i just want there to be a working standard. [20:53:12] <_joe_> ori: I bet something along the (long) chain doesn't speak chunked encoded properly [20:53:44] andrewbogott: so yea.... i dunno what to add here other than what i did, so ive told you what i know and im bowing out of this conversation ;] [20:55:19] bblack: Still not working for me, but I'll try again [20:55:56] robh: I’m responding to the Ops thread (finally). [20:56:06] same here [20:56:18] <^d> I wonder if I could make instant commons also fetch originals [20:56:23] <^d> That'd be interesting [20:56:35] <^d> rabbit hole! [20:56:36] <_joe_> I was pretty sure that wasn't the problem [20:56:37] * ^d runs [20:56:50] marktraceur: can you confirm the 413 doesn't say nginx/1.6? [20:56:52] <_joe_> marktraceur: so it works on beta, not in prod? [20:56:57] I never got the 413 [20:57:03] <_joe_> that was ori [20:57:12] so two different problems? [20:58:00] Maybe, or maybe Firefox is too stupid to understand a 413 response and ori's using Chrome? [20:58:14] (03Abandoned) 10Chad: wikitech: use big kid settings for commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189786 (owner: 10Chad) [20:58:20] i am using chrome, am mid-way through my second attempt (post-cp1064-depool) [20:58:43] chrome as well and I never got the 413 [20:59:22] 413 Request Entity Too Large [20:59:22] nginx/1.6.2 [20:59:31] that was definitely after you depooled it [20:59:33] (03CR) 10Chad: [C: 032] fix image path on wikitech after switch to silver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189655 (owner: 10Dzahn) [20:59:36] that's the result i just got, fwiw [20:59:53] <_joe_> akosiaris: are you pointing to eqiad? [20:59:58] i am using the X-Wikimedia-Debug header, though, so i am getting routed to mw1017 [21:00:08] <_joe_> mh are you? [21:00:15] <_joe_> lemme check [21:00:18] _joe_: nope, esams [21:00:21] (03Merged) 10jenkins-bot: fix image path on wikitech after switch to silver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189655 (owner: 10Dzahn) [21:00:28] ^d: thanks for your help, sorry about changing horses mid-stream. [21:00:34] <_joe_> akosiaris: maybe fix your dns, that may help nailing this down [21:00:34] but I can point me to eqiad, that's easy [21:00:36] hoo@fluorine:/a/mw-log$ wc -l localhost.log [21:00:36] 41405880 localhost.log [21:00:40] do we really need that? [21:00:43] <^d> mutante, andrewbogott: upload path fixed for wikitech. to turn uplodas back on we need to add some licenses [21:00:47] <^d> even if they're bogus [21:01:23] !log demon Synchronized wmf-config/InitialiseSettings.php: fixing upload path for labswiki (duration: 00m 06s) [21:01:30] Logged the message, Master [21:01:59] bblack: even when i turn off the X-W-Debug header, the response headers contain: server:nginx/1.6.2 [21:02:13] ^d: adding licenses is a thing done on-wiki, not in the config right? [21:02:17] 3operations, Project-Creators, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1029023 (10Krenair) This wasn't a discussion until after the creation had already taken place, because I came here to object to the way it had been done. Removing the requirement for these reques... [21:02:25] ori: what url hostname is the request hitting? [21:02:34] <^d> andrewbogott: Yes [21:02:38] bblack: https://commons.wikimedia.org/w/api.php [21:02:45] * andrewbogott adds yet one more thing to the todo list [21:03:06] oh, commons is text-lb [21:03:22] I just assumed uploads went through, you know, upload-lb :p [21:03:35] <_joe_> ori: does X-W-Debug catch api as well as the sites? [21:03:39] yeah [21:03:40] <^d> andrewbogott: I'll make it easy [21:04:02] thanks! [21:04:07] <^d> Copy text from: into [21:04:09] <^d> Press save [21:04:11] <^d> Profit!! [21:04:18] !log cp1065 frontend disabled in pybal temporarily [21:04:18] bblack: here's the request exported as a curl invocation: https://dpaste.de/qF1z/raw i changed centralauth_Token and commonswikiSession cookie values to 'XXX' but the request is otherwise reproduced faithfully [21:04:18] yea but mark said not to do that =p [21:04:23] Logged the message, Master [21:04:25] cuz i pointed that out in the mailing list thread! [21:04:26] ;D [21:04:38] mostly cuz either ^d or someone else told me [21:04:46] honestly I have no desire to get deep into this right now [21:04:53] <^d> All I see in the thread is "we can try commons but can do local uploads again if it's hard" [21:05:10] indeed, and someone has to reply back saying its hard and explain why [21:05:15] heh [21:05:15] if there's a real functional issue, I'll disable the nginx/1.6 frontends and we can make a phab task to investigate [21:05:43] ori: try now, should hit non-1.6.2 [21:05:50] marktraceur: unrelated bug: [UploadWizardFlowEvent] Value "retry-uploads-button-clicked" for property "event" is not one of ["upload-button-clicked","flickr-upload-button-clicked","leave-page"] [21:05:51] robh: I’m going to disable local uploads but now chad’s suggested edit is in the history which makes it easier once I get my way [21:05:59] Yeah I saw that [21:06:15] no one can see this [21:06:21] but this conversation makes me start laughing [21:06:28] cuz im so happy im not having to push it on list. [21:06:54] * robh had to try to push this topic two months ago and failed [21:07:21] Thanks for the bug report ori :) [21:07:51] bblack: now the connection just resets [21:07:52] *sigh* [21:07:57] there should be a big red banner on top of the EventLogging docs saying "DON'T USE ENUM. YOU'LL BE SORRY IF YOU DO." [21:08:39] ori: well, seems like 1.6 is better then! [21:09:04] (03PS2) 10Andrew Bogott: I guess we need these target dirs if we're going to backup to them. [puppet] - 10https://gerrit.wikimedia.org/r/189783 [21:09:16] ori: bblack I never get the 413, just connection resets like ori just did [21:09:35] my IP is 73.162.156.245, anything in the nginx logs for that? [21:10:21] <_joe_> ori... can you try that on the backend directly? (you'd need to upload the file on somewhere ) [21:10:37] <_joe_> I suspect hhvm/apache respond 413 [21:10:39] i actually have to go, but i agree that someone should do that [21:10:46] <_joe_> we may need to tune that [21:10:53] marktraceur: do you know how to set up a SOCKS proxy by sshing to bastion? [21:11:02] Ehhh, know how to maybe [21:11:05] ori: we don't log locally for nginx [21:11:08] But I haven't done it in some time [21:11:13] <_joe_> ok, sorry, it's a bit late for me to do that, and I am supposedly watching TV and NOT working [21:11:18] well, maybe for errors, I'll check that [21:11:52] 3Labs: Puppet logs should be timestamped in a human-readable way - https://phabricator.wikimedia.org/T88108#1029069 (10coren) p:5Triage>3Low [21:12:06] (03CR) 10Alex Monk: [C: 04-1] "(see task, that namespace may not have been fully cleaned up)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172012 (https://phabricator.wikimedia.org/T75164) (owner: 10Dereckson) [21:12:16] ori: 2015/02/10 20:57:02 [error] 26175#0: *160941080 client intended to send too large body: 133365469 bytes, client: 66.41.61.126, server: wikimedia.org, request: "POST /w/api.php HTTP/1.1", host: "commons.wikimedia.org", referrer: "https://commons.wikimedia.org/wiki/Special:UploadWizard?debug=true" [21:12:20] ori: bblack: I actually got something[error] 6649#0: *193359485 client intended to send too large body: 133365325 bytes, client: [21:12:25] heh [21:12:28] oh.. here we go [21:12:28] not you, but probably the same thing [21:12:35] the fact that the error doesn't happen on beta also fingers the ssh layer [21:12:35] yes, that is me [21:12:48] errrrrr, https [21:12:49] (03CR) 10Andrew Bogott: [C: 032] I guess we need these target dirs if we're going to backup to them. [puppet] - 10https://gerrit.wikimedia.org/r/189783 (owner: 10Andrew Bogott) [21:12:49] s/ssh/ssl/ ? [21:12:49] not ssh [21:12:53] yes :) [21:12:57] that error is directly from nginx, yes [21:13:01] it's probably a config setting [21:13:33] client_max_body_size [21:13:44] Fiona: who's Max? [21:13:55] root@cp1066:/etc/nginx# grep client_max_body -r . [21:13:55] ./nginx.conf: client_max_body_size 100m; [21:14:02] right; the file is 150m [21:14:11] ori: he's been asking for a while, couldn't find out yet [21:14:11] speaking of hhvm... looks like hhvm error rate increased after the train deploy of 1.25wmf16 [21:14:22] we're explicitly configured for 100m, and have been. this is nothing new. so how has this been working before? [21:14:33] twentyafterfour: fatals or exceptions? [21:14:35] I suspect that must not apply to working chunked? [21:14:39] if you use chunked uploads you are allowed 1000m [21:14:42] looks like the chunked upload does not get chunked? [21:14:44] Given that we still don't have backtraces for fatals, I have no idea what's up [21:14:47] twentyafterfour: I just got back, how'd the deploy go? [21:14:52] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:15:04] hoo: the entries tagged hhvm on https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor [21:15:28] it should be the client's responsibility to break it up into reasonably-sized pieces [21:15:56] That must be it... [21:16:02] !log rebooting cp1064 for experimental kernel (is depooled) [21:16:02] So...now I need to figure out what changed [21:16:10] Logged the message, Master [21:16:17] Back to the salt mines [21:16:19] it's not actually chunking [21:16:31] 3Wikimedia-Mailing-lists, operations, wikidata-query-service: New mailing list for wikidata-query-tech - https://phabricator.wikimedia.org/T89158#1029093 (10Manybubbles) Yeah, database vendors, the folks at google who were interested. [21:16:44] Thanks for the help everyone [21:16:54] twentyafterfour: over 4h it has significantly reduced AFAIS [21:17:06] But still, I need freaking backtraces [21:17:08] hoo: ok I was looking at past hour only [21:17:34] <_joe_> so the problem is uploadwizard is not chunking the request? [21:17:39] looking at the last 1 hour with 5m interval on fatalmonitor dashboard, there was a spike and noticeably more hhvm entries post-version-bump [21:18:02] PROBLEM - Host cp1064 is DOWN: PING CRITICAL - Packet loss = 100% [21:18:03] <_joe_> I'm not sure of that, I'd have to look at details of how nignx work, not now [21:18:31] greg-g sorry I didn't see your message. deploy went ok [21:18:37] mutante: don't be like that, I liked tracking the discussion in here and via email :p [21:18:38] twentyafterfour: sweet :) [21:18:43] but if this is a new failure: nothing related has changed for the current nginx/1.4 hosts [21:19:13] no major issues, I'm just poking the logs trying to fish out anything that needs to get reported in a phabricator ticket [21:19:25] _joe_: Seems like it, yes, I'm looking into it [21:19:29] greg-g: ^ [21:19:46] twentyafterfour: rock [21:19:47] JohnFLewis: which discussion about what [21:20:02] Whether the list should or should not be created [21:20:26] yea, sorry, i dont think all of ops need to be involved in a discussion whether a list should be created or not [21:20:37] there should be consensus first [21:20:48] also, doesn't need ops because others have the needed access [21:21:20] this one pushed other updates out of my notifications [21:21:28] Yeah Yeah but IRC tracking ;) (I get the point) [21:21:36] IRC "tracking"? [21:21:41] 3Release-Engineering, Wikimedia-General-or-Unknown, operations, WMF-Design: Better WMF error pages - https://phabricator.wikimedia.org/T76560#1029124 (10greg) See also: notes on 404/500 error pages from @bd808: https://phabricator.wikimedia.org/P279 [21:22:08] mutante: nevermind :) (only being my joking self) [21:26:28] 3Release-Engineering, Wikimedia-General-or-Unknown, operations, WMF-Design: Better WMF error pages - https://phabricator.wikimedia.org/T76560#1029152 (10Nirzar) @greg thanks, this makes things a lot clear :) only one small thing, /mediawiki-config/503.html this page has language selection in the footer but the w... [21:27:42] Huh, it might be Firefogg's fault after all [21:29:59] Or just a normalsauce bug. [21:32:41] know which database replaced db1001? [21:32:52] db1001 is not found anymore [21:33:26] 3Labs, Wikimedia-Labs-Infrastructure: Puppet failure on new instance: Various package have no installation candidate - https://phabricator.wikimedia.org/T84967#1029178 (10coren) 5Open>3Invalid a:3coren Bootstrap issue; some packages only "exist" after several puppet runs because the repos from which they c... [21:33:26] ignore me, my bad, it does exist [21:33:47] it exists, is reachable and mysql on it is fine [21:33:50] yeah :P [21:34:07] "normalsauce bug"? Isn't that normally called a "feature"? :-) [21:36:34] feature request as in 'enhancement' doesn't exist anymore in phab as it it did in BZ [21:38:36] !log repooled cp1065 eqiad text frontend in pybal [21:38:43] Logged the message, Master [21:39:08] springle: Are you working on labsdb1003? [21:40:22] So it turns out an upload object is not a config object [21:40:24] Who knew? [21:40:43] Fix patch coming, hopefully we'll get it out either during the evening SWAT today or during the morning SWAT tomorrow. [21:45:11] (03CR) 10QChris: Correcting docs and thresholds for eventlogging alarms (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/189588 (owner: 10Nuria) [21:46:52] RECOVERY - Host cp1064 is UP: PING OK - Packet loss = 0%, RTA = 1.25 ms [21:49:02] PROBLEM - Varnish HTTP upload-backend on cp1064 is CRITICAL: Connection refused [21:49:32] PROBLEM - Varnish HTTP upload-frontend on cp1064 is CRITICAL: Connection timed out [21:49:41] PROBLEM - Varnish traffic logger on cp1064 is CRITICAL: Timeout while attempting connection [21:49:56] * greg-g just looks in [21:50:04] assuming that's all ok (varnish) [21:50:12] PROBLEM - Host cp1064 is DOWN: PING CRITICAL - Packet loss = 100% [21:51:38] greg-g: yes, cp1064 is depooled, I'm playing with it [21:51:44] kk [21:51:52] RECOVERY - Host cp1064 is UP: PING OK - Packet loss = 0%, RTA = 1.27 ms [21:52:28] 3operations: Puppet broken on silver.wikimedia.org - https://phabricator.wikimedia.org/T88513#1029224 (10Dzahn) a:3Andrew Andrew moved wikitech over to silver. As of now puppet runs fine again. Notice: Finished catalog run in 31.46 seconds it does include these: 2431 include role::nova::manager 2432... [21:52:42] RECOVERY - Varnish traffic logger on cp1064 is OK: PROCS OK: 2 processes with command name varnishncsa [21:56:13] (03CR) 10Mattflaschen: [C: 04-1] "Nemo, why did you remove the TODOs without actually addressing them?" [puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [21:57:29] (03CR) 10Mattflaschen: "> It sounds like the "on a typical day" actually *is* a problem for traffic analysis." [puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [21:57:31] PROBLEM - puppet last run on mw1066 is CRITICAL: CRITICAL: Puppet has 1 failures [21:59:37] 3Ops-Access-Requests, operations, Citoid, Services: Give mvolz access to sha machine i.e. http://citoid.wikimedia.org/ - https://phabricator.wikimedia.org/T89057#1029240 (10RobH) I do not currently see a user mvolz in our shell access group (modules/admin/data/data.yml.) Is this a new user request for access?... [22:00:05] 3Ops-Access-Requests, operations, Citoid, Services: Give mvolz access to sha machine i.e. http://citoid.wikimedia.org/ - https://phabricator.wikimedia.org/T89057#1029241 (10RobH) p:5Triage>3High [22:00:08] !log repooled cp1064 eqiad upload frontends in pybal [22:00:16] Logged the message, Master [22:02:56] (03CR) 10Alex Monk: [C: 031] mediawikiwiki: Allow sysop to add and remove themself from translationadmin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187183 (https://phabricator.wikimedia.org/T87797) (owner: 10Florianschmidtwelzow) [22:06:04] (03PS1) 10BBlack: repool cp1064 upload backend cache [puppet] - 10https://gerrit.wikimedia.org/r/189838 [22:06:09] (03CR) 10Matanya: [C: 04-1] "edit-interface is found on large wikis with multiple user who want to help on the tech side but don't want the mop. on a wiki this size it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187915 (https://phabricator.wikimedia.org/T85713) (owner: 10Glaisher) [22:06:26] (03CR) 10BBlack: [C: 032 V: 032] repool cp1064 upload backend cache [puppet] - 10https://gerrit.wikimedia.org/r/189838 (owner: 10BBlack) [22:08:46] (03PS3) 10Alex Monk: enwiki FlaggedRevs: Remove 'autoreview' from 'autoconfirmed', check the former for PC2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189661 (owner: 10Cenarium) [22:09:32] PROBLEM - etherpad.wikimedia.org HTTPS on zirconium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:11:42] RECOVERY - etherpad.wikimedia.org HTTPS on zirconium is OK: HTTP OK: HTTP/1.1 200 OK - 28850 bytes in 8.084 second response time [22:14:32] RECOVERY - puppet last run on mw1066 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:15:07] 3operations: investigate etherpad service interrruptions / possible migrate service - https://phabricator.wikimedia.org/T89174#1029271 (10RobH) 3NEW [22:15:12] (03CR) 10Alex Monk: [C: 04-1] "editinterface right into templateeditor group? what?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189594 (https://phabricator.wikimedia.org/T89040) (owner: 10Calak) [22:28:21] (03CR) 10Calak: [C: 031] "Yes, like rowiki. The name of it is not important." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189594 (https://phabricator.wikimedia.org/T89040) (owner: 10Calak) [22:31:32] (03CR) 10Nemo bis: "Well, I added them, I can also remove them. There didn't seem to be actual interest in those questions" [puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [22:31:41] (03CR) 10Alex Monk: Change templateeditor user group rights on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189594 (https://phabricator.wikimedia.org/T89040) (owner: 10Calak) [22:32:41] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: puppet fail [22:33:02] (03CR) 10Aklapper: [C: 031] "looks good to me from looking at it (but didn't test locally)" [puppet] - 10https://gerrit.wikimedia.org/r/189326 (https://phabricator.wikimedia.org/T865) (owner: 10Merlijn van Deen) [22:41:01] (03CR) 10Krinkle: [C: 031] Move jq package to module, all elasticsearch machines should have it [puppet] - 10https://gerrit.wikimedia.org/r/188881 (owner: 10Chad) [22:45:21] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:21] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:52] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [22:55:21] RECOVERY - check_puppetrun on lutetium is OK: OK: Puppet is currently enabled, last run 280 seconds ago with 0 failures [23:12:49] (03PS1) 10Legoktm: Re-enable $wgCentralAuthAutoMigrate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189862 [23:13:41] (03PS1) 10Robmoen: Enable gather extension on en beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189863 [23:20:14] (03CR) 10Hoo man: [C: 04-1] "duplicate" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189862 (owner: 10Legoktm) [23:20:35] hoo: duplicate of what? [23:21:47] https://gerrit.wikimedia.org/r/188554 [23:21:56] Of my change with almost the exact same title [23:22:12] oops. [23:22:38] (03Abandoned) 10Legoktm: Re-enable $wgCentralAuthAutoMigrate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189862 (owner: 10Legoktm) [23:22:52] hoo: any reason you didn't also enable $wgCentralAuthAutoMigrateNonGlobalAccounts ? [23:23:07] hoo: and I'm going to put this in for SWAT in half an hour [23:23:14] Not really, no [23:23:24] feel free to also flip that swtich, if you want [23:24:18] * legoktm does [23:25:06] (03PS2) 10Legoktm: Re-enable wgCentralAuthAutoMigrate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188554 (owner: 10Hoo man) [23:26:59] 3Multimedia, operations, MediaWiki-extensions-UploadWizard: Investigate server error when uploading an OGV - https://phabricator.wikimedia.org/T89018#1029426 (10Tgr) [23:33:14] 3operations: investigate etherpad service interrruptions / possible migrate service - https://phabricator.wikimedia.org/T89174#1029458 (10faidon) FWIW, at the dev summit it was a simple matter of a silly default MaxClients setting of 150. I debugged this and fixed it on the spot and it remained stable for the re... [23:33:17] 3Multimedia, operations, MediaWiki-extensions-UploadWizard: Chunked upload fails in UploadWizard with the server aborting the connection, and no errors in the server logs - https://phabricator.wikimedia.org/T89018#1029459 (10Tgr) [23:33:48] 3Ops-Access-Requests, operations, Citoid, Services: Give mvolz access to sha machine i.e. http://citoid.wikimedia.org/ - https://phabricator.wikimedia.org/T89057#1029461 (10Mvolz) * Name: Marielle Volz * Labs: https://wikitech.wikimedia.org/wiki/User:Mvolz, shell name: mvolz * Requested username: mvolz * Public... [23:34:51] (03CR) 10Nuria: Correcting docs and thresholds for eventlogging alarms (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/189588 (owner: 10Nuria) [23:38:43] !log andyrussg Synchronized php-1.25wmf16/extensions/CentralNotice/: Update CentralNotice (duration: 00m 06s) [23:38:53] Logged the message, Master [23:40:12] marktraceur: did I mess up and re-add you to the afternoon swat window? (you're currently listed for it, but I remember you asking to be removed, right?) [23:46:43] (03PS1) 10Springle: remove labsdb100[123] hacked role. use the mariadb config class. [puppet] - 10https://gerrit.wikimedia.org/r/189868 [23:47:29] ori: Still searching. [23:47:31] Nemo_bis: ^ [23:48:29] (03CR) 10Springle: [C: 032] remove labsdb100[123] hacked role. use the mariadb config class. [puppet] - 10https://gerrit.wikimedia.org/r/189868 (owner: 10Springle) [23:50:02] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures [23:50:11] (03CR) 10Hoo man: remove labsdb100[123] hacked role. use the mariadb config class. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/189868 (owner: 10Springle) [23:50:21] ^demon|away, andrewbogott_afk: Regarding Commons images + wikitech-static, you could just set the cache time really high... [23:50:29] InstantCommons has a local thumbnail cache. [23:50:45] I also imagine it doesn't purge if Commons is unreachable. [23:53:22] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:53:48] 3Multimedia, operations, MediaWiki-extensions-UploadWizard: Chunked upload fails in UploadWizard with the server aborting the connection, and no errors in the server logs - https://phabricator.wikimedia.org/T89018#1029509 (10Tgr) For the record, @ori and @bblack have investigated this and found that the nginx se... [23:59:14] I'll swat today [23:59:48] * greg-g adds Krenair's name to the afternoon SWAT window :)