[00:01:37] !log ran mwscript maintenance/updateCollation.php --wiki=ruwikinews --force [00:01:38] MaxSem: hmm, not convinced my javascript changes have made it out :S [00:01:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:01:49] MatmaRex, real 5m13.443s [00:01:52] MaxSem: although, i see them with ?debug=1 .... so not sure [00:02:40] MaxSem: nice. thanks [00:03:39] 6Operations, 10MediaWiki-Interface, 10Traffic: Purge pages cached with mobile editlinks - https://phabricator.wikimedia.org/T125841#2120927 (10Jdlrobson) In a sample of 100 pages visited via random I found only one page with the issue: https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Vancouver My con... [00:03:49] (03PS3) 10Reedy: Disable OAI extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277229 (https://phabricator.wikimedia.org/T70867) [00:04:31] (03CR) 10Reedy: [C: 032] Disable OAI extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277229 (https://phabricator.wikimedia.org/T70867) (owner: 10Reedy) [00:05:33] (03Merged) 10jenkins-bot: Disable OAI extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277229 (https://phabricator.wikimedia.org/T70867) (owner: 10Reedy) [00:06:24] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Disable OAI (duration: 00m 25s) [00:06:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:07:29] !log reedy@tin Synchronized wmf-config/extension-list: Remove OAI (duration: 00m 24s) [00:07:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:09:03] Thanks for the deploy MaxSem. [00:18:15] RECOVERY - puppet last run on mw2070 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:18:15] MaxSem: weird, finally starting to get events in from the new schema, even though it was pushed out 50 minutes ago. wonder what took it so long....anyways should be fine now [00:24:47] MaxSem: so, its because i'm an idiot :P event logging in the db is backlogged by about an hour :P [00:26:06] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:26:07] :'} [00:31:34] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.052 second response time [00:32:33] (03PS5) 10Dereckson: Config changes for gu.wikiquote.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263614 (https://phabricator.wikimedia.org/T121853) (owner: 10Mdann52) [00:33:22] (03CR) 10Dereckson: [C: 031] "PS5: rebased, replaced a space by an underscore in namespace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263614 (https://phabricator.wikimedia.org/T121853) (owner: 10Mdann52) [00:51:35] 6Operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service: implement wdqs1001/1002 disk upgrades (extend lvm) - https://phabricator.wikimedia.org/T120714#2121044 (10RobH) Barring any issues during the reimagine (as long as nothing breaks) it can typically be reinstalled in around an hour. This include... [01:17:22] 6Operations, 10MediaWiki-Interface, 10Traffic: Purge pages cached with mobile editlinks - https://phabricator.wikimedia.org/T125841#2121093 (10matmarex) For the current code, I think a good enough check is to match pages that contain ` http://puppet-compiler.wmflabs.org/2053/" [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [01:54:32] (03PS5) 10Dzahn: ganglia: do not start meta-service on jessie/systemd [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) [01:55:51] (03CR) 10jenkins-bot: [V: 04-1] ganglia: do not start meta-service on jessie/systemd [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [01:55:56] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:57:34] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 615 bytes in 0.125 second response time [01:57:42] (03PS6) 10Dzahn: ganglia: do not start meta-service on jessie/systemd [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) [01:59:24] (03PS7) 10Dzahn: ganglia: do not start meta-service on jessie/systemd [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) [02:01:14] PROBLEM - Apache HTTP on mw1025 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.003 second response time [02:02:55] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.043 second response time [02:04:07] (03CR) 10Dzahn: [C: 04-1] "getting better, but still not there yet on alsafi http://puppet-compiler.wmflabs.org/2054/" [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [02:11:06] PROBLEM - HHVM rendering on mw1025 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.011 second response time [02:11:36] (03PS5) 10Krinkle: Avoid legacy overhead in mobile web experience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277422 (owner: 10Jdlrobson) [02:12:09] (03CR) 10Krinkle: [C: 032] Avoid legacy overhead in mobile web experience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277422 (owner: 10Jdlrobson) [02:12:41] (03Merged) 10jenkins-bot: Avoid legacy overhead in mobile web experience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277422 (owner: 10Jdlrobson) [02:12:54] RECOVERY - HHVM rendering on mw1025 is OK: HTTP OK: HTTP/1.1 200 OK - 68149 bytes in 0.189 second response time [02:16:33] (03PS1) 10Dzahn: ganglia: don't install old init script if systemd is used [puppet] - 10https://gerrit.wikimedia.org/r/277451 (https://phabricator.wikimedia.org/T124197) [02:20:55] (03PS2) 10Dzahn: ganglia: don't install old init scripts if systemd is used [puppet] - 10https://gerrit.wikimedia.org/r/277451 (https://phabricator.wikimedia.org/T124197) [02:24:58] (03PS1) 10Dereckson: Test Collection extension on zh.wikipedia.beta.wmflabs.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277452 (https://phabricator.wikimedia.org/T128425) [02:26:32] (03CR) 10Dzahn: [C: 032] "compiler tested with desired result http://puppet-compiler.wmflabs.org/2055/" [puppet] - 10https://gerrit.wikimedia.org/r/277451 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [02:28:23] (03PS8) 10Dzahn: ganglia: do not start meta-service on jessie/systemd [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) [02:28:45] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [02:28:54] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [02:29:18] (03CR) 10Dzahn: "had to rebase this on top of https://gerrit.wikimedia.org/r/#/c/277451/ first" [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [02:30:16] (03CR) 10Dereckson: "@Luke081515 This change seems ready for deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247093 (https://phabricator.wikimedia.org/T115812) (owner: 10Luke081515) [02:31:08] (03CR) 10Dzahn: "confirmed noop on carbon and current aggregators in prod" [puppet] - 10https://gerrit.wikimedia.org/r/277451 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [02:32:14] PROBLEM - HHVM rendering on mw1025 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 1.096 second response time [02:33:55] RECOVERY - HHVM rendering on mw1025 is OK: HTTP OK: HTTP/1.1 200 OK - 65771 bytes in 0.203 second response time [02:39:23] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 17m 31s) [02:39:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:48:16] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Mar 15 02:48:16 UTC 2016 (duration 8m 54s) [02:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:50:56] (03CR) 10Dzahn: [C: 04-1] "more puppet dependencies that fail when the old service doesnt exist anymore on systemd where each aggregator is its own service now. gett" [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [02:52:35] (03Abandoned) 10Dzahn: ganglia: script to start multiple aggregators [puppet] - 10https://gerrit.wikimedia.org/r/276369 (owner: 10Dzahn) [02:53:16] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [02:53:25] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [02:53:28] !log krinkle@tin Synchronized wmf-config/mobile.php: Remove legacy scripts from autoload on mobile (duration: 00m 26s) [02:53:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:53:39] (03PS1) 10Dzahn: ganglia: no dependency for old upstart service on systemd [puppet] - 10https://gerrit.wikimedia.org/r/277455 (https://phabricator.wikimedia.org/T124197) [02:54:56] (03PS1) 10Andrew Bogott: Add makedomain tool, for creation of domains in designate. [puppet] - 10https://gerrit.wikimedia.org/r/277456 [03:02:30] (03CR) 10Dzahn: [C: 032] "compiler tested noop in prod http://puppet-compiler.wmflabs.org/2057/" [puppet] - 10https://gerrit.wikimedia.org/r/277455 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [03:05:54] (03PS9) 10Dzahn: ganglia: do not start meta-service on jessie/systemd [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) [03:10:03] glad that we are _NOT_ affected by this bug. and things work https://bugzilla.redhat.com/show_bug.cgi?id=752774 [03:10:21] re: starting multiple instances of one service, with systemd and puppet [03:11:08] it works, now it's just about all the puppet depenecies etc to make it work with both systems at the same time [03:12:25] but on jessie it can definitely spawn many instances from just one unit template file now and the puppet class for it does the job [03:14:53] (03CR) 10Dzahn: [C: 032] "more rebasing on top of other stuff, now compiles fine : http://puppet-compiler.wmflabs.org/2058/" [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [03:18:12] (03CR) 10Dzahn: "noop on carbon, bast4001 (prod), the next new issue on jessie. one by one" [puppet] - 10https://gerrit.wikimedia.org/r/277354 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [03:24:14] (03PS1) 10Dzahn: ganglia: fix me - service notify systemd (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/277458 [03:28:03] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: The refreshLinks jobs enqueue rate is 10 times the normal rate - https://phabricator.wikimedia.org/T129517#2121269 (10Legoktm) [03:28:17] ACKNOWLEDGEMENT - puppet last run on logstash1001 is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn created ticket https://phabricator.wikimedia.org/T129934 [03:28:17] ACKNOWLEDGEMENT - puppet last run on logstash1002 is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn created ticket https://phabricator.wikimedia.org/T129934 [03:28:17] ACKNOWLEDGEMENT - puppet last run on logstash1003 is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn created ticket https://phabricator.wikimedia.org/T129934 [03:30:23] (03PS2) 10Dzahn: ganglia: fix me - service notify systemd (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/277458 (https://phabricator.wikimedia.org/T124197) [03:30:58] (03CR) 10Dzahn: [C: 04-1] "to be continued tomorrow. prod is fine." [puppet] - 10https://gerrit.wikimedia.org/r/277458 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [04:30:45] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:32:25] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 615 bytes in 0.735 second response time [04:40:15] (03Abandoned) 10CSteipp: Set password policy for enwiki sysops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/251678 (https://phabricator.wikimedia.org/T119100) (owner: 10CSteipp) [04:49:03] (03PS2) 10CSteipp: Enforce password policies on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276518 (https://phabricator.wikimedia.org/T119100) [05:16:48] 6Operations: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963#2121329 (10ori) [05:17:53] 6Operations, 6Performance-Team: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963#2121342 (10ori) [05:51:09] (03PS1) 10Ori.livneh: Add commented-out entries in readOnlyBySection for all database clusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277461 [05:52:35] (03PS1) 10Ori.livneh: Put eqiad in read-only mode for scheduled test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277462 [06:01:40] Reminder: in one hour (07:00 UTC) all wikis will be in read-only mode for five minutes for a scheduled test. [https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Schedule_for_Q3_FY2015-2016_rollout] [06:09:15] PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [24.0] [06:12:44] RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [06:16:13] (03CR) 10Jcrespo: [C: 031] Put eqiad in read-only mode for scheduled test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277462 (owner: 10Ori.livneh) [06:16:45] what are the tests? [06:19:22] (03PS1) 10KartikMistry: Enable non-default MT for some languages [puppet] - 10https://gerrit.wikimedia.org/r/277463 (https://phabricator.wikimedia.org/T129849) [06:27:01] (03PS2) 10KartikMistry: Enable non-default MT for some languages [puppet] - 10https://gerrit.wikimedia.org/r/277463 (https://phabricator.wikimedia.org/T129849) [06:28:50] jynus: not what, who [06:29:06] ? [06:29:54] sorry, bad joke. the point is primarily to have 5-minutes' worth of log data for all log buckets for the duration of the test [06:31:16] jynus: https://gerrit.wikimedia.org/r/#/c/277461/ look OK? [06:31:35] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:45] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:54] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:16] I am not sure that will rebase [06:32:34] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:45] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:05] what do you mean? it's based on the current head [06:33:16] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:40] (03CR) 10Giuseppe Lavagetto: [C: 031] Put eqiad in read-only mode for scheduled test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277462 (owner: 10Ori.livneh) [06:36:45] 6Operations, 6Discovery, 10Wikimedia-Logstash, 7Elasticsearch: logstash - nginx failed service start - https://phabricator.wikimedia.org/T129934#2121442 (10Gehel) a:3Gehel [06:40:00] the queue will continue executing. Do you want me to put the DBs in read only? [06:40:14] or do you want me to log the writes? [06:41:47] or are you going to stop the queue? [06:41:47] <_joe_> jynus: we can stop the jobrunners during the test [06:41:51] <_joe_> it makes sesne [06:42:12] <_joe_> give me 2 mins to prepare the salt calls (basically, lemme finish coffee) [06:43:39] jynus: could you log the writes? [06:45:49] <_joe_> !log stopping puppet on the eqiad jobrunners, in preparation for the read-only test [06:45:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:46:16] well, technically they are allways written on the binlog, I only need to store them permanently, just make sure the period is short, and you record the exact timestamps of the process [06:47:09] <_joe_> whenever you want I can stop the jobqueue [06:50:14] let's wait another five minutes [06:50:25] <_joe_> we said 8 am right? [06:50:33] <_joe_> I can do it one minute before the time [06:50:52] hey [06:51:14] we said 7 UTC [06:52:36] <_joe_> yeah sorry ori :P [06:52:40] <_joe_> paravoid: hey [06:52:41] I believe that's 8 am where joe and I live. or at least I hope so, I'd hate if I woke up early for nothing. [06:52:55] <_joe_> Elitre: it's 8 am allright :) [06:53:01] <_joe_> Elitre: ciao btw [06:53:37] yay! barely slept because of the excitement. :D [06:55:46] Oh same to me, I could barely sleep because someone got excited with a non-issue with mediawiki and decided to give me a call [06:55:57] <_joe_> jynus: :/ [06:56:13] <_joe_> I'd go on and stop the jobrunners now [06:56:22] good because I just put up a banner saying "incoming!" :P [06:56:27] <_joe_> I'll start with the videoscalers [06:56:40] thanks _joe_; go ahead [06:56:54] going to merge but not sync yet [06:56:58] <_joe_> !log stopping jobrunner and jobchron on the videoscalers in eqiad [06:57:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:57:25] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:57:58] (03CR) 10Ori.livneh: [C: 032] Add commented-out entries in readOnlyBySection for all database clusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277461 (owner: 10Ori.livneh) [06:58:22] (03Merged) 10jenkins-bot: Add commented-out entries in readOnlyBySection for all database clusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277461 (owner: 10Ori.livneh) [06:58:24] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:35] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:58:46] <_joe_> ok I'll stop the main jobrunners too [06:58:47] (03CR) 10Ori.livneh: [C: 032] Put eqiad in read-only mode for scheduled test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277462 (owner: 10Ori.livneh) [06:59:06] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:11] (03Merged) 10jenkins-bot: Put eqiad in read-only mode for scheduled test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277462 (owner: 10Ori.livneh) [06:59:19] <_joe_> !log stopping all jobrunners in eqiad [06:59:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:59:58] <_joe_> all jobrunners are stopped [07:00:13] jynus: are you ready to start logging writes? [07:00:30] jobcrawlers [07:00:40] <_joe_> there is terbium ofc, which could have some long-running jobs [07:00:41] I already did [07:00:59] <_joe_> let's go then :) [07:01:05] OK, going to sync. jynus, roger? [07:01:10] go [07:01:15] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:01:21] I did this already [07:01:41] Jamesofur: the time doesn't look right in CN settings [07:01:49] * Jamesofur sighs [07:01:51] * Jamesofur looks [07:02:19] doh [07:02:23] fixed [07:02:24] :( [07:02:29] !log ori@tin Synchronized wmf-config/db-eqiad.php: Ie3f798ac: Put eqiad in read-only mode for scheduled test (duration: 00m 55s) [07:02:31] midnight because midnight here.... [07:02:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:02:36] sorry [07:02:45] (it's been a long day already...) [07:02:48] It's 6am utc [07:02:56] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:02:56] 7am :) [07:02:57] Er no [07:03:01] Yep [07:03:02] mark: don't DO that [07:03:03] but, yeah, fixed [07:03:04] :P [07:03:09] Haha [07:03:10] hahaha [07:03:15] It's early here [07:03:15] the qps are going down, but because of the job runners, not the read only [07:03:21] ;-) [07:04:01] But I cannot save pages! [07:04:30] It says "The Wikipedia database is temporarily in read-only mode." [07:05:39] shell scripting crime of the day: https://gist.github.com/atdt/f557d57dc26bffb636e2 -- tail -f all log files on fluorine into a timestamped directory [07:06:24] around 1 exception per second [07:06:40] KPI! [07:06:53] o/ [07:07:01] morning Krinkle [07:07:21] OK [07:07:23] loginwiki cannot refresh cokies [07:07:24] <_joe_> https://grafana.wikimedia.org/dashboard/db/redis-jobqueue-elukey confirms jobs are not being submitted anymore [07:07:33] shall I revert now? [07:07:36] <_joe_> that's good [07:07:39] we're just over the five-minute mark [07:07:54] <_joe_> ori: green light for me [07:07:54] (03PS1) 10Ori.livneh: Revert "Put eqiad in read-only mode for scheduled test" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277464 [07:08:03] user preferences cannot be saved [07:08:04] watchlist page doesn't show the readonly banner [07:08:32] (03CR) 10Ori.livneh: [C: 032] Revert "Put eqiad in read-only mode for scheduled test" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277464 (owner: 10Ori.livneh) [07:08:33] "Could not update user with ID '0'; DB is read-only." [07:08:55] I mean I can hit the mark all as visited, but if I go to the edit page it shows the banner as expected [07:09:08] (03Merged) 10jenkins-bot: Revert "Put eqiad in read-only mode for scheduled test" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277464 (owner: 10Ori.livneh) [07:10:06] !log ori@tin Synchronized wmf-config/db-eqiad.php: I1eb69f16: Revert "Put eqiad in read-only mode for scheduled test" (duration: 00m 28s) [07:10:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:10:24] I have reversed the editor decline [07:10:25] <_joe_> !log reenabling puppet, jobrunner, jobchron on jobrunners and videoscalers [07:10:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:10:29] for the first time! [07:10:30] <_joe_> ahahahahaha [07:11:46] 15 minutes of no database errors, too [07:11:49] https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor [07:12:48] I'm still getting the database locked on preference page [07:12:50] https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki-errors [07:13:21] <_joe_> the error or a banner? [07:13:38] Seems Special:Translate is throwing DB-readonly when trying to viewing a page [07:13:44] banner [07:14:12] Damn it, I didn't sync the change properly. [07:14:19] <_joe_> oh [07:14:27] <_joe_> ? [07:14:54] metawiki /wiki/Special:MyLanguage/Wikimedia_Foundation_Board_noticeboard/10_March_2016_-_Wikimedia_Foundation_executive_transition_update - fails in MessageGroupStats [07:15:12] sorry, synced now. [07:15:28] !log ori@tin Synchronized wmf-config/db-eqiad.php: I1eb69f16: Revert "Put eqiad in read-only mode for scheduled test" (for real) (duration: 00m 28s) [07:15:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:16:25] ori: confirmed preferences works too now :) [07:24:14] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 51.72% of data above the critical threshold [5000000.0] [07:24:26] OK, thanks everyone [07:24:54] _joe_, jynus, Elitre, Jamesofur, volans, Krinkle [07:25:06] o7 [07:25:19] thank you, ori and Ops. high 5 time? [07:25:26] thank you ori :) [07:26:14] Elitre: not quite! This is a pretty rudimentary test, to pinpoint code-paths that don't handle read-only mode gracefully [07:26:44] A few interesting deadlocks happened in mediawiki-errors [07:26:52] not sure if those recover themselves [07:27:40] !log log files for RO test in fluorine:/a/mw-log.read-only.1458025363.tar.bz2 [07:27:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:28:23] I think /a is what the filesystem hierarchy standard recommends, right? [07:28:29] /a for amazing [07:28:47] <_joe_> and for /awesome [07:29:20] <_joe_> ori: think back two years approx, when I joined [07:29:41] <_joe_> we've removed a lot of /a-wesomeness since then [07:29:45] I can't tell if morebots is just using technical language or it's being deferential with someone. [07:30:17] <_joe_> Elitre: the latter [07:30:50] isn't fluorine the last holdout? [07:31:33] <_joe_> ori: rdbs are gone right? [07:31:58] yeah, with the roll-out of multi-instance [07:34:40] surprisingly enough, the only writes that continued during read only were non-mediawiki ones [07:35:56] \o/ [07:38:06] RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 50.00% above the threshold [1000000.0] [07:39:44] <_joe_> jynus: which ones? [07:39:59] ori: What' in that tarball? [07:40:26] _joe_, pt-heartbeat, the database watchdog/lag alert [07:40:37] Krinkle: output of tail -f of each log file in /a/mw-log for the duration of the test [07:49:36] (03PS1) 10Ori.livneh: Allow X-Wikimedia-Debug header to request pages in read-only mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277469 [07:50:45] (03PS2) 10Ori.livneh: Allow X-Wikimedia-Debug header to request pages in read-only mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277469 [07:51:35] ori: I'll look more later, but noticed at https://grafana-admin.wikimedia.org/dashboard/db/resourceloader?from=now-7d that traffic almost halved since at last weeks' branch roll out. [07:51:38] (03PS2) 10Ori.livneh: X-Wikimedia-Debug: profile if 'profiler' attribute set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276220 [07:51:40] Which is a good sign [07:51:42] more cache hits I gues [07:51:45] (03CR) 10Ori.livneh: [C: 032] X-Wikimedia-Debug: profile if 'profiler' attribute set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276220 (owner: 10Ori.livneh) [07:51:49] (03PS3) 10KartikMistry: Enable non-default MT for some languages [puppet] - 10https://gerrit.wikimedia.org/r/277463 (https://phabricator.wikimedia.org/T129849) [07:51:55] far far less 304s and a fewer 200s as well. [07:52:11] wow [07:52:13] Not sure exactly yet, but initial looks good. [07:52:20] Also no idea why yet exactly [07:52:24] (03Merged) 10jenkins-bot: X-Wikimedia-Debug: profile if 'profiler' attribute set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276220 (owner: 10Ori.livneh) [07:52:52] removal of some old symlinks would have turned some requests into 404s? [07:53:34] (03CR) 10Ori.livneh: [C: 032] Allow X-Wikimedia-Debug header to request pages in read-only mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277469 (owner: 10Ori.livneh) [07:53:49] ori: this is pure RL only (/w/load.php) not other misc-RL-ish requests [07:53:59] (03Merged) 10jenkins-bot: Allow X-Wikimedia-Debug header to request pages in read-only mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277469 (owner: 10Ori.livneh) [07:54:20] very promising [07:54:48] breadonly [07:54:50] * Krinkle is happy [07:54:55] * Krinkle loves bread [07:55:20] :) [07:55:24] o/ [07:55:31] O [07:55:33] seeya! [07:55:35] I'll be back in a few horus [07:57:04] (03CR) 10Mobrovac: [C: 04-1] "One puppet quirk to resolve and we should be good to go." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/277423 (owner: 10Thcipriani) [07:57:18] !log ori@tin Synchronized wmf-config/StartProfiler.php: I82ec01a: X-Wikimedia-Debug: profile if "profile" attribute set (duration: 00m 25s) [07:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:57:33] (03PS6) 10Jforrester: Enable VisualEditor for IP users on the German Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271713 (https://phabricator.wikimedia.org/T127881) [07:57:44] (03CR) 10Jforrester: [C: 031] "Due to go out today at 15:00 UTC." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271713 (https://phabricator.wikimedia.org/T127881) (owner: 10Jforrester) [07:57:54] James_F: \o/ [07:58:00] (03PS4) 10Jforrester: Enable VisualEditor Single Edit Tab on the Polish Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274130 (https://phabricator.wikimedia.org/T128477) [07:58:08] (03CR) 10Jforrester: [C: 031] "Due to go out today at 15:00 UTC." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274130 (https://phabricator.wikimedia.org/T128477) (owner: 10Jforrester) [07:59:04] !log ori@tin Synchronized wmf-config/CommonSettings.php: Ieeb76087: Allow X-Wikimedia-Debug header to request pages in read-only mode (duration: 00m 25s) [07:59:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:01:51] (03PS1) 10Ori.livneh: Follow-up for Ieeb76087: tolerate missing trailing semicolon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277472 [08:02:15] (03CR) 10Ori.livneh: [C: 032] Follow-up for Ieeb76087: tolerate missing trailing semicolon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277472 (owner: 10Ori.livneh) [08:02:41] (03Merged) 10jenkins-bot: Follow-up for Ieeb76087: tolerate missing trailing semicolon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277472 (owner: 10Ori.livneh) [08:10:34] !log ori@tin Synchronized wmf-config/CommonSettings.php: I01c01dcd: Follow-up for Ieeb76087: tolerate missing trailing semicolon (duration: 00m 29s) [08:10:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:15:15] !log restbase deploy start of c68f5f456 [08:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:24:23] !log restbase deploy end of c68f5f456 [08:24:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:40:58] (03PS1) 10Gehel: Adding the ability to ensure nginx => absent [puppet/nginx] - 10https://gerrit.wikimedia.org/r/277476 (https://phabricator.wikimedia.org/T129934) [08:47:22] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I wouldn't implement this using graphite, if we want it to be a critical alert." [puppet] - 10https://gerrit.wikimedia.org/r/252396 (https://phabricator.wikimedia.org/T118331) (owner: 10Ori.livneh) [08:50:06] 6Operations, 6Performance-Team, 7Availability, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Dig through logs from 15 Mar 2016 read-only test and file bugs - https://phabricator.wikimedia.org/T129973#2121627 (10ori) [08:56:52] (03PS1) 10DCausse: Enable ICU Folding on greek wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277477 (https://phabricator.wikimedia.org/T129502) [09:03:14] (03PS1) 10Gehel: Fixes issues with nginx package on logstash. [puppet] - 10https://gerrit.wikimedia.org/r/277478 (https://phabricator.wikimedia.org/T129934) [09:06:05] (03PS4) 10Giuseppe Lavagetto: realm: add $::app_routes hash [puppet] - 10https://gerrit.wikimedia.org/r/275443 (https://phabricator.wikimedia.org/T125673) [09:06:22] <_joe_> mobrovac: ^^ I'm revisiting the whole thing a bit [09:07:37] kk _joe_, lemme know when you're done and i'll take a look [09:13:55] (03PS2) 10Ema: RESTBase caching: Force clients to revalidate purged end points [puppet] - 10https://gerrit.wikimedia.org/r/277056 (owner: 10GWicke) [09:15:30] (03CR) 10Ema: [C: 032 V: 032] RESTBase caching: Force clients to revalidate purged end points [puppet] - 10https://gerrit.wikimedia.org/r/277056 (owner: 10GWicke) [09:18:54] 6Operations, 10Wikimedia-General-or-Unknown, 13Patch-For-Review: Dynamic backend selection via X-Wikimedia-Debug header - https://phabricator.wikimedia.org/T129000#2121683 (10ori) 5Open>3Resolved a:3ori https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [09:27:03] 6Operations, 6Performance-Team, 7Availability, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Dig through logs from 15 Mar 2016 read-only test and file bugs - https://phabricator.wikimedia.org/T129973#2121692 (10ori) Perhaps #release-engineering-team can help? [09:27:56] !log restbase rolling restart for https://gerrit.wikimedia.org/r/277056 [09:28:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:28:04] ema: ^ [09:28:34] 6Operations, 10Wikimedia-Fundraising: Add /fundraising to dumps.wikimedia.org - https://phabricator.wikimedia.org/T42847#464766 (10ArielGlenn) 5Open>3Resolved Well the changeset above was long since merged, so I am closing this ticket. [09:28:53] 6Operations, 10Datasets-General-or-Unknown, 10Wikimedia-Fundraising: Add /fundraising to dumps.wikimedia.org - https://phabricator.wikimedia.org/T42847#2121699 (10ArielGlenn) [09:30:31] 6Operations, 10Datasets-General-or-Unknown, 6Labs, 10wikitech.wikimedia.org: copy wikitech dumps to dumps server ? - https://phabricator.wikimedia.org/T128680#2082710 (10ArielGlenn) [09:32:06] (03PS5) 10Giuseppe Lavagetto: realm: add $::app_routes hash [puppet] - 10https://gerrit.wikimedia.org/r/275443 (https://phabricator.wikimedia.org/T125673) [09:34:56] 6Operations, 10Datasets-General-or-Unknown, 6Labs, 10Tool-Labs: enwiki database dumps missing - https://phabricator.wikimedia.org/T89537#2121741 (10ArielGlenn) [09:40:46] (03CR) 10Filippo Giunchedi: [C: 031] dependencies needed for logstash filtering [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/277264 (https://phabricator.wikimedia.org/T128787) (owner: 10Eevans) [09:48:35] 6Operations, 10Wikimedia-Site-Requests, 7Wikimedia-log-errors: Requests to localhost spam the 'localhost' and 'xff' log buckets - https://phabricator.wikimedia.org/T129982#2121837 (10hashar) [09:53:34] (03CR) 10Filippo Giunchedi: [C: 031] Adding the ability to ensure nginx => absent [puppet/nginx] - 10https://gerrit.wikimedia.org/r/277476 (https://phabricator.wikimedia.org/T129934) (owner: 10Gehel) [09:55:38] (03CR) 10Filippo Giunchedi: "would there be any harm in having elasticsearch https enabled even on logstash?" [puppet] - 10https://gerrit.wikimedia.org/r/277478 (https://phabricator.wikimedia.org/T129934) (owner: 10Gehel) [09:57:29] (03CR) 10Gehel: "Not really any harm as far as I can think, but I'm not sure there is any use either..." [puppet] - 10https://gerrit.wikimedia.org/r/277478 (https://phabricator.wikimedia.org/T129934) (owner: 10Gehel) [09:59:28] (03PS3) 10Giuseppe Lavagetto: cxserver: use dc-aware urls [puppet] - 10https://gerrit.wikimedia.org/r/275537 (https://phabricator.wikimedia.org/T125065) [09:59:30] (03PS6) 10Giuseppe Lavagetto: realm: add $::app_routes hash [puppet] - 10https://gerrit.wikimedia.org/r/275443 (https://phabricator.wikimedia.org/T125673) [10:02:05] (03CR) 10Filippo Giunchedi: "license/copyright for varnishapi missing, other than that LGTM!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) (owner: 10Ema) [10:04:45] (03CR) 10Giuseppe Lavagetto: [C: 031] "Does the right thing according to the compiler:" [puppet] - 10https://gerrit.wikimedia.org/r/275537 (https://phabricator.wikimedia.org/T125065) (owner: 10Giuseppe Lavagetto) [10:06:42] (03CR) 10Filippo Giunchedi: [C: 031] "nevermind, I just noticed the default is absent in the class anyways!" [puppet] - 10https://gerrit.wikimedia.org/r/277478 (https://phabricator.wikimedia.org/T129934) (owner: 10Gehel) [10:14:44] (03PS2) 10Muehlenhoff: Add ferm rules for carbon-c-relay for labs graphite [puppet] - 10https://gerrit.wikimedia.org/r/276482 [10:15:15] (03PS2) 10Gehel: Fixes issues with nginx package on logstash. [puppet] - 10https://gerrit.wikimedia.org/r/277478 (https://phabricator.wikimedia.org/T129934) [10:15:52] (03CR) 10Gehel: [C: 032] Fixes issues with nginx package on logstash. [puppet] - 10https://gerrit.wikimedia.org/r/277478 (https://phabricator.wikimedia.org/T129934) (owner: 10Gehel) [10:15:59] (03CR) 10Gehel: [C: 032] Adding the ability to ensure nginx => absent [puppet/nginx] - 10https://gerrit.wikimedia.org/r/277476 (https://phabricator.wikimedia.org/T129934) (owner: 10Gehel) [10:19:11] I just merged a change to puppet-nginx, which is a submodule. It does not seem to be deployed by "puppet-merge" on palladium. Should I update it manually? [10:20:09] yeah you'll need a puppet.git code review that updates the submodule commit [10:20:36] damn submodules, I never remember how they work... [10:21:26] PROBLEM - puppet last run on elastic1013 is CRITICAL: CRITICAL: puppet fail [10:21:53] ^ above icinga problem is my doing (again) sorry for the noise... [10:21:55] PROBLEM - puppet last run on elastic2008 is CRITICAL: CRITICAL: puppet fail [10:22:26] PROBLEM - puppet last run on elastic2013 is CRITICAL: CRITICAL: puppet fail [10:26:12] (03PS3) 10Muehlenhoff: Add ferm rules for carbon-c-relay for labs graphite [puppet] - 10https://gerrit.wikimedia.org/r/276482 [10:27:08] (03CR) 10Ori.livneh: [C: 031] "A few comments inline; good to go otherwise." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) (owner: 10Ema) [10:30:35] godog: I'm lost with submodules. Do you happend to know the command to update the nginx submodule? [10:33:21] (03CR) 10Ladsgroup: "I don't know operations/puppet coding conventions but python files here does not follow python standards mostly PEP8. Take a look and I wo" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) (owner: 10Ema) [10:34:35] (03PS1) 10Gehel: updated nginx submodule [puppet] - 10https://gerrit.wikimedia.org/r/277484 [10:34:35] godog: forget it, I found the solution [10:34:36] gehel: sure, I think it is sth like cd modules/nginx && git fetch && git checkout origin/master [10:34:42] ah ok [10:34:45] * gehel hates submodule [10:34:54] godog: thanks anyway! [10:35:14] hehe np, I had to look it up too... not a fan of submodules either [10:35:22] git submodule update modules/nginx [10:35:30] should take care of it [10:36:47] (03CR) 10Filippo Giunchedi: [C: 031] updated nginx submodule [puppet] - 10https://gerrit.wikimedia.org/r/277484 (owner: 10Gehel) [10:36:52] godog: I added you as a reviewer, but I think this is trivial enough and I'm just going to merge it (you already reviewed the nginx change itself) [10:36:53] the default being to checkout [10:37:00] (03CR) 10Gehel: [C: 032] updated nginx submodule [puppet] - 10https://gerrit.wikimedia.org/r/277484 (owner: 10Gehel) [10:37:04] (03CR) 10Filippo Giunchedi: [C: 031] Add ferm rules for carbon-c-relay for labs graphite [puppet] - 10https://gerrit.wikimedia.org/r/276482 (owner: 10Muehlenhoff) [10:37:14] you can pass --force to force a checkout even if the submodule HEAD match the super project registered commit [10:37:15] gehel: yeah np, trivial [10:37:18] --recursive to process submodules in submodules etc [10:37:43] hashar: neat, didn't know that! [10:37:53] godog: man git-submodule is a good read :-} [10:38:17] hashar: could you please define "good read" ? [10:38:21] you can make the command to rebase/merge instead and have the default behavior configured in the super project on a per submodule basis [10:38:41] (03PS3) 10Giuseppe Lavagetto: restbase: make restbase configuration use application routes [puppet] - 10https://gerrit.wikimedia.org/r/275536 (https://phabricator.wikimedia.org/T126235) [10:38:43] (03PS1) 10Giuseppe Lavagetto: service::configuration: simplify hiera defs [puppet] - 10https://gerrit.wikimedia.org/r/277486 (https://phabricator.wikimedia.org/T125065) [10:38:45] "good read" as "quite interesting" or "you will get to discover a few nice tip'n tricks regarding git" :D [10:39:39] an example is the mediawiki staging area. The extensions are submodules and we have live patches there [10:40:02] previously we used git submodule update then manually reapplied the various live patches [10:40:14] we changed the parent project git config to have submodule autorebase [10:40:23] (03CR) 10jenkins-bot: [V: 04-1] restbase: make restbase configuration use application routes [puppet] - 10https://gerrit.wikimedia.org/r/275536 (https://phabricator.wikimedia.org/T126235) (owner: 10Giuseppe Lavagetto) [10:40:26] so we can just git submodule update and the live patches are rebased all magically [10:40:26] <3 [10:40:35] RECOVERY - DPKG on logstash1001 is OK: All packages OK [10:41:07] hashar: hey, there is an interesting bug in ores extension which only happens in beta cluster :D https://phabricator.wikimedia.org/T129892 [10:41:25] I take a look at it today [10:41:56] Amir1: maybe it is a missing css ? [10:42:03] (03PS4) 10Giuseppe Lavagetto: restbase: make restbase configuration use application routes [puppet] - 10https://gerrit.wikimedia.org/r/275536 (https://phabricator.wikimedia.org/T126235) [10:42:25] oh the mark 'r' is not even showing up ... bah [10:42:32] I don't think so, the CSS for ordinary edits and new pages are the same [10:42:51] so if it wass css no row would be higlighted [10:43:19] Amir1: how are RC entries flagged as damaging? Is that a field in the rcchange table ? [10:43:46] it's a hook [10:43:50] let me find the source code [10:44:02] after a class is added CSS take care of the rest [10:45:31] (03PS1) 10Gehel: nginx service should be stopped AND disabled when nginx is absent [puppet/nginx] - 10https://gerrit.wikimedia.org/r/277487 [10:45:44] https://github.com/wikimedia/mediawiki-extensions-ORES/blob/master/includes/Hooks.php#L188 [10:45:49] hashar: ^ [10:46:15] RECOVERY - DPKG on logstash1002 is OK: All packages OK [10:48:18] (03CR) 10Gehel: [C: 032] nginx service should be stopped AND disabled when nginx is absent [puppet/nginx] - 10https://gerrit.wikimedia.org/r/277487 (owner: 10Gehel) [10:48:50] (03CR) 10Nikerabbit: HHVM: Enable translation cache garbage-collection on canary app servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/277061 (https://phabricator.wikimedia.org/T277061) (owner: 10Ori.livneh) [10:50:16] RECOVERY - puppet last run on elastic1013 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:50:45] RECOVERY - puppet last run on elastic2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:50:54] (03CR) 10Ori.livneh: "It should have been https://phabricator.wikimedia.org/T103886 . If you're following this, I submitted a stack trace upstream and PB said " [puppet] - 10https://gerrit.wikimedia.org/r/277061 (https://phabricator.wikimedia.org/T277061) (owner: 10Ori.livneh) [10:51:24] RECOVERY - puppet last run on elastic2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:53:16] Amir1: works for me [10:53:27] (03PS1) 10Gehel: updated nginx submodule to latest [puppet] - 10https://gerrit.wikimedia.org/r/277489 [10:53:50] yeah, also in in my local host in another instance I've put up in labs [10:54:08] (03PS1) 10Filippo Giunchedi: graphite: add 'big_users' route and cluster [puppet] - 10https://gerrit.wikimedia.org/r/277490 (https://phabricator.wikimedia.org/T85451) [10:54:08] *and [10:54:16] but not in beta [10:54:46] (03CR) 10Filippo Giunchedi: [C: 04-1] "blocked by https://phabricator.wikimedia.org/T126253" [puppet] - 10https://gerrit.wikimedia.org/r/277490 (https://phabricator.wikimedia.org/T85451) (owner: 10Filippo Giunchedi) [10:55:57] !log rolling reboot of mw* in codfw for kernel upgrade [10:56:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:56:24] (03CR) 10Gehel: [C: 032] updated nginx submodule to latest [puppet] - 10https://gerrit.wikimedia.org/r/277489 (owner: 10Gehel) [10:57:33] Amir1: I have pasted some thoughts at v [10:57:36] Amir1: https://phabricator.wikimedia.org/T129892#2122052 [10:57:42] thanks :) [10:57:55] Amir1: and maybe your user preference has some kind of cache in action [10:58:35] Amir1: so even logged in, your user preference would be fetched from cache. I am not sure beta features are properly enabled / invalidating the user pref caches when they are first deployed [10:58:45] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [10:59:12] (03CR) 10Mobrovac: [C: 031] realm: add $::app_routes hash [puppet] - 10https://gerrit.wikimedia.org/r/275443 (https://phabricator.wikimedia.org/T125673) (owner: 10Giuseppe Lavagetto) [10:59:18] hashar: that's intentional, We don't want show ores to people who hasn't enabled it as a beta feature [10:59:24] but when you enable it [10:59:40] on my account it just worked but I have beta feature auto enable [11:00:06] it doesn't work properly [11:00:15] http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:RecentChanges&hidenondamaging=1 [11:00:40] that link only shows me damaging edits [11:00:47] in this case it must highlights all rows but [11:01:04] it doesn't higlight edits that are making new pages [11:01:08] but the "New pages" entries do not :D [11:01:28] even though it understands to put them in damaging edits [11:01:44] so you should amend the task and better explain the issue maybe? [11:01:54] Amir1: at least the new page entry lacks the CSS class 'damaging' [11:01:58] yeah, I assumed too much [11:02:14] thanks for pointing out [11:02:17] a random entry has "mw-changeslist-ns0-Echo_test_page_44174601353330810529329695458487451239 mw-changeslist-line-not-watched mw-line-odd" [11:02:26] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [11:02:29] (that is on the entry
  • element) [11:02:51] yeah and it should have the damaging class added oer line 202 in here [11:02:52] another one has: "mw-changeslist-ns3-Selenium_user mw-changeslist-line-not-watched mw-line-even damaging" (last is "damaging" i.e. correct) [11:03:00] https://github.com/wikimedia/mediawiki-extensions-ORES/blob/master/includes/Hooks.php#L201 [11:03:04] sorry line 201 [11:03:32] and it adds it in my localhost and several other places [11:03:34] RECOVERY - DPKG on logstash1003 is OK: All packages OK [11:03:47] but not in beta [11:04:34] http://mw-revscoring.wmflabs.org/w/index.php?title=Special:RecentChanges&hidenondamaging=1&days=30 [11:05:01] hashar: ^ (you need to make an account and enable it in beta features, needless to say use a dummy password) [11:05:34] !log rebooting subra/suhail for kernel upgrade [11:05:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:05:43] (03PS3) 10Giuseppe Lavagetto: mobileapps: point to $rb_route, not to the local restbase cluster [puppet] - 10https://gerrit.wikimedia.org/r/275538 [11:05:45] (03PS3) 10Giuseppe Lavagetto: iegreview: use $parsoid_primary [puppet] - 10https://gerrit.wikimedia.org/r/275539 (https://phabricator.wikimedia.org/T125673) [11:07:07] 6Operations, 10hardware-requests: additional graphite machines request, 1x per DC - https://phabricator.wikimedia.org/T126253#2122103 (10mark) a:5mark>3None Approved. [11:07:21] Amir1: maybe the condition if ( strpos( $s, $separator ) === false ) { is not matched [11:08:28] I checked in the source code but nothing useful, the separator is hard-coded in mediawiki/core [11:08:48] but I check again [11:10:12] (03CR) 10Mobrovac: cxserver: use dc-aware urls (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/275537 (https://phabricator.wikimedia.org/T125065) (owner: 10Giuseppe Lavagetto) [11:10:36] 6Operations, 6Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Wikitechwiki has 4xx responses to requests for some static assets inc. poweredby_mediawiki_88x31.png and WikiEditor's button-sprite.svg - https://phabricator.wikimedia.org/T128747#2084612 (10Nikerabbit) Broken again (still?). GET https://wi... [11:14:24] (03PS4) 10Giuseppe Lavagetto: iegreview: use $::parsoid_site [puppet] - 10https://gerrit.wikimedia.org/r/275539 (https://phabricator.wikimedia.org/T125673) [11:14:26] (03PS2) 10Giuseppe Lavagetto: parsoid::testing: use master_dc variables [puppet] - 10https://gerrit.wikimedia.org/r/275814 (https://phabricator.wikimedia.org/T124670) [11:18:25] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:18:35] (03CR) 10Mobrovac: [C: 031] "Would be nice to also get rid of zotero::http_proxy" [puppet] - 10https://gerrit.wikimedia.org/r/277486 (https://phabricator.wikimedia.org/T125065) (owner: 10Giuseppe Lavagetto) [11:24:25] !log Updated Wikidata's property suggester with data from Monday's json dump [11:24:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:31:20] (03CR) 10Mobrovac: [C: 04-1] restbase: make restbase configuration use application routes (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/275536 (https://phabricator.wikimedia.org/T126235) (owner: 10Giuseppe Lavagetto) [11:34:32] 6Operations, 6Discovery, 10Wikimedia-Logstash, 3Discovery-Search-Sprint, and 2 others: logstash - nginx failed service start - https://phabricator.wikimedia.org/T129934#2122226 (10Gehel) [11:35:16] 6Operations, 6Discovery, 10Wikimedia-Logstash, 3Discovery-Search-Sprint, and 2 others: logstash - nginx failed service start - https://phabricator.wikimedia.org/T129934#2120380 (10Gehel) I reworked the changes to puppet elasticsearch module. Last puppet runs are successful on logstash servers as well. [11:35:44] (03PS1) 10Ema: Varnish support for shutting users out of a DC/cluster [puppet] - 10https://gerrit.wikimedia.org/r/277491 (https://phabricator.wikimedia.org/T129424) [11:36:39] (03CR) 10Mobrovac: mobileapps: point to $rb_route, not to the local restbase cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/275538 (owner: 10Giuseppe Lavagetto) [11:39:30] (03CR) 10Mobrovac: [C: 031] "As a side note, we ought to port iegreview to RB" [puppet] - 10https://gerrit.wikimedia.org/r/275539 (https://phabricator.wikimedia.org/T125673) (owner: 10Giuseppe Lavagetto) [11:41:34] (03CR) 10Mobrovac: [C: 031] parsoid::testing: use master_dc variables [puppet] - 10https://gerrit.wikimedia.org/r/275814 (https://phabricator.wikimedia.org/T124670) (owner: 10Giuseppe Lavagetto) [11:42:09] (03CR) 10Alexandros Kosiaris: [C: 031] Add ferm rules for redis access on maps cluster [puppet] - 10https://gerrit.wikimedia.org/r/277197 (owner: 10Muehlenhoff) [11:42:49] (03CR) 10Alexandros Kosiaris: [C: 04-1] "wrong port number" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/277198 (owner: 10Muehlenhoff) [11:45:01] (03CR) 10Alexandros Kosiaris: [C: 032] Add ferm rules for DNS auth servers (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/277258 (owner: 10Muehlenhoff) [11:45:12] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Pedantic comment" [puppet] - 10https://gerrit.wikimedia.org/r/277258 (owner: 10Muehlenhoff) [11:52:03] (03PS1) 10Hoo man: Enable allowDataAccessInUserLanguage on beta commons and wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277493 [11:52:35] (03CR) 10Hoo man: [C: 032] "Beta only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277493 (owner: 10Hoo man) [11:52:46] (03CR) 10BBlack: [C: 031] Varnish support for shutting users out of a DC/cluster [puppet] - 10https://gerrit.wikimedia.org/r/277491 (https://phabricator.wikimedia.org/T129424) (owner: 10Ema) [11:53:10] (03Merged) 10jenkins-bot: Enable allowDataAccessInUserLanguage on beta commons and wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277493 (owner: 10Hoo man) [11:55:58] !log hoo@tin Synchronized wmf-config/: Consistency sync (duration: 02m 29s) [11:56:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:56:32] A few codfw hosts timed out and mw2020 apparently changed its identification (reimage?) [11:58:27] nice stuff [11:58:37] hoo: sorry, I'm rebooting mw* servers in codfw for a kernel security update, wasn't aware of your work, I only checked the https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0March.C2.A015 page [11:59:03] moritzm: Not important [11:59:11] I was just syncing beta changes [11:59:29] <_joe_> hoo: mw2020 or 2090? [11:59:33] 2020 [11:59:41] <_joe_> oh, yes [12:00:00] <_joe_> but that was a few days ago, I'm perplexed [12:00:02] <_joe_> let me check [12:00:28] hoo: ok, this will continue for at least 1-2 hrs [12:00:48] <_joe_> it seems to be unreachable via ssh [12:02:15] (03PS2) 10Ema: Varnish support for shutting users out of a DC/cluster [puppet] - 10https://gerrit.wikimedia.org/r/277491 (https://phabricator.wikimedia.org/T129424) [12:02:55] dig http://wikidata.beta.wmflabs.org @8.8.8.8 [12:02:58] :'D [12:03:01] (test it) [12:03:03] (03CR) 10Ema: [C: 032 V: 032] Varnish support for shutting users out of a DC/cluster [puppet] - 10https://gerrit.wikimedia.org/r/277491 (https://phabricator.wikimedia.org/T129424) (owner: 10Ema) [12:03:06] (03PS1) 10ArielGlenn: pylint list-last-n-good-dumps.py first pass removing camelcase [puppet] - 10https://gerrit.wikimedia.org/r/277497 [12:03:13] lol nice [12:05:04] (03PS1) 10Ladsgroup: Flake8 for ganglia [puppet] - 10https://gerrit.wikimedia.org/r/277498 [12:05:05] can someone punicode encode that? [12:05:43] aw forget that, won't work :( [12:06:22] (03PS2) 10ArielGlenn: pylint list-last-n-good-dumps.py first pass removing camelcase [puppet] - 10https://gerrit.wikimedia.org/r/277497 [12:07:09] hashar: would like to make flake8 voting for operations/puppet? [12:07:34] I started to make some patches https://gerrit.wikimedia.org/r/277498 [12:13:06] 6Operations, 10hardware-requests: additional graphite machines request, 1x per DC - https://phabricator.wikimedia.org/T126253#2122267 (10fgiunchedi) a:3RobH [12:13:10] (03CR) 10jenkins-bot: [V: 04-1] pylint list-last-n-good-dumps.py first pass removing camelcase [puppet] - 10https://gerrit.wikimedia.org/r/277497 (owner: 10ArielGlenn) [12:15:58] (03PS3) 10ArielGlenn: pylint list-last-n-good-dumps.py first pass removing camelcase [puppet] - 10https://gerrit.wikimedia.org/r/277497 [12:16:10] (03PS4) 10ArielGlenn: pylint list-last-n-good-dumps.py first pass removing camelcase [puppet] - 10https://gerrit.wikimedia.org/r/277497 [12:18:00] (03CR) 10jenkins-bot: [V: 04-1] pylint list-last-n-good-dumps.py first pass removing camelcase [puppet] - 10https://gerrit.wikimedia.org/r/277497 (owner: 10ArielGlenn) [12:18:42] (03CR) 10Giuseppe Lavagetto: cxserver: use dc-aware urls (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/275537 (https://phabricator.wikimedia.org/T125065) (owner: 10Giuseppe Lavagetto) [12:20:51] (03PS5) 10ArielGlenn: pylint list-last-n-good-dumps.py first pass removing camelcase [puppet] - 10https://gerrit.wikimedia.org/r/277497 [12:26:13] (03CR) 10ArielGlenn: [C: 032] pylint list-last-n-good-dumps.py first pass removing camelcase [puppet] - 10https://gerrit.wikimedia.org/r/277497 (owner: 10ArielGlenn) [12:26:41] (03CR) 10Giuseppe Lavagetto: restbase: make restbase configuration use application routes (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/275536 (https://phabricator.wikimedia.org/T126235) (owner: 10Giuseppe Lavagetto) [12:27:54] (03PS5) 10Giuseppe Lavagetto: restbase: make restbase configuration use application routes [puppet] - 10https://gerrit.wikimedia.org/r/275536 (https://phabricator.wikimedia.org/T126235) [12:35:52] (03PS1) 10ArielGlenn: pylint list-last-n-good-dumps.py most of the rest [puppet] - 10https://gerrit.wikimedia.org/r/277500 [12:37:25] (03CR) 10ArielGlenn: [C: 032] pylint list-last-n-good-dumps.py most of the rest [puppet] - 10https://gerrit.wikimedia.org/r/277500 (owner: 10ArielGlenn) [12:38:38] akosiaris: hey, can you check this? https://phabricator.wikimedia.org/T125562 please reassign it to someone else if you don't have time to process [12:46:09] (03PS1) 10ArielGlenn: pylint and pep8 for rsync-dumps.py [puppet] - 10https://gerrit.wikimedia.org/r/277501 [12:46:56] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail [12:47:34] (03CR) 10ArielGlenn: [C: 032] pylint and pep8 for rsync-dumps.py [puppet] - 10https://gerrit.wikimedia.org/r/277501 (owner: 10ArielGlenn) [12:53:34] 6Operations, 6Performance-Team, 7Availability, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Dig through logs from 15 Mar 2016 read-only test and file bugs - https://phabricator.wikimedia.org/T129973#2122343 (10Aklapper) Please mark any findings as blocking {T129968} [13:00:05] PROBLEM - puppet last run on mw2020 is CRITICAL: Connection refused by host [13:00:34] PROBLEM - configured eth on mw2020 is CRITICAL: Connection refused by host [13:00:34] PROBLEM - nutcracker port on mw2020 is CRITICAL: Connection refused by host [13:00:50] Amir1: I am actually working on that one [13:00:55] PROBLEM - Check size of conntrack table on mw2020 is CRITICAL: Connection refused by host [13:01:04] PROBLEM - dhclient process on mw2020 is CRITICAL: Connection refused by host [13:01:05] PROBLEM - RAID on mw2020 is CRITICAL: Connection refused by host [13:01:15] PROBLEM - Disk space on mw2020 is CRITICAL: Connection refused by host [13:01:15] PROBLEM - Apache HTTP on mw2020 is CRITICAL: Connection refused [13:01:25] PROBLEM - nutcracker process on mw2020 is CRITICAL: Connection refused by host [13:01:26] PROBLEM - HHVM rendering on mw2020 is CRITICAL: Connection refused [13:01:46] PROBLEM - DPKG on mw2020 is CRITICAL: Connection refused by host [13:01:55] PROBLEM - salt-minion processes on mw2020 is CRITICAL: Connection refused by host [13:01:55] PROBLEM - HHVM processes on mw2020 is CRITICAL: Connection refused by host [13:02:59] (03PS2) 10Alexandros Kosiaris: lvs: remove port from ProxyFetch URL definitions [puppet] - 10https://gerrit.wikimedia.org/r/276756 [13:03:08] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] lvs: remove port from ProxyFetch URL definitions [puppet] - 10https://gerrit.wikimedia.org/r/276756 (owner: 10Alexandros Kosiaris) [13:08:10] thanks akosiaris :) [13:08:22] tell me if I can help in anything [13:09:20] twentyafterfour: Hi it seems phabricator is really slow for me. But other websites are loading faster for me so it doesn't look like it is my connection. [13:14:30] !log restarting elasticsearch server elastic2001.codfw.wmnet [13:14:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:15:04] PROBLEM - DPKG on iridium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:16:05] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:21:42] (03PS4) 10KartikMistry: Enable non-default MT for some languages [puppet] - 10https://gerrit.wikimedia.org/r/277463 (https://phabricator.wikimedia.org/T129849) [13:21:49] akosiaris: did you checked packages after I pushed tag(s)? [13:21:55] PROBLEM - Disk space on iridium is CRITICAL: DISK CRITICAL - /var/spool/exim4/db is not accessible: Permission denied [13:23:45] RECOVERY - Disk space on iridium is OK: DISK OK [13:24:33] kart_: hmm I think not.. lemme do that now [13:25:40] kart: gbp:error: upstream/3.3.2_r63423 is not a valid treeish in https://gerrit.wikimedia.org/r/#/c/269115/ [13:27:00] (03CR) 10Alexandros Kosiaris: "Problems above still stands" [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/269115 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [13:27:25] ah, that means gbp is not recognizing tag? [13:27:55] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: Puppet has 9 failures [13:28:03] (03CR) 10Alexandros Kosiaris: "configure:2579: error: Package requirements (apertium >= 3.4.0) were not met:" [debs/contenttranslation/apertium-dan] - 10https://gerrit.wikimedia.org/r/269912 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [13:28:34] (03CR) 10Alexandros Kosiaris: "configure:2579: error: Package requirements (apertium >= 3.4.0) were not met:" [debs/contenttranslation/apertium-nob] - 10https://gerrit.wikimedia.org/r/269914 (https://phabricator.wikimedia.org/T124317) (owner: 10KartikMistry) [13:29:35] (03CR) 10Alexandros Kosiaris: "configure:2579: error: Package requirements (apertium >= 3.4.0) were not met:" [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/269916 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [13:30:07] (03CR) 10Alexandros Kosiaris: "configure:2579: error: Package requirements (apertium >= 3.4.0) were not met:" [debs/contenttranslation/apertium-nno] - 10https://gerrit.wikimedia.org/r/269915 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [13:30:29] (03CR) 10Alexandros Kosiaris: "Above issue still stands" [debs/contenttranslation/giella-core] - 10https://gerrit.wikimedia.org/r/270671 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [13:30:54] kart_: I 've added comments on all of them [13:33:19] kart_: it probably means you pushed a different tag [13:33:46] I see on lttoolbox upstream/3.3.0.56152 while upstream/3.3.2_r63423 is requested [13:34:04] or not pushed a new tag at all [13:34:38] a freshly cloned giella-core has no tag at all [13:42:04] akosiaris: will fix [13:42:17] akosiaris: we need apertium 3.4.0 now, will also push it. [13:46:57] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2122384 (10Gehel) HTTPS is now active on all elasticsearch servers on port 9243. Still to do (non exhaustive list): * re-generate SSL cert... [13:47:45] PROBLEM - PyBal backends health check on lvs2006 is CRITICAL: PYBAL CRITICAL - apaches_80 - Could not depool server mw2114.codfw.wmnet because of too many down! [13:49:35] RECOVERY - PyBal backends health check on lvs2006 is OK: PYBAL OK - All pools are healthy [13:51:04] <_joe_> what's that ^? [13:51:08] <_joe_> moritzm: ? [13:51:16] RECOVERY - DPKG on iridium is OK: All packages OK [13:52:58] I am unsure as well... transient ? [13:53:08] _joe_: I had rebooted 2100-2119 together, reducing my batches to ten now [13:53:16] ah, ok [13:53:21] lol [13:53:42] although a batch of 20 worked twice before, dunno why it flagged this time [13:55:05] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:00:17] 6Operations, 10ops-codfw: mw2066 to mw2074 don't reboot cleanly - https://phabricator.wikimedia.org/T130008#2122402 (10MoritzMuehlenhoff) [14:00:27] (03CR) 10Mobrovac: restbase: make restbase configuration use application routes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/275536 (https://phabricator.wikimedia.org/T126235) (owner: 10Giuseppe Lavagetto) [14:15:10] !log restarting elasticsearch server elastic2002.codfw.wmnet [14:15:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:18:56] 6Operations, 10media-storage: [tracking] refresh swift hardware in codfw/eqiad - https://phabricator.wikimedia.org/T130012#2122471 (10fgiunchedi) [14:19:28] 6Operations, 10media-storage: add ms-be1019 / 1020 / 1021 to swift - https://phabricator.wikimedia.org/T118183#2122484 (10fgiunchedi) 5Open>3Resolved machines are in service, for weight / rack-zone allocation / etc see {T130012} [14:20:48] (03PS1) 10Hoo man: Enable arbitrary access on beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277510 [14:21:28] (03CR) 10Hoo man: [C: 032] "beta only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277510 (owner: 10Hoo man) [14:21:58] (03Merged) 10jenkins-bot: Enable arbitrary access on beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277510 (owner: 10Hoo man) [14:23:07] !log hoo@tin Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 36s) [14:23:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:25:34] PROBLEM - Host mw2147 is DOWN: PING CRITICAL - Packet loss = 100% [14:26:25] PROBLEM - Host mw2145 is DOWN: PING CRITICAL - Packet loss = 100% [14:26:36] PROBLEM - HHVM rendering on mw2144 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.161 second response time [14:26:44] RECOVERY - Host mw2147 is UP: PING OK - Packet loss = 0%, RTA = 36.52 ms [14:26:49] ^ extended downtime [14:26:54] RECOVERY - Host mw2145 is UP: PING OK - Packet loss = 0%, RTA = 36.10 ms [14:28:34] RECOVERY - HHVM rendering on mw2144 is OK: HTTP OK: HTTP/1.1 200 OK - 68188 bytes in 3.509 second response time [14:30:23] 6Operations, 10Traffic, 10Wikimedia-IRC-RC-Server, 7HTTPS, and 2 others: Stop rewriting URLs to unencrypted HTTP in the IRC feed - https://phabricator.wikimedia.org/T122933#2122521 (10faidon) a:5faidon>3None [14:30:25] RECOVERY - Host ms-fe1004 is UP: PING WARNING - Packet loss = 80%, RTA = 0.71 ms [14:30:49] oh hi ms-fe1004 [14:31:07] godog: all fixed [14:31:15] cmjohnson1: sweet, thanks! what was it? [14:31:30] the NIC card had to be reseated [14:32:38] 6Operations, 10ops-eqiad: ms-fe1004 off the network - https://phabricator.wikimedia.org/T129896#2122527 (10Cmjohnson) 5Open>3Resolved a:3Cmjohnson Verified that the sfp's and fiber were okay. Re-seated NIC card and link was re-established. [14:35:43] bah [14:40:28] (03CR) 10Subramanya Sastry: [C: 031] parsoid::testing: use master_dc variables [puppet] - 10https://gerrit.wikimedia.org/r/275814 (https://phabricator.wikimedia.org/T124670) (owner: 10Giuseppe Lavagetto) [14:43:00] <_joe_> subbu: thanks :) [14:43:10] 6Operations, 10Traffic, 10Wikimedia-IRC-RC-Server, 7HTTPS, and 2 others: Stop rewriting URLs to unencrypted HTTP in the IRC feed - https://phabricator.wikimedia.org/T122933#1916986 (10hashar) Note the IRC feed URLs have been set explicitly to HTTP because of {T31925}. Using `https://` broke patrolling bot... [14:43:34] yw [14:43:51] (03CR) 10Hashar: "That will surely break a few patrolling bots which is the reason the URLs are hardcoded to http T31925 . I commented a bit more on T12293" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/217858 (https://phabricator.wikimedia.org/T122933) (owner: 10Faidon Liambotis) [14:44:04] (03PS3) 10Alexandros Kosiaris: lvs: normalize ProxyFetch URL configuration [puppet] - 10https://gerrit.wikimedia.org/r/276739 [14:46:12] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "Discussions on IRC with bblack, pointed out that we do lose some data in this configuration change. At the same time, that data is not cle" [puppet] - 10https://gerrit.wikimedia.org/r/276739 (owner: 10Alexandros Kosiaris) [14:46:49] (03PS7) 10Giuseppe Lavagetto: realm: add $::app_routes hash [puppet] - 10https://gerrit.wikimedia.org/r/275443 (https://phabricator.wikimedia.org/T125673) [14:49:43] (03CR) 10Giuseppe Lavagetto: [C: 032] realm: add $::app_routes hash [puppet] - 10https://gerrit.wikimedia.org/r/275443 (https://phabricator.wikimedia.org/T125673) (owner: 10Giuseppe Lavagetto) [14:52:22] (03PS4) 10Giuseppe Lavagetto: cxserver: use dc-aware urls [puppet] - 10https://gerrit.wikimedia.org/r/275537 (https://phabricator.wikimedia.org/T125065) [14:53:04] (03PS2) 10Thcipriani: Pass deploy user from service::node [puppet] - 10https://gerrit.wikimedia.org/r/277423 [14:53:34] 6Operations, 10ops-codfw: mw2066 to mw2074 don't reboot cleanly - https://phabricator.wikimedia.org/T130008#2122576 (10Papaul) Those are all servers related to T125088 [14:57:35] (03CR) 10Giuseppe Lavagetto: [C: 032] cxserver: use dc-aware urls [puppet] - 10https://gerrit.wikimedia.org/r/275537 (https://phabricator.wikimedia.org/T125065) (owner: 10Giuseppe Lavagetto) [14:57:40] !log restarting elasticsearch server elastic2003.codfw.wmnet [14:57:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:59:23] Hi. [15:00:04] anomie ostriches thcipriani marktraceur Krenair aude: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160315T1500). Please do the needful. [15:00:04] James_F csteipp Dereckson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:14] * csteipp is here! [15:00:45] Who's swatting/ [15:00:46] Howdy howdy, let's do this :) [15:00:52] * ostriches can do it this morn' [15:01:39] Chris was first to respond, lets do him :) [15:01:41] I want to make pep8 voting for operations/puppet. Anyone has some time reviewing this? https://gerrit.wikimedia.org/r/#/c/277498/ [15:01:48] I'll stop rebooting mw* servers until the morning swat is done [15:02:00] Amir1: the ganglia are libs [15:02:00] Amir1: Put it on puppetswat :) [15:02:09] <_joe_> ostriches: mw2020 is being reimaged right now [15:02:09] Amir1: Might want to poke mutante|away [15:02:13] I've got a few other things for pep8/flake8 validation up too [15:02:48] thanks [15:02:49] <_joe_> ostriches: that patch definitely won't go through puppetswat [15:03:08] Ok nvm then :) [15:03:14] Amir1: there is also https://gerrit.wikimedia.org/r/#/c/244148/ and jayvdb fixed a lot more on https://gerrit.wikimedia.org/r/#/c/263866/ [15:03:14] <_joe_> and I think there have been preceding -1s on "pep8 only" patches for the ganglia dir [15:03:15] (03CR) 10Chad: [C: 032] Password policies for advanced permission groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272660 (https://phabricator.wikimedia.org/T119100) (owner: 10CSteipp) [15:03:25] Hey. [15:03:31] <_joe_> I think we should instead skip checking those files [15:03:36] (03CR) 10Gehel: [C: 031] pep8: don't use a lambda in check_legal_html.py [puppet] - 10https://gerrit.wikimedia.org/r/267149 (owner: 10Chad) [15:03:52] \o/ [15:03:53] <_joe_> they're from upstream, right? [15:04:04] (03Merged) 10jenkins-bot: Password policies for advanced permission groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272660 (https://phabricator.wikimedia.org/T119100) (owner: 10CSteipp) [15:04:20] * James_F bates with waited breath. [15:04:21] (03CR) 10Hashar: [C: 04-1] "The ganglia plugins are imported from 3rd parties, so there is little sense to fix pep8 issues in them. We should just ignore them. That " [puppet] - 10https://gerrit.wikimedia.org/r/277498 (owner: 10Ladsgroup) [15:04:23] * aude is happy to do swat tomorrow or once in a while (though not every day...) [15:04:28] (03CR) 10Gehel: [C: 031] pep8 fixes all over the place [puppet] - 10https://gerrit.wikimedia.org/r/267150 (owner: 10Chad) [15:04:37] <_joe_> hashar: heh :) [15:05:02] udplog2socket and diskstat look like they could go through [15:05:04] that is all yet another side task I am not pushing forward :( [15:05:13] (03CR) 10Gehel: [C: 031] phab_epipe.py: don't use lambda when it's not needed [puppet] - 10https://gerrit.wikimedia.org/r/275855 (owner: 10Chad) [15:05:21] the third one I would not want to do on a puppet swat [15:05:25] !log demon@tin Synchronized wmf-config/CommonSettings.php: password policy stuff (duration: 00m 32s) [15:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:05:30] csteipp: ^^^^^^^ [15:06:12] (03CR) 10Gehel: [C: 031] demux.py: don't import os, unused [puppet] - 10https://gerrit.wikimedia.org/r/276533 (owner: 10Chad) [15:06:14] James_F: Copy+paste fail? https://gerrit.wikimedia.org/r/#/c/271712/ was merged last week [15:06:29] (03CR) 10Chad: [C: 032] Enable VisualEditor Single Edit Tab on the Polish Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274130 (https://phabricator.wikimedia.org/T128477) (owner: 10Jforrester) [15:06:29] ostriches: Thanks! [15:06:47] (03CR) 10Dereckson: "The changes look goods to me, the removed import weren't used or already declared, the indent seems coherent." [puppet] - 10https://gerrit.wikimedia.org/r/277498 (owner: 10Ladsgroup) [15:06:56] ostriches: Oh, oops, that should be https://gerrit.wikimedia.org/r/#/c/271713/ [15:07:03] (03Merged) 10jenkins-bot: Enable VisualEditor Single Edit Tab on the Polish Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274130 (https://phabricator.wikimedia.org/T128477) (owner: 10Jforrester) [15:07:04] Figured [15:07:20] (03CR) 10Gehel: [C: 031] Gerrit: Make git directory location configurable so we can move it [puppet] - 10https://gerrit.wikimedia.org/r/276764 (owner: 10Chad) [15:07:50] ostriches: Maybe in different syncs though. [15:07:56] (03CR) 10Chad: [C: 032] Enable VisualEditor for IP users on the German Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271713 (https://phabricator.wikimedia.org/T127881) (owner: 10Jforrester) [15:08:06] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: VE single edit table plwiki (duration: 00m 29s) [15:08:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:08:11] James_F: Yep. Just put plwiki one live. ^^^ [15:08:16] Gonna do the dewiki one now [15:08:16] Yay. [15:08:18] * James_F tests. [15:08:43] (03Merged) 10jenkins-bot: Enable VisualEditor for IP users on the German Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271713 (https://phabricator.wikimedia.org/T127881) (owner: 10Jforrester) [15:09:45] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: VE for ip users on dewiki (duration: 00m 28s) [15:09:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:09:50] 6Operations, 10ops-codfw: labstore2003-labstore2004 onsite setup taks - https://phabricator.wikimedia.org/T128764#2122605 (10Papaul) Network ports: both systems are racked in Row B rack B8 on asw-b8-codfw labstore2003 ge-8/0/9 labstore2004 ge-8/0/1 [15:09:51] (03CR) 10Chad: [C: 032] Taller d'iniciació a la Viquipèdia, Montserrat throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276900 (https://phabricator.wikimedia.org/T129490) (owner: 10Dereckson) [15:10:02] James_F: and dewiki change live now too [15:10:10] 6Operations, 10ops-codfw: labstore2003-labstore2004 onsite setup taks - https://phabricator.wikimedia.org/T128764#2122618 (10Papaul) [15:10:20] (03CR) 10Paladox: [C: 031] Gerrit: Make git directory location configurable so we can move it [puppet] - 10https://gerrit.wikimedia.org/r/276764 (owner: 10Chad) [15:10:25] No Dereckson? [15:10:32] (03CR) 10Dereckson: "According https://github.com/ganglia/monitor-core/blob/master/gmond/python_modules/disk/diskstat.py some of the issues have already been f" [puppet] - 10https://gerrit.wikimedia.org/r/277498 (owner: 10Ladsgroup) [15:10:32] ostriches: Yup, looks good. [15:10:37] (03Merged) 10jenkins-bot: Taller d'iniciació a la Viquipèdia, Montserrat throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276900 (https://phabricator.wikimedia.org/T129490) (owner: 10Dereckson) [15:11:26] PROBLEM - HHVM jobrunner on mw1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:11:35] !log demon@tin Synchronized wmf-config/throttle.php: Taller d'iniciació a la Viquipèdia, Montserrat throttle rule (duration: 00m 27s) [15:11:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:11:58] Throttle I'll do, but I don't wanna make the guwki* changes without someone who's watchin it [15:12:27] k [15:13:36] !log rebooting californium for kernel upgrade [15:13:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:13:48] is there any phab task, etc. to dicuss bets approach on enabling flake8 in puppet? [15:13:56] *best [15:14:54] PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors [15:15:46] ostriches: what watching do you suggest? [15:16:07] Oh snap, I didn't see you here! [15:16:16] Tab complete failed [15:16:26] Let's merge then, I just wanted someone to own it :) [15:16:27] Oh I were in stealth mode, I said 14:59:23 < Dereckson> Hi. before the jouncebot annoucement. [15:16:48] (03CR) 10Chad: [C: 032] Config changes for gu.wikiquote.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263614 (https://phabricator.wikimedia.org/T121853) (owner: 10Mdann52) [15:17:06] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 65.52% of data above the critical threshold [5000000.0] [15:17:25] (03Merged) 10jenkins-bot: Config changes for gu.wikiquote.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263614 (https://phabricator.wikimedia.org/T121853) (owner: 10Mdann52) [15:17:27] (03CR) 10Paladox: "@Chad gerrit 2.12.2 is available which I think fixes the bug described above." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/263631 (owner: 10Chad) [15:19:04] (03PS1) 10Papaul: Adding prodcution DNS for labstore2004 Bug:T128764 [dns] - 10https://gerrit.wikimedia.org/r/277516 (https://phabricator.wikimedia.org/T128764) [15:19:24] RECOVERY - Apache HTTP on mw2020 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.077 second response time [15:20:01] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: guwikiquote namespace and import stuff (duration: 00m 27s) [15:20:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:20:07] Testing. [15:21:17] Thx :) [15:22:08] ostriches: looks good to me. Timezone, namespaces tested. I'll ask original task reporter to test import. [15:23:10] (03PS2) 10Giuseppe Lavagetto: service::configuration: simplify hiera defs [puppet] - 10https://gerrit.wikimedia.org/r/277486 (https://phabricator.wikimedia.org/T125065) [15:23:29] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] service::configuration: simplify hiera defs [puppet] - 10https://gerrit.wikimedia.org/r/277486 (https://phabricator.wikimedia.org/T125065) (owner: 10Giuseppe Lavagetto) [15:24:33] Dereckson: Thx! [15:24:54] Ok, that's all of swat this morning. Tune in later today for more swat games and prizes. [15:26:20] (03PS6) 10Giuseppe Lavagetto: restbase: make restbase configuration use application routes [puppet] - 10https://gerrit.wikimedia.org/r/275536 (https://phabricator.wikimedia.org/T126235) [15:26:57] !log rebooting nobelium for kernel upgrade [15:27:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:31:02] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: The refreshLinks jobs enqueue rate is 10 times the normal rate - https://phabricator.wikimedia.org/T129517#2122734 (10ArielGlenn) I am having a look at the logs. I found a few wikis that show up at the top of the list these last few days but not earlier,... [15:31:46] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [15:32:55] RECOVERY - Check size of conntrack table on mw2020 is OK: OK: nf_conntrack is 0 % full [15:33:04] (03PS1) 10Nikerabbit: Set valid content language for Norwegian wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277519 (https://phabricator.wikimedia.org/T126146) [15:33:06] RECOVERY - Disk space on mw2020 is OK: DISK OK [15:33:24] RECOVERY - HHVM processes on mw2020 is OK: PROCS OK: 6 processes with command name hhvm [15:33:44] RECOVERY - nutcracker port on mw2020 is OK: TCP OK - 0.000 second response time on port 11212 [15:33:44] RECOVERY - configured eth on mw2020 is OK: OK - interfaces up [15:33:46] RECOVERY - RAID on mw2020 is OK: OK: no RAID installed [15:34:05] RECOVERY - dhclient process on mw2020 is OK: PROCS OK: 0 processes with command name dhclient [15:34:25] RECOVERY - nutcracker process on mw2020 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [15:34:26] RECOVERY - DPKG on mw2020 is OK: All packages OK [15:34:54] RECOVERY - salt-minion processes on mw2020 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:35:56] (03PS5) 10KartikMistry: Enable non-default MT for some languages [puppet] - 10https://gerrit.wikimedia.org/r/277463 (https://phabricator.wikimedia.org/T129849) [15:36:24] RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct [15:36:25] PROBLEM - puppet last run on mw2020 is CRITICAL: CRITICAL: Puppet has 7 failures [15:37:10] (03CR) 10Ema: Port varnishlog to new VSL API (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) (owner: 10Ema) [15:37:24] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: The refreshLinks jobs enqueue rate is 10 times the normal rate - https://phabricator.wikimedia.org/T129517#2122763 (10ArielGlenn) Also I note here that Aaron's patch https://gerrit.wikimedia.org/r/#/c/277353/ is pending for merge.... [15:38:20] !log restarted pybal on lvs1007, lvs1008, lvs1009. Had already restarted pybal on lvs1010, lvs1011, lvs1012 about an hour before [15:38:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:38:31] (03PS12) 10Ema: Port varnishlog to new VSL API [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) [15:38:55] RECOVERY - HHVM rendering on mw2020 is OK: HTTP OK: HTTP/1.1 200 OK - 68211 bytes in 6.869 second response time [15:40:04] RECOVERY - puppet last run on mw2020 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:42:34] 6Operations, 10ops-codfw: labstore2003-labstore2004 onsite setup taks - https://phabricator.wikimedia.org/T128764#2122807 (10Papaul) chat with @RobH on IRC, according to him, those systems are setup with a HW RAID10 with no raid partman recipe. [15:44:08] 6Operations, 10ops-codfw: labstore2003-labstore2004 onsite setup taks - https://phabricator.wikimedia.org/T128764#2122809 (10RobH) They should be, as they are R510s with 12 disks and a perc raid controller. They may be in an invalid configuration, and the raid bios should be checked. [15:48:03] 6Operations, 10ops-codfw: labstore2003-labstore2004 onsite setup taks - https://phabricator.wikimedia.org/T128764#2122812 (10RobH) >>! In T128764#2122605, @Papaul wrote: > Network ports: both systems are racked in Row B rack B8 on asw-b8-codfw > > labstore2003 ge-8/0/9 > labstore2004 ge-8/0/1 port descript... [15:50:16] 6Operations, 10ops-codfw, 6Labs: Figure out what labstore hardware is viable in codfw - https://phabricator.wikimedia.org/T128083#2063474 (10RobH) Interface Admin Link Description ge-1/0/0 up up labstore2001 ge-1/0/1 up down labstore2002 So labstore2002 is enabled and on the rig... [15:50:37] !log rebooting analytics1027 for kernel upgrade [15:50:38] 6Operations, 10Analytics-Cluster, 10hardware-requests: eqiad: New Hive / Oozie server node in eqiad Analytics VLAN - https://phabricator.wikimedia.org/T124945#2122817 (10Ottomata) Bump, @Robh, could we use WMF4541 instead of waiting? [15:50:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:53:34] !log disabled puppet on analytics1027 for camus stop and reboot to apply kernel update [15:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:53:38] (03CR) 10Giuseppe Lavagetto: [C: 032] restbase: make restbase configuration use application routes [puppet] - 10https://gerrit.wikimedia.org/r/275536 (https://phabricator.wikimedia.org/T126235) (owner: 10Giuseppe Lavagetto) [15:53:56] (03CR) 10Paladox: "2.12.1 fixes the migration from 2.11. See https://gerrit-documentation.storage.googleapis.com/ReleaseNotes/ReleaseNotes-2.12.1.html" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/263631 (owner: 10Chad) [15:55:24] <_joe_> mobrovac: ^^ I'm applying the change on rb2001 now [15:55:39] kk _joe_ [15:56:06] (03CR) 10Chad: "I know." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/263631 (owner: 10Chad) [15:58:02] <_joe_> mobrovac: so graphoid isn't working properly in codfw [15:58:15] <_joe_> I just found out by running service checker on restbase [15:58:18] *sigh* [15:58:20] <_joe_> I correct myself [15:58:28] <_joe_> it's probably not set up correctly [15:58:36] * mobrovac going to check scb2001 [15:59:26] (03CR) 10Chad: [C: 04-1] "Err, maybe this doesn't work how I thought :\" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273004 (owner: 10Chad) [15:59:33] <_joe_> yup, I can't connect to the svc ip/port [15:59:53] ah _joe_, perhaps akosiaris needs to finish the LVS work? [15:59:53] _joe_: the LVS IP you mean ? [15:59:56] (03PS5) 10Alexandros Kosiaris: lvs: SC[AB] services lvs configuration [puppet] - 10https://gerrit.wikimedia.org/r/276199 (https://phabricator.wikimedia.org/T129234) [15:59:58] (03PS1) 10Alexandros Kosiaris: lvs: Add apertium.svc.{eqiad,codfw}.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/277525 [15:59:58] <_joe_> akosiaris: yes [15:59:59] hahaha [16:00:02] <_joe_> ahahahahahah [16:00:04] it's not advertised yet [16:00:04] _joe_ gehel: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160315T1600). Please do the needful. [16:00:04] mdholloway urandom ostriches kart_ mobrovac: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:06] <_joe_> fuck :P [16:00:11] * ostriches waves [16:00:17] I am fixing it as we speak more or less [16:00:17] <_joe_> mobrovac: I'd say we akc the alerts as they arrive [16:00:20] howdy [16:00:23] <_joe_> ostriches: is SWAT done? [16:00:31] Yeah, finished about 30m ago [16:00:40] _joe_: yup, makes sense [16:00:46] <_joe_> it's 9 patches [16:00:54] ah [16:00:54] <_joe_> I'm unsure we'll get to the bottom of it [16:00:55] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:00:59] and mine is the last one [16:01:13] mobrovac: I'm on 8th :) [16:01:15] * urandom looks at ^^^ [16:01:19] <_joe_> urandom: nope [16:01:21] :P [16:01:22] <_joe_> it's known [16:01:24] oh/ [16:01:26] ? [16:01:29] wassup? [16:01:30] <_joe_> urandom: that's ok it's akosiaris' fault :P [16:01:38] <_joe_> urandom: read backscroll :) [16:01:43] we brake things so we can fix'em urandom :) [16:01:44] <_joe_> mdholloway: you're up [16:01:49] 6Operations, 6Discovery, 10hardware-requests: Refresh elastic10{01..16}.eqiad.wmnet servers - https://phabricator.wikimedia.org/T128000#2122841 (10EBernhardson) [16:01:58] _joe_: ok [16:02:38] mobrovac: job security? [16:02:41] <_joe_> any other restbase-related config changes? [16:03:08] (03PS2) 10Alexandros Kosiaris: lvs: Add apertium.svc.{eqiad,codfw}.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/277525 [16:03:15] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] lvs: Add apertium.svc.{eqiad,codfw}.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/277525 (owner: 10Alexandros Kosiaris) [16:03:31] (03PS2) 10Giuseppe Lavagetto: Add Accept: header to RESTBase/Parsoid requests [puppet] - 10https://gerrit.wikimedia.org/r/275853 (https://phabricator.wikimedia.org/T128237) (owner: 10Mholloway) [16:03:48] _joe_: nothing else from me [16:03:59] 6Operations, 6Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Wikitechwiki has 4xx responses to requests for some static assets inc. poweredby_mediawiki_88x31.png and WikiEditor's button-sprite.svg - https://phabricator.wikimedia.org/T128747#2122852 (10Krenair) WFM too [16:04:01] _joe_: i have one, but it's not on the swat list, so we can leave it for another day [16:04:06] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Add Accept: header to RESTBase/Parsoid requests [puppet] - 10https://gerrit.wikimedia.org/r/275853 (https://phabricator.wikimedia.org/T128237) (owner: 10Mholloway) [16:04:34] <_joe_> mobrovac: I'm not restarting restbase in codfw [16:04:38] <_joe_> fyi [16:04:39] urandom: yup :) gotta make yourself indispensable [16:04:43] kk _joe_, better [16:06:19] <_joe_> mdholloway: running on scb1002, I'll tell you if I see immediate issues [16:07:00] <_joe_> ok all seems ok after the deploy [16:07:10] _joe_: excellent, thanks! [16:07:11] (03PS1) 10Papaul: DHCP: adding labstore200[3-4] MAC addrees Bug: T128764 [puppet] - 10https://gerrit.wikimedia.org/r/277527 (https://phabricator.wikimedia.org/T128764) [16:07:22] bblack, Krinkle, is esi.wmflabs.org currently in use? I’d like to delete it or move it to esi.varnish.wmflabs.org [16:07:50] !log restarted pybal on lvs2004, lvs2005, lvs2006 [16:07:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:07:54] <_joe_> urandom: your patch is not "puppet" [16:08:12] <_joe_> mdholloway: your patch is done, please test and tell me if it's ok [16:08:21] _joe_: great, looking now [16:08:23] !log restarted pybal on lvs1004, lvs1005, lvs1006 to pickup apertium change [16:08:26] _joe_: it's not? [16:08:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:08:34] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] dependencies needed for logstash filtering [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/277264 (https://phabricator.wikimedia.org/T128787) (owner: 10Eevans) [16:08:46] <_joe_> operations/software/logstash-logback-encoder [16:08:46] _joe_: oh, right [16:08:58] <_joe_> I mean, should I do anything else? :P [16:09:16] _joe_: nope [16:10:39] <_joe_> ostriches: so, I'm looking at https://gerrit.wikimedia.org/r/#/c/267149/, it seems ok [16:10:52] mobrovac: can you verify that RESTBase is now receiving Accept: headers from mobileapps? [16:11:04] godog, ori, Krinkle, is one of you the mind behind grafana.wmflabs.org? I’m wondering if it can be behind a web proxy rather than using a public IP. [16:11:14] (03PS2) 10Giuseppe Lavagetto: pep8: don't use a lambda in check_legal_html.py [puppet] - 10https://gerrit.wikimedia.org/r/267149 (owner: 10Chad) [16:11:14] _joe_: thank you sir [16:11:23] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] pep8: don't use a lambda in check_legal_html.py [puppet] - 10https://gerrit.wikimedia.org/r/267149 (owner: 10Chad) [16:11:26] (03PS1) 10Catrope: Enable Flow beta feature on plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277529 (https://phabricator.wikimedia.org/T130009) [16:11:46] mdholloway: shold be ok, i see no complaints in the logs which is always a good thing [16:12:22] (03PS2) 10Giuseppe Lavagetto: pep8 fixes all over the place [puppet] - 10https://gerrit.wikimedia.org/r/267150 (owner: 10Chad) [16:12:54] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] pep8 fixes all over the place [puppet] - 10https://gerrit.wikimedia.org/r/267150 (owner: 10Chad) [16:13:05] 6Operations, 10Phabricator, 6Project-Admins, 6Triagers: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#2122884 (10mcruzWMF) Hi, me and @Ocaasi would like to become project managers. We are working together on the Wikimedia Resource Center, a si... [16:13:09] (03CR) 10Alex Monk: "yes, that's a known side affect of the change, but it needs to be done at some point. we can't support legacy stuff forever, and bots with" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/217858 (https://phabricator.wikimedia.org/T122933) (owner: 10Faidon Liambotis) [16:13:50] (03PS3) 10Giuseppe Lavagetto: phab_epipe.py: don't use lambda when it's not needed [puppet] - 10https://gerrit.wikimedia.org/r/275855 (owner: 10Chad) [16:14:06] <_joe_> ostriches: this phab_epipe was written by us? [16:14:18] <_joe_> yes, chase :) [16:14:28] mobrovac: just verified on https://logstash.wikimedia.org/#/dashboard/elasticsearch/restbase that accept headers are coming through. i think we're good. thanks again, _joe_! [16:14:35] _joe_: Yeah I think that's ours :) [16:15:29] (03CR) 10Giuseppe Lavagetto: [C: 032] phab_epipe.py: don't use lambda when it's not needed [puppet] - 10https://gerrit.wikimedia.org/r/275855 (owner: 10Chad) [16:15:39] 6Operations, 10Phabricator, 6Project-Admins, 6Triagers: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#2122904 (10Aklapper) @mcruzWMF, @Ocaasi: Are you planning to regularly create projects in Phabricator or change the settings of those project... [16:15:49] andrewbogott: not me [16:16:26] andrewbogott: I don't think we need it anymore though. For actual labs-y metrics we can use prod grafana and load things client-side from labsmon [16:16:32] that's how we do it currently anyway [16:16:32] (03PS2) 10Giuseppe Lavagetto: demux.py: don't import os, unused [puppet] - 10https://gerrit.wikimedia.org/r/276533 (owner: 10Chad) [16:16:41] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] demux.py: don't import os, unused [puppet] - 10https://gerrit.wikimedia.org/r/276533 (owner: 10Chad) [16:16:54] <_joe_> ostriches: I'm going to merge this lot now [16:16:56] !log restarting elasticsearch server elastic2004.codfw.wmnet [16:16:58] andrewbogott: but don't take my word for it, maybe someone else started using it [16:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:17:13] Krinkle: it sure looks like it could go behind a proxy [16:17:19] but I’ll wait a bit for ori to chime in [16:18:58] (03PS2) 10Giuseppe Lavagetto: Gerrit: Make git directory location configurable so we can move it [puppet] - 10https://gerrit.wikimedia.org/r/276764 (owner: 10Chad) [16:19:27] (03PS13) 10Ema: Port varnishlog to new VSL API [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) [16:20:05] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Gerrit: Make git directory location configurable so we can move it [puppet] - 10https://gerrit.wikimedia.org/r/276764 (owner: 10Chad) [16:20:11] 6Operations, 10Phabricator, 6Project-Admins, 6Triagers: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#2122918 (10mcruzWMF) >>! In T706#2122904, @Aklapper wrote: > @mcruzWMF, @Ocaasi: Are you planning to regularly create projects in Phabricator... [16:20:12] (03CR) 10Alex Monk: "is labtest set up to be able to test this?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/277456 (owner: 10Andrew Bogott) [16:21:56] (03CR) 10Ema: "pep8 and license fixes merged upstream:" [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) (owner: 10Ema) [16:21:58] _joe_: Lemme know when that last one's on puppetmaster so I can kick puppet on ytterbium. Should be a no-op but I wanna make sure I don't bring down gerrit :D [16:22:12] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "Adding 2K lines to the hiera data there? As I said time and time before, this is _not_ the place for this." [puppet] - 10https://gerrit.wikimedia.org/r/277463 (https://phabricator.wikimedia.org/T129849) (owner: 10KartikMistry) [16:22:48] <_joe_> ostriches: I'm doing it already [16:22:53] Oh okies [16:23:28] _joe_: those defaults will go away soon next week. [16:23:28] <_joe_> noop [16:23:46] <_joe_> kart_: so we can wait next week, I guess; I'm not removing my -2 [16:23:56] _joe_: gerrit looks happy to me. thx [16:24:10] <_joe_> next week we have a code freeze, so I guess that won't happen [16:24:13] _joe_: We have planned deployment today :/ [16:24:15] (03PS1) 10Alexandros Kosiaris: Add apertium LVS IP on LVS servers [puppet] - 10https://gerrit.wikimedia.org/r/277530 [16:24:17] (fwiw: this is in prep of being able to move to a more sane partition scheme on the new box) [16:24:37] <_joe_> kart_: you can only blame yourself for this. I told you to fix this way of configuring your software 6 months ago or so. [16:24:50] <_joe_> but you can see if someone in ops doesn't agree with me [16:25:01] <_joe_> I am _not_ merging this, period. [16:27:05] !log reenabled puppet on analytics1027 [16:27:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:27:19] <_joe_> mobrovac: what should I do for your change? [16:27:21] (03PS14) 10Ema: Port varnishlog to new VSL API [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) [16:27:26] <_joe_> apart from merging it I mean [16:27:32] !log osmium - stopping rsyncd, removing remnants from backup job for ruthenium upgrade T122328 [16:27:33] T122328: Update ruthenium to Debian jessie from Ubuntu 12.04 - https://phabricator.wikimedia.org/T122328 [16:27:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:27:42] (03PS3) 10Giuseppe Lavagetto: Mathoid: enable PNG generation [puppet] - 10https://gerrit.wikimedia.org/r/276734 (https://phabricator.wikimedia.org/T71702) (owner: 10Mobrovac) [16:27:44] (03CR) 10Ema: [C: 032 V: 032] Port varnishlog to new VSL API [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) (owner: 10Ema) [16:27:50] _joe_: as soon as puppet runs, it'll restart mathoid [16:28:02] <_joe_> mobrovac: ok so just verify it [16:28:04] <_joe_> cool [16:28:08] yup [16:28:12] <_joe_> I'll test with a codfw host then :P [16:28:16] (03CR) 10Andrew Bogott: "> is labtest set up to be able to test this?" [puppet] - 10https://gerrit.wikimedia.org/r/277456 (owner: 10Andrew Bogott) [16:28:21] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Mathoid: enable PNG generation [puppet] - 10https://gerrit.wikimedia.org/r/276734 (https://phabricator.wikimedia.org/T71702) (owner: 10Mobrovac) [16:28:29] (03PS4) 10Giuseppe Lavagetto: Mathoid: enable PNG generation [puppet] - 10https://gerrit.wikimedia.org/r/276734 (https://phabricator.wikimedia.org/T71702) (owner: 10Mobrovac) [16:28:39] (03CR) 10Giuseppe Lavagetto: [V: 032] Mathoid: enable PNG generation [puppet] - 10https://gerrit.wikimedia.org/r/276734 (https://phabricator.wikimedia.org/T71702) (owner: 10Mobrovac) [16:30:14] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [16:30:40] !log osmium - delete /srv/ruthenium data that has already been copied back [16:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:31:54] <_joe_> mobrovac: service_checker is happy [16:32:10] 6Operations, 6Parsing-Team, 10Parsoid, 6Services, 13Patch-For-Review: Update ruthenium to Debian jessie from Ubuntu 12.04 - https://phabricator.wikimedia.org/T122328#2122943 (10Dzahn) forgot to cleanup on osmium. moritz asked about the rsyncd there. on osmium: stopped rsyncd, deleted config/init script... [16:32:25] hm seems it hasn't been restarted on scb1001 yet _joe_ [16:33:00] yup, puppet hasn't run yet [16:33:38] <_joe_> mobrovac: yeah i ran it just in codfw [16:33:53] <_joe_> i supposed waiting wasn't an issue for now [16:34:00] that explains it [16:34:03] nope [16:34:16] <_joe_> i just checked it didn't break [16:35:10] 6Operations, 10hardware-requests, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: MediaWiki maintenance host for codfw (terbium's equivalent) - https://phabricator.wikimedia.org/T126987#2122964 (10Papaul) [16:35:14] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [16:35:14] PROBLEM - check_puppetrun on heka is CRITICAL: CRITICAL: Puppet has 1 failures [16:35:58] 6Operations, 10Wikimedia-Stream, 7user-notice: reboot of rcs servers - https://phabricator.wikimedia.org/T130024#2122965 (10Dzahn) [16:36:34] (03PS2) 10Andrew Bogott: Add makedomain tool, for creation of domains in designate. [puppet] - 10https://gerrit.wikimedia.org/r/277456 [16:37:44] 6Operations, 10Wikimedia-Stream, 7user-notice: reboot of rcs servers (stream.wikimedia.org) - https://phabricator.wikimedia.org/T130024#2122980 (10Dzahn) [16:38:13] (03PS2) 10Alexandros Kosiaris: Add apertium LVS IP on LVS servers [puppet] - 10https://gerrit.wikimedia.org/r/277530 [16:40:04] 6Operations, 10RESTBase, 13Patch-For-Review: install restbase1010-restbase1015 - https://phabricator.wikimedia.org/T128107#2123010 (10Cmjohnson) @fgiunchedi restbase101[23] have ssds now, enabled on the switch and are ready for you to install. [16:40:14] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [16:40:14] PROBLEM - check_puppetrun on heka is CRITICAL: CRITICAL: Puppet has 1 failures [16:43:15] (03PS3) 10Alexandros Kosiaris: Add apertium LVS IP on LVS servers [puppet] - 10https://gerrit.wikimedia.org/r/277530 [16:43:21] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add apertium LVS IP on LVS servers [puppet] - 10https://gerrit.wikimedia.org/r/277530 (owner: 10Alexandros Kosiaris) [16:43:47] (03PS2) 10Dzahn: DHCP: adding labstore200[3-4] MAC addrees Bug: T128764 [puppet] - 10https://gerrit.wikimedia.org/r/277527 (https://phabricator.wikimedia.org/T128764) (owner: 10Papaul) [16:44:32] (03CR) 10Dzahn: [C: 032 V: 032] "Dell MACs" [puppet] - 10https://gerrit.wikimedia.org/r/277527 (https://phabricator.wikimedia.org/T128764) (owner: 10Papaul) [16:44:38] <_joe_> !log repooling mw2020 [16:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:45:14] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 300 seconds ago with 0 failures [16:45:14] RECOVERY - check_puppetrun on heka is OK: OK: Puppet is currently enabled, last run 185 seconds ago with 0 failures [16:45:54] 6Operations, 10ops-codfw: rack new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2123042 (10Papaul) [16:46:32] <_joe_> !log repooling mw1196 [16:46:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:46:44] !log labstore200[3-4] added to DHCP (T128764) @papaul #codfw [16:46:45] T128764: labstore2003-labstore2004 onsite setup taks - https://phabricator.wikimedia.org/T128764 [16:46:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:47:51] (03CR) 10Alex Monk: "I tried to do this:" [puppet] - 10https://gerrit.wikimedia.org/r/277456 (owner: 10Andrew Bogott) [16:48:06] 6Operations, 10hardware-requests, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: MediaWiki maintenance host for codfw (terbium's equivalent) - https://phabricator.wikimedia.org/T126987#2123048 (10Joe) mw2090 has been reimaged to act as terbium's replacement for now; we can work on the n... [16:48:17] (03PS2) 10Dzahn: Adding production DNS for labstore2004 [dns] - 10https://gerrit.wikimedia.org/r/277516 (https://phabricator.wikimedia.org/T128764) (owner: 10Papaul) [16:48:55] (03PS3) 10Dzahn: Adding production DNS for labstore2004 [dns] - 10https://gerrit.wikimedia.org/r/277516 (https://phabricator.wikimedia.org/T128764) (owner: 10Papaul) [16:49:33] cmjohnson1: thanks! re: restbase101[23], is it you on the console ? [16:50:20] (03CR) 10Dzahn: [C: 032] Adding production DNS for labstore2004 [dns] - 10https://gerrit.wikimedia.org/r/277516 (https://phabricator.wikimedia.org/T128764) (owner: 10Papaul) [16:51:28] (03CR) 10Andrew Bogott: "At least one of the issues with your test is that designate is picky about domain names -- you would need to request bast-test.wmflabs.org" [puppet] - 10https://gerrit.wikimedia.org/r/277456 (owner: 10Andrew Bogott) [16:51:31] !log authdns-update - add labstore2004 [16:51:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:51:40] <_joe_> !log repool mw1201 [16:51:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:52:42] 6Operations, 10ops-codfw, 13Patch-For-Review: labstore2003-labstore2004 onsite setup taks - https://phabricator.wikimedia.org/T128764#2123082 (10Dzahn) labstore2004.codfw.wmnet has address 10.192.21.8 [16:53:15] 6Operations, 10ops-codfw, 13Patch-For-Review: labstore2003-labstore2004 onsite setup taks - https://phabricator.wikimedia.org/T128764#2123085 (10Dzahn) [16:55:47] <_joe_> !log repooling mw1107 [16:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:57:34] (03CR) 10Dzahn: "looks like i already answered it myself back then , heh :)" [puppet] - 10https://gerrit.wikimedia.org/r/262670 (owner: 10Chad) [16:57:36] (03PS6) 10Dzahn: Use %{TIME_YEAR} instead of updating Wikimania redirects every year [puppet] - 10https://gerrit.wikimedia.org/r/262670 (owner: 10Chad) [16:58:14] !log restarting elasticsearch server elastic2005.codfw.wmnet [16:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:58:43] 6Operations, 7Wikimedia-log-errors: mw1099 has lost nutcracker - https://phabricator.wikimedia.org/T127939#2123102 (10Joe) 5Open>3Resolved a:3Joe [16:59:20] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2123105 (10EBernhardson) @Smalyshev Any thoughts on persistent http connections from php? My plan right now is to evaluate zend-http, which... [17:00:04] yurik gwicke cscott arlolra subbu: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160315T1700). [17:00:09] 6Operations: Investigate idle/depooled eqiad appservers - https://phabricator.wikimedia.org/T116256#2123108 (10Joe) 5Open>3Resolved a:5RobH>3Joe [17:00:14] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:51] 6Operations, 10hardware-requests: additional graphite machines request, 1x per DC - https://phabricator.wikimedia.org/T126253#2008955 (10RobH) [17:02:15] 6Operations, 10hardware-requests: additional graphite machines request, 1x per DC - https://phabricator.wikimedia.org/T126253#2008955 (10RobH) I've added T128910 as a blocker, as the codfw allocation will require one of these proposed spare pool systems. [17:04:54] _joe_: thanks for repooling the idle appservers [17:05:09] ganglia looks better now :) [17:05:13] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 188 seconds ago with 0 failures [17:05:14] PROBLEM - check_puppetrun on heka is CRITICAL: CRITICAL: Puppet has 1 failures [17:05:40] !log mw1017 (canary) - test wikimania redirect change, restart apache [17:05:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:06:24] (03CR) 10Dzahn: [C: 032] "tested on canary appserver mw1017, using apache-fast-test from tin" [puppet] - 10https://gerrit.wikimedia.org/r/262670 (owner: 10Chad) [17:07:57] (03PS3) 10Andrew Bogott: Add makedomain tool, for creation of domains in designate. [puppet] - 10https://gerrit.wikimedia.org/r/277456 [17:08:51] (03CR) 10Andrew Bogott: "I added env fallbacks to the args." [puppet] - 10https://gerrit.wikimedia.org/r/277456 (owner: 10Andrew Bogott) [17:09:50] (03PS5) 10MarcoAurelio: Enabling Translation extension on AffCom (chapcomwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275289 (https://phabricator.wikimedia.org/T66122) [17:10:13] PROBLEM - check_puppetrun on heka is CRITICAL: CRITICAL: Puppet has 1 failures [17:15:10] RECOVERY - check_puppetrun on heka is OK: OK: Puppet is currently enabled, last run 171 seconds ago with 0 failures [17:16:56] PROBLEM - LVS HTTP IPv4 on apertium.svc.codfw.wmnet is CRITICAL: Connection timed out [17:17:08] damn... [17:17:10] that's me [17:17:19] PROBLEM - LVS HTTP IPv4 on apertium.svc.eqiad.wmnet is CRITICAL: Connection refused [17:17:40] ignore [17:17:57] <_joe_> both? [17:18:02] yes [17:18:04] (03PS4) 10Dzahn: salt-misc: set bastion host based on realm as $2 [software] - 10https://gerrit.wikimedia.org/r/276884 [17:18:05] <_joe_> ok [17:18:29] mutante: when we get a page and know why, would it help you if one of us replied to the page email for you to parse during the weekly meeting pages for awareness? [17:18:47] i know next monday you'll list off each page, just trying to make it easier on ya =] [17:20:23] robh: thanks, but not really, i just search for "to alerts@" and they are in addition to usual icinga mail [17:20:38] cool, just wanted to check [17:24:30] at former job: have icinga and setup asterisk, notification command of icinga is to trigger an outgoing phone call on the asterisk box, people get called, hear automatic message from asterisk, and then you can have "if you want to ACK this, press 1" and stuff, by scripting a phone menu.. and give the info back to icinga [17:24:55] that scripting language to do that was just horribly annoying [17:26:01] 6Operations, 10MobileFrontend, 10Traffic, 3Reading-Web-Sprint-68-"Java and JavaScript are basically the same", and 4 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getT... - https://phabricator.wikimedia.org/T124356#2123192 [17:26:33] (03PS5) 10Dzahn: salt-misc: set bastion host based on realm as $2 [software] - 10https://gerrit.wikimedia.org/r/276884 [17:28:27] (03CR) 10Dzahn: [C: 032] "@ArielGlenn done, quoted, works also if the user doesn't supply $2 and since you said you like it.. merge.." [software] - 10https://gerrit.wikimedia.org/r/276884 (owner: 10Dzahn) [17:28:57] (03CR) 10Dzahn: [V: 032] salt-misc: set bastion host based on realm as $2 [software] - 10https://gerrit.wikimedia.org/r/276884 (owner: 10Dzahn) [17:30:02] (03PS3) 10Dzahn: salt-misc: add new target_type role (WIP) [software] - 10https://gerrit.wikimedia.org/r/276890 [17:32:46] godog: sorry yep I was on console [17:32:47] off now [17:34:18] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2123260 (10Smalyshev) Reimplementing full HTTP is kind of PITA, and I'm not sure how up-to-date is Zend/Http with all new HTTP stuff. If it... [17:37:02] !log restarting elasticsearch server elastic2006.codfw.wmnet [17:37:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:37:56] (03PS1) 10Alexandros Kosiaris: lvs: Fix apertium LVS IP assignment [puppet] - 10https://gerrit.wikimedia.org/r/277555 [17:38:10] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] lvs: Fix apertium LVS IP assignment [puppet] - 10https://gerrit.wikimedia.org/r/277555 (owner: 10Alexandros Kosiaris) [17:38:19] 6Operations, 10ops-eqiad, 10Dumps-Generation: Rack and setup snapshot1005-1007 - https://phabricator.wikimedia.org/T129553#2123266 (10Cmjohnson) [17:39:34] (03CR) 10Legoktm: "Since you asked...why not keep it at horizon? Most other services are named after their software, not functionality (i.e. phabricator, gra" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [17:39:34] 6Operations, 10ops-eqiad: ms1001 bad disk - https://phabricator.wikimedia.org/T129008#2123283 (10Cmjohnson) 5Open>3Resolved a:3Cmjohnson Fixed [17:42:09] (03CR) 10Andrew Bogott: "'Horizon' seems terrible from a UI standpoint. If you're trying to modify something on labs and can't remember the URL, you're at least l" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [17:42:11] (03CR) 10Greg Grossmeier: "+1 to horizon/what Lego said." [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [17:42:25] 6Operations, 10ops-eqiad: db1053 failed disk (degraded RAID) - https://phabricator.wikimedia.org/T129829#2117255 (10Cmjohnson) Replaced disk and it's rebuilding Firmware state: Rebuild [17:42:45] dangit, I should have waited to +1 [17:43:42] openstack-dashboard-horizon-django.wikimedia.org [17:43:49] (03CR) 10Rush: "I am open to whatever the general preference is but I do prefer horizon" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [17:44:02] how is ‘horizon’ not terrible? [17:44:12] I mean, obviously it’s easier for me to leave it as is :) [17:44:14] 6Operations, 6Editing-Department, 6Parsing-Team, 6Services: Services team goals April - June 2016 (Q4 2015/16) - https://phabricator.wikimedia.org/T118871#2123306 (10GWicke) We discussed this in today's team meeting, and decided to reduce the number of goals to two, and focusing one more quarter on 1) AP... [17:44:42] andrewbogott: it's consistently terrible! like phabricator.wikimedia.org instead of bugs.wikimedia.org, etc. [17:44:44] I mean it's a name as bad as any other but labsdashboard seems bad as it furthers the labs labs labs confusion [17:44:57] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2123310 (10Smalyshev) Found also this for hhvm - https://docs.hhvm.com/hack/reference/function/curl_init_pooled/ - that may be useful. [17:45:02] But, you know, we call dishwashers ‘dishwashers’ and not wakalixes because you can kind of tell what a dishwasher does from the name. [17:45:02] also, we could set up a redirect if you think people aren't going to remember horizon, but I doubt it? [17:45:10] (03CR) 10Dzahn: "whichever name is picked, but add a link on https://wikitech.wikimedia.org/wiki/Main_Page because you get redirected there from http://la" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [17:45:21] (like bugs.mediawiki.org) [17:45:44] if I think labsdashboard I think of grafana or something [17:45:49] chasemp: except isn’t labs an actual thing with actual correct usage? [17:46:01] andrewbogott: we call kleenex kleenex and not tissues ;) [17:46:02] Or did we just decide that ‘labs’ is so polluted that we can’t use the term at all anymore? [17:46:04] i think labs.wikimedia.org is the easiest to remember of all [17:46:16] well now, mutante has a good point :) [17:46:18] and should redirect to the currently used dashboard [17:46:18] what is labsdashboard supposed to be? [17:46:24] really we haven't decided on anything I think, other than that labs isn't a good identifier of a specific thing [17:46:33] 6Operations, 6Editing-Department, 6Parsing-Team, 6Services: Services team goals April - June 2016 (Q4 2015/16) - https://phabricator.wikimedia.org/T118871#2123341 (10GWicke) [17:46:38] chasemp: it’s a good identifier for All the things [17:46:39] or should be [17:46:43] which is what labsdashboard is... [17:46:46] * apergos fancies some bikeshedding right about now... [17:46:54] So -- [17:47:02] Actually, I’ll email this instead [17:47:03] 6Operations, 6Editing-Department, 6Parsing-Team, 6Services: Services team goals April - June 2016 (Q4 2015/16) - https://phabricator.wikimedia.org/T118871#1811347 (10GWicke) Updated summary per team meeting. [17:47:43] andrewbogott: you prefer labsdashboard to horizon or you feel like users will and so it's easier? [17:47:48] I think we pretty much mostly name domains after the software they run and not their function, and don't see a good reason to deviate from that [17:48:21] there can be reasons to avoid software names.. if the companies dont want you to because trademarks [17:48:28] like nagios.wm [17:49:14] apergos: tried to answer your question in a followup email [17:49:20] κ [17:49:39] chasemp: I think the name should have some relationship to function, rather than just being “WORD" [17:49:40] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [5000000.0] [17:50:03] where did the name horizon come from? [17:50:53] I pretty much think of dashboards as having a bunch of graphs now [17:50:58] meh [17:50:59] PROBLEM - puppet last run on analytics1026 is CRITICAL: CRITICAL: Puppet has 1 failures [17:51:02] yes [17:51:14] chasemp: ‘horizon’ is the openstack codename, like ‘nova’ ‘keystone’ ‘glance’ etc. [17:51:14] also I admit I hate the idea of naming it 'horizon' [17:51:16] hate hate hate [17:51:17] control panel [17:51:27] so maybe we’re back to ‘labsconsole' [17:51:29] "the plesk of labs" :p [17:51:33] it has a noble and storied history [17:51:41] labsconsole is not awful [17:52:01] Yeah, ‘dashboard’ was just a sop to ‘name things after the software they run' [17:52:06] you gotta announce it as "there's a new console on the horizon" [17:52:08] which in this case is ‘openstack-dashboard' [17:52:15] ah I see [17:52:30] well the caches are finally called caches [17:52:36] not squids (yay!) [17:53:49] the snapshots are named after function [17:54:54] uh [17:55:00] labsmanager ? [17:55:06] I don't have any feelings about horizon vs labsconsole tbh [17:55:10] I mean, you say it's about management of all that stuff [17:55:28] but it is true that dashboard conjures images of a grafana type thing in our setup nowadays [17:55:37] noooo, not labsconsole [17:56:12] chasemp: Do you know why using 2FA at horizon is not optional like at wikitech? [17:56:15] there are still links pointing at that, lets not break them [17:56:24] HOrizon Labs Operations = HOLO => holodeck.wikimedia [17:56:31] nice [17:56:38] that's kinda fun [17:56:49] I could see that [17:56:56] mutante, oooh [17:56:57] Luke081515: well, andrewbogott would have a better answer but I think because we want it to be [17:57:08] :-/ [17:58:05] Seems like a good time to make the swicth [17:58:07] Luke081515: yep, 2fa will be required going forward. That tool is too powerful for just username/password [17:58:20] more powerful than wikitech? [17:58:33] It’s replacing wikitech [17:58:36] well, gradually [17:59:08] Can someone use horizon right now? [17:59:13] Luke081515: sorry, what I mean is: It’s already super dangerous :) [17:59:25] halfak: sure, as long as you keep in mind that I’m constantly messing with it [17:59:34] and that the DNS stuff doesn’t work yet [17:59:35] andrewbogott: But before the last update, I could use it too, without 2FA. [17:59:43] andrewbogott, will start playing. [17:59:52] Luke081515: horizon, you mean? Yeah, it was read-only [18:00:00] Now that it can actually delete things, 2fa [18:00:12] sometime ago this was possible too [18:00:15] Luke081515: do you not have a smart phone? [18:00:42] andrewbogott: I have one, but it is always away from my computer, I don't use it much [18:01:43] so in theorey I can use it, but this means a change-over for me [18:03:09] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 50.00% above the threshold [1000000.0] [18:03:57] Luke081515: there are options that are not phone apps but in general we are moving to a consolidated capabilities model w/ all things running through this tool including invasive / dangerous things, it's already 2fa for a lot of ppl on wikitech now [18:05:20] * halfak just set up 2FA and logged into horizon. [18:05:24] Luke081515: yeah, I agree that it’s annoying. If it’s any consolation, it’s the most annoying to the person making the decision to require it :) [18:05:51] :) [18:10:27] 6Operations, 10MobileFrontend, 10Traffic, 3Reading-Web-Sprint-68-"Java and JavaScript are basically the same", and 4 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getT... - https://phabricator.wikimedia.org/T124356#2123484 [18:11:46] (03PS3) 10Dzahn: ganglia: fix me - service notify systemd (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/277458 (https://phabricator.wikimedia.org/T124197) [18:11:53] 6Operations, 10MobileFrontend, 10Traffic, 3Reading-Web-Sprint-68-"Java and JavaScript are basically the same", and 4 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getT... - https://phabricator.wikimedia.org/T124356#2123485 [18:15:26] Luke081515: you might be able to generate some backup codes for offline use, like stored on a piece of paper in your wallet. but if both the password and codes for this would end up on the same computer then it contradicts the point of a second factor [18:17:14] mutante: If the password is in his head and not his computer, though, then his laptop could be the second factor, right? [18:18:01] andrewbogott: yea, i just assume that if the password is in his head he is either a genius or it's a bad password [18:18:03] (03PS1) 10Alexandros Kosiaris: Assign LVS apertium IPs to sca, scb hieradata files [puppet] - 10https://gerrit.wikimedia.org/r/277561 [18:18:08] so keepassx [18:18:39] well, then there is of course the password for the keepassx access [18:18:44] lol @ remembering passwords becoming bad practice. [18:18:48] and that should be in the head [18:19:04] Remembering many passwords -- totally. [18:19:10] I'm looking at https://chrome.google.com/webstore/detail/authy/gaedmjdfmmahhbjefcbgaolhhanlaolb/related now [18:19:14] 2fa: Something you have, and another different thing you have. DO NOT involve your brain in this process [18:19:50] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Assign LVS apertium IPs to sca, scb hieradata files [puppet] - 10https://gerrit.wikimedia.org/r/277561 (owner: 10Alexandros Kosiaris) [18:19:52] technically something you are is cognitive enough to remember teh something you know [18:19:58] I would love to make that argument [18:19:59] RSA has had a 2fa desktop app for years [18:20:05] let's not get into biometric access for labs :) [18:20:07] mutante: Some pages say, that a desktop PC would need 4000 years to crack it :P [18:20:14] (my pwd) [18:20:34] sure until you mispaste it or resuse it on accident :) [18:20:55] Luke081515: oh, i believe you can remember a good password, just not many good passwords, and since you probably have many many accounts that would mean the same good password is reused for many things [18:21:25] I remember three passwords: keepassx, wikitech, ssh key [18:21:35] But I guess I shouldn’t assume that wikitech is on the shortlist for anyone but me :) [18:21:39] for me it's keepassx master password, ssh passphrase [18:21:50] user login on laptop [18:22:15] mutante: so your horizon factors are: phone, laptop, password-in-brain [18:22:18] three factors! [18:22:26] sort of [18:22:27] RECOVERY - LVS HTTP IPv4 on apertium.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 4999 bytes in 0.076 second response time [18:22:37] hrmm, and disk encryption when it boots is another thing [18:22:41] only 2, as you can't count the same factor twice (something you have) [18:22:41] ah one is back [18:22:50] well, back, up for the very first time [18:22:58] is more like it [18:23:10] RECOVERY - cxserver endpoints health on scb2002 is OK: All endpoints are healthy [18:23:19] RECOVERY - cxserver endpoints health on scb2001 is OK: All endpoints are healthy [18:23:37] I got: laptop pwd, google pwd, beta cluster pwd, bot account pwd, LDAP pwd and CA pwd [18:23:43] chasemp: really, even if they’re different things I have? [18:23:46] !log OS install labstore200[3-4] [18:23:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:23:52] only 6 ? lucky you [18:23:58] akosiaris: now this is in my head https://en.wikipedia.org/wiki/Feels_Like_the_First_Time [18:24:08] i didnt provide a you tube link cuz im not that mean. [18:24:10] andrewbogott: yeah that's how it works for compliance regs at least, it's the method not just the count [18:24:15] robh: lol [18:24:16] hm, ok [18:24:20] thanks ! :-D [18:24:32] I wonder how I 'll sleep tonight now [18:24:34] I guess it would depend on w/not I keep my phone in my laptop bag [18:24:59] * robh is having to listen to the song to get rid of it. [18:25:04] the only way out is through. [18:25:18] and then .. how is your phone secured, PIN 1234, swipe a triangle, or fingerprint people can take from a glass :p [18:25:37] 7 digit pin [18:25:39] and encrypte [18:25:42] encrypted [18:25:58] but people tell me I am crazy [18:26:01] oh 7 is long [18:26:08] I don't have much things at my phone, so I'm using a pattern currently [18:26:28] * Luke081515 uses programms to clean up the private data regulary [18:26:45] i have a longer passcode for my phone thats alphanumeric [18:26:52] its a bitch until you get really used to it. [18:27:09] but i still have to be looking at my phone to do it. (so if im driving, i dont get texts.) [18:27:27] though server crash texts are allowed to bypass my lock screen ;D [18:27:28] !log pool sca1001, sca1002 for apertium.svc.eqiad.wmnet in conftool [18:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:27:40] (while gir sings the doom song) [18:27:53] aaand akosiaris just made me recheck the crypto option on the phone.. and i did sure miss to do it when i got the new phone [18:27:59] doing that now [18:28:13] yea my phone has my 2fa for phab, email, and wikitech [18:28:27] clicks "Encrypt phone" [18:28:36] RECOVERY - LVS HTTP IPv4 on apertium.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 4999 bytes in 0.013 second response time [18:28:42] ah and here is the second one [18:28:44] yay! [18:29:07] https://www.youtube.com/watch?v=sY278K4ljWs when servers crash. [18:29:21] just the doom song part... not hte entire clip. [18:30:35] (03CR) 10Hashar: [C: 031] "I would keep Horizon myself which is the OpenStack project name for the web based admin interface. It is also easier for me to access sinc" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [18:30:40] ok, now the big part... let's see how many pages [18:31:01] (03PS6) 10Alexandros Kosiaris: lvs: SC[AB] services lvs configuration [puppet] - 10https://gerrit.wikimedia.org/r/276199 (https://phabricator.wikimedia.org/T129234) [18:32:00] PROBLEM - Apache HTTP on mw1129 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.009 second response time [18:32:19] PROBLEM - HHVM rendering on mw1129 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.012 second response time [18:32:49] (03PS4) 10Dzahn: ganglia: fix up for aggregator service on systemd [puppet] - 10https://gerrit.wikimedia.org/r/277458 (https://phabricator.wikimedia.org/T124197) [18:35:18] (03PS5) 10Dzahn: ganglia: fix up for aggregator service on systemd [puppet] - 10https://gerrit.wikimedia.org/r/277458 (https://phabricator.wikimedia.org/T124197) [18:39:31] (03CR) 10Dzahn: "as intended, no-op on existing aggregators:" [puppet] - 10https://gerrit.wikimedia.org/r/277458 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [18:40:13] (03PS7) 10Alexandros Kosiaris: lvs: SC[AB] services lvs configuration [puppet] - 10https://gerrit.wikimedia.org/r/276199 (https://phabricator.wikimedia.org/T129234) [18:40:45] (03PS6) 10Dzahn: ganglia: fix up for aggregator service on systemd [puppet] - 10https://gerrit.wikimedia.org/r/277458 (https://phabricator.wikimedia.org/T124197) [18:40:53] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/2073/" [puppet] - 10https://gerrit.wikimedia.org/r/277458 (https://phabricator.wikimedia.org/T124197) (owner: 10Dzahn) [18:43:24] !log disable puppet on neon for a few minutes while deploying https://gerrit.wikimedia.org/r/#/c/276199/7 [18:43:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:43:28] RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [18:43:40] (03PS8) 10Alexandros Kosiaris: lvs: SC[AB] services lvs configuration [puppet] - 10https://gerrit.wikimedia.org/r/276199 (https://phabricator.wikimedia.org/T129234) [18:43:46] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] lvs: SC[AB] services lvs configuration [puppet] - 10https://gerrit.wikimedia.org/r/276199 (https://phabricator.wikimedia.org/T129234) (owner: 10Alexandros Kosiaris) [18:44:50] !log restarting elasticsearch server elastic2007.codfw.wmnet [18:44:54] (03PS8) 10Krinkle: Remove 'https -> http' rewrite for IRC notifications [mediawiki-config] - 10https://gerrit.wikimedia.org/r/217858 (https://phabricator.wikimedia.org/T122933) (owner: 10Faidon Liambotis) [18:44:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:45:28] (03CR) 10Krinkle: [C: 031] "This needs public announcements, ideally 1-2 weeks ahead of time. But overall I think this is okay to do." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/217858 (https://phabricator.wikimedia.org/T122933) (owner: 10Faidon Liambotis) [18:46:49] Krinkle: has happened already [18:47:19] Krinkle: https://meta.wikimedia.org/wiki/Tech/News/2016/02 [18:49:18] paravoid: It didn't have a date set yet though, so another reminder this week and roll out next week would be nice [18:51:18] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [18:51:38] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [18:51:44] :-) [18:51:55] mobrovac: ^ things seem to be going well [18:52:09] after 2 unnecessary errors, finally at least [18:52:59] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [18:56:54] 6Operations, 10ops-codfw, 13Patch-For-Review: labstore2003-labstore2004 onsite setup taks - https://phabricator.wikimedia.org/T128764#2123686 (10RobH) Ok, we need to write a new partman recipe for this, but we can use lvm-noraid-large.a.cfg as the basis. Modify it to have the following: * 300MB outside of... [18:57:07] !log alsafi - url-downloader codfw - reboot [18:57:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:57:27] RoanKattouw: This may be the second time I’ve asked you this, but… do you have any affection for 'parsoid-spof’? So far no will claim ownership and I’m on the verge of deleting it. https://phabricator.wikimedia.org/T128620 [18:58:31] andrewbogott: Not me. Ask gwicke or subbu , one of them is likely to know [19:00:04] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160315T1900). Please do the needful. [19:00:46] andrewbogott, you can delete it .. i responded on that email thread already. but, let me comment on T128620 as well. [19:00:47] T128620: Maybe delete instance parsoid-spof in 'visualeditor' project - https://phabricator.wikimedia.org/T128620 [19:00:51] !log alsafi back up with 4.4 kernel [19:00:53] RoanKattouw: you’re on the tail end of a long, illustrious list of people I have asked :) [19:00:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:01:07] subbu: ah, I’ve been thinking of the address question and the instance question as separate questions [19:01:10] but I suppose they aren't [19:01:17] * andrewbogott deletes [19:01:39] <_joe_> akosiaris: has restbase recovered too? [19:01:52] _joe_: on 2001 ? yes [19:02:00] (08:52:59 μμ) icinga-wm: RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [19:02:14] <_joe_> cool, I can restart restbase everywhere in codfw then [19:02:40] 6Operations, 13Patch-For-Review: Port Ganglia aggregator setup to systemd - https://phabricator.wikimedia.org/T124197#2123726 (10Dzahn) This works now. You can see it on alsafi. I can killall -u ganglia , run puppet and puppet starts all the services: ``` ganglia 459 0.0 0.3 48724 3164 ? Ssl... [19:02:44] <_joe_> and we officially have restbase in codfw talking to services there and everything (including parsoid) seem to work [19:03:12] (03PS4) 10Chad: Also keep /srv/patches in sync between masters [puppet] - 10https://gerrit.wikimedia.org/r/266773 [19:03:17] <_joe_> let me rolling restart restbase in codfw to pick up the config [19:03:33] (03CR) 10Chad: Also keep /srv/patches in sync between masters (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/266773 (owner: 10Chad) [19:04:50] <_joe_> !log rolling restart of restbase in codfw [19:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:05:31] !log enable puppet on neon [19:05:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:05:46] heh, LVS checks seem successful this time around [19:06:13] I did find a bug in our configuration though. We are sending a false Host: header in the LVS icinga checks, but I 'll fix that tomorrow [19:06:49] yay! icinga says all are OK [19:07:06] PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: Puppet has 1 failures [19:07:40] _joe_: beverage of your choice is on me next time you are around ;) [19:08:32] <_joe_> gwicke: :) [19:08:37] 6Operations, 6Services, 13Patch-For-Review: setup/deploy sc[ab]200[1-2] - https://phabricator.wikimedia.org/T129234#2123779 (10akosiaris) [19:09:13] _joe_: we'll do a bit more manual testing tomorrow [19:09:28] 6Operations, 6Services, 3Mobile-Content-Service, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Prepare mobileapps for the codfw switchover - https://phabricator.wikimedia.org/T125061#2123788 (10akosiaris) [19:09:31] 6Operations, 10Graphoid, 6Services, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Prepare graphoid for the codfw switchover - https://phabricator.wikimedia.org/T125060#2123789 (10akosiaris) [19:09:34] <_joe_> gwicke: sure, but the checks not failing is a promising start I think [19:09:35] 6Operations, 10Mathoid, 6Services, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Prepare mathoid for the codfw switchover - https://phabricator.wikimedia.org/T125058#2123790 (10akosiaris) [19:09:38] 6Operations, 6Services, 13Patch-For-Review: setup/deploy sc[ab]200[1-2] - https://phabricator.wikimedia.org/T129234#2099212 (10akosiaris) 5Open>3Resolved All services on sca200X/scb200X are now up and running. LVS is running fine and icinga reports no errors. Resolving [19:10:02] 6Operations: Reimage hooft with jessie and rename to bast3001 - https://phabricator.wikimedia.org/T123712#2123796 (10Dzahn) [19:10:04] 6Operations, 13Patch-For-Review: reinstall bast4001 with jessie - https://phabricator.wikimedia.org/T123674#2123797 (10Dzahn) [19:10:06] 6Operations: Make services manageable by systemd (tracking) - https://phabricator.wikimedia.org/T97402#2123798 (10Dzahn) [19:10:10] _joe_: indeed, they cover a lot of ground already -- main thing missing is the update code paths (no-cache) [19:10:10] 6Operations, 13Patch-For-Review: Port Ganglia aggregator setup to systemd - https://phabricator.wikimedia.org/T124197#2123794 (10Dzahn) 5Open>3Resolved template unit file from which instances are spawned: ``` root@alsafi:/# cat /etc/systemd/system/ganglia-monitor-aggregator\@.service [Unit] Description=G... [19:10:46] _joe_: some time we should write a blog post about how we leverage swagger specs for monitoring [19:11:03] <_joe_> gwicke: maybe when me and marko find the time to write some docs [19:11:12] (03CR) 10Thcipriani: [C: 031] Also keep /srv/patches in sync between masters [puppet] - 10https://gerrit.wikimedia.org/r/266773 (owner: 10Chad) [19:11:13] 6Operations: Port Ganglia aggregator setup to systemd - https://phabricator.wikimedia.org/T124197#2123809 (10Dzahn) [19:11:25] <_joe_> gwicke: I think we should "package" service_checker and upload it to pipy [19:11:46] that would be great [19:12:04] I haven't heard of anybody else doing this yet, and it is pretty cool [19:12:10] (03PS2) 10Andrew Bogott: Have the site-branding link link back to horizon rather than to wikitech. [puppet] - 10https://gerrit.wikimedia.org/r/276264 [19:12:12] (03PS2) 10Andrew Bogott: Move horizon.wikimedia.org to labsconsole.wikimedia.org. [puppet] - 10https://gerrit.wikimedia.org/r/276262 [19:12:58] <_joe_> I'm pretty sure a lot of people do that internally, maybe not using swagger specs but some non-standard format [19:13:34] (03CR) 10Ottomata: "Just tested with the latest patch in mediawiki-vagrant with this config file: https://gist.github.com/ottomata/45cc0bc09f2b859ef9f1 -- I" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/276439 (https://phabricator.wikimedia.org/T124278) (owner: 10Elukey) [19:14:06] (03PS1) 10Andrew Bogott: Added labsconsole.wikimedia.org, the new horizon vhost [dns] - 10https://gerrit.wikimedia.org/r/277571 [19:14:06] _joe_: sure, but basing it off the same spec that drives the service as well avoids potential disagreements & the need to maintain two copies [19:14:31] (03CR) 10jenkins-bot: [V: 04-1] Added labsconsole.wikimedia.org, the new horizon vhost [dns] - 10https://gerrit.wikimedia.org/r/277571 (owner: 10Andrew Bogott) [19:14:38] also, we use the same for unit tests [19:15:11] <_joe_> yeah that's well done in fact :) [19:15:44] (03PS2) 10Andrew Bogott: Added labsconsole.wikimedia.org, the new horizon vhost [dns] - 10https://gerrit.wikimedia.org/r/277571 [19:16:55] !log restarting elasticsearch server elastic2008.codfw.wmnet [19:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:17:40] (03PS1) 10Dzahn: ganglia: remove aggregator from alsafi [puppet] - 10https://gerrit.wikimedia.org/r/277572 [19:20:05] 6Operations, 13Patch-For-Review: reinstall bast4001 with jessie - https://phabricator.wikimedia.org/T123674#2123880 (10Dzahn) This should now be unblocked since ganglia::monitor::aggregator now works with systemd. [19:20:54] andrewbogott, wait, labsconsole? [19:21:40] Uhoh, do you hate labsconsole [19:21:45] 6Operations: reinstall bast4001 with jessie - https://phabricator.wikimedia.org/T123674#2123950 (10Dzahn) [19:21:47] I almost thought there was agreement :) [19:24:04] 6Operations: Grant Joseph and Dan deploy permissions on aqs100[1-3] - https://phabricator.wikimedia.org/T116169#2123984 (10Dzahn) 5stalled>3Open setting stalled -> open because the blocking task has been resolved [19:24:45] 6Operations: Grant Joseph and Dan deploy permissions on aqs100[1-3] - https://phabricator.wikimedia.org/T116169#2123990 (10Dzahn) Hi all, so do you still want this access request processed now that AQS is deployed with scap? [19:24:47] 6Operations: Grant Joseph and Dan deploy permissions on aqs100[1-3] - https://phabricator.wikimedia.org/T116169#2123991 (10Dzahn) Hi all, so do you still want this access request processed now that AQS is deployed with scap? [19:25:57] 6Operations, 10Ops-Access-Requests: Grant Joseph and Dan deploy permissions on aqs100[1-3] - https://phabricator.wikimedia.org/T116169#2124005 (10Dzahn) [19:32:18] (03PS3) 10Andrew Bogott: Have the site-branding link link back to horizon rather than to wikitech. [puppet] - 10https://gerrit.wikimedia.org/r/276264 [19:32:20] (03PS3) 10Andrew Bogott: Move horizon.wikimedia.org to labsconsole.wikimedia.org. [puppet] - 10https://gerrit.wikimedia.org/r/276262 [19:33:00] (03CR) 10Hashar: [C: 031] "labsconsole sounds good ;-} Feel free to drop the horizon redirect!" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [19:42:22] !log twentyafterfour@tin Started scap: testwiki to 1.27.0-wmf.17 [19:42:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:46:26] (03CR) 10Luke081515: [C: 031] "I'm ok with labsconsole, but I think the horizon redirect should be keept." [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [19:46:42] ups, typo [19:46:55] (03CR) 10Ottomata: "Hmmm, the binlog method does seem more robust, especially since it would stream updates as they came in." [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) (owner: 10Ottomata) [19:48:58] (03CR) 10Ottomata: "Pssh, maybe I should just run a slave somewhere and periodically stop it and tar /var/lib/mysql." [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) (owner: 10Ottomata) [19:50:07] (03CR) 10Ottomata: "Or, i suppose I could do this with mylvmbackup via LVM snapshots instead, since /var/lib/mysql is on an lvm partition." [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) (owner: 10Ottomata) [19:55:16] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 745 [19:56:45] (03CR) 10Luke081515: [C: 031] Enable Flow beta feature on plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277529 (https://phabricator.wikimedia.org/T130009) (owner: 10Catrope) [19:58:22] (03PS1) 10Ori.livneh: Add quick-n-dirty logging function hackLog() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277585 [19:59:25] (03PS2) 10Ori.livneh: Add quick-n-dirty logging function hackLog() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277585 [20:05:29] !log labstore200[3-4] OS install on hold: making new partman recipe [20:05:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:06:51] (03PS4) 10Andrew Bogott: Have the site-branding link link back to horizon rather than to wikitech. [puppet] - 10https://gerrit.wikimedia.org/r/276264 [20:06:53] (03PS4) 10Andrew Bogott: Move horizon.wikimedia.org to labsconsole.wikimedia.org. [puppet] - 10https://gerrit.wikimedia.org/r/276262 [20:06:55] (03PS1) 10Andrew Bogott: Slight fix to Horizon login splash [puppet] - 10https://gerrit.wikimedia.org/r/277589 [20:09:36] (03PS2) 10Andrew Bogott: Slight fix to Horizon login splash [puppet] - 10https://gerrit.wikimedia.org/r/277589 [20:10:16] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 771 [20:10:45] (03PS1) 10Dzahn: Revert "phab: add sender domains for maint-announce tickets" [puppet] - 10https://gerrit.wikimedia.org/r/277592 [20:10:53] 6Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124188 (10Ironholds) Sooo how exactly does one *use* it? `su analytics-search-user` complaints about passwords. Otto maintains... [20:11:17] (03CR) 10Andrew Bogott: [C: 032] Slight fix to Horizon login splash [puppet] - 10https://gerrit.wikimedia.org/r/277589 (owner: 10Andrew Bogott) [20:11:32] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124191 (10Dzahn) [20:12:07] (03PS2) 10Dzahn: Revert "phab: add sender domains for maint-announce tickets" [puppet] - 10https://gerrit.wikimedia.org/r/277592 [20:12:34] (03CR) 10Dzahn: [C: 032] Revert "phab: add sender domains for maint-announce tickets" [puppet] - 10https://gerrit.wikimedia.org/r/277592 (owner: 10Dzahn) [20:13:52] (03Abandoned) 10Addshore: Rename wdqs-admins contact group to wdqs [puppet] - 10https://gerrit.wikimedia.org/r/268673 (owner: 10Addshore) [20:13:56] 6Operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service: implement wdqs1001/1002 disk upgrades (extend lvm) - https://phabricator.wikimedia.org/T120714#2124199 (10Smalyshev) OK, that sounds good. I spoke today with Blazegraph folks, and looks like we will have implementation of geospatial indexing by... [20:15:04] (03CR) 10Legoktm: [C: 04-1] "Errr, no labsconsole please. There are still links pointing to it before it merged with wikitech." [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [20:15:16] RECOVERY - check_mysql on lutetium is OK: Uptime: 21469 Threads: 3 Questions: 378640 Slow queries: 91 Opens: 1064 Flush tables: 2 Open tables: 64 Queries per second avg: 17.636 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [20:15:28] RECOVERY - RAID on db1053 is OK: OK: optimal, 1 logical, 2 physical [20:15:34] legoktm: tell me more about why not labsconsole? [20:15:41] The new site is, in fact, the new labs console [20:16:17] yeah idk if it's the name in 10 years but I've fallen into the labsconsole camp as the most obvious of things [20:16:29] 6Operations, 10ops-codfw: rack new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2124217 (10Papaul) mgmt info 10.193.2.249 port information Row D rack D5 port ge-5/0/8 [20:19:03] andrewbogott: Because there are going to be links pointing to that domain from before it merged into wikitech and should be redirected onto wikitech [20:19:30] legoktm: can you give me an example? [20:19:38] And, what will people be expecting when they follow those links? [20:20:20] 6Operations, 13Patch-For-Review: reinstall bast2001 with jessie - https://phabricator.wikimedia.org/T128899#2124263 (10Dzahn) [20:20:22] 6Operations, 10ops-codfw: Check bast2001 for hardware problems - https://phabricator.wikimedia.org/T129316#2124264 (10Dzahn) [20:20:38] andrewbogott: https://www.mediawiki.org/w/index.php?target=labsconsole.wikimedia.org&title=Special%3ALinkSearch https://en.wikipedia.org/w/index.php?target=labsconsole.wikimedia.org&title=Special%3ALinkSearch https://meta.wikimedia.org/w/index.php?target=labsconsole.wikimedia.org&title=Special%3ALinkSearch plus mailing lists, git history, etc. [20:20:55] andrewbogott: redirects to the actual content on wikitech? [20:21:40] was there actual content on labsconsole? I thought it was only ever osm [20:22:07] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2124270 (10EBernhardson) curl_init_pooled looks very interesting. Unfortunately it is new as of 3.9.0 and prod is on 3.6.5 [20:22:15] iirc the early tool labs docs were? [20:22:55] I forget the exact timeline of when the wikis were merged, I'll look into that later [20:23:24] But re-using domain names when we could easily avoid it (labsdashboard?) seems not-so-great. [20:23:44] it just redirecs to wikitech now [20:24:08] I don't understand why it's worse if it redirects to an openstack dashboard labsconsole url in the future [20:24:10] legoktm: I’d love it if you would respond to my mail thread and make a coherent argument. As it is everyone is just randomly sniping at me directly so there’s no dialogue between people with opinions. [20:24:13] I do not have an opinion. [20:24:15] any link in the wild is already invalid [20:24:37] fair idea [20:24:41] andrewbogott: sure, will do [20:24:46] thanks [20:25:00] (will be a few hours before I have time for that though) [20:25:15] well, I should say, I have the opinion that ‘horizon’ seems pretty cryptic [20:25:37] PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 58.62% of data above the critical threshold [5000000.0] [20:26:47] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 58.62% of data above the critical threshold [5000000.0] [20:30:49] (03PS1) 10Papaul: DNS:Adding mgmt entries for wasat Bug:T129930 [dns] - 10https://gerrit.wikimedia.org/r/277597 (https://phabricator.wikimedia.org/T129930) [20:31:26] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124289 (10mpopov) `su analytics-search-user` says there's no password entry. `su analytics-search` still asks me for password. [20:31:27] andrewbogott: all software names are. The best similar tool wehave is probably Phabricator (ie: it's user facing not a dev/ops tool like logstash or grafana). [20:31:47] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124291 (10EBernhardson) ```sudo -u analytics-search ls``` [20:31:50] andrewbogott: do we have many other examples of not using the software name for the domain? [20:31:59] greg-g: yeah, there are some of each [20:32:02] (other than wikis of course) [20:32:03] * bd808 wishes he'd named logstash.wm.o logs.wm.o instead [20:32:12] But leaving it ‘horizon’ is easiest for me :) [20:33:28] 6Operations, 10ops-codfw: rack new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2124292 (10Papaul) [20:33:36] PROBLEM - Disk space on terbium is CRITICAL: DISK CRITICAL - free space: / 2487 MB (3% inode=81%) [20:34:37] !log twentyafterfour@tin Finished scap: testwiki to 1.27.0-wmf.17 (duration: 52m 15s) [20:34:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:35:00] !log terbium - gzip nutcracker.log.1 for disk space [20:35:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:35:44] 6Operations, 10Ops-Access-Requests: Grant Joseph and Dan deploy permissions on aqs100[1-3] - https://phabricator.wikimedia.org/T116169#2124293 (10Milimetric) I don't think we need this access any more, @Dzahn. Thanks for following up, if I understand the scap process correctly, we can deploy without this. I'... [20:35:58] !log testwiki still shows 1.27.0-wmf.16 :( [20:36:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:36:19] !log grr now it says .17, pebkac [20:36:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:37:06] RECOVERY - Disk space on terbium is OK: DISK OK [20:37:37] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [20:39:13] (03PS2) 10Jdlrobson: WikidataPageBanner config changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277431 (https://phabricator.wikimedia.org/T129099) [20:39:37] (03CR) 10jenkins-bot: [V: 04-1] WikidataPageBanner config changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277431 (https://phabricator.wikimedia.org/T129099) (owner: 10Jdlrobson) [20:39:48] RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 50.00% above the threshold [1000000.0] [20:40:55] (03PS1) 10Cmjohnson: Adding snapshot100[4-7] to dhcpd file and giving raid1-lvm-ext4-srv.cfg [puppet] - 10https://gerrit.wikimedia.org/r/277599 [20:42:40] (03CR) 10Cmjohnson: [C: 032] Adding snapshot100[4-7] to dhcpd file and giving raid1-lvm-ext4-srv.cfg [puppet] - 10https://gerrit.wikimedia.org/r/277599 (owner: 10Cmjohnson) [20:44:53] 6Operations, 10ops-codfw: rack new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2124343 (10RobH) switch port description set, enabled, and vlan set. [20:45:32] !log krinkle@tin Synchronized php-1.27.0-wmf.16/extensions/MoodBar/MoodBar.php: T129978 (duration: 00m 53s) [20:45:33] T129978: MoodBar registers a dependency on a message key that isn't specified - https://phabricator.wikimedia.org/T129978 [20:45:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:48:18] !log restarting elasticsearch server elastic2009.codfw.wmnet [20:48:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:50:02] (03PS1) 1020after4: delete stale symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277601 [20:50:04] (03PS1) 1020after4: group0 to 1.27.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277602 [20:52:48] (03Merged) 10jenkins-bot: delete stale symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277601 (owner: 1020after4) [20:52:58] (03Merged) 10jenkins-bot: group0 to 1.27.0-wmf.17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277602 (owner: 1020after4) [20:53:39] (03PS2) 10GWicke: Increase purged entry point s-maxage from 12 to 48 hours [puppet] - 10https://gerrit.wikimedia.org/r/277112 [20:54:43] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124382 (10Ironholds) But analytics-search apparently doesn't have a /home/ directory so is not actually what we were looking for. Sod it, let's just... [20:56:05] (03PS3) 10Jdlrobson: WikidataPageBanner config changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277431 (https://phabricator.wikimedia.org/T129099) [20:57:15] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124389 (10Dzahn) 5Resolved>3Open reopening, should be fixed properly instead of running it through an individual user account [21:02:17] 6Operations, 10ops-eqiad, 10Dumps-Generation: Rack and setup snapshot1005-1007 - https://phabricator.wikimedia.org/T129553#2124403 (10Cmjohnson) [21:02:40] (03CR) 10Hashar: "Then we break links and people will adjust ? ;-}" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [21:04:08] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124406 (10Dzahn) Ok, there seems to be some major confusion on this ticket. When you say you are "requesting access to a user", that is not what is h... [21:06:35] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124415 (10Ironholds) Okay, then this was a fundamental misunderstanding from the get-go. We asked Andrew Otto for access to or the creation of a rol... [21:07:17] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124416 (10Dzahn) [analytics1001:~] $ id ironholds uid=5004(ironholds) gid=500(wikidev) groups=500(wikidev),731(analytics-privatedata-users),771(analy... [21:07:44] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124420 (10EBernhardson) actually the analytics-search-user group was created so that there is an analytics-search user. This user account is used to... [21:08:41] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124427 (10Ironholds) Well, not just cron jobs but also have somewhere to store the scripts the jobs are triggering. [21:08:56] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [21:09:16] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124428 (10Dzahn) >>! In T129260#2124415, @Ironholds wrote: > We asked Andrew Otto for access to or the creation of a role account that'd exist indepe... [21:09:52] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124431 (10Ironholds) And the files live where? (Otto is suggesting /a/) [21:11:16] 6Operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service: implement wdqs1001/1002 disk upgrades (extend lvm) - https://phabricator.wikimedia.org/T120714#2124433 (10RobH) That works for me, and I'll be around that entire week to assist on the operations end of things. Would we be able to include the r... [21:11:55] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124435 (10Dzahn) >>! In T129260#2124420, @EBernhardson wrote: > so that any person in the group can fix them, rather than having it owned by a specif... [21:15:39] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124444 (10EBernhardson) >>! In T129260#2124435, @Dzahn wrote: >>>! In T129260#2124420, @EBernhardson wrote: >> so that any person in the group can fi... [21:24:03] !log restarting elasticsearch server elastic2010.codfw.wmnet [21:24:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:27:40] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124489 (10Dzahn) >>! In T129260#2124444, @EBernhardson wrote: > the analytics team (i talked with otto on irc) felt like having researchers puppetize... [21:34:12] 6Operations, 10Ops-Access-Requests: Requesting access to to analytics-search-user for Mikhail Popov and Oliver Keyes - https://phabricator.wikimedia.org/T129260#2124510 (10Ironholds) Sure, then we'll look at that when we're no longer on a timer. I'm checking out for today. I'll start moving scripts tomorrow. [21:35:18] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [21:44:07] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [21:44:21] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.17 [21:44:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:47:29] !log twentyafterfour@tin Purged l10n cache for 1.27.0-wmf.14 [21:47:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:50:26] PROBLEM - check_disk on db1025 is CRITICAL: DISK CRITICAL - free space: / 3724 MB (52% inode=72%): /dev 32199 MB (99% inode=99%): /run 6441 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /a 362884 MB (29% inode=99%): /a/tmp 2 MB (0% inode=99%) [21:51:04] (03PS5) 10Andrew Bogott: Have the site-branding link link back to horizon rather than to wikitech. [puppet] - 10https://gerrit.wikimedia.org/r/276264 [21:51:06] (03PS5) 10Andrew Bogott: Move horizon.wikimedia.org to labsconsole.wikimedia.org. [puppet] - 10https://gerrit.wikimedia.org/r/276262 [21:51:08] (03PS1) 10Andrew Bogott: Update horizon login splash logo, again [puppet] - 10https://gerrit.wikimedia.org/r/277622 [21:51:11] ^^^ icinga/looking [21:53:11] 7Puppet, 10Phabricator, 5Gerrit-Migration, 13Patch-For-Review, 7WorkType-Maintenance: Configure backula to backup the /srv/phab/repos directory - https://phabricator.wikimedia.org/T120045#2124601 (10Dzahn) ``` [helium:~] $ sudo bconsole Connecting to Director helium.eqiad.wmnet:9101 1000 OK: helium.eqiad... [21:54:06] 7Puppet, 10Phabricator, 5Gerrit-Migration, 7WorkType-Maintenance: Configure backula to backup the /srv/phab/repos directory - https://phabricator.wikimedia.org/T120045#2124604 (10Dzahn) [21:55:16] RECOVERY - check_disk on db1025 is OK: DISK OK - free space: / 3724 MB (52% inode=72%): /dev 32199 MB (99% inode=99%): /run 6441 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /a 362884 MB (29% inode=99%): /a/tmp 102316 MB (99% inode=99%) [21:56:44] (03CR) 10Andrew Bogott: [C: 032] Update horizon login splash logo, again [puppet] - 10https://gerrit.wikimedia.org/r/277622 (owner: 10Andrew Bogott) [21:56:55] awight: wagyu beef? [21:57:31] 6Operations, 10ops-eqiad, 10Dumps-Generation: Rack and setup snapshot1005-1007 - https://phabricator.wikimedia.org/T129553#2124612 (10Cmjohnson) I used raid1-lvm-ext4-srv.cfg and didn't go over well @ArielGlenn please take a look. [21:57:45] mutante: strictly vegan meetings, actually, but I had to be there in the flesh [21:58:07] awight: aaah, i was thinking something vegan and was kind of surprised. got it! [21:58:18] "meating" [21:58:25] :p clever :) [21:58:34] It's just to make remote people hungry [21:58:39] meatbot [21:58:48] (03PS1) 10Ottomata: Add mysql_wmf::mylvmbackup define, use this for backups of analytics-meta mysql instance [puppet] - 10https://gerrit.wikimedia.org/r/277640 (https://phabricator.wikimedia.org/T127991) [22:00:59] !log ori@tin Synchronized php-1.27.0-wmf.16/includes/api/ApiPurge.php: I29636c04: Add RecursiveLinkPurge log for API requests (duration: 00m 33s) [22:01:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:01:56] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [22:02:09] (03PS2) 10Ottomata: Add mysql_wmf::mylvmbackup define, use this for backups of analytics-meta mysql instance [puppet] - 10https://gerrit.wikimedia.org/r/277640 (https://phabricator.wikimedia.org/T127991) [22:03:46] (03CR) 10Ottomata: "Perhaps using mylvmbackup is better than xtrabackup here? If this works, I will abandon https://gerrit.wikimedia.org/r/#/c/273312/12." [puppet] - 10https://gerrit.wikimedia.org/r/277640 (https://phabricator.wikimedia.org/T127991) (owner: 10Ottomata) [22:03:48] (03CR) 10Dzahn: [C: 032] "(gerrit trivia) Wasat is the traditional name of Delta Geminorum" [dns] - 10https://gerrit.wikimedia.org/r/277597 (https://phabricator.wikimedia.org/T129930) (owner: 10Papaul) [22:04:31] (03CR) 10jenkins-bot: [V: 04-1] Add mysql_wmf::mylvmbackup define, use this for backups of analytics-meta mysql instance [puppet] - 10https://gerrit.wikimedia.org/r/277640 (https://phabricator.wikimedia.org/T127991) (owner: 10Ottomata) [22:08:32] (03CR) 10Dzahn: "you say "move horizon" and "redirect remains in place" but the redirect that is added is labsconsole.wikimedia.org to wikitech.wikimedia" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [22:08:50] (03CR) 10jenkins-bot: [V: 04-1] Add mysql_wmf::mylvmbackup define, use this for backups of analytics-meta mysql instance [puppet] - 10https://gerrit.wikimedia.org/r/277640 (https://phabricator.wikimedia.org/T127991) (owner: 10Ottomata) [22:09:56] 6Operations, 10ops-codfw: rack new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2120304 (10Dzahn) mgmt dns merged [22:11:38] (03CR) 10Andrew Bogott: "@Dzahn, doesn't horizon.wikimedia.org.erb redirect horizon to labsconsole?" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [22:12:26] 6Operations, 10Analytics, 10hardware-requests, 13Patch-For-Review: eqiad: (3) AQS replacement nodes - https://phabricator.wikimedia.org/T124947#1970805 (10RobH) So we have a specification for this, it is actually our new spare pool specification on T128910. I've added this as dependent on that specificati... [22:19:49] @seen joal [22:19:49] mutante: joal is in here, right now [22:20:02] :) [22:20:27] joal: hey! can i ask you about acces on aqs really quick [22:20:33] now that AQS is deployed with scap [22:20:46] do you still care about the "deploy access" access request or is it solved [22:20:59] i think you have access now , right [22:21:05] but in a different way [22:21:29] 6Operations: "Unable to connect to redis server" log spam - https://phabricator.wikimedia.org/T130078#2124709 (10hashar) [22:23:44] (03CR) 10Dzahn: "@Andrew, aah! i see it now,first just saw "redirect" and the changed redirects.conf" [puppet] - 10https://gerrit.wikimedia.org/r/276262 (owner: 10Andrew Bogott) [22:26:51] 6Operations: Reimage hooft with jessie and rename to bast3001 - https://phabricator.wikimedia.org/T123712#2124749 (10Dzahn) [22:26:53] 6Operations: reinstall bast4001 with jessie - https://phabricator.wikimedia.org/T123674#2124750 (10Dzahn) [22:26:55] 6Operations: Make services manageable by systemd (tracking) - https://phabricator.wikimedia.org/T97402#2124751 (10Dzahn) [22:26:57] 6Operations: Port Ganglia aggregator setup to systemd - https://phabricator.wikimedia.org/T124197#2124747 (10Dzahn) 5Resolved>3Open remaining puppet issue that got overlooked earlier [22:27:26] (03CR) 10Krinkle: Add quick-n-dirty logging function hackLog() (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277585 (owner: 10Ori.livneh) [22:31:56] (03PS1) 10Ori.livneh: Turn on RecursiveLinkPurge log bucket, for I29636c045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277677 [22:32:46] !log krinkle@tin Synchronized php-1.27.0-wmf.16/extensions/VisualEditor/extension.json: T129704 (duration: 00m 24s) [22:32:47] T129704: Warning: Unable to fetch message `visualeditor-annotationbutton-magiclinknode` and related keys - https://phabricator.wikimedia.org/T129704 [22:32:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:35:40] (03PS2) 10Ori.livneh: Turn on RecursiveLinkPurge log bucket, for I29636c045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277677 [22:35:57] RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:37:09] (03CR) 10Ori.livneh: [C: 032] Turn on RecursiveLinkPurge log bucket, for I29636c045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277677 (owner: 10Ori.livneh) [22:37:34] (03Merged) 10jenkins-bot: Turn on RecursiveLinkPurge log bucket, for I29636c045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277677 (owner: 10Ori.livneh) [22:39:11] 6Operations: Port Ganglia aggregator setup to systemd - https://phabricator.wikimedia.org/T124197#2124781 (10Dzahn) ehmm.. everything was alright earlier. then the error popped up about puppet not being able to start ganglia-monitor-service (that is not the aggregator service this whole ticket was about), then i... [22:40:09] !log ori@tin Synchronized wmf-config/InitialiseSettings.php: I87c174cf83: Turn on RecursiveLinkPurge log bucket, for I29636c045 (duration: 00m 25s) [22:40:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:41:24] 6Operations, 10EventBus: setup/deploy conf200[1-3] - https://phabricator.wikimedia.org/T127344#2124806 (10RobH) [22:41:26] 6Operations, 10EventBus, 10hardware-requests: 3 conf200x servers in codfw for zookeeper (and etcd?) - https://phabricator.wikimedia.org/T121882#2124802 (10RobH) 5Open>3stalled T130080 has been created to get quotes for this, and is a blocker to this task. [22:41:28] 6Operations, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#2124807 (10RobH) [22:41:34] 6Operations, 10EventBus, 10hardware-requests: 3 conf200x servers in codfw for zookeeper (and etcd?) - https://phabricator.wikimedia.org/T121882#2124808 (10RobH) [22:43:47] 6Operations: Reimage hooft with jessie and rename to bast3001 - https://phabricator.wikimedia.org/T123712#2124828 (10Dzahn) [22:43:49] 6Operations: Make services manageable by systemd (tracking) - https://phabricator.wikimedia.org/T97402#2124830 (10Dzahn) [22:43:51] 6Operations: Port Ganglia aggregator setup to systemd - https://phabricator.wikimedia.org/T124197#2124826 (10Dzahn) 5Open>3Resolved can't reproduce. i can killall -u ganglia, run puppet. all things come back normal without issue. multiple times [22:43:53] 6Operations: reinstall bast4001 with jessie - https://phabricator.wikimedia.org/T123674#2124829 (10Dzahn) [22:44:58] (03CR) 10Hashar: "I am myself using:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277585 (owner: 10Ori.livneh) [22:45:59] (03PS1) 10Papaul: DNS: Adding production entries for wasat Bug:T129930 [dns] - 10https://gerrit.wikimedia.org/r/277680 (https://phabricator.wikimedia.org/T129930) [22:46:15] 7Blocked-on-Operations, 10Datasets-Archiving, 10Dumps-Generation, 10Flow, 3Collaboration-Team-Current: Publish recurring Flow dumps at http://dumps.wikimedia.org/ - https://phabricator.wikimedia.org/T119511#2124834 (10Mattflaschen) It has been merged. [23:00:04] RoanKattouw ostriches Krenair MaxSem: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160315T2300). [23:00:04] RoanKattouw Jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:01:22] Il'll do it [23:02:29] (03CR) 10Catrope: [C: 032] On Beta Cluster: Use different logo for login form [mediawiki-config] - 10https://gerrit.wikimedia.org/r/243732 (https://phabricator.wikimedia.org/T115078) (owner: 10Jdlrobson) [23:02:43] RoanKattouw: :) [23:03:29] (03PS4) 10Madhuvishy: [WIP] ifttt: Set up Wikimedia IFTTT channel service using puppet on labs [puppet] - 10https://gerrit.wikimedia.org/r/277189 [23:05:16] (03CR) 10Catrope: [C: 032] WikidataPageBanner config changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277431 (https://phabricator.wikimedia.org/T129099) (owner: 10Jdlrobson) [23:06:24] (03Merged) 10jenkins-bot: On Beta Cluster: Use different logo for login form [mediawiki-config] - 10https://gerrit.wikimedia.org/r/243732 (https://phabricator.wikimedia.org/T115078) (owner: 10Jdlrobson) [23:08:37] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 55.17% of data above the critical threshold [5000000.0] [23:08:49] (03Merged) 10jenkins-bot: WikidataPageBanner config changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277431 (https://phabricator.wikimedia.org/T129099) (owner: 10Jdlrobson) [23:10:02] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: WikidataPageBanner config changes (duration: 00m 33s) [23:10:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:10:35] (03CR) 10Catrope: [C: 032] Strip rather than hide HTML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276809 (https://phabricator.wikimedia.org/T110613) (owner: 10Jdlrobson) [23:11:11] (03Merged) 10jenkins-bot: Strip rather than hide HTML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276809 (https://phabricator.wikimedia.org/T110613) (owner: 10Jdlrobson) [23:12:44] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Strip rather than hide HTML in MobileFrontend (duration: 00m 29s) [23:12:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:13:12] jdlrobson: That's all of your patches done, please test/confirm [23:13:18] on it [23:15:34] 2/3 verified :) [23:16:44] 6Operations, 10ops-codfw: rack new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2124929 (10Papaul) [23:18:59] RoanKattouw: all verified! THanks! [23:22:18] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [23:28:26] (03PS1) 10Papaul: DHCP: Adding MAC address for wasat Bug:T129930 [puppet] - 10https://gerrit.wikimedia.org/r/277691 (https://phabricator.wikimedia.org/T129930) [23:28:28] 6Operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service: implement wdqs1001/1002 disk upgrades (extend lvm) - https://phabricator.wikimedia.org/T120714#2124959 (10Smalyshev) Yes, that's the plan - to do both reimage & full reload. [23:28:41] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: The refreshLinks jobs enqueue rate is 10 times the normal rate - https://phabricator.wikimedia.org/T129517#2124961 (10hashar) I haven't really looked at the issue but both svwiki and frwiki have a very similar pattern with almost all. I wrote a very very... [23:30:22] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: The refreshLinks jobs enqueue rate is 10 times the normal rate - https://phabricator.wikimedia.org/T129517#2124963 (10hashar) For the record: ``` mwscript extensions/WikimediaMaintenance/getJobQueueLengths.php | sort -n -k2 | tail -n10 itwiki 3007 enwik... [23:33:29] !log ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: AdHocDebug for JobQueueGroup->lazyPush() (duration: 00m 28s) [23:33:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:35:45] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2124967 (10Smalyshev) If upgrade is hard I imaging extracting just that part in curl extension and backporting it may be possible. Dependin... [23:36:15] (03PS1) 10Papaul: Adding install params for wasat Bug:T129930 [puppet] - 10https://gerrit.wikimedia.org/r/277694 (https://phabricator.wikimedia.org/T129930) [23:37:42] !log ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: AdHocDebug for JobQueueGroup->lazyPush() (with ip) (duration: 00m 30s) [23:37:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:38:23] 6Operations, 10ops-codfw: rack new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2124976 (10Papaul) [23:38:25] (03PS2) 10Dzahn: DHCP: Adding MAC address for wasat Bug:T129930 [puppet] - 10https://gerrit.wikimedia.org/r/277691 (https://phabricator.wikimedia.org/T129930) (owner: 10Papaul) [23:38:30] (03CR) 10Dzahn: [C: 032] DHCP: Adding MAC address for wasat Bug:T129930 [puppet] - 10https://gerrit.wikimedia.org/r/277691 (https://phabricator.wikimedia.org/T129930) (owner: 10Papaul) [23:40:52] !log ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: AdHocDebug for JobQueueGroup->lazyPush() (exclude job runners) (duration: 00m 28s) [23:40:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:43:02] (03PS2) 10Dzahn: Adding install params for wasat Bug:T129930 [puppet] - 10https://gerrit.wikimedia.org/r/277694 (https://phabricator.wikimedia.org/T129930) (owner: 10Papaul) [23:43:08] (03CR) 10Dzahn: [C: 032] Adding install params for wasat Bug:T129930 [puppet] - 10https://gerrit.wikimedia.org/r/277694 (https://phabricator.wikimedia.org/T129930) (owner: 10Papaul) [23:43:16] (03CR) 10Dzahn: [V: 032] Adding install params for wasat Bug:T129930 [puppet] - 10https://gerrit.wikimedia.org/r/277694 (https://phabricator.wikimedia.org/T129930) (owner: 10Papaul) [23:43:44] !log ori@tin Synchronized php-1.27.0-wmf.16/includes/jobqueue/JobQueueGroup.php: (no message) (duration: 00m 24s) [23:43:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:44:51] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: The refreshLinks jobs enqueue rate is 10 times the normal rate - https://phabricator.wikimedia.org/T129517#2124984 (10hashar) [23:44:54] (03PS2) 10Dzahn: DNS: Adding production entries for wasat Bug:T129930 [dns] - 10https://gerrit.wikimedia.org/r/277680 (https://phabricator.wikimedia.org/T129930) (owner: 10Papaul) [23:48:09] 6Operations, 10ops-codfw, 13Patch-For-Review: labstore2003-labstore2004 onsite setup task - https://phabricator.wikimedia.org/T128764#2125006 (10Papaul) [23:55:20] 6Operations, 13Patch-For-Review: Mediawiki font packages: switch to Jessie - https://phabricator.wikimedia.org/T102623#2125018 (10Dereckson) [23:55:38] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [23:56:41] (03PS1) 10Catrope: Enable Flow by default in all talk namespaces on gomwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277697 (https://phabricator.wikimedia.org/T128359)