[00:00:38] where can i see the full text of that OperationalError above^ ? [00:00:54] which box, or a pastebin :) [00:01:04] holmium [00:01:07] but I will paste, hang on [00:01:10] !log on tin: disabling puppet for scap test. Patching scap locally [00:01:13] Logged the message, Master [00:01:20] (swat is over, btw) [00:02:08] springle: https://phabricator.wikimedia.org/P404 [00:04:17] springle: sorry, I have a maximum number of debug channels turned on. Can pare them down if it’s making for illegible output [00:04:42] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: puppet fail [00:05:17] on mw1101 was my fault, I pressed ctrl-C [00:07:17] andrewbogott: the recordsets table is trying to reference a foreign key on domains table, but the fields have different data types. not sure how this would ever work [00:07:26] what generated the CREATE TABLE statement? [00:07:39] springle: I bet they only ever test on sqlite — maybe it has less stringent error checking? [00:08:28] !log tstarling Started scap: (no message) [00:08:31] Logged the message, Master [00:08:36] !log tstarling scap failed: NameError global name 'random' is not defined (duration: 00m 08s) [00:08:39] Logged the message, Master [00:08:41] springle: All I’ve run is designate-manage database-init and designate-manage database-sync [00:10:22] !log tstarling Started scap: (no message) [00:10:30] !log tstarling scap failed: TypeError cannot concatenate 'str' and 'int' objects (duration: 00m 07s) [00:10:33] Logged the message, Master [00:11:34] !log tstarling Started scap: (no message) [00:11:42] !log tstarling scap failed: CalledProcessError Command 'mkdir "/tmp/scap_l10n_3829571451" && /usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="cawikibooks" --outdir="/tmp/scap_l10n_3829571451" --threads=4 %(force)s --quiet' returned non-zero exit status 2 (duration: 00m 07s) [00:11:45] Logged the message, Master [00:15:31] andrewbogott: i created a correct verions of the table recordsets. try it again? [00:17:40] springle: I think now it’s erroring out because it’s trying to upgrade a table that’s already upgraded [00:17:51] !log tstarling Started scap: (no message) [00:17:52] (OperationalError) (1050, "Table 'recordsets' already exists") [00:17:59] !log tstarling scap failed: CalledProcessError Command 'mkdir "/tmp/scap_l10n_110284512" && /usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="cawikibooks" --outdir="/tmp/scap_l10n_110284512" --threads=4 %(force)s --quiet' returned non-zero exit status 2 (duration: 00m 08s) [00:18:02] Logged the message, Master [00:18:36] andrewbogott: this is obviously untested. we should let it use sqlite and file bugs, or go away and test this process on a non-production system [00:19:06] springle: do you mind if we give it one more chance before surrender? [00:19:13] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [00:19:26] andrewbogott: how? :) migration won't run [00:20:34] springle: Ah, well, this failure I take to be a symptom of the previous failure. But, hm... [00:21:05] Part of the context is that I’m running an oldish version of this, because of various dependency issues. So it’s likely not as crappy in the up-to-date version [00:22:02] the new version might have these bugs fixed :) [00:22:13] yes, but I can’t run the new version [00:22:53] can you try dropping recordsets? I’ll rerun and if it fails again then I’ll just fall back on sqlite [00:23:03] ok done [00:23:04] and hope that we can migrate to a proper db sometime later [00:23:29] bah, on to the next one. [00:23:31] Ok, screw this :) [00:23:34] Thank you for your help. [00:23:37] (03CR) 10Alex Monk: Set $wgRateLimits['badcaptcha'] to counter bots (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195886 (https://phabricator.wikimedia.org/T92376) (owner: 10Nemo bis) [00:23:40] andrewbogott: :) [00:23:49] andrewbogott: are oyu leaving pdns on mysql? [00:24:04] springle: yes, that seems like the more critical component anyway [00:24:42] ok [00:26:14] (03PS1) 10Andrew Bogott: Use designate defaults (sqlite) for its db. [puppet] - 10https://gerrit.wikimedia.org/r/197258 [00:26:33] springle: designate will talk directly to mysql in order to add pdns entries. I hope /that/ isn’t screwed up as well :( [00:27:05] (03Abandoned) 10Andrew Bogott: s/connection/database_connection in designate conf. [puppet] - 10https://gerrit.wikimedia.org/r/197252 (owner: 10Andrew Bogott) [00:27:40] (03PS2) 10Andrew Bogott: Use designate defaults (sqlite) for its db. [puppet] - 10https://gerrit.wikimedia.org/r/197258 [00:28:54] (03CR) 10Andrew Bogott: "Can you please explain more about this? Silver has a cron that runs the local job queue every minute -- and I'm sure that works, because " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190406 (owner: 10Ori.livneh) [00:29:21] (03PS2) 10Alex Monk: Enable Citoid extension on all VisualEditor wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197074 (https://phabricator.wikimedia.org/T62768) (owner: 10Jforrester) [00:29:23] (03CR) 10Andrew Bogott: [C: 032] Use designate defaults (sqlite) for its db. [puppet] - 10https://gerrit.wikimedia.org/r/197258 (owner: 10Andrew Bogott) [00:30:41] springle: I need to go make dinner now. I’m unblocked for the moment — thanks. [00:34:00] (03CR) 10Alex Monk: "Hi Mjbmr. I'm having trouble verifying this - please could you clarify where this came from? The linked task does not explain it..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195017 (owner: 10Mjbmr) [00:42:19] (03CR) 10Dzahn: [C: 031] "nitpick, single quotes please" [puppet] - 10https://gerrit.wikimedia.org/r/193837 (owner: 10ArielGlenn) [00:43:27] (03PS3) 10Dzahn: Backup /var/lib/jenkins/config.xml on gallium [puppet] - 10https://gerrit.wikimedia.org/r/165991 (owner: 10Alexandros Kosiaris) [00:43:56] !log tstarling Started scap: (no message) [00:44:49] !log running extensions/GlobalUsage/refreshGlobalimagelinks.php --wiki=plwiki --pages=nonexisting [00:44:53] Logged the message, Master [00:46:42] (03PS4) 10Dzahn: Backup /var/lib/jenkins/config.xml on gallium [puppet] - 10https://gerrit.wikimedia.org/r/165991 (https://phabricator.wikimedia.org/T65938) (owner: 10Alexandros Kosiaris) [00:47:07] (03PS1) 10Ori.livneh: Increase HHVM's memory_limit from 300M to 500M [puppet] - 10https://gerrit.wikimedia.org/r/197261 [00:47:27] TimStarling: ^ [00:47:41] I can apply / babysit [00:48:12] (03CR) 10Tim Starling: [C: 031] Increase HHVM's memory_limit from 300M to 500M [puppet] - 10https://gerrit.wikimedia.org/r/197261 (owner: 10Ori.livneh) [00:48:22] (03PS2) 10Ori.livneh: Increase HHVM's memory_limit from 300M to 500M [puppet] - 10https://gerrit.wikimedia.org/r/197261 [00:48:25] !log tstarling scap failed: ValueError unsupported format character '/' (0x2f) at index 18 (duration: 04m 29s) [00:48:32] Logged the message, Master [00:48:44] (03CR) 10Ori.livneh: [C: 032 V: 032] "_joe_, FYI." [puppet] - 10https://gerrit.wikimedia.org/r/197261 (owner: 10Ori.livneh) [00:50:30] (03CR) 10Dzahn: [C: 032] "added bug T65938 . this was needed, but the ticket lists more" [puppet] - 10https://gerrit.wikimedia.org/r/165991 (https://phabricator.wikimedia.org/T65938) (owner: 10Alexandros Kosiaris) [00:50:31] ori: Can you do a quick config change? https://gerrit.wikimedia.org/r/#/c/197250/ reverts what turned out to be a no-op for production but breaking for Beta Cluster. [00:52:08] James_F: let's wait for TimStarling to be done with scap, if that's OK [00:52:17] Of course. [00:52:58] !log tstarling Started scap: (no message) [00:53:13] yeah I'm a bit rusty at python [00:53:29] all the more reason to maintain this tool [00:54:08] what's the proper way to test scap? presumably patching tin live is not it [00:54:30] live patch in beta or make a local test rig [00:54:56] I have a vagrant vm that I can test things in but I failed to ever make it reproducable [00:55:07] beta is either very useful or not at all, depending on whether what you want to test happens to touch one of its many idiosyncracies [00:55:13] so live patching beta is more easily accessible [00:55:31] TimStarling: pastebin a diff? (Or is there a Gerrit patch somewhere?) [00:56:00] well, the reason I am testing it live on tin is to spare you from looking at my awful first-cut python [00:56:04] tin on labs is [00:56:05] Beta is much less idiosyncratic today; Yuvi have been squashing differences right and left lately [00:56:18] owww come on [00:56:29] for scap for sure [00:57:07] All of the beta::scap::* classes I created in puppet are gone as of the end of last week [00:57:26] !log tstarling scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_2149279197/*" "/srv/mediawiki-staging/php-1.25wmf20/cache/l10n"' returned non-zero exit status 1 (duration: 04m 27s) [00:57:30] Logged the message, Master [00:57:53] 6operations, 7HTTPS: implement Public Key Pinning (HPKP) for Wikimedia domains - https://phabricator.wikimedia.org/T92002#1124034 (10faidon) [00:58:00] logmsgbot is into public shaming [00:58:52] !log Increased memory limit on HHVM app servers from 300M to 500M in an attempt to reduce the rate at which T89918 occurs [00:58:55] Logged the message, Master [00:59:18] (03PS1) 10Tim Starling: Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 [00:59:27] ^ since you insist [00:59:34] (03CR) 10jenkins-bot: [V: 04-1] Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:01:19] !log tstarling Started scap: (no message) [01:03:48] 6operations, 10ops-eqiad: Dear eqiad@rt.wikimedia.org, Call for Submissions on Various Academic Disciplines - https://phabricator.wikimedia.org/T92930#1124056 (10emailbot) [01:04:41] !log disabled contacts.wikimedia.org - if you are an (unexpected) user please contact me or T90679 [01:04:44] Logged the message, Master [01:05:13] nice botspam there [01:07:05] yea. declined [01:07:30] spam has a project :p [01:07:53] so phabricator will be like OTRS, with gigabytes of spam per year? [01:08:04] OTRS has a spam queue [01:09:11] speaking of phabricator [01:09:22] awesome phab issue I discovered the other day [01:09:30] has somebody suggested yet to replace OTRS with phabricator? [01:09:34] try creating a new task [01:09:44] in the projects field try to add security [01:10:06] so type 'Sec', you get 'Security' suggested first, followed by Security-Core etc. [01:10:13] if you type 'Secu', though [01:10:20] Security just disappears :) [01:12:55] hmm, it seems to work for me [01:12:56] (03PS2) 10Tim Starling: Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 [01:13:09] (03CR) 10jenkins-bot: [V: 04-1] Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:13:14] oh, no, confirmed :p weird [01:13:20] TimStarling: not your best day, eh :) [01:15:11] (03PS3) 10Tim Starling: Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 [01:15:24] ori wanted it in gerrit [01:15:25] (03CR) 10jenkins-bot: [V: 04-1] Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:15:48] so now I get to play with the style bot [01:17:47] (03PS4) 10Tim Starling: Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 [01:18:11] btw it appears to work now, scap is running [01:18:24] can I upload a diff? [01:18:46] that's what https://gerrit.wikimedia.org/r/197262 is [01:18:56] all 4 patchsets [01:19:11] yay flake8 success [01:19:32] PROBLEM - Apache HTTP on mw1139 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:19:42] let me look at mw1139 instead. [01:19:44] PROBLEM - HHVM rendering on mw1139 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:21:28] !log restarted HHVM on mw1139; bt in https://phabricator.wikimedia.org/P406 [01:21:35] Logged the message, Master [01:21:43] RECOVERY - Apache HTTP on mw1139 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.082 second response time [01:21:53] RECOVERY - HHVM rendering on mw1139 is OK: HTTP OK: HTTP/1.1 200 OK - 66730 bytes in 0.582 second response time [01:25:51] (03PS3) 10Dzahn: Add apparmor profiles for avconv/ffmpeg2theora [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [01:26:42] (03CR) 10jenkins-bot: [V: 04-1] Add apparmor profiles for avconv/ffmpeg2theora [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [01:27:39] (03PS4) 10Dzahn: Add apparmor profiles for avconv/ffmpeg2theora [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [01:28:30] (03CR) 10jenkins-bot: [V: 04-1] Add apparmor profiles for avconv/ffmpeg2theora [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [01:29:07] (03PS5) 10Dzahn: WIP - Add apparmor profiles for avconv/ffmpeg2theora [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [01:29:50] (03CR) 10jenkins-bot: [V: 04-1] WIP - Add apparmor profiles for avconv/ffmpeg2theora [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [01:31:24] (03PS6) 10Dzahn: Add apparmor profiles for avconv/ffmpeg2theora [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [01:31:38] !log tstarling Finished scap: (no message) (duration: 30m 19s) [01:31:44] Logged the message, Master [01:32:00] that's a long bloody time [01:32:57] (03CR) 10Tim Starling: "Tested on tin, seems to work." [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:33:40] (03CR) 10Dzahn: "rebased onto the new mediawiki module from mediawiki_new, fixed class name, tabs" [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [01:35:27] (03CR) 10Dzahn: "Uploaded 2012-12-12 - py is reviewer :)" [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [01:36:53] (03PS7) 10Dzahn: Add apparmor profiles for avconv/ffmpeg2theora [puppet] - 10https://gerrit.wikimedia.org/r/38307 (https://phabricator.wikimedia.org/T42099) (owner: 10J) [01:39:31] (03PS5) 10Ori.livneh: Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:39:42] TimStarling: if you don't like it or it doesn't work, revert to your earlier PS [01:39:43] (03CR) 10jenkins-bot: [V: 04-1] Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:39:45] but see if you like it [01:39:46] ugh [01:41:00] (03PS6) 10Ori.livneh: Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:41:12] there. [01:41:14] (03CR) 10jenkins-bot: [V: 04-1] Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:41:43] (03PS7) 10Ori.livneh: Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:44:23] !log ori Started scap: (no message) [01:44:29] Logged the message, Master [01:44:33] !log ori scap failed: TypeError temp_dir() takes exactly 2 arguments (1 given) (duration: 00m 10s) [01:44:36] Logged the message, Master [01:45:15] !log ori Started scap: (no message) [01:45:26] !log ori scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_0.713383032704/*" "/srv/mediawiki-staging/php-1.25wmf20/cache/l10n"' returned non-zero exit status 1 (duration: 00m 10s) [01:45:29] Logged the message, Master [01:47:13] why did the cp fail? [01:48:12] damn it all to hell. i'll just revert to PS4 [01:49:24] (03PS8) 10Ori.livneh: Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:49:52] (03CR) 10Ori.livneh: [C: 032] Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [01:50:00] we all learned an important lesson today [01:50:07] (03Merged) 10jenkins-bot: Run rebuildLocalisationCache.php as www-data [tools/scap] - 10https://gerrit.wikimedia.org/r/197262 (owner: 10Tim Starling) [02:03:32] !log applied I98d383a629 locally on mw1017 [02:03:37] Logged the message, Master [02:04:16] ori: What's that lesson? utils.temp_dir is unrealiable? [02:06:10] don't rush to show off because you'll be sloppy and make an ass of yourself [02:06:22] as i did above :X [02:20:54] !log l10nupdate Synchronized php-1.25wmf20/cache/l10n: (no message) (duration: 00m 03s) [02:20:57] Logged the message, Master [02:22:01] !log LocalisationUpdate completed (1.25wmf20) at 2015-03-17 02:20:58+00:00 [02:22:05] Logged the message, Master [02:34:56] !log l10nupdate Synchronized php-1.25wmf21/cache/l10n: (no message) (duration: 00m 04s) [02:35:02] Logged the message, Master [02:36:03] !log LocalisationUpdate completed (1.25wmf21) at 2015-03-17 02:35:00+00:00 [02:36:07] Logged the message, Master [03:52:04] (03PS1) 10Chmarkine: dev.wm.org - Increase HSTS max-age to 1 year [puppet] - 10https://gerrit.wikimedia.org/r/197272 [03:53:06] (03PS2) 10Chmarkine: dev.wm.org - Increase HSTS max-age to 1 year [puppet] - 10https://gerrit.wikimedia.org/r/197272 (https://phabricator.wikimedia.org/T40516) [03:55:03] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [03:55:03] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [04:07:23] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [04:07:23] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:12:12] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 203, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr2-codfw:xe-5/2/1 (Telia, IC-307236) (#3658) [10Gbps wave]BR [06:06:03] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [06:07:22] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [06:13:12] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 205, down: 0, dormant: 0, excluded: 0, unused: 0 [06:26:43] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 203, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr2-codfw:xe-5/2/1 (Telia, IC-307236) (#3658) [10Gbps wave]BR [06:28:52] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:52] PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:33] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:53] PROBLEM - puppet last run on elastic1030 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:53] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:53] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:22] PROBLEM - puppet last run on labcontrol2001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:23] PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:23] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:43] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:12] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:23] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:32] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:32] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:03] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:12] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 205, down: 0, dormant: 0, excluded: 0, unused: 0 [06:45:14] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [06:45:33] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:45:42] RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:45:52] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:46:04] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:46:24] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:43] RECOVERY - puppet last run on elastic1030 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:46:53] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:46:53] RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:46:53] RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:46:53] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:47:03] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:47:32] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:47:42] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:47:52] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:07:39] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Mar 17 07:06:36 UTC 2015 (duration 6m 35s) [07:07:48] Logged the message, Master [07:14:23] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 203, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr2-codfw:xe-5/2/1 (Telia, IC-307236) (#3658) [10Gbps wave]BR [07:25:14] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [07:27:30] <_joe_> lol [07:27:37] <_joe_> not even in prod, and already down [07:33:44] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1124324 (10Joe) This is still not working. Next time verify it please, or I'm left idle for one more day On mw2080 ```... [07:33:55] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1124325 (10Joe) a:5Joe>3RobH [07:39:02] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 205, down: 0, dormant: 0, excluded: 0, unused: 0 [07:40:39] <_joe_> !log powercycled mw2027, went down with an unresponsive console [07:40:45] Logged the message, Master [07:42:14] RECOVERY - Host mw2027 is UP: PING WARNING - Packet loss = 44%, RTA = 42.89 ms [08:30:56] (03CR) 10Mobrovac: "Is the idea here to enable it back when VE/wmf21 gets deployed onto group2 wikis? If so, I don't understand why this patch, since it's a n" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197250 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [08:44:35] (03PS3) 10Nemo bis: Set $wgRateLimits['badcaptcha'] to counter bots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195886 (https://phabricator.wikimedia.org/T92376) [08:45:09] _joe_: si dà alla pazza gioia finché può ;) [08:55:33] 6operations, 10Parsoid, 6Services: Move Parsoid config into ops/puppet - https://phabricator.wikimedia.org/T92636#1124390 (10mobrovac) The problem at hand is, in a wider context, service orchestration. Services depends on one another, e.g. RESTBase - Parsoid or Citoid - Zotero, and thus have to be able to ac... [09:15:33] 6operations, 10Continuous-Integration: gallium.wikimedia.org disk space running low - https://phabricator.wikimedia.org/T91211#1124425 (10hashar) 5Open>3Resolved Resolved for now. Work is in progress to reduce the number of jobs being run that will help keep disk usage at a sane level. [09:17:17] (03CR) 10Mobrovac: "@Ori, cheers for this! I think we need to either set the default value for statsd_host in the citoid class, or add an entry in hiera for t" [puppet] - 10https://gerrit.wikimedia.org/r/197131 (owner: 10Ori.livneh) [09:31:06] (03CR) 10Ricordisamoa: [C: 04-1] Enable Flow post editing for autoconfirmed users on Mediawiki, English, Russian (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [09:32:44] (03PS2) 10Nemo bis: Enable Flow post editing for autoconfirmed users on MediaWiki.org, English, Russian [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [09:32:58] (03CR) 10Nemo bis: ""post-editing" or "editing of post(s)"?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [09:33:35] (03CR) 10Mobrovac: "Ah, now that I had my first coffee I realise what this patch is about :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197250 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [09:40:12] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 702 [09:47:27] <_joe_> mobrovac: first coffee after 10 AM? tsk tsk [09:47:28] <_joe_> :) [09:49:18] :) [09:49:29] it's the _end_ of my first coffee [09:49:34] not the same :) [09:49:48] especially when you start looking at patches after the first sip [09:49:57] it takes a while to drink it like that [09:49:58] :) [09:50:10] <_joe_> mobrovac: oh, right, _that_ type of coffee [09:50:12] RECOVERY - check_mysql on db1008 is OK: Uptime: 414049 Threads: 2 Questions: 2707659 Slow queries: 2747 Opens: 8683 Flush tables: 2 Open tables: 64 Queries per second avg: 6.539 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [09:50:20] <_joe_> Italian coffee is an atomic transaction [09:50:22] yup [09:50:28] yep, i know [09:57:13] Depends on context [10:02:40] <_joe_> Nemo_bis: would you sip a cup of espresso for more than 5 minutes? [10:07:16] (03PS11) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) [10:09:37] 6operations, 10Math, 7Wikimedia-log-errors: Fatal in production due to outdated package: Missing "texvccheck" executable - https://phabricator.wikimedia.org/T92707#1124606 (10Aklapper) p:5Triage>3High [10:10:19] (03CR) 10Phuedx: "@Kaldari: could you review these and get 'em merged while I sort out my perms?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194354 (owner: 10Phuedx) [10:15:08] 6operations, 10Math, 7Wikimedia-log-errors: Fatal in production due to outdated package: Missing "texvccheck" executable - https://phabricator.wikimedia.org/T92707#1124634 (10Joe) a:3Joe [10:17:30] 6operations, 10Math, 7Wikimedia-log-errors: Fatal in production due to outdated package: Missing "texvccheck" executable - https://phabricator.wikimedia.org/T92707#1124639 (10Joe) the mentioned machines have probably been installed before the new version of the package was uploaded, and no one bothered to ch... [10:20:23] 6operations: NIC misassigned (double entries) by jessie installer - https://phabricator.wikimedia.org/T90236#1124640 (10fgiunchedi) a:3fgiunchedi yep I'll take this @coren [10:32:31] (03CR) 10Mjbmr: "Hi," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195017 (owner: 10Mjbmr) [10:33:58] _joe_: rarely :) I know people who consume one in 1 hour or more, but admittely one is German and the other sort-of-Greek [10:35:34] sort of? :) [10:38:02] ah no no, espresso is not coffee, espresso is espresso [10:38:10] please, let's not confuse things :) [10:39:09] i.e. one set of (social) rules apply to an espresso, and another to (other types of) coffee [10:42:19] (03PS8) 10Yuvipanda: ldap+yaml file puppet ENC for self hosted puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/196628 (https://phabricator.wikimedia.org/T85279) [10:42:24] _joe_: ^ fixed the ldap issue you pointed out :) [10:42:56] <_joe_> YuviPanda|zzz: how did you do it? I'm not sure you can [10:43:12] _joe_: just validated hostnames to conform to a fairly strict regex. [10:43:25] <_joe_> oh that one, right [10:43:25] _joe_: plus the user being used is a ‘public’ user - everyone has access to that one’s password :) [10:43:58] _joe_: as for performance, I think it’s ok in this context (more rationale in my response on the patch). even with 50 hosts, we’re looking at one invocation every 24s, which is fine, I think [10:44:27] <_joe_> yes, ok [10:45:42] _joe_: this will also allow me to get rid of hiera_include(‘classes’) :) and then when horizon is more stable, move the ENC stuff there, and there it’ll be properly done as a persistant service and performant. [10:45:49] 6operations, 7network: Very slow connection to wmf engineering infrastructure - https://phabricator.wikimedia.org/T92548#1124727 (10Aklapper) Wondering if {T92513} is related / one manifestation. I am afraid I cannot reproduce (being in Europe, it's 10:42UTC): ``` $:andre\> wget -v http://tools.wmflabs.org/s... [10:45:56] <_joe_> YuviPanda|zzz: ok [10:46:03] <_joe_> btw, lose that zzz :) [10:46:44] ah :) [10:46:44] right [10:51:03] (03PS9) 10Yuvipanda: ldap+yaml file puppet ENC for self hosted puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/196628 (https://phabricator.wikimedia.org/T85279) [10:55:24] 6operations, 7network: Gerrit (ssh) is unusably slow at certain times of day from europe - https://phabricator.wikimedia.org/T92513#1124752 (10faidon) [10:58:45] (03PS12) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) [10:59:41] s/the Dallas DC/codfw/ :P [10:59:52] !log depool restbase1006, provisioning [10:59:56] Logged the message, Master [11:00:06] we have two Dallas DCs now :P [11:00:56] (03CR) 10Giuseppe Lavagetto: [C: 031] "Thanks for addressing my concerns" [puppet] - 10https://gerrit.wikimedia.org/r/196628 (https://phabricator.wikimedia.org/T85279) (owner: 10Yuvipanda) [11:01:16] <_joe_> paravoid: we have two dallas DCs with mediawiki in it? :) [11:01:22] no, exactly ;) [11:01:38] sorry, EPEDANTIC [11:01:47] I was half kidding :) [11:02:41] aude: hey [11:02:55] or jzerebecki? [11:03:15] ? [11:03:31] paravoid: hi [11:03:31] hi [11:03:35] hey [11:03:48] I'm reading https://phabricator.wikimedia.org/T92513 [11:03:50] ok [11:03:54] (03CR) 10Alexandros Kosiaris: "This looks fine, merging. @mobrovac: yup hiera sounds perfect. statsd.eqiad.wmnet should be OK for now and we should revisit when citoid i" [puppet] - 10https://gerrit.wikimedia.org/r/197131 (owner: 10Ori.livneh) [11:04:02] (03CR) 10Alexandros Kosiaris: [C: 032] Add configurable metric reporting for citoid [puppet] - 10https://gerrit.wikimedia.org/r/197131 (owner: 10Ori.livneh) [11:04:08] do I read that right that one of the manifestations of this affects the whole WMDE office? [11:04:12] i can do those things [11:04:23] it affects the office, but also home and remote people [11:04:30] might be just germany, not sure [11:04:40] oh wow [11:04:47] e.g. hoo [11:04:49] _joe_: ty! [11:05:02] and umherirrender [11:05:08] (03PS10) 10Yuvipanda: ldap+yaml file puppet ENC for self hosted puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/196628 (https://phabricator.wikimedia.org/T85279) [11:05:14] FlorianSW is also experiencing issues, T92548 [11:05:16] also germany [11:05:46] so yes, traceroute + IP please [11:05:49] yea also via dtag.de [11:05:51] ok [11:06:11] <_joe_> what issues do you have exactly? connecting to gerrit? [11:06:19] is this reproducible now? [11:06:24] <_joe_> it's fine from italy [11:06:39] (03CR) 10Yuvipanda: [C: 032] ldap+yaml file puppet ENC for self hosted puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/196628 (https://phabricator.wikimedia.org/T85279) (owner: 10Yuvipanda) [11:06:55] only certain times of day, as reported in the bug [11:07:08] I know, is this one of those times? :) [11:07:12] hmm [11:07:15] puppet merge is failing [11:07:16] * YuviPanda sees [11:07:21] paravoid: it's not [11:07:28] it starts around 5pm CET [11:07:32] for a few hours [11:07:37] right [11:07:42] (03PS1) 10Filippo Giunchedi: restbase: restbase1006 back in service [puppet] - 10https://gerrit.wikimedia.org/r/197307 [11:07:42] that's congestion [11:07:51] some link, probably dtag's with one of our carriers, gets saturated [11:08:02] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1124777 (10Joe) mw2135 is installing just fine right now, I'll create a precise map of what works/doesn't as soon as poss... [11:08:23] internet politics almost probably but we can try picking another path [11:08:46] seems like that is the problem [11:09:12] YuviPanda: on it [11:09:21] akosiaris: cool :) [11:09:28] mobrovac: hey, https://gerrit.wikimedia.org/r/197307 for restbase1006 [11:10:15] godog: yupiii [11:10:23] !log chown gitpuppet:gitpuppet /var/lib/git/operations/puppet/.git/logs/refs/remotes/origin/production on strontium, palladium. Somehow it was owned by root [11:10:25] provisioning now or already done? [11:10:28] Logged the message, Master [11:10:30] YuviPanda: ^ [11:10:31] there's two paths involved though (forward & reverse), plus there's no gurantee the same is not happening with the alternative path we pick as well [11:10:43] so I'd like some realtime feedback when I'm going to start making changes [11:10:51] I'll try catching one of you (or hoo|away) after 5pm CET [11:10:58] jzerebecki is getting traceroute [11:11:08] mobrovac: it is up ATM, looking at bringing up cassandra [11:11:09] (03CR) 10Mobrovac: [C: 031] restbase: restbase1006 back in service [puppet] - 10https://gerrit.wikimedia.org/r/197307 (owner: 10Filippo Giunchedi) [11:11:13] thank you :) [11:11:13] akosiaris: cool. merged yours too :D [11:11:23] no, it is still failing [11:11:24] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] restbase: restbase1006 back in service [puppet] - 10https://gerrit.wikimedia.org/r/197307 (owner: 10Filippo Giunchedi) [11:11:25] you should rebase before merging :D gets rid of merge commits [11:11:33] akosiaris: thanks! [11:11:34] it definitely goes through dtag.de for me [11:11:37] I dislike rebases [11:11:49] akosiaris: hmm, fails on strontium as well [11:12:10] akosiaris: hmm, I just like fast forwards. [11:12:30] well dtag is huge, that alone doesn't say much [11:12:31] ow, I'll hold puppet-merge, YuviPanda my changes are good to be merged whenever [11:12:33] in other news, https://test.wikidata.org/wiki/Special:Search (allof test. wikidata) has no javascript [11:12:55] any idea if something changed in the past day that could be related? [11:13:01] dtag == deutsche telekom, for the rest of our readers :P [11:13:09] ok [11:14:52] !log ran chown -R gitpuppet:gitpuppet /var/lib/git/operations/puppet on strontium, fix permission issues [11:14:55] Logged the message, Master [11:15:42] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 5 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [11:16:24] !log ran chown -R gitpuppet:gitpuppet /var/lib/git/operations/puppet on palladium, fix permission issues [11:16:25] how did all those root owned files ended up on strontium and palladium ? [11:16:28] Logged the message, Master [11:16:44] did someone ran git pull/merge as root ? [11:16:49] run* [11:16:53] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [11:19:38] akosiaris: not sure. [11:30:47] (03PS1) 10Alexandros Kosiaris: Followup commit for cd41322 [puppet] - 10https://gerrit.wikimedia.org/r/197309 [11:32:06] (03PS1) 10Mobrovac: Citoid: set the StatsD host to statsd.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/197310 (https://phabricator.wikimedia.org/T87496) [11:33:24] (03CR) 10Alexandros Kosiaris: [C: 032] Followup commit for cd41322 [puppet] - 10https://gerrit.wikimedia.org/r/197309 (owner: 10Alexandros Kosiaris) [11:36:07] (03CR) 10Silke Meyer: "LOL! This was even 2 years ago. Not needed any more. :)" [puppet] - 10https://gerrit.wikimedia.org/r/52043 (owner: 10Silke Meyer) [11:37:05] (03Abandoned) 10Faidon Liambotis: Install Solr and Solarium on Wikidata test repos. [puppet] - 10https://gerrit.wikimedia.org/r/52043 (owner: 10Silke Meyer) [11:38:01] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1124860 (10Joe) ok, my suspicions were confirmed: @RobH configured the new servers (mw2135 onwards), but there are still... [11:38:09] 6operations, 10RESTBase-Cassandra: restbase1006 cassandra re-joining/bootstrap failure - https://phabricator.wikimedia.org/T92950#1124861 (10fgiunchedi) 3NEW [11:39:49] 6operations, 10Parsoid, 6Services: Move Parsoid config into ops/puppet - https://phabricator.wikimedia.org/T92636#1124872 (10faidon) Well, service discovery is something that we are (probably :)) going to tackle this coming quarter, likely with a Zookeeper/etcd (etc.) solution. I don't think such a system wo... [11:45:34] godog: re cassandra problem, https://gerrit.wikimedia.org/r/#/c/195483/ should help as it excludes the current node from the seeds [11:46:33] 6operations, 10RESTBase-Cassandra: restbase1006 cassandra re-joining/bootstrap failure - https://phabricator.wikimedia.org/T92950#1124876 (10fgiunchedi) looks like restbase1005 fails to stream, there are not many indications of why that is the case ATM ``` INFO [HANDSHAKE-/10.64.48.100] 2015-03-17 11:30:40,5... [11:47:42] mobrovac: heh I manually removed the node before starting cassandra so I'm not sure it is related, see also my last update [11:49:27] uf, nullpointerex, the second most informative error after segfault [11:49:28] :) [11:49:56] heheh [11:52:11] godog: the log you pasted in the comment is from rb1006 or rb1005? [11:54:18] (03CR) 10Alexandros Kosiaris: [C: 032] Citoid: set the StatsD host to statsd.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/197310 (https://phabricator.wikimedia.org/T87496) (owner: 10Mobrovac) [11:54:35] mobrovac: the second log is from 1005 [12:01:49] hm, the stack traces don't help much [12:01:50] :( [12:03:37] (03PS1) 10KartikMistry: Beta: Enable more languages in Beta [puppet] - 10https://gerrit.wikimedia.org/r/197312 [12:05:28] <_joe_> !log upgraded libicu48 and mediawiki-math-texvc across the cluster [12:05:31] Logged the message, Master [12:09:09] 6operations, 10Math, 7Wikimedia-log-errors: Fatal in production due to outdated package: Missing "texvccheck" executable - https://phabricator.wikimedia.org/T92707#1124905 (10Joe) The package has been upgraded across the whole appservers/api cluster [12:09:23] 6operations, 10Math, 7Wikimedia-log-errors: Fatal in production due to outdated package: Missing "texvccheck" executable - https://phabricator.wikimedia.org/T92707#1124907 (10Joe) 5Open>3Resolved [12:09:56] 6operations, 7HHVM: Switch HAT appservers to trusty's ICU - https://phabricator.wikimedia.org/T86096#1124909 (10Joe) [12:09:58] 6operations, 7HHVM: Complete the use of HHVM over Zend PHP on the Wikimedia cluster - https://phabricator.wikimedia.org/T86081#1124911 (10Joe) [12:09:59] 6operations, 6Commons, 6Multimedia, 7HHVM, and 3 others: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1124908 (10Joe) 5Open>3stalled [12:15:07] godog: perhaps http://stackoverflow.com/questions/22837895/restarting-a-failed-stalled-stream-during-bootstrap-of-new-node might help ? [12:18:47] 6operations: iridium "standard" conflict with exim in role - https://phabricator.wikimedia.org/T92879#1124939 (10Aklapper) (Workaround was https://gerrit.wikimedia.org/r/#/c/197112/ ) [12:26:42] 6operations, 7network: Gerrit (ssh) is unusably slow at certain times of day from europe - https://phabricator.wikimedia.org/T92513#1124948 (10JanZerebecki) Currently not seeing the problem, adding this to compare it to when it happens again. Public IP this test is ran from: 87.138.110.76 ``` $ mtr -c 100 --... [12:39:10] 6operations, 7network: Very slow connection to wmf engineering infrastructure - https://phabricator.wikimedia.org/T92548#1124967 (10faidon) [12:39:11] 6operations, 7network: Gerrit (ssh) is unusably slow at certain times of day from europe - https://phabricator.wikimedia.org/T92513#1124966 (10faidon) [12:39:36] 6operations, 7network: Network congestion between DTAG & eqiad - https://phabricator.wikimedia.org/T92548#1124969 (10faidon) [12:46:16] 6operations, 10RESTBase-Cassandra: restbase1006 cassandra re-joining/bootstrap failure - https://phabricator.wikimedia.org/T92950#1124987 (10mobrovac) This seems to be a [known issue](https://issues.apache.org/jira/browse/CASSANDRA-6565). There are some [suggestions](http://stackoverflow.com/questions/28001350... [12:48:36] <_joe_> when I read things like https://issues.apache.org/jira/browse/CASSANDRA-6565 I usually guess what everyone would say if something like that happened with mysql [12:49:25] <_joe_> also the lack of devs responses to such an important bug for one year [12:50:20] yup [12:50:24] <_joe_> I'm not saying cassandra is a pile of crap of course - I just think most of these distributed k-v stores are extremely complex and delicate, and also relatively immature [12:51:08] from the dev perspective, testing them is not easy at all [12:51:14] <_joe_> I ditched cassandra at JOB~1 around 3 years ago when I saw it was relatively easy to break [12:51:17] reporducing real-life situations is even harder [12:51:18] <_joe_> mobrovac: oh I know [12:53:12] <_joe_> again, my point is that these stores are not a magic bullet and there is a reason why ops people urge caution in adopting new cool toys :) [12:53:20] * _joe_ loves new cool toys [12:56:30] :) [12:57:14] <_joe_> !log updating sudo across all production [12:57:18] Logged the message, Master [13:03:03] 6operations, 10ops-eqiad: Failed Raid Analytics1010 - https://phabricator.wikimedia.org/T92957#1125084 (10Cmjohnson) 3NEW [13:12:09] (03CR) 10JanZerebecki: [C: 031] dev.wm.org - Increase HSTS max-age to 1 year [puppet] - 10https://gerrit.wikimedia.org/r/197272 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [13:15:57] (03PS2) 10Thcipriani: Parameterize mail::mx role [puppet] - 10https://gerrit.wikimedia.org/r/196658 (https://phabricator.wikimedia.org/T91562) [13:17:36] 6operations, 7network: Network congestion between DTAG & eqiad - https://phabricator.wikimedia.org/T92548#1125133 (10Florian) @Aklapper Which provider do you use? Can you make a tracert (just to see, which hops and networks your provider uses)? There are already some well-known problems with the connections f... [13:32:59] 6operations, 7network: Network congestion between DTAG & eqiad - https://phabricator.wikimedia.org/T92548#1125165 (10faidon) @JanZerebecki gave us traceroutes from WMDE on T92513. They seem to be using DTAG (Deutsche Telekom) as well, which is why I merged the two tasks. The DTAG->Wikimedia path goes via Teli... [13:38:03] godog: any updates or luck ? :) [13:43:10] (03PS3) 10BBlack: dev.wm.org - Increase HSTS max-age to 1 year [puppet] - 10https://gerrit.wikimedia.org/r/197272 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [13:45:30] (03CR) 10BBlack: [C: 032] dev.wm.org - Increase HSTS max-age to 1 year [puppet] - 10https://gerrit.wikimedia.org/r/197272 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [13:46:38] 6operations, 7network: Network congestion between DTAG & eqiad - https://phabricator.wikimedia.org/T92548#1125183 (10Florian) @faidon: Argh, right, haven't seen :) I'd be happy to assist you whenever i have time, just ping me on IRC (FlorianSW on #wikimedia-dev, #wikimedia-operations, #wikimedia-mobile...) :) [13:50:46] mobrovac: not yet no, resuming on a full stomach :) [13:51:06] good, that sounds promising :) [13:58:39] (03Abandoned) 10Ottomata: Allow base::firewall to specify an 'accept' policy by default [puppet] - 10https://gerrit.wikimedia.org/r/160480 (owner: 10Ottomata) [13:58:57] hey ottomata [13:59:05] an1010 has two failed disks, someone filed a ticket I see [13:59:25] icinga is full of varnishkafka alerts too [13:59:42] _joe_: mw20xx alerts as well [13:59:48] 112 Matching Service Entries Displayed :(( [14:01:51] 6operations, 7network: cr1/cr2-codfw QSFP+ errors every second for qsfp-0/0/0 - https://phabricator.wikimedia.org/T92616#1125192 (10mark) I didn't see those errors back then. Yes, might be related. We could swap the cable to see if it's that, or replace with a fiber if that doesn't work. [14:03:18] paravoid, an10 is a cisco and not used, have lots of meetings today, will look into vk [14:03:48] ottomata: can you decom it then? :) [14:03:59] 6operations, 10RESTBase-Cassandra: restbase1006 cassandra re-joining/bootstrap failure - https://phabricator.wikimedia.org/T92950#1125201 (10mobrovac) Also note that the process CPU load on restbase100[1-5] is over 150% currently, so it might just be that not enough time is given for streaming. So, increasing... [14:04:42] mark: i hadn't done so because i wanted to keep them around for some streaming framework trials, but i haven't had time to do that [14:04:50] ok [14:05:20] my thought was: i will decom them if someone else wants them, or we need to unrack them, but unless its not a hurry, i'll just keep them around [14:05:27] if you want me to decom still i can :) [14:05:44] let's not reuse these ciscos [14:05:51] i mainly want them either offline or properly managed [14:12:36] aye, i wouldn't be reusing them for anything in production, just for short trials [14:13:06] (03CR) 10Dzahn: "the ref link is https://phabricator.wikimedia.org/T92879 (missing a number)" [puppet] - 10https://gerrit.wikimedia.org/r/197112 (owner: 10Rush) [14:20:51] (03PS1) 10Dzahn: phabricator: delete legalpad.yaml [puppet] - 10https://gerrit.wikimedia.org/r/197320 [14:24:22] <_joe_> paravoid: I'd say it's a graphite/statsd failure (the alerts on mw*) [14:24:37] <_joe_> and the kafka ones too [14:25:31] (03CR) 10Jforrester: "> Nevertheless, shouldn't this patch enable it for a couple of wp's right away so that we don't have to send a subsequent patch doing that" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197250 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [14:25:44] _joe_: ja? I haven't looked into them, but i thought since they were all about null datapoints, that it must be some graphite/statsd or maybe even jessie varnish migration thing [14:27:15] <_joe_> given both varnishkafka and the mw* hosts report empty datapoints in graphite checks... [14:27:19] <_joe_> godog: ^^ [14:32:26] 6operations: iridium "standard" conflict with exim in role - https://phabricator.wikimedia.org/T92879#1125265 (10Dzahn) It does not happen on OTRS (iodine) or racktables (magnesium). No puppet errors there, even though they also have "has_default_mail_relay: false". The difference seems to be only the structure... [14:36:47] 6operations, 10RESTBase-Cassandra: restbase1006 cassandra re-joining/bootstrap failure - https://phabricator.wikimedia.org/T92950#1125269 (10fgiunchedi) [[https://issues.apache.org/jira/browse/CASSANDRA-7063 | CASSANDRA-7063]] bumping `phi_convict_threshold` seems fixed already in 2.0.9 and we are running 2.1.... [14:37:44] (03CR) 10Mobrovac: "> Right now this patch has totally broken VE in Beta Cluster (because there's no RB there, and unlike wmf20 the code in master /does/ pay " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197250 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [14:38:29] (03PS3) 10Anomie: Disable restbase on wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197250 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [14:39:12] (03PS1) 10coren: Tool Labs: correctly handle comments in crontab [puppet] - 10https://gerrit.wikimedia.org/r/197325 (https://phabricator.wikimedia.org/T75256) [14:41:01] (03CR) 10Anomie: "I don't like the mixing of unrelated comment updates with the functional change, but not enough to -1 over it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) (owner: 10Revi) [14:42:48] YuviPanda: where's the hiera.yaml file on servers? i want to check some facts and : Failed to start Hiera: RuntimeError: Config file /etc/hiera.yaml not found [14:43:33] _joe_ ottomata is it tracked in phab somewhere? [14:44:19] <_joe_> godog: nope [14:45:06] (03CR) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC (035 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [14:45:18] godog: i would say it is reopening this: [14:45:26] https://phabricator.wikimedia.org/T76342 [14:45:42] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=detail&servicestatustypes=8&hoststatustypes=3&serviceprops=2097162&nostatusheader [14:46:39] mutante: should be in /etc/puppet/hiera.yaml on the puppet master, using hiera from the command line requires you to specify a RUBYLIB path for hiera [14:47:26] (03CR) 10Chad: [C: 04-1] mediawiki: add configs to support the Dallas DC (034 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [14:47:28] 6operations, 10Analytics: Fix Varnishkafka delivery error icinga warning - https://phabricator.wikimedia.org/T76342#1125299 (10Dzahn) reopen for https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=detail&servicestatustypes=8&hoststatustypes=3&serviceprops=2097162&nostatusheader .. or new bug? [14:47:36] thcipriani: thanks [14:48:03] mutante: RUBYLIB=/var/lib/puppet/lib hiera [fact] ::[toplevelfact]=fact --debug -c /etc/puppet/hiera.yaml [14:48:21] 6operations, 10Analytics: Fix Varnishkafka delivery error icinga warning - https://phabricator.wikimedia.org/T76342#1125300 (10Dzahn) 5Resolved>3Open [14:48:44] (03CR) 10Chad: mediawiki: add configs to support the Dallas DC (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [14:49:20] <^d> _joe_: Couple of nits from the refactoring, but I like the approach generally [14:49:29] <^d> (poolcounter is especially clean!) [14:50:30] manybubbles, ^d, marktraceur, thcipriani: Who wants to SWAT this morning? [14:50:48] gwicke, James_F, Revi, gi11es: Ping for SWAT in about 10 minutes [14:50:57] Sure. [14:51:00] yeah? [14:51:56] <^d> I can [14:51:59] * ^d makes a coffee [14:52:11] Revi: Just checking that you're here for the SWAT window. In a little bit ^d will merge your patch then ask you to confirm it worked. [14:52:14] ^d: ok! [14:52:19] ok :D [14:52:45] anomie: I also just added two more patches [14:52:52] * marktraceur wishes gi11es luck! [14:53:33] anomie: pong [14:53:36] hrmm ... Cannot load backend role: no such file to load -- hiera/backend/role_backend [14:54:02] * ^d makes a very strong coffee [14:54:15] * Revi prepares to go to bed [14:54:18] Glaisher: What about https://gerrit.wikimedia.org/r/#/c/196988/ ? [14:54:21] ^d: Irish? [14:54:29] <^d> Heh, no [14:54:37] <^d> Nespresso machine ftw! [14:54:47] yeah, if ^d is okay with that [14:54:49] <_joe_> ^d: thanks! [14:54:54] that would make it 8 patches [14:55:06] <_joe_> but then... you praise nespresso [14:55:29] Glaisher: Config patches are easy. [14:55:38] nespresso means not-espresso :) [14:55:58] <^d> It means I don't have to trek down the block to the coffee shop :p [14:56:02] James_F: this one needs db tables too [14:56:05] :) [14:56:06] mutante: file /var/lib/puppet/lib/hiera/backend/role_backend.rb —does that have output? [14:56:09] ACKNOWLEDGEMENT - puppet last run on ms-be2007 is CRITICAL: CRITICAL: Puppet has 2 failures Filippo Giunchedi waiting disk replacement [14:56:19] ACKNOWLEDGEMENT - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet has 1 failures Filippo Giunchedi waiting disk replacement [14:56:20] ^d: You could come to the office and I'd make you one whilst you SWATed. :-P [14:56:33] Glaisher: Oh, argh, yeah. In that case, we'll need a full window. [14:56:56] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1125318 (10RobH) Legal transfers domains, and we add support. Ops mgmt and legal has to have a discussion if we are to stop adding these domains. [14:56:59] is what tracked godog? [14:57:15] thcipriani: yes, the file is there [14:57:19] <^d> James_F: Have you considered a weekend job as a barista at the nespresso store? [14:57:20] Glaisher: Do they realize that editing a page protected with cascading requires the 'protect' right in addition to whatever right is specified? [14:57:52] oh, I don't think they do [14:57:52] ottomata: the varnishkafka/mw issue above re: datapoints [14:58:00] I had also forgotten about it [14:58:13] probably better to delay it [14:58:20] I'll ask on phab [14:58:30] mutante: and you're doing: RUBYLIB=/var/lib/puppet/lib hiera [whatever] --debug -c /etc/puppet/hiera.yaml ? If so, I'm out of ideas :\ [14:58:35] Otherwise someone could sneak around not being able to protect pages by editing a cascade-protected page they do have permission to edit, thereby extending the cascading. [14:59:09] !log rebooting cp1072-4, cp3030-49 (none in production) [14:59:11] mutante: thanks! I'll take a look [14:59:16] Logged the message, Master [14:59:30] ^d: :-P [14:59:45] godog: hm, dunno, but mutante just reopened this [14:59:45] https://phabricator.wikimedia.org/T76342#1125300 [15:00:04] manybubbles, anomie, ^d, thcipriani, marktraceur: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150317T1500). Please do the needful. [15:00:13] mutante: i think that is a new issue, that one was for check_ganglia. this one is from graphite, and the data is really not in graphite [15:00:21] meh [15:00:22] anomie: ok, removed that one [15:00:24] * ^d pumps up the jams [15:00:28] <^d> Let's get this started folks [15:00:36] <^d> Config changes first [15:00:38] swat time! [15:00:41] yay [15:00:45] yoohoo [15:00:57] ^d: https://www.youtube.com/watch?v=9EcjWd-O4jI [15:00:59] ^d: A little louder now. [15:01:23] ottomata: ok, they both manifest as "unknown" or "no data" in Icinga , that's why [15:02:04] (03PS1) 10Andrew Bogott: Turn on the rpc_notifier. [puppet] - 10https://gerrit.wikimedia.org/r/197326 [15:02:13] (03CR) 10Chad: [C: 032] Create Draft (118) namespace on Korean Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) (owner: 10Revi) [15:02:22] (03Merged) 10jenkins-bot: Create Draft (118) namespace on Korean Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) (owner: 10Revi) [15:03:10] robh: Jeez, she has like twenty different outfits for this video. [15:03:13] (03PS2) 10Andrew Bogott: Turn on the rpc_notifier. [puppet] - 10https://gerrit.wikimedia.org/r/197326 (https://phabricator.wikimedia.org/T87280) [15:03:31] ^d: is it too late to add some swat things? [15:03:33] Budget: $20 animation, $10 green screen, $3 million wardrobe [15:03:48] <^d> Are we full yet? [15:03:55] don't know [15:04:07] aude: There's nominally space for one thing. [15:04:15] one thing = 4 patches [15:04:18] !log demon Synchronized wmf-config/InitialiseSettings.php: draft namespace for kowiki (duration: 00m 05s) [15:04:21] But it's ^d's call. [15:04:22] Logged the message, Master [15:04:39] <^d> Revi: kowiki should have a draft namespace now, plz confirm [15:04:43] 2 for wmf20 core, 2 for wmf21 core [15:04:52] yeah, confirmed - https://ko.wikipedia.org/wiki/%EC%B4%88%EC%95%88:Revi [15:05:12] (03CR) 10Chad: [C: 032] Add 'autopatrol' protection level to lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196779 (https://phabricator.wikimedia.org/T92645) (owner: 10Glaisher) [15:05:16] (03CR) 10Chad: [C: 032] Update 'interface_editor' to 'interface-editor' at ckbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195934 (https://phabricator.wikimedia.org/T85731) (owner: 10Glaisher) [15:05:17] Revi: Does Korean Wikipedia now have draft namespaces working? [15:05:24] * aude was unsuccessful in staying up until midnight last night for swat [15:05:25] yeah [15:05:25] Revi: Nice. [15:05:32] xD thanks [15:05:33] uh, ^d I removed that autopatrol one [15:05:40] <^d> Oh you did? [15:05:41] <^d> Ugh [15:05:48] (looks like you missed the discussion) [15:05:48] away now, cousin has passed away and I was here only for this. [15:05:53] (03CR) 10Andrew Bogott: [C: 032] Turn on the rpc_notifier. [puppet] - 10https://gerrit.wikimedia.org/r/197326 (https://phabricator.wikimedia.org/T87280) (owner: 10Andrew Bogott) [15:05:54] Revi: https://ko.wikipedia.org/wiki/Special:Random/초안 only returns on item. :-) [15:05:59] Glaisher: Do they realize that editing a page protected with cascading requires the 'protect' right in addition to whatever right is specified? [15:06:01] oh, I don't think they do [15:06:03] <^d> Glaisher: Removed my +2 [15:06:05] <^d> Shouldn't merge [15:06:09] ah, ok [15:06:52] thanks James_F [15:07:32] ^d: i think we can wait [15:07:45] <^d> Let's, since it's 4 :) [15:07:49] <^d> unless it's on firw [15:07:51] <^d> *fire [15:08:10] (03CR) 10Chad: [C: 032] Disable restbase on wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197250 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [15:08:12] (03Merged) 10jenkins-bot: Update 'interface_editor' to 'interface-editor' at ckbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195934 (https://phabricator.wikimedia.org/T85731) (owner: 10Glaisher) [15:08:40] want to wait until hoo is around [15:08:49] 6operations, 10Analytics: Fix Varnishkafka delivery error icinga warning - https://phabricator.wikimedia.org/T76342#1125358 (10Dzahn) 5Open>3Resolved re-closing per: < ottomata> mutante: i think that is a new issue, that one was for check_ganglia. this one is from graphite, and the data is really not in... [15:09:18] !log demon Synchronized wmf-config/abusefilter.php: (no message) (duration: 00m 07s) [15:09:20] <^d> Glaisher: ^^ [15:09:21] Logged the message, Master [15:09:27] 6operations, 10RESTBase-Cassandra: restbase1006 cassandra re-joining/bootstrap failure - https://phabricator.wikimedia.org/T92950#1125366 (10GWicke) Since restbase1006 was out of service for longer than gc_grace period we need to wipe its data first: ``` rm -rf /var/lib/cassandra/* ``` [15:09:52] godog: it's important that the cassandra data on restbase1006 is wiped before trying to re-join the cluster [15:10:04] ^d: looks fine. thanks [15:10:18] (03CR) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [15:12:09] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Added initial Debian packaging [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/195229 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [15:12:13] ^d: discussed with hoo and we are ready whenever (or can do ourselves) [15:12:35] yeah [15:12:46] well, I can't... have very spotty internet (on a train) [15:12:51] i can [15:13:07] At best merge and pull both, then sync the whole includes/jobqueue folder [15:13:13] that's easiest and save [15:13:16] https://gerrit.wikimedia.org/r/#/c/197323/ and https://gerrit.wikimedia.org/r/#/c/197321/ [15:13:22] yep [15:13:27] the phab bot is killed again? [15:13:30] * safe [15:13:48] aude: Do wmf20 first, then 21 (assuming WD is on 20) [15:13:56] it is, yes [15:14:19] then https://gerrit.wikimedia.org/r/#/c/197324/ and https://gerrit.wikimedia.org/r/#/c/197322/ for wmf21 [15:14:34] yep [15:14:36] :) [15:14:43] godog: you haven't wiped the data before on rb1006 ? [15:14:48] 6operations: icinga UNKNOWN Varnishkafka Delivery Errors / varnishkafka data not in graphite - https://phabricator.wikimedia.org/T92965#1125387 (10MC8) [15:15:29] (03Merged) 10jenkins-bot: Disable restbase on wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197250 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [15:15:56] 6operations: icinga UNKNOWN Varnishkafka Delivery Errors / varnishkafka data not in graphite - https://phabricator.wikimedia.org/T92965#1125399 (10fgiunchedi) a:3fgiunchedi confirmed it is a graphite issue, namely the carbon-relay queue is too small now, will followup with a fix [15:15:57] ottomata: ^ re: those vk delivery errors, shouldn't they be warnings or something instead of UNKNOWN? otherwise we don't get alerted to the fact that we're probably losing stats info [15:15:59] <^d> Krenair: VE change merged, submodule update plz? [15:15:59] 6operations: hhvm - Icinga - UNKNOWN queue size / busy threads - https://phabricator.wikimedia.org/T92967#1125401 (10Dzahn) [15:16:21] 6operations, 7Graphite, 7HHVM, 7Icinga: hhvm - Icinga - UNKNOWN queue size / busy threads - https://phabricator.wikimedia.org/T92967#1125370 (10Dzahn) [15:16:32] mobrovac gwicke yep data was wiped before starting cassandra, if you want to give it a try too I'm all for it (looking at graphite too ATM) [15:16:45] bblack, i thikn they are unknown because graphite really has no data [15:16:59] i'm not really sure how check_graphite decides to make the alert happen [15:17:03] based on that [15:17:04] !log restarted cassandra on restbase1006 after clearing the data & removing it from its own seeds [15:17:06] 6operations, 6Analytics-Engineering, 7Graphite, 7Icinga: icinga UNKNOWN Varnishkafka Delivery Errors / varnishkafka data not in graphite - https://phabricator.wikimedia.org/T92965#1125408 (10Dzahn) [15:17:09] Logged the message, Master [15:17:11] but ja, you are right, they should alert [15:17:36] godog: ok [15:17:49] yeah I think the alert is on data values, therefore no-data==no-alert is how things are now, but we probably want no-data to be at least a warning, because without data we have no idea if there should have been an alert. [15:17:55] godog: it looks like it's streaming now [15:18:11] 6operations, 7Graphite, 7HHVM, 7Icinga: hhvm - Icinga - UNKNOWN queue size / busy threads - https://phabricator.wikimedia.org/T92967#1125410 (10Dzahn) graphite issue confirmed by @godog T92965#1125399 [15:18:29] !log demon Synchronized wmf-config/InitialiseSettings.php: restbase off for wikipedias (duration: 00m 06s) [15:18:32] Logged the message, Master [15:18:32] <^d> gwicke: plz confirm ^ [15:18:57] ^d: it'll do nothing [15:19:09] <^d> that it did nothing then :p [15:19:12] the VE bits were not yet up to date enough [15:19:16] It did something. [15:19:25] Beta Cluster VE now works again. [15:19:26] bblack, indeed, off the top of your head, any idea how to do that? [15:19:28] It's magic! [15:19:30] ;-) [15:19:31] we just want to avoid it coming on when they are updated to wmf21 tomorrow [15:19:33] ottomata: nope :) [15:19:36] haha [15:19:38] me neither :p [15:20:01] !log demon Synchronized php-1.25wmf21/includes/specials/SpecialUploadStash.php: (no message) (duration: 00m 07s) [15:20:04] Logged the message, Master [15:20:15] !log demon Synchronized php-1.25wmf20/includes/specials/SpecialUploadStash.php: (no message) (duration: 00m 07s) [15:20:19] Logged the message, Master [15:20:35] godog: did you delete /var/lib/cassandra/data, or all of /var/lib/cassandra/* earlier? [15:20:45] (03PS1) 10Faidon Liambotis: Tree-wide certificate path replacement, localcerts [puppet] - 10https://gerrit.wikimedia.org/r/197330 [15:20:47] (03PS1) 10Faidon Liambotis: certs: remove compatibility symlinks [puppet] - 10https://gerrit.wikimedia.org/r/197331 [15:20:49] (03PS1) 10Faidon Liambotis: certs: don't install pkcs12 certs on all systems [puppet] - 10https://gerrit.wikimedia.org/r/197332 [15:20:51] (03PS1) 10Faidon Liambotis: certs: inline create_pkcs12's only use and remove [puppet] - 10https://gerrit.wikimedia.org/r/197333 [15:20:52] <^d> gi11es: Your changes are live, plz confirm :) [15:20:53] (03PS1) 10Faidon Liambotis: certs: kill install_additional_key [puppet] - 10https://gerrit.wikimedia.org/r/197334 [15:20:55] (03PS1) 10Faidon Liambotis: Introduce a new sslcert module (to replace certs.pp) [puppet] - 10https://gerrit.wikimedia.org/r/197335 [15:20:55] ^d: Can I add a last-minute wmf21 one? [15:20:57] (03PS1) 10Faidon Liambotis: sslcert: add sslcert::ca define, use it from certs [puppet] - 10https://gerrit.wikimedia.org/r/197336 [15:20:59] (03PS1) 10Faidon Liambotis: sslcert: add sslcert::certificate [puppet] - 10https://gerrit.wikimedia.org/r/197337 [15:21:01] ^d: testing [15:21:01] (03PS1) 10Faidon Liambotis: Kill unused/old/test certificates [puppet] - 10https://gerrit.wikimedia.org/r/197338 [15:21:02] ^d: (Sorry.) [15:21:03] (03PS1) 10Faidon Liambotis: certs: remove legacy ensure => absent Files [puppet] - 10https://gerrit.wikimedia.org/r/197339 [15:21:05] (03PS1) 10Faidon Liambotis: sslcert: add sslcert::chainedcert [puppet] - 10https://gerrit.wikimedia.org/r/197340 [15:21:07] (03PS1) 10Faidon Liambotis: sslcert: generate chained certs automatically [puppet] - 10https://gerrit.wikimedia.org/r/197341 [15:21:10] bblack: ^ [15:21:15] bblack: https://gerrit.wikimedia.org/r/#/q/topic:sslcert,n,z would be the shortcut [15:21:15] lol [15:21:18] <^d> James_F: So needy! [15:21:24] ^d: That's me. :-) [15:21:27] https://gerrit.wikimedia.org/r/#/c/197260/ [15:21:29] 10Ops-Access-Requests, 6operations: Add Stephen LaPorte to dns-admin alias - https://phabricator.wikimedia.org/T92968#1125420 (10RobH) 3NEW [15:21:51] 10Ops-Access-Requests, 6operations: Add Stephen LaPorte to dns-admin alias - https://phabricator.wikimedia.org/T92968#1125427 (10RobH) [15:21:52] <^d> James_F: Get me a submodule update and fine [15:22:00] <^d> Also, I still need a submodule update for VE :) [15:22:00] ^d: Will do. [15:22:00] gwicke: the latter [15:22:06] (03CR) 10Glaisher: Provide the Citoid extension for test wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187132 (owner: 10Jforrester) [15:22:17] paravoid: I know it's a WIP branch, but any idea/faith that the current work works and should be merged? [15:22:32] well [15:22:36] gwicke: however I've removed restbase1006 from the list of seeds in cassandra.yaml [15:22:38] it's untested, but it should work [15:22:43] it is, however, incomplete [15:22:51] in the sense that the goal was to ditch certs.pp altogether [15:22:52] 6operations, 10Wikimedia-Labs-Other, 7Tracking: (Tracking) Database replication services - https://phabricator.wikimedia.org/T50930#1125432 (10coren) [15:22:56] (03CR) 10jenkins-bot: [V: 04-1] certs: kill install_additional_key [puppet] - 10https://gerrit.wikimedia.org/r/197334 (owner: 10Faidon Liambotis) [15:23:01] 10Ops-Access-Requests, 6operations: Add Stephen LaPorte to dns-admin alias - https://phabricator.wikimedia.org/T92968#1125435 (10Dzahn) warn: be prepared for some spam [15:23:39] (03CR) 10Chad: mediawiki: add configs to support the Dallas DC (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [15:23:41] 10Ops-Access-Requests, 6operations: Add Stephen LaPorte to dns-admin alias - https://phabricator.wikimedia.org/T92968#1125436 (10RobH) He didn't ask about the overwhelming spam involved, I wanted to keep that a surprise for his first day on the alias ;D [15:24:02] (03CR) 10jenkins-bot: [V: 04-1] sslcert: generate chained certs automatically [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [15:24:09] ^d: https://gerrit.wikimedia.org/r/197342 [15:24:13] ^d: Adding to Deployments now. [15:25:16] godog: that's what I did as well; strange that it didn't stream properly the first time round [15:26:14] (03CR) 10Jforrester: Provide the Citoid extension for test wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187132 (owner: 10Jforrester) [15:26:27] (03PS2) 10Faidon Liambotis: certs: remove legacy ensure => absent Files [puppet] - 10https://gerrit.wikimedia.org/r/197339 [15:26:29] (03PS2) 10Faidon Liambotis: Kill unused/old/test certificates [puppet] - 10https://gerrit.wikimedia.org/r/197338 [15:26:31] (03PS2) 10Faidon Liambotis: sslcert: add sslcert::certificate [puppet] - 10https://gerrit.wikimedia.org/r/197337 [15:26:33] (03PS2) 10Faidon Liambotis: sslcert: add sslcert::ca define, use it from certs [puppet] - 10https://gerrit.wikimedia.org/r/197336 [15:26:35] (03PS2) 10Faidon Liambotis: sslcert: generate chained certs automatically [puppet] - 10https://gerrit.wikimedia.org/r/197341 [15:26:37] (03PS2) 10Faidon Liambotis: sslcert: add sslcert::chainedcert [puppet] - 10https://gerrit.wikimedia.org/r/197340 [15:26:39] (03PS2) 10Faidon Liambotis: Introduce a new sslcert module (to replace certs.pp) [puppet] - 10https://gerrit.wikimedia.org/r/197335 [15:26:41] (03PS2) 10Faidon Liambotis: certs: kill install_additional_key [puppet] - 10https://gerrit.wikimedia.org/r/197334 [15:27:01] 6operations, 10RESTBase-Cassandra: restbase1006 cassandra re-joining/bootstrap failure - https://phabricator.wikimedia.org/T92950#1125437 (10GWicke) Just did this (and removed it from its own seeds again), and it's now streaming / joining okay. [15:27:25] <^d> James_F: Is someone doing a submodule update for https://gerrit.wikimedia.org/r/#/c/196514/? [15:27:38] ^d: Oh, bugger. Yes, I'll do it. [15:27:45] godog, urandom, here is a patch to avoid listing a node as its own seeds: https://gerrit.wikimedia.org/r/#/c/195483/ [15:27:51] <^d> James_F: Thanks. You're awesome :) [15:28:23] * aude kindly asks https://gerrit.wikimedia.org/r/#/c/197323/ and https://gerrit.wikimedia.org/r/#/c/197321/ [15:28:30] or can do later ourselves [15:28:37] then https://gerrit.wikimedia.org/r/#/c/197324/ and https://gerrit.wikimedia.org/r/#/c/197322/ for wmf21 [15:29:29] ^d: https://gerrit.wikimedia.org/r/197343 [15:34:13] !log demon Synchronized php-1.25wmf21/extensions/Citoid: (no message) (duration: 00m 07s) [15:34:18] Logged the message, Master [15:34:47] <^d> aude: Doing [15:34:55] thanks :) [15:35:00] shall add to the wiki [15:35:06] :) [15:36:52] 6operations, 10RESTBase-Cassandra: restbase1006 cassandra re-joining/bootstrap failure - https://phabricator.wikimedia.org/T92950#1125462 (10GWicke) Speaking of seeds: https://gerrit.wikimedia.org/r/#/c/195483/ is a patch that avoids listing a node in its own seeds in the first place. [15:37:18] (03PS1) 10Filippo Giunchedi: graphite: increase relay queue size [puppet] - 10https://gerrit.wikimedia.org/r/197344 (https://phabricator.wikimedia.org/T92965) [15:38:38] (03PS2) 10Filippo Giunchedi: graphite: increase relay queue size [puppet] - 10https://gerrit.wikimedia.org/r/197344 (https://phabricator.wikimedia.org/T92965) [15:38:44] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] graphite: increase relay queue size [puppet] - 10https://gerrit.wikimedia.org/r/197344 (https://phabricator.wikimedia.org/T92965) (owner: 10Filippo Giunchedi) [15:39:18] ^d: all good [15:41:25] !log demon Synchronized php-1.25wmf21/extensions/VisualEditor/: (no message) (duration: 00m 06s) [15:41:29] <^d> gi11es: ok great [15:41:30] Logged the message, Master [15:41:35] <^d> James_F: VE and Citoid done for you [15:42:22] !log restart carbon/relay on graphite1001 [15:42:25] Logged the message, Master [15:43:33] !log demon Synchronized php-1.25wmf20/includes/jobqueue/: (no message) (duration: 00m 07s) [15:43:36] <^d> aude: Done for wmf20, wmf21 still in zuul-land [15:43:36] Logged the message, Master [15:43:45] Will test [15:43:50] thanks [15:44:15] > var_dump( $jobQueueGroup->get( 'UpdateRepoOnMove' )->delayedJobsEnabled() ); [15:44:15] bool(true) [15:44:18] :) [15:44:21] yay [15:46:32] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Configure fails with" [debs/contenttranslation/apertium-mk] - 10https://gerrit.wikimedia.org/r/195244 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [15:47:22] (03CR) 10Alexandros Kosiaris: "lttoolbox build dependency is missing. Merged the patch but a new one should be filled to fix it" [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/195229 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [15:50:11] gtg [15:50:33] PROBLEM - puppet last run on mw2017 is CRITICAL: CRITICAL: Puppet has 1 failures [15:50:39] !log demon Synchronized php-1.25wmf21/includes/jobqueue/: (no message) (duration: 00m 09s) [15:50:40] <^d> aude: And wmf21 ^ [15:50:42] Logged the message, Master [15:52:29] ^d thanks [15:52:33] checking [15:52:54] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Using the -1 to point out that apertium-hbs is a dependency and needs to build first. Will remove it myself. The patch LGTM" [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/195232 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [15:53:44] looks good [15:54:35] <^d> \o/ [15:54:37] (03CR) 10Faidon Liambotis: [C: 04-1] "I agree with Tim. This is too complicated for little benefit. I'd rather keep it KISS and make sure that we ensure => absent it the few ti" [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) (owner: 10coren) [15:54:37] <^d> Ok, swat done [15:54:44] <^d> Come tomorrow for new prizes and adventures! [15:54:45] :) [15:54:51] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Configure fails with:" [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/195264 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [15:55:30] (03CR) 10coren: "Fair 'nuf. Hence review. :-)" [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) (owner: 10coren) [15:55:45] (03Abandoned) 10coren: WIP: new security module for security::pam [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) (owner: 10coren) [15:55:48] ^d: :-) [15:56:47] paravoid: That said: ensure => absent won't suffice. We'll need an exec {} stanza. [15:56:58] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Marking with a -1 until apertium-hbs is built" [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/195275 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [15:58:27] (03PS1) 10KartikMistry: Added missing Build-Depends [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/197345 [15:58:50] Coren: I meant security::pam { ...: ensure => absent } [15:59:11] (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-mk] - 10https://gerrit.wikimedia.org/r/195244 (https://phabricator.wikimedia.org/T89936) [15:59:12] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 220, down: 0, dormant: 0, excluded: 0, unused: 0 [15:59:44] (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-dan] - 10https://gerrit.wikimedia.org/r/195897 (https://phabricator.wikimedia.org/T91493) [15:59:59] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Dependency is missing during building:" [debs/contenttranslation/apertium-mk-bg] - 10https://gerrit.wikimedia.org/r/195284 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [16:00:59] ACKNOWLEDGEMENT - RAID on analytics1010 is CRITICAL: CRITICAL: Active: 6, Working: 6, Failed: 2, Spare: 0 ottomata Cisco! [16:01:57] (03CR) 10Alexandros Kosiaris: [C: 04-1] "A build dependency is missing:" [debs/contenttranslation/apertium-fr-es] - 10https://gerrit.wikimedia.org/r/195577 (https://phabricator.wikimedia.org/T92252) (owner: 10KartikMistry) [16:02:27] jzerebecki: hey :) it's 5pm, how is it looking now? [16:02:50] testing [16:03:46] (03CR) 10Alexandros Kosiaris: [C: 04-1] "A build dependency is missing:" [debs/contenttranslation/apertium-af-nl] - 10https://gerrit.wikimedia.org/r/195838 (https://phabricator.wikimedia.org/T91750) (owner: 10KartikMistry) [16:04:19] (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-fr-es] - 10https://gerrit.wikimedia.org/r/195577 (https://phabricator.wikimedia.org/T92252) [16:05:35] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Marking as -1 until apertium-dan is built and uploaded on apt.w.o" [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/195905 (https://phabricator.wikimedia.org/T91493) (owner: 10KartikMistry) [16:06:10] (03PS3) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-af-nl] - 10https://gerrit.wikimedia.org/r/195838 (https://phabricator.wikimedia.org/T91750) [16:06:44] (03PS3) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-fr-es] - 10https://gerrit.wikimedia.org/r/195577 (https://phabricator.wikimedia.org/T92252) [16:06:52] RECOVERY - puppet last run on mw2017 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:07:12] paravoid: lag is fine for me right now [16:07:37] how about transfers? [16:07:42] FlorianSW: how about you? [16:07:58] paravoid: let me test, one minute :) [16:08:07] though gerrit is slower than it was earlier today, but I don't think that is transfer [16:10:17] paravoid: 2% [===> ] 565.010 55,1KB/s [16:10:18] :/ [16:10:36] the same as ever :( [16:10:38] FlorianSW: dumps.w.o ? [16:11:11] (03PS3) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-mk] - 10https://gerrit.wikimedia.org/r/195244 (https://phabricator.wikimedia.org/T89936) [16:11:51] jzerebecki: i'm waiting for git to say me the speed ;) But like i said, i have the same bad connection speed for nearly any wmf engineering infra, not only gerrit [16:12:01] same for phabricator (loading images is nearly impossible) [16:13:17] hrm, that's strange [16:13:29] but everything else works for you? [16:13:32] mh for me the transferrate varies between 38 to 65KB/s [16:13:58] which is probably slower than at other times, but need to actually measure that [16:14:19] how is it compared to other sites? [16:14:38] i.e. what's your connection's bandwidth [16:15:26] paravoid: with other sites i have no problem (Wikipedia is working fine, too, at least browsing) [16:15:54] jzerebecki: exact, i have these transfer rates all over the time :( [16:17:02] paravoid: 5,17MB/s and increasing (file was too small) for http://ftp.nl.debian.org [16:18:21] ok, how does it look now? [16:18:54] (03PS1) 1020after4: add roles for staging-rdb[12] [puppet] - 10https://gerrit.wikimedia.org/r/197348 [16:20:16] paravoid: varying around 1,34MB/s for dump.w.o [16:20:24] so clear improvement [16:20:37] good [16:20:40] FlorianSW? [16:21:00] 6operations, 6Analytics-Engineering, 7Graphite, 7Icinga, 5Patch-For-Review: icinga UNKNOWN Varnishkafka Delivery Errors / varnishkafka data not in graphite - https://phabricator.wikimedia.org/T92965#1125616 (10fgiunchedi) fix merged, seems to have recovered, pending creation of related alarms [16:21:22] paravoid: what did you change? [16:22:12] rerouted wikimedia->dtag path from GTT to Telia [16:22:34] (03PS1) 10Yuvipanda: sca: Don't include admin in labs [puppet] - 10https://gerrit.wikimedia.org/r/197349 (https://phabricator.wikimedia.org/T91554) [16:23:56] (03PS2) 10Yuvipanda: sca: Don't include admin in labs [puppet] - 10https://gerrit.wikimedia.org/r/197349 (https://phabricator.wikimedia.org/T91554) [16:24:28] (03CR) 10Yuvipanda: [C: 032] sca: Don't include admin in labs [puppet] - 10https://gerrit.wikimedia.org/r/197349 (https://phabricator.wikimedia.org/T91554) (owner: 10Yuvipanda) [16:26:09] interesting git fetch from gerrit feels much faster [16:26:36] FlorianSW: hey? [16:27:35] paravoid: sorry, i test again now :) [16:27:41] (03CR) 10Yuvipanda: [C: 04-1] add roles for staging-rdb[12] (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197348 (owner: 1020after4) [16:28:41] paravoid: sorry :( tools.wmflabs.org -> 1,6 MB/s (maximum connection speed) [16:28:56] so, fixed? [16:30:49] hi hoo [16:31:00] hi [16:31:13] if you're at home, you might be happy to hear that we (probably :) fixed the network congestion issue you were experiencing [16:31:15] o/ [16:31:39] 7Puppet, 6operations, 10Deployment-Systems, 10Staging: provider => trebuchet doesn't work until manual 'git deploy start' on deployment-server - https://phabricator.wikimedia.org/T92978#1125640 (10yuvipanda) [16:31:45] paravoid: Awesome :) I'm not at home today, but will be tomorrow [16:32:01] paravoid: seems so, let me test a git clon [16:32:21] whom do I poke for git-deploy things? [16:33:38] 6operations, 7network: Network congestion between DTAG & eqiad - https://phabricator.wikimedia.org/T92548#1125643 (10faidon) 5Open>3Resolved a:3faidon OK, I downprefed the reverse path via GTT (now it happens to go via Telia) and got positive confirmation on IRC from both @JanZerebecki & @Florian. I'll r... [16:33:50] paravoid: seems better here [16:34:13] paravoid: git clone is working like a charm now :-) [16:34:20] awesome [16:34:22] thanks :) [16:34:35] thanks to you :) [16:34:51] downloaded a patch [16:35:02] thanks :) [16:35:51] i hope that telia.net will work now, i heared no good things about TeliaSonera <-> DTAG in the last times :/ [16:36:26] let me know if you see any more issues [16:36:28] YuviPanda: if you mean the trebuchet git deploy, then ryan lane? if there is someone else in wikimedia, please tell me. [16:36:43] we have more alternatives too, so we can avoid both telia & gtt [16:37:28] but let's only do that if there's a reason [16:37:43] 7Puppet, 6operations, 10Deployment-Systems, 10Staging: provider => trebuchet doesn't work until manual 'git deploy start' on deployment-server - https://phabricator.wikimedia.org/T92978#1125647 (10yuvipanda) [16:37:52] Awesome [16:37:55] (03PS1) 10Filippo Giunchedi: gdash: fix carbon-relay dashboard [puppet] - 10https://gerrit.wikimedia.org/r/197351 [16:37:57] (03PS1) 10Filippo Giunchedi: graphite: add error alerts [puppet] - 10https://gerrit.wikimedia.org/r/197352 (https://phabricator.wikimedia.org/T92965) [16:38:02] paravoid: good. thx. [16:38:03] I guess labs users will love you as well [16:38:05] paravoid: yes, sure, just saying and "fingers crossed" :P [16:39:07] jzerebecki: I asked bd808 and he gave me a list of names of people who are somewhat familiar with it :) [16:39:43] YuviPanda: thx. i see who you added in phab. [16:39:58] (03CR) 10Hoo man: "My personal connectivity issues have been solved, but I still think doing this is a good idea to keep things simple." [puppet] - 10https://gerrit.wikimedia.org/r/196964 (owner: 10Hoo man) [16:40:13] mobrovac gwicke btw restbase1006 is depooled in pybal, I think we're fine to add it back [16:41:00] godog: yep, things are looking good [16:41:18] godog: ups, wait [16:41:45] godog: rb1006 cass is still joining the cluster, let's wait until it finishes [16:43:04] (03CR) 10Alex Monk: [C: 04-1] "see ticket" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196779 (https://phabricator.wikimedia.org/T92645) (owner: 10Glaisher) [16:43:12] mobrovac: restbase should be disjoint from cassandra though? IOW if restbase itself is working it can be added back I think [16:43:24] yep [16:43:25] (03PS13) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) [16:45:31] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: fix carbon-relay dashboard [puppet] - 10https://gerrit.wikimedia.org/r/197351 (owner: 10Filippo Giunchedi) [16:53:15] paravoid: loading images on phabricator without get a new cup of tea (or something else) is a great feeling :D I can't thank you enough, really :P [16:54:58] YuviPanda: hey duty ops. Would you mind closing a RT for me? I lack the user right apparently https://rt.wikimedia.org/Ticket/Display.html?id=8855 thx! [16:55:14] hashar: sure! [16:55:53] hashar: hmm, looks like I don’t have the rights to close that either [16:57:09] YuviPanda: hashar: done [16:57:17] akosiaris: cool :) [16:57:39] danke! [16:58:01] and: are procurement tickets still handled in RT ? [16:58:02] 6operations, 10RESTBase, 6Scrum-of-Scrums, 6Services, and 2 others: RESTbase deployment - https://phabricator.wikimedia.org/T1228#1125707 (10fgiunchedi) [16:58:03] 7Blocked-on-Operations, 6operations, 10RESTBase, 10hardware-requests, 7RESTBase-architecture: RESTBase production hardware - 5 of 6 ready - https://phabricator.wikimedia.org/T76986#1125706 (10fgiunchedi) [16:58:04] 6operations, 10ops-eqiad, 10RESTBase, 6Services: restbase1006 faulty disk controller - https://phabricator.wikimedia.org/T89639#1125704 (10fgiunchedi) 5Open>3Resolved restbase1006 reinstalled and back in service [16:58:53] (03PS1) 10Aude: Use wikidata touch icon for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197354 (https://phabricator.wikimedia.org/T92948) [16:59:58] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: move cassandra submodule into puppet repo - https://phabricator.wikimedia.org/T92560#1125713 (10fgiunchedi) I agree, frequently-updated puppet submodules are quite clunky to work with [17:00:04] maxsem, kaldari: Dear anthropoid, the time has come. Please deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150317T1700). [17:00:26] 6operations, 10ops-esams: Dear esams@rt.wikimedia.org, Call for Submissions on Various Academic Disciplines - https://phabricator.wikimedia.org/T92982#1125714 (10emailbot) [17:00:42] hashar: I believe the quoting etc. is while actual requests are handled in phab user side [17:03:01] JohnFLewis: will figure out tomorrow morning :) thx! [17:07:31] !log migrateAccount.php --auto finished [17:07:38] Logged the message, Master [17:08:11] (03PS1) 10Yuvipanda: scap: Clone mediawiki-config on all scap masters [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) [17:08:32] aawiki: Warning: fopen(/tmp/mw-cache-1.25wmf20/conf-aawiki): failed to open stream: Permission denied in /srv/mediawiki/wmf-config/CommonSettings.php on line 192 [17:08:34] uhoh [17:08:54] <_joe_> legoktm: where is that [17:09:03] <_joe_> and, what command did you launch? [17:09:08] _joe_: I'm running a maint script with foreachwiki on terbium [17:09:11] 6operations, 10ops-codfw: ms-be2007.codfw.wmnet: slot=3 dev=sdd failed - https://phabricator.wikimedia.org/T92835#1125735 (10Papaul) Drive replacement complete. [17:09:26] foreachwiki ../../../../../home/legoktm/forceRenameNotif/sendForceRenameNotification.php --message /home/legoktm/forceRenameNotif/message.txt --subject /home/legoktm/forceRenameNotif/subject [17:09:32] 6operations, 10ops-codfw: ms-be2007.codfw.wmnet: slot=8 dev=sdi failed - https://phabricator.wikimedia.org/T92834#1125739 (10Papaul) Drive replacement complete. [17:09:33] <_joe_> legoktm: I guess this can be some script trying to run as apache [17:09:48] <_joe_> legoktm: checking [17:10:16] it does sudo -u www-data I think [17:10:44] <_joe_> legoktm: which in this case is probably wrong, as the cache files are owned by mwdeploy here [17:11:30] <_joe_> while on the rest of the cluster they're owned by www-data, as I would expect them to [17:11:44] <_joe_> how are those caches generated on terbium? do you have any clue? [17:12:06] umm, no idea. just normal scap? [17:12:09] bd808: ^ ? [17:13:01] <_joe_> then root@terbium:/tmp/mw-cache-1.25wmf20# find . -user mwdeploy [17:13:01] <_joe_> ./conf-aawiki [17:13:04] legoktm: in a meeting [17:13:28] <_joe_> legoktm: ok I guess this is some error in deploying or whatever, I'll change permissions of that file [17:13:31] paravoid: No, I understand, it's just that the spec demands a --remove before removing the file. [17:13:50] Coren: easily doable in puppet [17:13:51] <_joe_> legoktm: try again [17:14:19] _joe_: woot, thanks :D [17:14:21] * YuviPanda goes for food [17:14:38] !log started sendForceRenameNotification.php (CentralAuth/SULF) for all wikis [17:14:43] Logged the message, Master [17:14:45] paravoid: Really? You can hook something on removal of a file that gets executed before the file is removed? That would solve so many problems! [17:14:51] <_joe_> legoktm: yw [17:15:18] Coren: file { '...': ensure => absent, require => Exec[...] } [17:15:23] paravoid: Or do you mean "do an exec {} alongside the enusre => absent? [17:15:48] Allright. This'll just have to be documented really well. [17:16:44] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [17:16:57] godog: we need to manually re-run the deploy first [17:17:03] paravoid: Ima give the class an $ensure parameter then so that I can do the if($ensure == 'absent') to clean up, does that sound sane to you? [17:17:11] yes [17:17:20] godog: the code on restbase1006 probably isn't up to date [17:17:26] class? define you mean? [17:17:32] Yes, define. [17:17:40] gwicke: ack, let me know when it is good to go [17:17:48] * Coren goes do that now. [17:19:40] howdie opsen! is there such a thing as a prod web server I could ssh into to verify that some requests are routed properly? [17:20:29] godog: updated the code & verified restbase working on restbase1006, so good to go [17:20:30] I'm seeing curl requests made from php to the current site that seem to hang [17:20:33] 6operations, 10ops-ulsfo: fan reversed on asw1-ulsfo - https://phabricator.wikimedia.org/T83978#1125777 (10RobH) 5stalled>3Resolved I have the paperwork to ship the mis-shipped unit (currently locked in my desk @ office) back tomorrow. I'm resolving this onsite ticket, since the power supply has been swap... [17:21:38] gwicke: ack [17:21:42] !log repool restbase1006 [17:21:48] Logged the message, Master [17:22:33] (03PS1) 10Dzahn: wikiartpedia: make empty template, link other [dns] - 10https://gerrit.wikimedia.org/r/197361 [17:22:36] 6operations, 10ops-codfw: ms-be2007.codfw.wmnet: slot=3 dev=sdd failed - https://phabricator.wikimedia.org/T92835#1125783 (10fgiunchedi) pending full resync [17:22:41] 6operations, 10ops-codfw: ms-be2007.codfw.wmnet: slot=8 dev=sdi failed - https://phabricator.wikimedia.org/T92834#1125784 (10fgiunchedi) pending full resync [17:24:58] (03PS2) 10Dzahn: wikiartpedia: make empty template, link other [dns] - 10https://gerrit.wikimedia.org/r/197361 [17:26:49] 6operations, 10ops-ulsfo: cp4009 hardware fault - https://phabricator.wikimedia.org/T92476#1125791 (10RobH) I went onsite today and removed all power from the system (full cord removal) and it didn't resolve the issue. (Was a last ditch effort, since I was on site anyhow.) I need to get setup with the Dell s... [17:26:55] 6operations, 10ops-ulsfo: cp4009 hardware fault - https://phabricator.wikimedia.org/T92476#1125792 (10RobH) a:3RobH [17:32:08] (03CR) 10Anomie: "This should work, as far as it goes, but I left some suggestions for improvement." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/197325 (https://phabricator.wikimedia.org/T75256) (owner: 10coren) [17:34:17] YuviPanda: question if you have time, pretty short. [17:34:59] nuria: eating food. I'll be back in 10 [17:35:05] YuviPanda: k [17:37:37] (03PS1) 10Dzahn: visualwikipedia: turn into links to empty template [dns] - 10https://gerrit.wikimedia.org/r/197362 [17:38:55] (03CR) 10Dzahn: [C: 04-1] "only after wikiartpedia.biz is actually an empty template https://gerrit.wikimedia.org/r/#/c/197361" [dns] - 10https://gerrit.wikimedia.org/r/197362 (owner: 10Dzahn) [17:40:50] (03PS1) 10Andrew Bogott: Add support for a pdns db admin account. [puppet] - 10https://gerrit.wikimedia.org/r/197367 (https://phabricator.wikimedia.org/T92984) [17:45:03] (03PS2) 10Andrew Bogott: Add support for a pdns db admin account. [puppet] - 10https://gerrit.wikimedia.org/r/197367 (https://phabricator.wikimedia.org/T92984) [17:45:28] _joe_: role classes wiht parameters, yes/no? [17:45:32] now that we have hiera [17:46:00] <_joe_> ottomata: nah, use parameters in the module classes [17:46:06] (03CR) 10Andrew Bogott: [C: 032] Add support for a pdns db admin account. [puppet] - 10https://gerrit.wikimedia.org/r/197367 (https://phabricator.wikimedia.org/T92984) (owner: 10Andrew Bogott) [17:46:16] <_joe_> roles should mostly include other classes that are then configured via hiera [17:46:34] <_joe_> I think if you need a class param in a role class you have badly designed something [17:46:39] <_joe_> but of course YMMV [17:46:54] ok, i thougth i had seen somewhere that they were being encouraged more or something, maybe i made that up [17:47:42] (03CR) 10John F. Lewis: [C: 031] "I guess." [dns] - 10https://gerrit.wikimedia.org/r/197361 (owner: 10Dzahn) [17:48:52] nuria: am back. ‘sup [17:49:19] YuviPanda: I am trying to add pageview metrics to wikimetrics via logster+ statsd labs [17:49:46] YuviPanda: do i need to add statsd puppet code to wikimetrics or labs "assumes" statsd is there? [17:50:12] we have statsd running on labmon1001.eqiad.wmnet [17:50:16] YuviPanda: cause is not like i want to run statsd myself, i want to use the one you set up on labs [17:50:19] so you can just assume statsd is running there. [17:50:47] YuviPanda: ok, so i just worry about logster then, right? [17:52:51] (03CR) 10Alex Monk: "Per T18655 perhaps we should just drop this entry" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194086 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [17:53:04] nuria: yup [17:53:23] _joe_: did you see my comments about the imagescalers ? [17:53:30] YuviPanda: thanks man, will try to get my changes in today [17:53:37] nuria: \o/ cool [17:55:01] 6operations, 6Labs: Wikitech registration for prior SVN user - https://phabricator.wikimedia.org/T90658#1125984 (10coren) Done. @Dragons_flight: please # reset your password via https://wikitech.wikimedia.org/wiki/Special:PasswordReset # log into https://gerrit.wikimedia.org/ # Add your ssh key there And you... [17:55:10] 6operations, 6Labs: Wikitech registration for prior SVN user - https://phabricator.wikimedia.org/T90658#1125985 (10coren) 5Open>3Resolved a:3coren [17:55:26] (03PS2) 10Yuvipanda: scap: Clone mediawiki-config on all scap masters [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) [17:59:26] fucking internet, can’t keep ssh alive... [17:59:35] (03CR) 10coren: Tool Labs: correctly handle comments in crontab (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197325 (https://phabricator.wikimedia.org/T75256) (owner: 10coren) [17:59:46] <_joe_> matanya: nope, I'll look [17:59:47] (03PS2) 10coren: Tool Labs: correctly handle comments in crontab [puppet] - 10https://gerrit.wikimedia.org/r/197325 (https://phabricator.wikimedia.org/T75256) [17:59:57] YuviPanda: use mosh [18:00:04] matanya: can’t on labs. [18:00:04] twentyafterfour, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150317T1800). Please do the needful. [18:00:13] mosh doesn’t do key forwarding nor proxycommand [18:00:13] <_joe_> matanya: where exactly? [18:00:13] so [18:00:41] _joe_: here yersterday, basically, i said i had some errors creating thumbnails [18:01:05] <_joe_> matanya: uhm ok, why poking me specifically? [18:01:29] (03PS3) 10Yuvipanda: scap: Clone mediawiki-config on all scap masters [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) [18:01:37] (03PS1) 10Chad: Consistantly case ElasticSearch as Elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/197373 [18:01:41] YuviPanda: FWIW I use autossh + screen, not a solution but at least reconnects by itself :D [18:01:50] anomie: Implemented both your suggestions (which were sane): https://gerrit.wikimedia.org/r/#/c/197325/2 [18:01:59] _joe_: you converted them to HAT [18:02:10] <_joe_> matanya: I did not [18:02:19] anomie: Also, Dragons flight should be all set. [18:02:26] <_joe_> https://phabricator.wikimedia.org/T84842 is stalled [18:03:12] (03CR) 10Anomie: Tool Labs: correctly handle comments in crontab (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197325 (https://phabricator.wikimedia.org/T75256) (owner: 10coren) [18:03:18] Coren: One typo in there [18:03:47] (03PS3) 10BBlack: certs: remove legacy ensure => absent Files [puppet] - 10https://gerrit.wikimedia.org/r/197339 (owner: 10Faidon Liambotis) [18:03:49] oh, sorry _joe_ for some reason i got the imperssion we are a newer version, and that might cause issues. [18:03:49] (03PS3) 10BBlack: Kill unused/old/test certificates [puppet] - 10https://gerrit.wikimedia.org/r/197338 (owner: 10Faidon Liambotis) [18:03:51] (03PS3) 10BBlack: sslcert: add sslcert::certificate [puppet] - 10https://gerrit.wikimedia.org/r/197337 (owner: 10Faidon Liambotis) [18:03:53] (03PS3) 10BBlack: sslcert: add sslcert::ca define, use it from certs [puppet] - 10https://gerrit.wikimedia.org/r/197336 (owner: 10Faidon Liambotis) [18:03:54] sorry for the noise [18:03:55] (03PS3) 10BBlack: sslcert: generate chained certs automatically [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [18:03:57] (03PS3) 10BBlack: sslcert: add sslcert::chainedcert [puppet] - 10https://gerrit.wikimedia.org/r/197340 (owner: 10Faidon Liambotis) [18:03:59] (03PS2) 10BBlack: certs: remove compatibility symlinks [puppet] - 10https://gerrit.wikimedia.org/r/197331 (owner: 10Faidon Liambotis) [18:04:01] (03PS2) 10BBlack: Tree-wide certificate path replacement, localcerts [puppet] - 10https://gerrit.wikimedia.org/r/197330 (owner: 10Faidon Liambotis) [18:04:03] (03PS3) 10BBlack: Introduce a new sslcert module (to replace certs.pp) [puppet] - 10https://gerrit.wikimedia.org/r/197335 (owner: 10Faidon Liambotis) [18:04:04] anomie: The importance of review. :-) [18:04:05] (03PS3) 10BBlack: certs: kill install_additional_key [puppet] - 10https://gerrit.wikimedia.org/r/197334 (owner: 10Faidon Liambotis) [18:04:07] (03PS2) 10BBlack: certs: inline create_pkcs12's only use and remove [puppet] - 10https://gerrit.wikimedia.org/r/197333 (owner: 10Faidon Liambotis) [18:04:09] (03PS2) 10BBlack: certs: don't install pkcs12 certs on all systems [puppet] - 10https://gerrit.wikimedia.org/r/197332 (owner: 10Faidon Liambotis) [18:04:11] (03PS3) 10coren: Tool Labs: correctly handle comments in crontab [puppet] - 10https://gerrit.wikimedia.org/r/197325 (https://phabricator.wikimedia.org/T75256) [18:04:32] <_joe_> !log updating libavcodec53 and libavformat53 on the imagescalers and videoscalers [18:04:35] Logged the message, Master [18:05:05] Oh i see where i saw it, on tech news [18:05:12] (03CR) 10Anomie: [C: 031] "I don't have +2 here, or I would." [puppet] - 10https://gerrit.wikimedia.org/r/197325 (https://phabricator.wikimedia.org/T75256) (owner: 10coren) [18:05:13] * matanya blames guillom [18:05:35] (03CR) 10Alex Monk: [C: 04-1] Convert some usages of 'wiki' to 'wikipedia' (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194075 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [18:05:53] (03CR) 10coren: [C: 032] Tool Labs: correctly handle comments in crontab [puppet] - 10https://gerrit.wikimedia.org/r/197325 (https://phabricator.wikimedia.org/T75256) (owner: 10coren) [18:05:58] <_joe_> matanya: it's not his fault, we were supposed to complete it but hit a serious roadblock [18:06:02] anomie: TY. [18:06:27] I see in the ticket, thanks _joe_ [18:06:37] (03PS4) 10Yuvipanda: scap: Clone mediawiki-config on all scap masters [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) [18:07:16] <^d> YuviPanda: https://gerrit.wikimedia.org/r/197373 is pretty trivial [18:07:25] <^d> (noticed when setting up a staging-elastic* host) [18:07:42] (03PS2) 10Yuvipanda: Consistantly case ElasticSearch as Elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/197373 (owner: 10Chad) [18:08:15] (03CR) 10Yuvipanda: [C: 032 V: 032] "Let's schedule this via our change management committee with involvement from all relevant stakeholders." [puppet] - 10https://gerrit.wikimedia.org/r/197373 (owner: 10Chad) [18:08:58] 6operations, 7network: Network congestion between DTAG & eqiad - https://phabricator.wikimedia.org/T92548#1126005 (10Aklapper) Thank you Faidon for forwarding this (and Florian and aude bringing this up)! Just for completeness, to answer @Florian's question (and trying around 18:00UTC): >>! In T92548#1125133... [18:09:42] <^d> YuviPanda: Needs a deployment plan [18:09:57] ^d: does it need a deployment window as well? do we need manager approval? [18:10:12] <^d> I think we might even need a board resolution [18:10:30] ^d: alright, you email wikimedia-l, I’ll go find some goats to sacrifice... [18:10:43] <_joe_> tsk [18:10:45] <_joe_> newbies [18:11:00] <_joe_> I've actually worked with an entity that did ITILv3 seriously [18:11:33] (03PS4) 10BBlack: certs: remove legacy ensure => absent Files [puppet] - 10https://gerrit.wikimedia.org/r/197339 (owner: 10Faidon Liambotis) [18:11:35] (03PS4) 10BBlack: Kill unused/old/test certificates [puppet] - 10https://gerrit.wikimedia.org/r/197338 (owner: 10Faidon Liambotis) [18:11:37] <_joe_> they did a disrupting change that brought down 4 major e-commerce sites, and would only implement the fix after they got VP-level approvals [18:11:37] (03PS4) 10BBlack: sslcert: add sslcert::certificate [puppet] - 10https://gerrit.wikimedia.org/r/197337 (owner: 10Faidon Liambotis) [18:11:39] (03PS4) 10BBlack: sslcert: add sslcert::ca define, use it from certs [puppet] - 10https://gerrit.wikimedia.org/r/197336 (owner: 10Faidon Liambotis) [18:11:41] (03PS4) 10BBlack: sslcert: generate chained certs automatically [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [18:11:43] (03PS4) 10BBlack: sslcert: add sslcert::chainedcert [puppet] - 10https://gerrit.wikimedia.org/r/197340 (owner: 10Faidon Liambotis) [18:11:45] <_joe_> 2 hours of outage waiting for that [18:11:45] (03PS3) 10BBlack: certs: remove compatibility symlinks [puppet] - 10https://gerrit.wikimedia.org/r/197331 (owner: 10Faidon Liambotis) [18:11:47] (03PS3) 10BBlack: Tree-wide certificate path replacement, localcerts [puppet] - 10https://gerrit.wikimedia.org/r/197330 (owner: 10Faidon Liambotis) [18:11:49] (03PS4) 10BBlack: Introduce a new sslcert module (to replace certs.pp) [puppet] - 10https://gerrit.wikimedia.org/r/197335 (owner: 10Faidon Liambotis) [18:11:51] (03PS4) 10BBlack: certs: kill install_additional_key [puppet] - 10https://gerrit.wikimedia.org/r/197334 (owner: 10Faidon Liambotis) [18:11:53] (03PS3) 10BBlack: certs: inline create_pkcs12's only use and remove [puppet] - 10https://gerrit.wikimedia.org/r/197333 (owner: 10Faidon Liambotis) [18:11:55] (03PS3) 10BBlack: certs: don't install pkcs12 certs on all systems [puppet] - 10https://gerrit.wikimedia.org/r/197332 (owner: 10Faidon Liambotis) [18:14:40] (03PS2) 10Dzahn: set up bonded interface for ms1001 plus ipv6 for it [puppet] - 10https://gerrit.wikimedia.org/r/193837 (owner: 10ArielGlenn) [18:14:51] (03CR) 10BBlack: [C: 031] Tree-wide certificate path replacement, localcerts [puppet] - 10https://gerrit.wikimedia.org/r/197330 (owner: 10Faidon Liambotis) [18:15:02] _joe_: oh wow. [18:15:24] (03CR) 10BBlack: [C: 031] certs: remove compatibility symlinks [puppet] - 10https://gerrit.wikimedia.org/r/197331 (owner: 10Faidon Liambotis) [18:15:33] (03CR) 10BBlack: [C: 031] certs: don't install pkcs12 certs on all systems [puppet] - 10https://gerrit.wikimedia.org/r/197332 (owner: 10Faidon Liambotis) [18:15:46] (03CR) 10BBlack: [C: 031] certs: inline create_pkcs12's only use and remove [puppet] - 10https://gerrit.wikimedia.org/r/197333 (owner: 10Faidon Liambotis) [18:15:52] (03CR) 10BBlack: [C: 031] certs: kill install_additional_key [puppet] - 10https://gerrit.wikimedia.org/r/197334 (owner: 10Faidon Liambotis) [18:15:58] (03CR) 10BBlack: [C: 031] Introduce a new sslcert module (to replace certs.pp) [puppet] - 10https://gerrit.wikimedia.org/r/197335 (owner: 10Faidon Liambotis) [18:16:05] (03CR) 10BBlack: [C: 031] sslcert: add sslcert::ca define, use it from certs [puppet] - 10https://gerrit.wikimedia.org/r/197336 (owner: 10Faidon Liambotis) [18:16:22] (03CR) 10BBlack: [C: 031] sslcert: add sslcert::certificate [puppet] - 10https://gerrit.wikimedia.org/r/197337 (owner: 10Faidon Liambotis) [18:16:27] <_joe_> YuviPanda: for quite a long time they also used remedy, which was only barely working with chrome, but advertised as "IE ONLY" [18:16:29] (03CR) 10BBlack: [C: 031] Kill unused/old/test certificates [puppet] - 10https://gerrit.wikimedia.org/r/197338 (owner: 10Faidon Liambotis) [18:16:38] (03CR) 10BBlack: [C: 031] certs: remove legacy ensure => absent Files [puppet] - 10https://gerrit.wikimedia.org/r/197339 (owner: 10Faidon Liambotis) [18:16:57] (03CR) 10BBlack: [C: 031] sslcert: add sslcert::chainedcert [puppet] - 10https://gerrit.wikimedia.org/r/197340 (owner: 10Faidon Liambotis) [18:17:04] (03CR) 10BBlack: [C: 031] sslcert: generate chained certs automatically [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [18:17:20] if anyone else wants to review those, btw.. :) [18:22:51] (03CR) 10MaxSem: Convert some usages of 'wiki' to 'wikipedia' (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194075 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [18:22:59] Krenair, ^^^ [18:24:06] (03Abandoned) 10Faidon Liambotis: Autogenerate chained certificates [puppet] - 10https://gerrit.wikimedia.org/r/163798 (owner: 10coren) [18:24:30] (03Abandoned) 10Faidon Liambotis: Remove obsolete explicit ca certs references [puppet] - 10https://gerrit.wikimedia.org/r/165012 (owner: 10coren) [18:24:46] _joe_: one more question: https://phabricator.wikimedia.org/T91908 would you agree to my last statement ? [18:25:07] <_joe_> matanya: I would [18:25:33] akosiaris: same question to you :) [18:25:49] MaxSem, maybe we should move the test wikis to be special wikimedia.org wikis and not wikipedia subdomains [18:26:15] then they can have the wikimedia favicon [18:26:41] arbcom_* and wg_en should keep the wikipedia favicon [18:26:54] maybe. but that's not blocking wrt the above commit that removes unrelated icons [18:27:00] (03CR) 10BBlack: [C: 032] Tree-wide certificate path replacement, localcerts [puppet] - 10https://gerrit.wikimedia.org/r/197330 (owner: 10Faidon Liambotis) [18:27:06] arbcom are not wps [18:27:21] they're all very small subgroups of wikipedias [18:29:05] if you move them to wikimedia.org you can never have an arbcom of another project [18:29:20] so either they are part of WP or not [18:29:34] I think you are confusing me talking about two different things, mutante [18:29:43] * mutante shuts up [18:29:52] test and test2 should not be wikipedias and should be moved [18:30:07] arbcom_* and wg_en should not be [18:30:19] ok [18:30:29] MaxSem, those sites used to be subdomains of their respective wikipedias [18:30:38] they aren't now because of the ssl issues [18:31:09] and sanity! because f sanity too! [18:31:10] (03PS1) 10Aude: Don't use bits for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197381 (https://phabricator.wikimedia.org/T92949) [18:32:07] also, we shouldn't use underscores in cases like that, we should use dashes [18:32:12] (03PS2) 10Gage: IPsec: big off switch [puppet] - 10https://gerrit.wikimedia.org/r/196498 [18:32:19] underscores are not technically legal in a hostname, even though they're legal in DNS data. [18:32:25] (03CR) 10MaxSem: [C: 031] Don't use bits for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197381 (https://phabricator.wikimedia.org/T92949) (owner: 10Aude) [18:32:35] (the usual sane use of them is for DNS metadata records that aren't hosts) [18:32:37] bblack, these are DB names not subdomains [18:32:56] arbcom-de.wikipedia.org -> arbcom_dewiki [18:33:16] and before that it used to be arbcom.de.wikipedia.org afair [18:33:20] right [18:33:27] it should have wikipedia logos etc. [18:33:32] which was killed for obvious cert reason [18:33:32] not wikimedia ones [18:33:35] ah ok, I thought you were talking about hostnames [18:33:38] no [18:34:59] (03CR) 10Gage: "* Added explicit blocking and non-blocking modes" [puppet] - 10https://gerrit.wikimedia.org/r/196498 (owner: 10Gage) [18:36:43] (03PS1) 1020after4: Group1 wikis to 1.25wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197382 [18:38:32] (03CR) 1020after4: [C: 032] Group1 wikis to 1.25wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197382 (owner: 1020after4) [18:38:36] (03Merged) 10jenkins-bot: Group1 wikis to 1.25wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197382 (owner: 1020after4) [18:39:31] !log Deploying 1.25wmf21 to group1 wikis. [18:39:38] Logged the message, Master [18:42:02] (03CR) 10Alex Monk: Convert some usages of 'wiki' to 'wikipedia' (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194075 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [18:43:46] (03CR) 10BryanDavis: scap: Clone mediawiki-config on all scap masters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) (owner: 10Yuvipanda) [18:44:40] (03CR) 10Yuvipanda: scap: Clone mediawiki-config on all scap masters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) (owner: 10Yuvipanda) [18:46:24] (03CR) 10GWicke: "@ottomata, nothing stops people from syncing the external repo with the core puppet module. I think the occasional copy & commit is less p" [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [18:48:09] (03PS1) 10Legoktm: Add logmsgbot instance for #wikimedia-releng that listens to gallium [puppet] - 10https://gerrit.wikimedia.org/r/197386 [18:52:34] godog: you still around? [18:55:10] 6operations, 10RESTBase: restbase1006 not showing up in graphite cassandra metrics - https://phabricator.wikimedia.org/T92989#1126070 (10GWicke) 3NEW [18:55:45] 6operations, 10ops-ulsfo: cp4009 hardware fault - https://phabricator.wikimedia.org/T92476#1126078 (10Cmjohnson) Work Order submitted for a system board replacement. Once approved. The task will be updated [18:57:05] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.25wmf21 [18:57:15] Logged the message, Master [18:57:49] (03PS1) 10Yuvipanda: Add example PrivateSettings file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197389 [18:57:56] legoktm: thcipriani bd808 ^ [18:59:41] YuviPanda: eh. Private isn't just a symlink in prod? [19:00:02] bd808: it is a symlink to ../private/PrivateSettings.php [19:00:10] *nod* [19:00:32] bd808: so this is just an ‘example’ file, for new setups... [19:00:46] right [19:00:55] lgtm [19:02:49] bd808: cool, ty [19:12:06] (03PS7) 10Chad: Hiera-ize most of the Elasticsearch config [puppet] - 10https://gerrit.wikimedia.org/r/196640 [19:14:18] (03CR) 10BBlack: [C: 032] certs: remove compatibility symlinks [puppet] - 10https://gerrit.wikimedia.org/r/197331 (owner: 10Faidon Liambotis) [19:16:31] 6operations, 10RESTBase, 6Scrum-of-Scrums, 6Services, and 2 others: RESTbase deployment - https://phabricator.wikimedia.org/T1228#1126116 (10GWicke) [19:16:34] 7Blocked-on-Operations, 6operations, 10RESTBase, 10hardware-requests, 7RESTBase-architecture: RESTBase production hardware - 5 of 6 ready - https://phabricator.wikimedia.org/T76986#1126114 (10GWicke) 5Open>3Resolved Resolving with restbase1006 now back in operation. [19:16:37] (03CR) 10BBlack: [C: 032] certs: don't install pkcs12 certs on all systems [puppet] - 10https://gerrit.wikimedia.org/r/197332 (owner: 10Faidon Liambotis) [19:16:50] (03CR) 10BBlack: [C: 032] certs: inline create_pkcs12's only use and remove [puppet] - 10https://gerrit.wikimedia.org/r/197333 (owner: 10Faidon Liambotis) [19:17:25] (03CR) 10Chad: [C: 031] "Also works in staging" [puppet] - 10https://gerrit.wikimedia.org/r/196640 (owner: 10Chad) [19:17:36] <^d> YuviPanda: woot ^ [19:18:59] (03CR) 10BBlack: [C: 032] certs: kill install_additional_key [puppet] - 10https://gerrit.wikimedia.org/r/197334 (owner: 10Faidon Liambotis) [19:20:34] (03CR) 10BBlack: [C: 032] Introduce a new sslcert module (to replace certs.pp) [puppet] - 10https://gerrit.wikimedia.org/r/197335 (owner: 10Faidon Liambotis) [19:22:57] !log temporarily disabling puppet on all cp* [19:23:02] Logged the message, Master [19:23:03] PROBLEM - Host cerium is DOWN: CRITICAL - Bogus ICMP: Port Unreachable (10.64.16.147) [19:23:23] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: puppet fail [19:23:43] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: Puppet has 1 failures [19:23:52] PROBLEM - puppet last run on ms-fe1003 is CRITICAL: CRITICAL: Puppet has 1 failures [19:23:52] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: puppet fail [19:23:53] PROBLEM - puppet last run on mw1067 is CRITICAL: CRITICAL: puppet fail [19:23:53] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:11] oh god, puppetfail spam incoming [19:24:13] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:23] PROBLEM - puppet last run on mw1031 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:33] PROBLEM - puppet last run on logstash1003 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:33] PROBLEM - puppet last run on mw1072 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:43] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:43] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:43] PROBLEM - puppet last run on mw1062 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:45] Not too bad so far. [19:24:52] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:53] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:53] PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:53] PROBLEM - puppet last run on mw1089 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:53] PROBLEM - puppet last run on mw2046 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:53] PROBLEM - puppet last run on mw2028 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:58] it's going to get bad [19:25:02] PROBLEM - puppet last run on mw1134 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:02] PROBLEM - puppet last run on neptunium is CRITICAL: CRITICAL: puppet fail [19:25:03] PROBLEM - puppet last run on mw1048 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:23] PROBLEM - puppet last run on mw1080 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:32] (03PS1) 10BBlack: Revert "Introduce a new sslcert module (to replace certs.pp)" [puppet] - 10https://gerrit.wikimedia.org/r/197395 [19:25:44] (03CR) 10BBlack: [C: 032 V: 032] Revert "Introduce a new sslcert module (to replace certs.pp)" [puppet] - 10https://gerrit.wikimedia.org/r/197395 (owner: 10BBlack) [19:26:11] ^ that should stop it, other than the ones already in-progress coming in for another minute or two [19:26:54] bblack, out of curiosity, how long does a puppet run take these days? [19:26:56] we can just kill the bot if desired. puppet will bring it back [19:27:32] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: puppet fail [19:27:36] seems like we're ok spamwise [19:28:03] a puppet run on neon will take much longer than on most other servers, it depends what role they are [19:28:05] MaxSem: it varies tremendously by host. The best real cases I see commonly, when nothing is changing, are ~20s or so [19:28:18] thanks [19:28:19] but some overloaded boxes with tons of puppet code managing them can take minutes [19:28:23] RECOVERY - Host cerium is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [19:28:34] that sounds awfully faster than when I started:P [19:29:05] that whole recurring problem with puppet umask values was one of the fallouts of one of the key puppet client perf fixups recently :) [19:29:24] PROBLEM - Host cerium is DOWN: CRITICAL - Bogus ICMP: Port Unreachable (10.64.16.147) [19:29:24] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:25] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:25] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:25] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:28] hasharAway: hmm, you are not there. [19:29:33] PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:33] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:35] who else works on jenkins? [19:29:43] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:44] PROBLEM - puppet last run on db2040 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:44] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:47] ^d do you work on jenkins? [19:29:52] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:53] PROBLEM - puppet last run on mw2022 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:53] PROBLEM - puppet last run on virt1001 is CRITICAL: CRITICAL: puppet fail [19:29:53] PROBLEM - puppet last run on nembus is CRITICAL: CRITICAL: puppet fail [19:29:53] PROBLEM - puppet last run on analytics1010 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:03] PROBLEM - puppet last run on virt1003 is CRITICAL: CRITICAL: puppet fail [19:30:03] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:03] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:04] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:06] <^d> I try not to :p What's up? [19:30:13] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:23] PROBLEM - puppet last run on virt1004 is CRITICAL: CRITICAL: puppet fail [19:30:46] ottomata: Members saper, Krinkle, Addshore, hashar, chasemp [19:31:02] PROBLEM - puppet last run on mw2011 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:18] odd that those are still coming in failing, perhaps strontium sync again? [19:31:53] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:31:54] nope. beats me, perhaps just late notifications. [19:32:02] PROBLEM - puppet last run on virt1010 is CRITICAL: CRITICAL: puppet fail [19:32:22] the delay it takes for icinga to catch up i think [19:32:23] PROBLEM - puppet last run on virt1007 is CRITICAL: CRITICAL: puppet fail [19:32:24] (db1021 ran fine manually for me, and is not a slow executor) [19:32:42] ^d, i'm looking for the jenkins config that does verification of commits to operations/puppet/cdh [19:33:07] <^d> `integration/config.git repo` [19:33:41] <^d> zuul/layout.yaml, most likely [19:33:57] danke, looking [19:35:14] * Krinkle is disturbed by the name 'strontium'. Considering that 'stront' is Dutch for shit [19:35:19] It's called shitium [19:36:23] (03CR) 10Ottomata: "> I think the occasional copy & commit is less painful than..." [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [19:36:35] <^d> Krinkle: "Both strontium and strontianite are named after Strontian, a village in Scotland near which the mineral was discovered in 1790 by Adair Crawford and William Cruickshank." [19:36:44] <^d> Blame the Scots [19:37:12] PROBLEM - puppet last run on virt1011 is CRITICAL: CRITICAL: puppet fail [19:37:43] PROBLEM - puppet last run on virt1008 is CRITICAL: CRITICAL: puppet fail [19:37:51] I'mlooking for an etymological connection between the foul slang word for feces and Scotland [19:39:23] It's named after the nose of fairies. [19:39:24] the virt failures are separate from the change I reverted (but from the same series), looking into that now. [19:39:26] amazing [19:39:32] PROBLEM - puppet last run on virt1009 is CRITICAL: CRITICAL: puppet fail [19:40:52] RECOVERY - puppet last run on mw1062 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:40:59] (03CR) 10Ottomata: "> @ottomata, nothing stops people from syncing the external repo with the core puppet module" [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [19:41:03] RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [19:41:13] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [19:41:33] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [19:41:36] Krinkle: "From Middle Dutch stront (“shit”). Cognate with English strunt" and then on "strunt": (Scotland) spirituous liquor [19:41:53] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [19:41:54] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [19:41:58] (03PS5) 10Yuvipanda: scap: Clone mediawiki-config on all scap masters [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) [19:42:03] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [19:42:03] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:42:03] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [19:42:04] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [19:42:04] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:42:11] (03PS6) 10Yuvipanda: scap: Clone mediawiki-config on all scap masters [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) [19:42:13] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:42:13] RECOVERY - puppet last run on mw2046 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [19:42:14] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:42:22] RECOVERY - puppet last run on ms-fe1003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [19:42:22] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [19:42:23] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [19:42:33] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:42:34] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [19:42:47] (03CR) 10Krinkle: [C: 04-1] Don't use bits for test.wikidata (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197381 (https://phabricator.wikimedia.org/T92949) (owner: 10Aude) [19:42:52] PROBLEM - puppet last run on virt1012 is CRITICAL: CRITICAL: puppet fail [19:42:54] PROBLEM - puppet last run on virt1002 is CRITICAL: CRITICAL: puppet fail [19:43:02] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [19:43:23] RECOVERY - puppet last run on mw1089 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:43:32] RECOVERY - puppet last run on mw2028 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:44:40] (03PS7) 10Yuvipanda: scap: Clone mediawiki-config on all scap masters [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) [19:45:18] 6operations, 10Citoid, 6Services: Add citoid service alerts to the "services" group for SMS alerts - https://phabricator.wikimedia.org/T92887#1126224 (10Dzahn) @mobrovac let me know your phone number in a PM or so and we can start from there by creating a contact [19:45:53] RECOVERY - puppet last run on db2040 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [19:46:02] RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [19:46:03] (03PS1) 10Ottomata: Use literal tab characters in eventlogging varnishkafka format [puppet] - 10https://gerrit.wikimedia.org/r/197398 [19:46:13] (03CR) 10Aude: Don't use bits for test.wikidata (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197381 (https://phabricator.wikimedia.org/T92949) (owner: 10Aude) [19:46:18] (03PS2) 10Aude: Don't use bits for test.wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197381 (https://phabricator.wikimedia.org/T92949) [19:46:43] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [19:46:53] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [19:47:03] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:47:03] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [19:47:03] RECOVERY - puppet last run on mw2011 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [19:47:13] RECOVERY - puppet last run on mw2022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:47:18] (03PS1) 10BBlack: fix nova key require [puppet] - 10https://gerrit.wikimedia.org/r/197399 [19:47:23] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [19:47:23] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [19:47:33] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [19:47:33] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:47:42] (03PS2) 10BBlack: fix nova key require [puppet] - 10https://gerrit.wikimedia.org/r/197399 [19:47:53] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [19:48:03] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [19:48:03] RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:48:23] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:49:08] is jenkins-bot ok? [19:49:22] well jenkins itself [19:49:27] (03PS1) 10Andrew Bogott: Switch designate back to using the proper sql server. [puppet] - 10https://gerrit.wikimedia.org/r/197400 [19:49:46] (03CR) 10BBlack: [C: 032 V: 032] fix nova key require [puppet] - 10https://gerrit.wikimedia.org/r/197399 (owner: 10BBlack) [19:51:25] (03PS3) 10Dzahn: set up bonded interface for ms1001 plus ipv6 for it [puppet] - 10https://gerrit.wikimedia.org/r/193837 (owner: 10ArielGlenn) [19:51:43] (03CR) 10Dzahn: [C: 031] set up bonded interface for ms1001 plus ipv6 for it [puppet] - 10https://gerrit.wikimedia.org/r/193837 (owner: 10ArielGlenn) [19:52:53] RECOVERY - Host cerium is UP: PING OK - Packet loss = 0%, RTA = 1.16 ms [19:53:40] 6operations, 6Labs, 10hardware-requests: eqiad: (5) labs virt nodes - https://phabricator.wikimedia.org/T89752#1126241 (10RobH) [19:53:58] (03PS1) 10BBlack: fix nova key source name [puppet] - 10https://gerrit.wikimedia.org/r/197401 [19:54:09] (03CR) 10BBlack: [C: 032 V: 032] fix nova key source name [puppet] - 10https://gerrit.wikimedia.org/r/197401 (owner: 10BBlack) [19:54:51] (03CR) 10Andrew Bogott: [C: 032] Switch designate back to using the proper sql server. [puppet] - 10https://gerrit.wikimedia.org/r/197400 (owner: 10Andrew Bogott) [19:55:14] andrewbogott: got yours [19:55:16] (03CR) 10Krinkle: "Does this create an extra server-side writer "relogmsgbot" to the existing logmsgbot IRC bot. Or does this create a differently nicknamed " [puppet] - 10https://gerrit.wikimedia.org/r/197386 (owner: 10Legoktm) [19:55:21] (03PS8) 10Yuvipanda: scap: Clone mediawiki-config on all scap masters [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) [19:55:28] bblack: ok, thanks [19:55:38] (03CR) 10Yuvipanda: [C: 032 V: 032] scap: Clone mediawiki-config on all scap masters [puppet] - 10https://gerrit.wikimedia.org/r/197355 (https://phabricator.wikimedia.org/T88442) (owner: 10Yuvipanda) [19:56:13] PROBLEM - Host cerium is DOWN: CRITICAL - Bogus ICMP: Port Unreachable (10.64.16.147) [19:56:22] uhm [19:56:25] why is puppet disabled on tin?! [19:56:32] good question! don't know [19:56:43] RECOVERY - puppet last run on virt1009 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [19:56:44] * YuviPanda looks at sal [19:56:53] RECOVERY - puppet last run on virt1011 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [19:57:00] > 00:01 Tim: on tin: disabling puppet for scap test. Patching scap locally [19:57:16] TimStarling: ^ can I re-enable tin on scap? [19:57:18] brrrr [19:57:22] Can I re-enable puppet on tin? [19:58:53] (03CR) 10Legoktm: "AIUI, this will create a new "relogmsgbot" that we can have say stuff on IRC (from gallium: echo "foo" | nc -q0 neon.wikimedia.org 9200; o" [puppet] - 10https://gerrit.wikimedia.org/r/197386 (owner: 10Legoktm) [19:59:08] 6operations, 6Phabricator: Delete specific user account in Phabricator - https://phabricator.wikimedia.org/T93001#1126269 (10Dzahn) [20:00:12] RECOVERY - puppet last run on virt1002 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [20:00:23] !log stashing TimStarling’s changes to scap, re-enabling puppet on tin [20:00:27] Logged the message, Master [20:00:53] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet last ran 19 hours ago [20:01:22] RECOVERY - puppet last run on virt1012 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [20:03:23] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [20:03:55] (03PS2) 10Yuvipanda: Add example PrivateSettings file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197389 [20:04:04] (03CR) 10Yuvipanda: [C: 032] Add example PrivateSettings file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197389 (owner: 10Yuvipanda) [20:04:10] (03Merged) 10jenkins-bot: Add example PrivateSettings file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197389 (owner: 10Yuvipanda) [20:04:13] 6operations, 6Phabricator: Delete specific user account in Phabricator - https://phabricator.wikimedia.org/T93001#1126303 (10matmarex) Confirmation: https://meta.wikimedia.org/w/index.php?title=User_talk:Wieralee&diff=11566986&oldid=11421453#Phabricator_-_a_problem_with_an_email [20:04:44] !log yuvipanda Synchronized private: (no message) (duration: 00m 06s) [20:04:49] Logged the message, Master [20:05:44] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [20:06:54] RECOVERY - puppet last run on virt1001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [20:07:03] RECOVERY - puppet last run on virt1003 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [20:07:17] (03PS5) 10BBlack: certs: remove legacy ensure => absent Files [puppet] - 10https://gerrit.wikimedia.org/r/197339 (owner: 10Faidon Liambotis) [20:07:17] (03PS5) 10BBlack: Kill unused/old/test certificates [puppet] - 10https://gerrit.wikimedia.org/r/197338 (owner: 10Faidon Liambotis) [20:07:19] (03PS5) 10BBlack: sslcert: add sslcert::certificate [puppet] - 10https://gerrit.wikimedia.org/r/197337 (owner: 10Faidon Liambotis) [20:07:21] (03PS5) 10BBlack: sslcert: add sslcert::ca define, use it from certs [puppet] - 10https://gerrit.wikimedia.org/r/197336 (owner: 10Faidon Liambotis) [20:07:23] (03PS5) 10BBlack: sslcert: generate chained certs automatically [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [20:07:25] (03PS5) 10BBlack: sslcert: add sslcert::chainedcert [puppet] - 10https://gerrit.wikimedia.org/r/197340 (owner: 10Faidon Liambotis) [20:07:27] (03PS1) 10BBlack: Introduce a new sslcert module (to replace certs.pp) (#2) [puppet] - 10https://gerrit.wikimedia.org/r/197404 [20:07:32] RECOVERY - puppet last run on virt1004 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [20:08:11] 6operations, 6Phabricator: Delete specific user account in Phabricator - https://phabricator.wikimedia.org/T93001#1126322 (10yuvipanda) 5Open>3Resolved Done [20:09:03] RECOVERY - puppet last run on virt1010 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [20:09:33] RECOVERY - puppet last run on virt1007 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [20:10:42] PROBLEM - Host xenon is DOWN: CRITICAL - Bogus ICMP: Port Unreachable (10.64.0.200) [20:11:12] PROBLEM - Host praseodymium is DOWN: CRITICAL - Bogus ICMP: Port Unreachable (10.64.16.149) [20:11:44] (03CR) 10BBlack: [C: 04-1] "This passes lint and looks sane to me, but the earlier (reverted) commit of the same broke prod hosts with e.g.: "Error: Could not retriev" [puppet] - 10https://gerrit.wikimedia.org/r/197404 (owner: 10BBlack) [20:14:44] RECOVERY - puppet last run on virt1008 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [20:18:13] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1126378 (10RobH) [20:19:21] (03PS2) 10RobH: adding wikimedia.xyz domain support [dns] - 10https://gerrit.wikimedia.org/r/196312 [20:25:07] (03PS2) 10Ottomata: Use literal tab characters in eventlogging varnishkafka format [puppet] - 10https://gerrit.wikimedia.org/r/197398 [20:26:39] (03CR) 10Ottomata: [C: 032] Use literal tab characters in eventlogging varnishkafka format [puppet] - 10https://gerrit.wikimedia.org/r/197398 (owner: 10Ottomata) [20:28:01] (03PS6) 10BBlack: certs: remove legacy ensure => absent Files [puppet] - 10https://gerrit.wikimedia.org/r/197339 (owner: 10Faidon Liambotis) [20:28:03] (03PS6) 10BBlack: Kill unused/old/test certificates [puppet] - 10https://gerrit.wikimedia.org/r/197338 (owner: 10Faidon Liambotis) [20:28:05] (03PS6) 10BBlack: sslcert: add sslcert::certificate [puppet] - 10https://gerrit.wikimedia.org/r/197337 (owner: 10Faidon Liambotis) [20:28:07] (03PS6) 10BBlack: sslcert: add sslcert::ca define, use it from certs [puppet] - 10https://gerrit.wikimedia.org/r/197336 (owner: 10Faidon Liambotis) [20:28:09] (03PS6) 10BBlack: sslcert: generate chained certs automatically [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [20:28:11] (03PS6) 10BBlack: sslcert: add sslcert::chainedcert [puppet] - 10https://gerrit.wikimedia.org/r/197340 (owner: 10Faidon Liambotis) [20:28:13] (03PS2) 10BBlack: Introduce a new sslcert module (to replace certs.pp) (#2) [puppet] - 10https://gerrit.wikimedia.org/r/197404 [20:29:18] (03PS3) 10RobH: adding wikimedia.xyz domain support [dns] - 10https://gerrit.wikimedia.org/r/196312 [20:30:00] (03CR) 10RobH: [C: 032] adding wikimedia.xyz domain support [dns] - 10https://gerrit.wikimedia.org/r/196312 (owner: 10RobH) [20:32:47] hmm [20:32:48] [Error] Failed to load resource: the server responded with a status of 500 (Internal Server Error) (120px-Screen_Shot_2014-10-22_at_10.10.01_AM.png, line 0) [20:33:59] reproducible and consistently... [20:34:14] something wrong with a thumbnail server ? [20:35:20] x-cachecp1062 miss (0), cp3009 miss (0), cp3016 frontend miss (0) [20:35:22] (03PS1) 10Yuvipanda: integration: Enable project-wide puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/197409 [20:35:48] (03PS2) 10Yuvipanda: integration: Enable project-wide puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/197409 [20:36:12] Hi, I have a problem with my git repo. Jenkins complains that a change can not be merge with the state of my repo https://gerrit.wikimedia.org/r/#/c/197405/ but I can not see a problem in https://git.wikimedia.org/commit/mediawiki%2Fextensions%2FMathSearch/baf9ed156345aefad0fabb63319b5e79af318ebc [20:36:43] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1126446 (10RobH) DNS is now live for wikimedia.xyz. Redirection support adding @ 14:00 Pacific. [20:37:00] thedj: uh, can you give me a URL? [20:37:12] (03PS1) 10Nuria: [WIP] Testing pageviews and logster and wikimetrics [puppet] - 10https://gerrit.wikimedia.org/r/197411 [20:37:51] Glaisher: heh, thx for spotting the xyz typo, that would have sucked =P [20:37:52] YuviPanda: it's the history thumb at: https://commons.wikimedia.org/wiki/File:Screen_Shot_2014-10-22_at_10.10.01_AM.png [20:38:21] oh and of COURSE now it works [20:38:39] thedj: wfm. probably a small race when the apaches were restarted for robh’s change :) [20:38:42] anyway: https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Screen_Shot_2014-10-22_at_10.10.01_AM.png/120px-Screen_Shot_2014-10-22_at_10.10.01_AM.png [20:38:52] YuviPanda: i didnt restart anything yet [20:38:56] oh [20:38:58] its not scheduled until 2 [20:38:59] that’s stranger then... [20:39:26] robh: hahaha, so your C+2 on that patch happened exactly at 2AM my time, and so I just assumed... [20:39:28] timezones. [20:40:30] physikerwelt: hmm, that's weird [20:41:05] (03PS2) 10Nuria: [WIP] Testing pageviews, logster and wikimetrics [puppet] - 10https://gerrit.wikimedia.org/r/197411 [20:42:00] physikerwelt: it's parent is the head of the repo, so that shouldn't fail normally. unless jenkins didn't reset it's repo [20:42:29] thedj: Yes... that's why I went to #wikimedia-operations ... maybe something went wrong in with jenkins [20:43:28] physikerwelt: hmm, i see in the other channel, they have been playing with zuul a bit [20:44:13] thedj: it started with a commit that was not submitted... even though jenkins normally does that automatically https://gerrit.wikimedia.org/r/#/c/197267/2 [20:45:22] thedj: Do you think I should just push the change... or will this confuse jenkins even further? [20:45:34] join wikimedia-releng please [20:47:17] physikerwelt: "seems something is wrong on zuul/gerrit side" is what they say there [20:58:32] (03CR) 10Tim Landscheidt: "Doesn't this need setting the corresponding Puppet variable to the project puppet master as well?" [puppet] - 10https://gerrit.wikimedia.org/r/197409 (owner: 10Yuvipanda) [20:59:07] (03CR) 10Yuvipanda: "https://wikitech.wikimedia.org/w/index.php?title=Hiera%3AIntegration&diff=148696&oldid=147018 :D" [puppet] - 10https://gerrit.wikimedia.org/r/197409 (owner: 10Yuvipanda) [21:00:04] RobH: Respected human, time to deploy apache redirection support - wikimedia.xyz (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150317T2100). Please do the needful. [21:00:32] (03PS1) 10Ottomata: Make eventlogging forwarder define work with new forwarder config file format [puppet] - 10https://gerrit.wikimedia.org/r/197413 [21:02:39] YuviPanda: Tim's scap change got merged but apparently not properly deployed [21:02:48] I'll take care of it [21:02:56] bd808: ah, sweet. thanks [21:03:15] akosiaris: hey! the thing that happened with git-deploy and citoid is happening again now :( this time in staging, where I’m trying to set up sca01... [21:04:50] (03PS2) 10Ottomata: Make eventlogging forwarder define work with new forwarder config file format [puppet] - 10https://gerrit.wikimedia.org/r/197413 [21:05:25] (03CR) 10Yuvipanda: [C: 032] integration: Enable project-wide puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/197409 (owner: 10Yuvipanda) [21:05:36] !log mw1222.eqiad.wmnet and mw2003.codfw.wmnet not responding to trebuchet fetch for scap [21:05:41] Logged the message, Master [21:06:20] (03CR) 10BBlack: "This may be a puppet module autoloading bug on the master(s) and/or a timing dep there. May be best to just split commit in two (add new " [puppet] - 10https://gerrit.wikimedia.org/r/197404 (owner: 10BBlack) [21:06:36] (03CR) 10Tim Landscheidt: "Hmmm. Should Hiera:Integration not be moved to hieradata/labs/something then? (That's not a rhetorical question -- I haven't made my min" [puppet] - 10https://gerrit.wikimedia.org/r/197409 (owner: 10Yuvipanda) [21:07:05] (03CR) 10Yuvipanda: "I think it should be 'test on Hiera:, and then move into the repo'" [puppet] - 10https://gerrit.wikimedia.org/r/197409 (owner: 10Yuvipanda) [21:07:18] !log trebuchet checkout errors from mw1104, mw1113, mw1222. No response from mw2003 [21:07:21] Logged the message, Master [21:07:39] ok, better late than never [21:07:44] starting my apache deployment =P [21:09:11] !log Updated scap to include I6301816 (Check for content before !log updating apche redirects, disabling puppet on mw hosts [21:09:18] Logged the message, Master [21:09:21] Logged the message, Master [21:09:25] bd808: uh... you are deploying stuff? [21:09:36] just making sure we arent both hammering mw systems at same time =] [21:09:37] robh: just updating scap on tin [21:09:41] and all done [21:09:42] coolness [21:10:28] man i hate apache deployments... i have flashbacks of site outages =P [21:11:17] (03CR) 10RobH: [C: 032] adding support to redirect wikimedia.xyz to wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/196321 (owner: 10RobH) [21:15:52] whew, test change works, enabling across puppet runs on mw systems now [21:16:14] PROBLEM - puppet last run on mw2004 is CRITICAL: CRITICAL: Puppet last ran 4 days ago [21:16:45] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1126582 (10hashar) Zuul packaging as been discussed during the weekly ops meeting on 03/16. Andrew B. relayed the info... [21:19:53] RECOVERY - puppet last run on mw2004 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:23:27] (03CR) 10Alex Monk: "So this was never a configured namespace, but there are 47 existing links to it, half of which don't seem to refer to a valid page in the " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195017 (owner: 10Mjbmr) [21:27:14] ok, apache deploys now totally take longer [21:27:17] but are far more sane [21:27:21] my puppet runs are still going =P [21:28:27] andddd it just finished [21:28:31] csteipp: done with my deployment window [21:28:51] !log apaches updated per https://phabricator.wikimedia.org/T92547 and appear stable [21:28:58] Logged the message, Master [21:29:37] (03PS1) 10Yuvipanda: staging: Add sca role to sca nodes [puppet] - 10https://gerrit.wikimedia.org/r/197416 (https://phabricator.wikimedia.org/T91554) [21:29:46] (03CR) 10jenkins-bot: [V: 04-1] staging: Add sca role to sca nodes [puppet] - 10https://gerrit.wikimedia.org/r/197416 (https://phabricator.wikimedia.org/T91554) (owner: 10Yuvipanda) [21:29:59] (03PS2) 10Yuvipanda: staging: Add sca role to sca nodes [puppet] - 10https://gerrit.wikimedia.org/r/197416 (https://phabricator.wikimedia.org/T91554) [21:30:15] (03CR) 10Yuvipanda: [C: 032 V: 032] staging: Add sca role to sca nodes [puppet] - 10https://gerrit.wikimedia.org/r/197416 (https://phabricator.wikimedia.org/T91554) (owner: 10Yuvipanda) [21:33:32] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1126641 (10RobH) 5Open>3Resolved apache redirection support is now live. So our cluster now supports wikimedia.xyz and redirects it to wikimedia.org. Resolving this request. [21:39:48] (03PS3) 10Ottomata: Make eventlogging forwarder define work with new forwarder config file format [puppet] - 10https://gerrit.wikimedia.org/r/197413 [21:41:41] Krinkle: can you help me figure out which patch/group to revert for that section heading issue? [21:41:55] MatmaRex: ^^ ditto [21:42:07] https://phabricator.wikimedia.org/T18691 [21:42:24] (03PS1) 10Ottomata: Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 [21:42:55] (03CR) 10Ottomata: [C: 04-1] "WIP, eventlogging code will need to be deployed first." [puppet] - 10https://gerrit.wikimedia.org/r/197413 (owner: 10Ottomata) [21:43:07] (03CR) 10Ottomata: [C: 04-1] "WIP, eventlogging code will need to be deployed first." [puppet] - 10https://gerrit.wikimedia.org/r/197418 (owner: 10Ottomata) [21:43:13] (03CR) 10jenkins-bot: [V: 04-1] Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 (owner: 10Ottomata) [21:43:44] greg-g: reverting only the original core patch should be enough [21:44:11] greg-g: there were more patches to skins, that should not cause problems if left there, but could be reverted too [21:44:17] kk [21:44:27] greg-g: and there were more patches to update extension parser tests, which we'll probably want to revert [21:44:39] (03PS2) 10Ottomata: Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 [21:44:41] greg-g: it would be easier to figure out if phab would let me view more than the last 100 comments on the task [21:45:16] MatmaRex: :/ [21:45:26] wait, really, wtf [21:45:28] (03CR) 10jenkins-bot: [V: 04-1] Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 (owner: 10Ottomata) [21:45:32] yes [21:45:34] really [21:45:34] :( [21:45:42] supposedly fixed upstream, but we didn't update yet? [21:45:49] or the fix didn't actually fix it [21:45:52] do you konw what wmfXX it went out in originally? [21:46:00] there's a task in our phab about it somewhere [21:46:14] (03PS3) 10Ottomata: Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 [21:46:24] MatmaRex: https://gerrit.wikimedia.org/r/#/c/186332/ that one? [21:46:32] Yeah I thought there was something broken about that task's comments [21:46:40] yes, wmf19 apparently [21:46:42] huh [21:46:43] * greg-g nods [21:46:44] (I also went to try and find that information, and didn't get very far :() [21:46:47] https://www.mediawiki.org/wiki/MediaWiki_1.25/wmf19 [21:46:48] "Included in: master, wmf/1.20wmf21, wmf/1.25wmf19, wmf/1.25wmf20, wmf/1.25wmf21" [21:46:49] the fuck [21:46:59] ? [21:47:02] where did the 1.20 come from [21:47:07] oh ignore that [21:47:15] there's a task open to kill that [21:47:18] probably a bad script run [21:47:21] yep [21:47:28] anywho, MatmaRex can you rever that, Krenair wanna push it? :) [21:47:35] Not particularly. [21:47:45] I quite like it. [21:47:54] But okay. [21:47:58] greg-g: actually, there seems to have been some follow-ups, so it won't be a simple revert [21:48:01] that's fine, but it's breaking search results, we can fix it and redo [21:48:05] redo it8 [21:48:06] gah * [21:48:12] :/ [21:48:35] https://phabricator.wikimedia.org/T92501 - Clean up erroneously created wmf/1.20wmf21 branches [21:48:43] greg-g: there is https://gerrit.wikimedia.org/r/#/c/194377/ up for review which would most likely fix that [21:48:58] (i also quite like it, but i won't oppose reverting) [21:49:17] I like it, too, but only in a fixed form :) [21:49:38] Krinkle: around? [21:50:09] greg-g: Yup, saw your ping. Will check out which commits need reverting. [21:50:20] greg-g: It's no longer just one. There's been a few follow-up commits in different repos as well as fixups [21:50:26] thedj: around? see ^ [21:51:00] kk, MatmaRex/ Krinkle if you want to work together on this that's cool, if it's easier for just one person to do it looks like Krinkle's on it? [21:51:02] greg-g: Krinkle: https://gerrit.wikimedia.org/r/#/c/194377/ would work as a fix for most of the technical problems, after being fixed not to break in presence of HTML caches [21:51:17] * greg-g lets you two figure out the details [21:51:29] might be easier than reverting, but it also kind of sucks to experiment on living body like this [21:51:45] i have a university deadline in one hour and nine minutes [21:51:49] so ask me again then [21:51:52] :) [21:51:55] Krinkle: you're on point then :) [21:52:00] MatmaRex: :) [21:53:18] !log running varnishncsa in a shell on cp1056 to analyze usage of ext.geshi.language.* modules. [21:53:24] MatmaRex: We don't need lang/dir on headings though [21:53:25] Logged the message, Master [21:53:36] Can be and is inherited. [21:54:07] greg-g: I'm not on it yet. [21:54:23] (03PS1) 10Yuvipanda: sca: Include realserver only in production [puppet] - 10https://gerrit.wikimedia.org/r/197419 [21:54:29] greg-g: And wouldn't mind MatmaRex doing it. Seems like he's got a better handle on it as main caretaker of the feature. [21:54:31] (03CR) 10jenkins-bot: [V: 04-1] sca: Include realserver only in production [puppet] - 10https://gerrit.wikimedia.org/r/197419 (owner: 10Yuvipanda) [21:54:42] Krinkle: he has a university deadline :/ [21:54:52] OK :) [21:55:08] greg-g: Krinkle: if it can be done tomorrow, then i can do it [21:55:19] this is a "do it now" thing, unfortunately [21:55:37] okay [21:55:38] "unbreak now!" in phab speak [21:55:43] then [21:55:45] for simplest fix [21:55:56] just remove the few lines of code from Linker::makeHeadline [21:56:08] nothing will break if you do that, apart from parser tests of course [21:56:12] (03PS1) 10Andrew Bogott: Fix the args for designate-sink. [puppet] - 10https://gerrit.wikimedia.org/r/197420 [21:56:51] (03CR) 10Andrew Bogott: [C: 032] Fix the args for designate-sink. [puppet] - 10https://gerrit.wikimedia.org/r/197420 (owner: 10Andrew Bogott) [21:56:53] trying to fully revert this on short notice is going to suck, but removing it like that is easy [21:56:55] MatmaRex: can you do that (the first part) and then we'll work on the follow up (parser test fix) later? [21:57:33] meh, okay [21:58:20] (03PS1) 10Yuvipanda: citoid: Specify default port for citoid in puppet itself [puppet] - 10https://gerrit.wikimedia.org/r/197421 [21:58:25] MatmaRex: If you're pressed on time, don't worry about it. I'll remove it within the next few hours. [21:58:29] (03CR) 10jenkins-bot: [V: 04-1] citoid: Specify default port for citoid in puppet itself [puppet] - 10https://gerrit.wikimedia.org/r/197421 (owner: 10Yuvipanda) [21:59:14] (03PS2) 10Yuvipanda: sca: Include realserver only in production [puppet] - 10https://gerrit.wikimedia.org/r/197419 [21:59:39] (03CR) 10Yuvipanda: [C: 032 V: 032] sca: Include realserver only in production [puppet] - 10https://gerrit.wikimedia.org/r/197419 (owner: 10Yuvipanda) [21:59:41] greg-g: Krinkle: https://gerrit.wikimedia.org/r/197424 [22:00:38] (03PS4) 10Ottomata: Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 [22:00:45] !log csteipp Synchronized php-1.25wmf21/includes/: (no message) (duration: 00m 11s) [22:00:46] eh, it needs rebase, and i can't pull from gerrit *again* [22:00:50] Logged the message, Master [22:00:52] it always breaks at worst times [22:03:09] (03PS2) 10Yuvipanda: citoid: Specify default port for citoid in puppet itself [puppet] - 10https://gerrit.wikimedia.org/r/197421 [22:03:27] (03PS5) 10Ottomata: Set up kafka processor and kafka consumer for client side events from /beacon/event.gif via varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/197418 [22:05:04] (03CR) 10Yuvipanda: [C: 032 V: 032] citoid: Specify default port for citoid in puppet itself [puppet] - 10https://gerrit.wikimedia.org/r/197421 (owner: 10Yuvipanda) [22:05:57] MatmaRex: OK. Want me to resolve conflcit? [22:06:11] Krinkle: please do [22:06:27] !log deployed patches for T85848 to wmf20 and 21 [22:06:31] Logged the message, Master [22:06:37] MatmaRex: Hm.. pull from gerrit doesn't work through my VPN. [22:06:39] Weird stuff [22:06:43] Without it it works [22:09:04] (03CR) 10Yuvipanda: "This broke beta / staging, since a port was no longer specified by default, and hiera in hieradata/common doesn't actually apply to labs a" [puppet] - 10https://gerrit.wikimedia.org/r/195896 (https://phabricator.wikimedia.org/T89875) (owner: 10Mobrovac) [22:14:51] (03PS3) 10Yuvipanda: Parameterize mail::mx role [puppet] - 10https://gerrit.wikimedia.org/r/196658 (https://phabricator.wikimedia.org/T91562) (owner: 10Thcipriani) [22:15:15] (03CR) 10Yuvipanda: [C: 032 V: 032] Parameterize mail::mx role [puppet] - 10https://gerrit.wikimedia.org/r/196658 (https://phabricator.wikimedia.org/T91562) (owner: 10Thcipriani) [22:17:02] (03CR) 10Mjbmr: "Yes, they have been deleted, but once this is done, they will refer to right pages." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195017 (owner: 10Mjbmr) [22:27:19] (03CR) 10Ottomata: [C: 04-1] "WIP, eventlogging code will need to be deployed first." [puppet] - 10https://gerrit.wikimedia.org/r/197418 (owner: 10Ottomata) [22:28:07] 6operations, 10Datasets-General-or-Unknown, 6Services, 10hardware-requests: Hardware for HTML / zim dumps - https://phabricator.wikimedia.org/T91853#1126747 (10GWicke) @robh, can we go ahead with one of the spares? [22:28:42] PROBLEM - RAID on db1004 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [22:30:16] (03PS1) 10Legoktm: Load 3 extensions via extension registration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197427 [22:30:32] !log deployed updated patch for T73394 [22:30:37] Logged the message, Master [22:35:57] greg-g: it's not on enwiki yet, right? [22:36:39] Krinkle: since it was in wmf19, yeah, it is [22:36:56] hence the "unbreak now!" priority [22:37:02] I don't see it there, though [22:37:06] hmmm [22:37:18] Trying to test he hiding patch by loading dynamically via my chrome hack [22:37:20] §s, §s everywhere on enwiki! [22:37:44] ACKNOWLEDGEMENT - RAID on db1004 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Sean Pringle dont care. box is to be decommed [22:38:27] (03CR) 10Jforrester: [C: 031] "OMG." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197427 (owner: 10Legoktm) [22:38:29] k [22:38:38] legoktm: You intending to SWAT that? [22:38:42] James_F: yup! [22:38:47] legoktm: Awesomesauce. [22:39:22] Krinkle: the page had to be recently edited for the parser cache to be updated [22:39:29] Ah,r ight [22:41:05] legoktm, would the instantiating CR through the queue overwrite $wgGroupPermissions settings from CommonSettings.php? [22:43:16] greg-g: creating cherry-picks now [22:43:24] all need manual work since the parser tests file conflicts [22:43:48] Krinkle: :/ thanks [22:44:52] MaxSem: no. It will do a += so the one in CommonSettings.php will take precedence: https://github.com/wikimedia/mediawiki/blob/master/includes/registration/ExtensionRegistry.php#L153 [22:45:46] legoktm, what if an extension wants to overwrite a setting then? :P [22:46:16] MaxSem: in what way? [22:47:02] I dunno - change some core setting to work with this extension? [22:48:52] for now they can change it by using a "callback" and changing the global [22:53:34] greg-g: To deploy right away? [22:54:11] (03CR) 10Kaldari: [C: 031] [WikiGrok] Add the "filmProducer" campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194503 (owner: 10Phuedx) [22:54:21] I think it's supposed to be going on the SWAT. [22:54:28] But no one has added it yet. [22:54:51] Ah deploy is right up in an hour [22:54:52] OK [22:55:01] 6 minutes [22:55:03] :) [22:55:05] I'll add it there [22:55:58] Yeah, awfully close right? [22:57:58] (03PS3) 10Nuria: [WIP] Testing pageviews, logster and wikimetrics [puppet] - 10https://gerrit.wikimedia.org/r/197411 [22:58:31] In that case, RoanKattouw / Krenair: Sorry for merging the wmf branch commit already. Beware :) [22:58:42] Krinkle, sigh. [22:58:42] RECOVERY - uWSGI web apps on graphite2001 is OK: OK: All defined uWSGI apps are runnning. [22:59:02] Krinkle, why did you not just check the calendar? [22:59:42] Because timezones (been in NL for a few weeks, got home yesterday) and because I don't SWAT very oftten. [22:59:47] I should've checked, sorry. [23:00:03] It's only merged at this point. Nothing beyond that. [23:00:04] RoanKattouw, ^d, Krenair, MaxSem: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150317T2300). Please do the needful. [23:00:10] okay [23:00:11] hrmm [23:00:18] * MaxSem is here [23:00:19] So I'll do Krinkle's one first [23:00:20] someone needs to make jouncebot process the humans [23:00:22] and pluralize [23:00:29] or just human(s) in text ;D [23:00:41] Instead of simply reverting it and doing it last >_> [23:00:42] the latter scales better... [23:01:00] * legoktm is here [23:01:10] * ebernhardson is around [23:01:45] robh, it's not UDP it's multicast. every addressee receives a personalized message, it just happens to be the same :P [23:01:59] Krinkle, problem. [23:02:25] PROBLEM - uWSGI web apps on graphite2001 is CRITICAL: CRITICAL: Not all configured uWSGI apps are running. [23:08:50] Krenair: problem? [23:08:58] yes. [23:09:12] what is it? :P [23:10:58] Private. [23:11:03] I think I've resolved it now anyway. Hopefully. [23:12:46] Oh FFS [23:13:17] This one is your fault legoktm. [23:13:29] :< [23:13:30] always [23:13:34] 23:12:22 sync-dir failed: /srv/mediawiki-staging/php-1.25wmf20/extensions/Scribunto/engines/LuaCommon/lualib/ustring/make-tables.php has content before opening ugh [23:13:48] but [23:13:48] but [23:13:48] Trying to sync-dir php-1.25wmf20 [23:14:36] sigh [23:14:40] it's using short php tags [23:15:07] Krenair: can you sync a smaller dir? [23:15:10] also ugh - why whole wikiversion dir? [23:15:13] 3 smaller dirs? [23:15:19] I guess that's one workaround >_> [23:15:45] I do expect that to work, however. [23:16:03] !log krenair Synchronized php-1.25wmf20/includes/Linker.php: https://gerrit.wikimedia.org/r/#/c/197430/ (duration: 00m 05s) [23:16:10] Logged the message, Master [23:16:27] * legoktm will brb in 5 [23:16:31] !log krenair Synchronized php-1.25wmf20/resources: https://gerrit.wikimedia.org/r/#/c/197430/ (duration: 00m 05s) [23:16:34] Logged the message, Master [23:17:26] Krinkle, check [23:17:32] I think it's OK [23:18:09] Krenair, wfm [23:18:30] :`( [23:19:05] ValueError: /srv/mediawiki-staging/php-1.25wmf20/tests/parser/parserTests.txt has content before opening Seriously. [23:19:42] I'm going to leave this file but this needs to be fixed. [23:19:56] (03PS1) 10Tim Landscheidt: Tools: Port portgrabber to Python [puppet] - 10https://gerrit.wikimedia.org/r/197439 (https://phabricator.wikimedia.org/T91954) [23:20:06] oh heh [23:20:21] can we check images too? :P [23:22:34] !log krenair Synchronized php-1.25wmf21/includes/Linker.php: https://gerrit.wikimedia.org/r/#/c/197428/ (duration: 00m 05s) [23:22:39] Logged the message, Master [23:22:53] !log krenair Synchronized php-1.25wmf21/resources: https://gerrit.wikimedia.org/r/#/c/197428/ (duration: 00m 05s) [23:22:56] Logged the message, Master [23:22:56] Krinkle, are you there? [23:23:00] Krenair: Yup [23:23:11] You're supposed to be confirming that these work OK [23:23:41] Krenair: Confirmed it works OK on cached pages (is now hidden() [23:23:46] files fail to be renamed and get disconnected from the DB record on move >> https://phabricator.wikimedia.org/T93009 [23:24:05] Krenair: Confirmed its gone from newly parsed pages [23:24:17] Moving on to MaxSem [23:24:23] yessir! [23:24:27] I'll look into it as soon as I can but I'll be on board a plane for a couple hours, so if you have any idea for a quick fix... [23:27:05] * legoktm is back [23:27:34] [16:19:05] ValueError: /srv/mediawiki-staging/php-1.25wmf20/tests/parser/parserTests.txt has content before opening Apparently not. [23:28:46] (03CR) 10Tim Landscheidt: "Tested this in vitro, and then copied it to all webgrid nodes, restarted a webservice of mine, and looking at the proxies' Redis data and " [puppet] - 10https://gerrit.wikimedia.org/r/197439 (https://phabricator.wikimedia.org/T91954) (owner: 10Tim Landscheidt) [23:29:14] * Krenair kicks jenkins [23:29:47] Krenair, I just force-merge the submodule updates:P [23:29:55] I think I'm going to do that. [23:31:58] !log krenair Synchronized php-1.25wmf21/extensions/GeoData/GeoDataHooks.php: https://gerrit.wikimedia.org/r/#/c/197410/ (duration: 00m 06s) [23:31:59] MaxSem [23:32:02] Logged the message, Master [23:33:06] Krenair: Krenair is that section symbol fix out/done? [23:33:12] er Krinkle / Krenair [23:33:12] Yes. [23:33:18] thanks! [23:33:22] I almost reverted it several times. [23:33:23] But yes. [23:33:38] Krenair, uploads seem to work on mw.o, but I never knew how to repro that bug anyway [23:33:49] but at least it's not worse [23:33:58] Okay, that's fine [23:34:25] legoktm, ping [23:34:45] harr harr, but deletion is borked, as stated by tgr :P [23:35:48] legoktm, yt? [23:35:54] MaxSem, what? it's broken deletion? [23:36:22] Krinkle: what follow up do I need to make sure happens before the next branch is cut tomorrow morning (re the section glyph removal)? [23:36:27] don't think that this patch broke, it - it was on file upload hook [23:36:44] right [23:36:49] https://www.mediawiki.org/wiki/File:Omguploadtest.svg [23:37:09] possibly related to https://phabricator.wikimedia.org/T93009 [23:37:54] No legoktm? okay, kaldari? [23:39:08] kaldari, ... these patches all depend on a commit that's not up for deployment [23:39:16] ebernhardson? [23:39:34] Krenair: my internet connection isn't stable [23:40:24] Krenair: Can you add the first patch as well: https://gerrit.wikimedia.org/r/#/c/194354/ [23:40:46] (03CR) 10Kaldari: [C: 031] [WikiGrok] Add new suggestions to the actor campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194354 (owner: 10Phuedx) [23:41:16] kaldari, we doing these all at once or one at a time? [23:41:55] Krenair: all at once. They’re not actually dependent on each other. [23:42:08] (03CR) 10Alex Monk: [C: 032] [WikiGrok] Add new suggestions to the actor campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194354 (owner: 10Phuedx) [23:42:15] (03Merged) 10jenkins-bot: [WikiGrok] Add new suggestions to the actor campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194354 (owner: 10Phuedx) [23:42:26] (03CR) 10Alex Monk: [C: 032] [WikiGrok] Create 'film director' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194373 (owner: 10Bmansurov) [23:42:33] (03Merged) 10jenkins-bot: [WikiGrok] Create 'film director' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194373 (owner: 10Bmansurov) [23:42:40] Krenair: would have been nice if they had all just been in a single commit, but oh well :P [23:42:48] (03CR) 10Alex Monk: [C: 032] [WikiGrok] Create 'screenwriter' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194378 (owner: 10Bmansurov) [23:42:48] Krenair: yup [23:42:55] (03Merged) 10jenkins-bot: [WikiGrok] Create 'screenwriter' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194378 (owner: 10Bmansurov) [23:43:16] (03CR) 10Alex Monk: [C: 032] [WikiGrok] Add the "filmProducer" campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194503 (owner: 10Phuedx) [23:43:23] (03Merged) 10jenkins-bot: [WikiGrok] Add the "filmProducer" campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194503 (owner: 10Phuedx) [23:43:29] That everything kaldari? [23:43:54] Krenair: yep [23:45:35] !log krenair Synchronized wmf-config/mobile.php: 194354, 194373, 194378, 194503 (duration: 00m 05s) [23:45:37] kaldari, ^ [23:45:39] Logged the message, Master [23:46:24] Krenair: testing [23:48:33] Krenair: yeah...my connection is too unstable right now. Lets postpone it until tomorrow morning [23:51:27] (03CR) 10Alex Monk: "This broke things during the deployment today." [tools/scap] - 10https://gerrit.wikimedia.org/r/196306 (https://phabricator.wikimedia.org/T92534) (owner: 10Legoktm) [23:52:47] everything ok kaldari? [23:53:03] Krenair: almost done testing, sorry... [23:54:06] Krenair: looks good [23:54:25] ok, thanks kaldari [23:54:27] ebernhardson, hi [23:54:41] Krenair: hi [23:54:51] Your patch failed to merge [23:55:06] https://gerrit.wikimedia.org/r/#/c/197434/ [23:55:06] hmm, it should have been a simple bump. [23:55:33] sec i can make another one, not sure what went wrong with that one though. [23:56:43] Krenair: https://gerrit.wikimedia.org/r/197443 is new patch, mergable [23:56:58] doh, sec i'll just give it same commit it [23:56:58] id [23:57:46] Krenair: mergable now [23:58:49] This is not a good time for my shell on tin to freeze.