[00:01:43] ori: I could use your root super powers apparently. /srv/scap/scap/*.py? need to be chmod a+r [00:02:34] I was able to change some of them but others are owned by root or sam and won't let me fix the permissions [00:03:08] `dsh -g mediawiki-installation -M -F 80 -- 'cd /srv/scap/scap ; chmod a+r *.py?'` should do it [00:03:19] Or a slat command I suppose [00:03:22] *salt [00:04:13] * bd808|deploy will be writing the patch to change the scap deploy method tomorrow :/ [00:04:14] bd808|deploy: on it [00:06:58] bd808|deploy: try now? [00:07:50] ori: Just spot checking. I still see some 0660 perms on wm1125 [00:08:11] csteipp, does fundraising need to care about bug 63251; or can I wait till monday to deploy the newest 1.22 release? [00:08:17] (the bug is still hidden) [00:08:33] mwalker: It would only be aproblem if you allow random users to edit the wiki [00:08:38] Which you don't, iirc [00:08:45] yep; no random editing users [00:09:04] ori: "-rw-rw---- 1 reedy wikidev 2869 Apr 24 13:39 config.pyc" [00:11:16] pyc != py [00:11:21] i'll fix [00:11:40] how did things get that way, anyhow? [00:13:05] bd808|deploy: done [00:13:46] ori: I had the `?` in the dsh command on purpose :) As to how... it looks like things were all/mostly a-rwx in there. [00:14:37] We had the file permissions all jacked up early on. There were some root interventions and that attempt to do weird things with the git permissions via puppet [00:15:26] can you try now? [00:15:33] ori: It's still not getting fixed everywhere. I'm seeing multiple 0660 files on mw1010 [00:17:55] ori: You can see them all with `dsh -g mediawiki-installation -M -F 80 -- 'cd /srv/scap/scap ; ls -l'|grep -- ---` [00:18:18] bd808|deploy: try now [00:19:10] ori: All fixed except snapshot1002, mw1020, mw1169 and mw1180 [00:20:29] ori: Frack. I think puppet in un-fixing things [00:20:39] The list is longer now. [00:22:08] Weirdly it seems to only be pyc files that are reverting permissions [00:22:31] bd808|deploy: because they're created by you when you execute scap [00:22:56] But I haven't executed scap again yet [00:23:27] They were chmod by you/me and now they are back to 0660 [00:23:39] but as long as it's only pyc things will work [00:23:50] * bd808|deploy will scap again [00:23:54] !log bd808 Started scap: no-op scap to validate I24149ab and Ie967901 (try 2) [00:24:00] Logged the message, Master [00:26:16] !log File permissions on /srv/scap/scap/*.{py,pyc} were not consistently a+r which is needed for scap-rebuild-cdbs [00:26:23] Logged the message, Master [00:28:40] bd808|deploy: is that the previous scap or the current? [00:28:53] ori: previous [00:28:56] !log bd808 Finished scap: no-op scap to validate I24149ab and Ie967901 (try 2) (duration: 05m 02s) [00:29:00] It's working now [00:29:02] Logged the message, Master [00:29:46] !log Ori was able to fix permissions and second scap test worked as expected [00:29:52] Logged the message, Master [00:31:51] greg-g: All done piddling [00:32:00] ori: Thanks for the help. [00:34:57] bd808: please wipe up after yourself [00:47:50] PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/ubuntu/project/trace/carbon.wikimedia.org is over 12 hours old. [00:49:50] RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/ubuntu/project/trace/carbon.wikimedia.org is over 0 hours old. [00:53:20] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:53:20] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:54:10] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [00:54:10] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [02:10:10] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3548 MB (3% inode=99%): [02:17:10] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3446 MB (3% inode=99%): [02:22:06] Bleh. [02:23:10] RECOVERY - Disk space on virt0 is OK: DISK OK [02:37:08] !log LocalisationUpdate completed (1.24wmf1) at 2014-04-25 02:37:06+00:00 [02:37:15] Logged the message, Master [03:03:35] !log LocalisationUpdate completed (1.24wmf2) at 2014-04-25 03:03:33+00:00 [03:03:40] Logged the message, Master [03:25:10] (03CR) 10BryanDavis: "Cherry-picked into beta and in use for several days with no issues." [operations/puppet] - 10https://gerrit.wikimedia.org/r/127399 (owner: 10BryanDavis) [03:44:21] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Apr 25 03:44:16 UTC 2014 (duration 44m 15s) [03:44:27] Logged the message, Master [04:23:51] PROBLEM - Disk space on lvs3004 is CRITICAL: DISK CRITICAL - free space: / 1688 MB (3% inode=97%): [04:33:51] RECOVERY - Disk space on lvs3004 is OK: DISK OK [05:12:00] (03CR) 10Giuseppe Lavagetto: [C: 032] gitblit: remove nrpe from role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129143 (owner: 10Matanya) [05:16:18] (03CR) 10Giuseppe Lavagetto: [C: 031] "The whole point of HSTS is to prevent people from accessing bugzilla without https; I consider this to be much more relevant than the rela" [operations/puppet] - 10https://gerrit.wikimedia.org/r/127256 (owner: 10JanZerebecki) [05:28:30] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "It all seems pretty good to me, however please see the comment." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129075 (owner: 10Ori.livneh) [06:30:45] (03PS7) 10Giuseppe Lavagetto: Substituting the check_graphite script. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 [06:32:56] (03PS1) 10Matanya: fundraising: remove nrpe from role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129636 [06:34:36] (03CR) 10Edenhill: [C: 031] "Looks good!" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/127804 (owner: 10CSteipp) [06:39:16] (03PS1) 10Matanya: poolcounter: remove nrpe from role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129637 [06:41:11] <_joe|away> matanya: I'm off for the day but you could pack all those nrpe changes in one or two big ones [06:41:28] hi _joe|away [06:41:34] there is only one more left [06:41:51] thanks for the tip [06:45:07] (03PS1) 10Matanya: otrs: remove nrpe from role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129638 [07:22:26] !log up to 5x pt-table-sync running on db1048 m2 master for eventlogging migration. ok to kill if necessary [07:22:33] Logged the message, Master [07:58:39] (03PS1) 10Reedy: WIP: Initial twemproxy configs for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129641 [08:12:32] (03CR) 10Hashar: "Thank you Sam !" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129641 (owner: 10Reedy) [08:12:37] Reedy: lovely twemproxy [08:13:18] (03PS1) 10Reedy: Update MWRealm to drop pmtpa [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129642 [08:13:23] hashar: ^^ [08:13:28] :-] [08:13:49] Yay for Brad already having done most of the work! [08:14:25] (03CR) 10Hashar: [C: 031] "Sounds legit. Too afraid to merge/deploy it though :]" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129642 (owner: 10Reedy) [08:15:04] Reedy: I have never looked at twemproxy myself and the doc is "sparse" https://wikitech.wikimedia.org/wiki/Twemproxy :-D [08:22:45] (03PS1) 10Reedy: Variable twemproxy config location [operations/puppet] - 10https://gerrit.wikimedia.org/r/129644 [08:25:48] (03CR) 10Reedy: Variable twemproxy config location (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129644 (owner: 10Reedy) [08:39:18] (03CR) 10Dzahn: [C: 032] otrs: remove nrpe from role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129638 (owner: 10Matanya) [08:39:31] !log reedy updated /a/common to {{Gerrit|I57b6d055e}}: Update flow cache version to 4.2 [08:39:37] Logged the message, Master [08:39:38] (03PS1) 10Reedy: Remove noncirrus noc symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129645 [08:42:18] (03PS2) 10Reedy: WIP: Initial twemproxy configs for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129641 [08:42:32] (03CR) 10Reedy: [C: 032] Remove noncirrus noc symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129645 (owner: 10Reedy) [08:42:39] (03Merged) 10jenkins-bot: Remove noncirrus noc symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129645 (owner: 10Reedy) [08:43:16] !log reedy synchronized docroot and w [08:43:23] Logged the message, Master [08:43:37] (03CR) 10Dzahn: [C: 032] "all nodes including poolcounter include standard, standard includes base, base includes nrpe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129637 (owner: 10Matanya) [08:43:46] (03PS3) 10Reedy: WIP: Initial twemproxy configs for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129641 [08:45:52] (03PS4) 10Reedy: WIP: Initial twemproxy configs for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129641 [08:47:33] (03CR) 10Dzahn: [C: 032] "this is the civicrm. already includes standard-noexim in this same role file. and that includes base which includes nrpe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129636 (owner: 10Matanya) [08:49:25] (03PS1) 10Reedy: Add comments to map memcached ips to hostnames [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129646 [08:53:53] (03CR) 10Alexandros Kosiaris: [C: 04-1] Variable twemproxy config location (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129644 (owner: 10Reedy) [08:57:09] (03CR) 10Reedy: Variable twemproxy config location (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129644 (owner: 10Reedy) [08:57:53] (03PS4) 10Dzahn: decom the decom script [operations/puppet] - 10https://gerrit.wikimedia.org/r/125986 [08:59:13] (03CR) 10Dzahn: [C: 032] remove tampa.tech.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/129383 (owner: 10Dzahn) [09:00:16] (03CR) 10Alexandros Kosiaris: Variable twemproxy config location (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129644 (owner: 10Reedy) [09:01:17] (03CR) 10Dzahn: [C: 032] decom the decom script [operations/puppet] - 10https://gerrit.wikimedia.org/r/125986 (owner: 10Dzahn) [09:02:40] (03PS2) 10Reedy: Variable twemproxy config location [operations/puppet] - 10https://gerrit.wikimedia.org/r/129644 [09:02:41] ehm.. we have a puppet error in base... looking [09:03:14] (03PS3) 10Reedy: Vary twemproxy config location based on getRealmSpecificFilename() [operations/puppet] - 10https://gerrit.wikimedia.org/r/129644 [09:10:14] (03PS1) 10Dzahn: remove remnants of the old decom script [operations/puppet] - 10https://gerrit.wikimedia.org/r/129648 [09:10:30] (03CR) 10Alexandros Kosiaris: [C: 032] Vary twemproxy config location based on getRealmSpecificFilename() [operations/puppet] - 10https://gerrit.wikimedia.org/r/129644 (owner: 10Reedy) [09:11:32] (03PS2) 10Reedy: Update MWRealm to drop pmtpa [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129642 [09:11:38] (03CR) 10Reedy: [C: 032] Update MWRealm to drop pmtpa [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129642 (owner: 10Reedy) [09:11:47] (03Merged) 10jenkins-bot: Update MWRealm to drop pmtpa [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129642 (owner: 10Reedy) [09:12:10] (03PS2) 10Reedy: Add comments to map memcached ips to hostnames [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129646 [09:12:13] (03CR) 10Reedy: [C: 032] Add comments to map memcached ips to hostnames [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129646 (owner: 10Reedy) [09:12:24] (03Merged) 10jenkins-bot: Add comments to map memcached ips to hostnames [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129646 (owner: 10Reedy) [09:13:21] !log reedy synchronized multiversion/ 'I4a68dc8321b7b302f5e89b5adafcff096f2ac35b' [09:13:27] Logged the message, Master [09:13:49] (03CR) 10Dzahn: [C: 032] remove remnants of the old decom script [operations/puppet] - 10https://gerrit.wikimedia.org/r/129648 (owner: 10Dzahn) [09:15:03] !log reedy synchronized wmf-config/ 'I4a68dc8321b7b302f5e89b5adafcff096f2ac35b' [09:15:09] Logged the message, Master [09:34:10] andre__: ping [09:40:43] (03PS5) 10Reedy: Initial twemproxy configs for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129641 [09:40:49] (03PS6) 10Reedy: Initial twemproxy configs for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129641 [09:41:06] (03CR) 10Reedy: [C: 032] Initial twemproxy configs for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129641 (owner: 10Reedy) [09:41:13] (03Merged) 10jenkins-bot: Initial twemproxy configs for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129641 (owner: 10Reedy) [09:41:57] !log reedy synchronized wmf-config/ [09:42:03] Logged the message, Master [09:48:46] Analytics wants to reactivate metrics.wikimedia.org (just to redirect to a different page), but the site's certificate has not been renewed after heartbleed. [09:49:22] Can we use it nonetheless, should we request a new certificate, or just fallback to redirecting http only. [09:50:10] RECOVERY - MySQL Slave Delay on db1046 is OK: OK replication delay 138 seconds [09:50:34] (03PS1) 10Alexandros Kosiaris: Revert "Vary twemproxy config location based on getRealmSpecificFilename()" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129655 [09:50:40] RECOVERY - MySQL Replication Heartbeat on db1046 is OK: OK replication delay 58 seconds [09:51:24] qchris: could you mail the same thing to 7352@rt.wikimedia.org ? [09:51:37] mutante: Ok. Thanks. [09:52:03] qchris: because..i'm also not sure about those questions and seems good to have them processed there [09:52:06] Prinzessin# [09:52:30] Whoops :-) [09:52:30] eh :) [09:52:41] Time to change a passphrase :-/ [09:55:31] (03PS1) 10Reedy: Cannot use mw-deployment-vars.sh in dash/upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/129656 [10:09:40] PROBLEM - MySQL Replication Heartbeat on db1046 is CRITICAL: CRIT replication delay 315 seconds [10:10:10] PROBLEM - MySQL Slave Delay on db1046 is CRITICAL: CRIT replication delay 340 seconds [10:12:52] (03PS1) 10QChris: Redirect metrics.wikimedia.org to Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/129660 [10:37:16] (03CR) 10Alexandros Kosiaris: [C: 032] "The patch did not work for a number of reasons:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129655 (owner: 10Alexandros Kosiaris) [10:41:20] (03CR) 10Nikerabbit: Add comments to map memcached ips to hostnames (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129646 (owner: 10Reedy) [10:44:22] (03PS1) 10Dzahn: icinga-wm/ircecho: do not set variables in node [operations/puppet] - 10https://gerrit.wikimedia.org/r/129662 [10:45:20] PROBLEM - MySQL Recent Restart on db1046 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:45:49] (03Abandoned) 10Reedy: Cannot use mw-deployment-vars.sh in dash/upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/129656 (owner: 10Reedy) [10:46:10] RECOVERY - MySQL Recent Restart on db1046 is OK: OK 12451419 seconds since restart [10:46:35] (03CR) 10Dzahn: [C: 032] icinga-wm/ircecho: do not set variables in node [operations/puppet] - 10https://gerrit.wikimedia.org/r/129662 (owner: 10Dzahn) [10:47:02] (03PS1) 10Reedy: Vary twemproxy config location based on getRealmSpecificFilename() (take 2) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129663 [10:54:25] (03PS1) 10Dzahn: icinga-wm/echoirc: add $::realm case for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129664 [10:57:11] (03PS1) 10Odder: Add a new namespace to Hebrew Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129666 [11:04:30] (03PS1) 10Reedy: Update getRealmSpecificFilename comments [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129667 [11:04:39] (03CR) 10Dzahn: [C: 032] "trying to make it easier for beta:)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129664 (owner: 10Dzahn) [11:10:09] (03PS1) 10BBlack: Add cp3014 to esams mobile cache backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/129669 [11:12:02] (03CR) 10BBlack: [C: 032 V: 032] Add cp3014 to esams mobile cache backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/129669 (owner: 10BBlack) [11:14:22] (03Abandoned) 10BBlack: Add cp301[34] to mobile caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/127456 (owner: 10QChris) [11:16:36] (03PS1) 10Dzahn: echoirc,wikistats,don't quote 'default' in switch [operations/puppet] - 10https://gerrit.wikimedia.org/r/129670 [11:19:22] (03CR) 10Dzahn: [C: 04-2] "should be replaced with "nfs-home.eqiad.wmnet" once/if we have that, but in all places at once and don't remove. so abandoning for now. nf" [operations/dns] - 10https://gerrit.wikimedia.org/r/125952 (owner: 10Dzahn) [11:19:50] (03Abandoned) 10Dzahn: remove syslog service IP (Tampa) [operations/dns] - 10https://gerrit.wikimedia.org/r/125952 (owner: 10Dzahn) [11:20:20] (03CR) 10Dzahn: [C: 032] echoirc,wikistats,don't quote 'default' in switch [operations/puppet] - 10https://gerrit.wikimedia.org/r/129670 (owner: 10Dzahn) [11:21:02] (03PS1) 10Odder: Move queries for bugs with ASSIGNED status [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/129671 [11:22:05] (03PS8) 10BBlack: Replace Linux RPS setting with a smarter script [operations/puppet] - 10https://gerrit.wikimedia.org/r/95963 (owner: 10Faidon Liambotis) [11:23:42] (03CR) 10BBlack: [C: 032 V: 032] Replace Linux RPS setting with a smarter script [operations/puppet] - 10https://gerrit.wikimedia.org/r/95963 (owner: 10Faidon Liambotis) [11:35:54] !log reedy synchronized docroot and w [11:38:07] (03CR) 10Alexandros Kosiaris: "Please note this adds a regression as far as puppet 3 goes. Now the variables have moved from a resolvable scope (node-level) to a non res" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129662 (owner: 10Dzahn) [11:42:25] (03PS2) 10Odder: Move queries for bugs with ASSIGNED status [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/129671 [11:43:46] (03PS3) 10Odder: Move queries for bugs with ASSIGNED status [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/129671 [11:45:43] morebots: are you slacking? [11:45:43] I am a logbot running on tools-exec-09. [11:45:43] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [11:45:43] To log a message, type !log . [11:45:57] (03PS2) 10Alexandros Kosiaris: show what's backed up in motd (rt #469) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125971 (owner: 10ArielGlenn) [11:51:18] (03CR) 10Alexandros Kosiaris: [C: 032] show what's backed up in motd (rt #469) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125971 (owner: 10ArielGlenn) [12:00:07] (03CR) 10Manybubbles: [C: 032 V: 032] "I'll deploy this today because it is low risk." [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/129484 (owner: 10Manybubbles) [12:00:42] (03PS1) 10Reedy: Partial revert memcached config for labs until twemproxy update is remerged [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129674 [12:01:38] (03CR) 10Reedy: [C: 032] Partial revert memcached config for labs until twemproxy update is remerged [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129674 (owner: 10Reedy) [12:01:45] (03Merged) 10jenkins-bot: Partial revert memcached config for labs until twemproxy update is remerged [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129674 (owner: 10Reedy) [12:02:51] !log reedy synchronized wmf-config/ [12:06:14] (03PS1) 10Odder: National Library of Scotland to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129675 [12:07:22] !log stopping puppet on analytics1026 to test more frequent runs of Camus [12:07:28] Logged the message, Master [12:16:39] (03PS1) 10Dzahn: turn ircecho into a parameterized class [operations/puppet] - 10https://gerrit.wikimedia.org/r/129676 [12:17:22] (03CR) 10Dzahn: "thanks, yea. started the parameterized class in Change-Id: Ic3bd7c826" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129662 (owner: 10Dzahn) [12:17:37] (03CR) 10jenkins-bot: [V: 04-1] turn ircecho into a parameterized class [operations/puppet] - 10https://gerrit.wikimedia.org/r/129676 (owner: 10Dzahn) [12:19:28] (03CR) 10Dzahn: turn ircecho into a parameterized class (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129676 (owner: 10Dzahn) [12:20:54] (03PS2) 10Dzahn: turn ircecho into a parameterized class [operations/puppet] - 10https://gerrit.wikimedia.org/r/129676 [12:21:12] (03CR) 10Hashar: "That fixed login on beta. Thank you!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129674 (owner: 10Reedy) [12:23:49] might not be a bad idea [12:25:50] (03CR) 10Dzahn: [C: 04-1] "eh, how about line 52 and 54 in templates/default.erb? how should they look?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129676 (owner: 10Dzahn) [12:28:03] !log Performing rolling restart of Cirrus's Elasticsearch servers to upgrade a plugin. Low risk because it won't be used by the general public until Mondayish so a Friday push should be ok. [12:28:09] Logged the message, Master [12:46:52] PROBLEM - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.109 [12:52:56] (03PS3) 10Dzahn: turn ircecho into a parameterized class [operations/puppet] - 10https://gerrit.wikimedia.org/r/129676 [12:57:30] (03PS2) 10Tim Landscheidt: Tools: Install package libxml2-utils for xmllint [operations/puppet] - 10https://gerrit.wikimedia.org/r/120187 [12:58:12] !log upgrading db1047 (analytics slave) to mariadb 10 [12:58:18] Logged the message, Master [13:02:10] PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.110 [13:06:30] elastic1003 was down for the restart the moment it checked.... normally it doesn't catch that [13:10:23] (03PS1) 10Dzahn: remove unused IRC bot from openstack [operations/puppet] - 10https://gerrit.wikimedia.org/r/129681 [13:16:30] PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.111 [13:17:20] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:17:20] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:18:10] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [13:18:10] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [13:18:44] greg-g: any chance for me to get a javascript patch merged and deployed today to fix a regression in 1.24wmf1? [13:19:42] greg-g: https://gerrit.wikimedia.org/r/129683 [13:20:53] greg-g: this can be worked around on-wiki, if necessary, but i don't really feel like adding the same code to a dozen of scripts to avoid this (and i don't know how many wikis are affected, definitely plwiki and a few other polish wikis which borrow our scripts) [13:20:57] (03CR) 10coren: [C: 032] "I'm not use that was actually in use, but there certainly is no point to a bot that reports about gluster now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129681 (owner: 10Dzahn) [13:23:13] MatmaRex: for what it is worth the fix looks simple and safe to me [13:23:23] and its aweful early for greg [13:25:59] oh, yeah, it's barely 6 am in SF, right? [13:27:31] thank for looking :) [13:27:36] thanks* [13:27:37] 6:30. [13:27:39] yeah [13:30:51] (03PS1) 10Jgreen: enable bayes_auto_learn for iodine to help rebuild the db [operations/puppet] - 10https://gerrit.wikimedia.org/r/129685 [13:35:21] (03PS2) 10Jgreen: enable bayes_auto_learn for iodine to help rebuild the db [operations/puppet] - 10https://gerrit.wikimedia.org/r/129685 [13:36:47] (03CR) 10Ottomata: Redirect metrics.wikimedia.org to Wikimetrics (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129660 (owner: 10QChris) [13:38:11] (03CR) 10Jgreen: [C: 032 V: 031] enable bayes_auto_learn for iodine to help rebuild the db [operations/puppet] - 10https://gerrit.wikimedia.org/r/129685 (owner: 10Jgreen) [13:39:50] (03CR) 10Nuria: "Code looks good. As far as I can see it's executed in production and development right? I will be testing it for a bit." [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/129542 (owner: 10Milimetric) [13:47:30] PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.113 [13:50:34] (03PS2) 10QChris: Redirect metrics.wikimedia.org to Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/129660 [13:51:33] (03CR) 10QChris: [C: 04-1] Redirect metrics.wikimedia.org to Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/129660 (owner: 10QChris) [13:52:32] (03CR) 10QChris: Redirect metrics.wikimedia.org to Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/129660 (owner: 10QChris) [13:52:40] (03CR) 10QChris: Redirect metrics.wikimedia.org to Wikimetrics (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129660 (owner: 10QChris) [13:57:00] (03CR) 10Ottomata: [C: 032 V: 032] Redirect metrics.wikimedia.org to Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/129660 (owner: 10QChris) [14:01:30] (03PS2) 10Hashar: contint: apply beta natfix on Jenkins slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/127213 [14:01:52] RECOVERY - ElasticSearch health check on elastic1002 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [14:02:11] PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.139 [14:10:16] (03PS3) 10Yuvipanda: contint: extract android SDK dependencies to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [14:10:18] (03CR) 10jenkins-bot: [V: 04-1] contint: extract android SDK dependencies to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [14:16:10] PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.140 [14:18:37] (03PS1) 10Hashar: contint/beta: set natfix for the labs shared proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/129687 [14:29:30] RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [14:29:30] RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [14:29:39] (03CR) 10Hashar: [C: 031 V: 032] "deployed on contint puppetmaster and confirmed to fix the issue" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129687 (owner: 10Hashar) [14:29:51] PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.141 [14:34:44] (03PS4) 10Dzahn: turn ircecho into a parameterized class [operations/puppet] - 10https://gerrit.wikimedia.org/r/129676 [14:35:56] (03PS4) 10Tim Landscheidt: contint: extract android SDK dependencies to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [14:42:50] (03CR) 10Hashar: [C: 031] "This is fine and can be merged anytime as far as contint is concerned =)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [14:43:51] PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.142 [14:44:21] !log puppet stopped on iodine, doing manual spamassassin training [14:44:26] Logged the message, Master [14:46:56] !log disabled icinga notifications for iodine too... [14:47:03] Logged the message, Master [14:49:28] (03PS1) 10Dzahn: rm old wikibugs - replaced by pywikibugs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129694 [14:50:23] hashar: ^? [14:50:30] who made pywikibugs again [14:50:48] merjin [14:51:06] valhallasw ported wikibugs to python :] [14:51:11] mutante: ^^ [14:51:36] added him as a revieser [14:51:47] hashar: that also deletes from contint :) [14:51:49] thanks [14:52:06] i don't see it being applied on a production node [14:52:10] (the old class) [14:52:25] <^d> “we once had 3 bots using ircecho and we already deleted the openstack one (formerly labs-storage-wm)” [14:52:37] <^d> I think that's 4. [14:52:44] <^d> grrrit-wm used to be ircecho. [14:53:14] ah, true! that was already gone [14:53:48] counted icinga-wm, labs-storage-wm and wikibugs [14:54:16] (03CR) 10Tim Landscheidt: [C: 04-1] "err: /Stage[main]/Androidsdk::Dependencies/Package[libswt-gtk-3.5-java]/ensure: change from purged to present failed: Execution of '/usr/b" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [14:54:58] they all used ircecho, i made some changes to role/echoirc so you can use it on beta.. then akosiaris commented about puppet3 compat, so i tried to turn it into a parameterized class [14:55:21] then realized we dont use it anways.. so delete [14:55:43] (03CR) 10Nuria: [C: 031] "Tested on vagrant and db creation proceeds as it should." [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/129542 (owner: 10Milimetric) [14:56:05] that being said, would be nice to puppetize pywikibugs? [14:56:17] <^d> It's a toollabs thing. [14:56:58] hmm, is puppetizing tools a goal ? (even if very long term) [14:57:33] <^d> I have no idea. I imagine not? [14:59:04] (03PS2) 10Dzahn: rm old wikibugs - replaced by pywikibugs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129694 [14:59:47] ^d: this morning, I documented [15:01:20] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Fri 25 Apr 2014 12:01:08 PM UTC [15:01:56] (03CR) 10Dzahn: "this becomes simpler with I1c7dd567ecf7e and I6817cd1e1c9" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129676 (owner: 10Dzahn) [15:06:10] <^d> manybubbles: I see :) [15:06:42] (03CR) 10Tim Landscheidt: [C: 04-1] "Maybe not needed in Tools:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125241 (owner: 10Yuvipanda) [15:08:17] ^d: i figured it was a good thing to do between a pairing session and keeping an eye on my rolling restart [15:09:49] (03PS1) 10Springle: New config for db1047 as MariaDB 10 all-shards analytics slave. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129697 [15:10:51] PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.144 [15:12:17] ^d: maybe it makes sense to build the wikis in sorted order rather then random order [15:12:34] that way each job _should_ stay on its own wiki.... [15:12:46] <^d> I usually do them alphabetically. It really really shouldn't matter though. [15:13:10] (03PS2) 10Springle: New config for db1047 as MariaDB 10 all-shards analytics slave. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129697 [15:13:40] <^d> manybubbles: When do we want new highlighter turned on for testwikis? [15:13:50] monday [15:13:57] I scheduled it for the swat deploy [15:14:18] that, more redundency, and the highlighter settings updte [15:18:01] <^d> Mmmmmk [15:22:14] <^d> Oh, so I asked you yesterday but never got a reply. Do I *have* to do the central thing? [15:23:21] (03CR) 10Springle: [C: 032] New config for db1047 as MariaDB 10 all-shards analytics slave. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129697 (owner: 10Springle) [15:23:41] ^d: it'd be really odd not to [15:25:51] RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:25:52] RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:25:52] RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:26:50] <^d> manybubbles: You said I had to rename everything to be in the org.wikimedia.* package? https://github.com/wikimedia/search-highlighter/tree/master/experimental-highlighter-elasticsearch-plugin/src/main/java/org doesn't... [15:27:10] <^d> Or were we just talking past each other, and you meant the groupID was the thing that had to change, not packages too [15:27:16] ^d: yeah, thats Elasticsearch's stupid fault though [15:27:25] (03PS2) 10Dzahn: WIP - put apache sync scripts into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/129399 [15:27:28] groupid has to change, and packages _should_ match the groupid [15:27:45] in this case they can't because Elasticsearch is silly [15:27:53] I should send a pull request to fix that, actually [15:28:14] <^d> Ok, I'll try renaming all of this. [15:28:22] <^d> And I'll get my JIRA ticket in today. Already made an account. [15:28:39] good luck. Eclipse has a rename thing in the refactor menu you can do to packages [15:28:49] and it has a check box that tries to rename sub-packages [15:28:54] it worked pretty well for me [15:29:24] <^d> What's your groupID? org.wikimedia? [15:29:38] <^d> Or .something after that? [15:34:45] Oh, FFS. Hint for future RT duty newbies: Do *not* use the 'wikimedia default' dashboard. It hides most incoming tickets! [15:35:46] <^d> manybubbles: How's https://github.com/demon/elasticsearch-repository-swift/commit/eb9492596704395332e44ff45b71d990be482806 look? [15:36:20] RECOVERY - Host ps1-c3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 33.01 ms [15:36:20] RECOVERY - Host ps1-c2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 33.31 ms [15:36:20] RECOVERY - Host ps1-d1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 33.77 ms [15:36:20] RECOVERY - Host ps1-d3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 38.14 ms [15:36:20] RECOVERY - Host ps1-c1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 38.12 ms [15:36:28] RobH: ^^ [15:36:31] cmjohnson1: ^ too [15:36:31] :) [15:36:48] ^d: org.wikimedia.search.highlighter - it normally should be everything before the first . [15:36:51] RECOVERY - Host ps1-d2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 32.95 ms [15:37:19] <^d> Ah, I should use org.wikimedia.search.* then? [15:38:10] ^d: that makes sense. org org.wikimedia.elasticsearch (because it is for elasticsearch) [15:38:11] Coren: arr, yea, indeed. that was actually an attempt to make a dashboard for _non_ Ops people to make it less confusing, so only ops-requests there [15:38:29] I'm in .search because mine doesn't _require_ elasticsearch - it'll work with plain java or lucene, but it integrates with elasticsearch [15:39:02] <^d> Ahhhh, ok [15:39:05] <^d> I'll do that then [15:39:10] RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:40:39] Coren: i'd suggest hitting Edit on your personal dashboard, then add the "Unowned Tickets" widget. or you can make a custom search,save as "saved search", and add a widget with that search [15:41:40] mutante: Yeah, that's what I'm gonna do. I'm just glad there weren't too many tickets I missed. [15:42:25] <^d> manybubbles: https://github.com/demon/elasticsearch-repository-swift/commit/236bf888d2efdea88b56f27a7040cb28d243c33f [15:42:36] Coren: we should share our dashboards so any user can use them.. gotta run though. cya later [15:52:30] PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.12 [15:54:03] (03PS1) 10BryanDavis: Call scap-rebuild-cdbs after sync-common [operations/puppet] - 10https://gerrit.wikimedia.org/r/129706 [15:55:06] (03CR) 10jenkins-bot: [V: 04-1] Call scap-rebuild-cdbs after sync-common [operations/puppet] - 10https://gerrit.wikimedia.org/r/129706 (owner: 10BryanDavis) [16:00:06] (03PS2) 10BryanDavis: Call scap-rebuild-cdbs after sync-common [operations/puppet] - 10https://gerrit.wikimedia.org/r/129706 [16:00:37] (03PS1) 10John F. Lewis: Remove Wikidata as an importsource for testwikidatawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129707 [16:01:08] JohnLewis: :P Was about to push my change :P [16:01:21] hoo: I'm faster then :D [16:01:33] Do I need to create a dummy bug for that patch? [16:01:34] I was doing other stuff in between [16:02:05] hoo: My computer decided to freeze mid updating mediawiki-config :p [16:02:05] JohnLewis: You missed the most important bit [16:02:21] disallowing imports from testwikidata to wikidata [16:02:44] the other way round doesn't really matter, testwikidata is a broken place anyway :/ [16:03:01] JohnLewis: ^ [16:04:00] (03PS2) 10John F. Lewis: Remove Wikidata as an importsource for testwikidatawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129707 [16:04:38] urg [16:04:48] forgot to add the change to that commit >.> [16:04:55] :P [16:05:10] (03PS3) 10John F. Lewis: Remove Wikidata as an importsource for testwikidatawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129707 [16:05:15] 3rd time lucky :p [16:05:51] this one is good... if I wouldn't risk being shot for deploying on a Friday, I'd push that now [16:06:07] will schedule it for monday, though [16:06:10] urg Friday >.> [16:06:11] there's more stuff anyway [16:06:46] hoo: I'll schedule it for Monday - my patch :p Also - I don't need a dummy bug for that right? [16:07:03] JohnLewis: I'll take care of everything [16:07:08] MatmaRex: interesting. it's in all live versions... yeah, get manybubbles to review/+1 and yeah, we can get that out today [16:07:51] greg-g: thanks, i appreciate [16:07:56] hoo: I'm literally just about to shove it into Mondays SWAT :P I won't if you want to deal with it though :) [16:08:24] greg-g: was already +1'd, but i had to fix a silly mistake [16:08:35] oh, it is +1'd again :) [16:08:37] :) [16:08:39] I'm not yet sure... I can just push it into Monday's SWAT than it's no longer my issue (tm), or I'll just do it myself [16:08:51] PROBLEM - ElasticSearch health check on elastic1016 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.13 [16:09:02] MatmaRex: cool [16:10:22] (03PS1) 10Hoo man: Add two languages not supported by MediaWiki to wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129708 [16:11:39] (03CR) 10Hoo man: [C: 031] "@The SWAT deployer: This one is for technical reasons, no consensus or stuff like that needed." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129707 (owner: 10John F. Lewis) [16:11:55] JohnLewis: I'll just schedule both for the SWAT on monday [16:12:14] hoo: Fair enough :) [16:14:27] ok, 5 more minutes... wikitech blocked my IP for to many password attempts, nice [16:14:51] RECOVERY - ElasticSearch health check on elastic1016 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [16:15:11] RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [16:15:11] RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [16:15:11] hoo: Seriously? :p [16:15:30] RECOVERY - ElasticSearch health check on elastic1015 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6035: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [16:15:41] JohnLewis: I tried my production password a couple of times, then my old (pre-heartbleed) production password, then I remembered that this is labs [16:15:58] :p [16:16:31] hoo: Use your production password with 'wmflabs123456789' added at the end. 100% more secure :D [16:16:48] (for those not in this channel at this moment in time reading this chat ofc) [16:19:36] (03CR) 10John F. Lewis: [C: 04-1] "reinstate hoo's -1." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/127443 (owner: 10John F. Lewis) [16:19:50] hoo ^ I just remembered that :p [16:22:54] !log Elasticsearch rolling restart complete. [16:23:00] Logged the message, Master [16:33:47] (03PS2) 10Alexandros Kosiaris: Vary twemproxy config location based on getRealmSpecificFilename() (take 2) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129663 (owner: 10Reedy) [16:34:09] <^d> gwicke: You about? [16:34:34] Reedy: ^ this works better IMHO [16:34:40] may I merge ? [16:36:25] akosiaris: btw [16:36:29] I have packages ready [16:36:34] for twemproxy [16:36:38] for more than a year now [16:36:51] aha... [16:36:53] and ? [16:36:55] proper packages, with an init script that spawns it with a /etc/nutcracker/... config file [16:37:13] which we'd symlink to whatever location we'd want [16:37:18] well.. what are we waiting for ? [16:37:29] upstream wasn't very helpful and I gave up at some point [16:37:36] but I intend to ressurect it for the trusty/hhvm project [16:39:21] not very helpful ? like what ? not accepting patches for the various small simple problems ? [16:39:31] yeah [16:39:39] meh [16:40:19] e.g. https://github.com/twitter/twemproxy/pull/123 [16:41:50] also, https://github.com/twitter/twemproxy/pull/121 [16:56:10] (03PS3) 10Ori.livneh: Call scap-rebuild-cdbs after sync-common [operations/puppet] - 10https://gerrit.wikimedia.org/r/129706 (owner: 10BryanDavis) [16:56:43] akosiaris: I just reopened 121 [16:56:52] they weren't convinced by my "don't embed libyaml" arguments last time [16:57:14] let's see if they'll be convinced now that I added the fact that libyaml has had two security vulnerabilities since and they haven't bothered to update [16:58:24] (03CR) 10Ori.livneh: [C: 032] Call scap-rebuild-cdbs after sync-common [operations/puppet] - 10https://gerrit.wikimedia.org/r/129706 (owner: 10BryanDavis) [16:59:02] paravoid: ahahah [16:59:09] let's see... [17:08:41] !log reenabled puppet and notifications for iodine [17:08:46] Logged the message, Master [17:10:43] for a moment I was like "we host an iodine instance, wtf?!"... then I remembered our naming scheme :P [17:11:12] hoo: btw, seeing you here reminds me -- sorry for the delay re: startup module patch, i will re-review asap [17:12:29] :) [17:13:33] (03CR) 10Ori.livneh: [C: 031] Move beta scap source directory off of NFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/127399 (owner: 10BryanDavis) [17:17:57] (03CR) 10Ori.livneh: [C: 031] "\o/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129694 (owner: 10Dzahn) [17:18:46] (03CR) 10CSteipp: "Should we start with a max-age of 604800 (7 days) instead of 6 months, just in case we hit something unexpected?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/127256 (owner: 10JanZerebecki) [17:33:16] (03PS2) 10Ottomata: Add second test wiki database [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/129542 (owner: 10Milimetric) [17:35:22] (03PS3) 10Ottomata: Add second test wiki database [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/129542 (owner: 10Milimetric) [17:35:54] (03CR) 10Ottomata: "Nuria, Dan, can you test this? My vagrant isn't currently set up to test. I think this should work but there might be kinks." [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/129542 (owner: 10Milimetric) [17:41:43] (03PS1) 10Ottomata: Running camus every 10 minutes [operations/puppet] - 10https://gerrit.wikimedia.org/r/129713 [17:42:05] Hi [17:42:20] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:42:20] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:42:31] I need someone to check whether a password reset email got sent to a certain address [17:43:00] I can check a user_email for you, if needed [17:43:09] You are WMF staff with proper NDA, right? [17:43:11] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [17:43:11] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [17:43:15] (03PS2) 10Ottomata: Running camus every 10 minutes [operations/puppet] - 10https://gerrit.wikimedia.org/r/129713 [17:43:47] hoo, it's complicated, but that's irrelevant because it involves OTRS which I only access as a volunteer [17:44:11] (03PS3) 10Ottomata: Running camus every 10 minutes [operations/puppet] - 10https://gerrit.wikimedia.org/r/129713 [17:44:28] PM me with details [17:44:29] (03CR) 10Ottomata: [C: 032 V: 032] Running camus every 10 minutes [operations/puppet] - 10https://gerrit.wikimedia.org/r/129713 (owner: 10Ottomata) [17:45:20] RECOVERY - Puppet freshness on analytics1026 is OK: puppet ran at Fri Apr 25 17:45:16 UTC 2014 [17:46:17] (03PS1) 10Dr0ptp4kt: Deter keeprefreshing.com noise. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129714 [17:51:03] (03CR) 10Dr0ptp4kt: "Please allow Christian time to verify this won't break analytics things." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129714 (owner: 10Dr0ptp4kt) [18:02:27] (03CR) 10Giuseppe Lavagetto: [C: 031] Deter keeprefreshing.com noise. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129714 (owner: 10Dr0ptp4kt) [18:03:38] (03CR) 10Ori.livneh: "@Giuseppe: makes sense to me; I'll amend." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129075 (owner: 10Ori.livneh) [18:09:46] (03PS1) 10Ottomata: Adding diskstat percent_io_time and io_time to ganglia Kafka view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129718 [18:10:07] (03CR) 10Ottomata: [C: 032 V: 032] Adding diskstat percent_io_time and io_time to ganglia Kafka view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129718 (owner: 10Ottomata) [18:12:34] (03PS1) 10Jforrester: Enable Flow on mw:Talk:VisualEditor/Beta Features/Language [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129721 [18:21:21] (03CR) 10QChris: [C: 031] "That should kill >99.9% of the "keeprefreshing"-related noise we're seeing." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129714 (owner: 10Dr0ptp4kt) [18:28:54] (03CR) 10Hoo man: "Isn't that thing hitting on the desktop site?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129714 (owner: 10Dr0ptp4kt) [18:32:01] (03CR) 10Ori.livneh: "@BBlack: in response to your IRC ping, yes, I did test this" [operations/puppet] - 10https://gerrit.wikimedia.org/r/127131 (owner: 10Ori.livneh) [18:43:10] (03CR) 10QChris: "> Isn't that thing hitting on the desktop site?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129714 (owner: 10Dr0ptp4kt) [19:00:20] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:00:20] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:11] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [19:01:11] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [19:08:46] (03PS1) 10Andrew Bogott: Add the decom-user resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/129728 [19:08:48] (03PS1) 10Andrew Bogott: Decom users preilly and dsc [operations/puppet] - 10https://gerrit.wikimedia.org/r/129729 [19:26:54] (03CR) 10Rush: [C: 032] "I say let's give this a try, it's not going to explode anything!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 (owner: 10Giuseppe Lavagetto) [19:27:28] (03PS3) 10Andrew Bogott: Decom users preilly and dsc [operations/puppet] - 10https://gerrit.wikimedia.org/r/129729 [19:27:30] (03PS3) 10Andrew Bogott: Add the decom-user resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/129728 [19:27:31] my favorite thing I've said here so far [19:38:20] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:38:20] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:39:11] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [19:39:11] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [19:42:37] (03CR) 10Ottomata: [C: 032 V: 032] Add second test wiki database [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/129542 (owner: 10Milimetric) [19:44:42] (03PS1) 10Ottomata: Updating wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/129784 [19:45:12] (03PS2) 10Ottomata: Updating wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/129784 [19:45:18] (03CR) 10Ottomata: [C: 032 V: 032] Updating wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/129784 (owner: 10Ottomata) [19:45:41] (03PS1) 10Jforrester: Enable Flow on mw:Talk:Content translation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129795 [20:00:28] greg-g: mwalker: so, what's your plan about https://gerrit.wikimedia.org/r/#/c/129683/ ? (i see you merged it, thanks) [20:02:15] MatmaRex, is that live on the cluster? I didn't see the offending bug having been deployed into 1.24wmf2? [20:02:24] (03CR) 10Hoo man: [C: 031] "In that case... this looks good" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129714 (owner: 10Dr0ptp4kt) [20:02:35] MatmaRex, I'm more than happy to deploy it on monday though [20:02:37] mwalker: the bug is live on wmf1, yes [20:03:09] it's live on enwiki? [20:03:23] * mwalker may not be used to our rapid iteration cycle yet [20:03:32] yes, but i haven't heard of it causing issues there; is it causing issues on plwiki [20:03:46] it is* [20:04:16] i can work around it locally if i really have to, but it would require fixing up a bunch of scripts in an ugly way [20:04:58] how bad is it? e.g. can it wait till monday; or are you pressing for a deploy for this today? [20:06:19] greg-g, if we deploy today I think it would be better to revert https://gerrit.wikimedia.org/r/#/c/124567/2 rather than deploy https://gerrit.wikimedia.org/r/#/c/129683/ [20:07:36] mwalker: it seems to have killed a bunch of widely used gadgets [20:07:57] mwalker: I'll take your advice on that. [20:08:05] I give you the go ahead to do that at will :) [20:08:17] reverting sounds reasonable to me too [20:08:23] greg-g, I just dont want to take the chance to break more things [20:08:29] we can revert the revert and push the fix on monday [20:08:33] yeah [20:08:37] +1 [20:08:39] if you can't deploy because friday, then that's understandable and i'll figure something out [20:08:49] but the core patch / WikiEditor revert would be a lot cleaner [20:08:51] we can fix things on friday, reverts preferably, so this fits [20:09:14] we don't want to get a in a situation of "well, gotta muscle forward until it's fixed" on a friday [20:11:36] greg-g / MatmaRex : one of you want to give me a +2 on https://gerrit.wikimedia.org/r/#/c/129807/ [20:11:45] then I'll cherry pick that to wmf1 and 2 and deploy it [20:12:28] mwalker: done [20:12:38] thanks [20:18:01] (03PS4) 10Andrew Bogott: Decom users preilly and dsc [operations/puppet] - 10https://gerrit.wikimedia.org/r/129729 [20:18:03] (03PS4) 10Andrew Bogott: Add the decom-user resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/129728 [20:18:19] (03CR) 10Hashar: "c[_] c[_] c[_] c[_]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129694 (owner: 10Dzahn) [20:18:43] that is just because gerrit does not let me comment 🍺 [20:19:49] haha [20:20:52] (03PS1) 10BryanDavis: [WIP] Provision scap scripts using trebuchet [operations/puppet] - 10https://gerrit.wikimedia.org/r/129814 [20:21:38] hashar: :-) [20:23:52] * hashar pass some 🍩 to James_F  [20:24:17] there is a whole of them in a unicode block http://www.fileformat.info/info/unicode/block/miscellaneous_symbols_and_pictographs/list.htm [20:24:39] I am not sure how well they are supported though. Wikipedia has a bunch of redirect for them though [20:24:56] So you can http://en.wikipedia.org/wiki/🌂 [20:25:18] greg-g, so my plan is to deploy the core fix in monday's swat; and if that doesn't cause problems (it shouldn't but it is changing an edge case of an api...) then I'll revert the revert on tuesday [20:25:24] and all that has been added to the calendar [20:28:45] !log mwalker synchronized php-1.24wmf1/extensions/WikiEditor/ 'Reverting some faulty WikiEditor code for bug 64289' [20:28:52] Logged the message, Master [20:29:07] !log mwalker synchronized php-1.24wmf2/extensions/WikiEditor/ 'Reverting some faulty WikiEditor code for bug 64289' [20:29:14] Logged the message, Master [20:29:22] MatmaRex, ^ can you please test to see if things are working again? [20:31:38] mwalker: seems to have fixed the issue, thank you! [20:32:42] shiney [20:34:52] MatmaRex, if you haven't looked at the bug in the last 5 minutes; whilst you're still thinking about it TheDJ asked for an example page or two [20:35:28] greg-g, I'm done; we're back to the normal broken state [20:36:14] wait, we're fixed ie: broken? :) [20:37:12] heh; we fixed the new bug by reverting to old broken behaviour that was less broken than where we were 10 minutes ago [20:37:37] the ole 1 step forward 2 steps back thing [20:37:51] 'Breaking is the new fixing' -Wikimedia Operations [20:37:58] mwalker: seen it, will reply [20:38:15] mwalker: :) [20:42:48] (03CR) 10coren: [C: 032] "Seems legit." [operations/dns] - 10https://gerrit.wikimedia.org/r/127434 (owner: 10John F. Lewis) [20:46:43] (03CR) 10coren: [C: 032] Tools: Install package libxml2-utils for xmllint [operations/puppet] - 10https://gerrit.wikimedia.org/r/120187 (owner: 10Tim Landscheidt) [20:51:20] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:51:20] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:52:38] ? [20:53:11] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [20:53:11] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [20:54:50] -? [21:27:38] (03PS2) 10MarkTraceur: FUTURE: Fourth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125034 [21:27:46] (03PS2) 10MarkTraceur: FUTURE: Fifth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125035 [21:29:48] (03PS3) 10MarkTraceur: FUTURE: Fourth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125034 [21:34:52] (03PS3) 10MarkTraceur: FUTURE: Fifth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125035 [21:38:32] (03PS4) 10MarkTraceur: FUTURE: Fifth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125035 [21:38:34] (03PS1) 10MarkTraceur: FUTURE: Sixth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129828 [21:39:23] chasemp: your email about admins.pp doesn't mention ldap… is it still the intent that accounts will ultimately derive from there? [21:39:34] yes [21:39:58] but that is the plan but that is part 2 or 3? [21:40:28] I would like to port the existing to yaml directly and then when that's good in actual application, figure out how to duplicate our well-loved yaml from ldap [21:40:31] andrewbogott: ping [21:40:34] and then ldap is forever authoritative [21:40:41] yep, makes sense that it's a later stage, just wanted to make sure my uid-organizing work is not in vain. [21:40:47] preilly: what's up? [21:40:59] andrewbogott: did that change remove me from labs again? [21:41:02] preilly: oh, you got pinged by my patch earlier, huh? [21:41:10] yeah [21:41:12] Nope, not labs-related. [21:41:21] Okay cool thanks for the clarification [21:41:31] In theory it should not affect you at all -- you have a latent entry in puppet and I'm trying to clean them up. [21:41:52] I picked two users at random to be the first test cases, you were one of them :) [21:42:29] ha ha [21:42:37] that’s a pretty funny random pairing [21:43:00] oh, I suppose so :/ I didn't even think of it [21:43:46] andrewbogott: ha ha [21:45:20] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:20] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:26] andrewbogott: I left you off my list of peoples in athens from discussion....sorry [21:45:38] (03CR) 10Yuvipanda: [C: 031] Decom users preilly and dsc [operations/puppet] - 10https://gerrit.wikimedia.org/r/129729 (owner: 10Andrew Bogott) [21:45:44] chasemp: I was not offended :) [21:45:50] andrewbogott: nice pairing. [21:45:54] eek [21:46:13] um… YuviPanda, if you want to look at the patch that that one depends on… that's the actual interesting part :) [21:46:28] aha! I was wondering how that is done [21:46:37] andrewbogott: no module? tch [21:46:56] YuviPanda: 'cause chase is writing a new admins module right now, didn't want to accidentally create two [21:47:01] andrewbogott: ah [21:47:02] ok [21:48:00] andrewbogott: find / looks very expensive [21:48:07] andrewbogott: does that search shared storage too? [21:48:18] YuviPanda: oh, that won't run on labs. [21:48:47] On production… yeah, it's painful but probably necessary. [21:49:10] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [21:49:10] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [21:49:10] andrewbogott: would it run on every host? [21:49:26] only on hosts where that user had an account to begin with. [21:49:44] Or, at least, my intent is that it checks for /home/$user and only does the giant purge if homedir is found [21:49:55] did I get that part wrong? [21:50:18] no that seems right [21:50:32] I think problem is that perhaps I've no idea how prod is setup, and am talking from a labs-only knowledge :D [21:50:47] andrewbogott: but yes, this is a cool way to get rid of it, despite the find / being scary [21:55:02] (03PS5) 10Yuvipanda: contint: extract android SDK dependencies to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [21:55:16] scfc_de: the swt thing wasn't really required [21:55:19] scfc_de: can you test this? [21:56:14] Can anyone rattle off a clever salt command to detect whether or not $username is logged in anywhere in the cluster? [21:56:23] (03PS1) 10Rush: Imported Upstream version 2.2.9 [operations/debs/ircd-ratbox] - 10https://gerrit.wikimedia.org/r/129832 [21:56:25] (03PS1) 10Rush: gerrit .gitreview file [operations/debs/ircd-ratbox] - 10https://gerrit.wikimedia.org/r/129833 [21:56:28] I'm not convinced I can trust 'who' [21:58:00] (03CR) 10Yuvipanda: "@tim: SWT wasn't needed so I have killed it. Also sorted the package names. Can you test this?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [21:58:19] YuviPanda: One moment, please. [21:58:24] scfc_de: woo! [22:00:06] (03PS1) 10Rush: debian packaging directory with our details [operations/debs/ircd-ratbox] - 10https://gerrit.wikimedia.org/r/129835 [22:10:13] scfc_de: any luck? [22:12:27] YuviPanda: Yes, works for me on toolsbeta-puppetmaster3. [22:12:47] scfc_de: ah, cool! [22:13:05] (03CR) 10Tim Landscheidt: [C: 031] "Works for me on toolsbeta-puppetmaster3." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [22:13:26] andrewbogott: can you merge ^? [22:20:20] YuviPanda: just a minute, intermittently afk [22:21:10] (03PS2) 10MarkTraceur: FUTURE: Sixth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129828 [22:24:30] PROBLEM - Host db1016 is DOWN: PING CRITICAL - Packet loss = 100% [22:25:06] shits going down [22:25:14] errm... totally the wrong channel [22:25:34] * Damianz notes the randomly accurate timing though [22:25:44] :) [22:26:30] (03PS6) 10Andrew Bogott: contint: extract android SDK dependencies to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [22:30:12] (03CR) 10Andrew Bogott: [C: 032] contint: extract android SDK dependencies to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/126000 (owner: 10Hashar) [22:32:18] YuviPanda: puppet still runs w/out errors on gallium. [22:32:42] andrewbogott: woot! [22:32:45] andrewbogott: ty! [22:49:04] (03CR) 10Giuseppe Lavagetto: "Great work, really clean overall. Some small notes :)" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush)