[00:00:09] mutante, yeah I just replied [00:00:24] thanks! [00:04:05] New patchset: Pyoungmeister; "migrating all pmtpa slaves to coredb-based roles" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43970 [00:07:40] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43970 [00:08:10] merging someone's scap script stuff [00:16:34] New patchset: Pyoungmeister; "migrating pmtpa otrs dbs to coredb-based role classes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43971 [00:18:00] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43971 [00:19:18] New patchset: Asher; "deploy sharded SqlBagOStuff across pc1-3 for parsercache" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43973 [00:19:54] AaronSchulz: can you review ^^ [00:30:55] New patchset: Reedy; "Update wikiversions-labs.dat to use same versions as production" [operations/mediawiki-config] (newdeploy) - https://gerrit.wikimedia.org/r/43974 [00:31:53] Change merged: Reedy; [operations/mediawiki-config] (newdeploy) - https://gerrit.wikimedia.org/r/43974 [00:32:41] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [00:32:42] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [00:32:42] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [00:32:42] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [00:32:42] New patchset: Pyoungmeister; "some cleanup of db boxes in site.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43975 [00:33:38] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43975 [00:35:24] the old intern proxy stuff is in labs now right? [00:35:32] i have an old ticket about getting it hardware i wanna resolve. [00:35:39] Ryan_Lane: perhaps you know? ^ [00:37:05] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43961 [00:38:45] New patchset: Pyoungmeister; "db67 (researchdb) to coredb-based role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43976 [00:39:44] PROBLEM - Puppet freshness on gallium is CRITICAL: Puppet has not run in the last 10 hours [00:39:50] New patchset: Ryan Lane; "More threads for l10n cache rebuild" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43977 [00:40:21] RobH: no [00:40:25] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43976 [00:40:26] RobH: we're killing internproxy [00:40:28] it's on stat1 [00:40:35] oh..... [00:40:47] well, i still can kill ticket then [00:40:51] so yay? ;P [00:40:53] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43977 [00:40:54] heh [00:40:54] thx for info =] [00:40:57] yw [00:41:32] yay puppet finally completed on sanger [00:41:36] take that ! [00:42:25] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43973 [00:43:45] !log asher synchronized wmf-config/CommonSettings.php 'deploying db parsercache, sharded and replicated' [00:43:55] Logged the message, Master [00:47:36] !log the production persistent parsercache is now on pc[1-3], replicated to pc100[1-3] [00:47:49] Logged the message, Master [00:48:43] New patchset: Pyoungmeister; "moving pmtpa s5 master to coredb-based role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43978 [00:49:30] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43978 [00:49:55] merging some l10n stuff [00:54:37] New patchset: Pyoungmeister; "moving all other pmtpa masters to coredb-based role classes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43980 [00:56:51] New patchset: Asher; "helper script for mha to ensure write location consistency during an online master switch." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43981 [00:57:18] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43980 [00:57:41] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43981 [01:05:03] New patchset: Pyoungmeister; "migrating pmtpa es2 shard to coredb-based role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43983 [01:08:00] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43983 [01:09:10] !log deleting atop.log.1* in /var/log on neon to free disk space [01:09:28] Logged the message, Master [01:09:53] RECOVERY - MySQL disk space on neon is OK: DISK OK [01:10:01] LeslieCarr: <-- neon ran out of disk .. atop logs use quite a bit of it [01:10:52] !log neon - changing logrotate config for atop to rotate 7 instead of 14 [01:11:20] cool [01:11:21] wow [01:11:40] oh well shit, the main partition is only 9G ? [01:11:49] why don't we resize that [01:12:36] Logged the message, Master [01:13:57] New patchset: Pyoungmeister; "migrating pmtpa es3 shard to coredb-based role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43984 [01:14:41] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 326 seconds [01:16:00] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43984 [01:16:20] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 14 seconds [01:17:17] Ryan_Lane: (or whoever knows) are we using the Perl-based git-deploy or sartoris on Wednesday? [01:17:28] perl [01:17:56] Ryan_Lane: is it going to deploy to all eqiad api/app/job servers on weds? [01:18:00] yes [01:18:06] woo! [01:18:09] and tampa [01:18:09] right? [01:18:09] mutante: hrm, can you help me out ? i haven't done lvm extending - so i am on neon but lvdisplay seems to only show the swap partition as a volume ? [01:18:15] or just eqiad? [01:18:26] right [01:18:28] it's already deployed in eqiad [01:18:28] and i know that can't be right [01:19:06] it's probably testable in eqiad right now, if anyone knows how to do so :) [01:19:13] Ryan_Lane: will there be a way to deploy livehacks in case of uber-emergency? if no, can we make one? [01:19:17] yes [01:19:22] just check in locally [01:19:44] directly on the deploy host? (tin?) [01:19:47] yes [01:20:17] we should probably have a warning for people logging in as root [01:20:22] or anyone else who is knowledgeable about extending lvm [01:20:24] like "What the fuck are you doing?" [01:20:31] LeslieCarr: huh? [01:20:48] LeslieCarr: which drive are you trying to extend? [01:21:06] it's possible that it's not really using lvm [01:21:12] neon's main partition [01:21:16] most systems don't have / set as lvm [01:21:20] man, that would be crazy if only swap was using lvm [01:21:23] it really isn't using lvm [01:21:26] New patchset: Aaron Schulz; "Set sqlbagostuff log." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43986 [01:21:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:21:36] noticed that / was only 10G [01:21:53] see what space is left on the vg [01:21:55] vgdisplay [01:21:59] make another volume [01:22:06] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43986 [01:22:09] mount whatever is eating a lot of space under the new one [01:22:10] and mount it somewhere [01:22:32] well, yeah, as binasher says, mount it somewhere, then move the data [01:22:39] then mount it where you want it to go [01:23:48] !log aaron synchronized wmf-config/InitialiseSettings.php 'Set sqlbagostuff log' [01:23:58] Logged the message, Master [01:24:35] so i'd do "lvcreate -L 16G -n logs neon " to create the volume (it's basically /var/logs/ taking up the space in the tiny tiny partition) [01:24:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.371 seconds [01:25:19] yep [01:25:25] wait [01:25:30] -n logs neon? [01:25:43] is neon the vg? [01:25:51] yeah, neon is the vg [01:25:57] ah ok. yes, then [01:27:45] back later [01:39:15] !log restarting neon [01:39:24] Logged the message, Mistress of the network gear. [01:42:27] binasher - are you still around ? [01:42:35] and would be able to help :) [01:43:04] so i remounted /var/log on a partition [01:43:10] however, it'snot having logs written to it [01:45:52] you need to restart rsyslog [01:46:00] thanks [01:46:01] and whatever else writes to /var/log, like apache [01:46:50] hrm, that doesn't seem to be doing it [01:47:12] is it possible processes could still be writing to the old /var/log path (which is now sort of "hidden") ? [01:47:34] lsof | grep var/log [01:47:42] not if they've been restarted [01:48:44] LeslieCarr: apache is writing to /var/log/apache2 on neon.. looks ok [01:49:53] hrm, though rsyslog has been restarted [01:50:11] ah [01:50:13] nm [01:50:22] icinga isn't running which should be spamming out the logs [01:51:30] yay [01:51:31] :) [01:51:39] LeslieCarr: 2013-01-15 01:50:57 1Tuvg1-00014M-Uv Cannot open main log file "/var/log/exim4/mainlog": Permission denied: euid=105 egid=109 [01:52:01] some owner/permissions may not have been preserved [01:52:13] or directories might have not been created [01:52:43] cool [01:53:10] in this case /var/log/exim4/mainlog is there but incorrect user, group, and perms [01:53:34] i'll fix that up [01:54:41] rsyslog can't write to most of the log files either [01:57:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:58:52] i'm going through and slowly fixing it [02:02:21] !log created a new lvm for /var/log on neon, copied files over, and restarted processes. some file permission issues and incorrect file owners may remain [02:02:32] Logged the message, Mistress of the network gear. [02:09:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds [02:16:05] !log rebooting neon [02:16:15] Logged the message, Mistress of the network gear. [02:26:38] !log LocalisationUpdate completed (1.21wmf7) at Tue Jan 15 02:26:37 UTC 2013 [02:26:41] PROBLEM - MySQL disk space on db78 is CRITICAL: DISK CRITICAL - free space: /a 117078 MB (3% inode=99%): [02:26:47] Logged the message, Master [02:42:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.907 seconds [03:27:23] RECOVERY - MySQL disk space on db78 is OK: DISK OK [03:29:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:26] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [03:43:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.063 seconds [04:17:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:20:29] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [04:28:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.711 seconds [04:50:49] !log set pc[1-3] to replicate from pc100[1-3], full master/master all the way across the sky [04:50:59] Logged the message, Master [05:05:24] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 196 seconds [05:06:00] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 193 seconds [05:07:12] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [05:07:39] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [05:30:11] New patchset: Asher; "adding acct pkg to base::standard-packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43991 [05:30:48] any opposition to adding acct to base::standard-packages? https://gerrit.wikimedia.org/r/#/c/43991/ [05:40:21] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 184 seconds [05:42:09] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [06:46:30] hey the Wikidata interwiki prefix is broken [07:19:37] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [07:41:39] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [08:33:39] !log gallium : installing liblua5.1-0-dev package. [08:33:50] Logged the message, Master [08:38:09] New patchset: Hashar; "(bug 43819) liblua5.1-0-dev on gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43999 [08:40:12] New review: Hashar; "I have installed the package manually in production. That did fix the related bug." [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/43999 [08:59:52] New review: Hashar; "Per faidon :" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/43420 [09:10:45] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [09:10:45] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [09:16:11] New patchset: Nikerabbit; "ULS config update" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44004 [09:20:48] Change merged: Nikerabbit; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44004 [09:22:35] New patchset: Nikerabbit; "Oops, add missing comma" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44007 [09:22:47] Change merged: Nikerabbit; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44007 [09:27:18] New patchset: J; "install libjpeg-turbo-progs for rotate api" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44008 [09:29:39] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 207 seconds [09:29:48] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 213 seconds [09:31:30] New patchset: Nikerabbit; "Enable ULS and disable WebFonts/Narayam on Translate wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44009 [09:34:06] New review: Siebrand; "Disable Narayam for meta, too." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/44009 [09:34:18] RECOVERY - Puppet freshness on gallium is OK: puppet ran at Tue Jan 15 09:34:01 UTC 2013 [09:35:32] New patchset: Nikerabbit; "Enable ULS and disable WebFonts/Narayam on Translate wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44009 [09:38:21] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [09:38:39] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [09:42:34] !log nikerabbit synchronized php-1.21wmf7/extensions/UniversalLanguageSelector/ 'ULS to master' [09:42:44] Logged the message, Master [10:00:50] Change merged: Nikerabbit; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44009 [10:03:02] !log nikerabbit synchronized wmf-config/CommonSettings.php 'Updated ULS configuration' [10:03:11] Logged the message, Master [10:04:06] !log nikerabbit synchronized wmf-config/InitialiseSettings.php 'Enable ULS and disable WebFonts/Narayam on Translate wikis' [10:04:21] Logged the message, Master [10:04:45] PROBLEM - Puppet freshness on db1048 is CRITICAL: Puppet has not run in the last 10 hours [10:05:39] PROBLEM - Puppet freshness on db1007 is CRITICAL: Puppet has not run in the last 10 hours [10:05:39] PROBLEM - Puppet freshness on db1028 is CRITICAL: Puppet has not run in the last 10 hours [10:05:40] PROBLEM - Puppet freshness on db1041 is CRITICAL: Puppet has not run in the last 10 hours [10:05:40] PROBLEM - Puppet freshness on db1043 is CRITICAL: Puppet has not run in the last 10 hours [10:06:42] PROBLEM - Puppet freshness on db1024 is CRITICAL: Puppet has not run in the last 10 hours [10:06:58] Change restored: Hashar; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36552 [10:07:00] New patchset: Hashar; "validating new Jenkns job (do not submit)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36552 [10:07:45] PROBLEM - Puppet freshness on db1006 is CRITICAL: Puppet has not run in the last 10 hours [10:07:45] PROBLEM - Puppet freshness on db1038 is CRITICAL: Puppet has not run in the last 10 hours [10:07:45] PROBLEM - Puppet freshness on db1049 is CRITICAL: Puppet has not run in the last 10 hours [10:08:02] poor puppet [10:11:39] PROBLEM - Puppet freshness on db1001 is CRITICAL: Puppet has not run in the last 10 hours [10:11:40] PROBLEM - Puppet freshness on db1034 is CRITICAL: Puppet has not run in the last 10 hours [10:11:40] PROBLEM - Puppet freshness on db1005 is CRITICAL: Puppet has not run in the last 10 hours [10:15:42] PROBLEM - Puppet freshness on db1042 is CRITICAL: Puppet has not run in the last 10 hours [10:15:42] PROBLEM - Puppet freshness on db1018 is CRITICAL: Puppet has not run in the last 10 hours [10:15:43] PROBLEM - Puppet freshness on db1036 is CRITICAL: Puppet has not run in the last 10 hours [10:17:16] !log nikerabbit synchronized php-1.21wmf7/includes/AutoLoader.php [10:17:25] Logged the message, Master [10:17:39] PROBLEM - Puppet freshness on db1033 is CRITICAL: Puppet has not run in the last 10 hours [10:18:02] hmm [10:18:02] !log nikerabbit synchronized php-1.21wmf7/includes/Preferences.php [10:18:12] Logged the message, Master [10:18:28] !log nikerabbit synchronized php-1.21wmf7/includes/HTMLForm.php [10:18:37] Logged the message, Master [10:19:26] git is doing sooo many I/O [10:19:45] PROBLEM - Puppet freshness on db1017 is CRITICAL: Puppet has not run in the last 10 hours [10:21:42] PROBLEM - Puppet freshness on db1020 is CRITICAL: Puppet has not run in the last 10 hours [10:21:42] PROBLEM - Puppet freshness on db1021 is CRITICAL: Puppet has not run in the last 10 hours [10:21:42] PROBLEM - Puppet freshness on db1003 is CRITICAL: Puppet has not run in the last 10 hours [10:24:42] PROBLEM - Puppet freshness on db1027 is CRITICAL: Puppet has not run in the last 10 hours [10:24:42] PROBLEM - Puppet freshness on db1010 is CRITICAL: Puppet has not run in the last 10 hours [10:25:43] PROBLEM - Puppet freshness on db1019 is CRITICAL: Puppet has not run in the last 10 hours [10:25:43] PROBLEM - Puppet freshness on db1046 is CRITICAL: Puppet has not run in the last 10 hours [10:26:06] !log nikerabbit synchronized php-1.21wmf7/extensions/Translate 'Translate to master' [10:26:16] Logged the message, Master [10:27:13] PROBLEM - Puppet freshness on db1050 is CRITICAL: Puppet has not run in the last 10 hours [10:27:14] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [10:28:25] PROBLEM - Puppet freshness on db1011 is CRITICAL: Puppet has not run in the last 10 hours [10:28:26] PROBLEM - Puppet freshness on db1035 is CRITICAL: Puppet has not run in the last 10 hours [10:29:28] PROBLEM - Puppet freshness on db1002 is CRITICAL: Puppet has not run in the last 10 hours [10:29:28] PROBLEM - Puppet freshness on db1009 is CRITICAL: Puppet has not run in the last 10 hours [10:31:25] PROBLEM - Puppet freshness on db1039 is CRITICAL: Puppet has not run in the last 10 hours [10:31:25] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [10:33:31] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [10:33:31] PROBLEM - Puppet freshness on db1026 is CRITICAL: Puppet has not run in the last 10 hours [10:33:31] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [10:33:31] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [10:33:31] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [10:34:25] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [10:40:43] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [10:52:54] Swift is creating quite a lot of noise in the logs [10:53:07] hm? [10:53:10] what kind of noise? [10:54:59] Reedy: ^ [10:56:30] 401s [10:56:34] Invalid responses [10:56:42] can you copy one? [10:57:04] where is this? fluorine? [10:57:24] reedy@fenari:~$ tail -n 1000 /home/wikipedia/syslog/apache.log | grep -c -i swift [10:57:24] 294 [10:57:33] thanks. [10:57:40] http://p.defau.lt/?bzAKe7gIatXMyJtc_HMMkA [10:57:44] that's quite a lot indeed [10:57:46] I know there's usually some nearly all the time [10:57:52] Just seems there's more than usual [10:58:16] I think all but 3 of those lines are swift [11:06:49] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.089 second response time [11:09:49] PROBLEM - Apache HTTP on mw42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:11:28] RECOVERY - Apache HTTP on mw42 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.053 second response time [11:19:06] it got better now [11:19:16] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 182 seconds [11:19:20] and I wasn't able to capture it via tcpdump [11:19:53] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 186 seconds [11:20:45] Aye [11:24:31] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [11:25:08] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [11:29:44] paravoid: have you wrote the README file for the puppet wikimedia module ? :-D [11:56:06] New patchset: Nikerabbit; "Fix available Translate tasks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44047 [12:03:23] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44047 [12:09:37] !log nikerabbit synchronized wmf-config/CommonSettings.php 'Translate config fix' [12:09:47] Logged the message, Master [12:38:17] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [12:55:54] !log nikerabbit synchronized php-1.21wmf7/extensions/Translate/ 'Translate to master again' [12:56:03] Logged the message, Master [13:06:11] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 181 seconds [13:07:59] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [13:13:23] PROBLEM - Apache HTTP on mw43 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:15:02] RECOVERY - Apache HTTP on mw43 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.413 second response time [13:35:53] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [14:22:07] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [16:01:32] !log Made a mistake in Zuul configuration. A slight outage while correcting it :/ [16:01:45] Logged the message, Master [16:02:47] New patchset: Silke Meyer; "Puppet files to install Wikidata repo / client on different labs instances" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42786 [16:04:39] New review: Silke Meyer; "Good point. My files/templates now have puppet disclaimers, too." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/42786 [16:42:13] New patchset: Cmjohnson; "Changing virt1007 to labsdb1003" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44066 [16:42:45] Hi. Is this the place where I can ask to have my contributions transfered? [16:42:55] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44066 [16:48:59] PROBLEM - Packetloss_Average on oxygen is CRITICAL: CRITICAL: packet_loss_average is 9.53584724409 (gt 8.0) [16:49:14] XTSTech: From where? To where? [16:58:07] New patchset: Mark Bergsma; "Handle RADOSGW (Swift API) url rewriting for the basic case" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44067 [16:58:08] New patchset: Mark Bergsma; "Implement thumb 404 image scaler handling" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44068 [16:59:28] Reedy: From Wikipedia.. To.. Well, Wikipedia. [16:59:38] Either way, no [17:00:00] XTSTech: See http://en.wikipedia.org/wiki/Wikipedia:Changing_username [17:14:29] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 194 seconds [17:15:14] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 216 seconds [17:20:23] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [17:27:26] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [17:28:38] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [17:28:42] New review: Silke Meyer; ":/ Oops. Aha, the xml dump must not start with a puppet comment. Sorry." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/42786 [17:31:31] New patchset: Mark Bergsma; "Add timeline, math, score rewrites" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44072 [17:43:20] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [17:56:24] New patchset: Mark Bergsma; "Remove double slashes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44076 [18:07:20] PROBLEM - Puppet freshness on mw40 is CRITICAL: Puppet has not run in the last 10 hours [18:07:52] New patchset: Mark Bergsma; "Support project/language prefixes for math" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44077 [18:07:53] New patchset: Mark Bergsma; "Set CORS header in vcl_deliver" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44078 [18:26:18] New patchset: Jgreen; "remove fundraising-analytics apache virtualhost from aluminium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44080 [18:26:46] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44080 [18:27:01] anomie: I'm having issues with l10n on beta [18:27:14] Fatal error: require(): Failed opening required '/srv/deployment/mediawiki/common/l10n-1.21wmf7/ExtensionMessages.php' [18:27:17] Ryan_Lane- In a meeting now, should be done in a few minutes [18:27:20] ok [18:28:56] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 190 seconds [18:29:05] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 195 seconds [18:36:12] !log stopping Apache on sanger [18:36:23] Logged the message, Master [18:37:47] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [18:38:14] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [18:44:08] Ryan_Lane- Ok. It looks like in the old scap world we just manually copied that file from wmfX to wmfX+1. But it looks like we can easily enough add an if ( file_exists( ... ) ) around the inclusion of that file so it can be generated from scratch; I think the point of it is just to have the extension messages available on all wikis even if the extension itself isn't actually enabled. But let's ask Reedy in case I'm missing something. [18:44:41] the thing I don't understand is that this works fine on tin [18:45:10] When I was messing with tin, I put the file in place and forgot to revisit the issue. [18:45:15] oh [18:45:18] I see [18:45:31] I wish hashar was here [18:45:36] because there's another issue [18:45:40] with mwversionsinuse --withdb [18:46:46] what's the issue with mwversionsinuse? [18:46:59] laner@deployment-bastion:~$ mwversionsinuse --withdb [18:46:59] 1.21wmf7=aawikibooks 1.21wmf6=abwiki [18:47:15] check that vs tin [18:47:35] there's wikiversions-labs.dat and wikiversions.dat [18:47:38] no clue how that works [18:47:45] but it doesn't seem to update the cdb properly [18:48:30] also, the apache configuration isn't pointing to the correct place [18:48:42] I don't really know how hashar is doing that [18:48:51] The command to update the CDB should be checking /etc/wikimedia-realm, and using wikiversions-labs.dat if it contains "labs" and wikiversions.dat otherwise. [18:49:01] it doesn't work properly [18:49:15] I don't know anything about the apache config [18:50:36] when you push in the fix to l10n, please add it to: puppet [18:50:40] err s/:// [18:50:51] puppet/modules/deployment/files/git-deploy/dependencies/l10n [18:52:13] there's already a: if [ ! -f "$MW_COMMON/l10n-$mwVerNum/ExtensionMessages.php" ] [18:52:17] touch $MW_COMMON/l10n-$mwVerNum/ExtensionMessages.php [18:52:22] well, it was -d [18:54:30] Oh. Myabe I did revisit it then. Except that it should be -e instead of -d, stupid copy-paste. Is l10nupdate-quick bombing out before getting to that line? [18:56:23] ryan. need to re-establish ldap replication between sfo-aaa1 and sanger [18:56:37] how do you know it's broken? [18:56:42] 'cause I did it. [18:56:50] how did you do it? [18:56:51] Ryan_Lane: it is broken [18:57:06] delete all replication relationships in sfo-aaa1. [18:57:10] they are setting up new google accounts and they dont get mail [18:57:13] -_- [18:57:19] why would you do such a thing? [18:57:37] * Ryan_Lane sighs [18:57:38] accidently, while trying to add a new replication relationship with sfo-intranet1 in order to provide redundancy for ldap internally. [18:57:52] you don't need redundancy internally :( [18:58:00] that's the point of sanger [18:58:27] this is a *really* bad day for this... [18:58:27] well, ok. but it is necessary to re-create the relationship now. [18:58:31] :( [18:58:33] yes [18:58:59] sanger has a problem with its admin connector as well [18:58:59] good morning ops! [18:59:07] this isn't going to be simple to fix [18:59:13] good morning (evening?) hashar [18:59:21] you may need to wait until after we do the eqiad switchover [18:59:21] LeslieCarr: evening indeed. [18:59:42] LeslieCarr: someone contacted us earlier to get a RIPE record updated. I asked him to mail network at rt.wikimedia.org [18:59:48] no idea if that mail works though [18:59:55] oh [18:59:57] i have no idea [19:00:07] noc@wikimedia.org is a good address [19:00:13] Yossie: if http://www.blacksteel.com/ is your site congrats :-] [19:00:16] what was the relevant info ? [19:00:25] hashar, thanks, it is. [19:00:26] greping my browser history right now [19:00:39] we tried to temporarily add aliases to mchenry to forward USER: USER@corp.wikimedia.org - shouldn't that have worked? [19:00:41] Yossie: I love the 1990's look'n feel. Goood olddays [19:01:45] i had totally forgotten about all the codes [19:01:52] :) [19:01:58] Yossie: no [19:02:18] Yossie: the point of the replication is so that our email system knows about the ldap entries [19:02:39] either way, you're kind of out of luck until we finish the switchover [19:02:40] LeslieCarr: so the request was to change the maintainer for RT744-RIPE (that is River Tarnell, who used to be a contractor for WMDE iirc) [19:02:52] oh god yes need to update that [19:03:01] LeslieCarr: it is still maintained by Wikimedia. He asked for another maintainer and of course I can't find it :-] [19:03:07] ok, i need to get my ripe updating to not be sucky [19:03:09]