[00:25:18] !log catrope synchronized php-1.22wmf8/extensions/VisualEditor 'Update VE to pick up cssText fix' [00:25:28] Logged the message, Master [00:25:42] !log catrope synchronized php-1.22wmf9/extensions/VisualEditor 'Update VE to pick up cssText fix' [00:25:52] Logged the message, Master [01:52:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [02:01:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:02:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [02:17:47] !log LocalisationUpdate completed (1.22wmf9) at Thu Jul 11 02:17:47 UTC 2013 [02:17:58] Logged the message, Master [02:26:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.148 second response time [02:37:40] !log LocalisationUpdate completed (1.22wmf8) at Thu Jul 11 02:37:40 UTC 2013 [02:37:50] Logged the message, Master [02:46:56] PROBLEM - Puppet freshness on grosley is CRITICAL: No successful Puppet run in the last 10 hours [02:51:17] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 11 02:51:16 UTC 2013 [02:51:27] Logged the message, Master [04:16:49] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [04:46:32] New review: Dr0ptp4kt; "I believe yurik's already on this, but the VCL update deployment and the corresponding META config b..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73027 [05:36:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [06:06:18] PROBLEM - Memcached on mc15 is CRITICAL: Connection timed out [06:07:08] RECOVERY - Memcached on mc15 is OK: TCP OK - 0.026 second response time on port 11211 [06:23:25] New review: Yurik; "Update/rename of the META pages is not critical here, because this carrier is disabled. Thus, not ha..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73027 [07:10:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:11:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [07:38:28] !log gallium : fix permissions for android-common nightly build (need to be jenkins owned, not jenkins-slave) {{bug|51137}} [07:38:38] Logged the message, Master [08:02:02] PROBLEM - Puppet freshness on db78 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:03:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [08:27:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:29:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [09:05:56] New patchset: Mark Bergsma; "Detect and exit when persistent storage can't mmap a file at the required address." [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/73145 [09:05:56] New patchset: Mark Bergsma; "varnish (3.0.3plus~rc1-wm13) precise; urgency=low" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/73146 [09:28:07] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [09:28:07] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [09:28:07] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [09:28:07] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [09:28:07] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [09:28:07] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [09:28:07] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [09:28:08] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [09:44:54] New patchset: Mark Bergsma; "Retry starting Varnish 3 times on temp error 75" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73151 [09:59:40] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73151 [10:20:21] !log Graceful restart of Zuul (forward to I3fc6baba8c6d65) [10:20:31] Logged the message, Master [10:20:37] s/restart/reload [10:27:59] New patchset: Mark Bergsma; "Fix test" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73155 [10:28:39] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73155 [10:46:38] PROBLEM - Varnish HTTP parsoid-backend on cp1058 is CRITICAL: Connection refused [10:50:28] PROBLEM - Disk space on analytics1006 is CRITICAL: DISK CRITICAL - free space: / 705 MB (3% inode=84%): [10:50:38] RECOVERY - Varnish HTTP parsoid-backend on cp1058 is OK: HTTP OK: HTTP/1.1 200 OK - 634 bytes in 0.002 second response time [10:53:18] PROBLEM - DPKG on cp1059 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:01:18] RECOVERY - DPKG on cp1059 is OK: All packages OK [11:05:38] New patchset: Mark Bergsma; "Allow persistent connections for HTTP PURGE (error) responses" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72530 [11:05:38] New patchset: Mark Bergsma; "Maintain persistent connections on text cluster redirects" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73157 [11:09:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72530 [11:10:02] New patchset: Mark Bergsma; "Revert "Allow persistent connections for HTTP PURGE (error) responses"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73158 [11:10:13] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73158 [11:28:18] New patchset: Mark Bergsma; "Revert "Revert "Allow persistent connections for HTTP PURGE (error) responses""" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73159 [11:29:06] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73159 [11:36:48] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73157 [11:42:58] New patchset: Mark Bergsma; "Rename Varnish storage files for consistency" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73161 [11:43:37] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73161 [11:47:10] PROBLEM - Varnish HTTP text-backend on amssq47 is CRITICAL: Connection refused [11:56:20] PROBLEM - Varnish HTTP text-backend on cp1039 is CRITICAL: Connection refused [11:56:20] PROBLEM - DPKG on cp1039 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:04:10] New patchset: Mark Bergsma; "Add monitoring of esams wikidata/wikivoyage LVS services" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73162 [12:05:30] PROBLEM - Varnish HTTP text-backend on cp1040 is CRITICAL: Connection refused [12:06:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73162 [12:12:03] Any op around that could help me with LDAP problems around gerrit and migrating SVN users? [12:14:15] RECOVERY - Varnish HTTP text-backend on amssq47 is OK: HTTP OK: HTTP/1.1 200 OK - 190 bytes in 0.190 second response time [12:24:15] RECOVERY - Varnish HTTP text-backend on cp1039 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.001 second response time [12:24:25] RECOVERY - DPKG on cp1039 is OK: All packages OK [12:25:17] New patchset: Mark Bergsma; "Set wikidata/wikivoyage checks non-critical" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73166 [12:25:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73166 [12:36:05] PROBLEM - Packetloss_Average on analytics1006 is CRITICAL: CRITICAL: packet_loss_average is 23.7303262416 (gt 8.0) [12:47:05] PROBLEM - Puppet freshness on grosley is CRITICAL: No successful Puppet run in the last 10 hours [13:02:32] RECOVERY - Varnish HTTP text-backend on cp1040 is OK: HTTP OK: HTTP/1.1 200 OK - 188 bytes in 0.002 second response time [13:11:39] Is our naming scheme documented somewhere? [13:11:49] PROBLEM - Host virt3 is DOWN: PING CRITICAL - Packet loss = 100% [13:26:27] New patchset: coren; "DHCP: rename virt[34] to labsudb[12]" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73168 [14:17:43] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [14:28:11] New patchset: Reedy; "Add new symlinks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73170 [14:28:47] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73170 [14:29:35] New patchset: coren; "Add new labsudb role for Labs users' database" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73171 [14:30:28] Sean needs to have a gerrit account set up. :-) [14:42:50] RECOVERY - Host virt3 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [14:44:24] !log reedy synchronized php-1.22wmf10 'initial sync of 1.22wmf10' [14:44:36] Logged the message, Master [14:45:21] !log reedy synchronized docroot and w [14:45:31] Logged the message, Master [14:47:52] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: test2wiki to 1.22wmf10 and rebuild l10n cache [14:48:03] Logged the message, Master [14:49:30] PROBLEM - Host virt3 is DOWN: PING CRITICAL - Packet loss = 100% [14:49:37] Yay, scap is broken [14:50:35] oh joy [14:51:15] I'm guessing it's related to the changes Tim/Ori made to fix up the problems last time [14:52:20] RECOVERY - SSH on virt3 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:52:30] RECOVERY - Host virt3 is UP: PING OK - Packet loss = 0%, RTA = 26.67 ms [14:52:50] RECOVERY - Disk space on analytics1006 is OK: DISK OK [14:55:10] PROBLEM - Host virt4 is DOWN: PING CRITICAL - Packet loss = 100% [14:59:46] !log reedy Started syncing Wikimedia installation... : test2wiki to 1.22wmf10 and rebuild l10n cache [14:59:56] Logged the message, Master [15:00:17] https://bugzilla.wikimedia.org/show_bug.cgi?id=51174 [15:00:48] Updating ExtensionMessages-1.22wmf8.php... [15:00:48] done [15:00:48] Updating LocalisationCache for 1.22wmf8... done [15:00:57] I'm going to have to fix that newline [15:01:16] Reedy: starting early today :-) [15:04:45] Nope! This is the time I usually start to have time to fix up silly issues [15:05:45] RECOVERY - Host virt4 is UP: PING OK - Packet loss = 0%, RTA = 26.57 ms [15:06:25] PROBLEM - NTP on virt4 is CRITICAL: NTP CRITICAL: No response from NTP server [15:07:15] PROBLEM - NTP on virt3 is CRITICAL: NTP CRITICAL: No response from NTP server [15:10:24] New patchset: ArielGlenn; "replace prod ssh key for cmjohnson (rt 5448)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73176 [15:12:21] !log reedy Finished syncing Wikimedia installation... : test2wiki to 1.22wmf10 and rebuild l10n cache [15:12:23] Reedy: speaking of silly issues, do you have any theory as to why a POST to api.php in beta labs would return just the API HTML doc and not actually do an API call? It's bugzilla 50622 and 50623 [15:12:25] PROBLEM - Host virt4 is DOWN: PING CRITICAL - Packet loss = 100% [15:12:31] Logged the message, Master [15:12:41] Usually because the request is wrong [15:12:51] New patchset: ArielGlenn; "replace prod ssh key for cmjohnson (rt 5448)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73176 [15:14:17] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73176 [15:14:23] mark, bblack: can you guys make it to tomorrow's varnishkafka meeting with Snaps? (see Google Calendar) [15:15:25] RECOVERY - SSH on virt4 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:15:35] RECOVERY - Host virt4 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [15:18:41] New patchset: Reedy; "test2wiki to 1.22wmf10" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73180 [15:19:16] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73180 [15:20:05] RECOVERY - Packetloss_Average on analytics1006 is OK: OK: packet_loss_average is -0.22630703125 [15:30:20] mark, hi, could you +2 this tiny rename pls? Analytics wants it to simplify their life. https://gerrit.wikimedia.org/r/#/c/73027/ [15:35:00] blame the analytics boys for simplifying their life ;) [15:35:10] always! [15:38:46] New review: Akosiaris; "(1 comment)" [operations/puppet/cdh4] (master) C: 2; - https://gerrit.wikimedia.org/r/71569 [15:53:50] New review: Ottomata; "(1 comment)" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/71569 [15:54:03] akosiaris: ^ [21:18:29] i believe i gave the go ahead today but this after I cleared sending it to eqiad�you were on that email [21:18:32] ok, but we cannot unplug network gear without the netadmins doing things on thier end [21:18:32] their even [21:18:34] oh..i talked to her about it before it was unplugged [21:18:36] ok, so she said was ok to just unplug without them doing stuff, if so thats ok [21:18:38] lets spin up two machiens for this [21:18:41] i've got two, going to name them tmc1 tmc2 [21:18:41] temp memcache ;] [21:18:43] no worries about why things broke, we'll write up an outage report after [21:20:33] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:20:37] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [21:20:53] !log authdns-update for tmc1/2 and mgmt [21:21:07] Logged the message, RobH [21:21:08] hrmm [21:22:22] RobH: so... [21:22:25] !log spage Finished syncing Wikimedia installation... : E3 deploy latest GettingStarted and VE gender survey config [21:22:35] RobH: don't worry about bringing up some tempcs [21:22:36] Logged the message, Master [21:22:37] err [21:22:40] ........ [21:22:40] temp mcs [21:22:42] ? [21:22:47] there's lots of stuff fucked up right now [21:22:48] tmc [21:22:55] like redis replication [21:23:00] and the lack of redis [21:23:01] so we need to bring the old ones back? [21:23:10] no. we're going to move test and the misc jobs [21:29:08] !log depooling mw1017 to make it the new test.wm.o [21:29:18] Logged the message, Master [21:36:27] RECOVERY - RAID on ms-be5 is OK: OK: State is Optimal, checked 1 logical device(s) [21:36:37] RECOVERY - DPKG on ms-be5 is OK: All packages OK [21:36:37] RECOVERY - Disk space on ms-be5 is OK: DISK OK [21:36:55] !log deployed squid change to switch test to mw1017 [21:37:01] RECOVERY - swift-container-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [21:37:02] !log authdns update for zinc to move internal wiki [21:37:05] Logged the message, Master [21:37:11] RECOVERY - swift-object-server on ms-be5 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [21:37:14] Logged the message, RobH [21:37:21] RECOVERY - swift-object-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [21:37:28] !log move internal ip, not wiki. bleh [21:37:31] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [21:37:31] RECOVERY - swift-account-server on ms-be5 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [21:37:31] RECOVERY - swift-account-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [21:37:31] RECOVERY - swift-object-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [21:37:31] RECOVERY - swift-object-auditor on ms-be5 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [21:37:31] RECOVERY - swift-container-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [21:37:31] RECOVERY - swift-container-server on ms-be5 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [21:37:43] Logged the message, RobH [21:38:42] One of our users is having troubles to log into gerrit as he has unknowingly received two LDAP accounts whose cns agree (non-case-sensitively). Whom could he contact to get his LDAP accounts fixed? [21:39:11] RECOVERY - NTP on ms-be5 is OK: NTP OK: Offset 0.06268358231 secs [21:41:13] New patchset: Mattflaschen; "Enable GuidedTour and VE EventLogging on wikis with survey." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73338 [21:43:17] qchris: could you send a mail to the "requests" address in the topic [21:43:27] AaronSchulz: around? [21:43:36] AaronSchulz: we could use your help [21:43:44] mutante: Thanks. [21:43:55] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73338 [21:45:32] New patchset: Dzahn; "change dsh group testwikipedia from srv193 to mw1017" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73339 [21:46:01] RECOVERY - swift-account-reaper on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [21:46:01] RECOVERY - swift-account-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [21:46:10] anyone with access to the api logs who can grep something for me? [21:48:01] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: Connection refused [21:48:23] YuviPanda, sure what? [21:48:27] !log spage synchronized wmf-config/InitialiseSettings.php 'Config changes for VE survey' [21:48:45] Logged the message, Master [21:48:45] spagewmf: too late, MaxSem is on it [21:48:50] New review: Dzahn; "< Ryan_Lane> !log deployed squid change to switch test to mw1017" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/73339 [21:48:51] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73339 [21:48:55] spagewmf: thanks, though. will poke next time :) [21:49:43] New patchset: Asher; "moving misc::maintenance::update_flaggedrev_stats and misc::maintenance::geodata to eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73340 [21:50:29] New patchset: Dzahn; "delete raw-nagios-host-list dsh group file, outdated, not in use, replace with icinga list if needed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73341 [21:52:36] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73340 [21:54:57] New patchset: MaxSem; "Remove device detection from bits" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73342 [21:54:58] New patchset: MaxSem; "Switch testwiki to mw1017 in Varnish too" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73343 [21:55:13] Ryan_Lane: Yeah just read update-special-pages and it's pretty ridiculous [21:55:24] RoanKattouw: yeah. wtf? [21:55:39] I know [21:56:01] RoanKattouw: so, you know what's going on right? [21:56:10] RoanKattouw: want to help us move the remaining jobs to eqiad? [21:58:10] What jobs? [21:58:10] greg-g, we're running just a tad late on the deploy, probably 5-10 minutes. [21:58:38] New patchset: MaxSem; "Remove device detection from bits" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73342 [21:58:40] superm401: well fine then. [21:58:43] :P [22:00:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:01:59] RoanKattouw: that's what all of the US is asking, man [22:02:11] New patchset: MaxSem; "Switch testwiki to mw1017 in Varnish too" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73343 [22:02:15] Ryan_Lane, https://gerrit.wikimedia.org/r/73343 is also needed ^^^ [22:02:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [22:04:33] New review: Dzahn; "if ever needed again can be recreated by script if pointed to neon" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/73341 [22:04:34] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73341 [22:04:41] New patchset: Lcarr; "two fixes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73344 [22:05:50] New review: Dzahn; "keep the ❤'s" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73344 [22:07:41] MaxSem: ah, right, forgot about mobile [22:10:10] New patchset: Asher; "moving misc::maintenance::refreshlinks to eqiad." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73346 [22:10:51] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73343 [22:12:04] New patchset: Asher; "moving misc::maintenance::refreshlinks to eqiad." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73346 [22:12:26] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73346 [22:13:26] New review: Lcarr; "Don't worry, the hearts are staying forever!!!!" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/73344 [22:13:27] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73344 [22:13:35] :) mutante++ [22:13:40] hehe:) [22:14:15] New patchset: Tim Starling; "Use eqiad memcached servers for scripts running in pmtpa" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73347 [22:14:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:14:43] wanted to scare peter [22:16:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [22:17:21] MaxSem: your change should work now (and mobile should be fixed for test) [22:18:14] Ryan_Lane, still looks slow:( [22:18:26] it may still be running on one [22:18:35] or two [22:18:36] :) [22:19:06] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73347 [22:19:30] New patchset: Asher; "if creating /home/mwdeploy/refreshlinks dir, also better create /home/mwdeploy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73349 [22:19:51] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73349 [22:22:15] New patchset: Reedy; "Update noc symlinks for display" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73350 [22:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:24:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [22:26:54] New patchset: Springle; "simplewiki change_tags indexes updated, bug 40867" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73353 [22:27:46] hey ops, two things from my scap. 1. "snapshot3: sudo: no tty present and no askpass program specified" from all 4 snapshotN hosts. [22:29:03] Change merged: Springle; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73353 [22:30:06] 2. "cannot delete non-empty directory: php-1.22wmf2/.git/modules/extensions/WikiLove [22:30:06] cannot delete non-empty directory: php-1.22wmf2/.git/modules/extensions" (and all its parents up to "cannot delete non-empty directory: php-1.22wmf2" [22:31:48] 2 might because the rsync commands doesn't have the (dangerous) --force or --delete options. [22:32:37] spagewmf: 1. RT-2644 , re-opening [22:32:54] !log springle synchronized wmf-config/InitialiseSettings.php 'simplewiki change_tags indexes updated, bug 40867' [22:32:54] New patchset: Tim Starling; "Use eqiad memcached servers in pmtpa also" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73354 [22:32:57] Logged the message, Master [22:33:20] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73354 [22:33:48] https://wikitech.wikimedia.org/wiki/Scap#All-script [22:34:03] mutante: you once +1'd https://gerrit.wikimedia.org/r/33713 , wanna also +2 now that Asher said it's ok? :) pleeeeeeease [22:35:11] springle: https://wikitech.wikimedia.org/wiki/Deploy [22:35:22] (possibly too detailed) [22:36:01] !log tstarling synchronized wmf-config/twemproxy-pmtpa.yaml [22:36:11] Logged the message, Master [22:36:25] !log tstarling synchronized wmf-config/twemproxy.yaml [22:36:36] Logged the message, Master [22:36:46] Nemo_bis: but it has to be moved to terbium , right [22:37:13] mutante: I think not, because it must hit the Tampa slaves [22:37:24] or that's what I understood [22:37:47] springle: You might want to consider using !log in here when you're making schema changes and such [22:39:07] !log tstarling synchronized wmf-config/mc.php [22:39:18] Logged the message, Master [22:40:09] !log tstarling restarted twemproxy on all servers [22:40:20] Logged the message, Master [22:40:42] Reedy, ok [22:40:57] !log tstarling synchronized wmf-config [22:41:07] Logged the message, Master [22:41:10] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 61704 bytes in 0.776 second response time [22:41:41] springle: It's useful for other people to know stuff is going on, incase of any issues that seem related [22:42:45] ie !log Updating change_tags indexes on simplewiki [22:42:58] then !log change_tag index updates completed on simplewiki [22:44:19] fair enough [22:47:50] PROBLEM - Puppet freshness on grosley is CRITICAL: No successful Puppet run in the last 10 hours [22:52:15] New patchset: Ryan Lane; "Update dsh group for mobile caches" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73355 [22:52:37] !log updating change_tags indexes on remaining wikis in bug 40867 comment 6 [22:52:48] Logged the message, Master [22:54:24] MaxSem: ok, so… the dsh group was wrong. mobile caches are on new hosts. I'm force running puppet on the new systems [22:54:32] heh:) [22:54:48] I also updated the dsh group [22:55:00] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73355 [22:55:43] MaxSem: now it's fixed :) [22:55:50] PROBLEM - Puppet freshness on mw56 is CRITICAL: No successful Puppet run in the last 10 hours [22:56:36] {{confirmed}} [23:02:28] Reedy, MaxSem, anyone I see regular "Fatal error: Call to undefined method Solarium_Result_Update::numRows() at /usr/local/apache/common-local/php-1.22wmf9/extensions/GeoData/solrupdate.php on line 182" , shall I file a bug? [23:02:46] Please [23:02:54] * Reedy throws something at MaxSem [23:03:30] Reedy, throw it at those who moved it from a host where it worked:P [23:04:12] New patchset: Dzahn; "set variables for etherpad_lite as suggested in labs docs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73358 [23:05:21] MaxSem: You probably should still prevent the fatals ;) [23:05:31] MaxSem, Reedy bug 51207 [23:05:59] though, loopings [23:06:49] fffffuuuuuu [23:07:02] New patchset: Dzahn; "set variables for etherpad_lite as suggested in labs docs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73358 [23:07:14] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [23:07:14] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [23:07:24] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [23:07:54] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [23:07:57] nah, the problem wasn't with a move to hume [23:08:05] } while ( $res && $res->numRows() > 0 ); [23:08:06] :D [23:09:22] Reedy, lawl [23:09:27] that will also fail [23:09:41] New patchset: Pyoungmeister; "removing dsc from analytics contact group" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73359 [23:10:57] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73359 [23:12:57] New patchset: Dzahn; "set variables for etherpad_lite as suggested in labs docs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73358 [23:13:46] ehm.. Code Review 500 Internal server error [23:13:56] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73358 [23:14:03] but.. can't reproduce.. shrug [23:20:14] Anyone, should I file "clean out /usr/local/apache/common/php-1.22wmf2 on production machines" in RT or in bugzilla Wikimedia component? I assume the latter [23:21:57] i figure latter, usually a dev/deployer would clean that up right? [23:22:13] if its goign to require an ops person to do it, RT. [23:22:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:22:35] spagewmf: rt ticket please [23:22:39] I'll do it today or tomorrow [23:22:45] based on the time, probably tomorrow [23:23:01] greg-g, can I push a quick fix for GeoData fatal? [23:24:16] RobH well a regular scap/sync-dir won't clean out the .git objects and most deployers (well, me anyway :) ) don't know how to run a dsh job. notpeter, will do. [23:24:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [23:24:25] not urgent [23:24:38] springle: yea, peter is right then, cool [23:24:46] ack [23:24:49] spagewmf: even [23:24:52] sorry spring ;] [23:25:04] RobH: do you even lift irc, bro? [23:25:16] "springle" makes me hungry. taste the salty rainbow :) [23:25:56] notpeter: yes. http://i.imgur.com/hEowN.jpg [23:26:05] http://weknowmemes.com/wp-content/uploads/2013/06/im-so-out-of-shape.jpg [23:31:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:32:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.122 second response time [23:33:09] RobH: clearly, you do not lift. [23:33:19] New patchset: Dzahn; "comment mod_rewrite to quick fix duplicate definition when using with another class defining the same" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73365 [23:33:37] MaxSem: sorry I didn't see it until now. Yes. [23:33:48] MaxSem: quickly, of course, since we're over LD time :) [23:33:55] notpeter: do to! http://24.media.tumblr.com/230058fea2a482dd1cf9d12eaea3b837/tumblr_mgh60w8BqX1qcbo9lo1_1280.jpg [23:34:04] too even. [23:34:05] MaxSem: also, I'm heading out shortly, can you add it to the Deployment calendar for the LD today [23:34:09] thx [23:34:27] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73365 [23:34:50] RobH: I'll put you down for "maybe lifts" [23:34:57] \o/ [23:37:40] !log catrope Started syncing Wikimedia installation... : Update VE to master [23:37:51] Logged the message, Master [23:38:42] greg-g, collided with Roan, will do tomorrow or someting [23:40:09] MaxSem: not tomorrow :) [23:40:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:41:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.141 second response time [23:42:15] MaxSem: so, to be fair, people tried to move your job off of hume before and move it back because it was broken [23:42:20] New patchset: Dzahn; "update apache-fast-test to use mw1070 instead of srv193 as default test host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73367 [23:42:32] and it was moved again, and partially fixed [23:44:49] New patchset: Dzahn; "update apache-fast-test to use mw1017 instead of srv193 as default test host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73367 [23:44:56] !log catrope Finished syncing Wikimedia installation... : Update VE to master [23:45:06] Logged the message, Master [23:45:08] Ryan_Lane, the problem wan't with the terbium move as I later admitted:P [23:45:16] :) [23:46:51] notpeter , I filed RT 5455 , no rush [23:47:16] spagewmf: thanks! [23:48:44] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73367