[00:00:57] jamesofur, I see: https://en.wikisource.org/wiki/Special:Contributions/Jamesofur :D [00:02:40] MaxSem: My editing history (on either of my accounts :P ) does not give a very good sign of my reading or diff usage :P [00:03:02] (03CR) 10Quiddity: "Where exactly do the "wmgMFRemovableClasses => extracts" get used currently?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126226 (owner: 10Prtksxna) [00:05:18] I really should get back to my wikisource project, sometime... So many projects! >.< [00:19:59] (03PS14) 10BryanDavis: [WIP] Configure scap master and clients in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 [00:26:12] mwalker & Reedy, https://gerrit.wikimedia.org/r/126880 [00:59:04] yurik: why wouldn't we? [00:59:44] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [01:42:25] !log xtrabackup db63 to db60 [01:42:35] Logged the message, Master [02:03:46] !log stop mysqld on db35 (m1) for decom [02:03:54] Logged the message, Master [02:07:02] (03PS1) 10Springle: Remove db35 from m1. [operations/puppet] - 10https://gerrit.wikimedia.org/r/126898 [02:08:53] (03CR) 10Springle: [C: 032] Remove db35 from m1. [operations/puppet] - 10https://gerrit.wikimedia.org/r/126898 (owner: 10Springle) [02:11:44] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 2911 MB (3% inode=99%): [02:18:44] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3450 MB (3% inode=99%): [02:22:43] * springle kicks neon [02:33:49] !log LocalisationUpdate completed (1.23wmf21) at 2014-04-17 02:33:47+00:00 [02:33:55] Logged the message, Master [03:00:44] RECOVERY - Disk space on virt0 is OK: DISK OK [03:02:27] !log LocalisationUpdate completed (1.23wmf22) at 2014-04-17 03:02:25+00:00 [03:02:33] Logged the message, Master [03:06:29] !log deployed Parsoid 0bccf02c (deploy SHA 5e25f3b05) @ 1:30 pm PST, Apr 16th, 2014 [03:06:35] Logged the message, Master [03:49:24] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 17 03:49:19 UTC 2014 (duration 49m 18s) [03:49:29] Logged the message, Master [04:00:00] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [04:18:20] PROBLEM - Disk space on dataset1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:37:38] (03CR) 10Ori.livneh: [C: 032] Remove unneeded priority settings [operations/puppet] - 10https://gerrit.wikimedia.org/r/126828 (owner: 10Ori.livneh) [05:55:43] (03PS2) 10Nemo bis: Adding '*.panoramio.com' to the wgCopyUploadsDomains array [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126384 (owner: 10Marco) [06:00:35] (03PS3) 10Nemo bis: Adding '*.panoramio.com' to the wgCopyUploadsDomains array [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126384 (owner: 10Marco) [06:00:55] (03CR) 10Nemo bis: [C: 031] "Alright, did the bureaucracy for you" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126384 (owner: 10Marco) [06:41:07] (03CR) 10Dzahn: "since Change-Id: I6b9c47055e7 (also see Change-Id: I6d6250e69) we replaced virtual packages and i think this is one that pulls ttf-dejavu" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126834 (owner: 10Brian Wolff) [06:44:04] (03PS3) 10Dzahn: Add ttf-dejavu to image scalers for "DejaVu (Sans|Serif) Condensed". [operations/puppet] - 10https://gerrit.wikimedia.org/r/126834 (owner: 10Brian Wolff) [06:44:43] (03PS4) 10Dzahn: Add ttf-dejavu-core,ttf-dejavu-extra to image scalers for "DejaVu (Sans|Serif) Condensed". [operations/puppet] - 10https://gerrit.wikimedia.org/r/126834 (owner: 10Brian Wolff) [06:55:02] (03PS5) 10Giuseppe Lavagetto: Add ttf-dejavu-core,ttf-dejavu-extra to image scalers for "DejaVu (Sans|Serif) Condensed". [operations/puppet] - 10https://gerrit.wikimedia.org/r/126834 (owner: 10Brian Wolff) [06:56:05] (03CR) 10Giuseppe Lavagetto: [C: 032] Add ttf-dejavu-core,ttf-dejavu-extra to image scalers for "DejaVu (Sans|Serif) Condensed". [operations/puppet] - 10https://gerrit.wikimedia.org/r/126834 (owner: 10Brian Wolff) [07:01:00] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [07:09:36] (03CR) 10Dzahn: [C: 032] remove virt15 from DHCP, decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/126250 (owner: 10Dzahn) [07:42:00] (03PS1) 10Dzahn: decom db35,db38, remove from dsh, dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/126928 [07:43:31] (03CR) 10Dzahn: [C: 032] decom db35,db38, remove from dsh, dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/126928 (owner: 10Dzahn) [07:45:14] !log restarting gitblit [07:45:20] Logged the message, Master [07:47:53] !log db35,db38, stop puppet and salt, revoke certs,keys [07:47:59] Logged the message, Master [08:02:46] (03PS1) 10Dzahn: remove lvs1-4 from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/126929 [08:04:50] (03CR) 10Dzahn: [C: 032] remove lvs1-4 from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/126929 (owner: 10Dzahn) [08:06:00] (03CR) 10Dzahn: [C: 032] rm wap.wikipedia.org apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/126227 (owner: 10Dzahn) [08:26:57] (03CR) 10Odder: [C: 031] Adding '*.panoramio.com' to the wgCopyUploadsDomains array [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126384 (owner: 10Marco) [08:28:54] !log db35,db38 - shutdown [08:29:00] Logged the message, Master [08:32:40] (03PS1) 10Dzahn: add wiktionary.eu, link to wiktionary.org [operations/dns] - 10https://gerrit.wikimedia.org/r/126932 [08:33:00] twkozlowski: ^ [08:33:31] \o/ [08:33:42] (03CR) 10Dzahn: "please check status on #7304" [operations/dns] - 10https://gerrit.wikimedia.org/r/126932 (owner: 10Dzahn) [08:47:41] PROBLEM - Disk space on labstore1001 is CRITICAL: DISK CRITICAL - free space: /exp/dumps 379584 MB (3% inode=99%): [08:56:08] (03PS1) 10Dzahn: remove sumanah's LDAP admin permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126935 [08:58:51] (03PS2) 10Dzahn: remove sumanah's LDAP admin permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126935 [09:01:03] apergos: /ext/dumps on labstore1001 is nearly full apparently . I guess that is related to the wiki xml dumps ? :-) [09:02:03] (03CR) 10Dzahn: [C: 032] "with the key already being absent these can't be used anyways. and added a 'revoked' comment" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126935 (owner: 10Dzahn) [09:04:12] ugh really? [09:04:21] how full is nearly full? [09:04:58] (3% inode=99%): [09:05:14] let me rephrase that, how much space is it using? [09:07:50] what we should do is deal with this: https://bugzilla.wikimedia.org/show_bug.cgi?id=48894 [09:09:06] dont know, just relaying the icinga notification :D [09:09:17] saying there is 'only' 380GB of disk space left [09:10:52] it just cares about percentages, the larget the disk the earlier the warning [09:13:03] well that amount of space will get eaten over time because we don't cap the number of pageview files, which is what that bug is about [09:13:04] (03PS1) 10Odder: Redirect wiktionary.eu to www.wiktionary.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/126937 [09:13:16] (03CR) 10jenkins-bot: [V: 04-1] Redirect wiktionary.eu to www.wiktionary.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/126937 (owner: 10Odder) [09:17:23] (03CR) 10Dzahn: "ok, so i just looked at bast1001 in site.pp for something else and i see this:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 (owner: 10Hoo man) [09:26:25] (03CR) 10Alexandros Kosiaris: [C: 032] Sysctl: make the default priority 70 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126839 (owner: 10Ori.livneh) [09:31:12] (03CR) 10Gilles: [C: 031] Add ttf-kochi-mincho and ttf-kochi-gothic to imagescalers [operations/puppet] - 10https://gerrit.wikimedia.org/r/126729 (owner: 10Reedy) [09:32:05] (03PS2) 10Odder: Redirect wiktionary.eu to www.wiktionary.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/126937 [09:36:41] RECOVERY - Disk space on labstore1001 is OK: DISK OK [09:39:05] (03PS1) 10Dzahn: remove admins::restricted from lucene role [operations/puppet] - 10https://gerrit.wikimedia.org/r/126939 [09:40:07] (03CR) 10jenkins-bot: [V: 04-1] remove admins::restricted from lucene role [operations/puppet] - 10https://gerrit.wikimedia.org/r/126939 (owner: 10Dzahn) [09:40:50] (03PS2) 10Dzahn: remove admins::restricted from lucene role [operations/puppet] - 10https://gerrit.wikimedia.org/r/126939 [09:43:43] (03CR) 10Dzahn: "this is what this removes" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [09:47:31] (03PS1) 10Dzahn: remove admins::restricted from terbium,fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/126941 [09:49:24] (03CR) 10Dzahn: "also see: Change-Id: Iad35d5707dc" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 (owner: 10Hoo man) [10:01:31] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [10:11:19] !log lvs1-6 - disable puppet,salt,revoke certs,keys [10:11:25] Logged the message, Master [10:15:55] niiiiice [10:15:57] !log re-deleting unaccepted salt keys for virt2,5-11 [10:16:05] Logged the message, Master [10:35:27] (03CR) 10Matanya: "what about hooft in esams?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126941 (owner: 10Dzahn) [10:38:41] apergos: can you please comment on this one? ^ does hooft have private data? [10:40:47] (03PS1) 10Hashar: contint: remove ruby-bundler outdated package [operations/puppet] - 10https://gerrit.wikimedia.org/r/126953 [10:42:00] !log lvs1, lvs2 shutdown [10:42:05] Logged the message, Master [10:42:24] (03CR) 10Matanya: [C: 031] remove sudo::appserver from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [10:43:09] (03CR) 10Hashar: [C: 031 V: 032] "Found out that some Jenkins jobs was falling because of the old bundle version :-D" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126953 (owner: 10Hashar) [10:43:29] not by design, and note that mortals and restricted already have access over there now [10:48:01] (03PS1) 10Dzahn: remove lvs1-6 [operations/dns] - 10https://gerrit.wikimedia.org/r/126954 [10:50:11] (03PS2) 10Dzahn: remove lvs1-6 lvs1-6.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/126954 [10:51:31] (03PS1) 10Faidon Liambotis: ganglia_new: add esams to Swift's sites [operations/puppet] - 10https://gerrit.wikimedia.org/r/126956 [10:52:02] (03CR) 10Faidon Liambotis: [C: 032] ganglia_new: add esams to Swift's sites [operations/puppet] - 10https://gerrit.wikimedia.org/r/126956 (owner: 10Faidon Liambotis) [10:53:43] wow there they go... bye bye lvses [10:55:34] :D [10:56:13] !log lvs3,lvs4,lvs5,lvs6 - shutdown [10:56:19] Logged the message, Master [10:56:40] PROBLEM - Host ms-fe.pmtpa.wmnet is DOWN: CRITICAL - Network Unreachable (10.2.1.27) [10:56:55] heh [10:57:17] uhm [10:58:11] ACKNOWLEDGEMENT - LVS HTTP IPv4 on ms-fe.pmtpa.wmnet is CRITICAL: Connection timed out daniel_zahn LVS have been shutdown, service IPs removed [10:58:25] had removed the other monitoring but not swift [10:58:50] heh [10:58:56] (just got the page) [11:01:22] (03PS1) 10Dzahn: remove ms-fe.pmtpa monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/126957 [11:01:45] sorry, that was just left because i once thought we'd keep swift longer [11:01:50] nope [11:01:54] kill it! [11:02:06] yep, ok [11:02:12] killing it with that change above [11:02:22] the service IP/monitoring I mean [11:02:26] yea [11:02:27] the servers we can keep for another week or so [11:02:36] when robh/chris go there [11:02:49] ok [11:03:29] (03CR) 10Dzahn: [C: 032] "this was the last "pmtpa" in here" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126957 (owner: 10Dzahn) [11:04:20] (03PS5) 10Reedy: Remove further pmtpa remnants [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126673 [11:04:33] (03CR) 10Reedy: [C: 032] "Bye bye tampa, bye bye." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126673 (owner: 10Reedy) [11:04:40] (03Merged) 10jenkins-bot: Remove further pmtpa remnants [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126673 (owner: 10Reedy) [11:05:54] !log reedy synchronized wmf-config/ 'I290bd1ea628563646c02651041fa2cec4a320b56' [11:06:01] Logged the message, Master [11:18:56] (03PS2) 10Zfilipin: contint: remove ruby-bundler outdated package [operations/puppet] - 10https://gerrit.wikimedia.org/r/126953 (owner: 10Hashar) [11:24:44] (03PS3) 10Zfilipin: contint: remove ruby-bundler outdated package [operations/puppet] - 10https://gerrit.wikimedia.org/r/126953 (owner: 10Hashar) [11:25:53] (03CR) 10Zfilipin: [C: 031] contint: remove ruby-bundler outdated package [operations/puppet] - 10https://gerrit.wikimedia.org/r/126953 (owner: 10Hashar) [11:32:08] (03PS1) 10Faidon Liambotis: ganglia: add cluster Swift esams, remove Ceph esams [operations/puppet] - 10https://gerrit.wikimedia.org/r/126960 [11:33:09] (03PS2) 10Faidon Liambotis: ganglia: add group Swift esams, remove Ceph esams [operations/puppet] - 10https://gerrit.wikimedia.org/r/126960 [11:33:28] (03CR) 10Faidon Liambotis: [C: 032 V: 032] ganglia: add group Swift esams, remove Ceph esams [operations/puppet] - 10https://gerrit.wikimedia.org/r/126960 (owner: 10Faidon Liambotis) [11:46:54] (03PS1) 10Dzahn: add role ldap operations on silver [operations/puppet] - 10https://gerrit.wikimedia.org/r/126961 [11:57:23] (03CR) 10Dzahn: [C: 032] bugzilla,make Apache SSL CipherSuite configurable [operations/puppet] - 10https://gerrit.wikimedia.org/r/126204 (owner: 10Dzahn) [12:01:01] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:01:47] ignore the upcoming swift esams alerts [12:01:51] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [12:03:12] PROBLEM - swift-account-replicator on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [12:03:21] PROBLEM - Swift HTTP backend on ms-fe3002 is CRITICAL: Connection refused [12:03:21] PROBLEM - swift-container-updater on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [12:03:21] PROBLEM - swift-account-auditor on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [12:03:22] PROBLEM - swift-account-server on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [12:03:22] PROBLEM - swift-account-replicator on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [12:03:22] PROBLEM - swift-container-server on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [12:03:22] PROBLEM - swift-object-auditor on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:03:22] PROBLEM - swift-object-replicator on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:03:23] PROBLEM - swift-object-updater on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:03:23] PROBLEM - swift-container-auditor on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:03:24] PROBLEM - swift-object-auditor on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:03:24] PROBLEM - swift-object-server on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:03:25] PROBLEM - swift-account-reaper on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [12:03:25] PROBLEM - swift-container-updater on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [12:03:31] PROBLEM - swift-account-server on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [12:03:31] PROBLEM - swift-object-replicator on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:03:41] PROBLEM - swift-container-server on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [12:03:41] PROBLEM - swift-object-server on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:03:41] PROBLEM - swift-container-server on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [12:03:41] PROBLEM - swift-account-replicator on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [12:03:41] PROBLEM - swift-account-auditor on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [12:03:41] PROBLEM - swift-container-updater on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [12:03:41] PROBLEM - swift-object-updater on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:03:42] PROBLEM - swift-object-replicator on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:03:42] PROBLEM - swift-container-replicator on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [12:03:43] PROBLEM - swift-container-auditor on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:03:51] PROBLEM - swift-object-updater on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:03:51] PROBLEM - swift-object-auditor on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:03:51] PROBLEM - swift-container-auditor on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:03:51] PROBLEM - swift-object-replicator on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:03:51] PROBLEM - swift-container-replicator on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [12:03:52] PROBLEM - swift-account-auditor on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [12:03:52] PROBLEM - swift-object-updater on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:03:53] PROBLEM - swift-account-replicator on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [12:04:01] PROBLEM - Swift HTTP frontend on ms-fe3002 is CRITICAL: Connection refused [12:04:01] PROBLEM - swift-container-replicator on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [12:04:01] PROBLEM - Swift HTTP frontend on ms-fe3001 is CRITICAL: Connection refused [12:04:01] PROBLEM - swift-account-server on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [12:04:01] PROBLEM - swift-container-updater on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [12:04:01] PROBLEM - swift-account-reaper on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [12:04:11] PROBLEM - Swift HTTP backend on ms-fe3001 is CRITICAL: Connection refused [12:04:11] PROBLEM - swift-account-reaper on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [12:04:11] PROBLEM - swift-account-reaper on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [12:04:11] PROBLEM - swift-account-server on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [12:04:11] PROBLEM - swift-object-server on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:04:11] PROBLEM - swift-object-server on ms-be3004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:04:12] PROBLEM - swift-account-auditor on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [12:04:12] PROBLEM - swift-container-replicator on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [12:04:13] PROBLEM - swift-container-server on ms-be3002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [12:04:13] PROBLEM - swift-container-auditor on ms-be3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:04:14] PROBLEM - swift-object-auditor on ms-be3001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:05:30] heh [12:05:34] this is noisy [12:05:43] we should probably fix the checks at some point... [12:09:41] RECOVERY - swift-object-server on ms-be3002 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:09:41] RECOVERY - swift-container-server on ms-be3004 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [12:09:41] RECOVERY - swift-container-server on ms-be3003 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [12:09:41] RECOVERY - swift-account-replicator on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [12:09:41] RECOVERY - swift-object-updater on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:09:41] RECOVERY - swift-account-auditor on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [12:09:41] RECOVERY - swift-container-updater on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [12:09:42] RECOVERY - swift-container-replicator on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [12:09:42] RECOVERY - swift-object-replicator on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:09:43] RECOVERY - swift-container-auditor on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:09:44] (03CR) 10Dzahn: [C: 031] Remove mysql client from bastionhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/126027 (owner: 10Hoo man) [12:09:51] RECOVERY - swift-object-auditor on ms-be3004 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:09:51] RECOVERY - swift-object-updater on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:09:51] RECOVERY - swift-container-auditor on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:09:51] RECOVERY - swift-account-auditor on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [12:09:51] RECOVERY - swift-container-replicator on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [12:09:52] RECOVERY - swift-object-replicator on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:09:52] RECOVERY - swift-object-updater on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:09:52] RECOVERY - swift-account-replicator on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [12:10:01] RECOVERY - Swift HTTP frontend on ms-fe3002 is OK: HTTP OK: HTTP/1.1 200 OK - 137 bytes in 0.196 second response time [12:10:01] RECOVERY - swift-container-replicator on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [12:10:01] RECOVERY - Swift HTTP frontend on ms-fe3001 is OK: HTTP OK: HTTP/1.1 200 OK - 137 bytes in 0.197 second response time [12:10:01] RECOVERY - swift-account-server on ms-be3003 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [12:10:01] RECOVERY - swift-container-updater on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [12:10:01] RECOVERY - swift-account-reaper on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [12:10:11] RECOVERY - Swift HTTP backend on ms-fe3001 is OK: HTTP OK: HTTP/1.1 200 OK - 343 bytes in 0.207 second response time [12:10:11] RECOVERY - swift-object-server on ms-be3003 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:10:11] RECOVERY - swift-account-reaper on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [12:10:11] RECOVERY - swift-account-server on ms-be3001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [12:10:11] RECOVERY - swift-account-reaper on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [12:10:12] RECOVERY - swift-object-auditor on ms-be3001 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:10:12] RECOVERY - swift-container-replicator on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [12:10:13] RECOVERY - swift-container-auditor on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:10:13] RECOVERY - swift-container-server on ms-be3002 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [12:10:14] RECOVERY - swift-object-server on ms-be3004 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:10:14] RECOVERY - swift-account-auditor on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [12:10:15] RECOVERY - swift-account-replicator on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [12:10:21] RECOVERY - Swift HTTP backend on ms-fe3002 is OK: HTTP OK: HTTP/1.1 200 OK - 343 bytes in 0.209 second response time [12:10:21] RECOVERY - swift-container-updater on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [12:10:21] RECOVERY - swift-account-replicator on ms-be3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [12:10:21] RECOVERY - swift-account-auditor on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [12:10:21] RECOVERY - swift-container-server on ms-be3001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [12:10:21] RECOVERY - swift-object-updater on ms-be3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:10:22] RECOVERY - swift-account-server on ms-be3002 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [12:10:22] RECOVERY - swift-object-replicator on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:10:23] RECOVERY - swift-object-auditor on ms-be3003 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:10:23] RECOVERY - swift-container-auditor on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:10:24] RECOVERY - swift-object-server on ms-be3001 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:10:24] RECOVERY - swift-object-auditor on ms-be3002 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:10:25] RECOVERY - swift-account-reaper on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [12:10:25] RECOVERY - swift-container-updater on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [12:10:31] RECOVERY - swift-account-server on ms-be3004 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [12:10:31] RECOVERY - swift-object-replicator on ms-be3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:25:11] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=ms-fe3002.esams.wmnet&m=cpu_report&s=by+name&mc=2&g=network_report&c=Swift+esams [12:25:18] and this folks is called a saturated gbps [12:25:35] (03CR) 10Dzahn: "do you still want this? mind if i amend? i'd add the recurse attribute as suggested above. so it's just a single file resource but you can" [operations/puppet] - 10https://gerrit.wikimedia.org/r/76678 (owner: 10Tim Starling) [12:27:51] urk [12:29:01] no, that's okay [12:29:05] I'm copying files over to esams [12:29:11] the faster it goes, the better [12:29:22] ah:) [12:29:32] all images? [12:29:39] how long do you expect it takes [12:30:05] <_joe_> paravoid: pretty saturated indeed [12:30:48] <_joe_> paravoid: we are just creating the thumbnails in esams, right? [12:31:59] not thumbs, originals [12:32:22] just to have another copy while tampa is on the move [12:32:26] in case eqiad burns down or something [12:32:30] <_joe_> uh, so it's cross-site replicated, ok [12:32:39] <_joe_> sorry, didn't know :) [12:33:46] cool, reassuring to have another copy [12:34:05] it will take less than a week [12:34:14] probably closer to 5 days [12:34:16] <_joe_> writes still go to eqiad, right? [12:35:08] <_joe_> ok, I'll look those details up on wikitech and in puppet [12:35:30] <_joe_> (and mediawiki/config, I guess) [12:36:17] everything goes to eqiad, yes [12:36:22] the esams one is not going to be used by production [12:36:48] we just have a python script running on copper (a random misc box) that copies all the files [12:37:06] it's a oneoff, but I'll probably keep it running on a loop or something [12:37:54] (03PS1) 10Dzahn: add wikisource.pl, link to wikisource.org [operations/dns] - 10https://gerrit.wikimedia.org/r/126968 [12:38:05] <_joe_> ok, less fancy than I've anticipated, more simple :) [12:38:26] yeah it's crude [12:38:29] but will do for now [12:38:47] the longer term plan is to set up a proper georeplicated swift cluster across the two DCs [12:39:00] swift supports geoclusters nowadays, understands regions/hierarchies etc. [12:39:10] the two DCs = eqiad & the new DC, not esams [12:40:18] <_joe_> paravoid: for obvious latency reasons as well as bandwidth costs, right? [12:40:37] latency mostly [12:40:47] and legal reasons [12:41:20] (03CR) 10Dzahn: "ideally link the Apache change over here" [operations/dns] - 10https://gerrit.wikimedia.org/r/126968 (owner: 10Dzahn) [12:41:30] <_joe_> oh, honestly never thought of the implications of syncronizing free content across borders :) [12:41:59] we don't do production data outside of the US [12:42:03] just caches, backups etc. [12:42:26] because otherwise some country may come and ask us to remove a file/article/whatever [12:42:34] or at least that's my understanding of it [12:42:51] <_joe_> paravoid: yes it makes perfect sense. [12:43:20] <_joe_> I mean, not to operate under two different legislations [12:43:55] ah, while talking about images, do you know what this was? [12:44:03] imagedump.pmtpa.wmnet [12:44:22] i don't expect you will make imagedump.eqiad.wmnet? [12:45:51] I have no clue what this is [12:45:58] (03PS1) 10Odder: Redirect wikisource.pl to pl.wikisource.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/126969 [12:46:04] it says "dump" so maybe apergos knows? [12:46:11] mutante: ^^ [12:47:11] (03CR) 10Odder: "See I9332650 for the Apache patchset." [operations/dns] - 10https://gerrit.wikimedia.org/r/126968 (owner: 10Dzahn) [12:47:38] nope, predates me [12:48:36] (03CR) 10Dzahn: [C: 04-2] "Dereckson, abandon? if not please get some more attention to it again, it's sitting in our queue forever otherwise i'm afraid" [operations/puppet] - 10https://gerrit.wikimedia.org/r/80760 (owner: 10Dereckson) [12:49:14] paravoid: apergos , then i'll just kill it, since it's pmtpa, thx [12:49:35] sounds great [12:51:28] twkozlowski: yes, cool [12:52:25] mutante: So I set DNS to ns1.wm.org and ns2.wm.org... when? [12:53:25] twkozlowski: in this case after the apache change and the dns change are live [12:53:34] since you have the working redirect.. right [12:53:39] OK, will keep an eye on it. [12:53:44] Yeah, I do. [12:53:49] nods [12:55:08] re [12:55:39] (03PS2) 10Dzahn: remove imagedump.pmtpa.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/125949 [12:57:55] (03PS3) 10Dzahn: remove imagedump.pmtpa.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/125949 [12:58:09] (03PS5) 10Manybubbles: Deploy experimental highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/123704 [12:58:51] (03CR) 10Manybubbles: "Now that If0df224b21fe589cc7dcdc7e3548d1b1693abb44 is in (and going to test sites) I'd like to get this deployed so we can try it out." [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/123704 (owner: 10Manybubbles) [13:00:33] (03CR) 10Manybubbles: [C: 031] "Chad, Andrew Otto, and I are the only folks I know who might go poking there. We only do that rarely but it is nice to be able to get in." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126939 (owner: 10Dzahn) [13:00:58] (03CR) 10Dzahn: [C: 032] remove imagedump.pmtpa.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/125949 (owner: 10Dzahn) [13:01:41] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Wed 16 Apr 2014 06:54:47 AM UTC [13:03:57] (03CR) 10Dzahn: "this would only stop them from doing that if they are not already in another admin class, like mortals, which they have if they are deploy" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126939 (owner: 10Dzahn) [13:06:05] (03PS1) 10Dzahn: remove rendering.pmtpa,rendering.svc.pmtpa [operations/dns] - 10https://gerrit.wikimedia.org/r/126971 [13:08:02] (03CR) 10Manybubbles: "Chad and I are deployers and Andrew is ops so we shouldn't need a new group." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126939 (owner: 10Dzahn) [13:16:07] (03PS1) 10Dzahn: decom, remove db35,db38 [operations/dns] - 10https://gerrit.wikimedia.org/r/126972 [13:18:41] (03CR) 10Matanya: [C: 031] add role ldap operations on silver [operations/puppet] - 10https://gerrit.wikimedia.org/r/126961 (owner: 10Dzahn) [13:18:57] mutante: can you please verify formey only has ldap left? [13:24:47] matanya_: see the whole history in 6134 [13:24:58] it should only be that, yes [13:25:27] matanya_: well, strictly, speaking.. there is something [13:25:44] which is? [13:25:55] role::deployment::test [13:26:05] webserver::php5 [13:26:06] shrug [13:26:19] i was referring to php5 [13:26:28] webserver, almost certain just leftover from being svn [13:26:40] i guess it serves /something/ [13:26:46] but if it is svn, it is ok [13:26:48] deployment::test, dunno, but it's "test" [13:26:59] that is ryan's toy [13:27:01] iirc [13:27:21] gerrit.wikimedia.org [13:27:41] is the apache site [13:33:01] (03CR) 10Dzahn: [C: 032] add role ldap operations on silver [operations/puppet] - 10https://gerrit.wikimedia.org/r/126961 (owner: 10Dzahn) [13:37:52] ^ ok, that seems to have worked [13:37:58] (03CR) 10Ottomata: "admins::bastion sounds fine to me!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 (owner: 10Hoo man) [13:38:06] i can now use ldaplist on silver [13:38:13] apergos: [13:38:15] matanya: [13:38:37] (03CR) 10Ottomata: [C: 031] Deploy experimental highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/123704 (owner: 10Manybubbles) [13:38:49] so time to kill formey mutante [13:38:57] may i have the joy? :) [13:39:40] (03CR) 10Manybubbles: [C: 032 V: 032] "Good enough for me, I'll git-deploy this today and we'll start picking it up when we reinstall the nodes. I'll have to do quick rolling r" [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/123704 (owner: 10Manybubbles) [13:40:14] matanya: of making patches? sure:) [13:40:30] thanks, two minutes [13:40:45] i'm not gonna kill it right the second [13:40:49] no rush, but thanks [13:41:12] !log synced experimental highlighter to elasticsearch nodes - they'll pick it up on restart [13:41:18] Logged the message, Master [13:42:26] ottomata: morning! [13:46:45] mornin! [13:46:51] (03PS1) 10Matanya: formey: decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/126976 [13:47:18] ottomata: I'm piggy backing a plugin deploy onto your server repartitions [13:47:24] ok! [13:47:33] do you need to restart all daemons? [13:48:31] which reminds me, i'm going to start moving shards off of 1016, s'ok? [13:49:18] ottomata: any news regarding emery? [13:49:47] ja, sqstat is off [13:50:00] i've made some commits to continue the decom, need to get erbium running on unicast udp2log stream [13:50:07] been busy with other stuff [13:50:13] but i think we can move forward with it [13:50:41] need to merge the unicast patch before [13:54:03] (03PS1) 10Matanya: formey:decom [operations/dns] - 10https://gerrit.wikimedia.org/r/126978 [14:02:35] !log reedy updated /a/common to {{Gerrit|I290bd1ea6}}: Remove further pmtpa remnants [14:02:40] Logged the message, Master [14:03:43] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126979 [14:03:46] (03PS1) 10Reedy: testwiki to 1.24wmf1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126980 [14:03:47] (03PS1) 10Reedy: Wikipedias to 1.23wmf22 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126981 [14:03:50] (03PS1) 10Reedy: Rest of group0 to 1.24wmf1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126982 [14:03:58] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126979 (owner: 10Reedy) [14:04:10] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126979 (owner: 10Reedy) [14:09:47] (03PS3) 10BBlack: Update Zero netmapper data from zero.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/126829 [14:14:59] (03PS1) 10Manybubbles: Elasticsearch site plugins [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/126986 [14:17:10] (03PS1) 10Bartosz Dziewoński: Remove $wmgUseMicroDesign [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126987 [14:20:44] manybubbles: you ok if I move shards off of 1016? [14:20:53] ottomata: go ahead! [14:21:06] k its going [14:24:31] (03PS1) 10Bartosz Dziewoński: Remove $wmgUsabilityEnforce [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/126989 [14:42:39] (03PS1) 10Andrew Bogott: Added role::labs::lvm::biglogs [operations/puppet] - 10https://gerrit.wikimedia.org/r/126992 [14:43:03] Coren: ^ [14:43:41] (03CR) 10BryanDavis: [C: 031] Elasticsearch site plugins [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/126986 (owner: 10Manybubbles) [14:44:18] andrewbogott: The whole disk? [14:44:40] hm... [14:45:28] bd808: I'm around-ish for the SWAT but mobile. Hope that's OK. [14:48:33] James_F: Ok with me because I'm not doing the swat :) [14:49:39] (03CR) 10coren: [C: 031] "Might be better parametrizable, but that works." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126992 (owner: 10Andrew Bogott) [14:49:44] (03CR) 10Hashar: "Random moods :-] That managed to get scap deployed on beta cluster but I have the feeling that much more work will need to be done for pro" (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 (owner: 10BryanDavis) [14:51:41] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [14:52:21] James_F|Away, RoanKattouw_away: SWAT in about 10 minutes. Ready to test it to make sure it didn't break anything? [14:52:49] manybubbles: Unless you really want to, I'll take the SWAT this morning. [14:53:08] anomie: have fun! I'm happy to if you don't want to, though [14:53:41] manybubbles: I may as well. But then I may drop offline to concentrate on actual coding. [14:53:55] anomie: good man [14:55:21] PROBLEM - Varnish HTTP text-frontend on cp1053 is CRITICAL: Connection timed out [14:55:22] PROBLEM - Varnish HTTP text-frontend on cp1052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:31] PROBLEM - LVS HTTPS IPv4 on text-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:38] I'm getting 502 errors. [14:55:41] PROBLEM - Varnish HTTP text-frontend on cp1067 is CRITICAL: Connection timed out [14:55:41] PROBLEM - Varnish HTTP text-frontend on cp1068 is CRITICAL: HTTP CRITICAL - No data received from host [14:55:51] PROBLEM - Varnish HTTP text-frontend on cp1055 is CRITICAL: Connection timed out [14:56:18] <_joe_> and this has to do with the number of 5xx we were seeing [14:56:21] PROBLEM - Varnish HTTP text-frontend on cp1066 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:22] PROBLEM - LVS HTTP IPv4 on text-lb.eqiad.wikimedia.org is CRITICAL: HTTP CRITICAL - No data received from host [14:56:31] PROBLEM - LVS HTTPS IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:06] (03PS2) 10Andrew Bogott: Added role::labs::lvm::biglogs [operations/puppet] - 10https://gerrit.wikimedia.org/r/126992 [14:57:21] RECOVERY - Varnish HTTP text-frontend on cp1052 is OK: HTTP OK: HTTP/1.1 200 OK - 263 bytes in 8.932 second response time [14:57:21]