[00:00:04] RoanKattouw, ^d, marktraceur, MaxSem, RoanKattouw: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141202T0000). Please do the needful. [00:00:12] I can do it [00:00:53] * aude here [00:02:01] aude, do I need any magical tricks to deploy it? [00:02:15] no [00:02:20] it's just a tiny change in js [00:03:09] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 21.43% of data above the critical threshold [500.0] [00:04:26] !log maxsem Synchronized php-1.25wmf10/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#/c/176713/ (duration: 00m 06s) [00:04:28] Logged the message, Master [00:04:46] MaxSem: Did you sync the sub-repo? [00:04:57] yup [00:05:01] Cool. Thanks! [00:05:03] * James_F checks. [00:06:28] !log maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#/c/176837/ (duration: 00m 12s) [00:06:30] Logged the message, Master [00:06:35] aude, ^^^ [00:06:52] checking [00:06:53] MaxSem: Works, thank you. [00:07:10] looks good [00:07:28] thanks [00:14:19] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [00:14:33] (03CR) 10Ori.livneh: [C: 031] apachesync - delete sync-apache script [puppet] - 10https://gerrit.wikimedia.org/r/175884 (owner: 10Dzahn) [00:15:28] (03CR) 10Dzahn: [C: 032] apachesync - delete sync-apache script [puppet] - 10https://gerrit.wikimedia.org/r/175884 (owner: 10Dzahn) [00:17:34] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [00:18:29] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [00:32:38] (03CR) 10Dzahn: [C: 032] "yep, reasonable, checked on ocg1001" [puppet] - 10https://gerrit.wikimedia.org/r/176202 (owner: 10GWicke) [00:38:26] (03CR) 10Dzahn: [C: 031] "didn't see any files owned by nobody when checking on a single random appserver (mw1033)" [puppet] - 10https://gerrit.wikimedia.org/r/174896 (owner: 10Hoo man) [00:41:18] (03CR) 10Dzahn: Redirect wikimedia.community to www.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/176508 (owner: 10Glaisher) [00:42:39] (03CR) 10Dzahn: "Yuvi: just because it would now touch all different files and the title was also wrong..i'll make a follow-up" [puppet] - 10https://gerrit.wikimedia.org/r/173999 (owner: 10Dzahn) [01:48:27] (03PS1) 10Dzahn: facilities: move to module [puppet] - 10https://gerrit.wikimedia.org/r/176863 [01:48:58] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/176863/" [puppet] - 10https://gerrit.wikimedia.org/r/173999 (owner: 10Dzahn) [01:49:53] (03PS2) 10Dzahn: (WIP) facilities: move to module [puppet] - 10https://gerrit.wikimedia.org/r/176863 [01:53:36] (03PS1) 10Ori.livneh: Add [debs/hhvm] (master_330) - 10https://gerrit.wikimedia.org/r/176864 [01:54:29] (03CR) 10Ori.livneh: [C: 032 V: 032] Add [debs/hhvm] (master_330) - 10https://gerrit.wikimedia.org/r/176864 (owner: 10Ori.livneh) [02:10:40] (03PS1) 10Dzahn: remove slauerhoff and slauerhoff-array [dns] - 10https://gerrit.wikimedia.org/r/176868 [02:19:15] !log l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s) [02:19:18] !log LocalisationUpdate completed (1.25wmf9) at 2014-12-02 02:19:18+00:00 [02:19:22] Logged the message, Master [02:19:27] Logged the message, Master [02:19:39] lol, 1s [02:32:17] !log l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s) [02:32:21] !log LocalisationUpdate completed (1.25wmf10) at 2014-12-02 02:32:21+00:00 [02:32:28] Logged the message, Master [02:32:32] Logged the message, Master [02:59:10] (03PS1) 10Dzahn: broken edit to test jenkins check [puppet] - 10https://gerrit.wikimedia.org/r/176871 [03:00:09] (03CR) 10Dzahn: [C: 04-2] "going to test some variations of breaking Apache config in different PS'es because hashar asked for it" [puppet] - 10https://gerrit.wikimedia.org/r/176871 (owner: 10Dzahn) [03:01:31] (03PS2) 10Dzahn: broken edit to test jenkins check [puppet] - 10https://gerrit.wikimedia.org/r/176871 [03:06:20] Any idea about list mail getting backed up? [03:06:32] At least on social-media we're having a long delay at the moment between approve and send [03:06:41] and email sent directly too it by list members [03:08:56] mutante: ^ [04:05:10] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: Puppet has 1 failures [04:06:41] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Puppet has 1 failures [04:06:41] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [04:12:45] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: Puppet has 1 failures [04:12:45] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: Puppet has 1 failures [04:13:00] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 1 failures [04:13:34] jamesofur: Hrm. [04:13:47] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: Puppet has 1 failures [04:14:06] I swear there was a graph on ganglia or something for exim... can't find it though [04:14:27] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [04:14:34] Don't see many ops around at the moment. [04:15:29] PROBLEM - Router interfaces on mr1-esams is CRITICAL: CRITICAL: No response from remote host 91.198.174.247 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 [04:15:46] * jamesofur assumes Jeff_Green is busy but pings anyway [04:16:24] pong [04:16:28] hmm, the queued messages is definitely increased from usual. Climbed up to a peak of about 4.5k (from 2k) and now been roughly even at 3.4 or so [04:16:38] Jeff_Green: we seem to have a backlog on mailman (or mail in general somewhere) [04:16:55] multi hour delays on at the very least 1 mailing list (social-media) haven't tried others [04:16:56] ok [04:17:12] you noticed it first as delays? or is something alerting too? [04:17:38] noticed it first as delays [04:17:45] ranging from 2.5 hours to 5 or so [04:17:52] ok. I'll see what I can find [04:18:52] merci beaucoup [04:20:29] PROBLEM - Router interfaces on cr1-esams is CRITICAL: CRITICAL: No response from remote host 91.198.174.245 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 [04:23:18] RECOVERY - Router interfaces on cr1-esams is OK: OK: host 91.198.174.245, interfaces up: 98, down: 0, dormant: 0, excluded: 0, unused: 0 [04:23:23] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 2 04:23:23 UTC 2014 (duration 23m 22s) [04:23:26] Logged the message, Master [04:23:48] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [04:25:48] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: Puppet has 1 failures [04:25:49] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: Puppet has 1 failures [04:26:37] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:27:09] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 1 failures [04:28:06] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:28:54] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 1 failures [04:28:54] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [04:29:06] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 2 failures [04:30:08] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [04:33:38] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: Puppet has 1 failures [04:39:29] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [04:40:58] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:42:14] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: Puppet has 1 failures [04:42:30] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [04:43:18] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:43:28] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:46:23] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 2 failures [04:48:11] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: puppet fail [04:49:54] (03PS1) 10Deskana: Update LegalContactPages configuration per feedback from end user. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176876 [04:56:31] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:00:14] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 1 failures [05:00:15] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:00:15] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:00:51] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:01:30] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Puppet has 1 failures [05:01:30] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Puppet has 1 failures [05:01:40] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:02:26] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures [05:02:28] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:09:46] (03PS1) 10Ori.livneh: Set $wgTidyInternal = false to ease deployment of tidy extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176879 [05:09:51] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [05:09:51] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:10:16] (03PS2) 10Ori.livneh: Set $wgTidyInternal = false to ease deployment of tidy extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176879 [05:10:31] (03CR) 10Ori.livneh: [C: 032] Set $wgTidyInternal = false to ease deployment of tidy extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176879 (owner: 10Ori.livneh) [05:10:39] (03Merged) 10jenkins-bot: Set $wgTidyInternal = false to ease deployment of tidy extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176879 (owner: 10Ori.livneh) [05:10:47] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [05:11:32] !log ori Synchronized wmf-config/CommonSettings.php: Set $wgTidyInternal to false unconditionally to ease deployment of tidy extension (duration: 00m 06s) [05:11:38] Logged the message, Master [05:11:40] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:14:43] (03PS1) 10Ori.livneh: Enable internal tidy if extension is loaded and hostname == mw1114 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176880 [05:16:22] (03PS1) 10Ori.livneh: hhvm: load tidy.so extension [puppet] - 10https://gerrit.wikimedia.org/r/176881 [05:29:28] (03PS1) 10Ori.livneh: Unset $wgTidyInternal, so its default value is used. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176884 [05:41:10] (03PS1) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-br-fr] - 10https://gerrit.wikimedia.org/r/176885 [05:54:23] (03PS1) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-eo-en] - 10https://gerrit.wikimedia.org/r/176886 [06:30:04] (03PS1) 10KartikMistry: Added initial Debian packaging for apertium-id-ms [debs/contenttranslation/apertium-id-ms] - 10https://gerrit.wikimedia.org/r/176889 [06:34:31] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:00] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:45] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: Puppet has 1 failures [06:40:35] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:58] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:46:46] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:47:38] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:53:44] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [07:19:36] PROBLEM - puppet last run on virt1003 is CRITICAL: CRITICAL: Puppet has 1 failures [07:21:28] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: Puppet has 1 failures [07:31:14] RECOVERY - puppet last run on virt1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:33:11] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [07:37:56] <_joe_> !log repooling mw1048-1052 [07:37:59] Logged the message, Master [07:49:16] (03CR) 10Nikerabbit: Add ContentTranslation in wikishared DB (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175979 (owner: 10KartikMistry) [07:49:25] (03PS7) 10Nikerabbit: Add ContentTranslation in wikishared DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175979 (owner: 10KartikMistry) [07:50:29] (03CR) 10Nikerabbit: [C: 04-1] Add ContentTranslation in wikishared DB (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175979 (owner: 10KartikMistry) [08:00:11] <_joe_> !log depooling mw1060-mw1067 for reimaging [08:00:16] Logged the message, Master [08:09:47] (03PS1) 10Nemo bis: Redirect bugzilla alias URLs to old-bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/176898 [08:11:36] (03PS8) 10KartikMistry: Add ContentTranslation in wikishared DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175979 [08:13:38] (03PS1) 10Faidon Liambotis: Allow HTTP traffic WMF OIT space -> holmium (blog) [puppet] - 10https://gerrit.wikimedia.org/r/176899 [08:14:47] (03CR) 10Faidon Liambotis: [C: 032] Allow HTTP traffic WMF OIT space -> holmium (blog) [puppet] - 10https://gerrit.wikimedia.org/r/176899 (owner: 10Faidon Liambotis) [08:15:46] PROBLEM - Disk space on mw1060 is CRITICAL: Connection refused by host [08:15:46] PROBLEM - SSH on mw1060 is CRITICAL: Connection refused [08:15:47] PROBLEM - Apache HTTP on mw1060 is CRITICAL: Connection refused [08:15:56] PROBLEM - RAID on mw1060 is CRITICAL: Connection refused by host [08:15:56] PROBLEM - puppet last run on mw1060 is CRITICAL: Connection refused by host [08:16:18] PROBLEM - check configured eth on mw1060 is CRITICAL: Connection refused by host [08:16:36] PROBLEM - check if dhclient is running on mw1060 is CRITICAL: Connection refused by host [08:16:49] PROBLEM - check if salt-minion is running on mw1060 is CRITICAL: Connection refused by host [08:16:52] PROBLEM - DPKG on mw1060 is CRITICAL: Connection refused by host [08:17:06] PROBLEM - nutcracker port on mw1060 is CRITICAL: Connection refused by host [08:17:28] PROBLEM - nutcracker process on mw1060 is CRITICAL: Connection refused by host [08:17:47] PROBLEM - Host mw1062 is DOWN: CRITICAL - Plugin timed out after 15 seconds [08:20:46] <_joe_> I did schedule downtime for those hosts... wtf [08:21:46] <_joe_> eh, my bad [08:23:55] PROBLEM - Host mw1064 is DOWN: PING CRITICAL - Packet loss = 100% [08:24:36] RECOVERY - SSH on mw1060 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [08:30:43] RECOVERY - Apache HTTP on mw1060 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.129 second response time [08:31:04] RECOVERY - Host mw1064 is UP: PING OK - Packet loss = 0%, RTA = 1.25 ms [08:36:17] RECOVERY - check configured eth on mw1060 is OK: NRPE: Unable to read output [08:36:37] (03PS3) 10Glaisher: Redirect wikimedia.community to www.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/176508 [08:36:43] RECOVERY - DPKG on mw1060 is OK: All packages OK [08:37:05] PROBLEM - Disk space on mw1064 is CRITICAL: Connection refused by host [08:37:07] RECOVERY - check if dhclient is running on mw1060 is OK: PROCS OK: 0 processes with command name dhclient [08:37:10] RECOVERY - check if salt-minion is running on mw1060 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:37:22] PROBLEM - RAID on mw1064 is CRITICAL: Connection refused by host [08:37:22] RECOVERY - nutcracker port on mw1060 is OK: TCP OK - 0.000 second response time on port 11212 [08:37:30] PROBLEM - check configured eth on mw1064 is CRITICAL: Connection refused by host [08:37:43] PROBLEM - nutcracker port on mw1064 is CRITICAL: Connection refused by host [08:37:50] PROBLEM - SSH on mw1064 is CRITICAL: Connection refused [08:38:02] RECOVERY - Disk space on mw1060 is OK: DISK OK [08:38:02] RECOVERY - nutcracker process on mw1060 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:38:03] PROBLEM - check if dhclient is running on mw1064 is CRITICAL: Timeout while attempting connection [08:38:21] RECOVERY - RAID on mw1060 is OK: OK: no RAID installed [08:38:27] <_joe_> both hosts are in scheduled downtime, according to icinga [08:40:20] (03CR) 10Glaisher: Redirect wikimedia.community to www.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/176508 (owner: 10Glaisher) [08:50:05] (03PS2) 10Giuseppe Lavagetto: lvs: remove the HHVM specialized pools [puppet] - 10https://gerrit.wikimedia.org/r/175434 [08:50:07] (03PS2) 10Giuseppe Lavagetto: mediawiki: convert hhvm appservers to be part of the common pool [puppet] - 10https://gerrit.wikimedia.org/r/175433 [08:51:07] (03PS1) 10Faidon Liambotis: firewall: move allow NTP rule to base & restrict [puppet] - 10https://gerrit.wikimedia.org/r/176901 [08:51:29] (03PS2) 10Faidon Liambotis: firewall: move "allow NTP" rule to role::ntp [puppet] - 10https://gerrit.wikimedia.org/r/176901 [08:52:01] (03CR) 10Faidon Liambotis: [C: 032] firewall: move "allow NTP" rule to role::ntp [puppet] - 10https://gerrit.wikimedia.org/r/176901 (owner: 10Faidon Liambotis) [08:52:09] <_joe_> argh [08:52:16] <_joe_> another rebase :P [08:52:18] lol [08:52:20] sorry :) [08:52:23] <_joe_> np [08:54:02] (03PS3) 10Giuseppe Lavagetto: mediawiki: convert hhvm appservers to be part of the common pool [puppet] - 10https://gerrit.wikimedia.org/r/175433 [08:54:18] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: convert hhvm appservers to be part of the common pool [puppet] - 10https://gerrit.wikimedia.org/r/175433 (owner: 10Giuseppe Lavagetto) [08:54:33] (03PS3) 10Giuseppe Lavagetto: lvs: remove the HHVM specialized pools [puppet] - 10https://gerrit.wikimedia.org/r/175434 [08:54:42] (03CR) 10Giuseppe Lavagetto: [C: 032] lvs: remove the HHVM specialized pools [puppet] - 10https://gerrit.wikimedia.org/r/175434 (owner: 10Giuseppe Lavagetto) [09:06:02] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:06:03] PROBLEM - puppet last run on mw1064 is CRITICAL: CRITICAL: Puppet has 104 failures [09:10:32] PROBLEM - puppet last run on mw1063 is CRITICAL: CRITICAL: Puppet has 1 failures [09:13:02] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 104 failures [09:14:14] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: puppet fail [09:14:42] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: puppet fail [09:14:53] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: puppet fail [09:15:23] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: puppet fail [09:15:23] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: puppet fail [09:16:02] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: puppet fail [09:16:04] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: puppet fail [09:16:43] PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: puppet fail [09:16:48] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: $$role::cache::configuration::backends[$::realm]["hhvm_appservers"] is :undef, not a hash or array at /etc/puppet/manifests/role/cache.pp:739 on node amssq41.esams.wmnet [09:16:52] _joe_: ^ :) [09:17:08] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: puppet fail [09:17:22] <_joe_> paravoid: I was just checking [09:17:30] <_joe_> sigh [09:17:37] <_joe_> well, not any issue live [09:17:43] <_joe_> but those things are everywhere [09:17:49] <_joe_> fixing... [09:18:02] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: puppet fail [09:18:03] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: puppet fail [09:18:03] PROBLEM - Host hhvm-appservers.svc.eqiad.wmnet is DOWN: PING CRITICAL - Packet loss = 100% [09:18:26] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: puppet fail [09:19:06] greetings [09:19:38] PROBLEM - puppet last run on cp1037 is CRITICAL: CRITICAL: puppet fail [09:19:38] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: puppet fail [09:19:51] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: puppet fail [09:20:12] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: puppet fail [09:20:48] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: puppet fail [09:20:49] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: puppet fail [09:21:03] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: puppet fail [09:21:17] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: puppet fail [09:21:45] PROBLEM - puppet last run on cp1040 is CRITICAL: CRITICAL: puppet fail [09:22:13] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: puppet fail [09:22:14] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: puppet fail [09:22:32] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: puppet fail [09:22:45] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: puppet fail [09:22:47] <_joe_> ciao godog [09:22:51] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: puppet fail [09:22:51] (03PS1) 10Giuseppe Lavagetto: HAT: remove last references to the HHVM pools [puppet] - 10https://gerrit.wikimedia.org/r/176902 [09:22:52] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: puppet fail [09:23:01] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: puppet fail [09:23:02] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: puppet fail [09:23:11] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [09:23:16] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: puppet fail [09:23:21] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: puppet fail [09:23:24] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: puppet fail [09:23:24] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: puppet fail [09:23:48] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] HAT: remove last references to the HHVM pools [puppet] - 10https://gerrit.wikimedia.org/r/176902 (owner: 10Giuseppe Lavagetto) [09:24:01] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail [09:24:23] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: puppet fail [09:24:27] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: puppet fail [09:24:27] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: puppet fail [09:24:42] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: puppet fail [09:24:46] hey _joe_, I take it the page was you? :) [09:25:05] <_joe_> godog: yeah removing the pool, sorry [09:25:16] RECOVERY - puppet last run on mw1063 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:25:16] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: puppet fail [09:25:32] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: puppet fail [09:25:32] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: puppet fail [09:25:43] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [09:25:48] <_joe_> and this shower of criticals shoudl end soon [09:26:04] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: puppet fail [09:26:33] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: puppet fail [09:26:44] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: puppet fail [09:27:15] ah! okay [09:27:16] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: puppet fail [09:27:21] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: puppet fail [09:27:37] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: puppet fail [09:27:39] <_joe_> it's just the nagios checks that run at different times [09:27:42] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:28:11] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: puppet fail [09:28:12] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: puppet fail [09:28:21] PROBLEM - puppet last run on amssq44 is CRITICAL: CRITICAL: puppet fail [09:28:31] RECOVERY - puppet last run on cp1057 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:28:31] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: puppet fail [09:28:52] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: puppet fail [09:28:55] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: puppet fail [09:29:05] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: puppet fail [09:29:21] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: puppet fail [09:29:47] PROBLEM - puppet last run on cp1039 is CRITICAL: CRITICAL: puppet fail [09:30:46] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: puppet fail [09:30:46] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:30:53] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: puppet fail [09:31:03] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: puppet fail [09:31:22] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [09:31:22] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: puppet fail [09:32:15] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:32:15] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: puppet fail [09:32:23] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:32:33] RECOVERY - puppet last run on cp1038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:32:45] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [09:32:54] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [09:32:55] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:33:32] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:33:33] <_joe_> sorry for the spam [09:33:53] <_joe_> I plainly forgot this, I was sloppy [09:35:43] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:36:04] RECOVERY - puppet last run on cp1040 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [09:36:23] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [09:36:53] RECOVERY - puppet last run on cp1037 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:36:53] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [09:37:03] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [09:37:03] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [09:37:12] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [09:37:12] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [09:37:45] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [09:37:45] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [09:37:45] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:37:46] RECOVERY - puppet last run on mw1064 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [09:37:58] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [09:37:59] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:38:24] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:38:24] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [09:38:25] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [09:38:55] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [09:39:17] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:39:37] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:39:46] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:40:35] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:40:41] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:40:46] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:40:46] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [09:41:15] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:41:29] (03PS1) 10Giuseppe Lavagetto: HAT: move the last servers out of the HHVM pool [puppet] - 10https://gerrit.wikimedia.org/r/176907 [09:41:47] (03CR) 10Giuseppe Lavagetto: [C: 032] HAT: move the last servers out of the HHVM pool [puppet] - 10https://gerrit.wikimedia.org/r/176907 (owner: 10Giuseppe Lavagetto) [09:41:47] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:41:55] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:42:06] RECOVERY - puppet last run on cp1069 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:42:12] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [09:42:31] (03CR) 10Giuseppe Lavagetto: [V: 032] HAT: move the last servers out of the HHVM pool [puppet] - 10https://gerrit.wikimedia.org/r/176907 (owner: 10Giuseppe Lavagetto) [09:42:56] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [09:43:09] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [09:43:11] RECOVERY - puppet last run on amssq44 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [09:43:11] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [09:43:50] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:44:22] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:44:38] RECOVERY - puppet last run on cp1039 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [09:44:51] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:45:26] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [09:45:41] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [09:45:52] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:46:01] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:46:01] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [09:46:01] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [09:46:32] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:46:32] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [09:46:43] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:47:02] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:47:02] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [09:51:51] (03Abandoned) 10Giuseppe Lavagetto: Revert "mediawiki: move most servers from the hhvm to the standard pool" [puppet] - 10https://gerrit.wikimedia.org/r/175950 (owner: 10Giuseppe Lavagetto) [09:52:13] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:52:13] (03Abandoned) 10Giuseppe Lavagetto: Revert "varnish: remove redirection to the hhvm pool" [puppet] - 10https://gerrit.wikimedia.org/r/175951 (owner: 10Giuseppe Lavagetto) [09:53:38] (03CR) 10Filippo Giunchedi: [C: 031] Redirect wikimedia.community to www.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/176508 (owner: 10Glaisher) [09:54:24] RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:54:45] <_joe_> !log repooling mw1060,mw1062-65; depooling mw1067-mw1070 for reimaging [09:54:52] Logged the message, Master [09:58:45] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:59:53] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:08:44] (03PS1) 10Yuvipanda: base: Make atop log retention configurable [puppet] - 10https://gerrit.wikimedia.org/r/176909 [10:19:26] (03PS2) 10Yuvipanda: base: Make atop log retention configurable [puppet] - 10https://gerrit.wikimedia.org/r/176909 [10:45:05] (03PS1) 10Yuvipanda: shinken: Unify shinken::hosts and shinken::services [puppet] - 10https://gerrit.wikimedia.org/r/176911 [10:48:31] (03PS1) 10Gilles: Enable JPG thumbnail chaining on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176912 [10:48:36] PROBLEM - puppet last run on mw1066 is CRITICAL: CRITICAL: Puppet has 104 failures [10:48:50] (03CR) 10Yuvipanda: [C: 032] shinken: Unify shinken::hosts and shinken::services [puppet] - 10https://gerrit.wikimedia.org/r/176911 (owner: 10Yuvipanda) [10:49:06] PROBLEM - puppet last run on mw1067 is CRITICAL: CRITICAL: Puppet has 104 failures [10:49:45] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 104 failures [10:50:12] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 104 failures [10:50:45] PROBLEM - puppet last run on mw1070 is CRITICAL: CRITICAL: Puppet has 8 failures [10:55:36] PROBLEM - HHVM rendering on mw1069 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:16] RECOVERY - HHVM rendering on mw1069 is OK: HTTP OK: HTTP/1.1 200 OK - 70279 bytes in 0.460 second response time [11:03:33] (03PS1) 10Yuvipanda: Put the bin file in the /usr/bin instead of /usr/ircecho/bin [debs/ircecho] - 10https://gerrit.wikimedia.org/r/176914 [11:03:38] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:58] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [11:07:24] (03PS2) 10Yuvipanda: Put the bin file in the /usr/bin instead of /usr/ircecho/bin [debs/ircecho] - 10https://gerrit.wikimedia.org/r/176914 [11:07:38] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:10:01] (03PS3) 10Yuvipanda: Put the bin file in the /usr/bin instead of /usr/ircecho/bin [debs/ircecho] - 10https://gerrit.wikimedia.org/r/176914 [11:10:14] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 9 below the confidence bounds [11:12:37] (03PS4) 10Yuvipanda: Put the bin file in the /usr/bin instead of /usr/ircecho/bin [debs/ircecho] - 10https://gerrit.wikimedia.org/r/176914 [11:17:12] RECOVERY - puppet last run on mw1066 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:19:23] RECOVERY - puppet last run on mw1070 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:20:45] <_joe_> !log repooling mw1061,mw1066-mw1070 [11:20:49] Logged the message, Master [11:32:00] (03CR) 10Aklapper: [C: 04-1] "I consider the confusion of redirecting some reports to a ticket in old-bugzilla (when passing an alphabetic ID parameter) and redirecting" [puppet] - 10https://gerrit.wikimedia.org/r/176898 (owner: 10Nemo bis) [11:38:32] (03CR) 10Filippo Giunchedi: [C: 04-1] "some work needed for trusty vs precise (vs jessie?)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/176909 (owner: 10Yuvipanda) [11:42:05] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [11:54:01] !log remove legacy symlink /home/wikipedia/syslog from lithium [11:54:07] Logged the message, Master [11:54:46] PROBLEM - puppet last run on mw1237 is CRITICAL: CRITICAL: Puppet has 1 failures [12:08:45] RECOVERY - puppet last run on mw1237 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:20:13] (03CR) 10Nemo bis: "What's confusing? It's simple: we redirect the user to the place closest to what was being looked for." [puppet] - 10https://gerrit.wikimedia.org/r/176898 (owner: 10Nemo bis) [12:24:50] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 1 failures [12:38:52] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:39:52] godog: oh dog, that's terrible. (re: atop) [12:40:05] err, I was going to say oh god, but slipped [12:42:15] heheh [12:42:32] YuviPanda: yes it is a sorry state [12:46:09] godog: wonder if we should justr get rid of it. [12:46:21] godog: also wanna look at https://gerrit.wikimedia.org/r/#/c/176914/? [12:48:15] not building for precise since meh :) [12:49:05] YuviPanda: sure, btw I think there's value in atop if anything because it works by itself [12:49:22] hmm, yeah, enough people seem to think that [12:50:03] "by itself" I mean in isolation [12:50:21] YuviPanda: did you try to run a debdiff between old and new ircecho? [12:50:42] no, I just built the package and tested it on a labs instance... [12:50:51] let me look up debiandiff, is still super-newbie-packager [12:52:00] sure, it is in devscripts [12:53:11] (03PS5) 10Yuvipanda: Put the bin file in the /usr/bin instead of /usr/ircecho/bin [debs/ircecho] - 10https://gerrit.wikimedia.org/r/176914 [12:55:47] godog: huh, interesting. seems .git is somehow packaged into the deb? [12:55:50] * YuviPanda finds way to paste it [12:56:48] godog: http://paste.ubuntu.com/9344390/ [12:58:30] YuviPanda: I take it that's debdiff on the .dsc files? what about on the .deb files? but yeah the git dir shouldn't show up in the source package [12:58:43] yeah, that's on the dsc [13:00:08] godog: http://paste.ubuntu.com/9344409/ seems fine. [13:07:04] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: puppet fail [13:07:14] godog: hmm, interesting. the entire .git folder is in the package [13:11:50] (03CR) 10Filippo Giunchedi: [C: 031] Put the bin file in the /usr/bin instead of /usr/ircecho/bin [debs/ircecho] - 10https://gerrit.wikimedia.org/r/176914 (owner: 10Yuvipanda) [13:12:07] YuviPanda: heh, not a big deal for ircecho but it'd be nice if it didn't, anyways the actual changes LGTM [13:12:11] cool :) [13:12:21] I probably should spend more time on ircecho at some point [13:12:28] at least for a 'shut up, icinga-wm!' feature [13:12:36] and maybe some batching as well. [13:12:52] (03CR) 10Yuvipanda: [C: 032] Put the bin file in the /usr/bin instead of /usr/ircecho/bin [debs/ircecho] - 10https://gerrit.wikimedia.org/r/176914 (owner: 10Yuvipanda) [13:12:54] (03Merged) 10jenkins-bot: Put the bin file in the /usr/bin instead of /usr/ircecho/bin [debs/ircecho] - 10https://gerrit.wikimedia.org/r/176914 (owner: 10Yuvipanda) [13:14:37] heheh indeed, at least so e.g. "puppet last run" storms are more manageable [13:15:52] yeah. [13:16:02] or it could batch them itself intelligently [13:16:11] and so after the first 4, 5 ones it gets better [13:19:13] (03CR) 10Alexandros Kosiaris: [C: 031] Redirect wikimedia.community to www.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/176508 (owner: 10Glaisher) [13:23:50] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:46:55] mark, hi, have there been any changes in public IPs recently (past month or so)? [13:49:44] PROBLEM - puppet last run on db1007 is CRITICAL: CRITICAL: Puppet has 1 failures [13:50:28] PROBLEM - puppet last run on wtp1021 is CRITICAL: CRITICAL: Puppet has 1 failures [13:50:44] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: Puppet has 1 failures [13:50:45] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: Puppet has 1 failures [13:50:55] PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: Puppet has 1 failures [13:50:55] PROBLEM - puppet last run on mw1038 is CRITICAL: CRITICAL: Puppet has 1 failures [13:51:25] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: Puppet has 1 failures [13:51:34] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: Puppet has 1 failures [13:51:44] PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Puppet has 1 failures [13:52:15] PROBLEM - puppet last run on es1009 is CRITICAL: CRITICAL: Puppet has 1 failures [13:52:18] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 1 failures [13:52:18] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Puppet has 1 failures [13:52:18] PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: Puppet has 1 failures [13:52:25] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: Puppet has 1 failures [13:52:45] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: Puppet has 1 failures [13:53:04] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: Puppet has 1 failures [13:53:10] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: Puppet has 1 failures [13:53:14] PROBLEM - puppet last run on wtp1019 is CRITICAL: CRITICAL: Puppet has 1 failures [14:01:18] RECOVERY - puppet last run on db1007 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [14:02:13] RECOVERY - puppet last run on wtp1021 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [14:02:28] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [14:02:38] RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [14:03:12] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:03:12] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [14:03:20] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:03:51] RECOVERY - puppet last run on es1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:03:52] RECOVERY - puppet last run on ms-be1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:03:52] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:04:18] RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:04:28] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:04:49] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:04:51] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:04:51] RECOVERY - puppet last run on wtp1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:05:09] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:05:30] RECOVERY - puppet last run on mw1038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:06:48] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:08:03] (03PS1) 10KartikMistry: Initial Debian packaging for apertium-urd-hin [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/176924 [14:25:27] PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: Puppet has 1 failures [14:25:47] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: Puppet has 1 failures [14:25:48] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: Puppet has 1 failures [14:26:00] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures [14:26:03] PROBLEM - puppet last run on mw1155 is CRITICAL: CRITICAL: Puppet has 1 failures [14:26:05] PROBLEM - puppet last run on virt1008 is CRITICAL: CRITICAL: Puppet has 1 failures [14:26:07] PROBLEM - puppet last run on mw1113 is CRITICAL: CRITICAL: Puppet has 1 failures [14:26:20] PROBLEM - puppet last run on mw1253 is CRITICAL: CRITICAL: Puppet has 1 failures [14:26:57] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: Puppet has 1 failures [14:26:58] PROBLEM - puppet last run on es1004 is CRITICAL: CRITICAL: Puppet has 1 failures [14:26:58] PROBLEM - puppet last run on wtp1001 is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:10] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:19] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:20] PROBLEM - puppet last run on analytics1021 is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:20] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:40] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:51] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:51] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:51] PROBLEM - puppet last run on dysprosium is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:51] PROBLEM - puppet last run on es2006 is CRITICAL: CRITICAL: Puppet has 1 failures [14:27:59] PROBLEM - puppet last run on radium is CRITICAL: CRITICAL: Puppet has 1 failures [14:28:32] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet has 1 failures [14:30:51] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [14:31:11] RECOVERY - check if dhclient is running on analytics1003 is OK: PROCS OK: 0 processes with command name dhclient [14:31:29] RECOVERY - check configured eth on analytics1003 is OK: NRPE: Unable to read output [14:31:29] RECOVERY - check if salt-minion is running on analytics1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:32:20] RECOVERY - SSH on analytics1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [14:32:31] RECOVERY - Disk space on analytics1003 is OK: DISK OK [14:35:39] (03PS1) 10Ottomata: Disable kafaktee on analytics1003 (at least tempoarily) [puppet] - 10https://gerrit.wikimedia.org/r/176926 [14:35:59] (03CR) 10Ottomata: [C: 032 V: 032] Disable kafaktee on analytics1003 (at least tempoarily) [puppet] - 10https://gerrit.wikimedia.org/r/176926 (owner: 10Ottomata) [14:36:24] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [14:36:24] RECOVERY - puppet last run on labsdb1005 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [14:36:24] RECOVERY - puppet last run on dysprosium is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [14:36:34] RECOVERY - puppet last run on radium is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [14:36:34] !log depooling mw1220 to re-image as HHVM [14:36:41] Logged the message, Master [14:36:57] RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:37:15] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [14:37:15] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:37:27] RECOVERY - puppet last run on db1035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:37:34] RECOVERY - puppet last run on mw1155 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:37:34] RECOVERY - puppet last run on virt1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:37:52] RECOVERY - puppet last run on mw1113 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:37:52] RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:38:37] RECOVERY - puppet last run on cp1045 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:38:38] RECOVERY - puppet last run on wtp1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:38:49] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:38:49] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:38:58] RECOVERY - puppet last run on analytics1021 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:39:07] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:39:11] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:39:42] RECOVERY - puppet last run on es2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:40:48] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:41:19] yurikR: yes [14:41:40] Nemo_bis, do you know what they are? [14:42:21] yurikR: question too generic [14:42:38] !log depooling mw1220 for HHVM re-imaging [14:42:41] Are you asking about full IP ranges? [14:42:43] Logged the message, Master [14:44:11] RECOVERY - puppet last run on es1004 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [14:44:38] Nemo_bis, i'm looking for the IP ranges used by all our public-facing services. Mark or paravoid published them. They are at https://office.wikimedia.org/wiki/Wikipedia_Zero_Destination_IP_Addresses [14:46:16] this sounds like an XY problem, yurikR [14:46:20] what is the problem you're experiencing [14:46:56] paravoid, one of the partners is reporting that they are being blocked when testing, and asked to check if IP addresses we gave them are still valid [14:47:13] (blocked == not being zero rated) [14:47:16] what is their IP that *they* are seeing? [14:47:54] i told them to investigate it from their side - they don't know, and are investigating (testing was not done by an expert engineer) [14:48:11] in the mean time, i want to make sure that whatever ips we gave them are still valid [14:48:21] yurikR: well, there is https://wikitech.wikimedia.org/wiki/IP_addresses [14:48:30] You shouldn't provide IPs with any promise [14:48:44] imho [14:48:59] Nemo_bis: zero is an unfortunate exception to this general rule, I'm afraid [14:49:09] true, [14:49:16] I realised after hitting enter :) [14:49:36] we do "promise" the IPs that we hand out to the zero team [14:49:51] so that they can in turn give them to vendors [14:50:23] that's the lesser of two evils really, the other alternative was URL whitelisting which assumed clear-text traffic (no https) and deep packet inspection firewalls/gateways [14:50:27] We're not alone in this madness, see university proxies for paywalled databases :) [14:50:48] sigh ) [14:50:58] yurikR: the IPs in that page still stand, as far as I can see [14:51:10] this one? https://office.wikimedia.org/wiki/Wikipedia_Zero_Destination_IP_Addresses [14:51:14] but it's hard to debug something with such little evidence or even an accurate description of the problem, sorry [14:51:28] i understand, that's why i pushed back on it to their side [14:51:33] yes, these [14:51:49] fwiw, operations/dns, "grep addrs config-geo" [14:52:42] I don't see anything out-of-range there [14:52:53] thx, will do. In the mean time, i will re-send them all ips in the "effective nov 21, 2013" block [14:53:01] both text & multimedia [14:53:33] i assume bits and other misc stuff is also part of those ranges [14:54:14] yes they are [14:54:39] if they weren't, I assume you would have heard complaints from multiple partners, no? [14:55:11] heh, you assume people do proper testing [14:55:32] no, I assume that if it's not working they'd have angry customers [14:55:47] :) [14:55:54] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: Puppet has 1 failures [14:56:22] it takes a very long time for there to be some customer diligent enough to notice the problem, report it, get it all the way to tech department that emails us. We are actually thinking of introducing some feedback mechanism directly [14:56:39] PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: Puppet has 1 failures [14:56:41] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 1 failures [14:56:41] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: Puppet has 1 failures [14:56:41] fair enough :) [14:56:49] we suspect that in case something is not zero-rated, people will simply stop using the service [14:56:52] PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:03] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:03] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:03] PROBLEM - puppet last run on db2029 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:04] PROBLEM - puppet last run on amslvs1 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:04] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:32] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:52] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:54] PROBLEM - puppet last run on es1007 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:54] PROBLEM - puppet last run on polonium is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:54] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:54] PROBLEM - puppet last run on mw1044 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:06] PROBLEM - puppet last run on elastic1022 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:07] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:07] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:15] PROBLEM - puppet last run on pc1002 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:16] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:16] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:24] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:46] PROBLEM - puppet last run on mw1055 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:46] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:46] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:46] PROBLEM - puppet last run on db1016 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:54] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:56] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:57] PROBLEM - puppet last run on lithium is CRITICAL: CRITICAL: Puppet has 1 failures [14:59:00] I hate this check [14:59:28] PROBLEM - puppet last run on analytics1016 is CRITICAL: CRITICAL: Puppet has 1 failures [14:59:29] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: Puppet has 1 failures [14:59:29] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [14:59:29] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: Puppet has 1 failures [14:59:47] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: Puppet has 1 failures [14:59:58] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: Puppet has 1 failures [15:05:01] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: puppet fail [15:07:37] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [15:08:05] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [15:09:13] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [15:09:23] RECOVERY - puppet last run on db1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:09:34] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:09:34] RECOVERY - puppet last run on es1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:09:34] RECOVERY - puppet last run on polonium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:09:34] RECOVERY - puppet last run on mw1044 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [15:09:34] RECOVERY - puppet last run on amslvs1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:09:34] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:09:53] RECOVERY - puppet last run on lithium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:09:57] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:10:04] RECOVERY - puppet last run on analytics1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:10:04] RECOVERY - puppet last run on cp1058 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:10:13] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:10:13] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:10:14] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:10:14] RECOVERY - puppet last run on mw1055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:10:23] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:10:24] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:10:33] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:10:42] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:10:43] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:10:44] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:10:55] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:10:55] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:11:05] RECOVERY - puppet last run on pc1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:11:05] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:11:15] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:11:17] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:11:17] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:11:25] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:11:31] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:11:31] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:11:45] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:11:53] (03PS1) 10Hoo man: Enable Wikidata data transclusion for commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176929 [15:11:59] aude: ^ FYI [15:12:12] just so that we don't duplicate change sets :P [15:12:19] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:16:06] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:21] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [15:17:04] (03CR) 10Manybubbles: [C: 031] "Patch looks fine. Confirmed community consent." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176686 (owner: 10Vogone) [15:18:03] (03CR) 10Manybubbles: [C: 031] "Change looks fine and confirmed community consensus." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176808 (owner: 10Vogone) [15:18:44] (03CR) 10Manybubbles: [C: 031] "If gi11es says this is a good idea then cool!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176912 (owner: 10Gilles) [15:19:25] gi11es: so you resolved the issues with chaining thumbnails? Or was it that it always worked great for JPGs but never PNGs? [15:19:36] anomie: I'll SWAT today [15:19:44] manybubbles: ok1 [15:19:49] 1 [15:19:58] !!111oneoneone [15:21:26] PROBLEM - puppet last run on virt1007 is CRITICAL: CRITICAL: Puppet has 1 failures [15:23:47] <^d> anomie: You left out eleven. [15:24:07] (03PS1) 10Manybubbles: Create new pool counter for prefix searches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176931 [15:24:09] (03PS1) 10Manybubbles: Lower full text search queue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176932 [15:24:28] (03CR) 10Hashar: base: Make atop log retention configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/176909 (owner: 10Yuvipanda) [15:24:49] (03CR) 10Manybubbles: [C: 04-1] "Should be deployed before Cirrus support." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176931 (owner: 10Manybubbles) [15:25:53] (03CR) 10Manybubbles: [C: 04-1] "We should do this after splitting the prefix search into another queue in cirrus. And we should be ready to roll it back if there are iss" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176932 (owner: 10Manybubbles) [15:26:29] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Puppet has 8 failures [15:28:51] !log rebooting analytics1033 to verify bios settings [15:28:54] Logged the message, Master [15:31:09] PROBLEM - HHVM rendering on mw1220 is CRITICAL: Connection refused [15:31:42] RECOVERY - puppet last run on virt1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:31:53] (03CR) 10Filippo Giunchedi: [C: 031] "let me know when you plan to deploy this so we'll keep an eye on the image scalers and swift" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176912 (owner: 10Gilles) [15:32:49] PROBLEM - Host analytics1033 is DOWN: PING CRITICAL - Packet loss = 100% [15:33:59] RECOVERY - HHVM rendering on mw1220 is OK: HTTP OK: HTTP/1.1 200 OK - 70271 bytes in 2.198 second response time [15:34:42] RECOVERY - Host analytics1033 is UP: PING OK - Packet loss = 0%, RTA = 2.15 ms [15:34:56] (03CR) 10Manybubbles: "Its scheduled to go out in about 45 minutes." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176912 (owner: 10Gilles) [15:34:59] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:36:13] manybubbles: ack (176912) [15:36:33] godog: sweet. [15:49:13] !log repooled mw1220, re-imaging to hhvm complete [15:49:16] Logged the message, Master [15:55:17] manybubbles: I resolved the rotation issue for JPGs. I stopped trying to make it work for PNGs, I figured that the results from JPGs would be sufficient to see if it's worth the effort to implement for all formats [15:57:43] (03CR) 10Anomie: [C: 031] "Seems sane. [[meta:User:WMF Trademark Abuse]] appears to exist." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176876 (owner: 10Deskana) [15:59:28] anyone seen vogone to verify SWAT patches? [16:00:04] manybubbles, anomie, ^d, marktraceur: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141202T1600). Please do the needful. [16:00:41] Is manybubbles doing it? [16:01:04] marktraceur: He claimed it 41 minutes ago [16:01:23] Wow, on the ball [16:01:23] (03CR) 10Manybubbles: [C: 032] Enable JPG thumbnail chaining on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176912 (owner: 10Gilles) [16:01:31] (03Merged) 10jenkins-bot: Enable JPG thumbnail chaining on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176912 (owner: 10Gilles) [16:01:44] Gasp, exciting [16:01:44] marktraceur: yeah! this morning I'm being good! [16:02:47] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable jpg thumbnail chaining on commons (duration: 00m 06s) [16:02:49] Logged the message, Master [16:03:03] godog: ^^^ [16:03:08] gi11es: ^^^ [16:03:20] (03CR) 10Aude: [C: 031] Enable Wikidata data transclusion for commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176929 (owner: 10Hoo man) [16:03:24] manybubbles: testing... [16:03:37] looking for vogone or anyone else that can verify https://gerrit.wikimedia.org/r/#/c/176686 and https://gerrit.wikimedia.org/r/#/c/176808 [16:03:56] manybubbles: aye, thanks! [16:03:59] gi11es: I seem to be seeing more of Dec 2 16:02:57 mw1155: PHP Warning: mwstore://local-swift-eqiad/local-thumb/f/f4/Ms61_reuil.jpg/1024px-Ms61_reuil.jpg was not stored with SHA-1 metadata. in /srv/mediawiki/php-1.25wmf9/includes/filebackend/SwiftFileBackend.php on line 668 [16:04:08] is that ok/to be expected? [16:04:26] its not a devestating flood but its happening [16:04:43] ok, I've never seen that error before but it's probably related [16:04:56] the chained sizes are powers of 2, 1024 is one of them [16:05:06] gi11es: I'm here btw if there's sth strange going on [16:06:33] gi11es and godog: I also see 512 in there too [16:06:39] yep, makes sense [16:07:07] is it ok? I mean, its a WARNING but I don't know if that means we're causing trouble for people or not [16:07:18] I have to look into what that warning means [16:07:44] I'll check if that particular image is served correctly first [16:07:49] it looks ok [16:07:55] I tried one and it was served [16:08:09] I'm willing to let the warning sit for now as long as you are looking into it [16:08:47] https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/Ms61_reuil.jpg/1024px-Ms61_reuil.jpg [16:09:23] yeah it seems it emits the warning when backfilling sha1 for a given image, so I guess we weren't serving that before [16:10:54] chaining is working, based on a test image of mine (i.e. the expected sizes already existed by the time I requested them) [16:11:22] now I'll have a look at what that warning means [16:11:41] gi11es: cool. We can live with it so long as you are working on it. a SWAT for a later date [16:11:52] yep [16:12:54] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Very minor comment just to be on the same grounds as the rest of the packages. Otherwise LGTM" (032 comments) [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/176924 (owner: 10KartikMistry) [16:12:57] I guess it's probably time to make a vagrant role for a swift backend :) [16:13:25] gi11es: +1 [16:14:20] (03PS1) 10Giuseppe Lavagetto: HAT: remove the last references to the hhvm pool [puppet] - 10https://gerrit.wikimedia.org/r/176939 [16:14:41] the warning looks harmless, as godog says it's just stating that it's repairing data as it finds it [16:14:59] <_joe_> I finally nailed down the ganglia instability :/ [16:15:01] (03PS2) 10KartikMistry: Initial Debian packaging for apertium-urd-hin [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/176924 [16:15:16] <_joe_> well, metrics instability [16:15:32] ok, given no vogone I'm going to call SWAT done. [16:15:41] (03PS2) 10Giuseppe Lavagetto: HAT: remove the last references to the hhvm pool [puppet] - 10https://gerrit.wikimedia.org/r/176939 [16:15:47] gi11es: nice, puts for swift went up ~2x and post too, http://gdash.wikimedia.org/dashboards/swift.eqiad-prod/ [16:15:53] (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-br-fr] - 10https://gerrit.wikimedia.org/r/176885 [16:15:56] (03CR) 10Giuseppe Lavagetto: [C: 032] HAT: remove the last references to the hhvm pool [puppet] - 10https://gerrit.wikimedia.org/r/176939 (owner: 10Giuseppe Lavagetto) [16:15:58] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Same as for https://gerrit.wikimedia.org/r/#/c/176924/ and a question: That GPL2 license when every other packages is GPL3 is deliberate r" (033 comments) [debs/contenttranslation/apertium-id-ms] - 10https://gerrit.wikimedia.org/r/176889 (owner: 10KartikMistry) [16:16:23] (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-eo-en] - 10https://gerrit.wikimedia.org/r/176886 [16:16:31] (03CR) 10Giuseppe Lavagetto: [V: 032] HAT: remove the last references to the hhvm pool [puppet] - 10https://gerrit.wikimedia.org/r/176939 (owner: 10Giuseppe Lavagetto) [16:16:50] godog: the expected upside should be lower CPU usage on the scalers [16:18:51] or less spikes caused by thumbnailing large images, at least [16:18:59] (03PS1) 10BBlack: Add yuvipanda to icinga lists for ops people [puppet] - 10https://gerrit.wikimedia.org/r/176940 [16:19:29] gi11es: indeed, we could even increase the workers there now, I'm not overly concerned by the additional PUTs, we'll see tomorrow how it does over 24h [16:19:31] (03CR) 10Yuvipanda: [C: 031] "Aha!" [puppet] - 10https://gerrit.wikimedia.org/r/176940 (owner: 10BBlack) [16:19:51] (03CR) 10BBlack: [C: 032] Add yuvipanda to icinga lists for ops people [puppet] - 10https://gerrit.wikimedia.org/r/176940 (owner: 10BBlack) [16:21:13] (03CR) 10Alexandros Kosiaris: [C: 032] Added initial Debian packaging [debs/contenttranslation/apertium-br-fr] - 10https://gerrit.wikimedia.org/r/176885 (owner: 10KartikMistry) [16:21:21] (03CR) 10Alexandros Kosiaris: [V: 032] Added initial Debian packaging [debs/contenttranslation/apertium-br-fr] - 10https://gerrit.wikimedia.org/r/176885 (owner: 10KartikMistry) [16:21:41] (03PS2) 10KartikMistry: Added initial Debian packaging for apertium-id-ms [debs/contenttranslation/apertium-id-ms] - 10https://gerrit.wikimedia.org/r/176889 [16:22:48] (03CR) 10KartikMistry: Added initial Debian packaging for apertium-id-ms (031 comment) [debs/contenttranslation/apertium-id-ms] - 10https://gerrit.wikimedia.org/r/176889 (owner: 10KartikMistry) [16:22:59] <_joe_> !log depooling mw1071-1080 [16:23:02] Logged the message, Master [16:30:33] PROBLEM - Host mw1070 is DOWN: PING CRITICAL - Packet loss = 100% [16:31:14] <_joe_> ugh, my bad [16:32:44] RECOVERY - Host mw1070 is UP: PING OK - Packet loss = 0%, RTA = 4.47 ms [16:33:28] (03PS1) 10Alexandros Kosiaris: Fix mathoid.svc LVS check [puppet] - 10https://gerrit.wikimedia.org/r/176942 [16:40:56] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: puppet fail [16:45:49] manybubbles: I'm on IRC now [16:46:35] Vogone: ah. I've moved back to other work. can you push them to the next SWAT? [16:47:01] well, tomorrow then :) [16:47:13] I'm usually not awake at night :p [16:48:04] Vogone: sorry! Is there a SWAT slot that lines up better with your day time? [16:48:42] well, it's no problem for me to wait 24h, the change is not for me ;) [16:49:37] yeah - its just that you'd have to be awake in 24h. if its the middle of the night for you now then it wouldn't help too much. but we have SWATs a couple hours in either direction of this one [16:51:28] sorry, I meant I'm not awake at next SWAT … the current slot is perfect, I've just been busy and forgot about SWAT, so it's my fault :) [16:52:25] ah! [16:55:03] cmjohnson: Are those new virt servers racked? [16:55:45] i havent' received them yet [16:55:58] the shipment was the labs spare disk shelf [16:56:04] lemme check on them [16:56:31] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:56:34] oh? Someone yesterday said that they were in already. Maybe I misunderstood [16:57:48] I think robh said that but I was just back in receiving and they were not there [16:58:10] <_joe_> !log uploaded hhvm 3.3.0+dfsg1-1+wm5 [16:58:15] Logged the message, Master [16:58:31] they're in the area [16:59:06] cmjohnson: ok then :) Let me know when I can start imaging. [16:59:17] cmjohnson: it tracks to say delivered [16:59:24] the info on the ticket says so [16:59:28] on the 23rd [16:59:40] oh, wait [16:59:42] im reading this wrong [16:59:50] 28-Nov-2014 17:00:00 MST Requested Delivery Date [16:59:55] 24-Nov-2014 12:00:00 GMT (est.) Est Arrival at Dest [17:00:00] im not really sure wtf [17:00:07] yeah that site sux [17:00:09] cmjohnson: did you want to call the freight company or shall i? [17:00:15] i figured you were ;] [17:00:17] I can call them [17:00:32] i love old dominion [17:00:36] they jsut email us and it works. [17:00:49] ceva is ok but they bash shit up a lot [17:01:04] i've never heard of these expeditors freight [17:01:33] yep [17:06:26] PROBLEM - nutcracker port on mw1074 is CRITICAL: Connection refused by host [17:06:37] PROBLEM - DPKG on mw1217 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:07:15] PROBLEM - HHVM processes on mw1072 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [17:07:21] PROBLEM - nutcracker process on mw1074 is CRITICAL: Connection refused by host [17:07:35] PROBLEM - DPKG on mw1074 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:08:35] PROBLEM - DPKG on mw1219 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:08:56] PROBLEM - RAID on mw1072 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [17:09:16] PROBLEM - check configured eth on mw1072 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [17:09:35] PROBLEM - check if dhclient is running on mw1072 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [17:09:37] RECOVERY - nutcracker port on mw1074 is OK: TCP OK - 0.000 second response time on port 11212 [17:09:52] <_joe_> mh re-scheduling downtime it is [17:09:55] PROBLEM - check if salt-minion is running on mw1072 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [17:09:56] PROBLEM - puppet last run on mw1070 is CRITICAL: CRITICAL: Puppet has 102 failures [17:10:17] RECOVERY - nutcracker process on mw1074 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [17:10:35] PROBLEM - nutcracker port on mw1072 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [17:10:35] PROBLEM - DPKG on mw1071 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:10:35] RECOVERY - DPKG on mw1074 is OK: All packages OK [17:10:36] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Puppet has 103 failures [17:10:45] PROBLEM - nutcracker process on mw1072 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [17:10:56] PROBLEM - DPKG on mw1072 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [17:10:57] PROBLEM - puppet last run on mw1072 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [17:11:35] RECOVERY - DPKG on mw1219 is OK: All packages OK [17:12:47] RECOVERY - DPKG on mw1217 is OK: All packages OK [17:12:47] PROBLEM - puppet last run on mw1217 is CRITICAL: CRITICAL: Puppet has 102 failures [17:13:35] RECOVERY - DPKG on mw1071 is OK: All packages OK [17:14:35] PROBLEM - puppet last run on mw1219 is CRITICAL: CRITICAL: Puppet has 1 failures [17:15:18] (03CR) 10Chad: [C: 032 V: 032] Upgrade phabricator plugins to force UTF-8 for conduit [gerrit/plugins] - 10https://gerrit.wikimedia.org/r/176336 (owner: 10QChris) [17:15:20] RECOVERY - check configured eth on mw1072 is OK: NRPE: Unable to read output [17:15:27] RECOVERY - check if dhclient is running on mw1072 is OK: PROCS OK: 0 processes with command name dhclient [17:15:46] RECOVERY - check if salt-minion is running on mw1072 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [17:16:15] RECOVERY - HHVM processes on mw1072 is OK: PROCS OK: 1 process with command name hhvm [17:16:26] RECOVERY - nutcracker port on mw1072 is OK: TCP OK - 0.000 second response time on port 11212 [17:16:46] RECOVERY - nutcracker process on mw1072 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [17:16:55] RECOVERY - DPKG on mw1072 is OK: All packages OK [17:17:45] RECOVERY - RAID on mw1072 is OK: OK: no RAID installed [17:18:01] (03PS1) 10GWicke: Use Cassandra's datacenter1 default [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/176954 [17:22:31] (03CR) 10Ottomata: [C: 032 V: 032] Use Cassandra's datacenter1 default [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/176954 (owner: 10GWicke) [17:24:51] (03PS1) 10GWicke: Update the cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/176958 [17:25:36] (03CR) 10Ottomata: [C: 032 V: 032] Update the cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/176958 (owner: 10GWicke) [17:28:53] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Puppet has 102 failures [17:29:15] PROBLEM - DPKG on mw1211 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:29:56] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Puppet has 102 failures [17:30:34] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 101 failures [17:30:34] PROBLEM - DPKG on mw1213 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:30:52] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Puppet has 102 failures [17:31:24] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Puppet has 101 failures [17:33:38] RECOVERY - DPKG on mw1213 is OK: All packages OK [17:33:39] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Puppet has 1 failures [17:34:25] RECOVERY - puppet last run on mw1215 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:35:54] Seems like Gerrit is really slow? See the timestamps at the beginning of this job: https://integration.wikimedia.org/ci/job/mwext-DonationInterface-npm/484/console [17:35:56] RECOVERY - puppet last run on mw1219 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:38:14] RECOVERY - DPKG on mw1211 is OK: All packages OK [17:38:14] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Puppet has 1 failures [17:40:03] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:40:03] RECOVERY - puppet last run on mw1070 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:41:57] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:42:56] (03PS6) 10BryanDavis: logstash: Forward syslog events for apache2 + hhvm [puppet] - 10https://gerrit.wikimedia.org/r/176693 [17:43:11] <_joe_> awight: jenkins, not gerrit then? [17:43:30] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:43:58] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [17:46:46] RECOVERY - puppet last run on mw1217 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:47:14] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "hiera config needs fixing, and one small comment" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/176693 (owner: 10BryanDavis) [17:47:45] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:49:48] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:51:07] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [17:51:25] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [17:56:40] (03PS3) 10Spage: $wgContentHandlerUseDB true everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170129 (https://bugzilla.wikimedia.org/49193) [17:57:39] !log Deployed scap @ 6694d147a5b757dfbc747f0732185b014e82e9bb [17:57:42] Logged the message, Master [17:58:13] (03PS7) 10BryanDavis: logstash: Forward syslog events for apache2 + hhvm [puppet] - 10https://gerrit.wikimedia.org/r/176693 [17:58:23] (03CR) 10BryanDavis: logstash: Forward syslog events for apache2 + hhvm (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/176693 (owner: 10BryanDavis) [17:58:49] !log reedy Synchronized wmf-config/: nooop to test scap (duration: 00m 05s) [17:58:51] Logged the message, Master [17:59:50] (03CR) 10Spage: "Removing my -2, but I feel it's Chris Steipp's call to +2 this. I think blocker Bug T73163 is still open." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170129 (https://bugzilla.wikimedia.org/49193) (owner: 10Spage) [18:00:05] maxsem, kaldari: Dear anthropoid, the time has come. Please deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141202T1800). [18:00:53] (03PS1) 10Hoo man: Adjust UpdateRepo debug log for Wikidata wmf10 deploy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176963 [18:06:06] !log Reverted deployment of scap 6694d147a5b757dfbc747f0732185b014e82e9bb, scap now at b8fb82eb1834e3691287a6e24f8384c6c2259710 [18:06:08] Logged the message, Master [18:06:27] !log reedy Synchronized wmf-config/: noop for scap test (duration: 00m 06s) [18:06:29] Logged the message, Master [18:06:39] PROBLEM - puppet last run on mw1076 is CRITICAL: CRITICAL: Puppet has 102 failures [18:07:38] PROBLEM - DPKG on mw1077 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:08:04] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: Puppet has 102 failures [18:08:15] !log repooling mw121[0-9] as HHVM [18:08:18] Logged the message, Master [18:08:39] PROBLEM - puppet last run on mw1079 is CRITICAL: CRITICAL: Puppet has 102 failures [18:09:21] PROBLEM - puppet last run on mw1080 is CRITICAL: CRITICAL: Puppet has 102 failures [18:10:29] RECOVERY - DPKG on mw1077 is OK: All packages OK [18:10:29] PROBLEM - puppet last run on mw1077 is CRITICAL: CRITICAL: Puppet has 102 failures [18:15:29] (03CR) 10Aude: [C: 031] Adjust UpdateRepo debug log for Wikidata wmf10 deploy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176963 (owner: 10Hoo man) [18:18:31] PROBLEM - HHVM rendering on mw1078 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:20:57] manybubbles ^d ottomata|afk are we meeting in 10? [18:21:28] RECOVERY - HHVM rendering on mw1078 is OK: HTTP OK: HTTP/1.1 200 OK - 70363 bytes in 0.306 second response time [18:21:39] <^d> i suppose so [18:22:40] cool [18:24:11] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:29:17] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:29:27] PROBLEM - puppet last run on search1009 is CRITICAL: CRITICAL: Puppet has 1 failures [18:29:38] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: Puppet has 1 failures [18:30:17] RECOVERY - puppet last run on mw1076 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:30:38] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Puppet has 1 failures [18:30:47] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: Puppet has 1 failures [18:31:07] (03PS1) 10Hoo man: Set displayStatementsOnProperties to true for beta Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176973 [18:31:23] PROBLEM - puppet last run on wtp1009 is CRITICAL: CRITICAL: Puppet has 1 failures [18:34:18] RECOVERY - puppet last run on mw1077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:35:27] (03PS1) 10GWicke: Cherry-pick trebuchet pull #15 [puppet] - 10https://gerrit.wikimedia.org/r/176976 [18:35:30] <_joe_> !log repooling mw1076-mw1080 [18:35:33] Logged the message, Master [18:36:29] (03CR) 10jenkins-bot: [V: 04-1] Cherry-pick trebuchet pull #15 [puppet] - 10https://gerrit.wikimedia.org/r/176976 (owner: 10GWicke) [18:37:41] (03PS2) 10GWicke: Cherry-pick trebuchet pull #15 [puppet] - 10https://gerrit.wikimedia.org/r/176976 [18:39:24] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:40:48] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:41:19] RECOVERY - puppet last run on search1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:41:30] RECOVERY - puppet last run on cp1057 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:42:47] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:43:27] RECOVERY - puppet last run on wtp1009 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:45:00] (03CR) 10Ryan Lane: [C: 031] Cherry-pick trebuchet pull #15 [puppet] - 10https://gerrit.wikimedia.org/r/176976 (owner: 10GWicke) [18:51:04] !log depooling mw1209 for HHVM re-imaging [18:51:10] Logged the message, Master [18:54:55] (03CR) 10Ori.livneh: [C: 031] "Has this been tested anywhere?" [puppet] - 10https://gerrit.wikimedia.org/r/176976 (owner: 10GWicke) [18:56:18] ottomata: let me know if you find some time today for #8953 or I can take a look tomorrow european morning too [18:57:12] if i could get my trusty vagrant machine up i'd just do it... [18:57:13] grrr [18:57:40] although [18:57:46] i'm wondering how it is installed in precise nodes [18:57:54] it is currently packages for lucid [18:57:59] unless someone just didn't update the changelog [18:58:27] hm, yeah, i just installed poolcounter with a simple apt-get install [18:58:29] hm [18:58:42] perhaps just copied in reprepro then [18:58:47] (03CR) 10GWicke: "@ori: I haven't tested it, but if you look at the code it's doing the same thing for deploy.repo-name a few lines earlier, and that works " [puppet] - 10https://gerrit.wikimedia.org/r/176976 (owner: 10GWicke) [18:59:17] well, i mean, i dunno taht much about apt or reprepro, but I Just did apt-get install on my trusty box [18:59:23] and it pulled it out of apt.wm.o and installed it [18:59:40] manybubbles: ^ [19:00:05] Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141202T1900). [19:00:09] hm. [19:00:15] iunno [19:00:49] yeah so I think what might have happened is that it got simply copied (e.g. via reprepro) from distro to distro [19:02:11] is that ok to do? [19:02:29] awight: Did you scap after deploying those new messages? Noting the l10nupdate is essentially a noop atm [19:02:52] and, if not, what should the version change to? [19:02:57] i'm not changing sources, just rebuilding [19:03:04] 1.0.2-precise? [19:03:11] sorry [19:03:12] -trusty [19:03:13] ? [19:03:15] godog: ^? [19:03:50] (03PS1) 10Reedy: Non wikipedias to 1.25wmf10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176983 [19:04:01] hoo: Did I see a bump commit earlier for wikidata? [19:04:26] huh? [19:04:34] ottomata: same version would work I think if you are not changing anything and uploading brand new to trusty [19:04:55] hoo: submodule update for wikidata in wmf10? [19:04:59] hm ok [19:05:07] Not aware of that [19:05:08] aude: ^ [19:05:15] Last update should have been yesterday [19:05:17] godog, i'm not sure about this, but i'm worrie dreprepro will delete teh old version [19:05:22] if it is named the same [19:05:54] (03PS1) 10Aude: Bump cache epoch for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176985 [19:06:04] ottomata: I think it is fine to reference the same (package, version) from precise and trusty [19:06:13] hoo: Reedy yes, it was yesterday [19:06:18] ok :) [19:06:23] ok [19:06:37] (03CR) 10Reedy: [C: 032] Bump cache epoch for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176985 (owner: 10Aude) [19:06:47] (03Merged) 10jenkins-bot: Bump cache epoch for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176985 (owner: 10Aude) [19:07:05] (03PS2) 10Reedy: Adjust UpdateRepo debug log for Wikidata wmf10 deploy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176963 (owner: 10Hoo man) [19:07:09] (03CR) 10Reedy: [C: 032] Adjust UpdateRepo debug log for Wikidata wmf10 deploy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176963 (owner: 10Hoo man) [19:07:21] (03Merged) 10jenkins-bot: Adjust UpdateRepo debug log for Wikidata wmf10 deploy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176963 (owner: 10Hoo man) [19:07:30] (03PS2) 10Reedy: Enable Wikidata data transclusion for commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176929 (owner: 10Hoo man) [19:07:36] (03CR) 10Reedy: [C: 032] Enable Wikidata data transclusion for commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176929 (owner: 10Hoo man) [19:07:45] (03Merged) 10jenkins-bot: Enable Wikidata data transclusion for commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176929 (owner: 10Hoo man) [19:08:02] (03PS2) 10Reedy: Set displayStatementsOnProperties to true for beta Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176973 (owner: 10Hoo man) [19:08:06] (03CR) 10Reedy: [C: 032] Set displayStatementsOnProperties to true for beta Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176973 (owner: 10Hoo man) [19:08:15] (03Merged) 10jenkins-bot: Set displayStatementsOnProperties to true for beta Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176973 (owner: 10Hoo man) [19:08:45] (03CR) 10Reedy: [C: 031] Bouncehandler: Use the API pool for API requests [puppet] - 10https://gerrit.wikimedia.org/r/174951 (owner: 10Hoo man) [19:09:11] (03PS2) 10Reedy: Non wikipedias to 1.25wmf10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176983 [19:09:24] (03CR) 10Reedy: [C: 032] Non wikipedias to 1.25wmf10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176983 (owner: 10Reedy) [19:09:33] (03Merged) 10jenkins-bot: Non wikipedias to 1.25wmf10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176983 (owner: 10Reedy) [19:10:07] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf10 [19:10:14] Logged the message, Master [19:10:25] Reedy: I left a note in the bug tracker--is scap necessary to make on-disk messages files available? [19:10:25] !log reedy Synchronized wmf-config/: Wikidata config updates (duration: 00m 06s) [19:10:28] Logged the message, Master [19:10:39] awight: Yup, you've gotta scap if you change any messages [19:10:57] aha! ok I'm a fool then [19:11:13] (03PS2) 10Reedy: Update LegalContactPages configuration per feedback from end user. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176876 (owner: 10Deskana) [19:11:18] (03CR) 10Reedy: [C: 032] Update LegalContactPages configuration per feedback from end user. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176876 (owner: 10Deskana) [19:11:18] that at least solves part of the mystery. [19:11:18] awight: not a fool, under informed [19:11:22] Reedy: confirmed that someone ran a scap since then and the messages are indeed updated :) [19:11:27] (03Merged) 10jenkins-bot: Update LegalContactPages configuration per feedback from end user. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176876 (owner: 10Deskana) [19:11:34] that's good at least [19:11:39] (that they're deployed!) [19:12:18] (03PS3) 10Reedy: Only enable Extension:Oversight on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169611 (https://bugzilla.wikimedia.org/60373) [19:12:24] (03CR) 10Reedy: [C: 032] Only enable Extension:Oversight on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169611 (https://bugzilla.wikimedia.org/60373) (owner: 10Reedy) [19:12:32] (03Merged) 10jenkins-bot: Only enable Extension:Oversight on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169611 (https://bugzilla.wikimedia.org/60373) (owner: 10Reedy) [19:12:45] (03PS2) 10Reedy: Add new import sources to dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176686 (owner: 10Vogone) [19:12:50] (03CR) 10Reedy: [C: 032] Add new import sources to dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176686 (owner: 10Vogone) [19:12:58] (03Merged) 10jenkins-bot: Add new import sources to dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176686 (owner: 10Vogone) [19:13:18] (03PS2) 10Reedy: Remove -hhvm suffix from beta multiversion config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173486 [19:13:29] (03CR) 10Reedy: [C: 032] Remove -hhvm suffix from beta multiversion config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173486 (owner: 10Reedy) [19:13:39] (03Merged) 10jenkins-bot: Remove -hhvm suffix from beta multiversion config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173486 (owner: 10Reedy) [19:14:05] (03PS2) 10Reedy: Add new namespaces to dewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176808 (owner: 10Vogone) [19:14:11] (03CR) 10Reedy: [C: 032] Add new namespaces to dewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176808 (owner: 10Vogone) [19:14:19] (03Merged) 10jenkins-bot: Add new namespaces to dewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176808 (owner: 10Vogone) [19:16:06] (03PS2) 10Reedy: Remove OpenSeachXml from extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176714 (owner: 10BryanDavis) [19:16:33] (03CR) 10Reedy: [C: 032] "We're not going to lose anything message wise if the extension is still about" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176714 (owner: 10BryanDavis) [19:16:43] (03Merged) 10jenkins-bot: Remove OpenSeachXml from extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176714 (owner: 10BryanDavis) [19:20:16] (03PS7) 10Reedy: Extra language names configuration for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176610 (owner: 10Dereckson) [19:20:21] (03CR) 10Reedy: [C: 032] Extra language names configuration for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176610 (owner: 10Dereckson) [19:20:29] (03Merged) 10jenkins-bot: Extra language names configuration for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176610 (owner: 10Dereckson) [19:21:02] (03PS2) 10Reedy: Fix Undefined index: fulltext [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173508 [19:21:06] (03CR) 10Reedy: [C: 032] Fix Undefined index: fulltext [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173508 (owner: 10Reedy) [19:21:19] (03Merged) 10jenkins-bot: Fix Undefined index: fulltext [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173508 (owner: 10Reedy) [19:21:37] Reedy: Whee. :-) [19:21:38] Krinkle|detached: We should probably try enabling the log of luls again for a few minutes ;D [19:21:45] James_F: Which one? :P [19:22:01] Reedy: Doing 13 of them in fewer minutes. :-) [19:22:08] heh [19:23:46] !log reedy Synchronized search-redirect.php: Fix undefined index spam (duration: 00m 06s) [19:23:49] Logged the message, Master [19:24:02] !log reedy Synchronized wmf-config/: Config updates (duration: 00m 06s) [19:24:05] Logged the message, Master [19:24:55] Reedy: wanna do https://gerrit.wikimedia.org/r/#/c/175796/ too? :) [19:25:06] reedy@fluorine:/a/mw-log$ du --si archive/error.log-20141118.gz [19:25:06] 40M archive/error.log-20141118.gz [19:25:07] Krinkle|detached: ^^ We were right, it compresses well xD [19:25:13] (03PS2) 10Reedy: Bring in cdb library via composer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175796 (owner: 10Legoktm) [19:25:48] (03CR) 10Reedy: [C: 032] Bring in cdb library via composer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175796 (owner: 10Legoktm) [19:26:09] (03Merged) 10jenkins-bot: Bring in cdb library via composer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175796 (owner: 10Legoktm) [19:26:39] !log reedy Synchronized multiversion/: CDB updates (duration: 00m 07s) [19:26:41] Logged the message, Master [19:26:54] !log reedy Synchronized wmf-config/missing.php: CDB updates (duration: 00m 06s) [19:26:57] Logged the message, Master [19:27:35] !log reedy Synchronized wmf-config/InitialiseSettings.php: Enable error log for next 10-15 minutes for the luls (duration: 00m 07s) [19:27:37] Logged the message, Master [19:27:41] the site is still up, woot :D [19:27:44] thanks! [19:28:12] reedy@fluorine:/a/mw-log$ du --si error.log [19:28:12] 135M error.log [19:28:27] legoktm: you should follow that patch up with the autoload order change [19:28:34] ah, right [19:28:52] bd808|LUNCH: well, mw's autoloader wouldn't have even been registered at that point... [19:30:37] reedy@fluorine:/a/mw-log$ grep -c GESHI_LANG_ROOT error.log [19:30:37] 268800 [19:30:39] srsly [19:32:28] (03PS1) 10Legoktm: Set prepend-autoload: false, optimize-autoload: true for multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176994 [19:32:30] !log reedy Synchronized wmf-config/InitialiseSettings.php: disable error logging (duration: 00m 05s) [19:32:35] Logged the message, Master [19:33:03] (03PS3) 10Dzahn: (WIP) facilities: move to module [puppet] - 10https://gerrit.wikimedia.org/r/176863 [19:34:45] (03PS1) 10Kaldari: Disabling WikiGrok on enwiki now that A/B test has concluded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176996 [19:35:11] (03CR) 10Kaldari: [C: 032] Disabling WikiGrok on enwiki now that A/B test has concluded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176996 (owner: 10Kaldari) [19:36:20] (03CR) 10Kaldari: [V: 032] Disabling WikiGrok on enwiki now that A/B test has concluded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176996 (owner: 10Kaldari) [19:37:14] PROBLEM - Apache HTTP on mw1111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:37:51] PROBLEM - Apache HTTP on mw1167 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:37:51] PROBLEM - Apache HTTP on mw1175 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:37:51] PROBLEM - Apache HTTP on mw1110 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:38:10] PROBLEM - Apache HTTP on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:38:15] !log kaldari Synchronized wmf-config/InitialiseSettings.php: Syncing InitialiseSettings for disabling WikiGrok on en.wiki (A/B test done) (duration: 00m 05s) [19:38:32] PROBLEM - Apache HTTP on mw1121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:39:01] hmm, godog [19:39:05] maybe not? [19:39:05] File "pool/main/p/poolcounter/poolcounter_1.0.2_amd64.deb" is already registered with different checksums! [19:39:55] !log reedy Synchronized php-1.25wmf9/extensions/SyntaxHighlight_GeSHi/: Fix noise in production (duration: 00m 06s) [19:39:58] Logged the message, Master [19:46:11] (03PS1) 10Kaldari: Deprecating wgMFWikiGrokAbTestStartDate and wgMFWikiGrokAbTestEndDate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177003 [19:47:01] (03PS2) 10Kaldari: Deprecating wgMFWikiGrokAbTestStartDate and wgMFWikiGrokAbTestEndDate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177003 [19:47:22] (03CR) 10Kaldari: [C: 032 V: 032] Deprecating wgMFWikiGrokAbTestStartDate and wgMFWikiGrokAbTestEndDate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177003 (owner: 10Kaldari) [19:47:44] PROBLEM - puppet last run on mw1209 is CRITICAL: CRITICAL: Puppet has 1 failures [19:48:24] PROBLEM - check if salt-minion is running on mw1188 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [19:48:24] ottomata: do you get this while trying to upload? [19:48:32] Ree WHAT? NO REEDY! [19:48:32] https://meta.wikimedia.org/wiki/Wikimedia_servers refers people to look at https://icinga.wikimedia.org for server load, but it requires login? [19:48:52] PROBLEM - Apache HTTP on mw1188 is CRITICAL: Connection refused [19:49:14] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: Puppet has 2 failures [19:49:22] !log kaldari Synchronized wmf-config/mobile.php: Deprecating WikiGrok A/B test congif vars (duration: 00m 09s) [19:49:24] Logged the message, Master [19:50:00] (03CR) 10BBlack: "As indicated in the related bug: I think we're good to merge this on our end, but I'd like to get some last-minute confirmation from someo" [puppet] - 10https://gerrit.wikimedia.org/r/173078 (owner: 10JanZerebecki) [19:51:05] PROBLEM - HHVM rendering on mw1188 is CRITICAL: Connection refused [19:51:21] Reedy: still deploying stuff? [19:51:39] RECOVERY - Apache HTTP on mw1111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.074 second response time [19:51:46] else, i'll put https://gerrit.wikimedia.org/r/#/c/177006/ for swat or something [19:51:58] * aude see kaldari dpeloying [19:52:05] aude: Nope, I think I'm done [19:52:09] ok [19:52:13] think we can wait [19:52:22] Has he sync'd? [19:52:23] !log restarted apache on mw1111 [19:52:24] aude: I'm almost done [19:52:25] Logged the message, Master [19:52:34] kaldari: ok [19:52:35] (03PS8) 10BryanDavis: logstash: Forward syslog events for apache2 + hhvm [puppet] - 10https://gerrit.wikimedia.org/r/176693 [19:52:37] I keep dropping from freenode [19:52:45] or i can just do it when kaldari is done :) [19:52:52] feel free [19:52:59] ok [19:56:14] godog, i got it while doing reprepro include [19:56:19] RECOVERY - puppet last run on mw1209 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [19:56:30] wasn't sure if you had left for the day...so i built a new one with -trusty1 in the version :/ [19:56:38] and reprepro included it [19:56:40] that worked [19:56:48] not certain it was the right thing to do [19:57:51] ottomata: that's fine too :) but yeah the other way is "reprepro copy" [19:59:19] (03PS4) 10Dzahn: facilities: move to module [puppet] - 10https://gerrit.wikimedia.org/r/176863 [20:00:25] Reedy: Is Extension:WikimediaMessages on the deployment train? [20:00:57] Reedy: Trying to figure out when this will go live: https://gerrit.wikimedia.org/r/#/c/176877/ [20:01:39] Deskana: Yeah, it is. So it'll go to group0 tomorrow, group1 in a week [20:01:39] Unless we backport [20:01:48] Reedy: Nah, that's not necessary. [20:01:51] Reedy: Thanks. :) [20:02:17] Reedy, aude: Looks like my deployment window is over, so I'll just wait for the SWAT window to finish (if that's OK) [20:02:39] aude: did the fixes for the sitegroup memcached usage thing get deployed? [20:02:46] (03PS1) 10Ottomata: Disable webrequest partition checks in icinga [puppet] - 10https://gerrit.wikimedia.org/r/177009 [20:02:52] ori: not yet [20:02:58] will hit Wikipedias tomorrow [20:03:20] aude: I'm done now [20:03:23] Well, it's on non-Wikipedia now AFAIS [20:03:25] (03PS2) 10Ottomata: Disable webrequest partition checks in icinga [puppet] - 10https://gerrit.wikimedia.org/r/177009 [20:03:35] hoo: awesome [20:03:36] ok [20:03:40] Reedy: What about that config change you merged? Does that go live immediately? [20:03:43] kaldari: there's still an hour left of this window... I'm done, and I think there is only aude waiting so you can carry on [20:03:45] Deskana: Yeah [20:03:54] ori: what hoo says [20:04:02] thanks kaldari :) [20:04:13] Reedy: thanks [20:04:16] (03CR) 10Krinkle: [C: 031] Set prepend-autoload: false, optimize-autoload: true for multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176994 (owner: 10Legoktm) [20:04:29] aude: awesome, thanks [20:04:32] (03PS2) 10Krinkle: multiversion: Set prepend-autoload: false, optimize-autoload: true for composer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176994 (owner: 10Legoktm) [20:04:34] aude: Just let me know when you're finished and I'll sneak one more change out after you :) [20:04:53] ok [20:05:02] will be quick (as jenkins allows) [20:05:14] aude: I need to grab some lunch anyway :) [20:05:18] ok [20:05:39] (03CR) 10Ottomata: [C: 032] Disable webrequest partition checks in icinga [puppet] - 10https://gerrit.wikimedia.org/r/177009 (owner: 10Ottomata) [20:08:07] (03PS1) 10Ottomata: Update cassandra module (fix) [puppet] - 10https://gerrit.wikimedia.org/r/177013 [20:08:28] (03CR) 10Ottomata: [C: 032 V: 032] Update cassandra module (fix) [puppet] - 10https://gerrit.wikimedia.org/r/177013 (owner: 10Ottomata) [20:08:30] Krinkle: seems we're getting a few incomplete notices etc [20:08:30] 2014-12-02 19:32:40 mw1156 commonswiki: [b6526f4c] /w/thumb_handler.php/a/ac/Portal_Pfarrkirche_Mariae_Empfaengnis_Neubeuern-1.jpg/84px-Portal_Pfarrkirche_Mariae_Empfae [20:08:30] ngnis_Neubeuern-1.jpg ErrorException from line 2822 of /srv/mediawiki/php-1.25wmf10/includes/GlobalFunctions.php: PHP Notice: [20:08:31] next line is stack trace [20:08:32] guess the stack trace is semi telling [20:08:33] #0 [internal function]: MWExceptionHandler::handleError(1024, '', '/srv/mediawiki/...', 2822, Array) [20:08:33] #1 /srv/mediawiki/php-1.25wmf10/includes/GlobalFunctions.php(2822): trigger_error('') [20:08:34] #2 /srv/mediawiki/php-1.25wmf10/includes/GlobalFunctions.php(2922): wfShellExec(''/usr/bin/conve...', 0, Array, Array, Array) [20:08:52] Reedy: Ha [20:08:53] Yeah [20:09:05] (03CR) 10JanZerebecki: "They currently redirect to HTTPS via JavaScript, so I don't think notification before deployment is necessary." [puppet] - 10https://gerrit.wikimedia.org/r/173078 (owner: 10JanZerebecki) [20:09:08] I guess it's getting written two from two places at once [20:10:03] Krinkle: also, the trigger_error() is @ prefixed [20:11:28] !log aude Synchronized php-1.25wmf10/extensions/Wikidata: (no message) (duration: 00m 12s) [20:11:31] Logged the message, Master [20:11:32] * ^d twitches @ @ [20:11:55] mw1188 returned [127]: bash: /srv/deployment/scap/scap/bin/sync-common: No such file or directory [20:12:02] is that a known issue? [20:12:24] That might be one of the hosts that was reimaged today? [20:12:34] (03CR) 10Krinkle: "Syntax correct?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/173078 (owner: 10JanZerebecki) [20:12:50] Reedy: Hm.. why? [20:12:51] it has nothing there [20:12:54] What can @ do? [20:12:55] in /srv/mediawiki [20:13:00] Reedy: What can trigger_Error do? [20:13:03] suppose i can ignore [20:13:49] aude: I think it's broken. I saw YuviPanda mention it in another channel just now [20:13:58] ok [20:14:13] aude: yup, that one's just broken. it't out of pybal tho. [20:14:32] !log repooled mw1209 [20:14:35] Logged the message, Master [20:14:40] * bd808 can't wait for pybal and scap to be closer friends [20:15:06] YuviPanda: you may like my pybal shell helpers [20:15:15] ooooohhhh wheeerre [20:15:48] YuviPanda: https://github.com/wikimedia/operations-puppet/blob/production/modules/admin/files/home/ori/.hosts/palladium [20:16:07] pybal query mw1189, pybal depool mw1189, pybal repool mw1189, etc. [20:16:29] Krinkle: https://github.com/wikimedia/mediawiki/commit/e53af95c9301ca092ffa1f7de022beb24d60ea52 [20:16:30] "clear last error" [20:17:17] (03CR) 10JanZerebecki: [C: 04-1] "Uh. Thx. Will amend." [puppet] - 10https://gerrit.wikimedia.org/r/173078 (owner: 10JanZerebecki) [20:17:32] ori: why not move that into /usr/local/bin? [20:17:42] YuviPanda: fine by me [20:17:53] cool, I shall do that once I've mw1188 done. [20:18:13] RECOVERY - check if salt-minion is running on mw1188 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [20:19:40] Reedy: Ah, *that* trigger_error, not the one we fire in the debugging/exception classes [20:19:54] kaldari: i am done [20:19:59] nvm, I read it wrong. There is no trigger_error anyway in that class [20:21:15] (03CR) 10Ori.livneh: "I asked: https://ru.wikinews.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BD%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8:%D0%A4%D0%BE%D1%80%D1%83%D0%BC/%D" [puppet] - 10https://gerrit.wikimedia.org/r/173078 (owner: 10JanZerebecki) [20:30:40] aude: did you finish. Sorry got dropped for a bit. [20:31:06] kaldari|2: yes [20:32:38] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [20:32:52] ANyone able to look at mailman? The queued messages are increasing quite dramatically http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=sodium.wikimedia.org&r=hour&z=default&jr=&js=&st=1417552277&v=15367&m=exim%20queued%20messages&vl=messages&z=large [20:33:26] RECOVERY - Apache HTTP on mw1188 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 2.985 second response time [20:34:26] RECOVERY - HHVM rendering on mw1188 is OK: HTTP OK: HTTP/1.1 200 OK - 69046 bytes in 1.899 second response time [20:35:13] dramatic increase, but half the size of last week's peak. related to fundraising, i'd guess? [20:36:59] Reedy: matanya: re: wikimania videos.. doesnt look very good :/ [20:37:09] rsync: read errors mapping "/media/wikimania2014/Education I - Medicine.mov": Input/output error (5) [20:37:11] why? [20:37:12] and so on [20:37:15] WARNING: Social Machines II - Limits.mov failed verification -- update discarded (will try again). [20:37:20] just checked the screen [20:37:21] broken source ? [20:37:27] it seems like it, yea [20:37:37] i have some that made it [20:37:39] mutante: it arrived corrupted it seems [20:37:41] i/o errors [20:37:48] Chris has trouble from his macbook [20:37:58] the encoding was running all day [20:38:08] oh, you already started some? cool [20:38:11] that's something [20:39:23] done 1480 sec @ 9987 kbps so far [20:39:40] (03PS2) 10JanZerebecki: Change ru.wikinews.org to HTTPS only. [puppet] - 10https://gerrit.wikimedia.org/r/173078 [20:40:20] who was the original owner? fae ? [20:40:35] i just see 2 files that actually arrived :/ [20:40:35] looking at mailman/sodium: i don't see errors in the logs and icinga is green. it's running 10.04 :\ [20:40:40] nope [20:40:40] office it [20:40:46] julie ? [20:40:55] yea [20:40:58] she was on the video team [20:41:09] jgage: nope fundraising emails are sent from a non wmf server [20:41:10] and who took the role from here? [20:41:19] *her [20:41:28] jamesofur: hmm ok thanks, i'll dig deeper [20:41:34] yeah [20:42:01] (03Draft1) 10JanZerebecki: DO NOT MERGE: disable icinga use of naggen for labs test [puppet] - 10https://gerrit.wikimedia.org/r/158339 [20:42:08] matanya: ideally, we need them to format a disk with ext3/4 or whatever, and send it again [20:42:16] (03CR) 10JanZerebecki: [C: 04-1] DO NOT MERGE: disable icinga use of naggen for labs test [puppet] - 10https://gerrit.wikimedia.org/r/158339 (owner: 10JanZerebecki) [20:42:21] or go via ulsfo [20:42:41] Reedy: i wouldn't mind uploading from youtube if it is some much hassle [20:42:46] jgage: There's a few boxes that still are 10.04 [20:43:02] appreciate it [20:43:35] !log added jdouglas to wmf LDAP group [20:43:40] Logged the message, Master [20:44:03] matanya: i dont know who took the role from her yet [20:44:29] ok, i'll do my best in the mean time [20:45:08] what next? trying recovery software? [20:46:07] lol [20:46:17] it'd be easier to just direct upload them to labs from the office via ulsfo [20:47:12] Reedy: would cajoel be the guy to talk to ? [20:47:13] oh, do they exist in office? [20:47:36] matanya: techsupport@ i would say [20:47:40] Yeah, what they sent to eqiad wasn't the only copy (thank god!) [20:47:47] Messages from 208.80.154.4 temporarily deferred due to user complaints - 4.16.55.1; see http://postmaster.yahoo.com/421-ts01.html [20:47:48] good :) [20:47:51] the office uses ulsfo's interenet, therefore fast upload [20:47:55] i think Yahoo blocked us [20:48:03] (03CR) 10CSteipp: "As the security person, I'm not blocking this. The security issues have been addressed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170129 (https://bugzilla.wikimedia.org/49193) (owner: 10Spage) [20:48:14] how hard is it to be useful :) [20:48:29] I know yesterday we were throttled by google too [20:48:30] again? [20:48:43] when Jeff Green looked, but the queue wasn't raising at all just steady [20:48:51] (03CR) 10Gage: [C: 032] Cherry-pick trebuchet pull #15 [puppet] - 10https://gerrit.wikimedia.org/r/176976 (owner: 10GWicke) [20:49:06] what did you do jamesofur ? again spamming the world ? [20:49:21] I wish, at least I'd have fun then at the same time [20:50:47] it looks like all Yahoo in the log [20:52:01] maybe rsync will be cool enough to just retransfer the broken bytes of those videos on next attempt [20:52:43] that'd be nice [20:53:11] mutante: can you post the list of file for me somewhere, i'll do youtube in the mean time [20:53:24] i wouldn't want to duplicate myself [20:53:36] hmm, who was the one who reached out to yahoo last time... I vaguely remember a bugzilla about it [20:54:40] matanya: https://phabricator.wikimedia.org/P118 [20:54:55] uh oh, are we still backlogged? [20:54:56] look at all the .trash dirs :) [20:55:09] thanks [20:56:01] that "Barbican Hall" subdir is broken [20:56:02] /media/wikimania2014/Barbican Hall# ls [20:56:02] ls: reading directory .: Input/output error [20:57:36] mutante: why the education IV video size on youtube is 734MB and on the list it is 11GB ? [20:57:55] because youtube scaled it down? [20:58:02] no, full HD [20:58:09] 1920*1080 [20:58:17] did they alreayd re-encode it? then maybe it's broken? [20:58:36] does the end of it look like a "natural" ending? [20:58:56] oh, that did scale it down [20:58:57] 1280x720 [20:59:03] *they [21:00:04] Jeff_Green, if you're talking about mailman mail then yes [21:00:14] so, techsupport [21:01:32] oh gee, and now rsync died in general [21:01:35] Error: Cannot find master process to attach to! [21:01:54] what, my screen is "dead" [21:02:15] great [21:02:23] There is a screen on: 7818.pts-1.terbium(11/12/2013 05:49:26 PM)(Dead ???) [21:02:27] There is no screen to be attached matching 7818. [21:02:31] can you tell me a name in tech support? [21:02:35] schroedingers screen [21:02:49] i'll make them upload it directly to labstore [21:03:06] matanya: just use the team alias [21:03:14] i'll cc you [21:04:01] jzerebecki: when you finish with ru, move to he [21:04:54] ok [21:05:47] matanya: RT #8416 vs #8334 [21:06:29] joy [21:06:47] matanya: yes :) [21:07:09] !log re-pooled mw1188 [21:07:13] Logged the message, Master [21:08:57] sent mutante [21:09:24] btw, the google mail server rejects mail i send to wm.o sometimes [21:09:58] Diagnostic-Code: smtp; 550-5.1.1 The email account that you tried to reach does [21:09:58] not exist. Please try 550-5.1.1 double-checking the recipient's email [21:09:58] address for typos or 550-5.1.1 unnecessary spaces. Learn more at 550 5.1.1 [21:09:58] http://support.google.com/mail/bin/answer.py?answer=6596 ea1si7846393wib.83 [21:09:58] - gsmtp [21:10:16] for address i know exist and i have checked for typos [21:10:39] my guess: they have Julie still on the techsupport@ alias but her account is gone [21:24:49] (03PS1) 10GWicke: Add statsd host in restbase config [puppet] - 10https://gerrit.wikimedia.org/r/177078 [21:28:48] (03PS2) 10GWicke: Add statsd host in restbase config [puppet] - 10https://gerrit.wikimedia.org/r/177078 [21:29:34] (03PS1) 10Dzahn: move apache helper scripts and kill apachesync [puppet] - 10https://gerrit.wikimedia.org/r/177080 [21:30:36] YuviPanda | ori | mutante | ottomata | jgage: https://gerrit.wikimedia.org/r/#/c/177078/ [21:30:50] (03CR) 10Dzahn: "note how this also removes the "# rsyncd setup for httpd configs " part, is that correct to do?" [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [21:31:30] gwicke: in a meeting, can look after. [21:32:35] (03CR) 10Yuvipanda: Add statsd host in restbase config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/177078 (owner: 10GWicke) [21:33:39] (03CR) 10Ottomata: [C: 031] Add statsd host in restbase config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/177078 (owner: 10GWicke) [21:34:25] (03CR) 10Dzahn: "now also see this https://gerrit.wikimedia.org/r/#/c/177080/ instead" [puppet] - 10https://gerrit.wikimedia.org/r/164508 (owner: 10Ori.livneh) [21:34:47] (03PS3) 10GWicke: Add statsd host in restbase config [puppet] - 10https://gerrit.wikimedia.org/r/177078 [21:36:52] (03CR) 10GWicke: Add statsd host in restbase config (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/177078 (owner: 10GWicke) [21:36:58] <^d> qchris: You still need https://gerrit.wikimedia.org/r/#/q/status:open+project:gerrit,n,z? [21:37:02] <^d> They're all from '13. [21:38:34] ^d: As we want to stay closer to upstream, and don't mant to update gerrit, I do not need them any longer. [21:39:12] * ^d is just waiting for our glorious phabricator future :p [21:39:13] I guess I should abandon them. [21:40:09] Oh. I see you're doing the work for me :-) Thanks. [21:40:12] <^d> I am :) [21:40:51] ^d: the callsigns thing? is that at a "we are calling it point"? [21:41:58] <^d> Hmm. [21:42:00] (03Abandoned) 10John F. Lewis: deployment: fix lint [puppet] - 10https://gerrit.wikimedia.org/r/170493 (owner: 10John F. Lewis) [21:43:19] (03CR) 10Dzahn: [C: 032] "@neon:/usr/local/lib/nagios/plugins# /usr/lib/nagios/plugins/check_http -H mathoid.svc.eqiad.wmnet -p 10042 -I mathoid.svc.eqiad.wmnet -u " [puppet] - 10https://gerrit.wikimedia.org/r/176942 (owner: 10Alexandros Kosiaris) [21:45:15] (03PS1) 10Ottomata: Send kafka jmx stats to graphite using JMXtrans [puppet] - 10https://gerrit.wikimedia.org/r/177084 [21:45:23] (03CR) 10Chad: [C: 031] Move gerrit's remaining ITS templates into gerrit module [puppet] - 10https://gerrit.wikimedia.org/r/176264 (owner: 10QChris) [21:45:25] (03CR) 10GWicke: "@alex, daniel: Thanks for looking into this!" [puppet] - 10https://gerrit.wikimedia.org/r/176942 (owner: 10Alexandros Kosiaris) [21:46:15] (03CR) 10Chad: [C: 031] Remove hooks-bugzilla configuration [puppet] - 10https://gerrit.wikimedia.org/r/176265 (owner: 10QChris) [21:46:34] (03CR) 10Jdouglas: [C: 031] Add statsd host in restbase config [puppet] - 10https://gerrit.wikimedia.org/r/177078 (owner: 10GWicke) [21:47:05] (03CR) 10Chad: [C: 031] Drop 'Phabricator' suffix from gerrit's ITS actions [puppet] - 10https://gerrit.wikimedia.org/r/176266 (owner: 10QChris) [21:47:16] (03PS2) 10Ottomata: Send kafka jmx stats to graphite using JMXtrans [puppet] - 10https://gerrit.wikimedia.org/r/177084 [21:47:43] (03CR) 10Chad: [C: 031] Switch Gerrit's 'Report Bug' url to Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/176267 (owner: 10QChris) [21:47:54] <^d> qchris: That whole string of gerrit/phab changes lgtm too ^ [21:48:03] !log depooling mw1187 for re-imaging [21:48:07] Logged the message, Master [21:48:08] ^d: awesome! [21:48:14] Thanks [21:49:30] !log depooling mw1186 for re-imgaging [21:49:32] Logged the message, Master [21:50:33] YuviPanda: <3 <3 [21:50:52] ori: :) I suspect we might be done tomorrow or day after. [21:51:04] (03CR) 10Ottomata: [C: 032] Send kafka jmx stats to graphite using JMXtrans [puppet] - 10https://gerrit.wikimedia.org/r/177084 (owner: 10Ottomata) [21:51:50] (03PS4) 10Ori.livneh: Add statsd host in restbase config [puppet] - 10https://gerrit.wikimedia.org/r/177078 (owner: 10GWicke) [21:51:55] <^d> chasemp: I think there's consensus.. [21:51:56] !log depooling mw1185 for re-imaging [21:51:58] Logged the message, Master [21:52:20] (03CR) 10Ori.livneh: [C: 032] ""YAML Police" :D" [puppet] - 10https://gerrit.wikimedia.org/r/177078 (owner: 10GWicke) [21:52:41] PROBLEM - Host mw1186 is DOWN: PING CRITICAL - Packet loss = 100% [21:52:52] (03CR) 10Ori.livneh: [V: 032] Add statsd host in restbase config [puppet] - 10https://gerrit.wikimedia.org/r/177078 (owner: 10GWicke) [21:53:31] ^d sounds good then [21:53:45] ori: thanks! [21:54:11] <^d> chasemp: I pinged the thread with a call to close the discussion. [21:54:18] thanks [21:54:20] <^d> Since I haven't seen any major objections in the last ~48h [21:54:20] !log depooling mw1184 for re-imaging [21:54:22] Logged the message, Master [21:54:35] your silence is acceptance please be advised [21:54:35] :) [21:58:26] RECOVERY - Host mw1186 is UP: PING OK - Packet loss = 0%, RTA = 3.25 ms [22:00:04] spagewmf, ebernhardson: Dear anthropoid, the time has come. Please deploy Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141202T2200). [22:00:10] ^d: short is good, but four letters seems to be a bit inflexible [22:00:43] personally, I find PARSOID easier than MWPD or whatever [22:01:39] How about UEFS [22:02:02] sounds like some file system [22:02:13] über extra file system? [22:03:45] <^d> /[A-Z]{7}/ is just as arbitrary as {4}, only PARSOID fits ;-) [22:04:20] I would even allow 10 chars or so [22:04:23] <^d> Really, these aren't used outside of URLs. [22:04:28] there is a strong incentive for keeping it short anyway [22:04:32] <^d> So shorter is $better+++++++ [22:04:38] (03PS1) 10Ottomata: Update jmxtrans module with graphite fix and statsd support [puppet] - 10https://gerrit.wikimedia.org/r/177088 [22:05:02] shorter schmorter [22:05:04] descriptive is better [22:05:07] ! [22:05:13] <^d> No they aren't. They aren't used as descriptions. [22:05:18] (please ignore me I am not joining this convo!) [22:05:23] <^d> That's what a repo name or cloning path is. [22:05:49] we could huffman-code the prefixes [22:05:54] saves the most space [22:05:58] * ^d gives up [22:06:12] <^d> I'm going to assign the first repo "A" [22:06:14] <^d> the second "B" [22:06:16] <^d> And so forth. [22:06:17] haha [22:06:20] <^d> And everyone will be HAPPY [22:06:23] <^d> And LOVE IT [22:06:30] <^d> Or at least LEARN TO LIVE [22:06:45] mystery can make things interesting [22:07:01] now everybody is tempted to follow the links to find out [22:07:26] (03PS2) 10Ottomata: Update jmxtrans module with graphite fix and statsd support [puppet] - 10https://gerrit.wikimedia.org/r/177088 [22:07:34] (03CR) 10Ottomata: [C: 032 V: 032] Update jmxtrans module with graphite fix and statsd support [puppet] - 10https://gerrit.wikimedia.org/r/177088 (owner: 10Ottomata) [22:11:11] if this is urls [22:11:20] why are we using ca!ptial letters!? [22:11:55] BECAUSE THEY'RE AWESOME [22:12:38] callsigns are IMPORTANT [22:15:47] <^d> ottomata: It's also to help visually call out a repo commit as sha1s are [a-f]. [22:16:03] <^d> So you end up with commits like rMWabc1299ca... [22:16:11] ori: uhhh, what is the port I should use to send stuff to grpahite? [22:16:13] 2203? [22:16:14] <^d> or rOPUPabcabcabcabc [22:16:51] this is incorrect? [22:16:51] https://wikitech.wikimedia.org/wiki/Graphite [22:17:25] or 2003 is correct? [22:17:35] 2003 is graphite [22:17:36] 8165 is probably statsd [22:17:42] depends on your intentions [22:18:27] AHHH [22:18:31] analytics firewall. [22:18:41] chasemp: yeah, actually [22:18:44] i don't really know which to use when [22:18:49] i'm going to make all my kafka jmx stats go to graphite [22:18:58] so that I can use check_graphite to do alerts [22:19:01] rather than check _ganglia [22:19:14] well if statsd does namespace mangling for sanity [22:19:22] you may really need to go through it [22:19:35] and our collection interval is 60s if I recall [22:19:49] namespace mangling? [22:19:50] but it really depends on what you want to do honestly, there is a case for all seasons [22:19:57] you send in foo.does.bar 1 [22:20:00] i'm happy to user either, it is the same to me [22:20:07] i'm just telling jmxtrans to use one of the other [22:20:11] all i need is hosntame:port [22:20:13] and it says foo => prepend.foo.does.bar 1 [22:20:13] and it will do the rest [22:20:25] go for broke just relaying what I know :) [22:20:26] ah [22:20:29] that's nice [22:20:35] so, if my metrics start with kafka [22:20:39] they will be namespaced to kafka? [22:20:48] I have no idea what the statsd now in use does [22:20:50] it's kind of a mess [22:21:00] but I thought it did some level of this [22:21:07] hm [22:21:09] as ppl requested we keep it if moving away [22:21:13] so anyways, give it a whirl [22:21:22] no big harm unless it's 10k metrics or something [22:25:54] RECOVERY - LVS HTTP IPv4 on mathoid.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 301 bytes in 0.010 second response time [22:26:09] PROBLEM - puppet last run on virt1008 is CRITICAL: CRITICAL: Puppet has 1 failures [22:27:01] <_joe_> just got paged my mathoid [22:27:52] PROBLEM - puppet last run on mw1187 is CRITICAL: CRITICAL: Puppet has 102 failures [22:29:23] I never got a page, weird [22:31:11] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Puppet has 102 failures [22:31:40] PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: Puppet has 102 failures [22:33:30] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 102 failures [22:34:52] <_joe_> !log restarting apache on mw1110 mw1167 mw1175, stuck in apc futex [22:34:56] Logged the message, Master [22:36:08] RECOVERY - Apache HTTP on mw1110 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.086 second response time [22:36:52] RECOVERY - Apache HTTP on mw1175 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.086 second response time [22:37:40] RECOVERY - puppet last run on virt1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:38:02] mw1184-87 is me, my 2h window wasn't enough [22:38:22] RECOVERY - Apache HTTP on mw1167 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.079 second response time [22:38:58] should be done soon, tho. only one host left [22:39:08] <_joe_> !log likewise on mw1121, mw1200 [22:39:11] Logged the message, Master [22:39:27] RECOVERY - Apache HTTP on mw1121 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.123 second response time [22:39:32] <_joe_> YuviPanda: np I am just looking at the damn zend apaches [22:39:40] heh, ok :) [22:39:43] <_joe_> and I am done :) [22:39:46] <_joe_> good night [22:41:04] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.083 second response time [22:42:41] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:45:29] !log repooling mw118[4-7] as HHVM! [22:45:33] Logged the message, Master [22:52:34] RECOVERY - puppet last run on mw1186 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:54:50] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:55:18] ok, the puppet failures are just transient, everything looks ok [22:55:21] off to sleep then [23:00:12] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:10:37] (03CR) 10Manybubbles: [C: 031] Create new pool counter for prefix searches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176931 (owner: 10Manybubbles) [23:16:26] (03PS1) 10Ori.livneh: Only force $wgInternalTidy = false on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177104 [23:18:43] (03CR) 10Ori.livneh: [C: 032] Only force $wgInternalTidy = false on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177104 (owner: 10Ori.livneh) [23:18:51] (03Merged) 10jenkins-bot: Only force $wgInternalTidy = false on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177104 (owner: 10Ori.livneh) [23:21:26] (03PS1) 10Krinkle: gerrit: Output space in commentlink "commit" before the link [puppet] - 10https://gerrit.wikimedia.org/r/177106 [23:26:21] (03PS2) 10Krinkle: gerrit: Output space in commentlink "commit" before the link [puppet] - 10https://gerrit.wikimedia.org/r/177106 [23:27:20] (03CR) 10Krinkle: "Untested!" [puppet] - 10https://gerrit.wikimedia.org/r/177106 (owner: 10Krinkle) [23:31:27] (03CR) 10Bartosz Dziewoński: "It doesn't. It's custom." [puppet] - 10https://gerrit.wikimedia.org/r/177106 (owner: 10Krinkle) [23:46:00] <^d> anybody else planning to get on today's swat? [23:46:04] <^d> i have one thing [23:46:40] (03PS1) 10Ori.livneh: Revert "Only force $wgInternalTidy = false on production" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177123 [23:47:43] (03CR) 10Ori.livneh: [C: 032] Revert "Only force $wgInternalTidy = false on production" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177123 (owner: 10Ori.livneh) [23:47:51] (03Merged) 10jenkins-bot: Revert "Only force $wgInternalTidy = false on production" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177123 (owner: 10Ori.livneh) [23:48:06] ^d: i was going to push a patch that needs scap, can push yours too [23:48:25] (i didn't quite finish in flow's deploy window which closes in 12 minutes) [23:48:37] <^d> Mine's mostly a no-op. Prepping config for stuff going into new wmf branch. [23:51:19] !log ebernhardson Started scap: Bumping flow submodule in 1.25wmf10 [23:51:21] Logged the message, Master [23:58:47] (03PS1) 10Krinkle: gerrit: Don't match Phabricator identifiers within urls [puppet] - 10https://gerrit.wikimedia.org/r/177128