[00:00:04] RoanKattouw, ^d, marktraceur, MaxSem, kaldari: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141119T0000). Please do the needful. [00:00:12] on it [00:00:21] kaldari|2, are you ready? :P [00:05:44] ori, I think it's safe to try repooling mw1205 - the branch that had issues with sync is not active anymore:) [00:06:01] MaxSem: did you sync-common? [00:06:06] yep [00:06:16] !log repooled mw1205 [00:06:16] and then Sam did a full scap [00:06:20] Logged the message, Master [00:06:34] thanks - let's see how it goes [00:07:22] aude, do you guys know about division by zero in WD? [00:08:41] dammit, mw1025 is still hurt [00:09:48] !log gracefulled apache on mw1205 (suspect an APC bug) [00:09:50] Logged the message, Master [00:14:37] (03Abandoned) 1020after4: Fix a bug in redirector that broke the alternate-files-domain Also, don't die when connection fails, just log the error and return so that there is no chance of this interfering with normal operation of phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174310 (owner: 1020after4) [00:15:29] (03Abandoned) 1020after4: Clean up some puppet-lint errors and warnings [puppet] - 10https://gerrit.wikimedia.org/r/174288 (owner: 1020after4) [00:19:32] (03CR) 10MaxSem: [C: 032] Adding 'types of actors' WikiGrok campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174303 (owner: 10Kaldari) [00:19:44] (03Merged) 10jenkins-bot: Adding 'types of actors' WikiGrok campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174303 (owner: 10Kaldari) [00:21:01] !log maxsem Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/174303 (duration: 00m 05s) [00:21:05] Logged the message, Master [00:22:51] (03CR) 1020after4: "@Dzahn: $passwords::mysql::phabricator::maniphest_user" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [00:24:55] (03PS1) 10Springle: Switch m2-master to db1046 [puppet] - 10https://gerrit.wikimedia.org/r/174326 [00:24:57] (03PS1) 10Springle: switch m2-master CNAME to dbproxy1002 [dns] - 10https://gerrit.wikimedia.org/r/174327 [00:25:04] (03CR) 10jenkins-bot: [V: 04-1] switch m2-master CNAME to dbproxy1002 [dns] - 10https://gerrit.wikimedia.org/r/174327 (owner: 10Springle) [00:28:06] E: This script only supports gdnsd 2.x [00:32:38] bblack: ^ appears I caught jenkins and dns at a bad time? [00:32:51] (03PS8) 10Dzahn: (WIP) generate wikimedia wiki entries from helper [dns] - 10https://gerrit.wikimedia.org/r/171769 (https://bugzilla.wikimedia.org/38799) [00:35:47] (03PS2) 10Springle: switch m2-master CNAME to dbproxy1002.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/174327 [00:37:19] now it's happy. oh well [00:38:01] !log EventLogging: deployed 423f7dd5b2b5 & restarted. [00:38:03] Logged the message, Master [00:38:04] qchris: ^ [00:38:05] (03CR) 10Springle: [C: 032] Switch m2-master to db1046 [puppet] - 10https://gerrit.wikimedia.org/r/174326 (owner: 10Springle) [00:38:15] ori: Thanks! [00:38:40] (03CR) 10Dzahn: "re: comments from bblack above: hardcoded both. ran generating script on a labs instance. result is https://phabricator.wikimedia.org/P89" [dns] - 10https://gerrit.wikimedia.org/r/171769 (https://bugzilla.wikimedia.org/38799) (owner: 10Dzahn) [00:38:48] springle: odd [00:40:05] springle: no, that's probably my fault. I updated gallium, but I think I need to go update some other boxes that are jenkins slaves [00:40:12] (to have 2.x on all linting boxes) [00:40:56] yeah the lint failure happened on lanthanum [00:42:27] hopefully it's fixed now, those are the only two prod ci slaves anyways [00:43:56] bblack: thanks :) [00:46:14] (03CR) 10Springle: [C: 032] switch m2-master CNAME to dbproxy1002.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/174327 (owner: 10Springle) [00:48:42] !log m2-master CNAME switch to dbproxy1002, and db1046 to primary backend [00:48:45] Logged the message, Master [00:51:21] (03Abandoned) 1020after4: preamble script to read client address from HTTP_X_FORWARDED_FOR [puppet] - 10https://gerrit.wikimedia.org/r/168509 (owner: 1020after4) [00:51:30] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 315 seconds [00:51:59] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 345 seconds [00:52:59] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [00:53:32] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -1 seconds [00:59:06] (03CR) 10Rush: "I don't the why but I know that: $manifest_user = 'phmanifest'" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [01:03:38] (03PS1) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [01:04:25] (03CR) 10jenkins-bot: [V: 04-1] Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [01:07:57] (03PS2) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [01:08:44] (03CR) 10jenkins-bot: [V: 04-1] Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [01:14:58] (03PS1) 10Rush: phab cleaner set native dict type files.image-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/174342 [01:15:00] (03PS1) 10Rush: phab during migration raise upload limit [puppet] - 10https://gerrit.wikimedia.org/r/174343 [01:15:17] (03PS2) 10Rush: phab cleaner set native dict type files.image-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/174342 [01:15:40] (03CR) 10Rush: [C: 032 V: 032] phab cleaner set native dict type files.image-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/174342 (owner: 10Rush) [01:20:22] !log ori Synchronized php-1.25wmf8/extensions/SyntaxHighlight_GeSHi: Ibb0f7c24: Update SyntaxHighlight_GeSHi for cherry-picks (duration: 00m 05s) [01:20:30] Logged the message, Master [01:23:29] !log ori Synchronized php-1.25wmf7/extensions/SyntaxHighlight_GeSHi: I788e1beb8: Update SyntaxHighlight_GeSHi for cherry-picks (duration: 00m 05s) [01:23:31] Logged the message, Master [01:28:55] !log Restarted EventLogging mysql-m2 consumer to pick up switch to dbproxy1002 [01:28:58] Logged the message, Master [01:39:50] PROBLEM - Apache HTTP on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:40:20] PROBLEM - HHVM rendering on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:40:37] that was me, sorry [01:40:49] RECOVERY - Apache HTTP on mw1017 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.057 second response time [01:41:09] RECOVERY - HHVM rendering on mw1017 is OK: HTTP OK: HTTP/1.1 200 OK - 68164 bytes in 0.169 second response time [02:13:39] ori, I think https://gerrit.wikimedia.org/r/#/c/167700/ is ready [02:14:12] !log LocalisationUpdate completed (1.25wmf7) at 2014-11-19 02:14:12+00:00 [02:14:17] Logged the message, Master [02:14:21] (03CR) 10Ori.livneh: [C: 032] Add cassandra role [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [02:15:17] ori: thanks! [02:15:23] np [02:16:14] also useful for the thing you just chatted about with James & Roan ;) [02:21:28] you have good hearing :P [02:26:12] !log LocalisationUpdate completed (1.25wmf8) at 2014-11-19 02:26:12+00:00 [02:26:19] Logged the message, Master [03:04:45] (03PS1) 10Ori.livneh: hhvm: enable perf_pid.map files w/automatic pruning [puppet] - 10https://gerrit.wikimedia.org/r/174356 [03:22:49] PROBLEM - puppet last run on wtp1021 is CRITICAL: CRITICAL: Puppet has 1 failures [03:34:59] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (20.00%) [03:39:19] (03PS3) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [03:39:45] (03PS4) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [03:40:50] RECOVERY - puppet last run on wtp1021 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [03:45:32] (03CR) 1020after4: "changes rolled into https://gerrit.wikimedia.org/r/#/c/174335/" [puppet] - 10https://gerrit.wikimedia.org/r/174310 (owner: 1020after4) [03:46:51] (03CR) 1020after4: "@rush is it manifest_user instead of maniphest_user ?" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [03:48:09] (03CR) 1020after4: "reintroducing this change in https://gerrit.wikimedia.org/r/#/c/174335/" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [03:53:40] PROBLEM - HHVM busy threads on mw1114 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [90.0] [04:00:50] RECOVERY - HHVM busy threads on mw1114 is OK: OK: Less than 1.00% above the threshold [60.0] [04:15:00] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 19 04:15:00 UTC 2014 (duration 14m 59s) [04:15:02] Logged the message, Master [04:15:25] !log mw1020: disabled puppet & restarted hhvm w/hhvm.eval.perf_pid_map = true to test [04:15:27] Logged the message, Master [04:36:39] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [04:42:02] !log on mw1114: testing xhprof hack for T758 [04:42:07] Logged the message, Master [05:44:03] !log batchCAAntiSpoof finished with "34721605 user(s) done." [05:44:08] Logged the message, Master [05:55:49] (03PS1) 10Tim Starling: xhprof production profiling hack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174372 [06:28:39] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:11] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:40] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:01] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:25] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:45:56] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:24] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:45] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [07:23:59] (03CR) 10Giuseppe Lavagetto: [C: 031] Move the openstack_version setting hiera. [puppet] - 10https://gerrit.wikimedia.org/r/173904 (owner: 10Andrew Bogott) [08:08:21] (03PS5) 10Qgil: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [09:42:18] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [09:44:49] !log hoo Synchronized php-1.25wmf8/extensions/Wikidata/: Fix url and commonsMedia UI editing (duration: 00m 42s) [09:44:56] Logged the message, Master [09:45:00] Lydia_WMDE: Can you verify? [09:48:13] Ok, seems to work on WD [09:59:13] (03Abandoned) 10Alexandros Kosiaris: ldap: replace iptables with ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [10:02:44] looking in a min :) [10:02:56] (03Abandoned) 10Alexandros Kosiaris: openstack: remove firewall access for amanda, replaced by bacula [puppet] - 10https://gerrit.wikimedia.org/r/133104 (owner: 10Matanya) [10:04:49] (03CR) 10Alexandros Kosiaris: [C: 032] git buildpackage basic configuration [debs/vips] - 10https://gerrit.wikimedia.org/r/113098 (owner: 10Hashar) [10:27:09] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [10:28:51] (03CR) 10Alexandros Kosiaris: "The git submodule also needs to be declared or the cassandra module will not be present in the puppetmaster (as it happens now)." [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [10:51:49] (03PS1) 10Alexandros Kosiaris: WIP: Make torrus class naming more clear [puppet] - 10https://gerrit.wikimedia.org/r/174389 [10:55:49] (03PS1) 10Giuseppe Lavagetto: varnish: remove cache separation for HHVM [puppet] - 10https://gerrit.wikimedia.org/r/174390 [11:53:11] ACKNOWLEDGEMENT - RAID on ms-be2007 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) Filippo Giunchedi RT #8902 [11:53:22] ACKNOWLEDGEMENT - puppet last run on ms-be2007 is CRITICAL: CRITICAL: Puppet has 1 failures Filippo Giunchedi RT #8902 [12:23:59] (03PS2) 10Alexandros Kosiaris: WIP: Modularize torrus [puppet] - 10https://gerrit.wikimedia.org/r/174389 [13:00:57] (03CR) 10Giuseppe Lavagetto: [C: 031] kill facilities.pp, move to nagios_common [puppet] - 10https://gerrit.wikimedia.org/r/173999 (owner: 10Dzahn) [13:23:09] (03CR) 10BBlack: "I still think we should remove Zend fallback before this step." [puppet] - 10https://gerrit.wikimedia.org/r/174390 (owner: 10Giuseppe Lavagetto) [13:24:17] (03PS1) 10ArielGlenn: audit ssh key use on production cluster [software] - 10https://gerrit.wikimedia.org/r/174408 [13:24:20] (03CR) 10jenkins-bot: [V: 04-1] audit ssh key use on production cluster [software] - 10https://gerrit.wikimedia.org/r/174408 (owner: 10ArielGlenn) [14:00:08] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [14:04:57] (03CR) 10Hashar: Allow puppetmaster to send reports to logstash (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/143788 (https://bugzilla.wikimedia.org/60690) (owner: 10BryanDavis) [14:05:00] (03Abandoned) 10Alexandros Kosiaris: torrus: move into a module [puppet] - 10https://gerrit.wikimedia.org/r/108498 (owner: 10Matanya) [14:05:06] (03CR) 10Hashar: [C: 031] Allow puppetmaster to send reports to logstash [puppet] - 10https://gerrit.wikimedia.org/r/143788 (https://bugzilla.wikimedia.org/60690) (owner: 10BryanDavis) [14:09:41] (03CR) 10Hashar: puppetmaster: Make time to keep old reports for configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/174132 (https://bugzilla.wikimedia.org/73472) (owner: 10Yuvipanda) [14:09:49] RECOVERY - gdash.wikimedia.org on graphite1001 is OK: HTTP OK: HTTP/1.1 200 OK - 9447 bytes in 0.051 second response time [14:20:09] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [14:35:32] (03PS1) 10Andrew Bogott: Lay the ground for the openstack icehouse migration. [puppet] - 10https://gerrit.wikimedia.org/r/174413 [14:36:24] (03CR) 10jenkins-bot: [V: 04-1] Lay the ground for the openstack icehouse migration. [puppet] - 10https://gerrit.wikimedia.org/r/174413 (owner: 10Andrew Bogott) [14:38:35] (03PS2) 10Andrew Bogott: Lay the ground for the openstack icehouse migration. [puppet] - 10https://gerrit.wikimedia.org/r/174413 [14:43:32] Anyone know what's going on with beta enwiki [14:43:33] ? [14:44:13] I'm getting "500 hphp-invoke" [14:46:04] Nov 19 14:45:34 10.68.17.208 hhvm: #012Fatal error: unexpected N4HPHP13DataBlockFullE: Attempted to emit 23 byte(s) into a 35651584 byte DataBlock with 17 bytes available. This almost certainly means the TC is full. If this is the case, increasing Eval.JitASize, Eval.JitAColdSize, Eval.JitAFrozenSize and Eval.JitGlobalDataSize in the configuration file when running this script or application should fix this problem. [14:49:11] <_joe_> marktraceur: it needs a restart, and maybe some tuning [14:49:34] <_joe_> I am chasing a prod issue, but I'll take a look [14:49:35] /var is at 100% on deployment-mediawiki01 at least [14:49:47] <_joe_> oh man [14:49:56] So logging is killing it [14:50:07] <_joe_> 1 sec [14:52:51] <_joe_> marktraceur: it should be ok now [14:53:07] No dice. Maybe something else is wrong... [14:53:17] I'll keep digging [14:53:24] <_joe_> marktraceur: another server, proabbly [14:54:23] Now that apache can write to the logs, I'm only getting this: [14:54:23] Nov 19 14:52:39 10.68.17.96 apache2[7692]: [proxy:error] [pid 7692] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [14:54:28] But that may be unrelated [14:55:09] <_joe_> marktraceur: that's bogus [14:55:17] <_joe_> marktraceur: try now? [14:55:28] Ooh, loading [14:55:38] Winner! Thanks :) [14:55:41] <_joe_> it was deployment-mediawiki02 [14:55:51] K [15:10:04] (03PS3) 10Alexandros Kosiaris: WIP: Modularize torrus [puppet] - 10https://gerrit.wikimedia.org/r/174389 [15:15:11] (03CR) 10Andrew Bogott: [C: 032] Lay the ground for the openstack icehouse migration. [puppet] - 10https://gerrit.wikimedia.org/r/174413 (owner: 10Andrew Bogott) [15:35:49] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [15:43:18] thanks jgage [15:43:20] yt? [15:46:05] (03PS1) 10Andrew Bogott: Update hcaked libvirt driver for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174420 [15:46:07] (03PS1) 10Andrew Bogott: Update glance config for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174421 [15:48:24] log beginning upgrade of labs OpenStack from Havana to Icehouse [15:48:33] um [15:48:34] !log beginning upgrade of labs OpenStack from Havana to Icehouse [15:48:40] Logged the message, Master [15:48:48] (03PS2) 10Andrew Bogott: Update glance config for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174421 [15:48:50] (03PS2) 10Andrew Bogott: Update hacked libvirt driver for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174420 [15:49:38] !log backing up labs configs in ~andrew/osback/ [15:49:41] Logged the message, Master [15:50:19] * anomie sees nothing for SWAT (yet) [15:53:49] (03CR) 10Andrew Bogott: [C: 032] Update hacked libvirt driver for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174420 (owner: 10Andrew Bogott) [15:54:44] (03CR) 10Andrew Bogott: [C: 032] Update glance config for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174421 (owner: 10Andrew Bogott) [15:57:21] !log backed up all labs openstack databases to virt1000:~andrew/osback/havana-db-backup.sql [15:57:25] Logged the message, Master [15:58:20] I thought gi11es had something for SWAT [15:58:29] Maybe he forgot to add it [15:58:48] Oh, tgr|away is doing it tonight [15:58:50] nvm [16:00:05] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141119T1600). [16:00:13] nope.jpg [16:02:05] (03PS1) 10Andrew Bogott: Consolidate entries for virt1001-1009. [puppet] - 10https://gerrit.wikimedia.org/r/174425 [16:02:06] (03PS1) 10Andrew Bogott: Update virt1000 (labs controller node) to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174426 [16:02:09] (03PS1) 10Andrew Bogott: Upgrade labnet1001 to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174427 [16:02:11] (03PS1) 10Andrew Bogott: Update labs compute nodes to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174428 [16:04:05] (03CR) 10Andrew Bogott: [C: 032] Consolidate entries for virt1001-1009. [puppet] - 10https://gerrit.wikimedia.org/r/174425 (owner: 10Andrew Bogott) [16:12:11] !log Jenkins: uninstalled Jenkins statsd plugin ( https://phabricator.wikimedia.org/T1278 ). It is overloading the statsd server with a bunch of metrics we don't care about ( https://phabricator.wikimedia.org/T1075 ) [16:12:19] Logged the message, Master [16:16:32] !log moved virt1000 db backup to /a/osback because it was /way/ too big to fit in my homedir [16:16:39] Logged the message, Master [16:17:45] (03PS1) 10Giuseppe Lavagetto: varnishkafka: do not reload on every puppet run [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174429 [16:18:01] <_joe_> bblack: ^^ [16:18:41] <_joe_> (20 lines of puppet instead of 4 lines of bash) [16:19:04] <_joe_> of course now I need to double-commit this [16:19:17] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] varnishkafka: do not reload on every puppet run [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174429 (owner: 10Giuseppe Lavagetto) [16:19:28] <_joe_> ottomata: ^^ [16:19:44] <_joe_> would you do the honours of merging this into the main puppet tree? [16:20:06] <_joe_> we've been without ganglia since you and ori did the varnishkafka change yesterday [16:20:29] <_joe_> (as I told you guys, btw) [16:21:10] (03PS1) 10Yuvipanda: icinga: Scream in -operations too when betacluster has issues [puppet] - 10https://gerrit.wikimedia.org/r/174430 [16:21:25] <_joe_> -2 Abandoned [16:21:42] <_joe_> YuviPanda: we have too much noise here already, IMO [16:21:48] https://phabricator.wikimedia.org/T1334 [16:22:16] betacluster has had broken puppet stuff for about a month because changes to prod-related stuff that nobody checked on betacluster [16:22:28] and 5 instances still have broken puppet, and nobody's workin on 'em atm. [16:22:31] I think it should be here [16:24:11] Yeah, I think we need some system for making Ops care when they break beta. [16:24:16] Not that I'm a fan of more noise in here... [16:24:23] hm, aye, thanks _joe_ [16:24:36] what, we've been wtihout ganglia? [16:24:40] <_joe_> ottomata: are you merging it then? [16:24:43] yes can do [16:24:49] (was still reading backlog) [16:25:02] <_joe_> ottomata: I told you guys when you applied your changes on cp1056 [16:25:09] <_joe_> that ganglia was failing [16:25:10] springle: https://phabricator.wikimedia.org/T1254, in case you missed it/emails :) [16:25:12] _joe_ I did not hear it! [16:25:25] <_joe_> the reason, turns out, is that ganglia is being restarted on every run [16:25:53] (03PS1) 10Ottomata: Update varnishkafka module with ganglia fix [puppet] - 10https://gerrit.wikimedia.org/r/174431 [16:25:55] <_joe_> kart_: I have one apache change of yours in CR, it seeems good, should I merge it? [16:26:11] (03CR) 10Ottomata: [C: 032 V: 032] Update varnishkafka module with ganglia fix [puppet] - 10https://gerrit.wikimedia.org/r/174431 (owner: 10Ottomata) [16:26:39] _joe_: Feel free. We're holding of adding woff2 fonts as of now, but config is okay. [16:26:41] _joe_ merged. [16:26:44] thank you. [16:26:56] <_joe_> ottomata: ok, checking now [16:27:28] <_joe_> ottomata: when we're in SF, I'll make you offer me a beer for every submodule you created :P [16:28:28] hahah [16:28:31] _joe_, ok! [16:28:55] and all these vagrant and labs users will buy me a beer! :p [16:29:11] or...maybe that's just me [16:29:15] maybe i'll buy myself a beer [16:29:17] so many beers! [16:29:25] <_joe_> ottomata: I don't think labs, but vagrant maybe [16:29:33] labs-vagrant, yo! :) [16:29:43] * bd808 would like some time to work on that again [16:30:03] <_joe_> ottomata: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Text%2520caches%2520eqiad&tab=m&vn=&hide-hf=false [16:30:10] Jeff_Green: will buy me a beer then [16:30:13] for kafkatee in frack submodule [16:30:17] AT LEAST! [16:30:22] I want to get the lxc provider working with mw-vagrant so that labs-vagrant is really running vagrant [16:32:48] no beer for you! [16:32:56] mwuhahahhaha. [16:33:20] rats! [16:33:36] that's good anyway, i can really only handle one beer anyway [16:33:40] any more than that and I just get sleepy [16:34:10] _joe_...is that good? or just what was happening? [16:34:54] <^d> beer? [16:34:58] <^d> Somebody say something about beer? [16:35:08] <^d> (i guess it's 5 o'clock somewhere) [16:35:13] it's 10PM here [16:35:18] so about 6hours before my beer o'clock [16:36:19] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [16:36:44] !log upgrading virt1000 [16:36:49] Logged the message, Master [16:36:58] (03CR) 10Andrew Bogott: [C: 032] Update virt1000 (labs controller node) to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174426 (owner: 10Andrew Bogott) [16:39:15] YuviPanda: oh. I thought Kerala is in different timezone ;) [16:39:21] kart_: :) [16:39:29] kart_: you can still get illegal beer here [16:39:35] quasi legal, at least. is wrapped in paper [16:39:56] <_joe_> ottomata: that was what was happening [16:40:07] <_joe_> do you see the 20-min cycle of all of it? [16:41:42] ja [16:42:54] PROBLEM - DPKG on virt1000 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:44:03] RECOVERY - DPKG on virt1000 is OK: All packages OK [16:46:08] !log starting trusty upgrade of analytics1034 [16:47:26] (03PS1) 10Chad: enwiki gets Cirrus as primary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174434 [16:48:09] (03CR) 10Giuseppe Lavagetto: [C: 031] "\o/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174434 (owner: 10Chad) [16:48:32] (03PS2) 10Andrew Bogott: Update labs compute nodes to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174428 [16:48:34] (03PS2) 10Andrew Bogott: Upgrade labnet1001 to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174427 [16:48:36] (03PS1) 10Andrew Bogott: Update keystone config file for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174435 [16:49:50] * anomie is going to SWAT after all [16:50:02] (03PS3) 10Andrew Bogott: Update labs compute nodes to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174428 [16:50:03] (03PS3) 10Andrew Bogott: Upgrade labnet1001 to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174427 [16:50:05] (03PS2) 10Andrew Bogott: Update keystone config file for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174435 [16:50:07] (03CR) 10Yuvipanda: "(=`ω´=)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174434 (owner: 10Chad) [16:50:13] <^d> T-10 minutes on Cirrus for enwiki. [16:50:33] <^d> Make sure to have your hard hats. [16:50:35] ^d: I love that I got it twice. [16:50:38] two offers of cake! [16:50:38] <^d> Make note of the closest exit. [16:50:44] you want me to sync? [16:51:00] (03CR) 10Jforrester: "Yay." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174434 (owner: 10Chad) [16:51:05] (03CR) 10Andrew Bogott: [C: 032] Update keystone config file for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174435 (owner: 10Andrew Bogott) [16:51:11] <^d> manybubbles: I don't care either way :) [16:51:15] <^d> 9 minutes. [16:51:43] PROBLEM - DPKG on analytics1034 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:51:51] * bd808 grabs popcorn [16:51:59] hmm, I've no food item with me [16:52:34] <^d> Well you have 8 minutes to remedy that. [16:52:53] Just ran one last performance test. seems to be running just fine [16:53:05] everything is on the beach, and if I leave now I'm not coming back in 8mins. [16:53:09] I'll just grab some cold water [16:54:00] !log anomie Synchronized php-1.25wmf8/extensions/SecurePoll: SWAT: Fix SecurePoll jump wiki jumping [[gerrit:174436]] (duration: 00m 10s) [16:54:07] * anomie tests [16:55:16] Works for enwiki, accidentally. That's probably good enough until the train tomorrow. [16:55:49] anomie: Train is today :) [16:55:58] Reedy: I forgot that! Even better. [16:58:15] ^d: I'll build the commit then [16:58:39] <^d> I did already :) [16:58:52] <^d> https://gerrit.wikimedia.org/r/#/c/174434/ [16:59:12] (03PS1) 10Manybubbles: Use CirrusSearch by default for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174438 [16:59:26] Yay [16:59:32] <^d> Now we have 2 patches :D [16:59:57] <^d> Now to decide which one is better :D [17:00:04] manybubbles, ^d: Dear anthropoid, the time has come. Please deploy Search (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141119T1700). [17:00:14] (03Abandoned) 10Manybubbles: Use CirrusSearch by default for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174438 (owner: 10Manybubbles) [17:01:01] (03CR) 10Manybubbles: [C: 032] enwiki gets Cirrus as primary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174434 (owner: 10Chad) [17:01:10] (03Merged) 10jenkins-bot: enwiki gets Cirrus as primary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174434 (owner: 10Chad) [17:01:20] ^d: ok. I will sync then? [17:01:32] die lucene, die die [17:01:33] <^d> Make the sync msg fun :) [17:01:33] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: Puppet has 13 failures [17:01:41] <^d> "die lucene die die" is acceptable :) [17:02:14] PROBLEM - DPKG on virt1000 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:02:44] RECOVERY - DPKG on analytics1034 is OK: All packages OK [17:03:27] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: Make CirrusSearch default search for enwiki. I am so excited. (duration: 00m 04s) [17:03:40] hmm pool queue errors [17:04:08] And you just can't hide it? [17:04:17] on prefix search [17:04:21] * ^d goes swimming in the pool [17:04:41] they look to be stopping [17:05:31] ^d: that was more load then the performance tests said we'd get: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Elasticsearch+cluster+eqiad&h=&tab=m&vn=&hide-hf=false&m=cpu_report&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [17:06:42] <^d> Still not bad though. [17:07:59] <^d> manybubbles: Think we need to bump the queue settings a tad? [17:08:18] ^d: maybe a tad but we're not hitting them any more [17:08:23] RECOVERY - DPKG on virt1000 is OK: All packages OK [17:08:24] that was mostly a cache warming thing [17:08:25] (03PS1) 10Ottomata: Include spark CDH .deb in apt [puppet] - 10https://gerrit.wikimedia.org/r/174440 [17:08:29] <^d> Well probably because the cache filled. [17:08:30] <^d> Yeah [17:08:38] (03PS2) 10Ottomata: Include spark CDH .deb in apt [puppet] - 10https://gerrit.wikimedia.org/r/174440 [17:10:52] ^d: if we have trouble when we do the next rolling restart we need to have a think about it [17:11:01] <^d> Mhmm [17:11:09] <^d> Nother spat of pool counter full. [17:11:24] PROBLEM - Host analytics1034 is DOWN: PING CRITICAL - Packet loss = 100% [17:11:43] (03PS3) 10Ottomata: Include spark CDH .deb in apt [puppet] - 10https://gerrit.wikimedia.org/r/174440 [17:12:05] <^d> manybubbles: Maybe we should segment enwiki from the rest like you'd proposed before. [17:12:13] RECOVERY - Host analytics1034 is UP: PING OK - Packet loss = 0%, RTA = 2.04 ms [17:12:38] ^d: I think we should. [17:12:43] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:12:44] <^d> Let's do it [17:12:47] no need to rush into it, but I think we should at some point [17:12:48] k [17:12:54] you wanna propose a patch? [17:12:57] its 100% config I think [17:14:13] <^d> Working on it [17:14:14] <^d> Yeah [17:15:22] (03PS1) 10Chad: Segment enwiki's cirrus poolcounter traffic from the rest [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174442 [17:15:27] <^d> ^ manybubbles [17:16:01] (03CR) 10Ottomata: [C: 032] Include spark CDH .deb in apt [puppet] - 10https://gerrit.wikimedia.org/r/174440 (owner: 10Ottomata) [17:16:19] (03CR) 10Manybubbles: "Do we also want to lower the queue depths or are we ok there?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174442 (owner: 10Chad) [17:17:36] <^d> I think we'll be fine [17:19:27] !log starting trusty upgrade of analytics1035 [17:20:38] (03CR) 10Chad: [C: 032] "It'll probably be fine for now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174442 (owner: 10Chad) [17:20:47] (03Merged) 10jenkins-bot: Segment enwiki's cirrus poolcounter traffic from the rest [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174442 (owner: 10Chad) [17:21:35] ^d: cool [17:21:48] ^d: I'm still logged into tin if you want me to sync it [17:22:03] !log demon Synchronized wmf-config/CirrusSearch-production.php: two pools > one pool (duration: 00m 04s) [17:22:06] <^d> ALready done [17:22:11] cool [17:22:23] going to go help give meds to my 2 year old...... [17:22:43] !log puppetmaster dead on vir1000, investigating. Port seems already in use. [17:22:52] <^d> manybubbles|away: Care care. [17:24:07] <^d> *take care [17:24:11] <^d> Words are hard [17:26:34] PROBLEM - DPKG on analytics1035 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:27:33] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Puppet has 20 failures [17:31:14] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [17:31:46] "PHP Warning: Division by zero in /srv/mediawiki/php-1.25wmf8/extensions/Wikidata/vendor/data-values/geo/src/Formatters/GeoCoordinateFormatter.php on line 212" <-- known issue? [17:36:32] bd808: Seems there's a bug for it already [17:36:56] Reedy: Cool. I saw you even asked in the right place to find out :) [17:38:37] bd808: already fixed but need to backport [17:39:11] (03PS4) 10Andrew Bogott: Update labs compute nodes to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174428 [17:39:13] (03PS4) 10Andrew Bogott: Upgrade labnet1001 to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174427 [17:39:15] (03PS1) 10Andrew Bogott: Replaced some broken single-quotes [puppet] - 10https://gerrit.wikimedia.org/r/174447 [17:41:24] (03CR) 10Andrew Bogott: [C: 032] Replaced some broken single-quotes [puppet] - 10https://gerrit.wikimedia.org/r/174447 (owner: 10Andrew Bogott) [17:41:47] <^d> manybubbles: Segmenting enwiki did the trick. Just a few trickle of ArticleView timeouts which are normal. [17:42:32] ^d: I wasn't seeing any timeouts from it before the segmenting either though [17:42:39] I only saw them right when we went live [17:43:02] <^d> I'm not worried about 3 or 4 of those trickling in. [17:45:24] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [17:47:12] (03PS1) 10BBlack: s/eqiad/ulsfo/ as backup for AS/* + RU with esams primary [dns] - 10https://gerrit.wikimedia.org/r/174449 [17:49:05] (03CR) 10BBlack: [C: 032] s/eqiad/ulsfo/ as backup for AS/* + RU with esams primary [dns] - 10https://gerrit.wikimedia.org/r/174449 (owner: 10BBlack) [17:50:58] (03PS5) 10Andrew Bogott: Update labs compute nodes to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174428 [17:51:00] (03PS5) 10Andrew Bogott: Upgrade labnet1001 to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174427 [17:51:02] (03PS1) 10Andrew Bogott: Turn off keystone/redis for the icehouse upgrade. [puppet] - 10https://gerrit.wikimedia.org/r/174450 [17:52:06] (03CR) 10Andrew Bogott: [C: 032] Turn off keystone/redis for the icehouse upgrade. [puppet] - 10https://gerrit.wikimedia.org/r/174450 (owner: 10Andrew Bogott) [17:53:58] (03PS1) 10Gilles: Revert "Revert "Enable JPG thumbnail chaining on all wikis except commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174451 [17:57:09] !log disabled keystone-redis because the current package doesn't work with icehouse [17:57:13] Logged the message, Master [18:00:04] tonythomas, Jeff_Green: Dear anthropoid, the time has come. Please deploy BounceHandler (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141119T1800). [18:02:35] lol [18:05:27] * tonythomas googles anthropoid [18:05:51] :) [18:06:01] ha [18:06:22] anthropoid = a higher primate, especially an ape or apeman. [18:06:23] :D [18:06:29] Reedy: time to merge ? [18:06:41] I'm just looking at the backports [18:06:45] okey [18:06:59] (03PS1) 10BBlack: Add ESAMS-EQIAD comments matching prev commit [dns] - 10https://gerrit.wikimedia.org/r/174454 [18:07:40] (03CR) 10BBlack: [C: 032] Add ESAMS-EQIAD comments matching prev commit [dns] - 10https://gerrit.wikimedia.org/r/174454 (owner: 10BBlack) [18:09:20] c'mon jenkins [18:13:06] tonythomas: I don't see any benefit from backporting the other 3 commits [18:14:03] Reedy: true [18:14:03] RECOVERY - DPKG on analytics1035 is OK: All packages OK [18:15:26] (03PS8) 10Reedy: Deploy BounceHandler extension to production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [18:15:32] (03CR) 10Reedy: [C: 032] Deploy BounceHandler extension to production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [18:15:41] (03Merged) 10jenkins-bot: Deploy BounceHandler extension to production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [18:16:03] Reedy: legoktm Jeff_Green ! yay ! [18:16:08] finally ! [18:16:13] :D [18:16:18] well, not synced yet :P [18:16:21] true [18:16:26] and generate VERP = false :\ [18:16:42] when can we switch that on atleast here in test2wiki ? [18:17:15] !log reedy Synchronized php-1.25wmf8/extensions/BounceHandler/: Bump (duration: 00m 14s) [18:17:17] Logged the message, Master [18:18:50] next should be https://gerrit.wikimedia.org/r/#/c/168622/ [18:18:57] !log reedy Synchronized wmf-config/: BounceHandler (duration: 00m 15s) [18:19:00] Logged the message, Master [18:19:01] waiting for things to show up in https://test2.wikipedia.org/wiki/Special:Version :) [18:19:44] It's there now [18:19:47] yay ! its showing up! [18:21:44] are we ready for the exim config? [18:22:30] and guess what - $wgGenerateVERP = true; [18:22:39] and now I got my email with VERPed return PATH [18:22:46] andrewbogott: hahaha, puppet on labs now thinks it's on EC2 and tries to connect to a hardcoded EC2 related address [18:22:50] 169.254.169.254 [18:22:59] https://dpaste.de/6bOm [18:23:08] um… ok, I'm going to ignroe that for now :) [18:23:18] andrewbogott: yeah, ok. [18:24:17] Jeff_Green: Yup, should be [18:24:47] should I be seeing outgoing messages with the verp envelop address yet? [18:25:17] YuviPanda: you mean every instance is trying to connect to that? [18:25:22] Or just on virt1000? [18:25:22] Reedy: why am I not seeing the API yet ? https://test2.wikipedia.org/wiki/api.php?action=bouncehandler [18:25:22] yep. [18:25:26] every instance [18:25:29] nothing wrong with virt1000 [18:25:32] um… what the heck? [18:25:32] I'm digging [18:25:34] yeah [18:25:39] it's a facter feature :) [18:25:54] PROBLEM - Host analytics1035 is DOWN: PING CRITICAL - Packet loss = 100% [18:25:56] I can't imagine how my changes caused that [18:25:59] although surely they did :( [18:26:38] facter 1.7.5 has a regression causing this bug again, downgrading to 1.6.5 fixes it. [18:26:43] RECOVERY - Host analytics1035 is UP: PING OK - Packet loss = 0%, RTA = 1.12 ms [18:27:14] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [18:27:48] YuviPanda: but… did I somehow upgrade facter on the instances? [18:27:56] I guess maybe it's because of pluginsync [18:27:57] * legoktm reads up [18:28:16] (03PS6) 10Andrew Bogott: Update labs compute nodes to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174428 [18:28:18] (03PS6) 10Andrew Bogott: Upgrade labnet1001 to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174427 [18:28:20] (03PS1) 10Andrew Bogott: Another icehouse glance update. [puppet] - 10https://gerrit.wikimedia.org/r/174459 [18:28:25] tonythomas: I think you may have to rebase https://gerrit.wikimedia.org/r/#/c/168622/ ? [18:28:35] tonythomas: it's at https://test2.wikipedia.org/w/api.php?action=bouncehandler [18:28:39] 13:27 < whatami> This might have been mentioned already, but AT&T users are having trouble logging in. [18:28:42] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Can_not_log_in_on_AT.26T has all the information that I have. [18:28:46] /w instead of /wiki [18:29:20] legoktm: oh true 'appservers.svc.${::mw_primary}.wmnet/w/api.php' [18:29:21] !log starting trusty upgrade of 1036 [18:29:25] Logged the message, Master [18:29:27] Jeff_Green: will rebase in a while [18:29:43] (03PS9) 1001tonythomas: Make BounceHandler extension work on test2wiki [puppet] - 10https://gerrit.wikimedia.org/r/168622 [18:29:58] Jeff_Green: Gerrit Rebase button worked ! [18:29:58] yay [18:30:08] (03CR) 10Andrew Bogott: [C: 032] Another icehouse glance update. [puppet] - 10https://gerrit.wikimedia.org/r/174459 (owner: 10Andrew Bogott) [18:30:26] hmm, ec2id fucking up again [18:30:57] Jeff_Green: https://gerrit.wikimedia.org/r/#/c/168622/ - looks okey ? [18:32:03] tonythomas: https://test2.wikipedia.org/w/api.php?action=bouncehandler WFM [18:32:54] Reedy: yeah :) I gave wiki/ [18:33:01] the API is up ! [18:33:05] !log upgraded glance on virt1000 to version icehouse [18:33:07] Logged the message, Master [18:33:12] tonythomas: yeah better. looking [18:33:48] (03CR) 10Jgreen: [C: 031 V: 032] Make BounceHandler extension work on test2wiki [puppet] - 10https://gerrit.wikimedia.org/r/168622 (owner: 1001tonythomas) [18:34:07] (03CR) 10Jgreen: [C: 032 V: 031] Make BounceHandler extension work on test2wiki [puppet] - 10https://gerrit.wikimedia.org/r/168622 (owner: 1001tonythomas) [18:34:43] Jeff_Green: yay ! [18:35:11] (03CR) 10Andrew Bogott: [C: 032] Upgrade labnet1001 to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174427 (owner: 10Andrew Bogott) [18:35:26] running puppet on polonium [18:35:43] * tonythomas bite nails [18:36:23] i see: /usr/bin/curl -H 'Host: test2.wikipedia.org' -- that's as expected? [18:36:55] !log upgrading labnet1001 [18:36:57] Logged the message, Master [18:37:03] Jeff_Green: let me check [18:37:04] one sec [18:37:16] yup [18:37:22] can you paste that entire line / [18:37:27] command = [18:37:37] command = /usr/bin/curl -H 'Host: test2.wikipedia.org' appservers.svc.eqiad.wmnet/w/api.php -d "action=bouncehandler" --data-urlencode "email@-" -o /dev/null [18:38:22] this looks good, right ? [18:40:27] where's ec2id fact set? [18:42:23] legoktm: you have shell access to test2.wikipedia.org, right ? [18:42:38] I think so [18:42:43] tonythomas: I don't know, I was surprised to see us using test2 as the http endpoint [18:42:45] we would want a dummy user setup there with a fake email id ( confirmed ) [18:44:03] Jeff_Green: executing /usr/bin/curl -H 'Host: test2.wikipedia.org' appservers.svc.eqiad.wmnet/w/api.php -d "action=bouncehandler" from polonium gives something / [18:44:05] ? [18:44:56] yes. big blob of response [18:45:07] You are looking at the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use. blah blah blah [18:45:32] Could not retrieve ec2id: private method `chomp' called for nil:NilClass [18:45:33] err [18:45:40] Facter::Util::Resolution.exec("curl -f http://169.254.169.254/1.0/meta-data/instance-id 2> /dev/null").chomp [18:45:41] Jeff_Green: need to pass &format=json ? [18:45:41] wat [18:45:43] Jeff_Green: in that case it would look good, right ? [18:45:50] but I don't think anything cares about the response [18:45:54] so it could even use format=none [18:45:56] "The email parameter must be set" [18:45:57] ya [18:46:04] Jeff_Green: yay ! [18:46:08] it means it hit the API [18:46:22] why are we hitting the API on test2 though? [18:46:24] PROBLEM - DPKG on analytics1036 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:46:25] what IP is that [18:46:32] Jeff_Green: we are, right ? [18:46:38] YuviPanda: that happens when metadata is broken [18:46:46] I think that was paravoid's plan that we start from test2wiki [18:46:47] Which I might have just turned off for a moment… does it still happen? [18:46:58] checking [18:46:58] and then roll out to other wiki's ? [18:47:09] nope [18:47:11] andrewbogott: is all fine [18:47:12] now [18:47:15] ok [18:47:18] however, what IP is that? [18:47:37] tonythomas: he was talking about gradually rolling out verp addresses, not the http endpoint [18:47:42] YuviPanda: No DHCP? [18:47:48] tonythomas: at least that is my understanding [18:48:10] andrewbogott: things work fine now even without the facter degrade, since that IP resolves now. [18:48:24] ok... [18:48:53] YuviPanda: have a look at the ec2id fact. Is that the same IP? [18:48:56] If so that's the metatdata service [18:48:59] Jeff_Green: I was under the impression - we make sure things work with test2wiki - and then move. Never knew about setting a different http endpoint [18:49:19] we can set the HTTP end point to a wiki that have BounceHandler installed only right / [18:49:24] tonythomas: you've switched the http endpoint from login.wm.o to test2.wm.o [18:49:25] (03CR) 10Andrew Bogott: [C: 032] Update labs compute nodes to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174428 (owner: 10Andrew Bogott) [18:49:45] andrewbogott: yeah, that is from the ec2id fact. [18:49:55] andrewbogott: right, so the problem from the start was the metadata service being down [18:49:56] YuviPanda: ok, was probably just because I briefly broke metadata [18:49:59] cool [18:50:00] yeah [18:50:01] ok [18:50:11] !log upgrading virt1006 [18:50:13] * YuviPanda re-upgrades facter on tools-dev [18:50:14] Logged the message, Master [18:52:51] andrewbogott: heading back to shinken IRC stuff now, poke again if you need me to look at anything [18:53:09] !log reedy Synchronized php-1.25wmf8/extensions/Wikidata: Ie105a80aa776769eb0dae8a44cda0b7dbe018fb5 (duration: 00m 22s) [18:53:11] Logged the message, Master [18:53:54] YuviPanda: sounds good, thank you! [18:58:55] (03PS1) 10Andrew Bogott: Nova config changes for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174466 [18:59:35] RECOVERY - DPKG on analytics1036 is OK: All packages OK [19:00:04] Reedy, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141119T1900). Please do the needful. [19:00:44] (03PS2) 10Andrew Bogott: Nova config changes for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174466 [19:01:48] (03CR) 10Andrew Bogott: [C: 032] Nova config changes for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/174466 (owner: 10Andrew Bogott) [19:02:14] PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: Puppet has 3 failures [19:10:24] PROBLEM - Host analytics1036 is DOWN: PING CRITICAL - Packet loss = 100% [19:11:33] RECOVERY - Host analytics1036 is UP: PING OK - Packet loss = 0%, RTA = 1.51 ms [19:12:23] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [19:12:56] csteipp: was that email about API localization meant for the other Andrew? (I don't object to review requests, just, it's the first I've heard of it.) [19:13:02] (03PS1) 10Ori.livneh: check_keyholder: make agent unarmed alerts CRIT [puppet] - 10https://gerrit.wikimedia.org/r/174469 [19:13:52] (03CR) 10Ori.livneh: [C: 032 V: 032] check_keyholder: make agent unarmed alerts CRIT [puppet] - 10https://gerrit.wikimedia.org/r/174469 (owner: 10Ori.livneh) [19:14:18] ottomata: i replied on https://gerrit.wikimedia.org/r/#/c/174195/ [19:14:24] ottomata: how did the talk go?? [19:17:16] andrewbogott: You were the one Andrew said might be working on the OpenStackManger extension [19:17:26] Oh, yeah, that'd be me. [19:17:30] I will try to catch up later on [19:17:47] andrewbogott: Thanks, that would help. Just that one review (Ex:OSM) [19:18:16] ori: talk went really well! [19:18:28] :) was it recorded? [19:18:33] yes, don't know the link yet [19:18:34] but ja [19:18:39] sure sure :P [19:18:49] Reedy tonythomas, I need to grab food before i pass out. back in ~10 [19:19:11] ottomata: you got some positive feedback at http://www.meetup.com/Apache-Kafka-NYC/events/206917572/ [19:20:12] :) [19:20:51] oh crazy i used to work on that block! [19:21:40] Reedy: can we have one configuration change in BounceHandler ? we want $wgVERPdomainPart = 'wikimedia.org'; so that polonium routes it through our verp_api router [19:21:47] hm, ok, ori, so stats vs metrics vs something [19:21:58] i kinda like metrics better... [19:22:00] hm [19:22:04] metrics is fine by me [19:22:11] shall we settle on that? [19:22:23] well, debate with me! let's thinkg about it. once we pick it this name is gonna stick [19:22:31] (03PS1) 10Andrew Bogott: Revert most of "Nova config changes for icehouse" [puppet] - 10https://gerrit.wikimedia.org/r/174470 [19:22:37] what are you going to consume these messages with? [19:22:38] (03PS1) 10Legoktm: extdist: Support distributing skins [puppet] - 10https://gerrit.wikimedia.org/r/174471 [19:22:45] (03CR) 10jenkins-bot: [V: 04-1] extdist: Support distributing skins [puppet] - 10https://gerrit.wikimedia.org/r/174471 (owner: 10Legoktm) [19:23:46] (03CR) 10Andrew Bogott: [C: 032] Revert most of "Nova config changes for icehouse" [puppet] - 10https://gerrit.wikimedia.org/r/174470 (owner: 10Andrew Bogott) [19:24:07] ottomata: a backend i haven't written yet that will pipe into a time series db like graphite, opentsdb, or influxdb [19:24:41] it's going to mostly be a way of extended statsd to have a web-accessible endpoint, so 'stats' seemed consonant with that [19:25:11] aye [19:25:12] hm [19:25:18] (03PS1) 10Reedy: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174473 [19:25:20] (03PS1) 10Reedy: testwiki to 1.25wmf9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174474 [19:25:22] (03PS1) 10Reedy: Wikipedias to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174475 [19:25:24] !log disarming keyholder agent on tin to test alerts [19:25:24] (03PS1) 10Reedy: group0 to 1.25wmf9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174476 [19:25:27] Logged the message, Master [19:25:38] (03CR) 10Reedy: [C: 032] Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174473 (owner: 10Reedy) [19:25:42] and, all of the data is passed through as query params? [19:25:47] (03Merged) 10jenkins-bot: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174473 (owner: 10Reedy) [19:25:56] (03CR) 10Reedy: [C: 032] testwiki to 1.25wmf9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174474 (owner: 10Reedy) [19:26:04] (03Merged) 10jenkins-bot: testwiki to 1.25wmf9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174474 (owner: 10Reedy) [19:26:23] ottomata: yes, but i made the RxURL regexp capture any trailing path components too, since i wanted the option of attaching some meaning to that [19:26:48] ottomata: dsc of all people had a good idea for that ages ago [19:26:56] ori, just a thought, would it be slicker if the data was in some other header as a plain ol json object? [19:27:02] aye, that's cool [19:27:30] rather than json (I assume?) encode in query params? [19:27:36] encoded* [19:27:57] then you wouldn't have to parse uri_query to pull out your data [19:28:01] ottomata: the cheapest way to push data to a server from javascript if you don't care about the response is to create an image element and set its src attribute to the URL you want to ping. The downside of this approach is that you cannot craft special request headers, like you can with AJAX [19:28:08] the full value of the header would be a json object [19:28:22] hm [19:28:31] (03PS2) 10Legoktm: extdist: Support distributing skins [puppet] - 10https://gerrit.wikimedia.org/r/174471 [19:28:33] PROBLEM - Keyholder SSH agent on tin is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. [19:28:34] aye because all you have to do is make a simple request [19:28:36] hm [19:28:50] hm [19:28:50] aye [19:29:03] ok fair point. this woul dalso work in non JS browsers too [19:29:07] since all of the data is in the uri [19:29:08] yep [19:29:08] hm [19:29:33] RECOVERY - Keyholder SSH agent on tin is OK: OK: Keyholder is armed with all configured keys. [19:29:44] i wouldn't use uri-encoded json, just key=val pairs, i think, since there's no needed for nested objects [19:29:51] aye cool [19:30:02] ori, if possible, i think i'd like to find a name that was more specific. [19:30:10] what kind of metrics/stats do you plan on sending thorugh? [19:30:18] !log upgrading other compute hosts: virt1001-1009 [19:30:22] got a few sample requests? [19:30:26] Logged the message, Master [19:30:45] timing data for ajax requests, timing data for ui interactions, counters for feature usage [19:31:32] tonythomas: Do we need something for labs too? [19:31:43] ottomata: here's an example: ve.performance.system.activation=1404,ve.performance.system.domLoad=1134 [19:32:04] (03PS1) 10Reedy: Set $wgVERPdomainPart = 'wikimedia.org'; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174478 [19:32:31] ori, would you consider these all app metrics|stats? [19:32:38] (03PS1) 10Legoktm: Add SkinDistributor configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174480 [19:32:52] ottomata: yeah, i guess so [19:32:53] Reedy: I think we are only listening to bounces from mediawiki-verp.wmflabs.org in labs :\ [19:33:05] or, does that conflate with our use of 'app' as in appserver (mw?) [19:33:26] Reedy: https://gerrit.wikimedia.org/r/#/c/168622/9/manifests/role/mail.pp [19:33:38] !log reedy Started scap: testwiki to 1.25wmf9 and build l10n cache [19:33:38] the $verp_domains [19:33:40] Logged the message, Master [19:34:09] !log starting trusty upgrade of analytics1037 [19:34:12] Logged the message, Master [19:34:15] (03CR) 10Cscott: "This only partially solves the problem, as I'm not a parsoid-root (I'm just a parsoid-admin). But it's a step." [puppet] - 10https://gerrit.wikimedia.org/r/172780 (owner: 10Cscott) [19:34:26] (03CR) 1001tonythomas: [C: 031] Set $wgVERPdomainPart = 'wikimedia.org'; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174478 (owner: 10Reedy) [19:34:44] Reedy: +2 ? [19:34:49] appmeter [19:34:50] appmetrics [19:34:56] app-metrics [19:35:11] actually, hypnens were annoying for some reason, i wanted to avoid them in topic names [19:35:17] app_metrics? [19:35:18] app_meter [19:35:19] ? [19:35:20] appstats [19:35:29] application_metrics [19:35:29] ? [19:35:32] application_stats? [19:35:35] application_meter? [19:36:01] the 'app' / 'application' prefix seems kinda verbose and annoying without making the purpose all that clearer imo [19:36:42] stats/metrics is just so generic though. i think it helps [19:36:49] if i saw a kafka topic called stats or metrics [19:36:54] i'm not so sure I would know what it was for [19:36:55] (03CR) 10Alexandros Kosiaris: "Aside from Brandon's valid comments, and a pedantic whitespace comment on my part, LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/174390 (owner: 10Giuseppe Lavagetto) [19:36:57] wikimetrics maygbe? [19:37:00] stats.wikimedia.org? [19:37:03] heh [19:37:03] something on the stat servers? [19:37:08] app_metrics seems fine [19:37:14] app_stats too [19:37:34] PROBLEM - DPKG on virt1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:37:36] or we can come up with some coolio name for this whole pipeline [19:37:51] 'statsv'! [19:37:53] haha [19:37:57] hm [19:38:00] as in varnish? [19:38:01] v? [19:38:05] vandetta [19:38:07] haha [19:38:10] hahahha [19:38:13] statsv as in systemv [19:38:25] and then replace with statsd as in systemd? [19:38:33] PROBLEM - DPKG on virt1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:38:44] PROBLEM - DPKG on virt1008 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:38:54] PROBLEM - DPKG on virt1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:38:54] PROBLEM - DPKG on virt1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:38:55] heh [19:38:58] statsv is actually no bad ori [19:39:01] not [19:39:09] i'm for it [19:39:12] its kinda like statsd, but a varnish endpoint? [19:39:14] PROBLEM - DPKG on virt1007 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:39:14] PROBLEM - DPKG on virt1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:39:16] yep [19:39:18] exactly [19:39:28] (03PS4) 10Alexandros Kosiaris: WIP: Modularize torrus [puppet] - 10https://gerrit.wikimedia.org/r/174389 [19:39:34] PROBLEM - DPKG on virt1009 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:39:35] hmm, ok! [19:39:36] cool! [19:39:39] let's do it, i like it [19:39:43] so [19:39:43] :D [19:39:45] * ori amends patch [19:39:58] ok, on with the esams show! [19:40:38] ok, ori, mind if I make an amendment after yours? I thikn if we are going to do this multi instance thing, i'd like to rename the main class too [19:40:42] the main varnishkafka instance class [19:40:44] PROBLEM - puppet last run on virt1009 is CRITICAL: CRITICAL: Puppet has 2 failures [19:40:54] ottomata: sure [19:40:58] k [19:41:31] (03PS3) 10Ori.livneh: add varnish::logging::statslistener [puppet] - 10https://gerrit.wikimedia.org/r/174195 [19:41:33] ottomata: ^ all yours [19:42:01] oh, i guess the commit message needs to be amended [19:42:12] i'll leave that to you since you're amending anyhow [19:42:14] ori, i can do that [19:42:14] yeah [19:42:14] k [19:42:26] awesome, thanks very much [19:42:34] PROBLEM - puppet last run on virt1002 is CRITICAL: CRITICAL: Puppet has 5 failures [19:42:38] (03CR) 10Legoktm: "Depends on the https://gerrit.wikimedia.org/r/#/c/158055/ in the ExtensionDistributor extension." [puppet] - 10https://gerrit.wikimedia.org/r/174471 (owner: 10Legoktm) [19:43:42] YuviPanda: will you be around at 18:00 UTC to merge ^ for ExtensionDistributor? https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=135023&oldid=135011 [19:43:52] when's 18:00 UTC? [19:43:54] * YuviPanda checks [19:43:59] 10PST :P [19:44:11] tomorrow? isn't that already past now? [19:44:14] RECOVERY - DPKG on virt1007 is OK: All packages OK [19:44:14] RECOVERY - DPKG on virt1003 is OK: All packages OK [19:44:34] PROBLEM - DPKG on analytics1037 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:44:43] RECOVERY - DPKG on virt1001 is OK: All packages OK [19:44:44] RECOVERY - DPKG on virt1002 is OK: All packages OK [19:44:53] RECOVERY - DPKG on virt1008 is OK: All packages OK [19:44:54] RECOVERY - DPKG on virt1004 is OK: All packages OK [19:45:02] (03CR) 10Yuvipanda: [C: 04-1] "Does this need to be two cron jobs, two conf files, etc?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/174471 (owner: 10Legoktm) [19:45:08] RECOVERY - DPKG on virt1005 is OK: All packages OK [19:45:22] (03PS5) 10Alexandros Kosiaris: WIP: Modularize torrus [puppet] - 10https://gerrit.wikimedia.org/r/174389 [19:45:36] YuviPanda: it's much easier doing it that way... also see https://gerrit.wikimedia.org/r/#/c/174468/ [19:46:01] legoktm: hmm, in that case, ok. but rename $settings to $ext_settings? [19:46:07] yup, doing [19:46:34] RECOVERY - DPKG on virt1009 is OK: All packages OK [19:46:48] oof i hate the nested classes in cache.pp [19:47:03] (03PS3) 10Legoktm: extdist: Support distributing skins [puppet] - 10https://gerrit.wikimedia.org/r/174471 [19:47:13] (03CR) 10Legoktm: extdist: Support distributing skins (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/174471 (owner: 10Legoktm) [19:47:15] (03PS1) 10BBlack: esams drain: RU->ulsfo, 8x->eqiad [dns] - 10https://gerrit.wikimedia.org/r/174484 [19:47:53] (03CR) 10BBlack: [C: 032] esams drain: RU->ulsfo, 8x->eqiad [dns] - 10https://gerrit.wikimedia.org/r/174484 (owner: 10BBlack) [19:48:59] !log starting the long slow process of draining out esams traffic ahead of power maint event [19:49:06] Logged the message, Master [19:49:10] (03CR) 10Legoktm: "Added to SWAT: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=135024&oldid=135023" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174480 (owner: 10Legoktm) [19:49:30] legoktm: why does the patch merging need to wait until SWAT? [19:49:39] YuviPanda: that's a different patch! [19:50:01] legoktm: oh, you wanted me to be around for the extdist extension patch, not the ops patch? [19:50:02] * YuviPanda is confused [19:50:10] no, to be around for the puppet patch [19:50:13] there are 4 patches [19:50:20] https://gerrit.wikimedia.org/r/#/q/status:open+topic:skindistributor,n,z [19:50:36] puppet patch can happen anytime no? [19:50:38] the mediawiki-config one is a no-op until the extension code is live, so we can do it early [19:50:40] uhh no [19:50:47] it should happen near the same time as the extension one [19:51:02] I'm confused again, but ok. [19:51:43] RECOVERY - DPKG on analytics1037 is OK: All packages OK [19:53:14] PROBLEM - puppet last run on analytics1037 is CRITICAL: Connection refused by host [19:53:36] Coren: interruptible? I did dist-upgrade on the virt hosts and I got a bunch of scary warnings from grub on virt1009. [19:53:43] It's fine now but I'm worried it won't boot now [19:54:08] ori, does this RxURL match any url that starts with s/? [19:55:14] PROBLEM - Host analytics1037 is DOWN: PING CRITICAL - Packet loss = 100% [19:55:58] ottomata: yes, and i promise that that's fine [19:56:12] we don't have any single-letter URLs, our URL patterns are extremely predictable and fixed [19:56:14] RECOVERY - Host analytics1037 is UP: PING OK - Packet loss = 0%, RTA = 3.25 ms [19:56:50] how about [19:56:51] statsv/ [19:56:51] ? [19:57:13] lame! why not s/? [19:57:19] cause it is explicit! [19:57:40] fine. chars in urls are a finite resource! :P [19:58:05] * YuviPanda has no idea what you guys are talking about, but s/ is already used for Ext:ShortURL in a couple of Indian language wikis [19:58:18] but not bits [19:58:24] ah, bits. ok [19:58:53] RECOVERY - puppet last run on virt1009 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [19:58:55] ottomata: s/ is best; sv/ is good; statsv/ is fine. [19:58:58] i wouldn't push this one if you really want s/, ori, but statsv/ seems more consistent and obvious as to waht is going on [19:59:04] these urls will show up in teh regular webrequest logs too [19:59:09] let's go with statsv/ [19:59:11] ok [19:59:28] varnish_opts => { 'm' => 'RxURL:^/statsv\//', }, [19:59:30] can you make that change? since you're amending anyhow? [19:59:31] yep [20:00:44] RECOVERY - puppet last run on virt1002 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [20:01:20] yeah [20:04:03] (03PS1) 10BBlack: esams drain: CH/CZ/DE->eqiad, TR/UA->ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174488 [20:04:29] (03CR) 10BBlack: [C: 032] esams drain: CH/CZ/DE->eqiad, TR/UA->ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174488 (owner: 10BBlack) [20:05:54] (03PS4) 10Ottomata: add varnish::kafka::statsv [puppet] - 10https://gerrit.wikimedia.org/r/174195 (owner: 10Ori.livneh) [20:06:40] (03CR) 10jenkins-bot: [V: 04-1] add varnish::kafka::statsv [puppet] - 10https://gerrit.wikimedia.org/r/174195 (owner: 10Ori.livneh) [20:07:20] ottomata: are you sure you can do Varnishkafka::Instance { require +> Rsyslog::Conf['varnishkafka'] } ? [20:07:33] (03PS5) 10Ottomata: add varnish::kafka::statsv [puppet] - 10https://gerrit.wikimedia.org/r/174195 (owner: 10Ori.livneh) [20:07:36] i guess i can't do the +> [20:07:40] i thougth i could [20:07:42] i'm pretty sure i can do => [20:07:50] ottomata: I think you might need: [20:07:54] and its ok here, because none of these varnishkafka::instances are requireing [20:08:07] Rsyslog::Conf['varnishkafka'] -> Varnishkafka::Instance <| |> [20:08:17] hmm, that is probably moire flexible [20:08:26] this is wa [20:08:27] ori [20:08:27] https://docs.puppetlabs.com/puppet/latest/reference/lang_defined_types.html#resource-defaults [20:08:29] that's what i was doing [20:08:52] yeah, but you're overriding using the plusignment operator [20:09:02] which you can only do from subclasses of the class that declares the resource [20:09:04] i was trying yeah [20:09:06] can't do taht [20:09:17] my puppet syntax checker hasn't been working since I got a new computer :/ [20:09:25] Rsyslog::Conf['varnishkafka'] -> Varnishkafka::Instance <| |> should do it [20:09:25] but yours is better [20:09:39] i could override the require outright, but yours allows future subclasses to still use the require on their varnishkafka::instances [20:09:39] k [20:10:27] (03PS6) 10Ottomata: add varnish::kafka::statsv [puppet] - 10https://gerrit.wikimedia.org/r/174195 (owner: 10Ori.livneh) [20:12:16] (03CR) 10Ori.livneh: [C: 031] add varnish::kafka::statsv [puppet] - 10https://gerrit.wikimedia.org/r/174195 (owner: 10Ori.livneh) [20:23:14] !log starting trusty upgrade of analytics1038 [20:23:17] Logged the message, Master [20:24:44] (03PS5) 10Andrew Bogott: Move the openstack_version setting hiera. [puppet] - 10https://gerrit.wikimedia.org/r/173904 [20:27:05] (03PS1) 10BBlack: esams drain: 7x->ulsfo + 7x->eqiad [dns] - 10https://gerrit.wikimedia.org/r/174497 [20:27:26] (03PS1) 10Andrew Bogott: Changed some incorrect puppet paths in comments. [puppet] - 10https://gerrit.wikimedia.org/r/174498 [20:28:39] (03CR) 10Andrew Bogott: [C: 032] Changed some incorrect puppet paths in comments. [puppet] - 10https://gerrit.wikimedia.org/r/174498 (owner: 10Andrew Bogott) [20:29:54] (03PS5) 10Andrew Bogott: Allow sshd to pull ssh keys from ldap on Trusty. [puppet] - 10https://gerrit.wikimedia.org/r/173066 [20:30:59] (03PS3) 10Ottomata: Link aggregator dataset into wikimetrics public webspace [puppet] - 10https://gerrit.wikimedia.org/r/172285 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [20:33:04] (03PS2) 10Anomie: Configure Logstash and Elasticsearch for ApiFeatureUsage [puppet] - 10https://gerrit.wikimedia.org/r/173336 [20:33:53] PROBLEM - DPKG on analytics1038 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:36:04] (03CR) 10BBlack: [C: 032] esams drain: 7x->ulsfo + 7x->eqiad [dns] - 10https://gerrit.wikimedia.org/r/174497 (owner: 10BBlack) [20:45:03] RECOVERY - DPKG on analytics1038 is OK: All packages OK [20:48:25] (03PS1) 10BBlack: esams drain: rest of AS esams->ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174499 [20:48:45] (03PS6) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [20:49:42] andrewbogott: Sorry, never saw that ping. You all sorted out or do you want me to take a look? [20:50:08] Coren: take a look please? I haven't dug into it much [20:50:18] andrewbogott: Lemme go see. [20:50:20] There are lots of scary grub warnings in the apt logs on virt1009. That's about all I know so far [20:50:23] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: Puppet has 3 failures [20:50:31] (03CR) 10BBlack: [C: 032] esams drain: rest of AS esams->ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174499 (owner: 10BBlack) [20:51:45] (03PS7) 10Ottomata: add varnish::kafka::statsv [puppet] - 10https://gerrit.wikimedia.org/r/174195 (owner: 10Ori.livneh) [20:52:03] bblack, ori wants to deploy the this new varnishkafka endpoint on bits [20:52:10] you look like are doing some varnish stuff [20:52:12] should we wait? [20:52:47] yes please wait [20:52:50] until like tomorrow :) [20:52:57] okay, no problem [20:53:02] ottomata: ^ that's fine by me [20:54:14] PROBLEM - Host analytics1038 is DOWN: PING CRITICAL - Packet loss = 100% [20:54:44] ja can do [20:54:54] ori, earlier is better tomorrow [20:54:57] i'm leaving early to drive to va [20:55:08] ottomata: what time would be best for you? [20:55:33] RECOVERY - Host analytics1038 is UP: PING OK - Packet loss = 0%, RTA = 1.02 ms [20:56:24] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [20:57:30] !log starting trusty upgrade of analytics1038 [20:57:33] bah [20:57:35] 1039 [20:58:28] (03PS1) 10BBlack: esams drain: AF + 6x countries esams->eqiad [dns] - 10https://gerrit.wikimedia.org/r/174500 [20:58:50] (03CR) 10BBlack: [C: 032] esams drain: AF + 6x countries esams->eqiad [dns] - 10https://gerrit.wikimedia.org/r/174500 (owner: 10BBlack) [21:00:04] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141119T2100). [21:02:31] (03CR) 10Ori.livneh: "Does the fallback work? We discovered a week or two ago that it wasn't; I'm not sure if we ever got it working." [puppet] - 10https://gerrit.wikimedia.org/r/174390 (owner: 10Giuseppe Lavagetto) [21:02:53] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Puppet has 1 failures [21:05:58] (03CR) 10BryanDavis: "Some minor nits inline, but it reads right I think. Have you tried it on beta yet?" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie) [21:06:43] (03PS1) 10BBlack: esams drain: GB -> ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174502 [21:06:54] PROBLEM - DPKG on analytics1039 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:10:03] (03CR) 10BBlack: [C: 032] esams drain: GB -> ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174502 (owner: 10BBlack) [21:11:43] (03PS1) 10Ottomata: Add libgoogle-glog-dev on stat1002 and stat1003 for Ironholds [puppet] - 10https://gerrit.wikimedia.org/r/174506 [21:14:14] (03PS3) 10Anomie: Configure Logstash and Elasticsearch for ApiFeatureUsage [puppet] - 10https://gerrit.wikimedia.org/r/173336 [21:14:24] (03CR) 10Anomie: "Not on Beta yet, I wanted someone else to +1 it first." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie) [21:15:16] (03CR) 10BryanDavis: [C: 031] "Try it out in beta. :)" [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie) [21:18:04] RECOVERY - DPKG on analytics1039 is OK: All packages OK [21:18:45] !log reedy Finished scap: testwiki to 1.25wmf9 and build l10n cache (duration: 105m 06s) [21:18:48] Logged the message, Master [21:19:12] That is not cool scap, not cool at all [21:19:58] (03CR) 10Reedy: [C: 032] Wikipedias to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174475 (owner: 10Reedy) [21:20:22] (03Merged) 10jenkins-bot: Wikipedias to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174475 (owner: 10Reedy) [21:21:01] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf8 [21:21:04] Logged the message, Master [21:21:07] wwaaaaaaaaaaaaaaaaa [21:21:32] (03CR) 10Reedy: [C: 032] group0 to 1.25wmf9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174476 (owner: 10Reedy) [21:21:34] PROBLEM - Host analytics1039 is DOWN: PING CRITICAL - Packet loss = 100% [21:21:41] (03Merged) 10jenkins-bot: group0 to 1.25wmf9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174476 (owner: 10Reedy) [21:21:51] seriously, wtf [21:21:51] (03PS2) 10Ottomata: Add libgoogle-glog-dev on stat1002 and stat1003 for Ironholds [puppet] - 10https://gerrit.wikimedia.org/r/174506 [21:22:01] greg-g: We crushed two of the sync servers [21:22:01] (03CR) 10Ottomata: [C: 032 V: 032] Add libgoogle-glog-dev on stat1002 and stat1003 for Ironholds [puppet] - 10https://gerrit.wikimedia.org/r/174506 (owner: 10Ottomata) [21:22:10] why? [21:22:19] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf9 [21:22:20] Logged the message, Master [21:22:21] what'd they do to you? [21:22:24] (03CR) 10Reedy: [C: 032] Set $wgVERPdomainPart = 'wikimedia.org'; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174478 (owner: 10Reedy) [21:22:28] * greg-g isn't helping [21:22:32] (03Merged) 10jenkins-bot: Set $wgVERPdomainPart = 'wikimedia.org'; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174478 (owner: 10Reedy) [21:22:33] RECOVERY - Host analytics1039 is UP: PING OK - Packet loss = 0%, RTA = 2.30 ms [21:22:36] crush, lament, etc [21:22:50] Conan told me it was the meaning of life [21:22:55] mmm, 'puppet is running' is a very nice analog to 'my code is compiling [21:22:57] ' [21:23:51] greg-g: Things were very uneven -- https://phabricator.wikimedia.org/P93 [21:24:12] greg-g: Sam's going to look into rearranging the slaves a bit [21:25:29] !log starting trusty upgrade of analytics1040 [21:25:34] Logged the message, Master [21:25:57] (03PS4) 10Reedy: Add contact pages for legal to testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/119873 [21:27:37] (03PS1) 10BBlack: esams drain: IE/IS/PT->ulsfo, FR/ES eqiad->ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174512 [21:27:39] (03PS1) 10BBlack: esams drain: 13x->eqiad [dns] - 10https://gerrit.wikimedia.org/r/174513 [21:27:48] (03PS2) 10GWicke: Give parsoid-roots access to ruthenium and xenon. [puppet] - 10https://gerrit.wikimedia.org/r/172780 (owner: 10Cscott) [21:28:13] bd808: gotcha [21:28:29] (03CR) 10BBlack: [C: 032] esams drain: IE/IS/PT->ulsfo, FR/ES eqiad->ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174512 (owner: 10BBlack) [21:29:44] (03PS5) 10Reedy: Add contact pages for legal to testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/119873 [21:29:53] (03CR) 10BBlack: [C: 032] esams drain: 13x->eqiad [dns] - 10https://gerrit.wikimedia.org/r/174513 (owner: 10BBlack) [21:30:03] (03CR) 10Manybubbles: Configure Logstash and Elasticsearch for ApiFeatureUsage (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie) [21:31:03] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [21:31:40] (03CR) 10BryanDavis: Configure Logstash and Elasticsearch for ApiFeatureUsage (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie) [21:32:54] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [21:33:01] (03PS6) 10Reedy: Add contact pages for legal to metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/119873 [21:33:34] (03CR) 10Reedy: [C: 032] Add contact pages for legal to metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/119873 (owner: 10Reedy) [21:33:43] (03Merged) 10jenkins-bot: Add contact pages for legal to metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/119873 (owner: 10Reedy) [21:34:45] !log reedy Synchronized docroot and w: (no message) (duration: 00m 15s) [21:34:47] Logged the message, Master [21:35:16] !log reedy Synchronized wmf-config/: ContactPage for legal (duration: 00m 17s) [21:35:18] Logged the message, Master [21:36:35] Deskana|Away: https://meta.wikimedia.org/wiki/Special:Contact/requestlicense [21:37:25] PROBLEM - DPKG on analytics1040 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:38:06] !log restarted txstatsd & carbon on labmon1001, recovering from missing points now [21:38:09] Logged the message, Master [21:38:12] ^ not very confidence inspiring [21:38:22] iowait was ok as well (~3) [21:39:33] (03PS4) 10Anomie: Configure Logstash and Elasticsearch for ApiFeatureUsage [puppet] - 10https://gerrit.wikimedia.org/r/173336 [21:46:12] (03PS1) 10BBlack: esams drain: GR/HU/NL/NO/PL/RO -> eqiad, LU/IM/IT -> ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174574 [21:46:13] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: Puppet has 39 failures [21:46:34] RECOVERY - DPKG on analytics1040 is OK: All packages OK [21:46:38] (03CR) 10BBlack: [C: 032] esams drain: GR/HU/NL/NO/PL/RO -> eqiad, LU/IM/IT -> ulsfo [dns] - 10https://gerrit.wikimedia.org/r/174574 (owner: 10BBlack) [21:47:04] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [21:48:25] (03PS5) 10Anomie: Configure Logstash and Elasticsearch for ApiFeatureUsage [puppet] - 10https://gerrit.wikimedia.org/r/173336 [21:50:53] (03PS6) 10Anomie: Configure Logstash and Elasticsearch for ApiFeatureUsage [puppet] - 10https://gerrit.wikimedia.org/r/173336 [21:56:11] (03PS3) 10GWicke: Give parsoid-roots access to ruthenium; split cassandra test hosts [puppet] - 10https://gerrit.wikimedia.org/r/172780 (owner: 10Cscott) [21:57:28] PROBLEM - Host analytics1040 is DOWN: PING CRITICAL - Packet loss = 100% [21:58:33] RECOVERY - Host analytics1040 is UP: PING OK - Packet loss = 0%, RTA = 2.18 ms [21:59:33] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [22:00:04] yurik: Dear anthropoid, the time has come. Please deploy Wikipedia Zero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141119T2200). [22:00:24] (03PS7) 10Anomie: Configure Logstash and Elasticsearch for ApiFeatureUsage [puppet] - 10https://gerrit.wikimedia.org/r/173336 [22:01:16] !log starting trusty upgrade of analytics1041 [22:01:18] Logged the message, Master [22:05:24] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [22:11:04] PROBLEM - DPKG on analytics1041 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:16:13] RECOVERY - DPKG on analytics1041 is OK: All packages OK [22:21:54] PROBLEM - Host analytics1041 is DOWN: PING CRITICAL - Packet loss = 100% [22:22:34] RECOVERY - Host analytics1041 is UP: PING OK - Packet loss = 0%, RTA = 2.52 ms [22:45:14] (03PS8) 10Anomie: Configure Logstash and Elasticsearch for ApiFeatureUsage [puppet] - 10https://gerrit.wikimedia.org/r/173336 [22:45:16] (03CR) 10Cscott: [C: 031] "Works for me. Do you want to give parsoid-admins some sudo rights on ruthenium as well? It would be nice if parsoid-admins could tickle " [puppet] - 10https://gerrit.wikimedia.org/r/172780 (owner: 10Cscott) [22:46:41] (03PS9) 10Anomie: Configure Logstash and Elasticsearch for ApiFeatureUsage [puppet] - 10https://gerrit.wikimedia.org/r/173336 [22:46:56] (03PS1) 10Aklapper: Phab: Change user visible strings "Execute Query" and "Real Name" [puppet] - 10https://gerrit.wikimedia.org/r/174583 [22:53:55] (03PS2) 10Legoktm: Phab: Change user visible strings "Execute Query" and "Real Name" [puppet] - 10https://gerrit.wikimedia.org/r/174583 (owner: 10Aklapper) [22:53:57] (03PS1) 10Cmjohnson: Adding dns records for mgmt and production for mw1220 -mw1246 [dns] - 10https://gerrit.wikimedia.org/r/174584 [22:59:20] (03PS1) 10Kaldari: Adding Wikipedia wordmark for mobile and switching to it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174585 (https://bugzilla.wikimedia.org/58886) [23:01:32] (03PS1) 10Bmansurov: Add 'types of albums' WikiGrok campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174586 [23:02:03] (03CR) 10Qgil: [C: 031] Phab: Change user visible strings "Execute Query" and "Real Name" [puppet] - 10https://gerrit.wikimedia.org/r/174583 (owner: 10Aklapper) [23:03:28] (03CR) 10Cmjohnson: [C: 032] Adding dns records for mgmt and production for mw1220 -mw1246 [dns] - 10https://gerrit.wikimedia.org/r/174584 (owner: 10Cmjohnson) [23:10:44] (03CR) 10GWicke: "I think I'd prefer to add you to parsoid-roots. That way you can also attach a debugger to production parsoid instances as user 'parsoid'." [puppet] - 10https://gerrit.wikimedia.org/r/172780 (owner: 10Cscott) [23:20:45] (03PS3) 10Aaron Schulz: Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 [23:44:14] (03CR) 10Chad: [C: 031] Add SkinDistributor configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174480 (owner: 10Legoktm) [23:47:20] (03CR) 10Chad: [C: 031] extdist: Support distributing skins [puppet] - 10https://gerrit.wikimedia.org/r/174471 (owner: 10Legoktm) [23:55:55] (03CR) 10Giuseppe Lavagetto: "The fallback has been fixed last week, but it didn't work for a long time; so while it was left here as a safeguard; I think it could be e" [puppet] - 10https://gerrit.wikimedia.org/r/174390 (owner: 10Giuseppe Lavagetto) [23:59:38] <^d> Gather round the campfire kids. It's time for SWAT! We have prizes today! [23:59:57] (03PS2) 10Kaldari: Adding Wikipedia wordmark for mobile and switching to it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174585 (https://bugzilla.wikimedia.org/58886) [23:59:57] <^d> Pinging legoktm, tgr and kaldari in no particular order for swat :)