[00:01:42] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: Epic puppet fail [00:02:13] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [00:03:41] !log ori Finished scap: SWAT: d3de89777, 7abfe0d5e7, 8ec9853c32b, 476e9e90bd01 (duration: 06m 29s) [00:03:47] Logged the message, Master [00:05:24] James_F: looks ok? [00:06:57] (03PS7) 10Ori.livneh: Add 'trebuchet' package provider and role. [operations/puppet] - 10https://gerrit.wikimedia.org/r/155603 (https://bugzilla.wikimedia.org/59931) [00:11:02] (03PS26) 10BBlack: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [00:11:04] (03PS5) 10BBlack: Create internal LVS cluster 'hhvm_appservers' [operations/puppet] - 10https://gerrit.wikimedia.org/r/152908 (owner: 10Mark Bergsma) [00:11:06] (03PS5) 10BBlack: Add monitoring for LVS service hhvm-appservers.svc.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/152909 (owner: 10Mark Bergsma) [00:17:26] (03CR) 10BBlack: [C: 032] "What could possibly go wrong? :p" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [00:17:46] (03CR) 10BBlack: [C: 032] Create internal LVS cluster 'hhvm_appservers' [operations/puppet] - 10https://gerrit.wikimedia.org/r/152908 (owner: 10Mark Bergsma) [00:19:13] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [00:19:19] grr fuck you strontium [00:20:43] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [00:28:42] PROBLEM - HTTP 5xx req/min on labmon1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [00:28:43] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [00:29:44] (03PS1) 10Dzahn: Revert "switch svn.wm.org over to misc-web-lb.eqiad" [operations/dns] - 10https://gerrit.wikimedia.org/r/155676 [00:32:53] (03CR) 10Dzahn: "antimony backend is configured to port 8080" [operations/dns] - 10https://gerrit.wikimedia.org/r/155676 (owner: 10Dzahn) [00:33:57] (03CR) 10Dzahn: [C: 032] "antimony backend is configured to port 8080" [operations/dns] - 10https://gerrit.wikimedia.org/r/155676 (owner: 10Dzahn) [00:40:06] ori: Sorry, yes, looks OK; did RoanKattouw_away confirm in my absence or did he forget? :-) [00:40:22] (03CR) 10BBlack: [C: 032] Add monitoring for LVS service hhvm-appservers.svc.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/152909 (owner: 10Mark Bergsma) [00:40:39] i hadn't pinged him [00:40:47] all good though [00:41:42] RECOVERY - HTTP 5xx req/min on labmon1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:41:42] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [00:43:23] (03CR) 10Dzahn: "since the misc. varnish is configured to have the antimony backend on port 8080:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154973 (owner: 10Dzahn) [00:45:01] (03CR) 10Dzahn: "antimony backend is configured to port 8080" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155077 (owner: 10Dzahn) [01:07:35] (03PS1) 10Chad: WIP: Collection of fun bash scripts for managing elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/155679 [01:10:23] (03PS1) 10Dzahn: add W3C wikis table [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/155680 [01:13:22] (03PS1) 10Ori.livneh: Trusty app servers should use HHVM app server LVS pool [operations/puppet] - 10https://gerrit.wikimedia.org/r/155681 [01:13:26] ^ bblack [01:14:23] (03CR) 10Ori.livneh: "To answer the obvious question: there are no non-HHVM Trusty app servers; we use the distribution name elsewhere to choose the interpreter" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155681 (owner: 10Ori.livneh) [01:17:46] (03PS1) 10Dzahn: wikistats - retab it all [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/155682 [01:18:10] (03CR) 10BBlack: [C: 032] Trusty app servers should use HHVM app server LVS pool [operations/puppet] - 10https://gerrit.wikimedia.org/r/155681 (owner: 10Ori.livneh) [02:01:22] (03PS1) 10Dzahn: wikistats - move wsa out of /usr/local/bin [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/155684 [02:05:04] (03PS1) 10Dzahn: wikistats - bump up package version [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/155685 [02:07:18] (03CR) 10Dzahn: [C: 032] wikistats - move wsa out of /usr/local/bin [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/155684 (owner: 10Dzahn) [02:07:38] (03PS2) 10Dzahn: wikistats - bump up package version [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/155685 [02:08:02] (03CR) 10Dzahn: "recheck" [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/155682 (owner: 10Dzahn) [02:10:12] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:10:31] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:11:11] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:11:31] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:13:39] (03CR) 10Dzahn: puppetmaster - use ssl_ciphersuite (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153986 (owner: 10Dzahn) [02:14:17] (03PS3) 10Dzahn: puppetmaster - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153986 [02:33:14] !log LocalisationUpdate completed (1.24wmf17) at 2014-08-22 02:32:11+00:00 [02:33:23] Logged the message, Master [02:41:51] (03PS1) 10BryanDavis: Do not define MEDIAWIKI before loading WebStart.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155687 [02:47:12] PROBLEM - DPKG on dataset1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:48:01] RECOVERY - DPKG on dataset1001 is OK: All packages OK [03:19:48] !log LocalisationUpdate completed (1.24wmf18) at 2014-08-22 03:18:44+00:00 [03:19:54] Logged the message, Master [03:34:29] (03PS2) 10Chad: WIP: Collection of fun bash scripts for managing elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/155679 [04:11:54] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Aug 22 04:10:47 UTC 2014 (duration 10m 46s) [04:11:59] Logged the message, Master [04:27:51] PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: Puppet has 1 failures [04:44:51] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:16:21] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.001 second response time [06:22:21] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.007 second response time [06:22:22] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 37 data above and 0 below the confidence bounds [06:22:22] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 37 data above and 0 below the confidence bounds [06:25:32] PROBLEM - Disk space on elastic1016 is CRITICAL: DISK CRITICAL - free space: / 234 MB (0% inode=96%): [06:28:01] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: Epic puppet fail [06:28:21] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:21] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Epic puppet fail [06:28:21] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:42] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:11] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:11] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:21] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:11] RECOVERY - Disk space on ms1004 is OK: DISK OK [06:34:12] PROBLEM - Disk space on ms1004 is CRITICAL: DISK CRITICAL - free space: / 679 MB (3% inode=94%): /var/lib/ureadahead/debugfs 679 MB (3% inode=94%): [06:35:51] PROBLEM - puppet last run on db1047 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:22] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:46:11] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:46:21] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:46:21] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:46:41] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:47:01] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:47:11] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:47:21] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:53:51] RECOVERY - puppet last run on db1047 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:58:32] RECOVERY - Disk space on elastic1016 is OK: DISK OK [07:22:21] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: Epic puppet fail [07:41:21] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [08:18:51] PROBLEM - Puppet freshness on elastic1016 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 06:17:47 UTC [08:24:31] PROBLEM - mailman archives on sodium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:25:21] RECOVERY - mailman archives on sodium is OK: HTTP OK: HTTP/1.1 200 OK - 54206 bytes in 0.016 second response time [08:30:12] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [08:30:31] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [08:31:12] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [08:31:32] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [09:16:01] Reedy: there? I have a question re: sync-common file not found on mw1019 [09:18:57] hashar_: the new gerrit output from jenkins-bot is fancy! [09:21:56] (03CR) 10Filippo Giunchedi: [C: 031] remove HTTPS config from gitblit template (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/154973 (owner: 10Dzahn) [09:24:22] (03CR) 10Filippo Giunchedi: "is the change to cdh submodule intended?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154371 (owner: 10Dzahn) [09:26:24] (03CR) 10Filippo Giunchedi: [C: 031] remove blog.wikmedia.org related things [operations/puppet] - 10https://gerrit.wikimedia.org/r/153117 (owner: 10Dzahn) [09:26:59] (03PS4) 10Nemo bis: remove blog.wikimedia.org related things [operations/puppet] - 10https://gerrit.wikimedia.org/r/153117 (owner: 10Dzahn) [09:27:34] (03CR) 10Filippo Giunchedi: [C: 031] download.wm.org - use apache::site method [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 (owner: 10Dzahn) [09:30:42] (03CR) 10Filippo Giunchedi: contint: phpcs mw standard on labs slaves (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155544 (https://bugzilla.wikimedia.org/64858) (owner: 10Hashar) [09:31:19] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gerrit secure.config.erb - variable access [operations/puppet] - 10https://gerrit.wikimedia.org/r/155433 (owner: 10Dzahn) [09:31:52] godog: I am happy to know you like the pretty format. Honestly, I have just copy pasted code from OpenStack and qchris kindly enhanced the crazy regex :D [09:32:18] godog: for sync-common, ori said yesterday it was some PATH issue apparently [09:33:53] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] "seems so, was removed in Ia9ed6329" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153226 (owner: 10Dzahn) [09:34:52] hashar: yes it is! I was curious what script(s) generated that because they might want to source mw-deployment-vars.sh [09:35:00] which will contain the adjusted PATH [09:36:06] either that or we restore the symlinks from /srv/deployment into /usr/local/bin [09:36:27] the idea is to get rid of the symlink iirc [09:38:06] yep they are gone from new machines/reinstalls [09:47:16] (03CR) 10Filippo Giunchedi: jenkins: use openjdk-7-jre-headless (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153764 (owner: 10Hashar) [09:55:57] (03CR) 10Hashar: jenkins: use openjdk-7-jre-headless (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153764 (owner: 10Hashar) [09:56:24] godog: basically java 6 -> 7 could yield more perf / less mem usage [09:58:06] hashar: ah ok, are there many jobs using jvm btw? I think we should just go for 7 everywhere [09:58:47] not sure [09:58:56] _joe_ proposed to tweak Debian alternative [09:59:00] but I am not sure of the side effect [09:59:21] maybe all jobs already point to java 7 (the jvm to use is configurable in the jobs config) [10:09:34] would it be hard to find how many are using what java binary? [10:11:35] I'm asking because eventually (trusty?) it'd make sense to move everything to 7 unless we have a good reason not to [10:13:37] I guess so :-D [10:13:59] as for finding out jobs using java , I am not sure [10:14:06] I should grep the config files maybe [10:18:21] PROBLEM - puppet last run on elastic1016 is CRITICAL: CRITICAL: Puppet last ran 14419 seconds ago, expected 14400 [10:19:51] PROBLEM - Puppet freshness on elastic1016 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 06:17:47 UTC [10:23:41] RECOVERY - Puppet freshness on elastic1016 is OK: puppet ran at Fri Aug 22 10:23:37 UTC 2014 [10:24:21] RECOVERY - puppet last run on elastic1016 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:52:41] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [10:58:43] (03PS1) 10Filippo Giunchedi: elasticsearch: decrease ganglia stats timeout [operations/puppet] - 10https://gerrit.wikimedia.org/r/155703 [10:59:28] (03CR) 10jenkins-bot: [V: 04-1] elasticsearch: decrease ganglia stats timeout [operations/puppet] - 10https://gerrit.wikimedia.org/r/155703 (owner: 10Filippo Giunchedi) [11:01:22] (03PS2) 10Filippo Giunchedi: elasticsearch: decrease ganglia stats timeout [operations/puppet] - 10https://gerrit.wikimedia.org/r/155703 [11:02:47] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] elasticsearch: decrease ganglia stats timeout [operations/puppet] - 10https://gerrit.wikimedia.org/r/155703 (owner: 10Filippo Giunchedi) [11:47:41] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [12:29:09] any staff'er want to folow up a requested Common.js change on de.wp (not MMV related): https://bugzilla.wikimedia.org/show_bug.cgi?id=69897 [12:30:01] thedjNotWMF: do you use the +volunteer mode on IRC ? :p [12:30:14] PierreSelim: deifnetly [12:47:42] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.02 [13:04:25] (03PS1) 10Filippo Giunchedi: filippo: let .bash_profile call .bashrc [operations/puppet] - 10https://gerrit.wikimedia.org/r/155705 [13:04:46] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] filippo: let .bash_profile call .bashrc [operations/puppet] - 10https://gerrit.wikimedia.org/r/155705 (owner: 10Filippo Giunchedi) [13:16:31] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: Puppet has 1 failures [13:17:19] (03PS1) 10Hashar: contint: switch localvhost to apache::conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/155707 (https://bugzilla.wikimedia.org/68256) [13:17:24] (03PS1) 10Hashar: contint: migrate localvhost to apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/155708 [13:24:38] (03CR) 10Hashar: [C: 031 V: 032] "Pass on:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155707 (https://bugzilla.wikimedia.org/68256) (owner: 10Hashar) [13:28:32] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [13:28:51] PROBLEM - HTTP 5xx req/min on labmon1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [13:30:20] (03CR) 10Hashar: [C: 04-1] contint: switch localvhost to apache::conf (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155707 (https://bugzilla.wikimedia.org/68256) (owner: 10Hashar) [13:31:08] (03PS2) 10Hashar: contint: switch localvhost to apache::conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/155707 (https://bugzilla.wikimedia.org/68256) [13:33:50] (03PS2) 10Hashar: contint: migrate localvhost to apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/155708 [13:34:14] (03CR) 10Hashar: [C: 031 V: 032] "Fixed up the replaces => parameter which expects a relative path." [operations/puppet] - 10https://gerrit.wikimedia.org/r/155707 (https://bugzilla.wikimedia.org/68256) (owner: 10Hashar) [13:34:31] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [13:37:42] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [13:38:24] (03CR) 10Hashar: [C: 031 V: 032] "Deployed on integration puppetmaster." [operations/puppet] - 10https://gerrit.wikimedia.org/r/155708 (owner: 10Hashar) [13:41:32] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [13:41:51] RECOVERY - HTTP 5xx req/min on labmon1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:50:30] (03PS1) 10Mark Bergsma: Add asw-[a-c]-codfw.mgmt.codfw.wmnet to RANCID [operations/puppet] - 10https://gerrit.wikimedia.org/r/155712 [13:51:18] (03CR) 10Mark Bergsma: [C: 032] Add asw-[a-c]-codfw.mgmt.codfw.wmnet to RANCID [operations/puppet] - 10https://gerrit.wikimedia.org/r/155712 (owner: 10Mark Bergsma) [14:04:50] (03CR) 10Hashar: contint: phpcs mw standard on labs slaves (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155544 (https://bugzilla.wikimedia.org/64858) (owner: 10Hashar) [14:05:04] (03PS2) 10Hashar: contint: phpcs mw standard on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/155544 (https://bugzilla.wikimedia.org/64858) [14:13:21] (03PS4) 10Ottomata: Add cron job to drop old data in HDFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/155549 [14:13:27] (03CR) 10Ottomata: [C: 032 V: 032] Add cron job to drop old data in HDFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/155549 (owner: 10Ottomata) [14:26:27] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] contint: phpcs mw standard on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/155544 (https://bugzilla.wikimedia.org/64858) (owner: 10Hashar) [14:32:01] chasemp: i think this documentation is out of date, right? [14:32:01] https://wikitech.wikimedia.org/wiki/RT_Triage_Duty#Creating_new_shell_users [14:32:13] trying to find how I look up a user's ldap uid [14:32:15] i should use that, right? [14:38:39] (03PS1) 10Mark Bergsma: Cleanup now unused Tampa IPv6 subnets and IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/155724 [14:39:06] (03CR) 10Mark Bergsma: [C: 032] Cleanup now unused Tampa IPv6 subnets and IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/155724 (owner: 10Mark Bergsma) [14:39:40] !log Removed IPv6 subnets 2620:0:860:1::/64 (squid subnet) and 2620:0:860:3::/64 (sandbox subnet) from cr2-pmtpa configuration [14:39:47] Logged the message, Master [14:43:30] (03CR) 10Filippo Giunchedi: [C: 031] Add 'trebuchet' package provider and role. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155603 (https://bugzilla.wikimedia.org/59931) (owner: 10Ori.livneh) [14:44:49] (03PS1) 10Mark Bergsma: Remove remaining Tampa IPv6 addresses in use [operations/dns] - 10https://gerrit.wikimedia.org/r/155725 [14:47:47] (03CR) 10Mark Bergsma: [C: 032] Remove remaining Tampa IPv6 addresses in use [operations/dns] - 10https://gerrit.wikimedia.org/r/155725 (owner: 10Mark Bergsma) [14:48:04] (03PS1) 10Ottomata: Add new shell account for Elliott Eggleston (ejegg), add to deployment group [operations/puppet] - 10https://gerrit.wikimedia.org/r/155726 [14:49:00] (03PS2) 10Ottomata: Add new shell account for Elliott Eggleston (ejegg), add to deployment group [operations/puppet] - 10https://gerrit.wikimedia.org/r/155726 [14:49:07] (03CR) 10Ottomata: [C: 032 V: 032] Add new shell account for Elliott Eggleston (ejegg), add to deployment group [operations/puppet] - 10https://gerrit.wikimedia.org/r/155726 (owner: 10Ottomata) [14:51:29] !log switched s1 sanitarium and labsdb replication to db1069:3311 mariadb 10 [14:51:36] Logged the message, Master [14:53:49] RECOVERY - SSH on mw1130 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [15:00:09] (03PS1) 10Springle: Reassign db1053 to s4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/155731 [15:00:56] (03PS1) 10Aklapper: When exporting Bugzilla tickets via Chase's script we run into an API bug with specific Unicode letters for https://bugzilla.wikimedia.org/show_bug.cgi?id=9444#c0. This is applying a hackish upstream workaround described in https://bugzilla.mozilla.org/sh [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/155732 (https://bugzilla.wikimedia.org/69747) [15:01:36] (03CR) 10Springle: [C: 032] Reassign db1053 to s4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/155731 (owner: 10Springle) [15:01:59] (03PS2) 10Aklapper: Work around Bugzilla XML RPC bug with special Unicode characters [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/155732 (https://bugzilla.wikimedia.org/69747) [15:08:29] !log Still no apache2.log on fluorine or in logstash. Log seems to be available on fenari. [15:08:35] Logged the message, Master [15:09:08] Any opsen want to figure out what broke that ^^ in the last few weeks? [15:09:31] It makes logstash much less useful to not see apache fatals [15:18:35] !log upgrade & restart db1053, fs check [15:18:42] Logged the message, Master [15:20:48] bd808: sorry I can't right now, perhaps later :( I do have one question though re: the failures during deployment with "bash: sync-common not found" on mw1019, I think we could either restore scap symlinks in /usr/local/bin or (preferred I think) make the callers source mw-deployment-vars so they have scap in PATH [15:22:00] godog: Ummm... sure. "callers" is either an ssh connection initiated by scap, puppet, or a user in their home shell [15:22:56] ori wanted to kill the /usr/local/bin symlinks out of puppet OCD more than anything else I think [15:24:03] I'm not sure how I would change scap internally to ensure that the path is updated before any calls. [15:25:00] Ori's assumption was that he was setting up the system path to include /srv/deployment/scap/scap/bin I think. [15:26:01] yep but that doesn't always work, because /etc/profile.d where the path is added only gets invoked by login shells [15:26:03] ssh mw1017.eqiad.wmnet 'echo $PATH' [15:26:16] It totally doesn't work :( [15:27:09] So either we put the symlinks back or I figure out how to make scap invoke a login shell on every ssh connection. I'd vote for symlinks personally [15:27:19] if you login and hit echo $PATH it works [15:27:26] yeah I now think that too [15:29:37] bd808: thanks for your help! [15:29:38] I think sync-common and scap-rebuild-cdbs are the only things that are called via ssh [15:29:58] okay we can start with those [15:30:17] Well there are other things (rm and rsync) but they are already in the default path [15:33:45] <^d> !log elastic1008: fixed /etc/hosts to point to actual IP instead of loopback [15:33:52] Logged the message, Master [15:34:22] <^d> godog: ^ should pick up next time elastic restarts on it, not going to roll it now for no other reason. [15:35:51] ^d: yep, thanks! for reference that's RT #8130 [15:37:26] <^d> noted, thx for the help debugging it [15:39:31] RECOVERY - Apache HTTP on mw1130 is OK: HTTP OK: HTTP/1.1 200 OK - 454 bytes in 0.007 second response time [15:39:33] (03PS1) 10Filippo Giunchedi: mediawiki: restore scap symlinks [operations/puppet] - 10https://gerrit.wikimedia.org/r/155736 [15:39:38] bd808: ^ [15:40:11] bd808: I reinstalled mw1130 and will be adding back shortly..anything you want me to do first? [15:40:24] (03CR) 10BryanDavis: [C: 031] "LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155736 (owner: 10Filippo Giunchedi) [15:40:35] ^d: no problem, I'm thinking of at least adding a check to puppet so we are aware [15:41:07] cmjohnson1: Make sure ti works? :) Run sync-common followed by scap-rebuild-cdbs to get it on the latest code. [15:41:17] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] mediawiki: restore scap symlinks [operations/puppet] - 10https://gerrit.wikimedia.org/r/155736 (owner: 10Filippo Giunchedi) [15:41:21] k [15:41:24] I think that puppet is not doing that at the moment [15:41:48] bd808: thanks! [15:43:17] (03CR) 10BryanDavis: "You can validate that the profile.d hook is not working for non-interactive shells with `ssh mw1017.eqiad.wmnet 'echo $PATH'`" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155736 (owner: 10Filippo Giunchedi) [15:45:50] RECOVERY - Puppet freshness on mw1130 is OK: puppet ran at Fri Aug 22 15:45:41 UTC 2014 [15:47:53] (03PS1) 10Ottomata: Run webstatscollector (modified) with kafkatee on analytics1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/155740 [15:48:06] okay deployments should be working as expected on mw1019 too now (hhvm) [15:48:06] (03PS1) 10KartikMistry: Enable webfonts by default for Divehi (dv) wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155741 (https://bugzilla.wikimedia.org/69860) [15:50:30] PROBLEM - Apache HTTP on mw1130 is CRITICAL: Connection refused [15:51:20] (03CR) 10Glaisher: [C: 031] Enable webfonts by default for Divehi (dv) wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155741 (https://bugzilla.wikimedia.org/69860) (owner: 10KartikMistry) [15:51:51] (03PS2) 10Ottomata: Run webstatscollector (modified) with kafkatee on analytics1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/155740 [15:52:00] (03CR) 10Ottomata: [C: 032 V: 032] Run webstatscollector (modified) with kafkatee on analytics1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/155740 (owner: 10Ottomata) [15:52:05] aaaaah beta wikidata broke again [15:52:18] wonder why it works for a few days then breaks [15:53:00] !log springle Synchronized wmf-config/db-eqiad.php: depool db1056 while cloning (duration: 00m 07s) [15:53:06] Logged the message, Master [15:54:16] !log xtrabackup db1056 to db1053 [15:54:22] Logged the message, Master [15:55:59] bd808: btw fluorine hasn't got the mediawiki config for rsyslog likely the reason why apache2.log isn't there, has it ever been the case? [15:56:45] ottomata: man I think that was out of date before my time :) in the README for the admin module there is a command I think to lookup their ldap stuff [15:56:50] I think that fenari used to turn the syslog feed into udp2log feed aimed at flourine [15:57:19] i just updated it today, let's see if I got the right command! [15:57:29] So either udp2log is down on fenari or udp communication is blocked somewhere [15:57:31] RECOVERY - check if dhclient is running on mw1130 is OK: PROCS OK: 0 processes with command name dhclient [15:57:31] RECOVERY - Disk space on mw1130 is OK: DISK OK [15:57:31] RECOVERY - DPKG on mw1130 is OK: All packages OK [15:57:31] RECOVERY - RAID on mw1130 is OK: OK: no RAID installed [15:57:31] RECOVERY - check configured eth on mw1130 is OK: NRPE: Unable to read output [15:57:44] chasemp: where do you run ldaplist? [15:57:46] i couldn't get that to work for me [15:57:53] oh on silver... [15:58:28] godog: I think I saw m.ark say something about pulling ipv6 from fenari, maybe that broke it? [15:58:30] yeah, that command doesn't work for me [15:58:33] The database you selected does not exist. Please use "ldaplist -h" to see available databases. [15:58:38] OHHHH [15:58:46] passwd is not mean to be replaced with ldap passwd [15:58:49] that IS a database name [15:58:50] got it [15:58:54] i was trying the uids database [16:00:35] RECOVERY - nutcracker process on mw1130 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [16:00:36] RECOVERY - nutcracker port on mw1130 is OK: TCP OK - 0.000 second response time on port 11212 [16:00:47] thanks chasemp, i updated that page to link to the admin README [16:01:01] hey good idea [16:01:03] thanks to you [16:09:26] bd808: mmhh judging from puppet and the timestamps I think it was I98f9fc20 [16:09:39] e.g. mw1020 [16:09:40] -rw-r----- 1 syslog adm 40156150 Aug 15 15:17 /var/log/apache2.log.1 [16:09:43] -rw-r----- 1 syslog adm 0 Aug 17 06:25 /var/log/apache2.log [16:10:03] you know what could be nice [16:10:12] if puppet could leave some file metadata behind every time it touches everything [16:10:21] with the hash of the catalog version or whatever [16:10:27] s/everything/anything/ [16:11:37] yep that'd be nice auditing [16:11:38] probably a bit much overhead in some cases [16:16:01] It sort of does that with the stuff it keeps in /var/lib/puppet/clientbucket but you have to dig through run reports to find the hashes of files that were archived [16:18:45] true that [16:20:31] it keeps the replaced files when it does a swap, there are a few libs to restore [16:20:55] https://github.com/andytinycat/puppet-clientbucket-restore [16:21:18] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:22:28] RECOVERY - Apache HTTP on mw1130 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.205 second response time [16:22:54] neat. That script brute force searches by reading the 'paths' metadata file for everything in the cache. [16:23:39] bd808: btw is there a bug already for that apache2.log mishap? [16:24:24] godog: Nope. I've just been whining about it in SAL [16:25:15] When I first whined ori made noises like he would look into it, but I think he lost track [16:25:18] (03PS1) 10Cmjohnson: Revert "Removing mw1130 from dsh files to replace disk and re-install" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155747 [16:26:45] (03CR) 10Cmjohnson: [C: 032] Revert "Removing mw1130 from dsh files to replace disk and re-install" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155747 (owner: 10Cmjohnson) [16:26:52] oh ok so he's aware [16:27:36] godog: He at least was aware at some point [16:30:51] bd808: at the devops kansas city thing last night speaker from elasticsearch used wikipedia en search as the third slide example :) [16:31:12] saw new marval && logstash stuff, it is sweet [16:31:40] I need to get logstash and kibana upgraded at some point [16:31:46] too many things [16:32:16] Right now I'm wasting time trying to get registered for a damn HSA account [16:32:19] they finally committed to doing real security groupings for the interface from dashboards down to indexes [16:32:29] Bank websites are the worst pile of crap evar [16:32:49] chasemp: Oh? That would be awesome [16:33:03] I wonder how they will make that happen [16:33:20] elastic has no idea about auth [16:33:29] <^d> chasemp: example of a new user of it? [16:33:32] and kibana is all client side [16:33:35] <^d> Or example of someone who should use it :p [16:33:50] I will link to the slides here let me find them [16:34:02] (03CR) 10Cmjohnson: [C: 032] "I agree we can eliminate this +2 and merging." [operations/puppet] - 10https://gerrit.wikimedia.org/r/153227 (owner: 10Dzahn) [16:35:04] lame, dude said he would add them post-talk but fail [16:35:40] so not sure on implementation details, only that the es guys have taken on logstash as an in-house thing pretty much and they were showing up new UI using NY state car wreck data which was super neat [16:35:43] <^d> No worries, I know what our search page looks like :) [16:35:54] any non-time series data in logstash is interesting to me [16:35:55] :) [16:36:09] (03PS1) 10Yurik: Zero: 437-01 https support, unified; clarified analytics [operations/puppet] - 10https://gerrit.wikimedia.org/r/155748 [16:36:12] bblack, ^ [16:36:13] bd808: Feel like talking about wikitech a bit today? I think you offered to help me get it on the deployment train. [16:36:38] I did offer. me and my big mount [16:36:42] *mouth [16:37:03] andrewbogott: hangout or irc chat or ??? [16:37:09] irc is fine w/me. [16:37:13] (03CR) 10BBlack: [C: 032] Zero: 437-01 https support, unified; clarified analytics [operations/puppet] - 10https://gerrit.wikimedia.org/r/155748 (owner: 10Yurik) [16:37:24] thx :) [16:37:40] np [16:37:52] I don't really know enough about the deployment system to know where to start. I presume that 'deploying' is more complicated than just git fetch; git rebase... [16:39:20] andrewbogott: it actually boils down to that towards the end. The big thing we need to figure out is how to work wikitech's config into https://github.com/wikimedia/operations-mediawiki-config [16:40:03] (03CR) 10Legoktm: "Has this been tested for its impact upon things like the parser cache?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154408 (https://bugzilla.wikimedia.org/67709) (owner: 10Jforrester) [16:40:21] bd808: does that already have per-host switching, or is there currently only one big config? [16:45:12] (03PS1) 10Springle: Point m3-slave CNAME to db1048 [operations/dns] - 10https://gerrit.wikimedia.org/r/155750 [16:45:56] andrewbogott: There is per wiki switching [16:46:45] it's layered with global settings, per-environment settings, and per-wiki settings [16:46:53] chasemp: ^ that CNAME doesn't actually affect anything yet, afaics. not in puppet.. care to confirm? [16:47:09] (03PS1) 10Rangilo Gujarati: Have added two blog feed as requested on Meta here --> https://meta.wikimedia.org/wiki/Planet_Wikimedia [operations/puppet] - 10https://gerrit.wikimedia.org/r/155752 [16:47:11] Hm, ok. So maybe that's straightforward? I presume there's some system for embedding secret passwords from puppet as well? [16:47:24] Or should I just import a php file that's not in the confg and make sure puppet creates it? [16:47:26] nothing is pointed directly to the m3-slave atm [16:47:30] thanks [16:47:41] (03CR) 10Springle: [C: 032] Point m3-slave CNAME to db1048 [operations/dns] - 10https://gerrit.wikimedia.org/r/155750 (owner: 10Springle) [16:47:50] (03PS1) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 [16:48:14] andrewbogott: There is a "private" file on tin that passwords go in. See tin:/a/common/private [16:48:20] (03CR) 10Ori.livneh: Add 'trebuchet' package provider and role. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155603 (https://bugzilla.wikimedia.org/59931) (owner: 10Ori.livneh) [16:48:42] bd808: ok, I will read that config repo and bother you further in the afternoon. [16:48:49] It used to be completely unversioned *shudder* but Sam made it a local repo recently [16:48:54] Or… in my afternoon. You're in SF? [16:49:14] Oh, that doesn't derive from puppet? That's kind of bad. [16:49:15] andrewbogott: I'm GMT-6 and I'll be out this afternoon [16:49:19] But I can probably hack around it. [16:49:31] andrewbogott: Yeah no puppet in the cluster deploy process [16:49:43] GM-6 is CDT right? [16:49:53] MDT [16:50:07] SF +1h [16:50:19] EDT-2h [16:50:30] timezones are so awful, especially with DST [16:50:47] they aren't as bad as date math! [16:50:47] embrace UTC time on your laptop and irc client :) [16:50:47] Does GMT observe DST? [16:50:52] So GMT = UTC-1 right now? [16:50:59] the city of grenwich does, but UTC does not [16:51:08] DST is +1, not -1 [16:51:10] BST (brittish summer time) [16:51:11] and UTC and GMT are functionally equivalent for things that matter to us [16:51:11] Right, but when people say 'GMT' [16:51:12] dammit [16:51:27] it's preferably to say UTC rather than GMT today, though [16:51:43] it's the newer/better standard, but the practical difference are immaterial for most things [16:51:50] I have my little mac widget set to Rekjavik because it doesn't allow for UTC and that's the only city I could find that doesn't switch per season... [16:52:02] The difference is something about leap seconds isn't it? UTC never runs backwards [16:52:07] So GMT != 'what time it is in greenwich'? [16:52:09] yeah [16:52:15] there's a wikipedia page or two that explains it all [16:52:17] If I want a new router in exim4.conf to POST to a mediawiki API using curl, What IP should I give in the command, like curld -d [16:52:21] andrewbogott: right [16:52:21] (03CR) 10Filippo Giunchedi: Add 'trebuchet' package provider and role. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155603 (https://bugzilla.wikimedia.org/59931) (owner: 10Ori.livneh) [16:52:32] http://en.wikipedia.org/wiki/Coordinated_Universal_Time [16:52:34] 'key=foo" https://localhost/api.php ? [16:52:42] goes over the relation between UTC UT0 UT1 GMT etc [16:52:42] well, anyway, bd808, if that's the place to start, then I will start! [16:53:20] djb (always ahead of the curve, sometimes by unreasonable amounts) prefers to use TAI for all computer time representations [16:53:20] And the OSX fix for UTC is `sudo ln -sf /usr/share/zoneinfo/UTC /etc/localtime` [16:53:55] That'll make my mac's actual clock always read UTC, right? That seems less useful... [16:54:06] Since on occasion I interact with local humans who use local clocks :) [16:54:13] bblack: it is then "| tai64nlocal" all the way down [16:54:15] andrewbogott: I haven't done it myself yet but I keep thinking about it [16:54:20] :) [16:54:24] I know what time it is, but what time is it /here/? [16:54:43] blerg -- tai64nlocal is the devil [16:54:53] I find since I mostly work on computers with international people and machines that are set to UTC, etc [16:55:03] it's simpler to keep my machines and IRC and logs and everything in UTC [16:55:20] if I need to know what time it is here, I'll look at the clock on my microwave, or just look at how bright it is out the window :) [16:55:36] andrewbogott: I run geektool and have this in the corner of my desktop `printf 'UTC offset: ';date +%z;date -u +'%Y-%m-%dT%H:%MZ'` [16:55:56] (or subtract some number of hours that changes in the summer for really stupid reasons) [16:56:03] Makes sense to me, but I think I'm already sufficiently detached from my actual surroundings :) [16:56:30] And anyway, Rekjavik works fine, it's just weird that OSX doesn't supply a UTC option [16:57:23] break free from the chains of the man, don't accept his abitrary UTC offsets! [16:58:40] If I had my way, we'd all use live-updated GPS-local continuous timezones. [16:59:12] http://en.wikipedia.org/wiki/Universal_time#Versions <- is informative as well [17:00:10] heh that would be awesome, kinda [17:00:43] except when two people talked over a phone with a 43 minute time difference between them and tried to decide where to meet 31 minutes in another direction :) [17:00:58] s/where/when/ [17:01:37] on the plus side, though, if you were in a jet going 1,000 mph in the correct direction, your clock would never change time (but the date would jump each time you hit the international dateline) [17:01:58] Time would always be hh:mm:ddd:mm:ss where the second mm and ss are minutes and seconds of latitude [17:02:02] It's really not so complicated :p [17:03:04] yeah but imagine if someone says "let's meet at this place 100 miles away at 10:30 local time at the destination", everyone would have to do some math to figure out when to leave to get there on time [17:03:05] I guess to account for relativity you'd have to include velocity relative to something… or acceleration? Anyway I'm sure my phone can take care of all this. [17:03:37] I already tell google navigation where I'm going and when I want to get there, so that'd be the same :) [17:03:46] heh :) [17:04:13] it's entirely reasonable, though, for everyone to just use UTC [17:04:27] you just have to get used to the idea that sunrise might be at 15:00 or whatever [17:04:48] (well, and that the date would change in the midst of a day) [17:04:53] Doesn't China do that already, sort of? Everyone uses Beijing time regardless? [17:05:07] yeah china is one timezone across a pretty broad number of natural hours [17:05:09] I forget how many [17:05:47] 5 apparently [17:05:54] so no http://en.wikipedia.org/wiki/Swatch_Internet_Time ? I am disappoint [17:06:26] bd808: in this repo, does 'labs' mean 'beta'? [17:06:41] andrewbogott: yes [17:06:49] yay [17:07:19] godog: that actually sounds pretty good. [17:07:59] andrewbogott: yeah it isn't much different than "everyone! utc!" [17:08:19] It combines all the best features of UTC and unix [17:08:46] I actually owed a beats watch at some point in '96 or '97 [17:08:55] *owned [17:09:15] It didn't work out well with all the meat space people I had to interact with [17:09:46] bd808: so this repo uses a bunch of submodules? I could use a brief guided tour of how this is organized. [17:10:33] andrewbogott: Sure. https://wikitech.wikimedia.org/wiki/Het_deploy is a reasonable palce to start [17:10:37] *palce [17:10:40] *place [17:11:19] The repo I linked you to is called /a/common/wmf-config in that doc [17:11:48] er. I guess it's /a/common actually [17:12:27] and then the release branches are checked out there separately as /a/common/php-1.XwmfY [17:14:14] (03PS8) 10Ori.livneh: Add 'trebuchet' package provider and role. [operations/puppet] - 10https://gerrit.wikimedia.org/r/155603 (https://bugzilla.wikimedia.org/59931) [17:14:23] andrewbogott: Depending on how tricky things are, we may need to invent a wikitech realm to isolate things for you. [17:14:56] (03PS9) 10Ori.livneh: Add 'trebuchet' package provider and role. [operations/puppet] - 10https://gerrit.wikimedia.org/r/155603 (https://bugzilla.wikimedia.org/59931) [17:15:02] (03CR) 10Ori.livneh: [C: 032 V: 032] Add 'trebuchet' package provider and role. [operations/puppet] - 10https://gerrit.wikimedia.org/r/155603 (https://bugzilla.wikimedia.org/59931) (owner: 10Ori.livneh) [17:16:23] andrewbogott: Reedy would be a good resource for both of us on this. He knows way more than I do about the MWMultiVersion aspects. But none of it is rocket surgery [17:17:08] (03PS3) 10Ori.livneh: Use Trebuchet package provider for RCStream [operations/puppet] - 10https://gerrit.wikimedia.org/r/155648 [17:17:15] Right now I'm pondering the merits of "Start with a default mw config and add the special things that wikitech needs" vs "import just the exact config that wikitech is already using" [17:17:22] The latter seems much safer but ugly [17:17:36] The former will be much more awesome in the long term [17:17:45] yeah [17:17:56] Probably possible. Wikitech isn't /that/ different. [17:18:21] I just hate having to break production while I'm learning [17:18:27] (03CR) 10Ori.livneh: [C: 032] Use Trebuchet package provider for RCStream [operations/puppet] - 10https://gerrit.wikimedia.org/r/155648 (owner: 10Ori.livneh) [17:19:40] PROBLEM - puppet last run on nickel is CRITICAL: CRITICAL: Epic puppet fail [17:20:29] (03PS1) 10Mark Bergsma: Remove IPv6 address from fenari [operations/puppet] - 10https://gerrit.wikimedia.org/r/155758 [17:21:28] PROBLEM - puppet last run on es4 is CRITICAL: CRITICAL: Epic puppet fail [17:22:07] (03PS1) 10Ori.livneh: Use Trebuchet package provider for scap [operations/puppet] - 10https://gerrit.wikimedia.org/r/155759 [17:22:44] ori, are you on top of those puppet failures? [17:22:52] nickel says "Could not autoload package: Could not autoload /var/lib/puppet/lib/puppet/provider/package/trebuchet.rb: no such file to load -- json" [17:23:11] weird, works elsewhere. nickel is ancient, right? [17:23:21] maybe it has a version of ruby the requires the json gem? [17:23:27] PROBLEM - puppet last run on ms1001 is CRITICAL: CRITICAL: Epic puppet fail [17:24:12] lucid [17:24:46] ok, i got it [17:25:02] same thing on es4, also lucid [17:25:28] PROBLEM - puppet last run on tridge is CRITICAL: CRITICAL: Epic puppet fail [17:25:35] calling nickel ancient [17:25:40] you youngsters [17:26:17] PROBLEM - puppet last run on ms1004 is CRITICAL: CRITICAL: Epic puppet fail [17:26:27] It's nearly four years old! Four years ago I was… [17:26:49] it's just passed the stage of "a new box" for me [17:27:33] andrewbogott: this is a variant of https://tickets.puppetlabs.com/si/jira.issueviews:issue-html/PUP-2164/PUP-2164.html btw [17:28:29] (03PS1) 10Ori.livneh: Trebuchet package provider: require 'rubygems' [operations/puppet] - 10https://gerrit.wikimedia.org/r/155760 [17:29:37] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Epic puppet fail [17:30:08] PROBLEM - puppet last run on nfs1 is CRITICAL: CRITICAL: Epic puppet fail [17:30:28] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Epic puppet fail [17:31:34] ah, sorry. fix in a moment. the json situation on 1.8.7 is stupid. [17:32:37] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [17:32:47] PROBLEM - HTTP 5xx req/min on labmon1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [17:33:36] (03PS2) 10Ori.livneh: Trebuchet package provider: use PSON, not JSON [operations/puppet] - 10https://gerrit.wikimedia.org/r/155760 [17:34:14] (03PS3) 10Ori.livneh: Trebuchet package provider: use PSON, not JSON [operations/puppet] - 10https://gerrit.wikimedia.org/r/155760 [17:34:28] PROBLEM - puppet last run on sanger is CRITICAL: CRITICAL: Epic puppet fail [17:34:30] (03CR) 10Ori.livneh: [C: 032 V: 032] "fixes breakage" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155760 (owner: 10Ori.livneh) [17:36:38] RECOVERY - puppet last run on nickel is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [17:36:47] that fixed it [17:37:08] RECOVERY - puppet last run on nfs1 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [17:37:24] bd808: Apart from 'labs' I don't actually see any switching based on wiki. Am I missing something important? [17:37:28] PROBLEM - puppet last run on linne is CRITICAL: CRITICAL: Epic puppet fail [17:37:28] RECOVERY - puppet last run on tridge is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:37:28] RECOVERY - puppet last run on sanger is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [17:38:27] RECOVERY - puppet last run on ms1001 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:38:29] andrewbogott: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php [17:39:11] It's a lot of wgConf magic [17:39:27] RECOVERY - puppet last run on linne is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:39:28] RECOVERY - puppet last run on es4 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:39:30] andrewbogott: thanks for the ping about that failure [17:39:44] ori: Sure, thanks for fixing! [17:40:19] bd808: ok, starting to understand [17:41:08] andrewbogott: $wgConf is a SiteConfiguration object. It generates the globals based on wiki and does fancy stuff like merging wiki specific bits with cluster wide defaults [17:41:22] (03PS1) 10Ottomata: Move misc udp2log instance (sqstat) from analytics1003 to analytics1026 [operations/puppet] - 10https://gerrit.wikimedia.org/r/155762 [17:41:30] <^d> Ugh wgconf. [17:41:36] which means I have to form roughly 1000 opinions now [17:41:47] !log nuking /srv/deployment/rcstream on rcs1002 to verify trebuchet package provider reprovisions it [17:41:52] Logged the message, Master [17:42:29] ^d: Are you to blame for SiteConfiguration? :P [17:42:42] <^d> I had nothing to do with that! [17:42:45] <^d> Not my fault! [17:42:48] <^d> :) [17:43:37] <^d> If I had to guess I'd say Tim wrote the initial version. [17:43:53] !log moving sqstat udp2log filter from analytics1003 to analytics1026, reqstats might blip for a sec... [17:43:57] (03CR) 10Ottomata: [C: 032 V: 032] Move misc udp2log instance (sqstat) from analytics1003 to analytics1026 [operations/puppet] - 10https://gerrit.wikimedia.org/r/155762 (owner: 10Ottomata) [17:44:00] Logged the message, Master [17:44:01] I've had the good sense to quit all the companies where I made horrible software design decisions so I wouldn't have to continue living with them. [17:44:18] PROBLEM - Recent Changes Stream Python backend on rcs1002 is CRITICAL: CRITICAL: Not all configured rcstream instances are running. [17:44:37] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [17:44:47] RECOVERY - HTTP 5xx req/min on labmon1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:44:48] SiteConfig does neat stuff but it's so easy to mess up the config for [17:45:17] RECOVERY - puppet last run on ms1004 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [17:45:17] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: Puppet has 1 failures [17:46:18] RECOVERY - puppet last run on rcs1002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [17:46:18] RECOVERY - Recent Changes Stream Python backend on rcs1002 is OK: OK: All defined rcstream jobs are runnning. [17:46:40] <^d> bd808: Maybe we can make configuration both *neat* and *easy* [17:46:48] <^d> :) [17:46:58] yaml! [17:48:33] The system I built at $DAYJOB-2 was awesome. All db driven, only static config was db connection information, lots of handy forms and scripts for adding new things. Then I had to set up a second copy from scratch... head esplode [17:48:38] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [17:48:54] (03PS1) 10Ori.livneh: Use Trebuchet's deploy.checkout call rather than Git [operations/puppet] - 10https://gerrit.wikimedia.org/r/155765 [17:49:31] (03PS2) 10Ori.livneh: Use Trebuchet's deploy.checkout call rather than Git [operations/puppet] - 10https://gerrit.wikimedia.org/r/155765 [17:50:17] godog: ^ easy [17:50:28] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:51:09] (03CR) 10BryanDavis: [C: 031] "How did I miss that before?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155765 (owner: 10Ori.livneh) [17:51:44] (03CR) 10Filippo Giunchedi: [C: 031] Use Trebuchet's deploy.checkout call rather than Git [operations/puppet] - 10https://gerrit.wikimedia.org/r/155765 (owner: 10Ori.livneh) [17:51:47] ori: yup [17:51:48] thanks [17:51:55] (03CR) 10Ori.livneh: [C: 032] Use Trebuchet's deploy.checkout call rather than Git [operations/puppet] - 10https://gerrit.wikimedia.org/r/155765 (owner: 10Ori.livneh) [17:52:10] I'm off, have a good weekend! [17:52:57] godog: you too! [17:53:02] What is wmgMemoryLimit vs wgMemoryLimit? [17:53:08] Is one of them a typo? [17:53:28] woooooooooo [17:53:30] Notice: /Stage[main]/Rcstream/Package[rcstream]/ensure: ensure changed 'purged' to 'present' [17:53:30] Info: /Stage[main]/Rcstream/Package[rcstream]: Scheduling refresh of Service[rcstream] [17:53:30] Notice: /Stage[main]/Rcstream/Service[rcstream]/ensure: ensure changed 'stopped' to 'running' [17:53:32] \o/ [17:54:28] andrewbogott: wmg* are cluster config globals that will be used to set wm* globals later generally [17:54:41] ok [17:54:57] our global poop is poop [17:55:05] andrewbogott: to answer the question you're not asking: "yes, it's retarded" [17:55:33] What's the 'm' for? [17:55:56] my guess was "wikimedia" vs "wiki" [17:56:23] although maybe "wiki meta" vs "wiki"? [17:57:02] "wg" stands for wikipedia global, so "wmg" is probably wikimedia global [17:57:30] I guess there's no real reason for wikitech to not get its static content from bits like everything else does... [17:57:30] wmg = Wikimedia global, yes [17:57:44] wmg* vars are usually referenced in CommonSettings.php [17:57:58] ini_set( 'memory_limit', $wmgMemoryLimit ); [17:58:30] And often act as feature flags there [17:58:36] if ( $wmgUseTimeline ) { [17:58:37] * andrewbogott is now reading through every single config option on wikitech to determine which ones are defaults already [17:58:39] woo [18:03:05] (03PS3) 10Chad: WIP: Collection of fun bash scripts for managing elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/155679 [18:07:35] (03CR) 10Ori.livneh: [C: 032] Do not define MEDIAWIKI before loading WebStart.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155687 (owner: 10BryanDavis) [18:08:33] (03Merged) 10jenkins-bot: Do not define MEDIAWIKI before loading WebStart.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155687 (owner: 10BryanDavis) [18:10:41] !log ori updated /a/common to {{Gerrit|I338d72a47}}: Do not define MEDIAWIKI before loading WebStart.php [18:10:47] Logged the message, Master [18:11:19] cool. less log spam [18:23:23] (03PS1) 10Ottomata: kafkatee webstatscollector only needs mobile and text roles [operations/puppet] - 10https://gerrit.wikimedia.org/r/155769 [18:24:26] (03CR) 10Ottomata: [C: 032 V: 032] kafkatee webstatscollector only needs mobile and text roles [operations/puppet] - 10https://gerrit.wikimedia.org/r/155769 (owner: 10Ottomata) [18:55:32] (03PS1) 10Aaron Schulz: Use a different profile ID for job requests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155778 [19:06:00] (03PS2) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 [19:08:14] (03PS1) 10Yurik: Zero: Switched 515-03 to ip-based unified [operations/puppet] - 10https://gerrit.wikimedia.org/r/155783 [19:08:17] bblack, ^ [19:09:19] (03CR) 10BBlack: [C: 032] Zero: Switched 515-03 to ip-based unified [operations/puppet] - 10https://gerrit.wikimedia.org/r/155783 (owner: 10Yurik) [19:18:39] (03PS1) 10Yurik: Zero: Added 515-05 for unified [operations/puppet] - 10https://gerrit.wikimedia.org/r/155785 [19:19:02] bblack, hold on on ^ for a sec, i think i will add one more ID in tehre [19:19:17] ok [19:19:17] (this one is different from what you +2ed 2 min ago) [19:24:48] (03PS1) 10Ottomata: Send filtered logs to webstats-collector over network [operations/puppet] - 10https://gerrit.wikimedia.org/r/155788 [19:25:06] (03PS2) 10Yurik: Zero: Added 255-03, 515-05 for unified [operations/puppet] - 10https://gerrit.wikimedia.org/r/155785 [19:25:17] (03PS2) 10Ottomata: Send filtered logs to webstats-collector over network [operations/puppet] - 10https://gerrit.wikimedia.org/r/155788 [19:25:21] bblack, ok, added one more. I won't be here next week, that's why adding them a bit ahead of time. ^^ [19:25:22] (03CR) 10Ottomata: [C: 032 V: 032] Send filtered logs to webstats-collector over network [operations/puppet] - 10https://gerrit.wikimedia.org/r/155788 (owner: 10Ottomata) [19:25:35] (03PS1) 10Andrew Bogott: Random stab at getting wikitech config in here. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 [19:25:43] (03CR) 10jenkins-bot: [V: 04-1] Random stab at getting wikitech config in here. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [19:26:47] (03PS2) 10Andrew Bogott: Random stab at getting wikitech config in here. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 [19:27:15] Reedy, bd808|BUFFER, I welcome your thoughts about the mess that is https://gerrit.wikimedia.org/r/#/c/155789/ [19:27:51] oh, cool! [19:27:51] (03PS3) 10BBlack: Zero: Added 255-03, 515-05 for unified [operations/puppet] - 10https://gerrit.wikimedia.org/r/155785 (owner: 10Yurik) [19:27:57] (03CR) 10BBlack: [C: 032 V: 032] Zero: Added 255-03, 515-05 for unified [operations/puppet] - 10https://gerrit.wikimedia.org/r/155785 (owner: 10Yurik) [19:34:53] (03PS3) 10Andrew Bogott: Random stab at getting wikitech config in here. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 [19:36:25] (03CR) 10Dzahn: "would be nice if you could split this into 2 changes, one just adding the file to repo and one applying the actual hack. that way one can " [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/155732 (https://bugzilla.wikimedia.org/69747) (owner: 10Aklapper) [19:42:45] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [19:42:54] PROBLEM - HTTP 5xx req/min on labmon1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [19:45:17] (03CR) 10Legoktm: Added the bouncehandler router to catch in all bounce emails (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 (owner: 1001tonythomas) [19:47:19] (03CR) 10Dzahn: [C: 032] "just changes a comment, but since ferm is used in many places it removes a whole bunch of compiler warnings because of deprecated variable" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154373 (owner: 10Dzahn) [19:48:51] (03PS1) 10Ori.livneh: Trebuchet: call saltutil.{sync_all,refresh_pillars} when setting grains [operations/puppet] - 10https://gerrit.wikimedia.org/r/155793 [19:50:03] (03CR) 10Ori.livneh: [C: 032] Trebuchet: call saltutil.{sync_all,refresh_pillars} when setting grains [operations/puppet] - 10https://gerrit.wikimedia.org/r/155793 (owner: 10Ori.livneh) [19:55:44] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [19:55:55] RECOVERY - HTTP 5xx req/min on labmon1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:04:41] hi ori ! any thoughts on where I should find the API in WMF setup ? is there something generic like api.wikimedia.org or sth where I can send in bounce emails from exim running on the mail server ? [20:04:53] as in https://gerrit.wikimedia.org/r/#/c/155753/2/templates/exim/exim4.conf.SMTP_IMAP_MM.erb [20:16:09] tonythomas: not sure, really. maybe other folks here know [20:27:44] (03PS1) 10Ori.livneh: declare hhvm_appservers lvs service ip for labs, too [operations/puppet] - 10https://gerrit.wikimedia.org/r/155806 [20:28:12] bblack: ^ [20:28:50] (03PS2) 10Ori.livneh: declare hhvm_appservers lvs service ip for labs, too [operations/puppet] - 10https://gerrit.wikimedia.org/r/155806 (https://bugzilla.wikimedia.org/69921) [20:32:06] (03CR) 10Ori.livneh: "I cherry-picked this on the Labs puppet master and it fixed bits." [operations/puppet] - 10https://gerrit.wikimedia.org/r/155806 (https://bugzilla.wikimedia.org/69921) (owner: 10Ori.livneh) [20:33:24] (03CR) 10BBlack: [C: 032] declare hhvm_appservers lvs service ip for labs, too [operations/puppet] - 10https://gerrit.wikimedia.org/r/155806 (https://bugzilla.wikimedia.org/69921) (owner: 10Ori.livneh) [20:33:52] (03CR) 10Andrew Bogott: [C: 032] wikitech - use ssl_ciphersuite to add HSTS [operations/puppet] - 10https://gerrit.wikimedia.org/r/154368 (owner: 10Chmarkine) [20:33:52] thanks bblack [20:34:13] ori: re POST, I still haven't really managed to logic out why it's happening. I think varnish is doing it intentionally, but when I read the VCL I don't see why. The "obvious" req.backend= clause for POST shouldn't apply when the hhvm backend decision is being made. [20:34:18] np [20:34:41] (03CR) 10Dzahn: "13 'labs' => $::site ? {" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155806 (https://bugzilla.wikimedia.org/69921) (owner: 10Ori.livneh) [20:36:08] bblack: are we sure that it really is consistently proxying POSTs to zend? is it possible that instead the backend was marked as sick when we were looking? [20:36:38] it seemed consistent when I tried it several times, while my GETs were going to hhvm [20:37:04] hmmm. [20:37:14] the POST (and CentralAuth) -specific clause that messes with req.backend= seems too close to not be part of the problem. But, again, it doesn't seem to be the problem. [20:37:40] (03CR) 10Andrew Bogott: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154368 (owner: 10Chmarkine) [20:37:54] (because on text-backend in tier1, where we end up setting req.backend=hhvm_appservers, that bit about req.backend=backend_random doesn't apply) [20:39:38] but when you trace it with varnishlog on the final eqiad backend for the request, it doesn't log any kind of failure or anything, it just goes straight to 10.2.2.1 like it was told to do that by VCL somehow [20:42:55] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/247/change/153976/diff/fenari.wikimedia.org.diff.formatted" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153976 (owner: 10Dzahn) [20:49:31] bblack: is something incrementing req.restarts perhaps? [20:49:55] I don't *think* so, that should only happen on backend fail [20:50:03] but since I don't know what's going on yet, who knows :) [20:56:18] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/248/change/153849/diff/ytterbium.wikimedia.org.diff.formatted" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153849 (owner: 10Dzahn) [20:57:53] (03CR) 10Ori.livneh: gerrit - use apache::site (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153849 (owner: 10Dzahn) [20:58:06] (03CR) 10Dzahn: [C: 032] "ah thanks, yea, wikiedu.org obviously fine, and the other feed also looks relevant" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155752 (owner: 10Rangilo Gujarati) [20:59:27] (03CR) 10Dzahn: [V: 032] "manual verify neeed" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155752 (owner: 10Rangilo Gujarati) [21:04:31] (03CR) 10Dzahn: "thanks Rangilo, ran puppet and a feed update on en.planet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155752 (owner: 10Rangilo Gujarati) [21:12:47] (03PS1) 10Dzahn: en.planet - some feed URLs have moved [operations/puppet] - 10https://gerrit.wikimedia.org/r/155820 [21:17:54] hmm, so if there is a gerrit group "planet" and a person "planetenxin", it seems impossible to add the group as a reviewer, it always expands to the user [21:19:30] renaming groups works though, so i'm good :) [21:20:03] (03CR) 10Dzahn: [C: 032] en.planet - some feed URLs have moved [operations/puppet] - 10https://gerrit.wikimedia.org/r/155820 (owner: 10Dzahn) [21:23:15] (03CR) 10Dzahn: gerrit - use apache::site (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153849 (owner: 10Dzahn) [21:26:54] (03CR) 10Odder: "Please don't copy me onto future patch sets. Thank you!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155820 (owner: 10Dzahn) [21:27:14] (03CR) 10Dzahn: remove HTTPS config from gitblit template (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/154973 (owner: 10Dzahn) [21:28:43] (03CR) 10Dzahn: "you'll have to be deleted from the gerrit group.. sigh" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155820 (owner: 10Dzahn) [21:31:27] (03CR) 10Dzahn: [C: 04-2] "http://puppet-compiler.wmflabs.org/249/change/153986/html/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153986 (owner: 10Dzahn) [21:31:45] (03PS1) 10BBlack: allow POST to use hhvm [operations/puppet] - 10https://gerrit.wikimedia.org/r/155824 [21:32:12] ori: ^ [21:33:18] (03Abandoned) 10Dzahn: limn - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153961 (owner: 10Dzahn) [21:33:34] eh [21:34:03] (03PS2) 10Ori.livneh: Allow POST to use HHVM [operations/puppet] - 10https://gerrit.wikimedia.org/r/155824 (owner: 10BBlack) [21:34:20] bblack: (amended the commit message for posterity) [21:34:29] (03CR) 10Ori.livneh: [C: 031] Allow POST to use HHVM [operations/puppet] - 10https://gerrit.wikimedia.org/r/155824 (owner: 10BBlack) [21:34:33] (03CR) 10BBlack: [C: 032] Allow POST to use HHVM [operations/puppet] - 10https://gerrit.wikimedia.org/r/155824 (owner: 10BBlack) [21:34:43] (03CR) 10BBlack: [V: 032] Allow POST to use HHVM [operations/puppet] - 10https://gerrit.wikimedia.org/r/155824 (owner: 10BBlack) [21:36:18] (03CR) 10Dzahn: "sorry, gave up, i just wanted to fix apache::site no the entire db setup of instance proxy" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153961 (owner: 10Dzahn) [21:39:58] (03CR) 10Dzahn: [C: 04-2] "no, definitely not intended to touch submodules, it's just super annoying that it happens all the time :P" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154371 (owner: 10Dzahn) [21:40:20] (03PS3) 10Dzahn: exim templates - deprecated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/154371 [21:41:09] (03CR) 10Dzahn: exim templates - deprecated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/154371 (owner: 10Dzahn) [21:43:34] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/250/change/153843/html/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153843 (owner: 10Dzahn) [21:45:28] (03CR) 10Dzahn: "needs manual rebase" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153987 (owner: 10Dzahn) [21:45:54] (03PS4) 10Dzahn: puppetmaster - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153986 [21:51:41] (03PS2) 10Dzahn: puppetmaster Apache template - retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/153987 [21:53:00] (03CR) 10Dzahn: ishmael behind varnish, make neon a backend [operations/puppet] - 10https://gerrit.wikimedia.org/r/154969 (owner: 10Dzahn) [21:56:11] (03PS1) 10Dzahn: gitblit apache template - retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/155829 [22:02:16] (03CR) 10Dzahn: [C: 04-2] "http://memegenerator.net/instance/53698370" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155829 (owner: 10Dzahn) [22:05:13] (03PS2) 10Dzahn: gitblit apache template - retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/155829 [22:06:17] (03PS3) 10Dzahn: gitblit apache template - retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/155829 [22:06:56] (03CR) 10Dzahn: remove HTTPS config from gitblit template (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/154973 (owner: 10Dzahn) [22:11:02] (03PS5) 10Dzahn: Remove brion from the "dataset-admin" group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [22:11:56] (03CR) 10Dzahn: "how did it touch that last line as well?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [22:16:10] (03CR) 10Ori.livneh: [C: 032] Use Trebuchet package provider for scap [operations/puppet] - 10https://gerrit.wikimedia.org/r/155759 (owner: 10Ori.livneh) [22:17:48] (03CR) 10Dzahn: [C: 032] "i'm merging it as a manual maintenance script to schedule downtimes on icinga from shell, it's not used by anything automatic though, we m" [operations/puppet] - 10https://gerrit.wikimedia.org/r/144839 (owner: 10Dzahn) [22:18:22] (03CR) 10Tim Landscheidt: ""No newline at end of file" (original)?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [22:20:07] (03PS6) 10Dzahn: Remove brion from the "dataset-admin" group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [22:20:34] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Puppet has 1 failures [22:20:41] that's me [22:20:42] fix in a sec [22:20:44] PROBLEM - puppet last run on mw1041 is CRITICAL: CRITICAL: Puppet has 1 failures [22:20:46] (03CR) 10Dzahn: "+3 -2 ? oh come on :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [22:20:48] (03PS1) 10Ori.livneh: Trebuchet provider: avoid prepend (Ruby 1.9+) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155835 [22:20:54] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: Puppet has 1 failures [22:20:54] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 1 failures [22:20:54] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:03] (03CR) 10Ori.livneh: [C: 032 V: 032] Trebuchet provider: avoid prepend (Ruby 1.9+) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155835 (owner: 10Ori.livneh) [22:21:14] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:15] PROBLEM - puppet last run on mw1094 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:15] PROBLEM - puppet last run on tmh1002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:17] (03PS7) 10Dzahn: Remove brion from the "dataset-admin" group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [22:21:24] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:24] PROBLEM - puppet last run on mw1051 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:24] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:25] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:25] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:45] these will go away in a moment [22:21:54] PROBLEM - puppet last run on mw1083 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:55] PROBLEM - puppet last run on mw1096 is CRITICAL: CRITICAL: Puppet has 1 failures [22:22:02] killing bot [22:22:04] PROBLEM - puppet last run on mw1035 is CRITICAL: CRITICAL: Puppet has 1 failures [22:22:05] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: Puppet has 1 failures [22:22:08] thanks [22:22:20] np, it will come back next run [22:22:39] yeah they're recovering now [22:27:13] (03PS8) 10Dzahn: Remove brion from the "dataset-admin" group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [22:29:10] (03PS1) 10Ori.livneh: salt::grain: fix 'contains' check [operations/puppet] - 10https://gerrit.wikimedia.org/r/155838 [22:29:54] (03CR) 10Ori.livneh: [C: 032 V: 032] salt::grain: fix 'contains' check [operations/puppet] - 10https://gerrit.wikimedia.org/r/155838 (owner: 10Ori.livneh) [22:35:57] (03CR) 10Dzahn: "this would still be a difference between beta and prod now.. we should please get them closer together again.." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [22:36:25] (03CR) 10Dzahn: "reverts an apache change that breaks beta" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [22:37:16] (03CR) 10Dzahn: "removing myself, -> apergos" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [22:39:18] (03CR) 10Dzahn: [C: 031] "-> ops on duty :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122621 (owner: 10Reedy) [22:39:38] (03CR) 10Dzahn: "are you still going to use this?" [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 (https://bugzilla.wikimedia.org/68769) (owner: 10Scottlee) [22:40:24] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:40:24] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [22:40:25] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:40:37] (03CR) 10Dzahn: "no idea, it has negative reviews, i was just on it because at one point this has caused fatals" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150301 (https://bugzilla.wikimedia.org/68815) (owner: 10Reedy) [22:40:44] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [22:40:55] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [22:40:55] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [22:40:55] RECOVERY - puppet last run on mw1036 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:41:14] RECOVERY - puppet last run on mw1062 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:41:15] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:41:15] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [22:41:15] RECOVERY - puppet last run on mw1040 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:41:15] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [22:41:25] RECOVERY - puppet last run on mw1038 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [22:41:44] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [22:41:45] (03CR) 10Dzahn: "removing self" [operations/dns] - 10https://gerrit.wikimedia.org/r/115093 (owner: 10coren) [22:41:54] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [22:42:04] RECOVERY - puppet last run on mw1089 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [22:42:14] RECOVERY - puppet last run on mw1028 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [22:42:14] RECOVERY - puppet last run on mw1005 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:42:25] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:42:25] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [22:42:25] RECOVERY - puppet last run on mw1115 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [22:42:34] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [22:42:44] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [22:42:55] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [22:43:05] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [22:43:15] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [22:43:25] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:43:44] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:44:00] (03CR) 10Dzahn: "i guess it woul make sense for this to get a review from analytics" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123601 (owner: 10Matanya) [22:44:41] (03CR) 10Dzahn: "no clue if we still care at this point, probably we just focues on mapping to phab" [operations/puppet] - 10https://gerrit.wikimedia.org/r/80577 (owner: 10Faidon Liambotis) [22:47:04] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:47:54] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [22:48:19] (03CR) 10Dzahn: "let's get back to this, why exactly did you vote it down again? are you suggesting we don't use ChainFile ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111387 (owner: 10Jeremyb) [22:49:20] (03CR) 10Dzahn: "bump" [operations/puppet] - 10https://gerrit.wikimedia.org/r/143788 (https://bugzilla.wikimedia.org/60690) (owner: 10BryanDavis) [22:50:26] (03CR) 10Dzahn: "Ariel, i think this needs you because those files weren't actually installed by puppet or something" [operations/puppet] - 10https://gerrit.wikimedia.org/r/144640 (owner: 10ArielGlenn) [22:51:43] (03CR) 10Dzahn: "who wants to deploy ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147488 (owner: 10Reedy) [22:53:14] (03CR) 10Dzahn: "jeremyb, this is meanwhile in the wrong repo, somebody might want to create a new patch because it's now part of the mediawiki module" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/24407 (owner: 10Jeremyb) [22:53:48] (03CR) 10Dzahn: "link apache change?" [operations/dns] - 10https://gerrit.wikimedia.org/r/143086 (owner: 10Reedy) [22:54:34] (03CR) 10Dzahn: "still needs adjusting on Apache refactoring" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130296 (owner: 10ArielGlenn) [22:54:56] (03CR) 10Dzahn: [C: 04-2] keep two weeks of apache logs instead of a year [operations/puppet] - 10https://gerrit.wikimedia.org/r/130296 (owner: 10ArielGlenn) [22:56:10] (03CR) 10Dzahn: [C: 04-2] "meanwhile wrong repo, don't know if this should be an entirely new patch after Apache stuff has been moved" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/111925 (owner: 10BryanDavis) [22:56:46] (03CR) 10Dzahn: "hashar, has the zuul refactoring been going on meanwhile?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/144503 (owner: 10Matanya) [22:58:25] (03CR) 10Dzahn: "a role class within a module seems odd/wrong" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [22:58:33] (03CR) 10Hashar: [C: 04-1] "Yeah the Zuul manifests have been heavily refactored to be role based (server/cloner/merger) instead of realm based (prod/labs)." [operations/puppet] - 10https://gerrit.wikimedia.org/r/144503 (owner: 10Matanya) [22:58:51] (03CR) 10Dzahn: "akosiaris, welcome back from vacation, heh" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108498 (owner: 10Matanya) [22:59:15] mutante: sleeping now. Have a good weekend :] [22:59:46] (03CR) 10Dzahn: [C: 04-1] check-raid syntax fixes, check all raids on system [operations/puppet] - 10https://gerrit.wikimedia.org/r/145018 (owner: 10ArielGlenn) [23:00:25] (03CR) 10Dzahn: "was it intended to also make a change to the favicon stuff here?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147487 (owner: 10Reedy) [23:00:52] hasharMaybeWMF: have a nice weekend [23:01:01] * mutante likes MaybeWMF [23:01:34] (03CR) 10Dzahn: "i don't know, i still can only say that it's somehow cool and evil at the same time, needs platform" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [23:02:37] (03CR) 10Dzahn: [C: 031] "per "probably not being used at all"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117673 (owner: 10Matanya) [23:07:53] (03CR) 10Dzahn: "_might_ be the same that Change-Id: I359299e7e19679e16c fixes" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [23:08:17] (03CR) 10Dzahn: "also see Change-Id: I6f9b7ab038aba094" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154401 (owner: 10Hashar) [23:09:33] (03CR) 10Dzahn: [C: 04-1] "yep, please create a ticket" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155137 (owner: 10Yuvipanda) [23:11:09] (03CR) 10Dzahn: "seemed legitimate to me to ask for revert when it causes duplicate definition" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154329 (https://bugzilla.wikimedia.org/69590) (owner: 10Hashar) [23:14:06] (03CR) 10Dzahn: Added the bouncehandler router to catch in all bounce emails (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 (owner: 1001tonythomas) [23:16:53] (03CR) 10Dzahn: "this is a dependency for the quarry change, but seems unrelated?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153600 (owner: 10Yuvipanda) [23:17:53] (03CR) 10Dzahn: androidsdk: Make sure that JDK is present (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153600 (owner: 10Yuvipanda) [23:18:52] (03CR) 10Dzahn: "no more ServerAdmin setting?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154027 (owner: 10Ori.livneh) [23:19:23] (03CR) 10Ori.livneh: "modules/apache/files/defaults.conf:14:ServerAdmin webmaster@wikimedia.org" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154027 (owner: 10Ori.livneh) [23:20:46] mutante: (that's included by default wherever the apache module is used) [23:22:38] ori: gotcha, ok [23:23:38] (03CR) 10Dzahn: [C: 031] ""aims to be a generic syntax highlighter for general use "" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil) [23:23:48] (03PS4) 10Dzahn: tools: Install package python-pygments [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil) [23:25:27] (03CR) 10Dzahn: "this is just adding files to icinga, not an actual icing service, is that going to be a separate change?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [23:28:54] (03CR) 10Dzahn: [C: 04-1] public_html directory service, see RT #6862 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [23:41:54] PROBLEM - puppet last run on tarin is CRITICAL: CRITICAL: Epic puppet fail [23:47:53] icinga-wm: disagree [23:47:54] RECOVERY - puppet last run on tarin is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [23:48:04] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:48:55] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [23:48:55] (03CR) 10Dzahn: [C: 032] "it's just a syntax highlighter" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil) [23:56:58] (03Abandoned) 10BryanDavis: Send Vary header on http to https redirect [operations/apache-config] - 10https://gerrit.wikimedia.org/r/111925 (owner: 10BryanDavis)